Why manually collected EPOS data is *always* wrong
OK, so it isn't always wrong; forgive me a little poetic licence. It is, however, usually wrong if you collect and store it day by day.
Yes, really. Let me explain why...
Working in the consumer goods sector, we (Atheon Analytics) deal with a broad spectrum of companies from global brands to local producers, yet size, age and apparent sophistication are no predictor of attitudes to data. We are just as likely to find a purposeful data-driven regional supplier with a strong and detailed understanding of its trading performance as we are to find a household name with account managers and demand planners who poke around in data irregularly, producing questionable reports for managers unable to articulate which products are best-sellers and how available they are to shoppers.
One common challenge for the thousands of businesses that supply products to the supermarkets, is how to collect and manage sales data (let alone the full flow-of-goods data set). Sales data is provided freely by many of the biggest supermarkets on a daily basis in the form of "EPOS" (Electronic Point of Sales - till data in simple terms).
So if supermarket suppliers can access daily sales data (and the broader flow-of-goods set from some) why aren't they rushing to do so? Our experience in this specific area over the past five years suggests that fewer than 5% of supermarket suppliers collect and use daily EPOS data. Why?
The first challenge is just collecting data. For most companies this is a manual activity which involves an employee logging onto a 'supplier portal' (supermarket website for suppliers), navigating to the reporting system (different in features, appearance and output for each supermarket) and requesting one or more reports. In some cases reports are run in real-time and so the user needs to wait for the results, whereas in others a notification is issued when the data is available to download.
Unfortunately, not all of the supermarket portals play nicely. Some are incredibly slow on Mondays, with data only available late in the day after multiple attempts. Some have rather rocky reputations, with regular login and access problems. Some are famed for advising, after the fact, that (some of the) data received was in fact wrong. The result is that over time, users learn the idiosyncrasies, oddities and quirks of each retailer system and find workarounds and, in the main, decide that daily data collection is too much like hard work.
For all but the most dedicated of data users, collecting data at the weekend (and whilst on holiday) is definitely too much like hard work and rightly so. As a result, data gaps - a day here, a day there - appear and (without appropriate tools) can be difficult to track down and correct. So weekly reports are run, or end-of-month summaries, which are fine for "look back reports" but not enough for actionable analytics-driven interventions - like tackling availability dips before they result in material lost sales.
Even a complete daily data set - one where, definitively, all days are represented - is prone to inaccuracies. Some of these risks are fairly obvious, some less so:
- Have you checked that all products are present on every day? Perhaps there were no sales that day for every missing product, but this unlikely on well-distributed products (unless entirely out of stock), so do you recollect data the next day?
- How are you handling on/off promo? Some retailer systems will report two data rows per product - one for sales on promotion, one for sales off promotion (where a promotion is not active in all stores) - do you know how to interpret these and check for total sales?
- Have you checked for empty data columns? Sometimes one or more columns (sales value, sales volume, availability, lost sales etc.) may be missing entirely or reported as zero - do you have data quality checks in place to trap and flag these?
- How do you handle changing data values? Yes, even when you receive a complete data file - all products, no nulls etc. it's possible that the data present isn't a 'full EPOS read' i.e. Some stores didn't contribute their numbers, or waste figures are recalculated retrospectively - for some retail systems it's a virtual certainty that sales numbers for a given day will change slightly for the next 3-4 days after they are first reported. Do you collect multiple days of data each time, to correct for these subtle changes?
It quickly becomes apparent that a full and accurate EPOS read is a non-trivial undertaking. It's why automated EPOS (and flow-of-goods) data services *can* produce slightly different numbers than those that appear in manual reports; their automated nature allows them to address all of the above (and a host of other data quality checks) to re-collect questionable data, double-check against the original and amend recent history where appropriate.
Finally, if you get all of the above right there is still the challenge of transcribing EPOS data recorded against retailer product codes (yes, UK supermarkets still insist in reporting using proprietary product codes, despite the prevalence of GTINs) into supplier-usable forms. There are typically two elements to this:
- Converting from retailer code to supplier code (and/or using GTIN)
- Organising products into supplier-recognised categories (and enriching with other attributes)
This is made more complex by the many-to-many nature of product coding - it is perfectly possible for two retailer codes to refer to the same supplier product (for example, where retailers create additional codes for promotional variants), and for a retailer to use one code to describe two or more supplier products (for example, where the supplier makes a subtle packaging change).
The product code challenge continues to exist whether data is collected manually or automatically - and whether gathered daily, weekly or monthly - but some automated flow-of-goods data management services offer product code translation and auto-categorisation capabilities. In these cases, the supplier can retrieve a fully encoded EPOS data file on a rolling basis, or even connect directly to a fully-manged Demand Signal Repository which is always up-to-date and available for live query in supplier systems.
Modern automated data collection tools and services provide the best way to tackle the perils of daily EPOS data collection, but it is possible to 'roll your own' if you have the requisite software development skills and a thorough knowledge of the idiosyncrasies of the relevant supermarket systems. The four key elements are:
- Know the source system - be aware of how to deal with system outages and recollection of slightly-changing data
- Automate data collection - but make sure you use a light touch so as not to unwittingly overload the system in question
- Apply layered data quality checks - test and check all data received, and be prepared to re-collect when issues arise
- Design and manage appropriate master data lookup - ensure that you maintain an accurate cross index between your product codes and those for each of your grocery customers
If that all seems too much to worry about then look for an EPOS - or better still, a full flow-of-goods - data and/or analytics service which can address the problem for you.
[Hint: you might want to take a look at SKUtrak®]
Guy started his career as a software engineer, and after progressing through technical and commercial roles founded Atheon Analytics in 2005, currently performing the CEO and CTO roles.