Data discrepancy: How do you end it?
The impossible solution to discrepancy
For a long time I have been facing, and probably most of you too, with the problem of discrepancy on data. Not general discrepancy as this is something that is really hard to avoid, but especially on one of your key metrics. Let it be Orders, Clicks, or any other KPI.
To be honest to ourselves, we know that having discrepancy between what happen in “reality” and what our tools measure is an almost impossible tasks. As you may work in a department that is “famous” for its discrepancy and at the same time, having another department famous for detaining the real data, the truth.
You know that feeling:
“Thanks for your analysis. I will now ask the people from the “X&Y” department to do the same … you know… to make sure…. ”
To be fair, if you have time and good connection with the “X&Y” department, I would probably do the same. More lights never made something darker… (It actually can… but not the point 😉 ).
Everyone has discrepancy
However, I have a scoop for you, they probably also have discrepancy in their data! Yes, even them!
I know it may hurt that other departments trust them more but there are probable reasons for that, would it be:
- Historical reasons (The famous : they were always providing us data)
- Political reasons (Head of the department is in the COMEX)
- They actually have better data
How did they achieve that?
They actually are good at one particular thing: Measuring the data and validate them. If 2-3 persons compose your department and you are doing everything (data retrieving, data management, data extraction, data analysis, pipeline for other department, training) there are not so much that you can do… sorry… come back when you have a real team ;).
When you have more colleague, you can focus more time to actual prove the data that you are retrieving. You run analysis and try to see if every combination you are running are making any sense.
You actually bulletproof your data for any particular analysis so you can explain any behavior that may occur with it. This type of checking is painful, time consuming and often under-estimated as you prefer retrieving the data than really checking it. You are right to say that we can always clean it after, but that’s only when you are sure that you don’t omit any of them.
Reducing the discrepancy
The secret of reducing discrepancy is to actually get the closest possible to the machine. Always try to reduce the degree of layer that exists between the solution (application, website) and your tool.
As an example for the website, we can take the order discrepancy for online web analytics tool. I worked with many tools and all of them face this issue. It is not about the tool but the way you retrieve the data, because as good as your website integration is, there is always users that will not fire the tags you have past there.
It may be that that the user has left the page before the tag is fired, or the page had an issue and the tag could not fire or even that the user never reached the finish page due to some errors.
The way to resolve this issue is to integrate your tag directly on the server, so you are making sure that whenever the server run, the code is directly sending this information to your tool (Google Analytics, Adobe, AT internet, etc…).
The discrepancy will be gladly reduce through this technique.
Server Side tracking – Check box
When realizing server side tracking, you need to pay extra attention to specific details:
- Be sure to pass the primary key that is used on the rest of your tracking (user id of some sort, mcid /ecid for Adobe Analytics)
- Be sure to pass enough knowledge to the IT person that implement the tracking.
That means spending lots of time with him/her to explain what the consequences of their actions - Go through the testing phase with the QA team (don’t leave them on their own – you know best what needs to be set)
At the end, you will lose most of the power you have about this specific metric but will gain better consistency on your reports.
##tip : send a POST request as there is no limitation of size for post request, instead of 2048 bytes limitation for the GET request.