Why I hate my job so much hate ? (sometimes)
I have to be honest here and tell you that, sometimes, I hate my job. As web analytics expert, I fear the moment when I have to realize the task that no web analyst can skip : The implementation checking.
Others can call it acceptance checking but the reality of the job is that you have to test all the possible page implementation (or at lest a wide variety of page template) in order to check that the value you have retrieved is exactly the one you are expecting.
This process is important and cannot be skipped by all means. Actually it is quite the opposite, if you have one step where you have to focus all of your attention, it is this one. If you let something pass, it can impact your live data and the users of those data will get false information and deduce wrong hypothesis or actions to be taken.
If we have to tell the truth, even with those dreadful implication of doing this step wrongly, it is quite hard to realize this checking thoroughly. Looking at each page and see which type of dimension or event that are actually sent is quite tough. You are quick to get your attention diminishing as this task is quite repetitive. For getting to know lots of QA manager during my (short) career, I was amazed on how they create processes in order to realize that tasks, and how good was their automatic scripts for doing some of the possible checking. Unfortunately, most of their scripts are/were Java based and I don’t know that language, neither I have the will to learn it.
The different challenges face by automating the implementation checking
I had the idea to realize an automatic version of implementation long time ago. I think that everyone working in analytics had this idea at some point. It is hard and not very challenging intellectually – thus the difficulty.
The first version was actually a VBA script that automatically parse the request send to Adobe Analytics and return the value of a specific dimension.
You would have something looking like this :
request | URL | Campaign | v1 | c1 |
---|---|---|---|---|
http://subdomain.112.2o7.net/b/ss/rsid/10/JS-2.9.0?AQB=1&ndh=1&pf=1&…. | www.datanalyst.info/ | no campaign | something | something else |
This was already helpful but you still had to manually check all the URLs and copy the request into excel.
I like excel but this steps was quite manual. For actually checking the value, you can use the Adobe Experience Cloud Debugger. Here is the link for theĀ Chrome extension.
Having an history of your checking
Even if this VBA code was still requiring lots of effort, it was enabling the possibility to have an history of your code tests. It was easier to tell to the client what actually is missing, when the test has been done for the last time and you could “easily” compare files between each other.
This possibility was actually already a big win and I wanted my automatic script to be able to actually return / write / create an output where it can tell what has been tested, when and where.
This is definitely that an automatic script should keep.
Runnig Javascript
More of a pre-requisite than a challenge, isn’t it ? As you know, most of the analytics solutions (99%) are actually running with JavaScript and Adobe Analytics is no exception. Therefore, when I wanted to realize this kind of automated script, I came quickly to the issue that I need my script to run JavaScript.
You know how a big fan of Python I am. One of my favorite library being “requests“, where you can actually request URL quite easily, but unfortunately this library request the raw HTML document without the rendering of JavaScript, let alone the fact that tags / pixels can be fired.
Fortunately I discovered recently selenium that can render JavaScript from a browser. This library is then the one I used and even if it is a bit difficult to install, it is quite powerful so I would recommend to do it as you will use it for other tasks (like scraping train ticket price)
Catching the request
It is not everything to render the JavaScript of your page, you need to make sure that you can catch the requests that are being sent to Adobe Analytics (or google analytics if you want to create your crawler for this). This is quite tough but JavaScript is a wonderful world, especially the performance API for the browser.
This API, documentation here, enable you to retrieve the requests that are being sent from your page. As you know, most of the pixels, analytics tool use GET requests to retrieve a Pixel. On this request, the tool attached query parameters. You job is then just to read those parameters. It can be tough (see next point) but manageable.
The real problem begin when you want to catch POST request, because Adobe Analytics automatically transform GET request to POST request when the url is bigger than 2083 bytes. This is because the request will not be supported on IE if we send bigger request.
The POST request will transfer the data in what is called the “payload” of the request, unfortunately it is not possible to get that information easily. On top of that, when you use HTTPS website, the data of the POST request is encrypted…
After some time trying to solve this issue using proxy, I gave up that idea as it was really complex to realize. I came with another way to request that information. It will be explain on the second part of this article series.
Understanding the requested information
Another challenge that you face when you retrieve the information from the crawler is that the data are compressed. For example, eVar1 is called “v1”, the channel is “cc”, and so on…
So you have to understand exactly what you retrieved and how to interpret it. If you are willing to create the script and / or doing implementation yourself you probably know all of that but the analyst may not, and let’s be ambitious, you want anyone to use that script so that you don’t have to do those very long implementation checking alone (or even not have to do it anymore – as you are probably more useful doing something else).
So not only you have to understand it, you will want to interpret it for the other users using your script…. but you don’t have to.
Other stuffs
On the list of challenge that you are going to face (non exhaustive list) :
- Creating a way to deal with URL retrieved and URL already done
- Crating method that are easy to re-use
- Creating a crawler that work on any type of website (and implementation)
- Check for performance management so you don’t crash your script (too often)
- etc…
On the next post I will present and explain the python module I realized. This post was an introduction to the script so before you criticize the provided script, you understand the work done on it and the different issue already faced.
This may also help you if one day you face one of those challenges, you could take a look at the code to see how I solve them.