Python Module for Automatic Implementation Checking
As I was responsible for webanalytics for a company, I wasn’t checking my implementation often.
Truth is that I was the one doing the implementation and using the data. 90-95% of the time, I was the only concern by the implementation and the only one that can touch it.
When I started my job as Consultant (again), I was having the case that you need to check the client implementation very often as you may not be the only one working on that project, also you may have a 2 weeks missions and no time to actually understand how the implementation is built. You want to know what has been or is set on the pages and be able to compare what has been changed recently.
After everything I have learned or done in Python during the recent years, I had to try to automate this task.
You can find on this article (part 1) the different challenges that I faced when creating this module.
Logic on the implementation checker
The implementation checker currently has 2 methods;
- one method (checker) is for crawling the website you want
- one method (compare) is for comparing 2 different crawls so you can see what exactly has change.
The implementation checker will probably have major updates in the coming future but I think that this version can already satisfy most of the implementation manager or just web analysts that want to realize regular check of their pages implementation.
Requirement for the implementation checker
it seems obvious but with every tool I create, there is a need to have installed some programs before you run the module.
Most of the requirement concerns Python and therefore, if you already know this language, it should be quite easy. The different modules that are necessary are the following :
- pandas
- numpy
- Path
- urllib
- xlwt
- Openpyxl
- selenium
For Selenium, I already discussed this module that is quite powerful to generate a browser (see my article on how to scrap price tickets).
The point to be noted is that I use Chrome by default on my module (can be change in the code if you want) and therefore you need to install the chrome driver and make sure that this driver is accessible in your path.
Implementation checker Module explained
The next part of the article is to explain what the different parameter are doing and what is exactly doing the method itself.
The implementation module checker can be found of my GitHub Account : https://github.com/pitchmuc/adobeanalytics_impementationchecker
Why aachecker for the module ? Because it is an Adobe Analytics checker 😉
The checker method
The checker method is the one that is realizing the crawl of your website. takes a minimum of 1 parameter, your starting URL. It will return a data frame from pandas by default but can also return the result of your crawl in an Excel if you want. I would recommend you to leave this option to True.
Parameters :
- url : REQUIRED : This parameter can be 2 things ;
- It is your starting URL and combined with your counter, it is crawling the number of URL you have selected.
- It is a list of URL, in this case, the module will crawl the number of URL that is in your list.
- counter : OPTIONAL : as described above, it is the number of URL you would like to crawl when you give the checker a starting URL. This counter is useless if you use a list of URL. In that case the number of URL in your list will determine how many URL will be crawled.
- mobile : OPTIONAL : by default set to False. Determine if the crawl should be realized with a mobile user-agent. It is by default implemented with the Nexus 5.
- verbose : OPTIONAL : by default set to False. Determine if you would like to have comment coming up on your console when your script is running.
- export : OPTIONAL : by default set to True. Determine if you would like to have the result of the crawl in a excel file.
- fast_method : This parameter is the most complicated one to understand, therefore I kept it till the end.
This method takes a Boolean (True or False) and this has major implication on how the crawler will retrieve the data.
When you use one of the method for the crawl, it is imperative that you use the same method in the future for other crawls. Otherwise the crawl cannot be really compared.
When fast method equals to True
When you set this method to True, the crawler will generate the page and retrieve the requests that are sent to your Adobe account.
By default, the request are using “sc.omtrdc.net”, this is important as you would need the regional data center to process your data for most of the new feature of Adobe. So I hope your implementation has done that, but if you are using this type of implementation the crawler will retrieve the information. Then it will parse and retrieved the different dimension that you are sending the Adobe Analytics.
The good part of this Technic is that you are retrieving exactly what is being sent to Adobe Analytics. However, there is a big caveat, it is when the request has a size over 2080 characters. When this happens, the adobe analytics library is going to sent a POST request and not a GET request. In that case, especially because most of the website are using HTTPS, there is no possibility to read the payload that is sent to Adobe Analytics.
So when you are using this method, there are going to be URL where the data cannot be retrieved.
When fast method equals to False
When you set fast_method is equals to False, the crawler is generating the props, the evars, the events and the other dimension on the page and retrieving them. This takes quite a while for each page, therefore it is not the fast_method but enable you to retrieve all the dimensions that have been set on your page.
The good part of that method is, as I have stated above, all of the dimensions that you have set are going to be captured. The caveat of that method is that not all of the set dimension are necessarily sent to Adobe.
As I said, it is recommended to stick to one method when you are doing your checking, as the each of them will not necessarily retrieved the same dimensions. It would be best to have the same method to be able to compare later one, this will be done with the next method.
The compare method
The module contains another method that will enable you to compare 2 crawls so you can easily see which props or evars have been changed during the 2 periods of time.
This method takes 4 arguments :
- df1 : REQUIRED : Dataframe of your first crawl result. It is important that you keep the same column names when you are importing this dataframe.
- df2 : REQUIRED : Dataframe of your scond crawl result. It is important that you keep the same column names when you are importing this dataframe.
- export : OPTIONAL : default set to True. It will write an excel file with the result.
- True is that the value are the same.
- False is that the values are different.
- verbose : Optional : default set to False. It will write you where the file is going to be written.
I hope you like this new module I provided.
If you have any comments or idea of improvement, feel free to contact me and I will try to implement that on my next iteration of this module.
Hey Julien, is there any way to contact you? i know this is pretty old but it’s also exactly what I’m looking for. When i tried it with your website it works, when i tried it with netsuite dot com it worked as well but when I try entering the site I’m actually trying to analyze its throwing me “IndexError: list index out of range” and I can’t figure out why. Feel free to contact me @ the email i provided.
Thanks!
Thanks for the message. Unfortunately, I do not have time to work on this project, therefore I cannot help you on this one.