Hello everyone,
As some of you may have noticed on twitter, I have released recently an audience manager python SDK.
As usual, this wrapper helps you to call the different Audience Manager API endpoints and you would need a JWT integration to connect to it.
You can find the module on my github account and some get-started page already give you some hint on how to use it.
The Audience Manager API documentation can be found here : https://bank.demdex.com/portal/swagger/index.html#/
For your information, I am using the aam.adobe.io endpoint and not the legacy api.demdex.com endpoint.
This article will not describe the API as I am usually doing because most of the description is contained in the different docstring and the github documentation is already giving you the path to use it.
Instead, I would like to walk you through on why I built the API wrapper and you may be able to transpose that idea to other projects you are doing.
Bulk Management Capabilities
Be it Audience Manager, or Adobe Analytics, or any other tools you are using on an enterprise level, there is always a moment where the management of that tool is made impossible by the amount of data you have to deal with.
The manual process that started when the tool got acquired need to be scalable so that you can use automation tools to take care of that part.
This is the reason to use the API in the first place.
The problem that comes when you use these API is that you will need to know a bit of programming…. usually.
Thanksfully, a bunch of Adobe consulting developed the Bulk Adobe Audience Manager via Excel (the BAAM tool).
This Excel worksheet enables you to realize API calls without knowing any programming language. The equivalent of reportBuilder for Adobe Analytics.
I would even say it is better than the ReportBuilder because you can also send information back to Audience Manager.
You can find the whole documentation about that tool here: https://docs.adobe.com/content/help/en/audience-manager/user-guide/reference/bulk-management-tools/bulk-management-intro.html
The question is now:
Why do you create a tool that already exist ?
Each tool has a purpose
Behind each of the tool that you are using hide the use-cases that these tools are supporting.
For the BAAM tool, it is to help dealing with bulk import and export of data in your audience management instance.
This is great but the BAAM tool is built on Excel and that implies some limitation:
- You can extract only 1000 elements per request
- You can’t loop your request
- You cannot link the result together in an easy way.
These limitation are usually not a problem for anyone working on a daily basis with Audience Manager. So I would always encourage to use the BAAM tool for your daily work at scale.
On my hand however, I had a use-case that the BAAM tool wasn’t supporting:
How can I dowload 12 000 elements and automatically mapped them with additional information contained in other elements.
From the Excel itself, the download of these 12 000 elements will take 12 round of Excel running.
Let say 12 minutes, not too bad.
But the mapping with other elements is where it starts to be limited, the BAAM tool have some method to realize some connection between elements but these need to be made individually. Imagine doing that for 12 000 elements… This would take a week (actually 8 days non stop, counting 1 element takes 1 minute).
Audience Manager API Wrapper Use Case
If you know Audience Manager, you will better understand my use-case with the following paragraphs:
We wanted to have all of the traits (condition of a user qualification) for a company. This would take 12 minutes for 12 000 traits as described with the BAAM tool. It took 17 seconds with the python wrapper.
The trait are always retrieved with DataSource ID and Folder ID. These can be useful for a machine but for analysis, we would like to get to know the names behind these Data Source and Folder, so we can better classify them.
We would need to download all of the datasources and folder now in order to retrieve these information (respectively 1.7 seconds for 3 000 Trait Folder, and 2s for 3 500 data sources).
Then the fun part begins.
You have 3 file looking like this:
Trait ID | Trait Name | Trait Rule | Folder ID | Data Source ID |
… | … | … | … | … |
Data Source ID | Data Source Name |
… | … |
Folder ID | Folder Name |
… | … |
Each of these data are represented through a pandas dataframe.
For this to be ingested into one, you can use the following easy trick in python:
# df_traits is the trait file # df_ds is the datasource file # df_folder is the folder file # first step is to merge with data source temp_df = pandas.merge(df_traits, df_ds,how='left', left_on="Data Source ID",right_on="Data Source ID") # second step is to merge with trait full_df = temp_df.merge(df_folder, how="left", left_on="Folder ID",right_on="Folder ID"
I am always keeping the full amount of trait by doing a “left join”. In case a trait doesn’t have a datasource or a folder (which should not happen), we still keep these traits.
These operations take 1 seconds in total so in the total amount of getting all information combined from python took 22 s instead of an hour to one and half-hour, depending how fast your are in excel.
That is already a win for me and for my client.
The loop benefit
One of the main reason I explained was to use a loop and so far, there has been no loop used. So let’s dig into that use-case of the python wrapper.
The idea here was not only to have the information about these traits but also if these trait are used within a segment. The segment on itself doesn’t matter, but we want to identify the most important traits by that technique.
This will give us a hierarchy on which trait are the most important for segmentation but it is not telling you if these segments are used in a destination. Destinations are used to activate the segment population to a partner, be it Facebook, Google, etc…
So we will need to look at the segments and destinations as well. We will do it reverse so we have the full information at the end to merge it with the trait.
At the end, we will use the API method to know which segments are actually mapped into a destination and then check if these segments are the one returned for a trait.
The following methods : https://bank.demdex.com/portal/swagger/index.html#/Destination%20API/get_destinations__destinationId__mappings_ will retrieve all segments mapped to a destination.
Because you cannot use more than one destination Id in this parameter, this is where we will use the loop.
list_segment_mappings = [] ## will gather all of the segments IDs information for element in segments_ids: ## segments_ids is the list of all segments ID list_segment_mappings += aamInstance.getDestinationMappings(element)
aamInstance is the the instance of AAM that enable the API calls to audience manager.
The list_segment_mappings dimensions is the list of all segment information that have been mapped to a destination.
The next steps is to retrieve the unique segment id only, we can do that by using the following method:
list_segment_mapped = set([element['sid'] for element in list_segment_mappings])
Once we have this list we can then used it on the loop that will check if the trait have been used in a segment. This requires a call to this method: https://bank.demdex.com/portal/swagger/index.html#/Segments%20API/get_segments
Using the “containsTrait” parameter, we can check the trait one by one.
dict_trait_used = {} for trait_id in traits_ids: dict_trait_used[trait_id] = {'UseInSegment':False,'useMapped':False} check_seg = aamInstance.getSegments(containsTrait=trait_id,format='raw') if len(check_seg)>0: dict_trait_used[trait_id]['UseInSegment'] = True for element in check_seg: if element['sid'] in list_segment_mapped: dict_trait_used[trait_id]['useMapped'] = True
we first create a dictionary that will contain these information, then we checked for all of the trait that have been extracted in the traits_ids variable if they have segment mapped. By default they are not used in a segment and the segment is not mapped (UseInSegment = False, useMapped = False).
When there are some segments returned, we set True to the “UseInSegment” key, and then we check for each of them if they are in the list of segment mapped. If yes, we set True to the “useMapped” key.
NOTE: This operation is quite expensive on network ressource and therefore will take quite a long time. In my experience, 10 000 Traits were checked in a bit less than 2 hours. This is long, but imagine doing that by hand.
That will give us a nested dictionary that we turn into a Dataframe and then merge again with the previous information.
df_trait_segments = pd.DataFrame(dict_trait_used).T full_trait_infos_use = full_df.merge(df_trait_infos,how="left",left_on="sid",right_index=True)
I hope that these example on why and how I use a python wrapper for this API was interesting.
Let me know in the comment if you have some questions.