Lately, when I was browsing for a smartphone, I found a web page with really helpful articles and discussions. They even courteously provided a link to a shop where you can buy some of them. And so I gave it a go. Sadly though, it was one of these shops where you find everything but what you’re looking for. I clicked through some categories, used the search, clicked on one or two products and then resolved to type Amazon in the URL bar of the browser. Maybe they have this stuff, too. Customers, and I’m no exception here, expect usability (or buy on Amazon). Shops have to be clean and attractive. They have to provide the right content to the right customers in the right way. When you look for products the ones shown need to be relevant. For example, when you are looking at smartphones, the shop should recommend smartphones. When you type in a search phrase, smartphones should be ranked up.
For many of these use cases a recommendation engine can be used. Intershop knows that and lately built a proof-of-concept together with Microsoft. In a couple of days we hacked the first quick and dirty version. And of course – it worked. Currently it is only for product recommendations but could in the future be used for all kinds of personalization improvements. In this article I will tell you a bit more about what we did.
Let’s start with the use case for the shop of our demo company, the famous inTRONICS. In the shop carousels are used to recommend products. The products displayed are the same for all users, regardless of relevance. If you were looking for a smartphone, the carousel’s recommendations wouldn’t be of much help.
With the recommendation engine we developed, we can display each logged-in user the preferred products. And the same principle can be used for a lot of other content in the whole shop as well. In the future also users who are not logged-in could be included.
Let’s get into the technical details. Every Machine Learning project starts with data. Our data comes from tracking test users and bots. We collect the data by logging it directly in the shop. In a real implementation the data would come from tracking tools like Google Tag Manager or Matomo.
In a next step we need to prepare the data. In our POC we use one JSON-file per tracking event. Depending on the source you are using your data comes in a different format. E.g. with data from Google you would get less files. The first thing to do is to store the data in an Azure blob storage. It’s super cheap and has good performance. For the data ingestion step and the actual data preparation step there are two tools in Azure that we use: Azure Data Factory and Databricks.
Azure Data Factory is a good choice to copy data and rename variables. What seems to be very simple can get very challenging when it’s about a lot of data and different data sources like e.g. XML Files and SQL Server. In our case we only had to copy JSON files to our Blob storage. Easy.
The second data step is the actual ETL. ETL stands for “extract, transform, load”. We extract the data we need from our JSON-files (extract), group them by user and sum them up (transform) and store the result in a CSV file in the same blob storage (load). The CSV contains scores for each customer, one score per product we had tracking data for. A pretty tiny file. The score is computed from all the tracking events the customer triggered on his journey in the shop. Most important of which are the clicked and the bought products.
The data in the CSV file is sparse. That means we don’t have many scores. Most customers didn’t look at most of the products. And that’s where the Machine Learning Model comes in. We use a technique called Collaborative Filtering that magically adds all the scores we don’t know. These scores can then be used for recommendations by presenting to a customer the products with the highest score. The computation of our final set of scores is done with Azure Machine Learning Service. It’s a cloud service that makes it easy to create machines to compute our model. It also supports development with tools around model creation, management and deployment. For example it’s super easy to host a model, in our case the final scores, as a REST-based web service with just a view lines of code.
The diagram below shows the whole architecture of our POC from data collection to the deployed model as a REST-based service. The service can be integrated directly into the front-end. To set up the process we used Azure DevOps. The whole process can be run for example once a day to update the preferences of users.
I hope you enjoyed having a brief look into our recommendation engine prototype. Maybe your next smartphone will be recommended by an improved version of it.