In this blog article we show how we implemented a B2B business skill and what practically happens while doing that in a more developer-experience-focused way. For a deeper technical look, we refer to two other Alexa-related tech blog articles.
Speech assistants from Apple, Microsoft, Google and Amazon have made a huge leap forward in the last 4 years, conquering mobile phones, living rooms and even cars. Microsoft has dropped out of the development race to increasingly limit its Cortana to its own product world. However, Apple Siri, Google Assistant and Amazon Alexa are still available and offer themselves to extend your own applications with them.
The acceptance of the language assistants in the private sector is now very high throughout. This motivated us to try out language assistants for professional use and thus enable our customers to achieve concrete benefit in the B2B sector as well. The technological basis for this was to be provided by Alexa, Amazon’s assistant lady, who is 92% well-known in Germany .
We received the application scenario from one of our clients who supplies his customers with building materials and tools. Very often customers reorder standard products that they know well and that have proven themselves directly from the construction site. These reorders are frequently placed directly via mobile phone on the client’s mobile shop, whereby touch screen operation, at least on a construction site, was also considered error-prone and cumbersome. A reordering solution via a voice assistant should provide a remedy.
By using Alexa, there are four main components that we need to build a speech-based connection to the reorder process:
- Input device with Alexa speech app that allows customer interaction.
- An Alexa skill that represents the conversation logic of the speech assistant.
- An endpoint of the skill, which hosts the skill’s app code and connects to the business application you want to connect.
- The business app, in this case an Intershop Commerce Management e-shop system.
In our implementation a mobile phone with Alexa app acts as input device (1), because today every building site worker owns one and the Alexa app is free and hence, from this aspect’s view, almost everyone would be having access to the reorder skill without limitations. The Alexa app streams the customer’s voice audio to Alexa Service that cannot be seen from a developer’s point of view. This is the point where Alexa’s amazing Speech Recognition-, Machine Learning-, Natural Language Processing- and Speech-to-text functionality comes into play in the background. Alexa Service interprets the voice stream, derives intentions and executes appropriate actions. With Alexa’s skill (2) offering, Amazon provides us all the needed functionality to create an own conversational model that transparently interacts with Alexa Service.
Once the skill is created, it must be hosted in an endpoint: We decided to use an Amazon Lambda function (3) hosted by Amazon that interacts with the client’s Intershop Commerce Management e-shop system (4).
All four main components must be configured or programmed in order to realize our reordering assistant. After creating an Alexa developer account, the first step is to program and configure the so-called conversation model or language model. This is the conversation interface between the customer and the business logic of the skill.
At this point so-called intents are defined, which represent the tasks that the skill can perform for the customer. Each intent has one or more slots, which are containers for information that the skill needs from the customer to complete an intent. The kind of information that a slot type, a construction like a data type, holds, defines a slot. There are slot types already created by Amazon, such as “number”, but it is also possible to create own slot types. These own types have a name and a list of possible values, which the developer must self-maintain.
There are two ways to request a slot’s contain by asking the customer. First, you can trigger the queries in the code. This is especially useful if you want to refer to previous information. Second, for less complex queries, you can also let the skill self-manage queries by marking a slot in the developer console as “needed”. The skill will now query the marked slots in the order of creation.Therefore, two things still must be configured in the slot: the speech prompts and the utterances. Speech prompts are the requests Alexa uses to get the needed information. Utterances are generally all speech inputs the customer can make in the context of the skill or, in other words, the possible customer’s answers to the corresponding speech prompt.
While skill programming it is important to be careful at all points where possible utterances are created and set-up, because this is where the machine learning effect of the skill comes into play. The more possible utterances we enter, the better the skill’s language model is going to be trained and the better Alexa develops the ability to understand variations or combinations of the possibilities we have indicated. As a result, the customer speech interaction with the skill will be much more fluent, feels natural and prevents the customer from having to learn predefined phrases.
In the next step, our skill needs a runtime environment that hosts the skill application and serves as a communication endpoint. This can be done at AWS with a Lambda function or by self-hosting. The endpoint receives the information that the language model requests from the customer. The logic then processes this information and generates a voice response. This can either be a request for additional slots or a confirmation that the intent is complete. In our case of the reordering skill, the logic also handles communication with the ICM. This takes place via a REST request and is used to receive needed business information from the shop application.
Enough theory – let us take a closer look at the Intershop Reordering Skill. The language model currently consists of only one intent – the reorder intent. Its task is to gather the desired products, payment method and delivery address from the customer and then place the order in the ICM. The best way to explain implementation is by having a closer look on the dialog flow.
Let’s use the following scenario: Our customer, a technician on the building site, notices that he is about to run out of screws. He therefore wants to reorder some well-known products directly and starts the Intershop reordering skill. With the phrase “Please reorder screws”, he can now start the reordering process and the skill will then search for suitable products related to the term “screw”. Therefore, the skill iterates over a limited assortment, which is realized by a specialized Alexa reorder list. The customer can fill this list with its favorite products in the e-shop.
Having received the answer from the e-shop-query, the skill now offers the technician the first hit. He can select it directly or have two further products offered. After the product selection, the payment method and delivery address are requested. Only information already stored in the shop can be used here. This is especially due to security aspects, because nobody wants to speak sensitive data such as the bank account details loudly into the smartphone. Furthermore, Alexa does not offer a general out-of-the-box solution to capture postal codes and the like.
Now that the skill has all the information, the customer gets a summary of the order. If everything is correct and the customer approved all order information, the skill places the order in the ICM.
What experience have we gained in implementing the skill? Let us first look at the creation of the dialog.
At the first cut-through it quickly became apparent that implementing a straightforward dialog works well. Here, Alexa’s dialog handling gives us good possibilities to put the prompts in the desired order. It becomes a bit more difficult when it comes to implement first branches, e.g. if instead the standard payment method another one should be used. The main difficulty here is to keep the overview. The skill works in the background with intent handlers and whether such a handler is executed is determined by Boolean expressions. However, it must also be noted that the conditions must only be correct for one handler at a time, otherwise the skill may stall at this point, which results in a thrown error, and the customer needs to restart the skill again. This sometimes leads to very long and difficult to understand expressions, even with a few branches. Therefore, adding branches is not necessarily complicated, but a bit cumbersome.
More disappointing, especially from the customer’s point of view, is that there is no real possibility to jump back in the dialog or to run through individual parts several times. This means that different products must be ordered individually and if you have made a mistake, for example in the selection of the delivery address, the only thing that helps is to restart the ordering process. Alexa dialog system is therefore well suited to implement dialogs with low complexity and few branches. As soon as your scenario’s complexity increases, you quickly reach the limits.
After having implemented the dialog structure, communication with the ICM now had to be established. Thanks to Intershop’s comprehensive REST API, the basic “how” was quickly answered. However, there were still some questions open. On the one hand, how can the customer request a product? The idea of creating a slot type with all product names we quickly discarded. The maintenance effort for the skill would have been much too high, as well as the susceptibility to errors, especially with cryptic names that technical products often bear. Therefore, we decided to use a map over terms. To do so, two things were done: First, we created a slot type in the language model with corresponding umbrella terms, i. e. replacement words or similar words, e.g. “screw” for “screws” and so on. In the second step, the mapping had to be enabled on the ICM side. Here we decided to use a special custom attribute at the product object. This means that every screw-product has got a corresponding custom attribute with the value “screw”. This attribute is queried via REST-Call, so that all products of the desired type can be filtered from the wish list.
Since wholesale assortments are most often very huge, another challenge was the potential amount of search results for a requested reorder product. A question for screws should not end up in a 5 minutes Alexa monolog about dozens of variants of hundreds of products. At this point, we would have long since exceeded the practical limits of a pure language assistant.
So, the challenge was to narrow down the assortment in a way that the product suggestions are well adapted to the customer’s needs. The first idea was to use the order history because there are only already bought, and most likely well-known products in. However, thanks to the combination of long order histories and complex order structures, we soon noticed that also such requests take long, sometimes too long. Alexa has a limited processing time, which means a maximum of 8 seconds may pass between customer’s speech input and Alexa’s speech output. If a request’s processing time took longer, the skill threw an error and aborted. In the context of usability aspects, this time limit makes sense, because the customer does not want to wait forever for an answer, but in complex B2B shop scenarios it has a quiet restrictive effect. A little more flexibility would be desirable here.
As a basis for the reorder search assortment, we finally decided on an order template – in other scenarios known as wish list. This way the customer must maintain the list himself but can decide which products he prefers for Alexa reorder. Furthermore, it limits the number of products far enough so that the 8 seconds processing time can be kept.
For the future, an AI-based process, which automatically fills the list referring to the customer’s search and purchasing behavior, could also be considered here. For the moment, however, this exceeds the scope of our first reordering skill.
So far, all challenges could be solved. But one question remains unanswered. How is the login to the shop system realized? Alexa works with the Amazon account credentials of the customer, but that doesn’t correspond with e-shop’s credentials in B2B-scenarios. Additionally, no one wants to speak loudly its credentials into the smartphone. Unfortunately, there is no possibility to create an input mask on smartphones or Alexa Echo Show that could be used to manage the credentials and access. Therefore, a creative solution still needs to be found. Stay tuned …
 Statista, „de.statista.com,“ Statista, April 2019. [Online]. Available: https://de.statista.com/statistik/daten/studie/1031358/umfrage/umfrage-zu-bekanntheit-und-nutzung-verschiedener-sprachassistenten-in-deutschland/. [Access 25 06 2020]