Moving existing documentation to a new format can quickly become a daunting endeavor with a variety of challenges: How exactly are existing documentation files converted to the new format? How is the translation done? Which processes have to be adapted and how?
In this article, I will try to answer these questions with regard to the creation of our new online help.
Over the past few months, we, the Intershop documentation team, have been working on renewing our online help, the underlying technology and also the visual presentation.
For us, this means that we switched from DocBook to DITA, that our documentation sources were moved from SVN to Git and that all technical writers now use the same editor: Oxygen XML.
For our customers, this means that our online help can be accessed on demand from anywhere – even from mobile devices. Furthermore, there is an improved search function and a completely overhauled design.
Another Intershop Hackathon Project
The project originated in the Intershop Hackathon in Autumn 2018. We tested various ways to actually publish our online help online. We also wanted to improve the design and usability (e.g., search functionality) and our internal creation process.
Technologies or tools we considered were essentially:
Confluence was voted out mainly for cost reasons – we would have needed some costly plugins for direct publishing. Also the migration effort would have been very high.
The main reason why we decided against Markdown was that it was not supported by our translation memory system at that time (2018). Furthermore there were still many open questions regarding file management.
We finally decided to work with DITA in conjunction with Oxygen XML because this approach suits our expectations and requirements best.
DITA in a Nutshell
DITA (Darwin Information Typing) is a free XML-based format maintained and defined by OASIS. It follows the principles of single-source publishing. That means we can create HTML files as well as PDFs (and theoretically many other formats) from the DITA sources.
Some of the most important features are:
- Topic Orientation:
In contrast to other XML formats, such as DocBook, which tend to resemble a continuous book, DITA is topic-based. This means that information is arranged in information blocks, so-called “topics”, which are represented in individual XML files and arranged in a DITA map.
By default, DITA offers several types of topics like concept, task and reference.
Topics can be reused in various maps (or multiple times in one map) for various publications and be maintained centrally at one location.
DITA can be adapted to the needs of users, i.e., it can be specialized. Currently, DITA has several hundred elements that are probably not needed by anyone to this extent. With specialization it is possible to exclude certain elements or entire domains (topic-related groups of elements, e.g., for hazard statements). Furthermore, it is possible to restrict content models of DITA with the help of constraint modules. For example, mandatory elements can be made optional, or optional elements can be made mandatory.
DocBook vs DITA
There are already several articles and forum posts on the web that compare DocBook with DITA (see References), so I don’t want to describe the basic differences here, but the most important advantages compared to DocBook for us are the following:
- Reuse capabilities:
DITA features extensive reuse capabilities. These allow us to reuse content for similar products and offerings such as CaaS in various gradations or B2C and B2X content in our Commerce Management.
DITA is highly customizable, i.e., elements or whole domains (e.g., for hazard statements) can be removed, mandatory elements can be made optional and vice versa. Defining own rules also helps to improve consistency and therefore the quality of the content.
The conversion to DITA simplifies the translation process. Especially with the introduction of shorter release cycles there are often small changes in the software that must be documented (and translated). The compact DITA topics are much easier to handle than large DocBook files.
Editing Made Easy
In connection with DITA you often stumble across one certain name: Oxygen XML by Syncro Soft.
The company from Romania has made it its business to offer a complete solution for XML Authoring, Development & Collaboration.
After a short evaluation phase, we decided to use the Oxygen XML Editor. This offers some decisive advantages over the many freely available editors. Some of our highlights are:
- WYSIWYG-Editor for XML content:
The Author Mode allows to view the finished text with a simple layout. This allows to quickly identify small errors that might get lost in the tag clutter otherwise.
- Built in XML validator:
For our DocBook-based help this was done with an external tool – so a built-in validator is very useful and saves time.
- Link refactoring:
This means, for example, that if a file name is changed, all references to this file are updated automatically (Master Files Support).
- Auto-Completion of Tags:
Oxygen always suggests possible XML elements at the current position and prevents you from selecting elements that are not allowed. This way, validation errors can be avoided at an early stage.
- Responsive Webhelp:
In addition to the many output formats already available with the DITA Open Toolkit, Oxygen offers a responsive web help. It can be adjusted with simple CSS and also features a search function.
Migrating from DocBook to DITA
The migration from DocBook to DITA was a rather complex and time-consuming process, but with some positive side effects.
First, we cleaned up our DocBook files. A big problem were e.g., entities defined by us which were not recognized during the transformation and had to be resolved before. We did the initial conversion with Oxygen by using the “DocBook to DITA” transformation which throws the DocBook content into one large DITA topic file. That’s all there is to it, one might think – but that is just the beginning.
Even if a DITA file is created during this process, it does not correspond to the principles behind DITA – i.e., we do not have any topic centering and the reuse of content is hardly possible.
Therefore we had to do some refactoring work, which can be done partly automatically with Oxygen and XPath (a query language for selecting notes from an XML document):
- Converting nested elements to topics:
This allowed us to create single DITA files for each actual topic.
- Deleting elements and comments:
This helped to remove for example empty prolog elements and not required comments.
- Converting topics to suitable topic types, i.e., tasks, concepts and references:
Especially the conversion to tasks required some additional rework, since elements like
cmdwere not added at all or not in the right place. Oxygen’s XPath capabilities helped a lot here.
By using the following refactoring commands in Oxygen, it can be converted to a valid DITA-task:
- Rename Element:
Target elements (XPath):
New local name:
- Rename Element:
Target elements (XPath):
New local name:
- Unwrap Element:
Target elements (Xpath):
The following example shows a task before and after refactoring:
- Rename Element:
- Renaming files and arranging them in DITA maps:
After the files were ready, we did some bulk operations to adjust the naming and then we arranged everything in DITA maps.
- Translating the Documentation:
Currently, online help files are available in English and most of the documents are also available in German.
For translation, we use Across Language Server, a translation memory system.
Since the texts themselves were only slightly changed during the move, we were able to keep most translations from Across for the new DITA format.
Everything in Its Right Place
Until recently, the documentation sources were maintained by the documentation team via SVN and the online help was generated locally via command line. The finished results were then added to a Bitbucket Project via Git. A big problem was that our DocBook Transformation added random IDs to all sections of the generated HTML files, which made the review process very difficult, because due to the random IDs, Bitbucket always marked all files as changed.
All in all, it was a very cumbersome process.
Now we manage source files as well as the generated help files via Git in Bitbucket. Another novelty is the use of Oxygen Webhelp, a tool that allows us to automatically generate the online help and PDF files via Bamboo as soon as new data is committed, thus saving a lot of time spent for manual creation and checking in the data.
Was It Worth It?
Even though the conversion was very time-consuming, It was definitely worth the effort. The automatic build process saves a lot of time, we are now much faster with translations and with the reuse capabilities, we are well equipped for the future. Working with the small DITA topics is also much more pleasant than working with the large DocBook files we had before.
Beyond that, we now have a responsive online help that doesn’t have to hide from anyone – so for my part, I can answer the question with a clear yes.
For the future, we are planning further revisions to the document structure as well as a strong focus on search engine optimization so that our customers can easily find answers to their questions via search engines.
- Intershop Online Help
- Radu Coravu: DocBook vs DITA
- Don Day, Michael Priestley, and David Schell: Introduction to the Darwin Information Typing Architecture
- DITA Open Tool Kit