Just about everyone has some old letters from grandma or handwritten diaries from a great-aunt lying around at home somewhere. For some people living in Germany today, however, these could be difficult to read because their ancestors wrote in another language, like Russian, Serbian, Ottoman Turkish, or Arabic. This is where the project MultiHTR comes into play. HTR stands for handwritten text recognition. Professor of Slavic Studies Dr. Achim Rabus from the University of Freiburg originally began training MultiHTR, which is an application based on artificial intelligence (AI), to decipher manuscripts written in Old Church Slavonic. In a new project, he is using HTR to decipher pieces of writing like old holiday postcards and recipes that are submitted by the general public. Jürgen Reuß talked to him about his research goals.
Achim Rabus says " What some people may regard as mundane bits of writing are actually very interesting sources to work with for linguists. This is because they often reveal how people actually spoke a language at a certain time. When I found out about the call for applications for a grant from the Ministry of Science in Baden-Württemberg that focuses explicitly on smaller departments and the use of artificial intelligence, it was clear to me that we fit the funding criteria extremely well. First of all, Slavic Studies is a comparatively small department. Second, we already have experience applying AI to language. Third, we identified an area that has been little researched to date where we could combine our expertise in our field with our AI skills to benefit the public in a meaningful way."
There are many people in Germany who, for whatever reason, have lost touch with their cultural heritage. This is often the case when linguistic traditions don't get passed down. A good example is grandparents' and great-grandparents' generations, who learned Kurrent or Sütterlin cursive script in school. Because of the different script, their letters and diaries cannot be read by people today, although they were written in German. The program the team are using for recognizing old Slavonic handwriting, called Transkribus, can be trained for other languages and scripts as well, including old German cursive handwriting.
"We are offering this service because every program gets better the more you train it. Also, because we are Slavic Studies researchers, we are of course especially interested in reaching the many Germans with roots in Russia or former Yugoslavia. Some may even speak a little of the language of their grandparents, but if they have roots in the Serbian Orthodox part of former Yugoslavia, for example, they won't be able to read the recipes or diaries that their grandparents wrote because these were written in Cyrillic. You can find several examples of the kind of writing people have sent to us to decipher on our Instagram page." says Achim Rabus.
Dr Rabus's colleague Prof. Dr. Johanna Pink from the Department of Southeast Asian Studies is also involved in the MultiHTR project. Ottoman Turkish was traditionally written in Arabic until Atatürk decided to use the Latin alphabet instead. This means that if you grew up in Germany but your grandparents immigrated from Turkey, you won't be able to read handwriting from the Ottoman era. That is why the team want to develop smart Transkribus models that are able to decipher handwriting and convert it into the Latin alphabet. However, one somewhat larger problem when training AI recognition is that Arabic, for example, is written from right to left.
Dr Rabus adds "Our HTR AI gets better and better the more it is trained. We see our online offer as a kind of crowdsourcing. The more people who take advantage of the offer and help to correct the results, the more we and subsequent users will benefit from this."
Source: University of Freiburg
Top image: Pixabay