Language Tech through Time: A Lookback at the Linguist’s Landscape
Once upon a time, languages were steadfastly analogue. The art of translation was something accomplished by language experts with the power of the human brain alone. These days, things look vastly different – there are endless programs and tools that are designed specifically to make the professional linguist’s job easier. And they’re getting more and more sophisticated year on year. However, it would be wrong of us to assume that of all these concepts have sprung up in the last few years …
The most apparent everyday use of multilingual content, for most people at least, is the internet – which is perhaps the single biggest advance in communications technology in the last century. Translations are more visible, and more instantly impactful, than ever before. But it would be careless to claim that innovation in language technologies has been solely driven by the rise of the internet. In fact, software and tools designed to aid the art of translation have been around for considerably longer than we might imagine. Mention “language technology”, and the first thing that comes to mind for most people is likely to be trusty translation memory (TM). The concept of keeping translated strings in a database for repeated use was first proposed as early as the 1970s, and proprietary computer-assisted translation (CAT) software has been commercially available since the early 1990s. Terminology systems, too, have been available for many moons. Recognizing that TM and terminology go hand-in-hand, most CAT tools now provide both functions – marrying powerful terminology management with increasingly advanced translation memory recall.
In any discussion of technology and language, it’s impossible to ignore what for a long time has been the elephant in the room – neural machine translation (NMT). Whilst the seismic shift to NMT, using AI and neural networks to make the output more fluent, has largely occurred in the last few years, the study of machine translation (MT) dates back over 60 years. More surprisingly, the origins of some of the underlying concepts of modern MT have been studied for centuries. Techniques still used in MT today, such as frequency analysis and cryptanalysis, can be traced all the way back to the 9th century. So, far from being a shiny new 21st-century idea, MT is in actual fact one of the oldest devices in the language tech toolkit.
Evidently, whilst these ideas have been around for a while, the advances made in last few years have been staggering. The move from traditional statistical and rule-based MT to neural, has remodelled the way we view translation – not just from within the language industry, but from outside of it too. The immediate availability of high-quality, free, web-based NMT systems means that anyone, anywhere, can have anything translated into almost any language, at any given time. But of course, this comes with certain trade-offs. Although translations using NMT are much more fluent than the output from the older models, their reliance on neural networks means the results can be rocky. Add to this the fact that neural models still miss out parts of sentences, as did statistical and rule-based MT, and you have a potential problem on your hands. And because the output appears more fluent, it can be harder for the linguist to spot these omissions, or to identify inconsistencies.
No Substitute for Nuance
After repeated claims over the last few years that one tech giant has “solved machine translation” or another has “achieved human parity”, you would be forgiven for thinking that the days of the professional linguist are numbered. While it’s true that most modern MT systems are able to provide a level of quality that was previously thought impossible, they are still a long way off the elusive “human parity”. Such output might be satisfactory for someone wanting a quick translation of a few paragraphs from a business letter or a blog post, but it can’t match up to the skillset required for a medical translation, nor the creative evocation of a mood or reaction essential for effective advertising. AI is clever, but there’s a mountain to climb before machines can master cultural nuance. Highly skilled linguists remain the most crucial and valuable assets in the language industry, and even more so when they are post-editing NMT rather than translating from scratch. And that is unlikely to change any time soon.
After a frenetic few years, the NMT hype cycle seems to have settled, as we’re now witnessing a plateau of productivity. While some individual language pairs may see big quality improvements in the near future, it’s difficult to envisage another paradigm shift in the NMT field. The modern linguist now sits at the centre of an almost limitless array of tools and software designed to help them reach peak performance. Arguably, the difficulty now lies with identifying which tools are the most appropriate for the job at hand.
The next shift for language technology is already in motion, and it will go some way to addressing the bewildering choices faced by linguists and language service providers (LSPs). The move away from desktop began some time ago, but the circumstances of the pandemic, and many people working from home, has generated an acceleration in that direction. Most of the big players in the CAT sector already provide cloud-based versions of their systems at both individual and enterprise scale – and there will come a point when the desktop versions are but dinosaurs.
The race is now on to create a one-stop shop that offers the best-in-class of everything the modern linguist needs – sophisticated TM management with fragment matching and cached lookup, dynamic terminology, adaptive NMT from multiple providers, automated QA smart tagging, the ability to handle almost any file type, easy interoperability with other systems, and full integration. Now, that’s a long and ambitious list – but there’s certainly promising progress on the skyline for linguists and LSPs alike.