You already know Twitter is an international phenomenon. Social media played such a key role in revolutions across the Arab world that “Facebook“ is an emerging name for newborns of the Arab Spring. There’s even a baby called “Hashtag” here in the U.S.
“WHAT DOESN’T CHANGE, FROM ARABIC TO MALAY TO URDU, IS THE 140-CHARACTER LIMIT”
Twitter has now been translated into 36 languages, most recently Basque, Catalan, Romanian, Galician and, yes, lolcat (check it out). Over two-thirds of Twitter accounts are from outside the US as of 2012, and though English is still the most prominent, more than half of all tweets were in other languages according to the social-media research firm Semiocast. What doesn’t change, from Arabic to Malay to Urdu, is the 140-character limit. Of all these languages, which one can pack the most information into a tweet?
“THOUGH COUNTING CHARACTERS IS STRAIGHTFORWARD ENOUGH, QUANTIFYING INFORMATION IS VERY HARD”
The answer to this question isn’t as simple a matter as going around and asking linguists which languages are the most compact. The responses to such queries are tepid, and the many who did answer, sincerely trying to help, said something along the lines of British linguist David Crystal’s response: “These are unanswerable questions I’m afraid.” The endpoint of this search was less of an answer than an explanation of why the question is so unanswerable. Though counting characters is straightforward enough, quantifying information is very hard.
It’s a difficult question because it requires you to define information, to decide what counts as information and how to value it, Google computational linguist Richard Sproat told The Connectivist. “Different languages choose to indicate different kinds of things, and it’s really hard to say [that] one carries more information than another.” Sproat illustrated this point by translating “my grandfather died” (19 characters, with spaces in English) into Korean: “할아버지게서 돌아가셨어요.” This took slightly fewer characters (12) to express the idea. But some information was lost in translation. The Korean, Sproat said, included many honorifics, or marks of respect, for the grandfather that do not come out in English, and that’s true not just of this example but of the whole language, in which social status information is built in.
“JAPANESE SEEMED TO PROVIDE THE GREATEST ADVANTAGE”
“There are all these different kinds of information,” Sproat commented, “the fact that grandfather died, the fact that you’re being particularly deferential to your grandfather, and so on and so forth…In order to equate them, you have to know how to characterize the relative importance of these pieces of information.”
Another problem, as Steven Bedrick, a linguist from Oregon Health and Science University, pointed out, is that not all information in language is explicit. He gave the example of the 6-word story often attributed to Hemingway, “For sale, baby shoes, never worn.” (33 characters, with spaces.) There, most of the information is implied or left to the imagination. Though few tweets are Hemingway caliber, reading between the lines is generally an important part of reading comprehension and literary interpretation. On Twitter, do people read between the words? Quantify that.
This all boils down to the conclusion that deciding which language conveys the most information in 140 characters is messy at best and impossible at worst. Nonetheless, there is one clear conclusion: “For languages like Japanese and Chinese, because of the way their writing systems work, you can certainly encode more information in a tweet. That’s pretty safe to say,” Sproat told The Connectivist. “Beyond that, it would be hard to make any claims that would hold up under scrutiny.”
JAPANESE TRANSLATION BALLOONED ENGLISH TWEETS TO 260 CHARACTERS
One person did an experiment to answer the unanswerable question, and he reached the same basic conclusion as Sproat. As reportedin The Atlantic, Ben Summers, an IT specialist in the UK, used Google Translate to translate tweets from several different languages that use non-Latin alphabets into English and compared the number of characters between the original and the English translation. Summers found that Japanese seemed to provide the greatest advantage, with translated tweets ballooning to 260 characters in English. Thai tweets were in second place, for 185 characters. Russian, it seemed, did not give the tweeter much advantage.
Chinese, too, lets a tweeter say more. “Chinese characters pack a lot more information into them than English letters,” Sproat told The Connectivist. “In linguistic terms, they represent whole morphemes,” or units of meaning. Twitter remains blocked by China’s government, but there are other Twitter-like micro-blogging sites, led by Sina Weibo, which also has a 140-character limit for posts. Ashley Zhang, a recent graduate of Boston College, uses Weibo to keep up with friends back home in China. “Essentially,” Zhang told The Connectivist, “you can write a short story with 140 characters.”
“IT’S LIKE AN ELECTRONIC EDITOR”
It’s debatable, however, whether being able to say more on Twitter, in another language or otherwise, is an advantage. In the summer of 2011, bloggers debated whether or not Twitter should up the character limit. Slate‘s Farhad Manjoo argued that the character limit “turns otherwise straightforward thoughts into a bewildering jumble of txtese” and that Twitter should double the character limit so that people can write with more depth.
Others argued that less is more on Twitter; that rigorous structures promote creativity and improve writing. Journalist Matthew Ingram, who covers the media for GigaOM wrote: “As far as I’m concerned, the 140-character limit is one of the most brilliant things Twitter has ever done…it restricted what people could post, so that Twitter didn’t become a massive time-sink of 1,000-word missives and rambling nonsense.” Mallary Jean Tenore thinks the character limit enforces good writing: “It’s like an electronic editor that forces us to find a focus and make every word count.”
Beyond questions of writing quality, does being able to write more on a micro-blog defeat the purpose? If writing in Japanese or Chinese removes the constraint element, does it become something else altogether? “I wonder if composing a good Weibo message counts as a different communicative act than composing a good tweet,” says Bedrick. When Twitter is translated to another language, the whole enterprise can become somewhat foreign.Tweet