Learning Zone

Zawgyi vs. Unicode

Have you ever thought about what goes into making text appear on your screen?


How an app that’s created in Estonia can be used by someone in Canada? How a blog post written on a Macbook can be read on a Samsung Galaxy S7?


Or how the Wingdings fonts - the most fascinating thing to hit my family computer before Full Tilt! Pinball - worked; and more importantly, why we needed 3 of them?


I hadn’t. Until I was investigating the impact of one country not using the international standard for character encoding.


Which really should be the case study for why international standards are a really big deal. And why understanding the local culture and user behaviour is, in many cases, more important that just providing language translations.

What Happened in Myanmar

myanmar.jpg


When I think about Myanmar I think about the iconic hot-air balloons rising over the Bagan temples. About the views from Mandalay Hill as my desktop background. And about the longest civil war in the history of humankind.


It was - and in some areas still is - a series of ongoing insurgencies that began in 1948, shortly after the country gained independence from the United Kingdom that same year.


But even with the insurgencies, Myanmar of the early 1960s was the largest exporter of rice in the world. With an educated workforce and well-run economic and legal systems.


In Asia’s high-school yearbook of 1961, Myanmar was voted most likely to become fully industrialised.


So how did Asia’s rising star end up with decades of economic stagnation?


A coup. A coup in 1962. And economic sanctions by the US and EU. And borders that were closed to mass immigration and emigration.


This led to an under-developed Myanmar, far behind its neighbouring nations. SIM cards cost as much as $3000 USD. And with the context of a per capita income at $800 USD - mobile phones, landlines and internet were limited to the wealthy.


This started to change in 2010 when the elected president, and succeeding ruling party, gradually sought to reverse decades of mismanagement. And this eventually led to more-widely-opened borders, influences from the outside world, the introduction of $7 USD SIM cards and a reduction in censorship on the internet. Which in turn led to widespread connectivity through mobile phones and the internet.


But while Myanmar was enclosed within its own borders, fighting off insurgency, the outside world was discovering antibiotics. Looking for life on Mars. Reeling from the debut and shocking disbandment of the Beatles. And using the internet to facilitate communication by eliminating geographic, cultural and linguistic barriers.

Insert Unicode

Insert Unicode! Unicode to the rescue?

 

The international standard for encoding, representing and handling text, developed in 1991. Basically, if you’re creating or using any app or website that involves showing text on a screen, Unicode is most likely involved. It defines the way in which text characters should be stored and processed. So that they are correctly rendered by any Unicode-compliant font anywhere in the world.


It covers all the characters for all the writing systems of the world, modern and ancient. And yes, this includes Burmese.


I hope you’ve been paying attention to the dates. And everything else I’ve been saying. If you have, you’ll notice that this international standard was developed while Myanmar was closed off to the world. So while the rest of the world nodded in agreement to use this standard, Myanmar could barely hear the conversation. And no one could see Myanmar furiously shaking its head.

Insert Zawgyi

The Burmese standard for encoding, representing and handling Burmese text. The concept is similar to Unicode. Except Zawgyi is only used for Burmese characters in the Burmese language.


For both types of encoding, they can only be used with fonts that are compliant. With Unicode, this is easy, as it’s an international standard. For Zawgyi, this is restricted to specific phones and one specific country - Myanmar.


Sadly, as both Unicode and Zawgyi use similar ranges of code-points to store and process the characters, there will never be one universal font that can render both of them.

What this Means for App & Website Developers

In case you’re wondering, this is what it would look like:


Encoding

With Unicode (Padauk) font

With ZawgyiOne font

Unicode text

   

Zawgyi-encoded text

   

Table 1: Showing how different characters are rendered depending on the encoding and fonts

Source: http://www.unicode.org/faq/myanmar.html


Today this presents a unique and challenging problem. Apps and websites developed outside of Myanmar assume that users have devices - phones, laptops, tablets, smart watches, car displays, etc. - that use Unicode.


Once tech companies around the world realised the internet was becoming more widespread in Myanmar they offered Burmese translations. And they patted themselves on the back. And then sat back as their data on Myanmar barely changed.


They didn’t realise that users with Zawgyi devices won’t be able to accurately view the entire UI or any content created using Unicode. This could be anything from articles, blog posts, social media posts or even comments.  


And over 90% of devices in Myanmar use Zawgyi.


Any features involving data validation - I’m mainly thinking of names, phone numbers and email addresses at registration - are also at risk if the user enters Zawgyi characters where Unicode was expected or vice versa.


If you’re hoping people land on your website by searching through a search engine, sit up and take note - it’s unlikely that search engines are built to understand all text encodings. Especially when there’s a widely-accepted international standard. Unicode text on your site can be consistently misinterpreted and thus excluded from search results on Zawgyi devices.


If you’ve got an internal search feature, think about what would happen if a user enters the search text in Zawgyi but all your content is in Unicode - again I’m expecting misinterpretation and exclusion.


And what about whether your users have to constantly switch between your app and a converter, copying and pasting text just to understand what the UI is saying. I’m not sure that’s the user flow you’re aiming for. Or whether your app (or their phones) can handle that constant switching.


And if it were me, I’m not sure I’d stick around long enough to find out the answer. Also, as we’re talking about what I’d do, I’m pretty sure I wouldn’t refer your app to friends or family - there goes any hope you had for gaining users by word of mouth.


Burmese is spoken by 33 million people. 55 million people live in Myanmar.


Even if only 60% of speakers had access to the internet, that’s 20 million people. Even if only 50% of those had devices that worked with your website/app, that’s 10 million. Even if only 10% of those would use your website/app, that’s 1 million people. 1 million users you could have. But don’t.

So what can you do about that?

I don’t have all the answers, but I have some solutions for you to consider. And I also don’t have the details on how you’d implement these solutions, but a quick Google would probably point you in the right direction.


  1. Detecting the encoding and converting to Unicode. Now’s a good time to tell you that correctly detecting the encoding at all times is impossible. But, you can use tools like chardet that make educated guesses about a text’s encoding by using computer algorithms to study large volumes of text. Once detected, convert the text to Unicode and display with a Unicode-compliant font.

  1. Use webfonts (only for websites). For websites, once the encoding is detected, you can use webfonts on your site so that each block of text is displayed correctly. By using webfonts, you jump the barriers of device limitations as the fonts are loaded along with the text instead of being downloaded. For your UI, this is easy as you don’t even need to detect the encoding, you know which one you used. But it is slightly trickier for displaying user-generated content - as you’ll need to detect the encoding first.

  1. Use bundled fonts (only for mobile devices). Some devices and operating systems don’t allow users to change or replace fonts, I’m looking at you Apple. You can bundle the font within the app, but it can only be used for this one specific app. A better solution for you than it is for the users. But a solution nonetheless.

  1. Let users switch. This adds complexity to the client and the server. But you could give the users the option to switch themselves, trust them to know which one they need. And from what I understand of Myanmar, the majority of users are aware that there’s a text display issue, and will likely know what to do.

Think of it similar to the way you’d switch languages using a language selector - if the label of each language is in that language then the users know which to chose. You just need to make it obvious where the selector is. Hint: don’t bury it in settings where the user would have to go through multiple pages of text they can’t read.


  1. Don’t do anything. The final solution will limit your usability and reach. You can completely ignore this problem. And let the users decide whether they want to use your website or app.

Conclusion

For the people in Myanmar, they’re used to this as they, and the rest of the world, slowly caught on to the implications of Zawgyi and Unicode. And they’ve found ways to deal with it:


  • Using converters. The flow of copying and pasting and switching between apps is not the most user-friendly but for a very long time, it was the only way to understand what international apps were saying.
  • User-loaded fonts. Once the tech-savvy caught on to what was happening, they’d either install the Unicode fonts or the Zawgyi fonts, depending on which device they bought. The main downside of this being that they could then only exclusively use international apps/websites (Unicode) or Burmese apps/websites (Zawgyi). The users would also have to download the font using precious wifi which still isn’t what you’d consider affordable to the masses in Myanmar.
  • Store-loaded fonts. Once the phone shops picked up on this issue, they began offering to load the fonts in store when a device is bought. This got over the wifi issue as they could use copies of the same files. This type of service is similar to phone shops in Nigeria and India that pre-load specific applications, for the same goal of saving internet allowances.
  • Factory-loaded fonts. Once the manufacturing companies, most notably Chinese device manufacturers and Samsung, found out about Zawgyi and the market for phones with Zawgyi and Unicode, they started pre-loading the devices with both fonts. They looked at the numbers, the ones that I was telling you about earlier, and realised that even a small percentage of a 55 million population is significant.
  • Recognising symbols. As the Burmese people started using international apps and websites on their Zawgyi phones, they began to recognise the distorted characters. It’s like when the archeologists and historians began to decode the hieroglyphics. Which when you think about it, is really amazing. While some characters are unreadable, or replaced with empty space, others allowed the users to understand enough to get to specific pages, decide which text to paste into a converter or figure out how to change the settings.

So if you’re going to go for this option, you’ll need to consider these user behaviours.


From colonisation to independence to insurgencies to the brink of industrialisation to economic mismanagement and diplomatic isolation to democratic reforms, Myanmar has had a tumultuous time over the last few decades.


As they emerge from their civil war and reform their country and economy, the internet will be more involved than just lending a helping hand. As they introduce new technologies and develop their own, it’s important to sit up and take notice of these 55 million people.  


And it’s important to recognise the technological and cultural differences that you’ll need to address to serve this user-base.

 

Sasha Nagarajah

Sasha Nagarajah

Sasha is a self-professed data aficionado, she spends her days analysing data to understand how people all over the world use apps and the internet. And her nights practicing for the next time someone suggests karaoke. Project manager. Former engineer. LinkedIn user.

Schedule a Demo