Internationalization Misconceptions

Code For Cash

In an increasingly global, multi-lingual, and multi-cultural world, it becomes ever more import that your software be usable by people using different languages. While you might think that making your application work well for non-English speakers simply requires you to do some string replacement and call it a day, there are a variety of pitfalls waiting for you if you try this.

Expanding your audience to multiple languages is either a major development effort on an existing product, or involves adding a lot of support infrastructure up front on a new product. It adds complexity to every phase of product ideation, planning, development, testing, documentation, support, marketing, and even (sometimes) deployment. In short, getting internationalization and localization right will completely change your development process in a profound way. Like many things that provide you with a significant advantage, it’s worth doing properly to make sure that you get the best result that you can.

Making your application capable of being localized to multiple language and cultural environments is almost a requirement these days. The world is getting smaller, and more interconnected.

Episode Breakdown

What is internationalization and localization?

“Localization” refers to the adaptation of a product, applicaiton, or document content to meet the language, cultural, and other requirements of a specific target market (a ‘locale’). Localization is sometimes written as L10N, where 10 is the number of letters between ‘l’ and ‘n’ in the word “Localization”. Localization is best thought of as the process of making your application available to users in a different locale.

“Internationalization” is the design and development of a product, application, or document content that ENABLES easy localization for target audiences that vary in culture, region, or language. You typically want to have your internationalization process well in-hand before you start your localization process. This typically entails things such as separating assets that are user-facing from source code which is not and loaded the correct assets based on locale. Internationalization is best thought of as an internal process that allows your organization the flexibility to bring on new locales with as few barries as possible.

Think of localization as a repeating process and internationalization as an ongoing process that supports it. Keep your app internationalizable and then you can more easily localize it. This probably doesn’t mean that localization is ever going to be a perfectly smooth process, even if the languages are very close together. Even American English and British English are different enough in places to cause problems.

Can’t I just use google translate?

Google’s translation is extremely lossy. Try the following: Enter a decent-sized piece of text into google translate and pick a commonly used language to translate into. Now hit the button to switch them. Does the English still match what you said? In general, it won’t. Now imagine if this was technical documentation.

If you are relying on an automatic translation service, people can tell. Automatic translations tend to miss common idioms. There are also concepts in other languages that we don’t really have in English

Dates don’t change though, right?

If you went through our episodes on date and time you already know the answer to this one. The way dates are formatted varies widely. This impacts both dates that are input as well as those being output.

This can also get fun when you store what as originally entered and sort in dumb ways. You’d be surprised how frequently people do light string manipulation on dates and then store them in the database as strings, especially if the date isn’t particularly useful to the system (DOB). Then they try to sort them as strings and mayhem ensues as international clients get sorted in an unexpected way.

But the keyboard codes are the same, right?

Also not true. For instance, letters that look the same may be in different places. If you have a keyboard from a non-Euro culture, you won’t even have that.

This doesn’t just impact how you read keystrokes, but also impacts your documentation, menu shortcuts, shortcut key combinations, etc. Even better, you probably need to look and see what common keyboard shortcuts look like for other countries. It may or may not be the same. By the way, windows notepad doesn’t do this correctly. Ctrl + V pastes. With a cyrillic keyboard layout it’s the same key codes, but it’s different letters on the keyboard (ctrl + лю).

I can still sort stuff the same way….?

If you listened to our previous episode on the difficulties with strings, you also know the answer to this one. You have to assume that unicode is going to be in the mix. Text may need to be printed from right to left.

It can very quickly become non-trivial to sort a bunch of names or other text, especially when the items in the list are intermixed between different locales. Turns out sort orders for characters that look the same aren’t preserved across cultures either. Also, unless explicitly specify a collation and sort configuration, many databases make assumptions. This can also have implications in regards to case sensitivity.

At least I don’t have to change symbols, colors, and that kind of stuff, right?

Symbols are culturally dependent as well. For instance, a symbol of a woman in a skirt might be an appropriate emoticon in certain European cultures, while being scandalous at best in some other cultures. Many images, icons, and symbols either contain or are based on our own language. For instance, an upward pointing arrow with a “N” would be easily understood to mean “North” by most English speakers. A Polish compass has a “P” where the “N” would be on an English. These subtle differences will come out in the images you use for core application concepts.

You also have to be careful in regards to colors. White in America is often associated with weddings, purity, that sort of thing, while red is a warning sign. In the far east, white is often associated with death, while red is associated with luck and vitality. This means not only being required to change colors, but having to change the other colors that interact with them in your design system so that the system is usable.

Symbols are another bit of fun. Mailboxes, electrical outlets, hand gestures, and street signs vary between cultures. This means icons based on them will have to do the same, or you’ll risk confusing the users.

At least the legal requirements are the same?

The legal requirements are never the same. Even across states in the US, or countries in the EU, legal requirements around all kinds of things are different. This means things like security policies, data sharing rules, and contracts are in play and have to all be considered in the context of whatever culture you are targeting.

Sometimes the legal requirements will force major changes in design as well. For instance, you might not want to show the flag of Tibet in an application targeted for China. You might also run into legal trouble if you are showing things like national boundaries, timezone boundaries, or even satellite data if the country you are selling into has a different opinion on things. What’s more you might as well consider the future and the past as being different countries as well, given that you can’t even count on words retaining their meaning across longer timer periods.

At least everything will fit in my layout.

Nope. If you have a limited width layout, there is a German compound word that will break it and won’t wrap. You may also have issues with things like the height of characters, causing them to get off at one of the sides. To test, you’re more than likely going to have to literally look through every screen in your app, for every language you support.

Different screen layouts. You may find that screen layouts differ wildly, both on mobile and on the desktop in different regions. This means that you can’t assume that someone on a desktop necessarily has 4K (or sometimes even 1080p). You might even have a hard time finding a mobile device to test with in your country.

At least all the text goes left to right.

Unfortunately, no. Some languages are right to left. While you might reasonably conclude that this isn’t so bad, there are also a lot of other things you’ll have to worry about, such as margins, padding, wrapping, etc. If you have images inline with your text, this may make them look like nonsense based on how they are aligned.

These layout changes in text also impact the way everything else is laid out. For instance, if your layout has changed due to directional text changes, that means that the way you position things like tooltips may have to change as well. In tabular reports, this may also mess with things like column order, especially if the last column was freeform text.

Well, at least math is a universal language and numbers are the same.

Well…kinda. While numbers are the same in many cases, the way they are formatted is not. For instance, in the Brazilian Portuegese local, they use periods to break numbers apart into groups of three digits, like we do in US English. Some of them may not group digits in the same way either.

This gets even more interesting when you start having to worry about user input. When a Brazilian Portuegese user is visiting India and using his account on his boss’s computer (English-India), but the server being contacted is using US English, how do you determine what a decimal means. Bear in mind the impact this will have on any regular expressions you are using on front end validation. You’ll probably be switching those out too. This means very extensive testing, per language supported.

Well, at least the translation only has to be done once by some guy we hire on Fiverr.

While this sounds like the misconception listed above about using google translate, it’s actually worse. People tend to hire translators for this stuff for the short term, but their software is constantly evolving. This means that the “international audience”, ends up being behind on updates, support, security patches, etc.

Internationalization is a continual process that supports localization across time. This means that it is no more fire and forget than any other business process that has to support changing circumstances. It generally also means that developers will not be the primary people dealing with internationalized resources. Unless the developer speaks that language, it’s a waste of time. Instead, the developers have to make the system dynamic enough to have reasonable support for a variety of languages and they have to maintain things such that the support remains in place. QA, documentation staff, and the support team will also have to support other locales as well.

Well, at least my existing server infrastructure will work.

Maybe? However, there are probably assumptions that will become invalid in an internationalized application. You may have to have servers closer to the bulk of these users, since internationalization efforts tend to go along with trying to grab other markets in other countries. Your app may have architectural assumptions that have to change because of sudden in latency between servers. You may also not be able to store your data about users in another country outside of that country.

Your code structure may have to change as well. You may find that off-hours batch processing now has to happen at different times for different locales (rather than just doing stuff at 3:00 AM). You will have to consider security even more thoroughly than you have been. You’re a bigger target and likely have a big data transport pipe sitting in the middle of your infrastructure now. You are likely going to have to collate data for reporting purposes from a variety of sources, and if management expects up to the minute reporting on anything they either have to stop expecting that or pay the cost of that expectation.

Book Club

The Healthy Programmer: Get Fit, Feel Better, and Keep Coding

Joe Kutner

Chapter 10 is titled Refactoring Your Fitness. In the intro, Kutner talks about John Gill who is known for his contributions to rock climbing, specifically bouldering. He talks about how Gill developed an all around exercise routine to train for climbing the boulders in North Georgia. The first section talks about warming up. He emphasize the importance of preparing your body for intense exercise by increasing the blood flow to your muscles and improving the range of motion of your joints. He goes into how combining aerobic exercise with complex movements improves cognitive functioning. In the next section he discusses the dimensions of fitness: body composition, cardiovascular, flexibility, muscular endurance, and muscle strength. The following section discusses unit testing your fitness. In it he lists out several tests including run/walk a mile, push-ups, half sit-ups, sit-and-reach, and body composition. For each one he provides the 50th percentile based on age and sex as a goal. The final section he talks about how to improve your abilities. In it he provide specific training regimens for cardio and strength training. Key take aways are to warm up before exercising, participate in sports with complex movement, and know your BMI.

Tricks of the Trade

Think bigger.

Tagged with: , , , ,