Internationalization (commonly abbreviated ‘i18n’) is a topic which covers many areas: more than just translating UI strings, it involves changing settings and defaults to match the customs and conventions of the locale a program is being run in. For example, days of the week, human name formats, currencies, etc.
- Design projects to be internationalized from the beginning. (#Basics)
- Use gettext (not intltool) for string translation. (#Basics)
- Remember that all strings are in UTF-8, and may contain multi-byte characters. (#Unicode)
- Programs cannot reasonably implement changing locales at runtime. (#Changing locale)
Documenting the whole process of preparing a project for internationalisation is beyond the scope of this document, but some good guides exist:
- GNOME developer translation guidelines
- gtkmm translation guidelines (aimed at C++ programmers, but widely applicable to C programmers)
- GLib internationalization API reference
It is important to prepare a project for internationalization early in its lifetime, otherwise non-internationalizable programming practices creep in, and are hard to eliminate. For example, splitting strings into multiple translation units.
To add internationalization support to a project, follow the instructions here, which can be summarised as adding the following to
AM_GNU_GETTEXT_VERSION([0.19]) AM_GNU_GETTEXT([external]) GETTEXT_PACKAGE=AC_PACKAGE_TARNAME AC_DEFINE_UNQUOTED(GETTEXT_PACKAGE, ["$GETTEXT_PACKAGE"], [Define to the Gettext package name]) AC_SUBST(GETTEXT_PACKAGE)
Note that intltool is outdated, and we only need to use gettext.
Makefile.am. Then create an empty
po/POTFILES.in file (which will be modified when files are marked for translation), an empty
po/LINGUAS file (which will be modified when extra translation languages are added), and create
DOMAIN = $(PACKAGE)-$(VERSION) COPYRIGHT_HOLDER = MSGID_BUGS_ADDRESS = EXTRA_LOCALE_CATEGORIES = PO_DEPENDS_ON_POT = no XGETTEXT_OPTIONS = \ --from-code=UTF-8 \ --keyword=_ --flag=_:1:pass-c-format \ --keyword=N_ --flag=N_:1:pass-c-format \ --flag=g_log:3:c-format --flag=g_logv:3:c-format \ --flag=g_error:1:c-format --flag=g_message:1:c-format \ --flag=g_critical:1:c-format --flag=g_warning:1:c-format \ --flag=g_print:1:c-format \ --flag=g_printerr:1:c-format \ --flag=g_strdup_printf:1:c-format --flag=g_strdup_vprintf:1:c-format \ --flag=g_printf_string_upper_bound:1:c-format \ --flag=g_snprintf:3:c-format --flag=g_vsnprintf:3:c-format \ --flag=g_string_sprintf:2:c-format \ --flag=g_string_sprintfa:2:c-format \ --flag=g_scanner_error:2:c-format \ --flag=g_scanner_warn:2:c-format subdir = po top_builddir = ..
These should be committed to git.
No other translation infrastructure files should be committed to git, especially not the following. See the module setup guidelines for more information.
All strings in GLib, unless otherwise specified, are in Unicode, encoded as UTF-8. They must be handled as such, which means all string manipulation must be done in terms of Unicode characters, rather than bytes. In many cases, string manipulation functions do not need to differentiate between the two; manual array indexing is a situation where you should be careful.
GLib provides a set of UTF-8-safe versions of standard C string manipulation functions, which should always be used instead of the standard C ones.
When displaying sorted strings in the UI, care needs to be taken to ensure the strings are sorted using Unicode algorithms, rather than plain ASCII algorithms. This means using
g_utf8_collate() rather than
strcmp() to establish an order between two strings.
Furthermore, if section headings need to be used for splitting a list into alphabetical sections, they need to be generated using the current locale’s alphabet, rather than just the A–Z English alphabet. One approach to doing this would be to extract the first character of each item’s name (using
g_utf8_get_char_validated()) then using it as a section heading if it’s considered alphabetic for the current locale (using
Changing locale at runtime is not safe, as it requires calling
setenv(), which is explicitly not thread safe. It also theoretically involves more than just changing UI strings — it involves changing date formats, number formats, and the output of any code which is predicated on those. The impacts of changing locale can be far-reaching and subtle.
To change the locale of an application, the application has to be restarted.
When referring to languages (e.g. in configuration files or preferences), always use the ISO-639 language codes, as used by gettext.