Guidelines/Internationalization

From Apertis
Jump to: navigation, search

Contents

Internationalization

Internationalization (commonly abbreviated ‘i18n’) is a topic which covers many areas: more than just translating UI strings, it involves changing settings and defaults to match the customs and conventions of the locale a program is being run in. For example, days of the week, human name formats, currencies, etc.

Summary

  • Design projects to be internationalized from the beginning. (#Basics)
  • Use gettext (not intltool) for string translation. (#Basics)
  • Remember that all strings are in UTF-8, and may contain multi-byte characters. (#Unicode)
  • Programs cannot reasonably implement changing locales at runtime. (#Changing locale)

Basics

Documenting the whole process of preparing a project for internationalisation is beyond the scope of this document, but some good guides exist:

It is important to prepare a project for internationalization early in its lifetime, otherwise non-internationalizable programming practices creep in, and are hard to eliminate. For example, splitting strings into multiple translation units.

To add internationalization support to a project, follow the instructions here, which can be summarised as adding the following to configure.ac:

AM_GNU_GETTEXT_VERSION([0.19])
AM_GNU_GETTEXT([external])

GETTEXT_PACKAGE=AC_PACKAGE_TARNAME
AC_DEFINE_UNQUOTED(GETTEXT_PACKAGE, ["$GETTEXT_PACKAGE"], [Define to the Gettext package name])
AC_SUBST(GETTEXT_PACKAGE)

Note that intltool is outdated, and we only need to use gettext.

Add po/Makefile.in to AC_CONFIG_FILES and po to SUBDIRS in Makefile.am. Then create an empty po/POTFILES.in file (which will be modified when files are marked for translation), an empty po/LINGUAS file (which will be modified when extra translation languages are added), and create po/Makevars containing:

DOMAIN = $(PACKAGE)-$(VERSION)
COPYRIGHT_HOLDER =
MSGID_BUGS_ADDRESS =
EXTRA_LOCALE_CATEGORIES =
PO_DEPENDS_ON_POT = no

XGETTEXT_OPTIONS = \
  --from-code=UTF-8 \
  --keyword=_ --flag=_:1:pass-c-format \
  --keyword=N_ --flag=N_:1:pass-c-format \
  --flag=g_log:3:c-format --flag=g_logv:3:c-format \
  --flag=g_error:1:c-format --flag=g_message:1:c-format \
  --flag=g_critical:1:c-format --flag=g_warning:1:c-format \
  --flag=g_print:1:c-format \
  --flag=g_printerr:1:c-format \
  --flag=g_strdup_printf:1:c-format --flag=g_strdup_vprintf:1:c-format \
  --flag=g_printf_string_upper_bound:1:c-format \
  --flag=g_snprintf:3:c-format --flag=g_vsnprintf:3:c-format \
  --flag=g_string_sprintf:2:c-format \
  --flag=g_string_sprintfa:2:c-format \
  --flag=g_scanner_error:2:c-format \
  --flag=g_scanner_warn:2:c-format

subdir = po
top_builddir = ..

These should be committed to git.

No other translation infrastructure files should be committed to git, especially not the following. See the module setup guidelines for more information.

  • po/ChangeLog
  • po/Makefile.in.in
  • po/POTFILES
  • po/stamp-it
  • po/*.mo

Unicode

All strings in GLib, unless otherwise specified, are in Unicode, encoded as UTF-8. They must be handled as such, which means all string manipulation must be done in terms of Unicode characters, rather than bytes. In many cases, string manipulation functions do not need to differentiate between the two; manual array indexing is a situation where you should be careful.

GLib provides a set of UTF-8-safe versions of standard C string manipulation functions, which should always be used instead of the standard C ones.

Sorting strings

When displaying sorted strings in the UI, care needs to be taken to ensure the strings are sorted using Unicode algorithms, rather than plain ASCII algorithms. This means using g_utf8_collate() rather than strcmp() to establish an order between two strings.

Furthermore, if section headings need to be used for splitting a list into alphabetical sections, they need to be generated using the current locale’s alphabet, rather than just the A–Z English alphabet. One approach to doing this would be to extract the first character of each item’s name (using g_utf8_get_char_validated()) then using it as a section heading if it’s considered alphabetic for the current locale (using g_unichar_isalpha()).

Changing locale

Changing locale at runtime is not safe, as it requires calling setenv(), which is explicitly not thread safe. It also theoretically involves more than just changing UI strings — it involves changing date formats, number formats, and the output of any code which is predicated on those. The impacts of changing locale can be far-reaching and subtle.

To change the locale of an application, the application has to be restarted.

Language identifiers

When referring to languages (e.g. in configuration files or preferences), always use the ISO-639 language codes, as used by gettext.

External links

Personal tools
Namespaces

Variants
Actions
Navigation
Tools