Internationalisation of scalable font backends.

Fonts in XFree86 : Internationalisation of scalable font backends.
Previous: New fonts
Next: New font backends

3. Internationalisation of scalable font backends.

The scalable font backends (Type 1, Speedo, TrueType) can now automatically re-encode fonts to the encoding specified in the XLFD in `fonts.dir'. For example, a `fonts.dir' file can now contain entries for the Type 1 Courier font such as

cour.pfa -adobe-courier-medium-r-normal--0-0-0-0-m-0-iso8859-1 cour.pfa -adobe-courier-medium-r-normal--0-0-0-0-m-0-iso8859-2

which will lead to the font being recoded to ISO 8859-1 and ISO 8859-2 respectively.

3.1. The `fontenc' layer

Three of the scalable backends (Type 1, Speedo, and the `freetype' TrueType backend) use a common `fontenc' layer for font re-encoding. This allows those backends to share their encoding data, and allows simple configuration of new locales independently of font type.

Please note: the X-TrueType (X-TT) backend does not use the `fontenc' layer, but instead uses its own method for font reencoding. Readers only interested in X-TT may want to skip to Using Symbol Fonts, as the intervening information does not apply to X-TT. X-TT itself is described in more detail in X-TrueType.

In the `fontenc' layer, an encoding is defined by a name (such as `iso8859-1'), eventually a number of aliases (alternate names), and an ordered collection of mappings. A mapping defines the way the encoding can be mapped into one of the ``target'' encodings known to the `fontenc' layer; currently, those consist of Unicode, Adobe glyph names, and arbitrary TrueType `cmap's.

A number of encodings are hardwired into `fontenc', and are therefore always available; the hardcoded encodings cannot easily be redefined. These include:

`iso10646-1': Unicode;
`iso8859-1': ISO Latin-1 (Western Europe);
`iso8859-2': ISO Latin-2 (Eastern Europe);
`iso8859-3': ISO Latin-3 (Southern Europe);
`iso8859-4': ISO Latin-4 (Northern Europe);
`iso8859-5': ISO Cyrillic;
`iso8859-6': ISO Arabic;
`iso8859-7': ISO Greek;
`iso8859-8': ISO Hebrew;
`iso8859-9': ISO Latin-5 (Turkish);
`iso8859-10': ISO Latin-6 (Nordic);
`iso8859-15': ISO Latin-9, or Latin-0 (Revised Western-European);
`koi8-r': KOI8 Russian;
`koi8-u': KOI8 Ukrainian (see RFC 2319);
`koi8-ru': KOI8 Russian/Ukrainian
`koi8-uni': KOI8 ``Unified'' (Russian, Ukrainian, and Byelorussian);
`koi8-e': KOI8 `European', ISO-IR-111, or ECMA-Cyrillic;
`microsoft-symbol' and `apple-roman': these are only likely to be useful with TrueType symbol fonts.

New encodings can be added by defining encoding files. When a font encoding is requested that the `fontenc' layer doesn't know about, the backend checks the directory in which the font file resides (not the directory with `fonts.dir'!) for a file named `encodings.dir'. If found, this file is scanned for the unknown encoding, and the requested encoding definition file is read in. The mkfontdir(1) utility, when invoked with the `-e' option followed by the name of a directory containing encoding files, can be used to automatically build `encodings.dir' files. See the mkfontdir(1) manpage for more details.

A number of predefined encoding files have been included with the distribution. Information on writing new encoding files can be found in Format of encodings directory files and Format of encodings files.

3.2. Backend-specific notes about fontenc

3.2.1. Type 1

The Type 1 backend first searches for a mapping with a target of PostScript. If one is found, it is used. If none is found, the backend searches for a mapping with target Unicode, which is then composed with a built-in table mapping codes to glyph names. Note that this table only covers part of the Unicode code points that have been assigned names by Adobe.

If neither a PostScript or Unicode mapping is found, the backend defaults to ISO 8859-1.

Specifying an encoding value of `adobe-fontspecific' disables the encoding mechanism. This is useful with symbol and wrongly encoded fonts (see below).

The Type 1 backend currently limits all encodings to 8-bit codes.

3.2.2. Speedo

The Speedo backend searches for a mapping with a target of Unicode, and uses it if found. If none is found, the backend defaults to ISO 8859-1.

The Speedo backend limits all encodings to 8-bit codes.

3.2.3. The `freetype' TrueType backend

The TrueType backend scans the mappings in order. Mappings with a target of PostScript are ignored; mappings with a TrueType or Unicode target are checked against all the cmaps in the file. The first applicable mapping is used.

Authors of encoding files to be used with the TrueType backend should ensure that mappings are mentioned in decreasing order of preference.

3.3. Format of encodings directory files

In order to use a font in an encoding that the font backend does not know about, you need to have a `encodings.dir' file in the same directory as the font file used. `encodings.dir' has the same format as `fonts.dir'. Its first line specifies the number of encodings, while every successive line has two columns, the name of the encoding, and the name of the encoding file; this can be relative to the current directory, or absolute. Every encoding name should agree with the encoding name defined in the encoding file. For example,

3 mulearabic-0 encodings/mulearabic-0.enc mulearabic-1 encodings/mulearabic-1.enc mulearabic-2 encodings/mulearabic-2.enc

Note that the name of an encoding must be specified in the encoding file's STARTENCODING or ALIAS line. It is not enough to create an `encodings.dir' entry.

If your platform supports it (it probably does), encoding files may be compressed or gzipped.

`encoding.dir' files are best maintained by the mkfontdir(1) utility. Please see the mkfontdir(1) manpage for more information.

3.4. Format of encoding files

The encoding files are ``free form,'' i.e. any string of whitespace is equivalent to a single space. Keywords are parsed in a non-case-sensitive manner, meaning that `size', `SIZE', and `SiZE' all parse as the same keyword; on the other hand, case is significant in glyph names.

Numbers can be written in decimal, as in `256', in hexadecimal, as in `0x100', or in octal, as in `0400'.

Comments are introduced by a hash sign `#'. A `#' may appear at any point in a line, and all characters following the `#' are ignored, up to the end of the line.

The encoding file starts with the definition of the name of the encoding, and eventually its alternate names (aliases):

STARTENCODING mulearabic-0 ALIAS arabic-0 ALIAS something-else

The names of the encoding should be suitable for use in an XLFD font name, and therefore contain exactly one dash `-'.

The encoding file may then optionally declare the size of the encoding. For a linear encoding (such as Mule Arabic, or ISO 8859-1), the SIZE line specifies the maximum code plus one:

SIZE 0x2B

For a matrix encoding, it should specify two numbers. The first is the number of the last row plus one, the other, the highest column number plus one. For example, in the case of `jisx0208.1990-0' (JIS X 0208(1990), double-byte encoding, high bit clear), it should be

SIZE 0x75 0x80

Codes outside the region defined by the size line are supposed to be undefined. Encodings default to linear encoding with a size of 256 (0x100). This means that you must declare the size of all 16 bit encodings.

What follows is one or more mapping sections. A mapping section starts with a `STARTMAPPING' line stating the target of the mapping. The target may be one of:

Unicode (ISO 10646):
STARTMAPPING unicode
a given TrueType `cmap':
STARTMAPPING cmap 3 1
PostScript glyph names
STARTMAPPING postscript

Every line in a mapping section maps one from the encoding being defined to the target of the mapping. In mappings with a Unicode or TrueType mapping, codes are mapped to codes:

0x21 0x0660 0x22 0x0661 ...

As an abbreviation, it is possible to map a contiguous range of codes in a single line. A line consisting of three integers

start end target

is an abbreviation for the range of lines

start target start+1 target+1 ... end target+end-start

For example, the line

0x2121 0x215F 0x8140

is an abbreviation for

0x2121 0x8140 0x2122 0x8141 ... 0x215F 0x817E

Codes not listed are assumed to map through the identity (i.e. to the same numerical value). In order to override this default mapping, you may specify a range of codes to be undefined by using an `UNDEFINE' line:

UNDEFINE 0x00 0x2A

or, for a single code

UNDEFINE 0x1234

This works because later values override earlier one.

PostScript mappings are different. Every line in a PostScript mapping maps a code to a glyph name

0x41 A 0x42 B ...

and codes not explicitly listed are undefined.

A mapping section ends with an ENDMAPPING line

ENDMAPPING

After all the mappings have been defined, the file ends with an ENDENCODING line

ENDENCODING

Lines of the form

UNASSIGNED 0x00 0x1F

UNASSIGNED 0x1234

are ignored by the server, but may be used by supporting utilities.

In order to make future extensions to the format possible, lines starting with an unknown keyword are ignored, as are mapping sections with an unknown target.

3.5. Using symbol fonts

Type 1 symbol fonts should be installed using the `adobe-fontspecific' encoding.

In an ideal world, all TrueType symbol fonts would be installed using one of the `microsoft-symbol' and `apple-roman' encodings. A number of symbol fonts, however, are not marked as such; such fonts should be installed using `microsoft-cp1252', or, for older fonts, `microsoft-win3.1'.

In order to guarantee consistent results (especially between Type 1 and TrueType versions of the same font), it is possible to define a special encoding for a given font. This has already been done for the `ZapfDingbats' font; see the file `encodings/adobe-dingbats.enc'.

3.6. Using badly encoded font files

A number of text fonts are incorrectly encoded. Incorrect encoding is sometimes done by design, in order to make a font for an exotic script appear like an ordinary Western text font. It is often due to the font designer's laziness or incompetence; in particular, most people seem to find it easier to invent idiosyncratic glyph names rather than follow the Adobe glyph list.

There are two ways of dealing with such fonts: using them with the encoding they were designed for, and creating an ad hoc encoding file.

Of course, most of the time the proper fix would be to hit the font designer very hard on the head with the PLRM (preferably the first edition, as it was published in hardcover).

3.6.1. Using fonts with the designer's encoding

In the case of Type 1 fonts, the font designer can specify a default encoding; this encoding is requested by using the `adobe-fontspecific' encoding in the XLFD name. Sometimes, the font designer omitted to specify a reasonable default encoding; in this case, you should experiment with `adobe-standard', `iso8859-1', `microsoft-cp1252', and `microsoft-win3.1', (`microsoft-symbol' doesn't make sense for Type 1 fonts).

TrueType fonts do not have a default encoding, and use of the Microsoft Symbol encoding yields strange results with text fonts on some (non-X11) platforms. However, most TrueType fonts are designed with either Microsoft or Apple platforms in mind, so one of `microsoft-cp1252', `microsoft-win3.1', or `apple-roman' should yield reasonable results.

3.6.2. Specifying an ad hoc encoding file

It is always possible to define an encoding file to put the glyphs in a font in any desired order. Again, see the `encodingsadobe-dingbats.enc/' file to see how this is done.

3.6.3. Specifying font aliases

By following the directions above, you will find yourself with a number of fonts with unusual names -- specifying encodings such as `adobe-fontspecific', `microsoft-win3.1' etc. In order to use these fonts with standard applications, it may be useful to remap them to their proper names.

This is done by writing a `fonts.alias' file. The format of this file is similar to the format of the `fonts.dir' file, except that it maps XLFD names to XLFD names. A `fonts.alias' file might look as follows:

1 "-ogonki-alamakota-medium-r-normal--0-0-0-0-p-0-iso8859-2" \ "-ogonki-alamakota-medium-r-normal--0-0-0-0-p-0-adobe-fontspecific"

(both XLFD names on a single line). The syntax of the `fonts.alias' file is described in the mkfontdir(1) manual page.

Fonts in XFree86 : Internationalisation of scalable font backends.
Previous: New fonts
Next: New font backends