This document recommends how to mark the primary language(s) in a HTML document. It could be considered a clarification of the HTML 4.0 Specification [HTML40]; in particular, it is not in contradiction with the HTML 4.0 Specification. The objective is to have a best practice in this field; at present there is some confusion.
lang
attribute specifies the natural language.
This document is mostly concerned with how to specify the primary language(s)
(there could be more than one)
and the base language
(there is only one)
in HTML documents.
Some documents are bilingual and few are trilingual or n-lingual. Bilingual documents are usually short; i.e, a few paragraphs. N-lingual documents are usually very short; a few sentences.
The main reason for the existence of n-lingual documents is political; i.e., in certain situations it is not politically correct to assume a base language. A common practice is to have one small document that is a menu of languages. For example, the Europa server of the European Commission [EUR].
Another approach to choose the language is to set the client (e.g., the browser) to the preferred language(s). The client will transmit the language(s) in the Accept-Language field of HTTP. Immediately, the server will send an appropriate document. For example, the Spanish version will be presented if the language preferences (in the browser) are Spanish and French and the document is available (in the server) in French, German and Spanish.
<HTML> <HEAD> <META HTTP-EQUIV="Content-Language" Content="fr"> <TITLE>Mon doc</TITLE> </HEAD> <BODY> Je suis un Berlinois. </BODY> </HTML>
The value of the Content
attribute
of the META element is the same as the
value of the Content-Language
header in HTTP;
i.e,
a comma-separated list of language codes.
For example:
<META HTTP-EQUIV="Content-Language" Content="fr,en">
These language codes are the same used in the lang
attribute of some HTML elements.
For example:
<BODY LANG=fr>
The language codes are defined in [RFC1766]. See also 8.1.1 Language codes of the HTML 4.0 Specification [HTML40] and [RFC2068].
The order of the languages in the Content-Language is significant.
The first language in the list is the base language of the document;
i.e., any text not re-specified with the lang
attribute is in
the base language.
The META should not be marked with more than one language in documents with minor fragments in other languages. The rules to specify a document as monolingual, bilingual or n-lingual are the same as for printed books.
The reason for recommending META as opposed to the HTML element with
the lang
attribute are:
A lang
attribute in the HTML element overrides the language
specified in the META element.
The inheritance rules are in
8.1.2 Language information and text direction
of the HTML 4.0 Specification
[HTML40].
In particular, thanks to