diff options
Diffstat (limited to 'docs/title.txt')
-rw-r--r-- | docs/title.txt | 60 |
1 files changed, 32 insertions, 28 deletions
diff --git a/docs/title.txt b/docs/title.txt index fd449c544f9a..2432bae50b36 100644 --- a/docs/title.txt +++ b/docs/title.txt @@ -9,11 +9,10 @@ these forms and can be queried for the others, and for other attributes of the title. This is intended to be an immutable "value" class, so there are no mutator functions. -To get a new instance, call one of the static factory -methods Title::newFromURL(), Title::newFromDBKey(), -or Title::newFromText(). Once instantiated, the -other non-static accessor methods can be used, such as -getText(), getDBKey(), getNamespace(), etc. +To get a new instance, call Title::newFromText(). Once instantiated, +the non-static accessor methods can be used, such as getText(), +getDBKey(), getNamespace(), etc. Note that Title::newFromText() may +return false if the text is illegal according to the rules below. The prefix rules: a title consists of an optional interwiki prefix (such as "m:" for meta or "de:" for German), followed @@ -42,35 +41,40 @@ a namespace or interwiki prefix. An initial colon in a title listed in wiki text may however suppress special handling for interlanguage links, image links, -and category links. +and category links. It is also used to indicate the main +namespace in template inclusions. -Character mapping rules: Once prefixes have been stripped, the -rest of the title processed this way: spaces and underscores are -treated as equivalent and each is converted to the other in the -appropriate context (underscore in URL and database keys, spaces -in plain text). "Extended" characters in the 0x80..0xFF range -are allowed in all places, and are valid characters. They are -encoded in URLs. Other characters may be ASCII letters, digits, -hyphen, comma, period, apostrophe, parentheses, and colon. No -other ASCII characters are allowed, and will be deleted if found -(they will probably cause a browser to misinterpret the URL). -Extended characters are _not_ urlencoded when used as text or -database keys. +Once prefixes have been stripped, the rest of the title processed +this way: -Character encoding rules: TODO +* Spaces and underscores are treated as equivalent and each +is converted to the other in the appropriate context (underscore in +URL and database keys, spaces in plain text). +* Multiple consecutive spaces are converted to a single space. +* Leading or trailing space is removed. +* If $wgCapitalLinks is enabled (the default), the first letter is +capitalised, using the capitalisation function of the content language +object. +* The unicode characters LRM (U+200E) and RLM (U+200F) are silently +stripped. +* Invalid UTF-8 sequences or instances of the replacement character +(U+FFFD) are considered illegal. +* A percent sign followed by two hexadecimal characters is illegal +* Anything that looks like an XML/HTML character reference is illegal +* Any character not matched by the $wgLegalTitleChars regex is illegal +* Zero-length titles (after whitespace stripping) are illegal -Canonical forms: the canonical form of a title will always be -returned by the object. In this form, the first (and only the -first) character of the namespace and title will be uppercased; -the rest of the namespace will be lowercased, while the title -will be left as is. The text form will use spaces, the URL and -DBkey forms will use underscores. Interwiki prefixes are all -lowercase. The namespace will use underscores when returned -alone; it will use spaces only when attached to the text title. +All titles except special pages must be less than 255 bytes when +encoded with UTF-8, because that is the size of the database field. +Special page titles may be up to 512 bytes. + +Note that Unicode Normal Form C (NFC) is enforced by MediaWiki's user +interface input functions, and so titles will typically be in this +form. getArticleID() needs some explanation: for "internal" articles, it should return the "page_id" field if the article exists, else it returns 0. For all external articles it returns 0. All of the IDs for all instances of Title created during a request are cached, so they can be looked up quickly while rendering wiki -text with lots of internal links. +text with lots of internal links. See linkcache.txt. |