aboutsummaryrefslogtreecommitdiffstats
path: root/languages/messages/MessagesId.php
diff options
context:
space:
mode:
authorBrad Jorsch <bjorsch@wikimedia.org>2013-08-27 15:28:52 -0400
committerTim Starling <tstarling@wikimedia.org>2013-10-24 09:44:33 +0000
commitc8006382739518fda279344638b8e263d84d72dc (patch)
treeb7948d59b7c064088440eb96b335f2dcc28aa9e4 /languages/messages/MessagesId.php
parentde7af7ac2c651d747221dd322fa9e40956681cb9 (diff)
downloadmediawikicore-c8006382739518fda279344638b8e263d84d72dc.tar.gz
mediawikicore-c8006382739518fda279344638b8e263d84d72dc.zip
Improve linkprefix regular expressions
The regular expression in the linkprefix message is run against the entire page up to each wikilink, and is expected to capture one group having everything except the prefix and another having only the prefix. For long pages this winds up being a lot of text, so inefficient regular expressions are going to cause problems. The current regex is this: /^(.*?)([a-zA-Z\\x80-\\xff]+)$/sD This is not efficient: it will scan through the string trying to match against every run of one or more letters/non-ASCII characters, backtracking at every one except possibly the last. The only reason this hasn't been a huge problem everywhere is because only a few languages have this feature enabled. This change replaces this with this regex: /^((?>.*(?<![a-zA-Z\\x80-\\xff])))(.+)$/sD This is rather more efficient: it will grab the whole string (which is actually fast even for huge strings), then back off character by character until it finds one that isn't a letter/non-ASCII. Note that the above could be simplified somewhat: /^((?>.*[^a-zA-Z\\x80-\\xff]|))(.+)$/sD The performance improvement here is minor, and Gujarati, Church Slavic, Udmurt, and Ukrainian would still need the other style for their current implementations. For Gujarati, we also use another regex trick: a look-behind assertion in PCRE must be fixed length, so something like (?<!a|bb) won't work. But that regex fragment is equivalent to (?<!a)(?<!bb) which is allowed, so we use that instead. Bug: 52865 Change-Id: Iaa7eaa446b3f045a9ce970affcb2a889f44bdefd
Diffstat (limited to 'languages/messages/MessagesId.php')
-rw-r--r--languages/messages/MessagesId.php2
1 files changed, 1 insertions, 1 deletions
diff --git a/languages/messages/MessagesId.php b/languages/messages/MessagesId.php
index e4442bf02097..2c3b33d42ea8 100644
--- a/languages/messages/MessagesId.php
+++ b/languages/messages/MessagesId.php
@@ -481,7 +481,7 @@ $messages = array(
'broken-file-category' => 'Halaman dengan gambar rusak',
'categoryviewer-pagedlinks' => '($1) ($2)',
-'linkprefix' => '/^(.*?)([a-zA-Z\\x80-\\xff]+)$/sD',
+'linkprefix' => '/^((?>.*(?<![a-zA-Z\\x80-\\xff])))(.+)$/sD',
'about' => 'Tentang',
'article' => 'Halaman isi',