aboutsummaryrefslogtreecommitdiffstats
path: root/includes/collation/IcuCollation.php
Commit message (Collapse)AuthorAgeFilesLines
* Use type declaration on undocumented private functionsUmherirrender2025-04-021-1/+1
| | | | Change-Id: I0d8d2237500ed6f18439410c902d47c42e4119bc
* Add uppercase collation for Inari SámiJon Harald Søby2025-03-161-1/+1
| | | | | | | | | Add custom uppercase collation for Inari Sámi, which isn't properly supported in ICU (two letters are missing, so they appear at the end of the list if you try to use `uca-smn-u-kn`. Bug: T388970 Change-Id: I3ec1966f1621dce0bb266281a54fc2c7cf08f405
* Add "Ё" to first letter list for KazakhJon Harald Søby2025-02-141-1/+1
| | | | | | | | | This letter was missing from the list of first letter tailorings in IcuCollation.php, leading it to be sorted under "Е" when using 'uca-kk' and 'uca-kk-u-kn' collations. Bug: T384395 Change-Id: I9183aedc77efd3e998e4a90f75072a97aed09077
* Use namespaced classesUmherirrender2024-10-211-0/+1
| | | | | | | Changes to the use statements done automatically via script Addition of missing use statement done manually Change-Id: I73fb416573f5af600e529d224b5beb5d2e3d27d3
* fix: use objectcachefactory methods instead of deprecated objectcache methodsIrina Balaban2024-05-051-1/+3
| | | | | Bug: T363770 Change-Id: Ie732f6925ec2b1316a60bebbe3c27f963c9dacb1
* Replace some more usages of deprecated MWExceptionDaimona Eaytoy2023-06-091-3/+2
| | | | | Bug: T328220 Change-Id: I3c36835fbd90acc301731e2b33ae4815cd4b0cc5
* Clean up old ICU version checksKevin Israel2023-01-261-52/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since 1.36, intl has been a required PHP extension, and PHP 7.4 dropped support for ICU < 50.1 (Unicode 6.2), so: * In SpecialVersion, don't check whether INTL_ICU_VERSION is defined. * Remove check in the installer for outdated Unicode normalization. It was added over twelve years ago in r70126 (a21fb8651f20f1ef) with a comment that it should be kept up to date, but no one ever did. * Remove IcuCollation::getUnicodeVersionForICU(), which contained a long list of ICU versions and corresponding Unicode versions that had to be kept up to date manually. Instead use IntlChar::getUnicodeVersion(), which was added in PHP 7.0. There are no known callers outside core. * Remove LinkFilter::supportsIDN(), as ICU has had support for UTS#46 since version 4.6. There are no known callers outside core. Also remove $flags and $variant from the idn_to_utf8() call, which match PHP 7.4's defaults. (INTL_IDNA_VARIANT_2003 was the default in 7.3.) * Display the ICU and Unicode versions in the installer, just below the PHP version. The ICU version is shown on Special:Version near the PHP version, and it probably makes sense to show it there as well. Change-Id: Ibdfac1a6f46fd56b84de1140292e0ec863f043ee
* Use str_starts_with/str_ends_with/str_containsUmherirrender2022-12-121-1/+1
| | | | | | | Use the new function in conditions to avoid creating substrings or to search the whole string Change-Id: Ibad6b1b447a4f62cceb34359231f88ebb967a90b
* IcuCollation: Add mappings for versions 70 and 71Reedy2022-08-281-0/+2
| | | | Change-Id: I2fbb90b601beff17c9d222513cf79a2bfbee67ef
* IcuCollation: Fix some typos and a broken link in a commentKevin Israel2022-05-211-9/+9
| | | | Change-Id: Ib0f9f8518c53a90733f4599eec1bf89dfc5420f5
* IcuCollation: Remove unnecessary rtrim() and unset()Kevin Israel2022-05-211-19/+4
| | | | | | | | | | | | | | | - MediaWiki no longer supports PHP versions that are old enough to include the null terminator in the sort key, so there's no need to remove it using rtrim(). See T137642 for more information. - Simplify prefix comparison by setting the initial value of $prev to the empty string rather than false. A PHP array key can only be an integer or string, so there's no other reason to check for false. - $letterMap is unset to "Reduce memory usage before caching", which made sense at the time (r80443 / eaeea84b44dc). In 2a86b5a17a54, the code was refactored such that $letterMap goes out of scope before we save to the cache, so unset() should no longer be needed. Change-Id: Id99ab3a6a29d037912f4ab3e817dba52d2ac24e8
* Use str_starts_with/str_ends_withAryeh Gregor2022-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | All the other ways of doing it were ridiculous and much harder to read, and usually required repeating the needle expression (to get its length). I found these occurrences by grepping for various expressions, but I undoubtedly missed some. I didn't try replacing the many instances of strpos(...) === 0 with str_starts_with(...), because I think they're readable enough as-is (although less efficient). Likewise I didn't try porting strpos(...) !== false to str_contains(...). For case-insensitive comparisons, Tim Starling requested that we stick with substr_compare() because it's more efficient than calling strtolower(). On PHP < 8 these functions will be included with a polyfill via vendor/autoload.php. This is included at the beginning of includes/AutoLoader.php, so if our autoloader has been included the polyfill will be available. This means it should be safe to call these functions from any code that would not be usable without our autoloader. Three uses that Tim Starling identified as being performance-sensitive have been split out to a separate commit for porting after the switch to PHP 8. Change-Id: I113a8d052b6845852c15969a2f0e6fbbe3e9f8d9
* phan: Disable null_casts_as_any_type settingUmherirrender2022-03-211-0/+1
| | | | | | | | | | | Make phan stricter about null types by setting null_casts_as_any_type to false (the default in mediawiki-phan-config) Remaining false positive issues are suppressed. The suppression and the setting change can only be done together Bug: T242536 Bug: T301991 Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01
* IcuCollation cleanupTim Starling2021-12-011-69/+22
| | | | | | | | | | | | | | * Fix calls to deprecated function utf8ToCodepoint() * Use a closure in ArrayUtils::findLowerBound() and optimise the code. * Fix the fixme and remove unused method getPrecompiledData(). * Use __DIR__ not $IP when fetching data. * Make getFirstLetterData() and getPrimarySortKey() private. Remove getLetterByIndex(), getSortKeyByLetterIndex(), getFirstLetterCount(). Confirmed no callers. Partly based on Ppchelko's I9204bb3f633c249519f8d20d6a033ffeb8cce758. Change-Id: I66ba384c863928ca0c2742a1523d02bd78f241f9
* collation: Improve IcuCollation for static code analyzerUmherirrender2021-11-081-2/+3
| | | | | | | | phan says that $this->mainCollator is not documented to get null assigned. Use a local variable to check for null and than set the class property. Change-Id: I000c935da8d99184f2ae0382fc5caac81e80c8d7
* Update link targetMeno252021-10-161-2/+2
| | | | | | Link redirected on website Change-Id: Ib7dafdf63dfe1240b2f78c9296999db6ef454ed3
* IcuCollation: Add some more icu to unicode version mappingsReedy2021-10-011-0/+2
| | | | Change-Id: I08cf93e45a6422e819ba333e01a5b34e1c03a398
* Inject services into Collation classesDannyS7122021-07-211-4/+10
| | | | | | | Might be worth converting Collation::singleton/::factory to a service at some point... Change-Id: Ifc96f851e6091ce834dbaf0e91695c648a42169c
* Deprecate a bunch of global functionsDannyS7122020-12-181-1/+21
| | | | | | | | | | | | | | | | | * wfAcceptToPrefs * wfClearOutputBuffers * wfConfiguredReadOnlyReason * wfDebugMem * wfGetPrecompiledData * wfNegotiateType Bug: T264976 Bug: T264979 Bug: T264981 Bug: T264983 Bug: T264984 Bug: T264985 Change-Id: Ia05bc84e4d1be7c8a02472f32e2c009e4bb32032
* Remove some checks for extension_loaded( 'intl' )Reedy2020-11-291-5/+0
| | | | | | | ext-intl is required by MW now, so composer will enforce requirement Bug: T267669 Change-Id: I6fcc19e06b95e58def4364a24b8de100dd6f3f90
* Use consts in IcuCollation classUmherirrender2020-11-211-7/+7
| | | | Change-Id: I664e7ea57b98975a3ff1c0c78477c18eb56837b4
* Language: hard deprecate the `noSeparators` parameter to ::formatNumC. Scott Ananian2020-10-211-1/+1
| | | | | | | | | | | | | Code should use Language::formatNumNoSeparators() instead, which has existed since MW 1.21. Code search: https://codesearch.wmcloud.org/search/?q=formatNum%5C%28%5B%5E%29%5D*%2C&i=nope&files=&repos= Depends-On: I95c365e2535bb3c47bed69a9b702c8f13d9fab87 Depends-On: I012434d5f6c749fec45a6c160e8d5d03686192e9 Depends-On: If3de5645a92514f605d4117fea3a820ed6c86624 Change-Id: I58a66975e505f16d8db5d663a9ca225535277983
* Remove terminating line breaks from debug messagesTim Starling2020-06-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | A terminating line break has not been required in wfDebug() since 2014, however no migration was done. Some of these line breaks found their way into LoggerInterface::debug() calls, where they mess up the formatting of the debug log. So, remove terminating line breaks from wfDebug() and LoggerInterface::debug() calls. Also: * Fix the stripping of leading line breaks from the log header emitted by Setup.php. This feature, accidentally broken in 2014, allows requests to be distinguished in the log file. * Avoid using the global variable $self. * Move the logging of the client IP back to Setup.php. It was moved to WebRequest in the hopes that it would not always be needed, however $wgRequest->getIP() is now called unconditionally a few lines up in Setup.php. This means that it is put in its proper place after the "start request" message. * Wrap the log header code in a closure so that variables like $name do not leak into global scope. * In Linker.php, remove a few instances of an unnecessary second parameter to wfDebug(). Change-Id: I96651d3044a95b9d210b51cb8368edc76bebbb9e
* Fix numerous PSR12.Properties.ConstantVisibility.NotFoundReedy2020-05-111-6/+1
| | | | Change-Id: I9b08bde11727f47e262f5f7f422eac5585ea7fca
* Add missing public visibility on some methodsUmherirrender2020-05-081-1/+1
| | | | | | RSSFeed::formatTime and AtomFeed::formatTime are private Change-Id: I6bf081c31c92e7130ae0ae527ba4a8f4635c7de2
* collation: Add 64-67 ICU->Unicode mappingsReedy2020-03-211-0/+4
| | | | | | 67 not released yet, but due next month according to schedule Change-Id: I3dedc025e9800bc46040fc606af2b16eb52841a0
* Avoid PHP scalar type juggling in includes/ (part 2)Daimona Eaytoy2019-12-301-1/+1
| | | | | | Continuation of e5444ea55a8000f0040. Change-Id: I9f95e7de4e219dee3abcdd210bb708d949f378d0
* Remove Language::factory and getParentLanguage useAryeh Gregor2019-10-271-1/+4
| | | | Change-Id: I11f8801ef47ec1a1f63d840116e69667e6f3ae3c
* Remove several methods, deprecated in 1.32Derick Alangi2019-05-091-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've checked and doubled checked that these methods are no longer used anywhere in core or extensions, hence removed them. They were hard deprecated in MediaWiki 1.32. * OutputPage: ** `::showFileCopyError()` ** `::showFileRenameError()` ** `::showFileDeleteError()` ** `::showFileNotFoundError()` * ApiBase: ** `::truncateArray()` * IcuCollation: ** `::getICUVersion()` * HTMLForm: ** `::setSubmitProgressive()` * ResourceLoaderStartUpModules: ** `::getStartupModules()` ** `::getLegacyModules()` * BaseTemplate: ** `::msgHtml()` * QuickTemplate: ** `::msgHtml()` * WatchAction: ** `::getUnwatchToken()` Bug: T220656 Change-Id: Ic1a723a991f4ff63fcb5f045ddcda18d1f8c3c68
* Collapse some nested if statementsReedy2019-04-041-9/+7
| | | | Change-Id: I9a97325d738d09370d29d35d5254bc0dadc57ff4
* Make uca-tr use I as uppercase of dotless ı instead of reverseBrian Wolff2019-02-201-1/+2
| | | | | | | The primary collision resolution makes wrong choice Bug: T203158 Change-Id: Id677476937cc6575950496767b50c1e8c21f2fbc
* Add ICU mapping for versions 62 and 63Reedy2018-10-181-0/+2
| | | | Change-Id: I5e1238e856d4149c30806e6b2cb3619c0c9c1dbf
* Write Latin and other scripts with captial letterFomafix2018-10-051-2/+2
| | | | Change-Id: I16c660e54191b63cd6eb3407cb00504665930c4e
* Remove xx-uca-et collation workaroundPikne2018-09-111-1/+1
| | | | | | | Remove workaround introduced in I3e8031b9. No longer needed. Bug: T202977 Change-Id: I39921ef83cddc33535b99bd9c0b75f8afb52ea9a
* collation: Move first-letters-root to includes/collation/dataTimo Tijhof2018-08-011-6/+12
| | | | | | | | | | | For consistency with other data files. Also, like the other data files: * For automated fetching of the Unicode files, move the steps from Makefile to a bash script. * Switch to a static array file format. Change-Id: If07487950a270283b8eaeda9a507e723ed2d89c4
* Use PHP 7 '??' operator instead of if-then-elseFomafix2018-06-121-5/+1
| | | | Change-Id: I790b86e2e9e3e41386144637659516a4bfca1cfe
* Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literalsBartosz Dziewoński2018-06-041-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cases where we're operating on text data (and not binary data), use e.g. "\u{00A0}" to refer directly to the Unicode character 'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h (which correspond to the UTF-8 encoding of that character). This makes it easier to look up those mysterious sequences, as not all are as recognizable as the no-break space. This is not enforced by PHP, but I think we should write those in uppercase and zero-padded to at least four characters, like the Unicode standard does. Note that not all "\xNN" escapes can be automatically replaced: * We can't use Unicode escapes for binary data that is not UTF-8 (e.g. in code converting from legacy encodings or testing the handling of invalid UTF-8 byte sequences). * '\xNN' escapes in regular expressions in single-quoted strings are actually handled by PCRE and have to be dealt with carefully (those regexps should probably be changed to use the /u modifier). * "\xNN" referring to ASCII characters ("\x7F" and lower) should probably be left as-is. The replacements in this commit were done semi-manually by piping the existing "\xNN" escapes through the following terrible Ruby script I devised: chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8') puts chars.split('').map{|char| '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}' }.join('') Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
* Use PHP 7 '<=>' operator in 'sort()' callbacksBartosz Dziewoński2018-05-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | `$a <=> $b` returns `-1` if `$a` is lesser, `1` if `$b` is lesser, and `0` if they are equal, which are exactly the values 'sort()' callbacks are supposed to return. It also enables the neat idiom `$a[x] <=> $b[x] ?: $a[y] <=> $b[y]` to sort arrays of objects first by 'x', and by 'y' if they are equal. * Replace a common pattern like `return $a < $b ? -1 : 1` with the new operator (and similar patterns with the variables, the numbers or the comparison inverted). Some of the uses were previously not correctly handling the variables being equal; this is now automatically fixed. * Also replace `return $a - $b`, which is equivalent to `return $a <=> $b` if both variables are integers but less intuitive. * (Do not replace `return strcmp( $a, $b )`. It is also equivalent when both variables are strings, but if any of the variables is not, 'strcmp()' converts it to a string before comparison, which could give different results than '<=>', so changing this would require careful review and isn't worth it.) * Also replace `return $a > $b`, which presumably sort of works most of the time (returns `1` if `$b` is lesser, and `0` if they are equal or `$a` is lesser) but is erroneous. Change-Id: I19a3d2fc8fcdb208c10330bd7a42c4e05d7f5cf3
* Follow-up If8dfdaf1: Hard-deprecate, drop two uses, other pre-5.3 ↵James D. Forrester2018-05-241-0/+1
| | | | | | back-compat code Change-Id: I1c5eee3fe30d6687d88e07011a3d40b6770d0daf
* Merge "Add unicode mapping for ICU 60 and 61"jenkins-bot2018-05-241-0/+2
|\
| * Add unicode mapping for ICU 60 and 61Reedy2018-05-241-0/+2
| | | | | | | | Change-Id: Ifbbc8d7ecc788bc2c6b07a8ebba46a9648545786
* | IcuCollation: Deprecate getICUVersion(), no need for PHP53 back-compatJames D. Forrester2018-05-241-10/+6
|/ | | | Change-Id: If8dfdaf187b32b7b9a2c09a240416b9f481593f1
* IcuCollation: Use codepoint as tiebreaker when getting first-lettersBartosz Dziewoński2018-05-111-3/+11
| | | | | | | | This prevents unexpected cuneiform digits from acting as headings for 2 and 3 on category pages. Bug: T187645 Change-Id: I0424a24769899cb23b28704f97e1002fa44999fd
* Improve some parameter docsUmherirrender2018-01-071-1/+0
| | | | Change-Id: I31e983d7ac287158101b18ad95779d83537302a2
* Add Unicode to ICU mappings for versions 58 and 59Reedy2017-10-251-0/+2
| | | | Change-Id: I87a5e6ce3a44a2be1e6bf8adf2f98cd0a4745574
* Improve some parameter docsUmherirrender2017-09-101-0/+8
| | | | | | Add missing @return and @param to function docs and fixed some @param Change-Id: I810727961057cfdcc274428b239af5975c57468d
* Use short type bool/int in param documentationUmherirrender2017-08-201-1/+1
| | | | | | Enable the phpcs sniffs for this and used phpcbf Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925
* build: Update mediawiki/mediawiki-codesniffer to 0.10.1Kunal Mehta2017-07-221-1/+1
| | | | | | | | | And auto-fix all errors. The `<exclude-pattern>` stanzas are now included in the default ruleset and don't need to be repeated. Change-Id: I928af549dc88ac2c6cb82058f64c7c7f3111598a
* IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) ↵Bartosz Dziewoński2017-07-191-2/+2
| | | | | | | | | | | | | | headings They should be Ș, Ț (comma-below) and instead they were cedilla-below (Ş, Ţ). Same as for Romanian (ro) in 486f64f28302ecceed04977180fd21470cb54c81. Both of these languages are unsupported by libicu and so the collations are unlikely to have been used in practice. Bug: T171043 Bug: T171044 Change-Id: Idd0d593e73cd784fbef7b75e8985f988f5555e26
* Update FIRST_LETTER_VERSION for rowiki changesBrian Wolff2017-07-191-1/+1
| | | | | | | | | Can't just clear cache on production, as this now uses per-server apc instance. Follow-up 486f64f28302ecceed04977 Change-Id: I88df6d5a91c86ef687543d1a6988e0ec050bbfce