aboutsummaryrefslogtreecommitdiffstats
path: root/includes/parser/Preprocessor_Hash.php
Commit message (Collapse)AuthorAgeFilesLines
* Various doc fixes about false and null on method arguments/return typesUmherirrender2022-11-031-1/+1
| | | | | | Doc-only changes Change-Id: Ice974b3ba41708859dfe646e94b31c5ebbf26410
* Use str_starts_with/str_ends_withAryeh Gregor2022-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | All the other ways of doing it were ridiculous and much harder to read, and usually required repeating the needle expression (to get its length). I found these occurrences by grepping for various expressions, but I undoubtedly missed some. I didn't try replacing the many instances of strpos(...) === 0 with str_starts_with(...), because I think they're readable enough as-is (although less efficient). Likewise I didn't try porting strpos(...) !== false to str_contains(...). For case-insensitive comparisons, Tim Starling requested that we stick with substr_compare() because it's more efficient than calling strtolower(). On PHP < 8 these functions will be included with a polyfill via vendor/autoload.php. This is included at the beginning of includes/AutoLoader.php, so if our autoloader has been included the polyfill will be available. This means it should be safe to call these functions from any code that would not be usable without our autoloader. Three uses that Tim Starling identified as being performance-sensitive have been split out to a separate commit for porting after the switch to PHP 8. Change-Id: I113a8d052b6845852c15969a2f0e6fbbe3e9f8d9
* Remove or replace usages of "sanity"Reedy2021-11-191-1/+0
| | | | | Bug: T254646 Change-Id: I2b120f0b9c9e1dc1a6c216bfefa3f2463efe1001
* build: Update mediawiki/mediawiki-phan-config to 0.11.0Umherirrender2021-09-071-0/+1
| | | | | | | Addition and remove of suppression needs to be done with the version update. Change-Id: I3288b3cefa744b507eadebb67b8ab08c86517c1c
* parser: convert Preprocessor to WANCache and inject dependenciesAaron Schulz2021-01-111-49/+51
| | | | | | | | | | | | Make the caching logic use getWithSetCallback() and simplify the code given that there is only one Preprocessor subclass. Also, keep the cached values JSON serializable but rely on the serialization in BagOStuff instead for simplicity. Add related class constants for injecting preprocessor flags. Bug: T254608 Change-Id: I72f9f0c0bc352ed5120469090c71294ff0c24999
* Replace $wgDisable{Lang,Title}Conversion with LanguageConverterFactory methodsC. Scott Ananian2020-11-251-2/+5
| | | | | | | | | | | | | | | | | | | | Replace direct access to $wgDisableLangConversion with LanguageConverterFactory::isConversionDisabled(), and replace direct access to $wgDisableTitleConversion with LanguageConverterFactory::isTitleConversionDisabled(). However, most places that check ::isTitleConversionDisabled() actually want ::isLinkConversionDisabled(), so add that too (and deprecate isTitleConversionDisabled()). Code search: https://codesearch.wmcloud.org/search/?q=Disable%28Lang|Title%29Conversion&i=nope&files=&repos= This change removes a number of spurious dependencies on the global configuration and reduces code duplication (for example, if the logic for disabling language conversion were ever to change). Depends-On: I6fa8230ae97b0e34c381003548e61f9b7387d363 Change-Id: Icc4687638ff1815003dd903854efdbd904854f1e
* Fix PHP 8 compat with strcspn() $length parameter exceeding stringFlorian2020-10-041-1/+2
| | | | | Bug: T264502 Change-Id: I25cb8f56e6f56a9233e38844dc62cda9a06cb5e6
* Fix even more PSR12.Properties.ConstantVisibility.NotFoundReedy2020-05-161-2/+2
| | | | Change-Id: I6d98efcfac1f1c0ab6a442e0af6d5daa6ef7801a
* parser: Declare some dynamic propertiesDaimona Eaytoy2019-09-081-6/+0
| | | | | | | Mostly via the @property annotation. This is to make phan a little happier. Change-Id: I3fde33955240dab20870821e9db93caba163845b
* Unsuppress another phan issue (part 7)Daimona Eaytoy2019-09-031-0/+2
| | | | | | | Bug: T231636 Depends-On: I2cd24e73726394e3200a570c45d5e86b6849bfa9 Depends-On: I4fa3e6aad872434ca397325ed7a83f94973661d0 Change-Id: Ie6233561de78457cae5e4e44e220feec2d1272d8
* Improve type hints in parser related classesUmherirrender2019-07-051-0/+3
| | | | Change-Id: Ia07a2eb32894f96b195fa3189fb5f617e68f2581
* Split parser related files to have one class in one fileZoranzoki212019-04-271-1456/+0
| | | | Change-Id: I36b26609ccb3f135a22961b32a46cdc06603b3e4
* parser: Fix return type for methods and match phpdoc commentsDerick Alangi2019-04-121-1/+3
| | | | Change-Id: I867d7eb6fc56cc52eb8e129977b7a62607a11268
* Get rid of unnecessary func_get_args() and friendsAryeh Gregor2019-04-121-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | HHVM does not support variadic arguments with type hints. This is mostly not a big problem, because we can just drop the type hint, but for some reason PHPUnit adds a type hint of "array" when it creates mocks, so a class with a variadic method can't be mocked (at least in some cases). As such, I left alone all the classes that seem like someone might like to mock them, like Title and User. If anyone wants to mock them in the future, they'll have to switch back to func_get_args(). Some of the changes are definitely safe, like functions and test classes. In most cases, func_get_args() (and/or func_get_arg(), func_num_args() ) were only present because the code was written before we required PHP 5.6, and writing them as variadic functions is strictly superior. In some cases I left them alone, aside from HHVM compatibility: * Forwarding all arguments to another function. It's useful to keep func_get_args() here where we want to keep the list of expected arguments and their meanings in the function signature line for documentation purposes, but don't want to copy-paste a long line of argument names. * Handling deprecated calling conventions. * One or two miscellaneous cases where we're basically using the arguments individually but want to use them as an array as well for some reason. Change-Id: I066ec95a7beb7c0665146195a08e7cce1222c788
* Use PHP 7 '??' operator instead of if-then-elseFomafix2018-10-211-4/+1
| | | | Change-Id: If9d4be5d88c8927f63cbb84dfc8181baf62ea3eb
* build: Updating mediawiki/mediawiki-codesniffer to 22.0.0Umherirrender2018-09-161-3/+3
| | | | | | | | | Added spaces around . Removed empty return statement which are not required Removed return after phpunit markTestIncomplete, which is throwing to exit the test, no need for a return Change-Id: I2c80b965ee52ba09949e70ea9e7adfc58a1d89ce
* Use PHP 7 '??' operator instead of '?:' with 'isset()' where convenientBartosz Dziewoński2018-05-301-1/+1
| | | | | | | | | | | | | | Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/ Replace with: '\1 ?? ' (Everywhere except includes/PHPVersionCheck.php) (Then, manually fix some line length and indentation issues) Then manually reviewed the replacements for cases where confusing operator precedence would result in incorrect results (fixing those in I478db046a1cc162c6767003ce45c9b56270f3372). Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
* Don't globally disable PHPCS's prohibition of assert()Kunal Mehta2018-05-071-0/+2
| | | | | | | | Whitelist the remaining usages of assert(), and reinstate the PHPCS sniff that forbids usage of it. Add FIXME comments as well, so any casual readers of the code will not think that the disabling and usage is intentional. Change-Id: I7cabe715c0e6aa6a9ef3ffe5657f3de7fd8e662b
* Clarify -{ => {{ transitionArlo Breault2018-03-151-17/+30
| | | | | | Ensure we have the correct rule on the stack. Change-Id: Ie814df7b759a2381be0b815eeefdb5d1f7adcde0
* Remove "dash" case in preprocessToObjArlo Breault2018-03-091-3/+0
| | | | | | This was introduced in 2877402 and removed in 186a182 Change-Id: Ibfa1ae1597bfc50ae6ea49402c7966ca042f12e5
* Use ::class to resolve class names in includes filesUmherirrender2018-01-271-5/+5
| | | | | | | This helps to find renamed or misspelled classes earlier. Phan will check the class names Change-Id: I07a925c2a9404b0865e8a8703864ded9d14aa769
* build: Updating mediawiki/mediawiki-codesniffer to 15.0.0Umherirrender2018-01-011-22/+11
| | | | | | | | | | | | | Clean up use of @codingStandardsIgnore - @codingStandardsIgnoreFile -> phpcs:ignoreFile - @codingStandardsIgnoreLine -> phpcs:ignore - @codingStandardsIgnoreStart -> phpcs:disable - @codingStandardsIgnoreEnd -> phpcs:enable For phpcs:disable always the necessary sniffs are provided. Some start/end pairs are changed to line ignore Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
* Merge "Change php extract() to explicit code"jenkins-bot2017-12-281-4/+40
|\
| * Change php extract() to explicit codeUmherirrender2017-12-081-4/+40
| | | | | | | | | | | | Avoid php magic and make var settings more visible Change-Id: I223874fd871104b0ac6a80d7f39c6dd997d0551d
* | Require indentation of CASE statements in PHP codeHuji Lee2017-12-101-21/+21
|/ | | | | Bug: T182546 Change-Id: I91a9555893a08e4ec58da97c6cc4d1e70000ff6b
* Improve some parameter docsUmherirrender2017-09-101-0/+8
| | | | | | Add missing @return and @param to function docs and fixed some @param Change-Id: I810727961057cfdcc274428b239af5975c57468d
* Use short type bool/int in param documentationUmherirrender2017-08-201-6/+6
| | | | | | Enable the phpcs sniffs for this and used phpcbf Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925
* update mediawiki-codesniffer to 0.11.0 and fix issuesWMDE-Fisch2017-08-111-1/+1
| | | | | | | | - mostly auto fixes - some too long lines fixed - ignore amp space in one case passing by reference Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
* Remove empty lines at begin of function, if, foreach, switchUmherirrender2017-07-011-1/+0
| | | | | | Organize phpcs.xml a bit Change-Id: Ifb767729b481b4b686e6d6444cf48b1f580cc478
* Protect language converter markup in the preprocessor (take 2).C. Scott Ananian2017-05-231-15/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This revises 28774022769d2273be16c6c6e1cca710a1fd97ef, which was reverted in master due to unexpected issues with `-{{...}} ` markup on translatewiki and enwiki. Test cases are added to ensure that this is parsed as a template, not as language converter markup. https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical documentation for the preprocessor; this will be updated after this patch is merged. The basic principles described in that page are maintained in this patch: * Rightmost opening structure has precedence: `-{{` is parsed as a dash followed by template opening. * `{{{` has precedence over `{{` and `-{`: `-{{{{` is parsed as `-{` `{{{` since we first grab the rightmost `{{{`. A bunch of test cases were added to verify the "ideal precedence" order described on that wiki page. This patch introduced some minor incompatibilities in existing markup, in particular with chemical formulae in templates. Fixes for these are being tracked at https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups Bug: T146304 Bug: T153761 Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150
* includes: Replace implicit Bugzilla bug numbers with Phab onesJames D. Forrester2017-02-211-1/+1
| | | | | | | It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345" except where it doesn't, so let's just standardise on the real numbers. Change-Id: I6f59febaf8fc96e80f8cfc11f4356283f461142a
* Revert "Protect language converter markup in the preprocessor."C. Scott Ananian2017-01-031-1/+2
| | | | | | | | | This effectively reverts commit 28774022769d2273be16c6c6e1cca710a1fd97ef in order to unblock the deploy train. The underlying behavior might not be incorrect, but it was unexpected. Bug: T153761 Change-Id: Ifc9c7cf3482dd5d222ff4da24a6d4cc401e9d965
* Protect language converter markup in the preprocessor.C. Scott Ananian2016-12-151-6/+29
| | | | | | | | | This ensures that `{{echo|-{R|foo}-}}` is parsed correctly as a template invocation with a single argument, not as two separate arguments split by the `|`. Bug: T146304 Change-Id: I709d007c70a3fd19264790055042c615999b2f67
* Remove all assert() calls with string parametersTim Starling2016-08-151-1/+1
| | | | | | These fail when HHVM is in RepoAuthoritative mode Change-Id: Ifb1628f8269b2b651154b740b95cc14163a1b186
* Preprocessor_Hash: use child arrays instead of linked listsTim Starling2016-07-221-382/+455
| | | | | | | | | | | | | | | | | | | | | | | | | | | The singly-linked list data structure of Preprocessor_Hash was causing stack exhaustion due to the need for a recursion depth proportional to the number of children of a given PPNode, in serialize() and on object destruction. So, switch to array-based storage. PPNode_* becomes a temporary proxy around the underlying storage, which avoids circular references and keeps the storage very compact. Preprocessor_DOM uses similar temporary PPNode objects, so the fact that $node->getFirstChild() !== $node->getFirstChild() should not cause any new problems. * Increment cache version * Use JSON serialization of the store array instead of serialize(), since JSON is more compact, even after gzipping. * For efficiency, make $accum a plain array, and use it as an array where possible, instead of using helper functions. Performance and memory usage for typical input are slightly improved: something like 4% faster for the whole parse, and 20% less memory for the tree. Bug: T73486 Change-Id: I0d6c162b790d6dc1ddb0352aba6e4753854f4c56
* Preprocessor: Don't allow unclosed extension tags (matching until end of input)Bartosz Dziewoński2016-04-051-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Previously done in f51d0d9a819f8f1c181350ced2f015ce97985fcc and reverted in 543f46e9c08e0ff8c5e8b4e917fcc045730ef1bc.) I think it's saner to treat this as invalid syntax, and output the mismatched tag code verbatim. The current behavior is particularly annoying for <ref> tags, which often swallow everything afterwards. This does not affect HTML tags, though. Assuming Tidy is enabled, they are still auto-closed at the end of the page content. (For tags that "shadow" a HTML tag name, this results in the tag being treated as a HTML tag. This currently only affects <pre> tags: if unclosed, they are still displayed as preformatted text, but without suppressing wikitext formatting.) It also does not affect <includeonly>, <noinclude> and <onlyinclude> tags. Changing this behavior now would be too disruptive to existing content, and is the reason why previous attempt was reverted. (They are already special-cased enough that this isn't too weird, for example mismatched closing tags are hidden.) Related to T17712 and T58306. I think this brings the PHP parser closer to Parsoid's interpretation. It reduces performance somewhat in the worst case, though. Testing with https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with 3000 opening tags of 15 different types), parsing time rises from ~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN), where N is bytes of input and k is the number of types of tags present on the page. Maximum k shouldn't exceed 30 or so in reasonable setups (depends on installed extensions, it's 20 on English Wikipedia). Change-Id: Ide8b034e464eefb1b7c9e2a48ed06e21a7f8d434
* Fix @param and @return types on all PPFrame::getArgument methodsThiemo Mättig2016-03-291-8/+8
| | | | | | | | This is about template parameters. They can be indexed by position (int) or name (string). The returned value is always a string, or false (bool) on failure. Change-Id: I565210ad485505281246ef2bb3086a675b905976
* Fix unmatched @codingStandardsIgnore in parser folderumherirrender2016-02-171-2/+2
| | | | | | Fix outstanding phpcs errors Change-Id: I7b857be88354f2ffa27d76406253ec9e9710b91d
* Convert all array() syntax to []Kunal Mehta2016-02-171-45/+45
| | | | | | | | | | Per wikitech-l consensus: https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html Notes: * Disabled CallTimePassByReference due to false positives (T127163) Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
* Revert "Preprocessor: Don't allow unclosed extension tags (matching until ↵Legoktm2016-02-041-11/+5
| | | | | | | | | | | | end of input)" This reverts commit f51d0d9a819f8f1c181350ced2f015ce97985fcc. Breaks templates with non-closed </noinclude> tags, which were previously acceptable. Bug: T125754 Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
* Preprocessor: Don't allow unclosed extension tags (matching until end of input)Bartosz Dziewoński2016-01-211-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I think it's saner to treat this as invalid syntax, and output the mismatched tag code verbatim. The current behavior is particularly annoying for <ref> tags, which often swallow everything afterwards. This does not affect HTML tags, though. Assuming Tidy is enabled, they are still auto-closed at the end of the page content. Related to T17712 and T58306. I think this brings the PHP parser closer to Parsoid's interpretation. It reduces performance somewhat in the worst case, though. Testing with https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with 3000 opening tags of 15 different types), parsing time rises from ~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN), where N is bytes of input and k is the number of types of tags present on the page. Maximum k shouldn't exceed 30 or so in reasonable setups (depends on installed extensions, it's 20 on English Wikipedia). To consider: * Should we keep previous behavior for unclosed <includeonly> / <noinclude>? This would be particularly disruptive for these if someone relied on the old behavior, and they're already special-cased in places. * Unclosed <pre> tags are now treated as HTML tags, and are still displayed as preformatted text, but without suppressing wikitext formatting. Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
* Remove various double empty newlinesumherirrender2015-12-271-1/+0
| | | | | | | The double empty newline is not needed between functions, variable or at end of file Change-Id: Ib866a95084c4601ac150a2b402cfa184ebc18afa
* Fix PPNode_Hash_Tree::getChildrenOfType return valueBrad Jorsch2015-12-171-1/+1
| | | | | | PPNode defines it as returning an array-type PPNode, not an array. Change-Id: I9a6c5cea408aae449bfbf808d067837c4337c672
* Move brace matching rules to Preprocessor classOri Livneh2015-11-031-22/+4
| | | | | | | | Instead of declaring the array of rules within both Preprocessor_DOM:: and Preprocessor_Hash::preprocessToXml(), declare it as a protected property of the parent Preprocessor class. Change-Id: I6193de66566c164fe85cdd6a88c04fa9c565f1a9
* Consolidate common Preprocessor caching codeOri Livneh2015-10-251-34/+9
| | | | | | | | | | | | | | | | | * Consolidate nearly-identical caching code in Preprocessor_DOM and Preprocessor_Hash by making Preprocessor an abstract class rather than an interface and by implementing Preprocessor::cacheSetTree() and Preprocessor::cacheGetTree(). * Cache trees for wikitext blobs that have length equal or greater to PreprocessorCacheThreshold. Previously they needed to be greater than PreprocessorCacheThreshold, so this changes the requirement by one character. I did it because it seems more natural. * Modernize the code to use singleton service objects rather than globals. We spend a lot of time in the Preprocessor, so it would be nice for this code to be well-factored and clear. Change-Id: Ib71c29f14a28445a505e12c774a24ad964330b95
* Use line comments for @codingStandardsIgnoreStartumherirrender2015-10-071-12/+12
| | | | | | | | | | | | In Preprocess_DOM.php and Preprocess_Hash.php the @codingStandardsIgnoreStart is inside a doc comment, but phpcs does not see this tag and does not ignore the error. Using line comments fix this problems. See https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs/842/console Change-Id: Id0edf6edb2902466748165c2e820d2cf4b7fcf75
* Fix issues identified by SpaceBeforeSingleLineComment sniffVivek Ghaisas2015-09-261-2/+2
| | | | Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9
* Decline to cache preprocessor items larger than 1 MbOri Livneh2015-09-021-2/+6
| | | | | | | | | | This is a temporarily workaround for T111289. The data ought not be so large, but it frequently is, and the problem is almost exclusive to this code path. For now, just avoid attempting to cache the value if its size exceeds a million bytes. Bug: T111289 Change-Id: Idd1acd903193f0753cc5548bd32800705716dd9f
* Make PPFrame::RECOVER_COMMENTS actually workKevin Israel2015-08-151-1/+3
| | | | | | | | | Because of a missing condition, it generally only had an effect on output type Parser::OT_WIKI, and thus {{msgnw:}} would strip comments except when substituted during a pre-save transform. Bug: T98841 Change-Id: I1e47696434fe87475f9902e6bfb8990566456e2f
* Remove unneeded empty lines at begin of if/else/foreach bodyumherirrender2015-06-191-1/+0
| | | | | | An if body must not begin with an empty line Change-Id: I62b058be337fcc85a120fcd3dadce564db59a271