aboutsummaryrefslogtreecommitdiffstats
path: root/includes/parser
Commit message (Collapse)AuthorAgeFilesLines
* Namespace all remaining classes in includes/parserJames D. Forrester2024-10-1547-56/+195
| | | | | Bug: T353458 Change-Id: If02cc9b1ff78e26c1cf8c91ee4695845eb133829
* Merge "ParsoidParser: pass render reason to Parsoid; fix case of 'sampleStats'"jenkins-bot2024-10-121-1/+2
|\
| * ParsoidParser: pass render reason to Parsoid; fix case of 'sampleStats'C. Scott Ananian2024-09-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Every other option passed to parsoid (except `body_only`) is in camelCase, so make 'sampleStats' into a camel as well. Pass the render reason to Parsoid so that parsoid-specific parse stats can be correlated with stats coming from the ParserOutputAccess. Used in I88ba26fefd9d69ad3e2354d1e235b1e42d1914a0 but does not depend on that patch. Change-Id: I2e5c897c55e41224567ed94bbf903c8fff96e841
* | Merge "Add static return type for `ParserOutput::getExternalLinks`"jenkins-bot2024-10-101-2/+2
|\ \
| * | Add static return type for `ParserOutput::getExternalLinks`Arthur Taylor2024-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PHPUnit tests that mock the ParserOutput object are unable to correctly infer that the mock should return an empty array rather than null for `getExternalLinks`. This is currently causing test failures in SpamBlacklist in CI. Add the return type definition to the function field definition so that PHPUnit has a better chance at doing the right thing. Note that `getExternalLinks` returns `$this->mExternalLinks` by reference; if there’s some existing code which reassigns a non-array value to that reference (and, consequently, to `$this->mExternalLinks`, such code will start to throw TypeErrors during the assignment. Bug: T376633 Change-Id: I246d5541200c9d0c405f30ea9de091ff9c0e759c
* | | Merge "Remove meaningless @var documentation from constants"jenkins-bot2024-10-091-1/+0
|\ \ \
| * | | Remove meaningless @var documentation from constantsthiemowmde2024-10-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A constant is not a variable. The type is hard-coded via the value and can never change. While the extra @var probably doesn't hurt much, it's redundant and error-prone and can't provide any additional information. Change-Id: Iee1f36a1905d9b9c6b26d0684b7848571f0c1733
* | | | ParsoidParser: ensure magic variable expansion uses pageLanguageOverrideC. Scott Ananian2024-10-091-0/+1
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds tests for the caching fix in Ie76020dc4fa3545f827e1674051530b479f01f31, but these tests also revealed that the recursive invocation of the legacy parser to expand magic variables like {{PAGELANGUAGE}} wasn't using the pageLanguageOverride, aka ParserOptions::getTargetLanguage(). The page language override is used when parsing new context which doesn't currently exist in the database and therefore doesn't have a page language set by its title (which doesn't yet exist). Bug: T376783 Follows-Up: Ie76020dc4fa3545f827e1674051530b479f01f31 Change-Id: If6fe7cf00be6e78ef46181b17f01138383e95e46
* | | Merge "ParserOutput::setPageProperty(): emit deprecation warnings for ↵jenkins-bot2024-10-081-2/+9
|\ \ \ | |/ / |/| | | | | non-strings"
| * | ParserOutput::setPageProperty(): emit deprecation warnings for non-stringsC. Scott Ananian2024-10-041-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was deprecated in 1.42 but did not previously emit deprecation warnings. Depends-On: I072b111b047cfe13e32a822678d68165d1c76f84 Depends-On: I2734383207b92f71bffc66ba2392a592a1df0954 Depends-On: I79bb5030c13e83f664da1635254f4bc171ed4f3e Depends-On: If64a5239a40953f244657e60f95b2e938abfe447 Change-Id: Ifefd3dab43247d988b7c7ff7874c05c90fc8ce1f
* | | Merge "ParserOutput: ensure all created ParserOutputs have a "start of ↵jenkins-bot2024-10-071-2/+20
|\ \ \ | | | | | | | | | | | | parse" time set"
| * | | ParserOutput: ensure all created ParserOutputs have a "start of parse" time setC. Scott Ananian2024-10-041-2/+20
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *Most* implementations of ContentHandler::fillParserOutput() ensure that the returned ParserOutput has had ParserOutput::resetParseStartTime() called on it at an appropriate time -- but not *all*. This is a belt-and-suspenders fix that ensures that every code path which creates a ParserOutput has *some* "start time" defined. This could be misleading if the parsing is done first and the parser output is created at the very end of the parse, but in all the code that I've looked at the ParserOutput is the first thing created and so this default should be reasonable. While we're at it, remove the parseStartTime from the serialized form of the ParserOutput, because it is useless after the object is unserialized. Bug: T376433 Change-Id: I3bdf3996401a7d5ac4d8e1e5e6afb7ca410cbe6c
* / / Provide a prefixed StatsFactory in parsoid configYiannis Giannelos2024-10-041-4/+11
|/ / | | | | | | Change-Id: Ic3fc353b030a292952091813c9847cd697b25444
* | Switch over a bunch of class_alias uses to actualsJames D. Forrester2024-10-032-0/+3
| | | | | | | | Change-Id: Id175a83e71cc910eaee5d5890a9106872a3ca3b8
* | Merge "Add namespace to remaining parts of Wikimedia\Mime and Wikimedia\Stats"jenkins-bot2024-10-031-1/+1
|\ \
| * | Add namespace to remaining parts of Wikimedia\Mime and Wikimedia\StatsJames D. Forrester2024-09-271-1/+1
| | | | | | | | | | | | | | | Bug: T353458 Change-Id: If0137003ab625017d322d57870448a02569668c3
* | | Merge "Add namespace to remaining parts of Wikimedia\ObjectCache"jenkins-bot2024-10-037-3/+7
|\| |
| * | Add namespace to remaining parts of Wikimedia\ObjectCacheJames D. Forrester2024-09-277-3/+7
| | | | | | | | | | | | | | | Bug: T353458 Change-Id: I3b736346550953e3b2977c14dc3eb10edc07cf97
* | | Merge "Deprecate ParserOutput::setLanguageLinks(null)"jenkins-bot2024-10-021-3/+12
|\ \ \
| * | | Deprecate ParserOutput::setLanguageLinks(null)C. Scott Ananian2024-10-021-3/+12
| | |/ | |/| | | | | | | | | | | | | Bug: T376323 Follows-Up: I82a05a51d94782ebb9fa87ff889ca0f633b3e15c Change-Id: I0952659ab245326e9e8352170fb0a629ec109e72
* | | Merge "Allow localized gallery widths; avoid spurious "double px" tracking ↵jenkins-bot2024-10-021-2/+13
|\ \ \ | |/ / |/| | | | | category"
| * | Allow localized gallery widths; avoid spurious "double px" tracking categoryC. Scott Ananian2024-09-111-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The `widths` and `heights` attributes to the <gallery> tag weren't being properly localized with the `img_width` magic word, which meant that trailing 'px' wasn't stripped causing it to trigger the "double-px" tracking category when it shouldn't. Bug: T374311 Change-Id: I538bc0975f858f62cdd20619fc6f337abb9698eb
* | | Merge "Deduplicate language links in ParserOutput and OutputPage"jenkins-bot2024-09-272-27/+49
|\ \ \ | |_|/ |/| |
| * | Deduplicate language links in ParserOutput and OutputPageC. Scott Ananian2024-09-262-27/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move deduplication of language links out of Parser.php and into the ParserOutput in order to be compatible with alternate Parsers (Parsoid). Clean up various inconsistencies: ensure deduplication also happens in OutputPage when multiple ParserOutputs are merged into the final output, and ensure that the deduplication in LinksUpdate is done in the same order (first link prevails) as in Parser/ParserOutput/OutputPage. Deprecate OutputPage::setLanguageLinks() (the matching ParserOutput::setLanguageLinks() was deprecated in 1.42). As a breaking change, return an array, not an array *reference*, from ParserOutput::getLanguageLinks(). This allows us to safely modify the internal representation of language links. As far as I can tell, no one used the returned reference to sneakily modify the list of language links, and there not a good way to have deprecated this before making the breaking change. While we're at it, we've added tests to ensure that language link fragments are preserved. Bug: T26502 Bug: T358950 Bug: T375005 Change-Id: I82a05a51d94782ebb9fa87ff889ca0f633b3e15c
* | | Add namespace to IDBAccessObject and DBAccessObjectUtilsJames D. Forrester2024-09-271-1/+1
|/ / | | | | | | | | Bug: T353458 Change-Id: I23cf7991f8792d4d000d1780463d8ce76dc0aee0
* | parsoid: use real ParserOutput, not StubMetadataCollector in wikitext2lintC. Scott Ananian2024-09-241-1/+3
| | | | | | | | | | | | | | Bug: T374149 Bug: T331084 Depends-On: Iec2ab2a831a56b6bed0615df8a437e84ec63e799 Change-Id: I7a43df32cf0cc0c062374f404edb468a4e59f386
* | Fix names of parsercache_selective_* statsC. Scott Ananian2024-09-191-2/+2
| | | | | | | | | | | | | | Rename to use a unit type as a suffix to match the guidance in https://www.mediawiki.org/wiki/Manual:Stats#Metrics Change-Id: Ied4c1c3a1ab7fa6148d10a7fc89094c46f568453
* | Re-order arguments to DataAccess::addTrackingCategoryArlo Breault2024-09-181-6/+7
| | | | | | | | | | | | | | Match the order of other methods of that class. Follows-Up: Id4b29c6d09c79649c94d2da2e678af52a967bbe5 Change-Id: Ic8f70ba2b6466bb91f467d4e87e9de9e09c0245a
* | Merge "Randomly sample statistics for Parsoid Selective Update"jenkins-bot2024-09-161-4/+38
|\ \
| * | Randomly sample statistics for Parsoid Selective UpdateC. Scott Ananian2024-09-131-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Controlled by $wgParsoidSelectiveUpdateSampleRate (which defaults to off) randomly sample 1 in N parses to collect statistics to inform the design of Parsoid selective update: * For both legacy parses and Parsoid, count how many times a previous parse is in the cache when a new parse is requested. This needs to sample the legacy parser as well as Parsoid because Parsoid is not yet invoked from the RefreshLinksJob. We also count the relative number of parses from the different RevisionRenderer::getRenderedRevision() call sites to determine which pathways might account for the most opportunities for optimized selective update. * For sampled parses using the Parsoid parser where a previous parse result is available, also fetch the previous wikitext source from the database. Bug: T371713 Change-Id: I208aeac1b315a96bdb9669427cd03de461b914b4
* | | Merge "ParserOutput::collectMetadata(): fix handling of links"jenkins-bot2024-09-161-3/+19
|\ \ \ | |/ / |/| |
| * | ParserOutput::collectMetadata(): fix handling of linksC. Scott Ananian2024-09-131-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For language links, when there are conflicts between namespaces and interwiki prefixes, it is important to use TitleValue for language links rather than to try to reparse the Title. Language links also preserve fragments, unlike other link types in ParserOutput; added tests to document this. Added handling for interwiki links and template links. Bug: T363538 Change-Id: I6e8ff8ed7f8819000cc3f80e49c0739b568217a4
* | | Merge "parser: Add missing documentation to class properties"jenkins-bot2024-09-127-2/+27
|\ \ \
| * | | parser: Add missing documentation to class propertiesUmherirrender2024-09-077-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add doc-typehints to class properties found by the PropertyDocumentation sniff to improve the documentation. Once the sniff is enabled it avoids that new code is missing type declarations. This is focused on documentation and does not change code. Change-Id: I3afaba387663320187c49ff1cdb2ff3ae01681ad
* | | | Merge "Use type declaration for class properties holding type hinted arguments"jenkins-bot2024-09-121-1/+1
|\ \ \ \ | |_|/ / |/| | |
| * | | Use type declaration for class properties holding type hinted argumentsUmherirrender2024-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Provided arguments already have type declaration on the construtor and it is safe to use the same type on the class property Change-Id: Ia8bbdc4dee59dfb487582dd514486ec8542951be
* | | | parser: Remove PPDStack_Hash::$falseUmherirrender2024-09-111-3/+1
| |_|/ |/| | | | | | | | | | | | | | The value is never changed, just use false Change-Id: I6ff2b1e91f28340f8f467a1d5e50cc132e142b95
* | | Migrate all uses of deprecated URL global functions to use wfGetUrlUtils()James D. Forrester2024-09-101-2/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | wfGetUrlUtils() is also deprecated, but less so, so we can do this first and then properly replace the individual uses with dependency injection in local pieces of work. Also: * Switching Parser::getExternalLinkRel to UrlUtils::matchesDomainList exposed a type error in media.txt where $wgNoFollowDomainExceptions was set to a string (which is invalid) instead of an array. Bug: T319340 Change-Id: Icb512d7241954ee155b64c57f3782b86acfd9a4c
* | Merge "Avoid use of deprecated wfExpandUrl in various places"jenkins-bot2024-09-102-2/+3
|\ \
| * | Avoid use of deprecated wfExpandUrl in various placesEbrahim Byagowi2024-09-092-2/+3
| |/ | | | | | | | | Bug: T319340 Change-Id: I98e8e3a8fd135a554a85f6399033756c88ea415f
* | Merge "Avoid use of deprecated wfAssembleUrl"jenkins-bot2024-09-091-2/+2
|\ \
| * | Avoid use of deprecated wfAssembleUrlEbrahim Byagowi2024-09-091-2/+2
| |/ | | | | | | Change-Id: I198e862d1dd6eb73f4610a771b5c5e0cd43ce8c7
* | Merge "parser: Add a new {{USERLANGUAGE}} magic word for use in wikitext"jenkins-bot2024-09-083-0/+8
|\ \ | |/ |/|
| * parser: Add a new {{USERLANGUAGE}} magic word for use in wikitextdvorapa2024-09-073-0/+8
| | | | | | | | | | | | | | | | Depending on configuration, this returns either the interface language code of the current user or the current page language. Bug: T4085 Change-Id: Iab7fda272ec81af88c74612727ff6bed014d4a81
* | Merge "Remove ParserOutput::getText() calls from core (runOutputPipeline)"jenkins-bot2024-09-061-3/+5
|\ \
| * | Remove ParserOutput::getText() calls from core (runOutputPipeline)Isabelle Hurbain-Palatin2024-09-061-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the fourth patch of a series of patches to remove ParserOutput::getText() calls from core. This series of patches should be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e. Here we replace calls to getText where a ContentRenderer is available close by by temporary ParserOutput::runOutputPipeline that will eventually be replaced by a call to (probably) ContentRenderer (T371004). Doing this work in stages allows us to separate the work of "bring ParserOptions to the call site" from the work of "bringing ContentRenderer(ish) to the call site", since both need to be done for to make ParserOutput a value object (T293512). Change-Id: Ib4f9357293dc230df6e0ca2379a1e2a4cc1b91b7 Bug: T293512
* | | Merge "Introduce runOutputPipeline and clone by default"jenkins-bot2024-09-061-9/+69
|\| |
| * | Introduce runOutputPipeline and clone by defaultIsabelle Hurbain-Palatin2024-09-061-9/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the third patch of a series of patches to remove ParserOutput::getText() calls from core. This series of patches should be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e. Here we temporarily introduce runOutputPipeline in ParserOutput. It creates and runs the pipeline with default options, and is called by getText. (This is not entirely truthful because we go through a runPipelineInternal transient method for null-argument-passing reasons, but let's not over-complicate this commit message.) getText is responsible for maintaining the current behaviour, that is "disallow the cloning of the ParserOutput and putting text back to as it was" to mitigate T353257. As we get rid of getText, this behaviour should be moved, if necessary, to the caller site. The new method is currently added to ParserOutput so that further refactorings are, for the moment, simpler. It will eventually be moved to another place within the Content framework. We also rename 'suppressClone' to 'allowClone' (which is actually its negation) to avoid multiple levels of negations that make the code confusing. Note that the default value of 'allowClone' is true, and is currently overriden in two places: getText and OutputPage::getParserOutputText (which calls the pipeline directly and not through ParserOutput). Bug: T293512 Bug: T371022 Change-Id: Ibf04af1079aaa1934dc78685b00e636ff4d38a9a
* | | Remove wfRemoveDotSegments, deprecated since 1.39Ebrahim Byagowi2024-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | It didn't have any use outside the core so went for the removal instead of raising warning and hard deprecation. Change-Id: I08dab348a89f1fe1adccfad4f003d9fb8b233f0d
* | | Merge "Use stashed temp name in ParserOptions::newFromContext over anon"jenkins-bot2024-09-061-1/+19
|\ \ \