aboutsummaryrefslogtreecommitdiffstats
path: root/tests/phpunit/includes/parser/ParserOutputTest.php
Commit message (Collapse)AuthorAgeFilesLines
* Merge "Re-apply "Use Remex for DeduplicateStyles transform""jenkins-bot2025-02-181-4/+4
|\
| * Re-apply "Use Remex for DeduplicateStyles transform"Bartosz Dziewoński2025-01-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7f63d5250e8db443d8fce3016abd9757521b590d, re-applying commit 82da9cf14be08e9458f58fa96be51966a2fe7cb1. It can be re-applied safely after T354361 was fixed. Most of the incidental changes from the original patch are no longer needed, as they were made unnecessary by other work, or were applied in I4cb2f29cf890af90f295624c586d9e1eb1939b95. Change-Id: I1ff9a7c94244bffffe5574c0b99379ed1121a86d
* | Add ParserOutputFlags::{HAS_ASYNC_CONTENT,ASYNC_NOT_READY}C. Scott Ananian2025-01-291-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows Parsoid to mark parses which contain async content which is "not ready yet". At the moment this output is cached with a reduced TTL, although in the future it might still be treated as uncacheable, cached until evicted, or some other option. The HAS_ASYNC_CONTENT flag along with ParserOutput::hasReducedExpiry() ensures that RefreshLinksJob is opportunistically reinvoked whenever the page is reparsed, since the asynchronous content may change the metadata for the page when it becomes ready. As describe in T373256, ::hasReducedExpiry() is misnamed now, and a follow-up patch will probably rename it to ::hasDynamicContent() or something like that. What it really means is "RefreshLinksJob must be re-run on every parse, because the content may change on each parse". In the past we would *also* reduce the cache time for pages like this. But for asynchronous content, "the content may change on each parse" only *until* the asynchronous content is "ready". Once it is ready the contents will no longer change, and the cache lifetime can be raised again -- but ::hasDynamicContent() still needs to be set, which in the future will mean "you need to check that RefreshLinksJob has last run" not "you must always run RefreshLinksJob". Asynchronous content will always set HAS_ASYNC_CONTENT, even after the content is "ready", but will only set ASYNC_NOT_READY if it needed to use placeholder content in this render. Bug: T373256 Change-Id: I71e10f8a9133c16ebd9120c23c965b9ff20dabd2
* | Remove 2-line PHPDocs that just repeat the types from the codethiemowmde2025-01-171-16/+0
|/ | | | | | | | | | | | | | | | Same as Ia294bf4 did for 1-line comments. This patch removes slightly more complex 2-line PHPDoc comments that don't add any new information to the code, but literally repeat what the code already says. They say "don't document the code, code the documentation", and we are doing this more and more. We just tend to forget to remove the obsolete comments. Note I'm also removing a line of text in a few cases when it's very short and literally says the same as the method name. Again, such comments add zero new information. Change-Id: I01535404bab458c6c47e48e5456403b7a64198ed
* ParserOutput: Introduce ParserOutput::getLinkList()C. Scott Ananian2024-10-181-4/+132
| | | | | | | | | | | | | | | | | | | | | | | | This deprecates a number of methods which returned arrays by reference and exposed internal representation details of the ParserOutput. It also regularizes the return values to return consistent LinkTarget values, working around the wide variety of different internal storage formats used for links. In the future, once these methods which expose the internal representation are removed, we can simplify our internal storage as well. But for the moment we add the new getter without changing the internal representation. Note that by returning TitleValue objects this new interface also provides a means to fix the issue identified in T204792 where interwiki and namespace prefixes were getting confused. A TitleValue properly distinguishes between these -- although the callers will still have to be careful to use it as a TitleValue and not attempt to reparse it. These methods also correctly handle fragments, which are present for the language link type but stripped for the other linkt types. Bug: T204792 Change-Id: I48a2077b9645124f83082afd953d6bf7a861270b
* Namespace all remaining classes in includes/parserJames D. Forrester2024-10-151-4/+4
| | | | | Bug: T353458 Change-Id: If02cc9b1ff78e26c1cf8c91ee4695845eb133829
* ParserOutput::setPageProperty(): emit deprecation warnings for non-stringsC. Scott Ananian2024-10-041-2/+5
| | | | | | | | | | | This was deprecated in 1.42 but did not previously emit deprecation warnings. Depends-On: I072b111b047cfe13e32a822678d68165d1c76f84 Depends-On: I2734383207b92f71bffc66ba2392a592a1df0954 Depends-On: I79bb5030c13e83f664da1635254f4bc171ed4f3e Depends-On: If64a5239a40953f244657e60f95b2e938abfe447 Change-Id: Ifefd3dab43247d988b7c7ff7874c05c90fc8ce1f
* Deduplicate language links in ParserOutput and OutputPageC. Scott Ananian2024-09-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move deduplication of language links out of Parser.php and into the ParserOutput in order to be compatible with alternate Parsers (Parsoid). Clean up various inconsistencies: ensure deduplication also happens in OutputPage when multiple ParserOutputs are merged into the final output, and ensure that the deduplication in LinksUpdate is done in the same order (first link prevails) as in Parser/ParserOutput/OutputPage. Deprecate OutputPage::setLanguageLinks() (the matching ParserOutput::setLanguageLinks() was deprecated in 1.42). As a breaking change, return an array, not an array *reference*, from ParserOutput::getLanguageLinks(). This allows us to safely modify the internal representation of language links. As far as I can tell, no one used the returned reference to sneakily modify the list of language links, and there not a good way to have deprecated this before making the breaking change. While we're at it, we've added tests to ensure that language link fragments are preserved. Bug: T26502 Bug: T358950 Bug: T375005 Change-Id: I82a05a51d94782ebb9fa87ff889ca0f633b3e15c
* ParserOutput::collectMetadata(): fix handling of linksC. Scott Ananian2024-09-131-15/+43
| | | | | | | | | | | | | For language links, when there are conflicts between namespaces and interwiki prefixes, it is important to use TitleValue for language links rather than to try to reparse the Title. Language links also preserve fragments, unlike other link types in ParserOutput; added tests to document this. Added handling for interwiki links and template links. Bug: T363538 Change-Id: I6e8ff8ed7f8819000cc3f80e49c0739b568217a4
* Remove ParserOutput::getText() calls from core (runOutputPipeline)Isabelle Hurbain-Palatin2024-09-061-41/+3
| | | | | | | | | | | | | | | | | This is the fourth patch of a series of patches to remove ParserOutput::getText() calls from core. This series of patches should be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e. Here we replace calls to getText where a ContentRenderer is available close by by temporary ParserOutput::runOutputPipeline that will eventually be replaced by a call to (probably) ContentRenderer (T371004). Doing this work in stages allows us to separate the work of "bring ParserOptions to the call site" from the work of "bringing ContentRenderer(ish) to the call site", since both need to be done for to make ParserOutput a value object (T293512). Change-Id: Ib4f9357293dc230df6e0ca2379a1e2a4cc1b91b7 Bug: T293512
* Remove ParserOutput::getText() calls from core (direct pipeline)Isabelle Hurbain-Palatin2024-08-231-6/+9
| | | | | | | | | | | | | | | This is the second patch of a series of patches to remove ParserOutput::getText() calls from core. This series of patches should be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e. This patch replaces the calls to getText where the legacy parser is called directly by creating a pipeline and invoking it on the generated. These should probably eventually use the Content framework to generate output instead of using Parser directly (T371008), which will also allow them to transparently support Parsoid. Bug: T293512 Change-Id: I45951a49e57a8031887ee6e4546335141d231c18
* Remove ParserOutput::getText() calls from core (getRawText)Isabelle Hurbain-Palatin2024-08-191-4/+4
| | | | | | | | | | | | This is the first patch of a series of patches to remove ParserOutput::getText() calls from core. This series of patches should be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e. This first patch replaces the calls to getText to calls to ParserOutput::getRawText when the pipeline has no reason to be executed. Bug: T293512 Change-Id: I0ad53cd074ca9cf13e96e6b08179e13aeea180f4
* SerializationTestTrait: make phpunit data providers staticC. Scott Ananian2024-05-221-5/+5
| | | | | | | | | | | | PHPUnit 10 requires test data providers to be static. This patch doesn't completely do the job, as some providers currently call the non-static method `$this->markTestSkippedIfPhp()`, but it gets us closer to PHPUnit 10 compatibility. This also lets us simplify the validateParserCacheSerializationTestData maintenance script and make it slightly more generic. Change-Id: Ie3696bfaa29aca9da45f54239126222e8c847ea9
* Move section edit links outside headings (new heading HTML)Bartosz Dziewoński2024-05-061-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Legacy parser can now output headings using a more accessible markup, which is also identical to the markup used by the Parsoid parser. Changes to client-side JS and CSS necessary to support the new markup have already been merged in earlier commits. includes/skins/Skin.php includes/ServiceWiring.php * Define a new skin option, 'supportsMwHeading', which can be used to toggle the new markup per-skin. * Update the built-in fallback skin to enable it. This affects the output in parser tests. docs/config-schema.yaml includes/config-schema.php includes/config-vars.php includes/MainConfigNames.php includes/MainConfigSchema.php * Add a new configuration setting, 'ParserEnableLegacyHeadingDOM', which can be used to toggle the new markup per-site. includes/OutputTransform/Stages/HandleSectionLinks.php * Output new heading HTML for skins that enabled the option. tests/* * Duplicate parser tests that cover heading generation to cover both new and old markup. Update other parser tests to use new markup. * Add some unit and integration tests for the behavior of the skin option and some parser tests for edge cases of the new markup. Bug: T13555 Change-Id: I1180169a8e83af834c2984ba16089e6277f2a8dd
* phpunit: Fix tests relying on implicit wgScript/wgArticlePathTimo Tijhof2024-05-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of tests have hardcoded expections that pass only in WMF CI where Quibble has LocalSettings.php with $wgScript and $wgArticlePath set a certain way. We could fix these by adding setMwGlobals() in their tests, as we often do, but these are so often forgotten that I'd rather we just add them to TestSetup.php so that it is simply impossible to write a test that that passes locally for you (if you have the same config) but not for someone else. There is a larger project in there somewhere about expanding this slowly such that we basically only pluck DB-settings and extension enablement from LocalSettings and otherwise run the tests with the default settings in PHPUnit. Pretty much by definition, any (other) setting you have in LocalSettings is irrelevant because it either: 1. has no effect on the test (majority, harmless either way), 2. has a custom default via TestSetup.php (which has precedence over LocalSettings.php), 3. is relevant to the code being tested and the test case correctly calls setMwGlobals() to ensure a consistent value during test. 4. is relevant to the tested code but has no override, thus only passes if you happen to have the "right" value set for it (undesirable). Case 4 is already categorically impossible for the most common config settings that influence random code because we give them a value in TestSetup.php. This patch expands that to include $wgScript and $wgArticlePath. Perhaps in the future we can think about a way to do this automatically by either re-applying MainConfigSchema (sans db settings) or by only selectively applying LocalSettings.php in the first place. This patch follows-up I072ddf89562fe, which added a test case in WikitextContentHandlerIntegrationTest.php that assumed "/index.php" as the value of $wgScript. This passes in WMF CI since Quibble uses that value, but the tests failed in most local development installs since those tend to use "/w" instead. Rather than one-off fixing that one test with overrideConfigValues(), switch to a more general fixture, since the precise values don't matter for this test. Bug: T349087 Bug: T277470 Change-Id: If4304b7ca4a838bd892d4516a0b5c6dfbc30986e
* namespace MWDebugAmir Sarabadani2024-05-031-1/+1
| | | | | Bug: T353458 Change-Id: I99d728bd111ff882220cd175ff09d4da20b81eae
* Merge "Add ParserOptions::setCollapsibleSections()"jenkins-bot2024-04-291-0/+3
|\
| * Add ParserOptions::setCollapsibleSections()C. Scott Ananian2024-04-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a non-default option that will add a <div> wrapper around section contents to allow client-side collapsing. This is intended for use by MobileFrontEnd, but could eventually be enabled for desktop read views as well. Since this parser option is in the "cache-varying options" set, any caller who sets this option will fork the cache for that page, which is reasonable as the parser options sets a ParserOutput property. In the future our caching strategy will get smarter and we'll add code which avoids the cache split and just transfers the appropriate values from ParserOptions to ParserOutput flags after the cached output is retrieved. Bug: T359001 Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
* | Replace all instances of "per default" with "by default"Tim Starling2024-04-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the dictionary, "per" (or more conventionally "as per") means "according to". Refer OED "per" sense II.3.a. For example: "No value was passed, so return null, as per default". In this sentence, we are not specifying the default, we are referring to the default. This correct usage of "per default" was used nowhere in MediaWiki core as far as I can see. Instead we have "per default" being used to mean "by default", that is, giving the value to use when no explicit value was specified. In OED, the phrase "by default" is blessed with its own section just for computing usage: "P.1.e. Computing. As an option or setting adopted automatically by a computer program whenever an alternative is not specified by the user or programmer. Cf. sense I.7a." There are highly similar pre-computing usages of the same phrase, whereas the phrase "per default" is not mentioned. As a matter of style, I think "per default" should not be used even when it is strictly correct, since the common incorrect usage makes it ambiguous and misleading. Change-Id: Ibcccc65ead864d082677b472b34ff32ff41c60ae
* | [ParserOutput] Remove deprecated ::getTOCHTML() and ::setTOCHTML() methodsC. Scott Ananian2024-04-161-5/+0
| | | | | | | | | | | | These were deprecated with warnings in 1.40. Change-Id: I8027bc26c71ae94d3d5c7e5112545cd1b35749aa
* | ParserOutput: Rename ::setIndexedPageProperty() to ::setNumericPageProperty()C. Scott Ananian2024-04-151-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this method name gets baked forever into the 1.42 release, rename the ParserOutput::setIndexedPageProperty() and ::setUnindexedPageProperty() methods to ::setNumericPageProperty() and ::setUnsortedPageProperty() to try to address some confusion about whether the *presence* of the page property is still indexed (it is!), in contrast to whether there's an additional "sort key" associated with the *value* assigned to the page property. This naming is compatible with the feature request in T357783 to have the sort key and property value specified independently. The new method signature in that case would be: ...setSortedPageProperty( string $name, string $value, int|float $sortKey ) Although PHP 8.0 will throw a TypeError if a non-numeric type is coerced to numeric using `0 + ...`, use an explicit is_numeric check to obtain the same behavior in PHP 7.x. Change-Id: Ia94c192c429d0482c58467bed787fd2e0aca052f
* | Add ParserOutput::setIndexedPageProperty(); deprecate numeric propertiesC. Scott Ananian2024-04-051-7/+43
|/ | | | | | | | | | | | Deprecate non-string values to ::setPageProperty(), which introduce easy traps for programmers to fall into. Instead if page properties are intended to be indexed, use the new ::setIndexedPageProperty() instead. Also add ::setUnindexedPageProperty() for symmetry, with a tighter string type on the value. Bug: T305158 Bug: T350224 Change-Id: I8a39a7c90341dfee932aa819c9a0a637a8782f69
* HandleSectionLinks: Remove old debug logging for resolved bugBartosz Dziewoński2024-03-071-1/+1
| | | | | | | This is just a cleanup change. The exception should never happen, but if it does, this can be reverted. Change-Id: I26a7c4105d39d83015c09b779a2de3fd1ddacec1
* tests: Avoid php warnings about deprecation from data providersUmherirrender2024-02-251-0/+5
| | | | | | | | | | | | | | | | | | | | | | | Call the suppression function in the data provider as that function is called there. Also escape \ for the P to avoid \P PHP Deprecated: CacheTime::setCacheTime called with -1 as an argument [Called from MediaWiki\Tests\Parser\ParserOutputTest::provideMergeInternalMetaDataFrom in /workspace/src/tests/phpunit/includes/parser/ParserOutputTest.php at line 1028] in /workspace/src/includes/debug/MWDebug.php on line 379 PHP Warning: preg_match(): Compilation failed: unknown property after \P or \p at offset 19 in /workspace/src/includes/debug/MWDebug.php on line 370 PHP Deprecated: Use of MediaWiki\Parser\ParserOutput::setTOCHTML was deprecated in MediaWiki 1.40. [Called from MediaWiki\Tests\Parser\ParserCacheSerializationTestCases::getParserOutputTestCases in /workspace/src/tests/phpunit/includes/parser/ParserCacheSerializationTestCases.php at line 236] in /workspace/src/includes/debug/MWDebug.php on line 379 Bug: T355952 Follow-Up: Ifed584aac7947c47fd494a66564a3bd367f83c49 Change-Id: I7d6075b2c2b3d53db47e37040394bd013cece8a6
* Merge "[ParserOutput] Rename $mText to $mRawText and ::setText() to ↵jenkins-bot2024-02-211-3/+3
|\ | | | | | | ::setRawText()"
| * [ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()C. Scott Ananian2024-02-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ParserOutput::getText() is not a simple getter, but does transformations on the "text" of the ParserOutput; the simple getter is named ::getRawText(). To maintain consistency, rename ParserOutput::setText() to ::setRawText() and the property name ParserOutput::$mText to ::$mRawText so future readers are not confused. The JSON property name as it appears in the serialized ParserCache is left as 'Text' so that we don't have any forward- or backward- rollback issues. Change-Id: I3ef34814ab9473cc70d0a6806e8c5a4a02b73491
* | tests: Namespace more parser classesReedy2024-02-171-1/+5
| | | | | | | | Change-Id: I35d6e3181ed885b8731ff1c4b5703459fb4223e4
* | tests: Fix @covers and @coversDefaultClass to have leading \Reedy2024-02-161-1/+1
|/ | | | Change-Id: I5629f91387f2ac453ee4341bfe4bba310bd52f03
* [ParserOutput] Make 'enableSectionEditLinks' a ParserOptionC. Scott Ananian2024-02-091-0/+5
| | | | | | | | | | | | | | | | | | This will allow the Translate extension to set this parser option in the ArticleParserOptions hook, instead of mutating $options passed to ParserOutput::getText() in the ParserOutputPostCacheTransform hook. It ought to also help to handle the many places which call: ... = $parserOutput->getText( [ 'enableSectionEditLinks' => false, ] ); by allowing them to set the appropriate ParserOption instead of passing arguments to ::getText(). Bug: T350626 Change-Id: I719c115194059060f7f888608417a194ac80cc92
* Merge "Namespace includes/context"jenkins-bot2024-02-081-0/+1
|\
| * Namespace includes/contextJames D. Forrester2024-02-081-0/+1
| | | | | | | | | | Bug: T353458 Change-Id: I4dbef138fd0110c14c70214282519189d70c94fb
* | Introduce ParserOutput:setFromParserOptions() and use for preview flagC. Scott Ananian2024-02-071-0/+18
|/ | | | | | | Bug: T341010 Co-Authored-by: cananian <cananian@wikimedia.org> Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org> Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6
* Rename ParserOutput::{get,set}Timestamp() to ::{get,set}RevisionTimestamp()C. Scott Ananian2024-02-071-6/+6
| | | | | | | | | | This avoids confusion with the "render timestamp" held by the cache, and is consistent with ::get*RevisionId() etc. The old ::getTimestamp() and ::setTimestamp() methods have been deprecated. Change-Id: Idb5e687709c98086c5d3075d31885c58a0723197
* Add ParserOutput::{get,set}RenderId() and set render id in ContentRendererC. Scott Ananian2024-02-071-0/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
* tests: Use namespaced class names in @covers annotationsUmherirrender2024-01-271-43/+43
| | | | | | Assist from 8c9cb701e56226cac43fee2fa24b0d0e586f1733 Change-Id: I47897c499028d9e24c00ad0bc6ba7fd8002d9bc1
* Revert "Use Remex for DeduplicateStyles transform"Isabelle Hurbain-Palatin2023-12-221-4/+4
| | | | | | | | | | | This reverts commit 82da9cf14be08e9458f58fa96be51966a2fe7cb1. Passing through Remex seems to have unexpected consequences to be investigated but, for the sake of unbreaking the UBN, let's revert this first. Bug: T353920 Change-Id: Iaac7942aa77aee5ab525852ac5b41dd516ff13c9
* Use Remex for DeduplicateStyles transformC. Scott Ananian2023-12-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | The previous implementation was using an ad-hoc regular expression which was matching inside the data-mw attribute of Parsoid output, eg: <sup about="#mwt42" [...] typeof="mw:Extension/ref mw:Error" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;attrs&quot;:{&quot;name&quot;:&quot;infobox_stats_ref_rail&quot;},&quot;body&quot;:{&quot;html&quot;:&quot;<style data-mw-deduplicate=\&quot;TemplateStyles:r1133582631\&quot; typeof=\&quot;..."> After substitution, the <link> element inserted contained " instead of &quot; and so broke out of the attribute. Instead use a proper HTML tokenizer (via wikimedia/remex-html) so that we don't allow bogus matches inside attribute values. To fix up tests: * Don't deduplicate styles when parsing UX messages (also helps performance) * Don't deduplicate styles in ContentHandler integration tests * Don't deduplicate styles by default in parser tests (unless explicit option is set) Depends-On: Id9801a9ff540bd818a32bc6fa35c48a9cff12d3a Depends-On: I5111f1fdb7140948b82113adbc774af286174ab3 Followup-To: Ic0b17e361bf6eb0e71c498abc17f5f67f82318f8 Change-Id: I32d3d1772243c3819e1e1486351d16871b6e21c4
* Namespace ParserOutputJames D. Forrester2023-12-141-2/+3
| | | | | | | Most used non-namespaced class! Bug: T353458 Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
* ParserOutput: Allow passing LinkTarget to title-related methodsC. Scott Ananian2023-12-081-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadened the argument type to allow passing LinkTarget to: * ParserOutput::addCategory() * ParserOutput::addLanguageLink() * ParserOutput::addLink() * ParserOutput::addImage() * ParserOutput::addTemplate() This allows for a tighter interface with Parsoid's ContentMetadataCollector class and avoids errors caused by passing the wrong form of string title ("text" with spaces versus "dbkey" with underscores). There are a few performance problems remaining after this patch, which only apply to use by Parsoid (not the legacy parser): 1. ::addLink() does inefficient db requests to fetch the page id for each link if the optional $id parameter is not passed. These lookups should be deferred and a LinkBatch used. (The legacy parser always passes $id.) 2. ::addTemplate() similarly requires $page_id (and $rev_id) to be passed, so is not currently usable by Parsoid. 3. ::addLanguageLink() uses Title::getFullText() which is not present in LinkTarget and is currently implemented as a full Title lookup. This is not an issue for the legacy parser, because it already has a Title object so the lookup is a no-op, but could be improved for Parsoid's use. Bug: T296023 Change-Id: If21ec8563c8a619bdde7c0cb6534bb9009480a21
* Only cache expensive renderingsdaniel2023-11-301-0/+54
| | | | | | | | | | | | | | | | | | | | Pages that are fast to render can be omitted from the parser cache to preserve disk space and cache write operations. The threshold is configurable per namespace, so the tradeoff can be evaluated based on different access patterns. For example, pages that are accessed rarely, like file description pages on commons, may have a high threshold configured, while pages that are read frequently, like wikipedia articles, may be configured to be always cached, using a 0 threshold. Filtering is based on a time profile recorded in the ParserOutput. A generic mechanism for capturing the timing profile is implemented in the ContentHandler base class. Subclasses may implement a more rigorous capture mechanism. Bug: T346765 Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
* parser: Move lang/dir and mw-content-ltr to ParserOutput::getTextTimo Tijhof2023-11-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | == Skin::wrapHTML == Skin::wrapHTML no longer has to perform any guessing of the ParserOutput language. Nor does it have to special wiki pages vs special pages in this regard. Yay, code removal. == ImagePage == On URLs like /wiki/File:Example.jpg, the main output handler is ImagePage::view. This calls the parent Article::view to handle most of its output. Article::view obtains the ParserOptions, and then fetches ParserOutput, and then adds `<div class=mw-parser-output>` and its metadata to OutputPage. Before this change, ImagePage::view was creating a wrapper based on "predicting" what language the ParserOutput will contain. It couldn't call the new OutputPage::getContentLanguage or some equivalent as Article::view wouldn't have populated that yet. This leaky abstraction is fixed by this change as now the `<div>` from ParserOutput no longer comes with a "please wrap it properly" contract that Article subclasses couldn't possibly implement correctly (it coudln't wrap it after the fact because Article::view writes to OutputPage directly). RECENT (T310445): A special case was recently added for file pages about translated SVGs. For those, we decide which language to use for the "fullMedia" thumb atop the page. This was recently changed as part of T310445 from a hardcoded $wgLanguageCode (site content lang) to new problematic Title::getPageViewLanguage, which tries to guestimate the page language of the rendered ParserOutput and then gets the preferred variant for the current user. The motivation for this was to support language variants but used Title::getPageViewLanguage as a kitchen sink to achieve that minor side-effect. The only part of this now-deprecated method that we actually need is LanguageConverter::getPreferredVariant(). Test plan: Covered by ImagePageTest. == Skin mainpage-title == RECENT (T331095, T298715): A special case was added to Skin::getTemplateData that powers the mainpage-title interface message feature. This is empty by default, but when created via MediaWiki:mainpage-title allows interface admins to replace the H1 with a custom and localised page heading. A few months ago, in Ifc9f0a7174, Title::getPageViewLanguage was applied here to support language variants. Replace with the same fix as for ImagePage. Revert back to Message::inContentLanguage() but refactor to inLanguage() via MediaWikiServices::getContentLanguage so that LanguageConverter::getPreferredVariant can be applied. == EditPage == This was doing similar "predicting" of the ParserOutput language to create an empty preview placeholder for use by preview.js. Now that ApiParse (via ParserOutput::getText) returns a usable element without any secret "you magically know the right class, lang, and dir" contract, this placeholder is no longer needed. Test Plan: * EditPage: Default preview 1. index.php?title=Main_Page&action=edit 2. Show preview 3. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr> * EditPage: JS preview 1. Preferences > Editing > Show preview without reload 2. index.php?title=Main_Page&action=edit 3. Show preview 4. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr> 5. Type something and 'Show preview' again 6. Assert old element gone, new text is shown, and new element attributes are the same as the above. == McrUndoAction == Same as EditPage basically, but without the JS preview use case. == DifferenceEngine == Test: 1. Open /w/index.php?title=Main_Page&diff=0 (this shows the latest diff, can do manually by viewing /wiki/Main_Page, click "View history", click "Compare selected revisions") 2. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr> 3. Open /w/index.php?title=Main_Page&diff=0&action=render 4. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr> == Special:ExpandTemplates == Test: 1. /wiki/Special:ExpandTemplates 2. Write "Hello". 3. "OK" 4. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr> Bug: T341244 Depends-On: Icd9c079f5896ee83d86b9c2699636dc81d25a14c Depends-On: I4e7484b3b94f1cb6062e7cef9f20626b650bb4b1 Depends-On: I90b88f3b3a3bbeba4f48d118f92f54864997e105 Change-Id: Ib130a055e46764544af0f1a46d2bc2b3a7ee85b7
* tests: Use fallback skin for ParserOutput/DefaultOutputTransform testsBartosz Dziewoński2023-10-241-6/+10
| | | | | | | | | | | This matches the behavior of parserTests.txt again (in which the fallback skin is used by ParserTestRunner::runLegacyTest). The extra <span> wrappers were added by the Vector skin (and could be affected by future changes to the Vector skin). Follow-up to Ief6a6ee03ada8207fc5c60ea438412fa2d529022. Change-Id: I33729b5026fcfbdbacc0e3fdfef91c9e6b461e6c
* Skin: Separate generation of edit section data from HTMLJon Robson2023-10-231-6/+6
| | | | | | | | | | | | | | The SkinMustache class now accepts a skin option that allows callers to specify a template that can be used to render the edit section link. Additional change: * Parser tests updated as now edit link label is wrapped as a span when rendered in Vector 2022 consistent with other links. Bug: T346944 Change-Id: Ief6a6ee03ada8207fc5c60ea438412fa2d529022
* Refactor ParserOutput::getText into DefaultOutputTransform serviceIsabelle Hurbain-Palatin2023-10-161-0/+6
| | | | | | | | | | | This also introduces the ephemeral field "$mTransformedText" to store the result of transformation in ParserOutput. This is a first step before the transformation uses HtmlHolder as input and output. Bug: T348253 Change-Id: I312f3748ebfb0373ee3542ba0abdeefe7db1d488
* Remove implicit setter for ParserOutput::mTOCHTMLC. Scott Ananian2023-10-041-8/+1
| | | | | | | | | The ::setTOCHTML() and ::getTOCHTML() method have been deprecated since 1.40; there's no reason we should be updating ::$mTOCHTML behind their backs. Bug: T348134 Change-Id: I9396bc0a2caeb974a06c5b47075b3e2bb9f4278a
* Hard-deprecate ParserOutput::getCategories(), deprecated in 1.40C. Scott Ananian2023-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | | It is difficult to distinguish this method from OutputPage::addJsConfigVars() in code search: https://codesearch.wmcloud.org/deployed/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3EgetCategories%5C%28&files=&excludeFiles=&repos= We generally try to replace $output with $parserOutput or $pOutput as we touch code to improve the ability of codesearch to dig up deprecated ParserOutput methods. Bug: T305161 Depends-On: I02dd4f61c43c225b0ef6dc51c3e4f9d967a0a272 Depends-On: I61d2d77591579d825ad9d37f902e40366be55dd6 Depends-On: I91155106b7a9e10d3334f95ba4936d02851bfb11 Depends-On: Iaca745c79d9587571af03b23b21d76a6cba0ebf1 Depends-On: Id10a171c44411b1233ee4d6cf8fbd3dc57744eef Depends-On: I47a25c011d9bd4b1a15dda4e673e32c25eb64f2b Depends-On: I683fc768aba50b801f46467fcfa1668fa8731ea6 Change-Id: I5a2ac1c99b8b199102e12f0d32dd6ec5cdc24054
* Merge "Namespace TitleValue under \MediaWiki\Title"jenkins-bot2023-09-181-0/+1
|\
| * Namespace TitleValue under \MediaWiki\TitleJames D. Forrester2023-09-181-0/+1
| | | | | | | | | | | | | | One of the big ones, so doing this alone. Bug: T166010 Change-Id: I4c901d5c32696d8334ec30cede7d9b6f3d8d645e
* | Remove ParserOutput::addOutputHook() and related codeC. Scott Ananian2023-09-181-22/+0
| | | | | | | | | | | | | | | | | | | | ParserOutput::addOutputHook() has been deprecated since 1.38, and without any calls to ::addOutputHook() the associated ::getOutputHooks() and $wgParserOutputHooks configuration do nothing. Bug: T292321 Bug: T305161 Change-Id: Ib770c680d5e0697980e7e36a323ec56ba1d806b8
* | Remove ParserOutput::addTrackingCategory(), deprecated since 1.38C. Scott Ananian2023-09-181-26/+0
|/ | | | | | | | Instead use either Parser::addTrackingCategory() or the TrackingCategories service. Bug: T305161 Change-Id: I19e0f67e377e6c68f54f6d5bb4f079110d1e61fc