mediawikicore.git - The collaborative editing software that runs Wikipedia.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Sync up core repo with Parsoid	C. Scott Ananian	2025-04-04	1	-18/+70
\| \| \| \| \| \|	This now aligns with Parsoid commit 0965c908f046d659aab16b4023cc8de9ded1fce7 Change-Id: Ic007c7b4a893329de8499a88bb0edcb4b04d0905
*	Fixes to "Parsoid Fragment Support v2"	Subramanya Sastry	2025-03-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* In 387061415a38ea2d28e76ac9d7d599f6f02deec3, we added support for StripState::split. In 8465c722, we added support for 'exttag' strip marker which introduced the possibility of recursive strip markers. This patch fixes the oversight and adds recursive processing for nested strip markers. The code matches the logic of unstripType. Verified on local wiki that this fixes the issues highlighted in T387608. * Ensure that processNowiki is true when fragment mode v2 is being used (ie, when stripExtTags is false). This makes unnecessary the StripState::replaceNoWikis() function added to support mw.text.unstripNoWiki in T272507 (and broken in T387655). The workaround can be cleaned up once v2 fragment mode is enabled everywhere. This fixes a regression in Scribunto's mw.text.unstripNoWiki function when v2 fragment mode is used. * Ensure that the T299103 workaround for {{#tag:<nowiki>...</nowiki>}} continues to work by calling unstripNowiki() after PROCESS_NOWIKI puts the <nowiki> contents into the strip state. This fixes a regression in {{#tag:syntaxhighlight\|<nowiki>....</nowiki>}} when using v2 fragment mode. * Added 'marker' to StripState::split() output, so that unhandled strip state components can be left as strip markers. * Added some StripState::split() phpunit tests. * Changed ParserTestRunner to enable v2 fragment mode by default, which helped identify the Scribunto and SyntaxHighlight regressions above, covered by their parser test suites. Bug: T387608 Bug: T387655 Bug: T272507 Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org> Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org> Depends-On: I5e2533b7992b8e8a03fe2ea622b6fe5b008d20be Change-Id: I43134281e4da1c8767520e418031935447ea93af
*	Sync up core repo with Parsoid	Subramanya Sastry	2025-03-10	1	-9/+21
\| \| \| \| \| \|	This now aligns with Parsoid commit 4d44dbd77363ad4e7c428ecd30029e906018fa6c Change-Id: If883c36d599464aa6ed49edf71fd44dc880b3efd
*	DataAccess::preprocessWikitext(): fix logic around WikitextPFragment merging	C. Scott Ananian	2025-02-12	1	-0/+26
\| \| \| \| \| \| \| \|	Prevent unnecessary <nowiki/> tags from being inserted beween extension tags resulting from template or parser function expansion. Bug: T386233 Change-Id: I1da9539837532e6690765e0717eee2f38378809c
*	Remove temporary $wgParsoidNewTemplateExpansionMode configuration	C. Scott Ananian	2025-02-03	1	-20/+0
\| \| \| \| \| \| \| \| \|	This was used to test an experimental parsoid feature before deployment, but the testing was successful. Bug: T382464 Follows-Up: I194a9550500bf7ece215791c51d6feb78a80b1a8 Change-Id: Ib91a17868352722dc3570b07856423733f1b2368
*	Parsoid fragment support: fix handling of 'nowiki' and 'general' strip markers	C. Scott Ananian	2025-02-03	1	-3/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I originally misunderstood the difference between 'nowiki' and 'general' strip markers, thinking that 'nowiki' was "literal text" which needed to be HTML escaped, and 'general' was "HTML". This is not correct! Both strip marker types are included as raw HTML; the 'general' strip marker however is technically "half-parsed HTML" and (in the legacy parser) is subject to doBlockLevels, Language Conversion, and other processing (T381709). Parsoid does not currently have any support for processing "half-parsed HTML" and to date all code paths involving "half-parsed HTML" have been deprecated. For the moment, we'll treat the "half-parsed HTML" as "fully-parsed HTML" in Parsoid. This patch doesn't make any changes to legacy parser behavior. Change-Id: I07bacdb4bbe90728d2faa207c19fb92ad0e4a257
*	Add 'isRawHTML' output mode for parser functions and extensions	C. Scott Ananian	2025-02-02	1	-0/+69
\| \| \| \| \| \| \| \| \| \|	Ensure that when a parser function or extension returns raw HTML (using the new 'isRawHTML' flag) it is protected from doBlockLevels, language conversion, etc by using a 'nowiki' strip marker. Bug: T381617 Depends-On: I8f43f6ae9ca9a0c8d88c92b65c81fdc5cfa09dc3 Change-Id: Icb8eae9c1f3146e19c6bd811ab1fc86eebaa991f
*	Sync up core repo with Parsoid	Arlo Breault	2025-01-29	1	-112/+22
\| \| \| \| \| \|	This now aligns with Parsoid commit b9166ba69b1148e5b8d62dd200fa25fc79116b96 Change-Id: I5ca957b030639815786138b76c65720d706c13a6
*	Merge "Use Remex/HtmlHelper to implement Parser::replaceTableOfContents"	jenkins-bot	2025-01-29	1	-1/+1
\|\
\| *	Use Remex/HtmlHelper to implement Parser::replaceTableOfContents	C. Scott Ananian	2025-01-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is more robust and secure than the regular expression previously used to extract the <meta> tag. We also improve HtmlHelper slightly be adding the ability to replace an element with an 'outerHTML' string. Because our output is being run through Remex, there is a slightly larger degree of HTML normalization in the output than previously, which is visible in some small tweaks to test case outputs. Bug: T381617 Depends-On: I2712e0fa9272106e8cd686980f847ee7f6385b6f Change-Id: I4cb2f29cf890af90f295624c586d9e1eb1939b95
* \|	Sync up core repo with Parsoid	C. Scott Ananian	2025-01-23	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \|	This now aligns with Parsoid commit 3c3e96d168b5b5a5fe90520ea23f938a7a59181d Change-Id: Iadbe23dcc4b9ee68ad10220623ad9edae0b41b40
* \|	Support nested special page transclusion in Parsoid	Arlo Breault	2025-01-17	1	-1/+21
\|/ \| \| \| \|	Bug: T356718 Change-Id: Ie50308bde7212cc19d6fe6273ae36e79ae5f94c3
*	Sync up core repo with Parsoid	Subramanya Sastry	2024-12-18	1	-10/+79
\| \| \| \| \| \|	This now aligns with Parsoid commit 17e81f0d1890bef61f3d12be69a02f8a1fdd3edf Change-Id: I03929213653349b625eb75d9b0444cdd98466c89
*	Sync up core repo with Parsoid	C. Scott Ananian	2024-11-15	1	-40/+75
\| \| \| \| \| \|	This now aligns with Parsoid commit 1402de4384d49f46b4c72e71797714486e8cec9b Change-Id: I2e6b7b5cb4dd533f83a7eb69ce49e57d5346f291
*	Fix typo is comments	Fomafix	2024-11-14	1	-1/+1
\| \| \| \|	Change-Id: I1b7a9d85fbea10406def755da553ef7ba47e1858
*	Drop empty ids	Arlo Breault	2024-11-01	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \|	Empty ids aren't valid identifiers. The spec says they must contain at least one character, https://html.spec.whatwg.org/multipage/dom.html#global-attributes:the-id-attribute-2 Test was introduced in If95fd9410f8d2e1ed403ea063e09670a7f71dcce Depends-On: Iec3c919ed1ea51acef9efabe979bd8d0feaf651a Change-Id: I3c547f5524530e976eb7aa960751265c8383f7b4
*	Use a better bidi aware markup in CommentParser	Ebrahim Byagowi	2024-10-04	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	As noted on the comments, this needed a markup that work better in bidi scenarios and as a part of replacing bidi control codes with HTML markup I was able to test different bidi scenarios using <bdi> HTML tags. Bug: T375975 Change-Id: If2af751fc9f78869acf7b7e93199fa927de2cc19
*	Sync up core repo with Parsoid	C. Scott Ananian	2024-10-02	1	-1/+19
\| \| \| \| \| \|	This now aligns with Parsoid commit b19f73d7beadedcb6991640aac7eb7d6e7aec8f5 Change-Id: Ief91b25769f777169af65c9720faa767850f6239
*	Deduplicate language links in ParserOutput and OutputPage	C. Scott Ananian	2024-09-26	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move deduplication of language links out of Parser.php and into the ParserOutput in order to be compatible with alternate Parsers (Parsoid). Clean up various inconsistencies: ensure deduplication also happens in OutputPage when multiple ParserOutputs are merged into the final output, and ensure that the deduplication in LinksUpdate is done in the same order (first link prevails) as in Parser/ParserOutput/OutputPage. Deprecate OutputPage::setLanguageLinks() (the matching ParserOutput::setLanguageLinks() was deprecated in 1.42). As a breaking change, return an array, not an array reference, from ParserOutput::getLanguageLinks(). This allows us to safely modify the internal representation of language links. As far as I can tell, no one used the returned reference to sneakily modify the list of language links, and there not a good way to have deprecated this before making the breaking change. While we're at it, we've added tests to ensure that language link fragments are preserved. Bug: T26502 Bug: T358950 Bug: T375005 Change-Id: I82a05a51d94782ebb9fa87ff889ca0f633b3e15c
*	Sync up core repo with Parsoid	C. Scott Ananian	2024-09-26	1	-293/+4
\| \| \| \| \| \|	This now aligns with Parsoid commit fc9ab0949952d5e784acb012096860f5c8663fc7 Change-Id: I5d72f551c75de80b0834ea98d8a1d3cb5852e866
*	Sync up core repo with Parsoid	C. Scott Ananian	2024-09-24	1	-35/+61
\| \| \| \| \| \|	This now aligns with Parsoid commit dea42dd799d9c40fb7fedb42122ec264d6ef6ded Change-Id: I4b2614ce3a83bfea0af53927464e7fbde6a92df9
*	Parser tests: add additional options to test ParserOutput metadata	C. Scott Ananian	2024-09-13	1	-3/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	New options added: `iwl`, `links`, `special`, `extlinks`, and `templates`, and handling of existing `ill` option tweaked to be consistent. Added some tests to exercise these options, focusing on the handling of title fragments. Attempted to make the output formatting consistent among options; a future unification (I32df68714ffdf2f0745b974f47bc3ccceef1f41c) should help DRY these out further. Bug: T310512 Change-Id: Ic9c766ae4362969de124ad9d66eb47cfa68395c6
*	Sync up core repo with Parsoid	Yiannis Giannelos	2024-09-12	1	-3/+91
\| \| \| \| \| \|	This now aligns with Parsoid commit 80bc41a395b19221e7f26b36dfbe0ab15a025819 Change-Id: Iec571f78e7a55991aea69ede2519803b84c05936
*	parserTests.txt: Update documentation about cat/ill options	C. Scott Ananian	2024-09-10	1	-2/+0
\| \| \| \| \| \|	Parsoid does support these options now. Change-Id: I9caedd10b8f7229602ad4f963275b62777aca104
*	parser: Add a new {{USERLANGUAGE}} magic word for use in wikitext	dvorapa	2024-09-07	1	-1/+1
\| \| \| \| \| \| \| \|	Depending on configuration, this returns either the interface language code of the current user or the current page language. Bug: T4085 Change-Id: Iab7fda272ec81af88c74612727ff6bed014d4a81
*	Merge "Make {{#language}} consistent with {{#dir}} and {{#bcp47}}"	jenkins-bot	2024-07-31	1	-2/+61
\|\
\| *	Make {{#language}} consistent with {{#dir}} and {{#bcp47}}	C. Scott Ananian	2024-07-30	1	-2/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the same no-arg options for language code that {{#dir}} and {{#bcp47}} have, for consistency: * `{{#language}}` will return the name of the target language (for articles, the content language; for messages, the user language) The default value for the "in language" argument should be the autonym. This was working previously but only via a baroque code flow path for invalid language codes. Make this a bit clearer and add tests. Since non-autonym language code translations are added via the [[Extension:CLDR]] in production, hook LanguageGetTranslatedLanguageNames in the ParserTestRunner to ensure that we can test this. Followup-To: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36 Followup-To: I19c3e91a924e080f37dc95a0d4e61493583b533e Change-Id: Ibf6e7f194cc056eadb48a5ad8e6d01a761d9351c
* \|	Merge "Add {{#bcp47}} parser function"	jenkins-bot	2024-07-31	1	-0/+121
\|\\|
\| *	Add {{#bcp47}} parser function	C. Scott Ananian	2024-07-30	1	-0/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Template:Bcp47 is one of the most used templates in Wikimedia Commons. Providing its functionality as a parser function, tied to MediaWiki's language-handling code, reduces code duplication and will allow us to reduce template usage on commons. As with the {{#dir}} parser function, support one special case: * `{{#bcp47}}` will return the BCP-47 code of the target language (for articles, the content language; for messages, the user language) Note the following slight differences from [[Template:BCP47]] on Commons, documented in an added parser test: * 'simple' maps to 'en-simple' (not just 'en') * 'roa-tara' maps to 'nap-x-tara' (not 'it-x-tara') Bug: T366623 Change-Id: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
* \|	Merge "Add {{#dir}} parser function"	jenkins-bot	2024-07-30	1	-0/+83
\|\\|
\| *	Add {{#dir}} parser function	Ebrahim Byagowi	2024-07-19	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Template:Dir is one of the most used templates in Wikimedia Commons, this tries to provide parts of its functionality in hope we can perhaps simplify or get rid of the template eventually for clarity and performance reasons. As a convenience, `{{#dir}}` and `{{#dir:}}` are synonyms for `{{#dir:{{PAGELANGUAGE}}}}`: they return the direction of the target language. For articles, the target language is the content language; for messages, the target language is the user language. In addition, to avoid confusion between BCP-47 language codes and MediaWiki-internal language codes, an optional second parameter can be supplied. If the second parameter is the (localizable) string 'bcp47', the language code given in the first parameter will be treated as a BCP-47 code. For example: `{{#dir:sr-Cyrl\|bcp47}}`. (See LanguageCode::bcp47ToInternal() for a description of the differences and overlaps between MediaWiki internal and BCP-47 codes. These overlaps so far don't result in any case where encouraging editors to be precise about which set of enumerated string values they are using for consistency with other language-related functions, and because MediaWiki internally differentiates between BCP-47 codes and internal codes.) Bug: T359761 Change-Id: I19c3e91a924e080f37dc95a0d4e61493583b533e
* \|	ParserTestRunner: add timezone and user language options	Tim Starling	2024-07-12	1	-1/+2
\|/ \| \| \| \| \| \| \| \|	* Add wgLocaltimezone to the list of global variables which may be set in parser test options. * Add userLanguage option, which is passed through to ParserOptions. Bug: T223772 Change-Id: I8498527c276288feae854868a8f4b1f3205a49e8
*	Sync up core repo with Parsoid	C. Scott Ananian	2024-06-10	1	-0/+6
\| \| \| \| \| \|	This now aligns with Parsoid commit 2508e24a2aeb54b55eb54f7f65bedc4d477fc9cf Change-Id: Ibb9f1c6287c6ec3e982f0fa3ddf908b01484973a
*	Move section edit links outside headings (new heading HTML)	Bartosz Dziewoński	2024-05-06	1	-449/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Legacy parser can now output headings using a more accessible markup, which is also identical to the markup used by the Parsoid parser. Changes to client-side JS and CSS necessary to support the new markup have already been merged in earlier commits. includes/skins/Skin.php includes/ServiceWiring.php * Define a new skin option, 'supportsMwHeading', which can be used to toggle the new markup per-skin. * Update the built-in fallback skin to enable it. This affects the output in parser tests. docs/config-schema.yaml includes/config-schema.php includes/config-vars.php includes/MainConfigNames.php includes/MainConfigSchema.php * Add a new configuration setting, 'ParserEnableLegacyHeadingDOM', which can be used to toggle the new markup per-site. includes/OutputTransform/Stages/HandleSectionLinks.php * Output new heading HTML for skins that enabled the option. tests/* * Duplicate parser tests that cover heading generation to cover both new and old markup. Update other parser tests to use new markup. * Add some unit and integration tests for the behavior of the skin option and some parser tests for edge cases of the new markup. Bug: T13555 Change-Id: I1180169a8e83af834c2984ba16089e6277f2a8dd
*	Sync up core repo with Parsoid	Subramanya Sastry	2024-04-26	1	-2/+0
\| \| \| \| \| \|	This now aligns with Parsoid commit 902eb345ed701b635b98f03557276aa48b564cc2 Change-Id: I91c663a4f2ca00157fbd9337d1d0c72a98452591
*	Sync up core repo with Parsoid	Arlo Breault	2024-04-11	1	-4/+4
\| \| \| \| \| \|	This now aligns with Parsoid commit c296dca4af9a1d47200a3699e12d9884acc43150 Change-Id: I5a0e246171e9b58d77b2be945b802f381c1f40b2
*	Merge "Substitute category default sort key when filling links table, not at ↵	jenkins-bot	2024-04-11	1	-0/+16
\|\ \| \| \| \| \| \|	parse time"
\| *	Substitute category default sort key when filling links table, not at parse time	C. Scott Ananian	2024-03-29	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This ensures uniform treatment of all places that call `addCategory` without duplicating the `defaultsort` code; it also ensures that the effect of the {{DEFAULTSORT}} parser function is independent of page position. Bug: T40435 Bug: T353530 Change-Id: I4480a6d59e766fa4eddc9ec9117c58b66771bb47
* \|	Merge "Don't strip non-newline whitespace from left side of language links"	jenkins-bot	2024-04-04	1	-0/+51
\|\ \
\| * \|	Don't strip non-newline whitespace from left side of language links	C. Scott Ananian	2024-03-29	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This follows up on I5e87b33a956e296cdaf671fa99c9555944b73479 and makes (invisible) language links consistent with how we handle (invisible) category links. Bug: T359886 Followup-To: I5e87b33a956e296cdaf671fa99c9555944b73479 Change-Id: I3e5567a91b47e0b04da928450644f3f475aaf51b
* \| \|	Merge "Sync up core repo with Parsoid"	jenkins-bot	2024-04-01	1	-11/+54
\|\ \ \ \| \|/ / \|/\| \|
\| * \|	Sync up core repo with Parsoid	Subramanya Sastry	2024-04-01	1	-11/+54
\| \|/ \| \| \| \| \| \| \| \| \| \|	This now aligns with Parsoid commit 16e27722c6c50618c78230952c1ad27948fc3a0b Change-Id: I21067c1b22a494422184abf7c4bb50424b4fad56
* /	Don't strip non-newline whitespace from left side of [[Category]] links	C. Scott Ananian	2024-03-29	1	-2/+56
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This follows up on a long series of tweaks to whitespace handling around [[Category]] links (T2087, T87753, T174639) which aimed to simplify and make intelligible the whitespace handling around category links without allowing categories to break lists or paragraphs in which they are found. Removing newlines but not other whitespace on the left-hand side of category links should preserve the valuable features of T2087 et al while still ensuring that the following all render equivalently: ABC [[Category:Foo]]DEF ABC[[Category:Foo]] DEF ABC [[Category:Foo]] DEF Added parser test to document the new behavior; it's worth noting that although there were plenty of tests documenting the expected interaction of category links and newlines, there were previously no tests covering the interaction of non-newline whitespace and category links; the one test which needed to be altered added non-semantic whitespace (ie, extra whitespace to the test output which did not affect the way the HTML would display). This patch brings the legacy parser into parity which Parsoid parsing of category links. Bug: T359886 Change-Id: I5e87b33a956e296cdaf671fa99c9555944b73479
*	parser: Fix formatdate parser function for ISO year 0 = 1 BC	thiemowmde	2024-02-27	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure how this ever happened, but I'm sure it's a mistake. The following test scenario should make it very obvious: * {{#formatdate:-0002-12-31\|mdy}} * {{#formatdate:-0001-12-31\|mdy}} * {{#formatdate:0000-12-31\|mdy}} * {{#formatdate:0001-12-31\|mdy}} * {{#formatdate:0002-12-31\|mdy}} Expected output: 3 BC, 2 BC, 1 BC, 1, 2, … Current output: 3 BC, 2 BC, 0 (?), 1, 2, … Note how "1 BC" is skipped and shown as "0" instead. Everything else is correct, e.g. the ISO year -1 is already displayed as "2 BC". It's really only this single outlier. In case you don't know: There is no year 0 when the BC specifier is used. There is either year 1 after or year 1 before Christ. This is different in ISO, mostly to make calculations easier. That's why the DateFormater already does an extra `- 1` and `+ 1` in the two makeIsoYear and makeNormalYear methods. The problematic line of code was originally written in 2003, see https://phabricator.wikimedia.org/rMW98fc03e6 The core parser function exists since 2009, see https://phabricator.wikimedia.org/rMWb9ffb5a7 Change-Id: Iaeb7a954579a409fefd87dab4e2a15778ab39fb4
*	Sync up core repo with Parsoid	C. Scott Ananian	2024-02-21	1	-203/+203
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This now aligns with Parsoid commit 51baccc8741108a9e3f763f2c19c6ce6eda55ac4 Three tests needed to be disabled because they had dependencies on features not included in core's CI: * {{#if}} used in tests added by I71c38b42ac9bfb7137f2e34df70bdfa139abced7 but only provided by the ParserFunctions extension * <poem> used in tests added by I5a6356a82251881a5f841b36a7f26879fc611138 but only provided by the Poem extension In addition, the "multiline" part of the "Expansion of multi-line..." parser tests seems to have been lost at some point. My best guess is that the definition of `Template:1x` initially included an extra newline which was lost, maybe during an unrelated stripping of leading/trailing whitespace in `!! article` clauses. In any case, these tests are no longer testing the thing they say they are. These will be fixed in a follow up. Change-Id: Ia9144634625f176fbea11f3d2ef4b21a5492e99b
*	Fix more incorrect casing of MediaWiki	Reedy	2024-02-19	1	-2/+2
\| \| \| \|	Change-Id: I331e5636823a0beae8d804148f648cfaffd6a1f8
*	Revert "Use Remex for DeduplicateStyles transform"	Isabelle Hurbain-Palatin	2023-12-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 82da9cf14be08e9458f58fa96be51966a2fe7cb1. Passing through Remex seems to have unexpected consequences to be investigated but, for the sake of unbreaking the UBN, let's revert this first. Bug: T353920 Change-Id: Iaac7942aa77aee5ab525852ac5b41dd516ff13c9
*	Merge "Make two messages not raw HTML"	jenkins-bot	2023-12-18	1	-2/+2
\|\
\| *	Make two messages not raw HTML	Jon Harald Søby	2023-12-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two messages were added to wgRawHtmlMessages instead of just fixing the way they were parsed so they can't contain raw HTML. This fixes that. In order to avoid breakage on-wiki for old customized messages that took advantage of them being parsed as raw HTML, rename the messages too. Also rename a few other messages from the same set to stay consistent. Note: These messages are suppressed in favour of Echo's messages when Echo is enabled, and Echo is enabled on all Wikimedia wikis, so the existing customized messages on Wikimedia wikis are basically no-ops. Bug: T353316 Change-Id: Ib0d1c79247fe091f2806b7c23ffb2fe22cc4df4a
* \|	Use Remex for DeduplicateStyles transform	C. Scott Ananian	2023-12-15	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous implementation was using an ad-hoc regular expression which was matching inside the data-mw attribute of Parsoid output, eg: <sup about="#mwt42" [...] typeof="mw:Extension/ref mw:Error" data-mw="{"name":"ref","attrs":{"name":"infobox_stats_ref_rail"},"body":{"html":"<style data-mw-deduplicate=\"TemplateStyles:r1133582631\" typeof=\"..."> After substitution, the <link> element inserted contained " instead of " and so broke out of the attribute. Instead use a proper HTML tokenizer (via wikimedia/remex-html) so that we don't allow bogus matches inside attribute values. To fix up tests: * Don't deduplicate styles when parsing UX messages (also helps performance) * Don't deduplicate styles in ContentHandler integration tests * Don't deduplicate styles by default in parser tests (unless explicit option is set) Depends-On: Id9801a9ff540bd818a32bc6fa35c48a9cff12d3a Depends-On: I5111f1fdb7140948b82113adbc774af286174ab3 Followup-To: Ic0b17e361bf6eb0e71c498abc17f5f67f82318f8 Change-Id: I32d3d1772243c3819e1e1486351d16871b6e21c4