| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This now aligns with Parsoid commit 0965c908f046d659aab16b4023cc8de9ded1fce7
Change-Id: Ic007c7b4a893329de8499a88bb0edcb4b04d0905
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* In 387061415a38ea2d28e76ac9d7d599f6f02deec3, we added support for
StripState::split. In 8465c722, we added support for 'exttag' strip
marker which introduced the possibility of recursive strip
markers. This patch fixes the oversight and adds recursive
processing for nested strip markers. The code matches the logic of
unstripType. Verified on local wiki that this fixes the issues
highlighted in T387608.
* Ensure that processNowiki is true when fragment mode v2 is being
used (ie, when stripExtTags is false). This makes unnecessary the
StripState::replaceNoWikis() function added to support
mw.text.unstripNoWiki in T272507 (and broken in T387655). The
workaround can be cleaned up once v2 fragment mode is enabled
everywhere. This fixes a regression in Scribunto's
mw.text.unstripNoWiki function when v2 fragment mode is used.
* Ensure that the T299103 workaround for {{#tag:<nowiki>...</nowiki>}}
continues to work by calling unstripNowiki() after PROCESS_NOWIKI
puts the <nowiki> contents into the strip state. This fixes a regression
in {{#tag:syntaxhighlight|<nowiki>....</nowiki>}} when using v2
fragment mode.
* Added 'marker' to StripState::split() output, so that unhandled
strip state components can be left as strip markers.
* Added some StripState::split() phpunit tests.
* Changed ParserTestRunner to enable v2 fragment mode by default,
which helped identify the Scribunto and SyntaxHighlight regressions
above, covered by their parser test suites.
Bug: T387608
Bug: T387655
Bug: T272507
Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Depends-On: I5e2533b7992b8e8a03fe2ea622b6fe5b008d20be
Change-Id: I43134281e4da1c8767520e418031935447ea93af
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 4d44dbd77363ad4e7c428ecd30029e906018fa6c
Change-Id: If883c36d599464aa6ed49edf71fd44dc880b3efd
|
|
|
|
|
|
|
|
| |
Prevent unnecessary <nowiki/> tags from being inserted beween
extension tags resulting from template or parser function expansion.
Bug: T386233
Change-Id: I1da9539837532e6690765e0717eee2f38378809c
|
|
|
|
|
|
|
|
|
| |
This was used to test an experimental parsoid feature before deployment,
but the testing was successful.
Bug: T382464
Follows-Up: I194a9550500bf7ece215791c51d6feb78a80b1a8
Change-Id: Ib91a17868352722dc3570b07856423733f1b2368
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I originally misunderstood the difference between 'nowiki' and
'general' strip markers, thinking that 'nowiki' was "literal text"
which needed to be HTML escaped, and 'general' was "HTML". This is
not correct! Both strip marker types are included as raw HTML; the
'general' strip marker however is technically "half-parsed HTML" and
(in the legacy parser) is subject to doBlockLevels, Language
Conversion, and other processing (T381709). Parsoid does not
currently have any support for processing "half-parsed HTML" and to
date all code paths involving "half-parsed HTML" have been deprecated.
For the moment, we'll treat the "half-parsed HTML" as "fully-parsed
HTML" in Parsoid.
This patch doesn't make any changes to legacy parser behavior.
Change-Id: I07bacdb4bbe90728d2faa207c19fb92ad0e4a257
|
|
|
|
|
|
|
|
|
|
| |
Ensure that when a parser function or extension returns raw HTML
(using the new 'isRawHTML' flag) it is protected from doBlockLevels,
language conversion, etc by using a 'nowiki' strip marker.
Bug: T381617
Depends-On: I8f43f6ae9ca9a0c8d88c92b65c81fdc5cfa09dc3
Change-Id: Icb8eae9c1f3146e19c6bd811ab1fc86eebaa991f
|
|
|
|
|
|
| |
This now aligns with Parsoid commit b9166ba69b1148e5b8d62dd200fa25fc79116b96
Change-Id: I5ca957b030639815786138b76c65720d706c13a6
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is more robust and secure than the regular expression previously
used to extract the <meta> tag.
We also improve HtmlHelper slightly be adding the ability to replace
an element with an 'outerHTML' string.
Because our output is being run through Remex, there is a slightly
larger degree of HTML normalization in the output than previously,
which is visible in some small tweaks to test case outputs.
Bug: T381617
Depends-On: I2712e0fa9272106e8cd686980f847ee7f6385b6f
Change-Id: I4cb2f29cf890af90f295624c586d9e1eb1939b95
|
| |
| |
| |
| |
| |
| | |
This now aligns with Parsoid commit 3c3e96d168b5b5a5fe90520ea23f938a7a59181d
Change-Id: Iadbe23dcc4b9ee68ad10220623ad9edae0b41b40
|
|/
|
|
|
| |
Bug: T356718
Change-Id: Ie50308bde7212cc19d6fe6273ae36e79ae5f94c3
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 17e81f0d1890bef61f3d12be69a02f8a1fdd3edf
Change-Id: I03929213653349b625eb75d9b0444cdd98466c89
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 1402de4384d49f46b4c72e71797714486e8cec9b
Change-Id: I2e6b7b5cb4dd533f83a7eb69ce49e57d5346f291
|
|
|
|
| |
Change-Id: I1b7a9d85fbea10406def755da553ef7ba47e1858
|
|
|
|
|
|
|
|
|
|
|
| |
Empty ids aren't valid identifiers. The spec says they must contain at
least one character,
https://html.spec.whatwg.org/multipage/dom.html#global-attributes:the-id-attribute-2
Test was introduced in If95fd9410f8d2e1ed403ea063e09670a7f71dcce
Depends-On: Iec3c919ed1ea51acef9efabe979bd8d0feaf651a
Change-Id: I3c547f5524530e976eb7aa960751265c8383f7b4
|
|
|
|
|
|
|
|
|
|
| |
As noted on the comments, this needed a markup that work better
in bidi scenarios and as a part of replacing bidi control codes
with HTML markup I was able to test different bidi scenarios
using <bdi> HTML tags.
Bug: T375975
Change-Id: If2af751fc9f78869acf7b7e93199fa927de2cc19
|
|
|
|
|
|
| |
This now aligns with Parsoid commit b19f73d7beadedcb6991640aac7eb7d6e7aec8f5
Change-Id: Ief91b25769f777169af65c9720faa767850f6239
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move deduplication of language links out of Parser.php and into the
ParserOutput in order to be compatible with alternate Parsers (Parsoid).
Clean up various inconsistencies: ensure deduplication also happens in
OutputPage when multiple ParserOutputs are merged into the final output,
and ensure that the deduplication in LinksUpdate is done in the same
order (first link prevails) as in Parser/ParserOutput/OutputPage.
Deprecate OutputPage::setLanguageLinks() (the matching
ParserOutput::setLanguageLinks() was deprecated in 1.42).
As a breaking change, return an array, not an array *reference*, from
ParserOutput::getLanguageLinks(). This allows us to safely modify the
internal representation of language links. As far as I can tell, no one
used the returned reference to sneakily modify the list of language
links, and there not a good way to have deprecated this before making
the breaking change.
While we're at it, we've added tests to ensure that language link
fragments are preserved.
Bug: T26502
Bug: T358950
Bug: T375005
Change-Id: I82a05a51d94782ebb9fa87ff889ca0f633b3e15c
|
|
|
|
|
|
| |
This now aligns with Parsoid commit fc9ab0949952d5e784acb012096860f5c8663fc7
Change-Id: I5d72f551c75de80b0834ea98d8a1d3cb5852e866
|
|
|
|
|
|
| |
This now aligns with Parsoid commit dea42dd799d9c40fb7fedb42122ec264d6ef6ded
Change-Id: I4b2614ce3a83bfea0af53927464e7fbde6a92df9
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New options added: `iwl`, `links`, `special`, `extlinks`, and `templates`,
and handling of existing `ill` option tweaked to be consistent.
Added some tests to exercise these options, focusing on the handling
of title fragments. Attempted to make the output formatting consistent
among options; a future unification (I32df68714ffdf2f0745b974f47bc3ccceef1f41c)
should help DRY these out further.
Bug: T310512
Change-Id: Ic9c766ae4362969de124ad9d66eb47cfa68395c6
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 80bc41a395b19221e7f26b36dfbe0ab15a025819
Change-Id: Iec571f78e7a55991aea69ede2519803b84c05936
|
|
|
|
|
|
| |
Parsoid does support these options now.
Change-Id: I9caedd10b8f7229602ad4f963275b62777aca104
|
|
|
|
|
|
|
|
| |
Depending on configuration, this returns either the interface language
code of the current user or the current page language.
Bug: T4085
Change-Id: Iab7fda272ec81af88c74612727ff6bed014d4a81
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add the same no-arg options for language code that
{{#dir}} and {{#bcp47}} have, for consistency:
* `{{#language}}` will return the name of the *target language*
(for articles, the content language; for messages, the user language)
The default value for the "in language" argument should be the autonym.
This was working previously but only via a baroque code flow path for
invalid language codes. Make this a bit clearer and add tests.
Since non-autonym language code translations are added via the
[[Extension:CLDR]] in production, hook LanguageGetTranslatedLanguageNames
in the ParserTestRunner to ensure that we can test this.
Followup-To: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
Followup-To: I19c3e91a924e080f37dc95a0d4e61493583b533e
Change-Id: Ibf6e7f194cc056eadb48a5ad8e6d01a761d9351c
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Template:Bcp47 is one of the most used templates in Wikimedia Commons.
Providing its functionality as a parser function, tied to MediaWiki's
language-handling code, reduces code duplication and will allow us to
reduce template usage on commons.
As with the {{#dir}} parser function, support one special case:
* `{{#bcp47}}` will return the BCP-47 code of the *target language*
(for articles, the content language; for messages, the user language)
Note the following slight differences from [[Template:BCP47]] on Commons,
documented in an added parser test:
* 'simple' maps to 'en-simple' (not just 'en')
* 'roa-tara' maps to 'nap-x-tara' (not 'it-x-tara')
Bug: T366623
Change-Id: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Template:Dir is one of the most used templates in Wikimedia Commons,
this tries to provide parts of its functionality in hope we can
perhaps simplify or get rid of the template eventually for clarity and
performance reasons.
As a convenience, `{{#dir}}` and `{{#dir:}}` are synonyms for
`{{#dir:{{PAGELANGUAGE}}}}`: they return the direction of the target
language. For articles, the target language is the content language;
for messages, the target language is the user language.
In addition, to avoid confusion between BCP-47 language codes and
MediaWiki-internal language codes, an optional second parameter can be
supplied. If the second parameter is the (localizable) string
'bcp47', the language code given in the first parameter will be
treated as a BCP-47 code. For example: `{{#dir:sr-Cyrl|bcp47}}`.
(See LanguageCode::bcp47ToInternal() for a description of the
differences and overlaps between MediaWiki internal and BCP-47
codes. These overlaps *so far* don't result in any case where
encouraging editors to be precise about which set of enumerated
string values they are using for consistency with other
language-related functions, and because MediaWiki internally
differentiates between BCP-47 codes and internal codes.)
Bug: T359761
Change-Id: I19c3e91a924e080f37dc95a0d4e61493583b533e
|
|/
|
|
|
|
|
|
|
| |
* Add wgLocaltimezone to the list of global variables which may be set
in parser test options.
* Add userLanguage option, which is passed through to ParserOptions.
Bug: T223772
Change-Id: I8498527c276288feae854868a8f4b1f3205a49e8
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 2508e24a2aeb54b55eb54f7f65bedc4d477fc9cf
Change-Id: Ibb9f1c6287c6ec3e982f0fa3ddf908b01484973a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Legacy parser can now output headings using a more accessible markup,
which is also identical to the markup used by the Parsoid parser.
Changes to client-side JS and CSS necessary to support the new markup
have already been merged in earlier commits.
includes/skins/Skin.php
includes/ServiceWiring.php
* Define a new skin option, 'supportsMwHeading', which can be used
to toggle the new markup per-skin.
* Update the built-in fallback skin to enable it. This affects the
output in parser tests.
docs/config-schema.yaml
includes/config-schema.php
includes/config-vars.php
includes/MainConfigNames.php
includes/MainConfigSchema.php
* Add a new configuration setting, 'ParserEnableLegacyHeadingDOM',
which can be used to toggle the new markup per-site.
includes/OutputTransform/Stages/HandleSectionLinks.php
* Output new heading HTML for skins that enabled the option.
tests/*
* Duplicate parser tests that cover heading generation to cover both
new and old markup. Update other parser tests to use new markup.
* Add some unit and integration tests for the behavior of the skin
option and some parser tests for edge cases of the new markup.
Bug: T13555
Change-Id: I1180169a8e83af834c2984ba16089e6277f2a8dd
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 902eb345ed701b635b98f03557276aa48b564cc2
Change-Id: I91c663a4f2ca00157fbd9337d1d0c72a98452591
|
|
|
|
|
|
| |
This now aligns with Parsoid commit c296dca4af9a1d47200a3699e12d9884acc43150
Change-Id: I5a0e246171e9b58d77b2be945b802f381c1f40b2
|
|\
| |
| |
| | |
parse time"
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This ensures uniform treatment of all places that call `addCategory`
without duplicating the `defaultsort` code; it also ensures that the
effect of the {{DEFAULTSORT}} parser function is independent of page
position.
Bug: T40435
Bug: T353530
Change-Id: I4480a6d59e766fa4eddc9ec9117c58b66771bb47
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This follows up on I5e87b33a956e296cdaf671fa99c9555944b73479 and makes
(invisible) language links consistent with how we handle (invisible)
category links.
Bug: T359886
Followup-To: I5e87b33a956e296cdaf671fa99c9555944b73479
Change-Id: I3e5567a91b47e0b04da928450644f3f475aaf51b
|
|\ \ \
| |/ /
|/| | |
|
| |/
| |
| |
| |
| |
| | |
This now aligns with Parsoid commit 16e27722c6c50618c78230952c1ad27948fc3a0b
Change-Id: I21067c1b22a494422184abf7c4bb50424b4fad56
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This follows up on a long series of tweaks to whitespace handling around
[[Category]] links (T2087, T87753, T174639) which aimed to simplify and
make intelligible the whitespace handling around category links without
allowing categories to break lists or paragraphs in which they are found.
Removing newlines but not other whitespace on the left-hand side of
category links should preserve the valuable features of T2087 et al
while still ensuring that the following all render equivalently:
ABC [[Category:Foo]]DEF
ABC[[Category:Foo]] DEF
ABC [[Category:Foo]] DEF
Added parser test to document the new behavior; it's worth noting
that although there were plenty of tests documenting the expected
interaction of category links and newlines, there were previously
no tests covering the interaction of non-newline whitespace and
category links; the one test which needed to be altered added
non-semantic whitespace (ie, extra whitespace to the test output
which did not affect the way the HTML would display).
This patch brings the legacy parser into parity which Parsoid parsing
of category links.
Bug: T359886
Change-Id: I5e87b33a956e296cdaf671fa99c9555944b73479
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not sure how this ever happened, but I'm sure it's a mistake.
The following test scenario should make it very obvious:
* {{#formatdate:-0002-12-31|mdy}}
* {{#formatdate:-0001-12-31|mdy}}
* {{#formatdate:0000-12-31|mdy}}
* {{#formatdate:0001-12-31|mdy}}
* {{#formatdate:0002-12-31|mdy}}
Expected output: 3 BC, 2 BC, 1 BC, 1, 2, …
Current output: 3 BC, 2 BC, 0 (?), 1, 2, …
Note how "1 BC" is skipped and shown as "0" instead. Everything else
is correct, e.g. the ISO year -1 is already displayed as "2 BC".
It's really only this single outlier.
In case you don't know: There is no year 0 when the BC specifier is
used. There is either year 1 after or year 1 before Christ. This is
different in ISO, mostly to make calculations easier. That's why the
DateFormater already does an extra `- 1` and `+ 1` in the two
makeIsoYear and makeNormalYear methods.
The problematic line of code was originally written in 2003, see
https://phabricator.wikimedia.org/rMW98fc03e6
The core parser function exists since 2009, see
https://phabricator.wikimedia.org/rMWb9ffb5a7
Change-Id: Iaeb7a954579a409fefd87dab4e2a15778ab39fb4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This now aligns with Parsoid commit 51baccc8741108a9e3f763f2c19c6ce6eda55ac4
Three tests needed to be disabled because they had dependencies on features
not included in core's CI:
* {{#if}} used in tests added by I71c38b42ac9bfb7137f2e34df70bdfa139abced7
but only provided by the ParserFunctions extension
* <poem> used in tests added by I5a6356a82251881a5f841b36a7f26879fc611138
but only provided by the Poem extension
In addition, the "multiline" part of the "Expansion of multi-line..."
parser tests seems to have been lost at some point. My best guess is
that the definition of `Template:1x` initially included an extra
newline which was lost, maybe during an unrelated stripping of
leading/trailing whitespace in `!! article` clauses. In any case,
these tests are no longer testing the thing they say they are.
These will be fixed in a follow up.
Change-Id: Ia9144634625f176fbea11f3d2ef4b21a5492e99b
|
|
|
|
| |
Change-Id: I331e5636823a0beae8d804148f648cfaffd6a1f8
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 82da9cf14be08e9458f58fa96be51966a2fe7cb1.
Passing through Remex seems to have unexpected consequences to be
investigated but, for the sake of unbreaking the UBN, let's revert this
first.
Bug: T353920
Change-Id: Iaac7942aa77aee5ab525852ac5b41dd516ff13c9
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Two messages were added to wgRawHtmlMessages instead of just
fixing the way they were parsed so they can't contain raw
HTML. This fixes that.
In order to avoid breakage on-wiki for old customized messages
that took advantage of them being parsed as raw HTML, rename
the messages too. Also rename a few other messages from the
same set to stay consistent.
Note: These messages are suppressed in favour of Echo's messages
when Echo is enabled, and Echo is enabled on all Wikimedia wikis,
so the existing customized messages on Wikimedia wikis are basically
no-ops.
Bug: T353316
Change-Id: Ib0d1c79247fe091f2806b7c23ffb2fe22cc4df4a
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation was using an ad-hoc regular expression which
was matching inside the data-mw attribute of Parsoid output, eg:
<sup about="#mwt42" [...] typeof="mw:Extension/ref mw:Error" data-mw="{"name":"ref","attrs":{"name":"infobox_stats_ref_rail"},"body":{"html":"<style data-mw-deduplicate=\"TemplateStyles:r1133582631\" typeof=\"...">
After substitution, the <link> element inserted contained " instead of
" and so broke out of the attribute.
Instead use a proper HTML tokenizer (via wikimedia/remex-html) so that
we don't allow bogus matches inside attribute values.
To fix up tests:
* Don't deduplicate styles when parsing UX messages (also helps performance)
* Don't deduplicate styles in ContentHandler integration tests
* Don't deduplicate styles by default in parser tests
(unless explicit option is set)
Depends-On: Id9801a9ff540bd818a32bc6fa35c48a9cff12d3a
Depends-On: I5111f1fdb7140948b82113adbc774af286174ab3
Followup-To: Ic0b17e361bf6eb0e71c498abc17f5f67f82318f8
Change-Id: I32d3d1772243c3819e1e1486351d16871b6e21c4
|