| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main motivation is to further reduce the complexity of the class:
* There is no code that ever writes to $this->mSubstIDs. It's
effectively a constant.
* According to CodeSearch the getSubstIDs() method is not used
anywhere. It's @internal to the parser.
* I find it weird that the parser needs to call 2 factory methods to
do 1 thing.
* I still find it a good idea to keep the knowledge encapsulated in
the factory and not have the [ 'subst', 'safesubst' ] array in the
parser. That's why I propose the new method.
Change-Id: I5c147c75200c3c34a410d93a0328b56ea00a050f
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Garbage in, garbage out. When the wikitext is broken, it's still
helpful if the user can see the broken wikitext. Even if it's not
fully parsed. It's not the job of this class to fix broken UTF-8.
The worst thing that can happen is that the wikitext contains some
unparsed magic words. However, this is really only relevant for
very old revisions (20 years old, see T321234). It's very normal
that old revisions can't be 100% parsed any more, most notably
because of deleted templates. This here is not much different.
Bug: T321234
Change-Id: I0ce40f6575668847ef309599ee32de52190ab212
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The extra code that scans for duplicates and throws an exception was
added via I95dea67 in 2017. I'm not entrirely sure why. This should
be impossible in all relevant real-world scenarios. Maybe it happened
in a local dev scenario?
Even if, duplicates are harmless. Let me explain:
The only way a duplicate can end here is when the same magic word is
added twice to the $this->names array. The only thing that happens
then is that the resulting regex contains one of the sub-patterns
twice. It doesn't matter which one matches. We know these subpatterns
are identical. Unfortunately the PCRE compiler doesn't know and
assumes duplicate names are a problem. We have two options to fix
this: Strip duplicates in $this->names with array_unique() or tell
the PCRE compiler that duplicates are ok with the /J modifier.
I would like to avoid the extra, potentially expensive array_unique()
because, as said, duplicates never happen in real-world scenarios.
The /J modifier is supported since PHP 7.2.
Change-Id: I5f113abdbb44354fcc01be7f36fbc7d07f75876c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* MagicWord::getId was added in r24808 (164bb322f2) but never used.
At the time, access modifiers like 'private' were not yet in use.
Deprecate the method with warnings, for removal in a future release.
* Fix zero coverage for MagicWord, due to constructor being
internal, this is only intended to be created via array and
factory classes. Let their tests cover this class.
* Remove redundant file-level description and ensure the class desc
and ingroup tag are on the class block instead.
Ref https://gerrit.wikimedia.org/r/q/owner:Krinkle+message:ingroup
* Mark constructor `@internal` (was already implied by
stable interface policy), and explain where to get the object
instead.
* Mark load() `@internal`. Method was introduced in 1.1 when the
class (and PHP) did not yet use visibility modifiers for private
methods. The only way to get an instance of MagicWord
(MagicWordFactory::get) already calls load(), the method is not
a no-op if called a second time, and (fortunately) there exist no
callers to this outside this class that I could find.
* MagicWordArray::getBaseRegex was marked as internal
in change I17f1b7207db8d2203c904508f3ab8a64b68736a8.
Change-Id: I4084f858bb356029c142fbdb699f91cf0d6ec56f
|
|
|
|
|
|
|
| |
The tests we added before create only MagicWordArray objects with a
single magic word. Here we are testing actual arrays of magic words.
Change-Id: I5880cca2a1e1ecf7018edd22c11229da5d5baffd
|
|
|
|
|
|
|
|
| |
I think this code is effectively covered by the parser tests that use
magic words. Still it worried me more and more to make changes to
this code without dedicated unit tests.
Change-Id: Id72e1d7ef4736e4d0672798d720465648d91b3ba
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This nominally takes a string-valued language code conforming to the
BCP-47 standard, but this is often generated from a Bcp47Code object.
Since the MediaWiki Language code implements Bcp47Code, we may have
the case where we have a Language object in hand (but typed as a
Bcp47Code not Language) and call Language::toBcp47Code() only to pass
it to LanguageCode::bcp47ToInternal to convert it back to a
mediawiki-internal code.
We can save steps and be more efficient if allow the parameter to be a
Bcp47Code object, and write a fast path for the special case where
that Bcp47Code happens to be a Language object and we can simply call
Language::getCode() to obtain the internal code.
Change-Id: I24932449b8c40e3a5072748d87667184f4befa67
|
| |
| |
| |
| |
| | |
Bug: T166010
Change-Id: I4066885a7ea071d22497abcdb3f95e73e154d08c
|
| |
| |
| |
| |
| | |
Bug: T166010
Change-Id: Id13dcbf7a0372017495958dbc4f601f40c122508
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Remove parser creation from service creation
In ParsoidSiteConfig inject the ParserFactory and call getMainInstance
later, ParsoidSiteConfig is created often without calls to the parser.
For ParsoidDataAccess store the factory and call it when needed.
Bug: T343070
Change-Id: Ib3acadaf190383e4a8b3d266a9fd75c9b20c6649
|
| |
| |
| |
| |
| |
| |
| | |
One of the big ones, so doing this alone.
Bug: T166010
Change-Id: Ic2d59eb6764b1a273ed7162ecabf641f638b8f66
|
| |
| |
| |
| |
| |
| |
| | |
One of the big ones, so doing this alone.
Bug: T166010
Change-Id: Ibe103cd362535d3cb94cb8931e95fc74099d1497
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
repo as well.
Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Unit tests should not access the ExtensionRegistry singleton. This is
similar to how MediaWikiServices is disabled, but needs to be done
separately because ExtensionRegistry is not a service.
Make ExtensionRegistryTest use a mocked SettingsBuilder to avoid
triggering the exception when SettingsBuilder tries to access the global
instance of ExtensionRegistry.
Inject data from ExtensionRegistry into Parsoid's SiteConfig to keep
SiteConfigTest a working unit test.
Change-Id: I0a04c82250582fed7a66c1e10868d9b4f3823a28
|
| |
| |
| |
| | |
Change-Id: I22090062274dceec96d43e23eb227a7e3b1e36fa
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
wfParseUrl falls back to the global service locator as of I706ef8a5.
This will soon be disallowed in unit tests (see I5117eab9), and all the
classes updated in this patch are covered by a unit test that would then
fail.
SiteConfig already has a UrlUtils object available, so just use that.
In the other classes, there is no need to inject a UrlUtils service and
we can instead adopt parse_url, because these didn't depend on our
site-configurable or custom parsing logic. For precedent see also
change I6492f5142861513e4a7, I1e76d2f5aef, and lots of other examples
in Codesearch for parse_url().
The warnings about parse_url() in UrlUtils.php have been obsolete
since about PHP 5.4, when it started to support protocol-relative
URLs, non-slash protocols like "mailto", and deal with spaces/newlines
correctly (https://3v4l.org/YWUkl).
This patch was partly copied from PS 20 of I5117eab9.
Co-Authored-by: Timo Tijhof <krinkle@fastmail.com>
Change-Id: I98ea4670e842d11598664f058d8c90a900477be4
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
LanguageVariantConverterUnitTest: don't mock a method in the Parsoid
class that no longer exists.
ParsoidParser: pass a Bcp47Code (in the form of a Language object),
not a string, when selecting the preferred variant for the output
Followup-To: Ib8554f98b1c653df3864110e0e66796b8da67b5f
Change-Id: I32fd64a9495b8aed729b0b5b00535180006e0223
|
| |
| |
| |
| |
| |
| |
| | |
* SiteConfig::variants() was replaced by ::variantsFor()
* SiteConfig::langConverterEnabled() was replaced by ::langConverterEnabledBcp47()
Change-Id: I2dc510fcf0f03304f01c14cff92d5dd50736f062
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some details:
* Just use a real MagicWord object. It doesn't do anything that
needs mocking.
* Add missing methods to mocks.
* Remove not needed details from mocks.
* Remove duplicate test that does the same.
* Remove pointless assertions that are impossible to ever fail.
Change-Id: I177242429a528d2c7109ca757840b538b772711c
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
Because of this, you can see that the current parser cache key
(in deployed production code) doesn't have 'useParsoid=1' in it.
Ex: View source on enwiki:Hospet shows that the parser cache key
there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".
The only reason this doesn't conflict with legacy parser output
is because we use "parsoid-pcache", a diferent cache instance than
"pcache" used for legacy parser output. But if/when we decide to use
the same parser cache instance, this could cause cache corruptions.
With FlaggedRevisions, where a single "stable-pcache" parser cache
instance is used, in local testing, this was causing Parsoid HTML to be
saved without "useParsoid=1", and so Parsoid HTML was being returned
for legacy parser cache requests.
* In addition, fix the code in PageBundleParserOutputConverter to copy
over internal metadata (which includes used options). This ensures
that any tracked parser options aren't lost and the right parser cache
key is constructed later on.
* Added / updated a number of new tests that verifies that usedOptions
is tracked correctly in the useParsoid code paths. The tests fail
without the code changes in this patch.
Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This an issue introduced by I8711a51fc1bcac48, which
caused duplicate variant conversion to be applied in some cases.
The reason is that the $parserOutput and $processedParserOutput fields
in HtmlOutputRendererHelper ended up being the same object.
Change-Id: Ic1fbc8815ef74beba6dae927563a9945b6dab1a1
|
| |
| |
| |
| | |
Change-Id: I3bee4452b182a982b99017beed4ff929e96a10c6
|
|\ \
| | |
| | |
| | | |
check source"
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a slightly stricter test than we'd previously used to check
the validity of the provided source language parameter.
Change-Id: I22e9c5cf6c30ce737884162970a1eb349549c86d
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.
Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.
Do the same with the corresponding TitleFactory methods.
Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).
Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
|
|\ \ \ |
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I could not find any use outside of core, or even outside of this
class.
The class is instantiated a single time in core:
https://codesearch.wmcloud.org/search/?q=new%5CW%2BLinkHolderArray&files=%5C.php%24
This instance is not used anywhere else:
https://codesearch.wmcloud.org/search/?q=mLinkHolders&files=%5C.php%24
I would argue this doesn't really qualify as a breaking change. This
was always meant to be private.
Change-Id: I4c614dae1fe1d61c9cf8b7a03c37eb93fae33873
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is now enabled in production (Ic5a4a9950d51f63b17f4c5e70516bec87b981aa5)
and not something we want to remain configurable.
It is removed from Parsoid in I52ddfd21ff2e72a34cb5eb68742e3dfb85c6ccf6
Change-Id: I6a4d7d33fb42270fc5da3a922aa0a959180fb33f
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Just methods where adding "static" to the declaration was enough, I
didn't do anything with providers that used $this.
Initially by search and replace. There were many mistakes which I
found mostly by running the PHPStorm inspection which searches for
$this usage in a static method. Later I used the PHPStorm "make static"
action which avoids the more obvious mistakes.
Bug: T332865
Change-Id: I47ed6692945607dfa5c139d42edbd934fa4f3a36
|
|/ /
| |
| |
| |
| |
| | |
Bug: T268777
Depends-On: Ie6bc2c1cef2aca3166a8af6921cad29ebb8ef3a2
Change-Id: I0c01f62a4f290862d91436eca1baa0f5ee1af5fc
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.
Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.
This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be. Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method. The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).
Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.
Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
|
|\ \
| | |
| | |
| | | |
ETag."""
|
| |/
| |
| |
| |
| |
| | |
This reverts commit c4f40bd107d52e76a9121988106e5f8771ddbbcb.
Change-Id: Iff0f9859a83506059f100ddd60b74cfdd1279071
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The Parsoid entrypoints should always have a "real" ParserOutput
passed as the ContentMetadataCollector object, so that recursive
invocations of extensions, etc, can set appropriate metadata
properties in the ParserOutput.
This is part of a belt-and-suspenders fix for T331084, where a
StubMetadataCollector is being used in production -- production should
never use a stub, it should always use a real ParserOutput object.
The other fix for T331084 is
I30ea2bb24e6c9b0950a8f46dc8e5b9bf5ee3378b, which ensures that if you
*were* to use a StubMetadataCollector in production, it wouldn't throw
an error when a numeric category string was encountered.
Bug: T331084
Change-Id: I8711a51fc1bcac48eae92ab1ba15a33fe05937ed
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit ee8dd055c8ec28fc79cc334e20dc3920ac430726.
Reason for revert: breaks officewiki
Bug: T331629
Depends-On: I46f16eae9c137d43aad22bfd4be460cfb635614b
Change-Id: Ieb0dedfb5ae3168749a9ab6d930be527337348e8
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Allow clients to use an If-Match header with the
transform/html/to/wikitext endpoint.
Bug: T310464
Needed-By: Ifb1c40a0044f04fb339b00630fbca9190a1bce51
Change-Id: Ida81a314f015e205f2081c68a82d486145097c92
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is moderately messy.
Process was principally:
* xargs rg --files-with-matches '^use Title;' | grep 'php$' | \
xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1'
* rg --files-without-match 'MediaWiki\\Title\\Title;' . | grep 'php$' | \
xargs rg --files-with-matches 'Title\b' | \
xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1'
* composer fix
Then manual fix-ups for a few files that don't have any use statements.
Bug: T166010
Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a
Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These classes:
- MergeHistory
- MovePage
- ProtectionForm
- BadFileLookup (to MediaWiki\Page\File)
- FileDeleteForm (to MediaWiki\Page\File)
Bug: T321882
Change-Id: Ibeb488ba322c62a34042a0307bbb5562773bcad1
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | | |
Bug: T321882
Change-Id: I0b86acfdeaa3a2a0a14b7763fd088122820bafdc
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix documentation related to ExtraInterlanguageLinkPrefixes
configuration: it should be a list, not a map, and described usage
better.
In ApiQuerySiteInfo, third-party clients (like Parsoid) need to know
whether a given language link core corresponds to a deprecated
language code or a "real" one; the API was also missing information
regarding which language code an "extra language link" prefix
corresponds to (given by InterlanguageLinkCodeMap in the
configuration).
Finally, add the corresponding bcp47 codes for these interlanguage
links, so third-party clients don't need to know details of mediawiki
internal and deprecated language codes.
Change-Id: I82465261bc66f0b0cd30d361c299f08066494762
|
|/
|
|
|
|
|
|
| |
The rest of the Parsoid-related unit tests are in
tests/phpunit/unit/includes/parser/Parsoid
and that's where these should be, as well.
Change-Id: Iaf6daf3366184337a5e9c28ddbaf6ada8c290848
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
This is approved as part of T166010 RFC.
Bug: T321882
Change-Id: Ia4498c0a20e38a6a288dc14065ea8242c84fbc49
|