aboutsummaryrefslogtreecommitdiffstats
path: root/tests/phpunit/unit/includes/parser
Commit message (Collapse)AuthorAgeFilesLines
...
* | Fix remaining link numeric ids in LinkHolderArrayTestthiemowmde2022-12-091-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | These specific placeholders are meant to stay untouched in the test. Because of this it technically doesn't matter if the namespace numbers and link ids (from Parser::nextLinkID) are numeric. Still I find it less confusing when the test setup reflects how the code actually behaves. An IWLINK is identified by a single numeric id, and a LINK by a namespace:numeric id pair. This is split from Ie994059 to make it easier to review. Change-Id: I1141de06eb3235d199a3dc02ef0a24d0d4fe1416
* | Fix bogus test setup in LinkHolderArrayTestthiemowmde2022-12-091-41/+41
| | | | | | | | | | | | | | | | | | | | | | These array keys are all meant to come from Parser::nextLinkID(). That returns a positive integer. Let's please make the tests reflect what the code actually does to reduce confusion. I tried my best to come up with a mapping that still (possibly even better) makes it obvious what's happening. Change-Id: Ib5d5f6f48ddc8c066961ea0f36a922ae9f0c6f6c
* | Fix bogus non-numeric namespaces in LinkHolderArrayTestthiemowmde2022-12-091-15/+15
|/ | | | | | | | | | The first level in LinkHolderArray::$internals is indexed by numeric namespace id. This must be a number. Let's please make the test reflect what the code actually does. Note the same mistake was already fixed before in I7e71ffc. Change-Id: I14f717510a71adf8ea3f75e61ba844d93838f945
* Reorg: Move Title-related classes to title/Amir Sarabadani2022-11-261-0/+1
| | | | | | | | | | | | | | These three classes: - TitleArray - TitleArrayFromResult - TitleFactory We need to move these and the rest of files under title/ to Title/ (and namespace them) but the patch will become way too big given that Title class is also one of them. Bug: T321882 Change-Id: Iac1688172ee457348a08a470c86e047571feb8e0
* Protect against long match length in CHAR_REFS_REGEXC. Scott Ananian2022-11-171-2/+16
| | | | | | | | | | | | | | | | | | | | Some malformed pages contain "character references" that were so long that they caused PHP's `hexdec` to return a `float` instead of an `int`. This caused Parsoid to crash on a type hint on the argument to Sanitizer::validateCodepoint(). MediaWiki core has the same issue, but doesn't have the type hint (yet), so soft-fails instead of crashes. Add sanity checks around each call to `hexdec` to protect against arbitrarily-long entity strings (while allowing arbitrary zero-padding), and add a note to `intval` to explain why it is not similarly affected. New test cases added to SanitizerUnitTest as well. Corresponding patch on the Parsoid side: Ic33196961bb2b86290148fbc3ce33bcd8b28ab56 (And see T247804 re: eventually removing this duplicate code.) Bug: T322892 Change-Id: I5085c4edbb86e282b92536d05b01ed5f9d5c615e
* parsoid: inject UrlUtils to avoid phpunit failures in SiteConfigTestAaron Schulz2022-08-261-0/+8
| | | | | | | | | Previously, prior changes to the global UrlUtils instance could interfere with testInterWikiMap() and trigger a BadMethodCallException exception in UrlUtils::expand(). Bug: T297078 Change-Id: If75a91e33c5881244de388d203d0fc4879000f46
* Sanitizer: Don't consider inline var CSS insecureMichał Turek2022-08-241-1/+0
| | | | | | | | Since (T208881) "CSS using var() to create exponential sized calc() on wiki page will crash visitor's browser" was fixed by disabling var in inline CSS, the issue with browser crashes appears to have been fixed in Firefox, Chrome, modern Edge, and Opera. This change reverts T208881. Bug: T288201 Change-Id: I387a0e9fdd02faa69616890c613462c83b91b789
* unit tests: Use MainConfigNames constant to refer configsUmherirrender2022-08-172-48/+50
| | | | | | | When creating ServiceOptions objects or fake HashConfigs use the constant to refer the config name Change-Id: I59a29f25b76e896c07e82156c6cc4494f98e64cc
* Merge "Replace trivial usa of mock builder with createMock() shortcut"jenkins-bot2022-07-192-6/+2
|\
| * Replace trivial usa of mock builder with createMock() shortcutThiemo Kreuz2022-07-152-6/+2
| | | | | | | | | | | | | | | | | | | | | | createMock() does the same, but is much easier to read. A small difference is that some of the replacements made in this patch didn't use disableOriginalConstructor() before. In case this was relevant we should see the respective test fail. If not we can save some CPU cycles and skip these constructors. Change-Id: Ib98fb06e0fe753b7a53cb087a47e1159515a8ad5
* | Tests: Use createNoOpMock() shortcut in a few more placesThiemo Kreuz2022-07-181-2/+1
|/ | | | | | … instead of doing the same manually with anythingBut() and such. Change-Id: Idb66040d1560a82df9a5bfa2a6c7e20a0649e49c
* Ensure core compatibility with Parsoid external link attributes supportIsabelle Hurbain-Palatin2022-06-241-1/+22
| | | | | | | | | | | * Export nofollow and target settings in siteinfo API so that Parsoid's developer mode of ApiSiteConfig works. * Implement SiteConfig::getNoFollowConfig and SiteConfig::getExternalLinkTarget, which are defined as abstract in the parent class in Parsoid. Bug: T186241 Change-Id: I6a1f12335be19509d4c5a17e2cae96ecdb677103
* Merge "ParserCache: always use JSON"jenkins-bot2022-06-071-1/+0
|\
| * ParserCache: always use JSONdaniel2022-06-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | When JSON support was introduced into ParserCache in 1.36, it was controlled by a feature flag, $wgParserCacheUseJson. The feature flag was "born deprecated" in 1.36. It can now be removed. This means that ParserCache will always store entries as JSON. Support for reading old non-JSON entries remains intact. This is needed when updating wikis from a version older than 1.36 to the current version. Change-Id: Id04e42bfb458d98414bac50e0d6c505e8878e5c0
* | Tests: Cleanup some unnecessary nested function callsReedy2022-06-061-2/+2
|/ | | | | | Replace ->will( ->return with ->willReturn( Change-Id: Ia2dfafa03cac8169d86d6fa5a30b73bfad1fe9fa
* Rest: Migrate parsoid stashing logic from RESTbaseDerick Alangi2022-05-231-0/+48
| | | | | | | | | | | | | | | | | Add stash option to /page/html & /revision/html endpoints. When this option is set, the PageBundle returned by Parsoid is stashed and an etag is returned that can later be used to make use of the stashed PageBundle. The stash is for now backed by the BagOStuff returned by ObjectCache::getLocalClusterInstance(). This patch adds additional data to the ParserOutput stored in ParserCache. Old entries lacking that data will be ignored. Bug: T267990 Co-Authored-by: Nikki <nnikkhoui@wikimedia.org> Change-Id: Id35f1423a69e3ff63e4f9883b3f7e3f9521d81d5
* ParserObserver: Only report duplicate parse if the content is the sameBartosz Dziewoński2022-05-141-2/+4
| | | | | Bug: T303596 Change-Id: Ib3b00a8cfabeb12723ac6a441495d72fd0c0ca92
* Use UrlUtils in ParserAryeh Gregor2022-04-281-1/+6
| | | | Change-Id: I65f851ea29efe482ee225565a200d623fa85bc20
* TempUser EditPage and permissionsTim Starling2022-04-261-1/+3
| | | | | | | | | | | | | | | | | | * Allow EditPage to create a user on page save. This has to be enabled in config and then activated by the UI/API caller. * Add an autocreate source for temporary users. * Allow editing by anonymous users via automatic account creation when $wgGroupPermisions['*']['edit'] = false. On an edit GET request, use an unsaved placeholder user to stand in for post-create permissions. * On preview or aborted save, the username to be created is stashed in a session and restored on subsequent requests. * On a (likely) successful page save, create the account. * Put regular non-temporary users in a "named" group so that they can be given additional permissions. * Use a different "~~~" signature for temporary users * Show account creation warnings on edit and preview. Change-Id: I67b23abf73cc371280bfb2b6c43b3ce0e077bfe5
* Fix SignatureValidatorFactory circular dependencyTim Starling2022-04-131-1/+3
| | | | | | | | | | | Parser is using the service container to get a SignatureValidator because, as noted in Gerrit comments on the relevant commit, there is a circular dependency Parser -> SignatureValidatorFactory -> Parser. So, have SignatureValidatorFactory::__construct() take a closure which returns a Parser, instead of an actual Parser or ParserFactory. Change-Id: I7bf4660f84ec8c8fb1d5b3b8581fe5d82bc3156e
* Revert "Add temporary ParsoidSiteConfigInit hook"Isabelle Hurbain-Palatin2022-04-071-7/+1
| | | | | | | | | This reverts commit 5a9f0300e4573ce2a5f1b7869c9bdff2dc5d4708 on Parsoid - that code has been moved to Core in the meantime. Bug: T303029 Depends-On: 9b9ab2cdd6fd2dbb00e38f652b2856ca5860bffb Change-Id: I34243250542785804ddf6c8210d22715ba9e91fc
* Copy over Parsoid's Config and ServiceWiring classesC. Scott Ananian2022-03-281-0/+798
| | | | | | | | | | | | | | | | | | * This is the first step of migrating Parsoid integration code into core and transitioning Parsoid from an extension to a pure library. * Parsoid already has conditional code to skip loading Parsoid's copy of its classes, but it relies on the existence of ParsoidServices. Technically ParsoidServices isn't needed once Parsoid is migrated to core -- users can just use MediaWikiServices instead -- but we need to temporarily add ParsoidServices as a marker class during the transition. This version of Parsoid's ServiceWiring comes from Parsoid commit 898c813fd832b3f2d7b5a37f60bd65e8368ce18f. Bug: T302118 Change-Id: I0b388d93143a782c2c3b72e46407572e5c586e4a
* Add Sanitizer::removeSomeTags() which uses Remex to tokenizeC. Scott Ananian2022-03-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing Sanitizer::removeHTMLtags() method, in addition to having dodgy capitalization, uses regular expressions to parse the HTML. That produces corner cases like T298401 and T67747 and is not guaranteed to yield balanced or well-formed HTML. Instead, introduce and use a new Sanitizer::removeSomeTags() method which is guaranteed to always return balanced and well-formed HTML. Note that Sanitizer::removeHTMLtags()/::removeSomeTags() take a callback argument which (as far as I can tell) is never used outside core. Mark that argument as @internal, and clean up the version used by ::removeSomeTags(). Use the new ::removeSomeTags() method in the two places where DISPLAYTITLE is handled (following up on T67747). The use by the legacy parser is more difficult to replace (and would have a performace cost), so leave the old ::removeHTMLtags() method in place for that call site for now: when the legacy parser is replaced by Parsoid the need for the old ::removeHTMLtags() will go away. In a follow-up patch we'll rename ::removeHTMLtags() and mark it @internal so that we can deprecate ::removeHTMLtags() for external use. Some benchmarking code added. On my machine, with PHP 7.4, the new method tidies short 30-character title strings at a rate of about 6764/s while the tidy-based method being replaced here managed 6384/s. Sanitizer::removeHTMLtags blazes through short strings 20x faster (120,915/s); some of this difference is due to the set up cost of creating the tag whitelist and the Remex pipeline, so further optimizations could doubtless be done if Sanitizer::removeSomeTags() is more widely used. Bug: T299722 Bug: T67747 Change-Id: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
* remove access to config globals from includes/parserdaniel2022-02-011-8/+7
| | | | | | | | Loops ServiceOptions through to CoreParserFunctions and CoreTagHooks to avoid access to the main config from static methods. Bug: T294739 Change-Id: Ia6c97f2d0952964c2ad6189f8053ad127589b37c
* Merge "Make Sanitizer::stripAllTags() strip css and js tag contents"jenkins-bot2021-12-231-0/+3
|\
| * Make Sanitizer::stripAllTags() strip css and js tag contentsDerk-Jan Hartman2021-12-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We use Sanitizer::stripAllTags primarily to remove formatting from html so that we can use it in places like notifications, emails, search result blurbs etc etc. It is very unlikely we want the raw contents of css and/or js tags anywhere in those places, so lets surpress that content, to make it more readable as template styles are showing up in more and more places. Bug: T228856 Change-Id: I7930361068ddcf3a6c2fdebd0177d142f025b64f
* | Remove Parser dependency on config LanguageCode/DisableLangConversionUmherirrender2021-12-181-1/+5
|/ | | | | | | Use the value from corresponding services, for consistency if services are injected from outside of service wiring. Change-Id: Ib0f6af20df8dbc0deae71023e5493524d43ce211
* Fix line indent in ParserFactoryTestUmherirrender2021-12-181-5/+5
| | | | Change-Id: I9338f6804b741f03b3119b57e86901fdda5317d4
* Move ::addTrackingCategory() implementation to TrackingCategoriesC. Scott Ananian2021-10-151-1/+2
| | | | | | | | | | | | | | | | | | | | This moves the implementation of ParserOutput::addTrackingCategory() to the TrackingCategories class as a non-static method. This makes invocation from ParserOutput awkward, but when invoking as Parser::addTrackingCategory() all the necessary services are available. As a result, we've also soft-deprecated ParserOutput::addTrackingCategory(); new users should use the TrackingCategories::addTrackingCategory() method, or else Parser::addTrackingCategory() if the parser object is available. The Parser class is already kind of bloated as it is (alas), but there aren't too many callsites which invoke ParserOutput::addTrackingCategory() and don't have the corresponding Parser object handy; see: https://codesearch.wmcloud.org/search/?q=%5BOo%5Dutput%28%5C%28%5C%29%29%3F-%3EaddTrackingCategory%5C%28&i=nope&files=&excludeFiles=&repos= Change-Id: I697ce188a912e445a6a748121575548e79aabac6
* Detect and monitor against multiple Parser invocation during edit requestsCindy Cicalese2021-09-231-0/+53
| | | | | Bug: T288707 Change-Id: I0cca8f9bcf1d6e964b8b06c0c4490e83f4fb1de5
* Use PHP \u{xxxx} syntaxFomafix2021-08-271-4/+4
| | | | | | | | Let PHP do the UTF-8 encoding of Unicode characters in PHP strings. Also use faster str_replace instead of preg_replace. Change-Id: I4e99de694a607e2b5df52c6efcd3d863bb42f76e
* parser: Replace deprecated MWHttpRequest::factoryUmherirrender2021-08-041-1/+3
| | | | Change-Id: Id5fe298209cfbc09037799a2cdc117c9b7119172
* Parser: remove Title from method signaturesdaniel2021-04-291-1/+2
| | | | | Bug: T281068 Change-Id: I3280e38dd82d71845c343eeb911e71dd33bb380b
* Reduce mocking LoggerInterfaceDannyS7122021-04-231-2/+1
| | | | | | | Use TestLogger when we want to ensure specific logged messages, or NullLogger when we don't care Change-Id: Ifebc770933d4f5313d5b8b43a52437dbe1e24432
* phpunit: Mass-replace setMethods with onlyMethods and adjustDaimona Eaytoy2021-04-161-3/+3
| | | | | | | | | | | | Ended up using grep -Prl '\->setMethods\(' . | xargs sed -r -i 's/setMethods\(/onlyMethods\(/g' special-casing setMethods( null ) -> onlyMethods( [] ) and then manual fix of failing test (from PS2 onwards). Bug: T278010 Change-Id: I012dca7ae774bb430c1c44d50991ba0b633353f1
* Convert ParserCache to PageRecordPetr Pchelko2021-04-021-1/+4
| | | | | | | | | | | | | | | ParserOptions not updated cause they depend on Title::getLanguage implementation. Tests converted to not require a DB anymore. Can't be proper unit tests yet due to globals in ParserOptions and fake time hacks, but exec time does go down from 70 seconds to 9 seconds. Page content model is still emitted in the metrics since it was considered useful. Should be removed when we get something like a page type concept. Change-Id: Ib16fd0b5b87ffc3cb4d21f4aa43d1203cb7206d2
* Make Parser use UserIdentity instead of UserPetr Pchelko2021-03-171-1/+5
| | | | Change-Id: Idf8578e88af1fd4824f49417a200b16befdbca51
* Parser: initialize preprocessor in constructorC. Scott Ananian2021-03-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Initializing the preprocessor in the constructor allows better dependency injection, and removes code complexity caused by lazy initialization. Any use of the parser is going to end up creating the preprocessor in any case, so deferring the initialization doesn't save any performance. (Best performance is given by not creating the Parser in the first place if it is not needed, which is what DI allows.) Old code tried to unbreak cyclic dependencies by setting the preprocessor to null. This is somewhat of a lost cause, since there are a number of other cyclic dependencies involving the parser, including StripState, LinkHolders, etc. The code complexity is not worth it, given how ineffective it is in any case. This is part of T275160 in so far as it allows Parser::getPreprocessor() to be a simple getter, and thus (once this patch is merged) we can safely replace any direct access to Parser::$mPreprocessor with a call to Parser::getPreprocessor(). Bug: T275160 Change-Id: I38c6fe7d5a97badffdbf34d8b9d725756ed86514
* Introduce Tidy serviceC. Scott Ananian2021-03-152-59/+3
| | | | | | Refactor the old MWTidy singleton as a DI service. Change-Id: I95605ea5fd22f53a7f90fe07a6a73fa6c959597a
* Parser: Move Sanitizer::normalizeCharReferences into RemexCompatFormatterC. Scott Ananian2021-03-151-0/+1
| | | | | | | | | | Choosing a particular encoding of HTML entities is logically a task of the Remex formatter (which serializes HTML). Move it out of the Parser so that it is part of the serialization specification. This is a follow up to Ic8965e81882d7cf024bdced437f684064a30ac86. Change-Id: If45907baf24d60987b39cd1f7709c5f7caf19f37
* More misc test cleanupDannyS7122020-12-241-4/+0
| | | | | | | | | * parent::setUp() should be first, and ::tearDown() should be last * Move tests that directly extend PHPUnit\Framework\TestCase to /unit Change-Id: I1172855c58f4f52a8f624e6d596ec43beb8c93ff
* Introduce RevisionOutputCachedaniel2020-12-141-14/+44
| | | | | Bug: T267981 Change-Id: Ib1dc641ed10d786918362b25bd655780d5844ba1
* Make ParserCache use CachedBagOStuffPetr Pchelko2020-12-071-0/+4
| | | | | Bug: T269593 Change-Id: I21e6e39eccad22b781252b142c1e5b079c1ee0b4
* Clean up ParserCache construction and inject loggerPetr Pchelko2020-09-281-7/+2
| | | | | | Bug: T263583 Depends-On: Iceaa0e872c53aa79b7012711813895221fa62fa6 Change-Id: I6f131a078e9d6eb5da3533b0ac3730e24bd3f56f
* Create ParserCacheFactory.Petr Pchelko2020-09-251-0/+32
| | | | | | | | | | | | | | | | | | | | | * Makes ParserCache take the root of the key as a constructor argument * Introduces a ParserCacheFactory Next steps: - convert FlaggedRevs to using this. - cleanup This assumes that we wouldn't want to differentiate the parser cache settings per use-case, as it is now for default vs flaggedrevs caches. There are only two settings: $wgParserCacheType - name of the BagOStuff to use $wgParserCacheExpireTime - the expiration time. I think if we wanted to have different settings for different caches, we could add that as a next step. Bug: T263583 Change-Id: I188772da541a95c95a5ecece7c7dd748395506c2
* Drop Sanitizer::escapeId(), deprecated in MediaWiki 1.30James D. Forrester2020-07-291-36/+0
| | | | | | Hard deprecation was in b79c1e2, which shipped in MediaWiki 1.35. Change-Id: I7186462c95d346f362ba0cf84b136c083d66a7d3
* Whitespace cleanup: Use tabs for indentation, avoid double spacesDannyS7122020-06-271-1/+1
| | | | Change-Id: I346073b59d283029bd6666356c62c81e687ea5e6
* Update LinkHolderArray tests for new HookContainer parameterTim Starling2020-06-231-5/+10
| | | | Change-Id: I63fc731ca1dbaef6f215279ee0b1788e735783df
* parser: Add Title type hint to LinkHolderArray::makeHolderThiemo Kreuz2020-06-221-32/+0
| | | | | | | We *know* this can never be anything but a Title object: https://codesearch.wmflabs.org/search/?q=makeHolder Change-Id: Id6de0df627f2aeda79c6483f12a6d500ccd7853f
* New unit and integraton tests for class LinkHolderArrayArtBaltai2020-06-041-0/+307
| | | | | Bug: T243747 Change-Id: I2c12cc76a9bf01eb527db3ea038e4adc59446cac