aboutsummaryrefslogtreecommitdiffstats
path: root/includes/parser/Sanitizer.php
Commit message (Collapse)AuthorAgeFilesLines
...
* French spacing: don't require non-space before French spacingArlo Breault2021-03-151-3/+2
| | | | | | | | | This was only necessary when French spacing was added before doBlockLevels. Follow up to I654a09b0f98937379b9fad3f325134ead7f2d8a6 Change-Id: I9bff6b7599d97c39334a0bd0f731f29875da17bb
* Don't worry about something before when armoring french spacesArlo Breault2021-03-011-1/+1
| | | | | | | | | | | | | | | | | | We lost some insight in c44a395 because we're no longer analysing the entire dom as a serialized string, but instead running our regexp on individual text nodes. This patch as written here just allows for the space to be at the start of the text node. However, some git spelunking shows that in 9dc65ef, the condition for there being a non-whitespace character previous to the space was only because armoring French spacing happened before doBlockLevels and wanted to protect indent pre's. That's certainly not the case anymore, so we can probably get away with dropping the condition altogether now. Bug: T275918 Change-Id: I654a09b0f98937379b9fad3f325134ead7f2d8a6
* Don't apply French spacing in raw text elementsArlo Breault2021-02-161-3/+0
| | | | | | | | | | | | | This also means we don't need to take special care for French spacing in attributes, since it's no longer applied there. Adds a test that captures this change. Note that the test "Nowiki and french spacing" wonders whether this escaping should be applied to nowiki content. Bug: T255007 Change-Id: Ic8965e81882d7cf024bdced437f684064a30ac86
* Use static closures where safe to useUmherirrender2021-02-111-3/+3
| | | | | | | | | This is micro-optimization of closure code to avoid binding the closure to $this where it is not needed. Created by I25a17fb22b6b669e817317a0f45051ae9c608208 Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653
* Fix some unit tests accessing MediaWikiServicesDaimona Eaytoy2020-11-121-0/+1
| | | | | | | | | | | | | These are mostly easy fixes. Tests were fixed when that didn't require any change to the tested code, and moved to /integration otherwise. MediaWikiUnitTestCase::setTemporaryHook was removed: the caller should provide a HookContainer, at which point it would just become a useless wrapper around HookContainer::register. (We don't really need it to be temporary, if proper DI is used). The method was only used in the tests touched by this commit. Change-Id: I2aba02560c41b77eea9dd4bff0e4d1c4bb0da9a2
* Remove figure-inline from the set of allowed tags in the SanitizerArlo Breault2020-09-111-1/+0
| | | | | | | | This was added in f6038b0 to keep Parsoid and the legacy parser in sync. However, in T251641, we're moving away from using it in both. Bug: T251641 Change-Id: I148bcf09e64ae443104723f94e6bbdb4ad23a8ef
* Merge "Hard-deprecate Sanitizer::escapeIdReferenceList()"jenkins-bot2020-08-211-1/+13
|\
| * Hard-deprecate Sanitizer::escapeIdReferenceList()C. Scott Ananian2020-08-201-1/+13
| | | | | | | | | | | | | | Code search: https://codesearch.wmcloud.org/search/?q=escapeIdReferenceList&i=nope&files=&repos= Followup-To: Ifce057b0c436eabec310f812394e86ee7123e7c8 Change-Id: I18f2c47ad6b4f6256d1727f24314cc3c5e13f466
* | Sanitizer: use RemexHtml entity table, instead of its ownC. Scott Ananian2020-08-211-287/+46
|/ | | | | | | | | | | | | | | | | | | | Reduce code duplication by using the authoritative HTML entity list from Remex, instead of duplicating the table inside MediaWiki. This also extends the set of entities accepted in wikitext to nearly match HTML5. (HTML5 allows some entities which are not semicolon-terminated; wiktext insists on the semicolon.) This patch brings the core parser closer to Parsoid output, as in most cases Parsoid already accepted the full HTML5 entity list. (I873a6120e4bd1c69fee9da76d266e24e97a22add is a corresponding patch to Parsoid to unify its copy of Sanitizer.) Also deprecate Sanitizer::hackDocType() while we're updating it, since this method should not be public. Bug: T94603 Change-Id: Ia08bc261c3644f83109f13df04b692101b4e8ef2
* Sanitizer: Truncate IDs to a reasonable length; deprecate escapeIdReferenceListC. Scott Ananian2020-08-131-0/+5
| | | | | | | | | | | | | | | Overly-long anchors can cause OOMs later on during TOC processing, and are needless. The method Sanitizer::escapeIdReferenceList() is also deprecated in this patch, since it is a way to get around the ID length limit and appears to be unused outside the Sanitizer class. Since the use within Sanitizer (for ARIA attributes) appears safe, we'll just make this private in a future release and avoid the potential that someone will misuse this. Bug: T251506 Change-Id: Ifce057b0c436eabec310f812394e86ee7123e7c8
* Drop Sanitizer::escapeId(), deprecated in MediaWiki 1.30James D. Forrester2020-07-291-53/+5
| | | | | | Hard deprecation was in b79c1e2, which shipped in MediaWiki 1.35. Change-Id: I7186462c95d346f362ba0cf84b136c083d66a7d3
* Use @internal instead of @private per policydaniel2020-06-261-2/+2
| | | | | | | | https://www.mediawiki.org/wiki/Stable_interface_policy mandates the use of @internal. The semantics of @private was never properly defined. Bug: T247862 Change-Id: I4c7c6e7b5a80e86456965521f88d1dfa7d698f84
* Introduce wfDeprecatedMsg()Tim Starling2020-06-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Deprecating something means to say something nasty about it, or to draw its character into question. For example, "this function is lazy and good for nothing". Deprecatory remarks by a developer are generally taken as a warning that violence will soon be done against the function in question. Other developers are thus warned to avoid associating with the deprecated function. However, since wfDeprecated() was introduced, it has become obvious that the targets of deprecation are not limited to functions. Developers can deprecate literally anything: a parameter, a return value, a file format, Mondays, the concept of being, etc. wfDeprecated() requires every deprecatory statement to begin with "use of", leading to some awkward sentences. For example, one might say: "Use of your mouth to cough without it being covered by your arm is deprecated since 2020." So, introduce wfDeprecatedMsg(), which allows deprecation messages to be specified in plain text, with the caller description being optionally appended. Migrate incorrect or gramatically awkward uses of wfDeprecated() to wfDeprecatedMsg(). Change-Id: Ib3dd2fe37677d98425d0f3692db5c9e988943ae8
* Merge "Hard-deprecate sequential array as parameter to ↵jenkins-bot2020-06-151-4/+5
|\ | | | | | | Sanitizer::validateAttributes"
| * Hard-deprecate sequential array as parameter to Sanitizer::validateAttributesC. Scott Ananian2020-06-151-4/+5
| | | | | | | | | | | | | | | | | | | | Code search: https://codesearch.wmflabs.org/search/?q=validateAttributes&i=nope&files=&repos= Bug: T255049 Depends-On: I68f122d5a3fa06b0434863cff73851a39dd10514 Depends-On: Ia6315da837f1b27794bac8bc2e96008c60ca28ae Change-Id: Ie942e7e24dbf3256db15fe83bb5592f7a7c2fbc1
* | Merge "Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'"jenkins-bot2020-06-121-22/+22
|\|
| * Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'C. Scott Ananian2020-06-101-22/+22
| | | | | | | | | | Bug: T254646 Change-Id: I48d1a5b318c3511fae94291d84f65e5c9cd05a27
* | Hard deprecate $wgAllowImageTag configurationC. Scott Ananian2020-06-101-0/+1
|/ | | | | | | | | | | | The future Parsoid parser will not support this, and it appears to be unused. It could be reimplemented as an extension tag once it is removed from core. Code search: https://codesearch.wmflabs.org/search/?q=allowimagetag&i=fosho&files=&repos= Bug: T254802 Change-Id: I1b532a7a8794766f8df6fdf375a6ffd78fee94e5
* Remove unnecessary use of black/whitelist in Sanitizer commentsC. Scott Ananian2020-06-101-6/+7
| | | | | Bug: T254646 Change-Id: Ie1d4ce761f02304db4a990495e687e75e6783411
* Hooks::run() call site migrationTim Starling2020-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate all callers of Hooks::run() to use the new HookContainer/HookRunner system. General principles: * Use DI if it is already used. We're not changing the way state is managed in this patch. * HookContainer is always injected, not HookRunner. HookContainer is a service, it's a more generic interface, it is the only thing that provides isRegistered() which is needed in some cases, and a HookRunner can be efficiently constructed from it (confirmed by benchmark). Because HookContainer is needed for object construction, it is also needed by all factories. * "Ask your friendly local base class". Big hierarchies like SpecialPage and ApiBase have getHookContainer() and getHookRunner() methods in the base class, and classes that extend that base class are not expected to know or care where the base class gets its HookContainer from. * ProtectedHookAccessorTrait provides protected getHookContainer() and getHookRunner() methods, getting them from the global service container. The point of this is to ease migration to DI by ensuring that call sites ask their local friendly base class rather than getting a HookRunner from the service container directly. * Private $this->hookRunner. In some smaller classes where accessor methods did not seem warranted, there is a private HookRunner property which is accessed directly. Very rarely (two cases), there is a protected property, for consistency with code that conventionally assumes protected=private, but in cases where the class might actually be overridden, a protected accessor is preferred over a protected property. * The last resort: Hooks::runner(). Mostly for static, file-scope and global code. In a few cases it was used for objects with broken construction schemes, out of horror or laziness. Constructors with new required arguments: * AuthManager * BadFileLookup * BlockManager * ClassicInterwikiLookup * ContentHandlerFactory * ContentSecurityPolicy * DefaultOptionsManager * DerivedPageDataUpdater * FullSearchResultWidget * HtmlCacheUpdater * LanguageFactory * LanguageNameUtils * LinkRenderer * LinkRendererFactory * LocalisationCache * MagicWordFactory * MessageCache * NamespaceInfo * PageEditStash * PageHandlerFactory * PageUpdater * ParserFactory * PermissionManager * RevisionStore * RevisionStoreFactory * SearchEngineConfig * SearchEngineFactory * SearchFormWidget * SearchNearMatcher * SessionBackend * SpecialPageFactory * UserNameUtils * UserOptionsManager * WatchedItemQueryService * WatchedItemStore Constructors with new optional arguments: * DefaultPreferencesFactory * Language * LinkHolderArray * MovePage * Parser * ParserCache * PasswordReset * Router setHookContainer() now required after construction: * AuthenticationProvider * ResourceLoaderModule * SearchEngine Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
* Use HTML5 semantics for self-closed HTML tags in wikitextC. Scott Ananian2020-05-271-13/+4
| | | | | | | | | | This behavior has been deprecated and with a tracking category since 1.28. Time to remove the temporary parameter added to Sanitizer::removeHTMLtags() and (finally) tweak the behavior to match HTML5. Bug: T134423 Change-Id: I5c725175d05854139c95a2b3d8d35ff63cb6707b
* Fix more Squiz.Scope.MethodScope.MissingReedy2020-05-181-27/+27
| | | | Change-Id: I44cd7ba39a898a27f0f66cf34238ab95370d2279
* Fix even more PSR12.Properties.ConstantVisibility.NotFoundReedy2020-05-161-6/+6
| | | | Change-Id: I6d98efcfac1f1c0ab6a442e0af6d5daa6ef7801a
* Fix SingleSpaceBeforeSingleLineCommentReedy2020-05-111-1/+1
| | | | Change-Id: I285af438ce484af40741489797f20455726ec110
* Remove codepaths which ran parser in 'untidy' modeC. Scott Ananian2020-04-131-167/+29
| | | | | | | | Disabling tidy has been deprecated since 1.33. This cleans up the code paths which still used untidy output. Bug: T198214 Change-Id: I821ef3b8f59b272d983583d407b2f0794fe1e791
* Allow users to set tabindex="0" on elementsBrian Wolff2020-03-181-0/+6
| | | | | | | | | | | | Important for keyboard focusability of elements in order to ensure for example users with motoric impairments to reach those elements. This patch does not allow setting tabindex="-1" or tabindex > 0. tabindex > 1 seems like a terrible idea to allow users to do. I don't see any valid reason for tabindex="-1" in wikitext, so lets not allow that for now either. Bug: T247910 Change-Id: I5065b2deeb14bdb3682dd176b87f254ac6f2cf88
* Make id attributes not include ascii whitespace per specBrian Wolff2020-02-251-1/+5
| | | | | | | | | | | | | | | | HTML5 says id attributes should not have whitespace, where whitespace is defined as LF, CR, FF, TAB or SPACE (oddly enough VT does not count). Firefox in my testing actually was fine with these except CR. Nonetheless we should follow the spec, so this converts these whitespace characters to _. I don't think this will cause any back-compat issues, since its very hard to make these characters in wikitext (other than space which was already being converted) and basically requires either Lua or html entities to make these (with FF seeming to be impossible). Bug: T238385 Depends-On: Ie6fa40798f06a358f6082110b4d8cc0028c80321 Change-Id: Ie2b7c9429691e2c491c3359d5b400d8f078aa789
* Escape % sign if form valid percent-encoding in fragment identifiersBrian Wolff2020-02-151-2/+19
| | | | | | | | | | | | Currently if you combine a valid percent encoding and a non escaped character that is reserved in urls in a headline, the toc link does not work. E.g. ==`%41== needs #`%2541 but we currently generate #`%41 which matches ==`A== instead. Tested in firefox and chrome Bug: T238385 Change-Id: Ice2bbf79bed612d488ed6feb7510035e9dfb33af
* Whitelist `aria-hidden` attribute in SanitizerC. Scott Ananian2020-01-281-0/+1
| | | | | Bug: T204618 Change-Id: I34b9b729eccd7658d5165b6661e5fd45a733b36c
* Merge "Hard-deprecate Sanitizer::escapeId()"jenkins-bot2020-01-261-0/+1
|\
| * Hard-deprecate Sanitizer::escapeId()C. Scott Ananian2020-01-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deprecated in MW 1.30; time to clean up any remaining uses. Code search: https://codesearch.wmflabs.org/deployed/?q=escapeId%5C%28&i=nope&files=&repos= Depends-On: Ic03a5da2e1d6b8f5656555420dd573a1d698b9cc Depends-On: I311f44a5035f73c0fb2289f727eb39b73007429b Depends-On: I76c5b539bae5572c4ac65f28fec9c0c36381348c Depends-On: Id4cbfc3b113b1b04f949d485187e89ffe0b487f5 Depends-On: I7d5ba4930688ed7f011a4babed5986b8e40910a0 Depends-On: I964f83ce88fb9c66a7c59037c6066f4567bcf4c9 Change-Id: I89504cfdf8e02831d54a26900bfdc63a33b4eade
* | Remove Sanitizer::attributeWhitelist()/setupAttributeWhitelist()C. Scott Ananian2020-01-251-29/+0
|/ | | | | | | | | | These method were deprecated in 1.34 and should never have been public in the first place. New private methods have replaced them. Code search: https://codesearch.wmflabs.org/deployed/?q=attributeWhitelist%5C%28&i=nope&files=&repos= Change-Id: I363530b7edaced77f2c5b06721b1930d85e2e9dc
* Coding style: Auto-fix MediaWiki.Usage.IsNull.IsNullJames D. Forrester2020-01-101-1/+1
| | | | Change-Id: I90cfe8366c0245c9c67e598d17800684897a4e27
* Remove IE 6 security features from server-side codeTim Starling2019-11-281-38/+0
| | | | | | | | | | | | | | * Deprecate WebRequest::checkUrlExtension() and have it always return true. This reverts the security fixes made for T30235. * Remove IEUrlExtension. This is a helper for checkUrlExtension() which is not used in any extensions. * Remove CSS sanitization code which is specific to IE6. This reverts the changes made to fix T57332, and related followups. I confirmed that the relevant test cases do not result in XSS on IE8. * Remove related tests. Bug: T232563 Change-Id: I7318ea4a63210252ebc64968691d4f62d79a63e9
* Improve efficiency of french-spacing regexpC. Scott Ananian2019-10-251-1/+1
| | | | | | | Improvement pointed out by Od1n (thanks!). Bug: T197902 Change-Id: I4c560539873b2c50f8658df89263e927efc9ce10
* Convert some private static arrays to constantsMax Semenik2019-10-161-11/+11
| | | | | | | Remove @since for some private ones as we don't guarantee anything about private class members. Change-Id: Ifb898353c02082e9ef69d67f69339345c6cd154d
* Set @return-taint of Sanitizer::stripAllTags to taintedsbassett2019-08-131-0/+1
| | | | | | | | | | phan-taint-check (aka SecurityCheckPlugin) doesn't recognize Sanitizer::stripAllTags' output as tainted in certain situations. Adding a @return-taint of tainted to ensure that it does, which may result in the reporting of more issues. Bug: T230234 Change-Id: I357c168417a26882c7c460df20f36ec2be401096
* Deprecate Sanitizer::setupAttributeWhitelist/attributeWhitelistC. Scott Ananian2019-06-201-40/+85
| | | | | | | | | | | | | These methods should be made private in the next release, but hard-deprecate them for 1.34. Tweak the return value of the attribute whitelist to be an associative rather than a sequential array, which makes the lookup of allowed attributes more efficient and avoids an array_flip for every html element sanitized. Bug: T221677 Change-Id: I17d734937accec6c2679dbe17328cf9554bd556a
* Use [...] instead of array(...) in PHP comments and documentationFomafix2019-06-171-2/+2
| | | | Change-Id: I0c83783051bf35fe785bc01644eeb2946902b6b2
* SECURITY: blacklist CSS var()Max Semenik2019-06-061-0/+1
| | | | | Bug: T208881 Change-Id: I9a4ced2bc47eb5f96cf35e693bf5261c48acb126
* Allow <figure-inline> attributes through SanitizerC. Scott Ananian2019-04-221-0/+1
| | | | | | | | | | | | | | | | Parsoid uses <figure-inline> for inline figures. The intention is to transition core to use <figure> and <figure-inline> as well in the future (T118517). As a first step (and to keep Parsoid and the legacy parser in sync) allow <figure-inline> attributes in the Sanitizer. Note that this does not allow <figure-inline> in wikitext, since neither <figure> nor <figure-inline> is on the getRecognizedTagData() list. Bug: T51097 Bug: T118517 Bug: T118520 Change-Id: I5248717739bef0f7106c2bcf0b4a15acbc3c9a68
* Synchronize allowed attributes for <audio> with Parsoid/TimedMediaHandlerC. Scott Ananian2019-04-221-1/+2
| | | | | | | | | | | | We synchronized the allowed attributes for <video> in 4e7483ffd31dd05c11b16bf37552c25ed648bd0a but then decided to use the <audio> tag for audio media in Parsoid commit 5f3dbdc8794f2605101609f28e679df29a0387bc and updated its Sanitizer, but never updated core to match. Bug: T163583 Bug: T133673 Change-Id: Iefcbead2f335949eb45e2880861fd9473b810367
* Collapse some nested if statementsReedy2019-04-041-4/+2
| | | | Change-Id: I9a97325d738d09370d29d35d5254bc0dadc57ff4
* Sanitizer: remove deprecated parameter to escapeIdReferenceList()Max Semenik2019-02-211-8/+2
| | | | Change-Id: Iacd5796718c1d64e7290cfd9669c99d8f9e85dc5
* Merge "Quoted attributes don't need to be followed by a space"jenkins-bot2018-11-271-10/+34
|\
| * Quoted attributes don't need to be followed by a spaceArlo Breault2018-11-091-10/+34
| | | | | | | | | | | | Further, this splits up attribute parsing from filtering. Change-Id: Ib4e0a808a6ca2ba032873e885837233e2f2feefe
* | Hard deprecate codepaths where tidy is disabledC. Scott Ananian2018-11-051-0/+1
|/ | | | | | | | | | | | | | | | | Future parsers will not support the output generated with tidy disabled. Parser tests using untidied output will also be deprecated (and rewritten) in a follow-up patch. No new release notes necessary since user-visible tidy configuration was deprecated previously (in 1.32), and individual methods which had disabled tidy during execution were individually release-noted as they were updated. Bug: T198214 Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428 Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11 Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
* Preserve whitespace in search index text contentErik Bernhardson2018-09-141-3/+3
| | | | | | | | | Certain html tags imply a word break, but our html stripping doesn't understand that at all. Adjust the html stripping to inject whitespace for all block level tags (per MDN) along with the <br> element. Bug: T195389 Change-Id: I9fbfac765ea88628e4f9b2794fb54e1cd0060203
* Mass conversion of $wgContLang to serviceAryeh Gregor2018-08-111-2/+3
| | | | | | | Brought to you by vim macros. Bug: T200246 Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
* Merge "Don't armor french spaces before punctuation followed by word characters"jenkins-bot2018-07-131-2/+3
|\