aboutsummaryrefslogtreecommitdiffstats
path: root/includes/ExternalLinks
Commit message (Collapse)AuthorAgeFilesLines
* Use type declaration on undocumented private functionsUmherirrender2025-04-021-1/+1
| | | | Change-Id: I0d8d2237500ed6f18439410c902d47c42e4119bc
* Re-apply "Drop all 49 remaining class_aliases from MediaWiki 1.40"Daimona Eaytoy2025-03-051-3/+0
| | | | | | | | | This reverts commit 1695950bccb1ca7eba98952753708ae7c4b76d8d and re-applies commit I8f3c2ea021d0f6e. Reason for revert: the remaining usages have been updated in Ida665f486eff384. Bug: T166010 Change-Id: I43f06e6872b264e43aef7fa7c2ac47159926a694
* Revert "Drop all 49 remaining class_aliases from MediaWiki 1.40"Ahmon Dancy2025-03-041-0/+3
| | | | | | | | | This reverts commit db47e7f7154a2121bce6d3d9e93a74486bf765f3. Reason for revert: Broke scap sync-world in beta, and possibly caused T387938 Bug: T166010 Change-Id: If608c3e27081bb36b284ad16a5b912dd51b3557e
* Drop all 49 remaining class_aliases from MediaWiki 1.40James D. Forrester2025-03-041-3/+0
| | | | | | | Bug: T166010 Depends-On: Iba93dd9749656e641c427e01790d7a14cd1a2dc2 Depends-On: I97ccc2c49ce09ca96192bf6ffdc833c1765c3faa Change-Id: I8f3c2ea021d0f6e574dde901f0bfd4a0408f5455
* ExternalLinks: fix mailto: links reversalAmmarpad2025-02-271-2/+2
| | | | | | | | | | If $mailparts does not contain two elements (which would be the case when the separator `@` is not present in the string), then we cannot access $mailparts[1]. In this case, the entire path as is, is treated as the host. Bug: T380880 Change-Id: I10187c93e67ce9294ff0b3866939d2c7d7292a9a
* Move remaining four classes in includes/content into Content namespaceJames D. Forrester2024-08-101-1/+1
| | | | | Bug: T353458 Change-Id: Ia0f3e22078550be410c4b87faf6aa4eabe6e270d
* rdbms: Create IReadableDatabase::andExpr() / ::orExpr()Umherirrender2024-07-111-2/+1
| | | | | | | | | | | | | | | | | | | | | Avoid the call to internal constructor of AndExpressionGroup and OrExpressionGroup by creating a factory function similiar as the IReadableDatabase::expr function for Expression objects. This is also a replacement for calls to ISQLPlatform::makeList with LIST_AND or LIST_OR argument to reduce passing sql as string to the query builders. Created two functions to allow the return type to be set for both expression group to allow further calls of ->and() or ->or() on the returned object. Depending on the length of the array argument to makeList() it is sometimes hard to see if the list gets converted to AND or OR, having the operator in the function name makes it easier to read, so two functions are helpful in this case as well. Bug: T358961 Change-Id: Ica29689cbd0b111b099bb09b20845f85ae4c3376
* Add namespace and deprecation alias to TextContentEbrahim Byagowi2024-05-191-1/+1
| | | | | | | | | This patch introduces a namespace declaration for the MediaWiki\Content to TextContent and establishes a class alias marked as deprecated since version 1.43. Bug: T353458 Change-Id: Ic251b1ddfcf6db9c85cb54cddf912aa827d2bc3a
* LinkFilter::makeLikeArray: Fix another 'path' accessLucas Werkmeister2024-05-141-1/+4
| | | | | | | | | | | | | If a news: or mailto: URL is specified with two slashes, it will have a 'host' rather than a 'path' after all, so this workaround is unnecessary and should be skipped in that case; compare also change Idc6b389da9 (commit ec1b572362) for makeIndexes(). I’m not very sure that the test case makes much sense, but it’s at least enough to trigger the error and verify the fix. Bug: T364743 Change-Id: I09be813e661b80968da00d8a898b2add8c95fec7
* Fix some line indentUmherirrender2024-04-201-2/+2
| | | | Change-Id: I8f82724197d20f9289d80e138d80310f1eab29f2
* Standardise all our class alias deprecation comments for ease of greppingJames D. Forrester2024-03-191-3/+1
| | | | Change-Id: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
* Combine the expressions in LinkFilter::getQueryConditionsUmherirrender2024-03-021-7/+4
| | | | | | | | | | Avoid use of array. It removes some extra parenthesis from the query which are only one expression and no longer a AndExpressionGroup, as a group with one element is not needed. Bug: T358961 Change-Id: I9daad5e3703bd4a94f56d384c922cb415b5c2fb4
* Change uses of getDBLoadBalancerFactory() to getConnectionProvider()Bartosz Dziewoński2024-01-221-2/+2
| | | | | | | | | | | | Update cases where one of the IConnectionProvider methods is called immediately. This doesn't really change anything, but I hope it helps promote getConnectionProvider() as the common way to do this. Follow-up to 8604c384f624273f46b653ec252ffaed30e6ff89. Change-Id: Id0e7d02bab0c570343c2b1f03c70b44ee39db112
* Replace deprecated wfParseUrl with UrlUtils::parseDogu2024-01-081-4/+5
| | | | | | | The wfParseUrl function is deprecated as of MediaWiki 1.39 and has been replaced with the UrlUtils::parse method provided by the UrlUtils class. Change-Id: I5df192af99b38774c458bd4e0836fdce581683dd
* rdbms: Add support for LIKE in expression builderAmir Sarabadani2023-11-031-9/+13
| | | | | Bug: T210206 Change-Id: Iec33a64bb1ec1485ce91b8b05e660f8c1723182b
* Mass migrate simple cases to use expression builderAmir Sarabadani2023-10-261-1/+1
| | | | | | | | | | | | | | | Done via '([A-Za-z_\.]+) ?(=|!=|<|<=|>|>=) ?' . (\$db(?:r|w|))->addQuotes\( (.+?) \) to: $3->expr\( '$1', '$2', $4 \) And '([A-Za-z_\.]+) IS NULL OR ([A-Za-z_\.]+) ?(=|!=|<|<=|>|>=) ?' . (\$db(?:r|w|))->addQuotes\( (.+?) \) to: $4->expr( '$1', '=', null )->or\( '$2', '$3', $5 \) Bug: T210206 Change-Id: I109bf2a712bdefa9e074f775b1bee41ac5b9d665
* Replace single-value $db->buildComparison() with $db->expr()Bartosz Dziewoński2023-10-221-2/+2
| | | | | | | | | | Find: ->buildComparison\( ('..?'), \[(\s*)([^\],]+) => ([^\],]+)(\s*)\] \) Replace with: ->expr($2$3, $1, $4$5) Change-Id: I2cfc3070c2a08fc3888ad48a995f7d79198cc336
* Merge "Improve performance of trivial encoding/decoding regexes"jenkins-bot2023-10-171-4/+2
|\
| * Improve performance of trivial encoding/decoding regexesthiemowmde2023-10-041-4/+2
| | | | | | | | | | | | | | | | | | | | Instead of replacing 1 character at a time the functions used here can replace sequences of any length. This can dramatically reduce the function call overhead. Also make use of the `fn ()` syntax because we can. Change-Id: I2dbc2271aa7847d9b687703f837cb0d850596ef0
* | LinkFilter::makeIndexes: Don't explode if the 'host' key is missing for news://James D. Forrester2023-10-101-1/+4
|/ | | | | Bug: T347574 Change-Id: Idc6b389da974a70bdee9b1d49e4b5c45ccdd0d73
* Merge "Follow RFC 3986 on what is path in mailto URLs"jenkins-bot2023-09-201-1/+17
|\
| * Follow RFC 3986 on what is path in mailto URLsPetr Pchelko2023-09-041-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This hack was originally added to wfParseUrl as a fix for T10324 specifically for LinkFilter, however according to the RFC 3986 this is wrong. RFC defines that in URLs the authority component must start with //, so in urls without //, e.g. news: or mailto: there is no authority component, and thus no host component, everything after : is actually a path, so default PHP parse_url is correct. RFC even has an example: > For example, the URI <mailto:fred@example.com> has a path of "fred@example.com". It's fairly ugly to just copy-paste the hack into LinkFilter, but I didn't find an easy and elegant way to rewrite it without making any changes to the link indexes values stored in the DB. See https://datatracker.ietf.org/doc/html/rfc3986 Co-Authored-by: 沈澄心 <dringsim@qq.com> Change-Id: I3dd04495db9c7a66f62c3914c0eff06754b7d560
* | Add $wgExternalLinksDomainGaps config settingLucas Werkmeister2023-09-061-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This setting can be used to optimize externallinks queries for certain domains that have many entries in the externallinks table, but also big “gaps” where the table contains no entries for that domain. By putting those gaps (whose el_id values would usually have been obtained on the analytics databases) into the configuration, we can have MediaWiki tell the database to skip those ranges of the table instead of scanning through them. (This is only relevant for domains that have enough entries that the database chooses to scan the table in primary key order rather than using the el_to_domain_index_to_path index and filesorting.) Bug: T341000 Change-Id: Iec4fe01aaa595fbaf3b427b7baa68a9d7209b117
* | ExternalLinks: Drop migration codeAmir Sarabadani2023-09-052-95/+10
| | | | | | | | | | | | | | Anything that writes or reads from now-dropped columns Bug: T312666 Change-Id: Ic1c69de717bfa03bba94e97dabad9e717ba13fd6
* | Merge "Schema: Drop old externallinks columns and indexes"jenkins-bot2023-09-051-10/+0
|\ \
| * | Schema: Drop old externallinks columns and indexesAmir Sarabadani2023-09-051-10/+0
| |/ | | | | | | | | | | | | | | | | | | Already dropped from production Also dropping FixExtLinksProtocolRelative as it's not useful anymore and it has been run in previous releases so it's not worth fixing. Bug: T312666 Change-Id: I1dd6e704b34e685ada6e316da11243d10827d769
* / Migrate calls to wfGetDB() in static methodsAmir Sarabadani2023-09-051-3/+3
|/ | | | | | | | | | | | | wfGetDB() has been deprecated since 1.39 (or more?) and it's better to inject LBF and call ::getReplicaDatabase() or ::getPrimaryDatabase() which is not straightforward in classes but for static functions, there is no way to inject the method so we can simply call MediaWikiServices::getInstance()->getDBLoadBalancerFactory() While I was here, I migrated one call to SelectQueryBuilder. Bug: T330641 Change-Id: Idd2278cef647035dce05a2d461a620e145fe1167
* Follow-up 22cec53: Add in-code comment on alias for when it was addedJames D. Forrester2023-08-251-0/+3
| | | | Change-Id: I98a0cc509c3436a2f77996781db393d555d3504a
* Externallinks: Keep domain wildcard if path is not specifiedAmir Sarabadani2023-07-111-1/+1
| | | | | | | | Currently, if the query is *.wikipedia.org, it still makes an exact match to only wikipedia.org and not any of the subdomains. Bug: T326251 Change-Id: Ib372c35220a89ad9cd4d9879f4436ed153a830c7
* ExternalLinks: Make oneWildcard avoid adding wildcard to domainAmir Sarabadani2023-07-101-1/+6
| | | | | | | | | This is not providing much value and on top of that it makes using the el_to_domain_index_to_path index possible by turning like into exact match. Bug: T326251 Change-Id: Icace8725ab8b19e78072ed45f306ccf4ef90e2eb
* ExternalLinks: Clean up LinkFilter file header and code commentsTimo Tijhof2023-06-211-23/+21
| | | | | | | | | | | | | | | | Clean up doc blocks. Remove redundant file-level description and ensure the description and any ingroup are on the class block. Ref https://gerrit.wikimedia.org/r/q/owner:Krinkle+message:ingroup Remove mention of outdated `el_index` field and instead describe the purpose more generally. The internal column names should mostly not matter to the callers anyway. Follows-up I123662f40f6efb, mostly pre-existing issues except for the duplicate `'protocol'` default being specified in two places which this patch improves upon. Change-Id: Ief9b733377ce4611881b15b7faeedc5ee13916ae
* LinkSearch: Change default protocol to http:// and https:// in READ_NEWAmir Sarabadani2023-06-161-14/+30
| | | | | | | | | | Now that el_to_domain is much smaller and indexed, this shouldn't be taxing on the database anymore. It's not perfect but it works beautifully. Bug: T14810 Change-Id: I123662f40f6efbfd24f280984cd824ced6892840
* Externallinks: Make port part of the indexAmir Sarabadani2023-06-081-1/+5
| | | | | | | This is important in rebuilding the URL and causes bugs such as T337149#8910620 Bug: T337149 Change-Id: I9cd5a17da6da9fdd85574de06e6f5d0310dd48f3
* ExternalLinks: Make IP links work with read newAmir Sarabadani2023-05-311-12/+38
| | | | | | | | | Fixed tests and such. It's not the prettiest patch I have ever written but I'm planning to refactor the whole class once we are done with the data migration. Bug: T337149 Change-Id: I3303a063455cf444b78f4d5832d6bf243b290556
* ExternalLinks: Fix mailto: handling in read newAmir Sarabadani2023-05-311-3/+7
| | | | | | | Added regression tests too. Bug: T337149 Change-Id: Ia5edf60cd4180bc92e87a5cebf34cf23aed3c574
* ExternalLinks: Add support for non-reveresed indexed URLsAmir Sarabadani2023-05-261-3/+36
| | | | | | | | | This can be useful to get https://foo.com out of proto-relative one or add trailing / to end of URLs without one, etc. So we could compare content of externallinks with the URLs provided. Bug: T337149 Change-Id: I921728974cde0a095fb3034fc80f7f4bb046f380
* Stop storing more than one row for proto-relative external linksAmir Sarabadani2023-05-251-3/+2
| | | | | Bug: T335819 Change-Id: I21e467bdd57768bee0ca0a6018fec4e20009911e
* ExternalLinks: Add function for looking up extlinks of a pageAmir Sarabadani2023-04-241-0/+65
| | | | | | | | This logic has been repeated in three different extensions, let's DRY them up. Bug: T326251 Change-Id: I8ae9ef388957b0c04efa281f3bc3b5796bec17fe
* Add support for externallinks read newAmir Sarabadani2023-04-201-16/+89
| | | | | | | | | In API and Special:LinkSearch That's basically where it's needed in core. Bug: T326251 Change-Id: I7f95a2bb983987319c1b0ca0ff231064b0c07278
* Reorg: Move LinkFilter to ExternalLinksAmir Sarabadani2023-03-011-0/+423
It's one-class namespace and I know it's not great but: - I hope to add more classes with the redesign of externallinks table - It's not named very well either, it's a collection of URL-related functionalities - Making it clear LinkFilter is about external links, not internal or interwiki or templatelinks etc. Bug: T321882 Change-Id: I0dd530237f45e4fec786178ec03ee941c6bcd982