aboutsummaryrefslogtreecommitdiffstats
path: root/includes/Import.php
Commit message (Collapse)AuthorAgeFilesLines
* Split classes in Import.php into separate filesKrzysztof Zbudniewek2015-12-291-2054/+0
| | | | | Bug: T122532 Change-Id: Ic4463ab8d3a7b2779f43efb92cb790dbc1d88064
* PostgreSQL: Add quotes to timestampJeff Janes2015-12-271-1/+1
| | | | | | | | | The fix for bug T114806 doesn't quote timestamps it sends directly to the database (i.e. not in bind variables). Timestamps in PostgreSQL require quotes. Add addQuotes call. Bug: T121743 Change-Id: If8da1a0171f55d59c63f5501c854aa8fa48d5992
* Import: Importing no longer accepts too big revisionsThis, that and the other2015-12-231-0/+24
| | | | | | | Make sure the size of imported revisions does not exceed $wgMaxArticleSize. Bug: T73230 Change-Id: I6addace7d0eae565196bba564fbd1e329b681064
* Import: Properly handle deleted usernames in XML dumpsgeorggi2015-12-211-2/+6
| | | | | | | | Fixed username being not shown at all when contributor is deleted Fixed text not being shown when contributor is deleted Bug: T121338 Change-Id: I981c326f61735ace1d1fba35428bfc25d127b544
* Handle missing titles and usernames when importing log itemsgeorggi2015-12-191-7/+32
| | | | | Bug: T121338 Change-Id: Idf95263e4f22225509da4ee07fcb14383028894b
* Replace wfBaseConvert with Wikimedia\base_convertReedy2015-11-241-1/+1
| | | | Change-Id: Iadab3d018c3559daf79be90edb23d131729bdb68
* Merge "Use interwiki cache directly to resolve transwiki import sources"jenkins-bot2015-11-051-16/+31
|\
| * Use interwiki cache directly to resolve transwiki import sourcesThis, that and the other2015-11-051-16/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to resolve T17583, we will need to have a WMF-wide map of import sources. This will most likely consist of a project dropdown, e.g. "wikipedia", "wiktionary", "meta"... and a subproject dropdown, listing each language edition of the selected project. This would mean, if we wanted to import a page from French Wikipedia to English Wikipedia, we would need to select project "wikipedia" and language "fr". Currently, this causes the error "Bad interwiki link", because the prefix "wikipedia:" maps to the local project namespace on enwiki, instead of the relevant interwiki prefix. To avoid this error we need to bypass Title and directly query the interwiki map. Change-Id: I68989203e367e7ea515e4ae2222c330b264a7cb1
* | Let Import also read CDATA as contentMatthias Mullie2015-11-021-0/+1
| | | | | | | | Change-Id: I55275e20bb2fd589247fca5c44fd54d1ae9ff686
* | Set correct parentid on importumherirrender2015-10-091-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When importing over an existing page the parentid is set to the latest rev id of the page, which makes the size diff in history unusable. The import constructs the Revision object without a parentid and than Revision::getPreviousRevisionId is using the page_latest field to propagate the missing parentid. Avoid this bad propagate by select the previous revision id depending on timestamp before construct of the Revision object. Bug: T114806 Change-Id: Iee44d5a74de459f733ea62373cdbe9911e77083f
* | Fix issues identified by SpaceBeforeSingleLineComment sniffVivek Ghaisas2015-09-261-4/+4
| | | | | | | | Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9
* | Fix exception in Import, when import of a revision failsaude2015-09-221-5/+13
|/ | | | | | | | | | | | | | | A 'notice' is thrown when an import fails, for some reason, such as the user does not have permission, and the reason is reported to the user. In this case, $title is false and not a Title object, as needed by the beforeImportPage callback (which calls WikiPage::factory). As well, $pageInfo['_title'] is undefined, in pageOutCallback, which also calls WikiPage::factory via finishImportPage. Bug: T108544 Change-Id: I55042fdf305cd1198d3a4b28a0ebb5ce31b76a1f
* Fixed spacingumherirrender2015-06-171-3/+3
| | | | | | | | | | | | | | | | - Removed space after casts - Removed spaces in array index - Added spaces around string concat - Added space after words: switch, foreach - else if -> elseif - Removed parentheses around require_once, because it is not a function - Added newline at end of file - Removed double spaces - Added spaces around operations - Removed repeated newlines Bug: T102609 Change-Id: Ib860222b24f8ad8e9062cd4dc42ec88dc63fb49e
* Use mediawiki/at-ease library for suppressing warningsKunal Mehta2015-06-111-2/+2
| | | | | | | | | | | | | | | wfSuppressWarnings() and wfRestoreWarnings() were split out into a separate library. All usages in core were replaced with the new functions, and the wf* global functions are marked as deprecated. Additionally, some uses of @ were replaced due to composer's autoloader being loaded even earlier. Ie1234f8c12693408de9b94bf6f84480a90bd4f8e adds the library to mediawiki/vendor. Bug: T100923 Change-Id: I5c35079a0a656180852be0ae6b1262d40f6534c4
* Reset Title cache when importing titles.daniel2015-05-241-2/+1
| | | | | | | | | | | | WikiImporter now uses NaiveImportTitleFactory, which in turn uses Title::makeTitleSafe, bypassing the internal title cache. To avoid (potentially cached) Title objects obtained via Title::newFromText getting out of sync, WikiImporter now clears the title cache in addition to clearing the LinkCache. NOTE: a test for this is provided by I2be12fa7d439b. Bug: T89307 Change-Id: Ib50c48d4797fc21c62090c0be69e87f7e7d07428
* Make import destination UI more intuitive and clearerThis, that and the other2015-04-221-7/+2
| | | | | | | | | | | | | | | | | | | | | | | Previously there were two fields: Destination namespace, and Destination root page. They were both optional, and the "root page" one in particular was a bit mysterious until you tried it out. In addition, there was a strange interaction when you set both fields (I still don't quite understand what used to happen in this case). Now, there is a set of three clearly described radio buttons, allowing the user to select whether to import pages into their automatically chosen locations, into a single namespace, or as subpages of a given page. These correspond to the three ImportTitleFactory classes available in MediaWiki. See https://phabricator.wikimedia.org/M28 for a screenshot. The logic of WikiImporter#setTargetNamespace is tweaked slightly to remove the interaction between target namespace and target root page, since only one of these options can now be set. Similarly, the API's import module is modified in the same way. Bug: T17908 Change-Id: I11521260a88a7f4a95fbdb71ac50bcf7b4fe5cd1
* Enable entity loader and handle errors nicely in WikiImporter constructorThis, that and the other2015-04-111-2/+18
| | | | | | | | | | | | | | Two issues being addressed here: * Slightly friendlier message (instead of fatal) if libxml is not present * Need to make sure the entity loader is enabled when opening XML documents Also provide an error message when XMLReader::open fails, as otherwise, the user sees cryptic errors from code that tries to use the (unopened) XMLReader. Bug: T45868 Bug: T86036 Change-Id: Ibcccce9f09f87b17c3093fd0c3c3ff74d7dc6cb7
* Merge "Use XML localName when importing"jenkins-bot2015-04-091-15/+15
|\
| * Use XML localName when importingThis, that and the other2015-03-271-15/+15
| | | | | | | | | | | | | | XMLReader#name gives the qualified name, which was not a good thing to use. Bug: T6520 Change-Id: I8174fe64791f0e8d0c6677169595201446eab583
* | Add null check in WikiImporterThis, that and the other2015-03-301-8/+13
|/ | | | | | | | | | | | This is my code, and it caused fatals in production whenever anyone tried to import anything :( This should get rid of the fatals, but obviously this won't fix the underlying issue of WikiPage::getContent() sometimes returning null. See the task for more info on that issue. Bug: T94325 Change-Id: I68ce2288d7d209733bceffe42e1876c7afcd73d3
* Made wfFindFile/wfLocalFile callers use explicit "latest" flagsAaron Schulz2015-03-061-0/+1
| | | | | | | | * Callers that should not use caches won't * Aliased the old "bypassCache" param to "latest" bug: T89184 Change-Id: I9f79e5942ced4ae13ba4de0b4c62908cc746e777
* Profile all external HTTP requests from MWChad Horohoe2015-03-031-2/+2
| | | | Change-Id: Ie980b080da2ef21ec7d9fc32f1accc55710de140
* Avoid access to array key that does not existphysikerwelt2015-02-281-5/+6
| | | | | | | | | Accessing an array element that is not set causes a PHP notice. This change first, checks if the array key is present. Bug: T91127 Change-Id: I468a95851e6acdb8186a06b0a2ac73499cc4611f
* Merge "Cache countable statistics to prevent multiple counting on import"Legoktm2015-02-131-2/+38
|\
| * Cache countable statistics to prevent multiple counting on importThis, that and the other2015-02-041-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment, when $wgArticleCountMethod = 'link' (as it is on the WMF cluster), we are querying the Slave database before each individual revision is imported, in order to find out whether the page is countable at that time. This is not sensible, as (1) the slave lags behind the master, but (2) even the master may not be up to date, since page link updates take place through the job queue. This change sets up a cache to hold countable values for pages where import activity has already occurred. That way, we aren't hitting the DB on every revision, only to get an incorrect response back. Bug: T42009 Change-Id: I99189c82672d7790cda5036b6aa9883ce6e566b0
* | Clean up state of libxml on failed import.daniel2015-02-111-25/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure we always call XMLReader::close() to clean up libxml's internal state, even if import fails with an exception. Otherwise, any subsequent attempt at importing (or otherwise using an XMLReader) will fail with: XMLReader::open(): Unable to open source data This is particularly annoying for unit tests which should be allowed to fail without dragging down subsequent tests. Even more importantly, they may explicitly be testing a failure case, which should not cause subsequent tests to fail. NOTE: Wikibase patch Id035ecebebb67 is blocked on this, please re-check once this is merged. Change-Id: I31c014df39aa11c11ded70050ef12a8e2c5fefc5
* | Common interface for ImportStreamSource and ImportStringSource.daniel2015-02-101-6/+30
| | | | | | | | | | | | | | | | ImportStringSource is handy for testing, but was unusable due to type hints against ImportStreamSource. Introducing a common interface implemented by both fixes this. Change-Id: I820ffd8312789c26f55c18b6c46be191a550870a
* | Merge "Import: Fix error reporting"jenkins-bot2015-01-281-1/+1
|\ \ | |/ |/|
| * Import: Fix error reportingJeff Janes2014-09-121-1/+1
| | | | | | | | | | | | | | FileRepoStatus does not have a getXml method. Make the import routine invoke getHTML instead. Change-Id: I571cfe7165b92397f205c8710d260feeec5cc2ca
* | Merge "Proper namespace handling for WikiImporter"jenkins-bot2015-01-051-50/+107
|\ \
| * | Proper namespace handling for WikiImporterThis, that and the other2014-12-101-50/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, the import backend has tried to resolve titles in the XML data using the regular Title class. This is a disastrous idea, as local namespace names often do not match foreign namespace titles. There is enough metadata present in XML dumps generated by modern MW versions for the target namespace ID and name to be reliably determined. This metadata is contained in the <siteinfo> and <ns> tags, which (unbelievably enough) was totally ignored by WikiImporter until now. Fallbacks are provided for older XML dump versions which may be missing some or all of this metadata. The ForeignTitle class is introduced. This is intended specifically for the resolution of titles on foreign wikis. In the future, an InterwikiTitle class could be added, which would inherit ForeignTitle and add members for the interwiki prefix and fragment. Factory classes to generate ForeignTitle objects from string data, and Title objects from ForeignTitle objects, are also added. The 'AfterImportPage' hook has been modified so the second argument is a ForeignTitle object instead of a Title (the documentation was wrong, it was never a string). LiquidThreads, SMW and FacetedSearch all use this hook but none of them use the $origTitle parameter. Bug: T32723 Bug: T42192 Change-Id: Iaa58e1b9fd7287cdf999cef6a6f3bb63cd2a4778
* | | Documented the Classes ImportStringSource and ImportStreamSourceEvan McIntire2014-12-301-2/+5
| | | | | | | | | | | | | | | | | | Added short descriptions for each class Change-Id: I28d3dea76ab70326a1e16b7c41b1f3758f8648b8
* | | Change case of class names to match declarationsKevin Israel2014-12-191-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found by running tests under a version of PHP patched to report case mismatches as E_STRICT errors. User classes: * MIMEsearchPage * MostlinkedTemplatesPage * SpecialBookSources * UnwatchedpagesPage Internal classes: * DOMXPath * stdClass * XMLReader Did not change: * testautoLoadedcamlCLASS * testautoloadedserializedclass Change-Id: Idc8caa82cd6adb7bab44b142af2b02e15f0a89ee
* | | Replace wfRunHooks calls with direct Hooks::run callsAaron Schulz2014-12-101-6/+6
|/ / | | | | | | | | | | * This avoids the overhead of an extra function call Change-Id: I8ee996f237fd111873ab51965bded3d91e61e4dd
* / Import.php: Use Config instead of globalsKunal Mehta2014-10-221-10/+23
|/ | | | Change-Id: I4d1a8c443cfa360c5d388364c580d48fa7124099
* Fix for Ia9baaf0b: Make previously public variables public againStephan Gambke2014-08-291-15/+15
| | | | | | | | | | | | | Change Ia9baaf0b changed the visibility of member variables (many of which are not otherwise exposed, e.g. by a method) and by that introduced a major API change breaking extensions. This patch explicitly marks affected variables as public again, keeping the intent of the original patch of making phpcs-strict pass on includes/ directory. Bug: 67522 Bug: 67984 Change-Id: I498512b2a1e615365bb477c1fd210aaa3241ca03
* Remove wrong @return from doc blocksumherirrender2014-08-251-3/+0
| | | | | | | These functions actually does not return anything, so the @return is wrong here. '@return void' is ignored. Change-Id: I11495ee05b943c16c1c4715d617c8b50de22276c
* Drop "left in" debugging var_dump from WikiImporterThiemo Mättig2014-08-221-41/+0
| | | | | | | | | | I found this on accident when searching for a var_dump I forgot somewhere in my own code. We are maintaining production code here, right? Debugging and testing should be somewhere else. Also note the stray print before the var_dump. Change-Id: I98725b277039f55db9ff95399e9559a477b43c26
* Merge "Remove unused XMLReader2 class"jenkins-bot2014-07-271-24/+0
|\
| * Remove unused XMLReader2 classThis, that and the other2014-07-261-24/+0
| | | | | | | | | | | | | | Undocumented and unused within core. Was previously used in WikiImporter, but that use was removed in r81437. Change-Id: I45f4ff3fae19a7d9c1a0dacb2e02d53ee4bdaefb
* | Use master DB to check for page existence during importThis, that and the other2014-07-261-0/+1
|/ | | | | | | | | | | By default, slaves are used for the existence check. However, in the case of importing many revisions of the one page, the chances are that they won't have caught up to the fact that that page has just been created, causing site statistics to be incorrectly updated. We need to use the master DB for this check. Bug: 40009 Change-Id: I301353fb976a982f58635b87d9960d81fc541d14
* Cleanup some docs (includes/*.php)umherirrender2014-07-241-2/+2
| | | | | | | | | | - Swap "$variable type" to "type $variable" - Added missing types - Fixed spacing inside docs - Makes beginning of @param/@return/@var/@throws in capital - Changed some types to match the more common spelling Change-Id: I783e4dbfe5f6f98b32b9a03ccf6439e13e132bcc
* Fixed spacingumherirrender2014-07-241-1/+1
| | | | | | | | | | - Removed spaces after not operator (!) - Removed spaces inside array index - use tab as indent instead of spaces - Add newline at end of file - Removed spaces after casts Change-Id: I9ba17c4385fcb43d38998d45f89cf42952bc791b
* Fixed spacingumherirrender2014-07-191-1/+1
| | | | | | | | - Added/removed spaces around parenthesis - Added space after switch/if/foreach - changed else if to elseif Change-Id: I99cda543e0e077320091addd75c188cb6e3a42c2
* Correct doc of WikiImporter::__construct parameterAdrian Lang2014-06-021-4/+4
| | | | Change-Id: I0c61bb4f8d1e51f3b58ff99a9c632561dfd5134d
* Merge "Correctly parse 'redirect' XML tag during Special:Import."jenkins-bot2014-05-281-9/+29
|\
| * Correctly parse 'redirect' XML tag during Special:Import.Sebastian Brückner2014-05-281-9/+29
| | | | | | | | | | Bug: 65481 Change-Id: Id9b3b7878b2e7b6fc7a06b163e5bac60e700490e
* | Merge "Introduce ContentHandler::importTransform."jenkins-bot2014-05-271-13/+27
|\ \ | |/ |/|
| * Introduce ContentHandler::importTransform.daniel2014-05-201-13/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ContentHandler::importTransform allows ContentHandler implementations to apply transformations on page content upon import. Such transformatiosn may by useful to update from legacy formats, apply ID rewriting, etc. Note that the transformation is done on the serialized content. This allows for a "raw" import implementation that writes improted blobs directly into a blob store without unserializing them into an intermediary representation. Implementations may choose to unserialize, transform, and then re-serialize. Bug: 65256 Change-Id: I290fdf5589af43def8b3eddb68b5e1c23f6124e8
* | Inserted test whether the resource 'uploadsource' is already registered.Alexander Lehmann2014-05-201-1/+3
|/ | | | | | Bug: 65530 Change-Id: I1b82d6dc6a37792d4e7b7d01316802ea4d38a88b