mediawikicore.git - The collaborative editing software that runs Wikipedia.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Move Language subclasses to includes/	Timo Tijhof	2021-08-04	1	-138/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Depending on which namespace we want these classes to have after T166010 they could either stay in includes/languages/ (plural) in their own MediaWiki\Languages\-namespace dedicated to Language subclasses, or they could go in into a subdirectory like `includes/language/languages/` if we want to keep them in the same top-level namespace as other Language classes and services, but in a more nested namespace. For now, I've made the smaller change and kept the Language subclasses in their own directory directly under includes/, not nested further. Bug: T225756 Change-Id: I01015424707b442853879fd50c97f00215e5c2fa
*	languages: Introduce LanguageConverterFactory	Peter Ovchyn	2020-02-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Done: * Replace LanguageConverter::newConverter by LanguageConverterFactory::getLanguageConverter * Remove LanguageConverter::newConverter from all subclasses * Add LanguageConverterFactory integration tests which covers all languages by their code. * Caching of LanguageConverters in factory * Make all tests running (hope that's would be enough) * Uncomment the deprecated functions. * Rename FakeConverter to TrivialLanguageConverter * Create ILanguageConverter to have shared ancestor * Make the LanguageConverter class abstract. * Create table with mapping between lang code and converter instead of using name convention * ILanguageConverter @internal * Clean up code Change-Id: I0e4d77de0f44e18c19956a1ffd69d30e63cf51bf Bug: T226833, T243332
*	(y)etsin fixes, test refactoring, and misc fixes	tjones	2018-05-29	1	-147/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix etsin/етсин/этсин as noted in If933fc67845ac994d9ddfdf8349aff445ec9b13a ** only convert tsin to тсин and let the other rules sort out the e * Refactor most tests to be word-specific, which uncovered a couple of bugs in corner cases ** rol/üst prefix matches should match whole words (original [^ü] regex assumed word could not be end of string * Fixed incidental bugs I noticed while looking into the items above куркчи => kürkçi was in the wrong section cönk => джонк was in the right section, but reversed * Added additional tests cases for all of the above. Change-Id: Ia96be488a7b41c3ddba623b5c9262703b1c82687
*	Crimean Tatar/crh transliteration odds and ends	tjones	2018-05-22	1	-0/+9
\| \| \| \| \| \| \| \|	* refactor '\b' into WB const to make it easy to update in the future * add new ц-related exceptions Bug: T193764 Change-Id: Ib707136f8f2598d1f8ec995bf129b436dfb53cd9
*	CRH Transliteration Pattern Matching Fixes	tjones	2018-04-27	1	-12/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor to match exceptions as patterns, not words - break exception list to C2L and L2C pattern sets - change main loop to break only on Roman numerals and transliterate everything else, rather than tokenizing on single-script words (this fixes the km² problem, too) - update word anchors from ^ and $ to \b - only process Roman numerals for L2C translit - add exception for single "Roman" character followed by a period which looks like an initial - consolidate multi-step transliteration into regsConverter() - remove regex support from main exception list to support strtr() - re-organize some prefix/suffix/whole word patterns to the right place - add tests for recently fixed use cases - add support for many-to-one mappings in both directions - update character classes, exception lists, and regexes based on speaker feedback and example texts Misc other fixes: - fix some character classes errors - remove unneeded character classes - add tests for Roman numerals and quotes - add tests for affixes and regexes Bug: T188321 Bug: T189512 Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d
*	Fix table loading bug for CRH transliteration	tjones	2018-02-26	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In production, the regex and exception tables were not being loaded, resulting in very poor transliteration. The loading has been moved to the contructor, similar to the implementation of the Kazakh transliteration. Also, a bug in the mappings for Ö/ö -> Ё/ё and Ü/ü -> Ю/ю has been fixed. Test cases for specific additional examples have been added. (Though it is worth noting that the regex and exception tables did load properly during unit testing, so the problem wasn't caught there.) Bug: T186727 Change-Id: I6bacee7d9de6f4a870a8a9ef1f04b819ad489c02
*	Add @covers tags to languages tests	Kunal Mehta	2017-12-28	1	-0/+4
\| \| \| \| \| \| \| \|	I removed comments that merely repeated the location of the class being tested. There are other tests in this directory that don't have a corresponding class and need further investigation. Change-Id: Ic16f0887b5030ac53fab4382cfaedfb5426cdb08
*	Crimean Tatar Transliteration	tjones	2017-11-20	1	-0/+72
	This is a first pass at Latin/Cyrillic translitertion for Crimean Tatar (crh). Includes transliteration tables, prefix/suffix mappings, regex mappings, and exceptions lists for words and abbreviations. Regularize CRH language name in messages/* files. Fix "varient" typos in qqq.json. Add unit tests for CRH transliteration. Bug: T23582 Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068