Difference between revisions of "ImportTextFiles.php"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
Please refer to the original MediaWiki document<ref>[[mw:Manual:importTextFiles.php]]</ref> | Please refer to the original MediaWiki document<ref>[[mw:Manual:importTextFiles.php]]</ref> | ||
=A Typical Importation Process= | |||
# Extract text content from some sources, ideally Wikipedia or some reputable data source with existing data model. | |||
## In the case of Wikipedia, every page can be considered as a single text file, and each page has a unique ID provided in either XML or MediaWiki data model. | |||
## Dump all the textual data content into a directory. Each page should be stored in a unique file, whose page name is stored in a comma delimited file (CSV), that each line associates a unique ID with a the page name. | |||
# Move the file into a place where MediaWiki's maintenance script can easily have access, such as <code>/var/www/html/images</code>. For Docker deployments, this directory is likely to be mapped onto the host machine's hard drive, therefore, easy to put the text files under this directory. | |||
# Run the <code>importTextFiles.php</code> script like follows: | |||
$php ./maintenance/importTextFiles.php -s "Loading Textual Content from external sources" --overwrite --use-timestamp ./INPUTDATA_DIR/* | |||
<noinclude> | <noinclude> |
Revision as of 02:12, 27 January 2022
Please refer to the original MediaWiki document[1]
A Typical Importation Process
- Extract text content from some sources, ideally Wikipedia or some reputable data source with existing data model.
- In the case of Wikipedia, every page can be considered as a single text file, and each page has a unique ID provided in either XML or MediaWiki data model.
- Dump all the textual data content into a directory. Each page should be stored in a unique file, whose page name is stored in a comma delimited file (CSV), that each line associates a unique ID with a the page name.
- Move the file into a place where MediaWiki's maintenance script can easily have access, such as
/var/www/html/images
. For Docker deployments, this directory is likely to be mapped onto the host machine's hard drive, therefore, easy to put the text files under this directory. - Run the
importTextFiles.php
script like follows:
$php ./maintenance/importTextFiles.php -s "Loading Textual Content from external sources" --overwrite --use-timestamp ./INPUTDATA_DIR/*