Difference between revisions of "ImportTextFiles.php"

From PKC
Jump to navigation Jump to search
Line 7: Line 7:
# Move the file into a place where MediaWiki's maintenance script can easily have access, such as <code>/var/www/html/images</code>. For Docker deployments, this directory is likely to be mapped onto the host machine's hard drive, therefore, easy to put the text files under this directory. For example:<code>INPUTDATA_DIR</code> under <code>/var/www/html/images</code>.
# Move the file into a place where MediaWiki's maintenance script can easily have access, such as <code>/var/www/html/images</code>. For Docker deployments, this directory is likely to be mapped onto the host machine's hard drive, therefore, easy to put the text files under this directory. For example:<code>INPUTDATA_DIR</code> under <code>/var/www/html/images</code>.
# Run the <code>importTextFiles.php</code> script like follows:
# Run the <code>importTextFiles.php</code> script like follows:
  $php ./maintenance/importTextFiles.php -s "Loading Textual Content from external sources"  --overwrite --use-timestamp ./INPUTDATA_DIR/*
  $php ./maintenance/importTextFiles.php -s "Loading Textual Content from external sources"  --overwrite --use-timestamp ./images/INPUTDATA_DIR/*





Revision as of 02:13, 27 January 2022

Please refer to the original MediaWiki document[1]

A Typical Importation Process

  1. Extract text content from some sources, ideally Wikipedia or some reputable data source with existing data model.
    1. In the case of Wikipedia, every page can be considered as a single text file, and each page has a unique ID provided in either XML or MediaWiki data model.
    2. Dump all the textual data content into a directory. Each page should be stored in a unique file, whose page name is stored in a comma delimited file (CSV), that each line associates a unique ID with a the page name.
  2. Move the file into a place where MediaWiki's maintenance script can easily have access, such as /var/www/html/images. For Docker deployments, this directory is likely to be mapped onto the host machine's hard drive, therefore, easy to put the text files under this directory. For example:INPUTDATA_DIR under /var/www/html/images.
  3. Run the importTextFiles.php script like follows:
$php ./maintenance/importTextFiles.php -s "Loading Textual Content from external sources"  --overwrite --use-timestamp ./images/INPUTDATA_DIR/*



References

Related Pages