Difference between revisions of "Backup and Restore"
Line 3: | Line 3: | ||
The main challenge is to ensure both textual data and binary files are backed up and restored. The most reliable ways are the two following actions: | The main challenge is to ensure both textual data and binary files are backed up and restored. The most reliable ways are the two following actions: | ||
#[[Backup_and_Restore#Database_Backup Official Database Backup Tools]] | #[[Backup_and_Restore#Database_Backup|Official Database Backup Tools]] | ||
#[[#backup and Restore Using maintenance scripts]] | #[[#backup and Restore Using maintenance scripts]] | ||
Revision as of 12:38, 12 January 2022
Introduction
To ensure this MediaWiki's content will not be lost, we created a set of script and put it in $wgResourceBase's extensions/BackupAndRestore directory.
The main challenge is to ensure both textual data and binary files are backed up and restored. The most reliable ways are the two following actions:
Alternatively, the data set could be exported to a SQL file. Since that is running in a separate Docker service, the process will be discussed in a different page.
Data backup practice should follow something like 3-2-1 backup principle:
- 3 Copies of Data
- 2 Types of Storage
- 1 Offsite storage location
There are other practices, which can be found here[1].
Importance
To enable proper data security, convenient backup and restore procedure would be the first layer.
This MediaWiki instance already stores all data under mountPoint/images and mountPoint/mariadb, these two directories store all the data relevant to MediaWiki, including its uploaded media files, and textual data stored in the MariaDB database.
Implementation
The default up.sh shell script is a way to allow users to kickoff the program. This process does the following tasks:
- Before shutting down existing instance of docker services, it first asks the existing instance to backup the data.
- Then, use docker-compose down --volumes to shut down all the docker services.
- If the mountPoint/ directory does not exist, yet, decompress InitialDataPackage.tar.gz to the mountPoint/ directory.
- When the mountPoint directory has the initial data content, start docker services, using docker-compose up -d
- The docker-compose program reads docker-compose.yml file to configure docker services.
- When the docker services start running, take the existing data in the directory, if images/UploadedFiles/ directory or XLPLATEST.xml are available, these data elements will be imported to the database.
Instructions
There are two main parts of Backing up MediaWiki Data, the MySQL(MariabDB) database, and uploaded Media Files.
We strongly advise administrators to first restore the image files, then, load the Database content. This way, all the images would be referenced correctly. This is based on operational experience.
Media File Backup
The most essential command is the following:
First, create a temporary working directory:
mkdir /tmp/workingBackupMediaFiles
It is common to have files not being dumped out, due to errors caused by escape characters in File names. This will be resolved in the future.
You must first go to the proper directory, in the case of standard PKC configuration, you must make sure you launch the following command at this location /var/www/html
:
php maintenance/dumpUploads.php \
| sed 's~mwstore://local-backend/local-public~./images~' \
| xargs cp -t /tmp/MediaFiles
Then, compress the file in a zip directory.
zip -r ~/Mediafiles.zip /tmp/MediaFiles
Remember to remove the temporary files and its directory.
rm -r /tmp/workingBackupMediaFiles
For more information:
Reference the following Stack Overflow answer:
- Exporting and importing images in MediaWiki
- MediaWiki data backup
- MediaWiki Backup and Restore Bash Scripts
For more details, see MediaWiki's Installation, Backup, Restore, and Recovery.
Database Backup
For textual data backup, the fastest way is to use "mysqldump". The more detailed instructions can be found in the following link: [2]
To backup all the uploaded files, such as images, pdf files, and other binary files, you can reference the following Stack Overflow answer[3]
In the PKC docker-compose configuration, the backup file should be dumped to /var/lib/mysql
for convenient file transfer on the host machine of Docker runtime.
Example of the command to run on the Linux/UNIX shell:
mysqldump -h hostname -u userid -p --default-character-set=whatever dbname > backup.sql
For running this command in PKC's docker implementation, one needs to get into the Docker instance using something like:
docker exec -it pkc-mediawiki-1 /bin/bash (pkc-mediawiki-1
may be replace byxlp_mediawiki
)
Whem running this command on the actual database host machine, hostname
can be omitted, and the rest of the parameters are explained below:
mysqldump -uwikiuser
-pPASSWORD_FOR_YOUR_DATABASE
my_wiki
> backup.sql (note that you should NOT leave a space between -p and the passoword data)
Substituting hostname, userid, whatever, and dbname as appropriate. All four may be found in your LocalSettings.php (LSP) file. hostname may be found under $wgDBserver;
by default it is localhost. userid may be found under $wgDBuser
, whatever may be found under $wgDBTableOptions
, where it is listed after DEFAULT CHARSET=
. If whatever is not specified mysqldump will likely use the default of utf8, or if using an older version of MySQL, latin1. While dbname may be found under $wgDBname
. After running this line from the command line mysqldump will prompt for the server password (which may be found under Manual:$wgDBpassword in LSP).
For your convenience, the following instruction will compress the file as it is being dumped out.
mysqldump -h hostname -u userid -p dbname | gzip > backup.sql.gz
Periodic Backup
It is necessary to regularly and automatically create backup data copies. The ideal way of doing such a task is to use cron job.
The following instructions are designed to configure the crontab for the Ubuntu container that runs MediaWiki.
A documented solution can be found here Backup_MW by Flominator.
Restore
The process of restoring data after backing up can be found here:
Restoring XML Data
To manage large XML data set, see this video[4]. For programmable tools to process XML data, see here[5].
The following instruction should be launched in the host (through docker exec or kubectl exec -it command) of the container that hosts the mediawiki service. Where wikifolder
shown in the following instruction should be replaced by the location of where your MediaWiki is installed. They are usually located in /var/www/html/
.
php wikifolder/maintenance/importDump.php --dbpass wikidb_userpassword --quiet --wiki wikidb path-to-dumpfile/dumpfile.xml php wikifolder/maintenance/rebuildrecentchanges.php
In our case, if you are importing data in a terminal whose working directory contains the xml file to be loaded, say dumpedData.xml
. You may just type:
php wikifolder/maintenance/importDump.php ./dumpedData.xml
Restoring SQL Data
The following instruction should be launched in the host (through docker exec or kubectl exec -it command) of the container that hosts the mariadb/mysql service.
mysql -u $DATABASE_USER -p $DATABASE_NAME < BACKUP_DATA.sql
Caveat
In PKC, we suggest that images or any other independent files to be stored in a separate file service. However, for MediaWiki related operations, the following instructions will work. The reason is that we want PKC to be used for managing hyperlinks, and not other kind of data.
For accessing mysql using command line, you might face various problems. One way to allow access is to override the entrypoint
in docker-compose.yml
for the MariaDB Docker container by adding the following entry point statement in docker-compose.yml
file:
entrypoint: mysqld_safe --skip-grant-tables --user=mysql
More elaborate content can be found here: Change root password in MariaDB Docker container running with docker-compose.
Restoring Binary Files
The process of restoring binary files, such as images, PDF, and other binary format data, should refer to Restore.sh.
Instruction for restoring Binary Files to MediaWiki
Loading binary files to MediaWiki, one must use a maintenance script in the /maintenance directory. This is the command line information. It needs to be launched in the container that runs MediaWiki instance.
Load images from the UploadedFiles location. In most cases, the variable $ResourceBasePath
string can be replaced by /var/www/html
.
cd $ResourceBasePath php $ResourceBasePath/maintenance/importImages.php $ResourceBasePath/images/UploadedFiles/
After all files are uploaded, one should try to run a maintenance scrip on the server that serves Mediawiki service:
php $ResourceBasePath/maintenance/rebuildImages.php
For more information, please refer to MediaWiki's documentation on Manual:rebuildImages.php.
Backup and Restore Kubernetes
Please refer to Cloud Native DevOps with Kubernetes[6]. Particularly looking at Velero.
References
- ↑ Mark Cambell, Why 3-2-1 Backup Sucks, https://www.unitrends.com/blog/3-2-1-backup-sucks, last accessed: August 1st 2021
- ↑ MediaWiki:Manual:Backing up a wiki
- ↑ Stack Overflow: Exporting and Importing Images in MediaWiki
- ↑ Heaton, Jeff (Sep 19, 2019). Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark. local page: Jeff Heaton.
- ↑ There are many tools designed to perform data extraction for XML-based MediaWiki Content. The following one is an example: https://github.com/attardi/wikiextractor
- ↑ Arundel, John; Domingus, Justin (2019). "Backing Up Cluster State". Cloud Native DevOps with Kubernetes. O'Reilly Media. p. 207. ISBN 978-1-492-04076-7.