Difference between revisions of "Backup and Restore"

From PKC
Jump to navigation Jump to search
 
(52 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Introduction=
=Introduction=
To ensure this MediaWiki's content will not be lost, we created a set of script and put it in $wgResourceBase's extensions/BackupAndRestore directory.
For [[PKC]] related data backup/restore, see [[Backup_and_Restore_Loop]].


The main challenge is to ensure both textual data and binary files are backed up and restored. The most reliable ways are the two following actions:
To ensure this MediaWiki's content will not be lost<ref name="Woziak on Apple">{{:Video/Steve Jobs never understood the computer part: Wozniak - ET Exclusive}}</ref><ref extends="Woziak on Apple">{{:The Standard Question in Data Science}}</ref>, we created a set of script and put it in $wgResourceBase's extensions/BackupAndRestore directory.
#[[Backup_and_Restore#Database_Backup Official Database Backup Tools]]
#[[#backup and Restore Using maintenance scripts]]


Alternatively, the data set could be exported to a SQL file. Since that is running in a separate Docker service, the process will be discussed in a different page.
The main challenge is to ensure both textual data and binary files are backed up and restored. There are four distinct steps:


Data backup practice should follow something like 3-2-1 backup principle:
#[[Backup_and_Restore#Database_Backup|Official Database Backup Tools]]
* 3 Copies of Data
#[[Backup_and_Restore#Media_File_Backup|Official Media File Backup Tools]]
* 2 Types of Storage
#[[Backup_and_Restore#Restoring Binary Files|Restoring Binary Files]]
* 1 Offsite storage location
#[[Backup_and_Restore#Restoring SQL Data|Restoring SQL Data]]
There are other practices, which can be found here<ref>Mark Cambell, Why 3-2-1 Backup Sucks, https://www.unitrends.com/blog/3-2-1-backup-sucks, last accessed: August 1st 2021</ref>.


==Importance==
It is necessary to study the notion File Backend<ref>[[mw:FileBackend_design_considerations|MediaWiki's File backend design considerations]]</ref> for Mediawiki: such as [[mw:Manual:CopyFileBackend.php|CopyFileBackend.php]].
To enable proper data security, convenient backup and restore procedure would be the first layer.


This MediaWiki instance already stores all data under [[mountPoint/images]] and [[mountPoint/mariadb]], these two directories store
For specific data loading scripts, please see:[[ImportTextFiles.php]]
all the data relevant to MediaWiki, including its uploaded media files, and textual data stored in the [[MariaDB]] database.


==Implementation==
==Database Backup==
The default [[up.sh]] shell script is a way to allow users to kickoff the program. This process does the following tasks:
For textual data backup, the fastest way is to use "mysqldump". The more detailed instructions can be found in the following link: <ref>[https://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki MediaWiki:Manual:Backing up a wiki]</ref>


# Before shutting down existing instance of docker services, it first asks the existing instance to backup the data.
To backup all the uploaded files, such as images, pdf files, and other binary files, you can reference the following Stack Overflow answer<ref>[https://stackoverflow.com/questions/1002258/exporting-and-importing-images-in-mediawiki Stack Overflow: Exporting and Importing Images in MediaWiki]</ref>
# Then, use [[docker-compose down --volumes]] to shut down all the docker services.
# If the [[mountPoint/]] directory does not exist, yet, decompress [[InitialDataPackage.tar.gz]] to the mountPoint/ directory.
# When the mountPoint directory has the initial data content, start docker services, using docker-compose up -d
# The docker-compose program reads [[docker-compose.yml]] file to configure docker services.
# When the docker services start running, take the existing data in the directory, if [[images/UploadedFiles/]] directory or [[XLPLATEST.xml]] are available, these data elements will be imported to the database.


==Instructions==
In the [[PKC]] docker-compose configuration, the backup file should be dumped to <code>/var/lib/mysql</code> for convenient file transfer on the host machine of Docker runtime.
There are two main parts of Backing up MediaWiki Data, the MySQL(MariabDB) database, and uploaded Media Files.
Example of the command to run on the Linux/UNIX shell:
'''We strongly advise administrators to first restore the image files, then, load the Database content.'''
This way, all the images would be referenced correctly. This is based on operational experience.


===Media File Backup===
mysqldump -h hostname -u userid -p --default-character-set=whatever dbname > backup.sql
{{:Backup MediaWiki MediaFiles}}


For more details, see [[MediaWiki's Installation, Backup, Restore, and Recovery]].
For running this command in PKC's docker implementation, one needs to get into the Docker instance using something like:
===Database Backup===
docker exec -it pkc-mediawiki-1 /bin/bash (<code>pkc-mediawiki-1</code> may be replace by <code>xlp_mediawiki</code>)
{{:Backup MediaWiki Database}}


==Periodic Backup==
Whem running this command on the actual database host machine, <code>hostname</code> can be omitted, and the rest of the parameters are explained below:
It is necessary to regularly and automatically create backup data copies. The ideal way of doing such a task is to use [[cron]] job.
mysqldump -u <code>wikiuser</code> -p<code>PASSWORD_FOR_YOUR_DATABASE</code> <code>my_wiki</code> > backup.sql
(note that you should '''NOT''' leave a space between -p and the passoword data)
Substituting hostname, userid, whatever, and dbname as appropriate. All four may be found in your LocalSettings.php (LSP) file. hostname may be found under <code>$wgDBserver;</code> by default it is localhost. userid may be found under <code>$wgDBuser</code>, whatever may be found under <code>$wgDBTableOptions</code>, where it is listed after <code>DEFAULT CHARSET=</code>. If whatever is not specified mysqldump will likely use the default of utf8, or if using an older version of MySQL, latin1. While dbname may be found under <code>$wgDBname</code>. After running this line from the command line mysqldump will prompt for the server password (which may be found under Manual:$wgDBpassword in LSP).


The following instructions are designed to configure the [[crontab]] for the [[Ubuntu]] container that runs MediaWiki.
For your convenience, the following instruction will compress the file as it is being dumped out.
mysqldump -h hostname -u userid -p dbname | gzip > backup.sql.gz


A documented solution can be found here [https://www.mediawiki.org/wiki/User:Flominator/Backup_MW Backup_MW by Flominator].
===Dump XML file===
 
One may use [[mw:Manual: dumpBackup.php|dumpBackup.php]] to back up textual content into a single XML file. Make sure that the following command is run at the <code>/var/www/html</code> directory.
[[Category:Process]]
<syntaxhighlight lang=bash>
php maintenance/dumpBackup.php --full --quiet > ./images/yourFileName.xml
</syntaxhighlight>


=Restore=
==Media File Backup==
Before running the [[PHP]] maintenance script [[mw:Manual:dumpUploads.php|dumpUploads.php]], you must first create a temporary working directory:
<code>mkdir /tmp/workingBackupMediaFiles</code>
It is common to have files not being dumped out, due to errors caused by escape characters in File names. This will be resolved in the future.
You must first go to the proper directory, in the case of standard [[PKC]] configuration, you must make sure you launch the following command at this location <code>/var/www/html</code>:


The process of restoring data after backing up can be found here:
php maintenance/dumpUploads.php | [[sed]] -e '/\.\.\//d' -e "/'/d" | [[xargs]] --verbose cp -t /tmp/MediaFiles


==Restoring XML Data==
Note that the second filtering expression tries to eliminate files with <code>'</code> character in the file names. After dumping all the files to the <code>MediaFiles</code> directory, make sure that you check whether there are [[files missing]].  
To manage large XML data set, see this [[Video/Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark|video]]<ref>{{:Video/Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark}}</ref>. For programmable tools to process XML data, see here<ref>{{:Tool/Data Extraction Tools}}</ref>.
The following instruction should be launched in the host (through docker exec or kubectl exec -it command) of the container that hosts the mediawiki service. Where <code>wikifolder</code> shown in the following instruction should be replaced by the location of where your MediaWiki is installed. They are usually located in <code>/var/www/html/</code>.


php wikifolder/maintenance/importDump.php --dbpass wikidb_userpassword --quiet --wiki wikidb path-to-dumpfile/dumpfile.xml
Then, compress the file in a zip directory.
php wikifolder/maintenance/rebuildrecentchanges.php
<code>zip -r ~/UploadedFiles_date_time.zip /tmp/MediaFiles</code>


In our case, if you are importing data in a terminal whose working directory contains the xml file to be loaded, say <code>dumpedData.xml</code>. You may just type:
Remember to remove the temporary files and its directory.
php wikifolder/maintenance/importDump.php ./dumpedData.xml
<code>rm -r /tmp/MediaFiles</code>


==Restoring SQL Data==
It is very useful to learn more about [[xargs]] and [[sed]] Unix commands.
The following instruction should be launched in the host (through docker exec or kubectl exec -it command) of the container that hosts the mariadb/mysql service.
 
mysql -u $DATABASE_USER -p $DATABASE_NAME < BACKUP_DATA.sql
 
===Caveat===
In PKC, we suggest that images or any other independent files to be stored in a separate file service. However, for MediaWiki related operations, the following instructions will work. The reason is that we want PKC to be used for managing hyperlinks, and not other kind of data.
 
For accessing mysql using command line, you might face various problems. One way to allow access is to override the <code>entrypoint</code> in <code>docker-compose.yml</code> for the MariaDB Docker container by adding the following entry point statement in <code>docker-compose.yml</code> file:
entrypoint: mysqld_safe --skip-grant-tables --user=mysql
More elaborate content can be found here: [https://gist.github.com/rordi/ab55c92848fd06884172ed22ae88d7df Change root password in MariaDB Docker container running with docker-compose].


==Restoring Binary Files==
==Restoring Binary Files==
The process of restoring binary files, such as images, PDF, and other binary format data, should refer to [[Restore.sh]].
===Instruction for restoring Binary Files to MediaWiki===
Loading binary files to MediaWiki, one must use a maintenance script in the /maintenance directory. This is the command line information. It needs to be launched in the container that runs MediaWiki instance.
Loading binary files to MediaWiki, one must use a maintenance script in the /maintenance directory. This is the command line information. It needs to be launched in the container that runs MediaWiki instance.


Line 90: Line 69:
After all files are uploaded, one should try to run a maintenance scrip on the server that serves Mediawiki service:
After all files are uploaded, one should try to run a maintenance scrip on the server that serves Mediawiki service:
  php $ResourceBasePath/maintenance/rebuildImages.php
  php $ResourceBasePath/maintenance/rebuildImages.php
For more information, please refer to MediaWiki's documentation on [[MW:Manual:rebuildImages.php|Manual:rebuildImages.php]].
For more information, please refer to MediaWiki's documentation on [[MW:Manual:rebuildImages.php|MW:Manual:rebuildImages.php]].


=Backup and Restore Kubernetes=
==Restoring SQL Data==
Please refer to '''Cloud Native DevOps with Kubernetes'''<ref>{{:BOOK/Cloud Native DevOps with Kubernetes/Backing Up Cluster State}}</ref>. Particularly looking at [[Velero]].
The following instruction should be launched in the host (through docker exec or kubectl exec -it command) of the container that hosts the mariadb/mysql service.
mysql -u $DATABASE_USER -p $DATABASE_NAME < BACKUP_DATA.sql
If the file is very large, it might have been compressed in to <code>gz</code> or <code>tar.gz</code> form. Then, just use the piped command to first uncompress it and directly send it to <code>msql</code> program for data loading.
gunzip -c BACKUP_DATA.sql.gz | mysql -u $DATABASE_USER -p $DATABASE_NAME
 
===Load XML file===
One may use [[mw:Manual:importDump.php|importDump.php]] to restore textual content from a single XML file. Make sure that the following command is run at the <code>/var/www/html</code> directory.
<syntaxhighlight lang=bash>
php maintenance/importDump.php < yourFileName.xml
</syntaxhighlight>
 
=Industry Solutions=
It would be useful to see how other technology providers solves the Backup and Restore problems. For instance [[Kasten]] can be a relevant solution.
 
==Backup and Restore Loop==
{{:Backup and Restore Loop}}


=References=
=References=
<references />
<references />
=Related Pages=
[[Category:MediaWiki Maintenence]]
[[Category:Backup and Restore]]
[[Category:Site Reliability Engineering]]

Latest revision as of 11:14, 13 June 2022

Introduction

For PKC related data backup/restore, see Backup_and_Restore_Loop.

To ensure this MediaWiki's content will not be lost[1]Cite error: Invalid <ref> tag; invalid names, e.g. too many, we created a set of script and put it in $wgResourceBase's extensions/BackupAndRestore directory.

The main challenge is to ensure both textual data and binary files are backed up and restored. There are four distinct steps:

  1. Official Database Backup Tools
  2. Official Media File Backup Tools
  3. Restoring Binary Files
  4. Restoring SQL Data

It is necessary to study the notion File Backend[2] for Mediawiki: such as CopyFileBackend.php.

For specific data loading scripts, please see:ImportTextFiles.php

Database Backup

For textual data backup, the fastest way is to use "mysqldump". The more detailed instructions can be found in the following link: [3]

To backup all the uploaded files, such as images, pdf files, and other binary files, you can reference the following Stack Overflow answer[4]

In the PKC docker-compose configuration, the backup file should be dumped to /var/lib/mysql for convenient file transfer on the host machine of Docker runtime. Example of the command to run on the Linux/UNIX shell:

mysqldump -h hostname -u userid -p --default-character-set=whatever dbname > backup.sql

For running this command in PKC's docker implementation, one needs to get into the Docker instance using something like:

docker exec -it pkc-mediawiki-1 /bin/bash (pkc-mediawiki-1 may be replace by xlp_mediawiki)

Whem running this command on the actual database host machine, hostname can be omitted, and the rest of the parameters are explained below:

mysqldump -u wikiuser -pPASSWORD_FOR_YOUR_DATABASE my_wiki > backup.sql
(note that you should NOT leave a space between -p and the passoword data)

Substituting hostname, userid, whatever, and dbname as appropriate. All four may be found in your LocalSettings.php (LSP) file. hostname may be found under $wgDBserver; by default it is localhost. userid may be found under $wgDBuser, whatever may be found under $wgDBTableOptions, where it is listed after DEFAULT CHARSET=. If whatever is not specified mysqldump will likely use the default of utf8, or if using an older version of MySQL, latin1. While dbname may be found under $wgDBname. After running this line from the command line mysqldump will prompt for the server password (which may be found under Manual:$wgDBpassword in LSP).

For your convenience, the following instruction will compress the file as it is being dumped out.

mysqldump -h hostname -u userid -p dbname | gzip > backup.sql.gz

Dump XML file

One may use dumpBackup.php to back up textual content into a single XML file. Make sure that the following command is run at the /var/www/html directory.

php maintenance/dumpBackup.php --full --quiet > ./images/yourFileName.xml

Media File Backup

Before running the PHP maintenance script dumpUploads.php, you must first create a temporary working directory: mkdir /tmp/workingBackupMediaFiles It is common to have files not being dumped out, due to errors caused by escape characters in File names. This will be resolved in the future. You must first go to the proper directory, in the case of standard PKC configuration, you must make sure you launch the following command at this location /var/www/html:

php maintenance/dumpUploads.php | sed -e '/\.\.\//d' -e "/'/d" | xargs --verbose cp -t /tmp/MediaFiles

Note that the second filtering expression tries to eliminate files with ' character in the file names. After dumping all the files to the MediaFiles directory, make sure that you check whether there are files missing.

Then, compress the file in a zip directory. zip -r ~/UploadedFiles_date_time.zip /tmp/MediaFiles

Remember to remove the temporary files and its directory. rm -r /tmp/MediaFiles

It is very useful to learn more about xargs and sed Unix commands.

Restoring Binary Files

Loading binary files to MediaWiki, one must use a maintenance script in the /maintenance directory. This is the command line information. It needs to be launched in the container that runs MediaWiki instance.

Load images from the UploadedFiles location. In most cases, the variable $ResourceBasePath string can be replaced by /var/www/html.

cd $ResourceBasePath
php $ResourceBasePath/maintenance/importImages.php $ResourceBasePath/images/UploadedFiles/

After all files are uploaded, one should try to run a maintenance scrip on the server that serves Mediawiki service:

php $ResourceBasePath/maintenance/rebuildImages.php

For more information, please refer to MediaWiki's documentation on MW:Manual:rebuildImages.php.

Restoring SQL Data

The following instruction should be launched in the host (through docker exec or kubectl exec -it command) of the container that hosts the mariadb/mysql service.

mysql -u $DATABASE_USER -p $DATABASE_NAME < BACKUP_DATA.sql

If the file is very large, it might have been compressed in to gz or tar.gz form. Then, just use the piped command to first uncompress it and directly send it to msql program for data loading.

gunzip -c BACKUP_DATA.sql.gz | mysql -u $DATABASE_USER -p $DATABASE_NAME

Load XML file

One may use importDump.php to restore textual content from a single XML file. Make sure that the following command is run at the /var/www/html directory.

php maintenance/importDump.php < yourFileName.xml

Industry Solutions

It would be useful to see how other technology providers solves the Backup and Restore problems. For instance Kasten can be a relevant solution.

Backup and Restore Loop

PKC Automatic Backup and Restore Process is a series of command executed on backup-source server and backup-target server. The target of the process loop is to ensure there are always a backup data that can be easily restored into target server. Below is the schematic diagram on how the process is executed.

Backup-restore

Introduction

Configuration process is done by ansible script, that is executed from ansible host. Below are the outline of deployment process in general

  1. Get the source code; From github to your local machine. The source code will contain all necessary ansible script to execute the deployment process.
  2. Adjust the configuration; Adjust configuration on your local machine to define the backup-and-restore loop
  3. Execute Backup Process; Execute backup installation
  4. Execute Restore Process; Execute restore installation

Step to deploy

Below are the step to perform Backup and Restore Loop from ansible agent machine. You need to install ansible on your local machine before you can execute the ansible script, please find the comprehensive methode of installing ansible on Ansible Installation

Get the source code

Download the source code from github link below

git clone https://github.com/xlp0/PKC

Adjust the configuration

The source code of PKC code is consist of below directory, there are two files that need to adjust before we cant start performing Backup-Restore-Loop. Those files are :

  1. host-restore in ./resource/config
  2. cs-restore-remote.yml in ./resource/ansible-yml
.
└── PKC/
    └── resource/
        ├── ansible-yml
        ├── config
        └── script

file host-restore, please replace values with your specific case.

[your-source-server] ansible_connection=ssh ansible_ssh_private_key_file=[.pem] ansible_user=[user] domain=[src-domain]
[your-destination-server] ansible_connection=ssh ansible_ssh_private_key_file=[.pem] ansible_user=[user] domain=[dest-domain]

file cs-restore-remote.yml. on header parts

- name: Backup and Restore Loop
  hosts: all
  gather_facts: yes
  become: yes
  become_user: root
  vars:
    - pkc_install_root_dir: "/home/ubuntu/cs/"
    - src_server: [put-your-source-server-here]
    - dst_server: [put-your-destiantion-server-here]

  tasks:
... redacted

Execute Backup Restore Loop Process

To execute, please ensure that you have change the current folder into PKC's project root folder and paste below command.

ansible-playbook -i ./resources/config/hosts-restore ./resources/ansible-yml/cs-restore-remote.yml

As the output of this process is the latest backup of mySQL file and Mediawiki Image file in

PKC\resource\ansible-yml\backup\

Below is the table of output file

File Name Folder Remarks
*.tar.gz ./resource/ansible-yml/backup latest backup file with timestamp in filename, each time the process is executed, will generate two backup files.
Which are database backup file, and image backup file
restore_report.log ./resource/ansible-yml/backup Log of backup-restore-loop output.

References

https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html

References


Related Pages