{"id":103306,"date":"2024-09-07T20:00:00","date_gmt":"2024-09-07T20:00:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=103306"},"modified":"2024-10-31T17:42:36","modified_gmt":"2024-10-31T17:42:36","slug":"postgresql-backups-to-support-point-in-time-recovery-learning-postgresql-with-grant","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/postgresql-backups-to-support-point-in-time-recovery-learning-postgresql-with-grant\/","title":{"rendered":"PostgreSQL Backups to Support Point In Time Recovery: Learning PostgreSQL with Grant"},"content":{"rendered":"<p><em><strong>This is part of an ongoing series of post by Grant as he takes his vast knowledge of SQL Server and applies it to adding PostgreSQL and shares it with you so you can skip learn from his triumphs and mistakes. For more you can go to the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/series-learning-postgresql-with-grant\/\">Learning PostgreSQL with Grant<\/a> series home page<\/strong><\/em><\/p>\n\n<p>The focus of your backup strategy should not be backups. Instead, you should be focused on restoring databases, because, after all, that\u2019s what matters. Simply having a backup is fine. However, if you can\u2019t restore your data from that backup, it doesn\u2019t matter. On top of this is the ability to ensure you can restore your databases to a point in time, not simply to the last full backup. That ability requires a different approach to how you perform your backups.<\/p>\n<p>PostgreSQL has the capabilities to support backups as I already described in my first article on the topic. PostgreSQL also has the capability to restore to a point in time. However, that does require you to change the way you\u2019re performing your backups. This article advances our understanding of how to better protect your PostgreSQL databases by expanding on the database backups and restores into a more full-blown disaster recovery process through point in time restores.<\/p>\n<p>While the important part is the restore, in a classic chicken or egg conundrum, we can\u2019t talk about restoring until we first have a backup, so I\u2019ll start with how you need to backup your databases in preparation for a point in time restore.<\/p>\n<h3>Continuous Archiving<\/h3>\n<p>Unlike SQL Server, the backups needed for point in time recovery are a little different in PostgreSQL from the full backups. Like SQL Server though, the mechanism that makes this work is a log. In the case of PostgreSQL, it\u2019s the WAL, or Write Ahead Log. This plays an almost identical role to the database logs within SQL Server, recording all actions that modify the data or structures. This means our core process in getting the right kind of protection in place for our PostgreSQL database is the same, conceptually, as it is for SQL Server:<\/p>\n<ol>\n<li>Get a database backup<\/li>\n<li>Get WAL backups<\/li>\n<li>Restore the database, but don\u2019t finish the recovery process<\/li>\n<li>Replay the WAL files up to a point in time<\/li>\n<li>Recover our database and be back online.<\/li>\n<\/ol>\n<p>The backup process I showed in the first article on backups used pg_dump and pg_dumpall to basically export data structures and the data inside them. Those backups could then be used to recover the database, in full. However, these dump files will not work with the WAL for a point in time recovery. Instead, we have to use a completely different approach. We\u2019re going to backup the file system where the data is kept as well as backup the WAL files.<\/p>\n<p>Backing up the file system has implications. We could backup the files that make up a SQL Server backup, rather than run a <code>BACKUP<\/code> command but then we\u2019re looking at the distinct possibility of corrupt files since transactions may be in flight. Same issue here. While the PostgreSQL storage mechanisms are actually a little different, simply backing up the file system doesn\u2019t guarantee restore. However, the incomplete data files that will be backed up, can be fixed by replaying the WAL. Yeah, I found it a bit confusing too.<\/p>\n<p>Well, prepare for a little more confusion.<\/p>\n<p>In order to set this up, we don\u2019t first create our file system backups. First, we have to set up our logs so that they get archived. Let\u2019s talk about it.<\/p>\n<h2>WAL Archiving<\/h2>\n<p>The Write Ahead Log or WAL, is in some ways similar to the logs in SQL Server. However, they are a little different. First, by default, the WAL is kept and maintained in a rolling set of files, each 16mb by default. Similar to how SQL Server maintains the error logs, rolling over a fixed number, with new names, the WAL is maintained by PostgreSQL. The WAL files can be used to, more or less, replay the log to get the database back to a moment in time. However, in order to accomplish this, you first need to set up a mechanism to save the WAL files after they\u2019re full to a second location. That way, you can recover from a backup that predates the latest of the WAL files.<\/p>\n<p>To make this work, you have to have a location where you can write the WAL files. This can be anything from attached storage to cloud-based file share or any other place where you can copy the files after they are filled. Next, you need to be able to create a batch command that can move the file to the appropriate storage location.<\/p>\n<p>First, we have to make some modifications to the default behaviors in how PostgreSQL manages the WAL. Similar to how we set the recovery model in a SQL Server database, we have to change the <code>wal_level<\/code>. It has three &#8211; minimal, replica or logical &#8211; each one adding information to the WAL. Also similar to SQL Server\u2019s \u2018SIMPLE\u2019 recovery model, minimal maintains as little WAL information as possible, purely to support recovery in the event of an unplanned outage, not a full recovery. To get a full, point in time recovery, you have to have the <code>wal_level<\/code> set to replica or logical. The default is replica, so unless you\u2019ve changed something on your PostgreSQL instance, you should be ready for WAL archiving there.<\/p>\n<p>Next, we have to change the configuration setting <code>archive_mode<\/code>. The setting has three values, off (default), on and always. On is self explanatory. Always is a setting for working with replication and goes far beyond our introduction to the concepts here. Editing the <code>postgresql.conf<\/code> file can be done through nano, vim, or whatever your OS allows. You should see the <code>archive_mode<\/code> setting in the file. If not, you can add it. Set it so that it looks like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">archive_mode = on<\/pre>\n<p>Then, you have to add, or change, the configuration setting for the <code>archive_command<\/code>. This is what will move the finished WAL files to your archive location. I\u2019ve created a local directory, just for testing purposes, and I\u2019m copying the files to there through this command:<\/p>\n<pre class=\"lang:none theme:none\">archive_command = 'test ! -f \/walarchive\/%f $ cp %p \/walarchive\/%f'<\/pre>\n<p>This is just a simple copy command. You can absolutely make this more sophisticated as necessary. It verifies that the file doesn\u2019t exist through the test command, then it copies the file over. The <code>%f<\/code> is a placeholder for the file name. <code>%p<\/code> is a placeholder for the finished WAL file that is getting archived.<\/p>\n<p>Now I need to restart the system:<\/p>\n<pre class=\"lang:none theme:none\">Docker restart PostgreHome<\/pre>\n<p>That\u2019s it for configuring the WAL archiving. Next up, backing up all the database files.<\/p>\n<h2>Base Backup<\/h2>\n<p>The most important thing to remember when setting up for a point in time recovery is that you are recovering all the databases, not just one. To do this then, you need to backup all the databases. You can do this two ways. First, the easiest and the one I\u2019ll cover here, is to use the <code>pg_basebackup<\/code> tool. The other way is through a series of commands called a low-level API backup. It allows you to place your server into a backup mode and use file system commands to backup the database. This method allows for a lot more granular control over exactly what is getting backed up.<\/p>\n<p>The use of <code>pg_basebackup<\/code> is very simple. Here is the smallest possible command you can run to get your backups:<\/p>\n<pre class=\"lang:none theme:none\">pg_basebackup -D \\dbarchive<\/pre>\n<p>In this case, I\u2019m running it from the server, so there\u2019s no need to provide a connection. I have to give it an output directory, but that\u2019s it. You can see the output from my execution of the command here:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1697\" height=\"130\" class=\"wp-image-103307\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/07\/word-image-103306-1.png\" \/><\/p>\n<p><code>pg_basebackup<\/code> puts the server into backup mode, copies all the databases into the location specified, and puts the server back into normal operation, all automatically. Additionally, and this is important, it writes out what it did to the <code>backup_label<\/code> file. This gives you the WAL file start and stop points across the backup so that you know which WAL file you\u2019ll need to start with when running a restore. The output of the <code>backup_label<\/code> file is here:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-103308\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/07\/text-description-automatically-generated-1.png\" alt=\"Text\n\nDescription automatically generated\" width=\"471\" height=\"125\" \/><\/p>\n<p>From that core backup, you can add a whole bunch of different functionality. For example, you can choose to have your backups compressed into tar files. You can also create a label for the backup. It\u2019s also possible to watch the progress as it goes. The command to do all this looks like this:<\/p>\n<pre class=\"lang:none theme:none\">pg_basebackup -D \\dbarchive -P -l \\ \n\"Learning PostgreSQL With Grant Backup\" -F tar -z<\/pre>\n<p>There are a very large number of other options to let you control your backups, however, this is enough for a recovery. All that\u2019s left is to schedule running this command on a regular basis.<\/p>\n<h2>Recovery To A Point In Time<\/h2>\n<p>The worst thing possible has happened. Your database has been corrupted, but, you already set up continuous archiving of the WAL and you\u2019re regularly running <code>pg_basebackup<\/code>, so you\u2019re ready to run a restore. This process isn\u2019t simple. I would strongly recommend you practice this a bunch. Also, prep the restore scripts ahead of time so you\u2019re not scrambling to piece it all together in an emergency.<\/p>\n<ol>\n<li>Assuming you can access the server, run <code>pg_switch_wal<\/code>. This will close out the current WAL file allowing it to get archived, making it available for your restore later.<\/li>\n<li>Stop the server (if it\u2019s online).<\/li>\n<li>Remove all files and subdirectories in your data directory (same with tablespaces, but I haven\u2019t learned those yet, so I\u2019m going to ignore them for now).<\/li>\n<li>Restore the Base Backup files from wherever you had them, do this using the appropriate user, in my case, the default <code>postgres<\/code> user.<\/li>\n<li>Remove all files from <code>pg_wal\/<\/code>, they\u2019re old since we have archiving.<\/li>\n<li>Change settings in <code>postgresql.conf<\/code> to start the recovery process.<\/li>\n<li>Create a file called <code>recovery.signal<\/code> in the cluster data directory.<\/li>\n<li>Start the service.<\/li>\n<\/ol>\n<p>As I said, you\u2019re really going to want to practice this a few times. Let\u2019s address a few of these steps in more detail.<\/p>\n<p>Running <code>pg_switch_wal<\/code> requires a connection so you can run a query: <code>SELECT pg_switch_wal()<\/code>. That will cause the WAL to close the current file and because you have archiving set up, it\u2019ll get added to the archive location. If you can\u2019t connect, then you\u2019re losing what in SQL Server land we refer to as the tail log, the last bit since the last backup. You can still try to get this by copying the file out of the <code>pg_wal<\/code> directory and the restoring it there right before you restart the service.<\/p>\n<p>You can stop the server use <code>pg_ctl<\/code> <code>stop<\/code>. Same thing in reverse to start it, <code>pg_ctl start<\/code>.<\/p>\n<p>Since the base backup copies everything, you just need to get rid of everything. You want to be able to copy all the information you have stored in whatever location you used for it. The exception is the pg_wal directory. If you want to restore just the backups from pg_basebackup, that\u2019s fine. Leave it in place. However, since we\u2019re going to a point in time based on the archive of the WAL files, it\u2019s best to clean out this one folder after you restore the rest. How you remove the directory depends on the OS. Same thing with the copy from backup. If you do drop and recreate the directory, make sure you set the postgresql user as the owner with appropriate security:<\/p>\n<pre class=\"lang:none theme:none\">sudo chown postgres:postgres \/var\/lib\/postgresql\/14\/data\nsudo chmod 700 \/var\/lib\/postgresql\/14\/data<\/pre>\n<p>Changing the postgresl.conf file is the thing I find the most confusing. However, that\u2019s how recovery is done. There should be a variable called restore_command within the configuration file. It\u2019s likely commented out. Remove the comment and edit the path. The default will be:<\/p>\n<pre class=\"lang:none theme:none\">\/path\/to\/database_archive\/<\/pre>\n<p>In my case, I\u2019m going to change it to:<\/p>\n<pre class=\"lang:none theme:none\"><em>\/walarchive<\/em><\/pre>\n<p>The full command would then be:<\/p>\n<pre class=\"lang:none theme:none\">restore_command = \u2018cp \/walarchive\/$f %p\u2019<\/pre>\n<p>Finally, because I want to restore to a point in time, I have specify that. It can be time, a log sequence number, a transaction id, or a restore point if I\u2019ve created them (beyond the scope of this article). Let\u2019s assume time. I\u2019ll add this below the restore_command:<\/p>\n<pre class=\"lang:none theme:none\">Recovery_target_time = \u20182022-05-02 15:24:00 EDT\u2019<\/pre>\n<p>That\u2019s it.<\/p>\n<p>As to the <code>recovery.signal<\/code> file, you just have to create it:<\/p>\n<pre class=\"lang:none theme:none\">vi recovery.signal<\/pre>\n<p>When we restart the service, it will restore all the logs up to the point in time we specified. It\u2019ll also rename r<code>ecovery.signal<\/code> to <code>recovery.done<\/code> so that if you restart the service, it doesn\u2019t restart the recovery again. No need to reedit the postgres.conf file since the existence of recovery.signal is what starts the recovery process.<\/p>\n<p>That should do it. A point in time restore accomplished.<\/p>\n<h2>Conclusion<\/h2>\n<p>To say this is complicated barely begins to cover it. However, it\u2019s really just a question of taking it one step at a time and validating that step. Ensure you have WAL archiving validating that files are being placed into your archive. Same thing with <code>pb_basebackup<\/code>. For the restore process, just script out all the steps and make darned sure you practice them before a real emergency hits.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The focus of your backup strategy should not be backups. Instead, you should be focused on restoring databases, because, after all, that\u2019s what matters. Simply having a backup is fine. However, if you can\u2019t restore your data from that backup, it doesn\u2019t matter. On top of this is the ability to ensure you can restore&#8230;&hellip;<\/p>\n","protected":false},"author":221792,"featured_media":103313,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143534],"tags":[158977,158976,158978],"coauthors":[6785],"class_list":["post-103306","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-featured","category-postgresql","tag-learningpostgresqlwithgrant","tag-planetpostgresqlgrantfritchey","tag-postgresql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/103306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221792"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=103306"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/103306\/revisions"}],"predecessor-version":[{"id":104385,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/103306\/revisions\/104385"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/103313"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=103306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=103306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=103306"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=103306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}