Introductory PostgreSQL Administration Tasks
The purpose of this article is to discuss basic administration tasks that need to be performed by the budding PostgreSQL (pgsql) administrator. Administering any server-based application or service is normally a black art--and pgsql is no different. You should know how to do a number of things after the installation takes place, and this article is a good place to start.
Topics covered include making backups, the purpose of the template database, monitoring server resources, dump and restore operations, and security. Finally, the article will wrap up with a blurb on statistics gathering and pointers to further information.
Unless otherwise specified, all references to OS in this article should be treated as references to RedHat Linux 7.0.
A Blurb About BackupsAs an administrator, you will undoubtedly be responsible for ensuring the safety of your data. Should this responsibility fall upon you, you should be aware of a number of items before you do a blanket save of your entire system.
The nature and frequency of backups are based on the type of databases you are housing on your server. For instance, if you are hosting a typical database application that is modified predominately during business hours, then you will most likely perform backups during downtime. A 24/7 operation such as a Web site is more problematic. Depending on site traffic, you may not have time to do a dedicated backup of the databases on the server.
Assuming you are running in a traditional business environment, the need to back up your data goes without saying: The real questions are how, and how often. The how is answered simply by analyzing the volatility of your data. Is the database a line-of-business application in which many hundreds or thousands of records are modified each day? Then that database is a good candidate for daily backups. Is it a database that is only updated once in a great while? Then a weekly or monthly backup schedule is appropriate.
At this point, you should load psql and create a database complete with a few tables and indexes. Then su to the pgsql superuser you created when you installed pgsql.
On the machine I'm using to write this article, my pgsql data directory is housed in /var/lib/pgsql/data. There was nothing special about this directory; it was just the default directory used by the RedHat 7.0 PostgreSQL RPM install.
Now that you are su'ed to the pgsql superuser and resting in the data directory, do an ls and look at what is there. The item you should look for at this point is a directory named base. This directory is the root data directory. Go ahead and cd into the base directory and take a peek. On my workstation config, I have two directories: template1 and damond. Damond is a database for my account on the machine. You should see a directory matching the name of the database you created earlier. Change into that directory and do an ls.
The files starting with pg_ are system files used by pgsql. We're interested in the files that don't start with pg. You'll notice that for each table or index you created, a matching file exists in the directory. Unlike other database systems, which use one large file for everything (I'm afraid to mention any particulars here, but I'm thinking of a few big league systems), pgsql stores each element in its own files. This allows you, as an administrator, to not only position individual files anywhere on your file system, but to selectively back up and restore individual tables in each database.
Now, before we get all giddy over this ability, I need to discuss reasons why you wouldn't want to reach directly into the data directory to do backup/restore processes.
Databases, by their very nature, are atomic: Operations either succeed or fail. Pgsql is no different in this regard. Developers can tell pgsql that they are starting a transaction, perform a unit of work, and then either commit or roll back their changes. Anything can be done during these transactions: records deleted, changed, added, etc. It's quite possible that before a change takes place in Table A, 15 other changes have occurred in Tables B and C. Because of this, it becomes a hairy business to do a full backup/restore on individual tables. But, that said, it is still possible to do so. As an administrator, you will have to work carefully with the application developers in order to plan and implement such a backup strategy.
In the end, it is much easier to do a full backup once a week on each database you care about and to do incremental backups on the remaining days. Many administrators do a full system save every night (which includes data and binaries--basically, whatever is on the disk). Backups such as these take a while, but allow for easy recovery of a system if it should fail completely.
Before leaving the subject of backing up your system, one last topic I'll discuss is testing the backups for validity. I ran into one customer who had been performing backups religiously for years. Turns out the company had been formatting backup tapes for years...it hadn't backed up a single byte of data from the beginning. Verify your tapes. Verify that the data is being saved. And, finally, verify that you can pull the data off the tape in case you need to. Disaster recovery is no joke. It will be made even less funny when you find that your backup tapes contain zero data.