The Postgres Incident
Not understanding past errors, I found out last week just how important it is to use the vacuum command in Postgres.If you don't vacuum your database, your pg_log file can exceed a gigabyte in size and wrap thereby corrupting data. In this case, and perhaps others, all databases in Postgres.
First, many thanks go to Tom Lane who's expertise in this matter helped save my bacon.
It started with the following message:
Database 'xxxx', OID yyyyy, has disappeared from pg_database
I was using Postgres 7.0.2 which came with Red Hat 6.2. Now I should have upgraded a long time ago (more on that later) but this particular problem was caused by not using the vacuum command which cleans up databases and performs other background routines that are necessary to ensure database performance and stability. Now my pg_log file actually wasn't over 1GB as was the traditional indication. So, enter step 1.
Probably the next thing is to look at pg_database and tryto understandwhy all the rows are showing as dead. There are a couple of toolsaround that can dump out the contentsof tuple headers. I'd suggestpg_filedump (http://sources.redhat.com/rhdb/tools.html), but it isintended for 7.1/7.2 and you'd need to do a little surgery to make itwork on 7.0 files. (I think ripping out the code that expects an LSNfield to be present in the page header would be enough.) If you checkthe PG list archives you will find references to one or two olderprograms in the same vein; there might still be something availablethat will work on 7.0 files without modification. regards, tom lane
This took quite a while as I initially tried to compile the pg_filedump against a 7.2 source, and I should have compiled against a 7.0 source. However, that required a lot of inspection and alteration of the pg_dumpfile source. Eventually it compiled and I sent a rather verbose dump to the list. Tom looked at it and confirmed that somehow the log had either wrapped, was damaged, or both. So the solution was as follows.