Introduction
Understanding how PostgreSQL works under the hood can be a fascinating and rewarding journey for database enthusiasts and professionals alike. One powerful tool for exploring PostgreSQL internals is pg_filedump
. This article walks you through what pg_filedump
is, its use cases, and how to set it up and use it to analyze PostgreSQL page files. We’ll also demonstrate how to run PostgreSQL using Docker to create a portable and clean testing environment.
What is pg_filedump
?
pg_filedump
is a diagnostic and debugging tool for PostgreSQL that allows you to inspect the structure and contents of PostgreSQL database files at a low level. It provides detailed information about data pages, tuples, and more, helping you analyze and understand PostgreSQL storage mechanisms.
Setting Up PostgreSQL with Docker
1. Pull the PostgreSQL Image:
docker pull postgres:15
2. Run the PostgreSQL Container:
docker run --name postgres-container \
-e POSTGRES_USER=admin \
-e POSTGRES_PASSWORD=admin \
-e POSTGRES_DB=testdb \
-p 5432:5432 \
-d postgres
This command creates a PostgreSQL container named postgres-container
with the username admin
, password admin
, and a database named testdb
3. Access the PostgreSQL Container:
docker exec -it postgres-container bash
Installing and Compiling pg_filedump
To use pg_filedump
, follow these steps:
1. Clone the PostgreSQL Git Repository:
git clone https://git.postgresql.org/git/pg_filedump.git cd postgres/contrib/pg_filedump
1. Install Dependencies: Inside the container or host machine, install the required PostgreSQL development libraries and tools:
apt-get update && apt-get install -y \
git \
build-essential \
libpq-dev \
postgresql-server-dev-15 \
liblz4-dev \
clang-11
2. Clone the PostgreSQL Git Repository and switch to the directory:
git clone https://git.postgresql.org/git/pg_filedump.git
cd postgres/contrib/pg_filedump
3. Compile and install pg_filedump
:
make
make install
The output will be similar to below.
root@02aa4942c6be:/pg_filedump# make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -moutline-atomics -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -I. -I./ -I/usr/include/postgresql/15/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pg_filedump.o pg_filedump.c
...
root@02aa4942c6be:/pg_filedump# make install
/bin/mkdir -p '/usr/lib/postgresql/15/bin'
/usr/bin/install -c pg_filedump '/usr/lib/postgresql/15/bin'
From the output, you can see the pg_filedump binary is placed under /usr/lib/postgresql/15/bin
If you encounter errors, ensure all dependencies are correctly installed and that the paths to PostgreSQL headers and libraries are correctly configured.
Using pg_filedump
Once compiled, you can use pg_filedump
to analyze page files:
1. Locate PostgreSQL Data Directory:
docker exec -it postgres-container psql -U admin postgres -c 'SHOW data_directory;'
2. Locate the data file associated with the table or index you want to inspect: You can use the relfilenode
of the table or index. Here’s how to find it:
SELECT relname, relfilenode FROM pg_class WHERE relname = 'your_table_name';
This will return the relfilenode
for the table, which corresponds to the data file in the base/<database_oid>/
directory.
3. Navigate to the Data Directory:
cd /var/lib/postgresql/data/base/<database_oid>
3. Run
pg_filedump -i /var/lib/postgresql/data/base/5/16389
Replace
with the desired file name to inspect its details.16389
Below is a sample output,
pg_filedump -i /var/lib/postgresql/data/base/5/16389
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility
*
* File: /var/lib/postgresql/data/base/5/16389
* Options used: -i
*******************************************************************
Block 0 ********************************************************
<Header> -----
Block Offset: 0x00000000 Offsets: Lower 40 (0x0028)
Block: Size 8192 Version 4 Upper 8032 (0x1f60)
LSN: logid 0 recoff 0x01560358 Special 8192 (0x2000)
Items: 4 Free Space: 7992
Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 40
<Data> -----
Item 1 -- Length: 40 Offset: 8152 (0x1fd8) Flags: NORMAL
XMIN: 737 XMAX: 0 CID|XVAC: 0
Block Id: 0 linp Index: 1 Attributes: 3 Size: 24
infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)
Item 2 -- Length: 36 Offset: 8112 (0x1fb0) Flags: NORMAL
XMIN: 737 XMAX: 0 CID|XVAC: 0
Block Id: 0 linp Index: 2 Attributes: 3 Size: 24
infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)
Item 3 -- Length: 40 Offset: 8072 (0x1f88) Flags: NORMAL
XMIN: 737 XMAX: 0 CID|XVAC: 0
Block Id: 0 linp Index: 3 Attributes: 3 Size: 24
infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)
Item 4 -- Length: 40 Offset: 8032 (0x1f60) Flags: NORMAL
XMIN: 737 XMAX: 0 CID|XVAC: 0
Block Id: 0 linp Index: 4 Attributes: 3 Size: 24
infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)
Analyze the Output
The output from pg_filedump
will provide details about the PostgreSQL storage pages:
- Page Headers: Information about the page such as page ID, LSN (Log Sequence Number), flags, and free space.
- Tuples: A list of the rows (tuples) stored on each page.
- Free Space: If you requested free space info, you’ll see the available space on each page.
Common Issues and Troubleshooting
- Missing
postgres.h
Error: Ensure you have installed the correct PostgreSQL server development packages (postgresql-server-dev-XX
). - Missing
lz4.h
Error: Install theliblz4-dev
package to resolve this. - Path Issues: Ensure environment variables like
PG_CONFIG
point to the correct PostgreSQL installation.
Explore Our Ultimate Database Fundamentals Series