PostgreSQL Internals: A Practical Guide to pg_filedump

PostgreSQL Internals: A Practical Guide to pg_filedump

Introduction

Understanding how PostgreSQL works under the hood can be a fascinating and rewarding journey for database enthusiasts and professionals alike. One powerful tool for exploring PostgreSQL internals is pg_filedump. This article walks you through what pg_filedump is, its use cases, and how to set it up and use it to analyze PostgreSQL page files. We’ll also demonstrate how to run PostgreSQL using Docker to create a portable and clean testing environment.


What is pg_filedump?

pg_filedump is a diagnostic and debugging tool for PostgreSQL that allows you to inspect the structure and contents of PostgreSQL database files at a low level. It provides detailed information about data pages, tuples, and more, helping you analyze and understand PostgreSQL storage mechanisms.

Setting Up PostgreSQL with Docker

1. Pull the PostgreSQL Image:

docker pull postgres:15

2. Run the PostgreSQL Container:

docker run --name postgres-container \
   -e POSTGRES_USER=admin \
   -e POSTGRES_PASSWORD=admin \
   -e POSTGRES_DB=testdb \
   -p 5432:5432 \
   -d postgres

This command creates a PostgreSQL container named postgres-container with the username admin, password admin, and a database named testdb

3. Access the PostgreSQL Container:

docker exec -it postgres-container bash

Installing and Compiling pg_filedump

To use pg_filedump, follow these steps:

1. Clone the PostgreSQL Git Repository:

git clone https://git.postgresql.org/git/pg_filedump.git cd postgres/contrib/pg_filedump

1. Install Dependencies: Inside the container or host machine, install the required PostgreSQL development libraries and tools:

apt-get update && apt-get install -y \
   git \
   build-essential \
   libpq-dev \
   postgresql-server-dev-15 \
   liblz4-dev \
   clang-11


2. Clone the PostgreSQL Git Repository and switch to the directory:

git clone https://git.postgresql.org/git/pg_filedump.git 
cd postgres/contrib/pg_filedump


3. Compile and install pg_filedump:

make
make install

The output will be similar to below.

root@02aa4942c6be:/pg_filedump# make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -moutline-atomics -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -I. -I./ -I/usr/include/postgresql/15/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o pg_filedump.o pg_filedump.c
...

root@02aa4942c6be:/pg_filedump# make install
/bin/mkdir -p '/usr/lib/postgresql/15/bin'
/usr/bin/install -c  pg_filedump '/usr/lib/postgresql/15/bin'


From the output, you can see the pg_filedump binary is placed under /usr/lib/postgresql/15/bin

If you encounter errors, ensure all dependencies are correctly installed and that the paths to PostgreSQL headers and libraries are correctly configured.

Using pg_filedump

Once compiled, you can use pg_filedump to analyze page files:

1. Locate PostgreSQL Data Directory:

 docker exec -it postgres-container psql -U admin postgres -c 'SHOW data_directory;'

2. Locate the data file associated with the table or index you want to inspect: You can use the relfilenode of the table or index. Here’s how to find it:

SELECT relname, relfilenode FROM pg_class WHERE relname = 'your_table_name';

This will return the relfilenode for the table, which corresponds to the data file in the base/<database_oid>/ directory.


3. Navigate to the Data Directory:

cd /var/lib/postgresql/data/base/<database_oid>

3. Run

pg_filedump -i /var/lib/postgresql/data/base/5/16389

Replace 16389 with the desired file name to inspect its details.

Below is a sample output,

pg_filedump -i /var/lib/postgresql/data/base/5/16389

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility
*
* File: /var/lib/postgresql/data/base/5/16389
* Options used: -i
*******************************************************************

Block    0 ********************************************************
<Header> -----
 Block Offset: 0x00000000         Offsets: Lower      40 (0x0028)
 Block: Size 8192  Version    4            Upper    8032 (0x1f60)
 LSN:  logid      0 recoff 0x01560358      Special  8192 (0x2000)
 Items:    4                      Free Space: 7992
 Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0000 ()
 Length (including item array): 40

<Data> -----
 Item   1 -- Length:   40  Offset: 8152 (0x1fd8)  Flags: NORMAL
  XMIN: 737  XMAX: 0  CID|XVAC: 0
  Block Id: 0  linp Index: 1   Attributes: 3   Size: 24
  infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)

 Item   2 -- Length:   36  Offset: 8112 (0x1fb0)  Flags: NORMAL
  XMIN: 737  XMAX: 0  CID|XVAC: 0
  Block Id: 0  linp Index: 2   Attributes: 3   Size: 24
  infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)

 Item   3 -- Length:   40  Offset: 8072 (0x1f88)  Flags: NORMAL
  XMIN: 737  XMAX: 0  CID|XVAC: 0
  Block Id: 0  linp Index: 3   Attributes: 3   Size: 24
  infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)

 Item   4 -- Length:   40  Offset: 8032 (0x1f60)  Flags: NORMAL
  XMIN: 737  XMAX: 0  CID|XVAC: 0
  Block Id: 0  linp Index: 4   Attributes: 3   Size: 24
  infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)

Analyze the Output

The output from pg_filedump will provide details about the PostgreSQL storage pages:

  • Page Headers: Information about the page such as page ID, LSN (Log Sequence Number), flags, and free space.
  • Tuples: A list of the rows (tuples) stored on each page.
  • Free Space: If you requested free space info, you’ll see the available space on each page.

Common Issues and Troubleshooting

  1. Missing postgres.h Error: Ensure you have installed the correct PostgreSQL server development packages (postgresql-server-dev-XX).
  2. Missing lz4.h Error: Install the liblz4-dev package to resolve this.
  3. Path Issues: Ensure environment variables like PG_CONFIG point to the correct PostgreSQL installation.

Explore Our Ultimate Database Fundamentals Series