Ryan Lambert: Postgres 15 improves UNIQUE and NULL

July 10, 2022, 4:45 pm

≫ Next: Stefan Fercot: Patroni on pure Raft

≪ Previous: Denish Patel: Connection Scaling

Postgres 15 beta 2 was released recently! I enjoy Beta season... reviewing and testing new features is a fun diversion from daily tasks. This post takes a look at an improvement to UNIQUE constraints on columns with NULL values. While the nuances of unique constraints are not as flashy as making sorts faster (that's exciting!), improving the database developer's control over data quality is always a good benefit.

This email chain has the history behind this change. The Postgres 15 release notes summarize this improvement:

"Allow unique constraints and indexes to treat NULL values as not distinct (Peter Eisentraut)
Previously NULL values were always indexed as distinct values, but this can now be changed by creating constraints and indexes using UNIQUE NULLS NOT DISTINCT."

Two styles of `UNIQUE`

To take a look at what this change does, we create two tables. The null_old_style table has a 2-column UNIQUE constraint on (val1, val2). The val2 allows NULL values.

CREATE TABLE null_old_style
(
    id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    val1 TEXT NOT NULL,
    val2 TEXT NULL,
    CONSTRAINT uq_val1_val2
        UNIQUE (val1, val2)
);

↧

Stefan Fercot: Patroni on pure Raft

July 11, 2022, 12:00 am

≫ Next: Andreas 'ads' Scherbaum: Charly Batista

≪ Previous: Ryan Lambert: Postgres 15 improves UNIQUE and NULL

Since September 2020 and its 2.0 release, Patroni is able to rely on the pysyncobj module in order to use python Raft implementation as DCS.

In this post, we will setup a demo cluster to illustrate that feature.

Installation

For this demo, we will install 3 PostgreSQL nodes in Streaming Replication, running on Rocky Linux 8.

If you’re familiar with Vagrant, here’s a simple Vagrantfile to initiate 3 virtual machines:

# VagrantfileVagrant.configure(2)do|config|config.vm.box='rockylinux/8'config.vm.provider'libvirt'do|lv|lv.cpus=1lv.memory=1024endconfig.vm.synced_folder".","/vagrant",disabled: truenodes='srv1','srv2','srv3'nodes.eachdo|node|config.vm.definenodedo|conf|conf.vm.hostname=nodeendendconfig.vm.provision"shell",inline: <<-SHELL
        #-----------------------------
        sudo dnf install -y bind-utils
        #-----------------------------
    SHELLend

PostgreSQL

First of all, let’s install PostgreSQL on all the nodes:

$ sudo dnf install-y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
$ sudo dnf -qy module disable postgresql
$ sudo dnf install-y postgresql14-server postgresql14-contrib
$ sudo systemctl disable postgresql-14

Patroni will bootstrap (create) the initial PostgreSQL cluster and be in charge of starting the service, so be sure systemctl is disabled for the PostgreSQL service.

Watchdog

Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.

Patroni will be the component interacting with the watchdog device. Set the permissions of the software watchdog:

$ cat<<EOF | sudo tee /etc/udev/rules.d/99-watchdog.rules
KERNEL=="watchdog", OWNER="postgres", GROUP="postgres"
EOF
$ sudo sh -c'echo "softdog" >> /etc/modules-load.d/softdog.conf'$ sudo modprobe softdog
$ sudo chown postgres: /dev/watchdog

Patroni

Install Patroni and its Raft dependencies:

$ sudo dnf install-y python39
$ sudo-iu postgres pip3 install--user--upgrade pip
$ sudo-iu postgres pip3 install--user setuptools_rust
$ sudo-iu postgres pip3 install--user psycopg[binary]>=3.0.0
$ sudo-iu postgres pip3 install--user patroni[raft]

Remark: since December 2021 and its version 2.1.2, Patroni supports psycopg3.

Since we installed Patroni for the postgres user, let’s add its location to the user PATH:

$ sudo-u postgres sh -c'echo "export PATH=\"/var/lib/pgsql/.local/bin:\$PATH\"" >> ~/.bash_profile'$ sudo-iu postgres patroni --version$ sudo-iu postgres syncobj_admin --help

Create the data directory for Raft:

$ sudo mkdir /var/lib/raft
$ sudo chown postgres: /var/lib/raft

Patroni configuration

We will need to define the list of Patroni nodes participating in the Raft consensus cluster. To fetch it dynamically, you can use this simple shell script (where srv1 srv2 srv3 are the 3 Patroni hosts):

# Fetch the IP addresses of all Patroni hostsMY_IP=$(hostname-I | awk' {print $1}')patroni_nodes=( srv1 srv2 srv3 )i=0
for node in"${patroni_nodes[@]}"do
  i=$i+1
  target_ip=$(dig +short $node)if[["$target_ip"="$MY_IP"]];then
    continue
  fi
  target_array[$i]="'$target_ip:5010'"done
RAFT_PARTNER_ADDRS=$(printf",%s""${target_array[@]}")export RAFT_PARTNER_ADDRS="[${RAFT_PARTNER_ADDRS:1}]"echo"partner_addrs: $RAFT_PARTNER_ADDRS"

Let us now define the Patroni configuration in /etc/patroni.yml:

$ CLUSTER_NAME="demo-cluster-1"$ MY_NAME=$(hostname--short)$ MY_IP=$(hostname-I | awk' {print $1}')$ cat<<EOF | sudo tee /etc/patroni.yml
scope: $CLUSTER_NAME
namespace: /db/
name: $MY_NAME

restapi:
  listen: "0.0.0.0:8008"
  connect_address: "$MY_IP:8008"
  authentication:
    username: patroni
    password: mySupeSecretPassword

raft:
  data_dir: /var/lib/raft
  self_addr: "$MY_IP:5010"
  partner_addrs: $RAFT_PARTNER_ADDRS

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: false
      use_slots: true
      parameters:
        archive_mode: "on"
        archive_command: "/bin/true"

  initdb:
  - encoding: UTF8
  - data-checksums
  - auth-local: peer
  - auth-host: scram-sha-256

  pg_hba:
  - host replication replicator 0.0.0.0/0 scram-sha-256
  - host all all 0.0.0.0/0 scram-sha-256

  # Some additional users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin%
      options:
        - createrole
        - createdb

postgresql:
  listen: "0.0.0.0:5432"
  connect_address: "$MY_IP:5432"
  data_dir: /var/lib/pgsql/14/data
  bin_dir: /usr/pgsql-14/bin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: confidential
    superuser:
      username: postgres
      password: my-super-password
    rewind:
      username: rewind_user
      password: rewind_password
  parameters:
    unix_socket_directories: '/var/run/postgresql,/tmp'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
EOF

Except for $MY_IP, $MY_NAME and $RAFT_PARTNER_ADDRS which are related to the local host, the patroni.yml configuration should be the same on all Patroni nodes.

Depending on the Patroni installation source, create the systemd file if not done during the installation and start the Patroni service:

$ cat<<EOF | sudo tee /etc/systemd/system/patroni.service
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target

[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=python3 /var/lib/pgsql/.local/bin/patroni /etc/patroni.yml
ExecReload=/bin/kill -s HUP \$MAINPID
KillMode=process
TimeoutSec=30
Restart=no

[Install]
WantedBy=multi-user.target
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl enable patroni
$ sudo systemctl start patroni

To check the Raft cluster status, use the syncobj_admin command:

$ sudo-iu postgres syncobj_admin -conn localhost:5010 -status

To list the members of the cluster, use the patronictl command:

$ sudo-iu postgres patronictl -c /etc/patroni.yml topology
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7117556723320621508)----+-----------+
| srv2   | 192.168.121.12  | Leader  | running |  1 |           |
| + srv1 | 192.168.121.126 | Replica | running |  1 |         0 |
| + srv3 | 192.168.121.194 | Replica | running |  1 |         0 |
+--------+-----------------+---------+---------+----+-----------+

Database connection

Instead of connecting directly to the database server, it is possible to setup HAProxy so the application will be connecting to the proxy instead, which will then forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read-only requests to one or more replicas, for load balancing. HAproxy can be installed as an independent server but it can also be installed on the application server or the database server itself.

Another possibility is to use PostgreSQL client libraries like libpq and jdbc which support client connection fail-over. The connection string contains multiple servers (eg: host=srv1,srv2,srv3) and the client library loops over the available hosts to find a connection that is available and capable of read-write or read-only operations. This capability allows clients to follow the primary cluster during a switchover.

Example:

$ psql "host=srv1,srv2,srv3 dbname=postgres user=admin target_session_attrs=read-write"-c"SELECT pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 f
(1 row)$ psql "host=srv1,srv2,srv3 dbname=postgres user=admin target_session_attrs=read-only"-c"SELECT pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)$ psql "host=srv1,srv2,srv3 dbname=postgres user=admin target_session_attrs=read-only"-c"\conninfo"
You are connected to database "postgres" as user "admin" on host "srv2"(address "192.168.121.36") at port "5432".

Automatic failover test

By default Patroni will set up the watchdog to expire 5 seconds before TTL expires. With the default setup of loop_wait=10 and ttl=30 this gives HA loop at least 15 seconds (ttl - safety_margin - loop_wait) to complete before the system gets forcefully reset. By default accessing DCS is configured to time out after 10 seconds. This means that when DCS is unavailable, for example due to network issues, Patroni and PostgreSQL will have at least 5 seconds (ttl - safety_margin - loop_wait - retry_timeout) to come to a state where all client connections are terminated.

Simply run pgbench on the leader node and disconnect the VM network interface for a few seconds to notice that a failover may happen very (too?) quickly!

Example:

22:45,326 INFO: no action. I am (srv1), a secondary, and following a leader (srv2)
22:55,333 INFO: no action. I am (srv1), a secondary, and following a leader (srv2)
23:05,355 INFO: Got response from srv3 http://192.168.121.194:8008/patroni: {"state": "running", ...}
23:07,268 WARNING: Request failed to srv2: GET http://192.168.121.12:8008/patroni (...)
23:07,280 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
23:07,319 INFO: promoted self to leader by acquiring session lock
23:07 srv1 python3[27101]: server promoting
23:07,340 INFO: cleared rewind state after becoming the leader
23:08,760 INFO: no action. I am (srv1), the leader with the lock

When the network interface comes back up on srv2, if it received additional data, the replication might be broken:

FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/D9000000 on timeline 1 is not in this server's history
  DETAIL:  This server's history forked from timeline 1 at 0/CDBDA4A8.
LOG:  new timeline 2 forked off current database system timeline 1 before current recovery point 0/D9683DC8

Since we didn’t configure Patroni to use pg_rewind, the replication lag might grow very quickly:

$ sudo-iu postgres patronictl -c /etc/patroni.yml list
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7117556723320621508)----+-----------+
| srv1   | 192.168.121.126 | Leader  | running |  2 |           |
| srv2   | 192.168.121.12  | Replica | running |  1 |       169 |
| srv3   | 192.168.121.194 | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+

Hopefully, we defined a maximum_lag_on_failover to prevent the failover on the failing standby:

$ sudo-iu postgres patronictl -c /etc/patroni.yml switchover --candidate srv2 --force
Current cluster topology
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7117556723320621508)----+-----------+
| srv1   | 192.168.121.126 | Leader  | running |  2 |           |
| srv2   | 192.168.121.12  | Replica | running |  1 |      1046 |
| srv3   | 192.168.121.194 | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+
Switchover failed, details: 503, Switchover failed

From Patroni logs:

INFO: Member srv2 exceeds maximum replication lag
WARNING: manual failover: no healthy members found, failover is not possible

We have to reinitialize the failing standby:

$ sudo-iu postgres patronictl -c /etc/patroni.yml reinit demo-cluster-1 srv2
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7117556723320621508)----+-----------+
| srv1   | 192.168.121.126 | Leader  | running |  2 |           |
| srv2   | 192.168.121.12  | Replica | running |  1 |      1575 |
| srv3   | 192.168.121.194 | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+
Are you sure you want to reinitialize members srv2? [y/N]: y
Success: reinitialize for member srv2

From Patroni logs:

INFO: Removing data directory: /var/lib/pgsql/14/data
INFO: Lock owner: srv1; I am srv2
INFO: reinitialize in progress
...

The reinit step is by default performed using pg_basebackup without the fast checkpoint mode. So, depending on the checkpoint configuration and database size, it may take a lot of time.

Conclusion

It is very important to understand the parameters affecting the automatic failover kick-off and the consequences of a switchover/failover, or even the impact of not using pg_rewind.

As usual, testing its own configuration is important and once in production what’s even more important is to have a good monitoring and alerting system!

↧

Andreas 'ads' Scherbaum: Charly Batista

July 11, 2022, 7:00 am

≫ Next: Stefan Fercot: Patroni and pgBackRest combined

≪ Previous: Stefan Fercot: Patroni on pure Raft

PostgreSQL Person of the Week Interview with Charly Batista: I’m born in a small town in the Brazilian Amazon area. My parents moved to Brasilia (the capital city of Brazil) when I was around 12yo. I finished high school and went to college in Brasilia, but never really finished it. During the years I’ve moved through cities and companies, and ended up moving to China to work with Postgres around 7 years ago.

↧

Stefan Fercot: Patroni and pgBackRest combined

July 12, 2022, 12:00 am

≫ Next: Hans-Juergen Schoenig: Column order in PostgreSQL does matter

≪ Previous: Andreas 'ads' Scherbaum: Charly Batista

I see more and more questions about pgBackRest in a Patroni cluster on community channels. So, following yesterday’s post about Patroni on pure Raft, we’ll see in this post an example about how to setup pgBackRest in such cases.

To prepare this post, I followed most of the instructions given by Federico Campoli at PGDAY RUSSIA 2021 about Protecting your data with Patroni and pgbackrest. The video recording might even be found here.

pgBackRest repository location

The first decision you’ll have to take is to know where to store your pgBackRest repository. It can be any supported repo-type with or without a dedicated backup server (aka. repo-host). The most important thing to remember is that the repository should be equally reachable from all your PostgreSQL/Patroni nodes.

Based on that decision, you’ll have to think about how to schedule the backup command. With a repo-host that’s easy, because pgBackRest will be able to determine which PostgreSQL node is the primary (or a standby if you want to take the backup from a standby node). If you’re using a directly attached storage (i.e. NFS mount point or S3 bucket), the easiest solution might be to test the return of pg_is_in_recovery() in the cron job.

For the purpose of this demo setup, I’ll use a MinIO docker container with self-signed certificates and a bucket named pgbackrest already created.

Installation

pgBackRest installation and configuration

Install pgBackRest on all the PostgreSQL nodes:

$ sudo dnf install-y epel-release
$ sudo dnf install-y pgbackrest
$ pgbackrest version
pgBackRest 2.39

Then, configure /etc/pgbackrest.conf:

$ cat<<EOF | sudo tee /etc/pgbackrest.conf
[global]
repo1-type=s3
repo1-storage-verify-tls=n
repo1-s3-endpoint=192.168.121.1
repo1-s3-uri-style=path
repo1-s3-bucket=pgbackrest
repo1-s3-key=minioadmin
repo1-s3-key-secret=minioadmin
repo1-s3-region=eu-west-3

repo1-path=/repo1
repo1-retention-full=2
start-fast=y
log-level-console=info
log-level-file=debug
delta=y
process-max=2

[demo-cluster-1]
pg1-path=/var/lib/pgsql/14/data
pg1-port=5432
pg1-user=postgres
EOF

In the previous post, we defined the cluster name to use in the scope configuration of Patroni. We’ll reuse the same name for the stanza name.

Let’s initialize the stanza:

$ sudo-iu postgres pgbackrest --stanza=demo-cluster-1 stanza-create
INFO: stanza-create command begin 2.39: ...
INFO: stanza-create for stanza 'demo-cluster-1' on repo1
INFO: stanza-create command end: completed successfully (684ms)

Configure Patroni to use pgBackRest

Let’s adjust the archive_command in Patroni configuration:

$ sudo-iu postgres patronictl -c /etc/patroni.yml edit-config
## adjust the following lines
postgresql:
  parameters:
    archive_command: pgbackrest --stanza=demo-cluster-1 archive-push "%p"
    archive_mode: "on"$ sudo-iu postgres patronictl -c /etc/patroni.yml reload demo-cluster-1

Check that the archiving system is working:

$ sudo-iu postgres pgbackrest --stanza=demo-cluster-1 check
INFO: check command begin 2.39: ...
INFO: check repo1 configuration (primary)
INFO: check repo1 archive for WAL (primary)
INFO: WAL segment 000000010000000000000004 successfully archived to '...' on repo1
INFO: check command end: completed successfully (1083ms)

Take a first backup

Let’s not take our first full backup:

$ sudo-iu postgres pgbackrest --stanza=demo-cluster-1 backup --type=full
P00   INFO: backup command begin 2.39: ...
P00   INFO: execute non-exclusive pg_start_backup():
  backup begins after the requested immediate checkpoint completes
P00   INFO: backup start archive = 000000010000000000000006, lsn = 0/6000028
P00   INFO: check archive for prior segment 000000010000000000000005
P00   INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
P00   INFO: backup stop archive = 000000010000000000000006, lsn = 0/6000100
P00   INFO: check archive for segment(s) 000000010000000000000006:000000010000000000000006
P00   INFO: new backup label = 20220711-075256F
P00   INFO: full backup size = 25.2MB, file total = 957
P00   INFO: backup command end: completed successfully

All Patroni nodes should now be able to see the content of the pgBackRest repository:

$ pgbackrest info
stanza: demo-cluster-1
    status: ok
    cipher: none

    db (current)
        wal archive min/max (14): 000000010000000000000006/000000010000000000000006

        full backup: 20220711-075256F
            timestamp start/stop: 2022-07-11 07:52:56 / 2022-07-11 07:53:02
            wal start/stop: 000000010000000000000006 / 000000010000000000000006
            database size: 25.2MB, database backup size: 25.2MB
            repo1: backup set size: 3.2MB, backup size: 3.2MB

The “restore” part

Now that the archiving system is working, we can configure the restore_command. Possibly, we could disable replication slots for Patroni too since we now have the archives as safety net for the replication.

Let’s edit the bootstrap configuration part:

$ sudo-iu postgres patronictl -c /etc/patroni.yml edit-config
## adjust the following lines
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    archive_command: pgbackrest --stanza=demo-cluster-1 archive-push "%p"
    archive_mode: 'on'
  recovery_conf:
    recovery_target_timeline: latest
    restore_command: pgbackrest --stanza=demo-cluster-1 archive-get %f "%p"
  use_pg_rewind: false
  use_slots: true
retry_timeout: 10
ttl: 30

$ sudo-iu postgres patronictl -c /etc/patroni.yml reload demo-cluster-1

To use pgBackRest for creating (or re-initializing) replicas, we need to adjust the Patroni configuration file.

On all your nodes, in /etc/patroni.yml, find the following part:

postgresql:listen:"0.0.0.0:5432"connect_address:"$MY_IP:5432"data_dir:/var/lib/pgsql/14/databin_dir:/usr/pgsql-14/binpgpass:/tmp/pgpass0authentication:replication:username:replicatorpassword:confidentialsuperuser:username:postgrespassword:my-super-passwordrewind:username:rewind_userpassword:rewind_passwordparameters:unix_socket_directories:'/var/run/postgresql,/tmp'

and add:

create_replica_methods:-pgbackrest-basebackuppgbackrest:command:pgbackrest --stanza=demo-cluster-1 restore --type=nonekeep_data:Trueno_params:Truebasebackup:checkpoint:'fast'

Don’t forget to reload the configuration:

$ sudo systemctl reload patroni

Create a replica using pgBackRest

Here is our current situation:

$ sudo-iu postgres patronictl -c /etc/patroni.yml list
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7119014497759128647)----+-----------+
| srv1   | 192.168.121.2   | Replica | running |  1 |         0 |
| srv2   | 192.168.121.254 | Leader  | running |  1 |           |
| srv3   | 192.168.121.89  | Replica | running |  1 |         0 |
+--------+-----------------+---------+---------+----+-----------+

We already have 2 running replicas. So we’ll need to stop Patroni on one node and remove its data directory to trigger a new replica creation:

$ sudo systemctl stop patroni
$ sudo rm-rf /var/lib/pgsql/14/data
$ sudo systemctl start patroni
$ sudo journalctl -u patroni.service -f
...
INFO: trying to bootstrap from leader 'srv2'
...
INFO: replica has been created using pgbackrest
INFO: bootstrapped from leader 'srv2'
...
INFO: no action. I am (srv3), a secondary, and following a leader (srv2)

As we can see from the logs above, the replica has successfully been created using pgBackRest:

$ sudo-iu postgres patronictl -c /etc/patroni.yml list
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7119014497759128647)----+-----------+
| srv1   | 192.168.121.2   | Replica | running |  1 |         0 |
| srv2   | 192.168.121.254 | Leader  | running |  1 |           |
| srv3   | 192.168.121.89  | Replica | running |  1 |         0 |
+--------+-----------------+---------+---------+----+-----------+

Now, let’s insert some data on srv2 and simulate an incident by suspending the VM network interface for a few seconds.

A failover will happen and the old primary will be completely out-of-sync and the replication will be lagging when adding new data on the primary:

$ sudo-iu postgres patronictl -c /etc/patroni.yml list
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7119014497759128647)----+-----------+
| srv1   | 192.168.121.2   | Leader  | running |  2 |           |
| srv2   | 192.168.121.254 | Replica | running |  1 |       188 |
| srv3   | 192.168.121.89  | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+

Since we didn’t configure pg_rewind, we’ll need to re-initialize the failing node manually:

$ sudo-iu postgres patronictl -c /etc/patroni.yml reinit demo-cluster-1 srv2
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7119014497759128647)----+-----------+
| srv1   | 192.168.121.2   | Leader  | running |  2 |           |
| srv2   | 192.168.121.254 | Replica | running |  1 |       188 |
| srv3   | 192.168.121.89  | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+
Are you sure you want to reinitialize members srv2? [y/N]: y
Success: reinitialize for member srv2

$ sudo-iu postgres patronictl -c /etc/patroni.yml list
+--------+-----------------+---------+---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+ Cluster: demo-cluster-1 (7119014497759128647)----+-----------+
| srv1   | 192.168.121.2   | Leader  | running |  2 |           |
| srv2   | 192.168.121.254 | Replica | running |  2 |         0 |
| srv3   | 192.168.121.89  | Replica | running |  2 |         0 |
+--------+-----------------+---------+---------+----+-----------+

The following trace shows that pgBackRest has successfully been used to re-initialize the old primary:

INFO: replica has been created using pgbackrest
INFO: bootstrapped from leader 'srv1'
...
INFO: Lock owner: srv1; I am srv2
INFO: establishing a new patroni connection to the postgres cluster
INFO: no action. I am (srv2), a secondary, and following a leader (srv1)

Conclusion

It is very easy and convenient to configure Patroni to use pgBackRest. Obviously, having backups is a good thing but being able to use those backups as source for the replica creation or re-initialization is even better.

As usual, the hardest part is to clearly define where to store the pgBackRest repository.

↧

Hans-Juergen Schoenig: Column order in PostgreSQL does matter

July 12, 2022, 12:00 am

≫ Next: Agustin Gallego: Enabling and Enforcing SSL/TLS for PostgreSQL Connections

≪ Previous: Stefan Fercot: Patroni and pgBackRest combined

I’ve recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people don’t understand: Column order and column access.

Creating a large table

The first question is: How can we create a table containing many columns? The easiest way is to simply generate the CREATE TABLE statement using generate_series:

test=# SELECT 'CREATE TABLE t_broad ('

|| string_agg('t_' || x

|| ' varchar(10) DEFAULT ''a'' ', ', ')

|| ' )'

FROM generate_series(1, 4) AS x;

?column?       ----------------------------------------------------------

CREATE TABLE t_broad (

t_1 varchar(10) DEFAULT 'a' ,

t_2 varchar(10) DEFAULT 'a' ,

t_3 varchar(10) DEFAULT 'a' , t_4 v

archar(10) DEFAULT 'a'  )

(1 row)

test=# \gexec

CREATE TABLE

For the sake of simplicity I have only used 4 columns here. Once the command has been generated we can use \gexec to execute the string we have just compiled. \gexec is a really powerful thing: It treats the previous result as SQL input which is exactly what we want here. It leaves us with a table containing 4 columns.

However, let’s drop the table and create a really large one.


test=# DROP TABLE t_broad ;

DROP TABLE

Create an extremely wide table

The following statement creates a table containing 1500 columns. Mind that the upper limit is 1600 columns:

test=# SELECT 'CREATE TABLE t_broad ('

|| string_agg('t_' || x

|| ' varchar(10) DEFAULT ''a'' ', ', ') || ' )'

FROM generate_series(1, 1500) AS x;

In real life such a table is far from efficient and should usually not be used to store data. It will simply create too much overhead and in most cases it is not good modelling in the first place.

Let’s populate the table and add 1 million rows:


test=# \timing

Timing is on.

test=# INSERT INTO t_broad

SELECT 'a' FROM generate_series(1, 1000000);

INSERT 0 1000000

Time: 67457,107 ms (01:07,457)

test=# VACUUM ANALYZE ;

VACUUM

Time: 155935,761 ms (02:35,936)

Note that the table has default values so we can be sure that those columns actually contain something. Finally I have executed VACUUM to make sure that all hint bits and alike are set.

The table we have just created is roughly 4 GB in size which can easily be determined using the following line:


test=# SELECT pg_size_pretty(pg_total_relation_size('t_broad'));

pg_size_pretty

----------------

3907 MB

(1 row)

Accessing various columns

PostgreSQL stores data in rows. As you might know data can be stored column- or row-oriented. Depending on your use case one or the other option might be beneficial. In the case of OLTP a row-based approach is usually far more efficient.

Let’s do a count(*) and see how long it takes:


test=# SELECT count(*) FROM t_broad;

count

---------

1000000

(1 row)



Time: 416,732 ms

We can run the query in around 400 ms which is quite ok. As expected, the optimizer will go for a parallel sequential scan:


test=# explain SELECT count(*) FROM t_broad;

QUERY PLAN

--------------------------------------------------------------------

Finalize Aggregate (cost=506208.55..506208.56 rows=1 width=8)

-> Gather (cost=506208.33..506208.54 rows=2 width=8)

Workers Planned: 2

-> Partial Aggregate

(cost=505208.33..505208.34 rows=1 width=8)

-> Parallel Seq Scan on t_broad

(cost=0.00..504166.67 rows=416667 width=0)

JIT:

Functions: 4

Options: Inlining true, Optimization true, Expressions true,

Deforming true

(8 rows)

Let’s compare this to a count on the first column. You’ll see a small difference in performance. The reason is that count(*) has to check for the existence of the row while count(column) has to check if a NULL value is fed to the aggregate or not. In case of NULL the value has to be ignored:

test=# SELECT count(t_1) FROM t_broad;

count

---------

1000000

(1 row)



Time: 432,803 ms

But, let’s see what happens if we access column number 100? The time to do that will differ significantly:

test=# SELECT count(t_100) FROM t_broad;

count

---------

1000000

(1 row)



Time: 857,897 ms

The execution time has basically doubled. The performance is even worse if we do a count on column number 1000:


test=# SELECT count(t_1000) FROM t_broad;

count

---------

1000000

(1 row)



Time: 8570,238 ms (00:08,570)

Wow, we are already 20 times slower than before. This is not a small difference but a major problem which has to be understood.

Debunking PostgreSQL performance issues: column order

To understand why the problem happens in the first place we need to take a look at how PostgreSQL stores data: After the tuple header which is present in every row we got a couple of varchar columns. We just used varchar here to prove the point. The same issues will happen with other data types – the problem is simply more apparent with varchar as it is more complicated internally than, say, integer.

How does PostgreSQL access a column? It will fetch the row and then dissect this tuple to calculate the position of the desired column inside the row. So if we want to access column #1000 it means that we have to figure out how long those first 999 columns before our chosen one really are. This can be quite complex. For integer we simply have to add 4, but in case of varchar, the operation turns into something really expensive. Let’s inspect how PostgreSQL stores varchar (just to see why it is so expensive):

1 bit indicating short (127 bytes) vs. long string (> 127 bit)
7 bit or 31 bit length (depending on first bit)
“data” + \0 (to terminate the string )
alignment (to make sure the next column starts at a multiple of CPU-word length)

Now imagine what that means if we need to loop over 1000 columns? It does create some non-trivial overhead.

Finally …

The key insight here is that using extremely large tables is often not beneficial from a performance standpoint. It makes sense to use sensible table layouts to have a good compromise between performance and convenience.

If you are interested in other ways to improve performance, read my blog on CLUSTER.

The post Column order in PostgreSQL does matter appeared first on CYBERTEC.

↧

Agustin Gallego: Enabling and Enforcing SSL/TLS for PostgreSQL Connections

July 5, 2022, 4:57 am

≫ Next: Egor Rogov: PostgreSQL 14 Internals

≪ Previous: Hans-Juergen Schoenig: Column order in PostgreSQL does matter

Enabling SSL in PostgreSQL is very straightforward. In just three steps we can make sure the connections to it are more secure, using in-transit encryption via SSL/TLS:

Make sure we have the server certificate and key files available
Enable the SSL configuration (ssl = on)
Make sure the pg_hba.conf file rules are updated accordingly

In this blog post, we are going to go through these steps, and we’ll also see how we can check and validate the connections are indeed using the safer SSL protocol.

What is SSL/TLS?

SSL (Secure Sockets Layer) is an encryption protocol designed to make network communications between two nodes secure. Without some form of network encryption, any third party that can examine network packets will have access to the data sent between the client and server (in this case, the PostgreSQL data, which means users, passwords, and even SQL statements). TLS (Transport Layer Security) is the more modern definition of it, and even if SSL is deprecated, it is still common to use it for naming purposes. To all intents and purposes, we are using them as aliases in this blog.

The PostgreSQL documentation pages offer us some more insight in this respect. If needed, consult the Secure TCP/IP Connections with SSL and SSL Support entries for more information.

Trying to enable SSL without Cert/Key Files

Let’s now see what happens when we try to enable SSL without having the needed certificate and key files in place:

postgres=# alter system set ssl=on;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

We don’t see any errors, but are we really using SSL? If we check the error log, we’ll indeed see the errors:

2022-06-23 20:43:54.713 UTC [5284] LOG:  received SIGHUP, reloading configuration files
2022-06-23 20:43:54.714 UTC [5284] LOG:  parameter "ssl" changed to "on"
2022-06-23 20:43:54.715 UTC [5284] LOG:  could not load server certificate file "server.crt": No such file or directory
2022-06-23 20:43:54.715 UTC [5284] LOG:  SSL configuration was not reloaded

Creating certificates

So, we first need to create the aforementioned files. If you don’t already have valid certificate and key files, a quick one-liner for this is the following openssl command (it’s not the focus here to delve too much into this part of the process):

[root@node0 ~]# cd /var/lib/pgsql/14/data
[root@node0 data]# openssl req -nodes -new -x509 -keyout server.key -out server.crt -subj '/C=US/L=NYC/O=Percona/CN=postgres'
Generating a 2048 bit RSA private key
....+++
.........................+++
writing new private key to 'server.key'
-----

We have changed the current working directory to the PostgreSQL data directory since we were in a RHEL-based system. If you are on a Debian-based one, you should store the files in /etc/ssl/certs/ and /etc/ssl/private/ or define/check ssl_cert_file and ssl_key_file PostgreSQL configuration variables, respectively. Also, make sure the postgres user owns them, and they are only readable to it:

[root@node0 data]# chmod 400 server.{crt,key}
[root@node0 data]# chown postgres:postgres server.{crt,key}
[root@node0 data]# ll server.{crt,key}
-r--------. 1 postgres postgres 1212 Jun 23 20:49 server.crt
-r--------. 1 postgres postgres 1704 Jun 23 20:49 server.key

Enabling SSL/TLS

Now we can enable SSL and reload the configuration again; this time with no errors shown:

postgres=# alter system set ssl=on;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

2022-06-23 20:52:05.823 UTC [5284] LOG:  received SIGHUP, reloading configuration files
2022-06-23 20:52:05.823 UTC [5284] LOG:  parameter "ssl" changed to "on"

So far, we have enabled SSL, but unless we modify the pg_hba.conf file these settings won’t apply to any users (at least not in a forceful manner). This is the first step that can give us a false sense of security, so let’s go ahead and see how to fix it.

Enforcing SSL/TLS

As mentioned, the pg_hba.conf file is where we can tune which connections are going to be required to use SSL. We can instruct PostgreSQL to enforce this by using the “hostssl” keyword instead of the plain “host” one. Note that you can see some connections starting to use SSL at this point because the plain “host” keyword will allow for connections that want to use SSL to use it. However, this is not enforcing SSL to be used (i.e.: if the client doesn’t want to use SSL, PostgreSQL will not deny the connection).

Let’s imagine this is the pg_hba.conf file we have been using so far:

# TYPE  DATABASE        USER   ADDRESS            METHOD
local   all             all                       peer
host    all             all    127.0.0.1/32       scram-sha-256
host    all             all    ::1/128            scram-sha-256
host    all             all    0.0.0.0/0          md5
host    replication     all    10.124.33.113/24   md5

And we want to enforce SSL connections from all remote users (and also include remote replication connections):

# TYPE   DATABASE       USER    ADDRESS            METHOD
local    all            all                        peer
host     all            all     127.0.0.1/32       scram-sha-256
host     all            all     ::1/128            scram-sha-256
hostssl  all            all     0.0.0.0/0          md5
hostssl  replication    all     10.124.33.113/24   md5

Again, this is not enough if we are adamant about really enforcing connections to use SSL. We have to call pg_reload_conf() once more to make sure they are loaded into PostgreSQL itself:

postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

At this point, new remote non-SSL connections will be denied:

[root@node1 ~]# psql "host=10.124.33.132 sslmode=disable"
psql: error: connection to server at "10.124.33.132", port 5432 failed: FATAL:  no pg_hba.conf entry for host "10.124.33.113", user "postgres", database "postgres", no encryption

So, can we finally say we are fully secure now? No, not yet! Connections that were already established are not forced to use SSL until they reconnect.

Checking for connections using SSL/TLS

We can check for connections using SSL with the following query:

postgres=# select pg_ssl.pid, pg_ssl.ssl, pg_ssl.version,
           pg_sa.backend_type, pg_sa.usename, pg_sa.client_addr
           from pg_stat_ssl pg_ssl
           join pg_stat_activity pg_sa
             on pg_ssl.pid = pg_sa.pid;
 pid  | ssl | version |  backend_type  | usename  |  client_addr
------+-----+---------+----------------+----------+---------------
 5547 | f   |         | walsender      | postgres | 10.124.33.113
 5549 | f   |         | client backend | postgres | 10.124.33.132
 5556 | f   |         | client backend | postgres | 10.124.33.113
(3 rows)

In this case, the replication connection (walsender) is not yet using SSL and neither are the two other clients connected, so we need to force a restart if we want them to reconnect. As always, we recommend that you try all these steps in a testing environment first and that when it’s time to do it in production you do them in a properly established maintenance window (no matter how trivial the steps seem to be).

To force the replication connections to use SSL, one can either restart the service in the replica or use pg_terminate_backend (which will send the SIGTERM signal to the process and is safe to use in this context). In this case, we are using pg_terminate_backend in the primary itself, but it can also be used in the replica, provided we are using the correct PID number.

postgres=# select pg_terminate_backend(5547);
 pg_terminate_backend
----------------------
 t
(1 row)

After that, we should see the new replica connection correctly using the SSL/TLS protocol:

postgres=# select pg_ssl.pid, pg_ssl.ssl, pg_ssl.version,
           pg_sa.backend_type, pg_sa.usename, pg_sa.client_addr
           from pg_stat_ssl pg_ssl
           join pg_stat_activity pg_sa
             on pg_ssl.pid = pg_sa.pid;
 pid  | ssl | version |  backend_type  | usename  |  client_addr
------+-----+---------+----------------+----------+---------------
 5557 | t   | TLSv1.2 | walsender      | postgres | 10.124.33.113
 5549 | f   |         | client backend | postgres | 10.124.33.132
 5556 | f   |         | client backend | postgres | 10.124.33.113
(3 rows)

PID 5549 is our own connection, so that’s an easy fix:

postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
           5549
(1 row)

Connection from 5556 would be the remaining one for us to check if we need to enforce SSL on all. On the client-side, we can use \conninfo to check information on our current connection:

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "10.124.33.132" at port "5432".
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)

Disabling SSL/TLS

If you want to disable SSL instead, be sure to not lose the client connection after you set ssl=off and make changes to the pg_hba.conf file, otherwise you may be locked out if you don’t have any accounts using “host” only access method, and your only way out is to restart the service. To be safe, first, edit and reload pg_hba.conf file to include entries with “host”, and only then fully disable SSL (ssl=off).

Conclusion

Enabling SSL/TLS for in-transit connection encryption is easy, but there are some pitfalls to be aware of when it comes to enforcing its usage. Simply enabling the configuration for it is not enough for it to be enforced, even if by default some connections may prefer using SSL when it’s available. If you need to ensure that all connections use SSL, edit the pg_hba.conf file accordingly and make sure it’s loaded. Remember that “hostssl” entries are the ones that force this behavior.

We can use tcpdump and wireshark to check if connections are indeed being encrypted. But, that’s a topic for another blog…

↧

Egor Rogov: PostgreSQL 14 Internals

July 11, 2022, 5:00 pm

≫ Next: Hubert 'depesz' Lubaczewski: Better scrolling of long explain plans on explain.depesz.com

≪ Previous: Agustin Gallego: Enabling and Enforcing SSL/TLS for PostgreSQL Connections

This is a short note to inform that we are translating my book “PostgreSQL 14 Internals,” recently published in Russian. Liudmila Mantrova, who helped me a lot with the editing of the original, is working on the translation, too. My gratitude to her is beyond words.

The book is largely based on the articles I’ve published here and training courses my colleagues and I are developing. It dives deep into the problems of data consistency and isolation, explaining implementation details of multiversion concurrency control and snapshot isolation, buffer cache and write-ahead log, and then moves on to the twists and turns of the locking system. The book also covers the questions of planning and executing SQL queries, including the discussion of data access and join methods, statistics, and various index types.

The book is freely available in PDF. The translation is in progress, and for now only Part I of the book is ready. Other parts will follow soon, so stay tuned!

↧

Hubert 'depesz' Lubaczewski: Better scrolling of long explain plans on explain.depesz.com

July 12, 2022, 4:52 am

≫ Next: Luca Ferrari: PostgreSQL 15: changes in the low level backup functions

≪ Previous: Egor Rogov: PostgreSQL 14 Internals

Thanks to some discussion on Slack, and work by Alexandre Felipe viewing of large explains will be now a bit easier. You can see it, for example, in this explain. First of all, when you scroll down, the column headers stay in place. Plus – you can always see the horizontal scrollbar to see the … Continue reading

↧

Luca Ferrari: PostgreSQL 15: changes in the low level backup functions

July 12, 2022, 5:00 pm

≫ Next: Robert Bernier: Introducing PostgreSQL 15: Working with DISTINCT

≪ Previous: Hubert 'depesz' Lubaczewski: Better scrolling of long explain plans on explain.depesz.com

The upcoming new release of PostgreSQL does some changes to low level backup functions.

PostgreSQL 15: changes in the low level backup functions

The upcoming PostgreSQL 15 release implements a few changes into the /low level/ backup functions.
Nowdays I suspect nobody, except backup solution developers, know or use such functions, but I clearly remember when we developed our own scripts to do a continuos backup using functions like pg_start_backup() and pg_stop_backup().
You should use other backup solutions today, like the great pgBackRest.
In any case, what are the changes?
As you can read from [the release notes](https://www.postgresql.org/docs/15/release-15.html{:target=”_blank”} there two mainly:

functions have been renamed to a more consistent naming scheme.
a few functions and modes have been removed.

The functions are now named as pg_backup_, so pg_start_backup() becomes pg_backup_start(), and similarly, pg_stop_backup() becomes pg_backup_stop(). Quite frankly I like this decision, it makes the naming simpler to search for and to remember.
Moreover, there is no more the presence of deprecated (since version 9.6, if I remember correctly), the exclusive backup mode. This was the only way to perform a low level backup back in the days, but since a lot it has been deprecated. One of the problems with exclusive backups is that the system will create a label file that prevents the primary to restart after a crash, and in turn this led people to delete the label file also on standby servers. Now this is no more a problem, and the pg_backup_start() and pg_backup_stop() functions do not handle anymore the exclusive backup parameter.
As a consequence of this choice, the functions pg_is_in_backup() and pg_backup_start_time() have been removed because they were focused only on exclusive backups, that do not exist anymore.

↧

Robert Bernier: Introducing PostgreSQL 15: Working with DISTINCT

July 13, 2022, 5:20 am

≫ Next: Ryan Booz: State of PostgreSQL: How to Contribute to PostgreSQL and the Community

≪ Previous: Luca Ferrari: PostgreSQL 15: changes in the low level backup functions

Introducing PostgreSQL 15: Working with DISTINCT

Well, it’s that time of the year when once again we have a look at the newest version of PostgreSQL.

As tradition dictates, here at Percona, the team is given a list of features to write about. Mine happened to be about a very basic and, I might add, important function i.e. SELECT DISTINCT.

Before getting into the details I’d like to mention a couple of caveats regarding how the results were derived for this blog:

The tables are pretty small and of a simple architecture.
Because this demonstration was performed upon a relatively low-powered system, the real metrics have the potential of being significantly greater than what is demonstrated.

For those new to postgres, and the ANSI SQL standard for that matter, the SELECT DISTINCT statement eliminates duplicate rows from the result by matching specified expressions.

For example, given the following table:

table t_ex;
 c1 | c2
----+----
  2 | B
  4 | C
  6 | A
  2 | C
  4 | B
  6 | B
  2 | A
  4 | B
  6 | C
  2 | C

This SQL statement returns those records filtering out the UNIQUE values found in column “c1” in SORTED order:

select distinct on(c1) * from t_ex;

Notice, as indicated by column “c2”, that c1 uniqueness returns the first value found in the table:

c1 | c2
----+----
  2 | B
  4 | B
  6 | B

This SQL statement returns those records filtering out UNIQUE values found in column “c2”

select distinct on(c2) * from t_ex;

c1 | c2
----+----
  6 | A
  2 | B
  4 | C

And finally, of course, returning uniqueness for the entire row:

select distinct * from t_ex;

c1 | c2
----+----
  2 | A
  6 | B
  4 | C
  2 | B
  6 | A
  2 | C
  4 | B
  6 | C

So what’s this special new enhancement of DISTINCT you ask? The answer is that it’s been parallelized!

In the past, only a single CPU/process was used to count the number of distinct records. However, in postgres version 15 one can now break up the task of counting by running multiple numbers of workers in parallel each assigned to a separate CPU process. There are a number of runtime parameters controlling this behavior but the one we’ll focus on is max_parallel_workers_per_gather.

Let’s generate some metrics!

In order to demonstrate this improved performance three tables were created, without indexes, and populated with approximately 5,000,000 records. Notice the number of columns for each table i.e. one, five, and 10 respectively:

Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           |          |

Table "public.t5"
 Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
 c1     | integer               |           |          |
 c2     | integer               |           |          |
 c3     | integer               |           |          |
 c4     | integer               |           |          |
 c5     | character varying(40) |           |          |

Table "public.t10"
 Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
 c1     | integer               |           |          |
 c2     | integer               |           |          |
 c3     | integer               |           |          |
 c4     | integer               |           |          |
 c5     | character varying(40) |           |          |
 c6     | integer               |           |          |
 c7     | integer               |           |          |
 c8     | integer               |           |          |
 c9     | integer               |           |          |
 c10    | integer               |           |          |

insert into t1 select generate_series(1,500);

insert into t5
select   generate_series(1,500)
        ,generate_series(500,1000)
        ,generate_series(1000,1500)
        ,(random()*100)::int
        ,'aofjaofjwaoeev$#^&ETHE#@#Fasrhk!!@%Q@';

insert into t10
select   generate_series(1,500)
        ,generate_series(500,1000)
        ,generate_series(1000,1500)
        ,(random()*100)::int
        ,'aofjaofjwaoeev$#^&ETHE#@#Fasrhk!!@%Q@'
        ,generate_series(1500,2000)
        ,generate_series(2500,3000)
        ,generate_series(3000,3500)
        ,generate_series(3500,4000)
        ,generate_series(4000,4500);

List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method |  Size  |
--------+------+-------+----------+-------------+---------------+--------+
 public | t1   | table | postgres | permanent   | heap          | 173 MB |
 public | t10  | table | postgres | permanent   | heap          | 522 MB |
 public | t5   | table | postgres | permanent   | heap          | 404 MB |

The next step is to copy the aforementioned data dumps into the following versions of postgres:

PG VERSION
    pg96
    pg10
    pg11
    pg12
    pg13
    pg14
    pg15

The postgres binaries were compiled from the source and data clusters were created on the same low-powered hardware using the default, and untuned, runtime configuration values.

Once populated, the following bash script was executed to generate the results:

#!/bin/bash
for v in 96 10 11 12 13 14 15
do
    # run the explain analzye 5X in order to derive consistent numbers
    for u in $(seq 1 5)
    do
        echo "--- explain analyze: pg${v}, ${u}X ---"
        psql -p 100$v db01 -c "explain analyze select distinct on (c1) * from t1" > t1.pg$v.explain.txt
        psql -p 100$v db01 -c "explain analyze select distinct * from t5" > t5.pg$v.explain.txt
        psql -p 100$v db01 -c "explain analyze select distinct * from t10" > t10.pg$v.explain.txt
    done
done

And here are the results: One can see that the larger the tables become the greater the performance gains that can be achieved.

PG VERSION	1 column (t1), ms	5 column (t5), ms	10 column (t10), ms
pg96	3,382	9,743	20,026
pg10	2,004	5,746	13,241
pg11	1,932	6,062	14,295
pg12	1,876	5,832	13,214
pg13	1,973	2,358	3,135
pg14	1,948	2,316	2,909
pg15	1,439	1,025	1,245

QUERY PLAN

One of the more interesting aspects of the investigation was reviewing the query plans between the different versions of postgres. For example, the query plan for a single column DISTINCT was actually quite similar, ignoring the superior execution time of course, between the postgres 9.6 and 15 plans respectively.

PG96 QUERY PLAN, TABLE T1
-------------------------------------------------------------------------------
 Unique  (cost=765185.42..790185.42 rows=500 width=4) (actual time=2456.805..3381.230 rows=500 loops=1)
   ->  Sort  (cost=765185.42..777685.42 rows=5000000 width=4) (actual time=2456.804..3163.600 rows=5000000 loops=1)
         Sort Key: c1
         Sort Method: external merge  Disk: 68432kB
         ->  Seq Scan on t1  (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.055..291.523 rows=5000000 loops=1)
 Planning time: 0.161 ms
 Execution time: 3381.662 ms

PG15 QUERY PLAN, TABLE T1
---------------------------------------------------------------------------
 Unique  (cost=557992.61..582992.61 rows=500 width=4) (actual time=946.556..1411.421 rows=500 loops=1)
   ->  Sort  (cost=557992.61..570492.61 rows=5000000 width=4) (actual time=946.554..1223.289 rows=5000000 loops=1)
         Sort Key: c1
         Sort Method: external merge  Disk: 58720kB
         ->  Seq Scan on t1  (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.038..259.329 rows=5000000 loops=1)
 Planning Time: 0.229 ms
 JIT:
   Functions: 1
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.150 ms, Inlining 31.332 ms, Optimization 6.746 ms, Emission 6.847 ms, Total 45.074 ms
 Execution Time: 1438.683 ms

The real difference showed up when the number of DISTINCT columns were increased, as demonstrated by querying table t10. One can see parallelization in action!

PG96 QUERY PLAN, TABLE T10
-------------------------------------------------------------------------------------------
 Unique  (cost=1119650.30..1257425.30 rows=501000 width=73) (actual time=14257.801..20024.271 rows=50601 loops=1)
   ->  Sort  (cost=1119650.30..1132175.30 rows=5010000 width=73) (actual time=14257.800..19118.145 rows=5010000 loops=1)
         Sort Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
         Sort Method: external merge  Disk: 421232kB
         ->  Seq Scan on t10  (cost=0.00..116900.00 rows=5010000 width=73) (actual time=0.073..419.701 rows=5010000 loops=1)
 Planning time: 0.352 ms
 Execution time: 20025.956 ms

PG15 QUERY PLAN, TABLE T10
------------------------------------------------------------------------------------------- HashAggregate  (cost=699692.77..730144.18 rows=501000 width=73) (actual time=1212.779..1232.667 rows=50601 loops=1)
   Group Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
   Planned Partitions: 16  Batches: 17  Memory Usage: 8373kB  Disk Usage: 2976kB
   ->  Gather  (cost=394624.22..552837.15 rows=1002000 width=73) (actual time=1071.280..1141.814 rows=151803 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  HashAggregate  (cost=393624.22..451637.15 rows=501000 width=73) (actual time=1064.261..1122.628 rows=50601 loops=3)
               Group Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
               Planned Partitions: 16  Batches: 17  Memory Usage: 8373kB  Disk Usage: 15176kB
               Worker 0:  Batches: 17  Memory Usage: 8373kB  Disk Usage: 18464kB
               Worker 1:  Batches: 17  Memory Usage: 8373kB  Disk Usage: 19464kB
               ->  Parallel Seq Scan on t10  (cost=0.00..87675.00 rows=2087500 width=73) (actual time=0.072..159.083 rows=1670000 loops=3)
 Planning Time: 0.286 ms
 JIT:
   Functions: 31
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 3.510 ms, Inlining 123.698 ms, Optimization 200.805 ms, Emission 149.608 ms, Total 477.621 ms
 Execution Time: 1244.556 ms

INCREASING THE PERFORMANCE: Performance enhancements were made by updating the postgres runtime parameter max_parallel_workers_per_gather. The default value in a newly initialized cluster is 2. As the table below indicates, it quickly became an issue of diminishing returns due to the restricted capabilities of the testing hardware itself.

POSTGRES VERSION 15

max_parallel_workers_per_gather	1 column (t1)	5 column (t5)	10 column (t10)
2	1,439	1,025	1,245
3	1,464	875	1,013
4	1,391	858	977
6	1,401	846	1,045
8	1,428	856	993

PostgreSQL Distinct

ABOUT INDEXES: Performance improvements were not realized when indexes were applied as demonstrated in this query plan.

PG15, TABLE T10(10 DISTINCT columns), and max_parallel_workers_per_gather=4:

QUERY PLAN                                                                                 
-----------------------------------------------------------------------------------
 Unique  (cost=0.43..251344.40 rows=501000 width=73) (actual time=0.060..1240.729 rows=50601 loops=1)
   ->  Index Only Scan using t10_c1_c2_c3_c4_c5_c6_c7_c8_c9_c10_idx on t10  (cost=0.43..126094.40 rows=5010000 width=73) (actual time=0.058..710.780 rows=5010000 loops=1)
         Heap Fetches: 582675
 Planning Time: 0.596 ms
 JIT:
   Functions: 1
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.262 ms, Inlining 0.000 ms, Optimization 0.122 ms, Emission 2.295 ms, Total 2.679 ms
 Execution Time: 1249.391 ms

CONCLUDING THOUGHTS: Running DISTINCT across multiple CPUs is a big advance in performance capabilities. But keep in mind the risk of diminishing performance as you increase the number of max_parallel_workers_per_gather and you approach your hardware’s limitations. And as the investigation showed, under normal circumstances, the query planner might decide to use indexes instead of running parallel workers. One way to get around this is to consider disabling runtime parameters such as enable_indexonlyscan and enable_indexscan. Finally, don’t forget to run EXPLAIN ANALYZE in order to understand what’s going on.

↧

Ryan Booz: State of PostgreSQL: How to Contribute to PostgreSQL and the Community

July 13, 2022, 6:55 am

≫ Next: Paul Ramsey: Postgres Indexing: When Does BRIN Win?

≪ Previous: Robert Bernier: Introducing PostgreSQL 15: Working with DISTINCT

The State of PostgreSQL survey was recently closed with overwhelming community response and excitement. We continue to dig into the results and prepare the data to make it easy to share with everyone in the community at the end of July.

One observation that stands out in the results, particularly with the solid growth of PostgreSQL adoption, is how few users contribute to the project and community—only 12 % of users with 15 or fewer years of experience have contributed at least once. Add to that the multiple comments each year about how difficult and (seemingly) underappreciated some contributions are, and it feels like this topic should be at the top of our discussion list.

To be clear, it's evident that the community is engaged, excited, and supportive. But at both ends of the "how long have you been using PostgreSQL" spectrum, there seems to be agreement that more involvement would be good and that making it easier for users to get involved should be a priority.

Let's take a closer look.

Have You Ever Contributed to PostgreSQL?

This year's survey had nearly 1,000 respondents, more than twice as many as last year. While there was an increase in users that have contributed to PostgreSQL in some way (17 % vs. 15 % in 2021), it strikes me that this nearly mirrors the 80/20 rule, otherwise known as the Pareto Principle. Taken at face value, it appears that approximately 17 % of the community is responsible for creating most of the value that the community receives.

But that's only part of the story.

As we said above, it's evident that code contribution and figuring out ways to increase community involvement in the project is something many people think about. What surprised me even more is how years of PostgreSQL experience and level of project contribution influenced user comments in this area.

When a user has between 0-5 years of experience using PostgreSQL, the focus around community contribution is primarily on better tooling and examples for helping to build the community, things that are often non-code in nature:

Once users have 6-15 years of experience, the focus begins to change. It appears that this group of users, having invested a significant amount of time using PostgreSQL, wants a better view of the project roadmap with a common theme around making contributions more straightforward and accessible.

And finally, once a user has invested at least 15 years into learning and using PostgreSQL, it becomes more evident that the project can only be sustained through more people getting involved. There's a genuine sense that growing the base of contributors is better for everyone.

The arc of this journey isn't surprising. The newer someone is to the project, the more they're looking for non-technical improvements that would help them succeed. Users with solid experience begin to look for ways to contribute but still find it challenging to do that. By the time someone has been involved with PostgreSQL for 15 years, they know that the project's success is dependent on bringing in new contributors and that there are still some significant barriers for most users.

Do any of these comments resonate with you? When you look at the comments from your peer group (based on your years of experience with PostgreSQL), do you have similar thoughts about supporting the project and community? Does it feel like now might be the right time to start finding ways that you could contribute to the project or community?

Me too! So let's talk about some ways we could all take the next step.

The First Steps Towards Contributing to PostgreSQL

As it turns out, we had a similar discussion last year based on the survey feedback. Aleksander Alekseev (Timescaler and PostgreSQL contributor) wrote a 💯 article on how to contribute to PostgreSQL (or any open-source project). I'd encourage you to read the article in full, but Aleksander made a few key points last year, which I think are worth repeating in light of the 2022 survey responses.

Identify your motivation

Regardless of the total PostgreSQL experience someone has, it seems clear that community members want to contribute more and see the overall community experience continue to improve. By taking the next step in one of the areas mentioned in the survey, you need to know your end goal because it will influence both the experience you need and the benefit you and others will gain.

Aleksander identified a few principal reasons people often contribute to a project like PostgreSQL:

To gain a unique experience
To learn about the technical internals of a project
To work with great people
To make users happier
To boost your resume

When I look at the list of comments that users left, I see many areas that almost anyone could contribute to, both coding and non-coding. Knowing how you can effectively contribute and why you're motivated to do so will go a long way in helping you identify just the right place to start.

Consider contributing beyond code

What I noticed most in the comments users provided about improving the community is that most of the need seems to be in the non-code area. While some comments are directly related to code contributions, many more deal with educational content, improving the documentation with better examples, and providing more options for getting community help.

Stated plainly, PostgreSQL and every other open-source project succeed when there are technical contributions and a host of supporters.

Improving the documentation is mentioned often. Providing better examples around mid-level features shows up a few times. Advanced topics like replication and partitioning are common areas for people to seek help.

These helpful resources could happen within the official documentation (especially improved query examples!) or through personal blogs, conference talks (check out the upcoming events for call for papers and submit your talk proposal), or videos.

Speaking of conferences and meetups, we can all acknowledge that the last two years have transformed our expectations around events and community. There are so many more opportunities for everyone that wants to contribute, whether locally or across the globe. For instance, although PGConf NYC is in New York City and coordinated by the North American PostgreSQL group, there is some help and collaboration from organizers and volunteers in Europe.

There are always user groups that you can attend or offer to present at, most of which still have a virtual option available! You could get involved by offering to translate documentation or the Code of Conduct. And finally, you can join the PostgreSQL Slack, Discord, or novice email list to offer your support and expertise to other users within the community. There are so many great opportunities to help build PostgreSQL at many levels!

To modify the official documentation, you'll have to check out the main repository and learn about writing in DocBook (XML) format. The one caveat for contributing to the official documentation is that it is part of the main Git repository and subject to the same process (discussed below) as regular source contributions.

While this is a little more effort than a typical pull request workflow, submitting documentation contributions can be an excellent way to start with small patches so you can become familiar with the workflow.

Speaking of workflow and patches in the main repository (both code and documentation), let's briefly discuss how you can get started.

Contributing with code

Maybe you look at the comments above, particularly if you've had 6-10 years of PostgreSQL experience, and you're beginning to feel like now is the time to start contributing to the core code project.

Despite clear attempts to clarify the process by the team of users contributing to core PostgreSQL development, I see from these comments that many users still don't understand how code contribution works and what a Commitfest entails. (Hint, hint: I'm guessing that a clear, organized blog, conference talk, or YouTube video on the process would be an instant hit!)

A Commitfest is a month-long iteration during which new patches are accepted. When a Commitfest is in progress, contributors won’t accept new patches, instead adding them to the following event. There are five Commitfests every year: January, March, July, September, and November.

At a high level, to contribute a patch to the PostgreSQL project, you'll need to join the pgsql-hackers@ mailing list and get familiar with the Commitfest tracking website. Additionally, the Commitfest Wiki page provides a checklist of how the process works from beginning to end and what the Commitfest manager is responsible for doing.

Once you're engaged on the mailing list and understand when the next Commitfest will start, you can work through the pgsql-hackers list to submit the patch for inclusion in the next Commitfest. After the patch is accepted and added to a Commitfest, the real work begins—and all feedback happens within the mailing list.

At the start of a Commitfest, proposed patches typically have the following lifecycle:

Knowing the potential flow of a patch through the Commitfest can help you understand a particular focus for a pending update or upcoming major release. Over the last few major releases, there might be many smaller changes (some of which have a huge impact), but there tend to be larger themes of functionality that define a specific release of PostgreSQL. I point this out because even helping to clarify or blog about upcoming topics in PostgreSQL releases can be tremendously helpful to the overall community.

In short, whether you can contribute patches or not, learning to follow the Commitfest process and proposed patches is a great way to get more connected to the project.

How Will You Contribute to the Community Next?

The State of PostgreSQL 2022 survey results once again provided valuable insights into how the community uses PostgreSQL, community strengths, and some areas that many think could improve. One of those areas continues to be around community support and contributions.

Hopefully, just seeing some of the comments that community members provided regarding community and growth will encourage more of us to jump in with contributions. PostgreSQL will get better, and the community will get stronger.

Read the Report

Now that we’ve given you a taste of our survey results, are you curious to learn more about the PostgreSQL community? If you’d like to know more insights about the State of PostgreSQL 2022, including why respondents chose PostgreSQL, their opinion on industry events, and what information sources they would recommend to friends and colleagues, don’t miss our complete report. Click here to get notified and learn firsthand what the State of PostgreSQL is in 2022.

↧

Paul Ramsey: Postgres Indexing: When Does BRIN Win?

July 13, 2022, 8:00 am

≫ Next: Regina Obe: PostGIS 3.3.0beta2

≪ Previous: Ryan Booz: State of PostgreSQL: How to Contribute to PostgreSQL and the Community

<p>The PostgreSQL BRIN index is a specialized index for (as the documentation says) "handling very large tables in which certain columns have some natural correlation with their physical location within the table".</p> <p>For data of that sort, BRIN indexes offer extremely low insert costs (good for high velocity data) and extremely small index sizes (good for high volume data).</p> <p>But what data has this "natural correlation"?</p> <p>Most frequently, data with a timestamp that is continuously adding new rows.</p> <ul> <li>A log table</li> <li>A table of GPS track points</li> <li>A table of IoT sensor measurements</li> </ul> <p>In these examples the timestamp will be either the insertion time or the measurement time, and the stream of new data will be appended to the table more-or-less in the same order as the timestamp column values.</p> <p>So that's a pretty narrow subset of the kinds of data you might manage. However, if you <em>do</em> have that kind of data, BRIN indexes might be helpful.</p> <h2>Under the Covers</h2> <p>Because the BRIN index is so simple, it's possible to describe the internals with barely any simplifying.</p> <p>Data in PostgreSQL tables are <strong>arranged on disk in equal-sized "pages"</strong> of 8kb each. So a table will physically reside on disk as a collection of pages. Within each page, rows are packed in from the front, with gaps appearing as data is deleted/updated and usually some spare space at the end for future updates.</p> <p><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/30d69d2a-0a61-412c-0dce-20dcd6257f00/public" alt="Table Page" loading="lazy"></p> <p>A table with narrow rows (few columns, small values) will fit a lot of rows into a page. A table with wide rows (more columns, long strings) will fit only a few.</p> <p>Because each page holds multiple rows, we can state that a given column in that page has a minimum and maximum value in that page. When searching for a particular value, the whole page can be skipped, if the value is not within the min/max of the page. This is the core <strong>magic of BRIN</strong>.</p> <p>So, for BRIN to be effective, you need a table where the <strong>physical layout</strong> and the <strong>ordering of the column of interest</strong> are <strong>strongly correlated</strong>. In situation of perfect correlation (which we test below) each page will in fact contain a completely unique set of of values.</p> <p><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/0e7066aa-f5e3-45e9-1eba-c0457d8dbb00/public" alt="Correlated Table" loading="lazy"></p> <p>The BRIN index is a small table that associates a range of values with a range of pages in the table order. Building the index just requires a single scan of the table, so compared to building a structure like a BTree, it is very fast.</p> <p><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/138f0faa-91a9-4a9c-ea57-c332a602ad00/public" alt="Index Structure" loading="lazy"></p> <p>Because the BRIN has one entry for each range of pages, it's also very small. The number of pages in a range is configurable, but the default is 128. As we will see, tuning this number can make a <strong>big difference</strong> in query performance.</p> <h2>Measuring the Differences</h2> <p>For testing, we generate a table with three columns: one key completely uncorrelated to the storage on disk ("random"), one key perfectly correlated ("sequential"), and a "value" column to retrieve.</p> <pre><code class="language-sql">CREATE TABLE test AS SELECT 1000000.0*random() AS random, a::float8 AS sequential, 1.0 AS value FROM generate_series(1, 1000000) a; ANALYZE test; </code></pre> <p>Both keys are in the range of zero to one million, so range queries on them will have similar numbers of return values. Now we measure some baseline timings with <strong>larger and larger</strong> result sets.</p> <pre><code class="language-sql">EXPLAIN ANALYZE SELECT Sum(value) FROM test WHERE random between 0.0 and 100.0; EXPLAIN ANALYZE SELECT Sum(value) FROM test WHERE sequential between 0.0 and 100.0; </code></pre> <p>This is the summary query that all the timings below use. It sums the <code>value</code> column based on a filter of either the <code>random</code> or <code>sequential</code> keys.</p> <p>(It is worth noting as a side bar that in the case of summarizing the indexed column, the btree has the advantage of being able to use an index-only scan. To do an apples-to-apples comparison, we've avoided that here by summarizing a separate "value" column.)</p> <p>For the first test, since there are no indexes yet, the system has to scan the whole table every time, so the only change is it takes <strong>slightly longer</strong> to sum all the values as the result set gets larger.</p> <table> <thead> <tr> <th>Rows</th> <th>Filter Rand</th> <th>Filter Seq</th> </tr> </thead> <tbody> <tr> <td>100</td> <td>220 ms</td> <td>218 ms</td> </tr> <tr> <td>1000</td> <td>230 ms</td> <td>224 ms</td> </tr> <tr> <td>10000</td> <td>250 ms</td> <td>249 ms</td> </tr> <tr> <td>100000</td> <td>262 ms</td> <td>264 ms</td> </tr> </tbody> </table> <p>Let's build indexes now.</p> <pre><code class="language-sql">CREATE INDEX btree_random_x ON test (random); CREATE INDEX btree_sequential_x ON test (sequential); CREATE INDEX brin_random_x ON test USING BRIN (random); CREATE INDEX brin_sequential_x ON test USING BRIN (sequential); </code></pre> <p>Note the huge size differences among the indexes!</p> <pre><code class="language-sql">SELECT pg_size_pretty(pg_relation_size('test')) AS table_size, pg_size_pretty(pg_relation_size('btree_random_x')) AS btree_random_size, pg_size_pretty(pg_relation_size('brin_random_x')) AS brin_random_size, pg_size_pretty(pg_relation_size('btree_sequential_x')) AS btree_sequential_size, pg_size_pretty(pg_relation_size('brin_sequential_x')) AS brin_sequential_size; </code></pre> <p>The BTree indexes end up very close to the table size. The BRIN indexes are <strong>1000 times smaller</strong>. This is with the default <code>pages_per_range</code> of <strong>128</strong> -- smaller values of <code>pages_per_range</code> will result in slightly larger (but still very small!) indexes.</p> <table> <thead> <tr> <th>Relation</th> <th>Size</th> </tr> </thead> <tbody> <tr> <td>table_size</td> <td>42 MB</td> </tr> <tr> <td>btree_random_size</td> <td>21 MB</td> </tr> <tr> <td>brin_random_size</td> <td>24 kB</td> </tr> <tr> <td>btree_sequential_size</td> <td>21 MB</td> </tr> <tr> <td>brin_sequential_size</td> <td>24 kB</td> </tr> </tbody> </table> <p>Now, we drop all the indexes, and then re-create them one at a time, testing each for larger and larger result sets.</p> <table> <thead> <tr> <th>Rows</th> <th>BTree Rand</th> <th>BTree Seq</th> <th>BRIN Rand</th> <th>BRIN Seq</th> </tr> </thead> <tbody> <tr> <td>100</td> <td>0.6 ms</td> <td>0.5 ms</td> <td>211 ms</td> <td>11 ms</td> </tr> <tr> <td>1000</td> <td>5 ms</td> <td>2 ms</td> <td>207 ms</td> <td>10 ms</td> </tr> <tr> <td>10000</td> <td>22 ms</td> <td>13 ms</td> <td>221 ms</td> <td>15 ms</td> </tr> <tr> <td>100000</td> <td>98 ms</td> <td>85 ms</td> <td>250 ms</td> <td>67 ms</td> </tr> </tbody> </table> <p>First, note that as expected the BRIN index is completely useless when filtering the random key. The order of the data on disk is uncorrelated with the order of the key, so the BRIN index is no better than a sequence scan.</p> <p>Second, note that for small result sets, the BTree outperforms BRIN, but as the result sets get larger, the BRIN index actually pulls ahead. For even larger result sets (1M records of a 10M or 100M record table) the BRIN advantage only grows.</p> <p>Even for small result sets, remember that the BRIN index is only taking up 0.1% the memory space of the BTree and the index update cost for new rows is also much lower.</p> <h2>Tuning Parameters</h2> <p>This test was with the default <code>pages_per_range</code> of <strong>128</strong>. Tuning this parameter to the general query filter width can result in quite different performance results.</p> <pre><code class="language-sql">DROP INDEX brin_sequential_x; CREATE INDEX brin_sequential_x ON test USING BRIN (sequential) WITH (pages_per_range=64); </code></pre> <table> <thead> <tr> <th>Rows</th> <th>PPR=128</th> <th>PPR=64</th> <th>PPR=32</th> <th>PPR=16</th> <th>PPR=8</th> <th>PPR=4</th> </tr> </thead> <tbody> <tr> <td>100</td> <td>11 ms</td> <td>7 ms</td> <td>4 ms</td> <td>3 ms</td> <td>2.5 ms</td> <td>3.5 ms</td> </tr> <tr> <td>1000</td> <td>12 ms</td> <td>8 ms</td> <td>5 ms</td> <td>4 ms</td> <td>3.5 ms</td> <td>4.5 ms</td> </tr> <tr> <td>10000</td> <td>13 ms</td> <td>12 ms</td> <td>11 ms</td> <td>11 ms</td> <td>12 ms</td> <td>13 ms</td> </tr> <tr> <td>100000</td> <td>67 ms</td> <td>68 ms</td> <td>67 ms</td> <td>68 ms</td> <td>69 ms</td> <td>69 ms</td> </tr> </tbody> </table> <p>The variable performances are a interplay of how many rows fit in each page of the table and how many pages the query filter needs to read to fulfill the query.</p> <p>Using the <a href="https://www.postgresql.org/docs/current/pgstattuple.html">stattuple</a> extension, we can get the number of tuples that fit in each page.</p> <pre><code class="language-sql">SELECT 1000000 / pg_relpages('test'); -- 156 </code></pre> <p>For the narrow, 100-row queries the larger <code>pages_per_range</code> values meant that a lot of pages didn't have relevant data, so reading and filtering it was pure overhead, like a sequence scan. For the smaller <code>pages_per_range</code> values, most of the values in any pulled page would be relevant, hence the faster queries, particularly for narrow filters.</p> <h2>Conclusion</h2> <ul> <li>The BRIN index can be a useful alternative to the BTree, for specific cases: <ul> <li>For tables with an "insert only" data pattern, and a correlated column (like a timestamp)</li> <li>For use cases with very large tables (too large for btree) or very high velocity (expensive to update) or both</li> <li>For queries that pull quite large (100K+) result sets from tables</li> </ul> </li> <li>The low cost to build, and low memory foot print make a BRIN index an option worth looking at, particularly for larger tabes with correlated column data</li> </ul>

↧

Regina Obe: PostGIS 3.3.0beta2

July 12, 2022, 5:00 pm

≫ Next: Nikolay Sivko: Pg-agent – a Postgres exporter for Prometheus focusing on query performance statistics

≪ Previous: Paul Ramsey: Postgres Indexing: When Does BRIN Win?

The PostGIS Team is pleased to release PostGIS 3.3.0beta2! Best Served with PostgreSQL 15 Beta2 and GEOS 3.11.0 and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0beta2

This release is a beta of a major release, it includes bug fixes since PostGIS 3.2.1 and new features.

↧

Nikolay Sivko: Pg-agent – a Postgres exporter for Prometheus focusing on query performance statistics

July 14, 2022, 4:00 am

≫ Next: Frits Hoogland: Postgres and the curious case of the cursory cursors

≪ Previous: Regina Obe: PostGIS 3.3.0beta2

Why we built another Postgres exporter for Prometheus

↧

Frits Hoogland: Postgres and the curious case of the cursory cursors

July 14, 2022, 9:31 am

≫ Next: Frits Hoogland: Postgres and the curious case of the cursory cursors

≪ Previous: Nikolay Sivko: Pg-agent – a Postgres exporter for Prometheus focusing on query performance statistics

Recently I was asked to look into a case where an identical SQL executed on the same database in two different ways resulted in two quite different latencies: 133ms versus 750ms.

One was acceptable for the client, and the other one wasn't.

How can the same SQL yield two such different latencies? First let me explain the setting: the first latency figure, the fast one, was a plain SQL. The second one was the exact same SQL, however it was inside a PLpgSQL procedure, and using a refcursor. Both returned around 60.000 rows. The SQL statement in both cases did not have any variables, they were completely identical.

For the plain SQL, it's easy to diagnose the execution plan and the time it spent: just put explain analyze in front of the SQL, execute it, and look at the statistics. However, for the SQL executed in PLpgSQL procedure, it is not possible to get the execution plan from inside the PLpgSQL procedure. The general way to deal with this is to use the plain SQL approach, by copying the SQL from the procedure and explain it. And that was the problem: once executed as plain SQL it exposed an acceptable latency, only when executed as PLpgSQL it exposed a different, not acceptable latency.

So, what can you do about it? It took me a few stabs at it before I found the issue.

We had an issue created for this, and a colleague found that if the order by from the statement was removed, it would function correct again.

I took an approach quite native to me, and ran linux perf to record execution stacks with call-graphs (perf record -g -p <PID>) to see what it was doing.

Many of the principal parts of an execution plan can be seen back in the C functions. One way of doing that is to look in the perf output (using perf report) for the function ExecScan, and look at what that function calls. This is the execution plan:

Sort  (cost=14603.24..14764.65 rows=64562 width=20)
  Sort Key: tile_row, tile_col, block_row, block_col
  ->  Bitmap Heap Scan on block_tbl b  (cost=2139.33..9445.26 rows=64562 width=20)
        Recheck Cond: ((world_key = 19301) AND (perimeter_num = 8007) AND (updated_ts = '1655419238118'::bigint))
        Filter: ((updated_user_key = 0) AND (frozen_flag = 0))
        ->  Bitmap Index Scan on block_perimeter  (cost=0.00..2123.18 rows=65108 width=0)
              Index Cond: ((world_key = 19301) AND (perimeter_num = 8007) AND (updated_ts = '1655419238118'::bigint))

And if I allocate ExecScan, and look at it's surrounding calls, it looks like this:

- 27.44% ExecSort
   - 12.93% tuplesort_gettupleslot
      - 9.15% tuplesort_gettuple_common
         - 4.42% mergereadnext
            - 2.84% readtup_heap
                 1.58% __mcount_internal
            - 0.95% getlen
                 0.63% __mcount_internal
              0.63% __mcount_internal
         - 3.47% tuplesort_heap_replace_top
            - 2.84% comparetup_heap
                 1.58% __mcount_internal
           1.26% __mcount_internal
        2.52% __mcount_internal
        0.95% _mcount
   - 6.31% ExecScan
      - 5.36% BitmapHeapNext
         - 3.15% MultiExecBitmapIndexScan
              index_getbitmap
            - btgetbitmap
               - 2.21% _bt_next
                    _bt_steppage
                    _bt_readnextpage
                  - _bt_readpage
                     - _bt_checkkeys
                          0.63% FunctionCall2Coll
               - 0.95% tbm_add_tuples
                    0.63% tbm_get_pageentry
           1.26% tbm_iterate
           0.63% heap_hot_search_buffer
      - 0.63% ExecInterpExpr
           slot_getsomeattrs
           slot_deform_tuple

There we got the sort first (ExecSort) and the scan second (ExecScan), which performs a function called MultiExecBitmapIndexScan. I can see the similarity of the functions and the executionplan.

Okay, now how about the perf report from the procedure? That looks like this:

- 13.55% ExecScan
   - 10.57% IndexNext
      - 9.76% index_getnext
         - 9.49% index_fetch_heap
            - 3.25% ReadBufferExtended
               - ReadBuffer_common
                  - 1.08% BufTableLookup
                       0.54% __mcount_internal
                       0.54% hash_search_with_hash_value
                    0.81% PinBuffer
                    0.54% LWLockAcquire
              1.90% heap_hot_search_buffer
              1.63% heap_page_prune_opt
              0.81% __mcount_internal
            - 0.81% ReleaseAndReadBuffer
                 __mcount_internal
              0.54% _mcount
              0.54% LWLockAcquire
      - 0.81% ExecStoreTuple
           0.54% __mcount_internal
   - 1.63% ExecInterpExpr
        0.54% slot_getsomeattrs

This looks different. It's performing IndexNext instead of BitmapHeapNext, so it's definitely using a different execution plan. Also, I was not able to find the ExecSort function. So the SQL execution is not performing sorting.

At this point I could prove there was another execution plan by looking at the low level execution. My suspicion at this point was that the planner might optimise for PLpgSQL, and leave some work (such as maybe the sorting) to PLpgSQL, which might be why it's gone, and another plan was chosen. But PLpgSQL is not a core component of the postgres code, it's a loadable procedural language, which you can validate by looking at the extensions.

What I tried next was creating the SQL as a function, and then use the function as a table inside the procedure. Functions are known for being able to be a 'barrier' for the planner to optimise SQL, and are treated as a 'black box', meaning it's left untouched. And that actually worked! By creating a function that returns a table, I get near the SQL performance.

However: that would mean a lot of work to revert all the SQLs in the procedures to individual functions, and swap the SQL with that function. And of course it doesn't explain why the procedure did influence the SQL optimization.

Then the client came back to me and said that if he used the refcursor for obtaining one or a small number of rows, it was actually pretty quick...but that that wasn't useful to him, because in the way it's used, it needs all the (60.000) rows.

That got me thinking: wait a minute...what if it isn't the procedure that is influencing this, but the cursor that is created? And a cursor can be explained! Explaining a cursor works in the following way:

postgres=# begin;
BEGIN
postgres=# explain declare tt cursor for select b.tile_row, b.tile_col, b.block_row, b.block_col, b.block_value
postgres-#     from block_tbl b
postgres-#    where b.world_key        = 19301
postgres-#     and b.updated_ts       = 1655419238118
postgres-#     and b.updated_user_key = 0
postgres-#     and b.frozen_flag      = 0
postgres-#     and b.perimeter_num    = 8007
postgres-# order by b.tile_row, b.tile_col, b.block_row, b.block_col;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using block_tile on block_tbl b  (cost=0.42..40080.63 rows=64562 width=20)
   Index Cond: ((world_key = 19301) AND (frozen_flag = 0))
   Filter: ((updated_ts = '1655419238118'::bigint) AND (updated_user_key = 0) AND (perimeter_num = 8007))

Bingo! This shows a different execution plan than earlier, which used a sort and bitmap scans.

Okay, this proves another plan is captured based on the SQL used for a cursor. But why is this happening?

It turns out SQL based on cursors is automagically optimised for a percentage of first rows based on the parameter cursor_tuple_fraction, which defaults to 0.1, alias 10%, whilst regular SQL execution is optimised for fetching 100%.

This is visible in the source code in parsenodes.h and planner.c. With cursors, or with the SPI_prepare_cursor function. The CURSOR_OPT_FAST_PLAN flag can be set, which then performs the first row optimization. Sadly, the different optimization goal is not shown, so it can't be seen from the execution plan.

Summary

The postgres planner optimises for 100% of execution. This is different for cursors, which are optimised for getting the first 10% of rows, which can lead to different execution plans and different latency if you do not want 10% of the rows as fast as possible, but 100% of the rows.

↧

Frits Hoogland: Postgres and the curious case of the cursory cursors

July 14, 2022, 9:31 am

≫ Next: Jeremy Schneider: PostgreSQL Performance Puzzle

≪ Previous: Frits Hoogland: Postgres and the curious case of the cursory cursors

Recently I was asked to look into a case where an identical SQL executed on the same database in two different ways resulted in two quite different latencies: 133ms versus 750ms.

One was acceptable for the client, and the other one wasn't.

So, what can you do about it? It took me a few stabs at it before I found the issue.

We had an issue created for this, and a colleague found that if the order by from the statement was removed, it would function correct again.

I took an approach quite native to me, and ran linux perf to record execution stacks with call-graphs (perf record -g -p <PID>) to see what it was doing.

Sort  (cost=14603.24..14764.65 rows=64562 width=20)
  Sort Key: tile_row, tile_col, block_row, block_col
  ->  Bitmap Heap Scan on block_tbl b  (cost=2139.33..9445.26 rows=64562 width=20)
        Recheck Cond: ((world_key = 19301) AND (perimeter_num = 8007) AND (updated_ts = '1655419238118'::bigint))
        Filter: ((updated_user_key = 0) AND (frozen_flag = 0))
        ->  Bitmap Index Scan on block_perimeter  (cost=0.00..2123.18 rows=65108 width=0)
              Index Cond: ((world_key = 19301) AND (perimeter_num = 8007) AND (updated_ts = '1655419238118'::bigint))

And if I allocate ExecScan, and look at it's surrounding calls, it looks like this:

- 27.44% ExecSort
   - 12.93% tuplesort_gettupleslot
      - 9.15% tuplesort_gettuple_common
         - 4.42% mergereadnext
            - 2.84% readtup_heap
                 1.58% __mcount_internal
            - 0.95% getlen
                 0.63% __mcount_internal
              0.63% __mcount_internal
         - 3.47% tuplesort_heap_replace_top
            - 2.84% comparetup_heap
                 1.58% __mcount_internal
           1.26% __mcount_internal
        2.52% __mcount_internal
        0.95% _mcount
   - 6.31% ExecScan
      - 5.36% BitmapHeapNext
         - 3.15% MultiExecBitmapIndexScan
              index_getbitmap
            - btgetbitmap
               - 2.21% _bt_next
                    _bt_steppage
                    _bt_readnextpage
                  - _bt_readpage
                     - _bt_checkkeys
                          0.63% FunctionCall2Coll
               - 0.95% tbm_add_tuples
                    0.63% tbm_get_pageentry
           1.26% tbm_iterate
           0.63% heap_hot_search_buffer
      - 0.63% ExecInterpExpr
           slot_getsomeattrs
           slot_deform_tuple

There we got the sort first (ExecSort) and the scan second (ExecScan), which performs a function called MultiExecBitmapIndexScan. I can see the similarity of the functions and the executionplan.

Okay, now how about the perf report from the procedure? That looks like this:

- 13.55% ExecScan
   - 10.57% IndexNext
      - 9.76% index_getnext
         - 9.49% index_fetch_heap
            - 3.25% ReadBufferExtended
               - ReadBuffer_common
                  - 1.08% BufTableLookup
                       0.54% __mcount_internal
                       0.54% hash_search_with_hash_value
                    0.81% PinBuffer
                    0.54% LWLockAcquire
              1.90% heap_hot_search_buffer
              1.63% heap_page_prune_opt
              0.81% __mcount_internal
            - 0.81% ReleaseAndReadBuffer
                 __mcount_internal
              0.54% _mcount
              0.54% LWLockAcquire
      - 0.81% ExecStoreTuple
           0.54% __mcount_internal
   - 1.63% ExecInterpExpr
        0.54% slot_getsomeattrs

postgres=# begin;
BEGIN
postgres=# explain declare tt cursor for select b.tile_row, b.tile_col, b.block_row, b.block_col, b.block_value
postgres-#     from block_tbl b
postgres-#    where b.world_key        = 19301
postgres-#     and b.updated_ts       = 1655419238118
postgres-#     and b.updated_user_key = 0
postgres-#     and b.frozen_flag      = 0
postgres-#     and b.perimeter_num    = 8007
postgres-# order by b.tile_row, b.tile_col, b.block_row, b.block_col;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using block_tile on block_tbl b  (cost=0.42..40080.63 rows=64562 width=20)
   Index Cond: ((world_key = 19301) AND (frozen_flag = 0))
   Filter: ((updated_ts = '1655419238118'::bigint) AND (updated_user_key = 0) AND (perimeter_num = 8007))

Bingo! This shows a different execution plan than earlier, which used a sort and bitmap scans.

Okay, this proves another plan is captured based on the SQL used for a cursor. But why is this happening?

Summary

The postgres planner optimises for getting 100% of the rows for execution. This is different for cursors, which are optimised for getting the first 10% of rows, which can lead to different execution plans and different latency if you do not want 10% of the rows as fast as possible, but 100% of the rows as fast as possible.

↧

Jeremy Schneider: PostgreSQL Performance Puzzle

July 16, 2022, 7:42 pm

≫ Next: Luca Ferrari: PostgreSQL 15: changes in the public schema permissions

≪ Previous: Frits Hoogland: Postgres and the curious case of the cursory cursors

This week I stumbled across a really big surprise in Open Source PostgreSQL. I thought it would make a fun puzzle to challenge the skills of all you on the internet who consider yourselves as PostgreSQL experts.

It’s very simple. I spun up a fresh copy of Centos 7 on an EC2 instance and loaded the latest Open Source PostgreSQL production version (14.4) from the official PG yum repositories. Then I created a simple table with a text column and two not-null bigint columns. I populated the table with 100 million rows of dummy data using generate_series() and then I indexed both bigint columns in the same way. I did a vacuum analyze to get stats and make sure the table is ready for a simple test. And finally, I did a quick bounce of PG and cleared the OS page cache so we could start the test cold.

Now I ran a very simple query:

select count(mydata) from test where mynumber1 < 500000;

Now here’s what shocked me. Starting with a cold cache every time, that query right there consistently completes in less than 500 milliseconds. But when I run the same query on the the other number column, it consistently takes over 100 seconds! Yes, seconds – not milliseconds. With exactly the same execution plan, processing the same number of rows! (500 thousand rows or 0.5% of the table.)

Now I’m not surprised to see specific data in a table impacting performance – that’s 101 for anyone who’s been around RDBMS performance. But the magnitude of this difference was way higher than I expected. I can repeat the two queries over and over with the same difference in executions times.

Again: it’s a simple table with two not-null bigint columns. The columns are defined identically – same data type, same index type, same execution plan, same number of rows processed. The only difference is the data in the table. There are no null values involved. And the difference in execution time is astronomical – as much as 600 X slower for some of my iterations!!

Can you guess what the data in the table is?

Included below is a complete and detailed transcript of the reproduction – I’ve just excluded the expressions that I used to populate the test table. I’m curious if anyone will be able to guess what those expressions were!!

If anyone tells me a guess on blog comments or twitter, I’ll test out their expression using the transcript below. Then I’ll credit anyone who sends a guess that exhibits the same shocking difference in performance between the two column. Fame and glory, my friends. In a week or so (maybe sooner if someone guesses it really fast), I’ll publish the two expressions that I myself redacted from my session transcript below.

Major, big-time bonus fame and glory if anyone can explain the reason.

PS. Mohamed, if you’re reading this, you’re not allowed to guess! (He found it first but we worked together to figure out what was going on.)

instance_type: r5.large
volume_type: gp2
volume_size: 100

# CentOS Linux 7 x86_64 HVM EBS ENA 2002_01
us-west-2: ami-0bc06212a56393ee1

[root@ip-172-31-36-129 ~]# rpm -q postgresql14-server
postgresql14-server-14.4-1PGDG.rhel7.x86_64
[root@ip-172-31-36-129 ~]#

pg-14.4 rw root@db1=# create table test ( mydata text, mynumber1 bigint not null, mynumber2 bigint not null);
CREATE TABLE
Time: 6.617 ms

pg-14.4 rw root@db1=# insert into test
[more] - >   select 'data-XXXXXXXXXXXX'
[more] - >     , <GUESS THIS FIRST EXPRESSION>[more] - >     , <GUESS THIS SECOND EXPRESSION>[more] - >   from generate_series(1,100000000) as n
[more] - > ;
INSERT 0 100000000
Time: 224101.677 ms (03:44.102)

pg-14.4 rw root@db1=# create index test_mynumber1 on test(mynumber1);
CREATE INDEX
Time: 229785.314 ms (03:49.785)

pg-14.4 rw root@db1=# create index test_mynumber2 on test(mynumber2);
CREATE INDEX
Time: 108900.146 ms (01:48.900)

pg-14.4 rw root@db1=# vacuum verbose analyze test;
INFO:  00000: vacuuming "public.test"
LOCATION:  lazy_scan_heap, vacuumlazy.c:940
INFO:  00000: launched 1 parallel vacuum worker for index cleanup (planned: 1)
LOCATION:  do_parallel_vacuum_or_cleanup, vacuumlazy.c:2766
INFO:  00000: table "test": found 0 removable, 79913800 nonremovable row versions in 665949 out of 833334 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 757
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 8.44 s, system: 3.97 s, elapsed: 96.48 s.
LOCATION:  lazy_scan_heap, vacuumlazy.c:1674
INFO:  00000: vacuuming "pg_toast.pg_toast_16454"
LOCATION:  lazy_scan_heap, vacuumlazy.c:940
INFO:  00000: table "pg_toast_16454": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 758
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
LOCATION:  lazy_scan_heap, vacuumlazy.c:1674
INFO:  00000: analyzing "public.test"
LOCATION:  do_analyze_rel, analyze.c:331
INFO:  00000: "test": scanned 30000 of 833334 pages, containing 3600000 live rows and 0 dead rows; 30000 rows in sample, 100000080 estimated total rows
LOCATION:  acquire_sample_rows, analyze.c:1357
VACUUM
Time: 100010.196 ms (01:40.010)



[root@ip-172-31-36-129 ~]# service postgresql-14 restart;
Redirecting to /bin/systemctl restart postgresql-14.service
[root@ip-172-31-36-129 ~]# sync; echo 1 > /proc/sys/vm/drop_caches
[root@ip-172-31-36-129 ~]#



pg-14.4 rw root@db1=# select count(mydata) from test where mynumber1<500000;
 count
--------
 499999
(1 row)

Time: 155.314 ms



[root@ip-172-31-36-129 ~]# service postgresql-14 restart;
Redirecting to /bin/systemctl restart postgresql-14.service
[root@ip-172-31-36-129 ~]# sync; echo 1 > /proc/sys/vm/drop_caches
[root@ip-172-31-36-129 ~]#



pg-14.4 rw root@db1=# select count(mydata) from test where mynumber2<500000;
 count
--------
 499180
(1 row)

Time: 105202.086 ms (01:45.202)



[root@ip-172-31-36-129 ~]# service postgresql-14 restart;
Redirecting to /bin/systemctl restart postgresql-14.service
[root@ip-172-31-36-129 ~]# sync; echo 1 > /proc/sys/vm/drop_caches
[root@ip-172-31-36-129 ~]#


pg-14.4 rw root@db1=# select count(mydata) from test where mynumber1<500000;
 count
--------
 499999
(1 row)

Time: 494.063 ms



[root@ip-172-31-36-129 ~]# service postgresql-14 restart;
Redirecting to /bin/systemctl restart postgresql-14.service
[root@ip-172-31-36-129 ~]# sync; echo 1 > /proc/sys/vm/drop_caches
[root@ip-172-31-36-129 ~]#



pg-14.4 rw root@db1=# select count(mydata) from test where mynumber2<500000;
 count
--------
 499180
(1 row)

Time: 104084.311 ms (01:44.084)

↧

Luca Ferrari: PostgreSQL 15: changes in the public schema permissions

July 14, 2022, 5:00 pm

≫ Next: Andreas 'ads' Scherbaum: Laura Ricci

≪ Previous: Jeremy Schneider: PostgreSQL Performance Puzzle

The upcoming new release of PostgreSQL does some changes on the public schema permissions.

PostgreSQL 15: changes in the `public` schema permissions

In PostgreSQL 15 the default public schema that every database has will have a different set of permissions. In fact, before PostgreSQL 15, every user could manipulate the public schema of a database he is not owner. Since the upcoming new version, only the database owner will be granted full access to the public schema, while other users will need to get an explicit GRANT:

Imagine the user luca is owner of the database testdb: it means he can do whatever he wants on the database.

testdb=>SHOWserver_version;server_version----------------15beta2(1row)testdb=>SELECTcurrent_role,current_user;current_role|current_user--------------+--------------luca|luca(1row)testdb=>CREATETABLEmytable(ttext);CREATETABLE

On the other hand, another user, let’s say pgbench, cannot:

testdb=>SELECTcurrent_role,current_user;current_role|current_user--------------+--------------pgbench|pgbench(1row)testdb=>CREATETABLEmytable2(ttext);ERROR:permissiondeniedforschemapublicLINE1:CREATETABLEmytable2(ttext);testdb=>select*frommytable;ERROR:permissiondeniedfortablemytable

That means that public is not managed as a user defined schema, and therefore in order to allow other users to do operations, an explicit GRANT must be executed.
What has changed is that there is no more the CREATE permission on public schema, while YUSAGE is as before. Therefore, in order to allow not-owners to create objects, an explicit GRANT CREATE ON SCHEMA public TO pgbench statement myust be executed.
This affects newly created databases, not those restored from previous backups.
But there is a trick that could help in setting back the previous behavior: if you set the permissions on the template1 (or in a template database) you could have them for free on new databases:

template1=#GRANTCREATEONSCHEMApublicTOPUBLIC;GRANTtemplate1=#CREATEDATABASEnewdbWITHOWNERluca;CREATEDATABASE

And now, collecting as not-owning user:

%psql-Upgbench-hlocalhostnewdbnewdb=>createtablefoo(iint);CREATETABLE

the permissions are as in previous PostgreSQL versions.
It is not clear if the above trick will remain in place once the PostgreSQL version exists the beta status, in any case I discourage you to adopt it. The choice of revoking by default privileges on the `public` schmea could be annoying, but is a good choice in term of security and forces you to decide how to deal with permissions.

↧

Andreas 'ads' Scherbaum: Laura Ricci

July 18, 2022, 7:00 am

≫ Next: Michael Christofides: Introducing a new Postgres podcast

≪ Previous: Luca Ferrari: PostgreSQL 15: changes in the public schema permissions

PostgreSQL Person of the Week Interview with Laura Ricci: I was born into an Italian family who had emigrated to France. I studied literature and foreign languages before I started working several different jobs. Then I went abroad in order to work for two non-profit organizations as a volunteer.

↧

Michael Christofides: Introducing a new Postgres podcast

July 19, 2022, 1:05 am

≫ Next: Julian Markwort: gexec in psql: PostgreSQL poweruser practice

≪ Previous: Andreas 'ads' Scherbaum: Laura Ricci

Over the past few years, like many of us, I’ve gotten pretty into podcasts. To my surprise, I even like technical podcasts.

A few weeks ago, Nikolay Samokhvalov messaged me to see if I’d be up for a weekly 30-minute chat on the excellent Postgres TV YouTube channel he runs with Ilya Kosmodemiansky. I liked the idea, and suggested we also make it a podcast. Naturally, we called it Postgres FM.

What’s the format?

We’re planning to release a roughly 30-minute episode each week, likely on a single topic. We have a list of our own ideas, and a growing list of listener suggestions too!

Who’s it for?

Nikolay brings a wealth of experience from the world of Postgres administration at scale, as well as building tools and services for it. I have more experience managing database tools for developers, and my experience definitely skews more towards working with start-ups.

As such, I’d expect it to be most interesting for people who use Postgres regularly for something substantial, but I do hope some episodes will also be helpful or interesting for more casual users too.

Have you started already?

We have! Perhaps predictably, for two people who care a lot about performance, our first episode was about slow queries and slow transactions. We discussed what counts as slow, how to monitor for issues, and some things we’ve seen in the wild.

Last week we also released an episode on hosting Postgres, specifically managed services vs. DIY. We covered the main reasons we see folks choosing either a managed service, or to manage it in-house, as well as discussing some of the vendors.

How can I check it out?

We’re posting the full recordings to Postgres.TV and an audio-only version to Postgres.FM. You can subscribe for future episodes in the usual ways!

Your suggestions and feedback would be very welcome — please feel free to add a comment below, or message us on Twitter: @samokhvalov and @michristofides.

Who drew the awesome elephant?

That would be the lovely, and very talented, Jessie Draws.

↧

Two styles of UNIQUE

Installation

PostgreSQL

Watchdog

Patroni

Patroni configuration

Database connection

Automatic failover test

Conclusion

pgBackRest repository location

Installation

pgBackRest installation and configuration

Configure Patroni to use pgBackRest

Take a first backup

The “restore” part

Create a replica using pgBackRest

Conclusion

Creating a large table

Create an extremely wide table

Accessing various columns

Debunking PostgreSQL performance issues: column order

Finally …

What is SSL/TLS?

Trying to enable SSL without Cert/Key Files

Creating certificates

Enabling SSL/TLS

Enforcing SSL/TLS

Checking for connections using SSL/TLS

Disabling SSL/TLS

Conclusion

PostgreSQL 15: changes in the low level backup functions

Have You Ever Contributed to PostgreSQL?

The First Steps Towards Contributing to PostgreSQL

Identify your motivation

Consider contributing beyond code

Contributing with code

How Will You Contribute to the Community Next?

Read the Report

Summary

Summary

PostgreSQL 15: changes in the public schema permissions

What’s the format?

Who’s it for?

Have you started already?

How can I check it out?

Who drew the awesome elephant?

Two styles of `UNIQUE`

PostgreSQL 15: changes in the `public` schema permissions