Pavel Stehule: pspg 5.5.0

November 2, 2021, 9:42 am

≫ Next: Franck Pachot: Open-source🍃MongoDB API to 🚀YugabyteDB with 🥭MangoDB proxy

≪ Previous: Lukas Fittl: How we deconstructed the Postgres planner to find indexing opportunities

Today I released pspg 5.5. There are few new interesting features:

stream mode based on continuous reading from file (uses kqueue) is supported on BSD Unix,
There are two new visual effects - possibility to hide border line, and possibility to highlight odd rows
pspg can be used in Oracle's SQLcl client too. It should to work with default format and with ANSICONSOLE format,
one char vertical scrolling is supported too.

↧

Franck Pachot: Open-source🍃MongoDB API to 🚀YugabyteDB with 🥭MangoDB proxy

November 3, 2021, 6:59 am

≫ Next: Caitlin Strong: Patroni & etcd in High Availability Environments

≪ Previous: Pavel Stehule: pspg 5.5.0

There is no doubt that one great thing about MongoDB is the API. Many developers loves it. I'm a big fan of SQL, but we need to listen to all users, and they have use-cases for it. Another thing we expect from NoSQL is the scalability.

In this example we have both, open-source, ACID and resilient, with

MangoDB proxy between MongoDB and PostgreSQL protocols
YugabyteDB with its PostgreSQL compatible API on top of the fully consistent distributed storage

This MangoDB project is new, and when looking for it you will see Google still trying to tell you that you made a typo. So this post is a first quick test to validate how it works. Things will probably change with contributions to https://github.com/MangoDB-io/MangoDB

MangoDB example

I'll take de demo application from:

git clone https://github.com/MangoDB-io/example
cd example

and install it on my laptop because I became recently a big fan of Docker on Windows.

The docker-compose.yml starts a PostgreSQL database and I'll replace that with a YugabyteDB one. This is easy.

postgres service

First, I remove the postgres service. And add the yb-master and yb-tserver ones from YugabyteDB docker-compose.yml

I didn't change any parameters, and I keep the defaults:

host name is yb-tserver (in a distributed database you can connect to any server)
port is 5433 (this is our default, rather than the 54322 default for PostgreSQL)
user is yugabyte (and password is the same)
database is yugabyte (you can create a dedicated one of course)

setup service

I change this in the setup service command, which starts just to create the schema for the application (which is a "todo" list in this example):

psql -h yb-tserver -p 5433 -U yugabyte -d yugabyte -c 'CREATE SCHEMA IF NOT EXISTS todo'

and the docker-compose dependecy is set to yb-tserver instead of postgres

mangodb service

The application also needs the connection string. We use the PostgreSQL driver as it is the same protocol, and change the host, port and database name only:

    depends_on:
      - 'postgres'
    entrypoint: ["sh", "-c", "psql -h postgres -U postgres -d mangodb -c 'CREATE SCHEMA IF NOT EXISTS todo'"]

Start the application

Here is my final docker-compose.yml:

version:"3"volumes:yb-master-data-1:yb-tserver-data-1:services:client:build:./app/clienthostname:'todo_client'container_name:'todo_client'stdin_open:trueapi:build:./app/apihostname:'todo_api'container_name:'todo_api'nginx:image:nginxhostname:'nginx'container_name:'nginx'ports:-8888:8888volumes:-./nginx.conf:/etc/nginx/conf.d/default.confyb-master:image:yugabytedb/yugabyte:latestcontainer_name:yb-master-n1volumes:-yb-master-data-1:/mnt/mastercommand:["/home/yugabyte/bin/yb-master","--fs_data_dirs=/mnt/master","--master_addresses=yb-master-n1:7100","--rpc_bind_addresses=yb-master-n1:7100","--replication_factor=1"]ports:-"7000:7000"environment:SERVICE_7000_NAME:yb-masteryb-tserver:image:yugabytedb/yugabyte:latestcontainer_name:yb-tserver-n1volumes:-yb-tserver-data-1:/mnt/tservercommand:["/home/yugabyte/bin/yb-tserver","--fs_data_dirs=/mnt/tserver","--start_pgsql_proxy","--rpc_bind_addresses=yb-tserver-n1:9100","--tserver_master_addrs=yb-master-n1:7100"]ports:-"9042:9042"-"5433:5433"-"9000:9000"environment:SERVICE_5433_NAME:ysqlSERVICE_9042_NAME:ycqlSERVICE_6379_NAME:yedisSERVICE_9000_NAME:yb-tserverdepends_on:-yb-mastermangodb:image:ghcr.io/mangodb-io/mangodb:latesthostname:'mangodb'container_name:'mangodb'command:['-listen-addr=:27017','-postgresql-url=postgres://yugabyte@yb-tserver:5433/yugabyte',]ports:-27017:27017setup:image:postgres:14.0hostname:'setup'container_name:'setup'restart:'on-failure'depends_on:-'yb-tserver'entrypoint:["sh","-c","psql-hyb-tserver-p5433-Uyugabyte-dyugabyte-c'CREATESCHEMAIFNOTEXISTStodo'"]

I pull the images, create the containers and run the services:

docker-compose up

Here is the start from command line:

Visible in Docker Desktop:

The YugabyteDB console is available on: http://localhost:7000

The example application:

The application is accessible on http://localhost:8888/ and we can add items to the To-Do list:

This calls the db.collection.insertOne() MongoDB function:

Check the database

This MongoDB call is translated to SQL by the MangoDB proxy. The collection is a table:

$ psql postgres://yugabyte:yugabyte@localhost:5433/yugabyte
psql (12.7, server 11.2-YB-2.9.0.0-b0)
Type "help" for help.

yugabyte=# \dn

  List of schemas
  Name  |  Owner
--------+----------
 public | postgres
 todo   | yugabyte
(2 rows)

yugabyte=# set schema 'todo';
SET

yugabyte=# \d
         List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 todo   | tasks | table | yugabyte
(1 row)

yugabyte=# \d+ todo.tasks
                                   Table "todo.tasks"
 Column | Type  | Collation | Nullable | Default | Storage  | Stats target | Description
-------------+-------+-----------+----------+---------+----------+--------------+-------------
 _jsonb | jsonb |           |          |         | extended |              |

yugabyte=# select * from todo.tasks;
                                                                    _jsonb
----------------------------------------------------------------------------------------------------------------------------------------------------
 {"$k": ["description", "completed", "_id"], "_id": {"$o": "6182627a17462641b80439d4"}, "completed": false, "description": "Play 😎"}
 {"$k": ["description", "completed", "_id"], "_id": {"$o": "6182627017462641b80439d3"}, "completed": false, "description": "Start MangoDB"}
 {"$k": ["description", "completed", "_id"], "_id": {"$o": "6182626817462641b80439d2"}, "completed": false, "description": "Start YugabyteDB"}
(3 rows)

yugabyte=#

The storage is very simple: one table with one JSONB column.

SQL Statements

I'll track the statements used with the pg_stat_statements extension which is enabled by default in YugabyteDB. Just resetting in my lab:

yugabyte=# select pg_stat_statements_reset();
 pg_stat_statements_reset
-------------------------------

(1 row)

In the application I refresh, mark "Play" as completed, Insert a new task, delete it, and refresh multiple times.

yugabyte=# select calls,query from pg_stat_statements;
 calls |                                                        query
------------+----------------------------------------------------------------------------------------------------------------------
     1 | select query from pg_stat_statements
     1 | INSERT INTO "todo"."tasks" (_jsonb) VALUES ($1)
     1 | DELETE FROM "todo"."tasks" WHERE _jsonb->$1 = $2
     1 | SELECT _jsonb FROM "todo"."tasks" WHERE _jsonb->$1 = $2
     1 | UPDATE "todo"."tasks" SET _jsonb = $1 WHERE _jsonb->'_id' = $2
     7 | SELECT _jsonb FROM "todo"."tasks"
     1 | select pg_stat_statements_reset()
    11 | SELECT COUNT(*) > 0 FROM information_schema.columns WHERE column_name = $1 AND table_schema = $2 AND table_name = $3
    11 | SELECT COUNT(*) > 0 FROM information_schema.tables WHERE table_schema = $1 AND table_name = $2
     1 | select * from pg_stat_statements
(10 rows)

There are many things to optimize here. Reading the information_schema many time is not the most efficient. We need an index on the ID (which is in the JSON document). And updates are re-writing the whole document. I'll think about this and probably contribute to this open-source project. Probably the ID should be in another column, that we can properly index and shard, rather than scanning the whole table or adding an additional index.

JSONB indexing

As YugabyteDB plugs the distributed storage to a full PostgreSQL query layer, we can even index this Here is the table with just one JSONB (it was created by with CREATE TABLE "todo"."tasks" (_jsonb jsonb);):

yugabyte=# select * from todo.tasks;
                                                              _jsonb
---------------------------------------------------------------------------------------------------------------------------------------
 {"$k": ["description", "completed", "_id"], "_id": {"$o": "618282aea9a2a141efa3c401"}, "completed": false, "description": "Franck"}
(1 row)

If I select one key, it has to scan the whole table:

yugabyte=# explain analyze select * from todo.tasks where _jsonb->>'_id' = '{"$o": "618282aea9a2a141efa3c401"}';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on tasks  (cost=0.00..105.00 rows=1000 width=32) (actual time=0.815..0.817 rows=1 loops=1)
   Filter: ((_jsonb ->> '_id'::text) = '{"$o": "618282aea9a2a141efa3c401"}'::text)
 Planning Time: 0.048 ms
 Execution Time: 0.874 ms
(4 rows)

But I can create a unique index on it:

yugabyte=# create unique index task_pk ON todo.tasks
           ((_jsonb ->> '_id'::text) hash);
CREATE INDEX

Do not forget the double parenthesis (this is the PostgreSQL syntax):

one for the list of columns to index,
and one because it is not directly a column but a value derived from the JSON document.

The HASH modifier is optional here because hash sharding is the default on the first column. And this is what we want on this generated identifier. But if you have range scans, you could change it to ASC or DESC.

Now I have a fast access to the document:

yugabyte=# explain analyze select * from todo.tasks where _jsonb->>'_id' = '{"$o": "618282aea9a2a141efa3c401"}';

                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using task_pk on tasks  (cost=0.00..4.12 rows=1 width=32) (actual time=13.617..13.622 rows=1 loops=1)
   Index Cond: ((_jsonb ->> '_id'::text) = '{"$o": "618282aea9a2a141efa3c401"}'::text)
 Planning Time: 11.593 ms
 Execution Time: 13.706 ms
(4 rows)

This means that a query with the key will go to the right tablet (the tables and indexes are automatically sharded in YugabyteDB) and to the right row. We are ready to scale out and keep the low latency.

Index Only Indexes

The previous execution plan may require two RPC on a scale-out database: one to the index and one to the table. Because, for better agility, all indexes are global in YugabyteDB. And, of course, with no compromise on strong consistency. But an Index Only Scan would be better. It is easy to acheive (I explained in How a Distributed SQL Database Boosts Secondary Index Queries with Index Only Scan):

yugabyte=# create unique index task_pk ON todo.tasks
           ((_jsonb ->> '_id'::text) hash) include (_jsonb);
CREATE INDEX

And here is the fastest access you can have to a document on a SQL distributed database, still with the full agility of a JSON document:

yugabyte=# explain analyze select * from todo.tasks where _jsonb->>'_id' = '{"$o": "618282aea9a2a141efa3c401"}';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using task_pk on tasks  (cost=0.00..4.12 rows=1 width=32) (actual time=2.559..2.561 rows=1 loops=1)
   Index Cond: (((_jsonb ->> '_id'::text)) = '{"$o": "618282aea9a2a141efa3c401"}'::text)
   Heap Fetches: 0
 Planning Time: 5.229 ms
 Execution Time: 2.607 ms
(5 rows)

Now you have all in the index and don't need the table at all. In PostgreSQL you have no choice as you need to maintain the heap table. But YugabyteDB stores tables in LSM trees where rows are organized for fast access on the primary key. When storing documents into a SQL table, it is better to have the identifier in its own column, an integer or uuid, to really have a (key uuid, value jsonb) schema. I'll suggest that to the MangoDB project, as well as some other optimizations for PostgreSQL or YugabyteDB. But the essence is there: a simple MongoDB API to distributed SQL database.

↧

Caitlin Strong: Patroni & etcd in High Availability Environments

November 3, 2021, 7:50 am

≫ Next: Nikolay Samokhvalov: Three Cases Against IF NOT EXISTS / IF EXISTS in Postgres DDL

≪ Previous: Franck Pachot: Open-source🍃MongoDB API to 🚀YugabyteDB with 🥭MangoDB proxy

Patroni & etcd in High Availability Environments

Crunchy Data products often include High Availability. Patroni and etcd are two of our go-to tools for managing those environments. Today I wanted to explore how these work together. Patroni relies on proper operation of the etcd cluster to decide what to do with PostgreSQL. When communication between these two pieces breaks down, it creates instability in the environment resulting in failover, cluster restart, and even the the loss of a primary database. To fully understand the importance of this relationship, we need to understand a few core concepts of how these pieces work. First, we'll start with a brief overview of the components involved in HA systems and their role in the environment.

↧

Nikolay Samokhvalov: Three Cases Against IF NOT EXISTS / IF EXISTS in Postgres DDL

November 3, 2021, 2:45 pm

≫ Next: Frits Hoogland: Cloud: IO limits gone full circle

≪ Previous: Caitlin Strong: Patroni & etcd in High Availability Environments

What is this about?

Many DDL statements in PostgreSQL support modified IF EXISTS / IF NOT EXISTS. For example:

 test=# create table if not exists mytable(); CREATE TABLE test=# drop table if exists mytable; DROP TABLE

I recommend using `IF EXISTS` / `IF NOT EXISTS` in DDL only when necessary. Here are three examples that demonstrate how the overuse of these words may lead to negative consequences.

↧

Frits Hoogland: Cloud: IO limits gone full circle

November 4, 2021, 7:15 am

≫ Next: Frits Hoogland: Cloud: IO limits gone full circle

≪ Previous: Nikolay Samokhvalov: Three Cases Against IF NOT EXISTS / IF EXISTS in Postgres DDL

In the old days we used rotating disks, which had mechanical arms moving over the surface to read data, which meant there was a certain latency before data could be obtained, and a limited amount of bandwidth. The solution for getting more bandwidth was to use more disks (RAID). Still the overall usage was essentially bound by IOPS because of the mechanical arms/attenuators. (there was other storage media before that, but that is outside of the scope of this article)

Then came the solid state disks (SSD). Because these do not use mechanical, rotating disks, the latency is severely limited. In fact, this was such an improvement that the existing access protocols were found to be limiting SSD and new protocols were needed to take advantage of parallelism and bandwidth that was made possible by SSD (such as multipath IO and NVMe). Of course new storage (technology) comes with their own problems, but that is beyond the scope of this article.

Fast forward further in time and we enter the cloud era. Now we can rent a (virtual) machine, and elastically scale up and down, and let all these properties that we have obsessed about in the past, such as the number of disks, disk failure rates, bandwidth, etc. be the problem of the cloud vendor, we can just use the infrastructure...

Or can we? If you carefully look at the specifications of the virtual machines of all major cloud providers, you will notice that a cloud machine shape has obvious limits such as number of vCPUs and memory, but also has limits on disks, both on the layer of the virtual machine, as well as on the disk.

The disk limits being less obvious also gives me the impression that these are put in such a way that makes it easy to miss these.

But that is not what I wanted to discuss: if you look and work out the information about IO limits for a cloud machine shape together with one or more disk devices, you will notice that the IO limits of especially the smaller machine shapes are quite low.

In fact...if you take the IO limits of such machines, it leaves an impression with me that we essentially are back at the disk limits of the time of rotating disks.

But it's not all nostalgia: there is another side to this; this means that disk IO sensitive applications that have to use these machines have to be tuned for limited IOPS again, and cannot assume close to unlimited amounts of IOPS and bandwidth, using tuning such as using large IOs to be able to reach bandwidth, because parallel usage of small IOs will run into the IOPS limit.

↧

Frits Hoogland: Cloud: IO limits gone full circle

November 4, 2021, 7:15 am

≫ Next: Jonathan Katz: Can't Resize your Postgres Kubernetes Volume? No Problem!

≪ Previous: Frits Hoogland: Cloud: IO limits gone full circle

The disk limits being less obvious also gives me the impression that these are put in such a way that makes it easy to miss these.

In fact...if you take the IO limits of such machines, it leaves an impression with me that we essentially are back at the disk limits of the time of rotating disks.

↧

Jonathan Katz: Can't Resize your Postgres Kubernetes Volume? No Problem!

November 4, 2021, 11:50 am

≫ Next: Luca Ferrari: pg_upgrade and OpenBSD

≪ Previous: Frits Hoogland: Cloud: IO limits gone full circle

Can't Resize your Postgres Kubernetes Volume? No Problem!

You've built an application and are using Postgres to run it. You move it into production. Things are going great. So great that you've accumulated so much data that you need to resize your disk.

Before the cloud, this often involved either expanding your disk partitioning or getting a new disk, both of which are costly operations. Cloud has made this much easier: disk resizes can occur online or transparently to the application, and can be done as simply as clicking a button (such as in Crunchy Bridge).

If you're running your database on Kubernetes, you can also get fairly cheap disk resizes using persistent volumes. While the operation is simple, it does require you to reattach the PVC to a Pod for the expansion to take effect. If uptime is important, you do want to use something like PGO, the open source Postgres Operator from Crunchy Data. PGO uses a rolling update strategy to minimize or eliminate downtime.

There is a catch to the above: not every Kubernetes storage system supports storage resize operations. In order to expand the storage available to your Postgres cluster, you have to create a new cluster and copy data to a larger persistent volume.

Though this is a bit inconvenient, there is still a way to resize your Postgres data volumes while minimizing downtime with PGO. Let's take a look how we can do that!

"Instances Sets": Creating Postgres Cluster That Are Similar But Different

Following the PGO quickstart, let's create a Postgres cluster that looks like this and add an additional replica:

↧

Luca Ferrari: pg_upgrade and OpenBSD

November 4, 2021, 5:00 pm

≫ Next: Jonathan Katz: Fun with SQL in Postgres: Finding Revenue Accrued Per Day

≪ Previous: Jonathan Katz: Can't Resize your Postgres Kubernetes Volume? No Problem!

OpenBSD ships pg_upgrade as a separate package.

pg_upgrade and OpenBSD

I never noted that, on OpenBSD, the pg_upgrade command is not shipped with the default PostgreSQL server isntallation. I usually install PostgreSQL from sources, so I never digged into Open BSD packages. The choice of OpenBSD is to keep pg_upgrade separate from the rest of the binaries and executables of PostgreSQL.
Allow me to explain and let’s start from the installed binaries on a OpenBSD 7.0 machine:

% ls-1 /usr/local/bin/pg*
/usr/local/bin/pg_archivecleanup
/usr/local/bin/pg_basebackup
/usr/local/bin/pg_checksums
/usr/local/bin/pg_config
/usr/local/bin/pg_controldata
/usr/local/bin/pg_ctl
/usr/local/bin/pg_dump
/usr/local/bin/pg_dumpall
/usr/local/bin/pg_isready
/usr/local/bin/pg_receivewal
/usr/local/bin/pg_recvlogical
/usr/local/bin/pg_resetwal
/usr/local/bin/pg_restore
/usr/local/bin/pg_rewind
/usr/local/bin/pg_standby
/usr/local/bin/pg_test_fsync
/usr/local/bin/pg_test_timing
/usr/local/bin/pg_verifybackup
/usr/local/bin/pg_waldump
/usr/local/bin/pgbench

The server is a PostgreSQL 13.4, installed via pkg_add. The PostgreSQL contrib module is installed, but as you can see, there is no pg_upgrade binary in the above listing.
Let’s inspect the packages:

% pkg_info -Q postgresql

postgresql-client-13.4p0 (installed)
postgresql-contrib-13.4p0 (installed)
postgresql-docs-13.4p0 (installed)
postgresql-odbc-10.02.0000p0
postgresql-pg_upgrade-13.4p0
postgresql-pllua-2.0.7
postgresql-plpython-13.4p0
postgresql-plr-8.4.1
postgresql-server-13.4p0 (installed)

Please note the postgresql-pg_upgrade-13.4p0 that is what contains the pg_upgrade command:

% pkg_info  postgresql-pg_upgrade-13.4p0 
Information for https://cdn.openbsd.org/pub/OpenBSD/7.0/packages/amd64/postgresql-pg_upgrade-13.4p0.tgz

Comment:
Support for upgrading PostgreSQL data from previous version

Description:
Contains pg_upgrade, used for upgrading PostgreSQL database
directories to newer major versions without requiring a dump and
reload.

Maintainer: Pierre-Emmanuel Andre <pea@openbsd.org>

WWW: https://www.postgresql.org/

This choice of packaging is somehow strange.
Let’s install pg_upgrade:

% doas pkg_add postgresql-pg_upgrade
quirks-4.53 signed on 2021-10-30T11:32:24Z
postgresql-pg_upgrade-13.4p0:postgresql-previous-12.8: ok
postgresql-pg_upgrade-13.4p0: ok

% ls-lh$(which pg_upgrade)-rwxr-xr-x  1 root  bin   185K Sep 26 21:25 /usr/local/bin/pg_upgrade

So, the binary itself is very tiny, and sizes at 185 kB, therefore placing it on its own package does not make sense with regard to the disk space occupation. However, please note that installing pg_upgrade also triggered the installation of postgresql-previous-12.8, that means the system has installed also PostgreSQL 12.8.
This is clearly shown from a query on such package:

% pkg_info postgresql-previous-12.8   
Information for inst:postgresql-previous-12.8

Comment:
PostgreSQL RDBMS (previous version, for pg_upgrade)

Required by:
postgresql-pg_upgrade-13.4p0

Description:
PostgreSQL RDBMS server, the previous version

This is the previous version of PostgreSQL, necessary to allow for
pg_upgrade to work in the currently supported PostgreSQL version.

And in fact, the package installs all the previous version of the cluster, included libraries and executables:

% pkg_info -L postgresql-previous-12.8 | grep bin
/usr/local/bin/postgresql-12/clusterdb
/usr/local/bin/postgresql-12/createdb
/usr/local/bin/postgresql-12/createuser
/usr/local/bin/postgresql-12/dropdb
/usr/local/bin/postgresql-12/dropuser
/usr/local/bin/postgresql-12/ecpg
/usr/local/bin/postgresql-12/initdb
/usr/local/bin/postgresql-12/oid2name
/usr/local/bin/postgresql-12/pg_archivecleanup
/usr/local/bin/postgresql-12/pg_basebackup
/usr/local/bin/postgresql-12/pg_checksums
/usr/local/bin/postgresql-12/pg_config
...

Therefore, installing pg_upgrade will also install the *whole previous major version of PostgreSQL.**

It was a separated packages since a while…

Inspecting the CVS of the ports tree, it is possible to note that the pg_upgrade command has been separated into a subpackage since 2016:

This moves pg_upgrade to a subpackage, and has that
subpackage depend on postgresql-previous.

In fact, this is the commit that made the pg_upgrade a distinct package into the build system.
The rationale about this can be found in the b2k16 hackaton article, where Jeremy Evans explain that in order to get pg_upgrade to work, there was the need to have the previous binaries for PostgreSQL. Therefore, the application has been moved to a different package, so that it can install also the previous binaries on the system.

Conclusions

The choice of keeping pg_upgrade as a separated package is a choice. I don’t think it is right or wrong, it is just a choice that ensures that if you decide to install a newer PostgreSQL, you must have a previous version to upgrade from.
Quite frankly, I don’t see the reason because I could have a different database version into the system, that I want to upgrade from, even if I did not have installed from ports.
Moreover, pg_upgrade can upgrade PostgreSQL even from non-sequential PostgreSQL versions, even if I personally don’t recommend this, especially if the “hole” in versioning is big. However, this means that installing the previous version of PostgreSQL could not be the right choice in every scenario. Again, this is not either a good or bad choice, it is just a choice and it must be noted that, unlike other operating systems, OpenBSD does not offer old versions of PostgreSQL as packages (if we exclude the -previous package), that means it is a choice coherent with the philosophy of the operating system.

↧

Jonathan Katz: Fun with SQL in Postgres: Finding Revenue Accrued Per Day

November 5, 2021, 11:00 am

≫ Next: Ryan Lambert: Find missing OpenStreetMap data with PostGIS

≪ Previous: Luca Ferrari: pg_upgrade and OpenBSD

Fun with SQL in Postgres: Finding Revenue Accrued Per Day

I recently wrote an example of how you can project monthly recurring revenue (MRR) in Postgres. This is a helpful metric to understand how a subscription-based business is doing and can help inform all sorts of financial and operational decisions at the company.

↧

Ryan Lambert: Find missing OpenStreetMap data with PostGIS

November 6, 2021, 10:01 pm

≫ Next: Franck Pachot: 🚀 Think about Primary Key & Indexes before anything else

≪ Previous: Jonathan Katz: Fun with SQL in Postgres: Finding Revenue Accrued Per Day

The #30DayMapChallenge is going on again this November. Each day of the month has a different theme for that day's map challenge. These challenges do not have a requirement for technology, so naturally I am using OpenStreetMap data stored in PostGIS with QGIS for the visualization component.

The challenge for Day 5 was an OpenStreetMap data challenge. I decided to find and visualize missing crossing tags. Crossing tags are added to the node (point) where a pedestrian highway (e.g. highway=footway) intersects a motorized highway (e.g. highway=tertiary). This post explains how I used PostGIS and OpenStreetMap data to find intersections missing a dedicated crossing tag.

Without further ado, here was my submission for Day 5.

↧

Franck Pachot: 🚀 Think about Primary Key & Indexes before anything else

November 7, 2021, 2:30 pm

≫ Next: Haki Benita: Lesser Known PostgreSQL Features

≪ Previous: Ryan Lambert: Find missing OpenStreetMap data with PostGIS

A recent tweet by Nikolay Samokhvalov, draws attention to the importance of understanding database structures:

Nikolay Samokhvalov – Postgres is awesome
@samokhvalov
This thread is astonishing.

People know 1001 way to optimize a simple @PostgreSQL query to a table with 11M rows running 3s — from migrating to ClickHouse to involving sharding

And only a few understand what's btree and how it is working 🤷🏻‍♂️ twitter.com/jlongster/stat…
22:57 PM - 06 Nov 2021
James Long @jlongster
Hitting scaling issues with PostgreSQL. Yes! The best kind of problem! Now I need to figure out how to make a query on 11644420 rows take less then 3 seconds.

Andy Pavlo also raised the point that people are suggesting many radical changes, without understanding proper indexing:

Andy Pavlo
@andy_pavlo
This thread is the motivation of @CMUDB's research on self-driving databases. James had a query opt problem that just needed the proper index, yet people are suggesting switching to a distributed or NoSQL DBMS, adding matviews, type change to UUID, etc.
H/T @samokhvalov twitter.com/jlongster/stat…
13:58 PM - 07 Nov 2021
James Long @jlongster
Hitting scaling issues with PostgreSQL. Yes! The best kind of problem! Now I need to figure out how to make a query on 11644420 rows take less then 3 seconds.

In this post, I'll take this example to explain that thinking about indexes should not be too difficult. And, anyway, this work must be done before scaling to a distributed database. The problem was encountered in PostgreSQL. I'll run my demo on YugabyteDB to show that the concepts are the same on a distributed database. The SQL is the same.

James Long's approach was good: ask the community and provide all required information, the execution plan and index definition: https://gist.github.com/jlongster/4b31299dcb622aa7e29b59d889db2b2c#file-gistfile1-txt

With such information, the problem is easy to reproduce:

yugabyte=#\cyugabyteyugabytepsql(15devel,server11.2-YB-2.9.1.0-b0)Youarenowconnectedtodatabase"yugabyte"asuser"yugabyte".yugabyte=#createtablemessages_binary("timestamp"text,"group_id"uuid,"other_column"int,primarykey("timestamp","group_id"));CREATETABLEyugabyte=#EXPLAINSELECT*FROMmessages_binaryWHEREgroup_id='e7e46753-2e99-4ee4-b77f-17136b01790e'ANDtimestamp>'1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e';QUERYPLAN--------------------------------------------------------------------------------------------------------------------------------------------------SeqScanonmessages_binary(cost=0.00..105.00rows=1000width=52)Filter:(("timestamp">'1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e'::text)AND(group_id='983d5259-97ff-49e3-8829-101a
b8dead92'::uuid))(2rows)

This is a full table scan. And this is not what we want because it reads all rows where we need only one "group_id". What we need is a range scan.

Let's insert a few rows (3 timestamps in 3 groups):

yugabyte=#createextensionpgcrypto;CREATEEXTENSIONyugabyte=#insertintomessages_binarywithgroupsas(selectgen_random_uuid()group_idfromgenerate_series(1,3))selectto_char(now()+(generate_series(1,3)*interval'1 second'),'yyyy-mm-ddThh24:mi:ss.000Z-')||substr(gen_random_uuid()::text,25)"timestamp",group_id,42as"value"fromgroups;INSERT09yugabyte=#select*frommessages_binary;timestamp|group_id|other_column--------------------------------------------+--------------------------------------+--------------2021-11-07T20:00:23.000Z-c533a5e5623e|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:24.000Z-b879daca6cb7|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:23.000Z-ca98dd4de397|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:22.000Z-c440295c4500|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:24.000Z-631b45e66aba|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:22.000Z-ad01842bb691|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:24.000Z-90342717a0c8|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:22.000Z-933f552d0159|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:23.000Z-1dcde16fc472|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|42(9rows)

In YugabyteDB rows are sharded and stored into the primary index itself. In PostgreSQL, they are appended into a heap table, with an additional index on the primary key. In both cases, a range scan access depends on the primary key, which is defined here as ("timestamp","group_id"). And we can see that the rows I need for group_id = 'e7e46753-2e99-4ee4-b77f-17136b01790e are scattered in this Seq Scan result.

Let's have an idea of the order of the primary key, with a SELECT ... ORDER BY on the same columns:

yugabyte=#select*frommessages_binaryorderby"timestamp","group_id";timestamp|group_id|other_column--------------------------------------------+--------------------------------------+--------------2021-11-07T20:00:22.000Z-933f552d0159|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:22.000Z-ad01842bb691|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:22.000Z-c440295c4500|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:23.000Z-1dcde16fc472|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:23.000Z-c533a5e5623e|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:23.000Z-ca98dd4de397|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:24.000Z-631b45e66aba|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:24.000Z-90342717a0c8|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:24.000Z-b879daca6cb7|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|42(9rows)

Now you can understand how inefficient is the query with the WHERE group_id = 'e7e46753-2e99-4ee4-b77f-17136b01790e' AND timestamp > '1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e' predicate. We start on the first row because it verifies timestamp > '1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e' and from this we have to scan all rows and filter them afterwards. There is no data structure where the interesting rows can be found in a small range that can be read alone. This explains the Seq Scan.

We need a structure like this one, ordered on "group_id" first:

yugabyte=#select*frommessages_binaryorderby"group_id","timestamp";timestamp|group_id|other_column--------------------------------------------+--------------------------------------+--------------2021-11-07T20:00:22.000Z-c440295c4500|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:23.000Z-1dcde16fc472|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:24.000Z-90342717a0c8|9c3d61e1-6d3f-4b95-9e08-46f485d10b75|422021-11-07T20:00:22.000Z-ad01842bb691|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:23.000Z-c533a5e5623e|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:24.000Z-631b45e66aba|e7e46753-2e99-4ee4-b77f-17136b01790e|422021-11-07T20:00:22.000Z-933f552d0159|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:23.000Z-ca98dd4de397|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|422021-11-07T20:00:24.000Z-b879daca6cb7|f27ac68f-2a10-46f0-a8fe-77b99c0c5a66|42(9rows)

On this structure (look at the row order, I didn't change the column order), the database engine can:

seek to the first group_id='e7e46753-2e99-4ee4-b77f-17136b01790e',
additionally seek to the first timestamp > '1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e',
read the following rows in sequence,
and stop at the last one with group_id='e7e46753-2e99-4ee4-b77f-17136b01790e'.

How to get this structure? Easy to define with another index:

yugabyte=#createindexmessages_binary_key2onmessages_binary("group_id","timestamp");CREATEINDEX

Here is the execution plan:

yugabyte=#EXPLAINSELECT*FROMmessages_binaryWHEREgroup_id='e7e46753-2e99-4ee4-b77f-17136b01790e'ANDtimestamp>'1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e';QUERYPLAN----------------------------------------------------------------------------------------------------------------------------------------------------------------IndexScanusingmessages_binary_key2onmessages_binary(cost=0.00..5.25rows=10width=52)IndexCond:((group_id='e7e46753-2e99-4ee4-b77f-17136b01790e'::uuid)AND("timestamp">'1970-01-01T00:00:00.000Z-0000-ae26
b84edae7349e'::text))(2rows)

This is efficient. It can be even better if we add all the columns selected into the index, with the INCLUDE clause, like:

yugabyte=#createindexmessages_binary_key2onmessages_binary("group_id","timestamp")include("other_column");CREATEINDEXyugabyte=#EXPLAINSELECT*FROMmessages_binaryWHEREgroup_id='e7e46753-2e99-4ee4-b77f-17136b01790e'ANDtimestamp>'1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e';QUERYPLAN----------------------------------------------------------------------------------------------------------------------------------------------------------------IndexOnlyScanusingmessages_binary_key2onmessages_binary(cost=0.00..5.15rows=10width=52)IndexCond:((group_id='e7e46753-2e99-4ee4-b77f-17136b01790e'::uuid)AND("timestamp">'1970-01-01T00:00:00.000Z-0000-ae26
b84edae7349e'::text))(2rows)

I've detailed this Index Only Scan technique in:
https://blog.yugabyte.com/how-a-distributed-sql-database-boosts-secondary-index-queries-with-index-only-scan/

With a little more analysis, there's a possibility that the index on ("timestamp","group_id") is not useful at all because there's a low chance that we have a query on the timestamp only without the group_id.
Then it would be better to define the table as:

yugabyte=#createtablemessages_binary("timestamp"text,"group_id"uuid,"other_column"int,primarykey("group_id","timestamp"));CREATETABLE

I'm inserting more rows and look at the execution plan:

yugabyte=#insertintomessages_binarywithgroupsas(selectgen_random_uuid()group_idfromgenerate_series(1,1e3))selectto_char(now()+(generate_series(1,1e4)*interval'1 second'),'yyyy-mm-ddThh24:mi:ss.000Z-')||substr(gen_random_uuid()::text,25)"timestamp",group_id,42as"value"fromgroups;yugabyte=#analyzemessages_binary;ANALYZEyugabyte=#insertintomessages_binarywithgroupsas(selectgen_random_uuid()group_idfromgenerate_series(1,1e3))selectto_char(now()+(generate_series(1,1e4)*interval'1 second'),'yyyy-mm-ddThh24:mi:ss.000Z-')||substr(gen_random_uuid()::text,25)"timestamp",group_id,42as"value"fromgroups;yugabyte=#EXPLAIN(analyze)SELECT*FROMmessages_binaryWHEREgroup_id='e7e46753-2e99-4ee4-b77f-17136b01790e'ANDtimestamp>'1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e';QUERYPLAN----------------------------------------------------------------------------------------------------------------------------------------------------------------IndexScanusingmessages_binary_pkeyonmessages_binary(cost=0.00..1214.95rows=10530width=52)(actualtime=10.588..100.838rows=10000loops=1)IndexCond:((group_id='e7e46753-2e99-4ee4-b77f-17136b01790e'::uuid)AND("timestamp">'1970-01-01T00:00:00.000Z-0000-ae26
b84edae7349e'::text))PlanningTime:0.067msExecutionTime:101.711ms(4rows)

The 10000 rows are retrieved in 100 milliseconds, in my small 8 vCPU lab, from a 10 million rows table but I can tell you that it does not depend on the size of the table. This is another benefit of understanding access patterns: you know how it scales. This Index Scan is O(1) on the table size and O(n) on the result size.

This is the best you can do for this query, without additional indexes, just the right order in the primary key. In PostgreSQL this will still do random reads to the table, but at least all is filtered from the range scan on the primary key index. In YugabyteDB, all rows will be retrieved with sequential reads:

sharding is done on the primary key to read from one tablet only
SST files have bloom filter to read only the required ones
They index blocks to read only the required ones

An additional remark from Alvaro Hernández questions the number of rows retreived by the query:

Álvaro Hernández
@ahachete
@jlongster @scocio That is definitely much. Why do you need so many rows returned? That's not very scalable, with or without indexes.
23:18 PM - 06 Nov 2021

And one more thing, James Long's execution plan shows (group_id = '983d5259-97ff-49e3-8829-101ab8dead92'::text) in the Index Conditions. Storing UUID as text is not efficient, I've used the uuid datatype here.

In summary, before declaring that your "database don't scale", there's no shortcut to understanding your access pattern: the size of the expected result and the structure to access it efficiently. And do as James Long: read the execution plan and ask the community 👍

↧

Haki Benita: Lesser Known PostgreSQL Features

November 7, 2021, 2:00 pm

≫ Next: Andreas 'ads' Scherbaum: John Naylor

≪ Previous: Franck Pachot: 🚀 Think about Primary Key & Indexes before anything else

In 2006 Microsoft conducted a customer survey to find what new features users want in new versions of Microsoft Office. To their surprise, more than 90% of what users asked for already existed, they just didn't know about it. To address the "discoverability" issue, they came up with the "Ribbon UI" that we know from Microsoft Office products today.

Office is not unique in this sense. Most of us are not aware of all the features in tools we use on a daily basis, especially if it's big and extensive like PostgreSQL. With PostgreSQL 14 released just a few weeks ago, what a better opportunity to shed a light on some lesser known features that already exist in PostgreSQL, but you may not know.

In this article I present lesser known features of PostgreSQL.

<small>Illustration by <a href="https://www.instagram.com/_wrightdesign/">Eleanor Wright</a></small> — Illustration by Eleanor Wright

Table of Contents

Get the Number of Updated and Inserted Rows in an Upsert

INSERT ON CONFLICT, also known as "merge" (in Oracle) or "upsert" (a mashup of UPDATE and INSERT), is a very useful command, especially in ETL processes. Using the ON CONFLICT clause of an INSERT statement, you can tell the database what to do when a collision is detected in one or more key columns.

For example, here is a query to sync data in an employees table:

db=#WITHnew_employeesAS(SELECT*FROM(VALUES('George','Sales','Manager',1000),('Jane','R&D','Developer',1200))ASt(name,department,role,salary))INSERTINTOemployees(name,department,role,salary)SELECTname,department,role,salaryFROMnew_employeesONCONFLICT(name)DOUPDATESETdepartment=EXCLUDED.department,role=EXCLUDED.role,salary=EXCLUDED.salaryRETURNING*;  name  │ department │   role    │ salary────────┼────────────┼───────────┼──────── George │ Sales      │ Manager   │   1000 Jane   │ R&D        │ Developer │   1200INSERT 0 2

The query inserts new employee data to the table. If there is an attempt to add an employee with a name that already exists, the query will update that row instead.

RETURNING *

Check out how to implement complete processes using WITH and RETURNING.

You can see from the output of the command above, INSERT 0 2, that two employees were affected. But how many were inserted, and how many were updated? The output is not giving us any clue!

While I was looking for a way to improve the logging of some ETL process that used such query, I stumbled upon this Stack Overflow answer that suggested a pretty clever solution to this exact problem:

db=#WITHnew_employeesAS(SELECT*FROM(VALUES('George','Sales','Manager',1000),('Jane','R&D','Developer',1200))ASt(name,department,role,salary))INSERTINTOemployees(name,department,role,salary)SELECTname,department,role,salaryFROMnew_employeesONCONFLICT(name)DOUPDATESETdepartment=EXCLUDED.department,role=EXCLUDED.role,salary=EXCLUDED.salaryRETURNING*,(xmax=0)ASinserted;  name  │ department │   role    │ salary │ inserted────────┼────────────┼───────────┼────────┼────────── Jane   │ R&D        │ Developer │   1200 │ t George │ Sales      │ Manager   │   1000 │ fINSERT 0 2

Notice the difference in the RETUNING clause. It includes the calculated field inserted that uses the special column xmax to determine how many rows were inserted. From the data returned by the command, you can spot that a new row was inserted for "Jane", but "George" was already in the table, so the row was updated.

The xmax column is a special system column:

The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version.

In PostgreSQL, when a row is updated, the previous version is deleted, and xmax holds the ID of the deleting transaction. When the row is inserted, no previous row is deleted, so xmax is zero. This "trick" is cleverly using this behavior to distinguish between updated and inserted rows.

Grant Permissions on Specific Columns

Say you have a users table that contain sensitive information such as credentials, passwords or PII:

db=#CREATETABLEusers(idINT,usernameVARCHAR(20),personal_idVARCHAR(10),password_hashVARCHAR(256));CREATE TABLEdb=#INSERTINTOusersVALUES(1,'haki','12222227','super-secret-hash');INSERT 1 0

The table is used by different people in your organization, such as analysts, to access data and produce ad-hoc reports. To allow access to analysts, you add a special user in the database:

db=#CREATEUSERanalyst;CREATE USERdb=#GRANTSELECTONusersTOanalyst;GRANT

The user analyst can now access the users table:

db=#\connectdbanalystYou are now connected to database "db" as user "analyst".db=>SELECT*FROMusers; id │ username │ personal_id │   password_hash────┼──────────┼─────────────┼───────────────────  1 │ haki     │ 12222227    │ super-secret-hash

As mentioned previously, analysts access users data to produce reports and conduct analysis, but they should not have access to sensitive information or PII.

To provide granular control over which data a user can access in a table, PostgreSQL allows you to grant permissions only on specific columns of a table:

db=#\connectdbpostgresYou are now connected to database "db" as user "postgres".db=#REVOKESELECTONusersFROManalyst;REVOKEdb=#GRANTSELECT(id,username)ONusersTOanalyst;GRANT

After revoking the existing select permission on the table, you granted analyst select permission only on the id and username columns. Now, analyst can no longer access these columns:

db=#\connectdbanalystYou are now connected to database "db" as user "analyst".db=>SELECT*FROMusers;ERROR:  permission denied for table usersdb=>SELECTid,username,personal_idFROMusers;ERROR:  permission denied for table usersdb=>SELECTid,usernameFROMusers; id │ username────┼──────────  1 │ haki

Notice that when the user analyst attempts to access any of the restricted columns, either explicitly or implicitly using *, they get a "permission denied" error.

Match Against Multiple Patterns

It's not uncommon to use pattern matching in SQL. For example, here is a query to find users with a "gmail.com" email account:

SELECT*FROMusersWHEREemailLIKE'%@gmail.com';

This query uses the wildcard '%' to find users with emails that end with "@gmail.com". What if, for example, in the same query you also want to find users with a "yahoo.com" email account?

SELECT*FROMusersWHEREemailLIKE'%@gmail.com'ORemailLIKE'%@yahoo.com'

To match against either one of these patterns, you can construct an OR condition. In PostgreSQL however, there is another way to match against multiple patterns:

SELECT*FROMusersWHEREemailSIMILARTO'%@gmail.com|%@yahoo.com'

Using SIMILAR TO you can match against multiple patterns and keep the query simple.

Another way to match against multiple patterns is using regexp:

SELECT*FROMusersWHEREemail~'@gmail\.com$|@yahoo\.com$'

When using regexp you need to take be a bit more cautious. A period "." will match anything, so to match the period "." in gmail.com or yahoo.com, you need to add the escape character "\.".

When I posted this on twitter I got some interesting responses. One comment from the official account of psycopg, a PostgreSQL driver for Python, suggested another way:

SELECT*FROMusersWHEREemail~ANY('{@gmail\.com$|@yahoo\.com$}')

This query uses the ANY operator to match against an array of patterns. If an email matches any of the patterns, the condition will be true. This approach is easier to work with from a host language such as Python:

withconnection.cursor()ascursor:cursor.execute('''        SELECT *        FROM users        WHERE email ~ ANY(ARRAY%(patterns)s)    '''%{'patterns':['@gmail\.com$','@yahoo\.com$',],})

Unlike the previous approach that used SIMILAR TO, using ANY you can bind a list of patterns to the variable.

Find the Current Value of a Sequence Without Advancing It

If you ever needed to find the current value of a sequence, your first attempt was most likely using currval:

db=#SELECTcurrval('sale_id_seq');ERROR:  currval of sequence "sale_id_seq" is not yet defined in this session

Just like me, you probably found that currval only works if the sequence was defined or used in the current session. Advancing a sequence for no good reason is usually not something you want to do, so this is not an acceptable solution.

In PostgreSQL 10 the table pg_sequences was added to provide easy access to information about sequences:

db=#SELECT*FROMpg_sequencesWHEREsequencename='sale_id_seq';─[ RECORD 1 ]─┬────────────schemaname    │ publicsequencename  │ sale_id_seqsequenceowner │ dbdata_type     │ integerstart_value   │ 1min_value     │ 1max_value     │ 2147483647increment_by  │ 1cycle         │ fcache_size    │ 1last_value    │ 155

This table can answer your question, but it's not really a "lesser known feature", it's just another table in the information schema.

Another way to get the current value of a sequence is using the undocumented function pg_sequence_last_value:

db=#SELECTpg_sequence_last_value('sale_id_seq'); pg_sequence_last_value────────────────────────                   155

It's not clear why this function is not documented, but I couldn't find any mention of it in the official documentation. Take that under consideration if you decide to use it.

Another interesting thing I found while I was researching this, is that you can query a sequence, just like you would a table:

db=#SELECT*FROMsale_id_seq; last_value │ log_cnt │ is_called────────────┼─────────┼───────────        155 │      10 │ t

This really makes you wonder what other types of objects you can query in PostgreSQL, and what you'll get in return.

It's important to note that this feature should not be used for anything except getting a cursory look at a sequence. You should not try to update ID's based on values from this output, for that you should use nextval.

Use `\copy` With Multi-line SQL

If you work with psql a lot you probably use \COPY very often to export data from the database. I know I do. One of the most annoying things about \COPY is that it does not allow multi-line queries:

db=#\COPY(\copy: parse error at end of line

When you try to add a new line to a \copy command you get this error message.

To overcome this restriction, my first idea was to use a view:

db=#CREATEVIEWv_department_dbasASSELECTdepartment,count(*)ASemployeesFROMempWHERErole='dba'GROUPBYdepartmentORDERBYemployees;CREATE VIEWdb=#\COPY(SELECT*FROMv_department_dbas)TOdepartment_dbas.csvWITHCSVHEADER;COPY 5db=#DROPVIEWv_department_dbas;DROP VIEW;

This works, but if something fails in the middle it can leave views laying around. I like to keep my schema tidy, so I looked for a way to automatically cleanup after me. A quick search brought up temporary views:

db=#CREATETEMPORARYVIEWv_department_dbasAS#...CREATEVIEWdb=#\COPY(SELECT*FROMv_department_dbas)TOdepartment_dbas.csvWITHCSVHEADER;COPY 5

Using temporary views I no longer had to cleanup after myself, because temporary views are automatically dropped when the session terminates.

I used temporary views for a while, until I struck this little gem in the psql documentation:

db=#COPY(SELECTdepartment,count(*)ASemployeesFROMempWHERErole='dba'GROUPBYdepartmentORDERBYemployees)TOSTDOUTWITHCSVHEADER\gdepartment_dbas.csvCOPY5

Nice, right? Let's break it down:

Use COPY instead of \COPY: the COPY command is a server command executed in the server, and \COPY is a psql command with the same interface. So while \COPY does not support multi-line queries, COPY does!
Write results to STDOUT: Using COPY we can write results to a directory on the server, or write results to the standard output, using TO STDOUT.
Use \g to write STDOUT to local file: Finally, psql provides a command to write the output from standard output to a file.

Combining these three features did exactly what I wanted.

Copy expert

If you move a lot of data around, don't miss the fastest way to load data into PostgreSQL using Python.

Prevent Setting the Value of an Auto Generated Key

If you are using auto generated primary keys in PostgreSQL, it's possible you are still using the SERIAL datatype:

CREATETABLEsale(idSERIALPRIMARYKEY,sold_atTIMESTAMPTZ,amountINT);

Behind the scenes, PostgreSQL creates a sequence to use when rows are added:

db=#INSERTINTOsale(sold_at,amount)VALUES(now(),1000);INSERT 0 1db=#SELECT*FROMsale; id │           sold_at             │ amount────┼───────────────────────────────┼────────  1 │ 2021-09-25 10:06:56.646298+03 │   1000

The SERIAL data type is unique to PostgreSQL and has some known problems, so starting at version 10, the SERIAL datatype was softly deprecated in favor of identity columns:

CREATETABLEsale(idINTGENERATEDBYDEFAULTASIDENTITYPRIMARYKEY,sold_atTIMESTAMPTZ,amountINT);

Identity columns work very similar to the SERIAL datatype:

db=#INSERTINTOsale(sold_at,amount)VALUES(now(),1000);INSERT 0 1db=#SELECT*FROMsale; id │           sold_at             │ amount────┼───────────────────────────────┼────────  1 │ 2021-09-25 10:11:57.771121+03 │   1000

But, consider this scenario:

db=#INSERTINTOsale(id,sold_at,amount)VALUES(2,now(),1000);INSERT 0 1db=#INSERTINTOsale(sold_at,amount)VALUES(now(),1000);ERROR:  duplicate key value violates unique constraint "sale_pkey"DETAIL:  Key (id)=(2) already exists.

Why did it fail?

The first INSERT command explicitly provides the value 2 of the id column, so the sequence was not used.
The second INSERT command does not provide a value for id, so the sequence is used. The next value of the sequence happened to be 2, so the command failed with a unique constraint violation.

Auto-incrementing IDs rarely need to be set manually, and doing so can cause a mess. So how can you prevent users from setting them?

CREATETABLEsale(idINTGENERATEDALWAYSASIDENTITYPRIMARYKEY,sold_atTIMESTAMPTZ,amountINT);

Instead of using GENERATED BY DEFAULT, use GENERATED ALWAYS. To understand the difference, try the same scenario again:

db=#INSERTINTOsale(sold_at,amount)VALUES(now(),1000);INSERT 0 1db=#INSERTINTOsale(id,sold_at,amount)VALUES(2,now(),1000);ERROR:  cannot insert into column "id"DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.HINT:  Use OVERRIDING SYSTEM VALUE to override.

What changed?

The first INSERT does not provide a value for id and completes successfully.
The second INSERT command however, attempts to set the value 2 for id and fails!

In the error message, PostgreSQL is kind enough to offer a solution for when you actually do want to set the value for an identity column explicitly:

db=#INSERTINTOsale(id,sold_at,amount)OVERRIDINGSYSTEMVALUEVALUES(2,now(),1000);INSERT 0 1

By adding the OVERRIDING SYSTEM VALUE to the INSERT command you explicitly instruct PostgreSQL to allow you to set the value of an identity column. You still have to handle a possible unique constraint violation, but you can no longer blame PostgreSQL for it!

Two More Ways to Produce a Pivot Table

In one of my previous articles I demonstrated how to produce pivot tables using conditional aggregates. After writing the article, I found two more ways to generate pivot tables in PostgreSQL.

Say you want to get the number of employees, at each role, in each department:

db=#WITHemployeesAS(SELECT*FROM(VALUES('Haki','R&D','Manager'),('Dan','R&D','Developer'),('Jax','R&D','Developer'),('George','Sales','Manager'),('Bill','Sales','Developer'),('David','Sales','Developer'))ASt(name,department,role))SELECTrole,department,count(*)FROMemployeesGROUPBYrole,department;   role    │ department │ count───────────┼────────────┼─────── Developer │ Sales      │     2 Manager   │ Sales      │     1 Manager   │ R&D        │     1 Developer │ R&D        │     2

A better way of viewing this would be as a pivot table. In psql you can use the \crosstabview command to transform the results of the last query to a pivot table:

db=#\crosstabview   role    │ Sales │ R&D───────────┼───────┼───── Developer │     2 │   2 Manager   │     1 │   1

Magic!

By default, the command will produce the pivot table from the first two columns, but you can control that with arguments:

db=#\crosstabviewdepartmentrole department │ Developer │ Manager────────────┼───────────┼───────── Sales      │         2 │       1 R&D        │         2 │       1

Another, slightly less magical way to produce a pivot table is using the built-in tablefunc extension:

db=#CREATEEXTENSIONtablefunc;CREATE EXTENSIONdb=#SELECT*FROMcrosstab('    SELECT role, department, count(*) AS employees    FROM employees    GROUP BY 1, 2    ORDER BY role','    SELECT DISTINCT department    FROM employees    ORDER BY 1')ASt(roletext,salesint,rndint);   role    │ sales │ rnd───────────┼───────┼───── Developer │     2 │   2 Manager   │     1 │   1

Using the function crosstab you can produce a pivot table. The downside of this method is that you need to define the output columns in advance. The advantage however, is that the crosstab function produces a table, which you can use as a sub-query for further processing.

Dollar Quoting

If you store text fields in your database, especially entire paragraphs, you are probably familiar with escape characters. For example, to include a single quote ' in a text literal you need to escape it using another single quote '':

db=#SELECT'John''s Pizza';   ?column?────────────── John's Pizza

When text starts to get bigger, and include characters like backslashes and new lines, it can get pretty annoying to add escape characters. To address this, PostgreSQL provides another way to write string constants:

db=#SELECT$$a longstring with new linesand 'single quotes'and "double quotesPostgreSQL doesn't mind ;)$$AStext;           text─────────────────────────── a long                   ↵ string with new lines    ↵ and 'single quotes'      ↵ and "double quotes       ↵↵ PostgreSQL doesn't mind ;)

Notice the dollar signs $$ at the beginning and end of the string. Anything in between $$ is treated as a string. PostgreSQL calls this "Dollar Quoting".

But there is more, if you happen to need to use the sign $$ in the text, you can add a tag, which makes this even more useful. For example:

db=#SELECT$JSON${"name": "John's Pizza","tagline": "Best value for your $$"}$JSON$ASjson;                  json───────────────────────────────────────── {                                      ↵"name": "John's Pizza",            ↵"tagline": "Best value for your $$"↵ }

Notice that we choose to tag this block with $JSON$ , so the sign "$$" was included as a whole in the output.

You can also use this to quickly generate jsonb objects that include special characters:

db=#SELECT$JSON${"name": "John's Pizza","tagline": "Best value for your $$"}$JSON$::jsonbASjson;                          json──────────────────────────────────────────────────────── {"type": "book", "title": "How to get $$ in 21 days"}

The value is now a jsonb object which you can manipulate as you wish!

Comment on Database Objects

PostgreSQL has this nice little feature where you can add a comments on just about every database object. For example, adding a comment on a table:

db=#COMMENTONTABLEsaleIS'Sales made in the system';COMMENT

You can now view this comment in psql (and probably other IDEs):

db=#\dt+sale                                  List of relations Schema │ Name │ Type  │ Owner │ Persistence │    Size    │       Description────────┼──────┼───────┼───────┼─────────────┼────────────┼────────────────────────── public │ sale │ table │ haki  │ permanent   │ 8192 bytes │ Sales made in the system

You can also add comments on table columns, and view them when using extended describe:

db=#COMMENTONCOLUMNsale.sold_atIS'When was the sale finalized';COMMENTdb=#\d+sale  Column  │           Type           │         Description──────────┼──────────────────────────┼───────────────────────────── id       │ integer                  │ sold_at │ timestamp with time zone │ When was the sale finalized amount   │ integer                  │

You can also combine the COMMENT command with dollar quoting to include longer and more meaningful descriptions of, for example, functions:

COMMENTONFUNCTIONgenerate_random_stringIS$docstring$Generatearandomstringatagivenlengthfromalistofpossiblecharacters.Parameters:-length(int):lengthoftheoutputstring-characters(text):possiblecharacterstochoosefromExample:db=#SELECTgenerate_random_string(10);     generate_random_string────────────────────────     o0QsrMYRvp    db=# SELECT generate_random_string(3, 'AB');     generate_random_string────────────────────────     ABB$docstring$;

This is a function I used in the past to demonstrate the performance impact of medium sized texts on performance. Now I no longer have to go back to the article to remember how to use the function, I have the docstring right there in the comments:

db=#\df+generate_random_stringList of functions────────────┬────────────────────────────────────────────────────────────────────────────────Schema      │ publicName        │ generate_random_string/* ... */Description │ Generate a random string at a given length from a list of possible characters.↵│                                                                               ↵│ Parameters:                                                                   ↵│                                                                               ↵│     - length (int): length of the output string                               ↵│     - characters (text): possible characters to choose from                   ↵│                                                                               ↵│ Example:                                                                      ↵│                                                                               ↵│     db=# SELECT generate_random_string(10);                                   ↵│      generate_random_string                                                   ↵│     ────────────────────────                                                  ↵│      o0QsrMYRvp                                                               ↵│                                                                               ↵│     db=# SELECT generate_random_string(3, 'AB');                              ↵│      generate_random_string                                                   ↵│     ────────────────────────                                                  ↵│      ABB                                                                      ↵│

Keep a Separate History File Per Database

If you are working with CLI tools you probably use the ability to search past commands very often. In bash and psql, a reverse search is usually available by hitting CTRL + R.

If in addition to working with the terminal, you also work with multiple databases, you might find it useful to keep a separate history file per database:

db=#\setHISTFILE~/.psql_history-:DBNAME

This way, you are more likely to find a relevant match for the database you are currently connected to. You can drop this in your ~/.psqlrc file to make it persistent.

Autocomplete Reserved Words in Uppercase

There is always a lot of debate (and jokes!) on whether keywords in SQL should be in lower or upper case. I think my opinion on this subject is pretty clear.

If like me, you like using uppercase keywords in SQL, there is an option in psql to autocomplete keywords in uppercase:

db=#selec<tab>db=#selectdb=#\setCOMP_KEYWORD_UPPERupperdb=#selec<tab>db=#SELECT

After setting COMP_KEYWORD_UPPER to upper, when you hit TAB for autocomplete, keywords will be autocompleted in uppercase.

Sleep for Interval

Delaying the execution of a program can be pretty useful for things like testing or throttling. To delay the execution of a program in PostgreSQL, the go-to function is usually pg_sleep:

db=#\timingTiming is on.db=#SELECTpg_sleep(3); pg_sleep──────────(1 row)Time: 3014.913 ms (00:03.015)

The function sleeps for the given number of seconds. However, when you need to sleep for longer than just a few seconds, calculating the number of seconds can be annoying, for example:

db=#SELECTpg_sleep(14400);

How long will this function sleep for? Don't take out the calculator, the function will sleep for 4 minutes.

To make it more convenient to sleep for longer periods of time, PostgreSQL offers another function:

db=#SELECTpg_sleep_for('4 minutes');

Unlike its sibling pg_sleep, the function pg_sleep_for accepts an interval, which is much more natural to read and understand than the number of seconds.

Get the First or Last Row in a Group Without Sub-Queries

When I initially compiled this list I did not think about this feature as a lesser known one, mostly because I use it all the time. But to my surprise, I keep running into weird solutions to this problem, that can be easily solved with what I'm about to show you, so I figured it deserves a place on the list!

Say you have the this table of students:

db=#SELECT*FROMstudents;  name  │ class │ height────────┼───────┼──────── Haki   │ A     │    186 Dan    │ A     │    175 Jax    │ A     │    182 George │ B     │    178 Bill   │ B     │    167 David  │ B     │    178

⚙ Table data

You can use the following CTE to reproduce queries in this section

WITHstudentsAS(SELECT*FROM(VALUES('Haki','A',186),('Dan','A',175),('Jax','A',182),('George','B',178),('Bill','B',167),('David','B',178))ASt(name,class,height))SELECT*FROMstudents;

How would you get the entire row of the tallest student in each class?

On first thought you might try something like this:

SELECTclass,max(height)astallestFROMstudentsGROUPBYclass; class │ tallest───────┼───────── A     │     186 B     │     178

This gets you the height, but it doesn't get you the name of the student. As a second attempt you might try to find the tallest student based on its height, using a sub-query:

SELECT*FROMstudentsWHERE(class,height)IN(SELECTclass,max(height)astallestFROMstudentsGROUPBYclass);  name  │ class │ height────────┼───────┼──────── Haki   │ A     │    186 George │ B     │    178 David  │ B     │    178

Now you have all the information about the tallest students in each class, but there is another problem.

side note

The ability to match a set of records like in the previous query ((class, height) IN (...)), is another lesser known, but a very powerful feature of PostgreSQL.

In class "B", there are two students with the same height, which also happen to be the tallest. Using the aggregate function MAX you only get the height, so you may encounter this type of situation.

The challenge with using MAX is that you choose the height based only on the height, which makes perfect sense in this case, but you still need to pick just one student. A different approach that lets you "rank" rows based on more than one column, is using a window function:

SELECTstudents.*,ROW_NUMBER()OVER(PARTITIONBYclassORDERBYheightDESC,name)ASrnFROMstudents;  name  │ class │ height │ rn────────┼───────┼────────┼──── Haki   │ A     │    186 │  1 Jax    │ A     │    182 │  2 Dan    │ A     │    175 │  3 David  │ B     │    178 │  1 George │ B     │    178 │  2 Bill   │ B     │    167 │  3

To "rank" students bases on their height you can attach a row number for each row. The row number is determined for each class (PARTITION BY class) and ranked first by height in descending order, and then by the students' name (ORDER BY height DESC, name). Adding the student name in addition to the height makes the results deterministic (assuming the name is unique).

To get the rows of only the tallest student in each class you can use a sub-query:

SELECTname,class,heightFROM(SELECTstudents.*,ROW_NUMBER()OVER(PARTITIONBYclassORDERBYheightDESC,name)ASrnFROMstudents)asinnerWHERErn=1; name  │ class │ height───────┼───────┼──────── Haki  │ A     │    186 David │ B     │    178

You made it! This is the entire row for the tallest student in each class.

Using DISTINCT ON

Now that you went through all of this trouble, let me show you an easier way:

SELECTDISTINCTON(class)*FROMstudentsORDERBYclass,heightDESC,name; name  │ class │ height───────┼───────┼──────── Haki  │ A     │    186 David │ B     │    178

Pretty nice, right? I was blown away when I first discovered DISTINCT ON. Coming from Oracle, there was nothing like that, and as far as I know, no other database other than PostgreSQL does.

Intuitively understand DISTINCT ON

To understand how DISTINCT ON works, let's go over what it does step by step. This is the raw data in the table:

SELECT*FROMstudents;  name  │ class │ height────────┼───────┼──────── Haki   │ A     │    186 Dan    │ A     │    175 Jax    │ A     │    182 George │ B     │    178 Bill   │ B     │    167 David  │ B     │    178

Next, sort the data:

SELECT*FROMstudentsORDERBYclass,heightDESC,name;  name  │ class │ height────────┼───────┼──────── Haki   │ A     │    186 Jax    │ A     │    182 Dan    │ A     │    175 David  │ B     │    178 George │ B     │    178 Bill   │ B     │    167

Then, add the DISTINCT ON clause:

SELECTDISTINCTON(class)*FROMstudentsORDERBYclass,heightDESC,name;

To understand what DISTINCT ON does at this point, we need to take two steps.

First, split the data to groups based on the columns in the DISTINCT ON clause, in this case by class:

  name  │ class │ height
─────────────────────────
 Haki   │ A     │    186  ┓
 Jax    │ A     │    182  ┣━━ class=A
 Dan    │ A     │    175  ┛

 David  │ B     │    178  ┓
 George │ B     │    178  ┣━━ class=B
 Bill   │ B     │    167  ┛

Next, keep only the first row in each group:

  name  │ class │ height
─────────────────────────
 Haki   │ A     │    186  ┣━━ class=A
 David  │ B     │    178  ┣━━ class=B

And there you have it! The tallest student in each class.

The only requirement DISTINCT ON has, is that the leading columns in the ORDER BY clause will match the columns in the DISTINCT ON clause. The remaining columns in the ORDER BY clause are used to determine which row is selected for each group.

To illustrate how the ORDER BY affect the results, consider this query to find the shortest student in each class:

SELECTDISTINCTON(class)*FROMstudentsORDERBYclass,height,name; name │ class │ height──────┼───────┼──────── Dan  │ A     │    175 Bill │ B     │    167

To pick the shortest student in each class, you only have to change the sort order, so that the first row of each group is the shortest student.

Generate UUID Without Extensions

To generate UUIDs in PostgreSQL prior to version 13 you probably used the uuid-ossp extension:

db=#CREATEEXTENSION"uuid-ossp";CREATE EXTENSIONdb=#SELECTuuid_generate_v4()ASuuid;                 uuid────────────────────────────────────── 8e55146d-0ce5-40ab-a346-5dbd466ff5f2

Starting at version 13 there is a built-in function to generate random (version 4) UUIDs:

db=#SELECTgen_random_uuid()ASuuid;                 uuid────────────────────────────────────── ba1ac0f5-5d4d-4d80-974d-521dbdcca2b2

The uuid-ossp extension is still needed if you want to generate UUIDs other than version 4.

Generate Reproducible Random Data

Generating radom data is very useful for many things such for demonstrations or testing. In both cases, it's also useful to be able to reproduce the "random" data.

Using PostgreSQL random function you can produce different types of random data. For example:

db=#SELECTrandom()ASrandom_float,ceil(random()*10)ASrandom_int_0_10,'2022-01-01'::date+interval'1 days'*ceil(random()*365)ASrandom_day_in_2022;─[ RECORD 1 ]──────┬────────────────────random_float       │ 0.6031888056092001random_int_0_10    │ 3random_day_in_2022 │ 2022-11-10 00:00:00

If you execute this query again, you will get different results:

db=#SELECTrandom()ASrandom_float,ceil(random()*10)ASrandom_int_0_10,'2022-01-01'::date+interval'1 days'*ceil(random()*365)ASrandom_day_in_2022;─[ RECORD 1 ]──────┬────────────────────random_float       │ 0.7363406030115378random_int_0_10    │ 2random_day_in_2022 │ 2022-02-23 00:00:00

To generate reproducible random data, you can use setseed:

db=#SELECTsetseed(0.4050); setseed─────────(1 row)db=#SELECTrandom()ASrandom_float,ceil(random()*10)ASrandom_int_0_10,'2022-01-01'::date+interval'1 days'*ceil(random()*365)ASrandom_day_in_2022FROMgenerate_series(1,2);    random_float    │ random_int_0_10 │ random_day_in_2022────────────────────┼─────────────────┼───────────────────── 0.1924247516794324 │               9 │ 2022-12-17 00:00:00 0.9720620908236377 │               5 │ 2022-06-13 00:00:00

If you execute the same block again in a new session, even in a different database, it will produce the exact same results:

otherdb=#SELECTsetseed(0.4050); setseed─────────(1 row)otherdb=#SELECTrandom()ASrandom_float,ceil(random()*10)ASrandom_int_0_10,'2022-01-01'::date+interval'1 days'*ceil(random()*365)ASrandom_day_in_2022FROMgenerate_series(1,2);    random_float    │ random_int_0_10 │ random_day_in_2022────────────────────┼─────────────────┼───────────────────── 0.1924247516794324 │               9 │ 2022-12-17 00:00:00 0.9720620908236377 │               5 │ 2022-06-13 00:00:00

Notice how the results are random, but still exactly the same. The next time you do a demonstration or share a script, make sure to include setseed so your results could be easily reproduced.

Add Constraints Without Validating Immediately

Constraint are an integral part of any RDBMS. They keep data clean and reliable, and should be used whenever possible. In living breathing systems, you often need to add new constraints, and adding certain types of constraints may require very restrictive locks that interfere with the operation of the live system.

To illustrate, add a simple check constraint on a large table:

db=#ALTERTABLEordersADDCONSTRAINTcheck_price_gt_zeroCHECK(price>=0);ALTER TABLETime: 10745.662 ms (00:10.746)

This statement adds a check constraint on the price of an order, to make sure it's greater than or equal to zero. In the process of adding the constraint, the database scanned the entire table to make sure the constraint is valid for all the existing rows. The process took ~10s, and during that time, the table was locked.

In PostgreSQL, you can split the process of adding a constraint into two steps.

First, add the constraint and only validate new data, but don't check that existing data is valid:

db=#ALTERTABLEordersADDCONSTRAINTcheck_price_gt_zeroCHECK(price>=0)NOTVALID;ALTER TABLETime: 13.590 ms

The NOT VALID in the end tells PostgreSQL to not validate the new constraint for existing rows. This means the database does not have to scan the entire table. Notice how this statement took significantly less time compared to the previous, it was almost instantaneous.

Next, validate the constraint for the existing data with a much more permissive lock that allows other operations on the table:

db=#ALTERTABLEordersVALIDATECONSTRAINTcheck_price_gt_zero;ALTER TABLETime: 11231.189 ms (00:11.231)

Notice how validating the constraint took roughly the same time as the first example, which added and validated the constraint. This reaffirms that when adding a constraint to an existing table, most time is spent validating existing rows. Splitting the process into two steps allows you to reduce the time the table is locked.

The documentation also mentions another use case for NOT VALID - enforcing a constraint only on future updates, even if there are some existing bad values. That is, you would add NOT VALID and never do the VALIDATE.

Check out this great article from the engineering team at Paypal about making schema changes without downtime, and my own tip to disable constraints and indexes during bulk loads.

Synonyms in PostgreSQL

Synonyms are a way to reference objects by another name, similar to symlinks in Linux. If you're coming from Oracle you are probably familiar with synonyms, but otherwise you may have never heard about it. PostgreSQL does not have a feature called "synonyms", but it doesn't mean it's not possible.

To have a name reference a different database object, you first need to understand how PostgreSQL resolves unqualified names. For example, if you are connected to the database with the user haki, and you reference a table foo, PostgreSQL will search for the following objects, in this order:

haki.foo
public.foo

This order is determined by the search_path parameter:

db=#SHOWsearch_path;   search_path─────────────────"$user", public

The first value, "$user" is a special value that resolves to the name of the currently connected user. The second value, public, is the name of the default schema.

To demonstrate some of the things you can do with search path, create a table foo in database db:

db=#CREATETABLEfoo(valueTEXT);CREATE TABLEdb=#INSERTINTOfooVALUES('A');INSERT 0 1db=#SELECT*FROMfoo; value─────── A(1 row)

If for some reason you want the user haki to view a different object when they reference the name foo, you have two options:

1. Create an object named foo in a schema called haki:

db=#CREATESCHEMAhaki;CREATE SCHEMAdb=#CREATETABLEhaki.foo(valuetext);CREATE TABLEdb=#INSERTINTOhaki.fooVALUES('B');INSERT 0 1db=#\conninfoYou are connected to database "db" as user "haki"db=#SELECT*FROMfoo;value───────B

Notice how when the user haki referenced the name foo, PostgreSQL resolved the name to haki.foo and not public.foo. This is because the schema haki comes before public in the search path.

2. Update the search path:

db=#CREATESCHEMAsynonyms;CREATE SCHEMAdb=#CREATETABLEsynonyms.foo(valuetext);CREATE TABLEdb=#INSERTINTOsynonyms.fooVALUES('C');INSERT 0 1db=#SHOWsearch_path;   search_path─────────────────"$user", publicdb=#SELECT*FROMfoo; value─────── Adb=#SETsearch_pathTOsynonyms,"$user",public;SETdb=#SELECT*FROMfoo; value─────── C

Notice how after changing the search path to include the schema synonyms, PostgreSQL resolved the name foo to synonyms.foo.

When synonyms are useful?

I used to think that synonyms are a code smell that should be avoided, but over time I found a few valid use cases for when they are useful. One of those use cases are zero downtime migrations.

When you are making changes to a table on a live system, you often need to support both the new and the old version of the application at the same time. This poses a challenge, because each version of the application expects the table to have a different structure.

Take for example a migration to remove a column from a table. While the migration is running, the old version of the application is active, and it expects the column to exist in the table, so you can't simply remove it. One way to deal with this is to release the new version in two stages - the first ignores the field, and the second removes it.

If however, you need to make the change in a single release, you can provide the old version with a view of the table that includes the column, and only then remove it. For that, you can use a "synonym":

db=#\conninfoYou are now connected to database "db" as user "app".db=#SELECT*FROMusers; username │ active──────────┼──────── haki     │ t

The application is connected to database db with the user app. You want to remove the column active, but the application is using this column. To safely apply the migration you need to "fool" the user app into thinking the column is still there while the old version is active:

db=#\conninfoYou are now connected to database "db" as user "admin".db=#CREATESCHEMAapp;CREATE SCHEMAdb=#GRANTUSAGEONSCHEMAappTOapp;GRANTdb=#CREATEVIEWapp.usersASSELECTusername,trueASactiveFROMpublic.users;CREATE VIEWdb=#GRANTSELECTONapp.usersTOapp;GRANT

To "fool" the user app, you created a schema by the name of the user, and a view with a calculated field active. Now, when the application is connected with user app, it will see the view and not the table, so it's safe to remove the column:

db=#\conninfoYou are now connected to database "db" as user "admin".db=#ALTERTABLEusersDROPCOLUMNactive;ALTER TABLEdb=#\connectdbappYou are now connected to database "db" as user "app".db=#SELECT*FROMusers; username │ active──────────┼──────── haki     │ t

You dropped the column and the application sees the calculated field instead! All is left is some cleanup and you are done.

Find Overlapping Ranges

Say you have a table of meetings:

db=#SELECT*FROMmeetings;       starts_at     │        ends_at─────────────────────┼───────────────────── 2021-10-01 10:00:00 │ 2021-10-01 10:30:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 2021-10-01 12:30:00 │ 2021-10-01 12:45:00

⚙ Table data

You can use the following CTE to reproduce the queries in this section:

WITHmeetingsAS(SELECTstarts_at::timestamptzASstarts_at,ends_at::timestamptzASends_atFROM(VALUES('2021-10-01 10:00 UTC','2021-10-01 10:30 UTC'),('2021-10-01 11:15 UTC','2021-10-01 12:00 UTC'),('2021-10-01 12:30 UTC','2021-10-01 12:45 UTC'))ASt(starts_at,ends_at))SELECT*FROMmeetings;

You want to schedule a new meeting, but before you do that, you want to make sure it does not overlap with another meeting. There are several scenarios you need to consider:

[A] New meeting ends after an existing meeting starts

|-------NEW MEETING--------|
    |*******EXISTING MEETING*******|

[B] New meeting starts before an existing meetings ends

        |-------NEW MEETING--------|
|*******EXISTING MEETING*******|

[C] New meeting takes place during an existing meeting

    |----NEW MEETING----|
|*******EXISTING MEETING*******|

[D] Existing meeting takes place while the new meeting is scheduled

|--------NEW MEETING--------|
    |**EXISTING MEETING**|

[E] New meeting is scheduled at exactly the same time as an existing meeting

|--------NEW MEETING--------|
|*****EXISTING MEETING******|

To test a query that check for overlaps, you can prepare a table with all the scenarios above, and try a simple condition:

WITHnew_meetingsAS(SELECTid,starts_at::timestamptzasstarts_at,ends_at::timestamptzasends_atFROM(VALUES('A','2021-10-01 11:10 UTC','2021-10-01 11:55 UTC'),('B','2021-10-01 11:20 UTC','2021-10-01 12:05 UTC'),('C','2021-10-01 11:20 UTC','2021-10-01 11:55 UTC'),('D','2021-10-01 11:10 UTC','2021-10-01 12:05 UTC'),('E','2021-10-01 11:15 UTC','2021-10-01 12:00 UTC'))ast(id,starts_at,ends_at))SELECT*FROMmeetings,new_meetingsWHEREnew_meetings.starts_atBETWEENmeetings.starts_atandmeetings.ends_atORnew_meetings.ends_atBETWEENmeetings.starts_atandmeetings.ends_at;       starts_at     │        ends_at      │ id │       starts_at     │        ends_at─────────────────────┼─────────────────────┼────┼─────────────────────┼──────────────────── 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

The first attempt found an overlap with 4 out of 5 scenarios. It did not detect the overlap for scenario D, where the new meetings starts before and ends after an existing meeting. To handle this scenario as well, you need to make the condition a bit longer:

WITHnew_meetingsAS(/* ... */)SELECT*FROMmeetings,new_meetingsWHEREnew_meetings.starts_atBETWEENmeetings.starts_atandmeetings.ends_atORnew_meetings.ends_atBETWEENmeetings.starts_atandmeetings.ends_atORmeetings.starts_atBETWEENnew_meetings.starts_atandnew_meetings.ends_atORmeetings.ends_atBETWEENnew_meetings.starts_atandnew_meetings.ends_at;       starts_at     │        ends_at      │ id │       starts_at     │        ends_at─────────────────────┼─────────────────────┼────┼─────────────────────┼──────────────────── 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

The query now detects an overlap in all 5 scenarios, but, consider these additional scenarios:

[F] New meeting is scheduled immediately after an existing meetings

                            |--------NEW MEETING--------|
|*****EXISTING MEETING******|

[G] New meeting is scheduled to end immediately when an existing meeting starts

|--------NEW MEETING--------|
                            |*****EXISTING MEETING******|

Back-to-back meetings are very common, and they should not be detected as an overlap. Adding the two scenarios to the test, and trying the query:

WITHnew_meetingsAS(SELECTid,starts_at::timestamptzasstarts_at,ends_at::timestamptzasends_atFROM(VALUES('A','2021-10-01 11:10 UTC','2021-10-01 11:55 UTC'),('B','2021-10-01 11:20 UTC','2021-10-01 12:05 UTC'),('C','2021-10-01 11:20 UTC','2021-10-01 11:55 UTC'),('D','2021-10-01 11:10 UTC','2021-10-01 12:05 UTC'),('E','2021-10-01 11:15 UTC','2021-10-01 12:00 UTC'),('F','2021-10-01 12:00 UTC','2021-10-01 12:10 UTC'),('G','2021-10-01 11:00 UTC','2021-10-01 11:15 UTC'))ast(id,starts_at,ends_at))SELECT*FROMmeetings,new_meetingsWHEREnew_meetings.starts_atBETWEENmeetings.starts_atandmeetings.ends_atORnew_meetings.ends_atBETWEENmeetings.starts_atandmeetings.ends_atORmeetings.starts_atBETWEENnew_meetings.starts_atandnew_meetings.ends_atORmeetings.ends_atBETWEENnew_meetings.starts_atandnew_meetings.ends_at;       starts_at     │        ends_at      │ id │       starts_at     │        ends_at─────────────────────┼─────────────────────┼────┼─────────────────────┼──────────────────── 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ F  │ 2021-10-01 12:00:00 │ 2021-10-01 12:10:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ G  │ 2021-10-01 11:00:00 │ 2021-10-01 11:15:00

The two back-to-back meetings, scenarios F and G, are incorrectly classified as overlaps. This is caused because the operator BETWEEN in inclusive. To implement this condition without using BETWEEN you would have to do something like this:

WITHnew_meetingsAS(/* ... */)SELECT*FROMmeetings,new_meetingsWHERE(new_meetings.starts_at>meetings.starts_atANDnew_meetings.starts_at<meetings.ends_at)OR(new_meetings.ends_at>meetings.starts_atANDnew_meetings.ends_at<meetings.ends_at)OR(meetings.starts_at>new_meetings.starts_atANDmeetings.starts_at<new_meetings.ends_at)OR(meetings.ends_at>new_meetings.starts_atANDmeetings.ends_at<new_meetings.ends_at)OR(meetings.starts_at=new_meetings.starts_atANDmeetings.ends_at=new_meetings.ends_at);       starts_at     │        ends_at      │ id │       starts_at     │        ends_at─────────────────────┼─────────────────────┼────┼─────────────────────┼──────────────────── 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

The query correctly identifies scenarios A - E as overlaps, and does not identify the back-to-back scenarios F and G as overlaps. This is what you wanted. However, this condition is pretty crazy! It can easily get out of control.

This is where the following operator in PostgreSQL proves itself as extremely valuable:

WITHnew_meetingsAS(SELECTid,starts_at::timestamptzasstarts_at,ends_at::timestamptzasends_atFROM(VALUES('A','2021-10-01 11:10 UTC','2021-10-01 11:55 UTC'),('B','2021-10-01 11:20 UTC','2021-10-01 12:05 UTC'),('C','2021-10-01 11:20 UTC','2021-10-01 11:55 UTC'),('D','2021-10-01 11:10 UTC','2021-10-01 12:05 UTC'),('E','2021-10-01 11:15 UTC','2021-10-01 12:00 UTC'),('F','2021-10-01 12:00 UTC','2021-10-01 12:10 UTC'),('G','2021-10-01 11:00 UTC','2021-10-01 11:15 UTC'))ast(id,starts_at,ends_at))SELECT*FROMmeetings,new_meetingsWHERE(new_meetings.starts_at,new_meetings.ends_at)OVERLAPS(meetings.starts_at,meetings.ends_at);       starts_at     │        ends_at      │ id │       starts_at     │        ends_at─────────────────────┼─────────────────────┼────┼─────────────────────┼──────────────────── 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

This is it! Using the OVERLAPS operator you can replace those 5 complicated conditions, and keep the query short and simple to read and understand.

↧

Andreas 'ads' Scherbaum: John Naylor

November 8, 2021, 6:00 am

≫ Next: Luca Ferrari: PostgreSQL USB Sticks in the Attic!

≪ Previous: Haki Benita: Lesser Known PostgreSQL Features

PostgreSQL Person of the Week Interview with John Naylor: I was born in Oklahoma, USA, but have been nomadic since 2017. During that time I’ve mostly lived in Southeast Asia, but in 2020-21 I lived in Barbados and now the Dominican Republic. The “thing that need not be named” slowed me down but didn’t stop me. I haven’t commuted to an office since 2011, and I’ve always had a fascination for foreign cultures, so it was natural that I’d end up doing this.

↧

Luca Ferrari: PostgreSQL USB Sticks in the Attic!

November 7, 2021, 4:00 pm

≫ Next: Luca Ferrari: My Perl Weekly Challenge Solutions in PostgreSQL

≪ Previous: Andreas 'ads' Scherbaum: John Naylor

USB sticks I found in the attic…

PostgreSQL USB Sticks in the Attic!

TLDR: this is not a technical post!

Cleaning the attic, I found a couple of old PostgreSQL USB Sticks.
It happened that, back at the Italian PostgreSQL Day (PGDay.IT) 2012, we (at the time I was an happy memeber of ITPUG) created PostgreSQL-branded USB sticks to give away as gadgets to participants.
The USB stick was cool, with soft rubber envelope, a clear white and blue elephant logo on its sides, the size of 4 GB (that back then, it was quite common) and a necklace.
However, it had something that I didn’t like.
So, when I was the ITPUG president back in 2013, I decided to change the design of the USB stick (as well as doubling its size).
Let’s inspect the differences, and please apologize if the sticks printing is not clear anymore, but well, some years have gone by:

The upper stick is the 2012 edition, the lower one is the 2013 edition.
Do you spot the difference?
Yes, the 2013 edition USB stick did have the PostgreSQL logo on one side and the ITPUG logo on the other side, while the 2012 edition did not have any reference to the organizing and local user group ITPUG!

When I decided to give a new spark to the ITPUG, I also decided to improve its visibility via such gadgets, that were too much generic and, for this reason, also re-usable in other events as PostgreSQL related gadgets.

Therefore, such gadget was both presenting PostgreSQL and the italian users’ group, no shame at all!

↧

Luca Ferrari: My Perl Weekly Challenge Solutions in PostgreSQL

November 8, 2021, 4:00 pm

≫ Next: Jonathan Katz: Multifactor SSO Authentication for Postgres on Kubernetes

≪ Previous: Luca Ferrari: PostgreSQL USB Sticks in the Attic!

Pushing PostgreSQL solutions to my own repositories.

My Perl Weekly Challenge Solutions in PostgreSQL

Starting back at Perl Weekly Challenge 136, I decided to try to implement, whenever possible (to me), the challenges not only in Raku (i.e., Perl 6), but also in PostgreSQL (either pure SQL or plpgsql).

Recently, I modified my sync script that drags solutions from the official Perl Weekly Challenge repository to my own repositories, and of course, I added a way to synchronized PostgreSQL solutions.

The solutions are now available on GitHub under the PWC directory of my PostgreSQL examples repository.

↧

Jonathan Katz: Multifactor SSO Authentication for Postgres on Kubernetes

November 9, 2021, 10:51 am

≫ Next: Frits Hoogland: Postgres pgagroal connectionpool

≪ Previous: Luca Ferrari: My Perl Weekly Challenge Solutions in PostgreSQL

Multifactor SSO Authentication for Postgres on Kubernetes

Did you know that PostgreSQL 12 introduced a way for you to provide multifactor (aka "two-factor") authentication to your database?

This comes from the ability to set clientcert=verify-full as part of your pg_hba.conf file, which manages how clients can authenticate to PostgreSQL. When you specify clientcert=verify-full, PostgreSQL requires a connecting client to provide a certificate that is valid against its certificate authority (CA) and the certificate's common name (CN) matches the username the client is authenticating as. This is similar to using the cert method of authentication.

Where does the second factor come in? You can add clientcert=verify-full to another authentication method, such as the password-based scram-sha-256. When you do this, your client has to provide both a valid certificate AND password. Cool!

If you have a public key infrastructure (PKI) set up, you effectively have a single-sign on system for your PostgreSQL databases. You can then treat the password for the user in a local database as a "second factor" for logging in. Again, cool!

Let's put this all together, and see how we can deploy a multifactor single sign-on (SSO) authentication system for Postgres on Kubernetes using cert-manager and PGO, the open source Postgres Operator from Crunchy Data!

↧

Frits Hoogland: Postgres pgagroal connectionpool

November 10, 2021, 7:21 am

≫ Next: Dimitri Fontaine: An introduction to the pg_auto_failover project

≪ Previous: Jonathan Katz: Multifactor SSO Authentication for Postgres on Kubernetes

This blogpost is about a connectionpool that is lesser known than pgbouncer, which is pgagroal. Both are socalled 'external connectionpools', which mean they can serve application/user connections but are not part of an application.

They also serve the same function, which is to serve as a proxy between clients and applications on one side, and to a postgres instance on the other side. In that position, the first obvious advantage is that it can perform as an edge service, concentrating connections from one network, and proxy the requests to the database in a non-exposed network.

Another advantage is that the client/application side connections are decoupled from the database side connections, and therefore can serve badly behaving applications (which create and destroy connections to a database repeatedly) by linking the database connection request to an already setup database connection, instead of initializing and destroying a connection.

CentOS/RHEL/Alma/Rocky/enz. 8 only

Pgagroal is EL version 8 only, because its build scripts check minimal required versions. When you try to build pgagroal on CentOS 7, it will error with the message:

CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.14.0 or higher is required.  You are running version 2.8.12.2

'EL' is a general naming for all Linux distributions that take RedHat's Enterprise distribution as a basis.

Installation

However, when you are on EL version 8, you can use the postgres yum repository to install pgagroal in a very simple way. There is no need to download the source and compile it yourself.

Add the EL 8 postgres yum repositories:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Install pgagroal:
```
sudo dnf -y install pgagroal
```
Add the pgagroal user:
```
sudo useradd pgagroal
```

Now you're set, and can use the pgagroal by starting pgagroal via systemd: sudo systemctl start pgagroal.

Configuration: autostart

By default, pgagroal adds a systemd unit file, but does not enable it (so it gets started during startup), and does not start it. If you want pgagroal to be started by systemd automatically, you must enable the systemctl unit: sudo systemctl enable pgagroal.

Configuration: listen configuration

In order to be defensive and careful, pgagroal by default listens at localhost, at port 2345, which is the reverse of the postgres default port 5432. If you want to use pgagroal as a connection pool in front of an instance of postgres, you probably should change the host and maybe the port settings in the /etc/pgagroal/pgagroal.conf file in the section [pgagroal], so that the clients can reach and communicate with pgagroal at the set host:port combination.

Configuration: server section

To specify where pgagroal needs to connect to, there is a section called [primary] in the /etc/pgagroal/pgagroal.conf file, which allows you to set host and port.

This is where pgagroal is fundamentally different from pgbouncer: pgbouncer allows you to specify multiple databases on multiple machines (see 'section [databases]'), pgagraol allows you to specify a single primary server.

Configuration: pgagroal_hba.conf

Just like postgres, pgagroal can perform host based authentication using its own hba.conf file in /etc/pgagroal, called /etc/pgagroal/pgagraol_hba.conf, which has the same fields as a normal postgres hba.conf file (type, database, user, address, method). By default it performs no authentication.

Configuration: pgagroal_databases.conf

For the configured postgres instance at the host and port number set in the /etc/pgagroal/pgagroal.conf file in the [primary] section, pgagroal can be configured to apply limits for a database, a user or both. The limit is the number of connections for a database, user or database and user combination.

A number of initial connections (so connections created before an application or user request for it) can also be set, but for that a user definition must be created, so pgagroal can use that username and password to authenticate and build a pool of connections.

Configuration: pipeline

Another really important configuration setting in /etc/pgagroal/pgagroal.conf is pipeline. The setting of pipeline defines how a connection is managed by pgagroal. The default value is 'auto', which makes pgagroal choose the pipeline setting based on the configuration.

The most minimal and therefore fastest implementation is 'performance'. This setting does not support transport layer security, and binds a client connection to a database connection for the duration of the session.

The next pipeline configuration option is 'session', which, quite obviously binds a client connection to a database connection for the duration of the session too, but supports all configuration options.

The last pipeline configuration option is 'transaction'. This is a special configuration, because it binds a client to a database connection for the duration of a transaction.

This has the wonderful property that you can have a socalled 'asynchronous connection count', which means that you can have (much) more client connections than having database connections. In other words: this is a potential solution for the often excessively oversized application connection pools.

But there is a huge caveat: because the dynamic transactional binding of clients with database connections, you cannot have any construction used by a client that sets and depends on a server side setting or configuration. This means concretely things like 'SET', 'LISTEN', 'WITH HOLD CURSOR' and 'PREPARE' and 'DEALLOCATE' cannot be used.

Configuration: metrics

A feature that is not present in pgbouncer is the ability to expose runtime statistics in prometheus format, which means statistics can be scraped by a prometheus server.

Conclusion

I cannot come to a definite verdict between pgbouncer and pgagroal. There are reports of issues with pgbouncer in the past, for which I don't know the current state. Pgbouncer configuration feels and seems much less straightforward than pgagroal configuration.

But pgbouncer can serve as a proxy to multiple machines, while pgagraol is limited to one. Pgagroal advertises the explicit design for performance, which I have not tested, while pgbouncer seems to be more generalistic.

↧

Dimitri Fontaine: An introduction to the pg_auto_failover project

November 10, 2021, 8:11 am

≫ Next: Ryan Booz: Generating more realistic sample time-series data with PostgreSQL generate_series()

≪ Previous: Frits Hoogland: Postgres pgagroal connectionpool

We just released pg_auto_failover version 1.6.3 on GitHub, and the binary packages should be already available at the usual PGDG and CitusData places, both for debian based distributions and RPM based distributions too.

This article is an introduction to the pg_auto_failover project: we answer the Five W questions, starting with why does the project exist in the first place?

TL;DR pg_auto_failover is an awesome project. It fills the gap between “Postgres is awesome, makes developping my application so much easier, it solves so many problems for me!” and the next step “so, how do I run Postgres in Production?”. If you’re not sure how to bridge that gap yourself, how to deploy your first production system with automated failover, then pg_auto_failover is for you. It is simple to use, user friendly, and well documented. Star it on the pg_auto_failover GitHub repository and get started today. Consider contributing to the project, it is fully Open Source, and you are welcome to join us.

Buckle up, our guide tour is starting now!

↧

Ryan Booz: Generating more realistic sample time-series data with PostgreSQL generate_series()

November 11, 2021, 6:51 am

≫ Next: Frits Hoogland: What is free memory in Linux?

≪ Previous: Dimitri Fontaine: An introduction to the pg_auto_failover project

In this three-part series on generating sample time-series data, we demonstrate how to use the built-in PostgreSQL function, generate_series(), to more easily create large sets of data to help test various workloads, database features, or just to create fun samples.

In part 1 of the series, we reviewed how generate_series() works, including the ability to join multiple series into a larger table of time-series data - through a feature known as a CROSS (or Cartesian) JOIN. We ended the first post by showing you how to quickly calculate the number of rows a query will produce and modify the parameters for generate_series() to fine-tune the size and shape of the data.

However, there was one problem with the data we could produce at the end of the first post. The data that we were able to generate was very basic and not very realistic. Without more effort, using functions like random() to generate values doesn't provide much control over precisely what numbers are produced, so the data still feels more fake than we might want.

This second post will demonstrate a few ways to create more realistic-looking data beyond a column or two of random decimal values. Read on for more.

In the coming weeks, part 3 of this blog series will add one final tool to the mix - combining the data formatting techniques below with additional equations and relational data to shape your sample time-series output into something that more closely resembles real-life applications.

By the end of this series, you'll be ready to test almost any feature that TimescaleDB offers and create quick datasets for your testing and demos!

A brief review of generate_series()

In the first post, we demonstrated how generate_series() (a Set Returning Function) could quickly create a data set based on a range of numeric values or dates. The generated data is essentially an in-memory table that can quickly create large sets of sample data.

-- create a series of values, 1 through 5, incrementing by 1
SELECT * FROM generate_series(1,5);

generate_series|
---------------|
              1|
              2|
              3|
              4|
              5|


-- generate a series of timestamps, incrementing by 1 hour
SELECT * from generate_series('2021-01-01','2021-01-02', INTERVAL '1 hour');

    generate_series     
------------------------
 2021-01-01 00:00:00+00
 2021-01-01 01:00:00+00
 2021-01-01 02:00:00+00
 2021-01-01 03:00:00+00
 2021-01-01 04:00:00+00
...

We then discussed how the data quickly becomes more complex as we join the various sets together (along with some value returning functions) to create a multiple of both sets together.

This example from the first post joined a timestamp set, a numeric set, and the random() function to create fake CPU data for four fake devices over time.

-- there is an implicit CROSS JOIN between the two generate_series() sets
SELECT time, device_id, random()*100 as cpu_usage 
FROM generate_series('2021-01-01 00:00:00','2021-01-01 04:00:00',INTERVAL '1 hour') as time, 
generate_series(1,4) device_id;


time               |device_id|cpu_usage          |
-------------------+---------+-------------------+
2021-01-01 00:00:00|        1|0.35415126479989567|
2021-01-01 01:00:00|        1| 14.013393572770028|
2021-01-01 02:00:00|        1|   88.5015939122006|
2021-01-01 03:00:00|        1|  97.49037810105996|
2021-01-01 04:00:00|        1|  50.22781125586846|
2021-01-01 00:00:00|        2|  46.41196423062297|
2021-01-01 01:00:00|        2|  74.39903569177027|
2021-01-01 02:00:00|        2|  85.44087332221935|
2021-01-01 03:00:00|        2|  4.329394730750735|
2021-01-01 04:00:00|        2| 54.645873866589056|
2021-01-01 00:00:00|        3|  63.01888063314749|
2021-01-01 01:00:00|        3|  21.70606884856987|
2021-01-01 02:00:00|        3|  32.47610779097485|
2021-01-01 03:00:00|        3| 47.565982341726354|
2021-01-01 04:00:00|        3|  64.34867263419619|
2021-01-01 00:00:00|        4|   78.1768041898232|
2021-01-01 01:00:00|        4|  84.51505102850199|
2021-01-01 02:00:00|        4| 24.029611792753514|
2021-01-01 03:00:00|        4|  17.08996115345549|
2021-01-01 04:00:00|        4| 29.642690955760997|

And finally, we talked about how to calculate the total number of rows your query would generate based on the time range, the interval between timestamps, and the number of "things" for which you are creating fake data.

Range of readings	Length of interval	Number of "devices"	Total rows
1 year	1 hour	4	35,040
1 year	10 minutes	100	5,256,000
6 months	5 minutes	1,000	52,560,000

Still, the main problem remains. Even if we can generate 50 million rows of data with a few lines of SQL, the data we generate isn't very realistic. It's all random numbers, with lots of decimals and minimal variation.

As we saw in the query above (generating fake CPU data), any columns of data that we add to the SELECT query are added to each row of the resulting set. If we add static text (like 'Hello, Timescale!'), that text is repeated for every row. Likewise, adding a function as a column value will be called one time for each row of the final set.

That's what happened with the random() function in the CPU data example. Every row has a different value because the function is called separately for each row of generated data. We can use this to our advantage to begin making the data look more realistic.

With a little more thought and custom PostgreSQL functions, we can start to bring our sample data "to life."

What is realistic data?

This feels like a good time to make sure we're on the same page. What do I mean by "realistic" data?

Using the basic techniques we've already discussed allows you to create a lot of data quickly. In most cases, however, you often know what the data you're trying to explore looks like. It's probably not a bunch of decimal or integer values. Even if the data you're trying to mimic are just numeric values, they likely have valid ranges and maybe a predictable frequency.

Take our simple example of CPU and temperature data from above. With just two fields, we have a few choices to make if we want the generated data to feel more realistic.

Is CPU a percentage? Out of 100% or are we representing multi-core CPUs that can present as 200%, 400%, or 800%?
Is temperature measured in Fahrenheit or Celsius? What are reasonable values for CPU temperature in each unit? Do we store temperature with decimals or as an integer in the schema?
What if we added a "note" field to the schema for messages that our monitoring software might add to the readings from time to time? Would every reading have a note or just when a threshold was reached? Is there a special diagnostic message at the top of each hour that we need to replicate in some way?

Using random() and static text by themselves allows us to generate lots of data with many columns, but it's not going to be very interesting or as useful in testing features in the database.

That's the goal of the second and third posts in this series, helping you to produce sample data that looks more like the real thing without much extra work. Yes, it will still be random, but it will be random within constraints that help you feel more connected to the data as you explore various aspects of time-series data.

And, by using functions, all of the work is easily reusable from table to table.

Walk before you run

In each of the examples below, we'll approach our solutions much as we learned in elementary math class: show your work! It's often difficult to create a function or procedure in PostgreSQL without playing with a plain SQL statement first. This abstracts away the need to think about function inputs and outputs at the outset so that we can focus on how the SQL works to produce the value we want.

Therefore, the examples below show you how to get a value (random numbers, text, JSON, etc.) in a SELECT statement first before converting the SQL into a function that can be reused. This kind of iterative process is a great way to learn features of PostgreSQL, particularly when it's combined with generate_series().

So, take one foot and put it in front of the other, and let's start creating better sample data.

Creating more realistic numbers

In time-series data, numeric values are often the most common data type. Using a function like random() without any other formatting creates very… well... random (and precise) numbers with lots of decimal points. While it works, the values aren't realistic. Most users and devices aren't tracking CPU usage to 12+ decimals. We need a way to manipulate and constrain the final value that's returned in the query.

For numeric values, PostgreSQL provides many built-in functions to modify the output. In many cases, using round() and floor() with basic arithmetic can quickly start shaping the data in a way that better fits your schema and use case.

Let's modify the example query for getting device metrics, returning values for CPU and temperature. We want to update the query to ensure that the data values are "customized" for each column, returning values within a specific range and precision. Therefore, we need to apply a standard formula to each numeric value in our SELECT query.

Final value = random() * (max allowed value - min allowed value) + min allowed value

This equation will always generate a decimal value between (and inclusive of) the min and max value. If random() returns a value of 1, the final output will equal the maximum value. If random() returns a value of 0, then the result will equal the minimum value. Any other number that random() returns will produce some output between the min and max values.

Depending on whether we want a decimal or integer value, we can further format the "final value" of our formula with round() and floor().

This example produces a reading every minute for one hour for 10 devices. The cpu value will always fall between 3 and 100 (with four decimals of precision), and the temperature will always be an integer between 28 and 83.

SELECT
  time,
  device_id,
  round((random()* (100-3) + 3)::NUMERIC, 4) AS cpu,
  floor(random()* (83-28) + 28)::INTEGER AS tempc
FROM 
	generate_series(now() - interval '1 hour', now(), interval '1 minute') AS time, 
	generate_series(1,10,1) AS device_id;


time                         |device_id|cpu    |tempc        |
-----------------------------+---------+-------+-------------+
2021-11-03 12:47:01.181 -0400|        1|53.7301|           61|
2021-11-03 12:48:01.181 -0400|        1|34.7655|           46|
2021-11-03 12:49:01.181 -0400|        1|78.6849|           44|
2021-11-03 12:50:01.181 -0400|        1|95.5484|           64|
2021-11-03 12:51:01.181 -0400|        1|86.3073|           82|
…|...|...|...

By using our simple formula and formatting the result correctly, the query produced the "curated" output (random as it is) we wanted.

The power of functions

But there's also a bit of a letdown here, isn't there? Typing that formula repeatedly for each value - trying to remember the order of parameters and when I need to cast a value - will become tedious quickly. After all, you only have so many keystrokes left.

The solution is to create and use PostgreSQL functions that can take the inputs we need, do the correct calculations, and return the formatted value that we want. There are many ways we could accomplish a calculation like this in a function. Use this example as a starting place for your learning and exploration.

Note:In this example, I chose to return the value from this function as a numeric data type because it can return values that look like integers (no decimals) or floats (decimals). As long as the return values are inserted into a table with the intended schema, this is a "trick" to visually see what we expect - an integer or a float. In general, the numeric data type will often perform worse in queries and features like compression because of how numeric values are represented internally. We recommend avoiding numeric types in schema design whenever possible, preferring the float or integer types instead.

/*
 * Function to create a random numeric value between two numbers
 * 
 * NOTICE: We are using the type of 'numeric' in this function in order
 * to visually return values that look like integers (no decimals) and 
 * floats (with decimals). However, if inserted into a table, the assumption
 * is that the appropriate column type is used. The `numeric` type is often
 * not the correct or most efficient type for storing numbers in a table.
 */
CREATE OR REPLACE FUNCTION random_between(min_val numeric, max_val numeric, round_to int=0) 
   RETURNS numeric AS
$$
 DECLARE
 	value NUMERIC = random()* (min_val - max_val) + max_val;
BEGIN
   IF round_to = 0 THEN 
	 RETURN floor(value);
   ELSE 
   	 RETURN round(value,round_to);
   END IF;
END
$$ language 'plpgsql';

This example function uses the minimum and maximum values provided, applies the "range" formula we discussed earlier, and finally returns a numeric value that either has decimals (to the specified number of digits) or not. Using this function in our query, we can simplify creating formatted values for sample data, and it cleans up the SQL, making it easier to read and use.

SELECT
  time,
  device_id,
  random_between(3,100, 4) AS cpu,
  random_between(28,83) AS temperature_c
FROM 
	generate_series(now() - interval '1 hour', now(), interval '1 minute') AS time, 
	generate_series(1,10,1) AS device_id;

This query provides the same formatted output, but now it's much easier to repeat the process.

Creating more realistic text

What about text? So far, in both articles, we've only discussed how to generate numeric data. We all know, however, that time-series data often contain more than just numeric values. Let's turn to another common data type: text.

Time-series data often contains text values. When your schema contains log messages, item names, or other identifying information stored as text, we want to generate sample text that feels more realistic, even if it's random.

Let's consider the query used earlier that creates CPU and temperature data for a set of devices. If the devices were real, the data they create might contain an intermittent status message of varying length.

To figure out how to generate this random text, we will follow the same process as before, working directly in a stand-alone SQL query before moving our solution into a reusable function. After some initial attempts (and ample Googling), I came up with this example for producing random text of variable length using a defined character set. As with the random_between() function above, this can be modified to suit your needs. For instance, it would be fairly easy to get unique, random hexadecimal values by limiting the set of characters and lengths.

Let your creativity guide you.

WITH symbols(characters) as (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 {}')),
w1 AS (
	SELECT string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), '') r_text, 'g1' AS idx
	FROM symbols,
generate_series(1,10) as word(chr_idx) -- word length
	GROUP BY idx)
SELECT
  time,
  device_id,
  random_between(3,100, 4) AS cpu,
  random_between(28,83) AS temperature_c,
  w1.r_text AS note
FROM w1, generate_series(now() - interval '1 hour', now(), interval '1 minute') AS time, 
	generate_series(1,10,1) AS device_id
ORDER BY 1,2;

time                         |device_id|cpu     |temperature_c|note      |
-----------------------------+---------+--------+-------------+----------+
2021-11-03 16:49:24.218 -0400|        1| 88.3525|           50|I}3U}FIsX9|
2021-11-03 16:49:24.218 -0400|        2| 29.5313|           53|I}3U}FIsX9|
2021-11-03 16:49:24.218 -0400|        3| 97.6065|           70|I}3U}FIsX9|
2021-11-03 16:49:24.218 -0400|        4| 96.2170|           40|I}3U}FIsX9|
2021-11-03 16:49:24.218 -0400|        5| 53.2318|           82|I}3U}FIsX9|
2021-11-03 16:49:24.218 -0400|        6| 73.7244|           56|I}3U}FIsX9|

In this case, it was easier to generate a random value inside of a CTE that we could reference later in the query. However, this approach has one problem that's pretty easy to spot in the first few rows of returned data.

While the CTE does create random text of 10 characters (go ahead and run it a few times to verify), the value of the CTE is generated once each time and then cached, repeating the same result over and over for every row. Once we transfer the query into a function, we expect to see a different value for each row.

For this second example function to generate "words" of random lengths (or no text at all in some cases), the user will need to provide an integer for the minimum and maximum length of the generated text. After some testing, we also added a simple randomizing feature.

Notice the IF...THEN condition that we added. Any time the generated number is divided by five and has a remainder of zero or one, the function will not return a text value. There is nothing special about this approach to providing randomness to the frequency of the output, so feel free to adjust this part of the function to suit your needs.

/*
 * Function to create random text, of varying length
 */
CREATE OR REPLACE FUNCTION random_text(min_val INT=0, max_val INT=50) 
   RETURNS text AS
$$
DECLARE 
	word_length NUMERIC  = floor(random() * (max_val-min_val) + min_val)::INTEGER;
	random_word TEXT = '';
BEGIN
	-- only if the word length we get has a remainder after being divided by 5. This gives
	-- some randomness to when words are produced or not. Adjust for your tastes.
	IF(word_length % 5) > 1 THEN
	SELECT * INTO random_word FROM (
		WITH symbols(characters) AS (VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 '))
		SELECT string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), ''), 'g1' AS idx
		FROM symbols
		JOIN generate_series(1,word_length) AS word(chr_idx) on 1 = 1 -- word length
		group by idx) a;
	END IF;
	RETURN random_word;
END
$$ LANGUAGE 'plpgsql';

When we use this function to add random text to our sample time-series query, notice that the text is random in length (between 2 and 10 characters) and frequency.

SELECT
  time,
  device_id,
  random_between(3,100, 4) AS cpu,
  random_between(28,83) AS temperature_c,
  random_text(2,10) AS note
FROM generate_series(now() - interval '1 hour', now(), interval '1 minute') AS time, 
	generate_series(1,10,1) AS device_id
ORDER BY 1,2;

time                         |device_id|cpu     |temperature_c|note     |
-----------------------------+---------+--------+-------------+---------+
2021-11-04 14:17:03.410 -0400|        1| 86.5780|           67|         |
2021-11-04 14:17:03.410 -0400|        2|  3.5370|           76|pCVBp AZ |
2021-11-04 14:17:03.410 -0400|        3| 59.7085|           28|kMrr     |
2021-11-04 14:17:03.410 -0400|        4| 69.6153|           46|3UdA     |
2021-11-04 14:17:03.410 -0400|        5| 33.0906|           56|d0sSUilx |
2021-11-04 14:17:03.410 -0400|        6| 44.2837|           74|         |
2021-11-04 14:17:03.410 -0400|        7| 14.2550|           81|TOgbHOU  |

Hopefully, you're starting to see a pattern. Using generate_series() and some custom functions can help you create time-series data of many shapes and sizes.

We've demonstrated ways to create more realistic numbers and text data because they are the primary data types used in time-series data. Are there any other data types included with time-series data that you might need to generate with your sample data?

What about JSON values?

Creating sample JSON

Note: The sample queries below create JSON strings as the output with the intention that it would be inserted into a table for further testing and learning. In PostgreSQL, JSON string data can be stored in a JSON or JSONB column, each providing different features for querying and displaying the JSON data. In most circumstances, JSONB is the preferred column type because it provides more efficient storage and the ability to create indexes over the contents. The main downside is that the actual formatting of the JSON string, including the order of the keys and values, is not retained and may be difficult to reproduce exactly. To better understand the differences of when you would store JSON string data with one column type over the other, please refer to the PostgreSQL documentation.

PostgreSQL has supported JSON and JSONB data types for many years. With each major release, the feature set for working with JSON and overall query performance improves. In a growing number of data models, particularly when REST or Graph APIs are involved, storing extra meta information as a JSON document can be beneficial. The data is available if needed while facilitating efficient queries on serialized data stored in regular columns.

We used a design pattern similar to this in our NFT Starter Kit. The OpenSea JSON API used as the data source for the starter kit includes many properties and values for each asset and collection. A lot of the values weren't helpful for the specific analysis in that tutorial. However, we knew that some of the values in the JSON properties could be useful in future analysis, tutorials, or demonstrations. Therefore, we stored additional metadata about assets and collections in a JSONB field to query it if needed. Still, it didn't complicate the schema design for otherwise common data like name and asset_id.

Storing data in a JSON field is also a common practice in areas like IIoT device data. Engineers usually have an agreed-upon schema to store and query metrics produced by the device, followed by a "free form" JSON column that allows engineers to send error or diagnostic data that changes over time as hardware is modified or updated.

There are several approaches to add JSON data to our sample query. One added challenge is that JSON data includes both a key and a value, along with the possibility of numerous levels of child object nesting. The approach you take will depend on how complex you want the PostgreSQL function to be and the end goal of the sample data. In this example, we'll create a function that takes an array of keys for the JSON and generates random numerical values for each key without nesting. Generating the JSON string in SQL from our values is straightforward, thanks to built-in PostgreSQL functions for reading and writing JSON strings. 🎉

As with the other examples in this post, we'll start by using a CTE to generate a random JSON document in a stand-alone SELECT query to verify that the result is what we want. Remember, we'll observe the same issue we had earlier when generating random text in the stand-alone query because we are using a CTE. The JSON is random every time the query runs, but the string is reused for all rows in the result set. CTE's are materialized once for each reference in a query, whereas functions are called again for every row. Because of this, we won't observe random values in each row until we move the SQL into a function to reuse later.

WITH random_json AS (
SELECT json_object_agg(key, random_between(1,10)) as json_data
    FROM unnest(array['a', 'b']) as u(key))
  SELECT json_data, generate_series(1,5) FROM random_json;

json_data       |generate_series|
----------------+---------------+
{"a": 6, "b": 2}|              1|
{"a": 6, "b": 2}|              2|
{"a": 6, "b": 2}|              3|
{"a": 6, "b": 2}|              4|
{"a": 6, "b": 2}|              5|

We can see that the JSON data is created using our keys (['a','b']) with numbers between 1 and 10. We just have to create a function that will create random JSON data each time it is called. This function will always return a JSON document with numeric integer values for each key we provide for demonstration purposes. Feel free to enhance this function to return more complex documents with various data types if that's a requirement for you.

CREATE OR REPLACE FUNCTION random_json(keys TEXT[]='{"a","b","c"}',min_val NUMERIC = 0, max_val NUMERIC = 10) 
   RETURNS JSON AS
$$
DECLARE 
	random_val NUMERIC  = floor(random() * (max_val-min_val) + min_val)::INTEGER;
	random_json JSON = NULL;
BEGIN
	-- again, this adds some randomness into the results. Remove or modify if this
	-- isn't useful for your situation
	if(random_val % 5) > 1 then
		SELECT * INTO random_json FROM (
			SELECT json_object_agg(key, random_between(min_val,max_val)) as json_data
	    		FROM unnest(keys) as u(key)
		) json_val;
	END IF;
	RETURN random_json;
END
$$ LANGUAGE 'plpgsql';

With the random_json() function in place, we can test it in a few ways. First, we'll simply call the function directly without any parameters, which will return a JSON document with the default keys provided in the function definition ("a", "b", "c") and values from 0 to 10 (the default minimum and maximum value).

SELECT random_json();

random_json             |
------------------------+
{"a": 7, "b": 3, "c": 8}|

Next, we'll join this to a small numeric set from generate_series().

SELECT device_id, random_json() FROM generate_series(1,5) device_id;

device_id|random_json              |
---------+-------------------------+
        1|{"a": 2, "b": 2, "c": 2} |
        2|                         |
        3|{"a": 10, "b": 7, "c": 1}|
        4|                         |
        5|{"a": 7, "b": 1, "c": 0} |

Notice two things with this example.

First, the data is different for each row, showing that the function gets called for each row and produces different numeric values each time. Second, because we kept the same random output mechanism from the random_text() example, not every row includes JSON.

Finally, let's add this into the sample query for generating device data that we've used throughout this article to see how to provide an array of keys ("building" and "rack") for the generated JSON data.

SELECT
  time,
  device_id,
  random_between(3,100, 4) AS cpu,
  random_between(28,83) AS temperature_c,
  random_text(2,10) AS note,
  random_json(ARRAY['building','rack'],1,20) device_location
FROM generate_series(now() - interval '1 hour', now(), interval '1 minute') AS time, 
	generate_series(1,10,1) AS device_id
ORDER BY 1,2;


time                         |device_id|cpu     |temperature_c|note     |device_location             |
-----------------------------+---------+--------+-------------+---------+----------------------------+
2021-11-04 16:19:22.991 -0400|        1| 14.7614|           70|CTcX8 2s4|                            |
2021-11-04 16:19:22.991 -0400|        2| 62.2618|           81|x1V      |{"rack": 4, "building": 5}  |
2021-11-04 16:19:22.991 -0400|        3| 10.1214|           50|1PNb     |                            |
2021-11-04 16:19:22.991 -0400|        4| 96.3742|           29|aZpikXGe |{"rack": 12, "building": 4} |
2021-11-04 16:19:22.991 -0400|        5| 22.5327|           30|lM       |{"rack": 2, "building": 3}  |
2021-11-04 16:19:22.991 -0400|        6| 57.9773|           44|         |{"rack": 16, "building": 5} |
...

There are just so many possibilities for creating sample data with generate_series(), PostgreSQL functions, and some custom logic.

Putting it all together

Let's put what we've learned into practice, using these three functions to create and insert ~1 million rows of data and then query it with the hyperfunctions time_bucket(), time_bucket_ng(), approx_percentile() and time_weight(). To do this, we'll create two tables: one will be a list of computer hosts and the second will be a hypertable that stores fake time-series data about the computers.

Step 1: Create the schema and hypertable

CREATE TABLE host (
	id int PRIMARY KEY,
	host_name TEXT,
	LOCATION jsonb
);

CREATE TABLE host_data (
	date timestamptz NOT NULL,
	host_id int NOT NULL,
	cpu double PRECISION,
	tempc int,
	status TEXT	
);

SELECT create_hypertable('host_data','date');

Step 2: Generate and insert data

-- Insert data to create fake hosts
INSERT INTO host
SELECT id, 'host_' || id::TEXT AS name, 
	random_json(ARRAY['building','rack'],1,20) AS LOCATION
FROM generate_series(1,100) AS id;


-- insert ~1.3 million records for the last 3 months
INSERT INTO host_data
SELECT date, host_id,
	random_between(5,100,3) AS cpu,
	random_between(28,90) AS tempc,
	random_text(20,75) AS status
FROM generate_series(now() - INTERVAL '3 months',now(), INTERVAL '10 minutes') AS date,
generate_series(1,100) AS host_id;

Step 3: Query data using time_bucket() and time_bucket_ng()

-- Using time_bucket(), query the average CPU and max tempc
SELECT time_bucket('7 days', date) AS bucket, host_name,
	avg(cpu),
	max(tempc)
FROM host_data
JOIN host ON host_data.host_id = host.id
WHERE date > now() - INTERVAL '1 month'
GROUP BY 1,2
ORDER BY 1 DESC, 2;


-- try the experimental time_bucket_ng() to query data in month buckets
SELECT timescaledb_experimental.time_bucket_ng('1 month', date) AS bucket, host_name,
	avg(cpu) avg_cpu,
	max(tempc) max_temp
FROM host_data
JOIN host ON host_data.host_id = host.id
WHERE date > now() - INTERVAL '3 month'
GROUP BY 1,2
ORDER BY 1 DESC, 2;

Step 4: Query data using toolkit hyperfunctions

-- query all host in building 10 for 7 day buckets
-- also try the new percentile approximation function to 
-- get the p75 of data for each 7 day period
SELECT time_bucket('7 days', date) AS bucket, host_name,
	avg(cpu),
	approx_percentile(0.75,percentile_agg(cpu)) p75,
	max(tempc)
FROM host_data
JOIN host ON host_data.host_id = host.id
WHERE date > now() - INTERVAL '1 month'
	AND LOCATION -> 'building' = '10'
GROUP BY 1, 2
ORDER BY 1 DESC, 2;



-- To test time-weighted averages, we need to simulate missing
-- some data points in our host_data table. To do this, we'll
-- randomly select ~10% of the rows, and then delete them from the
-- host_data table.
WITH random_delete AS (SELECT date, host_id FROM host_data
	 JOIN host ON host_id = id WHERE 
	date > now() - INTERVAL '2 weeks'
	ORDER BY random() LIMIT 20000
)
DELETE FROM host_data hd
USING random_delete rd
WHERE hd.date = rd.date
AND hd.host_id = rd.host_id;


-- Select the daily time-weighted average and regular average
-- of each host for building 10 for the last two weeks.
-- Notice the variation in the two numbers because of the missing data.
SELECT time_bucket('1 day',date) AS bucket,
	host_name,
	average(time_weight('LOCF',date,cpu)) weighted_avg,
	avg(cpu) 
FROM host_data
	JOIN host ON host_data.host_id = host.id
WHERE LOCATION -> 'building' = '10'
AND date > now() - INTERVAL '2 weeks'
GROUP BY 1,2
ORDER BY 1 DESC, 2;

In a few lines of SQL, we created 1.3 million rows of data and were able to test four different functions in TimescaleDB, all without relying on any external source. 💪

Still, you may notice one last issue with the values in our host_data table (even though the values are not more realistic in nature). By using random() as the basis for our queries, the calculated numeric values all tend to have an equal distribution within the specified range which causes the average of the values to always be near the median. This makes sense statistically, but it highlights one other area of improvement to the data we generate. In the third post of this series, we'll demonstrate a few ways to influence the generated values to provide shape to the data (and even some outliers if we need them).

Reviewing our progress

When using a database like TimescaleDB or testing features in PostgreSQL, generating a representative dataset is a beneficial tool to have in your SQL toolbelt.

In the first post, we learned how to generate lots of data by combining the result sets of multiple generate_series() functions. Using the implicit CROSS JOIN, the total number of rows in the final output is a product of each set together. When one of the data sets contains timestamps, the output can be used to create time-series data for testing and querying.

The problem with our initial examples was that the actual values we generated were random and lacked control over their precision - and all of the data was numeric. So in this second post, we demonstrated how to format the numeric data for a given column and generate random data of other types, like text and JSON documents. We also added an example in the text and JSON functions that created randomness in how often the values were emitted for each of those columns.

Again, all of these are building block examples for you to use, creating functions that generate the kind of data you need to test.

To see some of these examples in action, watch my video on creating realistic sample data:

In part 3 of this series, we will demonstrate how to add shape and trends into your sample time-series data (e.g., increasing web traffic over time and quarterly sales cycles) using the formatting functions in this post in conjunction with relational lookup tables and additional mathematical functions. Knowing how to manipulate the pattern of generated data is particularly useful for visualizing time-series data and learning analytical PostgreSQL or TimescaleDB functions.

If you have questions about using generate_series() or have any questions about TimescaleDB, please join our community Slack channel, where you'll find an active community and a handful of the Timescale team most days.

If you want to try creating larger sets of sample time-series data using generate_series() and see how the exciting features of TimescaleDB work, sign up for a free 30-day trial or install and manage it on your instances. (You can also learn more by following one of our many tutorials.)

↧

Frits Hoogland: What is free memory in Linux?

November 15, 2021, 5:39 am

≫ Next: Gilles Darold: PostgreSQL 15 will include some more regexp functions

≪ Previous: Ryan Booz: Generating more realistic sample time-series data with PostgreSQL generate_series()

This blogpost is about linux memory management, and specifically about the question that has been asked about probably any operating system throughout history: how much free memory do I need to consider it to be healthy?

To start off, a reference to a starwars quote: 'This is not the free memory you're looking for'.

What this quote means to say is that whilst the free memory statistic obviously shows free memory, what you are actually looking for is the amount of memory that can be used for memory allocations on the system. On linux, this is not the same as free memory.

Of course free memory is directly available, actually free memory, which can be directly used by a process that needs free memory (a free page). A free page is produced by the kernel page daemon (commonly named 'swapper'), or by a process that explicitly frees memory.

The linux operating system frees a low amount of memory, for the reason to make use of memory as optimal as it can. One of the many optimizations in linux is to use memory for a purpose, such as storing an earlier read page from disk. It doesn't make sense to free all used memory right after usage, such as a page read from disk. In fact, radically cleaning all used pages after use would eliminate the (disk) page cache in linux.

In linux, there are no settings for dedicating a memory area as disk cache, instead it essentially takes all non-used memory and keeps the data in it available as long as there isn't another, better purpose for the page.

However, there must be some sort of indicator that tells you how much memory the operating system can use. I just said that free memory is not that indicator. So what is that indicator? Since this commit there is the notion of 'available memory' (unsurprisingly, 'MemAvailable' in /proc/meminfo). This is the amount of memory that could be used if memory is needed by any process. So, if you want to know how much memory can be used on a linux system, available memory is the statistic to look for, and not free memory.

An obvious question at this point is: if there is available memory that actually should be looked for when trying to assess memory health, which is traditionally done by looking at free memory, why is there free memory, and not just available memory?

Free memory is really just a minimal amount of pages made free upfront to quickly provide the kernel or a process with some free pages, for the maximal amount as set by the kernel parameter vm.min_free_kbytes. The main reason for doing that is performance: if a page is required by a process, and it would need to find an available page, the requesting process must stop processing and scan memory for pages that are available, which takes time and CPU processing, and if a page is found, it needs to be freed (removed from the lists in which it could be found to be used for the original contents). A free page has all that work done upfront, and can just be taken from the list.

The next obvious question is: okay, so I need to look at available memory, do I need to monitor for a certain amount, such as a percentage of total memory? Despite being an obvious question, this cannot be answered without understanding the system and especially the processing application on it. The amount of memory that needs to be available depends on the amount of memory allocations and freeing that the total amount of running processes perform, which is really specific to each machine, and the application it is serving.

The way free memory works can be seen on a linux system when monitoring the free memory statistic (MemFree in /proc/meminfo): if a system has started up, probably a certain (large) amount of memory is shown as free, which amount declines based on the memory allocation eagerness of the processes running on it. Once it's down to close to vm.min_free_kbytes, what you will see is it fluctuates a bit, but remains there. If free memory remains consistently higher, not because of allocation and freeing, it means the memory is never used. In general, what you see is that it mostly stays around vm.min_free_kbytes. This also means that monitoring free memory doesn't really tell you anything.

This will be different if processes free large amounts of memory on the system, obviously these will be added to free memory, and thus increase free memory, which will increase the statistic of free memory much higher than around vm.min_free_kbytes. This is application dependent, and not something hardly ever seen: process based databases such as Postgresql or Oracle can allocate (huge) private heaps per individual process for data processing, and can free them if they are not needed anymore.

Closely monitoring the free memory and available memory statistics, in the case of Postgresql (and therefore Yugabyte YSQL) memory usage, memory allocations tend to be bursty in nature. That means that most of the time, there is no gradual increase of any memory statistic for which a warning and alarm threshold can be set so a potential low on memory scenario can be warned for. Instead, the general pattern that I witness is that memory usage does generally move up and down, however at certain times very rapidly peak causing the swapper to free memory, and sometimes do peak so much that processes are forced to stall and perform direct memory gathering.

If the peak is short enough, this passes rather unnoticed from a userland perspective, and the direct memory gathering together with the swapper freeing memory quickly returns the system to a normal memory state and returns to normal performance. Of course totally depending on each specific situation, when lots of memory is in active use, this is actually not a rare event.

If the memory gathering peak is longer and gets a system close to exhaustion, there are two scenario's that I see, which do not exclude each other, in random order:

If any process could not find enough pages to satisfy its need after scanning all memory, it will invoke the the OOM (out of memory) killer. The OOM killer performs a calculation roughly based on memory usage and oom priority, and terminates one process with the intent to restore memory availability and thus normal functioning and performance.
The system gets into a state that is commonly called 'thrashing' for which the Wikipedia article link performs a very good description. The essence is the memory management for all processes performing memory tasks eats up all the time because of memory over-allocation, bringing the system close to a standstill, but no process actually is unable to find available pages, so the OOM killer is not invoked.

Both obviously have a profound impact, and should be prevented, because they impact global performance. The general recommendation is to make sure memory allocations are significantly lower than total memory, so there is memory left for the kernel, the page cache and any peaks in memory usage for one or more processes.

The conclusion is that monitoring free memory with the intent to understand how much memory is available is wrong. There is another statistic that provides this information, which is memory available. In order to make a linux system perform optimally, it should have enough available memory so processing is impacted as little as possible.

↧