Hans-Juergen Schoenig: PostgreSQL: Improving sort performance

August 23, 2018, 6:00 am

≫ Next: Venkata Nagothi: Using Barman to Backup PostgreSQL - An Overview

≪ Previous: Joshua Otwell: PostgreSQL Streaming Replication vs Logical Replication

Sorting is a very important aspect of PostgreSQL performance tuning. However, tuning sorts is often misunderstood or simply overlooked by many people. So I decided to come up with a PostgreSQL blog showing, how sorts can be tuned in PostgreSQL.

Creating sample data

To show how sorting works, I created a couple of million rows first:

test=# CREATE TABLE t_test (x numeric);
CREATE TABLE
test=# INSERT INTO t_test SELECT random()
       FROM generate_series(1, 5000000);
INSERT 0 5000000
test=# ANALYZE ;
ANALYZE

What the code does is to create a table and load 5 million random values. As you will notice, data can be loaded within seconds.

Sorting data in PostgreSQL

Let us try to sort the data. For the sake of simplicity I am using the most simplistic statements possible. What you can see is that PostgreSQL has to sort on disk because the data we want to sort does not fit into memory. In this case a bit more than 100 MB of data is moved to disk:

test=# explain analyze SELECT * FROM t_test ORDER BY x;
                                QUERY PLAN
--------------------------------------------------------------------------
Sort (cost=804270.42..816770.42 rows=5000000 width=11)
     (actual time=4503.484..6475.222 rows=5000000 loops=1)
     Sort Key: x
     Sort Method: external merge Disk: 102896kB
     -> Seq Scan on t_test (cost=0.00..77028.00 rows=5000000 width=11)
        (actual time=0.035..282.427 rows=5000000 loops=1)
Planning time: 0.139 ms
Execution time: 6606.637 ms
(6 rows)

Why does PostgreSQL not simply sort stuff in memory? The reason is the work_mem parameter, which is by default set to 4 MB:

test=# SHOW work_mem;
 work_mem
---------- 
      4MB
(1 row)

work_mem tells the server that up to 4 MB can be used per operation (per sort, grouping operation, etc.). If you sort too much data, PostgreSQL has to move the excessive amount of data to disk, which is of course slow.

Fortunately changing work_mem is simple and can even be done at the session level.

Speeding up sorts in PostgreSQL – using more work_mem

Let us change work_mem for our current session and see what happens to our example shown before.

test=# SET work_mem TO '1 GB';
SET

The easiest way to change work_mem on the fly is to use SET. In this case I have set the parameter to 1 GB. Now PostgreSQL has enough RAM to do stuff in memory:

test=# explain analyze SELECT * FROM t_test ORDER BY x;
                             QUERY PLAN
---------------------------------------------------------------------------
 Sort (cost=633365.42..645865.42 rows=5000000 width=11)
      (actual time=1794.953..2529.398 rows=5000000 loops=1)
      Sort Key: x
      Sort Method: quicksort Memory: 430984kB
      -> Seq Scan on t_test (cost=0.00..77028.00 rows=5000000 width=11)
         (actual time=0.075..296.167 rows=5000000 loops=1)
Planning time: 0.067 ms
Execution time: 2686.635 ms
(6 rows)

The performance impact is incredible. The speed has improved from 6.6 seconds to around 2.7 seconds, which is around 60% less. As you can see, PostgreSQL uses “quicksort” instead of “external merge Disk”. If you want to speed up and tune sorting in PostgreSQL, there is no way of doing that without changing work_mem. The work_mem parameter is THE most important knob you have. The cool thing is that work_mem is not only used to speed up sorts – it will also have a positive impact on aggregations and so on.

Taking care of partial sorts

As of PostgreSQL 10 there are 3 types of sort algorithms in PostgreSQL:

external sort Disk
quicksort
top-N heapsort

“top-N heapsort” is used if you only want a couple of sorted rows. For example: The highest 10 values, the lowest 10 values and so on. “top-N heapsort” is pretty efficient and returns the desired data in almost no time:

test=# explain analyze SELECT * FROM t_test ORDER BY x LIMIT 10;
                               QUERY PLAN
----------------------------------------------------------------------------------
 Limit (cost=185076.20..185076.23 rows=10 width=11)
       (actual time=896.739..896.740 rows=10 loops=1)
        -> Sort (cost=185076.20..197576.20 rows=5000000 width=11)
                (actual time=896.737..896.738 rows=10 loops=1)
           Sort Key: x
           Sort Method: top-N heapsort Memory: 25kB
           -> Seq Scan on t_test (cost=0.00..77028.00 rows=5000000 width=11) 
                                 (actual time=1.154..282.408 rows=5000000 loops=1)
Planning time: 0.087 ms
Execution time: 896.768 ms
(7 rows)

Wow, the query returns in less than one second.

Improving sorting: Consider indexing …

work_mem is ideal to speed up sorts. However, in many cases it can make sense to avoid sorting in the first place. Indexes are a good way to provide the database engine with “sorted input”. In fact: A btree is somewhat similar to a sorted list.

Building indexes (btrees) will also require some sorting. Many years ago PostgreSQL used work_mem to tell the CREATE INDEX command, how much memory to use for index creation. This is not the case anymore: In modern versions of PostgreSQL the maintenance_work_mem parameter will tell DDLs how much memory to use.

Here is an example:

test=# \timing
Timing is on.
test=# CREATE INDEX idx_x ON t_test (x);
CREATE INDEX
Time: 4648.530 ms (00:04.649)

The default setting for maintenance_work_mem is 64 MB, but this can of course be changed:

test=# SET maintenance_work_mem TO '1 GB';
SET
Time: 0.469 ms

The index creation will be considerably faster with more memory:

test=# CREATE INDEX idx_x2 ON t_test (x);
CREATE INDEX
Time: 3083.661 ms (00:03.084)

In this case CREATE INDEX can use up to 1 GB of RAM to sort the data, which is of course a lot faster than going to disk. This is especially useful if you want to create large indexes.

The query will be a lot faster if you have proper indexes in place. Here is an example:

test=# explain analyze SELECT * FROM t_test ORDER BY x LIMIT 10;
                                  QUERY PLAN
--------------------------------------------------------------------------------
Limit (cost=0.43..0.95 rows=10 width=11)
      (actual time=0.068..0.087 rows=10 loops=1)
      -> Index Only Scan using idx_x2 on t_test
               (cost=0.43..260132.21 rows=5000000 width=11)
               (actual time=0.066..0.084 rows=10 loops=1)
               Heap Fetches: 10
Planning time: 0.130 ms
Execution time: 0.119 ms
(5 rows)

In my example the query needs way less than a millisecond. If your database happens to sort a lot of data all the time, consider using better indexes to speed things up rather than pumping work_mem to higher and higher.

Sorting in PostgreSQL and tablespaces

Many people out there are using tablespaces to scale I/O. By default PostgreSQL only uses a single tablespace, which can easily turn into a bottleneck. Tablespaces are a good way to provide PostgreSQL with more hardware.

Let us assume you have to sort a lot of data repeatedly: The temp_tablespaces is a parameter, which allows administrators to control the location of temporary files sent to disk. Using a separate tablespace for temporary files can also help to speed up sorting.

If you are not sure how to configure work_mem, consider checking out http://pgconfigurator.cybertec.at– it is an easy tool helping people to configure PostgreSQL.

The post PostgreSQL: Improving sort performance appeared first on Cybertec.

↧

Venkata Nagothi: Using Barman to Backup PostgreSQL - An Overview

August 20, 2018, 3:28 pm

≫ Next: Dimitri Fontaine: Geolocation with PostgreSQL

≪ Previous: Hans-Juergen Schoenig: PostgreSQL: Improving sort performance

Database backups play an imperative role in designing an effective disaster recovery strategy for production databases. Database Administrators and Architects must continuously work towards designing an optimal and effective backup strategy for real-time mission critical databases and further ensure Disaster Recovery SLAs are satisfied. As per my experience, this is not easy and can take from days to weeks to achieve an impeccable backup strategy. It is just not writing a good script to backup databases and make sure it works. There are several factors to consider, let us take a look at them:

Database size: Database size plays in important role when designing backup strategies. In-fact, this is one of the core factors which defines
- Time taken by the backup
- The load on the infrastructure components like Disk, Network, CPU etc.
- Amount of backup storage required and the costs involved
- If the databases are hosted on cloud, then, the backup storage costs rely on the amount of storage required
- Also, database size impacts the RTO
Infrastructure: Backup strategy heavily relies on infrastructure of the databases. The backup procedure would be different for databases hosted on a physical server in an on-prem data-centre as compared to those hosted on cloud.
Backup Location: Where are the backups going? Generally, the backups will be placed at a remote location, for instance on tape or cloud specific storage like AWS S3.
Backup Tool: Identify an optimal tool to perform online database backup which potentially ensures consistent backup has been taken.

A good database backup strategy must ensure RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are met which in-turn help achieve Disaster Recovery objective. File-system level backups can be performed on PostgreSQL Databases in several ways. In this blog, my focus will be on a tool called Barman which is popularly used to perform PostgreSQL Database Backups.

Barman (backup and recovery manager) is an Python based open-source tool developed by developers at 2nd Quadrant. This tool is developed to achieve enterprise grade database backup strategy for mission critical PostgreSQL production databases. Its features and characteristics resemble that of Oracle’s RMAN. In my opinion, barman is one of the best options for PostgreSQL databases and can deliver several benefits from the operations perspective to DBAs and Infrastructure engineers.

Let us look at some capabilities of Barman:

I will start with configuration overview and then list out what kind of backups can be performed

Technically, barman-cli is python based tool and has two different configuration files to deal with. One file which is the actual configuration for the database to be backed-up resides in “/etc/barman.d” names as <db-identifier-for-barman>.conf and the other file which has the barman related parameters (like barman backups location, barman server, log files etc.) configured resides in “/etc” (/etc/barman.conf). The barman configuration files have MySQL type parameter configuration.

Example contents of /etc/barman.conf file is shown below

[barman]
barman_user = barman            ---------> barman user who performs backup/recovery of database
configuration_files_directory = /etc/barman.d    -----> location for DB configuration files
barman_home = /dbbackups/barman    ---> barman home directory
log_file = /dbbackups/barman/logs/barman.log ---> barman log file location
log_level = INFO  -----> level of logging for barman operations
compression = gzip  ----->  backups must be compressed

Installation of Barman

Let us take a look at the installation procedure of barman -

Installing from the source

Download the barman from the https://www.pgbarman.org/

Untar / unzip the installer and execute the following command as root user -

[root@barman-server barman-2.4]# ./setup.py install
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'setup_requires'
  warnings.warn(msg)
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'tests_require'
  warnings.warn(msg)
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/barman
copying barman/utils.py -> build/lib/barman
copying barman/fs.py -> build/lib/barman
copying barman/retention_policies.py -> build/lib/barman
copying barman/diagnose.py -> build/lib/barman
copying barman/backup.py -> build/lib/barman
copying barman/recovery_executor.py -> build/lib/barman
copying barman/backup_executor.py -> build/lib/barman
copying barman/config.py -> build/lib/barman
copying barman/process.py -> build/lib/barman
copying barman/output.py -> build/lib/barman
copying barman/__init__.py -> build/lib/barman
copying barman/remote_status.py -> build/lib/barman
copying barman/xlog.py -> build/lib/barman
copying barman/lockfile.py -> build/lib/barman
copying barman/postgres.py -> build/lib/barman
copying barman/server.py -> build/lib/barman
copying barman/cli.py -> build/lib/barman
copying barman/version.py -> build/lib/barman
copying barman/compression.py -> build/lib/barman
copying barman/wal_archiver.py -> build/lib/barman
copying barman/infofile.py -> build/lib/barman
copying barman/exceptions.py -> build/lib/barman
copying barman/hooks.py -> build/lib/barman
copying barman/copy_controller.py -> build/lib/barman
copying barman/command_wrappers.py -> build/lib/barman
running build_scripts
creating build/scripts-2.7
copying and adjusting bin/barman -> build/scripts-2.7
changing mode of build/scripts-2.7/barman from 644 to 755
running install_lib
creating /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/utils.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/fs.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/retention_policies.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/diagnose.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/backup.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/recovery_executor.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/backup_executor.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/config.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/process.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/output.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/__init__.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/remote_status.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/xlog.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/lockfile.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/postgres.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/server.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/cli.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/version.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/compression.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/wal_archiver.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/infofile.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/exceptions.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/hooks.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/copy_controller.py -> /usr/lib/python2.7/site-packages/barman
copying build/lib/barman/command_wrappers.py -> /usr/lib/python2.7/site-packages/barman
byte-compiling /usr/lib/python2.7/site-packages/barman/utils.py to utils.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/fs.py to fs.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/retention_policies.py to retention_policies.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/diagnose.py to diagnose.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/backup.py to backup.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/recovery_executor.py to recovery_executor.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/backup_executor.py to backup_executor.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/config.py to config.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/process.py to process.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/output.py to output.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/__init__.py to __init__.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/remote_status.py to remote_status.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/xlog.py to xlog.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/lockfile.py to lockfile.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/postgres.py to postgres.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/server.py to server.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/cli.py to cli.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/version.py to version.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/compression.py to compression.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/wal_archiver.py to wal_archiver.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/infofile.py to infofile.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/exceptions.py to exceptions.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/hooks.py to hooks.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/copy_controller.py to copy_controller.pyc
byte-compiling /usr/lib/python2.7/site-packages/barman/command_wrappers.py to command_wrappers.pyc
running install_scripts
copying build/scripts-2.7/barman -> /usr/bin
changing mode of /usr/bin/barman to 755
running install_data
copying doc/barman.1 -> /usr/share/man/man1
copying doc/barman.5 -> /usr/share/man/man5
running install_egg_info
Writing /usr/lib/python2.7/site-packages/barman-2.4-py2.7.egg-info

Installing from the repo

Installation can also be done via yum as follows

[barman@barman-server~]$ yum install barman

Let us take a look at different types of backups barman supports

Physical Hot Backups

Barman supports Physical Hot Backups which means, online backup of physical data files and transaction log files of the database using rsync methodology which can be in the compressed form as well.

Let us take a look at the steps and commands to perform RSYNC backup using barman

#1 PostgreSQL database configuration file for barman

[pgdb]
description="Main PostgreSQL server"
conninfo=host=pgserver user=postgres dbname=postgres
ssh_command=ssh barman@pgserver
archiver=on
backup_method = rsync

“pgdb” is the identifier of the Postgres Database for barman and the configuration file name should be <identifier>.conf located in /etc/barman.d/. When the barman backup command is executed, barman looks for the section [pgdb] in pgdb.conf file.

The parameter backup_method defines the type of backup to be taken. In this case backup_method is rsync.

Note: For the barman backup command to be successful, password-less ssh authentication must be configured between barman and postgres servers.

#2 postgresql.conf file parameters

wal_level=replica
archive_mode=on
archive_command=’rsync to <ARCHIVE LOCATION>’

Barman’s backup command

#3 Check if barman is ready to perform backups

[barman@pgserver pgdb]$ barman check pgdb
Server pgdb:
        PostgreSQL: OK
        is_superuser: OK
        wal_level: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 4 backups, expected at least 0)
        ssh: OK (PostgreSQL server)
        not in recovery: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        archiver errors: OK

The above output says all is “OK” to proceed with the backup which means, you are good to take a backup.

For example, below output says backup cannot be taken because according to barman SSH is not working -

[barman@pgserver  ~]$ barman check pgdb
Server pgdb:
        PostgreSQL: OK
        is_superuser: OK
        wal_level: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 0 backups, expected at least 0)
        ssh: FAILED (Connection failed using 'barman@pgserver -o BatchMode=yes -o StrictHostKeyChecking=no' return code 127)
        not in recovery: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        archiver errors: OK

#4 Perform Database backup

[barman@barman-server ~]$ barman backup pgdb
Starting backup using rsync-exclusive method for server pgdb in /dbbackup/barman_backups/pgdb/base/20180816T153846
Backup start at LSN: 0/1C000028 (00000001000000000000001C, 00000028)
This is the first backup for server pgdb
WAL segments preceding the current backup have been found:
        00000001000000000000000B from server pgdb has been removed
        00000001000000000000000C from server pgdb has been removed
        00000001000000000000000D from server pgdb has been removed
        00000001000000000000000E from server pgdb has been removed
        00000001000000000000000F from server pgdb has been removed
        000000010000000000000010 from server pgdb has been removed
        000000010000000000000011 from server pgdb has been removed
        000000010000000000000012 from server pgdb has been removed
        000000010000000000000013 from server pgdb has been removed
        000000010000000000000014 from server pgdb has been removed
        000000010000000000000015 from server pgdb has been removed
        000000010000000000000016 from server pgdb has been removed
Starting backup copy via rsync/SSH for 20180816T153846
Copy done (time: 1 second)
This is the first backup for server pgdb
Asking PostgreSQL server to finalize the backup.
Backup size: 21.8 MiB
Backup end at LSN: 0/1C0000F8 (00000001000000000000001C, 000000F8)
Backup completed (start time: 2018-08-16 15:38:46.668492, elapsed time: 1 second)
Processing xlog segments from file archival for pgdb
        000000010000000000000016
        000000010000000000000017
        000000010000000000000018
        000000010000000000000019
        00000001000000000000001A
        00000001000000000000001B
        00000001000000000000001C
        00000001000000000000001C.00000028.backup

To understand if the barman backup command will even be successful, below command helps -

Incremental Backups

Another great capability of Barman is the ability to take incremental backups. This means, only the changed blocks since the last full database backup can be backed-up. For databases which undergo less data changes, incrementally backing them up can reduce resource usage.

It heavily depends on rsync and hard-links. Below are the benefits of incremental backups –

Significantly reduces the daily backup time
The volume of data being backed-up reduces as only the changed data blocks will be backed-up which, in-turn reduces the usage of infrastructure resources like network bandwidth, disk space, I/O etc.
If you are after achieving a very good RTO, this is the feature you would be looking for

Commands for incremental backup is pretty much the same. Any subsequent backups after the first backup taken with option backup_method=rsync will be incremental backups and barman pulls the WALs using pg_recievexlog utility.

Remote Database Backups and Recovery

This capability of Barman is highly beneficial for DBAs in my opinion. The first thing DBAs would look for is avoid stressing production database server resources as much as possible during the backups and doing them remotely would be the best option. Barman leverages pg_basebackup which makes it a lot easier in scripting and automating it.

In general, traditionally available options for automated backups will be -

pg_basebackup
tar copy

The above two options involve a lot of development and testing to ensure an effective backup strategy is in-place to meet demands of SLAs and can pose challenges for large databases with multiple tablespaces.

With Barman, it is pretty simple. Another exceptional capability of barman is continuous WAL streaming. Let us take a look at that in a bit more detail.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Streaming Backup with continuous WAL streaming

This makes barman standout in comparison with other tools in the market. Live WAL files can be streamed continuously to a remote backup location using Barman. This is THE FEATURE which DBAs would be excited to know. I was excited to know about this. It is extremely difficult or next to impossible to achieve this with manually built scripts or with a combination of tools like pg_basebackup and pg_receivewal. With continuous WAL streaming, a better RPO can be achieved. If the backup strategy is designed meticulously, It would not be an exaggeration to say that an almost 0 RPO can be achieved.

Let us look at the steps, commands to perform a streaming barman backup

#1 postgresql.conf parameter changes

Following configurations to be done in the postgresql.conf

wal_level=replica
max_wal_senders = 2
max_replication_slots = 2
synchronous_standby_names = 'barman_receive_wal'
archive_mode=on
archive_command = 'rsync -a %p barman@pgserver:INCOMING_WAL_DIRECTORY/%f'
archive_timeout=3600 (should not be 0 or disabled)

#2 Create Replication Slot using barman

Replication slot is important for streaming backups. In-case continuous streaming of WALs fails for any reason, all the un-streamed WALs can be retained on the postgres database without being removed.

[barman@pgserver ~]$ barman receive-wal --create-slot pgdb
Creating physical replication slot 'barman' on server 'pgdb'
Replication slot 'barman' created

#3 Configure the database server configuration file for barman

Database identifier for barman is “pgdb”. A configuration file called pgdb.conf must be created in /etc/barman.d/ location with the following contents

[pgdb]
description="Main PostgreSQL server"
conninfo=host=pgserver user=postgres dbname=postgres
streaming_conninfo=host=pgserver user=barman
backup_method=postgres
archiver=on
incoming_wals_directory=/dbbackups/barman_backups/pgdb/incoming
streaming_archiver=on
slot_name=barman

streaming_conninfo is the parameter to configure for barman to perform streaming backups
backup_method must be configured to “postgres” when streaming backup is to be taken
streaming_archiver must be configured to “on”
slot_name=barman This parameter must be configured when you need barman to use replication slots. In this case the replication slot name is barman

Once the configuration is done, do a barman check to ensure streaming backups will run successful.

#4 Check if barman receive-wal is running ok

In general for the first barman receive-wal does not work immediately after configuration changes, might error out and barman check command might show the following -

[barman@pgserver  archive_status]$ barman check pgdb
Server pgdb:
        PostgreSQL: OK
        is_superuser: OK
        PostgreSQL streaming: OK
        wal_level: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 0 backups, expected at least 0)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: FAILED (See the Barman log file for more details)
        archiver errors: OK

When you run barman receive-wal, it might hang. To make receive-wal work properly for the first time, below command must be executed.

[barman@pgserver  arch_logs]$ barman cron
Starting WAL archiving for server pgdb
Starting streaming archiver for server pgdb

Now, do a barman check again, it should be good now.

[barman@pgserver  arch_logs]$ barman check pgdb
Server pgdb:
        PostgreSQL: OK
        is_superuser: OK
        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 2 backups, expected at least 0)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: OK
        archiver errors: OK

If you can see, receivexlog status shows ok. This is one of the issues i faced.

#5 Check if the barman is ready to perform backups

[barman@pgserver  ~]$ barman check pgdb
Server pgdb:
        PostgreSQL: OK
        is_superuser: OK
        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 4 backups, expected at least 0)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: OK
        archiver errors: OK

#6 Check the streaming status using barman

[barman@pgserver pgdb]$ barman replication-status pgdb
Status of streaming clients for server 'pgdb':
  Current LSN on master: 0/250008A8
  Number of streaming clients: 1

  1. #1 Sync WAL streamer
     Application name: barman_receive_wal
     Sync stage      : 3/3 Remote write
     Communication   : TCP/IP
     IP Address      : 192.168.1.10 / Port: 52602 / Host: -
     User name       : barman
     Current state   : streaming (sync)
     Replication slot: barman
     WAL sender PID  : 26592
     Started at      : 2018-08-16 16:03:21.422430+10:00
     Sent LSN   : 0/250008A8 (diff: 0 B)
     Write LSN  : 0/250008A8 (diff: 0 B)
     Flush LSN  : 0/250008A8 (diff: 0 B)

The above status means, barman is ready to perform streaming backup. Perform the backup as shown below -

[barman@pgserver arch_logs]$ barman backup pgdb
Starting backup using postgres method for server pgdb in /dbbackup/barman_backups/pgdb/base/20180816T160710
Backup start at LSN: 0/1F000528 (00000001000000000000001F, 00000528)
Starting backup copy via pg_basebackup for 20180816T160710
Copy done (time: 1 second)
Finalising the backup.
Backup size: 21.9 MiB
Backup end at LSN: 0/21000000 (000000010000000000000020, 00000000)
Backup completed (start time: 2018-08-16 16:07:10.401526, elapsed time: 1 second)
Processing xlog segments from file archival for pgdb
        00000001000000000000001F
        000000010000000000000020
        000000010000000000000020.00000028.backup
        000000010000000000000021
Processing xlog segments from streaming for pgdb
        00000001000000000000001F
        000000010000000000000020

Centralized and Catalogued Backups

Is highly beneficial for environments running multiple databases on multiple servers in a networked environment. This is one of the exceptional feature of Barman. I have worked in a real-time environments where-in i had to manage, administer 100s of databases and I always felt the need for centralized database backups and which is why Oracle RMAN became popular for Oracle database backup strategy and now Barman is filling that space for PostgreSQL. With Barman, DBA,s and DevOps engineers can work towards building a centralized backup server where-in Database backups for all the databases are maintained, validated.

Catalogued backups meaning, barman maintains a centralized repository where-in statuses of all the backups are maintained. You can check the backups available for particular database as shown below -

[barman@pgserver ~]$  barman list-backup pgdb
pgdb 20180816T160924 - Thu Aug 16 16:09:25 2018 - Size: 22.0 MiB - WAL Size: 135.7 KiB
pgdb 20180816T160710 - Thu Aug 16 16:07:11 2018 - Size: 21.9 MiB - WAL Size: 105.8 KiB
pgdb 20180816T153913 - Thu Aug 16 15:39:15 2018 - Size: 21.9 MiB - WAL Size: 54.2 KiB
pgdb 20180816T153846 - Thu Aug 16 15:38:48 2018 - Size: 21.9 MiB - WAL Size: 53.0 KiB

Backup Retention Policy

Retention policies can be defined for database backups. Backups can be rendered obsolete after a certain period and obsolete backups can be deleted time-to-time.

There are options in the configuration file to make sure backups are retained and made obsolete when retention period exceeds -

First parameter to configure is minimum_redundancy. Always configure minimum_redundancy to >0 to ensure backups are not deleted accidentally.

Example: minimum_redundancy = 1

retention_policy parameter will determine how long the base backups must be retained for to ensure disaster recovery SLAs are met.
wal_retention_policy parameter will determine how long the wal backups must be retained for. This is ensure expected RPO is met.

Existing retention and redundancy policies for a database server can be check using barman check command as follows

[barman@pgserver ~]$ barman check pgdb
Server pgdb:
        PostgreSQL: OK
        is_superuser: OK
        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 4 backups, expected at least 0)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: OK
        archiver errors: OK

Parallel Backups and Recoveries can be performed by utilizing multiple CPUs which really makes backups and recoveries complete faster. This feature is beneficial for very large databases sizing up to TeraBytes.

To execute backups parallely, add the following option in the database server configuration file (which is /etc/barman.d/pgdb.conf file)-

parallel_jobs = 1

I can conclude by saying that barman is an enterprise grade tool which can potentially help DBAs design an effective disaster recovery strategy.

Tags:

PostgreSQL

barman

backup

↧

Dimitri Fontaine: Geolocation with PostgreSQL

August 24, 2018, 3:11 am

≫ Next: Devrim GÜNDÜZ: My picks for Postgres Open SV 2018 !

≪ Previous: Venkata Nagothi: Using Barman to Backup PostgreSQL - An Overview

We have loaded Open Street Map points of interests in the article The Most Popular Pub Names— which compares PostgreSQL with MongoDB for simple geographical queries, and is part of our PostgreSQL Extensions article series. In today’s article, look at how to geolocalize an IP address and locate the nearest pub, all within a single SQL query!

For that, we are going to use the awesome ip4r extension from RhodiumToad.

↧

Devrim GÜNDÜZ: My picks for Postgres Open SV 2018 !

August 24, 2018, 5:07 am

≫ Next: Jobin Augustine: PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

≪ Previous: Dimitri Fontaine: Geolocation with PostgreSQL

Postgres Open SV 2018 is less than two weeks away now, and I'll speak there as well.

Many people, including me, are flying overseas to attend this great conference, so please hurry up and register before the tickets run out!

Life is hard when there are a lot of talks in parallel, but still I picked up some interesting talks that I want to attend: Continue reading "My picks for Postgres Open SV 2018 !"

↧

Jobin Augustine: PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

August 24, 2018, 8:38 am

≫ Next: Pavel Stehule: New release of pspg pager

≪ Previous: Devrim GÜNDÜZ: My picks for Postgres Open SV 2018 !

PostgreSQL foreign tables in MySQL There are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.

This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post

Preparing MySQL for fdw connectivity

On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.

mysql> create user 'fdw_user'@'%' identified by 'Secret!123';

This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.

mysql> grant select,insert,update,delete on EMP to fdw_user@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> grant select,insert,update,delete on DEPT to fdw_user@'%';
Query OK, 0 rows affected (0.00 sec)

Installing mysql_fdw on PostgreSQL server

Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongside other development packages. To switch to Percona packages as client libraries, you need to have the Percona development packages too.

sudo yum install Percona-Server-devel-57-5.7.22-22.1.el7.x86_64.rpm

Now we should be able to install the mysql_fdw from PGDG repository:

sudo yum install mysql_fdw_10.x86_64

Connect to the PostgreSQL server where we are going to create the foreign table, and using the command line tool, create mysql_fdw extension:

postgres=# create extension mysql_fdw;
CREATE EXTENSION

Create a server definition to point to the MySQL server running on a host machine by specifying the hostname and port:

postgres=# CREATE SERVER mysql_svr  FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host 'hr',port '3306');
CREATE SERVER

Now we can create a user mapping. This maps the database user in PostgreSQL to the user on the remote server (MySQL). While creating the user mapping, we need to specify the user credentials for the MySQL server as shown below. For this demonstration, we are using PUBLIC user in PostgreSQL. However, we could use a specific user as an alternative.

postgres=# CREATE USER MAPPING FOR PUBLIC SERVER mysql_svr OPTIONS (username 'fdw_user',password 'Secret!123');
CREATE USER MAPPING

Import schema objects

Once we complete the user mapping, we can import the foreign schema.

postgres=# IMPORT FOREIGN SCHEMA hrdb FROM SERVER mysql_svr INTO public;

Or we have the option to import only selected tables from the foreign schema.

postgres=# IMPORT FOREIGN SCHEMA hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO public;

This statement says that the tables “EMP” and “DEPT” from the foreign schema named “hrdb” in mysql_server need to be imported into the public schema of the PostgreSQL database.

FDWs in PostgreSQL allow us to import the tables to any schema in postgres.

Let’s create a schema in postgres:

postgres=# create schema hrdb;
postgres=# IMPORT FOREIGN SCHEMA hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO hrdb;

Suppose we need the foreign table to be part of multiple schemas of PostgreSQL. Yes, it is possible.

postgres=# create schema payroll;
CREATE SCHEMA
postgres=# create schema finance;
CREATE SCHEMA
postgres=# create schema sales;
CREATE SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO payroll;
IMPORT FOREIGN SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO finance;
IMPORT FOREIGN SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO sales;
IMPORT FOREIGN SCHEMA

You might be wondering if there’s a benefit to doing this. Yes, since in a multi-tenant environment, it allows us to centralize many of the master/lookup tables. These can even sit in a remote server, and that can be MySQL as well!.

IMPORTANT: PostgreSQL extensions are database specific. So if you have more than one database inside a PostgreSQL instance/cluster, you have to create a separate fdw extension, foreign server definition and user mapping.

Foreign tables with a subset of columns

Another important property of foreign tables is that you can have a subset of columns if you are not planning to issue DMLs on the remote table. For example MySQL’s famous sample database Sakila contains a table “film” with the following definition

CREATE TABLE `film` (
`film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` text,
`release_year` year(4) DEFAULT NULL,
`language_id` tinyint(3) unsigned NOT NULL,
`original_language_id` tinyint(3) unsigned DEFAULT NULL,
`rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
`rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
`length` smallint(5) unsigned DEFAULT NULL,
`replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',
`special_features` set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`),
KEY `idx_fk_language_id` (`language_id`),
KEY `idx_fk_original_language_id` (`original_language_id`),
CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE,
CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

Imagine that we don’t need all of these fields to be available to the PostgreSQL database and its application. In such cases, we can create a foreign table with only the necessary columns in the PostgreSQL side. For example:

CREATE FOREIGN TABLE film (
film_id smallint NOT NULL,
title varchar(255) NOT NULL,
) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'film');

The challenges of incompatible syntax and datatypes

There are many syntactical differences between MySQL and PostgreSQL. Consequently, you may need to manually intervene to create foreign tables. For example, MySQL tables accepts definition of enumerations in place, whereas PostgreSQL expects enumeration types to be defined before creating the table like this:

CREATE TYPE rating_t AS enum('G','PG','PG-13','R','NC-17');

Many such things are not handled perfectly. So it is better to specify them as a text datatype. The same applies to the set datatype.

CREATE FOREIGN TABLE film (
film_id smallint NOT NULL,
title varchar(255) NOT NULL,
rating text,
special_features text
) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'film');

I’m used to receiving scepticism from people about treating enum and set as text. Well, please don’t forget that we are not storing them in PostgreSQL, the text datatype is just a method for handling input and output from the table. The data is pulled and pushed from the foreign server, which is MySQL, and this converts these texts into the corresponding enumeration before storing them.

IMPORTANT : mysql_fdw has the capability to do data type conversion (casting) automatically behind the scenes when a user fires DML against foreign tables.

Generally, DML against a remote MySQL database from the PostgreSQL side can be quite challenging because of the architecture differences. These impose restrictions, such as the first column of the foreign table must be unique. We will cover these in more depth in a future post.

Handling views on the MySQL side

While foreign tables are not limited tables on the MySQL side, a view can also be mapped as a foreign table. Let’s create a view in the MySQL database.

mysql> create view v_film as select film_id,title,description,release_year from film;

PostgreSQL can treat this view as a foreign table:

postgres=# CREATE FOREIGN TABLE v_film (
film_id smallint,
title varchar(255) NOT NULL,
description text,
release_year smallint ) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'v_film');
CREATE FOREIGN TABLE

Views on the top of foreign table on PostgreSQL

PostgreSQL allows us to create views on the top of foreign tables. This might even be pointing to a view on the remote MySQL server. Let’s try creating a view using the newly created foreign table v_film.

postgres=# create view v2_film as select film_id,title from v_film;
postgres=# explain verbose select * from v2_film;
QUERY PLAN
--------------------------------------------------------------------------
Foreign Scan on public.v_film  (cost=10.00..1010.00 rows=1000 width=518)
Output: v_film.film_id, v_film.title
Local server startup cost: 10
Remote query: SELECT `film_id`, `title` FROM `sakila`.`v_film`
(4 rows)

Materializing the foreign tables (Materialized Views)

One of the key features mysql_fdw implements is the ability to support persistent connections. After query execution, the connection to the remote MySQL database is not dropped. Instead it retains the connection for the next query from the same session. Nevertheless, in some situations, there will be concerns about continuously streaming data from the source database (MySQL) to the destination (PostgreSQL). If you have a frequent need to access data from foreign tables, you could consider the option of materializing the data locally. It is possible to create a materialized view on the top of the foreign table.

postgres=# CREATE MATERIALIZED VIEW mv_film as select * from film;
SELECT 1000

Whenever required, we can just refresh the materialized view.

postgres=# REFRESH MATERIALIZED VIEW mv_film;
REFRESH MATERIALIZED VIEW

Automated Cleanup

One of the features I love about the FDW framework is its ability to clean up foreign tables in a single shot. This is very useful when we setup foreign table for a temporary purpose, like data migration. At the very top level, we can drop the extension, PostgreSQL will walk through the dependencies and drop those too.

postgres=# drop extension mysql_fdw cascade;
NOTICE:  drop cascades to 12 other objects
DETAIL:  drop cascades to server mysql_svr
drop cascades to user mapping for public on server mysql_svr
drop cascades to foreign table "DEPT"
drop cascades to foreign table "EMP"
drop cascades to foreign table hrdb."DEPT"
drop cascades to foreign table hrdb."EMP"
drop cascades to foreign table payroll."DEPT"
drop cascades to foreign table payroll."EMP"
drop cascades to foreign table finance."DEPT"
drop cascades to foreign table finance."EMP"
drop cascades to foreign table sales."DEPT"
drop cascades to foreign table sales."EMP"
DROP EXTENSION
postgres=#

Conclusion

I should concede that the features offered by mysql_fdw are far fewer compared to postgres_fdw. Many of the features are not yet implemented, including column renaming. But the good news is that the key developer and maintainer of mysql_fdw is here with Percona! Hopefully, we will be able to put more effort into implementing some of the missing features. Even so, we can see here that the features implemented so far are powerful enough to support system integration. We can really make the two sing together!

Percona’s support for PostgreSQL

As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here.

The post PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw appeared first on Percona Database Performance Blog.

↧

Pavel Stehule: New release of pspg pager

August 27, 2018, 1:00 am

≫ Next: Vladimir Svedov: Understanding System Columns in PostgreSQL

≪ Previous: Jobin Augustine: PostgreSQL Accessing MySQL as a Data Source Using mysqsl_fdw

I redesigned some work with mouse - ncurses native implementation is simple, but slow by design.

A default layout of pspg is based on old Norton Commander layout. It is good for beginners, because almost all controls are visible. Probably, when you work with pspg longer time, then you prefer more visible content against auxiliary lines. The lines (bars) can be disabled now - you can run pspg with option --no-bars. The pspg is available from github https://github.com/okbob/pspg

↧

Vladimir Svedov: Understanding System Columns in PostgreSQL

August 27, 2018, 2:58 am

≫ Next: Pavel Trukhanov: Real world SSD wearout

≪ Previous: Pavel Stehule: New release of pspg pager

So you sit with your hands over a keyboard and think “what fun I can have to make my lifetime even curiouser?..” Well - create a table of course!

vao=# create table nocol();
CREATE TABLE
vao=# select * from nocol;
--
(0 rows)

What fun is it about a table with no data?.. Absolutely none! But I can easily fix it:

vao=# insert into nocol default values;
INSERT 0 1

It looks weird and quite stupid to have a table with no columns and one row. Not to mention it is not clear what “default values” there were inserted… Well - reading few lines from docs reveals that “All columns will be filled with their default values.” Yet I have no columns! Well - I surely have some:

vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol'::regclass;
 attname  | attnum | atttypid | attisdropped 
----------+--------+----------+--------------
 tableoid |     -7 | oid      | false
 cmax     |     -6 | cid      | false
 xmax     |     -5 | xid      | false
 cmin     |     -4 | cid      | false
 xmin     |     -3 | xid      | false
 ctid     |     -1 | tid      | false
(6 rows)

So these six are definitely not the ALTER TABLE DROP COLUMN zombies because attisdropped is false. Also I see that the type name of those columns end up with “id”. Reading the bottom section of Object Identifier Types will give the idea. Another funny observation is - the -2 is missing! I wonder where I could have lost it - I just created a table after all! Hm, what object identifier is missing in my table? By definition I mean. I have tuple, command and xact ids. Well unless some “global over whole db identifier”, like oid?.. Checking is easy - I will create table with OIDS:

vao=# create table nocol_withoid() with oids;
CREATE TABLE
vao=# select attname, attnum, atttypid::regtype, attisdropped::text from pg_attribute where attrelid = 'nocol_withoid'::regclass;
 attname  | attnum | atttypid | attisdropped 
----------+--------+----------+--------------
 tableoid |     -7 | oid      | false
 cmax     |     -6 | cid      | false
 xmax     |     -5 | xid      | false
 cmin     |     -4 | cid      | false
 xmin     |     -3 | xid      | false
 oid      |     -2 | oid      | false
 ctid     |     -1 | tid      | false
(7 rows)

Voila! So the missing -2 is missing indeed and we like it. Spending oids for used data rows would be a bad idea, so I’ll keep playing with a table without OIDS.

What I have? I have 6 attributes after creating “no column table” with (oids=false). Should I use system columns? If so, why they are kind of hidden? Well - I would assume they are not so broadly advertised, because the usage is not intuitive and behaviour can change in future. For instance after seeing tuple id (ctid) some might think “ah - this is sort of internal PK” (and it kind of is):

vao=# select ctid from nocol;
 ctid  
-------
 (0,1)
(1 row)

First digits (zero) stand for the page number and the second (one) stand for the tuple number. They are sequential:

vao=# insert into nocol default values;
INSERT 0 1
vao=# select ctid from nocol;
 ctid  
-------
 (0,1)
 (0,2)
(2 rows)

But this sequence won’t help you to define even which row arrived after which:

vao=# alter table nocol add column i int;
ALTER TABLE
vao=# update nocol set i = substring(ctid::text from 4 for 1)::int;
UPDATE 2
vao=# select i, ctid from nocol;
 i | ctid  
---+-------
 1 | (0,3)
 2 | (0,4)
(2 rows)

Here I added a column (to identify my rows) and filled it with initial tuple number (mind both rows were physically moved)

vao=# delete from nocol where ctid = '(0,3)';
DELETE 1
vao=# vacuum nocol;
VACUUM
vao=# insert into nocol default values;
INSERT 0 1
vao=# select i, ctid from nocol;
 i | ctid  
---+-------
   | (0,1)
 2 | (0,4)
(2 rows)

Aha! (said with rising intonation) - here I deleted one of my rows, let out the vacuum on the poor table and inserted a new row. The result - the later added row is in the first page first tuple, because Postgres wisely decided to save the space and reuse the freed up space.

So the idea to use ctid for getting the sequence of rows introduced looks bad. Up to some level - if you work in one transaction the sequence remains - newly affected rows on same table will have “larger” ctid. Of course after vacuum (autovacuum) or if you’re lucky enough to have HOT updates earlier or just released gaps will be reused - breaking the sequential order. But fear not - there were six hidden attributes, not one!

vao=# select i, ctid, xmin from nocol;
 i | ctid  | xmin  
---+-------+-------
   | (0,1) | 26211
 2 | (0,4) | 26209
(2 rows)

If I check the xmin, I will see that the transaction id that introduced the last inserted row is (+2) higher (+1 was the deleted row). So for sequential row identifier I might use totally different attribute! Of course it’s not this simple, otherwise such usage would be encouraged. The xmin column before 9.4 was actually overwritten to protect from xid wraparound. Why so complicated? The MVCC in Postgres is very smart and methods around it get better over time. Of course it brings complexity. Alas. Some people even want to avoid system columns. Double alas. Because system columns are cool and well documented. The very top attribute (remember I skip oids) is tableoid:

vao=# select i, tableoid from nocol;
 i | tableoid 
---+----------
   |   253952
 2 |   253952
(2 rows)

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

It looks useless having SAME value in every row - doesn’t it? And yet a while ago it was very popular attribute - when we were all building partitioning using rules and inherited tables. How would you debug which table the row is coming from if not with tableoid? So when you use rules, views (same rules) or UNION the tableoid attribute helps you identify the source:

vao=# insert into nocol_withoid default values;
INSERT 253967 1
vao=# select ctid, tableoid from nocol union select ctid, tableoid from nocol_withoid ;
 ctid  | tableoid 
-------+----------
 (0,1) |   253952
 (0,1) |   253961
 (0,4) |   253952
(3 rows)

Wow what was that? I have got so much used to see INSERT 0 1 that my psql output looked weird! Ah - true - I created a table with oids and just desperately pointlessly used one (253967) identifier! Well - not completely pointlessly (though desperately) - the select returns two rows with same ctid (0,1) - not surprising - I’m selecting from two tables and then adding results one to another, so the chance to have the same ctid is not that low. The last thing to mention is I can again use object identifier types to show it pretty:

vao=# select ctid, tableoid::regclass from nocol union select ctid, tableoid from nocol_withoid ;
 ctid  |   tableoid    
-------+---------------
 (0,1) | nocol
 (0,1) | nocol_withoid
 (0,4) | nocol
(3 rows)

Aha! (said with rising intonation) - So that’s the way to clearly pin the data source here!

Finally another very popular and interesting usage - defining which row was inserted and which upserted:

vao=# update nocol set i = 0 where i is null;
UPDATE 1
vao=# alter table nocol alter COLUMN i set not null;
ALTER TABLE
vao=# alter table nocol add constraint pk primary key (i);
ALTER TABLE

Now that we have a PK, I can use ON CONFLICT directive:

vao=# insert into nocol values(0),(-1) on conflict(i) do update set i = extract(epoch from now()) returning i, xmax;
     i      |   xmax    
------------+-----------
 1534433974 |     26281
         -1 |         0
(2 rows)

Why so happy? Because I can tell (with some confidentiality) that row with xmax not equal to zero that it was updated. And don’t think it’s obvious - it looks so just because I used unixtime for PK, so it looks really different from one digit values. Imagine you do such ON CONFLICT twist on big set and there is no logical way to identify which value had conflict and which - not. xmax helped tonnes of DBAs in hard times. And the best description of how it works I would recommend here - just as I would recommend all three discussion participants (Abelisto, Erwin and Laurenz) to be read on other postgres tag questions and answers on SO.

That’s it.

tableoid, xmax, xmin and ctid are good friends of any DBA. Not to insult cmax, cmin and oid - the are just as good friends too! But this is enough for a small review and I want to get my hands off the keyboard now.

Tags:

PostgreSQL

system columns

↧

Pavel Trukhanov: Real world SSD wearout

August 27, 2018, 8:16 am

≫ Next: Daniel Vérité: Beware of your next glibc upgrade

≪ Previous: Vladimir Svedov: Understanding System Columns in PostgreSQL

A year ago we’ve added SMART metrics collection to our monitoring agent that collects disk drive attributes on clients servers.

So here a couple of interesting cases from the real world.

Because we needed it to work without installing any additional software, like smartmontools, we implemented collection not of all the attributes, but only basic and not vendor-specific ones — to be able to provide consistent experience. And also that way we skipped burdensome task of maintaining a knowledge base of specific stuff — and I like that a lot :)

This time we’ll discuss only SMART attribute named “media wearout indicator”. Normalized, it shows a percentage of “write resource” left in the device. Under the hood the device keeps track of the number of cycles the NAND media has undergone, and the percentage is calculated against the maximum number of cycles for that device. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0.

Are there any actually dead SSDs?

Though SSDs are pretty common nowadays, just couple of years earlier you could hear a lot of fear talk about SSD wearout. So we wanted to see if some of it were true. So we searched for the maximum wearout across all the devices of all of our clients.

It was just 1%

Reading the docs says it just won’t go below 1%. So it is weared out.

We notified this client. Turns out it was a dedicated server in Hetzner. Their support replaced the device:

Do SSDs die fast?

As we introduced SMART monitoring for some of the clients already some time ago, we have accumulated history. And now we can see it on a timeline.

A server with highest wearout rate we have across our clients servers unfortunately was added to okmeter.io monitoring only two month ago:

This chart indicates that during these two month only, it burned through 8% of “write resource”.

So 100% of this SSD lifetime under that load will end in 100/(8/2) = 2 years.

Is that a lot or too little? I don’t know. But let’s check what kind of load it’s serving?

As you can see, it’s ceph doing all the disk writes, but it’s not doing these writes for itself — it’s a storage system for some application. This particular environment was running under Kubernetes, so let’s sneak a peek what’s running inside:

It’s Redis! Though you might’ve noticed divergence in values with the previous chart — values here are 2 times lower (it’s probably due to ceph’s data replication), load profile is the same, so we conclude it’s redis after all.

Let’s see what redis is doing:

So it’s on average less than 100 write commands per second. As you might know, there’s two ways Redis makes actual writes to disk:

RDB — which periodically snapshots all the dataset to the disk, and
AOF — which writes a log of all the changes.

It’s obvious that’s here we saw RDB with 1 minute dums:

Case: SSD + RAID

We see that there are three common patterns of server storage system setup with SSDs:

Two SSDs in a RAID-1 that holds everything there is.
Some HDDs + SSDs in a RAID-10 — we see that setup a lot on traditional RDBMS servers: OS, WAL and some “cold” data on HDD, while SSD array hold hotest data.
Just a bunch of SSDs (JBOD) for some NoSQL like Apache Cassandra.

So in the first case with RAID-1 writes go to both disks symmetrically, and wearout happens with the same rate:

Looking for some anomalies we found one server where it was completely different:

Checking mount options, to understand this, didn’t produce much insight — all the partitions were RAID-1 mdraids:

But looking for per device IO metrics we see, again, there’s difference between two disks. And /dev/sda gets more bytes written:

Turns out there’s swap configured on one of the /dev/sda partitions. And pretty decent swap IO on this server:

SSD wearout and PostgreSQL

This journey began with me looking to check SSD wearout with different Postgres write load profiles. But not much luck — all of our client’s Postgres databases, with at least somewhat high write load, are configured pretty carefully — writes go mostly to HDDs.

But I found one pretty interesting case nevertheless:

We see these two SSDs in a RAID-1 weared out 4% during 3 months. But checking if it’s high amount of WAL writes turned out to be wrong — it’s only less than 100Kb/s:

I figured that probably Postgres generates writes in some other way, and it is indeed. Constant temp files writes all the time:

Thanks to Postgres elaborate internal statistics and okmeter.io’s rich support for it, we easily spotted the root cause of that:

It was a SELECT query generating all that load and wearout! SELECT’s in Postgres can sometime generate even non-temp file, but real writes. Read about it here.

Summary

Redis+RDB generates a ton of disk writes and it depends not on the amount of changes in Redis db, but on DB size and dump frequency. RDB seems to produce the maximum Write Amplification from all known to me storages.
Actively used SWAP on SSD is probably a bad idea. Unless you want to add some jitter to RAID-1 SSDs wearout.
In DBMSes like Postgresql it might be not only WAL and datafiles that dominate disk writes. Bad database design or access patterns might produce a lot of temp files writes. Read how to monitor Postgres queries.

That’s all for today. Be aware of your SSDs wearout!

We at okmeter.io believe that for engineer to dig up a root cause of a problem, he needs decent tooling and a lot of metrics on every layer and part of infrastructure. That where we’re trying to help.

Real world SSD wearout was originally published in okmeter.io blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Daniel Vérité: Beware of your next glibc upgrade

August 27, 2018, 9:15 am

≫ Next: Bruce Momjian: Certificate Revocation Lists

≪ Previous: Pavel Trukhanov: Real world SSD wearout

GNU libc 2.28, released on August 1, 2018, has among its new features a major update of its Unicode locale data with new collation information.

From the announcement:

The localization data for ISO 14651 is updated to match the 2016 Edition 4 release of the standard, this matches data provided by Unicode 9.0.0. This update introduces significant improvements to the collation of Unicode characters. […] With the update many locales have been updated to take advantage of the new collation information. The new collation information has increased the size of the compiled locale archive or binary locales.

For Postgres databases using language and region-sensitive collations, which tend to be the default nowadays, it means that certain strings might sort differently after this upgrade. A critical consequence is that indexes that depend on such collations must be rebuilt immediately after the upgrade. Servers in WAL-based/streaming replication setups should also be upgraded together since a standby must run the same libc/locales as its primary.

The risk otherwise is index corruption issues, as mentioned for instance in these two threads from pgsql-general: “Issues with german locale on CentOS 5,6,7”, and “The dangers of streaming across versions of glibc: A cautionary tale”

So while this issue is not new, what’s special about glibc-2.28 is the scale of the update in locales, which is unprecedented in recent times. Previously and since year 2000, according to bug#14095, the locale data in glibc were modified on a case-by-case basis. This time, there’s a big merge to close the gap with the standard.

To get a feel for the extent of these changes, I’ve installed ArchLinux which already has glibc-2.28, along with PostgreSQL 10.5, and compared some query results against the same Postgres on Debian 9 (“stretch”) with glibc-2.24.

I expected changes, but not so broad. Simple tests on plain ASCII strings reveal obvious differences immediately. For instance, with the en_US.UTF-8 locale:

Debian stretch (glibc 2.24)

=# select version();
                                                             version                                                              
----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 (Debian 10.5-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
(1 row)

=# show lc_collate ;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

=# SELECT * FROM (values ('a'), ('$a'), ('a$'), ('b'), ('$b'), ('b$'), ('A'), ('B'))
   AS l(x) ORDER BY x ;
 x  
----
 a
 $a
 a$
 A
 b
 $b
 b$
 B
(6 rows)

ArchLinux (glibc 2.28):

=# select version();
                                   version                                   
-----------------------------------------------------------------------------
 PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.2.0, 64-bit
(1 row)

=# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

=# SELECT * FROM (values ('a'), ('$a'), ('a$'), ('b'), ('$b'), ('b$'), ('A'), ('B'))
   AS l(x) ORDER BY x ;
 x  
----
 $a
 $b
 a
 A
 a$
 b
 B
 b$
(6 rows)

The changes are not limited to UTF-8 locales. The above differences also occur with LATIN9 encoding and lc_collate = 'fr_FR.iso885915@euro', for instance. Here’s an even simpler query showing other non-alphabetic characters giving different comparison results across versions:

Debian stretch (glibc 2.24)

=# SELECT * FROM (values ('"0102"'), ('0102')) AS x(x)
   ORDER BY x;
   x    
--------
 0102
 "0102"
(2 rows)

ArchLinux (glibc 2.28):

=# SELECT * FROM (values ('"0102"'), ('0102')) AS x(x)
   ORDER BY x;
   x    
--------
 "0102"
 0102
(2 rows)

The above query is one I liked to use to illustrate the difference between FreeBSD and Linux/glibc. The en_US collation in FreeBSD 11 used to give opposite results on this than glibc, but now it turns out that the new glibc gives the same results…

Of course people generally don’t upgrade libc on their own initiative, it happens as part of a system upgrade. Whether that system upgrade includes a Postgres upgrade, and whether it includes a dump/reload or database-wide REINDEXes, that remains to be checked by administrators. pg_upgrade does not reindex automatically nor mentions the need to reindex in that particular situation.

As of this writing, only “bleeding edge” distros like ArchLinux already ship glibc 2.28. For Fedora it’s scheduled for October 30, 2018; Debian has 2.27-5 in testing, and Ubuntu “cosmic” (18.10) has 2.27-3.

In any case, Linux users, you certainly want to check if your databases are concerned by these collation updates, and if yes, watch out for when glibc 2.28 is landing on your systems and prepare an upgrade scenario to avoid any risk of data corruption!

To know which collations each database use by default:

 SELECT datname, datcollate FROM pg_database;

To know which collations are in use in indexes (to run in each database):

SELECT distinct collname FROM pg_collation JOIN
  (SELECT regexp_split_to_table(n::text,' ')::oid  AS o
    FROM (SELECT distinct indcollation AS n FROM pg_index) AS a) AS b on o=oid
 -- WHERE collprovider <> 'i'
;

Uncomment the last line with Postgres 10 or newer to filter out ICU collations if needed (ICU collations are not concerned with glibc upgrades).

↧

Bruce Momjian: Certificate Revocation Lists

August 27, 2018, 4:30 pm

≫ Next: Federico Campoli: A mechanical elephant

≪ Previous: Daniel Vérité: Beware of your next glibc upgrade

If you are setting up Postgres server or client TLS/SSL certificates, be sure to also configure support for a certificate revocation list (CRL). This list, distributed by the certificate authority, lists certificates that should no longer be trusted.

While the CRL will initially likely be empty, a time will come when a private key used by a certificate or device is exposed in an unauthorized manner, or an employee who had access to private keys leaves your organization. When that happens, you will need the ability to invalidate certificates — having that ability pre-configured will help, especially during a crisis.

↧

Federico Campoli: A mechanical elephant

August 27, 2018, 5:00 pm

≫ Next: Hans-Juergen Schoenig: What hot_standby_feedback in PostgreSQL really does

≪ Previous: Bruce Momjian: Certificate Revocation Lists

In the previous post we modified the apt role for controlling the setup in a declarative way. Then we added a ssh role for configuring the three devuan servers. The role is used to configure the server’s postgres process owner for ssh passwordless connection.

In this tutorial we’ll complete the setup for the postgres user and then we’ll configure the database clusters with a new role.

↧

Hans-Juergen Schoenig: What hot_standby_feedback in PostgreSQL really does

August 28, 2018, 3:23 am

≫ Next: Achilleas Mantzios: Multitenancy Options for PostgreSQL

≪ Previous: Federico Campoli: A mechanical elephant

Many of you out there using PostgreSQL streaming replication might wonder what this hot_standby_feedback parameter in postgresql.conf really does. Support customers keep asking this question so it might be useful to share this knowledge with a broader audience of PostgreSQL users out there.

What VACUUM does in PostgreSQL

VACUUM is an essential command in PostgreSQL its goal is to clean out dead rows, which are not needed by anyone anymore. The idea is to reuse space inside a table later as new data comes in. The important thing is: The purpose of VACUUM is to reuse space inside a table – this does not necessarily imply that a relation will shrink. Also: Keep in mind that VACUUM can only clean out dead rows, if they are not need anymore by some other transaction running on your PostgreSQL server.

Consider the following image:

hot_standby_feedback in PostgreSQL — How hot_standby_feedback and VACUUM work together in PostgreSQL

As you can see we have two connections here. The first connection on the left side is running a lengthy SELECT statement. Now keep in mind: An SQL statement will basically “freeze” its view of the data. Within an SQL statement the world does not “change” – the query will always see the same set of data regardless of changes made concurrently. That is really really important to understand.

Let us take a look at the second transaction. It will delete some data and commit. The question that naturally arises is: When can PostgreSQL really delete this row from disk? DELETE itself cannot really clean the row from disk because there might still be a ROLLBACK instead of a COMMIT. In other words a rows must not be deleted on DELETE. PostgreSQL can only mark it as dead for the current transaction. As you can see other transactions might still be able to see those deleted rows.
However, even COMMIT does not have the right to really clean out the row. Remember: The transaction on the left side can still see the dead row because the SELECT statement does not change its snapshot while it is running. COMMIT is therefore too early to clean out the row.

This is when VACUUM enters the scenario. VACUUM is here to clean rows, which cannot be seen by any other transaction anymore. In my image there are two VACUUM operations going on. The first one cannot clean the dead row yet because it is still seen by the left transaction.
However, the second VACUUM can clean this row because it is not used by the reading transaction anymore.

On a single server the situation is therefore pretty clear. VACUUM can clean out rows, which are not seen anymore.

Replication conflicts in PostgreSQL

What happens in a master / slave scenario? The situation is slightly more complicated because how can the master know that some strange transaction is going on one of the slaves?

Here is an image showing a typical scenario:

PostgreSQL VACUUM and table bloat — Prevent table bloat with VACUUM in PostgreSQL

In this case a SELECT statement on the replica is running for a couple of minutes. In the meantime a change is made on the master (UPDATE, DELETE, etc.). This is still no problem. Remember: DELETE does not really delete the row – it simply marks it as dead but it is still visible to other transactions, which are allowed to see the “dead” row. The situation becomes critical if a VACUUM on the master is allowed to really delete row from disk. VACUUM is allowed to do that because it has no idea that somebody on a slave is still going to need the row. The result is a replication conflict. By default a replication conflict is resolved after 30 seconds:

ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed

If you have ever seen a message like that – this is exactly the kind of problem we are talking about here.

hot_standby_feedback can prevent replication conflicts

To solve this kind of problem, we can teach the slave to periodically inform the master about the oldest transaction running on the slave. If the master knows about old transactions on the slave, it can make VACUUM keep rows until the slaves are done.
This is exactly what hot_standby_feedback does. It prevents rows from being deleted too early from a slave’s point of view. The idea is to inform the master about the oldest transaction ID on the slave so that VACUUM can delay its cleanup action for certain rows.

The benefit is obvious: hot_standby_feedback will dramatically reduce the number of replication conflicts. However, there are also downsides: Remember, VACUUM will delay its cleanup operations. If the slave never terminates a query, it can lead to table bloat on the master, which can be dangerous in the long run.

The post What hot_standby_feedback in PostgreSQL really does appeared first on Cybertec.

↧

Achilleas Mantzios: Multitenancy Options for PostgreSQL

August 28, 2018, 2:58 am

≫ Next: Christophe Pettus: Don’t LOCK tables. Just don’t.

≪ Previous: Hans-Juergen Schoenig: What hot_standby_feedback in PostgreSQL really does

Multi-tenancy in a software system is called the separation of data according to a set of criteria in order to satisfy a set of objectives. The magnitude/extend, the nature and the final implementation of this separation is dependent on those criteria and objectives. Multi-tenancy is basically a case of data partitioning but we’ll try to avoid this term for the obvious reasons (the term in PostgreSQL has a very specific meaning and is reserved, as declarative table partitioning was introduced in postgresql 10).

The criteria might be:

according to the id of an important master table, which symbolizes the tenant id which might represent:
1. a company/organization within a larger holding group
2. a department within a company/organization
3. a regional office/branch of the same company/organization
according to a user’s location/IP
according to a user’s position inside the company/organization

The objectives might be:

separation of physical or virtual resources
separation of system resources
security
accuracy and convenience of management/users at the various levels of the company/organization

Note by fulfilling an objective we also fulfill all the objectives beneath, i.e. by fulfilling A we also fulfill B, C and D, by fulfilling B we also fulfill C and D, and so forth.

If we want to fulfill objective A we may choose to deploy each tenant as a separate database cluster within its own physical/virtual server. This gives maximum separation of resources and security but gives poor results when we need to see the whole data as one, i.e. the consolidated view of the whole system.

If we want to only achieve objective B we might deploy each tenant as a separate postgresql instance in the same server. This would give us control over how much space would be assigned to each instance, and also some control (depending on the OS) on CPU/mem utilization. This case is not essentially different than A. In the modern cloud computing era, the gap between A and B tends to get smaller and smaller, so that A will be most probably the prefered way over B.

If we want to achieve objective C, i.e. security, then it is enough to have one database instance and deploy each tenant as a separate database.

And finally if we care only for “soft” separation of data, or in other words different views of the same system, we can achieve this by just one database instance and one database, using a plethora of techniques discussed below as the final (and major) topic of this blog. Talking about multi-tenancy, from the DBA’s perspective, cases A, B and C bear a lot of similarities. This is since in all cases we have different databases and in order to bridge those databases, then special tools and technologies must be used. However, if the need to do so comes from the analytics or Business Intelligence departments then no bridging maybe needed at all, since the data could be very well replicated to some central server dedicated to those tasks, making bridging unnecessary. If indeed such a bridging is needed then we must use tools like dblink or foreign tables. Foreign tables via Foreign Data Wrappers is nowadays the preferred way.

If we use option D, however, then consolidation is already given by default, so now the hard part is the opposite: separation. So we may generally categorize the various options into two main categories:

Soft separation
Hard separation

Hard Separation via Different Databases in Same Cluster

Let’s suppose that we have to design a system for an imaginary business offering car and boat rentals, but because those two are governed by different legislations, different controls, audits, each company must maintain separate accounting departments and thus we would like to keep their systems separated. In this case we choose to have a different database for each company: rentaldb_cars and rentaldb_boats, which will have identical schemas:

# \d customers
                                  Table "public.customers"
   Column    |     Type      | Collation | Nullable |                Default                
-------------+---------------+-----------+----------+---------------------------------------
 id          | integer       |           | not null | nextval('customers_id_seq'::regclass)
 cust_name   | text          |           | not null |
 birth_date  | date          |           |          |
 sex         | character(10) |           |          |
 nationality | text          |           |          |
Indexes:
    "customers_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "rental" CONSTRAINT "rental_customerid_fkey" FOREIGN KEY (customerid) REFERENCES customers(id)

# \d rental
                              Table "public.rental"
   Column   |  Type   | Collation | Nullable |              Default               
------------+---------+-----------+----------+---------------------------------
 id         | integer |           | not null | nextval('rental_id_seq'::regclass)
 customerid | integer |           | not null |
 vehicleno  | text    |           |          |
 datestart  | date    |           | not null |
 dateend    | date    |           |          |
Indexes:
    "rental_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "rental_customerid_fkey" FOREIGN KEY (customerid) REFERENCES customers(id)

Lets suppose we have the following rentals. In rentaldb_cars:

rentaldb_cars=# select cust.cust_name,rent.vehicleno,rent.datestart FROM rental rent JOIN customers cust on (rent.customerid=cust.id);
    cust_name    | vehicleno | datestart  
-----------------+-----------+------------
 Valentino Rossi | INI 8888  | 2018-08-10
(1 row)

and in rentaldb_boats:

rentaldb_boats=# select cust.cust_name,rent.vehicleno,rent.datestart FROM rental rent JOIN customers cust on (rent.customerid=cust.id);
   cust_name    | vehicleno | datestart  
----------------+-----------+------------
 Petter Solberg | INI 9999  | 2018-08-10
(1 row)

Now the management would like to have a consolidated view of the system, e.g. a unified way to view the rentals. We may solve this via the application, but if we don’t want to update the application or don’t have access to the source code, then we might solve this by creating a central database rentaldb and by making use of foreign tables, as follows:

CREATE EXTENSION IF NOT EXISTS postgres_fdw WITH SCHEMA public;
CREATE SERVER rentaldb_boats_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS (
    dbname 'rentaldb_boats'
);
CREATE USER MAPPING FOR postgres SERVER rentaldb_boats_srv;
CREATE SERVER rentaldb_cars_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS (
    dbname 'rentaldb_cars'
);
CREATE USER MAPPING FOR postgres SERVER rentaldb_cars_srv;
CREATE FOREIGN TABLE public.customers_boats (
    id integer NOT NULL,
    cust_name text NOT NULL
)
SERVER rentaldb_boats_srv
OPTIONS (
    table_name 'customers'
);
CREATE FOREIGN TABLE public.customers_cars (
    id integer NOT NULL,
    cust_name text NOT NULL
)
SERVER rentaldb_cars_srv
OPTIONS (
    table_name 'customers'
);
CREATE VIEW public.customers AS
 SELECT 'cars'::character varying(50) AS tenant_db,
    customers_cars.id,
    customers_cars.cust_name
   FROM public.customers_cars
UNION
 SELECT 'boats'::character varying AS tenant_db,
    customers_boats.id,
    customers_boats.cust_name
   FROM public.customers_boats;
CREATE FOREIGN TABLE public.rental_boats (
    id integer NOT NULL,
    customerid integer NOT NULL,
    vehicleno text NOT NULL,
    datestart date NOT NULL
)
SERVER rentaldb_boats_srv
OPTIONS (
    table_name 'rental'
);
CREATE FOREIGN TABLE public.rental_cars (
    id integer NOT NULL,
    customerid integer NOT NULL,
    vehicleno text NOT NULL,
    datestart date NOT NULL
)
SERVER rentaldb_cars_srv
OPTIONS (
    table_name 'rental'
);
CREATE VIEW public.rental AS
 SELECT 'cars'::character varying(50) AS tenant_db,
    rental_cars.id,
    rental_cars.customerid,
    rental_cars.vehicleno,
    rental_cars.datestart
   FROM public.rental_cars
UNION
 SELECT 'boats'::character varying AS tenant_db,
    rental_boats.id,
    rental_boats.customerid,
    rental_boats.vehicleno,
    rental_boats.datestart
   FROM public.rental_boats;

In order to view all the rentals and the customers in the whole organization we simply do:

rentaldb=# select cust.cust_name, rent.* FROM rental rent JOIN customers cust ON (rent.tenant_db=cust.tenant_db AND rent.customerid=cust.id);
    cust_name    | tenant_db | id | customerid | vehicleno | datestart  
-----------------+-----------+----+------------+-----------+------------
 Petter Solberg  | boats     |  1 |          1 | INI 9999  | 2018-08-10
 Valentino Rossi | cars      |  1 |          2 | INI 8888  | 2018-08-10
(2 rows)

This looks good, isolation and security are guaranteed, consolidation is achieved, but still there are problems:

customers must be separately maintained, meaning that the same customer might end up with two accounts
The application must respect the notion of a special column (such as tenant_db) and append this to every query, making it prone to errors
The resulting views are not automatically updatable (since they contain UNION)

Soft Separation in the Same Database

When this approach is chosen then consolidation is given out of the box and now the hard part is separation. PostgreSQL offers a plethora of solutions to us in order to implement separation:

Views
Role Level Security
Schemas

With views, the application must set a queryable setting such as application_name, we hide the main table behind a view, and then in every query on any of the children (as in FK dependency) tables, if any, of this main table join with this view. We will see this in the following example in a database we call rentaldb_one. We embed the tenant company identification into the main table:

rentaldb_one=# \d rental_one
                                   Table "public.rental_one"
   Column   |         Type          | Collation | Nullable |              Default               
------------+-----------------------+-----------+----------+------------------------------------
 company    | character varying(50) |           | not null |
 id         | integer               |           | not null | nextval('rental_id_seq'::regclass)
 customerid | integer               |           | not null |
 vehicleno  | text                  |           |          |
 datestart  | date                  |           | not null |
 dateend    | date                  |           |          |
Indexes:
    "rental_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "rental_company_check" CHECK (company::text = ANY (ARRAY['cars'::character varying, 'boats'::character varying]::text[]))
Foreign-key constraints:
    "rental_customerid_fkey" FOREIGN KEY (customerid) REFERENCES customers(id)

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

The table customers’ schema remains the same. Lets see the current contents of the database:

rentaldb_one=# select * from customers;
 id |    cust_name    | birth_date | sex | nationality
----+-----------------+------------+-----+-------------
  2 | Valentino Rossi | 1979-02-16 |     |
  1 | Petter Solberg  | 1974-11-18 |     |
(2 rows)

rentaldb_one=# select * from rental_one ;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 cars    |  1 |          2 | INI 8888  | 2018-08-10 |
 boats   |  2 |          1 | INI 9999  | 2018-08-10 |
(2 rows)

We use the new name rental_one in order to hide this behind the new view which will have the same name of the table that the application expects: rental.The application will need to set the application name to denote the tenant. So in this example we will have three instances of the application, one for cars, one for boats and one for the top management. The application name is set like:

rentaldb_one=# set application_name to 'cars';

We now create the view:

create or replace view rental as select company as "tenant_db",id,customerid,vehicleno,datestart,dateend from rental_one where (company = current_setting('application_name') OR current_setting('application_name')='all');

Note: We keep the same columns, and table/view names as possible, the key point in multi-tenant solutions is to keep things same in the application side, and changes to be minimal and manageable.

Let’s do some selects:

rentaldb_one=# set application_name to 'cars';

rentaldb_one=# set application_name to 'cars';
SET
rentaldb_one=# select * from rental;
 tenant_db | id | customerid | vehicleno | datestart  | dateend
-----------+----+------------+-----------+------------+---------
 cars      |  1 |          2 | INI 8888  | 2018-08-10 |
(1 row)

rentaldb_one=# set application_name to 'boats';
SET
rentaldb_one=# select * from rental;
 tenant_db | id | customerid | vehicleno | datestart  | dateend
-----------+----+------------+-----------+------------+---------
 boats     |  2 |          1 | INI 9999  | 2018-08-10 |
(1 row)

rentaldb_one=# set application_name to 'all';
SET
rentaldb_one=# select * from rental;
 tenant_db | id | customerid | vehicleno | datestart  | dateend
-----------+----+------------+-----------+------------+---------
 cars      |  1 |          2 | INI 8888  | 2018-08-10 |
 boats     |  2 |          1 | INI 9999  | 2018-08-10 |
(2 rows)

The 3rd instance of the application which must set application name to “all” is intended for use by the top management with a view to the whole database.

A more robust solution, security-wise, may be based on RLS (row level security). First we restore the name of the table, remember we don’t want to disturb the application:

rentaldb_one=# alter view rental rename to rental_view;
rentaldb_one=# alter table rental_one rename TO rental;

First we create the two groups of users for each company (boats, cars) which must see their own subset of the data:

rentaldb_one=# create role cars_employees;
rentaldb_one=# create role boats_employees;

We now create security policies for each group:

rentaldb_one=# create policy boats_plcy ON rental to boats_employees USING(company='boats');
rentaldb_one=# create policy cars_plcy ON rental to cars_employees USING(company='cars');

After giving the required grants to the two roles:

rentaldb_one=# grant ALL on SCHEMA public to boats_employees ;
rentaldb_one=# grant ALL on SCHEMA public to cars_employees ;
rentaldb_one=# grant ALL on ALL tables in schema public TO cars_employees ;
rentaldb_one=# grant ALL on ALL tables in schema public TO boats_employees ;

we create one user in each role

rentaldb_one=# create user boats_user password 'boats_user' IN ROLE boats_employees;
rentaldb_one=# create user cars_user password 'cars_user' IN ROLE cars_employees;

And test:

postgres@smadev:~> psql -U cars_user rentaldb_one
Password for user cars_user:
psql (10.5)
Type "help" for help.

rentaldb_one=> select * from rental;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 cars    |  1 |          2 | INI 8888  | 2018-08-10 |
(1 row)

rentaldb_one=> \q
postgres@smadev:~> psql -U boats_user rentaldb_one
Password for user boats_user:
psql (10.5)
Type "help" for help.

rentaldb_one=> select * from rental;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 boats   |  2 |          1 | INI 9999  | 2018-08-10 |
(1 row)

rentaldb_one=>

The nice thing with this approach is that we don’t need many instances of the application. All the isolation is done at the database level based on the user’s roles. Therefore in order to create a user in the top management all we need to do is grant this user both roles:

rentaldb_one=# create user all_user password 'all_user' IN ROLE boats_employees, cars_employees;
postgres@smadev:~> psql -U all_user rentaldb_one
Password for user all_user:
psql (10.5)
Type "help" for help.

rentaldb_one=> select * from rental;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 cars    |  1 |          2 | INI 8888  | 2018-08-10 |
 boats   |  2 |          1 | INI 9999  | 2018-08-10 |
(2 rows)

Looking at those two solutions we see that the view solution requires changing the basic table name, which may be pretty intrusive in that we may need to run exactly the same schema in a non multitenant solution, or with an app that is not aware of application_name, while the second solution binds people to specific tenants. What if the same person works e.g. on the boats tenant in the morning and on the cars tenant in the afternoon? We will see a 3rd solution based on schemas, which in my opinion is the most versatile, and does not suffer of any of the caveats of the two solutions described above. It allows the application to run in a tenant-agnostic manner, and the system engineers to add tenants on the go as needs arise. We will keep the same design as before, with the same test data (we will keep working on the rentaldb_one example db). The idea here is to add a layer in front of the main table in the form of a database object in a separate schema which will be early enough in the search_path for that specific tenant. The search_path can be set (ideally via a special function, which gives more options) in the connection configuration of the data source at the application server layer (therefore outside of the application code). First we create the two schemas:

rentaldb_one=# create schema cars;
rentaldb_one=# create schema boats;

Then we create the database objects (views) in each schema:

CREATE OR REPLACE VIEW boats.rental AS
 SELECT rental.company,
    rental.id,
    rental.customerid,
    rental.vehicleno,
    rental.datestart,
    rental.dateend
   FROM public.rental
  WHERE rental.company::text = 'boats';

CREATE OR REPLACE VIEW cars.rental AS
 SELECT rental.company,
    rental.id,
    rental.customerid,
    rental.vehicleno,
    rental.datestart,
    rental.dateend
   FROM public.rental
  WHERE rental.company::text = 'cars';

Next step is to set the search path in each tenant as follows:

For the boats tenant:

set search_path TO 'boats, "$user", public';

For the cars tenant:

set search_path TO 'cars, "$user", public';

For the top mgmt tenant leave it at default

Lets test:

rentaldb_one=# select * from rental;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 cars    |  1 |          2 | INI 8888  | 2018-08-10 |
 boats   |  2 |          1 | INI 9999  | 2018-08-10 |
(2 rows)

rentaldb_one=# set search_path TO 'boats, "$user", public';
SET
rentaldb_one=# select * from rental;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 boats   |  2 |          1 | INI 9999  | 2018-08-10 |
(1 row)

rentaldb_one=# set search_path TO 'cars, "$user", public';
SET
rentaldb_one=# select * from rental;
 company | id | customerid | vehicleno | datestart  | dateend
---------+----+------------+-----------+------------+---------
 cars    |  1 |          2 | INI 8888  | 2018-08-10 |
(1 row)

Instead of set search_path we may write a more complex function to handle more complex logic and call this in the connection configuration of our application or connection pooler.

In the example above we used the same central table residing on public schema (public.rental) and two additional views for each tenant, using the fortunate fact that those two views are simple and therefore writeable. Instead of views we may use inheritance, by creating one child table for each tenant inheriting from the public table. This is a fine match for table inheritance, a unique feature of PostgreSQL. The top table might be configured with rules to disallow inserts. In the inheritance solution a conversion would be needed to populate the children tables and to prevent insert access to the parent table, so this is not as simple as in the case with views, which works with minimal impact to the design. We might write a special blog on how to do that.

The above three approaches may be combined to give even more options.

Tags:

PostgreSQL

multitenancy

↧

Christophe Pettus: Don’t LOCK tables. Just don’t.

August 28, 2018, 5:00 pm

≫ Next: Brian Fehrle: A Guide to Partitioning Data In PostgreSQL

≪ Previous: Achilleas Mantzios: Multitenancy Options for PostgreSQL

It’s not uncommon that an application needs to serialize access to one or more resources. The temptation is very high to use the LOCK TABLE SQL statement to do this.

Resist the temptation.

There are many issues with using LOCK:

It blocks autovacuum, which can cause bloat or even transaction ID wraparound in extreme cases.
An ACCESS EXCLUSIVE lock (the default mode) is passed down to secondaries, which can block queries there, or even cause deadlock-type situations.
It’s easy to cause deadlocks with bad LOCKing order.

If the goal is to serialize access, consider using advisory locks instead. They have all of the benefits of a LOCK on a table, while not actually blocking access to autovacuum, or access on secondaries.

(Yes, some database tools may need to take explicit locks for a variety of reasons; that’s a different matter, of course.)

↧

Brian Fehrle: A Guide to Partitioning Data In PostgreSQL

August 29, 2018, 2:58 am

≫ Next: Ibrar Ahmed: Tune Linux Kernel Parameters For PostgreSQL Optimization

≪ Previous: Christophe Pettus: Don’t LOCK tables. Just don’t.

What is Data Partitioning?

For databases with extremely large tables, partitioning is a wonderful and crafty trick for database designers to improve database performance and make maintenance much easier. The maximum table size allowed in a PostgreSQL database is 32TB, however unless it’s running on a not-yet-invented computer from the future, performance issues may arise on a table with only a hundredth of that space.

Partitioning splits a table into multiple tables, and generally is done in a way that applications accessing the table don’t notice any difference, other than being faster to access the data that it needs. By splitting the table into multiple tables, the idea is to allow the execution of the queries to have to scan much smaller tables and indexes to find the data needed. Regardless of how efficient an index strategy is, scanning an index for a table that’s 50GB will always be much faster than an index that’s for a table at 500GB. This applies to table scans as well, because sometimes table scans are just unavoidable.

When introducing a partitioned table to the query planner, there are a few things to know and understand about the query planner itself. Before any query is actually executed, the query planner will take the query and plan out the most efficient way it will access the data. By having the data split up across different tables, the planner can decide what tables to access, and what tables to completely ignore, based on what each table contains.

This is done by adding constraints to the split up tables that define what data is allowed in each table, and with a good design, we can have the query planner scan a small subset of data rather than the whole thing.

Should A Table Be Partitioned?

Partitioning can drastically improve performance on a table when done right, but if done wrong or when not needed, it can make performance worse, even unusable.

How big is the table?

There is no real hardline rule for how big a table must be before partitioning is an option, but based on database access trends, database users and administrators will start to see performance on a specific table start to degrade as it gets bigger. In general, partitioning should only be considered when someone says “I can’t do X because the table is too big.” For some hosts, 200 GB could be the right time to partition, for others, it may be time to partition when it hits 1TB.

If the table is determined to be “too big”, it’s time to look at the access patterns. Either by knowing the applications that access the database, or by monitoring logs and generating query reports with something like pgBadger, we can see how a table is accessed, and depending on how it’s accessed, we can have options for a good partitioning strategy.

To learn more about pgBadger and how to use it, please check out our previous article about pgBadger.

Is table bloat an issue?

Updated and deleted rows results in dead tuples that ultimately need to be cleaned up. Vacuuming tables, whether manually or automatically, goes over every row in the table and determines if it is to be reclaimed or left alone. The larger the table, the longer this process takes, and the more system resources used. Even if 90% of a table is unchanging data, it must be scanned each time a vacuum is run. Partitioning the table can help reduce the table that needs vacuuming to smaller ones, reducing the amount of unchanging data needing to be scanned, less time vacuuming overall, and more system resources freed up for user access rather than system maintenance.

How is data deleted, if at all?

If data is deleted on a schedule, say data older than 4 years get deleted and archived, this could result in heavy hitting delete statements that can take time to run, and as mentioned before, creating dead rows that need to be vacuumed. If a good partitioning strategy is implemented, a multi- hour DELETE statement with vacuuming maintenance afterward could be turned into a one minute DROP TABLE statement on a old monthly table with zero vacuum maintenance.

How Should The Table Be Partitioned?

The keys for access patterns are in the WHERE clause and JOIN conditions. Any time a query specifies columns in the WHERE and JOIN clauses, it tells the database “this is the data I want”. Much like designing indexes that target these clauses, partitioning strategies rely on targeting these columns to separate data and have the query access as few partitions as possible.

Examples:

A transaction table, with a date column that is always used in a where clause.
A customer table with location columns, such as country of residence that is always used in where clauses.

The most common columns to focus on for partitioning are usually timestamps, since usually a huge chunk of data is historical information, and likely will have a rather predictable data spread across different time groupings.

Determine the Data Spread

Once we identify which columns to partition on we should take a look at the spread of data, with the goal of creating partition sizes that spread the data as evenly as possible across the different child partitions.

severalnines=# SELECT DATE_TRUNC('year', view_date)::DATE, COUNT(*) FROM website_views GROUP BY 1 ORDER BY 1;
 date_trunc |  count
------------+----------
 2013-01-01 | 11625147
 2014-01-01 | 20819125
 2015-01-01 | 20277739
 2016-01-01 | 20584545
 2017-01-01 | 20777354
 2018-01-01 |   491002
(6 rows)

In this example, we truncate the timestamp column to a yearly table, resulting in about 20 million rows per year. If all of our queries specify a date(s), or date range(s), and those specified usually cover data within a single year, this may be a great starting strategy for partitioning, as it would result in a single table per year, with a manageable number of rows per table.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Creating a Partitioned Table

There are a couple ways to create partitioned tables, however we will focus mainly on the most feature rich type available, trigger based partitioning. This requires manual setup and a bit of coding in the plpgsql procedural language to get working.

It operates by having a parent table that will ultimately become empty (or remain empty if it’s a new table), and child tables that INHERIT the parent table. When the parent table is queried, the child tables are also searched for data due to the INHERIT applied to the child tables. However, since child tables only contain subsets of the parent’s data, we add a CONSTRAINT on the table that does a CHECK and verifies that the data matches what’s allowed in the table. This does two things: First it refuses data that doesn’t belong, and second it tells the query planner that only data matching this CHECK CONSTRAINT is allowed in this table, so if searching for data that doesn’t match the table, don’t even bother searching it.

Lastly, we apply a trigger to the parent table that executes a stored procedure that decides which child table to put the data.

Create Table

Creating the parent table is like any other table creation.

severalnines=# CREATE TABLE data_log (data_log_sid SERIAL PRIMARY KEY,
  date TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),
  event_details VARCHAR);
CREATE TABLE

Create Child Tables

Creating the child tables are similar, but involve some additions. For organizational sake, we’ll have our child tables exist in a separate schema. Do this for each child table, changing the details accordingly.

NOTE: The name of the sequence used in the nextval() comes from the sequence that the parent created. This is crucial for all child tables to use the same sequence.

severalnines=# CREATE SCHEMA part;
CREATE SCHEMA

severalnines=# CREATE TABLE part.data_log_2018 (data_log_sid integer DEFAULT nextval('public.data_log_data_log_sid_seq'::regclass),
  date TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),
  event_details VARCHAR)
 INHERITS (public.data_log);
CREATE TABLE

severalnines=# ALTER TABLE ONLY part.data_log_2018
    ADD CONSTRAINT data_log_2018_pkey PRIMARY KEY (data_log_sid);
ALTER TABLE

severalnines=# ALTER TABLE part.data_log_2018 ADD CONSTRAINT data_log_2018_date CHECK (date >= '2018-01-01' AND date < '2017-01-01');
ALTER TABLE

Create Function and Trigger

Finally, we create our stored procedure, and add the trigger to our parent table.

severalnines=# CREATE OR REPLACE FUNCTION 
 public.insert_trigger_table()
  RETURNS trigger
  LANGUAGE plpgsql
 AS $function$
 BEGIN
     IF NEW.date >= '2018-01-01' AND NEW.date < '2019-01-01' THEN
         INSERT INTO part.data_log_2018 VALUES (NEW.*);
         RETURN NULL;
     ELSIF NEW.date >= '2019-01-01' AND NEW.date < '2020-01-01' THEN
         INSERT INTO part.data_log_2019 VALUES (NEW.*);
         RETURN NULL;
     END IF;
 END;
 $function$;
CREATE FUNCTION

severalnines=# CREATE TRIGGER insert_trigger BEFORE INSERT ON data_log FOR EACH ROW EXECUTE PROCEDURE insert_trigger_table();
CREATE TRIGGER

Test it Out

Now that it’s all created, let’s test it. In this test, I’ve added more yearly tables covering 2013 - 2020.

Note: The insert response below is ‘INSERT 0 0’, which would suggest it didn’t insert anything. This will be addressed later in this article.

severalnines=# INSERT INTO data_log (date, event_details) VALUES (now(), 'First insert');
INSERT 0 0

severalnines=# SELECT * FROM data_log WHERE date >= '2018-08-01' AND date < '2018-09-01';
 data_log_sid |            date            | event_details
--------------+----------------------------+---------------
            1 | 2018-08-17 23:01:38.324056 | First insert
(1 row)

It exists, but let’s look at the query planner to make sure the row came from the correct child table, and the parent table didn’t return any rows at all.

severalnines=# EXPLAIN ANALYZE SELECT * FROM data_log;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..130.12 rows=5813 width=44) (actual time=0.016..0.019 rows=1 loops=1)
   ->  Seq Scan on data_log  (cost=0.00..1.00 rows=1 width=44) (actual time=0.007..0.007 rows=0 loops=1)
   ->  Seq Scan on data_log_2015  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on data_log_2013  (cost=0.00..17.80 rows=780 width=44) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on data_log_2014  (cost=0.00..17.80 rows=780 width=44) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on data_log_2016  (cost=0.00..17.80 rows=780 width=44) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on data_log_2017  (cost=0.00..17.80 rows=780 width=44) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on data_log_2018  (cost=0.00..1.02 rows=2 width=44) (actual time=0.005..0.005 rows=1 loops=1)
   ->  Seq Scan on data_log_2019  (cost=0.00..17.80 rows=780 width=44) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Seq Scan on data_log_2020  (cost=0.00..17.80 rows=780 width=44) (actual time=0.001..0.001 rows=0 loops=1)
 Planning time: 0.373 ms
 Execution time: 0.069 ms
(12 rows)

Good news, the single row we inserted landed in the 2018 table, where it belongs. But as we can see, the query doesn’t specify a where clause using the date column, so in order to fetch everything, the query planner and execution did a sequential scan on every single table.

Next, let’s test using a where clause.

severalnines=# EXPLAIN ANALYZE SELECT * FROM data_log WHERE date >= '2018-08-01' AND date < '2018-09-01';
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..2.03 rows=2 width=44) (actual time=0.013..0.014 rows=1 loops=1)
   ->  Seq Scan on data_log  (cost=0.00..1.00 rows=1 width=44) (actual time=0.007..0.007 rows=0 loops=1)
         Filter: ((date >= '2018-08-01 00:00:00'::timestamp without time zone) AND (date < '2018-09-01 00:00:00'::timestamp without time zone))
   ->  Seq Scan on data_log_2018  (cost=0.00..1.03 rows=1 width=44) (actual time=0.006..0.006 rows=1 loops=1)
         Filter: ((date >= '2018-08-01 00:00:00'::timestamp without time zone) AND (date < '2018-09-01 00:00:00'::timestamp without time zone))
 Planning time: 0.591 ms
 Execution time: 0.041 ms
(7 rows)

Here we can see that the query planner and execution did a sequential scan on two tables, the parent and the child table for 2018. There are child tables for the years 2013 - 2020, but those other than 2018 were never accessed because the where clause has a range belonging only within 2018. The query planner ruled out all the other tables because the CHECK CONSTRAINT deems it impossible for the data to exist in those tables.

Working Partitions with Strict ORM Tools or Inserted Row Validation

As mentioned before, the example we built returns a ‘INSERT 0 0’ even though we inserted a row. If the applications inserting data into these partitioned tables rely on verifying that rows inserted is correct, these will fail. There is a fix, but it adds another layer of complexity to the partitioned table, so can be ignored if this scenario is not an issue for the applications using the partitioned table.

Using a View instead of the parent table.

The fix for this issue is to create a view that queries the parent table, and direct INSERT statements to the view. Inserting into a view may sound crazy, but that’s where the trigger on the view comes in.

severalnines=# CREATE VIEW data_log_view AS 
 SELECT data_log.data_log_sid,
     data_log.date,
     data_log.event_details
    FROM data_log;
CREATE VIEW

Querying this view will look just like querying the main table, and WHERE clauses as well as JOINS will operate as expected.

View Specific Function and Trigger

Instead of using the function and trigger we defined before, they both will be slightly different. Changes in bold.

CREATE OR REPLACE FUNCTION public.insert_trigger_view()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
    IF NEW.date >= '2018-01-01' AND NEW.date < '2019-01-01' THEN
        INSERT INTO part.data_log_2018 VALUES (NEW.*);
        RETURN NEW;

    ELSIF NEW.date >= '2019-01-01' AND NEW.date < '2020-01-01' THEN
        INSERT INTO part.data_log_2019 VALUES (NEW.*);
        RETURN NEW;

    END IF;
END;
$function$;

severalnines=# CREATE TRIGGER insert_trigger INSTEAD OF INSERT ON data_log_view FOR EACH ROW EXECUTE PROCEDURE insert_trigger_view();

The “INSTEAD OF” definition takes over the insert command on the view (which wouldn’t work anyway), and executes the function instead. The function we defined has a very specific requirement of doing a ‘RETURN NEW;’ after the insert into the child tables is complete. Without this (or doing it like we did before with ‘RETURN NULL’) will result in ‘INSERT 0 0’ instead of ‘INSERT 0 1’ as we would expect.

Example:

severalnines=# INSERT INTO data_log_view (date, event_details) VALUES (now(), 'First insert on the view');
INSERT 0 1

severalnines=# EXPLAIN ANALYZE SELECT * FROM data_log_view WHERE date >= '2018-08-01' AND date < '2018-09-01';
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..2.03 rows=2 width=44) (actual time=0.015..0.017 rows=2 loops=1)
   ->  Seq Scan on data_log  (cost=0.00..1.00 rows=1 width=44) (actual time=0.009..0.009 rows=0 loops=1)
         Filter: ((date >= '2018-08-01 00:00:00'::timestamp without time zone) AND (date < '2018-09-01 00:00:00'::timestamp without time zone))
   ->  Seq Scan on data_log_2018  (cost=0.00..1.03 rows=1 width=44) (actual time=0.006..0.007 rows=2 loops=1)
         Filter: ((date >= '2018-08-01 00:00:00'::timestamp without time zone) AND (date < '2018-09-01 00:00:00'::timestamp without time zone))
 Planning time: 0.633 ms
 Execution time: 0.048 ms
(7 rows)

severalnines=# SELECT * FROM data_log_view WHERE date >= '2018-08-01' AND date < '2018-09-01';
 data_log_sid |            date            |      event_details
--------------+----------------------------+--------------------------
            1 | 2018-08-17 23:46:24.860634 | First insert
            2 | 2018-08-18 00:13:01.126795 | First insert on the view
(2 rows)

Applications testing for inserted ‘rowcount’ to be correct will find this fix to work as expected. In this example, we appended _view to our view and stored procedure, but if the table is to be desired to be partitioned without any users knowing / application change, then we would rename the parent table to data_log_parent, and call the view by the old parent table’s name.

Updating a row and changing the partitioned column value

One thing to be aware of is that if performing an update on the data in the partitioned table, and changing the value of the column to something not allowed by the constraint will result in an error. If this type of update will never happen, then it can be ignored, but if it is a possibility, a new trigger for UPDATE processes should be written that will effectively delete the row from the old child partition, and insert a new one into the new target child partition.

Creating Future Partitions

Creating future partitions can be done in a few different ways, each with their pros and cons.

Future Partition Creator

An external program can be written up to create future partitions X time before they are needed. In a partitioning example partitioned on a date, the next needed partition to create (in our case 2019) could be set to be created sometime in December. This can be a manual script run by the Database Administrator, or set to have cron run it when needed. Yearly partitions would mean it runs once a year, however daily partitions are common, and a daily cron job makes for a happier DBA.

Automatic Partition Creator

With the power of plpgsql, we can capture errors if trying to insert data into a child partition that doesn’t exist, and on the fly create the needed partition, then try inserting again. This option works well except in the case where many different clients inserting similar data at the same time, could cause a race condition where one client creates the table, while another attempts to create the same table and gets an error of it already existing. Clever and advanced plpgsql programming can fix this, but whether or not it is worth the level of effort is up for debate. If this race condition will not happen due to the insert patterns, then there’s nothing to worry about.

Dropping Partitions

If data retention rules dictate that data is deleted after a certain amount of time, this becomes easier with partitioned tables if partitioned on a date column. If we are to delete data that’s 10 years old, it could be as simple as:

severalnines=# DROP TABLE part.data_log_2007;
DROP TABLE

This is much quicker, and more efficient than a ‘DELETE’ statement, as it doesn’t result in any dead tuples to clean up with a vacuum.

Note: If removing tables from the partition setup, code in the trigger functions should also be altered to not direct date to the dropped table.

Things to Know Before Partitioning

Partitioning tables can offer a drastic improvement to performance, but it could also make it worse. Before pushing to production servers, the partitioning strategy should be tested extensively, for data consistency, performance speed, everything. Partitioning a table has a few moving parts, they should all be tested to make sure there are zero issues.

When it comes to deciding the number of partitions, it’s highly suggested to keep the number of child tables under 1000 tables, and even lower if possible. Once the child table count gets above ~1000, performance starts to take a dive as the query planner itself ends up taking much longer just to make the query plan. It’s not unheard of to have a query plan take many seconds, while the actual execution only takes a few milliseconds. If servicing thousands of queries a minute, several seconds could bring applications to a standstill.

The plpgsql trigger stored procedures can also get complicated, and if too complicated, also slow down performance. The stored procedure is executed once for every row inserted into the table. If it ends up doing too much processing for every row, inserts could become too slow. Performance testing will make sure it’s still in acceptable range.

Get Creative

Partitioning tables in PostgreSQL can be as advanced as needed. Instead of date columns, tables can be partitioned on a ‘country’ column, with a table for each country. Partitioning can be done on multiple columns, such as both a ‘date’ and a ‘country’ column. This will make the stored procedure handling the inserts more complex, but it’s 100% possible.

Remember, the goals with partitioning are to break extremely large tables down into smaller ones, and do it in a well thought out way to allow the query planner to access the data faster than it could have in the larger original table.

Declarative Partitioning

In PostgreSQL 10 and later, a new partitioning feature ‘Declarative Partitioning’ was introduced. It’s an easier way to set up partitions, however has some limitations, If the limitations are acceptable, it will likely perform faster than the manual partition setup, but copious amounts of testing will verify that.

The official postgresql documentation has information about Declarative Partitioning and how it works. It’s new in PostgreSQL 10, and with version 11 of PostgreSQL on the horizon at the time of this writing, some of the limitations are fixed, but not all of them. As PostgreSQL evolves, Declarative Partitioning may become a full replacement for the more complex partitioning covered in this article. Until then, Declarative Partitioning may be an easier alternative if none of the limitations limit the partitioning needs.

Declarative Partitioning Limitations

The PostgreSQL documentation addresses all of the limitations with this type of partitioning in PostgreSQL 10, but a great overview can be found on The Official PostgreSQL Wiki which lists the limitations in an easier to read format, as well as noting which ones have been fixed in the upcoming PostgreSQL 11.

Ask the Community

Database Administrators all around the globe have been designing advanced and custom partitioning strategies for a long time, and many of us hang out in IRC and mailing lists. If help is needed deciding the best strategy, or just getting a bug in a stored procedure resolved, the community is here to help.

IRC
Freenode has a very active channel called #postgres, where users help each other understand concepts, fix errors, or find other resources.
Mailing Lists
PostgreSQL has a handful of mailing lists that can be joined. Longer form questions / issues can be sent here, and can reach many more people than IRC at any given time. The lists can be found on the PostgreSQL Website, and the lists pgsql-general or pgsql-admin are good resources.

Tags:

PostgreSQL

data partitioning

howto

↧

Ibrar Ahmed: Tune Linux Kernel Parameters For PostgreSQL Optimization

August 29, 2018, 5:33 am

≫ Next: Bruce Momjian: Foreign Data Wrappers and Passwords

≪ Previous: Brian Fehrle: A Guide to Partitioning Data In PostgreSQL

Linux parameters for PostgreSQL performance tuning For optimum performance, a PostgreSQL database depends on the operating system parameters being defined correctly. Poorly configured OS kernel parameters can cause degradation in database server performance. Therefore, it is imperative that these parameters are configured according to the database server and its workload. In this post, we will discuss some important kernel parameters that can affect database server performance and how these should be tuned.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to define the maximum size of a single shared memory segment a Linux process can allocate. Until version 9.2, PostgreSQL uses System V (SysV) that requires SHMMAX setting. After 9.2, PostgreSQL switched to POSIX shared memory. So now it requires fewer bytes of System V shared memory.

Prior to version 9.3 SHMMAX was the most important kernel parameter. The value of SHMMAX is in bytes.

Similarly SHMALL is another kernel parameter used to define system wide total amount of shared memory pages. To view the current values for SHMMAX, SHMALL or SHMMIN, use the ipcs command.

$ ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1073741824
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

$ ipcs -M
IPC status from  as of Thu Aug 16 22:20:35 PKT 2018
shminfo:
	shmmax: 16777216	(max shared memory segment size)
	shmmin:       1	(min shared memory segment size)
	shmmni:      32	(max number of shared memory identifiers)
	shmseg:       8	(max shared memory segments per process)
	shmall:    1024	(max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate the shared memory. This parameter is one of the most important kernel parameters. Whenever you get following error messages, it means that you have an older version PostgreSQL and you have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they are going to use.

Possible misconfiguration errors

If SHMMAX is misconfigured, you can get an error when trying to initialize a PostgreSQL cluster using the initdb command.

DETAIL: Failed system call was shmget(key=1, size=2072576, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.&nbsp;
You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 2072576 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1

Similarly, you can get an error when starting the PostgreSQL server using the pg_ctl command.

DETAIL: Failed system call was shmget(key=5432001, size=14385152, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.
You can either reduce the request size or reconfigure the kernel with larger SHMMAX.; To reduce the request size (currently 14385152 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.

Be aware of differing definitions

The definition of the SHMMAX/SHMALL parameters is slightly different between Linux and MacOS X. These are the definitions:

Linux: kernel.shmmax, kernel.shmall
MacOS X: kern.sysv.shmmax, kern.sysv.shmall

The sysctl command can be used to change the value temporarily. To permanently set the value, add an entry into /etc/sysctl.conf. The details are given below.

# Get the value of SHMMAX
sudo sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kern.sysv.shmall
kern.sysv.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kern.sysv.shmmax=16777216
kern.sysv.shmmax: 4096 -> 16777216<br>
# Set the value of SHMALL
sudo sysctl -w kern.sysv.shmall=16777216
kern.sysv.shmall: 4096 -> 16777216

# Get the value of SHMMAX
sudo sysctl kernel.shmmax
kernel.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kernel.shmall
kernel.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kernel.shmmax=16777216
kernel.shmmax: 4096 -> 16777216<br>
# Set the value of SHMALL
sudo sysctl -w kernel.shmall=16777216
kernel.shmall: 4096 -> 16777216

Remember: to make the change permanent add these values in

/etc/sysctl.conf

Huge Pages

Linux, by default uses 4K memory pages, BSD has Super Pages, whereas Windows has Large Pages. A page is a chunk of RAM that is allocated to a process. A process may own more than one page depending on its memory requirements. The more memory a process needs, the more pages that are allocated to it. The OS maintains a table of page allocation to processes. The smaller the page size, the bigger the table, the more time required to lookup a page in that page table. Therefore, huge pages make it possible to use large amount of memory with reduced overheads; fewer page look ups, fewer page faults, faster read/write operations through larger buffers. This results in improved performance.

PostgreSQL has support for bigger pages on Linux only. By default, Linux uses 4K of memory pages, so in cases where there are too many memory operations, there is a need to set bigger pages. Performance gains have been observed by using huge pages with sizes 2 MB and up to 1 GB. The size of Huge Page can be set boot time. You can easily check the huge page settings and utilization on your Linux box using cat /proc/meminfo | grep -i huge command.

Note: This is only for Linux, for other OS this operation is ignored$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this example, although huge page size is set at 2,048 (2 MB), the total number of huge pages has a value of 0. which signifies that huge pages are disabled.

Script to quantify Huge Pages

This is a simple script which returns the number of Huge Pages required. Execute the script on your Linux box while your PostgreSQL is running. Ensure that $PGDATA environment variable is set to PostgreSQL’s data directory.

#!/bin/bash
pid=`head -1 $PGDATA/postmaster.pid`
echo "Pid:            $pid"
peak=`grep ^VmPeak /proc/$pid/status | awk '{ print $2 }'`
echo "VmPeak:            $peak kB"
hps=`grep ^Hugepagesize /proc/meminfo | awk '{ print $2 }'`
echo "Hugepagesize:   $hps kB"
hp=$((peak/hps))
echo Set Huge Pages:     $hp

The output of the script looks like this:

Pid:            12737
VmPeak:         180932 kB
Hugepagesize:   2048 kB
Set Huge Pages: 88

The recommended huge pages are 88, therefore you should set the value to 88.

sysctl -w vm.nr_hugepages= 88

Check the huge pages now, you will see no huge page is in use (HugePages_Free = HugePages_Total).

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       88
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now set the parameter huge_pages “on” in $PGDATA/postgresql.conf and restart the server.

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       81
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that a very few of the huge pages are used. Let’s now try to add some data into the database.

postgres=# CREATE TABLE foo(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO foo VALUES(generate_Series(1,10000000));
INSERT 0 10000000

Let’s see if we are now using more huge pages than before.

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       18
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that most of the huge pages are in use.

Note: The sample value for HugePages used here is very low, which is not a normal value for a big production machine. Please assess the required number of pages for your system and set those accordingly depending on your systems workload and resources.

vm.swappiness

vm.swappiness is another kernel parameter that can affect the performance of the database. This parameter is used to control the swappiness (swapping pages to and from swap memory into RAM) behaviour on a Linux system. The value ranges from 0 to 100. It controls how much memory will be swapped or paged out. Zero means disable swap and 100 means aggressive swapping.

You may get good performance by setting lower values.

Setting a value of 0 in newer kernels may cause the OOM Killer (out of memory killer process in Linux) to kill the process. Therefore, you can be on safe side and set the value to 1 if you want to minimize swapping. The default value on a Linux system is 60. A higher value causes the MMU (memory management unit) to utilize more swap space than RAM, whereas a lower value preserves more data/code in memory.

A smaller value is a good bet to improve performance in PostgreSQL.

vm.overcommit_memory / vm.overcommit_ratio

Applications acquire memory and free that memory when it is no longer needed. But in some cases an application acquires too much memory and does not release it. This can invoke the OOM killer. Here are the possible values for vm.overcommit_memory parameter with a description for each:

1. Heuristic overcommit, Do it intelligently (default); based kernel heuristics
2. Allow overcommit anyway
3. Don’t over commit beyond the overcommit ratio.

Reference: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

vm.overcommit_ratio is the percentage of RAM that is available for overcommitment. A value of 50% on a system with 2 GB of RAM may commit up to 3 GB of RAM.

A value of 2 for vm.overcommit_memory yields better performance for PostgreSQL. This value maximises RAM utilization by the server process without any significant risk of getting killed by the OOM killer process. An application will be able to overcommit, but only within the overcommit ratio, thus reducing the risk of having OOM killer kill the process. Hence a value to 2 gives better performance than the default 0 value. However, reliability can be improved by ensuring that memory beyond an allowable range is not overcommitted. It avoid the risk of the process being killed by OOM-killer.

On systems without swap, one may experience problem when vm.overcommit_memory is 2.

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

vm.dirty_background_ratio / vm.dirty_background_bytes

The vm.dirty_background_ratio is the percentage of memory filled with dirty pages that need to be flushed to disk. Flushing is done in the background. The value of this parameter ranges from 0 to 100; however a value lower than 5 may not be effective and some kernels do not internally support it. The default value is 10 on most Linux systems. You can gain performance for write intensive operations with a lower ratio, which means that Linux flushes dirty pages in the background.

You need to set a value of vm.dirty_background_bytes depending on your disk speed.

There are no “good” values for these two parameters since both depend on the hardware. However, setting vm.dirty_background_ratio to 5 and vm.dirty_background_bytes to 25% of your disk speed improves performance by up to ~25% in most cases.

vm.dirty_ratio / dirty_bytes

This is same as vm.dirty_background_ratio / dirty_background_bytes except that the flushing is done in the foreground, blocking the application. So vm.dirty_ratio should be higher than vm.dirty_background_ratio. This will ensure that background processes kick in before the foreground processes to avoid blocking the application, as much as possible. You can tune the difference between the two ratios depending on your disk IO load.

Summing up

You can tune other parameters for performance, but the improvement gains are likely to be minimal. We must keep in mind that not all parameters are relevant for all applications types. Some applications perform better by tuning some parameters and some applications don’t. You need to find a good balance between these parameter configurations for the expected application workload and type, and OS behaviour must also be kept in mind when making adjustments. Tuning kernel parameters is not as easy as tuning database parameters: it’s harder to be prescriptive.

In my next post, I’ll take a look at tuning PostgreSQL’s database parameters. You might also enjoy this post:

Tuning PostgreSQL for sysbench-tpcc

The post Tune Linux Kernel Parameters For PostgreSQL Optimization appeared first on Percona Database Performance Blog.

↧

Bruce Momjian: Foreign Data Wrappers and Passwords

August 29, 2018, 8:45 am

≫ Next: Craig Kerstiens: Postgres data types you should consider using

≪ Previous: Ibrar Ahmed: Tune Linux Kernel Parameters For PostgreSQL Optimization

Foreign data wrappers (FDW) allow data to be read and written to foreign data sources, like NoSQL stores or other Postgres servers. Unfortunately the authentication supported by FDWs is typically limited to passwords defined using CREATE USER MAPPING. For example, postgres_fdw only supports password-based authentication, e.g. SCRAM. Though only the database administrator can see the password, this can still be a security issue.

Ideally, at least some of the Postgres FDWs should support more sophisticated authentication methods, particularly SSL certificates. Another option would be to allow user authentication to be sent through FDWs, so the user has the same permissions on the FDW source and target. There is no technical reason FDW authentication is limited to passwords. This problem has been discussed, and it looks like someone has a plan for solving it, so hopefully it will be improved soon.

↧

Craig Kerstiens: Postgres data types you should consider using

August 29, 2018, 11:07 am

≫ Next: Dimitri Fontaine: Preventing SQL Injections

≪ Previous: Bruce Momjian: Foreign Data Wrappers and Passwords

Postgres is a rich and powerful database. And the existence of PostgreSQL extension APIs have enabled Postgres to expand its capabilities beyond the boundaries of what you would expect in a traditional relational database. Examples of popular Postgres extensions today include HyperLogLog, which gives you approximate distincts with a small footprint—to rich geospatial support via PostGIS—to Citus which helps you scale out your Postgres database across multiple nodes to improve performance for multi-tenant SaaS applications and real-time analytics dashboards—to the built-in full text search capabilities in PostgreSQL. With all the bells and whistles you can layer into Postgres, sometimes the most basic built-ins get overlooked.

PostgreSQL has nearly 100 different data types, and these data types can come with their own tuned indexing or their own specialized functions. You probably already use the basics such as integers and text, and today we’re going to take a survey of less-used but incredibly powerful PostgreSQL data types.

JSONB tops the list of Postgres data types

Postgres first received JSON support in Postgres 9.2. But the initial JSON support in 9.2 was about JSON validation, hence was less ideal for many use cases that needed JSON as well as fast query performance.

A couple of years later we got the the successor to the JSON datatype: JSONB. JSONB is a binary version of JSON stored on disk. JSONB compresses, so you lose whitespace, but it comes with some powerful index types to allow you to work much more flexibly with your JSON data.

JSONB is great for unstructured data, and with Postgres you can easily join JSONB data to your other relational models. We use JSONB ourselves heavily at Citus for things such as feature flags, event observation data, and recording logs. You can index JSONB data with GIN indexes which allow you to query on keys and speed up your lookups, since the GIN index makes the full document automatically available for you.

Range types are a calendar app’s best friend

Let’s face it, dealing with time in any fashion is hard. When dealing with time ranges, the challenges can be even worse: how do ensure your your conference schedule doesn’t have two talks scheduled at the same time in a given room? How do you ensure you only have a single invoice for each month? With range types, the value has a from and to value, or a range. You can have ranges of numbers such as 1-20, or ranges of timestamps. The next time you have two columns in your database for a from-to, or a start-top value, consider using a timestamp range.

Once you have your timestamp range in place, make sure to set up your constraints to enforce the data integrity you’re looking for.

Define your own acceptable values with enums

Knowing the values that are inserted into your database are valid can be just as important as having flexibility. Enumerated data types (enums) are a great candidate for certain values that seldom change. With an enum, you first define the type and then use that type when creating your table. A great example may be states for invoices. First you can create your enumerated type, called invoice_state in this example

CREATETYPEinvoice_stateasenum('pending','failed','charged');

Then on your invoices table you can use the newly-created enumerated type as the column type:

CREATETABLEinvoices(idserial,customer_idint,amountint,stateinvoice_state);

Internally for the operation of our Citus Cloud database as a service, we use enums for things like invoice states, and also for AWS regions we support as those seldom change and it can be overkill to add another table to join against.

Match your data types to your needs

Whether it’s an IP address, a timestamp, a UUID, or other data type if you have a data type within your application consider using the equivalent match for that data type in Postgres. By using Postgres data types you’re able to get the maximum leverage and flexibility out of your database and with Postgres track record of improving functions and features around data types your world will only get better.

↧

Dimitri Fontaine: Preventing SQL Injections

August 29, 2018, 4:17 pm

≫ Next: Luca Ferrari: Managing Multiple PostgreSQL Installations with pgenv

≪ Previous: Craig Kerstiens: Postgres data types you should consider using

An SQL Injection is a security breach, one made famous by the Exploits of a Momxkcd comic episode in which we read about little Bobby Tables:

PostgreSQL implements a protocol level facility to send the static SQL query text separately from its dynamic arguments. An SQL injection happens when the database server is mistakenly led to consider a dynamic argument of a query as part of the query text. Sending those parts as separate entities over the protocol means that SQL injection is no longer possible.

↧

Luca Ferrari: Managing Multiple PostgreSQL Installations with pgenv

August 29, 2018, 5:00 pm

≫ Next: Shaun M. Thomas: [Video] Data Integration with PostgreSQL

≪ Previous: Dimitri Fontaine: Preventing SQL Injections

pgenv is a shell script that allows you to quickly manage multiple PostgreSQL installations within the same host. It reminds somehow perlbrew (for Perl 5) and systems like that. In this post I briefly show how to use pgenv as well as I explain which changes I made to it.

Managing Multiple PostgreSQL Installations with pgenv

pgenv is another pearl from theory. It is a bash single script that allows you to download, build, start and stop (as well as nuke) several PostgreSQL installations within the same host.
It is worth noting that pgenv is not, at least now, an enterprise-level PostgreSQL management tool, rather an easy way to keep test instances clean and organized. It can be very useful to keep several clusters on which doing experiments, testing, and so on.

I first discovered pgenv reading this blog post by David, and I thought it was cool to have a single script to help me manage several environments. I must be honest, this is not the first tool like this I have seen for PostgreSQL, but somehow it caught my attention. I then cloned the repository and start using it. And since I’m curious, I read the source code. Well, ehm, bash? Ok, it is not my favourite shell anymore, but surely it can speed up development while shorting the code with respect to more portable shells.

pgenv works with a command-oriented interface: as in git or other developer-oriented tools you specify a command (e.g., build) and optionally a specific PostgreSQL version to apply the command to. pgenv works on a single cluster at time, by linking and unlinking the specific instance...

↧