Dave Conlin: Configuring work_mem in Postgres

February 12, 2020, 6:38 am

≫ Next: Jobin Augustine: Compression of PostgreSQL WAL Archives Becoming More Important

≪ Previous: Hans-Juergen Schoenig: Migrating from MS SQL to PostgreSQL: Uppercase vs. Lowercase

One of the worst performance hits a Postgres query can take is having to perform a sort or hash operation on disk. When these space-intensive operations require more memory than is available, Postgres uses disk space instead. Disk space is much slower to read and write than RAM, so this generally takes significantly longer.

The best solution to this problem is to avoid having to perform the operation entirely, for example by adding a judicious index.

The second best solution is to reduce the amount of space required for the operation, by working on fewer rows or less data per row. If you can reduce the amount of memory needed, the operation can take place in RAM rather than requiring slow disk access.

However, these options are not always available. Assuming that the server has enough memory, it often makes sense to allow postgres to use more RAM for these operations before choosing to use disk space. This is done by adjusting the work_mem system config parameter.

The default value for work_mem is 4MB. This is generally acknowledged to be too small for most modern systems. For example, Christophe Pettus suggests that 16MB is a good starting point for most people. So it’s pretty normal to at least consider increasing it. You can check what the current value is with the query:

SHOW work_mem;

There are dangers in changing the work_mem value. It refers to the memory available to a single operation– one individual hash, bitmap scan or sort – and so even a single query can use several times the defined value. When you consider that servers are often serving many queries simultaneously, you’ll see why setting the value too high can lead to the server running out of memory. This is, to put it mildly, something you probably want to avoid.

A value too small, on the other hand, will result in too many operations taking place on disk, which is in turn much less efficient than using RAM.

Therefore it’s usually a good idea to experiment with values before changing them for the whole server. You can change the work_mem just for your current session by running a query, eg:

SET work_mem TO '16MB';

Possible units available are “kB”, “MB”, “GB” and “TB”
If you use an integer (with no units), Postgres will interpret this as a value in kilobytes

Although you can still destroy the server’s performance by setting work_mem for your session this way, it is much harder. Values that are too low will only hurt the queries you run yourself, and a value has to be much too high before it can hog all the available memory with just one query.

Once you’re happy that you want to change the work_mem value for your whole server, you can set the value permanently by adding/modifying the work_mem line in your data/postgresql.conf file:

work_mem = 16MB

Then reload the config using a query:

select pg_reload_conf();

Or run the reload from the command line:

pg_ctl reload -D postgres\data

Hopefully this will help you tweak your work_mem parameter to its optimal value!

Image credit: Sergio Ibanez

↧

Jobin Augustine: Compression of PostgreSQL WAL Archives Becoming More Important

February 13, 2020, 6:54 am

≫ Next: Robert Haas: Useless Vacuuming

≪ Previous: Dave Conlin: Configuring work_mem in Postgres

compression of postgresql wal archives As hardware and software evolve, the bottlenecks in a database system also shift. Many old problems might disappear and new types of problems pop-up.

Old Limitations

There were days when CPU and Memory was a limitation. More than a decade back, servers with 4 cores were “High End” and as a DBA, my biggest worry was managing the available resources. And for an old DBA like me, Oracle’s attempt to pool CPU and Memory from multiple host machines for a single database using RAC architecture was a great attempt to solve it.

Then came the days of storage speed limitations. It was triggered by the emergence of multi-core with multi-thread processors becoming common, as well as memory size and bus speed increasing. Enterprises tried to solve it with sophisticated SAN drives, Specialized Storages with cache, etc. But it has remained for many years, even now as enterprises started increasingly shifting to NVMe drives.

Recently we started observing a new bottleneck which is becoming a pain point for many database users. As the capability of the single-host server increased, it started processing a huge number of transactions. There are systems that produce thousands of WAL files in a couple of minutes, and there were a few cases reported where WAL archiving to a cheaper, slower disk system was not able to catch up with WAL generation. To add more complexity, many organizations prefer to store WAL archives over a low bandwidth network. (There is an inherent problem in Postgres Archiving that if it lags behind, it tends to lag more because the archive process needs to search among .ready files. which won’t be discussed here.)

In this blog post, I would like to bring to your attention the fact that compressing WALs can be easily achieved if you are not already doing it, as well as a query to monitor the archiving gap.

Compressing PostgreSQL WALs

The demands and requirements for compressing WALs before archiving are increasing day by day. Luckily, most of the PostgreSQL backup tools like pgbackrest/wal-g etc already take care of it. The

archive_command

invokes these tools, silently archiving for users.

For example, in pg_backrest, we can specify archive_command, which uses the gzip behind the scene.

ALTER SYSTEM SET archive_command = 'pgbackrest --stanza=mystanza archive-push %p';

Or in WAL-G, we can specify:

ALTER SYSTEM SET archive_command = 'WALG_FILE_PREFIX=/path/to/archive /usr/local/bin/wal-g wal-push  %p';

This does the lz4 compression of WAL files.

But what if we are not using any specific backup tool for WAL compression for archiving? We can still compress the WALs using Linux tools like gzip or bzip, etc. Gzip will be available in most of the Linux installations by default, so configuring it will be an easy task.

alter system set archive_command = '/usr/bin/gzip -c %p > /home/postgres/archived/%f.gz';

However, 7za is the most interesting among all the compression options for WAL, which gives the highest compression as fast as possible, which is the major criterion in a system with high WAL generation. You may have to explicitly install the 7za, which is part of the 7zip package from an extra repo.

On CentOS 7 it is:

sudo yum install epel-release
sudo yum install p7zip

On Ubuntu it is:

sudo apt install p7zip-full

Now we should be able to specify the archive_command like this:

postgres=# alter system set archive_command = '7za a -bd -mx2 -bsp0 -bso0 /home/postgres/archived/%f.7z %p';
ALTER SYSTEM

In my test system, I could see archived WAL files of less than 200kb. Size can vary according to the content of the WALs, which depends on the type of transaction on the database.

-rw-------. 1 postgres postgres 197K Feb  6 12:13 0000000100000000000000AA.7z
-rw-------. 1 postgres postgres 197K Feb  6 12:13 0000000100000000000000AB.7z
-rw-------. 1 postgres postgres 198K Feb  6 12:13 0000000100000000000000AC.7z
-rw-------. 1 postgres postgres 196K Feb  6 12:13 0000000100000000000000AD.7z
-rw-------. 1 postgres postgres 197K Feb  6 12:13 0000000100000000000000AE.7z

Compressing 16MB files to kilobyte rages is definitely going to save network bandwidth and storage while addressing the problem of archiving falling behind.

Restoring the WALs

Archiving and getting the highest compression is just one part, but we should also be able to restore them when required. The backup tools provide their own restore command options. For example, pgbackrest can use

archive-get

restore_command = 'pgbackrest --stanza=demo archive-get %f "%p"'

Wal-g provides

wal-fetch

for the same purpose.

In case you are opting for manual archive compression using gzip, we can use the gunzip utility in restore_command as follows:

gunzip -c /home/postgres/archived/%f.gz > %p

If you are already started using PostgreSQL 12, this parameter can be set using ALTER SYSTEM:

postgres=# alter system set restore_command = 'gunzip -c /home/postgres/archived/%f.gz > %p';
ALTER SYSTEM

For 7za as shown above, you may use the following:

postgres=# alter system set restore_command = '7za x -so /home/postgres/archived/%f.7z > %p';
ALTER SYSTEM

However, unlike archive_command changes, restore_command changes require you to restart the standby database.

Monitoring Archive Progress

The current WAL archive is available from pg_stat_archiver status, but finding out the gap using the WAL file names is a bit tricky. A sample query which I used to find out the WAL archive lagging is this:

select pg_walfile_name(pg_current_wal_lsn()),last_archived_wal,last_failed_wal, 
  ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),9,8))::bit(32)::int*256 + 
  ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),17))::bit(32)::int  -
  ('x'||substring(last_archived_wal,9,8))::bit(32)::int*256 -
  ('x'||substring(last_archived_wal,17))::bit(32)::int
  as diff from pg_stat_archiver;

The caveat here is that both current WAL and the WAL to be archived are of the same timeline in order for this query to work, which is the common case. Very rarely we may encounter a different case than that in production. So this query could be of good help when monitoring the WAL archiving of a PostgreSQL server.

Learn more about the Percona Distribution for PostgreSQL.

Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF

↧

Robert Haas: Useless Vacuuming

February 13, 2020, 11:13 am

≫ Next: Ernst-Georg Schmid: Excel and ODF support for cloudfs_fdw

≪ Previous: Jobin Augustine: Compression of PostgreSQL WAL Archives Becoming More Important

In previous blog posts that I've written about VACUUM, and I seem to be accumulating an uncomfortable number of those, I've talked about various things that can go wrong with vacuum, but one that I haven't really covered is when autovacuum seems to be running totally normally but you still have a VACUUM problem. In this blog post, I'd like to talk about how to recognize that situation, how to figure out what has caused it, how to avoid it via good monitoring, and how to recover if it happens.

↧

Ernst-Georg Schmid: Excel and ODF support for cloudfs_fdw

February 14, 2020, 2:53 am

≫ Next: Pavel Stehule: plpgsql_check 1.9 calculates coverage metrics

≪ Previous: Robert Haas: Useless Vacuuming

cloudfs_fdw now supports .xls (Excel 97-2003), .xlsx, and .ods (Open Document Format) Spreadsheets via pandas, xlrd, and odfpy. It requires pandas >= 1.0.1, so Multicorn must be compiled against Python 3.

Since pandas provides sorting and filtering capabilities, cloudfs_fdw tries to push down SQL qualifiers and sort keys when they can be translated into pandas notation.

Take a look and have fun.

↧

Pavel Stehule: plpgsql_check 1.9 calculates coverage metrics

February 14, 2020, 10:03 am

≫ Next: Daniel Vérité: Isolation Repeatable Read in PostgreSQL versus MySQL

≪ Previous: Ernst-Georg Schmid: Excel and ODF support for cloudfs_fdw

Small note - I finished support of statement and branch coverage metrics calculations for plpgsql_check

https://github.com/okbob/plpgsql_check/commit/c6f9896ead0a0969db23c3062f1fe7ce5f38029b

↧

Daniel Vérité: Isolation Repeatable Read in PostgreSQL versus MySQL

February 14, 2020, 10:14 am

≫ Next: Selvakumar Arumugam: A Tool to Compare PostgreSQL Database Schema Versions

≪ Previous: Pavel Stehule: plpgsql_check 1.9 calculates coverage metrics

To avoid having concurrent transactions interfere with each other, SQL engines implement isolation as a feature. This property corresponds to the I letter in the well known ACID acronym, the other properties being Atomicity, Consistency and Durability.

Isolation happens to be configurable, with different levels that correspond to different behaviors when executing concurrent transactions.

The SQL-1992 standard defines four level of isolation, from the weakest to the strongest:

Read Uncommitted: a transaction sees the changes of other transactions before they are committed. PostgreSQL does not implement this mode.
Read Committed: a transaction sees changes from others as soon as they have committed.
Repeatable Read: when a transaction reads back a row that has been already read by a previous query, it must read the same values, even if the row has been changed by another transaction that has committed in the meantime.
Serializable: a transaction cannot see or produce results that could not have occurred if other transactions were not concurrently changing the data. PostgreSQL implements true serializability since version 9.1

If you’re porting code from MySQL or writing code targeting both PostgreSQL and MySQL/MariaDB, you might want to care about the default level being Read Committed in PostgreSQL and Repeatable Read in MySQL or MariaDB (with the InnoDB engine). By the way, the SQL standard says that Serializable should be used by default, so both engines don’t follow this recommendation (just like Oracle, MS SQL Server, DB2,Sybase ASE… which ignore it as well).

Anyway, let’s see first a very simple example to illustrate the difference between MySQL and PostgreSQL in their default isolation levels:

Example 1

Say we have a single-column table with 4 values:

CREATE TABLE a(x int);
INSERT INTO a VALUES (1),(2),(3),(4);

A transaction Tx1 computes the sum and average of the values in two distinct queries, while a transaction Tx2 inserts a new value, with an execution schedule such that the insertion is committed between the queries in Tx1.

With PostgreSQL in its default isolation Read Committed:

-- Tx1                              -- Tx2
=# BEGIN;                           =# BEGIN;
BEGIN                               BEGIN

=# SELECT SUM(x) FROM a;
 sum
-----
  10
(1 row)
                                    =# INSERT INTO a VALUES(50);
                                    INSERT 0 1

                                    =# COMMIT;
                                    COMMIT
=# SELECT AVG(x) FROM a;
        avg
---------------------
 11.6666666666666667
   (1 row)

=# COMMIT;
COMMIT

The value 50 has been ignored by the SUM but taken in by the AVG query. If we look only at Tx1, its two results are mathematically inconsistent: of course you can’t have the average greater than the sum. This is because in Read Committed, each new SQL statement starts with a state of the database that includes all the changes of the other transactions that already committed. In this case that’s the insertion by Tx2 of 50.

With MySQL/MariaDB and its default isolation Repeatable Read, the result is different:

-- Tx1                                      -- Tx2
mysql> BEGIN;                               mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)        Query OK, 0 rows affected (0.00 sec)

> SELECT SUM(x) FROM a;
+--------+             
| SUM(x) |             
+--------+             
|     10 |             
+--------+             
1 row in set (0.00 sec)

                                          > INSERT INTO a VALUES(50);         
                                          Query OK, 1 row affected (0.00 sec) 
                                          
                                          > COMMIT;
                                          Query OK, 0 rows affected (0.02 sec)


> SELECT AVG(x) FROM a;                   
+--------+                                
| AVG(x) |
+--------+
| 2.5000 |
+--------+
1 row in set (0.00 sec)

> COMMIT;
Query OK, 0 rows affected (0.00 sec)

With MySQL/MariaDB, the row with 50 inserted by Tx2 is ignored by Tx1, due to the Repeatable Read isolation: the values previously read for the SUM operation must be reused for the AVG operation.

To get the same behavior with PostgreSQL in this example, we’d need to set the transaction to a higher isolation level (Repeatable Read ou Serializable)

The point is that simple concurrent queries can produce different results across different engines in their default configurations.

Now we could think that in order to get the same results with MySQL and PostgreSQL, we should just use them at the same isolation levels. That sounds reasonable, but that’s not true in general. In fact, behaviors can differ quite a bit across SQL engines at the same isolation levels. Let’s see another example to illustrate this.

Example 2

Now let’s use Repeatable Read in both engines. To switch the session to this isolation level independently of the default value, we want to execute the following statements:

for MySQL/MariaDB:

> SET SESSION transaction isolation level Repeatable Read;

for PostgreSQL:

=# SET default_transaction_isolation TO 'Repeatable Read';

Let’s create two empty tables:

CREATE TABLE a(xa int);
CREATE TABLE b(xb int);

Now let’s execute two concurrent transactions that insert into each table the count(*) of the other table.

Here’s the PostgreSQL transcript:

-- Tx1                              -- Tx2
=# BEGIN;                           =# BEGIN;
BEGIN                               BEGIN

=# INSERT INTO a
   SELECT count(*) FROM b;
INSERT 0 1
                                    =# INSERT INTO b
                                       SELECT count(*) FROM a;
                                    INSERT 0 1
=# COMMIT;
COMMIT
                                    =# SELECT COUNT(*) FROM a;
                                     count 
                                    -------
                                         0
                                    (1 row)
                                    
                                    =# COMMIT;
                                    COMMIT

After Tx1 and Tx2 have finished, here are the results:

=# SELECT * FROM a;
 xa 
----
  0
(1 row)

=# SELECT * FROM b;
 xb 
----
  0
(1 row)

The value 0 ends up in the two tables because, for each of the two transactions, the table from which it counts the rows is empty according to its visibility.

But with MySQL/MariaDB, the behavior and the final result are different, as shown in the transcript below. Tx2 waits for Tx1 and incorporates its results before continuing, instead of being entirely isolated from Tx1.

-- Tx1                                      -- Tx2
mysql> BEGIN;                               mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)        Query OK, 0 rows affected (0.00 sec)

> INSERT INTO a SELECT count(*) FROM b;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0


                                            > INSERT INTO b SELECT count(*)
                                              FROM a;
                                            -- (Tx2 gets blocked here)

> COMMIT;
Query OK, 0 rows affected (0.01 sec)

                                            -- (Tx2 continues)
                                            
                                            Query OK, 1 row affected (5.03 sec)
                                            Records: 1  Duplicates: 0  Warnings: 0
                                            
                                            > -- this result differs from PG
                                            > SELECT COUNT(*) FROM a;
                                            +----------+
                                            | count(*) |
                                            +----------+
                                            |        1 |
                                            +----------+
                                            
                                            > COMMIT;
                                            Query OK, 0 rows affected (0.01 sec)

-- Tx1 and Tx2 are done
> SELECT * FROM a;
+------+
| xa   |
+------+
|    0 |
+------+
1 row in set (0.01 sec)

> -- this result differs from PG
> SELECT * FROM b;
+------+
| xb   |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

Despite this sequence of instructions being very simple and the transactions being in Repeatable Read in the two engines, the result differs between Postgres and MySQL/MariaDB, in the sense that the b table has a row with 0 in PostgreSQL and 1 in MySQL.

Since Tx1 and Tx2 don’t write to the same row (in fact in this case they don’t even write to the same table), for PostgreSQL the INSERT of Tx1 does not interfere at all with Tx2. Tx2 does not need to wait for Tx1 to finish, it can count the rows in a through its snapshot.

Whereas with MySQL, Tx2 waits for Tx1 to end, and counts the rows in a after incorporating what Tx1 has done (one insertion). So Tx1 and Tx2 are less isolated. From the point of view of a PostgreSQL user accustomed to its Repeatable Read mode, this behavior is quite surprising (in fact this part seems like Read Committed in PostgreSQL).

And there are more differences. Let’s see another one with a variant of this example in which Tx2 queries the a table at the start of the transaction.

Example 3

So this example is a variant of the previous one, in which Tx2 reads and returns count(*) FROM a at the beginning. For PostgreSQL, it doesn’t change anything: during all of Tx2, count(*) FROM a is always 0, whether as a subquery of an INSERT statement or as the main query.

But for MySQL, there’s a clear difference of behavior to avoid the “Non Repeatable Read” phenomenon, which the minimum expected for the Repeatable Read isolation level.

Let’s look at the following transcript with MySQL/MariaDB:

-- Tx1                                      -- Tx2
mysql> BEGIN;                               mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)        Query OK, 0 rows affected (0.00 sec)

> INSERT INTO a SELECT count(*) FROM b;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0
                                            > -- Initiate a repeatable read
                                            > SELECT count(*) FROM a;
                                            +----------+
                                            | count(*) |
                                            +----------+
                                            |        0 |
                                            +----------+
                                            1 row in set (0.00 sec)


                                            > INSERT INTO b SELECT count(*)
                                              FROM a;
                                            -- (Tx2 gets blocked)

> COMMIT;
Query OK, 0 rows affected (0.01 sec)

                                            -- (Tx2 gets unblocked)
                                            
                                            Query OK, 1 row affected (3.13 sec)
                                            Records: 1  Duplicates: 0  Warnings: 0
                                            

                                            > SELECT * FROM b;
                                            +------+
                                            | xb   |
                                            +------+
                                            |    1 |
                                            +------+
                                            1 row in set (0.00 sec)
                                            
                                            > -- repeat the read
                                            > SELECT COUNT(*) FROM a;
                                            +----------+
                                            | count(*) |
                                            +----------+
                                            |        0 |
                                            +----------+
                                            1 row in set (0.00 sec)
                                            
                                            > COMMIT;
                                            Query OK, 0 rows affected (0.01 sec)

The final results in a and b are the same than in the previous example with table b containing 1, but this time at the end of Tx2, it turns out that SELECT COUNT(*) FROM a returns 0, while in the previous example it returned 1.

The difference is that a “Non Repeatable Read” phenomenon is now forbidden because there was a SELECT count(*) FROM a which returned 0 at the start of the transaction. So any subsequent SELECT count(*) FROM a must produce 0. But that is true only when it’s directly run, not when it’s executed as a subquery through INSERT INTO b SELECT count(*) FROM a.

That’s why there this weird inconsistency between the 1 to be found in b.xb and the 0 returned to the client, despite the fact that it’s the result of the same expression in the same Repeatable Read transaction.

Also this difference in result persists during the rest of Tx2. For instance before the COMMIT we could have this sequence of statements:

mysql> SELECT * FROM b;
+------+
| xb   |
+------+
|    1 |
+------+
1 row in set (0.01 sec)

mysql> INSERT INTO b SELECT COUNT(*) FROM a;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM b;
+------+
| xb   |
+------+
|    1 |
|    1 |
+------+
2 rows in set (0.00 sec)

mysql> SELECT COUNT(*) FROM a;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Example 4

There’s another very significant difference in how Repeatable Read in PostgreSQL and MySQL/MariaDB deal with write conflicts, (the kind of conflict that can generate “Lost Updates”). PostgreSQL with its Snapshot Isolation technique will avoid a write conflict on the same row by aborting one of the transactions.

By contrast,MySQL/MariaDB at the same isolation level does not abort a transaction, but forbids the second write (a delete, in that case) without emitting any error.

Let’s have again a table with a single column with 4 integer numbers from 1 to 4. The two concurrent transactions are Tx1 which subtracts 1 from each value, and Tx2 which suppresses the row corresponding to the maximum value in the table. The idea is that Tx2 want to remove a row that Tx1 changes concurrently.

CREATE TABLE list(x int);
INSERT INTO list VALUES (1),(2),(3),(4);

PostgreSQL transcript:

-- Tx1                                 -- Tx2
=# BEGIN;                              =# BEGIN;
BEGIN                                  BEGIN

=# UPDATE list SET x=x-1;              =# SELECT * FROM list; 
                                        x                     
                                       ---                    
                                        1                     
                                        2                     
                                        3                     
                                        4                     
                                       (4 rows)             
                                       
                                       =# DELETE FROM list WHERE x=4;
                                       -- (Tx2 gets blocked)

=# COMMIT
COMMIT
                                        -- Tx2 ends in error
                                        ERROR:  could not serialize access
                                        due to concurrent update
                                        
                                        =# \echo :SQLSTATE
                                        40001
                                        
                                        =# ROLLBACK;
                                        ROLLBACK

At the Repeatable Read isolation level, the engine rejects the write by Tx2 (broadly speaking, a delete is a write) on a row that Tx1 has modified. That refusal consists of aborting the transaction with a specific error code (SQLSTATE 40001).

With MySQL/MariaDB as below, there is no transaction abort. The DELETE does not error out but does not remove the row with x=4 that Tx1 changed, even though this row stays visible from Tx2 until its end.

-- Tx1                                      -- Tx2
mysql> BEGIN;                               mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)        Query OK, 0 rows affected (0.00 sec)

> UPDATE list SET x=x-1;                    > SELECT * FROM list;   
Query OK, 4 rows affected (0.00 sec)        +------+                
Rows matched: 4  Changed: 4  Warnings: 0    | x    |                
                                            +------+                
                                            |    1 |                
                                            |    2 |                
                                            |    3 |                
                                            |    4 |                
                                            +------+                
                                            4 rows in set (0.00 sec)
                                            
                                            > DELETE FROM list WHERE x=4;
                                            -- (Ici Tx2 se trouve bloquÃ©e)
                                            Query OK, 0 rows affected (5.73 sec)
> COMMIT;
Query OK, 0 rows affected (0.01 sec)

                                            > SELECT * FROM list;
                                            +------+
                                            | x    |
                                            +------+
                                            |    1 |
                                            |    2 |
                                            |    3 |
                                            |    4 |
                                            +------+
                                            4 rows in set (0.01 sec)

                                            > DELETE FROM list WHERE x=4;
                                            Query OK, 0 rows affected (0.00 sec)

                                            > SELECT * FROM list;
                                            +------+
                                            | x    |
                                            +------+
                                            |    1 |
                                            |    2 |
                                            |    3 |
                                            |    4 |
                                            +------+
                                            4 rows in set (0.00 sec)
                                            
                                            > COMMIT;
                                            Query OK, 0 rows affected (0.00 sec)

When repeating the SELECT * FROM list and the DELETE, we can see that the row with x=4 is still visible, and that the DELETE doesn’t remove it. The only indication that the engine does not want to delete it is the 0 rows affected in the information returned along with the DELETE.

By comparison with PostgreSQL, Tx2 not deleting the row looks like the behavior of its Read Committed level, but the fact that this row persists to be visible looks like the behavior of the Repeatable Read level. In that sense the Repeatable Read level in MySQL feels like it sits somewhere between the Read Committed and the Repeatable Read of Postgres.

Conclusion

When porting applications from MySQL to PostgreSQL or vice-versa, or designing services that need to work on both, we should expect different behaviors with concurrent transactions, even when configured at the same isolation level.

The SQL standard says that certain isolation levels must avoid certain phenomena, but each SQL implementation has its own interpretation of this, with visible results that are clearly different for the same SQL code.

PostgreSQL uses Read Committed by default, whereas MySQL has chosen Repeatable Read, which is better isolated, but when PostgreSQL transactions use the Repeatable Read level, they’re more isolated than MySQL transactions.

↧

Selvakumar Arumugam: A Tool to Compare PostgreSQL Database Schema Versions

February 10, 2020, 4:00 pm

≫ Next: Luca Ferrari: Take advantage of pg_settings when dealing with your configuration

≪ Previous: Daniel Vérité: Isolation Repeatable Read in PostgreSQL versus MySQL

Parcel sorting Photo by @kelvyn on Unsplash

The End Point development team has completed a major application migration from one stack to another. Many years ago, the vendor maintaining the old stack abandoned support and development. This led to a stack evolution riddled with independent custom changes and new features in the following years.

The new application was developed by a consortium that created migration scripts to transfer data to a fresh stack resulting in a completely restructured database schema. While we could not directly use those consortium migration scripts to our client application, attempting to create migration scripts from scratch would be tedious due to the many labor-intensive and time-consuming tasks. We looked to reuse and customize the scripts in order to ensure an exact match of the customized changes to the client’s application.

Liquibase: A Schema Comparison Tool

After an arduous hunt for a suitable solution, we came across Liquibase, an open-source schema comparison tool that utilizes the diff command to assess missing, changed, and unexpected objects.

Installation and Usage

Let’s see how to use Liquibase and review the insights and results offered by the diff command.

Before beginning, download the latest version of Liquibase. As the default package doesn’t have its own driver, it would be wise to add the PostgreSQL driver to the Liquibase lib folder. (You’ll need to do this with any other database types and their necessary libraries and drivers.)

$ wget https://github.com/liquibase/liquibase/releases/download/v3.8.5/liquibase-3.8.5.tar.gz
$ tar xvzf liquibase-3.8.5.tar.gz
$ wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.5/postgresql-42.2.5.jar -P lib/

$ ./liquibase \
--classpath="/path/to/home/apps/liquidiff/lib" \
--outputFile=liquibase_output.txt \
--driver=org.postgresql.Driver \
--url=jdbc:postgresql://localhost:5432/schema_two \
--username=postgres \
--password=CHANGEME \
--defaultSchemaName=public \
Diff \
--referenceUrl=jdbc:postgresql://localhost:5432/schema_one \
--referenceUsername=postgres \
--referencePassword=CHANGEME \
--referenceDefaultSchemaName=public

Comparison Results

The following output shows the list of all sections with missing, changed, and newly added objects to each section.

$ cat liquibase_output.txt

Reference Database: postgres @ jdbc:postgresql://localhost:5432/schema_one (Default Schema: public)
Comparison Database: postgres @ jdbc:postgresql://localhost:5432/schema_two (Default Schema: public)
Compared Schemas: public
Product Name: EQUAL
Product Version: EQUAL
Missing Catalog(s): NONE
Unexpected Catalog(s): NONE
Changed Catalog(s): 
     schema_one
          name changed from 'schema_one' to 'schema_two'
Missing Column(s): 
     public.users.settings
     ...
Changed Column(s): 
     public.table_one.unique_no
          type changed from 'varchar(20 BYTE)' to 'varchar(255 BYTE)'     
     public.table_two.created_at
          defaultValue changed from 'null' to 'now()'
          nullable changed from 'false' to 'true'
          order changed from '4' to '22'
    ...
Missing Foreign Key(s): 
     one_two_id_fkey(one[two_id] -> two[id])
    ...
Unexpected Foreign Key(s): 
    ...
Changed Foreign Key(s): 
    ...
Missing Index(s): 
    ...
Unexpected Index(s): 
     table_pkey UNIQUE  ON public.table(id)
    ...
Changed Index(s): 
     index_events_on_record_number ON public.table(record_number)
          unique changed from 'false' to 'true'
    ...
Missing Primary Key(s): ...
Unexpected Primary Key(s): ...
Changed Primary Key(s): ...
Missing Schema(s): NONE
Unexpected Schema(s): NONE
Changed Schema(s): NONE
Missing Sequence(s): ...
Unexpected Sequence(s): NONE
Changed Sequence(s): NONE
Missing Stored Procedure(s): NONE
Unexpected Stored Procedure(s): NONE
Changed Stored Procedure(s): NONE
Missing Table(s): ...
Unexpected Table(s): ...
Changed Table(s): NONE
Missing Unique Constraint(s): NONE
Unexpected Unique Constraint(s): NONE
Changed Unique Constraint(s): NONE
Missing View(s): ...
Unexpected View(s): NONE
Changed View(s): NONE
Liquibase command 'Diff' was executed successfully.

Conclusion

Although comparing and contrasting the 100+ tables from the old application was beneficial, the data migration was still challenging due to volume and variety of data. However, with the help of Liquibase, we became more familiar with differences in the schema, including table level, columns, references, indexes, views, etc.

This led to an increase in accuracy which was very helpful during the migration process. We hope that by sharing our findings, others will also benefit from this tool and all that it offers.

↧

Luca Ferrari: Take advantage of pg_settings when dealing with your configuration

February 12, 2020, 4:00 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add leader_pid to pg_stat_activity

≪ Previous: Selvakumar Arumugam: A Tool to Compare PostgreSQL Database Schema Versions

The right way to get the current PostgreSQL configuration is by means of pg_settings.

Take advantage of pg_settings when dealing with your configuration

I often see messages on PostgreSQL related mailing list where the configuration is assumed by a Unix-style approach. For example, imagine you have been asked to provide your autovacuum configuration in order to see if there’s something wrong with it; one approach I often is the copy and paste of the following:

% sudo -u postgres grep autovacuum /postgres/12/postgresql.conf #autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem #autovacuum = on # Enable autovacuum subprocess? 'on' #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and autovacuum_max_workers = 7 # max number of autovacuum subprocesses autovacuum_naptime = 2min # time between autovacuum runs autovacuum_vacuum_threshold = 500 # min number of row updates before autovacuum_analyze_threshold = 700 # min number of row updates before #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum #autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age #autovacuum_vacuum_cost_delay = 2ms # default vacuum cost delay for # autovacuum, in milliseconds; #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use 

While this could be a correct approach and makes it simply to provide a full set of configuration values, it has few drawbacks:

it produces verbose output (e.g., there are comments on the right of each line);
it could not be the whole story about the configuration, for example because something is in postgresql.conf.auto;
it does include commented out lines;
it could be not the configuration your cluster is running on.

Let’s examine all the drawbacks, one...

↧

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add leader_pid to pg_stat_activity

February 17, 2020, 6:24 am

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add %x to default PROMPT1 and PROMPT2 in psql

≪ Previous: Luca Ferrari: Take advantage of pg_settings when dealing with your configuration

On 6th of February 2020, Michael Paquier committed patch: Add leader_pid to pg_stat_activity This new field tracks the PID of the group leader used with parallel query. For parallel workers and the leader, the value is set to the PID of the group leader. So, for the group leader, the value is the same … Continue reading

↧

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add %x to default PROMPT1 and PROMPT2 in psql

February 17, 2020, 12:07 pm

≫ Next: Hubert 'depesz' Lubaczewski: Which tables should be auto vacuumed or auto analyzed – UPDATE

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add leader_pid to pg_stat_activity

On 12nd of February 2020, Michael Paquier committed patch: Add %x to default PROMPT1 and PROMPT2 in psql %d can be used to track if the current connection is in a transaction block or not, and adding it by default to the prompt has the advantage to not need a modification of .psqlrc, something … Continue reading

↧

Hubert 'depesz' Lubaczewski: Which tables should be auto vacuumed or auto analyzed – UPDATE

February 18, 2020, 4:45 am

≫ Next: Hubert 'depesz' Lubaczewski: Fix for displaying aggregates on explain.depesz.com

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – Add %x to default PROMPT1 and PROMPT2 in psql

Some time ago I wrote blogpost which showed how to list tables that should be autovacuumed or autoanalyzed. Query in there had one important problem – it didn't take into account per-table settings. Specifically – it only used system-wide values for: autovacuum_analyze_scale_factor autovacuum_analyze_threshold autovacuum_vacuum_scale_factor autovacuum_vacuum_threshold but these can be also set per table using syntax … Continue reading

↧

Hubert 'depesz' Lubaczewski: Fix for displaying aggregates on explain.depesz.com

February 18, 2020, 8:41 am

≫ Next: Thomas Spoelstra: My First Postgres Day experience at FosDEM 2020

≪ Previous: Hubert 'depesz' Lubaczewski: Which tables should be auto vacuumed or auto analyzed – UPDATE

Couple of days ago RhodiumToad reported, on irc, a bug in explain.depesz.com. Specifically – if explain was done using JSON/XML/YAML formats, and node type was Aggregate, the site didn't extract full info. In text explains the node type is one of: Aggregate HashAggregate GroupAggregate But in non-text formats, type of Aggregate was ignored. As of … Continue reading

↧

Thomas Spoelstra: My First Postgres Day experience at FosDEM 2020

February 10, 2020, 3:17 am

≫ Next: Hubert 'depesz' Lubaczewski: Why I’m not fan of uuid datatype

≪ Previous: Hubert 'depesz' Lubaczewski: Fix for displaying aggregates on explain.depesz.com

My first experience of a Postgres day, where people talk postgreSQL all day. My day started very early – I left home at 05:15 and arrived in Brussels at 09:10. Unfortunately, due to the train schedules (where trains seem to wake up later than I do) I arrived a bit late for the opening talk by Magnus Hagander. Magnus is one of the core members of the PostgreSQL infrastructure team, developer and comitter in the Global development group. And then, straight into the sessions in a packed room, fortunately there was a seat available.

↧

Hubert 'depesz' Lubaczewski: Why I’m not fan of uuid datatype

February 18, 2020, 11:37 pm

≫ Next: Hans-Juergen Schoenig: shared_buffers: Looking into the PostgreSQL I/O cache

≪ Previous: Thomas Spoelstra: My First Postgres Day experience at FosDEM 2020

Recently, on irc, there were couple of cases where someone wanted to use uuid as datatype for their primary key. I opposed, and tried to explain, but IRC doesn't really allow for longer texts, so figured I'll write a blogpost. First problem – UUID values are completely opaque. That means – uuids generated for table … Continue reading

↧

Hans-Juergen Schoenig: shared_buffers: Looking into the PostgreSQL I/O cache

February 19, 2020, 12:00 am

≫ Next: Amit Kapila: Parallelism, what next?

≪ Previous: Hubert 'depesz' Lubaczewski: Why I’m not fan of uuid datatype

The PostgreSQL caching system has always been a bit of a miracle to many people and many have asked me during consulting or training sessions: How can I figure out what the PostgreSQL I/O cache really contains? What is in shared buffers and how can one figure out? This post will answer this kind of question and we will dive into the PostgreSQL cache.

shared_buffers-looking-into-the-postgresql-i-o-cache

Creating a simple sample database

Before we can inspect shared buffers we have to create a little database. Without data the stuff we are going to do is not too useful:

 hs@hansmacbook ~ % createdb test

To keep it simple I have created a standard pgbench database containing 1 million rows as follows:

 hs@hansmacbook ~ % pgbench -i -s 10 test
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data...
100000 of 1000000 tuples (10%) done (elapsed 0.14 s, remaining 1.25 s)
200000 of 1000000 tuples (20%) done (elapsed 0.27 s, remaining 1.10 s)
300000 of 1000000 tuples (30%) done (elapsed 0.41 s, remaining 0.95 s)
400000 of 1000000 tuples (40%) done (elapsed 0.61 s, remaining 0.91 s)
500000 of 1000000 tuples (50%) done (elapsed 0.79 s, remaining 0.79 s)
600000 of 1000000 tuples (60%) done (elapsed 0.92 s, remaining 0.62 s)
700000 of 1000000 tuples (70%) done (elapsed 1.09 s, remaining 0.47 s)
800000 of 1000000 tuples (80%) done (elapsed 1.23 s, remaining 0.31 s)
900000 of 1000000 tuples (90%) done (elapsed 1.37 s, remaining 0.15 s)
1000000 of 1000000 tuples (100%) done (elapsed 1.49 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.

Deploying 1 million rows is pretty fast. In my case it took around 1.5 seconds (on my laptop),

Deploying pg_buffercache

Now that we have some data we can install the pg_buffercache extension which is ideal if you want to inspect the content of the PostgreSQL I/O cache:


test=# CREATE EXTENSION pg_buffercache;
CREATE EXTENSION
test=# \d pg_buffercache
View "public.pg_buffercache"
Column | Type | Collation | Nullable | Default
------------------+----------+-----------+----------+---------
bufferid | integer | | |
relfilenode | oid | | |
reltablespace | oid | | |
reldatabase | oid | | |
relforknumber | smallint | | |
relblocknumber | bigint | | |
isdirty | boolean | | |
usagecount | smallint | | |
pinning_backends | integer | | |

pg_buffercache will return one row per 8k block in shared_buffers. However, to make sense out of the data one has to understand the meaning of those OIDs in the view. To make it easier for you I have created some simple example.

Let us take a look at the sample data first:


test=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+-------+---------+-------------
public | pg_buffercache | view | hs | 0 bytes |
public | pgbench_accounts | table | hs | 128 MB |
public | pgbench_branches | table | hs | 40 kB |
public | pgbench_history | table | hs | 0 bytes |
public | pgbench_tellers | table | hs | 40 kB |
(5 rows)



My demo database consists of 4 small tables.
<h2>Inspecting per database caching</h2>
Often the question is how much data from which database is currently cached. While this sounds simple you have to keep some details in mind:



SELECT CASE WHEN c.reldatabase IS NULL THEN ''
WHEN c.reldatabase = 0 THEN ''
ELSE d.datname
END AS database,
count(*) AS cached_blocks
FROM pg_buffercache AS c
LEFT JOIN pg_database AS d
ON c.reldatabase = d.oid
GROUP BY d.datname, c.reldatabase
ORDER BY d.datname, c.reldatabase;

database | cached_blocks
---------------+---------------
postgres | 67
template1 | 67
test | 526
| 25
| 15699
(5 rows)

The reldatabase column contains the object ID of the database a block belongs to. However, there is a “special” thing here: 0 does not represent a database but the pg_global schema. Some objects in PostgreSQL such as the list of databases, the list of tablespaces or the list of users are not stored in a database – this information is global. Therefore “0” needs some special treatment here. Otherwise the query is pretty straight forward. To figure out how much RAM is currently not empty we have to go and count those empty entries which have no counterpart in pg_database. In my example the cache is not really fully populated but mostly empty. On a real server with real data and real load the cache is almost always 100% in use (unless your configuration is dubious).

Inspecting your current database

There is one more question many people are interested in: What does the cache know about my database? To answer that question I will access an index to make sure some blocks will be held in shared buffers:


test=# SELECT count(*) FROM pgbench_accounts WHERE aid = 4;
count
-------
1
(1 row)

The following SQL statement will calculate how many blocks from which table (r) respectively index (relkind = i) is currently cached:

test=# SELECT c.relname, c.relkind, count(*)
FROM pg_database AS a, pg_buffercache AS b, pg_class AS c
WHERE c.relfilenode = b.relfilenode
AND b.reldatabase = a.oid
AND c.oid >= 16384
AND a.datname = 'test'
GROUP BY 1, 2
ORDER BY 3 DESC, 1;
relname | relkind | count
-----------------------+---------+-------
pgbench_accounts | r | 2152
pgbench_branches | r | 5
pgbench_tellers | r | 5
pgbench_accounts_pkey | i | 4
(4 rows)

We deliberately exclude all relations with object ID below 16384, because these low IDs are reserved for system objects. That way, the output only contains data for user tables.

As you can see the majority of blocks in memory originate from pgbench_accounts. This query is therefore a nice way to find out instantly what is in cache and what is not. Of course there is a lot more information to be extracted but for most use cases those two queries will answer the most pressing questions.

Finally …

If you want to know more about PostgreSQL and performance in general I suggest checking out one of our other posts in PostgreSQL performance issues.

The post shared_buffers: Looking into the PostgreSQL I/O cache appeared first on Cybertec.

↧

Amit Kapila: Parallelism, what next?

February 19, 2020, 3:07 am

≫ Next: Magnus Hagander: Connecting to Azure PostgreSQL with libpq 12 in a Kerberos environment

≪ Previous: Hans-Juergen Schoenig: shared_buffers: Looking into the PostgreSQL I/O cache

This blog post is about the journey of parallelism in PostgreSQL till now and what is in store for the future. Since PostgreSQL 9.6 where the first feature of parallel query has arrived, each release improves it. Below is a brief overview of the parallel query features added in each release.

PG9.6 has added Parallel execution of sequential scans, joins, and aggregates.

PG10 has added (a) Support parallel B-tree index scans, (b) Support parallel bitmap heap scans, (c) Allow merge joins to be performed in parallel, (d) Allow non-correlated subqueries to be run in parallel, (e) Improve ability of parallel workers to return pre-sorted data and (f) Increase parallel query usage in procedural language functions.

PG11 has added (a) Allow parallel building of a btree index, (b) Allow hash joins to be performed in parallel using a shared hash table, (c) Allow parallelization of commands CREATE TABLE ... AS, SELECT INTO, and CREATE MATERIALIZED VIEW, (d) Allow UNION to run each SELECT in parallel if the individual SELECTs cannot be parallelized, (e) Allow partition scans to more efficiently use parallel workers, (f) Allow LIMIT to be passed to parallel workers, this allows workers to reduce returned results and use targeted index scans, (g) Allow single-evaluation queries, e.g. WHERE clause aggregate queries, and functions in the target list to be parallelized.

PG12 has added Allow parallelized queries when in SERIALIZABLE isolation mode.

The progress for PG13 with respect to parallelism. Some of the important advancements are:
(a) Parallel vacuum - This feature allows the vacuum to leverage multiple CPUs in order to process indexes. This enables us to perform index vacuuming and index cleanup with background workers. This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table. Specifying zero as a number of workers will disable parallelism. For more information, see commit.

(b) Improve EXPLAIN's handling of per-worker details. This allows displaying the worker information in a much better way. The few visible side-effects as mentioned in the commit

* In text format, instead of something like

Sort Method: external merge Disk: 4920kB
Worker 0: Sort Method: external merge Disk: 5880kB
Worker 1: Sort Method: external merge Disk: 5920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Buffers: shared hit=345 read=3507, temp read=505 written=744

you get

Sort Method: external merge Disk: 4920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Sort Method: external merge Disk: 5880kB
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Sort Method: external merge Disk: 5920kB
Buffers: shared hit=345 read=3507, temp read=505 written=744

(c) Avoid unnecessary shm writes in Parallel Hash Join. This improves the performance of Parallel Hash Join by a significant amount on large systems running many-core joins. Though this work has been back-patched to v11 where Parallel Hash Join was introduced, I mentioned it here as it is done during PG13 development. For more information, see commit.

What is being discussed for the future:
(a) Parallel grouping sets - PostgreSQL already supports parallel aggregation by aggregating in two stages. First, each process participating in the parallel portion of the query performs an aggregation step, producing a partial result for each group of which that process is aware. Second, the partial results are transferred to the leader via the Gather node. Finally, the leader re-aggregates the results across all workers in order to produce the final result.

Next, there has been a discussion in the community to parallelize queries containing grouping sets in much the same way as we do parallel aggregation.
Basically, the aim is to parallelize queries like SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
This feature has been proposed for PG13, but yet not committed.

(b) Parallel copy - We also want to parallelize the Copy command, in particular "Copy <table_name> from .. ;" command. This will help improve the bulk load operation in PostgreSQL. Currently, we do a lot of work during the Copy command. We read the file in 64KB chunks, then find the line endings and process that data line by line, where each line corresponds to one tuple. We first form the tuple (in form of value/null array) from that line, check if it qualifies the where condition and if it qualifies, then perform constraint check and few other checks and then finally store it in local tuple array. Once we reach 1000 tuples or consumed 64KB (whichever occurred first), we insert them together and then for each tuple insert into the index(es) and execute after row triggers. The aim of this work is to parallelize as much as possible the work done during the copy. There is an ongoing discussion in the community on this topic.

There is a small caveat here that to achieve parallel copy, we need to work on relation extension lock where parallel workers block each other while extending the relation which is not the case currently. There is already a discussion on this topic in the community.

(c) Parallel file_fdw - The proposed work in this area allows file_fdw to divide its scan up for parallel workers, much like a parallel seq scan.

There are more areas where the parallelism can be used like parallel DMLs, but I would not like to go in detail of that as till now we haven't seen any proposals for those. Similarly, we can improve few things in our current parallel infrastructure (a) As of now, for each query the parallel workers are created and destroyed, instead we can have some pool of parallel query workers which can avoid the overhead of starting them for each query, (b) As of now, each worker can use up to work_mem of memory which might increase the overall memory usage of query. We might want to improve this, but currently, there is no proposal for this.

↧

Magnus Hagander: Connecting to Azure PostgreSQL with libpq 12 in a Kerberos environment

February 20, 2020, 2:26 am

≫ Next: Andreas 'ads' Scherbaum: PGConf.DE 2020 - Registration open

≪ Previous: Amit Kapila: Parallelism, what next?

If you are using Azure PostgreSQL and have upgraded your client side libpq to version 12 (which can happen automatically for example if you use the PostgreSQL apt repositories), you may see connection attempts fail with symptoms like:

$ psql -hZZZZZZ.postgres.database.azure.com -dpostgres -UXXXXX_dba@ZZZ-db01
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

With no log information whatsoever available. This can happen if your client is in a Kerberos environment and has valid Kerberos credentials (which can be verified with the klist command). In this case, PostgreSQL 12 will attempt to negotiate GSSAPI encryption with the server, and it appears the connection handler in Azure PostgreSQL is unable to handle this and just kills the connection.

When running the same thing against a local PostgreSQL server prior to version 12, a message like the following will show up in the log:

2020-02-20 10:48:08 CET [35666]: [2-1] client=1.2.3.4 FATAL:  unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0

This is a clear indicator of what's going on, but unfortunately the information isn't always available when connecting to a managed cloud service, such as Azure PostgreSQL. The hard error from Azure also prevents libpq from retrying without GSSAPI encryption, which is what would happen when connecting to a regular PostgreSQL backend or for example through pgbouncer.

The fix/workaround? Disable GSSAPI encryption in the client:

$ export PGGSSENCMODE=disable
$ psql -hZZZZZZ.postgres.database.azure.com -dpostgres -UXXXXX_dba@ZZZ-db01
Password for user XXXXX_dba@ZZZ-db01:
psql (11.6 (Ubuntu 11.6-1.pgdg16.04+1), server 9.5.20)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=>

If you have this type of issue, it's probably worth putting this environment variable in your startup scripts. It can also be set using the gssencmode parameter as part of the connection string, in environments where this is more convenient.

↧

Andreas 'ads' Scherbaum: PGConf.DE 2020 - Registration open

February 20, 2020, 5:00 am

≫ Next: movead li: Have An Eye On Locks Of PostgreSQL

≪ Previous: Magnus Hagander: Connecting to Azure PostgreSQL with libpq 12 in a Kerberos environment

PostgreSQL Conference Germany 2020 in Stuttgart, Germany, on May 15th is now open for registrations.

The Call for Papers is already closed, and we are working with the last speakers to confirm their talks, and we will have a full schedule published soon.

There are still a few "EARLYBIRD" tickets available, until end of February.

See you in Stuttgart!

↧

movead li: Have An Eye On Locks Of PostgreSQL

February 20, 2020, 5:32 pm

≫ Next: Darafei Praliaskouski: PostGIS 3.0.1

≪ Previous: Andreas 'ads' Scherbaum: PGConf.DE 2020 - Registration open

The lock is an essential part of a database system. In PostgreSQL, there are various locks, such as table lock, row lock, page lock, transaction lock, advisory lock, etc. Some of these locks are automatically added to complete the database functions during the operation of the database system, and some are manually added for the elements in PostgreSQL through some SQL commands. This blog explores the locks in PostgreSQL.

1. A Little Bit About pg_locks

It is a locks view in PostgreSQL. Except for row locks added by the SELECT … FOR command, we can observe all other locks existing in the database in this lock view. There is a granted attribute in the lock view, if this attribute is true, it means that the process has acquired the lock. Otherwise, it means that the process is waiting for the lock.

2. Table-level Lock

ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE,SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE is all the table-level locks, and below is the block relationship among them.

	ACCESS SHARE	ROW SHARE	ROW EXCLU SIVE	SHARE UPDATE EXCLU SIVE	SHARE	SHARE ROW EXCLU SIVE	EXCLU SIVE	ACCESS EXCLU SIVE
ACCESS SHARE								X
ROW SHARE							X	X
ROW EXCLUSIVE					X	X	X	X
SHARE UPDATE EXCLUSIVE				X	X	X	X	X
SHARE			X	X		X	X	X
SHARE ROW EXCLUSIVE			X	X	X	X	X	X
EXCLUSIVE		X	X	X	X	X	X	X
ACCESS EXCLUSIVE	X	X	X	X	X	X	X	X

These are table-level locks that exist in PostgreSQL and table-level locks are stored in memory, we can view the table-level locks we added in the pg_locks view. The database can also automatically add some table-level locks during the running process, we can manually add any table-level lock to a table through the lock command, such as LOCK TABLE t1 IN ACCESS SHARE MODE, this manually adds an ACCESS SHARE lock to t1. Next, I will show every lock.

ACCESS SHARE

When will this lock appear
- Access share lock is acquired when a read-only query is performed on a table
- Use the Lock command
Locks conflict with itACCESS EXCLUSIVE
Lock instance(Not by lock command)We can’t easily capture the moment when an access share lock is added to the table, but we can use its conflict lock to verify it. Here, I add an ACCESS EXCLUSIVE which is a conflicting lock of ACCESS SHAR for table t1.

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11751
(1 row)
postgres=# begin;
BEGIN
postgres=# lock table t1 in ACCESS EXCLUSIVE mode;
LOCK TABLE
postgres=#

Then we query the t1 table in another session, and we found that the SQL command hung.

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
         11804
(1 row)

postgres=# select * from t1 where i = 1;

OK, let’s observe the lock view, 11751 session has acquired ACCESS EXCLUSIVE lock on t1, 11804 session is waiting for ACCESS SHAR lock on t1.

select locktype, relation::regclass as rel, pid, mode, granted
from pg_locks
where pid <> pg_backend_pid() and locktype = 'relation'
order by pid;
 locktype | rel |  pid  |        mode         | granted 
----------+-----+-------+---------------------+---------
 relation | t1  | 11751 | AccessExclusiveLock | t
 relation | t1  | 11804 | AccessShareLock     | f
(2 rows)

postgres=#

ROW SHARE

When will this lock appear
- When using the SELECT … FOR command to lock a row of data, a ROW SHARE lock is first added to the target table.
- Use the Lock command
Locks conflict with itEXCLUSIVE, ACCESS EXCLUSIVE
Lock instance(Not by lock command)

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11751
(1 row)

postgres=# begin;
BEGIN
postgres=# select * from t1 where i = 2 for share;
 i | j 
---+---
 2 | 2
(1 row)

postgres=# select locktype, relation::regclass as rel, pid, mode, granted
from pg_locks where pid <> pg_backend_pid() and locktype = 'relation'  order by pid;
locktype | rel |  pid  |     mode     | granted 
----------+-----+-------+--------------+---------
 relation | t1  | 19312 | RowShareLock | t
(1 row)

postgres=#

Here we can see a ROW SHARE lock on table t1.

ROW EXCLUSIVE

When will this lock appear
- Performing insert / delete / update on a table will add this lock to the table
- Use the Lock command
Locks conflict with itSHARE,SHARE ROW EXCUSIVE,EXCUSIVE,ACCESS EXCUSIVE
Lock instance(Not by lock command)When updating a table, first add a ROW EXCLUSIVE lock on the table

postgres=# begin;
BEGIN
postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11751
(1 row)

postgres=# update t1 set j =j+1 where i = 2;
UPDATE 1

postgres=# select locktype, relation::regclass as rel, pid, mode, granted
from pg_locks where pid <> pg_backend_pid() and locktype = 'relation'  order by pid;
 locktype | rel |  pid  |       mode       | granted 
----------+-----+-------+------------------+---------
 relation | t1  | 11751 | RowExclusiveLock | t
(1 row)

postgres=#

After the above test, query the lock view and find that the ROW EXCLUSIVE lock has been added to the table t1.

SHARE UPDATE EXCLUSIVE

When will this lock appear
- analyse,vacuum, create index concur-ently, these operations will acquire this lock
- Use the Lock command
Locks conflict with itSHARE UPDATE EXCLUSIVE,SHARE ,SHARE ROW EXCLUSIVE,EXCLUSIVE,ACCESS EXCLUSIVE
Lock instance(Not by lock command)Here we also use lock conflicts to prove that the analyze command adds a SHARE UPDATE EXCLUSIVE lock on the table.Add a SHARE lock which is a conflict lock of SHARE UPDATE EXCLUSIVE on t1 table

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11751
(1 row)

postgres=# begin;
BEGIN
postgres=# lock table t1 in share mode;
LOCK TABLE
postgres=#

On another session, do a analyse command,and we can see the analyse command hung

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11804
(1 row)
postgres=# analyse t1;

Now look at the lock view, 11804 session is waiting for a SHARE UPDATE EXCLUSIVE lock

postgres=# select locktype, relation::regclass as rel, pid, mode, granted
from pg_locks where pid <> pg_backend_pid() and locktype = 'relation'  order by pid;
 locktype | rel |  pid  |           mode           | granted 
----------+-----+-------+--------------------------+---------
 relation | t1  | 11751 | ShareLock                | t
 relation | t1  | 11804 | ShareUpdateExclusiveLock | f
(2 rows)

When will this lock appear
- CREATE INDEX will add a SHARE lock on the target table
- Use the Lock command
Locks conflict with itROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE,SHARE ROW EXCLUSIVE,EXCLUSIVE,ACCESS EXCLUSIVE
Lock instance(Not by lock command)Here we also use lock conflicts to prove that the create index command adds a share lock on the table.Add a ROW EXCLUSIVE lock which is a conflict lock of SHARE on t1 table

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11751
(1 row)

postgres=# begin;
BEGIN
postgres=# lock table t1 in row exclusive mode;
LOCK TABLE
postgres=#

On another session, do a create index command,and we can see the create index command hung

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11804
(1 row)
            
postgres=# create index on t1(i);

Now look at the lock view, 11804 session is waiting for a SHARE lock

postgres=# select locktype, relation::regclass as rel, pid, mode, granted
from pg_locks where pid <> pg_backend_pid() and locktype = 'relation'  order by pid;
 locktype | rel |  pid  |       mode       | granted 
----------+-----+-------+------------------+---------
 relation | t1  | 11751 | RowExclusiveLock | t
 relation | t1  | 11804 | ShareLock        | f
(2 rows)

postgres=#

SHARE ROW EXCLUSIVE

When will this lock appear
- This lock is not acquired automatically in PostgreSQL, it is only added in PostgreSQL through the LOCK command
Locks conflict with itROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE,SHARE, SHARE ROW EXCLUSIVE,EXCLUSIVE,ACCESS EXCLUSIVE
Lock instance(Not by lock command)None

EXCLUSIVE

When will this lock appear
- This lock is not acquired automatically in PostgreSQL, it is only added in PostgreSQL through the LOCK command
Locks conflict with itROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE,SHARE, SHARE ROW EXCLUSIVE,EXCLUSIVE,ACCESS EXCLUSIVE
Lock instance(Not by lock command)None

ACCESS EXCLUSIVE

When will this lock appear
- ALTER TABLE,DROP TABLE,TRUNCATE,REINDEX,CLUSTER,VACUUM FULL commands
- Use the Lock command
Locks conflict with itall table-level locks
Lock instance(Not by lock command)Here we also use lock conflicts to prove that the alter table command adds an access exclusive lock on the table.Add a ACCESS SHARE lock which is a conflict lock of ACCESS EXCLUSIVE on t1 table

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11751
(1 row)

postgres=# begin;
BEGIN
postgres=# lock table t1 in access share mode;
LOCK TABLE
postgres=#

On another session, do a alter table command,and we can see the alter table command hung

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          11804
(1 row)
postgres=# alter table t1 add column k int;

Now look at the lock view, 11804 session is waiting for a ACCESS EXCLUSIVE lock

postgres=# select locktype, relation::regclass as rel, pid, mode, granted
from pg_locks where pid <> pg_backend_pid() and locktype = 'relation'  order by pid;
 locktype | rel |  pid  |        mode         | granted 
----------+-----+-------+---------------------+---------
 relation | t1  | 11751 | AccessShareLock     | t
 relation | t1  | 11804 | AccessExclusiveLock | f
(2 rows)

postgres=#

3. ROW-level Lock

FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR KEY SHARE are explicit row-level lock,below is the block relationship among them.

	FOR KEY SHARE	FOR SHARE	FOR NO KEY UPDATE	FOR UPDATE
FOR KEY SHARE				X
FOR SHARE			X	X
FOR NO KEY UPDATE		X	X	X
FOR UPDATE	X	X	X	X

An explicit row-level lock is different from a table-level lock. A FOR UPDATE statement can lock many rows of a table at the same time. In this case, a downward lock cannot store lock information in memory like a table-level lock. Information is not displayed in the pg_locks view. The row lock indicates that the tuple is locked by modifying the t_infomask field and t_infomask2 information in the tuple. Through the code in heapam.c we can see the values of t_infomask and t_infomask2 in different row lock modes.

   /*heapam.c*/
   switch (mode)
   			{
   				case LockTupleKeyShare: //FOR KEY SHARE
   					new_xmax = add_to_xmax;
   					new_infomask |= HEAP_XMAX_KEYSHR_LOCK;
   					break;
   				case LockTupleShare:	//FOR SHARE
   					new_xmax = add_to_xmax;
   					new_infomask |= HEAP_XMAX_SHR_LOCK;
   					break;
   				case LockTupleNoKeyExclusive://FOR NO KEY UPDATE
   					new_xmax = add_to_xmax;
   					new_infomask |= HEAP_XMAX_EXCL_LOCK;
   					break;
   				case LockTupleExclusive://FOR UPDATE
   					new_xmax = add_to_xmax;
   					new_infomask |= HEAP_XMAX_EXCL_LOCK;
   					new_infomask2 |= HEAP_KEYS_UPDATED;
   					break;
   				default:
   					new_xmax = InvalidTransactionId;	/* silence compiler */
   					elog(ERROR, "invalid lock mode");
   			}
   		}

  /*htup_details.h*/
   
   /*
    * information stored in t_infomask:
    */
   #define HEAP_XMAX_KEYSHR_LOCK	0x0010	/* xmax is a key-shared locker */
   #define HEAP_COMBOCID			0x0020	/* t_cid is a combo cid */
   #define HEAP_XMAX_EXCL_LOCK		0x0040	/* xmax is exclusive locker */
   #define HEAP_XMAX_LOCK_ONLY		0x0080	/* xmax, if valid, is only a locker */
   #define HEAP_XMIN_COMMITTED		0x0100	/* t_xmin committed */
   ...
   /*
    * information stored in t_infomask2:
    */
   #define HEAP_KEYS_UPDATED		0x2000	/* tuple was updated and key cols
   										 * modified, or tuple deleted */

FOR UPDATE

Use SELECT … FOR UPDATE to complete the lock on some rows and a row with a FOR UPDATE lock will block FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE or FOR KEY SHARE operation.

Adding a FOR UPDATE lock to table t1 in a session

   postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             25019
   (1 row)
   
   postgres=# select * from t1;
    i | j 
   ---+---
    1 | 2
   (1 row)
   
   postgres=# begin;
   BEGIN
   postgres=# select * from t1 where i = 1 for update;
    i | j 
   ---+---
    1 | 2
   (1 row)
   
   postgres=#

In another session to update this record of t1, I found that the update could not be completed, it hung.

postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             26596
   (1 row)
   
   postgres=# update t1 set j = j + 1 where i = 1;

Let’s check the pg_locks view:

 postgres=# select locktype, relation::regclass as rel, pid, mode, granted
   postgres-# from pg_locks where pid <> pg_backend_pid() order by pid;
      locktype    | rel |  pid  |       mode       | granted 
   ---------------+-----+-------+------------------+---------
    virtualxid    |     | 25019 | ExclusiveLock    | t
    relation      | t1  | 25019 | RowShareLock     | t
    transactionid |     | 25019 | ExclusiveLock    | t
    transactionid |     | 26596 | ShareLock        | f
    transactionid |     | 26596 | ExclusiveLock    | t
    tuple         | t1  | 26596 | ExclusiveLock    | t
    virtualxid    |     | 26596 | ExclusiveLock    | t
    relation      | t1  | 26596 | RowExclusiveLock | t
   (8 rows)

We found that there is no 25019 session lock on a table record(tuple) in the lock view, but the subsequent 26596 session has added an EXCLUSIVE lock to this record.This also proves that row locks do not appear in the pg_locks view. Explicit FOR UPDATE row lock do not block EXCLUSIVE row locks.

Let’s take a look at the data of table t1 through pageinspace’s heap_page_items () function

postgres=# select ctid,* from t1;
    ctid  | i | j 
   -------+---+---
    (0,1) | 1 | 2
   (1 row)
   
   postgres=# select lp, t_xmin, t_xmax, t_ctid, t_infomask, t_infomask from  heap_page_items(get_raw_page('t1', 0));
    lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask 
   ----+--------+--------+--------+------------+------------
     1 |    631 |    633 | (0,1)  |       8194 |       448
   (1 row)
   
   postgres=#

t_infomask value: 448=0x1C0

t_infomask2 value: 8194=0x2002

So t_infomask & HEAP_XMAX_EXCL_LOCK is true and t_infomask2 & HEAP_KEYS_UPDATED is true

(Here you should drop back to see the code in htup_details.h)

So we can say the tuple(lp=1) has a FOR UPDATE lock now.

FOR NO KEY UPDATE

Use SELECT … FOR NO KEY UPDATE to complete the lock on some rows and a row with a FOR UPDATE lock will block FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE operation.

Based on the 25019 session above, we newly insert a test data and start a new transaction, and add a FOR NO KEY UPDATE lock on this new row

   postgres=# commit;
   COMMIT
   postgres=# insert into t1 values(2,2);
   INSERT 0 1
   postgres=# begin;
   BEGIN
   postgres=# select * from t1 where i = 2 for no key update;
    i | j 
   ---+---
    2 | 2
   (1 row)
   
   postgres=#

Similarly we look at the data of the t1 table on the hard disk.

(Some people may wonder why there is one more record, because an update statement is executed in the 26596 session. After his blocking transaction is committed, the session has also completed the update operation, so there is one more tuple)

postgres=# select ctid,* from t1; 
    ctid  | i | j 
   -------+---+---
    (0,2) | 1 | 3
    (0,3) | 2 | 2
   (2 rows)
   
   postgres=# select lp, t_xmin, t_xmax, t_ctid, t_infomask, t_infomask from  heap_page_items(get_raw_page('t1', 0));
    lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask 
   ----+--------+--------+--------+------------+------------
     1 |    631 |    634 | (0,2)  |      16386 |        1280
     2 |    634 |    636 | (0,2)  |      32770 |       10496
     3 |    635 |    637 | (0,3)  |          2 |         448
   (3 rows)
   
   postgres=#

lp=3 is the tuple we want,

t_infomask value:448=0x1C0

t_infomask & HEAP_XMAX_EXCL_LOCK is true

t_infomask2 & HEAP_KEYS_UPDATED is false

So we can say the tuple(lp=3) has a FOR NO KEY UPDATE lock now.

FOR SHARE

Use SELECT … FOR SHARE to complete the lock on some rows and a row with a FOR UPDATE lock will block FOR UPDATE, FOR NO KEY UPDATE operation.

Based on the 25019 session above, we newly insert a test data and start a new transaction, and add a FOR SHARE lock on this new row.

 postgres=# commit;
   COMMIT
   postgres=# insert into t1 values(3,3);
   INSERT 0 1
   postgres=# begin;
   BEGIN
   postgres=# select * from t1 where i = 3 for share;
    i | j 
   ---+---
    3 | 3
   (1 row)
   
   postgres=#

We look at the data of the t1 table on the hard disk.

postgres=# select ctid,* from t1; 
    ctid  | i | j 
   -------+---+---
    (0,2) | 1 | 3
    (0,3) | 2 | 2
    (0,4) | 3 | 3
   (3 rows)
   
   postgres=# select lp, t_xmin, t_xmax, t_ctid, t_infomask, t_infomask from  heap_page_items(get_raw_page('t1', 0));
    lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask 
   ----+--------+--------+--------+------------+------------
     1 |    631 |    634 | (0,2)  |      16386 |       9984
     2 |    634 |    636 | (0,2)  |      32770 |       8640
     3 |    635 |    637 | (0,3)  |          2 |        448
     4 |    638 |    639 | (0,4)  |          2 |        464
   (4 rows)
   
   postgres=#

lp=4 is the tuple we want,

t_infomask value:464=0x1D0

t_infomask & HEAP_XMAX_SHR_LOCK is true

So we can say the tuple(lp=3) has a FOR SHARE lock now.

FOR KEY SHARE

Use SELECT … FOR KEY SHARE to complete the lock on some rows and a row with a FOR UPDATE lock will block FOR UPDATE operation.

Based on the 25019 session above, we newly insert a test data and start a new transaction, and add a FOR KEY SHARE lock on this new row.

 postgres=# commit;
   COMMIT
   postgres=# insert into t1 values(4,4);
   INSERT 0 1
   postgres=# begin;
   BEGIN
   postgres=# select * from t1 where i = 4 for key share;
    i | j 
   ---+---
    4 | 4
   (1 row)

We look at the data of the t1 table on the hard disk.

postgres=# select lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask from  heap_page_items(get_raw_page('t1', 0));
    lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask 
   ----+--------+--------+--------+-------------+------------
     1 |    631 |    634 | (0,2)  |       16386 |       9984
     2 |    634 |    636 | (0,2)  |       32770 |       8640
     3 |    635 |    637 | (0,3)  |           2 |        448
     4 |    638 |    639 | (0,4)  |           2 |        464
     5 |    640 |    642 | (0,5)  |           2 |        400
   (5 rows)
   
   postgres=#

lp=5 is the tuple we want,

t_infomask value:400=0x190

t_infomask & HEAP_XMAX_KEYSHR_LOCK is true

So we can say the tuple(lp=5) has a FOR KEY SHARE lock now.

EXCLUSIVE

In addition to using the FOR … series of explicit row-level locks, there be row-level lock stored in memory in the database, which can be displayed in the pg_locks view, let’s show it.

The EXCLUSIVE row block EXCLUSIVE row block only, it do not matter with FOR … series locks.

Start a transaction and update a row of data in a session

postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             17501
   (1 row)
   
   postgres=# begin;
   BEGIN
   postgres=# update t1 set j = j+1 where i = 2;
   UPDATE 1
   postgres=#

In another session, update the same row of data and found that the update command hung.

 postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             17559
   (1 row)
   
   postgres=# update t1 set j = j+1 where i = 2;

We look at the pg_locks view

postgres=#  select locktype, relation::regclass as rel, pid, mode, granted
   from pg_locks where pid <> pg_backend_pid() and locktype = 'tuple'  order by pid;
    locktype | rel |  pid  |     mode      | granted 
   ----------+-----+-------+---------------+---------
    tuple    | t1  | 17559 | ExclusiveLock | t
   (1 row)
   
   postgres=#

At this point we found that session 17559 added a row lock to a row in the t1 table.

4. Transaction Lock

There is a transaction lock which can’t be added manually.

postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             17501
   (1 row)
   
   postgres=# begin;
   BEGIN
   postgres=# update t1 set j = j+1 where i = 2;
   UPDATE 1
   postgres=select locktype, transactionid, pid, mode, granted
   from pg_locks where pid <> pg_backend_pid() and locktype = 'transactionid'  order by pid;
      locktype    | transactionid |  pid  |     mode      | granted 
   ---------------+---------------+-------+---------------+---------
    transactionid |           674 | 17501 | ExclusiveLock | t
   (1 row)

In this example, we started a transaction and performed a row of data update in the transaction. After querying the lock view, we can see that an exclusive lock was added on transaction 674. When other concurrent processes want to determine the visibility of the old tuple will be blocked by this lock.

5. Page Lock

The page lock is also a pass in the official document, because page locks are only used in some indexes in PostgreSQL, and these are basically invisible to us and are not testable. According to the official document, it says ‘Application developers normally need not be concerned with page-level locks, but they are mentioned here for completeness’.

6. Advisory Lock

PostgreSQL provides a way to create locks whose meaning is defined by the application, called advisory locks. PostgreSQL is only responsible for creating advisory locks or unlocking after receiving the advisory lock command, but PostgreSQL does not use these locks, these locks will be used by the application that created them. Advisory locks are also explicit in the pg_locks view. The following table is the function of the advisory lock:

Name	Return Type	Description
pg_advisory_lock(key bigint)	`void`	Obtain exclusive session level advisory lock
pg_advisory_lock(`key1` int, `key2` int)	`void`	Obtain exclusive session level advisory lock
pg_advisory_lock_shared(`key` bigint)	`void`	Obtain share session level advisory lock
pg_advisory_lock_shared(`key1` int, `key2` int)	`void`	Obtain share session level advisory lock
pg_advisory_unlock(`key` bigint)	`boolean`	Release an exclusive session level advisory lock
pg_advisory_unlock(`key1` int, `key2` int)	`boolean`	Release an exclusive session level advisory lock
pg_advisory_unlock_all()	`void`	Release all session level advisory locks held by the current session
pg_advisory_unlock_shared(`key` bigint)	`boolean`	Release a shared session level advisory lock
pg_advisory_unlock_shared(`key1` int, `key2` int)	`boolean`	Release a shared session level advisory lock
pg_advisory_xact_lock(`key` bigint)	`void`	Obtain exclusive transaction level advisory lock
pg_advisory_xact_lock(`key1` int, `key2` int)	`void`	Obtain exclusive transaction level advisory lock
pg_advisory_xact_lock_shared(`key` bigint)	`void`	Obtain share transaction level advisory lock
pg_advisory_xact_lock_shared(`key1` int, `key2` int)	`void`	Obtain share transaction level advisory lock
pg_try_advisory_lock(`key` bigint)	`boolean`	Obtain exclusive session level advisory lock if available
pg_try_advisory_lock(`key1` int, `key2` int)	`boolean`	Obtain exclusive session level advisory lock if available
pg_try_advisory_lock_shared(`key` bigint)	`boolean`	Obtain share session level advisory lock if available
pg_try_advisory_lock_shared(`key1` int, `key2` int)	`boolean`	Obtain share session level advisory lock if available
pg_try_advisory_xact_lock(`key` bigint)	`boolean`	Obtain exclusive session level advisory lock if available
pg_try_advisory_xact_lock(`key1` int, `key2` int)	`boolean`	Obtain exclusive session level advisory lock if available
pg_try_advisory_xact_lock_shared(`key` bigint)	`boolean`	Obtain share session level advisory lock if available
pg_try_advisory_xact_lock_shared(`key1` int, `key2` int)	`boolean`	Obtain share session level advisory lock if available

Next we simply test the advisory lock.

Transaction-level advisory

Acquire transaction-level exclusive locks

postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             11751
   (1 row)
   
   postgres=# begin;
   BEGIN
   postgres=# select pg_advisory_xact_lock(100);
    pg_advisory_xact_lock 
   -----------------------
    
   (1 row)
   
   postgres=#

Check in the pg_locks view, there’s an advisory lock here.

  postgres=# select locktype, relation::regclass as rel, pid, objid, mode, granted
   from pg_locks where pid <> pg_backend_pid() and locktype='advisory';
    locktype | rel |  pid  | objid |     mode      | granted 
   ----------+-----+-------+-------+---------------+---------
    advisory |     | 11751 |   100 | ExclusiveLock | t
   (1 row)
   
   postgres=#

After committing the transaction, check the lock view and find that the transaction advisory lock is gone

   postgres=# select pg_backend_pid();
    pg_backend_pid 
   ----------------
             11751
   (1 row)
   
   postgres=# commit;
   COMMIT
   postgres=# select locktype, relation::regclass as rel, pid, objid, mode, granted
   from pg_locks where pid <> pg_backend_pid() and locktype='advisory';
    locktype | rel | pid | objid | mode | granted 
   ----------+-----+-----+-------+------+---------
   (0 rows)
   
   postgres=#

Session-level advisory

Get session-level advisory lock, which are independent of transactions and do not need to open an additional transaction block

   postgres=# select pg_advisory_lock(1000);
    pg_advisory_lock 
   ------------------
    
   (1 row)
   
   postgres=# select locktype, relation::regclass as rel, pid, objid, mode, granted
   from pg_locks where pid <> pg_backend_pid() and locktype='advisory';
    locktype | rel |  pid  | objid |     mode      | granted 
   ----------+-----+-------+-------+---------------+---------
    advisory |     | 11751 |  1000 | ExclusiveLock | t
   (1 row)
   
   postgres=#

After the current session ends, the lock will be released automatically. Of course, you can also use the pg_advisory_unlock () function to release the lock in advance.

7. In the end

This blog analyzes every postgres lock and adds experimental use cases to it. This is a PostgreSQL lock exploration blog, only after you fully understand the locks that exist in the database can you calmly cope with various query problems that occur in the database.

Movead Li

Movead.Li is kernel development of Highgo Software. Since joining into Highgo Software in 2016, Movead takes the most time on researching the code of Postgres and is good at ‘Write Ahead Log’ and ‘Database Backup And Recovery’. Base on the experience Movead has two open-source software on the Postgres database. One is Walminer which can analyze history wal file to SQL. The other one is pg_lightool which can do a single table or block recovery base on base backup and walfiles or walfiles only.

Hello

Now he has joined the HighGo community team and hopes to make more contributions to the community in the future.

The post Have An Eye On Locks Of PostgreSQL appeared first on Highgo Software Inc..

↧

Darafei Praliaskouski: PostGIS 3.0.1

February 19, 2020, 4:00 pm

≫ Next: Luca Ferrari: Usage of disk space in Oracle and PostgreSQL: a simple use case

≪ Previous: movead li: Have An Eye On Locks Of PostgreSQL

The PostGIS Team is pleased to release PostGIS 3.0.1.

Best served with PostgreSQL 12.2, GEOS 3.8.0, SFCGAL 1.3.7, GDAL 3.0.4, PROJ 6.3.1, protobuf-c 1.3.3, json-c 0.13.1.

Continue Reading by clicking title hyperlink ..

↧