Quantcast
Channel: Planet PostgreSQL
Viewing all 9730 articles
Browse latest View live

Regina Obe: PostGIS 3.0.0rc1


ahsan hadi: Newbie to PostgreSQL – where to start

$
0
0

While managing a small team of development resources working on PostgreSQL development, I sometimes get resources in my team that have good development experience but are new to PostgreSQL. I have developed a short set of training instructions in order to get these resources started with PostgreSQL and get them to familiarise themselves with Postgres and its internals. The purpose of this blog is to share these instructions so it can benefit others in a similar situation. The instructions involve going through a lot of documentation, white-papers, online books, it also includes few development exercises that can be helpful in understanding PostgreSQL codebase. I have found these helpful in the initial training for resources that are new to PostgreSQL, putting them in the blog so they are also helpful to others.

Online Learning Resources

For someone who is new to PostgreSQL, the obvious starting point is understanding PostgreSQL functionality. I would recommend the following resources for reading about PostgreSQL functionality.

http://www.postgresqltutorial.com/

This is a really good place for Postgres tutorials, the tutorial available on this site vary from basic “Getting started with PostgreSQL” to complex features like common table expressions, partition tables, etc. It also contains tutorials for Postgres client’s tools, programming interfaces, etc.

This presentations and tutorial available at Bruce Momjain site are also very useful, this site contains presentations, online books are other material on all sorts of topics related to PostgreSQL. Whether it is talking about query processing, Postgres internal, horizontal scalability with sharding, security, etc, this site contains a lot of useful training material related to PostgreSQL.

https://momjian.us/main/presentations/extended.html

There is of-course community official documentation for every release of PostgreSQL. The difference between each release document is new features added to the release or changes to existing features. You can get documentation for any release using the main docs link below.

https://www.postgresql.org/docs/11/index.html

https://www.postgresql.org/docs/

The above resources should give you a really good insight into PostgreSQL features and functionality. The next place to start is understanding Postgres internals and how PostgreSQL community goes about feature development.

I have found this online book (link below) really good in understanding PostgreSQL internals :

http://www.interdb.jp/pg/

It is very useful in understanding the components involved in query processing and taking a deep dive into PostgreSQL internals for storage, WAL and memory management.

PostgreSQL has one of the best and active community with some of the best techies, you can subscribe to the community mailing list using the link below.

https://www.postgresql.org/list/

The mailing list archives are also available at the same link, the most interesting one for development in psql-hackers where most of the development discussion takes place. The PostgreSQL development team lives in psql-hackers mailing list. You can simply click on any of the mailing lists and search for a particular feature like “TDE” and get all the email threads related to TDE.

This is the best place to understand how community development takes place. For new features, you can see the process of starting from submitting a proposal along with POC patches and getting buy-in from other community members. There is a lot to learn from a community mailing list and understanding how the community goes about in doing PostgreSQL development.

Basic development exercises

It is time to indulge in some development exercises to develop a better understanding of the codebase. It is out of scope of this blog to explain how each one is done, I believe it is already well documented in other online resources.

1)

The first think that I would recommend is getting an understanding of the PostgresSQL regression suite, how it is executed and how to add a new test case to the regression test suite. PostgreSQL has a comprehensive set of test cases that live in “src/test/regress/sql”, last I looked at the master branch the number of test cases files was around 197. The test cases are generally divided based on the functionality i.e. limit.sql, json.sql, etc and for data-type boolean.sql, etc. Each test case file contains detailed test cases for testing the desired functionality.

There are other test cases written in the TAP framework for testing tools and utilities that can’t be tested with just the SQL interface. This is also needed when you need to test with multiple server configurations or need to testing replication/failover or testing backup etc. Please see the link below TAP (Test any protocol) based stated as a PERL based testing framework but later extended to allow writing test cases in other languages.

http://testanything.org/

So someone coming new to PostgreSQL needs to understand how to run the regression test suite and how to add new test cases to regression. We can add test cases for partitioning or any other functionality, make it part of regression schedule and run the regression.

2)

The second exercise is to understand how PostgreSQL extensions work and how to add new extensions to PostgreSQL. The extensions are added to PostgreSQL for making it extensible, The contrib/directory shipped with the source code contains several extensions, which are described in PostgreSQL documentation. Other extensions are developed independently, like PostGIS. Even PostgreSQL replication solutions can be developed externally, Slony is a popular replication solution that is developed outside of the core.

The user can add their own extensions to PostgreSQL according to there needs. This is a really useful that shows how to add a user-defined extension to PostgreSQL.

https://www.highgo.ca/2019/10/01/a-guide-to-create-user-defined-extension-modules-to-postgres/

3)

The third exercise is slightly more challenging but it is very useful in understanding the internals of PostgreSQL. The exercise is to basically follow a SELECT query in PostgreSQL internals and see how the query changes as it passes through the internal components of PostgreSQL. This is done using a debugger and compiling PostgreSQL source with enable-debug and watching the state by placing appropriate breakpoints in the code.

Another real blog is written by Cary (HighGo) that shows how this is done…

https://www.highgo.ca/2019/10/03/trace-query-processing-internals-with-debugger/

Ahsan Hadi

Ahsan Hadi is a VP of Development with HighGo Software Inc. Prior to coming to HighGo Software, Ahsan had worked at EnterpriseDB as a Senior Director of Product Development, Ahsan worked with EnterpriseDB for 15 years. The flagship product of EnterpriseDB is Postgres Plus Advanced server which is based on Open source PostgreSQL. Ahsan has vast experience with Postgres and has lead the development team at EnterpriseDB for building the core compatibility of adding Oracle compatible layer to EDB’s Postgres Plus Advanced Server. Ahsan has also spent number of years working with development team for adding Horizontal scalability and sharding to Postgres. Initially, he worked with postgres-xc which is multi-master sharded cluster and later worked on managing the development of adding horizontal scalability/sharding to Postgres. Ahsan has also worked a great deal with Postgres foreign data wrapper technology and worked on developing and maintaining FDW’s for several sql and nosql databases like MongoDB, Hadoop and MySQL.

Prior to EnterpriseDB, Ahsan worked for Fusion Technologies as a Senior Project Manager. Fusion Tech was a US based consultancy company, Ahsan lead the team that developed java based job factory responsible for placing items on shelfs at big stores like Walmart. Prior to Fusion technologies, Ahsan worked at British Telecom as a Analyst/Programmer and developed web based database application for network fault monitoring.

Ahsan joined HighGo Software Inc (Canada) in April 2019 and is leading the development teams based in multiple Geo’s, the primary responsibility is community based Postgres development and also developing HighGo Postgres server.

Hans-Juergen Schoenig: What is autovacuum doing to my temporary tables?

$
0
0

Did you know that your temporary tables are not cleaned up by autovacuum? If you did not, consider reading this blog post about PostgreSQL and autovacuum. If you did – well, you can still continue to read this article.

Autovacuum cleans tables automatically

Since the days of PostgreSQL 8.0, the database has provided this miraculous autovacuum daemon which is in charge of cleaning tables and indexes. In many cases, the default configuration is absolutely ok and people don’t have to worry about VACUUM much. However, recently one of our support clients sent us an interesting request related to temporary tables and autovacuum.

What is the problem? The main issue is that autovacuum does not touch temporary tables. Yes, it’s true – you have to VACUUM temporary tables on your own. But why is this the case? Let’s take a look at how the autovacuum job works in general: Autovacuum sleeps for a minute, wakes up and checks if a table has seen a sufficiently large number of changes before it fires up a cleanup process. The important thing is that the cleanup process actually has to see the objects it will clean, and this is where the problem starts. An autovacuum process has no way of seeing a temporary table, because temporary tables can only be seen by the database connection which actually created them. Autovacuum therefore has to skip temporary tables. Unfortunately, most people are not aware of this issue. As long as you don’t use your temporary tables for extended periods, the missing cleanup job is not an issue. However, if your temp tables are repeatedly changed in long transactions, it can become a problem.

What-is-autovacuum-doing-to-my-temporary-tables

Proving my point

The main question now is: How can we verify what I have just said? To show you what I mean, I will load the pgstattuple extension and create two tables– a “real” one, and a temporary one:

test=# CREATE EXTENSION pgstattuple;
CREATE EXTENSION
test=# CREATE TABLE t_real AS
SELECT * FROM generate_series(1, 5000000) AS id;
SELECT 5000000
test=# CREATE TEMPORARY TABLE t_temp AS
SELECT * FROM generate_series(1, 5000000) AS id;
SELECT 5000000

Let us now kill half of the data in each those two tables:

test=# DELETE FROM t_real WHERE id % 2 = 0;
DELETE 2500000
test=# DELETE FROM t_temp WHERE id % 2 = 0;
DELETE 2500000

The tables will now contain around 50% trash each. If we wait sufficiently long, we will see that autovacuum has cleaned up the real table while the temporary one is still in jeopardy:

test=# \x
Expanded display is on.
test=# SELECT * FROM pgstattuple('t_real');
-[ RECORD 1 ]
............---------------+-..---------
table_len                  | 181239808
tuple_count                | 2500000
tuple_len                  | 70000000
tuple_percent              | 38.62
dead_tuple_count           | 0
dead_tuple_len             | 0
dead_tuple_percent         | 0
free_space                 | 80620336
free_percent               | 44.48

test=# SELECT * FROM pgstattuple('t_temp');
-[ RECORD 1 ]
--------------------+------------
table_len           | 181239808
tuple_count         | 2500000
tuple_len           | 70000000
tuple_percent       | 38.62
dead_tuple_count    | 2500000
dead_tuple_len      | 70000000
dead_tuple_percent  | 38.62
free_space          | 620336
free_percent        | 0.34

The “real table” has already been cleaned and a lot of free space is available, while the temporary table still contains a ton of dead rows. Only a manual job will find the free space in all that jumble.

Keep in mind that VACUUM is only relevant if you really want to keep the temporary table for a long time. If you close your connection, the entire space will be automatically reclaimed anyway– so there is no need to worry about dropping the table.

If you want to learn more about VACUUM in general, consider checking out one of our other blogposts. If you are interested in how VACUUM works, it also is definitely useful to read the official documentation, which can be found here

The post What is autovacuum doing to my temporary tables? appeared first on Cybertec.

Federico Campoli: Time and relative dimension in space

$
0
0

The transactional model has been in PostgreSQL since the early versions. PostgreSQL implementation follows the guidelines of the SQL standard some notable exceptions.

When designing an application it’s important to understand how the concurrent access to data happens in order to avoid unexpected results or errors.

Mark Wong: PDXPUG October Meetup: Terminal Tools & Database Statistics

$
0
0

2019 October 17th Meeting 6pm-8pm

Location:

PSU Business Accelerator
2828 SW Corbett Ave · Portland, OR
Parking is open after 5pm.

Speaker: Mark Wong

pg_top was born in 2007 from a fork of the unixtop, a terminal program displaying top processes on the system, where pg_top focuses on the processes with the PostgreSQL database you are connected to. Recently pg_systat was forked from systat to display additional database statistics.

These tools have can help do more such as explore query execution plans and create reports from system and database resources.

Come learn about the statistics PostgreSQL keeps and how to use these tools to view them.

Mark leads the 2ndQuadrant performance practice as a Performance Consultant for English Speaking Territories, based out of Oregon in the USA. He is a long time Contributor to PostgreSQL, co-organizer of the Portland PostgreSQL User Group, and serves as a Director and Treasurer for the United States PostgreSQL Association.

Craig Ringer: A convenient way to launch psql against postgres while running pg_regress

$
0
0
I got tired of looking up regression_output/log/postmaster.log to find the PGSOCKET to use to connect to a running pg_regress‘s temp-install postgres so I wrote a little shell function for it. My patch to print a connst in pg_regress never got merged so I needed a workaround. I’m sure I’m not the only one, so here’s […]

Avinash Kumar: How to Set Up Streaming Replication in PostgreSQL 12

$
0
0
Streaming Replication in PostgreSQL

PostgreSQLPostgreSQL 12 can be considered revolutionary considering the performance boost we observe with partitioning enhancements, planner improvements, several SQL features, Indexing improvements, etc. You may see some of such features discussed in future blog posts. But, let me start this blog with something interesting. You might have already seen some news that there is no

recovery.file
in standby anymore and that the replication setup (streaming replication) has slightly changed in PostgreSQL 12. We have earlier blogged about the steps involved in setting up a simple Streaming Replication until PostgreSQL 11 and also about using replication slots for the same. Let’s see how different is it to set up the same Streaming Replication in PostgreSQL 12.

Installing PostgreSQL 12 on Master and Standby

On CentOS/RedHat, you may use the rpms available in the PGDG repo (the following link may change depending on your OS release).

# as root:
yum install -y https://yum.postgresql.org/12/redhat/rhel-7.4-x86_64/pgdg-redhat-repo-latest.noarch.rpm -y
yum install -y postgresql12-server

Steps to set up Streaming Replication in PostgreSQL 12

In the following steps, the Master server is: 192.168.0.108 and the Standby server is: 192.168.0.107

Step 1 :
Initialize and start PostgreSQL, if not done already on the Master.

## Preparing the environment
$ sudo su - postgres
$ echo "export PATH=/usr/pgsql-12/bin:$PATH PAGER=less" >> ~/.pgsql_profile
$ source ~/.pgsql_profile

## As root, initialize and start PostgreSQL 12 on the Master
$ /usr/pgsql-12/bin/postgresql-12-setup initdb
$ systemctl start postgresql-12

 

Step 2 :
Modify the parameter

listen_addresses
to allow a specific IP interface or all (using *). Modifying this parameter requires a restart of the PostgreSQL instance to get the change into effect.
# as postgres
$ psql -c "ALTER SYSTEM SET listen_addresses TO '*'";
ALTER SYSTEM

# as root, restart the service
$ systemctl restart postgresql-12

You may not have to set any other parameters on the Master for simple replication setup, because the defaults hold good.

 

Step 3 :
Create a User for replication in the Master. It is discouraged to use superuser postgres in order to setup replication, though it works.

postgres=# CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret';
CREATE ROLE

 

Step 4 :
Allow replication connections from Standby to Master by appending a similar line as following to the

pg_hba.conf
file of the Master. If you are enabling automatic failover using any external tool, you must also allow replication connections from Master to the Standby. In the event of a failover, the Standby may be promoted as a Master and the Old Master need to replicate changes from the New Master (previously a standby). You may use any of the authentication methods as supported by PostgreSQL today.
$ echo "host replication replicator 192.168.0.107/32 md5" >> $PGDATA/pg_hba.conf

## Get the changes into effect through a reload.

$ psql -c "select pg_reload_conf()"

 

Step 5 :
You may use

pg_basebackup
  to backup the data directory of the Master from the Standby. While creating the backup, you may also tell
pg_basebackup
  to create the replication specific files and entries in the data directory using
"-R"
 .
## This command must be executed on the standby server.
$ pg_basebackup -h 192.168.0.108 -U replicator -p 5432 -D $PGDATA -Fp -Xs -P -R
Password:
25314/25314 kB (100%), 1/1 tablespace

You may use multiple approaches such as rsync or any other disk backup methods to copy the master’s data directory to the standby. But, there is an important file (standby.signal) that must exist in a standby data directory to help postgres determine its state as a standby. It is automatically created when you use the

"-R"
option while taking
pg_basebackup
. If not, you may simply use touch to create this empty file.
$ touch $PGDATA/standby.signal

$ ls -l $PGDATA
total 60
-rw-------. 1 postgres postgres 224 Oct 8 16:41 backup_label
drwx------. 5 postgres postgres 41 Oct 8 16:41 base
-rw-------. 1 postgres postgres 30 Oct 8 16:41 current_logfiles
drwx------. 2 postgres postgres 4096 Oct 8 16:41 global
drwx------. 2 postgres postgres 32 Oct 8 16:41 log
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_commit_ts
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_dynshmem
-rw-------. 1 postgres postgres 4581 Oct 8 16:41 pg_hba.conf
-rw-------. 1 postgres postgres 1636 Oct 8 16:41 pg_ident.conf
drwx------. 4 postgres postgres 68 Oct 8 16:41 pg_logical
drwx------. 4 postgres postgres 36 Oct 8 16:41 pg_multixact
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_notify
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_replslot
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_serial
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_snapshots
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_stat
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_stat_tmp
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_subtrans
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_tblspc
drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_twophase
-rw-------. 1 postgres postgres 3 Oct 8 16:41 PG_VERSION
drwx------. 3 postgres postgres 60 Oct 8 16:41 pg_wal
drwx------. 2 postgres postgres 18 Oct 8 16:41 pg_xact
-rw-------. 1 postgres postgres 288 Oct 8 16:41 postgresql.auto.conf
-rw-------. 1 postgres postgres 26638 Oct 8 16:41 postgresql.conf
-rw-------. 1 postgres postgres 0 Oct 8 16:41 standby.signal

One of the most important observations should be the contents of the

postgresql.auto.conf
file in the standby server. As you see in the following log, an additional parameter
primary_conninfo
has been added to this file. This parameter tells the standby about its Master. If you haven’t used
pg_basebackup
with
-R
option, you would not see this entry (of
primary_conninfo
) in this file, on the standby server. Which means that you have to add this manually.
$ cat $PGDATA/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
listen_addresses = '*'
primary_conninfo = 'user=replicator password=secret host=192.168.0.108 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'

postgresql.auto.conf
file is the configuration file that is read at the end when you start Postgres. So, if there is a parameter that has different values in
postgresql.conf
and
postgresql.auto.conf
files, the value set in the
postgresql.auto.conf
is considered by PostgreSQL. Also, any parameter that has been modified using
ALTER SYSTEM
would automatically be written to
postgresql.auto.conf
  file by postgres.

How was the replication configuration handled until PostgreSQL 11?

Until PostgreSQL 11, we must create a file named:

recovery.conf
that contains the following minimalistic parameters. If the
standby_mode
is ON, it is considered to be a standby.
$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.8 port=5432 user=replicator password=secret'

So the first difference between PostgreSQL 12 and earlier (until PostgreSQL 11) is that the

standby_mode
parameter is not present in PostgreSQL 12 and the same has been replaced by an empty file
standby.signal
in the standby’s data directory. And the second difference is the parameter
primary_conninfo
. This can now be added to the
postgresql.conf
or
postgresql.auto.conf
 file of the standby’s data directory.

 

Step 6 :
Start PostgreSQL using

pg_ctl
on the Standby.
$ pg_ctl -D $PGDATA start

 

Step 7 :
Verify the replication between the Master and the Standby. In order to verify, run this command on the Master. In the following log, you see a lot of details of the standby and the lag between the Master and Standby.

$ psql -x -c "select * from pg_stat_replication"
-[ RECORD 1 ]----+------------------------------
pid | 2522
usesysid | 16384
usename | replicator
application_name | walreceiver
client_addr | 192.168.0.107
client_hostname |
client_port | 36382
backend_start | 2019-10-08 17:15:19.658917-04
backend_xmin |
state | streaming
sent_lsn | 0/CB02A90
write_lsn | 0/CB02A90
flush_lsn | 0/CB02A90
replay_lsn | 0/CB02A90
write_lag | 00:00:00.095746
flush_lag | 00:00:00.096522
replay_lag | 00:00:00.096839
sync_priority | 0
sync_state | async
reply_time | 2019-10-08 17:18:04.783975-04

Enabling Archiving on Master and the Standby recovery using Archives.

Most of the time, the default or modified retention settings of WAL segments on the Master may not be enough to maintain a healthy replication between itself and its standby. So, we need the WALs to be safely archived to another disk or a remote backup server. These archived WAL segments can be used by the standby to replay them when the WALs are gone from the Master.

To enable archiving on the Master, we can still use the same approach of setting the following 2 parameters.

archive_mode = ON
archive_command = 'cp %p /archives/%f' ## Modify this with an appropriate shell command.

But to enable recovery from archives on a standby, we used to add a parameter named

restore_command
to the
recovery.conf
file until PostgreSQL 11. But starting from PostgreSQL 12, we can add the same parameter to
postgresql.conf
or
postgresql.auto.conf
file of the standby. Please note that it requires a restart of PostgreSQL to update the changes made to
archive_mode
and
restore_command
parameters.
echo "restore_command = 'cp /archives/%f %p'" >> $PGDATA/postgresql.auto.conf
pg_ctl -D $PGDATA restart -mf

In my next blog post, I shall talk about Point-in-time-recovery on PostgreSQL 12, where I will discuss a few more parameters related to recovery in detail. Meanwhile, have you tried Percona Distribution for PostgreSQL? It is a collection of finely-tested and implemented open source tools and extensions along with PostgreSQL 11, maintained by Percona. Please subscribe to our blog posts to learn more interesting features in PostgreSQL.

Discuss on HackerNews

Regina Obe: PostGIS 3.0.0rc2

$
0
0

The PostGIS development team is pleased to release PostGIS 3.0.0rc2. This will be the final RC before release.

This release works with PostgreSQL 9.5-12 and GEOS >= 3.6

Best served with PostgreSQL 12 , GEOS 3.8.0 and pgRouting 3.0.0-alpha.

Continue Reading by clicking title hyperlink ..

Dimitri Fontaine: Compute database size

$
0
0
Photo by unsplash-logoCharles 🇵🇭 It is well known that database design should be as simple as possible, and follow the normalization process. Except in some cases, sometimes, for scalability purposes. Partitioning might be used to help deal with large amount of data for instance. But what is a large amount of data? Do you need to pay attention to those scalability trade-offs now, or can you wait until later?

Álvaro Herrera: PostgreSQL 12: Foreign Keys and Partitioned Tables

$
0
0
Now that PostgreSQL 12 is out, we consider foreign keys to be fully compatible with partitioned tables. You can have a partitioned table on either side of a foreign key constraint, and everything will work correctly. Why do I point this out? Two reasons: first, when partitioned tables were first introduced in PostgreSQL 10, they […]

Daniel Vérité: Nondeterministic collations

$
0
0

Since version 12, PostgreSQL collations are created with a parameter named deterministic, that can be true or false, so that collations are now either deterministic (which they are by default), or nondeterministic.

What does that mean? This term refers to what Unicode calls deterministic comparisons between strings:

This is a comparison where strings that do not have identical binary contents (optionally, after some process of normalization) will compare as unequal

So before version 12, comparisons for collatable types in Postgres are always deterministic according to the above definition. Specifically, when the underlying collation provider (libc or ICU) reports that two strings are equal, a tie-breaker bytewise comparison is performed, so that it’s only when the strings consist of identical binary contents that they are truly equal for Postgres.

Starting with version 12, the new “deterministic” property can be set to false at CREATE COLLATION time to request that string comparisons skip the tie-breaker, so that the memory representations being different is not an obstacle to recognize strings as equal when the underlying locale says they are. This does not only affect direct comparisons or lookups through WHERE clauses, but also the results of GROUP BY, ORDER BY, DISTINCT, PARTITION BY, unique constraints, and everything implying the equality operator.

So what can be achieved with nondeterministic collations?

The most obvious features are case-insensitive and accent-insensitive matching implemented with COLLATE clauses, as opposed to calling explicit functions to do case-mapping (upper, lower) and removal of accents (unaccent). Now that these are accessible through the collation service, the traditional recommendation to use the citext datatype for case-insensitive lookups may start to be reconsidered.

Beyond that, nondeterministic collations allow to match strings that are canonically equivalent (differing only by which Unicode normal form they use), or differ only by compatible sequences, or by punctuation, or by non-displayable characters.

Except for the canonical equivalence, these matching features are optional, and they’re activated by declaring collation attributes inside the locale parameter, especially the comparison levels.

Unicode Technical Report #35 provides a table of collation settings with BCP47 keys and values, but the examples in this post will use ICU “old-style” attributes: colStrength, colCaseLevel, colAlternate rather than “new-style” keys (respectively ks, kc, ka). This is because the former work with all versions of ICU, whereas the latter work only when PostgreSQL is built with ICU version 54 or later (released in 2014). It appears that pre-compiled binaries for Windows are currently built with ICU version 53, so it’s better to stick to the old-style syntax at least for them.

Now, let’s go through a list of fancy comparison features that are enabled by nondeterministic collations.

1. Equality between canonically equivalent sequences of code points

This is a requirement of Unicode that PostgreSQL was not able to fulfill until now. As explained in Unicode equivalence (wikipedia):

Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase “n”) followed by U+0303 (the combining tilde “◌̃”) is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter “ñ” of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other.

Nondeterministic collations will recognize canonically equivalent sequences as equal without requiring any particular collation attribute in the locale argument.

The example below uses a language-agnostic locale: an empty string, that selects the root collation. und may also be used, as the 3-letter BCP-47 tag for “undefined”. Otherwise a language code may be used, optionally followed by a script code and a region code, such as 'fr-CA' for “french as spoken in Canada”.

Example of canonical equivalence between NFD and NFC forms:

CREATECOLLATIONnd(provider='icu',locale='',-- or 'und' (no language or region specified)deterministic=false);SELECTs1,s2,s1=s2COLLATEndASequalFROM(VALUES(E'El Nin\u0303o',E'El Ni\u00F1o'))ASs(s1,s2);
   s1    |   s2    | equal 
---------+---------+-------
 El Niño | El Niño | t

By contrast, with any deterministic collation, we would get f for false in the equal column, since these strings s1 and s2 are bytewise unequal.

2. Equality between compatible sequences of code points

Besides being equivalent, sequences of code points can be merely compatible, in which case they can optionally be considered as equal.

Quoting another part of the above-linked wikipedia entry:

Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the typographic ligature “ff”) is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin “f” letters)

At tertiary strength (the default), these sequences are not equal. Let’s see this in SQL, reusing the "nd" collation previously defined:

SELECTs1,s2,s1=s2COLLATEndASequalFROM(VALUES('shelffull',E'shel\ufb00ull'))ASs(s1,s2);
    s1     |    s2    | equal 
-----------+----------+-------
 shelffull | shelffull | f

But at secondary strength, these sequences compare as equal:

CREATECOLLATIONnd2(provider='icu',locale='@colStrength=secondary',-- or 'und-u-ks-level2'deterministic=false);SELECTs1,s2,s1=s2COLLATEnd2ASequalFROM(values('shelffull',E'shel\ufb00ull'))ASs(s1,s2);
    s1     |    s2    | equal 
-----------+----------+-------
 shelffull | shelffull | t

3. Ignoring case

The most typical use case for nondeterministic collations is probably the case-insensitive comparison. At secondary strength, strings that differ by case compare as equal:

SELECTs1,s2,s1=s2COLLATEnd2ASequalFROM(values('Abc','ABC'))ASs(s1,s2);
 s1  | s2  | equal 
-----+-----+-------
 Abc | ABC | t

4. Ignoring case and accents

Strings that differ by accents or case (or both) compare as equal at primary strength:

CREATECOLLATIONnd1(provider='icu',locale='@colStrength=primary',-- or 'und-u-ks-level1'deterministic=false);SELECTs1,s2,s1=s2COLLATEnd1AS"equal-nd1",s1=s2COLLATEnd2AS"equal-nd2"FROM(values('Été','ete'))ASs(s1,s2);
 s1  | s2  | equal-nd1 | equal-nd2 
-----+-----+-----------+-----------
 Été | ete | t         | f

5. Ignoring accents but not case

It’s possible to ignore accents but not case by staying at primary strength, but setting a boolean attribute to the collation: colCaseLevel.

Example:

CREATECOLLATIONnd2c(provider='icu',locale='und@colStrength=primary;colCaseLevel=yes',-- or 'und-u-ks-level1-kc'deterministic=false);SELECT'Ete'='Eté'COLLATEnd2cASeq1,'Ete'='ete'COLLATEnd2cASeq2;eq1|eq2-----+-----t|f

6. Ignoring spaces and punctuation

The simplest option is to ignore punctuation completely, or “blank” it as refered to in “Ignore Punctuation” Options in ICU documentation.

This is done by activating “Alternate Handling” at strength levels 1 to 3. Since colStrength=tertiary by default, we can leave it unspecified:

CREATECOLLATION"nd3alt"(provider='icu',locale='und@colAlternate=shifted',deterministic=false);SELECT'{your-name?}'='your name'COLLATE"nd3alt"ASequal;equal-------t

7. Matching compatible symbols

colAlternate set to shifted at the quaternary comparison level may also be used to recognize equality between punctuation or symbols that are linguistically equivalent, but appear as distinct sequences of code points. For instance HORIZONTAL ELLIPSIS (U+2026) is equivalent to three consecutive ascii dots (FULL STOP, U+002E), and FULLWIDTH COMMERCIAL AT (U+FF20) is equivalent to COMMERCIAL AT (U+0040) as used in ASCII email addresses.

CREATECOLLATION"nd4alt"(provider='icu',locale='und@colStrength=quaternary;colAlternate=shifted',deterministic=false);SELECT'Wow…!'='Wow...!'COLLATE"nd4alt"ASequal;equal-------t

8. Ignoring code points assigned to invisible characters

At strength level 3 or below, code points in the ranges [\u0001-\u0008], [\u000E-\u001F][\u007f-\u009F] (control characters) are ignored in comparisons. This is also true of code points for spacing characters such as (to list just a few plausible ones):

  • SOFT HYPHEN (U+00AD)
  • ZERO WIDTH SPACE (U+200B)
  • INVISIBLE SEPARATOR (U+2063)
  • LEFT-TO-RIGHT MARK (U+200E)
  • RIGHT-TO-LEFT MARK (U+200F)
  • WORD JOINER (U+2060)
  • …and many more…

Example:

SELECTs1,s2,s1=s2COLLATEndASequalFROM(VALUES('ABC',E'\u200eA\u0001B\u00adC'))ASs(s1,s2);
 s1  |    s2     | equal 
-----+-----------+-------
 ABC | ‎A\x01B­C | t

To have these code points not ignored, the comparison strength should be set at the maximum level, that is colStrength=identical (or ks-identic with the tags syntax). At this level, the only difference with binary equality is the case of strings that differ only by canonically equivalent sequences.

CREATECOLLATION"nd-identic"(provider='icu',locale='und@colStrength=identical',-- or und-u-ks-identicdeterministic=false);SELECT'abc'=E'a\u0001bc'COLLATE"nd-identic"ASequal;equal-------f

Transforming (beyond collations)

German umlauts are sometimes converted into sequences of US-ASCII letters like this:

  • ü => ue, Ü => Ue
  • ö => oe, Ö => Oe
  • ä => ae, Ä => Ae

These equivalences are not recognized as equal sequences by ICU collations, even at primary strength and specifying German (de) as the language. On the other hand, ß (sharp s) and ss are equal at primary strength.

Starting with version 60, ICU provides de-ASCII as a built-in transform rule. Transforms are provided by a different service than collations, which is not exposed by PostgreSQL core (see icu_transform() in icu_ext if you need that, or more generally transliterations between scripts).

Robert Haas: Braces Are Too Expensive

$
0
0
PostgreSQL has what's sometimes called a Volcano-style executor, after a system called Volcano, about which Goetz Greafe published several very interesting papers in the early to mid 1990s. PostgreSQL was in its infancy in those days, but many of the concepts in the Volcano papers have made their way into PostgreSQL over the years. It may also be that Volcano took inspiration from PostgreSQL or its predecessors; I'm not entirely sure of the history or who took inspiration from whom. In any case, the Volcano execution model has been thoroughly embedded in PostgreSQL for the entire history of the database system; the first chinks in the armor only started to appear in 2017.
Read more »

Euler Taveira de Oliveira: Postgres Object ownership

$
0
0
Sometimes I have to fix some object ownership such as tables and views. Let's figure out if there is such object in your database:

--
-- list tables, views, foreign tables and sequences not owned by role postgres
--
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'r'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER VIEW ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'v'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER FOREIGN TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'f'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER SEQUENCE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'S'
AND pg_get_userbyid(c.relowner) <> 'postgres';

This UNION ALL query list tables, views, foreign tables and sequences whose owner is not role postgres. They should be candidates for a new owner (mainly because you are adjusting ownership and permission on testing and/or staging environment). Let's say you want to apply such changes after check that some ownerships are wrong. The following query will output SQL command(s) to change ownership.

--
-- change owner of tables, views, foreign tables and sequences not owned by role postgres
--
SELECT 'ALTER TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'r'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT 'ALTER VIEW ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'v'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT 'ALTER FOREIGN TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'f'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT 'ALTER SEQUENCE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'S'
AND pg_get_userbyid(c.relowner) <> 'postgres';

And if you are using psql (9.6 and later), you should replace last character (semicolon) with \gexec. In this case, instead of print SQL commands, they will be executed. Voila, all tables, views, foreign tables and sequences that were not owned by postgres will switch to a new owner: postgres.

Federico Campoli: Regenerated

$
0
0

With PostgreSQL 12 the generated columns are now supported natively. Until the version Postgresql 11 it were possible to have generated columns using a trigger.

In this post we’ll see how to configure a generated column via trigger and natively then we’ll compare the performances of both strategies.

Hans-Juergen Schoenig: Prewarming PostgreSQL I/O caches

$
0
0

PostgreSQL uses shared_buffers to cache blocks in memory. The idea is to reduce
disk I/O and to speed up the database in the most efficient way
possible. During normal operations your database cache will be pretty useful and
ensure good response times. However, what happens if your database instance is
restarted – for whatever reason? Your PostgreSQL database performance will suffer
until your I/O caches have filled up again. This takes some time and it can
be pretty damaging to your query response times.

pg_prewarm: Filling up your database cache

Fortunately, there are ways in PostgreSQL to fix the problem. pg_prewarm is a
module which allows you to automatically prewarm your caches after a database
failure or a simple restart. The pg_prewarm module is part of the PostgreSQL
contrib package and is usually available on your server by default.
There is no need to install additional third party software. PostgreSQL has all
you need by default.

pg_prewarm

Warming caches manually or automatically.

Basically, pg_prewarm can be used in two ways:

  • Manual caching
  • Automatic caching on startup

Let us take a look at both options and see how the module works in detail. In general automatic pre-warming is, in my judgement, the better way to preload caches – but in some cases, it can also make sense to just warm caches manually (usually for testing purposes).

pg_prewarm: Putting data into shared_buffers manually

Prewarming the cache manually is pretty simple. The following section explains how the process works in general.

The first thing to do is to enable the pg_prewarm extension in your database:

test=# CREATE EXTENSION pg_prewarm;
CREATE EXTENSION

To show how a table can be preloaded, I will first create a table and put it into
the cache:

test=# CREATE TABLE t_test AS
SELECT * FROM generate_series(1, 1000000) AS id;
SELECT 1000000
test=# SELECT * FROM pg_prewarm('public.t_test');
pg_prewarm
------------
4425
(1 row)

All you have to do is to call the pg_prewarm function and pass the name of the
desired table to the function. In my example, 4425 pages have been read and put
into the cache.
4425 blocks translates to roughly 35 MB:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 35 MB
(1 row)

Calling pg_prewarm with one parameter is the easiest way to get started.
However, the module can do a lot more, as shown in the next listing:

test=# \x
Expanded display is on.
test=# \df *pg_prewarm*
List of functions
-[ RECORD 1 ]
---------------------+---------------------------------------------
 Schema              | public
 Name                | pg_prewarm
 Result data type    | bigint
 Argument data types | regclass, mode text DEFAULT 'buffer'::text,
                       fork text DEFAULT 'main'::text,
                       first_block bigint DEFAULT NULL::bigint,
                       last_block bigint DEFAULT NULL::bigint
 Type                | func

In addition to passing the name of the object you want to cache to the function,
you can also tell PostgreSQL which part of the table you want to cache. The
“relation fork” defines whether you want the real data file, the VM (Visibility
Map) or the FSM (Free Space Map). Usually, caching the main table is just fine.
You can also tell PostgreSQL to cache individual blocks. While this is flexible,
it is usually not what you want to do manually.

Automatically populate your PostgreSQL I/O cache

In most cases, people might want pg_prewarm to take care of caching automatically
on startup. The way to achieve this is to add pg_prewarm to
shared_preload_libraries and to restart the database server. The following
example shows how to configure shared_preload_libraries in postgresql.conf:
shared_preload_libraries = ‘pg_stat_statements, pg_prewarm’
After the server has restarted you should be able to see the “autoprewarm
master” process which is in charge of starting up things for you.

80259 ? Ss 0:00 /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data/
80260 ? Ss 0:00 \_ postgres: logger
80262 ? Ss 0:00 \_ postgres: checkpointer
80263 ? Ss 0:00 \_ postgres: background writer
80264 ? Ss 0:00 \_ postgres: walwriter
80265 ? Ss 0:00 \_ postgres: autovacuum launcher
80266 ? Ss 0:00 \_ postgres: stats collector
80267 ? Ss 0:00 \_ postgres: autoprewarm master
80268 ? Ss 0:00 \_ postgres: logical replication launcher

By default, pg_prewarm will store a list of blocks which are currently in memory
on disk. After a crash or a restart, pg_prewarm will automatically restore the
cache as it was when the file was last exported.

When to use pg_prewarm

In general, pg_prewarm makes most sense if your database and your RAM are really
really large (XXX GB or more). In such cases, the difference between cached
and uncached data will be greatest and users will suffer the most if
performance is bad.

Finally …

If you want to learn more about performance and database tuning in general,
consider checking out my post about how to track down slow or time consuming
queries. You might also be interested in taking a look at
http://pgconfigurator.cybertec.at, which is a free website to help you with
database configuration.

The post Prewarming PostgreSQL I/O caches appeared first on Cybertec.


Vasilis Ventirozos: Tuning checkpoints

$
0
0
We recently had the chance to help a customer with some IO related issues that ended up being unconfigured checkpoints. Something that may not always be obvious but can actually be somewhat common.

Let's start with how things roughly work.
Postgres smallest IO unit is a disk block that is 8kb (by default). Each time postgres needs a new block it will fetch it from the disks and load it to an area in RAM called shared_buffers.
When postgres needs to write, it does it in the same manner:
  • Fetches the block(s) from the disks if the block is not in shared_buffers
  • Changes the page in shared buffers.
  • Marks the page as changed (dirty) in shared buffers.
  • It writes the change in  a "sequential ledger of changes" called WAL to ensure durability.

This basically means that the writes are not yet "on disk". This operation is taken care of by a postgres process called checkpointer. Checkpoints are how postgres guarantees that data files and index files will be updated with all the changes that happened before that checkpoint. In case of a crash, postgres will go back to the latest checkpoint record and it will start a REDO operation from WAL. Checkpoints are triggered every checkpoint_timeout (default at 5min) or when changes reach max_wal_size (default at 1GB). This is an IO intensive operation and postgres tries to spread this IO
with checkpoint_completion_target (default at 0.5).

checkpoint_timeout* is the maximum time between checkpoints in seconds.
min_wal_size minimum size of wals that will be recycled rather than removed
max_wal_size** maximum size allowed for wals between checkpoints
checkpoint_completion_target allows data changes to spread over a longer period of time, making the  final fsync() much cheaper.
* Affects recovery time, change only after reviewing the documentation
** This is a softmax, it can exceed this value in special cases.

Best way to start is to set checkpoint_timeout value to something reasonable and set max_wal_size high enough so you won't reach the timeout. To make sense of what is reasonable, you can do the following: schedule (cronjob will do) something like this to run in short periods of time, say every minute:

psql -XqtA -c "copy (select now()::timestamp(0), pg_current_wal_insert_lsn()) to stdout with csv;" monkey >> current_lcn

Leave it running for as long as you see fit, from the result you can extract the difference of 2 locations in bytes like this:

monkey=# SELECT pg_size_pretty(pg_wal_lsn_diff('0/B1277248','0/59CEA2F8'));
 pg_size_pretty
----------------
 1398 MB
(1 row)

(protip: file_fdw + window function)

This function will calculate the difference of later location - earlier location in bytes, so having the location per minute can help calculate (or graph) the rate of changes over time.
With that number, your storage capabilities and your recovery needs in mind you should be able to come up with a good starting point.
max_wal_size should be set high enough so it won't be reached before the timeout. The rate of changes figure we calculated earlier should be a good indication of where to start. min_wal_size has to follow common sense and leave a small portion of wal files to be recycled for the next checkpoint.
checkpoint_completion_target often is just set to 0.9 but if you want to be more "precise" it follows the following rule:
(checkpoint_timeout - 2min) / checkpoint_timeout

To watch how checkpointer is working after the changes, you can query pg_stat_bgwriter. This is a very interesting view that probably deserves it's own blogpost - because of the data you can extract from it - but today we are going to concentrate on 2 columns, checkpoint_timed and checkpoint_req.

checkpoint_timed counts checkpoints that happened triggered by checkpoint_timeout and checkpoint_req is the count of the checkpoint that happened because max_wal_size was hit.
What you want to see there is the majority of the checkpoints to be timed.

And because graphs are cool, you can see an example of a production system before and after checkpoint configuration. The graph is write IOPS over time on an AWS RDS database.
In RDS specifically, i find it weird that checkpoint defaults are so low considering the high shared_buffers default it has.






Thanks for reading
-- Vasilis Ventirozos
-- credativ LLC



Regina Obe: PostGIS 3.0.0 coming soon - Try 3.0.0rc2 at a package repo near you

Alex Korban: Generating land-constrained geographical point grids with PostGIS

$
0
0
When I was in the market for an EV, one of the things I wondered about was how far I would be able to go outside the city before I had to charge it. Having a number for the range isn't enough to know offhand whether I'd be able to reach a particular destination. So I wanted to make the range more obvious in my EV guide by visualising vehicle range on a map. The trivial solution is to use range as a radius and show a circle: Distance circle But of course, that's going to be massively inaccur...

Álvaro Herrera: Managing another PostgreSQL Commitfest

$
0
0
I have written about managing a PostgreSQL commitfest before. During the PostgreSQL 13 development cycle, I did it again. This time I used a different strategy, mostly because I felt that there was excessive accumulation of very old patches that had received insufficient attention. So apart from bugfixes (which are always special cases), I focused […]

Magnus Hagander: Nordic PGDay 2020 - Call for Papers open

$
0
0

The call for papers for Nordic PGDay 2020 in Helsinki, Finland, is now open. Submit your proposals for interesting talks about all things PostgreSQL, and join us in March.

Just like two years ago, the conference is held in cooperation with pgDay.paris which is held two days later. So if you are interested in both, you can submit the same proposal to both conferences at once!

Viewing all 9730 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>