The PostGIS development team is pleased to release PostGIS 3.0.0rc1.
This release works with PostgreSQL 9.5-12 and GEOS >= 3.6
Best served with PostgreSQL 12 , GEOS 3.8.0rc2 and pgRouting 3.0.0-alpha.
Continue Reading by clicking title hyperlink ..The PostGIS development team is pleased to release PostGIS 3.0.0rc1.
This release works with PostgreSQL 9.5-12 and GEOS >= 3.6
Best served with PostgreSQL 12 , GEOS 3.8.0rc2 and pgRouting 3.0.0-alpha.
Continue Reading by clicking title hyperlink ..While managing a small team of development resources working on PostgreSQL development, I sometimes get resources in my team that have good development experience but are new to PostgreSQL. I have developed a short set of training instructions in order to get these resources started with PostgreSQL and get them to familiarise themselves with Postgres and its internals. The purpose of this blog is to share these instructions so it can benefit others in a similar situation. The instructions involve going through a lot of documentation, white-papers, online books, it also includes few development exercises that can be helpful in understanding PostgreSQL codebase. I have found these helpful in the initial training for resources that are new to PostgreSQL, putting them in the blog so they are also helpful to others.
Online Learning Resources
For someone who is new to PostgreSQL, the obvious starting point is understanding PostgreSQL functionality. I would recommend the following resources for reading about PostgreSQL functionality.
http://www.postgresqltutorial.com/
This is a really good place for Postgres tutorials, the tutorial available on this site vary from basic “Getting started with PostgreSQL” to complex features like common table expressions, partition tables, etc. It also contains tutorials for Postgres client’s tools, programming interfaces, etc.
This presentations and tutorial available at Bruce Momjain site are also very useful, this site contains presentations, online books are other material on all sorts of topics related to PostgreSQL. Whether it is talking about query processing, Postgres internal, horizontal scalability with sharding, security, etc, this site contains a lot of useful training material related to PostgreSQL.
https://momjian.us/main/presentations/extended.html
There is of-course community official documentation for every release of PostgreSQL. The difference between each release document is new features added to the release or changes to existing features. You can get documentation for any release using the main docs link below.
https://www.postgresql.org/docs/11/index.html
https://www.postgresql.org/docs/
The above resources should give you a really good insight into PostgreSQL features and functionality. The next place to start is understanding Postgres internals and how PostgreSQL community goes about feature development.
I have found this online book (link below) really good in understanding PostgreSQL internals :
It is very useful in understanding the components involved in query processing and taking a deep dive into PostgreSQL internals for storage, WAL and memory management.
PostgreSQL has one of the best and active community with some of the best techies, you can subscribe to the community mailing list using the link below.
https://www.postgresql.org/list/
The mailing list archives are also available at the same link, the most interesting one for development in psql-hackers where most of the development discussion takes place. The PostgreSQL development team lives in psql-hackers mailing list. You can simply click on any of the mailing lists and search for a particular feature like “TDE” and get all the email threads related to TDE.
This is the best place to understand how community development takes place. For new features, you can see the process of starting from submitting a proposal along with POC patches and getting buy-in from other community members. There is a lot to learn from a community mailing list and understanding how the community goes about in doing PostgreSQL development.
Basic development exercises
It is time to indulge in some development exercises to develop a better understanding of the codebase. It is out of scope of this blog to explain how each one is done, I believe it is already well documented in other online resources.
1)
The first think that I would recommend is getting an understanding of the PostgresSQL regression suite, how it is executed and how to add a new test case to the regression test suite. PostgreSQL has a comprehensive set of test cases that live in “src/test/regress/sql”, last I looked at the master branch the number of test cases files was around 197. The test cases are generally divided based on the functionality i.e. limit.sql, json.sql, etc and for data-type boolean.sql, etc. Each test case file contains detailed test cases for testing the desired functionality.
There are other test cases written in the TAP framework for testing tools and utilities that can’t be tested with just the SQL interface. This is also needed when you need to test with multiple server configurations or need to testing replication/failover or testing backup etc. Please see the link below TAP (Test any protocol) based stated as a PERL based testing framework but later extended to allow writing test cases in other languages.
So someone coming new to PostgreSQL needs to understand how to run the regression test suite and how to add new test cases to regression. We can add test cases for partitioning or any other functionality, make it part of regression schedule and run the regression.
2)
The second exercise is to understand how PostgreSQL extensions work and how to add new extensions to PostgreSQL. The extensions are added to PostgreSQL for making it extensible, The contrib/directory shipped with the source code contains several extensions, which are described in PostgreSQL documentation. Other extensions are developed independently, like PostGIS. Even PostgreSQL replication solutions can be developed externally, Slony is a popular replication solution that is developed outside of the core.
The user can add their own extensions to PostgreSQL according to there needs. This is a really useful that shows how to add a user-defined extension to PostgreSQL.
https://www.highgo.ca/2019/10/01/a-guide-to-create-user-defined-extension-modules-to-postgres/
3)
The third exercise is slightly more challenging but it is very useful in understanding the internals of PostgreSQL. The exercise is to basically follow a SELECT query in PostgreSQL internals and see how the query changes as it passes through the internal components of PostgreSQL. This is done using a debugger and compiling PostgreSQL source with enable-debug and watching the state by placing appropriate breakpoints in the code.
Another real blog is written by Cary (HighGo) that shows how this is done…
https://www.highgo.ca/2019/10/03/trace-query-processing-internals-with-debugger/
Ahsan Hadi is a VP of Development with HighGo Software Inc. Prior to coming to HighGo Software, Ahsan had worked at EnterpriseDB as a Senior Director of Product Development, Ahsan worked with EnterpriseDB for 15 years. The flagship product of EnterpriseDB is Postgres Plus Advanced server which is based on Open source PostgreSQL. Ahsan has vast experience with Postgres and has lead the development team at EnterpriseDB for building the core compatibility of adding Oracle compatible layer to EDB’s Postgres Plus Advanced Server. Ahsan has also spent number of years working with development team for adding Horizontal scalability and sharding to Postgres. Initially, he worked with postgres-xc which is multi-master sharded cluster and later worked on managing the development of adding horizontal scalability/sharding to Postgres. Ahsan has also worked a great deal with Postgres foreign data wrapper technology and worked on developing and maintaining FDW’s for several sql and nosql databases like MongoDB, Hadoop and MySQL.
Prior to EnterpriseDB, Ahsan worked for Fusion Technologies as a Senior Project Manager. Fusion Tech was a US based consultancy company, Ahsan lead the team that developed java based job factory responsible for placing items on shelfs at big stores like Walmart. Prior to Fusion technologies, Ahsan worked at British Telecom as a Analyst/Programmer and developed web based database application for network fault monitoring.
Ahsan joined HighGo Software Inc (Canada) in April 2019 and is leading the development teams based in multiple Geo’s, the primary responsibility is community based Postgres development and also developing HighGo Postgres server.
Did you know that your temporary tables are not cleaned up by autovacuum? If you did not, consider reading this blog post about PostgreSQL and autovacuum. If you did – well, you can still continue to read this article.
Since the days of PostgreSQL 8.0, the database has provided this miraculous autovacuum daemon which is in charge of cleaning tables and indexes. In many cases, the default configuration is absolutely ok and people don’t have to worry about VACUUM much. However, recently one of our support clients sent us an interesting request related to temporary tables and autovacuum.
What is the problem? The main issue is that autovacuum does not touch temporary tables. Yes, it’s true – you have to VACUUM temporary tables on your own. But why is this the case? Let’s take a look at how the autovacuum job works in general: Autovacuum sleeps for a minute, wakes up and checks if a table has seen a sufficiently large number of changes before it fires up a cleanup process. The important thing is that the cleanup process actually has to see the objects it will clean, and this is where the problem starts. An autovacuum process has no way of seeing a temporary table, because temporary tables can only be seen by the database connection which actually created them. Autovacuum therefore has to skip temporary tables. Unfortunately, most people are not aware of this issue. As long as you don’t use your temporary tables for extended periods, the missing cleanup job is not an issue. However, if your temp tables are repeatedly changed in long transactions, it can become a problem.
The main question now is: How can we verify what I have just said? To show you what I mean, I will load the pgstattuple extension and create two tables– a “real” one, and a temporary one:
test=# CREATE EXTENSION pgstattuple; CREATE EXTENSION test=# CREATE TABLE t_real AS SELECT * FROM generate_series(1, 5000000) AS id; SELECT 5000000 test=# CREATE TEMPORARY TABLE t_temp AS SELECT * FROM generate_series(1, 5000000) AS id; SELECT 5000000
Let us now kill half of the data in each those two tables:
test=# DELETE FROM t_real WHERE id % 2 = 0; DELETE 2500000 test=# DELETE FROM t_temp WHERE id % 2 = 0; DELETE 2500000
The tables will now contain around 50% trash each. If we wait sufficiently long, we will see that autovacuum has cleaned up the real table while the temporary one is still in jeopardy:
test=# \x Expanded display is on. test=# SELECT * FROM pgstattuple('t_real'); -[ RECORD 1 ] ............---------------+-..--------- table_len | 181239808 tuple_count | 2500000 tuple_len | 70000000 tuple_percent | 38.62 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 80620336 free_percent | 44.48 test=# SELECT * FROM pgstattuple('t_temp'); -[ RECORD 1 ] --------------------+------------ table_len | 181239808 tuple_count | 2500000 tuple_len | 70000000 tuple_percent | 38.62 dead_tuple_count | 2500000 dead_tuple_len | 70000000 dead_tuple_percent | 38.62 free_space | 620336 free_percent | 0.34
The “real table” has already been cleaned and a lot of free space is available, while the temporary table still contains a ton of dead rows. Only a manual job will find the free space in all that jumble.
Keep in mind that VACUUM is only relevant if you really want to keep the temporary table for a long time. If you close your connection, the entire space will be automatically reclaimed anyway– so there is no need to worry about dropping the table.
If you want to learn more about VACUUM in general, consider checking out one of our other blogposts. If you are interested in how VACUUM works, it also is definitely useful to read the official documentation, which can be found here
The post What is autovacuum doing to my temporary tables? appeared first on Cybertec.
The transactional model has been in PostgreSQL since the early versions. PostgreSQL implementation follows the guidelines of the SQL standard some notable exceptions.
When designing an application it’s important to understand how the concurrent access to data happens in order to avoid unexpected results or errors.
2019 October 17th Meeting 6pm-8pm
Location:
PSU Business Accelerator
2828 SW Corbett Ave · Portland, OR
Parking is open after 5pm.
Speaker: Mark Wong
pg_top was born in 2007 from a fork of the unixtop, a terminal program displaying top processes on the system, where pg_top focuses on the processes with the PostgreSQL database you are connected to. Recently pg_systat was forked from systat to display additional database statistics.
These tools have can help do more such as explore query execution plans and create reports from system and database resources.
Come learn about the statistics PostgreSQL keeps and how to use these tools to view them.
Mark leads the 2ndQuadrant performance practice as a Performance Consultant for English Speaking Territories, based out of Oregon in the USA. He is a long time Contributor to PostgreSQL, co-organizer of the Portland PostgreSQL User Group, and serves as a Director and Treasurer for the United States PostgreSQL Association.
PostgreSQL 12 can be considered revolutionary considering the performance boost we observe with partitioning enhancements, planner improvements, several SQL features, Indexing improvements, etc. You may see some of such features discussed in future blog posts. But, let me start this blog with something interesting. You might have already seen some news that there is no
recovery.filein standby anymore and that the replication setup (streaming replication) has slightly changed in PostgreSQL 12. We have earlier blogged about the steps involved in setting up a simple Streaming Replication until PostgreSQL 11 and also about using replication slots for the same. Let’s see how different is it to set up the same Streaming Replication in PostgreSQL 12.
On CentOS/RedHat, you may use the rpms available in the PGDG repo (the following link may change depending on your OS release).
# as root: yum install -y https://yum.postgresql.org/12/redhat/rhel-7.4-x86_64/pgdg-redhat-repo-latest.noarch.rpm -y yum install -y postgresql12-server
In the following steps, the Master server is: 192.168.0.108 and the Standby server is: 192.168.0.107
Step 1 :
Initialize and start PostgreSQL, if not done already on the Master.
## Preparing the environment $ sudo su - postgres $ echo "export PATH=/usr/pgsql-12/bin:$PATH PAGER=less" >> ~/.pgsql_profile $ source ~/.pgsql_profile ## As root, initialize and start PostgreSQL 12 on the Master $ /usr/pgsql-12/bin/postgresql-12-setup initdb $ systemctl start postgresql-12
Step 2 :
Modify the parameter
listen_addressesto allow a specific IP interface or all (using *). Modifying this parameter requires a restart of the PostgreSQL instance to get the change into effect.
# as postgres $ psql -c "ALTER SYSTEM SET listen_addresses TO '*'"; ALTER SYSTEM # as root, restart the service $ systemctl restart postgresql-12
You may not have to set any other parameters on the Master for simple replication setup, because the defaults hold good.
Step 3 :
Create a User for replication in the Master. It is discouraged to use superuser postgres in order to setup replication, though it works.
postgres=# CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret'; CREATE ROLE
Step 4 :
Allow replication connections from Standby to Master by appending a similar line as following to the
pg_hba.conffile of the Master. If you are enabling automatic failover using any external tool, you must also allow replication connections from Master to the Standby. In the event of a failover, the Standby may be promoted as a Master and the Old Master need to replicate changes from the New Master (previously a standby). You may use any of the authentication methods as supported by PostgreSQL today.
$ echo "host replication replicator 192.168.0.107/32 md5" >> $PGDATA/pg_hba.conf ## Get the changes into effect through a reload. $ psql -c "select pg_reload_conf()"
Step 5 :
You may use
pg_basebackupto backup the data directory of the Master from the Standby. While creating the backup, you may also tell
pg_basebackupto create the replication specific files and entries in the data directory using
"-R".
## This command must be executed on the standby server. $ pg_basebackup -h 192.168.0.108 -U replicator -p 5432 -D $PGDATA -Fp -Xs -P -R Password: 25314/25314 kB (100%), 1/1 tablespace
You may use multiple approaches such as rsync or any other disk backup methods to copy the master’s data directory to the standby. But, there is an important file (standby.signal) that must exist in a standby data directory to help postgres determine its state as a standby. It is automatically created when you use the
"-R"option while taking
pg_basebackup. If not, you may simply use touch to create this empty file.
$ touch $PGDATA/standby.signal $ ls -l $PGDATA total 60 -rw-------. 1 postgres postgres 224 Oct 8 16:41 backup_label drwx------. 5 postgres postgres 41 Oct 8 16:41 base -rw-------. 1 postgres postgres 30 Oct 8 16:41 current_logfiles drwx------. 2 postgres postgres 4096 Oct 8 16:41 global drwx------. 2 postgres postgres 32 Oct 8 16:41 log drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_commit_ts drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_dynshmem -rw-------. 1 postgres postgres 4581 Oct 8 16:41 pg_hba.conf -rw-------. 1 postgres postgres 1636 Oct 8 16:41 pg_ident.conf drwx------. 4 postgres postgres 68 Oct 8 16:41 pg_logical drwx------. 4 postgres postgres 36 Oct 8 16:41 pg_multixact drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_notify drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_replslot drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_serial drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_snapshots drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_stat drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_stat_tmp drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_subtrans drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_tblspc drwx------. 2 postgres postgres 6 Oct 8 16:41 pg_twophase -rw-------. 1 postgres postgres 3 Oct 8 16:41 PG_VERSION drwx------. 3 postgres postgres 60 Oct 8 16:41 pg_wal drwx------. 2 postgres postgres 18 Oct 8 16:41 pg_xact -rw-------. 1 postgres postgres 288 Oct 8 16:41 postgresql.auto.conf -rw-------. 1 postgres postgres 26638 Oct 8 16:41 postgresql.conf -rw-------. 1 postgres postgres 0 Oct 8 16:41 standby.signal
One of the most important observations should be the contents of the
postgresql.auto.conffile in the standby server. As you see in the following log, an additional parameter
primary_conninfohas been added to this file. This parameter tells the standby about its Master. If you haven’t used
pg_basebackupwith
-Roption, you would not see this entry (of
primary_conninfo) in this file, on the standby server. Which means that you have to add this manually.
$ cat $PGDATA/postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. listen_addresses = '*' primary_conninfo = 'user=replicator password=secret host=192.168.0.108 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
postgresql.auto.conffile is the configuration file that is read at the end when you start Postgres. So, if there is a parameter that has different values in
postgresql.confand
postgresql.auto.conffiles, the value set in the
postgresql.auto.confis considered by PostgreSQL. Also, any parameter that has been modified using
ALTER SYSTEMwould automatically be written to
postgresql.auto.conffile by postgres.
Until PostgreSQL 11, we must create a file named:
recovery.confthat contains the following minimalistic parameters. If the
standby_modeis ON, it is considered to be a standby.
$ cat $PGDATA/recovery.conf standby_mode = 'on' primary_conninfo = 'host=192.168.0.8 port=5432 user=replicator password=secret'
So the first difference between PostgreSQL 12 and earlier (until PostgreSQL 11) is that the
standby_modeparameter is not present in PostgreSQL 12 and the same has been replaced by an empty file
standby.signalin the standby’s data directory. And the second difference is the parameter
primary_conninfo. This can now be added to the
postgresql.confor
postgresql.auto.conffile of the standby’s data directory.
Step 6 :
Start PostgreSQL using
pg_ctlon the Standby.
$ pg_ctl -D $PGDATA start
Step 7 :
Verify the replication between the Master and the Standby. In order to verify, run this command on the Master. In the following log, you see a lot of details of the standby and the lag between the Master and Standby.
$ psql -x -c "select * from pg_stat_replication" -[ RECORD 1 ]----+------------------------------ pid | 2522 usesysid | 16384 usename | replicator application_name | walreceiver client_addr | 192.168.0.107 client_hostname | client_port | 36382 backend_start | 2019-10-08 17:15:19.658917-04 backend_xmin | state | streaming sent_lsn | 0/CB02A90 write_lsn | 0/CB02A90 flush_lsn | 0/CB02A90 replay_lsn | 0/CB02A90 write_lag | 00:00:00.095746 flush_lag | 00:00:00.096522 replay_lag | 00:00:00.096839 sync_priority | 0 sync_state | async reply_time | 2019-10-08 17:18:04.783975-04
Most of the time, the default or modified retention settings of WAL segments on the Master may not be enough to maintain a healthy replication between itself and its standby. So, we need the WALs to be safely archived to another disk or a remote backup server. These archived WAL segments can be used by the standby to replay them when the WALs are gone from the Master.
To enable archiving on the Master, we can still use the same approach of setting the following 2 parameters.
archive_mode = ON archive_command = 'cp %p /archives/%f' ## Modify this with an appropriate shell command.
But to enable recovery from archives on a standby, we used to add a parameter named
restore_commandto the
recovery.conffile until PostgreSQL 11. But starting from PostgreSQL 12, we can add the same parameter to
postgresql.confor
postgresql.auto.conffile of the standby. Please note that it requires a restart of PostgreSQL to update the changes made to
archive_modeand
restore_commandparameters.
echo "restore_command = 'cp /archives/%f %p'" >> $PGDATA/postgresql.auto.conf pg_ctl -D $PGDATA restart -mf
In my next blog post, I shall talk about Point-in-time-recovery on PostgreSQL 12, where I will discuss a few more parameters related to recovery in detail. Meanwhile, have you tried Percona Distribution for PostgreSQL? It is a collection of finely-tested and implemented open source tools and extensions along with PostgreSQL 11, maintained by Percona. Please subscribe to our blog posts to learn more interesting features in PostgreSQL.
The PostGIS development team is pleased to release PostGIS 3.0.0rc2. This will be the final RC before release.
This release works with PostgreSQL 9.5-12 and GEOS >= 3.6
Best served with PostgreSQL 12 , GEOS 3.8.0 and pgRouting 3.0.0-alpha.
Continue Reading by clicking title hyperlink ..Since version 12, PostgreSQL collations are created with a parameter named deterministic, that can be true or false, so that collations are now either deterministic (which they are by default), or nondeterministic.
What does that mean? This term refers to what Unicode calls deterministic comparisons between strings:
This is a comparison where strings that do not have identical binary contents (optionally, after some process of normalization) will compare as unequal
So before version 12, comparisons for collatable types in Postgres are always deterministic according to the above definition. Specifically, when the underlying collation provider (libc or ICU) reports that two strings are equal, a tie-breaker bytewise comparison is performed, so that it’s only when the strings consist of identical binary contents that they are truly equal for Postgres.
Starting with version 12, the new “deterministic” property can be set
to false
at CREATE COLLATION
time to request that string comparisons
skip the tie-breaker, so that the memory representations being different
is not an obstacle to recognize strings as equal when the underlying locale
says they are.
This does not only affect direct comparisons or lookups through WHERE
clauses, but also the results of GROUP BY, ORDER BY, DISTINCT,
PARTITION BY, unique constraints, and everything implying the equality
operator.
So what can be achieved with nondeterministic collations?
The most obvious features are case-insensitive and accent-insensitive
matching implemented with COLLATE clauses, as opposed to calling
explicit functions to do case-mapping
(upper
, lower
)
and removal of accents (unaccent
).
Now that these are accessible through the collation service,
the traditional recommendation to use the citext datatype for
case-insensitive lookups may start to be reconsidered.
Beyond that, nondeterministic collations allow to match strings that are canonically equivalent (differing only by which Unicode normal form they use), or differ only by compatible sequences, or by punctuation, or by non-displayable characters.
Except for the canonical equivalence, these matching features are
optional, and they’re activated by declaring collation attributes
inside the locale
parameter, especially the comparison levels.
Unicode Technical Report #35 provides a table of collation settings with BCP47 keys and values,
but the examples in this post will use ICU “old-style” attributes:
colStrength
, colCaseLevel
, colAlternate
rather than “new-style” keys (respectively ks
, kc
, ka
). This is because the
former work with all versions of ICU, whereas the latter work only when PostgreSQL is built
with ICU version 54 or later (released in 2014). It appears that pre-compiled binaries for Windows are currently built with ICU version 53,
so it’s better to stick to the old-style syntax at least for them.
Now, let’s go through a list of fancy comparison features that are enabled by nondeterministic collations.
This is a requirement of Unicode that PostgreSQL was not able to fulfill until now. As explained in Unicode equivalence (wikipedia):
Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase “n”) followed by U+0303 (the combining tilde “◌̃”) is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter “ñ” of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other.
Nondeterministic collations will recognize canonically equivalent
sequences as equal without requiring any particular collation attribute
in the locale
argument.
The example below uses a language-agnostic locale: an empty string,
that selects the root collation. und
may also be used, as the 3-letter BCP-47 tag for “undefined”. Otherwise a language code may be used, optionally followed by a script code and a region code, such as 'fr-CA'
for “french as spoken in Canada”.
Example of canonical equivalence between NFD and NFC forms:
CREATECOLLATIONnd(provider='icu',locale='',-- or 'und' (no language or region specified)deterministic=false);SELECTs1,s2,s1=s2COLLATEndASequalFROM(VALUES(E'El Nin\u0303o',E'El Ni\u00F1o'))ASs(s1,s2);
s1 | s2 | equal
---------+---------+-------
El Niño | El Niño | t
By contrast, with any deterministic collation, we would get f
for false in the equal
column, since these strings s1
and s2
are bytewise unequal.
Besides being equivalent, sequences of code points can be merely compatible, in which case they can optionally be considered as equal.
Quoting another part of the above-linked wikipedia entry:
Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the typographic ligature “ff”) is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin “f” letters)
At tertiary strength (the default), these sequences are not equal.
Let’s see this in SQL, reusing the "nd"
collation previously defined:
SELECTs1,s2,s1=s2COLLATEndASequalFROM(VALUES('shelffull',E'shel\ufb00ull'))ASs(s1,s2);
s1 | s2 | equal
-----------+----------+-------
shelffull | shelffull | f
But at secondary strength, these sequences compare as equal:
CREATECOLLATIONnd2(provider='icu',locale='@colStrength=secondary',-- or 'und-u-ks-level2'deterministic=false);SELECTs1,s2,s1=s2COLLATEnd2ASequalFROM(values('shelffull',E'shel\ufb00ull'))ASs(s1,s2);
s1 | s2 | equal
-----------+----------+-------
shelffull | shelffull | t
The most typical use case for nondeterministic collations is probably the case-insensitive comparison. At secondary strength, strings that differ by case compare as equal:
SELECTs1,s2,s1=s2COLLATEnd2ASequalFROM(values('Abc','ABC'))ASs(s1,s2);
s1 | s2 | equal
-----+-----+-------
Abc | ABC | t
Strings that differ by accents or case (or both) compare as equal at primary strength:
CREATECOLLATIONnd1(provider='icu',locale='@colStrength=primary',-- or 'und-u-ks-level1'deterministic=false);SELECTs1,s2,s1=s2COLLATEnd1AS"equal-nd1",s1=s2COLLATEnd2AS"equal-nd2"FROM(values('Été','ete'))ASs(s1,s2);
s1 | s2 | equal-nd1 | equal-nd2
-----+-----+-----------+-----------
Été | ete | t | f
It’s possible to ignore accents but not case by staying at primary strength, but setting a boolean attribute to the collation: colCaseLevel.
Example:
CREATECOLLATIONnd2c(provider='icu',locale='und@colStrength=primary;colCaseLevel=yes',-- or 'und-u-ks-level1-kc'deterministic=false);SELECT'Ete'='Eté'COLLATEnd2cASeq1,'Ete'='ete'COLLATEnd2cASeq2;eq1|eq2-----+-----t|f
The simplest option is to ignore punctuation completely, or “blank” it as refered to in “Ignore Punctuation” Options in ICU documentation.
This is done by activating “Alternate Handling” at strength levels 1 to 3.
Since colStrength=tertiary
by default, we can leave it unspecified:
CREATECOLLATION"nd3alt"(provider='icu',locale='und@colAlternate=shifted',deterministic=false);SELECT'{your-name?}'='your name'COLLATE"nd3alt"ASequal;equal-------t
colAlternate
set to shifted
at the quaternary comparison level
may also be used to recognize equality
between punctuation or symbols that are linguistically equivalent, but appear as
distinct sequences of code points.
For instance HORIZONTAL ELLIPSIS (U+2026) is equivalent to three consecutive
ascii dots (FULL STOP, U+002E), and FULLWIDTH COMMERCIAL AT (U+FF20) is
equivalent to COMMERCIAL AT (U+0040) as used in ASCII email addresses.
CREATECOLLATION"nd4alt"(provider='icu',locale='und@colStrength=quaternary;colAlternate=shifted',deterministic=false);SELECT'Wow…!'='Wow...!'COLLATE"nd4alt"ASequal;equal-------t
At strength level 3 or below, code points
in the ranges [\u0001-\u0008]
, [\u000E-\u001F]
[\u007f-\u009F]
(control characters) are ignored in comparisons.
This is also true of code points for spacing characters such as
(to list just a few plausible ones):
Example:
SELECTs1,s2,s1=s2COLLATEndASequalFROM(VALUES('ABC',E'\u200eA\u0001B\u00adC'))ASs(s1,s2);
s1 | s2 | equal
-----+-----------+-------
ABC | A\x01BC | t
To have these code points not ignored, the comparison strength should
be set at the maximum level, that is colStrength=identical
(or
ks-identic
with the tags syntax).
At this level, the only difference with binary equality is the case of strings that
differ only by canonically equivalent sequences.
CREATECOLLATION"nd-identic"(provider='icu',locale='und@colStrength=identical',-- or und-u-ks-identicdeterministic=false);SELECT'abc'=E'a\u0001bc'COLLATE"nd-identic"ASequal;equal-------f
German umlauts are sometimes converted into sequences of US-ASCII letters like this:
These equivalences are not recognized as equal sequences by ICU collations, even at primary strength and specifying German (de) as the language. On the other hand, ß (sharp s) and ss are equal at primary strength.
Starting with version 60, ICU provides de-ASCII as a built-in
transform rule.
Transforms are provided by a different service than collations, which is not
exposed by PostgreSQL core (see icu_transform()
in icu_ext if you need that, or more generally transliterations between scripts).
--
-- list tables, views, foreign tables and sequences not owned by role postgres
--
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'r'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER VIEW ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'v'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER FOREIGN TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'f'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT n.nspname AS SCHEMA,
c.relname AS relation,
pg_get_userbyid(c.relowner) AS ROLE,
'ALTER SEQUENCE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'S'
AND pg_get_userbyid(c.relowner) <> 'postgres';
--
-- change owner of tables, views, foreign tables and sequences not owned by role postgres
--
SELECT 'ALTER TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'r'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT 'ALTER VIEW ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'v'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT 'ALTER FOREIGN TABLE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'f'
AND pg_get_userbyid(c.relowner) <> 'postgres'
UNION ALL
SELECT 'ALTER SEQUENCE ' || quote_ident(nspname) || '.' || quote_ident(relname) || ' OWNER TO postgres;' AS command
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE nspname !~ '^pg_'
AND nspname <> 'information_schema'
AND relkind = 'S'
AND pg_get_userbyid(c.relowner) <> 'postgres';
With PostgreSQL 12 the generated columns are now supported natively. Until the version Postgresql 11 it were possible to have generated columns using a trigger.
In this post we’ll see how to configure a generated column via trigger and natively then we’ll compare the performances of both strategies.
PostgreSQL uses shared_buffers to cache blocks in memory. The idea is to reduce
disk I/O and to speed up the database in the most efficient way
possible. During normal operations your database cache will be pretty useful and
ensure good response times. However, what happens if your database instance is
restarted – for whatever reason? Your PostgreSQL database performance will suffer
until your I/O caches have filled up again. This takes some time and it can
be pretty damaging to your query response times.
Fortunately, there are ways in PostgreSQL to fix the problem. pg_prewarm is a
module which allows you to automatically prewarm your caches after a database
failure or a simple restart. The pg_prewarm module is part of the PostgreSQL
contrib package and is usually available on your server by default.
There is no need to install additional third party software. PostgreSQL has all
you need by default.
Basically, pg_prewarm can be used in two ways:
Let us take a look at both options and see how the module works in detail. In general automatic pre-warming is, in my judgement, the better way to preload caches – but in some cases, it can also make sense to just warm caches manually (usually for testing purposes).
Prewarming the cache manually is pretty simple. The following section explains how the process works in general.
The first thing to do is to enable the pg_prewarm extension in your database:
test=# CREATE EXTENSION pg_prewarm; CREATE EXTENSION
To show how a table can be preloaded, I will first create a table and put it into
the cache:
test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 1000000) AS id; SELECT 1000000 test=# SELECT * FROM pg_prewarm('public.t_test'); pg_prewarm ------------ 4425 (1 row)
All you have to do is to call the pg_prewarm function and pass the name of the
desired table to the function. In my example, 4425 pages have been read and put
into the cache.
4425 blocks translates to roughly 35 MB:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 35 MB (1 row)
Calling pg_prewarm with one parameter is the easiest way to get started.
However, the module can do a lot more, as shown in the next listing:
test=# \x Expanded display is on. test=# \df *pg_prewarm* List of functions -[ RECORD 1 ] ---------------------+--------------------------------------------- Schema | public Name | pg_prewarm Result data type | bigint Argument data types | regclass, mode text DEFAULT 'buffer'::text, fork text DEFAULT 'main'::text, first_block bigint DEFAULT NULL::bigint, last_block bigint DEFAULT NULL::bigint Type | func
In addition to passing the name of the object you want to cache to the function,
you can also tell PostgreSQL which part of the table you want to cache. The
“relation fork” defines whether you want the real data file, the VM (Visibility
Map) or the FSM (Free Space Map). Usually, caching the main table is just fine.
You can also tell PostgreSQL to cache individual blocks. While this is flexible,
it is usually not what you want to do manually.
In most cases, people might want pg_prewarm to take care of caching automatically
on startup. The way to achieve this is to add pg_prewarm to
shared_preload_libraries and to restart the database server. The following
example shows how to configure shared_preload_libraries in postgresql.conf:
shared_preload_libraries = ‘pg_stat_statements, pg_prewarm’
After the server has restarted you should be able to see the “autoprewarm
master” process which is in charge of starting up things for you.
80259 ? Ss 0:00 /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data/ 80260 ? Ss 0:00 \_ postgres: logger 80262 ? Ss 0:00 \_ postgres: checkpointer 80263 ? Ss 0:00 \_ postgres: background writer 80264 ? Ss 0:00 \_ postgres: walwriter 80265 ? Ss 0:00 \_ postgres: autovacuum launcher 80266 ? Ss 0:00 \_ postgres: stats collector 80267 ? Ss 0:00 \_ postgres: autoprewarm master 80268 ? Ss 0:00 \_ postgres: logical replication launcher
By default, pg_prewarm will store a list of blocks which are currently in memory
on disk. After a crash or a restart, pg_prewarm will automatically restore the
cache as it was when the file was last exported.
In general, pg_prewarm makes most sense if your database and your RAM are really
really large (XXX GB or more). In such cases, the difference between cached
and uncached data will be greatest and users will suffer the most if
performance is bad.
If you want to learn more about performance and database tuning in general,
consider checking out my post about how to track down slow or time consuming
queries. You might also be interested in taking a look at
http://pgconfigurator.cybertec.at, which is a free website to help you with
database configuration.
The post Prewarming PostgreSQL I/O caches appeared first on Cybertec.
PostGIS 3.0.0 is planned for release early next week. In the meantime you will find PostGIS 3.0.0rc1 or rc2 available via yum.postgresql.org, apt.postgresql.org, and EDB Windows 64-bit stackbuilder for PostgreSQL 12.
Continue reading "PostGIS 3.0.0 coming soon - Try 3.0.0rc2 at a package repo near you"The call for papers for Nordic PGDay 2020 in Helsinki, Finland, is now open. Submit your proposals for interesting talks about all things PostgreSQL, and join us in March.
Just like two years ago, the conference is held in cooperation with pgDay.paris which is held two days later. So if you are interested in both, you can submit the same proposal to both conferences at once!