Regina Obe: PostGIS 3.0.0

October 19, 2019, 5:00 pm

≫ Next: Pavel Stehule: precompiled libraries for orafce 3.8 for PostgreSQL 10, 11 and 12

≪ Previous: Magnus Hagander: Nordic PGDay 2020 - Call for Papers open

The PostGIS development team is pleased to release PostGIS 3.0.0.

This release works with PostgreSQL 9.5-12 and GEOS >= 3.6.

If you are using postgis_sfcgal extension, you need to compile against SFCGAL 1.3.1 or higher.

Best served with PostgreSQL 12 , GEOS 3.8.0 and pgRouting 3.0.0-beta.

Continue Reading by clicking title hyperlink ..

↧

Pavel Stehule: precompiled libraries for orafce 3.8 for PostgreSQL 10, 11 and 12

October 20, 2019, 2:55 am

≫ Next: Pavel Stehule: dll for plpgsql_check 1.7 are available for PostgreSQL 10, 11, and 12

≪ Previous: Regina Obe: PostGIS 3.0.0

I upload precompiled libraries to postgres.cz/files/orafce-3.8.0-x64.zip. These libraries are 64bit only (there is not 32bit build for Postgres 11 and 12). For 32bit or older PostgreSQL releases, please, use older orafce builds. Windows build is partially reduced - there are not support for PLVlex for PostgreSQL 10 and 11 (due compilation problems), and there are not support for utl_file (due crash in these functions - I am not able to fix it on MSWIN platform).

Installation - a) install Visual C++ Redistributable for Visual Studio 2015, b) copy *.sql and *.control to ../PostgreSQL/version/share and *.dll (after to rename to just orafce.dll to ../PostgreSQL/version/lib.

↧

Pavel Stehule: dll for plpgsql_check 1.7 are available for PostgreSQL 10, 11, and 12

October 20, 2019, 2:10 pm

≫ Next: Dimitri Fontaine: Table of Content

≪ Previous: Pavel Stehule: precompiled libraries for orafce 3.8 for PostgreSQL 10, 11 and 12

I prepared dll libraries. You can load it from url plpgsql_check-1.7.6-x64-win.zip.

Installation

Download, unzip and choose related dll file
rename to plpgsql_check.dll and copy to PostgreSQL's lib directory (Program Files/PostgreSQL/10/lib)
copy plpgsql_check-1.7.sql and plpgsql_check.control to PostgreSQL's share/extension directory (PostgreSQL/10/share/extension).
with super user rights (user postgres) run command CREATE EXTENSION plpgsql_check;.

Itcan needed installed a Microsoft Visual C++ 2015 SP1 Redistributable Package https://www.microsoft.com/en-US/download/details.aspx?id=48145.

Please, check it.

↧

Dimitri Fontaine: Table of Content

October 13, 2019, 3:05 pm

≫ Next: Laurenz Albe: Never lose a PostgreSQL transaction with pg_receivewal

≪ Previous: Pavel Stehule: dll for plpgsql_check 1.7 are available for PostgreSQL 10, 11, and 12

Photo by unsplash-logoNicole Honeywill / Sincerely Media Each part of The Art of PostgreSQL can be read on its own, or you can read this book from the first to the last page in the order of the parts and chapters therein. A great deal of thinking have been put in the ordering of the parts, so that reading “The Art of PostgreSQL” in a linear fashion should provide the best experience.

↧

Laurenz Albe: Never lose a PostgreSQL transaction with pg_receivewal

October 22, 2019, 4:30 am

≫ Next: Vik Fearing: pgDay Paris 2020 - Call for Papers Open

≪ Previous: Dimitri Fontaine: Table of Content

pg_receivewal makes IBM envious
© Laurenz Albe 2019

“Durability”, the D of ACID, demands that a committed database transaction remains committed, no matter what. For normal outages like a power failure, this is guaranteed by the transaction log (WAL). However, if we want to guarantee durability even in the face of more catastrophic outages that destroy the WAL, we need more advanced methods.

This article discusses how to use pg_receivewal to maintain durability even under dire circumstances.

Archiving WAL with `archive_command`

The “traditional” method of archiving the transaction log is the archive_command in postgresql.conf. The DBA has to set this parameter to a command that archives a WAL segment after it is completed.

Popular methods include:

Use cp (or copy on Windows) to copy the file to network attached storage like NFS.
Call a command like scp or rsync to copy the file to a remote machine.
Call an executable from your favorite PostgreSQL backup software

The important thing to consider is that the archived WAL segment is stored somewhere else than the database.

I have a redundant distributed storage system, do I still need to store WAL archives somewhere else?

Yes, because there is still a single point of failure: the file system.
If the file system becomes corrupted through a hardware or software problem, all the redundant distributed copies of your WAL archive can vanish or get corrupted.

If you believe that this is so unlikely that it borders on the paranoid: I have seen it happen.
A certain level of professional paranoia is a virtue in a DBA.

When `archive_command` isn’t good enough

If your database server gets destroyed so that its disks are no longer available, we will still lose some committed transactions: the transactions in the currently active WAL segment. Remember that PostgreSQL archives a WAL segment usually when it is full. So up to 16MB worth of committed transactions can vanish with the active WAL segment.

To reduce the impact, you can set archive_timeout: that will set the maximum time between WAL archivals. But for some applications, that just isn’t good enough: If you cannot afford to lose a single transaction even in the event of a catastrophe, WAL archiving just won’t do the trick.

`pg_receivewal` comes to the rescue

PostgreSQL 9.2 introduced pg_receivexlog, which has been renamed to pg_receivewal in v10. This client program will open a replication connection to PostgreSQL and stream WAL, just like streaming replication does. But instead of applying the information to a standby server, it writes it do disk. This way, it creates a copy of the WAL segments in real time. The partial WAL segment that pg_receivewal is currently writing has the extension .partial to distinguish it from completed WAL archives. Once the segment is complete, pg_receivewal will rename it.

pg_receivewal is an alternative to WAL archiving that avoids the gap between the current and the archived WAL location. It is a bit more complicated to manage and monitor, because it is a separate process and should run on a different machine.

`pg_receivewal` and synchronous replication

By default, replication is asynchronous, so pg_receivewal can still lose a split second’s worth of committed transactions in the case of a crash. If you cannot even afford that, you can switch to synchronous replication. That guarantees that not a single committed transaction can get lost, but it comes at a price:

Since every commit requires a round trip to pg_receivewal, it will take significantly longer. This has an impact on the number of writing transactions your system can support.
Keep the network latency low!
If you have only a single synchronous standby server (pg_receivewal acts as a standby), the availability of your system is reduced. This is because PostgreSQL won’t commit any more transactions if your only standby is unavailable.
To avoid that problem, you need at least two synchronous pg_receivewal processes.

Archive recovery and partial WAL segments

Now if the worst has happened and you need to recover, you’ll have to make sure to restore the partial WAL segments as well. In the simple case where you archive to an NFS mount, the restore_command could be as simple as this:

restore_command = 'cp /walarchive/%f %p || cp /walarchive/%f.partial %p'

Conclusion

With careful design and a little effort, you can set up a PostgreSQL system that can never lose a single committed transaction even under the most dire circumstances. Integrate this with a high availability setup for maximum data protection.

The post Never lose a PostgreSQL transaction with pg_receivewal appeared first on Cybertec.

↧

Vik Fearing: pgDay Paris 2020 - Call for Papers Open

October 22, 2019, 1:00 am

≫ Next: Vik Fearing: pgDay Paris 2020 - Registration Open

≪ Previous: Laurenz Albe: Never lose a PostgreSQL transaction with pg_receivewal

The call for papers for pgDay Paris 2020 is now open. Submit your proposals for interesting talks about all things PostgreSQL, and join us in March!

https://2020.pgday.paris/callforpapers/

↧

Vik Fearing: pgDay Paris 2020 - Registration Open

October 23, 2019, 1:00 am

≫ Next: Avinash Kumar: Seamless Application Failover using libpq Features in PostgreSQL

≪ Previous: Vik Fearing: pgDay Paris 2020 - Call for Papers Open

Registration for pgDay Paris 2020 is now open.

We have some new ticket types this year, including a very cheap BLIND ticket that is only sold until the schedule is published. If you know you are coming to the conference no matter what, hurry up and grab one of these!

We also have cheap tickets this year for students and the unemployed. Valid proof is required.

https://2020.pgday.paris/registration/

↧

Avinash Kumar: Seamless Application Failover using libpq Features in PostgreSQL

October 23, 2019, 9:58 am

≫ Next: Regina Obe: PostgreSQL 12 64-bit for Windows FDWs

≪ Previous: Vik Fearing: pgDay Paris 2020 - Registration Open

When you build replication in PostgreSQL using Streaming replication, you cannot perform writes to a standby node; only reads. This way, you could offload reads or reporting queries to standby servers and send writes to master. Additionally, starting from PostgreSQL 10,

libpq

and

psql

clients could probe the connection for a master and allow connections to a master for

read-write

any

node for

read-only

connections automatically.

For example, consider three database nodes – Server_A, Server_B, and Server_C in replication using streaming replication, with Server_A being the Master/Primary node. You could specify all three servers in a connection string and request the connection to be redirected to a

read-write

node only, which is Server_A in this scenario. If a failover or a switchover happened to Server_B, the

read-write

connections will be automatically redirected to Server_B. To understand this in detail, let us see a simple scenario in action.

I have set up a three-node replication cluster using streaming replication with the following roles.

192.168.70.10 is the master
192.168.70.20 is the first standby
192.168.70.30 is the second standby

$psql -h 192.168.70.10
Password for user postgres:
psql (11.5)
Type "help" for help.

postgres=# select inet_server_addr() as "connected_to";
connected_to
---------------
192.168.70.10
(1 row)

postgres=# select client_addr, write_lag,flush_lag,replay_lag from pg_stat_replication;
client_addr | write_lag | flush_lag | replay_lag
---------------+-----------------+-----------------+-----------------
192.168.70.20 | 00:00:00.058204 | 00:00:00.058617 | 00:00:00.058619
192.168.70.30 | 00:00:00.03639 | 00:00:00.036689 | 00:00:00.036693
(2 rows)

Now, let us use

psql

with all the three IPs specified in the connection string. We would, however, use

target_session_attrs

this time to connect to a master node.

Connecting to Master Using Read-Write Mode

$ psql 'postgres://192.168.70.20:5432,192.168.70.10:5432,192.168.70.30:5432/postgres?target_session_attrs=read-write' -c "select inet_server_addr()"
Password for user postgres:
inet_server_addr
------------------
192.168.70.10
(1 row)

Connecting to any Server for Reads

Please note that the server that is first in the list is automatically connected when you used target_session_attrs as

any

$ psql 'postgres://192.168.70.20:5432,192.168.70.10:5432,192.168.70.30:5432/postgres?target_session_attrs=any' -c "select inet_server_addr()"
inet_server_addr
------------------
192.168.70.20
(1 row)

Or

$ psql 'postgres://192.168.70.10:5432,192.168.70.20:5432,192.168.70.30:5432/postgres?target_session_attrs=any' -c "select inet_server_addr()"
inet_server_addr
------------------
192.168.70.10
(1 row)

If the server that is first in the list is not reachable, the driver tries to connect to the next server in the list for reads. So, a reads connection would never fail when you have multiple standbys and at least one of the database nodes is reachable while using target_session_attrs as “any”.

-- On Server : 192.168.70.10

$ pg_ctl -D $PGDATA stop -mf
waiting for server to shut down.... done
server stopped
[postgres@pg1]$ psql 'postgres://192.168.70.10:5432,192.168.70.20:5432,192.168.70.30:5432/postgres?target_session_attrs=any' -c "select inet_server_addr()"
inet_server_addr
------------------
192.168.70.20
(1 row)

An important point to note is that the driver might take additional time in connecting to each node in the list to determine if it is a master. Let’s say that the server:

192.168.70.10

is no longer a master and

192.168.70.20

(second in the list of servers in the connection string) is the new master accepting writes. When you specify that the connections should go to a

read-write

node, the driver checks if the first server in the list accepts writes and then connects to the second server. If the first server is not reachable, then you may experience further delay. However, this is still a seamless failover as you do not have to disturb the application during this switchover.

Let us say that you use

Python

PHP

to connect to PostgreSQL. As the application interfaces for

Python

PHP

, and several other programming languages use

libpq

as the underlying engine, you could use multiple IPs in the connection string and request the connections be redirected to a

read-write

any

node.

Below is an example to achieve this with Python. I have a written a simple python script and specified

target_session_attrs

"read-write"

by passing multiple IPs to the host. Now, when I execute the script, it confirms the IP connected to (192.168.70.10 is a master here) and shows that the server is not in a

recovery

mode.

$ cat pg_conn.py
import psycopg2
conn = psycopg2.connect(database="postgres",host="192.168.70.10,192.168.70.20,192.168.70.30", user="postgres", password="secret", port="5432", target_session_attrs="read-write")
cur = conn.cursor()
cur.execute("select pg_is_in_recovery(), inet_server_addr()")
row = cur.fetchone()
print "recovery =",row[0]
print "server =",row[1]

$ python pg_conn.py
recovery = False
server = 192.168.70.10

I could similarly use PHP to connect to postgres and specify that the connections should only be directed to a master node as seen in the following example.

# cat pg_conn.php
<?php
$conn = pg_connect("host=192.168.70.10,192.168.70.20,192.168.70.30 port=5432 dbname=postgres user=postgres password=secret target_session_attrs=read-write") or die("Could not connect");
$status = pg_connection_status($conn);
if ($status === PGSQL_CONNECTION_OK) {
print "Connection status ok\n";
} else {
print "Connection status bad\n";
}
$sql = pg_query($conn, "select pg_is_in_recovery()");
while ($row = pg_fetch_row($sql)) {
echo "Recovery-status: $row[0]\n";
}
?>

$ php -f pg_conn.php
Connection status ok
Recovery-status: f
Server: 192.168.70.10

An important point to note is that the clients are able to achieve this because they are using

libpq

that belongs to PG10 or later.

# yum info python2-psycopg2-2.8.3-2.rhel7.x86_64 | grep repo
From repo : pgdg11

# rpm -q --requires python2-psycopg2-2.8.3-2.rhel7.x86_64 | grep libpq
libpq.so.5()(64bit)

# rpm -q --requires php-pgsql-5.4.16-46.el7 | grep libpq
libpq.so.5()(64bit)

# locate libpq.so.5
/usr/pgsql-11/lib/libpq.so.5

We have discussed that you might expect some slowness due to multiple hops while connecting to an appropriate master server, but this approach still helps for a seamless application failover. And we have discussed the built-in mechanism available with Community PostgreSQL by default. In the next blog post, Jobin Augustine will be talking about using HAProxy (Open Source) for achieving a much more robust and reliable way to perform a seamless application failover with PostgreSQL.

↧

Regina Obe: PostgreSQL 12 64-bit for Windows FDWs

October 23, 2019, 4:55 pm

≫ Next: Michael Banck: pg_checksums 1.0 released

≪ Previous: Avinash Kumar: Seamless Application Failover using libpq Features in PostgreSQL

We are pleased to provide binaries for file_textarray_fdw and odbc_fdw for PostgreSQL 12 Windows 64-bit.

To use these, copy the files into your PostgreSQL 12 Windows 64-bit install folders in same named folders and then run CREATE EXTENSION as usual in the databases of your choice. More details in the packaged README.txt

Continue reading "PostgreSQL 12 64-bit for Windows FDWs"

↧

Michael Banck: pg_checksums 1.0 released

October 23, 2019, 6:00 am

≫ Next: cary huang: Vancouver Postgres Group Meetup Event – Kubernetes Best Practices for Distributed SQL databases

≪ Previous: Regina Obe: PostgreSQL 12 64-bit for Windows FDWs

pg_checksums 1.0 released

Version 1.0 of pg_checksums has been released. pg_checksums verifies, activates or deactivates data checksums in PostgreSQL instances. It is based on the pg_checksums utility in PostgreSQL 12, with the following additions: 1. Online verification of checksums The pg_checksums utility in PostgreSQL...

23-10

Michael Banck

↧

cary huang: Vancouver Postgres Group Meetup Event – Kubernetes Best Practices for Distributed SQL databases

October 25, 2019, 5:32 pm

≫ Next: Jonathan Katz: Monitoring PostgreSQL Clusters in Kubernetes

≪ Previous: Michael Banck: pg_checksums 1.0 released

Date: October 24, 2019

Guest Speaker: Andrew Nelson, Developer Advocate from YugaByte

About Vancouver Postgres User Meetup Group

Vancouver Postgres is a Postgres user meetup group based in Vancouver, Canada. It specializes in building Postgres users to the related ecosystem including but not limited to technologies such as RDS Postgres, Aurora for Postgres, Google Postgres, PostgreSQL.Org Postgres, Greenplum, Timescale and ZomboDB.

User Group Home Page:https://www.meetup.com/Vancouver-Postgres/

Guest Speaker: Andrew Nelson

We are pleased to announced Andrew Nelson from YugaByte as guest speaker, who will share his extensive knowledge in distributed SQL databases and kubernetes deployment to the local Vancouver meetup group.

Andrew giving his A game in the presentation

Andrew has over 20 years of technical experience in the field of cloud computing, enterprise storage, virtualization, disaster recovery and big data and has worked for several large companies such as Nutanix and Vmware.

Andrew recently joined Yugabyte as a Developer Advocate with strong focus on usability and extensibility of YugaByte DB as a data platform within the Kubernetes and public cloud ecosystem.About the Presentation

About the Presentation

Here in Vancouver, Andrew shared the 4 important stages to deploy distributed databases with Kubernetes with great emphasis on the Design stage.

Design
Release Management
Operations
Monitoring

Andrew did a fantastic job to deliver this technical presentation in a fun and interesting way by using real life references to demonstrate the 4 important stages.

He uses bricks and mortars to emphasize the importance of fundational work, the Mondadnock Building in Chicago as a building built solely by bricks to illustrate the importance of good management and operation, and finally to the empire state building in New York as a skyscraper build by bricks supported by steel frames to illustrate the importance of support and monitoring.

Overall, it is a very interesting meetup event.

Cary Huang

A multi-disciplined software developer specialised in C/C++ Software development, network security, embedded software, firewall, and IT infrastructure

↧

Jonathan Katz: Monitoring PostgreSQL Clusters in Kubernetes

October 26, 2019, 6:56 am

≫ Next: Pavel Stehule: watch mode for pspg

≪ Previous: cary huang: Vancouver Postgres Group Meetup Event – Kubernetes Best Practices for Distributed SQL databases

The open source PostgreSQL Operator provides many features that are required to run a production database-as-a-service on Kubernetes, including provisioning PostgreSQL clusters, performing backup and restore operations, and manage high-availability runtime environments.

This time, we are going to look at a very important part of managing PostgreSQL in production, namely, how to monitor and visualize the health of PostgreSQL databases. A quick refresher on how to install the PostgreSQL Operator using Ansible, this article will show you how to configure and deploy a PostgreSQL cluster with full monitoring capabilities using pgMonitor in a Kubernetes environment!

Background on pgMonitor

pgMonitor is an open source project that combines a suite of tools to quickly stand up a monitoring environment that helps you to visualize what is occurring within your PostgreSQL clusters. It includes the open source time-series database Prometheus and the open source charting and dashboard visualization tool Grafana. Combined with several data exporters, pgMonitor facilitates the collection and visualization of important metrics (e.g. system resource utilization, database size, transaction wraparound, bloat, etc.) that you need to be aware of in your PostgreSQL environment.

↧

Pavel Stehule: watch mode for pspg

October 27, 2019, 12:25 am

≫ Next: Beena Emerson: Benchmark Partition Table - 1

≪ Previous: Jonathan Katz: Monitoring PostgreSQL Clusters in Kubernetes

I released version 2.5.0 of pspg.

Major feature of this release is possibility to bypass psql and take data from Postgres directly (option: -q, --query). Now the pspg can be used like very simple Postgres client. The main benefit of this feature is in relation to new watch mode. The result of query can be refreshed every n seconds (option: -w). It's allow to use the pspg for simple fresh data presentations, for simple monitoring. The refreshing can be paused/stared by pressing space.

↧

Beena Emerson: Benchmark Partition Table - 1

October 27, 2019, 12:02 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – pgbench: add –partitions and –partition-method options.

≪ Previous: Pavel Stehule: watch mode for pspg

With the addition of declarative partitioning in PostgreSQL 10, it only made sense to extend the existing pgbench benchmarking module to create partitioned tables. A recent commit of patch by Fabien Coelho in PostgreSQL 13 has made this possible.

The pgbench_accounts table can now be partitioned with --partitions and --partition-method options which specify the number of partitions and the partitioning method accordingly when we initialize the database.

pgbench -i --partitions <integer> [--partition-method <method>]

partitions : This must be a positive integer value

partition-method : Currently only range and hash are supported and the default is range.

pgbench will throw an error if the --partition-method is specified without a valid --partitions option.

For range partitions, the given range is equally split into the specified partitions. The lower bound of the first partition is MINVALUE and the upper bound of the last partition is MAXVALUE. For hash partitions, the number of partitions specified is used in the modulo operation.

Test Partitions

I performed a few tests using the new partition options with the following settings:

pgbench scale = 5000 (~63GB data + 10GB indexes)
pgbench thread/client count = 32
shared_buffers = 32GB
min_wal_size = 15GB
max_wal_size = 20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9
synchronous_commit=on

The hardware specification of the machine on which the benchmarking was performed is as follows:

IBM POWER8 Server
Red Hat Enterprise Linux Server release 7.1 (Maipo) (with kernel Linux 3.10.0-229.14.1.ael7b.ppc64le)
491GB RAM
IBM,8286-42A CPUs (24 cores, 192 with HT)

Two different types of queries were tested:

Read-only default query: It was run using the existing -S option of pgbench.

Range query: The following custom query which searches for a range that is 0.002% of the total rows was used.

\set v1 random(1, 100000 * :scale)

\set v2 :v1 + 1000000

BEGIN;

SELECT abalance FROM pgbench_accounts WHERE v1 BETWEEN :v1 AND :v2;

END;

Tests were run for both range and hash partition types. The following table shows the median of three tps readings taken and the tps increase in percentage when compared to the non-partitioned table.

	Read-only Default Query				Range Query
non-partitioned	323331.60				35.36
partitions	range		hash		range		hash
partitions	tps	increase	tps	increase	tps	increase	tps	increase
100	201648.82	-37.63 %	208805.45	-35.42 %	36.92	4.40 %	35.31	-0.16 %
200	189642.09	-41.35 %	199718.17	-38.23 %	37.63	6.42 %	34.34	-2.90 %
300	191242.31	-40.85 %	203182.88	-37.16 %	38.33	8.38 %	34.01	-3.82 %
400	186329.88	-42.37 %	189118.42	-41.51 %	49.43	39.78 %	34.86	-1.44 %
500	189727.31	-41.32 %	195470.47	-39.54 %	48.39	36.83 %	33.19	-6.13 %
600	185143.62	-42.74 %	191237.48	-40.85 %	45.42	28.44 %	32.42	-8.32 %
700	179190.37	-44.58 %	178999.73	-44.64 %	42.18	19.29 %	32.57	-7.91 %
800	170432.79	-47.29 %	173027.42	-46.49 %	45.82	29.57 %	31.38	-11.28 %

Read-only Default Query

In this type of OLTP point query, we are selecting only one row. Internally, an index scan is performed on the pgbench_accounts_pkey for the value being queried. In the non-partitioned case, the index scan is performed on the only index present. However, for the partitioned case, the partition details are collected and then partition pruning is carried out before performing an index scan on the selected partition.

As seen on the graph, the different types of partitions do not show much change in behavior because we would be targeting only one row in one particular partition. This drop in performance for the partitioned case can be attributed to the overhead of handling a large number of partitions. The performance is seen to slowly degrade as the number of partitions is increased.

Range Custom Query

In this type of query, one million rows which are about 0.002% of the total entries are targeted in sequence. In the non-partitioned case, the singular primary key is searched for all of the given range. As in the previous case, for the partitioned table, partitioning pruning is attempted before the index scan is performed on the smaller indexes of the selected partitions.

Given the way the different partition types sort out the rows, the given range being queried will only be divided amongst at most two partitions in the range type but it would be scattered across all the partitions for hash type. As expected the range type fares much better in this scenario given the narrowed search being performed. The hash type performs worse as it is practically doing a full index search, like in the non-partitioned case, along with bearing the overhead of partition handling.

We can discern that range partitioned tables are very beneficial when the majority of the queries are range queries. We have not seen any benefit for hash partitions in these tests but they are expected to fare better in certain scenarios involving sequential scans. We can conclude that the partition type and other partition parameters should be set only after thorough analysis as the incorrect implementation of partition can tremendously decrease the overall performance.

I want to extend a huge thank you to all those who have contributed to this much essential feature which makes it possible to benchmark partitioned tables - Fabein Coelho, Amit Kapila, Amit Langote, Dilip Kumar, Asif Rehman, and Alvaro Herrera.

---

This blog is also published on postgresrocks.

↧

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – pgbench: add –partitions and –partition-method options.

October 28, 2019, 2:56 am

≫ Next: Craig Kerstiens: Interesting Upcoming pgDays

≪ Previous: Beena Emerson: Benchmark Partition Table - 1

On 3rd of October 2019, Amit Kapila committed patch: pgbench: add --partitions and --partition-method options. These new options allow users to partition the pgbench_accounts table by specifying the number of partitions and partitioning method. The values allowed for partitioning method are range and hash. This feature allows users to measure the overhead of … Continue reading

↧

Craig Kerstiens: Interesting Upcoming pgDays

October 29, 2019, 7:50 am

≫ Next: Avinash Kumar: Monitoring PostgreSQL Databases Using PMM

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 13 – pgbench: add –partitions and –partition-method options.

I’ve been to a lot of conferences over the years. PgConf EU, PostgresOpen, too many pgDays to count, and even more none Postgres conferences (OSCON, Strangeloop, Railsconf, PyCon, LessConf, and many more). I’ve always found Postgres conferences one of the best places to get training and learn about what’s new with Postgres (in addition to Dimitri’s recent book, more on that below). They’re my regular stop to catch up on all the new features of a release before it comes out, and often there is a talk highlighting what is new with a simple easy to understand summary once released.

I just got back from PGConf EU a little over a week ago and it was a great time. I’m sure we’ll see some rundowns of it start appearing on Postgres planet. But, as far as I’m concerned PGConf EU is in the past (unless your counting next year which is in Berlin-in which case I’ll see you there). For me it’s time to look to the future and there are a number of upcoming pgDays I’m looking forward to.

The first two I want to highlight are separate events, but you’ll notice they’re scheduled nicely for you to easily attend both. With a day in between for travel you’ll find that many speakers and attendees depart one and head straight to the other. It makes for an easy opportunity to visit two cities, see two different communities, and yet not have to spend too much time traveling. The first is Nordic pgDay in Helsinki. It’s coming up on March 24. The second is pgDay Paris on March 26. Both of these are great single track conferences. If you’re in Europe of fancy a trip to Europe I recommend giving them a look, and even better the CFPs are open so consider submitting.

Another pgDay I have to mention is right in my backyard. pgDay SF I’m particularly excited for a few reasons:

San Francisco is very much a central tech hub, which means a great chance of learning from folks at many many interesting tech companies in attendance
Just like Nordic and Paris this is a single track conference, which I’m a personal fan of because you can have continuity between talks and shared conversation with other attendees
The venue! If you’re not from the bay area you may not be aware, but Swedish American Hall is well known music venue within SF. It’s had many famous artists over the years and now pgDay SF joins the ranks.

This isn’t an exhaustive list of course, just a few on my personal list that I hope to make it to. If you’re there and see me make sure to say hi!

If you’re looking for a deeper resource on Postgres I recommend the book The Art of PostgreSQL. It is by a personal friend that has aimed to create the definitive guide to Postgres, from a developer perspective. If you use code CRAIG15 you’ll receive 15% off as well.

↧

Avinash Kumar: Monitoring PostgreSQL Databases Using PMM

October 29, 2019, 11:16 am

≫ Next: Kaarel Moppel: Upgrading Postgres major versions using Logical Replication

≪ Previous: Craig Kerstiens: Interesting Upcoming pgDays

Monitoring PostgreSQL with Percona Monitoring Management

PostgreSQL is a widely-used Open Source database and has been ranked # 1 for the past 2 years in DB-Engine rankings. As such, there is always a need for reliable and robust monitoring solutions. While there are some commercial monitoring tools, there is an equally good number of open source tools available for monitoring PostgreSQL. Percona Monitoring and Management (PMM) is one of those open source solutions that have continuous improvements and is maintained forever by Percona. It is simple to set up and easy to use.

PMM can monitor not only PostgreSQL but also MySQL and MongoDB databases, so it is a simple monitoring solution for monitoring multiple types of databases. In this blog post, you will see all the steps involved in monitoring PostgreSQL databases using PMM.

This is what we will be discussing:

Using the PMM docker image to create a PMM server.
Installing PMM client on a Remote PostgreSQL server and connecting the PostgreSQL Client to PMM Server.
Creating required users and permissions on the PostgreSQL server.
Enabling PostgreSQL Monitoring with and without QAN (Query Analytics)

If you already know how to create a PMM Server, please skip the PMM server setup and proceed to the PostgreSQL client setup.

Using the PMM docker image to create a PMM server

PMM is a client-server architecture where clients are the PostgreSQL, MySQL, or MongoDB databases and the server is the PMM Server. We see a list of metrics on the Grafana dashboard by connecting to the PMM server on the UI. In order to demonstrate this setup, I have created 2 virtual machines where one of them is the PMM Server and the second server is the PostgreSQL database server.

192.168.80.10 is my PMM-Server
192.168.80.20 is my PG 11 Server

Step 1 :

On the PMM Server, install and start docker.

# yum install docker -y
# systemctl start docker

Here are the installation instructions of PMM Server.

Step 2 :

Pull the pmm-server docker image. I am using the latest PMM2 docker image for this setup.

$ docker pull percona/pmm-server:2

You see a docker image of size 1.48 GB downloaded after the above step.

$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/percona/pmm-server 2 cd30e7343bb1 2 weeks ago 1.48 GB

Step 3 :

Create a container for persistent PMM data.

$ docker create \
-v /srv \
--name pmm-data \
percona/pmm-server:2 /bin/true

Step 4 :

Create and launch the PMM Server. In the following step, you can see that we are binding the port 80 of the container to the port 80 of the host machine. Likewise for port 443.

$ docker run -d \
-p 80:80 \
-p 443:443 \
--volumes-from pmm-data \
--name pmm-server \
--restart always \
percona/pmm-server:2

At this stage, you can modify certain settings such as the memory you wish to allocate to the container or the CPU share, etc. You can also see more such configurable options using

docker run --help

. The following is just an example of how you can modify the above step with some memory or CPU allocations.

$ docker run -d \
-p 80:80 \
-p 443:443 \
--volumes-from pmm-data \
--name pmm-server \
--cpu-shares 100 \
--memory 1024m \
--restart always \
percona/pmm-server:2

You can list the containers started for validation using

docker ps

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bb6043082d3b percona/pmm-server:2 "/opt/entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp pmm-server

Step 5 :

You can now see the PMM Server Dashboard in the browser using the Host IP address. For my setup, the PMM Server’s IP Address is

192.168.80.10

. As soon as you put the IP in the browser, you will be asked to enter the credentials as seen in the image below. Default user and password are both:

admin

create a PMM server

And then you will be asked to change the password or skip.

PMM Server setup is completed after this step.

Installing PMM client on a Remote PostgreSQL server

I have a PostgreSQL 11.5 Server running on

192.168.80.20

. The following steps demonstrate how we can install and configure the PMM client to enable monitoring from the PMM server (

192.168.80.10

Before you proceed further, you must ensure that ports 80 and 443 are both enabled on the PMM server for the PG 11 Server to connect. In order to test that, I have used telnet to validate whether ports 80 and 443 are open on the PMM Server for the pg11 server.

[root@pg11]$ hostname -I
192.168.80.20

[root@pg11]$ telnet 192.168.80.10 80
Trying 192.168.80.10...
Connected to 192.168.80.10.
Escape character is '^]'.

[root@pg11]$ telnet 192.168.80.10 443
Trying 192.168.80.10...
Connected to 192.168.80.10.
Escape character is '^]'.

Step 6 :

There are very few steps you need to perform on the PostgreSQL server to enable it as a client for PMM server. The first step is to install the PMM Client on the PostgreSQL Database server as follows. Based on the current PMM release, I am installing

pmm2-client

today. But, this may change once we have a new PMM release.

$ sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
$ sudo yum install pmm2-client -y

Step 7 :

The next step is to connect the client (PostgreSQL server) to the PMM Server. We could use

pmm-admin config

in order to achieve that. Following is a simple syntax that you could use in general.

$ pmm-admin config [<flags>] [<node-address>] [<node-type>] [<node-name>]

The following are the flags and other options I could use with my setup.

flags        : --server-insecure-tls
               --server-url=https://admin:admin@192.168.80.10:443
               (--server-url should contain the PMM Server Host information)

node-address : 192.168.80.20
               (My PostgreSQL Server)

node-type    : generic
               (As I am running my PostgreSQL database on a Virtual Machine but not on a Container, it is generic.)

node-name    : pg-client
               (Can be any nodename you could use to uniquely identify this database server on your PMM Server Dashboard)

So the final syntax for my setup looks like the below. We can run this command as root or by using the sudo command.

Syntax : 7a

$ pmm-admin config --server-insecure-tls --server-url=https://admin:admin@192.168.80.10:443 192.168.80.20 generic pg-client

$ pmm-admin config --server-insecure-tls --server-url=https://admin:admin@192.168.80.10:443 192.168.80.20 generic pg-client
Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.
Checking local pmm-agent status...
pmm-agent is running.

Syntax : 7b

You could also use a simple syntax such as following without

node-address, node-type, node-name

$ pmm-admin config --server-insecure-tls --server-url=https://admin:admin@192.168.80.10:443

But when you use such a simple syntax as above,

node-address, node-type, node-name

are defaulted to certain values. If the defaults are incorrect due to your server configuration, you may better pass these details explicitly like I have done in the

syntax : 7a

. In order to validate whether the defaults are correct, you can simply use

# pmm-admin config --help

. In the following log, you see that the

node-address

defaults to

10.0.2.15

which is incorrect for my setup. It should be

192.168.80.20

# pmm-admin config --help
usage: pmm-admin config [<flags>] [<node-address>] [<node-type>] [<node-name>]

Configure local pmm-agent

Flags:
  -h, --help                   Show context-sensitive help (also try --help-long and --help-man)
      --version                Show application version
...
...
...
Args:
  [<node-address>]  Node address (autodetected default: 10.0.2.15)

Below is an example where the default settings were perfect because I had configured my database server the right way.

# pmm-admin config --help
usage: pmm-admin config [<flags>] [<node-address>] [<node-type>] [<node-name>]

Configure local pmm-agent

Flags:
  -h, --help                   Show context-sensitive help (also try --help-long and --help-man)
...
...
Args:
  [<node-address>]  Node address (autodetected default: 192.168.80.20)
  [<node-type>]     Node type, one of: generic, container (default: generic)
  [<node-name>]     Node name (autodetected default: pg-client)

Using steps 6 and 7a, I have finished installing the PMM client on the PostgreSQL server and also connected it to the PMM Server. If the above steps are successful, you should see the client listed under Nodes, as seen in the following image. Else, something went wrong.

Creating required users and permissions on the PostgreSQL server

In order to monitor your PostgreSQL server using PMM, you need to create a user *using* which the database stats can be collected by the PMM agent. However, starting from PostgreSQL 10, you do not need to grant SUPERUSER or use SECURITY DEFINER (to avoid granting SUPERUSER). You can simply grant the role

pg_monitor

to a user (monitoring user). In my next blog post, you will see how we could use SECURITY DEFINER to avoid granting SUPERUSER for monitoring PostgreSQL databases with 9.6 or older.

Assuming that your PostgreSQL Version is 10 or higher, you can use the following steps.

Step 1 :

Create a postgres user that can be used for monitoring. You could choose any username;

pmm_user

in the following command is just an example.

$ psql -c "CREATE USER pmm_user WITH ENCRYPTED PASSWORD 'secret'"

Step 2 :

Grant

pg_monitor

role to the

pmm_user

$ psql -c "GRANT pg_monitor to pmm_user"

Step 3 :

If you are not using localhost, but using the IP address of the PostgreSQL server while enabling monitoring in the next steps, you should ensure to add appropriate entries to enable connections from the

IP

and the

pmm_user

in the

pg_hba.conf

file.

$ echo "host    all             pmm_user        192.168.80.20/32        md5" >> $PGDATA/pg_hba.conf
$ psql -c "select pg_reload_conf()"

In the above step, replace

192.168.80.20

with the appropriate PostgreSQL Server’s IP address.

Step 4 :

Validate whether you are able to connect as

pmm_user

to the postgres database from the postgres server itself.

# psql -h 192.168.80.20 -p 5432 -U pmm_user -d postgres
Password for user pmm_user: 
psql (11.5)
Type "help" for help.

postgres=>

Enabling PostgreSQL Monitoring with and without QAN (Query Analytics)

Using PMM, we can monitor several metrics in PostgreSQL such as database connections, locks, checkpoint stats, transactions, temp usage, etc. However, you could additionally enable Query Analytics to look at the query performance and understand the queries that need some tuning. Let us see how we can simply enable PostgreSQL monitoring with and without QAN.

Without QAN

Step 1 :

In order to start monitoring PostgreSQL, we could simply use

pmm-admin add postgresql

. It accepts additional arguments such as the service name and PostgreSQL address and port. As we are talking about enabling monitoring without QAN, we could use the flag:

--query-source=none

to disable QAN.

# pmm-admin add postgresql --query-source=none --username=pmm_user --password=secret postgres 192.168.80.20:5432
PostgreSQL Service added.
Service ID  : /service_id/b2ca71cf-a2a4-48e3-9c5b-6ecd1a596aea
Service name: postgres

Step 2 :

Once you have enabled monitoring, you could validate the same using

pmm-admin list

# pmm-admin list
Service type  Service name         Address and port  Service ID
PostgreSQL    postgres             192.168.80.20:5432 /service_id/b2ca71cf-a2a4-48e3-9c5b-6ecd1a596aea

Agent type                  Status     Agent ID                                        Service ID
pmm-agent                   connected  /agent_id/13fd2e0a-a01a-4ac2-909a-cae533eba72e  
node_exporter               running    /agent_id/f6ba099c-b7ba-43dd-a3b3-f9d65394976d  
postgres_exporter           running    /agent_id/1d046311-dad7-467e-b024-d2c8cb7f33c2  /service_id/b2ca71cf-a2a4-48e3-9c5b-6ecd1a596aea

You can now access the PostgreSQL Dashboards and see several metrics being monitored.

With QAN

With PMM2, there is an additional step needed to enable QAN. You should create a database with the same name as the monitoring user (

pmm_user

here). And then, you should create the extension:

pg_stat_statements

in that database. This behavior is going to change on the next release so that you can avoid creating the database.

Step 1 :

Create the database with the same name as the monitoring user. Create the extension:

pg_stat_statements

in the database.

$ psql -c "CREATE DATABASE pmm_user"
$ psql -c -d pmm_user "CREATE EXTENSION pg_stat_statements"

Step 2 :

shared_preload_libraries

has not been set to

pg_stat_statements

, we need to set it and restart PostgreSQL.

$ psql -c "ALTER SYSTEM SET shared_preload_libraries TO 'pg_stat_statements'"
$ pg_ctl -D $PGDATA restart -mf
waiting for server to shut down.... done
server stopped
...
...
 done
server started

Step 3 :

In the previous steps, we used the flag:

--query-source=none

to disable QAN. In order to enable QAN, you could just remove this flag and use

pmm-admin add postgresql

without the flag.

# pmm-admin add postgresql --username=pmm_user --password=secret postgres 192.168.80.20:5432
PostgreSQL Service added.
Service ID  : /service_id/24efa8b2-02c2-4a39-8543-d5fd54314f73
Service name: postgres

Step 4 :

Once the above step is completed, you could validate the same again using

pmm-admin list

. But this time, you should see an additional service:

qan-postgresql-pgstatements-agent

# pmm-admin list
Service type  Service name         Address and port  Service ID
PostgreSQL    postgres             192.168.80.20:5432 /service_id/24efa8b2-02c2-4a39-8543-d5fd54314f73

Agent type                  Status     Agent ID                                        Service ID
pmm-agent                   connected  /agent_id/13fd2e0a-a01a-4ac2-909a-cae533eba72e  
node_exporter               running    /agent_id/f6ba099c-b7ba-43dd-a3b3-f9d65394976d  
postgres_exporter           running    /agent_id/7039f7c4-1431-4518-9cbd-880c679513fb  /service_id/24efa8b2-02c2-4a39-8543-d5fd54314f73
qan-postgresql-pgstatements-agent running    /agent_id/7f0c2a30-6710-4191-9373-fec179726422  /service_id/24efa8b2-02c2-4a39-8543-d5fd54314f73

After this step, you can now see the Queries and their statistics captured on the

Query Analytics Dashboard

Meanwhile, have you tried Percona Distribution for PostgreSQL? It is a collection of finely-tested and implemented open source tools and extensions along with PostgreSQL 11, maintained by Percona. PMM works for both Community PostgreSQL and also the Percona Distribution for PostgreSQL. Please subscribe to our blog posts to learn more interesting features in PostgreSQL.

↧

Kaarel Moppel: Upgrading Postgres major versions using Logical Replication

October 30, 2019, 1:00 am

≫ Next: Luca Ferrari: pgenv: adjust your PATH!

≪ Previous: Avinash Kumar: Monitoring PostgreSQL Databases Using PMM

Some weeks ago, in the light of PostgreSQL v12 release, I wrote a general overview on various major version upgrade methods and benefits of upgrading in general – so if upgrading is a new thing for you I’d recommend to read that posting first. But this time I’m concentrating on the newest (available since v10) and the most complex upgrade method – called “Logical Replication” or LR shortly. For demonstration purposes I’ll be migrating from v10 to freshly released v12 as this is probably the most likely scenario. But it should work the same also with v11 to v12. But do read on for details.

Benefits of LR upgrades

First a bit of recap from the previous post on why would you use LR for upgrading at all. Well, in short – because it’s the safest option with shortest possible downtime! With that last point I’m already sold…but here again the list of “pros” / “cons”:

PROS

Minimal downtime required

After the initial setup burden one just needs to wait (and verify) that the new instance hast all the data from the old one…and then just shut down the old instance and point applications to the new instance. Couldn’t be easier!

Also before the switchover one can make sure that statistics are up to date, to minimize the typical “degraded performance” period seen after “pg_upgrade” for more complex queries (on bigger databases). For high load application one could even be more careful here and pull the most popular relations into shared buffers by using the (relatively unknown) “pg_prewarm” Contrib extension or by just running common SELECT-s in a loop, to counter the “cold cache” effect.

Flexible

One can for example already make some changes on the target DB – add columns / indexes, change datatypes, leave out some old archive tables, etc. The general idea is that LR does not work on the binary, 1-to-1 level as”pg_upgrade” does, but rather JSON-like data objects are sent over to another master / primary instance, providing quite some freedom on the details.

Safe

Before the final switchover you can anytime abort the process and re-try if something seems fishy. The old instances data is not changed in any way even after the final switchover! Meaning you can easily roll back (with cost of some data loss typically though) to the old version if some unforeseen issues arise. One should only watch out for the replication slot on the source / publisher DB if the target server just taken down suddenly.

CONS

Quite a few steps to take and possibly one needs to modify the schema a bit.
Always per DB.
Could take a long time for big databases.
Large objects, if in use (should be a thing of the past really), need to be exported / imported manually.

Preparing for LR

As LR has some prerequisites on the configuration and schema, you’d first need to see if it’s possible to start with the migration process at all or some changes are needed on the old master node, also called the “publisher” in LR context.

Action points:

1) Enable LR on the old master aka subscriber aka source DB if not done already. This means setting “wal_level” to “logical” in postgresql.conf and making sure that “replication” connections are allowed in “pg_hba.conf” from the new host (also called the “subscriber” in LR context). FYI – changing “wal_level” needs server restart! To enable any kind of streaming replication some other params are needed but they are actually already set accordingly out of the box as of v10 so it shouldn’t be a problem.

2) Check that all tables have a Primary Key (which is good database design anyways) or alternatively have REPLICA IDENTITY set. Primary Keys don’t need much explaining probably but what is this REPLICA IDENTITY thing? A bit simplified – basically it allows to say which columns formulate uniqueness within a table and PK-s are automatically counted as such.

3) If there’s no PK for a particular table, you should create one, if possible. If you can’t do that, set unique constraints / indexes to serve as REPLICA IDENTITY, if at all possible. If even that isn’t possible, you can set the whole row as REPLICA IDENTITY, a.k.a. REPLICA IDENTITY FULL, meaning all columns serve as PK’s in an LR context – with the price of very slow updates / deletes on the subscriber (new DB) side, meaning the whole process could take days or not even catch up, ever! It’s OK not to define a PK for a table, as long as it’s a write-only logging table that only gets inserts.

Sample code:


psql -c “ALTER SYSTEM SET wal_level TO logical;”
sudo systemctl postgresql@10-main restart

# find problematic tables (assuming we want to migrate everything "as is")
SELECT
    quote_ident(nspname) || '.' || quote_ident(relname) AS tbl
FROM
    pg_class c
    JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE
    relkind = 'r'
    AND NOT nspname LIKE ANY (ARRAY[E'pg\\_%', 'information_schema'])
    AND NOT relhaspkey
    AND NOT EXISTS (SELECT * FROM pg_index WHERE indrelid = c.oid
            AND indisunique AND indisvalid AND indisready AND indislive)
ORDER BY
    1;

# set replica identities on tables highlighted by the previous query
ALTER TABLE some_bigger_table REPLICA IDENTITY USING INDEX unique_idx ;
ALTER TABLE some_table_with_no_updates_deletes REPLICA IDENTITY FULL ;

Fresh setup of the new “subscriber” DB

Second most important step is to set up a new totally independent instance with a newer Postgres version – or at least create a new database on an existing instance with the latest major version. And as a side note – same version LR migrations are also possible, but you’d be solving some other problem in that case.

This step is actually very simple – just a standard install of PostgreSQL, no special steps needed! With the important addition that to make sure everything works exactly the same way as before for applications – same encoding and collation should be used!

-- on old
SELECT pg_catalog.pg_encoding_to_char(d.encoding) AS "Encoding", d.datcollate as "Collate" FROM pg_database d WHERE datname = current_database();
-- on new
CREATE DATABASE appdb TEMPLATE template0 ENCODING UTF8 LC_COLLATE "en_US.UTF-8";

NB! Before the final switchover it’s important that no normal users have access to the new DB – as they might alter table data or structures and thereby inadvertently produce replication conflicts that mostly mean starting from scratch (or a costly investigation / fix) as “replay” is a sequential process.

Schema / roles synchronization

Next we need to synchronize the old schema onto the new DB as Postgres does not take care of that automatically as of yet. The simplest way is to use the official PostgreSQL backup tool called “pg_dump”, but if you have your schema initialization scripts in Git or such and they’re up to date then this is fine also. For syncing roles “pg_dumpall” can be used.

NB! After this point it’s not recommended to introduce any changes to the schema or be at least very careful when doing it, e.g. creating new tables / columns first on the subscriber and refreshing the subscriptions when introducing new tables – otherwise data synchronization will break! Tip – a good way to disable unwanted schema changes is to use DDL triggers! An approximate example on that is here. Adding new tables only on the new DB is no issue though but during an upgrade not a good idea anyways – my recommendation is to first upgrade and then to evolve the schema.

pg_dumpall -h $old_instance --globals-only | psql -h $new_instance
pg_dump -h $old_instance --schema-only appdb | psql -h $new_instance appdb

Create a “publication” on the old DB

If preparations on the old DB has been finished (all tables having PK-s or replication identities) then this is a oneliner:

CREATE PUBLICATION upgrade FOR ALL TABLES;

Here we added all (current and those added in future) tables to a publication (a replication set) named “upgrade” but technically we could also leave out some or choose to only replicate some operations like UPDATE-s, but for a pure version upgrade you want typically all.

NB! As of this moment the replication identities become important – and you might run into trouble on the old master if the identities are not in place on all tables that get changes! In such case you might see errors like that:

UPDATE pgbench_history SET delta = delta WHERE aid = 1;
ERROR:  cannot update table "pgbench_history" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

Create a “subscription” on the target DB

Next step – create a “subscription” on the new DB. This is also a oneliner, that creates a logical replication slot on the old instance, pulls initial table snapshots and then starts to stream and apply all table changes as they happen on the source, resulting eventually in a mirrored dataset! Note that currently superuser rights are needed for creating the subscription and actually hit also makes life a lot easier on the publisher side.

CREATE SUBSCRIPTION upgrade_sub CONNECTION 'port=5432 user=postgres' PUBLICATION upgrade;
NOTICE:  created replication slot "upgrade_sub" on publisher
CREATE SUBSCRIPTION

WARNING! As of this step the 2 DB-s are “coupled” via a replication slot, carrying some dangers if the process is aborted abruptly and the old DB is not “notified” of that. If this sounds new please see the details from documentation.

Check replication progress

Depending on the amount of data it will take X minutes / days until everything is moved over and “live” synchronization is working.

Things to inspect for making sure there are no issues:

No errors in server logs on both sides
There’s an active “pg_replication_slots” entry on the master with the name that we used to create the “subscription” on the new DB
All tables are actively replicating on the subscriber side, i.e. “pg_subscription_rel.srsubstate” should be ‘r’ for all tables (ready – normal replication)

Basic data verification / switchover preparation

Although not a mandatory step, when it comes to data consistency / correctness, it always makes sense to go the extra mile and run some queries that validate that things (source – target) have the same data. For a running DB it’s of course a bit difficult as there’s always some replication lag but for “office hours” applications it should make a lot of sense. My sample script for comparing rowcounts (in a non-threaded way) is for example here but using some slightly more “costly” aggregation / hashing functions that really look at all the data would be even better there.

Also important to note if you’re using sequences (which you most probably are) – sequence state is not synchronized by LR and needs some manual work / scripting! The easiest option I think is that you leave the old DB ticking in read-only mode during switchover so that you can quickly access the last sequence values without touching the indexes for maximum ID-s on the subscriber side.

Switchover time!

We’re almost there with our little undertaking…with the sweaty part remaining – the actual switchover to start using the new DB! Needed steps are simple though and somewhat similar to switching over to a standard, “streaming replication” replica.

1) Re-check the system catalog views on replication status.
2) Stop the old instance. Make sure it’s a nice shutdown. The last logline should state “database system is shut down”, meaning all recent changes were delivered to connected replication clients, including our new DB. Start of downtime! PS Another alternative to make sure absolutely all data is received is to actually configure the new instance in “synchronous replication” mode! This has the usual synchronous replication implications of course so I’d avoid it for bigger / busier DBs.
3) Start the old DB in read-only mode by creating a recovery.conf file (from v12 this is achieved by declaring a “standby.signal” file)
4) Optionally make some more quick “health checks” if time constraints allow it – verify table sizes, row counts, your last transactions, etc. For “live” comparisons it makes sense to restart the old DB under a new, random port so that no-one else connects to it.
5) Synchronize the sequences. Given we’ll leave the old DB in read-only mode the easiest way is something like that:

psql -h $old_instance -XAtqc "SELECT $$select setval('$$ || quote_ident(schemaname)||$$.$$|| quote_ident(sequencename) || $$', $$ || last_value || $$); $$ AS sql FROM pg_sequences" appdb \
| psql -h $new_instance appdb

6) Reconfigure your pg_hba.conf to allow access for all “mortal” users, then reconfigure your application, connection pooler, DNS or proxy to start using the new DB! If the two DB-s were on the same machine then it’s even easier – just change the ports and restart. End of downtime!
7) Basically we’re done here, but would be nice of course to clean up and remove the (no-more needed) subscription not to accumulate errors in server log.

DROP SUBSCRIPTION upgrade_sub;

Note that if you won’t keep the old “publisher” accessible in read-only or normal primary mode (dangerous!) though, some extra steps are needed here before dropping:

ALTER SUBSCRIPTION  upgrade_sub DISABLE ;
ALTER SUBSCRIPTION  upgrade_sub SET (slot_name = NONE);
DROP SUBSCRIPTION upgrade_sub;

8) Time for some bubbly drinks

Summary

Although there are quite some steps and nuances involved, LR is worth adding to the standard upgrade toolbox for time-critical applications as it’s basically the best way to do major version upgrades nowadays – minimal dangers, minimal downtime!

FYI – if you’re planning to migrate dozens of DB-s the LR upgrade process can be fully automated! Even starting from version 9.4 actually, with the help of the “pglogical” extension. So feel free to contact us if you might need something like that and don’t particularly enjoy the details. Thanks for reading!

The post Upgrading Postgres major versions using Logical Replication appeared first on Cybertec.

↧

Luca Ferrari: pgenv: adjust your PATH!

October 24, 2019, 5:00 pm

≫ Next: Luca Ferrari: Installing PostgreSQL on FreeBSD via Ansible

≪ Previous: Kaarel Moppel: Upgrading Postgres major versions using Logical Replication

A few days ago we added the option to suggest you changes to your PATH to prevent version clashes.

pgenv: adjust your PATH!

In the following you can find another quick video that demonstrate how easy it is to get, almost automtically, a PostgreSQL 12 instance up and running on your local machine using pgenv.

Please note also that, at time 5:35, you will see how pgenv suggests you to adjust your PATH environment variable in order to use the just installed binaries for the cluster. The idea behind this suggestion is to prevent you using a system-wide binary, e.g., psql, that has a possible incompatibility with the in-use cluster.

↧

Luca Ferrari: Installing PostgreSQL on FreeBSD via Ansible

October 29, 2019, 5:00 pm

≫ Next: Jobin Augustine: PostgreSQL Application Connection Failover Using HAProxy with xinetd

≪ Previous: Luca Ferrari: pgenv: adjust your PATH!

My very simple attempt at keeping PostgreSQL up-to-date on FreeBSD machines.

Installing PostgreSQL on FreeBSD via Ansible

I’m slowly moving to Ansible to manage my machines, and one problem I’m trying to solve at best is how to keep PostgreSQL up-to-date.
In the case of FreeBSD machines, pkgng is the module to use, but in the past I was used to this very simple playbook snippet:

-name:PostgreSQL 11become:yeswith_items:-server-contrib-client-plperlpkgng:name:postgresql11-state:latest

However, there is a very scarign warning message when running the above:

TASK [PostgreSQL 11] [DEPRECATION WARNING]: Invoking "pkgng" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: "postgresql11-"`, please use `name: ['server', 'contrib', 'client', 'plperl']` and remove the loop. This feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. 

That’s easy to fix, but also annoying (at least to me), because I have to change the above snippet to the following one:

-name:PostgreSQL 11become:yespkgng:name:-postgresql11-server-postgresql11-contrib-postgresql11-client-postgresql11-plperlstate:latest

So far, the better solution I’ve found that helps me keep readibility is...

↧

Installation

Archiving WAL with archive_command

I have a redundant distributed storage system, do I still need to store WAL archives somewhere else?

When archive_command isn’t good enough

pg_receivewal comes to the rescue

pg_receivewal and synchronous replication

Archive recovery and partial WAL segments

Conclusion

Connecting to Master Using Read-Write Mode

Connecting to any Server for Reads

About Vancouver Postgres User Meetup Group

Guest Speaker: Andrew Nelson

About the Presentation

Background on pgMonitor

Test Partitions

Range Custom Query

Using the PMM docker image to create a PMM server

Installing PMM client on a Remote PostgreSQL server

Creating required users and permissions on the PostgreSQL server

Enabling PostgreSQL Monitoring with and without QAN (Query Analytics)

Without QAN

With QAN

Benefits of LR upgrades

Preparing for LR

Fresh setup of the new “subscriber” DB

Schema / roles synchronization

Create a “publication” on the old DB

Create a “subscription” on the target DB

Check replication progress

Basic data verification / switchover preparation

Switchover time!

Summary

pgenv: adjust your PATH!

Installing PostgreSQL on FreeBSD via Ansible

Archiving WAL with `archive_command`

When `archive_command` isn’t good enough

`pg_receivewal` comes to the rescue

`pg_receivewal` and synchronous replication