Abdul Yadi: Extension for String Translation

February 1, 2020, 12:23 am

≫ Next: Darafei Praliaskouski: PostGIS 3.1.0alpha1

≪ Previous: Luca Ferrari: Checking catalogues for corruption with pg_catcheck

PostgreSQL provides a built-in function for character wise string replacement:
select translate('abcdefghijkl', 'ace', '123'); translate -------------- 1b2d3fghijkl

pgstrtranslate extends it with multi-character replacement. It takes 4 arguments and returning a text.
CREATE OR REPLACE FUNCTION public.pgstrtranslate( fullsearch boolean, t text, search text[], replacement text[]) RETURNS text AS '$libdir/pgstrtranslate', 'pgstrtranslate' LANGUAGE c IMMUTABLE STRICT;

How it works

Non-fullsearch replacement:

select pgstrtranslate(false, --non-fullsearch 'abcdefghijkl', --original string array['ab', 'efg', '2cd']::text[], --array of searchs array['012', '3', '78']::text[]); --array of replacement translate -------------- 012cd3hijkl
‘abcdefghijkl’ -> ‘012cd3hijkl’
Note that ‘2cd’ does not match original string.

Fullsearch replacement:

select pgstrtranslate(true, --fullsearch 'abcdefghijkl', --original string array['ab', 'efg', '2cd']::text[], --array of searchs array['012', '3', '78']::text[]); --array of replacement translate -------------- 01783hijkl
Replace ‘ab’ with ‘012’: ‘abcdefghijkl’ -> ‘012cdefghijkl’
Replace ‘efg’ with ‘3’: ‘012cdefghijkl’ -> ‘012cd3hijkl’
Replace ‘2cd’ with ’78’: ’012cd3hijkl’ -> ’01783hijkl’

How to install

Clone or download source code from https://github.com/AbdulYadi/pgstrtranslate.git. Extract it.
If necessary, modify PG_CONFIG path according to your specific PostgreSQL installation location.
Build as usual:

$ make $ make install

On successful compilation, install this extension in PostgreSQL environment:

$ create extension pgstrtranslate;

↧

Darafei Praliaskouski: PostGIS 3.1.0alpha1

February 1, 2020, 4:00 pm

≫ Next: Federico Campoli: The strange case of the EXCEPTION block

≪ Previous: Abdul Yadi: Extension for String Translation

The PostGIS Team is pleased to release the first alpha of upcoming PostGIS 3.1.0 release.

Best served with PostgreSQL 12.1, GEOS 3.8.0.

Continue Reading by clicking title hyperlink ..

↧

Federico Campoli: The strange case of the EXCEPTION block

February 2, 2020, 4:00 pm

≫ Next: Hugo Dias: An Overview of Job Scheduling Tools for PostgreSQL

≪ Previous: Darafei Praliaskouski: PostGIS 3.1.0alpha1

When in a pl/pgsql function there is an exception then the function stops the execution and returns an error. When this happens all the changes made are rolled back.

It’s always possible to manage the error at application level, however there are some cases where managing the exception inside the function it may be a sensible choice. And pl/pgsql have a nice way to do that. The EXCEPTION block.

However handling the exception inside a function is not just a cosmetic thing. The way the excepion is handled have implications that may cause issues.

↧

Hugo Dias: An Overview of Job Scheduling Tools for PostgreSQL

February 3, 2020, 12:01 pm

≫ Next: Martin Davis: Running commands in the JTS TestBuilder

≪ Previous: Federico Campoli: The strange case of the EXCEPTION block

Unlike other database management systems that have their own built-in scheduler (like Oracle, MSSQL or MySQL), PostgreSQL still doesn’t have this kind of feature.

In order to provide scheduling functionality in PostgreSQL you will need to use an external tool like...

Linux crontab
Agent pgAgent
Extension pg_cron

In this blog we will explore these tools and highlight how to operate them and their main features.

Linux crontab

It’s the oldest one, however, an efficient and useful way to execute scheduling tasks. This program is based on a daemon (cron) that allows tasks to be automatically run in the background periodically and regularly verifies the configuration files ( called crontab files) on which are defined the script/command to be executed and its scheduling.

Each user can have his own crontab file and for the newest Ubuntu releases are located in:

/var/spool/cron/crontabs (for other linux distributions the location could be different):

root@severalnines:/var/spool/cron/crontabs# ls -ltr

total 12

-rw------- 1 dbmaster crontab 1128 Jan 12 12:18 dbmaster

-rw------- 1 slonik   crontab 1126 Jan 12 12:22 slonik

-rw------- 1 nines    crontab 1125 Jan 12 12:23 nines

The syntax of the configuration file is the following:

mm hh dd mm day <<command or script to execute>>



mm: Minute(0-59)

hh: Hour(0-23)

dd: Day(1-31)

mm: Month(1-12)

day: Day of the week(0-7 [7 or 0 == Sunday])

A few operators could be used with this syntax to streamline the scheduling definition and these symbols allow to specify multiple values in a field:

Asterisk (*) - it means all possible values for a field

The comma (,) - used to define a list of values

Dash (-) - used to define a range of values

Separator (/) - specifies a step value

The script all_db_backup.sh will be executed according each scheduling expression:

0 6 * * * /home/backup/all_db_backup.sh	At 6 am every day
20 22 * * Mon, Tue, Wed, Thu, Fri /home/backup/all_db_backup.sh	At 10:20 PM, every weekday
0 23 * * 1-5 /home/backup/all_db_backup.sh	At 11 pm during the week
0 0/5 14 * * /home/backup/all_db_backup.sh	Every five hours starting at 2:00 p.m. and ending at 2:55 p.m., every day

Although it’s not very difficult, this syntax can be automatically generated on multiple web pages.

If the crontab file doesn’t exist for a user it can be created by the following command:

slonik@severalnines:~$ crontab -e

or presented it using the -l parameter:

slonik@severalnines:~$ crontab -l

If necessary to remove this file, the appropriate parameter is -r:

slonik@severalnines:~$ crontab -r

The cron daemon status is shown by the execution of the following command:

Agent pgAgent

The pgAgent is a job scheduling agent available for PostgreSQL that allows the execution of stored procedures, SQL statements, and shell scripts. Its configuration is stored on the postgres database in the cluster.

The purpose is to have this agent running as a daemon on Linux systems and periodically does a connection to the database to check if there are any jobs to execute.

This scheduling is easily managed by PgAdmin 4, but it’s not installed by default once the pgAdmin installed, it’s necessary to download and install it on your own.

Hereafter are described all the necessary steps to have the pgAgent working properly:

Step One

Installation of pgAdmin 4

$ sudo apt install pgadmin4 pgadmin4-apache

Step Two

Creation of plpgsql procedural language if not defined

CREATE TRUSTED PROCEDURAL LANGUAGE ‘plpgsql’

     HANDLER plpgsql_call_handler

          HANDLER plpgsql_validator;

Step Three

Installation of pgAgent

$ sudo apt-get install pgagent

Step Four

Creation of the pgagent extension

CREATE EXTENSION pageant

This extension will create all the tables and functions for the pgAgent operation and hereafter is showed the data model used by this extension:

Now the pgAdmin interface already has the option “pgAgent Jobs” in order to manage the pgAgent:

In order to define a new job, it’s only necessary select "Create" using the right button on “pgAgent Jobs”, and it’ll insert a designation for this job and define the steps to execute it:

In the tab “Schedules” must be defined the scheduling for this new job:

Finally, to have the agent running in the background it’s necessary to launch the following process manually:

/usr/bin/pgagent host=localhost dbname=postgres user=postgres port=5432 -l 1

Nevertheless, the best option for this agent is to create a daemon with the previous command.

Extension pg_cron

The pg_cron is a cron-based job scheduler for PostgreSQL that runs inside the database as an extension (similar to the DBMS_SCHEDULER in Oracle) and allows the execution of database tasks directly from the database, due to a background worker.

The tasks to perform can be any of the following ones:

stored procedures
SQL statements
PostgreSQL commands (as VACUUM, or VACUUM ANALYZE)

pg_cron can run several jobs in parallel, but only one instance of a program can be running at a time.

If a second run should be started before the first one finishes, then it is queued and will be started as soon as the first run completes.

This extension was defined for the version 9.5 or higher of PostgreSQL.

Installation of pg_cron

The installation of this extension only requires the following command:

slonik@sveralnines:~$ sudo apt-get -y install postgresql-10-cron

Updating of Configuration Files

In order to start the pg_cron background worker once PostgreSQL server starts, it’s necessary to set pg_cron to shared_preload_libraries parameter in postgresql.conf:

shared_preload_libraries = ‘pg_cron’

It’s also necessary to define in this file, the database on which the pg_cron extension will be created, by adding the following parameter:

cron.database_name= ‘postgres’

On the other hand, in pg_hba.conf file that manages the authentication, it’s necessary to define the postgres login as trust for the IPV4 connections, because pg_cron requires such user to be able to connect to the database without providing any password, so the following line needs to be added to this file:

host postgres postgres 192.168.100.53/32 trust

The trust method of authentication allows anyone to connect to the database(s) specified in the pg_hba.conf file, in this case the postgres database. It's a method used often to allow connection using Unix domain socket on a single user machine to access the database and should only be used when there isan adequate operating system-level protection on connections to the server.

Both changes require a PostgreSQL service restart:

slonik@sveralnines:~$ sudo system restart postgresql.service

It’s important to take into account that pg_cron does not run any jobs as long as the server is in hot standby mode, but it automatically starts when the server is promoted.

Creation of pg_cron extension

This extension will create the meta-data and the procedures to manage it, so the following command should be executed on psql:

postgres=#CREATE EXTENSION pg_cron;

CREATE EXTENSION

Now, the needed objects to schedule jobs are already defined on the cron schema:

This extension is very simple, only the job table is enough to manage all this functionality:

Definition of New Jobs

The scheduling syntax to define jobs on pg_cron is the same one used on the cron tool, and the definition of new jobs is very simple, it’s only necessary to call the function cron.schedule:

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(12356,''DAILY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(998934,''WEEKLY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(45678,''DAILY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(1010,''WEEKLY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(1001,''MONTHLY_DATA'');')

select cron.schedule('*/5 * * * *','select reporting.f_reset_client_data(0,''DATA'')')

select cron.schedule('*/5 * * * *','VACUUM')

select cron.schedule('*/5 * * * *','$$DELETE FROM reporting.rep_request WHERE create_dt<now()- interval '60 DAYS'$$)

The job setup is stored on the job table:

Another way to define a job is by inserting the data directly on the cron.job table:

INSERT INTO cron.job (schedule, command, nodename, nodeport, database, username)

VALUES ('0 11 * * *','call loader.load_data();','postgresql-pgcron',5442,'staging', 'loader');

and use custom values for nodename and nodeport to connect to a different machine (as well as other databases).

Deactivation of a Jobs

On the other hand, to deactivate a job it’s only necessary to execute the following function:

select cron.schedule(8)

Jobs Logging

The logging of these jobs can be found on the PostgreSQL log file /var/log/postgresql/postgresql-12-main.log:

Tags:

postgres

PostgreSQL

scheduling

configuration

↧

Martin Davis: Running commands in the JTS TestBuilder

February 3, 2020, 9:40 pm

≫ Next: Andrew L'Ecuyer: Guard Against Transaction Loss with PostgreSQL Synchronous Replication

≪ Previous: Hugo Dias: An Overview of Job Scheduling Tools for PostgreSQL

The JTS TestBuilder is a great tool for creating geometry, processing it with JTS spatial functions, and visualizing the results. It has powerful capabilities for inspecting the fine details of geometry (such as the Reveal Topology mode). I've often thought it would be handy if there was a similar tool for PostGIS. Of course QGIS excels at visualizing the results of PostGIS queries. But it doesn't offer the same simplicity for creating geometry and passing it into PostGIS functions.

This is the motivation behind a recent enhancement to the TestBuilder to allow running external (system) commands that return geometry output. The output can be in any text format that TestBuilder recognizes (currently WKT, WKB and GeoJSON). It also provides the ability to encode the A and B TestBuilder input geometries as literal WKT or WKB values in the command. The net result is the ability to run external geometry functions just as if they were functions built into the TestBuilder.

Examples

Running PostGIS spatial functions

Combined with the versatile Postgres command-line interface psql, this allows running a SQL statement and loading the output as geometry. Here's an example of running a PostGIS spatial function. In this case a MultiPoint geometry has been digitized in the TestBuilder, and processed by the ST_VoronoiPolygons function. The SQL output geometry is displayed as the TestBuilder result.

The command run is:

/Applications/Postgres.app/Contents/Versions/latest/bin/psql -qtA -c
"SELECT ST_VoronoiPolygons('#a#'::geometry);"

Things to note:

the full path to psql is needed because the TestBuilder processes the command using a plain sh shell. (It might be possible to improve this.)
The psql options -qtA suppress messages, table headers, and column alignment, so that only the plain WKB of the result is output
The variable #a# has the WKT of the A input geometry substituted when the command is run. This is converted into a PostGIS geometry via the ::geometry cast. (#awkb# can be used to supply WKB, if full numeric precision is needed)

Loading data from PostGIS

This also makes it easy to load data from PostGIS to make use of TestBuilder geometry analysis and visualization capabilities. The query can be any kind of SELECT statement, which makes it easy to control what data is loaded. For large datasets it can be useful to draw an Area of Interest in the TestBuilder and use that as a spatial filter for the query. The TestBuilder is able to load multiple textual geometries, so there is not need to collect the query result into a single geometry.

Loading data from the Web

Another use for commands is to load data from the Web, by using curl to retrieve a dataset. Many web spatial datasets are available in GeoJSON, which loads fine into the TestBuilder. Here's an example of loading a dataset provided by an OGC Features service (pygeoapi):

Command Panel User Interface

The Command panel provides a simple UI to make it easier to work with commands. Command text can be pasted and cleared. A history of commands run is recorded for the TestBuilder session. Recorded commands in the session can be recalled via the Previous and Next Command buttons.
Buttons are provided to insert substitution variable text.

To help debug incorrect command syntax, error output from commands is displayed.

It can happen that a command executes successfully, but returns output that cannot be parsed. This is indicated by an error in the Result panel. A common cause of this is that the command produces logging output as well as the actual geometry text, which interferes with parsing. To aid in debugging this situation the command window shows the first few hundred characters of command output. The good news is that many commands offer a "quiet mode" to provide data-only output.

Unparseable psql output due to presence of column headers. The pqsl -t option fixes this.

If you find an interesting use for the TestBuilder Command capability, post it in the comments!

↧

Andrew L'Ecuyer: Guard Against Transaction Loss with PostgreSQL Synchronous Replication

February 4, 2020, 7:44 am

≫ Next: Semab Tariq: How to use Machine Learning with 2UDA – PostgreSQL and Orange

≪ Previous: Martin Davis: Running commands in the JTS TestBuilder

Guard Against Transaction Loss with PostgreSQL Synchronous Replication

Crunchy Data recently released its latest version of the open source PostgreSQL Operator for Kubernetes, version 4.2. Among the various enhancements included within this release is support for Synchronous Replication within deployed PostgreSQL clusters.

As discussed in our prior post, the PostgreSQL Operator 4.2 release introduces distributed consensus based high-availability. For workloads that are sensitive to transaction loss, the Crunchy PostgreSQL Operator supports PostgreSQL synchronous replication.

Streaming Replication in PostgreSQL

When a high-availability PostgreSQL cluster is created using the PostgreSQL Operator, streaming replication is enabled to keep various replica servers up-to-date with the latest write-ahead log (WAL) records as they are generated on the primary database server. PostgreSQL streaming replication is asynchronous by default, which means if the primary PostgreSQL server experiences a failure and/or crashes, there is a chance some transactions that were committed may have yet to be replicated to the standby server, potentially resulting in data loss. The specific amount of data loss in this scenario varies, and is directly proportional to the replication delay at the time the failover occurs.

↧

Semab Tariq: How to use Machine Learning with 2UDA – PostgreSQL and Orange

February 4, 2020, 5:19 am

≫ Next: Luca Ferrari: PL/PgSQL Exception and XIDs

≪ Previous: Andrew L'Ecuyer: Guard Against Transaction Loss with PostgreSQL Synchronous Replication

This article gives a step by step guide to utilizing Machine Learning capabilities with 2UDA. In the article, we’ll use an example of Animals to predict whether they are mammals, Birds, Fish or Insects. Software versions We’re going to use 2UDA version 11.6-1 to implement the Machine Learning model. 2UDA version 11.6-1 combines: PostgreSQL 11.6 […]

↧

Luca Ferrari: PL/PgSQL Exception and XIDs

February 4, 2020, 4:00 pm

≫ Next: Mark Wong: Creating a PostgreSQL procedural language – Part 1 – Setup

≪ Previous: Semab Tariq: How to use Machine Learning with 2UDA – PostgreSQL and Orange

A few considerations on how exceptions are handled in PL/PgSQL.

PL/PgSQL Exception and XIDs

I read the blog post The strange case of the EXCEPTION block where the author was claiming that an EXCEPTION block in a PL/PgSQL function was incrementing the transaction id (xid).
Somehow, this was not very surprising to me.
Why? That reminded me immediatly my own question on the general mailing list when I was observing a very similar behaviour within psql. In particular, this answer was illuminating:

 something is using subtransactions there. My first guess would be that there are triggers with EXCEPTION blocks 

My Guess About How Exceptions Are Handled

I think PL/PgSQL is using subtransactions (or savepoints) to handle exceptions.
Why?
Well, if you think about when you catch and exception you probably want to resume your execution, that is you must have a way to rollback your unit of work and start over again.

See Transactions in Action!

It is possible to inspect the transactions in action with a simple function and a table to abuse.
There is no need to play around with VACUUM FREEZE and age() as the original author says.
Let’s see the function:

CREATEORREPLACEFUNCTIONf_loop(bintDEFAULT0,eintDEFAULT10)RETURNSVOIDAS$$BEGINRAISEDEBUG'TXID of the function (here should not be assigned) function: % %',txid_current_if_assigned(),txid_status(txid_current_if_assigned 

↧

Mark Wong: Creating a PostgreSQL procedural language – Part 1 – Setup

February 5, 2020, 1:31 pm

≫ Next: Luca Ferrari: Executing VACUUM by non-owner user

≪ Previous: Luca Ferrari: PL/PgSQL Exception and XIDs

PostgreSQL supports many procedural languages, which can be used to write user defined functions or stored procedures. There are four that are readily available as part of the standard PostgreSQL distribution: PL/pgSQL, PL/Tcl, PL/Perl, PL/Python. Yet procedural languages don’t have to be created as part of the core project. There are a number more that […]

↧

Luca Ferrari: Executing VACUUM by non-owner user

February 5, 2020, 4:00 pm

≫ Next: Stefan Fercot: Monitor pgBackRest backups with check_pgbackrest 1.7

≪ Previous: Mark Wong: Creating a PostgreSQL procedural language – Part 1 – Setup

VACUUM needs to be run by the object owner!

Executing VACUUM by non-owner user

The documentation about VACUUM clearly states it:

 To vacuum a table, one must ordinarily be the table's owner or a superuser. However, database owners are allowed to vacuum all tables in their databases, except shared catalogs. [...] VACUUM cannot be executed inside a transaction block. 

There is not an ACL flag about VACUUM, that means you cannot GRANT someone else to execute VACUUM.
Period.

Therefore there is no escape: in order to run VACUUM you must to be either (i) the object owner or (ii) the database owner or,as you can image, (iii) one of the cluster superuser(s).

Why am I insisting on this? Because some friends of mine argued that it is always possible to escape restrictions with functions an SECURITY DEFINER options. In this particular case, one could think to define a function that executes VACUUM, then apply the SECURITY DEFINER option so that the function will run as the object owner, and then provide (i.e., GRANT) execution permission to a normal user.
WRONG!
The fact that VACUUM cannot be executed within a transaction block means you cannot use such an approach, because a function is executed within a transaction block.
And if now you are asking yourself why VACUUM cannot be wrapped in a transaction block, just explain me how to ROLLBACK a VACUUM execution, it will be an interesting and fantasyland explaination!

So, what is going to happen if...

↧

Stefan Fercot: Monitor pgBackRest backups with check_pgbackrest 1.7

February 5, 2020, 4:00 pm

≫ Next: Julien Rouhaud: New in pg12: New leader_pid column in pg_stat_activity

≪ Previous: Luca Ferrari: Executing VACUUM by non-owner user

check_pgbackrest is designed to monitor pgBackRest backups, relying on the status information given by the info command.

The main features are:

check WAL archives consistency;
check the retention policy;
check its own version;
multiple output format: human, json and nagios.

Installation

The RPM file made it’s way to the PGDG Yum repository! Special thanks to Devrim for that!

It can also be found in the Dalibo Labs Yum repository.

Install the Official PostgreSQL yum repo package or the Dalibo Labs yum repo package
yum install nagios-plugins-pgbackrest

Remark: epel-release also needs to be installed first.

Monitor the backup retention

The service fails when

the number of full backups is less than --retention-full;
the newest backup is older than --retention-age;
the newest full backup is older than --retention-age-to-full.

$ check_pgbackrest --stanza=my_stanza 
  --service=retention --retention-full=1 --output=human
  --retention-age=24h --retention-age-to-full=7d

Service        : BACKUPS_RETENTION
Returns        : 0 (OK)
Message        : backups policy checks ok
Long message   : full=1
Long message   : diff=1
Long message   : incr=1
Long message   : latest=incr,20200131-150158F_20200131-150410I
Long message   : latest_age=2m47s
Long message   : latest_full=20200131-150158F
Long message   : latest_full_age=5m

Monitor WAL segments archives

The pgbackrest info command

shows the oldest (min) archive and the most recent one (max);
doesn’t check if all the archives in between are really on the disk;
doesn’t give the age of the most recent archive (until 2.21);
…

The archives service checks if all archived WALs exist between the oldest and the latest WAL needed for the recovery.

Local storage

This service requires the --repo-path argument to specify where the archived WALs are stored.

Archives must be compressed (.gz). If needed, use “compress-level=0” instead of “compress=n”.

Use the --wal-segsize argument to set the WAL segment size if you don’t use the default one.

$ check_pgbackrest --stanza=my_stanza
  --service=archives --repo-path=/var/lib/pgbackrest/archive --output=human
Service        : WAL_ARCHIVES
Returns        : 0 (OK)
Message        : 81 WAL archived
Message        : latest archived since 1m59s
Long message   : latest_archive_age=1m59s
Long message   : num_archives=81
Long message   : archives_dir=/var/lib/pgbackrest/archive/my_stanza/12-1
Long message   : min_wal=000000010000000000000003
Long message   : max_wal=000000010000000000000053
Long message   : oldest_archive=000000010000000000000003
Long message   : latest_archive=000000010000000000000053
Long message   : latest_bck_archive_start=000000010000000000000007
Long message   : latest_bck_type=incr

Remote storage

If the archives are pushed to another server, use the --repo-host and --repo-host-user arguments:

$ check_pgbackrest --stanza=my_stanza 
  --service=archives --repo-path=/var/lib/pgbackrest/archive 
  --repo-host="backup-srv"--repo-host-user=postgres

WAL_ARCHIVES OK - 4 WAL archived, latest archived since 25m30s | 
  latest_archive_age=25m30s num_archives=4

S3 storage

The service can use the Amazon S3 API by getting repo1-s3-key and repo1-s3-key-secret parameters directly from the pgBackRest configuration file.

Use the --repo-s3 argument to turn on that behavior:

$ check_pgbackrest --stanza=my_stanza 
  --service=archives --repo-path=/repo1/archive 
  --repo-s3--repo-s3-over-http

WAL_ARCHIVES OK - 4 WAL archived, latest archived since 1m7s | 
  latest_archive_age=1m7s num_archives=4

Go further

Use the --ignore-archived-before argument to ignore the archived WALs generated before the provided interval. Used to only check the latest archives.

Use the --ignore-archived-after argument to ignore the archived WALs generated after the provided interval. Used under heavy archiving load.

The --latest-archive-age-alert argument defines the max age of the latest archived WAL as an interval before raising a critical alert.

Tests using Vagrant

Tests scenarios have been prepared using CentOS 7 boxes and libvirt vm provider:

[check_pgbackrest]$ ls test
Makefile  perf  provision  README.md  regress  ssh  Vagrantfile

1. pgBackRest configured to backup and archive on a CIFS mount

icinga-srv executes check_pgbackrest by ssh with Icinga 2;
pgsql-srv hosting a pgsql cluster with pgBackRest installed;
backup-srv hosting the CIFS share.

Backups and archiving are done locally on pgsql-srv on the CIFS mount point.

2. pgBackRest configured to backup and archive remotely

icinga-srv executes check_pgbackrest by ssh with Icinga 2;
pgsql-srv hosting a pgsql cluster with pgBackRest installed;
backup-srv hosting the pgBackRest backups and archives.

Backups of pgsql-srv are taken from backup-srv. Archives are pushed from pgsql-srv to backup-srv. Checks (retention and archives) are done both locally (on backup-srv) and remotely (on pgsql-srv). Checks are performed from icinga-srv by ssh. pgBackRest backups are use to build a Streaming Replication with backup-srv as standby server.

3. pgBackRest configured to backup and archive to a MinIO S3 bucket

icinga-srv executes check_pgbackrest by ssh with Icinga 2;
pgsql-srv hosting a pgsql cluster with pgBackRest installed;
backup-srv hosting the MinIO server.

Evolution

The first evolution I’d like to implement would be to use the pgbackrest ls command to get the archives list. Indeed, mtime property for archives is available since pgBackRest 2.21. At the moment, we’d still need a cat command for the .history files.

$ pgbackrest help ls
pgBackRest 2.23 - 'ls'command help

List paths/files in the repository.

This is intended to be a general purpose list function, but for now it only
works on the repository.

Command Options:

  --filter                         filter output with a regular expression
  --output                         output format [default=text]
  --recurse                        include all subpaths in output [default=n]
  --sortsort output ascending, descending, or none
                                   [default=asc]

Another possible evolution would be the Debian support with specific test cases (using Vagrant) and a .deb package.

If you have any idea to improve the tool, please, share it! :-)

Conclusion

check_pgbackrest is an open project, licensed under the PostgreSQL license.

Any contribution to improve it is welcome.

↧

Julien Rouhaud: New in pg12: New leader_pid column in pg_stat_activity

February 6, 2020, 4:59 am

≫ Next: Mark Wong: PDXPUG February 2020 Meetup: DBlink and SQL/MED and FDW, oh my! External Data Access tricks

≪ Previous: Stefan Fercot: Monitor pgBackRest backups with check_pgbackrest 1.7

New leader_pid column in pg_stat_activity view

Surprisingly, since parallel query was introduced in PostgreSQL 9.6, it was impossible to know wich backend a parallel worker was related to. So, as Guillaume pointed out, it makes it quite difficult to build simple tools that can sample the wait events related to all process involved in a query. A simple solution to that problem is to export the lock group leader information available in the backend at the SQL level:

commit b025f32e0b5d7668daec9bfa957edf3599f4baa8
Author: Michael Paquier <michael@paquier.xyz>
Date:   Thu Feb 6 09:18:06 2020 +0900

Add leader_pid to pg_stat_activity

This new field tracks the PID of the group leader used with parallel
query.  For parallel workers and the leader, the value is set to the
PID of the group leader.  So, for the group leader, the value is the
same as its own PID.  Note that this reflects what PGPROC stores in
shared memory, so as leader_pid is NULL if a backend has never been
involved in parallel query.  If the backend is using parallel query or
has used it at least once, the value is set until the backend exits.

Author: Julien Rouhaud
Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas
Vondra
Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com

With this change, you can now easily find all processes involved in a parallel query. For instance:

=#SELECTquery,leader_pid,array_agg(pid)filter(WHEREleader_pid!=pid)ASmembersFROMpg_stat_activityWHEREleader_pidISNOTNULLGROUPBYquery,leader_pid;query|leader_pid|members-------------------+------------+---------------select*fromt1;|31630|{32269,32268}(1row)

Be careful, as mentionned in the commit message, if the leader_pid is the same as pid, it doesn’t necessarily mean that the backend is currently performing a parallel query, as once set this field is never reset. Also, to avoid extra ovherhead, no additional lock is held while outputting the data. It means that each row is processed independently. So, while quite unlikely, you can get in some circumstances inconsistent data, such as a parallel worker pointing to a pid that already disconnected.

New in pg12: New leader_pid column in pg_stat_activity was originally published by Julien Rouhaud at rjuju's home on February 06, 2020.

↧

Mark Wong: PDXPUG February 2020 Meetup: DBlink and SQL/MED and FDW, oh my! External Data Access tricks

February 6, 2020, 8:45 am

≫ Next: Keith Fiske: How To Migrate From Trigger-Based Partitioning To Native in PostgreSQL

≪ Previous: Julien Rouhaud: New in pg12: New leader_pid column in pg_stat_activity

2020 February 20 Meeting 6pm-8pm

Location:

PSU Business Accelerator
2828 SW Corbett Ave · Portland, OR
Parking is open after 5pm.

Speakers: Gabrielle Roth & Michelle Franz

In February, long-term PDXPUGers Michelle Franz & Gabrielle Roth return to discuss DBLINK and FDW, two methods of accessing data that’s outside your database. We’ll briefly mention SQL/MED on the way, and tell a couple of bad (Dad) jokes.

We’ll cover use (and mis-use) cases, configuration of each, and interesting quirks. Gabrielle will attempt a live demo of file_fdw and postgres_fdw. You’re welcome to follow along if you’d like, just show up with a couple of Pg databases: they can be on the same cluster (or container or RDS instance or what have you) or different ones, doesn’t really matter. One of them should have at least one table with data in it.

About the speakers:

From embedded systems to UI to relational database programming, Michelle’s running an interesting path through the software world. Working with databases is her current episode, with some programming, some DBA, some SysAdmin and a healthy dose of learning agile. She’s very quickly come to appreciate and enjoy working with Postgres and her work has allowed her to learn from some super talented database experts.

Gabrielle has been using Postgres since sometime in the version 7s, and thinks that the best part of using Open Source software is the culture of sharing knowledge. She’s a Senior Data Engineer at NS1 and co-founder and former co-lead of PDXPUG.

↧

Keith Fiske: How To Migrate From Trigger-Based Partitioning To Native in PostgreSQL

February 7, 2020, 6:53 am

≫ Next: Pavel Stehule: Moscow PgConf.Russia 2020

≪ Previous: Mark Wong: PDXPUG February 2020 Meetup: DBlink and SQL/MED and FDW, oh my! External Data Access tricks

How To Migrate From Trigger-Based Partitioning To Native in PostgreSQL

PostgreSQL 10 introduced native partitioning and more recent versions have continued to improve upon this feature. However, many people set up partition sets before native partitioning was available and would greatly benefit from migrating to it. This article will cover how to migrate a partition set using the old method of triggers/inheritance/constraints to a partition set using the native features found in PostgreSQL 11+. Note these instructions do not cover migrating to PG10 since some key features that make this migration easier were not yet implemented. It is highly recommended to move to PG11 or higher if you want to migrate existing partition sets.

↧

Pavel Stehule: Moscow PgConf.Russia 2020

February 7, 2020, 11:03 am

≫ Next: Pavel Stehule: psql and gnuplot

≪ Previous: Keith Fiske: How To Migrate From Trigger-Based Partitioning To Native in PostgreSQL

Last week I was on Moscow's PgConf.Russia 2020 (Thanks for Oleg Bartunov for invitation). It was interesting and very good event, where I can meet interesting people, where I can see interesting places - like famous building of Moscow Lomonosov State University - with nice guide from PostgresPro.

My presentations

https://pgconf.ru/en/2020/272099
https://pgconf.ru/en/2020/272097

↧

Pavel Stehule: psql and gnuplot

February 8, 2020, 11:48 pm

≫ Next: Luca Ferrari: Why Dropping a Column does not Reclaim Disk Space? (or better, why is it so fast?)

≪ Previous: Pavel Stehule: Moscow PgConf.Russia 2020

psql from PostgreSQL 12 can produces CSV format. This format is well readable by gnuplot. So anybody can use it together:

\pset format csv
select i, sin(i) from generate_series(0, 6.3, 0.5) g(i) \g |gnuplot -p -e "set datafile separator ','; set key autotitle columnhead; plot '-'with boxes"

    1 +-+---------+-*******--+-----------+----------+-----------+--------+-+   
      +           + *     *******        +          +           +          +   
  0.8 +-+      ******     *     *                              sin *******-+   
      |        *    *     *     *                                          |   
  0.6 +-+      *    *     *     *******                                  +-+   
      |  *******    *     *     *     *                                    |   
  0.4 +-+*     *    *     *     *     *                                  +-+   
      |  *     *    *     *     *     *                                    |   
  0.2 +-+*     *    *     *     *     ******                             +-+   
    0 **********************************************************************   
      |                                    *     *     *     *    *     *  *   
 -0.2 +-+                                  *     *     *     *    *     *+-*   
      |                                    *     *     *     *    *     ****   
 -0.4 +-+                                  *******     *     *    *     *+-+   
      |                                          *     *     *    *     *  |   
 -0.6 +-+                                        *     *     *    *     *+-+   
      |                                          *     *     *    *******  |   
 -0.8 +-+                                        *******     *    *      +-+   
      +           +          +           +          +  *     *  + *        +   
   -1 +-+---------+----------+-----------+----------+--************------+-+   
      0           1          2           3          4           5          6  

postgres=# select i, sin(i) from generate_series(0, 6.3, 0.05) g(i) \g |gnuplot -p -e "set datafile separator ','; set key autotitle columnhead; set terminal dumb enhanced; plot '-'with boxes" 



    1 +-+-------+-*******-+---------+--------+---------+---------+-------+-+   
      +         ************        +        +         +         +         +   
  0.8 +-+     ****************                                 sin *******-+   
      |      ******************                                            |   
  0.6 +-+   ********************                                         +-+   
      |   ************************                                         |   
  0.4 +-+**************************                                      +-+   
      | ****************************                                       |   
  0.2 +******************************                                    +-+   
    0 ***************************************************************    +-+   
      |                              ********************************      |   
 -0.2 +-+                             ******************************     +-+   
      |                                ****************************        |   
 -0.4 +-+                               **************************       +-+   
      |                                  ***********************           |   
 -0.6 +-+                                 *********************          +-+   
      |                                     ******************             |   
 -0.8 +-+                                    ****************            +-+   
      +         +         +         +        + ************      +         +   
   -1 +-+-------+---------+---------+--------+---********--------+-------+-+   
      0         1         2         3        4         5         6         7   


postgres=# select i, sin(i) from generate_series(0, 6.3, 0.05) g(i) \g |gnuplot -p -e "set datafile separator ','; set key autotitle columnhead; set terminal dumb enhanced; plot '-'with lines ls 1" 



    1 +-+-------+-*******-+---------+--------+---------+---------+-------+-+   
      +         ***      ***        +        +         +         +         +   
  0.8 +-+     ***          ***                                 sin *******-+   
      |      **              **                                            |   
  0.6 +-+   *                  *                                         +-+   
      |   **                    **                                         |   
  0.4 +-+**                      **                                      +-+   
      | **                        **                                       |   
  0.2 +**                          **                                    +-+   
    0 **+                           **                              *    +-+   
      |                              **                            **      |   
 -0.2 +-+                             **                          **     +-+   
      |                                 *                        **        |   
 -0.4 +-+                                *                      **       +-+   
      |                                  **                    *           |   
 -0.6 +-+                                 **                 **          +-+   
      |                                     **              **             |   
 -0.8 +-+                                    **           ***            +-+   
      +         +         +         +        + ***     +***      +         +   
   -1 +-+-------+---------+---------+--------+---*******---------+-------+-+   
      0         1         2         3        4         5         6         7

↧

Luca Ferrari: Why Dropping a Column does not Reclaim Disk Space? (or better, why is it so fast?)

February 8, 2020, 4:00 pm

≫ Next: Sadequl Hussain: How to Automate PostgreSQL 12 Replication and Failover with repmgr – Part 2

≪ Previous: Pavel Stehule: psql and gnuplot

You may have noticed how dropping a column is fast in PostgreSQL, haven’t you?

Why Dropping a Column does not Reclaim Disk Space? (or better, why is it so fast?)

Simple answer: because PostgreSQL knows how to do its job at best!

Let’s create a dummy table to test this behavior against:

testdb=>CREATETABLEfoo(iint);CREATETABLEtestdb=>INSERTINTOfooSELECTgenerate_series(1,10000000);INSERT010000000testdb=>SELECTpg_size_pretty(pg_relation_size('foo'));pg_size_pretty----------------346MB(1row)

Now, let’s add a quite large column to the table and measure how much time does it takes:

testdb=>\timingTimingison.testdb=>ALTERTABLEfooADDCOLUMNttextDEFAULTmd5(random()::text);ALTERTABLETime:30702,872ms(00:30,703)

What happened? In nearly 31 secs the table has grown with random data on every row to the extent of 651 MB (almost the double of the original size):

testdb=>SELECTpg_size_pretty(pg_relation_size('foo'));pg_size_pretty----------------651MB(1row)

What does PostgreSQL thinks about the...

↧

Sadequl Hussain: How to Automate PostgreSQL 12 Replication and Failover with repmgr – Part 2

February 11, 2020, 2:38 am

≫ Next: Mark Wong: Creating a PostgreSQL procedural language – Part 2 – Embedding Julia

≪ Previous: Luca Ferrari: Why Dropping a Column does not Reclaim Disk Space? (or better, why is it so fast?)

This is the second installment of a two-part series on 2ndQuadrant’s repmgr, an open-source high-availability tool for PostgreSQL. In the first part, we set up a three-node PostgreSQL 12 cluster along with a “witness” node. The cluster consisted of a primary node and two standby nodes. The cluster and the witness node were hosted in […]

↧

Mark Wong: Creating a PostgreSQL procedural language – Part 2 – Embedding Julia

February 11, 2020, 4:51 pm

≫ Next: Hans-Juergen Schoenig: Migrating from MS SQL to PostgreSQL: Uppercase vs. Lowercase

≪ Previous: Sadequl Hussain: How to Automate PostgreSQL 12 Replication and Failover with repmgr – Part 2

Julia provides an API so that Julia functions can be called from C. PL/Julia will use this C API to execute Julia code from its user defined functions and stored procedures. Julia’s documentation provides an example C program that starts up the Julia environment, evaluates the expression sqrt(2.0), displays the resulting value to the standard […]

↧

Hans-Juergen Schoenig: Migrating from MS SQL to PostgreSQL: Uppercase vs. Lowercase

February 12, 2020, 12:00 am

≫ Next: Dave Conlin: Configuring work_mem in Postgres

≪ Previous: Mark Wong: Creating a PostgreSQL procedural language – Part 2 – Embedding Julia

When migrating from MS SQL to PostgreSQL, one of the first things people notice is that in MS SQL, object names such as tables and columns all appear in uppercase. While that is possible on the PostgreSQL side as well it is not really that common. The question therefore is: How can we rename all those things to lowercase – easily and fast?

MS SQL PostgreSQL Migration

Finding tables to rename in PostgreSQL

The first question is: How can you find the tables which have to be renamed? In PostgreSQL, you can make use of a system view (pg_tables) which has exactly the information you need:

SELECT 	'ALTER TABLE public."' || tablename || '" RENAME TO ' || lower(tablename) 
FROM 	pg_tables 
WHERE 	schemaname = 'public'
	AND tablename <> lower(tablename);

This query does not only return a list of tables which have to be renamed. It also creates a list of SQL commands.

If you happen to use psql directly it is possible to call …

\gexec

… directly after running the SQL above. \gexec will take the result of the previous statement and consider it to be SQL which has to be executed. In short: PostgreSQL will already run the ALTER TABLE statements for you.

The commands created by the statement will display a list of instructions to rename tables:

                 ?column?                 
------------------------------------------
 ALTER TABLE public."AAAA" RENAME TO aaaa
(1 row)

Avoid SQL injection at all cost

However, the query I have just shown has a problem: It does not protect us against SQL injection. Consider the following table:

test=# CREATE TABLE "A B C" ("D E" int);
CREATE TABLE

In this case the name of the table contains blanks. However, it could also contain more evil characters, causing security issues. Therefore it makes sense to adapt the query a bit:

test=# SELECT 'ALTER TABLE public.' || quote_ident(tablename) || ' RENAME TO ' || lower(quote_ident(tablename))
       FROM    pg_tables
       WHERE   schemaname = 'public'
               AND   tablename <> lower(tablename);

The quote_ident function will properly escape the list of objects as shown in the listing below:

                   ?column?                   
----------------------------------------------
 ALTER TABLE public."AAAA" RENAME TO "aaaa"
 ALTER TABLE public."A B C" RENAME TO "a b c"
(2 rows)

\gexec can be used to execute this code directly.

Renaming columns in PostgreSQL to lowercase

After renaming the list of tables, you can turn your attention to fixing column names. In the previous example, I showed you how to get a list of tables from pg_tables. However, there is a second option to extract the name of an object: The regclass data type. Basically regclass is a nice way to turn an OID to a readable string.

The following query makes use of regclass to fetch the list of tables. In addition, you can fetch column information from pg_attribute:

test=# SELECT   'ALTER TABLE ' || a.oid::regclass || ' RENAME COLUMN ' || quote_ident(attname)
                    || ' TO ' || lower(quote_ident(attname))
       FROM    pg_attribute AS b, pg_class AS a, pg_namespace AS c 
       WHERE   relkind = 'r'
               AND     c.oid = a.relnamespace
               AND     a.oid = b.attrelid
               AND     b.attname NOT IN ('xmin', 'xmax', 'oid', 'cmin', 'cmax', 'tableoid', 'ctid')
               AND     a.oid > 16384
               AND     nspname = 'public'
               AND     lower(attname) != attname;
                     ?column?                     
--------------------------------------------------
 ALTER TABLE "AAAA" RENAME COLUMN "B" TO "b"
 ALTER TABLE "A B C" RENAME COLUMN "D E" TO "d e"
(2 rows)
\gexec

\gexec will again run the code we have just created, and fix column names.

Finally

As you can see, renaming tables and columns in PostgreSQL is easy. Moving from MS SQL to PostgreSQL is definitely possible – and tooling is more widely available nowadays than it used to be. If you want to read more about PostgreSQL, checkout our blog about moving from Oracle to PostgreSQL.

The post Migrating from MS SQL to PostgreSQL: Uppercase vs. Lowercase appeared first on Cybertec.

↧