Dimitri Fontaine: PostgreSQL Data Types: ENUM

May 2, 2018, 2:00 am

≫ Next: Thomas Munro: SERIALIZABLE in PostgreSQL 11... and beyond

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 11 – Support partition pruning at execution time

Continuing our series of PostgreSQL Data Types today we’re going to introduce the PostgreSQL ENUM type.

This data type has been added to PostgreSQL in order to make it easier to support migrations from MySQL. Proper relational design would use a reference table and a foreign key instead.

↧

Thomas Munro: SERIALIZABLE in PostgreSQL 11... and beyond

May 2, 2018, 8:27 pm

≫ Next: Hans-Juergen Schoenig: PostgreSQL: Sharing data across function calls

≪ Previous: Dimitri Fontaine: PostgreSQL Data Types: ENUM

Thanks to the tireless work of Google Summer of Code student Shubham Barai with the support of reviewers, a mentor and a committer, PostgreSQL 11 will ship with predicate lock support for hash indexes, gin indexes and gist indexes. These will make SERIALIZABLE transaction isolation much more efficient with those indexes, filling in some of the feature combination gaps and quirks that exist in my favourite RDBMS.

It seems like a good time to write a bit about SERIALIZABLE and how it interacts with other selected PostgreSQL features, including indexes.

A bit of background

If you want to read something a little less dry than the usual papers about transaction isolation, I recommend ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications which, among other things, discusses a transaction isolation-based attack that bankrupted a bitcoin exchange. It also makes some interesting observations about some of PostgreSQL's rivals. Even excluding malicious attacks, I've been working with databases long enough to have heard plenty of transaction isolation screw-up stories that I can't repeat here including trading systems and credit limit snafus, unexpected arrest warrants and even ... a double booked violin teacher.

True SERIALIZABLE using the SSI algorithm is one of my favourite PostgreSQL features and was developed by Dan Ports and Kevin Grittner for release 9.1. From what I've seen and heard, SERIALIZABLE has a reputation among application developers as a complicated expert-level feature for solving obscure problems with terrible performance, but that isn't at all justified... at least on PostgreSQL. As the ACIDRain paper conveys much better than I can, weaker isolation levels are in fact more complicated to use correctly with concurrency. For non-trivial applications, error-prone ad-hoc serialisation schemes based on explicit locking are often required for correctness. While the theory behind SSI may sound complicated, that's the database's problem! The end user experience is the exact opposite: it's a switch you can turn on that lets you write applications that assume that each transaction runs in complete isolation, dramatically cutting down the number of scenarios you have to consider (or fail to consider). In short, it seems that it's the weaker isolation levels that are for experts.

As an example, suppose you are writing a scheduling system for a busy school. One transaction might consist of a series of queries to check if a teacher is available certain time, check if a room is available, check if any of the enrolled students has another class at the same time, check if the room's capacity would be exceeded by the currently enrolled students, and then finally schedule a class. If you do all of this in a SERIALIZABLE transaction then you don't even have to think about concurrent modifications to any of those things. If you use a weaker level, then you have to come up with an ad-hoc locking strategy to make sure that a concurrent transaction doesn't create a scheduling clash.

Most other databases use a pessimistic locking strategy for SERIALIZABLE, which amounts to literally serialising transactions whose read/write set conflicts. In contrast, PostgreSQL uses a recently discovered optimistic strategy which allows more concurrency and avoids deadlocks, but in exchange for the increased concurrency it sometimes needs to abort transactions if it determines that they are incompatible with all serial orderings of the transactions. When that happens, the application must handle a special error code by retrying the whole transaction again. Many workloads perform better under SSI than under the traditional locking strategy, though some workloads (notably queue-like workloads where sessions compete to access a 'hot' row) may be unsuitable because they generate too many retries. In other words, optimistic locking strategies pay off as long as the optimism is warranted. Pessimistic strategies may still be better if every transaction truly does conflict with every other transaction.

The type of locks used by SSI are known as "SIREAD" locks or "predicate" locks, and they are distinct from the regular PostgreSQL "heavyweight locks" in that you never wait for them and they can't deadlock. The SSI algorithm permits spurious serialisation failures, which could be due to lack of memory for SIREAD locks leading to lock escalation (from row to page to relation level), lack of support in index types leading to lock escalation (see below), or the fundamental algorithm itself which is based on a fast conservative approximation of a circular graph detector. We want to minimise those. A newer algorithm called Precise SSI might be interesting for that last problem, but much lower hanging fruit is the index support.

Interaction with indexes

Unlike the regular locking that happens when you update rows, SERIALIZABLE needs to lock not only rows but also "predicates", representing gaps or hypothetical rows that would match some query. If you run a query to check if there is already a class booked in a given classroom at a given time and found none, and then a concurrent transaction creates a row that would have matched your query, we need a way to determine that these transactions have an overlapping read/write set. If you don't use SERIALIZABLE, you'd probably need to serialise such pairs of transactions by making sure that that there is an advisory lock or an explicit row lock on something else -- in the school example that might be the row representing the classroom (which is in some sense a kind of "advisory" lock by another name, since all transactions involved have to opt into this scheme). Predicate locks handle the case automatically without the user having to do that analysis, and make sure that every transaction in the system gets it right. In PostgreSQL, performing this magic efficiently requires special support from indexes. PostgreSQL 11 adds more of that.

When SSI was first introduced, only btrees had support for predicate locks. Conceptually, a predicate lock represents a logical predicate such as "X = 42" against which any concurrent writes must be compared. Indexes that support predicate locks approximate that predicate by creating SIREAD locks for index pages that would have to be modified by any conflicting insert. In PostgreSQL 11 that behaviour now extends to gin, gist and hash.

If your query happens to use an index that doesn't support predicate locks, then PostgreSQL falls back to predicate locking the whole relation. This means that the SSI algorithm will report a lot more false positive serialisation failures. In other words, in earlier releases if you were using SERIALIZABLE you'd have a good reason to avoid gin, gist and hash indexes, and vice versa, because concurrent transactions would produce a ton of serialisation anomalies and thus retries in your application. In PostgreSQL 11 you can use all of those feature together without undue retries!

One quite subtle change to the SERIALIZABLE/index interaction landed in PostgreSQL 9.6. It made sure that unique constraint violations wouldn't hide serialisation failures. This is important, because if serialisation failures are hidden by other errors then it prevents application programming frameworks from automatically retrying transactions for you on serialisation failure. For example, if your Java application is using Spring Retry you might configure it to retry any incoming service request on ConcurrencyFailureException; for Ruby applications you might use transaction_retry; similar things exist for other programming environments that provide managed transactions. That one line change to PostgreSQL was later determined to be a bug-fix and back-patched to all supported versions. If future index types add support for unique constraints, they will also need to consider this case.

Here ends the part of this blog article that concerns solid contributions to PostgreSQL 11. The next sections are about progressively more vaporous contributions aiming to fill in the gaps where SERIALIZABLE interacts poorly with other features.

Parallelism

The parallel query facilities in PostgreSQL 9.6, 10 and the upcoming 11 release are disabled by SERIALIZABLE. That is, if you enable SERIALIZABLE, your queries won't be able to use more than one CPU core. I worked on a patch to fix that problem. Unfortunately I didn't quite manage to get that into the right shape in time to land it in PostgreSQL 11 so the target is now PostgreSQL 12. It's good that parallel query was released when it was and not held back by lack of SERIALIZABLE support, but we need to make sure that we plug gaps like these: you shouldn't have to choose between SERIALIZABLE and parallel query.

Replication

PostgreSQL allows read-only queries to be run on streaming replica servers. It doesn't allowed SERIALIZABLE to be used on those sessions though, because even read-only transactions can create serialisation anomalies. A solution to this problem was described by Kevin Gittner. I have written some early prototype code to test the idea (or my interpretation of it), but I ran into a few problems that are going to require some more study.

Stepping back a bit, the general idea is to extend what SERIALIZABLE READ ONLY DEFERRABLE does on a single-node database server. Before I explain that, I'll need to explain the concept of a "safe transaction". One of the optimisations that Kevin and Dan made in their SSI implementation is to identify points in time when READ ONLY transactions become safe, meaning that there is no way that they can either suffer a serialisation failure or cause anyone else to suffer one. When that point is reached, PostgreSQL effectively silently drops the current transaction from SERIALIZABLE to REPEATABLE READ, or in other words from SSI (serialisable snapshot isolation) to SI (snapshot isolation) because it has proven that the result will be the same. That allows it to forget all about SIREAD locks and the transaction dependency graph, so that it can go faster. SERIALIZABLE READ ONLY DEFERRABLE is a way to say that you would like to begin a READ ONLY transaction and then wait until it is safe before continuing. In other words, it effectively runs in REPEATABLE READ isolation, but waits until a moment when that'll be indistinguishable from SERIALIZABLE. It might return immediately if no writable SERIALIZABLE transactions are running, but otherwise it'll make you wait until all concurrent writable SERIALIZABLE transactions have ended. As far as I know, PostgreSQL's safe read only transaction concept is an original contribution not described in the earlier papers.

The leading idea for how to make SERIALIZABLE work on standby servers is to cause it to silently behave like SERIALIZABLE READ ONLY DEFERRABLE. That's complicated though, because the standby server doesn't know anything about transactions running on the primary server. The proposed solution is to put a small amount of extra information into the WAL that would allow standby servers to know that read only transactions (or technically snapshots) begun at certain points must be safe, or are of unknown status and must wait for a later WAL record that contains the determination.

I really hope that we can get that to work, because as with the other features listed above, it's a shame to have to choose between load balancing and SERIALIZABLE.

SKIP LOCKED

SKIP LOCKED is the first patch that I wrote for PostgreSQL, released in 9.5. I wrote it to scratch an itch: we used another RDBMS in my job of the time, and that was one of the features that came up as something missing from PostgreSQL that might prevent us from migrating.

It's designed to support distributing explicit row locks to multiple sessions when you don't care which rows each session gets, but you want to maximise concurrency. The main use case is consuming jobs from a job queue, but other uses cases include reservation systems (booking free seats, rooms etc) and rolled-up lazily maintained aggregation tables (finding 'dirty' rows that need to be recomputed etc).

This is called a kind of "exotic isolation" by Jim Gray in Transaction Processing: Concepts and Techniques (under the name "read-past"). As far as I can see, it's philosophically opposed to SERIALIZABLE because it implies that you are using explicit row locks in the first place. That shouldn't be necessary under SERIALIZABLE, or we have failed. Philosophy aside, there is a more practical problem: the rows that you skip are still predicate-locked, so create conflicts among all the competing transactions. You lock different rows but only one concurrent transaction ever manages to complete, and you waste a lot of energy retrying.

This forces a choice between SERIALIZABLE and SKIP LOCKED, not because the features exclude each other but because the resulting performance is terrible.

The idea I have to deal with this is to do a kind of implicit SIREAD lock skipping under certain conditions. First, let's look at a typical job queue processing query using SKIP LOCKED:

SELECT id, foo, bar
FROM job_queue
WHERE state = 'NEW'
FOR UPDATE SKIP LOCKED
LIMIT 1;

The idea is that under SERIALIZABLE you should be able to remove the FOR UPDATE SKIP LOCKED clause and rely on SSI's normal protections. Since you didn't specify an ORDER BY clause and you did specify a LIMIT N clause, you told that you don't care which N rows you get back as long as state = 'NEW'. This means we can change the scan order. Seeing the LIMIT and the isolation level, the executor could skip (but not forget) any tuples that are already SIREAD-locked, and then only go back to the ones it skipped if it doesn't manage to find enough non-SIREAD-locked tuples to satisfy the query. Instead of an explicit SKIP LOCKED mode, it's a kind of implicit REORDER LOCKED (meaning SIREAD locks) that minimises conflicts.

If you add an ORDER BY clause it wouldn't work, because you thereby remove the leeway granted by nondeterminism in the ordering. But without it, this approach should fix a well known worst case workload for SERIALIZABLE. Just an idea; no patch yet.

↧

Hans-Juergen Schoenig: PostgreSQL: Sharing data across function calls

May 3, 2018, 1:07 am

≫ Next: Vladimir Svedov: A Guide to Pgpool for PostgreSQL - Part Two

≪ Previous: Thomas Munro: SERIALIZABLE in PostgreSQL 11... and beyond

Recently I did some PostgreSQL consulting in the Berlin area (Germany) when I stumbled over an interesting request: How can data be shared across function calls in PostgreSQL? I recalled some one of the other features of PostgreSQL (15+ years old or so) to solve the issue. Here is how it works.

Stored procedures in PostgreSQL

As many of you might know PostgreSQL allows you to write stored procedures in many different languages. Two of the more popular ones are Perl and Python, which have been around for quite some time. The cool thing is: Both languages offer a way to share variables across function calls. In Perl you can make use of the $_SHARED variable, which is always there.

Here is an example:

CREATE OR REPLACE FUNCTION set_var(int)
RETURNS int AS $$
   $_SHARED{'some_name'} = $_[0];
   return $_[0];
$$ LANGUAGE plperl;

What the code does is to assign a value to some_name and returns the assigned value. Some other function can then make use of this data, which is stored inside your database connection. Here is an example:

CREATE OR REPLACE FUNCTION increment_var()
RETURNS int AS $$
   $_SHARED{'some_name'} += 1;
   return $_SHARED{'some_name'};
$$ LANGUAGE plperl;

This function will simply increment the value and return it. As you can see the code is pretty simple and easy to write.

Assigning shared variables

The following listing shows, how the code can be used. The first call will assign a value to the function while the second one will simply increment that value:

test=# SELECT set_var(5);
 set_var
---------
 5
(1 row)

test=# SELECT increment_var(), increment_var();
 increment_var | increment_var
---------------+---------------
             6 | 7
(1 row)

It is especially noteworthy here that the second column will already see the changes made by the first column, which is exactly what we want here.

Shared variables and transactions

When working with shared variables in PL/Perl or PL/Python you have to keep in mind that those changes will not be transactional as all the rest in PostgreSQL is. Even if you rollback a transaction you can observe that those values will stay incremented:

test=# BEGIN;
BEGIN
test=# SELECT increment_var(), increment_var();
increment_var  | increment_var
---------------+---------------
             8 | 9
(1 row)
test=# ROLLBACK;
ROLLBACK
test=# SELECT increment_var(), increment_var();
 increment_var | increment_var
---------------+---------------
            10 | 11
(1 row)

This behavior makes shared values actually a nice thing to have if you want to preserve data across transactions.

The post PostgreSQL: Sharing data across function calls appeared first on Cybertec.

↧

Vladimir Svedov: A Guide to Pgpool for PostgreSQL - Part Two

May 3, 2018, 3:22 am

≫ Next: Luca Ferrari: plperl: which version of Perl?

≪ Previous: Hans-Juergen Schoenig: PostgreSQL: Sharing data across function calls

This is the second part of the blog “A Guide to Pgpool for PostgreSQL”. The first part covering load balancing, session pooling, in memory cache and installation can be found here.

Many users look towards pgpool specifically for High Availability features, and it has plenty to offer. There are few quite a lot of instructions for pgpool HA on the web (e.g. longer one and shorter one), so it would not make any sense to repeat them. Neither do we want to provide yet another blind set of configuration values. Instead I suggest to play against the rules and try doing it the wrong way, so we’ll see some interesting behaviour. One of the top expected feature (at least it’s on the top of the page) is the ability to recognise the usability of a “dead” ex master and re-use it with pg_rewind. It could save hours of bringing back the new standby with big data (as we skip rsync or pg_basebackup, which effectively copies ALL files over from the new master). Strictly speaking, pg_rewind is meant for planned failover (during upgrade or migrating to new hardware). But we’ve seen when it’s greatly helps with not planned but yet graceful shutdown and automated failover - for e.g., ClusterControl makes use of it when performing automatic failover of replication slaves. Let’s assume we have the case: we need (any) master to be accessible as much as possible. If for some reason (network failure, max connections exceeded or any other “failure” that forbids new sessions to start) we no longer can use a master for RW operations, we have a failover cluster configured, with slaves that can accept connections. We can then promote one of the slaves and fail over to it.

First let’s assume we have three nodes:

10.1.10.124:5400 with /pg/10/m (pgpool spins here as well)
10.1.10.147:5401 with /pg/10/m2
10.1.10.124:5402 with /pg/10/s2

Those are effectively the same nodes as in part one, but the failover node is moved to a different host and $PGDATA. I did it to make sure I did not typo or forget some extra quote in remote ssh command. Also the debugging info will look simpler because ip addresses are different. Finally I was not sure I will be able to make this unsupported use case to work, so I have to see it with my own eyes.

Failover

First we set failover_command and run pgpool reload and try to failover. Here and further, I will echo some info to /tmp/d on the pgpool server, so I can tail -f /tmp/d to see the flow.

postgres@u:~$ grep failover_command /etc/pgpool2/pgpool.conf
failover_command = 'bash /pg/10/fo.sh %D %H %R'

postgres@u:~$ cat /pg/10/fo.sh
rem_cmd="pg_ctl -D $3 promote"
cmd="ssh -T postgres@$2 $rem_cmd"
echo "$(date) $cmd" >>/tmp/d
$cmd &>>/tmp/d

NB: Do you have $PATH set in .bashrc on remote host?..

Let’s stop the master (I know it’s not how disaster happens, you expect at least some huge monkey or red shining robot to smash the server with a huge hammer, or at least the boring hard disks to die, but I’m using this graceful variant to demo the possible use of pg_rewind, so here the failover will be the result of human error or network failure a half second over the health_check_period), so:

/usr/lib/postgresql/10/bin/pg_ctl -D /pg/10/m stop
2018-04-18 13:53:55.469 IST [27433]  LOG:  received fast shutdown request
waiting for server to shut down....2018-04-18 13:53:55.478 IST [27433]  LOG:  aborting any active transactions
2018-04-18 13:53:55.479 IST [28855] postgres t FATAL:  terminating connection due to administrator command
2018-04-18 13:53:55.483 IST [27433]  LOG:  worker process: logical replication launcher (PID 27440) exited with exit code 1
2018-04-18 13:53:55.484 IST [27435]  LOG:  shutting down
2018-04-18 13:53:55.521 IST [27433]  LOG:  database system is shut down
 done
server stopped

Now checking the failover command output:

postgres@u:~$ cat /tmp/d
Wed Apr 18 13:54:05 IST 2018 ssh -T postgres@localhost
pg_ctl -D /pg/10/f promote
waiting for server to promote.... done
server promoted

And checking after a while:

t=# select nid,port,st, role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port |  st  |  role
-----+------+------+---------
   0 | 5400 | down | standby
   1 | 5401 | up   | primary
   2 | 5402 | up   | standby
(3 rows)

Also we see in ex-failover cluster logs:

2018-04-13 14:26:20.823 IST [20713]  LOG:  received promote request
2018-04-13 14:26:20.823 IST [20713]  LOG:  redo done at 0/951EC20
2018-04-13 14:26:20.823 IST [20713]  LOG:  last completed transaction was at log time 2018-04-13 10:41:54.355274+01
2018-04-13 14:26:20.872 IST [20713]  LOG:  selected new timeline ID: 2
2018-04-13 14:26:20.966 IST [20713]  LOG:  archive recovery complete
2018-04-13 14:26:20.998 IST [20712]  LOG:  database system is ready to accept connections

Checking replication:

postgres@u:~$ psql -p 5401 t -c "select now() into test"
SELECT 1
postgres@u:~$ psql -p 5402 t -c "select * from test"
              now
-------------------------------
 2018-04-13 14:33:19.569245+01
(1 row)

The slave /pg/10/s2:5402 switched to a new timeline thanks to recovery_target_timeline = latestin recovery.conf, so we are good. We don’t need to adjust recovery.conf to point to the new master, because it points to the pgpool ip and port and they stay the same no matter who is performing the primary master role.

Checking load balancing:

postgres@u:~$ (for i in $(seq 1 9); do psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1" -XAt; done) | sort| uniq -c
      6 5401
      3 5402

Nice. Apps behind pgpool will notice a second outage and continue to work.

Reusing ex-master

Now we can turn the ex-master to failover standby and bring it back (without adding a new node to pgpool, as it exists there already). If you don’t have wal_log_hints enabled or data checksums (comprehensive difference between these options is here), you have to recreate cluster on ex-master to follow a new timeline:

postgres@u:~$ rm -fr /pg/10/m
postgres@u:~$ pg_basebackup -h localhost -p 5401 -D /pg/10/m/

But don’t rush to run the statements above! If you took care on wal_log_hints (requires restart), you can try using pg_rewind for much faster switching of the ex-master to a new slave.

So ATM we have the ex-master offline, new master with next timeline started. If the ex-master was offline due to temporary network failure and it comes back, we need to shut it down first. In the case above we know it’s down, so we can just try rewinding:

postgres@u:~$ pg_rewind -D /pg/10/m2 --source-server="port=5401 host=10.1.10.147"
servers diverged at WAL location 0/40605C0 on timeline 2
rewinding from last common checkpoint at 0/4060550 on timeline 2
Done!

And again:

postgres@u:~$ pg_ctl -D /pg/10/m2 start
server started
...blah blah 
postgres@u:~$ 2018-04-16 12:08:50.303 IST [24699]  LOG:  started streaming WAL from primary at 0/B000000 on timeline 2

t=# select nid,port,st,role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port |  st  |  role
-----+------+------+---------
   0 | 5400 | down | standby
   1 | 5401 | up   | primary
   2 | 5402 | up   | standby
(3 rows)

Ops. Duh! Despite the fact that the cluster at port 5400 is online and follows a new timeline, we need to tell pgpool to recognize it:

postgres@u:~$ pcp_attach_node -w -h 127.0.0.1 -U vao -n 0
 pcp_attach_node  -- Command Successful

Now all three are up (and pgpool knows it) and in sync:

postgres@u:~$ sql="select ts.i::timestamp(0), current_setting('data_directory'),case when pg_is_in_recovery() then 'recovering' else 'mastering' end stream from ts order by ts desc"
postgres@u:~$ psql -h 10.1.10.147 -p 5401 t -c "$sql";
          i          | current_setting |  stream
---------------------+-----------------+-----------
 2018-04-30 14:34:36 | /pg/10/m2       | mastering
(1 row)

postgres@u:~$ psql -h 10.1.10.124 -p 5402 t -c "$sql";
          i          | current_setting |   stream
---------------------+-----------------+------------
 2018-04-30 14:34:36 | /pg/10/s2       | recovering
(1 row)

postgres@u:~$ psql -h 10.1.10.124 -p 5400 t -c "$sql";
          i          | current_setting |   stream
---------------------+-----------------+------------
 2018-04-30 14:34:36 | /pg/10/m        | recovering
(1 row)

Now I’ll try using recovery_1st_stage_command for reusing ex-master:

root@u:~# grep 1st /etc/pgpool2/pgpool.conf
recovery_1st_stage_command = 'or_1st.sh'

But recovery_1st_stage_command does not offer the needed arguments for pg_rewind, which I can see if I add to recovery_1st_stage_command:

echo "online recovery started on $(hostname) $(date --iso-8601) $0 $1 $2 $3 $4"; exit 1;

The output:

online recovery started on u2 2018-04-30 /pg/10/m2/or_1st.sh /pg/10/m2 10.1.10.124 /pg/10/m 5401

Well - using pg_rewind is just in todo list - what did I expect?.. So I need to do some monkey hack to get master ip and port (remember it will keep changing after failover).

A monkey hack

So I have something like this in recovery_1st_stage_command:

root@u:~# cat /pg/10/or_1st.sh
pgpool_host=10.1.10.124
pgpool_port=5433
echo "online recovery started on $(hostname) $(date --iso-8601) $0 $1 $2 $3 $4" | ssh -T $pgpool_host "cat >> /tmp/d"
master_port=$(psql -XAt -h $pgpool_host -p $pgpool_port t -c "select port from dblink('host=$pgpool_host port=$pgpool_port','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int) where role='primary'")
master_host=$(psql -XAt -h $pgpool_host -p $pgpool_port t -c "select hostname from dblink('host=$pgpool_host port=$pgpool_port','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int) where role='primary'")
failover_host=$(psql -XAt -h $pgpool_host -p $pgpool_port t -c "select hostname from dblink('host=$pgpool_host port=$pgpool_port','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int) where role!='primary' order by port limit 1")
src='"port=$master_port host=$master_host"'
rem_cmd="'pg_rewind -D $3 --source-server=\"port=$master_port host=$master_host\"'"
cmd="ssh -T $failover_host $rem_cmd"
echo $cmd | ssh -T $pgpool_host "cat >> /tmp/d"
$cmd

tmp=/tmp/rec_file_tmp
cat > $tmp <<EOF
standby_mode          = 'on'
primary_conninfo      = 'host=$master_host port=$master_port user=postgres'
trigger_file = '/tmp/tg_file'
recovery_target_timeline  = latest
EOF

scp $tmp $failover_host:$3/recovery.conf

rem_cmd="pg_ctl -D $3 start"
cmd="ssh -T $failover_host $rem_cmd"
echo $cmd | ssh -T $pgpool_host "cat >> /tmp/d"
$cmd
echo "OR finished $(date --iso-8601)" | ssh -T $pgpool_host "cat >> /tmp/d"
exit 0;

Now what a mess! Well - if you decide to use not existing feature - prepare - it will look bad, work worse and you will permanently feel ashamed of what you did. So step by step:

I need pgpool IP and port to remotely connect to it, both to query “show pool_nodes” and to log steps and to run commands.
I’m piping some dbg info to /tmp/d over ssh, because the command will be executed on master side, which will change after failing over
I can use the result of “show pool_nodes” to get the running master connection info simply filtering with WHERE clause
I will need double quotes in argument for pg_rewind, which will need to run over ssh, so I just split the command for readability, then echo it and run
Preparing recovery.conf based on output from “show pool_nodes” (writing it I ask myself - why did I not just use pgpool IP and port instead?..
Starting new failover slave (I know I’m supposed to use 2nd step - just skipped to avoid getting all IPs and port over again)

Now what’s left - trying to use this mess in pcp:

root@u:~# pcp_recovery_node -h 127.0.0.1 -U vao -n 0 -w
pcp_recovery_node -- Command Successful
root@u:~# psql -h localhost -p 5433 t -c"select nid,port,st,role from dblink('host=10.1.10.124 port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int)"
 nid | port | st |  role
-----+------+----+---------
   0 | 5400 | up | standby
   1 | 5401 | up | primary
   2 | 5402 | up | standby
(3 rows)

Checking the /tmp/d on pgpool server:

root@u:~# cat /tmp/d
Tue May  1 11:37:59 IST 2018 ssh -T postgres@10.1.10.147 /usr/lib/postgresql/10/bin/pg_ctl -D /pg/10/m2 promote
waiting for server to promote.... done
server promoted
online recovery started on u2 2018-05-01 /pg/10/m2/or_1st.sh /pg/10/m2
ssh -T 10.1.10.124 'pg_rewind -D --source-server="port=5401 host=10.1.10.147"'
ssh -T 10.1.10.124 pg_ctl -D start
OR finished 2018-05-01

Now obviously we want to roll it over again to see if it works on any host:

postgres@u:~$ ssh -T 10.1.10.147 pg_ctl -D /pg/10/m2 stop             waiting for server to shut down.... done
server stopped
postgres@u:~$ psql -h localhost -p 5433 t -c"select nid,port,st,role from dblink('host=10.1.10.124 port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int)"
 nid | port |  st  |  role
-----+------+------+---------
   0 | 5400 | up   | primary
   1 | 5401 | down | standby
   2 | 5402 | up   | standby
(3 rows)

root@u:~# pcp_recovery_node -h 127.0.0.1 -U vao -n 1 -w

postgres@u:~$ psql -h localhost -p 5433 t -c"select nid,port,st,role from dblink('host=10.1.10.124 port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int)"
 nid | port | st |  role
-----+------+----+---------
   0 | 5400 | up | primary
   1 | 5401 | up | standby
   2 | 5402 | up | standby
(3 rows)

Log looks similar - only IP and ports have changed:

 Tue May  1 11:44:01 IST 2018 ssh -T postgres@10.1.10.124 /usr/lib/postgresql/10/bin/pg_ctl -D /pg/10/m promote
waiting for server to promote.... done
server promoted
online recovery started on u 2018-05-01 /pg/10/m/or_1st.sh /pg/10/m 10.1.10.147 /pg/10/m2 5400
ssh -T 10.1.10.147 'pg_rewind -D /pg/10/m2 --source-server="port=5400 host=10.1.10.124"'
ssh -T 10.1.10.147 pg_ctl -D /pg/10/m2 start
online recovery started on u 2018-05-01 /pg/10/m/or_1st.sh /pg/10/m
ssh -T 10.1.10.147 'pg_rewind -D --source-server="port=5400 host=10.1.10.124"'
ssh -T 10.1.10.147 pg_ctl -D start
OR finished 2018-05-01

In this sandbox, the master moved to 5401 on failover and after living there for a while it moved back to 5400. Using pg_rewind should make it as fast as possible. Previously the scary part of automatic failover was - if you really messed up the config and did not foresee some force majeure, you could run into automatic failover to next slave and next and next until there is no free slave left. And after that, you just end up with several split brained masters and no failover spare. It’s a poor consolation in such scenario to have even more slaves to failover, but without pg_rewind you would not have even that. “Traditional” rsync or pg_basebackup copy ALL $PGDATA over to create a standby, and can’t reuse the “not too much different” ex master.

In conclusion to this experiment I would like to emphasize once again - this is not a solution suitable for blind copy pasting. The usage of pg_rewind is not encouraged for pg_pool. It is not usable at all ATM. I wanted to add some fresh air to pgpool HA configuration, for nubes like me to observe a little closer how it works. For coryphaeus to smile at naivistic approach and maybe see it with our - nubes eyes.

Tags:

PostgreSQL

pgpool

load balancing

connection pooling

↧

Luca Ferrari: plperl: which version of Perl?

May 2, 2018, 5:00 pm

≫ Next: pgCMH - Columbus, OH: See you at PyCon

≪ Previous: Vladimir Svedov: A Guide to Pgpool for PostgreSQL - Part Two

plperl is a great extension for PostgreSQL that allows the execution of Perl 5 code within the database.

`plperl`: which version of Perl?

When executing Perl 5 code within the database, PostgreSQL uses the /embedded Perl 5/ to create one (or more) instance of the interpreter. The version of the compiler and virtual machine that runs depends on how PostgreSQL has been compiled, or better, how libperl.so has been created. It is possible to use a specific version of Perl without having to change the system wide Perl 5, and in particular it is possible with some effort to use perlbrew to this aim.

Understanding which `perl` the database is executing

To know which perl executable the server will run it is possible to use the Config module for a little introspection:

#DOLANGUAGEplperlu$PERL$useConfig;elog(INFO,'Perl executable '.$Config{perlpath});elog(INFO,'Perl version '.$Config{version});elog(INFO,'Perl library '.$Config{libperl});$PERL$;

For example, the above piece of code produces the following output on my system:

INFO: Perl executable /usr/local/bin/perl INFO: Perl version 5.24.3 INFO: Perl library libperl.so.5.24.3 

that tells clearly the perl executable is at version 5.24.3. The same could have been checked from the plperl.so library file, that is linked to the above version of the library:

↧

pgCMH - Columbus, OH: See you at PyCon

May 2, 2018, 9:00 pm

≫ Next: Pavel Stehule: ncurses CUA menu demo

≪ Previous: Luca Ferrari: plperl: which version of Perl?

Hey everyone! Just a quick note to let you all know that both CJ and myself are manning the PostgreSQL community booth at this year’s PyCon next week in Cleveland, OH.

The Python and PostgreSQL communities have a long history together, and we’re super excited to be able to represent PostgreSQL at the largest annual gathering of Pythonistas. If you can make the drive, come on up to Cleveland, say hi, and learn some Python!

https://us.pycon.org/2018/#!

↧

Pavel Stehule: ncurses CUA menu demo

May 3, 2018, 1:31 pm

≫ Next: Shaun M. Thomas: PG Phriday: BDR Around the Globe

≪ Previous: pgCMH - Columbus, OH: See you at PyCon

I finished technology demo of ncurses based implementation of CUA menu - menubar and pull down menu. Please, check my github project https://github.com/okbob/ncurses-st-menu. I like when interface looking well - so I implemented few styles (owns styles are possible). You can see screenshots:

↧

Shaun M. Thomas: PG Phriday: BDR Around the Globe

May 4, 2018, 10:00 am

≫ Next: Laurenz Albe: Avoiding “OR” for better query performance

≪ Previous: Pavel Stehule: ncurses CUA menu demo

With the addition of logical replication in Postgres 10, it’s natural to ask "what’s next"? Though not directly supported yet, would it be possible to subscribe two Postgres 10 nodes to each other? What kind of future would that be, and what kind of scenarios would be ideal for such an arrangement?

As it turns out, we already have a kind of answer thanks to the latency inherent to the speed of light: locality. If we can provide a local database node for every physical app location, we also reduce latency by multiple orders of magnitude.

Let’s explore the niche BDR was designed to fill.

What is Postgres-BDR?

Postgres-BDR is simply short for Postgres Bi-Directional Replication. Believe it or not, that’s all it needs to be. The implications of the name itself are numerous once fully explored, and we’ll be doing plenty of that.

So what does it do?

Logical replication
Multi-Master
Basic conflict resolution (last update wins)
Distributed locking
Global sequences
High latency replay (imagine a node leaves and comes back later)

The key to having Multi-Master and associated functionality is logical replication. Once we can attach to the logical replication stream, we just need a piece of software to communicate between all participating nodes to consume those streams. This is necessary to prevent nodes from re-transmitting rows received from another system and resolve basic conflicts.

Why Geographical Distribution?

Specifically:

Local database instances
Inter-node communication is high latency
Eventual consistency is the only consistency (for now)
That’s a Good Thing!(tm)

We don’t want latency between our application and the database, so it’s better to operate locally. This moves the latency into the back-end between the database nodes, and that means the dreaded "eventual consistency" model. It’s a natural consequence, but we can use it to our advantage.

Consider This

We have an application stack that operates in four locations: Sydney, Dubai, Dallas, and Tokyo.

We tried to get a data center in Antarctica, but the penguins kept going on strike. Despite that, here’s a look at average latencies between those locations:

	Dallas	Dubai	Sydney	Tokyo
Dallas	xxx	230ms	205ms	145ms
Dubai	230ms	xxx	445ms	145ms
Sydney	200ms	445ms	xxx	195ms
Tokyo	145ms	145ms	195ms	xxx

Based on the lag, Tokyo seems to be something of a centralized location. Still, those round-trip-times are awful.

Eww, Math

Imagine our initial database is in Dubai.

Say we start in Dubai and want to expand to the US at first. We may be tempted to simply spin up an AWS or Google Cloud instance or two running our application. But like all modern websites, we make liberal use of AJAX and each page contains multiple database-backed components. Even five simple queries could drastically bloat page load time and drive away potential customers.

That’s bad, m’kay? This is hardly a new problem, and there are several existing solutions:

Front-end caching
- Problems with extremely dynamic content.
- Writes still need to go to Dubai.
Back-end caching
- Only works for small-medium size datasets
- Writes still need to go to Dubai.
Database replica
- Writes still need to go to Dubai.

We could use something like Varnish to cache the pages themselves, and that works for a few cases where there isn’t a lot of dynamic or user-generated content. Or if there is, we better have a lot of cache storage. Hopefully we don’t have any pages that need to write data, because that still has to go back to Dubai.

Alternatively, we could use memchached, Cassandra, or some other intermediate layer the application can populate and interact with instead. This is better, but can get stale depending on how we refresh these caches. Some of these cache layers can even capture writes locally and persist by committing to the primary store… in Dubai. This makes for great interactivity with potential implications regarding data consistency.

And then there’s outright Postgres streaming replicas. This is much closer to ideal for highly dynamic content, barring any unexpected network interruptions. And as usual, writes must be made in Dubai.

Yet all three of these solutions suffer from the same downfall.

Broken Record

All those darn writes being rerouted to Dubai. Writes always need to end up in the primary database, and this is where BDR comes in.

If our primary database is in Dubai, all our writes must eventually make it there, one way or another. Then it must propagate through all of our caching layers and replicas before it’s considered committed by the end user.

Or we can remove the Middle Man with BDR. In order to do that, we want a situation where all writes are local to their region. Something like this:

With BDR, we still have a "master" copy in every case, but it’s not integral to basic operation.

How do we Get There?

Identify problem areas
- Find all sequence usage
- Isolate conflict scenarios
Install BDR
Create and subscribe

Now we get to the nitty-gritty. We need to start by making sure our sequences don’t clobber each other between the regions. Then we should consider all the ways interactions between regions might conflict. This may require a careful and time-consuming audit of the application stack, but it’s better than being surprised later.

Then we can work on deploying everything. Believe it or not, that’s probably the easiest part.

Seeking Sequences

We start our adventure by finding any serial columns that increment by using the Postgres nextval() function. BDR has a mechanism for ensuring sequence values never collide across the cluster, but we must first identify all sequences that are associated with a column that isn’t already a BIGINT. A query like this might work:

SELECT table_schema, table_name, column_name
  FROM information_schema.columns
 WHERE column_default LIKE 'nextval\(%'
   AND data_type != 'bigint';

BDR’s built-in global sequence functionality needs BIGINT columns because it uses a lot of bit-shift math and packs it into a 64-bit integer. The sequence is no longer sequential, but it’s unique for up to 8-million values per second, per node.

Conflict Avoidance

The best way to prevent conflicts is to avoid them outright. Aggregate columns like "total" or "sum" are potential hot-spots for conflicting updates. If such a column must be used, consider constructing it from a ledger or event log instead. This way it doesn’t matter if Sydney adds 100 to a balance, and Texas adds 50, because each event is atomic.

Once applications can, they should interact with data specific to their region whenever possible. The less cross-talk between different regions, the less potential there is for conflict. If we scale down slightly, we can even consider this within a single datacenter. The goal is to separate write targets by using sticky sessions, natural hashes, assigned application zones, and so on.

Similarly, even if a session is operating outside of its region, conflicts can be avoided if the application doesn’t switch back and forth. If an application can’t "see" multiple copies of the database at alternating times, it’s much less likely to generate a conflicting row due to latency mismatches.

Off to the Races

Why are stored aggregates bad? Race conditions. Consider this scenario on a simple account table with a balance column.

Sydney adds 50 to 100 balance
Tokyo adds 50 to 100 balance
Result should be 200, but is 150 instead due to latency
Ledger of credit / debit would always be correct
Think of a cash register receipt
Use materialized aggregates for reports or summaries

Going into more detail helps illustrate what kinds of conflicts we want to prevent. A ledger has no concept of sum, just plus or minus. It’s possible to refund specific line-items so the full history is maintained. We can also generate the summary for all rows in the grouping. If a historical row is temporarily missing on a remote node, transient summaries may temporarily vary, but the incorrect balance will never exist as actual data.

These kinds of modifications are the safest approach, but are not always necessary in isolated cases. This is why it’s important to evaluate first.

Shop S-Mart

Local database; local writes. This is important and comes in two flavors:

Tokyo operates on Tokyo data; Texas on Texas
- Update conflicts less likely
- Fewer / no race conditions
Dubai operates in Dubai; Sydney in Sydney
- Prevents cross-data center conflicts
- Local writes are faster anyway

Since every BDR node has a copy of the full database, we could theoretically operate outside of our region. Of course, it’s better if we don’t. If Tokyo operates on a row from Texas, there’s a good chance Texas will overwrite that change or cause a conflict. We can avoid a non-deterministic result by keeping regional operations together.

Alternatively, the application may be configured in Sydney and try to operate on Sydney rows in the Dubai data center. That’s fine until other Sydney nodes are simultaneously connected to the database in Sydney. Unless there’s a very good reason for this, it’s probably a bad idea. It’s slower, defeats the purpose of geographical database distribution in general, and is almost guaranteed to generate conflicts.

Of course, if one region must be offline for maintenance or other purposes, using another geographical database is encouraged. Though even then, since we’re talking about continental divides, it’s probably better to have multiple local alternatives instead.

Journey of 1000 Miles

Assuming we have an existing cluster, how do we distribute to new zones? Let’s start with a basic arbitrary table from a corporate ordering system.

CREATE TABLE system_order (
  order_id      SERIAL PRIMARY KEY,
  product_id    INT NOT NULL,
  customer_id   INT NOT NULL,
  unit_count    INT NOT NULL,
  status        VARCHAR NOT NULL DEFAULT 'pending',
  reading_date  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_system_order_product_id ON system_order (unit_count);
CREATE INDEX idx_system_order_reading_date ON system_order (reading_date);

Take note of the SERIAL primary key, which corresponds to the underlying INT type. It’s an easy mistake if values aren’t expected to exceed 2-billion, but such isn’t compatible with BDR global sequences. A script like this might work for converting the existing tables before adding BDR to the mix:

COPY (
  SELECT 'ALTER TABLE ' || table_schema || '.' || table_name ||
         ' ALTER COLUMN ' || column_name || ' TYPE BIGINT;'
    FROM information_schema.columns
   WHERE column_default LIKE 'nextval\(%'
     AND data_type != 'bigint'
) TO '/tmp/alter_columns.sql';

\i /tmp/alter_columns.sql

Be wary that changing an underlying column type in Postgres results in the entire table being rebuilt. This may require extended downtime, or a more protracted migration using logical replication to another cluster where the types have already been converted.

We should also convert any columns that reference these as foreign keys, since the types should match. The query that does this is rather daunting however, so if requested, we’ll share it in the comments.

Seeding the World

Let’s assume Dubai is the origin node. This is currently how we’d initialize a cluster using the upcoming BDR 3 syntax:

CREATE EXTENSION pglogical;
CREATE EXTENSION bdr;

SELECT bdr.create_node(
  node_name := 'dubai',
  local_dsn := 'dbname=storefront host=dubai.host'
);

SELECT bdr.create_node_group(
  node_group_name := 'megacorp'
);

SELECT bdr.wait_for_join_completion();

With BDR, we "seed" a cluster from a single instance. In this case, the company started in Dubai.

Non Sequential

Now let’s fix those sequences. BDR 3 has a function that makes a metadata change so it treats a sequence as global automatically, using the 64-bit packing technique discussed earlier. We can bulk-convert all of them this way:

SELECT bdr.alter_sequence_set_kind(c.oid, 'timeshard')
  FROM pg_class c
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
 WHERE c.relkind = 'S'
   AND n.nspname NOT IN (
         'pg_catalog', 'information_schema', 'pg_toast',
         'pglogical', 'bdr'
       );

From this point on, values generated by nextval() for these sequences will be essentially arbitrary numeric results with up to 19 digits. For platforms that already use something like UUIDs, this kind of conversion isn’t necessary. And be wary of applications that may malfunction when interacting with such large numbers; numeric overflow is nobody’s friend.

Johnny Appleseed

Tokyo has the fastest RTT, so obviously it was the next step as the company grew. Let’s create a node there now on an empty shell database:

CREATE EXTENSION pglogical;
CREATE EXTENSION bdr;

SELECT bdr.create_node(
  node_name := 'tokyo',
  local_dsn := 'dbname=storefront host=tokyo.host'
);

SELECT bdr.join_node_group(
  join_target_dsn := 'dbname=storefront host=dubai.host'
);

SELECT bdr.wait_for_join_completion();

Looks similar to creating the seed node, right? The only difference is that we need to join an existing BDR cluster instead of starting a new one. And then we just keep repeating the process for Texas, Sydney, or wherever we want until we’re satisfied. Nodes everywhere, until latency is finally at an acceptable level.

Next Steps

Once the cluster exists:

Point applications to local copies
Create any necessary read replicas/caches
Be smug (optional)

For every node we roll out, we can move the local application stack to use it instead of the initial copy in Dubai. That will make both the local users and the Dubai database much happier.

This also means each local BDR node can have any number of streaming replicas for read scaling or standby targets. This alone warrants another entire blog post, but we’ll save that for another time.

And of course if you’re so inclined, you can be happy for drastically improving the company’s application infrastructure. Don’t worry, we won’t judge.

Finishing UP

The remaining caveats aren’t too onerous.

Since BDR uses replication slots, if a node is unreachable, Postgres will begin to retain WAL files for consumption when it returns. To plan for this, we can just take the amount of WAL files that elapse over a certain time period. Each one is 16MB, which tells us how much space to set aside. If a node goes over this limit, we remove it.

Conflicts can be logged to a table, and we recommend enabling that functionality. Then a monitoring system should always watch the contents of that table on each individual node. Conflicts resolve locally, so if they happen, only the node that experienced the conflict will have rows there. Keep an eye on this to make sure BDR’s resolution was the right one.

A transaction on a local BDR node gets committed locally before it enters the Postgres logical replication stream. This asynchronous nature is a strength of why BDR works so well locally. It’s also the reason conflicts are possible. Keep this in mind and it will always be easier to troubleshoot conflicts.

In the end, we’re left with something that contributes to a global cumulative data resource, with the same responsiveness of local access. All thanks to harnessing some of the deeper capabilities of logical replication.

As long as we understand the inherent design limitations of multi-master architectures, we can safely deploy our application stack and dedicated databases anywhere in the world.

For more information on use cases, deployment strategies, and other concepts related to muli-master and BDR concepts, we have a Postgres-BDR whitepaper that goes into more depth about these topics.

↧

Laurenz Albe: Avoiding “OR” for better query performance

May 7, 2018, 1:30 am

≫ Next: Dimitri Fontaine: PostgreSQL Data Types: Point

≪ Previous: Shaun M. Thomas: PG Phriday: BDR Around the Globe

To be OR not to be... — © Laurenz Albe 2018

PostgreSQL query tuning is our daily bread at Cybertec, and once you have done some of that, you’ll start bristling whenever you see an OR in a query, because they are usually the cause for bad query performance.

Of course there is a reason why there is an OR in SQL, and if you cannot avoid it, you have to use it. But you should be aware of the performance implications.

In this article I’ll explore “good” and “bad” ORs and what you can do to avoid the latter.

A little sample schema

We’ll use this simple setup for demonstration:

CREATE TABLE a(id integer NOT NULL, a_val text NOT NULL);

INSERT INTO a
   SELECT i, md5(i::text)
   FROM generate_series(1, 100000) i;

CREATE TABLE b(id integer NOT NULL, b_val text NOT NULL);

INSERT INTO b
   SELECT i, md5(i::text)
   FROM generate_series(1, 100000) i;

ALTER TABLE a ADD PRIMARY KEY (id);
ALTER TABLE b ADD PRIMARY KEY (id);
ALTER TABLE b ADD FOREIGN KEY (id) REFERENCES a;

VACUUM (ANALYZE) a;
VACUUM (ANALYZE) b;

Suppose that we want to run queries with equality and LIKE conditions on the text columns, so we need some indexes:

CREATE INDEX a_val_idx ON a(a_val text_pattern_ops);
CREATE INDEX b_val_idx ON b(b_val text_pattern_ops);

Have a look at the documentation if you don’t understand text_pattern_ops.

The “good” `OR`

An OR is fine in most parts of an SQL query: if it is not used to filter out rows from your query result, it will have no negative effect on query performance.

So if your OR appears in a CASE expression in the SELECT list, don’t worry.

Unfortunately you usually find the OR where it hurts: in the WHERE clause.

The “bad” `OR`

Now for an example of an OR in a WHERE clause that is still pretty nice:

EXPLAIN (COSTS off)
SELECT id FROM a
WHERE id = 42
   OR a_val = 'value 42';

                        QUERY PLAN                         
-----------------------------------------------------------
 Bitmap Heap Scan on a
   Recheck Cond: ((id = 42) OR (a_val = 'value 42'::text))
   ->  BitmapOr
         ->  Bitmap Index Scan on a_pkey
               Index Cond: (id = 42)
         ->  Bitmap Index Scan on a_val_idx
               Index Cond: (a_val = 'value 42'::text)
(7 rows)

PostgreSQL can actually use an index scan for the query, because it can combine the bitmaps for both indexes with a “bitmap OR”.
Note, however, that a bitmap index scan is much more expensive than a normal index scan — it has to scan the complete index. Moreover, it uses much more RAM; each of these bitmaps can use up to work_mem memory.

A multi-column index on (id, a_val) won’t help at all with this query, so there is no really cheap way to execute this query.
If you need better performance, see the trick from the “ugly” section below.

`IN` is better than `OR`

Now for a more stupid variant of the above query:

EXPLAIN (COSTS off)
SELECT id FROM a
WHERE id = 42
   OR id = 4711;

                 QUERY PLAN                 
--------------------------------------------
 Bitmap Heap Scan on a
   Recheck Cond: ((id = 42) OR (id = 4711))
   ->  BitmapOr
         ->  Bitmap Index Scan on a_pkey
               Index Cond: (id = 42)
         ->  Bitmap Index Scan on a_pkey
               Index Cond: (id = 4711)
(7 rows)

Again, a bitmap index scan is used. But there is a simple method to rewrite that query without the pesky OR:

EXPLAIN (COSTS off)
SELECT id FROM a
WHERE id IN (42, 4711);

                    QUERY PLAN                     
---------------------------------------------------
 Index Only Scan using a_pkey on a
   Index Cond: (id = ANY ('{42,4711}'::integer[]))
(2 rows)

You see? As soon as you get rid of the OR, an efficient index scan can be used!

You might say that this is good for equality conditions, but what about the following query:

SELECT id FROM a
WHERE a_val LIKE 'something%'
   OR a_val LIKE 'other%';

To improve that query, observe that the PostgreSQL optimizer rewrote the IN in the previous query to = ANY.

This is a case of the standard SQL “quantified comparison predicate”: <comparison operator> ANY is true if the comparison is TRUE for any of the values on the right-hand side (the standard only defines this for subqueries on the right-hand side, but PostgreSQL extends the syntax to arrays).

Now LIKE is a comparison operator as well, so we can write:

EXPLAIN (COSTS off)
SELECT id FROM a
WHERE a_val LIKE ANY (ARRAY['something%', 'other%']);

                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on a
   Filter: (a_val ~~ ANY ('{something%,other%}'::text[]))
(2 rows)

Unfortunately, the index cannot be used here.

`pg_trgm` to the rescue

But we are not at the end of our wits yet! There is such a wealth of indexes in PostgreSQL; let’s try a different one. For this, we need the pg_trgm extension:

CREATE EXTENSION pg_trgm;

Then we can create a GIN trigram index on the column:

CREATE INDEX a_val_trgm_idx ON a USING gin (a_val gin_trgm_ops);

Now things are looking better:

EXPLAIN (COSTS off)
SELECT id FROM a
WHERE a_val LIKE ANY (ARRAY['something%', 'other%']);

                             QUERY PLAN                             
--------------------------------------------------------------------
 Bitmap Heap Scan on a
   Recheck Cond: (a_val ~~ ANY ('{something%,other%}'::text[]))
   ->  Bitmap Index Scan on a_val_trgm_idx
         Index Cond: (a_val ~~ ANY ('{something%,other%}'::text[]))
(4 rows)

Feel the power of trigram indexes!

Note 1: This index can also be used if the search pattern starts with %

Note 2: The GIN index can become quite large. To avoid that, you can also use a GiST index, which is much smaller, but less efficient to search.

The “ugly” `OR`

Things become really bad if OR combines conditions from different tables:

EXPLAIN (COSTS off)
SELECT id, a.a_val, b.b_val
FROM a JOIN b USING (id)
WHERE a.id = 42
   OR b.id = 42;

                 QUERY PLAN                  
---------------------------------------------
 Merge Join
   Merge Cond: (a.id = b.id)
   Join Filter: ((a.id = 42) OR (b.id = 42))
   ->  Index Scan using a_pkey on a
   ->  Index Scan using b_pkey on b
(5 rows)

Here we have to compute the complete join between the two tables and afterwards filter out all rows matching the condition. In our example, that would mean computing 100000 rows only to throw away the 99999 that do not natch the condition.

Avoiding the ugly `OR`

Fortunately, there is an equivalent query that is longer to write, but much cheaper to execute:

EXPLAIN (COSTS off)
   SELECT id, a.a_val, b.b_val
   FROM a JOIN b USING (id)
   WHERE a.id = 42
UNION
   SELECT id, a.a_val, b.b_val
   FROM a JOIN b USING (id)
   WHERE b.id = 42;

                        QUERY PLAN                        
----------------------------------------------------------
 Unique
   ->  Sort
         Sort Key: a.id, a.a_val, b.b_val
         ->  Append
               ->  Nested Loop
                     ->  Index Scan using a_pkey on a
                           Index Cond: (id = 42)
                     ->  Index Scan using b_pkey on b
                           Index Cond: (id = 42)
               ->  Nested Loop
                     ->  Index Scan using a_pkey on a a_1
                           Index Cond: (id = 42)
                     ->  Index Scan using b_pkey on b b_1
                           Index Cond: (id = 42)
(14 rows)

Both parts of the query can make use of efficient index scans and return one row, and since the rows happen to be identical, UNION will reduce them to one row.

If you can be certain that both branches of the query will return distinct sets, it is better to use UNION ALL instead of UNION, because that doesn’t have to do the extra processing to remove duplicates.

When using this trick, you should be aware that rewriting a query in that fashion does not always result in an equivalent query: if the original query can return identical rows, these would be removed by the UNION. In our case, we don’t have to worry, because the primary keys were included in the query result. I find that this is hardly ever a problem in practice.

The post Avoiding “OR” for better query performance appeared first on Cybertec.

↧

Dimitri Fontaine: PostgreSQL Data Types: Point

May 7, 2018, 1:46 am

≫ Next: Luca Ferrari: plperl: invoking other subroutines

≪ Previous: Laurenz Albe: Avoiding “OR” for better query performance

Continuing our series of PostgreSQL Data Types today we’re going to introduce the PostgreSQL Point type.

In order to put the Point datatype in a context where it makes sense, we’re going to download a complete geolocation data set and normalize it, thus making good use of both the normalization good practice and those other PostgreSQL data types we’ve been learning about in the previous articles of this series.

Buckle-up, this is a long article with a lot of SQL inside.

↧

Luca Ferrari: plperl: invoking other subroutines

May 3, 2018, 5:00 pm

≫ Next: Joshua Drake: PostgresConf US 2018: A Review of the largest PostgresConf ever and a recap of the last 12 months!

≪ Previous: Dimitri Fontaine: PostgreSQL Data Types: Point

plperl does not allow direct sub invocation, so the only way is to execute a query.

`plperl`: invoking other subroutines

The official plperl documentation shows you a way to use a subref to invoke code shared across different plperl functions via the special global hash %_SHARED. While this is a good approach, it only works for code attached to the hash, that is a kind of closure (e.g., a dispatch table), and requires each time an initialization of the %_SHARED hash since plperl interpreters does not share nothing across sections.

The other way, always working, is to execute a query to perform the SELECT that will invoke the function. As an example:

CREATEORREPLACEFUNCTIONplperl_trampoline(fun_nametext)RETURNSTEXTAS$PERL$my($fun_name)=@_;returnundefif(!$fun_name);elog(DEBUG,"Calling [$fun_name]");my$result_set=spi_exec_query("SELECT $fun_name() AS result;");return$result_set->{rows}[0]->{result};$PERL$LANGUAGEplperl;

so that you can simply do:

>selectplperl_trampoline('now');plperl_trampoline------------------------------2018-05-0413:09:17.11772+02

The problem of this solution should be clear: it can work only for a...

↧

Joshua Drake: PostgresConf US 2018: A Review of the largest PostgresConf ever and a recap of the last 12 months!

May 7, 2018, 11:54 am

≫ Next: Craig Kerstiens: It's the future (for databases)

≪ Previous: Luca Ferrari: plperl: invoking other subroutines

The PostgresConf team wanted to provide some information on the performance of PostgresConf US 2018 and events over the past year, as well as potentially answer some pending questions. Ultimately our goals are about people, which is why our motto is, "People, Postgres, Data." With each event we hold, each talk we accept, and how we train our volunteers, we make sure people (the benefit for and to), postgres, and data are considered and included. If there is no benefit or consideration to the growth of people, it is not an option.

With that in mind, please read on to see how our focus on people, Postgres, and data had an impact on the community over the last year.

Since PostgresConf US 2017 we have had events in:

Philadelphia
Ohio (in combination with Ohio Linux Fest)
South Africa
Seattle
Austin
Jersey City (PostgresConf US 2018)
Nepal

All of these events are non-profit and volunteer organized.

PostgresConf US 2018

Logistics

Days: 5, 2 for training, 3 for Breakout sessions and summits
Official Attendance #: 601
Content: Over 207 sessions submitted
Sessions: Over 108 sessions provided

Partner Support (Sponsors): 28

We had a record level of support from partners this year and due to this support we are going to be forced to find a new venue for next year. Our Jersey city location no longer has the capacity to hold us. This will increase costs but initial indications are that our partners understand this and are willing to support us financially to help continue the success of our efforts and keep costs reasonable for attendees.

Diversity

This year we were able to work with Women Who Code NYC. They provided many volunteers and we provided them with the ability to experience some of the best Postgres based content available, at no charge. We expect great things from this budding relationship in the future.

Professional Growth

We held a Career and Talent Fair. A dozen companies were present to connect with potential employees.

We also held a surprisingly well attended speed mentoring session for potential employees (Especially helpful for many of the WWC) on resumes and interview practices.

Leadership

This year saw the continued elevation of our primary leadership: Viral Shah, Lloyd Albin, Amanda Nystrom, and Debra Cerda. They continued to increase their presence and responsibility within the conference and dedicated hundreds of hours voluntarily to the growth of people. Our international members have also increased their leadership roles with our on-the-ground teams in South Africa and China.

Summits

We had our standard Regulated Industry Summit but also a Greenplum Summit. As I am sure you are aware Greenplum is an Open Source, Postgres based MPP database. They were by far the most popular booth in the entire conference and their summit was very well attended. The relationship with Pivotal and the success of the Greenplum Summit allowed us to learn new ways to bring together the entire Postgres Ecosystem. We expect to run a minimum of 3 more summits at PostgresConf US 2019.

Contribution

We were able to have several excellent (and long) meetings with leaders of Pivotal, Microsoft, Google, and Amazon on how they can begin contributing more back to Postgresql.org. All of them expressed a deep drive to contribute and a desire to learn more about the core community. Of particular note is Google, who would like to contribute the following back to the community:

https://github.com/google/pg_page_verification

We discussed with them the process and various changes they would need to make (license and code style, etc.). We also educated them on the PostgreSQL.Org rigorous review process.

Microsoft is reviewing how they can contribute but they showed an interest in build farm nodes, professional technical writers to help with docs, and potentially code contribution to our Windows port.

International Collaboration

The Chinese Open Source Promotion Union launched the Chinese Postgres Association. We invited them to PostgresConf US and introduced them to the United States Community. We expect great things from the Chinese community in the future.

Future

As we continue to build up our on-the-ground teams, we will likely hold less events in the U.S. this year. We will instead be focused on a smaller number of events in the U.S. and adding events in China and Europe. We have had an amazing amount of support from the Chinese community and the current goal is 1000 attendees for that conference.

Our current plan of events for the U.S. are

San Jose (October 2018)
Philadelphia
PostgresConf US (Manhattan)

Future International Events

October 2018.
Spring of 2019.
Spring of 2019.

This may change as we are actively recruiting on-the-ground teams to help us grow the community.

Collaboration

Our goal is collaboration and growth with other PostgreSQL community and Ecosystem efforts. We want to allow each potential community member to find a home. A place that they feel positive about contributing to the community as a whole. As we continue to grow as a community, it is vital to recognize that each member has their own needs, desires, and return on investment requirement (professional or personal) that they are seeking.

Tidbits of note

Pivotal went public during PostgresConf US 2018.
We (PostgreSQL and PostgresConf) were picked up by ZDNet during the conference
Another flattering article about PostgreSQL

On DBEngines PostgreSQL is the 4th most popular database but the significance is that of the other 3, we are the only ones that are growing in popularity.

↧

Craig Kerstiens: It's the future (for databases)

May 8, 2018, 7:36 am

≫ Next: Federico Campoli: The maze of the elephant

≪ Previous: Joshua Drake: PostgresConf US 2018: A Review of the largest PostgresConf ever and a recap of the last 12 months!

Hi. I work as a data architect in San Francisco and I’m auditing Dr. Jones class to stay up to date on the latest technologies and she mentioned you might be able to help me before I get too deep into the design of a new system.

I would be happy to help. Can you give me an overview of where you’re at?

Well my default was just to use Postgres. I had a few questions on what schema designs might make most sense.

Well I’m working with more interesting data architectures. Really getting excited about what’s possible with neomodern data architectures, they make it so my app devs can build any feature their hearts desire.

I thought your expertise used to be relational databases?

It was, but neomodern data architectures are better. Neomodern data architectures allow it so app devs can build any feature they like without having to think about data models. Really, it’s the future of databases.

Hmmm, my app is a pretty straightforward web app that allows salespeople to send campaigns and track metrics on the campaigns. My app should be fine in a Postgres database right?

That might work for you, but really, if you have to define a data model up front, then you’re limiting yourself.

How do you mean limiting? Can’t I just add new tables as I need them? Or add columns to existing tables?

Well you could, but meta document stores can take care of that for you

Meta document stores?

Yeah, you have a custom document for each data model, but then have a document store that auto parses each document and can tell you the structure without having to read the document itself

Oh nice, so I can have an index of all my data.

Exactly, and because you have a bunch of smaller document stores you can distribute them around the world and read the data from the nearest one

So reading from the nearest copy of data would be faster?

Well it’s supposed to be, but to make sure the data is correct you have to read from at least 3 nodes to get consensus

Consensus?

Yeah concensus. Since the underlying storage is all in blockchain, there has to be an agreement about which of the data is a source of truth

How long does concensus take?

Right now, most queries take around 400 milliseconds, but I’m planning to rewrite the consensus algorithm over the next two months

In the past my Postgres database would perform reads in usually a millisecond or less. 400 ms doesn’t sound any better?

Well the key is scale, at larger scale. We’re planning ahead for really large growth so we needed to have something distributed in place

And what about writes? Is write performance just as bad?

Writes take about 3 seconds, but we don’t write to the database directly

You don’t write to the database?! What do you write to then?

Well we tried to write directly to the database, but for some reason there was clock skew which gave inconsistencies. So now we’re planning to upgrade to TrueTime atomic clocks in July.

What the hell! Where do you get atomic clocks from?

We sent our infrastructure guy to China to get our atomic clocks custom designed

This is running on your own hardware? Not on a cloud vendor like AWS, Microsoft, or Google?

We budget our hardware on a 5 year cycle, and accounting wouldn’t let us move the classification. Over the next 3 years, we’ll save about 50% on hardware cost, and almost break even when you include the people hours

Ouch! That sounds like a lot of work. Wait, can we back up for a minute. You said you don’t write to the database. If you don’t write directly to the database, what do you write to then?

Right now we dual write to Redis and Memcached

You’re running two things that you write to?! Why would you even need to do that?

They’re not normally used for transactions, but since they’re so fast, we can write to both. And since we write to both, we have a better chance of transactions

This is sounding like more work than I was hoping for. Is that all I need to know?

Oh we haven’t even gotten to the big wins yet

Okay, color me curious. What’s the payoff to all this investment?

I’ve seen the biggest value in how we’ve been able to give each analyst their own data lake.

Data lake?

Yeah, it’s like a partial copy of the database that has just what that person wants

You lost me

Because it’s schemaless, we have to use the meta document to parse it into a structured format so people can read it

How do you parse it?

We have event streams that feed out from Kafka

Where do you stream it?

Into Apache Spark

What’s Spark then do?

It computes pre-pre-aggregates

And those then get queried?

Oh, no. We push the pre-pre-aggregates to Druid, which computes our pre-aggregates

Okay, then you query the pre-aggregates?

No, that go into our data warehouse. Our data warehouse is great for analysts. It doesn’t help for user facing things, but our analysts can create some really powerful reports.

Oh, so you can make business decisions in real time?

Well because the pipeline data is usually about 12 hours behind, our business decisions are 12 hours behind. But we’re confident we can get it down to 8 hrs.

Okay, I think you lost me somewhere, but you’re saying the application developers are way more productive at least?

Oh the application developers haven’t gotten to try it yet, we’re just wrapping up our staging environment, then we have to start ordering the hardware for production. We plan to go live by 2022.

I think I may just stick with Postgres.

Credit for inspiration to Paul Biggar

↧

Federico Campoli: The maze of the elephant

May 8, 2018, 5:00 pm

≫ Next: Berend Tober: An Overview of the Serial Pseudo-Datatype for PostgreSQL

≪ Previous: Craig Kerstiens: It's the future (for databases)

In the previous post we introduced the PostgreSQL’s dependency system.

At first sight the implementation can look like a maze where the succession of relationships are not clear.

This post will try to give a practical example to show how pg_depend can act like an Ariadne’s thread in order to resolve the dependencies.

The scenario presented is very simple but can be used as a starting point for more complex requirements.

↧

Berend Tober: An Overview of the Serial Pseudo-Datatype for PostgreSQL

May 9, 2018, 2:58 am

≫ Next: Michael Paquier: Postgres 11 highlight - Removing superuser dependency for pg_rewind

≪ Previous: Federico Campoli: The maze of the elephant

Introduction

PostgreSQL natively supplies a rich diversity of data types supporting many practical use cases. This article introduces the special implementation of serial data types typically used for creation of synthetic primary keys.

Unique Keys

A foundational precept of database design theory is that each tuple (i.e., row) of a relation (i.e., table) must be uniquely identified from other tuples. The attributes, or columns, that together distinctly identify one tuple from all the others are called a "key". Some purists maintain that any modeled object or concept inherently possesses an attribute or set of attributes that can serve as a key and that it is important to identify this set of key attributes and utilize them for the unique selection of tuples.

But as a practical matter, identifying a sufficiently large set of attributes assurring uniqueness for a modeled object may be impractical, and so for real-world implementations, developers often turn to synthetic keys as a surrogate. That is, rather than relying on some combination of actual attributes, a value internal to the database, typically incremented integer values, and otherwise having no physical meaning is defined as a key. In additional to the simplicity of a single column key, the fact that there is no real-world dependency means that external factors can never force a need to change the value, such as for instance, might be the case if a person's name where used as a key ... and then the person married or entered a federal government witness protection program and changed their name. Even some values commonly thought by laypersons to be unique and immutable, such as the U.S. social security number, are neither: a person can obtain a new SSN, and SSN's sometimes are re-used.

Declaring a Serial Data Type

PostgreSQL provides a special datatype declaration to satisfy this need for synthetic keys. Declaring a database table column as type SERIAL satisfies the requirement for synthetic keys by supplying unique integers upon inserts of new tuples. This pseudo-datatype implements an integer data type column with an associated default value derived via a function call that supplies incremented integer values. Executing the following code to create a simple table with an id column of type serial:

CREATE TABLE person (id serial, full_name text);
actually executes the following DDL
CREATE TABLE person (
    id integer NOT NULL,
    full_name text
);

CREATE SEQUENCE person_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER SEQUENCE person_id_seq OWNED BY person.id;
ALTER TABLE ONLY person
    ALTER COLUMN id
    SET DEFAULT nextval('person_id_seq'::regclass);

That is, the keyword "serial" as a datatype specification implies execution of DDL statements creating an integer type column with a NOT NULL constraint, a SEQUENCE, and then the column default is ALTERED to call a built-in function accessing that SEQUENCE.

The built-in function nextval performs an autoincrement service: each time nextval is called it increments the specified sequence counter and returns the the newly-incremented value.

You can see the result of this effect by examining the table definition:

postgres=# \d person
                   Table "public.person"
  Column   |  Type   |         Modifiers
-----------+---------+-----------------------------------------
 Id        | integer | not null default nextval('person_id_seq'::regclass)
 full_name | text    |

Inserting Serial Values

To make use of the auto-increment functionality, we simply insert rows, relying on the default value for the serial column:

INSERT INTO person (full_name) VALUES ('Alice');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
(1 row)

We see that a value for the id column corresponding to the new “Alice” row has been automatically generated. Alternatively, one can make use of the DEFAULT keyword if explicitly listing all column names is desired:

INSERT INTO person (id, full_name) VALUES (DEFAULT, 'Bob');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
(2 rows)

where we see the auto-increment functionality more apparently, assigning the serially-next value to the new row for the second insert of “Bob”.

Inserting multiple rows even works:

INSERT INTO person (full_name) VALUES ('Cathy'), ('David');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  3 | Cathy
  4 | David
(4 rows)

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Missing Serial Values

The built-in nextval function is optimized for non-blocking, high-concurrency applications and so does not respect rollback. Consequently, this means there may be missing values in the sequence. Below, we rollback an insert, but then see a subsequent insert gets an new value that skips over the value that would have been associated with the aborted transaction:

BEGIN TRANSACTION;
INSERT INTO person (full_name) VALUES ('Eve');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  3 | Cathy
  4 | David
  5 | Eve
(5 rows)
ROLLBACK;
INSERT INTO person (full_name) VALUES ('Fred');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  3 | Cathy
  4 | David
  6 | Fred
(5 rows)

The advantage of not respecting rollbacks is that other sessions attempting concurrent inserts are not blocked by other inserting sessions.

Another way to end up with missing values is if rows are deleted:

DELETE FROM person WHERE full_name = 'Cathy';
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  4 | David
  6 | Fred
(4 rows)

Note that even after deleting the most recently inserted row corresponding to the largest auto-increment id column value, the sequence counter does not revert, i.e., even though after deleting the row corresponding to 'Fred', for subsequent inserts the sequence counter still retains the previously-known largest value and increments from there:

DELETE FROM person WHERE full_name = 'Fred';
INSERT INTO person (full_name) VALUES ('Gina');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  4 | David
  7 | Gina
(4 rows)

Gaps or missing values as shown above are reportedly viewed as a problem by some application developers because on the PostgreSQL General mailing list, there is a slow-but-steady reiteration of the question how to avoid sequence gaps when employing the serial pseudo-datatype. Sometimes there is no actual underlying business requirement, it's just matter of personal aversion to missing values. But there are circumstances when preventing missing numbers is a real need, and that is the subject for a subsequent article.

NO YOU CAN'T - YES YOU CAN!

The NOT NULL constraint imputed by the serial pseudo-datatype protects against the insertion of NULL for the id column by rejecting such insert attempts:

INSERT INTO person (id, full_name) VALUES (NULL, 'Henry');
ERROR:  null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, Henry).

Thus, we are assured of having a value for that attribute.

However, a problem some people encounter is that, as declared above, nothing prevents explicit insertion of values, bypassing the default autoincrement value derived via invocation of the nextval function:

INSERT INTO person (id, full_name) VALUES (9, 'Ingrid');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  4 | David
  7 | Gina
  9 | Ingrid
(5 rows)

But then two inserts later using the default produces a duplicate value for the id column if there is no constraint check of column values against the sequence value:

INSERT INTO person (full_name) VALUES ('James');
INSERT INTO person (full_name) VALUES ('Karen');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  4 | David
  7 | Gina
  9 | Ingrid
  8 | James
  9 | Karen
(7 rows)

If we were in fact using the serial id column as a key, we would have declared it as a PRIMARY KEY or at least created a UNIQUE INDEX. Had we done that, then the 'Karen' insert above would have failed with a duplicate key error. The most recent release of PostgreSQL includes a new constraint declaration syntax 'generated by default as identity' which avoids this pitfall and some other legacy issues related to the serial pseudo-datatype.

Sequence Manipulation Functions

In addition to the nextval function we already mentioned which advances the sequence and returns the new value, there are a few other functions for querying and setting the sequence values: the currval function returns value most recently obtained with nextval for specified sequence, the lastval function returns value most recently obtained with nextval for any sequence, and the setval function sets a sequence's current value. These functions are called with simple queries., for example

SELECT currval('person_id_seq');
 currval
---------
       9
(1 row)

And note that if a call is made to the nextval function independently of actually performing an insert, it does increment the sequence, and that will be reflected in subsequent inserts:

SELECT nextval('person_id_seq');
 nextval
---------
      10
(1 row)
INSERT INTO person (full_name) VALUES ('Larry');
SELECT * FROM person;
 id | full_name
----+-----------
  1 | Alice
  2 | Bob
  4 | David
  7 | Gina
  9 | Ingrid
  8 | James
  9 | Karen
 11 | Larry
(8 rows)

Conclusion

We have introduced a basic understanding of the PostgreSQL SERIAL pseudo-datatype for auto-incremented synthetic keys. For illustration in this article, we used the SERIAL type declaration, which creates a 4-byte integer column. PostgreSQL accommodates different range needs with the SMALLSERIAL and BIGSERIAL pseudo-datatypes for, respectively, 2-byte and 8-byte column sizes. Look for a future article on one means of addressing the need for sequences with no missing values.

Tags:

↧

Michael Paquier: Postgres 11 highlight - Removing superuser dependency for pg_rewind

May 9, 2018, 7:00 pm

≫ Next: Pavel Stehule: ncurses-st-menu library is available

≪ Previous: Berend Tober: An Overview of the Serial Pseudo-Datatype for PostgreSQL

The following commit adds a new feature which is part of Postgres 11, and matters a lot for a couple of tools:

commit: e79350fef2917522571add750e3e21af293b50fe
author: Stephen Frost <sfrost@snowman.net>
date: Fri, 6 Apr 2018 14:47:10 -0400
Remove explicit superuser checks in favor of ACLs

This removes the explicit superuser checks in the various file-access
functions in the backend, specifically pg_ls_dir(), pg_read_file(),
pg_read_binary_file(), and pg_stat_file().  Instead, EXECUTE is REVOKE'd
from public for these, meaning that only a superuser is able to run them
by default, but access to them can be GRANT'd to other roles.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net

This is rather a simple thing: a set of in-core functions like using a hardcoded superuser check to make sure that they do not run with unprivileged user rights. For the last couple of releases, an effort has been made to remove those hardcoded checks so as one can GRANT execution access to a couple or more functions so as actions which would need a full superuser (a user who theoritically can do anything on the cluster and administers it), are delegated to extra users with rights limited to those actions.

This commit, while making lookups to the data directory easier, is actually very useful for pg_rewind as it removes the need of having a database superuser in order to perform the rewind operation when the source server is up and running.

In order to get to this state, one can create a dedicated user and then grant execution to a subset of functions, which can be done as follows:

CREATE USER rewind_user LOGIN;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO rewind_user;

Once run, then this new database user “rewind_user” will be able to run pg_rewind without superuser rights, which matters for a lot of deployments as restricting superuser access to a cluster as much as possible is a common security policy. Note that pg_dump is able to dump ACLs on system functions since 9.6, so once put in place those policies remain in logical backups.

↧

Pavel Stehule: ncurses-st-menu library is available

May 10, 2018, 2:55 am

≫ Next: gabrielle roth: PDXPUG: May meeting in one week

≪ Previous: Michael Paquier: Postgres 11 highlight - Removing superuser dependency for pg_rewind

I finished all work on CUA menu (menubar and pulldown menu) ncurses library. This library allows skins, shadows, supports accelerators, mouse. Look to demo.c for info about usage. Download from Github.

↧

gabrielle roth: PDXPUG: May meeting in one week

May 10, 2018, 8:59 am

≫ Next: Christophe Pettus: PostgreSQL Replication at PerconaLive 2018

≪ Previous: Pavel Stehule: ncurses-st-menu library is available

When: 6-8pm Thursday May 17, 2018
Where: iovation
Who: Dylan Hornstein
What: Learning SQL

During this PDXPUG meetup, we will talk about one person’s journey to learn SQL. From joining iovation’s Reporting Team without any experience with SQL or relational databases, to using SQL every day for adhoc data analysis and bulk data dumps, Dylan Hornstein will talk about his experience getting familiar with SQL and learning to navigate a relational database, as well as some challenges and tips he’s found along the way. The presentation is geared towards those starting out in data roles and we will likely expand into a wider conversation around using SQL and understanding data.

Dylan has been a Data Analyst at iovation for three years now. Having spent six months as a Client Manager before moving to his Data Analyst role, Dylan has experience working directly with iovation’s clients as well as working behind the scenes with the data. In his current role, he is responsible for providing reports, adhoc analysis and bulk data dumps for clients and internal teams. While Dylan had prior work experience as a Data Analyst, his move to iovation’s Reporting Team came with a steep learning curve, as he was new to working with SQL and relational databases.

—
If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.
—

Our meeting will be held at iovation, on the 3rd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry! For access to the 3rd floor of the plaza, please either take the lobby stairs to the third floor or take the plaza elevator (near Subway and Rabbit’s Cafe) to the third floor. There will be signs directing you to the meeting room. All attendess must check in at the iovation front desk.

See you there!

↧

Christophe Pettus: PostgreSQL Replication at PerconaLive 2018

May 10, 2018, 12:00 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 11 – Add json(b)_to_tsvector function

≪ Previous: gabrielle roth: PDXPUG: May meeting in one week

The slides from my talk on PostgreSQL Replication at PerconaLive 2018 are available.

↧

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 11 – Add json(b)_to_tsvector function

April 23, 2018, 11:18 am

≫ Next: Alexey Lesovsky: Let’s speed things up.

≪ Previous: Christophe Pettus: PostgreSQL Replication at PerconaLive 2018

On 7th of April 2018, Teodor Sigaev committed patch: Add json(b)_to_tsvector function Jsonb has a complex nature so there isn't best-for-everything way to convert it to tsvector for full text search. Current to_tsvector(json(b)) suggests to convert only string values, but it's possible to index keys, numerics and even booleans value. To solve that json(b)_to_tsvector has […]

↧