Dimitri Fontaine: Why is pgloader so much faster?

May 14, 2014, 5:59 am

≫ Next: Shaun M. Thomas: Foreign Keys are Not Free

≪ Previous: Michael Paquier: Postgres 9.4 feature highlight: MSVC installer for client binaries and libraries

pgloader loads data into PostgreSQL. The new version is stable enough nowadays that it's soon to be released, the last piece of the 3.1.0 puzzle being full debian packaging of the tool.

The pgloader logo is a loader truck, just because.

As you might have noticed if you've read my blog before, I decided that pgloader needed a full rewrite in order for it to be able to enter the current decade as a relevant tool. pgloader used to be written in the python programming language, which is used by lots of people and generally quite appreciated by its users.

Why changing

Still, python is not without problems, the main ones I had to deal with being poor performances and lack of threading capabilities. Also, the pgloader setup design was pretty hard to maintain, and adding compatiblity to other loader products from competitors was harder than it should.

As I said in my pgloader lightning talk at the 7th European Lisp Symposium last week, in searching for a modern programming language the best candidate I found was actually Common Lisp.

After some basic performances checking as seen in my Common Lisp Sudoku Solver project where I did get up to ten times faster code when compared to python, it felt like the amazing set of features of the language could be put to good use here.

So, what about performances after rewrite?

The main reason why I'm now writing this blog post is receiving emails from pgloader users with strange feelings about the speedup. Let's see at the numbers one user gave me, for some data point:

 select rows, v2, v3,
        round((  extract(epoch from v2)
               / extract(epoch from v3))::numeric, 2) as speedup
   from timing;
        
  rows   |        v2         |       v3        | speedup 
---------+-------------------+-----------------+---------
 4768765 | @ 37 mins 10.878  | @ 1 min 26.917  |   25.67
 3115880 | @ 36 mins 5.881   | @ 1 min 10.994  |   30.51
 3865750 | @ 33 mins 40.233  | @ 1 min 15.33   |   26.82
 3994483 | @ 29 mins 30.028  | @ 1 min 18.484  |   22.55
(4 rows)

The raw numbers have been loaded into a PostgreSQL table

So what we see in this quite typical CSV Loading test case is a best case of 30 times faster import. Which brings some questions on the table, of course.

Wait, you're still using `COPY` right?

The PostgreSQL database system provides a really neat COPY command, which in turn is only exposing the COPY Streaming Protocol, that pgloader is using.

So yes, pgloader is still using COPY. This time the protocol implementation is to be found in the Common Lisp Postmodern driver, which is really great. Before that, back when pgloader was python code, it was using the very good psycopg driver, which also exposes the COPY protocol.

So, what did happen here?

Well it happens that pgloader is now built using Common Lisp technologies, and those are really great, powerful and fast!

Not only is Common Lisp code compiled to machine code when using most Common Lisp Implementations such as SBCL or Clozure Common Lisp; it's also possible to actually benfit from parallel computing and threads in Common Lisp.

That's not how I did it!

In the pgloader case I've been using the lparallel utilities, in particular its queuing facility to be able to implement asynchronous IOs where a thread reads the source data and preprocess it, fills up a batch at a time in a buffer that is then pushed down to the writer thread, that handles the COPY protocol and operations.

So my current analysis is that the new thread based architecture used with a very powerful compiler for the Common Lisp high-level language are allowing pgloader to enter a whole new field of data loading performances.

Conclusion

Not only is pgloader so much faster now, it's also full of new capabilities and supports several sources of data such as dBase files, SQLite database files or even MySQL live connections.

Rather than a configuration file, the way to use the new pgloader is using a command language that has been designed to look as much like SQL as possible in the pgloader context, to make it easy for its users. Implementation wise, it should now be trivial enough to implement compatibility with other data load software that some PostgreSQL competitor products do have.

Also, the new code base and feature set seems to attract way more users than the previous implementation ever did, despite using a less popular programming language.

You can already download pgloader binary packages for debian based distributions and centos based ones too, and you will even find a Mac OS X package file ( .pkg) that will make /usr/local/bin/pgloader available for you on the command line. If you need a windows binary, drop me an email.

The first stable release of the new pgloader utility is scheduled to be named 3.1.0 and to happen quite soon. We are hard at work on packaging the dependencies for debian, and you can have a look at the Quicklisp to debian project if you want to help us get there!

↧

Shaun M. Thomas: Foreign Keys are Not Free

May 14, 2014, 2:40 pm

≫ Next: Josh Berkus: 9.4 Beta, Postgres-XL, and pgCon Events

≪ Previous: Dimitri Fontaine: Why is pgloader so much faster?

PostgreSQL is a pretty good database, and I enjoy working with it. However, there is an implementation detail that not everyone knows about, which can drastically affect table performance. What is this mysterious feature? I am, of course, referring to foreign keys.

Foreign keys are normally a part of good database design, and for good reason. They inform about entity relationships, and they verify, enforce, and maintain those relationships. Yet all of this comes at a cost that might surprise you. In PostgreSQL, every foreign key is maintained with an invisible system-level trigger added to the source table in the reference. At least one trigger must go here, as operations that modify the source data must be checked that they do not violate the constraint.

This query is an easy way to see how many foreign keys are associated with every table in an entire PostgreSQL database:

SELECT t.oid::regclass::text AS table_name, count(1) AS total
  FROM pg_constraint c
  JOIN pg_class t ON (t.oid = c.confrelid)
 GROUP BY table_name
 ORDER BY total DESC;

With this in mind, consider how much overhead each trigger incurs on the referenced table. We can actually calculate this overhead. Consider this function:

CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
  END LOOP;

  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '                    '
     WHERE id = i;
  END LOOP;

  DROP TABLE test_fk CASCADE;

  FOR i IN 1..key_count LOOP
    EXECUTE 'DROP TABLE test_fk_ref_' || i;
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;

The function is designed to create a simple two-column table, fill it with 100,000 records, and test how long it takes to update every record. This is purely meant to simulate a high-transaction load caused by multiple clients. I know no sane developer would actually update so many records this way.

The only parameter this function accepts, is the amount of tables it should create that reference this source table. Every referring table is empty, and has only one column for the reference to be valid. After the foreign key tables are created, it performs those 100,000 updates, and we can measure the output with our favorite SQL tool. Here is a quick test with psql:

\timing
SELECT fnc_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(5);
SELECT fnc_check_fk_overhead(10);
SELECT fnc_check_fk_overhead(15);
SELECT fnc_check_fk_overhead(20);

On our system, these timings were collected several times, and averaged 2961ms, 3805ms, 4606ms, 5089ms, and 5785ms after three runs each. As we can see, after merely five foreign keys, performance of our updates drops by 28.5%. By the time we have 20 foreign keys, the updates are 95% slower!

I don’t mention this to make you abandon foreign keys. However, if you are in charge of an extremely active OLTP system, you might consider removing any non-critical FK constraints. If the values are merely informative, or will not cause any integrity concerns, a foreign key is not required. Indeed, excessive foreign keys are actually detrimental to the database in a very tangible way.

I merely ask you keep this in mind when designing or revising schemas for yourself or developers you support.

↧

Josh Berkus: 9.4 Beta, Postgres-XL, and pgCon Events

May 15, 2014, 12:13 pm

≫ Next: Josh Berkus: Help us choose and avocacy theme for PostgreSQL 9.4

≪ Previous: Shaun M. Thomas: Foreign Keys are Not Free

So, in case you somehow missed it, the PostgreSQL 9.4 Beta 1 is out. Yaay! Here's what I have to say about that:

libdata=# select title,
    bookdata #> '{"publication_info", 0, "isbn"}' as isbn
from booksdata
where bookdata @> '{ "publication_info" : [{"publisher": "Avon"} ] }'::jsonb
order by bookdata #> '{"publication_info", 0, "price"}' DESC;

                 title            |    isbn
--------------------------------------+-----------------
The Bewitched Viking                 | "0-06-201900-7"
When a Scot Loves a Lady     | "0-06-213120-6"
Eternal Prey                         | "0-06-201895-7"
My Irresistible Earl                 | "0-06-173396-2"
...

Download the beta now and test it out! Break it! Tell us how you broke it! It's a beta, and it's up to you to make sure that the final release is as robust as possible.

Speaking of betas, there's a new new open source big data option on the block: Postgres-XL. This is a fork of PostgresXC, which supposedly resolves the blockers which have kept PostgresXC from being ready for production use. I look forward to trying it out when I get a chance.

Finally, I wanted to remind everyone about the Clustering Summit, the PostgresXC Pizza Demo, and the Unconference at pgCon next week.   Especially, I still need two assistants to help me with the unconference. Email me at josh-at-postgresql.org if you're available to help with setup at the unconference.

↧

Josh Berkus: Help us choose and avocacy theme for PostgreSQL 9.4

May 15, 2014, 4:39 pm

≫ Next: Christoph Berg: PostgreSQL 9.4 on Debian

≪ Previous: Josh Berkus: 9.4 Beta, Postgres-XL, and pgCon Events

Every year, for each PostgreSQL release, I have a "theme" which decides our graphics and presentation themes for promoting that version of PostgreSQL. In the past, the themes have generally been my personal ideas, but this year we're putting it out to our greater community.

Five potential theme ideas have been selected form about 100 which were suggested on the pgsql-advocacy mailing list. Now we need you to rate them, in order to decide which one we go with ... and who wins a Chelnik from Mark Wong!

Please vote on the basis of selecting a good theme/slogan for PostgreSQL 9.4 specifically, rather than just what sounds like the coolest phrase.

So, vote!

↧

Christoph Berg: PostgreSQL 9.4 on Debian

May 15, 2014, 10:54 pm

≫ Next: Steve Singer: Keeping Developers in sync with alembic

≪ Previous: Josh Berkus: Help us choose and avocacy theme for PostgreSQL 9.4

Yesterday saw the first beta release of the new PostgreSQL version 9.4. Along with the sources, we uploaded binary packages to Debian experimental and apt.postgresql.org, so there's now packages ready to be tested on Debian wheezy, squeeze, testing/unstable, and Ubuntu trusty, saucy, precise, and lucid.

If you are using one of the release distributions of Debian or Ubuntu, add this to your /etc/apt/sources.list.d/pgdg.list to have 9.4 available:

deb http://apt.postgresql.org/pub/repos/apt/ codename-pgdg main 9.4

On Debian jessie and sid, install the packages from experimental.

Happy testing!

↧

Steve Singer: Keeping Developers in sync with alembic

May 18, 2014, 1:09 pm

≫ Next: Paul Ramsey: Security releases 2.0.6 and 2.1.3

≪ Previous: Christoph Berg: PostgreSQL 9.4 on Debian

I was recently working on a project where we had about half a dozen developers working on an established code base. All of the developers were new to the code base and I knew that we were going to be making a fair number of database schema and data-seeding changes in a short period of time. Each developer had their own development environment with a dedicated database (PostgreSQL). The developers on the project had their hands full learning about the code base and I didn’t want to distract them by having to take a lot of their time managing their development database instances.

I decided to try using Alembic to manage the database schema migrations.

I wanted something where someone just needed to grab the latest code from source-control and run a single command to apply any database migrations that go along with the version of the source code they had. In addition to schema changes we had a number of planned data-migrations often things as simple as updating content or templates stored in a database table. The application had database access from PHP via Propel and Java via Hibernate. We thought about using either of those for schema and data migrations but ran into a few concerns

I didn’t see a good way to track or even specify ‘data’ changes with the built-in support either of those tools(we were also on legacy versions of both ORMs). Changing the schema often isn’t enough the data changes and associated seeding are just as important
The production DBA’s were not comfortable running hbm2ddl or a propel migrate in production because they couldn’t see and review what was actually being changed. Solving the problem for development was important but eventually the changes need to be deployed

We decided to try Alembic because it had a good reputation in the postgresql community and I’ve met the author of Alembic Michael Brewer at a number of conferences. Alembic also allowed us to write arbitrary queries for data migrations. I initially did some demonstration migrations using the Alembic/SQL Alchemy ORM syntax.
A typical migration file in this manner looked something like


"""add_car

Revision ID: 9de4d14f
Revises: 43254534
Create Date: 2014-04-30 17:10:51

"""

revision = '9de4d14'
down_revision = '43254534'
from alembic import op
from sqlalchemy as sa

def upgrade():
  op.add_column('person',sa.Column('first_car',sa.String()))

def downgrade():
  op.drop_column('person','first_car')

Developers were able to use this type of migration to keep their environments up to date. The migration file would get checked into source control along with the code change associated with the migration. A developer would then run


alembic -c alembic.ini upgrade head

To upgrade their database. This worked well but I found that lots of developers needed to add things to the database but they didn’t really want to learn the SQL Alchemy ORM language. Most of the developers hadn’t worked much with python let alone SQL Alchemey. Their SQL wasn’t great but they found it much easier to write the database migrations directly in SQL instead of in the ORM. Database migrations with Alembic done this way look something like


"""add_car

Revision ID: 9de4d14f
Revises: 43254534
Create Date: 2014-04-30 17:10:51

"""

revision = '9de4d14'
down_revision = '43254534'
from alembic import op
from sqlalchemy as sa

def upgrade():
  op.execute('alter table person add column first_car text')  

def downgrade():
  op.execute('alter table person drop column first_car')

Data style migrations were similarly straight forward they would just put a INSERT or UPDATE statement in the op.execute to perform the data migration.

The team wrote approximately 40 database migrations in a little over 2 months, that works out to a database migration every few days. Once we got over the initial setup (puppet helped) we had very few problems. Occasionally a two developers would add a migration around the same time both revising the same base version creating a split revision line. Alembic detects this when you try to run an upgrade an errors out.

When it came time to apply the upgrade to production our DBA team was able to just run a ‘alembic upgrade head’ in production to update the database from the known starting state using the same migrations that were applied in the development environments. This worked fine for us but might not be optimal. Sometimes a column would be added one in one revision then renamed or the data type changed a few days after in a different migration script. The production migration could have just created the column with final name/type but this would have required re-writing some of the migrations to merge migrations into a more consolidated version. In many circumstances this is probably worth doing but we didn’t do that to.

↧

Paul Ramsey: Security releases 2.0.6 and 2.1.3

May 18, 2014, 5:00 pm

≫ Next: Raghavendra Rao: Few areas of improvements in PostgreSQL 9.4

≪ Previous: Steve Singer: Keeping Developers in sync with alembic

It has come to our attention that the PostGIS Raster support may give more privileges to users than an administrator is willing to grant. These include reading files from the filesystem and opening connections to network hosts.

Continue Reading by clicking title hyperlink ..

↧

Raghavendra Rao: Few areas of improvements in PostgreSQL 9.4

May 18, 2014, 9:50 pm

≫ Next: Leo Hsu and Regina Obe: PostgreSQL 9.4beta1 and PostGIS 2.2.0 dev on Windows

≪ Previous: Paul Ramsey: Security releases 2.0.6 and 2.1.3

With the beta release of PostgreSQL 9.4, DBA's have been given some cool features like pg_prewarm, JSONB, ALTER SYSTEM, Replication Slots and many more. Out of numerous architectural level features presented in this version, likewise there are other few minor enhancements those I have attempted to cover in this blog.

pg_stat_activity view included two new columns (backend_xid/backend_min) to track the transaction id information. pg_stat_activity.backend_xid column covers the id of top-level transaction currently begin executed and pg_stat_activity.backend_xmin column covers the information of minimal running XID. Check out below two query outputs executed in two different situations, first one show the hierarchal information of the transaction id in backend_xmin column of sessions trying to acquire lock(table/Row) on same row, whereas other one just an independent transactions happening without disturbing the same row. This kind of a information help user to know more about the transactions when waiting queries found in the database.

postgres=# select pid,backend_xid,backend_xmin,query from pg_stat_activity where pid<>pg_backend_pid();
  pid  | backend_xid | backend_xmin |           query
-------+-------------+--------------+---------------------------
 22351 |        1905 |         1904 | insert into a values (1);
   785 |        1904 |              | insert into a values (1);
 12796 |             |         1904 | truncate  a;
 12905 |             |         1904 | delete from a ;

postgres=# select pid,backend_xid,backend_xmin,query from pg_stat_activity where pid<>pg_backend_pid();
  pid  | backend_xid | backend_xmin |            query
-------+-------------+--------------+-----------------------------
 22351 |             |              | insert into foo values (1);
   785 |        1900 |              | insert into foo values (1);
(2 rows)

New clauses in CREATE TABLESPACE/ALTER TABLESPACE as "with" and "move" options respectively. Similarly, meta command \db+ to give detailed information about the parameters set for a particular TABLESPACE using "with" option.

postgres=# \h create tablespace
Command:     CREATE TABLESPACE
Description: define a new tablespace
Syntax:
CREATE TABLESPACE tablespace_name
    [ OWNER user_name ]
    LOCATION 'directory'
    [ WITH ( tablespace_option = value [, ... ] ) ]

Example:

postgres=# create tablespace t1 location '/usr/local/pgpatch/pg/ts' with (seq_page_cost=1,random_page_cost=3); 
CREATE TABLESPACE

postgres=# \db+
                                                    List of tablespaces
    Name    |  Owner   |         Location         | Access privileges |               Options                | Description
------------+----------+--------------------------+-------------------+--------------------------------------+-------------
 pg_default | postgres |                          |                   |                                      |
 pg_global  | postgres |                          |                   |                                      |
 t1         | postgres | /usr/local/pgpatch/pg/ts |                   | {seq_page_cost=1,random_page_cost=3} |
(3 rows)

New system functions to give information on type regclass,regproc,regprocedure,regoper,regoperator and regtype. For all the types, new functions are to_regclass(), to_regproc(), to_regprocedure(), to_regoper(), to_regoperator() and to_regtype().

Example:
select to_regclass('pg_catalog.pg_class'),to_regtype('pg_catalog.int4'),to_regprocedure('pg_catalog.abs(numeric)'),to_regproc('pg_catalog.now'),to_regoper('pg_catalog.||/');
 to_regclass | to_regtype | to_regprocedure | to_regproc | to_regoper
-------------+------------+-----------------+------------+------------
 pg_class    | integer    | abs(numeric)    | now        | ||/
(1 row)

New "-g" option in command line utility CREATEUSER to specify role membership.

-bash-4.1$ createuser -g rw -p 10407 r1 
-bash-4.1$ psql -p 10407
psql (9.4beta1) Type "help" for help.

postgres=# \dg
                             List of roles
 Role name |                   Attributes                   | Member of
-----------+------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication | {}
 r1        |                                                | {rw}

pg_stat_all_tables view, has a new column "n_mod_since_analyze", which highlights on the number of rows has been modified since the table was last analyzed. Below outputs brief about the "n_mod_since_analyze" column changes, first time manual analyze executed and after sometime autovacuum invoked on the table, in this duration we can figure out how many rows effected with different catalog update calls.

postgres=# analyze a;
ANALYZE
postgres=# select relname,last_autoanalyze,last_analyze,n_mod_since_analyze from pg_stat_all_tables where relname='a';
 relname | last_autoanalyze |         last_analyze          | n_mod_since_analyze
---------+------------------+-------------------------------+---------------------
 a       |                  | 2014-05-03 02:09:51.002006-07 |                   0
(1 row)

postgres=# insert into a values(generate_series(1,100));
INSERT 0 100
postgres=# select relname,last_autoanalyze,last_analyze,n_mod_since_analyze from pg_stat_all_tables where relname='a';
 relname | last_autoanalyze |         last_analyze          | n_mod_since_analyze
---------+------------------+-------------------------------+---------------------
 a       |                  | 2014-05-03 02:09:51.002006-07 |                 100
(1 row)

postgres=# truncate a;
TRUNCATE TABLE
postgres=# select relname,last_autoanalyze,last_analyze,n_mod_since_analyze from pg_stat_all_tables where relname='a';
 relname | last_autoanalyze |         last_analyze          | n_mod_since_analyze
---------+------------------+-------------------------------+---------------------
 a       |                  | 2014-05-03 02:09:51.002006-07 |                 100
(1 row)

postgres=# select relname,last_autoanalyze,last_analyze,n_mod_since_analyze from pg_stat_all_tables where relname='a';
 relname |       last_autoanalyze        |         last_analyze          | n_mod_since_analyze
---------+-------------------------------+-------------------------------+---------------------
 a       | 2014-05-03 02:14:21.806912-07 | 2014-05-03 02:09:51.002006-07 |                   0
(1 row)

pg_stat_archiver, its a new view introduced to track all WALs generated and it also captures failed WAL's count. If you are from Oracle then this one is like "ARCHIVE LOG LIST".

postgres=# select * from pg_stat_archiver ;
-[ RECORD 1 ]------+------------------------------
 archived_count     | 167
 last_archived_wal  | 00000001000000000000009B
 last_archived_time | 2014-05-02 20:42:36.230998-07
 failed_count       | 75
 last_failed_wal    | 000000010000000000000012
 last_failed_time   | 2014-05-01 12:09:57.087644-07
 stats_reset        | 2014-04-30 19:02:01.288521-07

pg_stat_statements, extension module has a new column queryid to track the internal hash code, computed from the statement's parse tree.

postgres=# select queryid,query from pg_stat_statements;
  queryid   |               query
------------+------------------------------------
 1144716789 | select * from pg_stat_statements ;
(1 row)

Thank you.

--Raghav

↧

Leo Hsu and Regina Obe: PostgreSQL 9.4beta1 and PostGIS 2.2.0 dev on Windows

May 19, 2014, 6:39 pm

≫ Next: Jim Mlodgenski: Trigger Overhead (Part 2)

≪ Previous: Raghavendra Rao: Few areas of improvements in PostgreSQL 9.4

PostgreSQL 9.4beta1 was released last week and windows binaries for both 32-bit and 64-bit are already available to try it out from http://www.postgresql.org/download/windows. Since this is a beta release, there are no installers yet, just the zip binary archive. To make the pot a little sweeter, we've setup the PostGIS windows build bot (Winnie) to automatically build for 9.4 - PostGIS 2.2.0 development branch and pgRouting 2 branches whenever there is a change in the code. We also have the pointcloud extension in the extras folder. If you are on 9.3, we've got 2.2 binaries for that as well. The PostGIS/pgRouting related stuff you can find at http://postgis.net/windows_downloads in the 9.4 folder.

For the rest of this article we'll discuss a couple of stumbling blocks you may run into.

Much of what we'll describe here is windows specific, but thanks to the beauty of extensions and GUCs, the extension install and GUC setting part for PostGIS is applicable to all operating systems.

Continue reading "PostgreSQL 9.4beta1 and PostGIS 2.2.0 dev on Windows"

↧

Jim Mlodgenski: Trigger Overhead (Part 2)

May 21, 2014, 7:40 am

≫ Next: Craig Kerstiens: Postgres and Connection Pooling

≪ Previous: Leo Hsu and Regina Obe: PostgreSQL 9.4beta1 and PostGIS 2.2.0 dev on Windows

I found a bit more time dig into timing of triggers and their overhead so I wanted to see how much overhead the choice of procedural language affected performance. I followed the same testing methodology from my original trigger test. For this test I created an empty trigger in the following languages:

PL/pgSQL

CREATE FUNCTION empty_trigger() RETURNS trigger AS $$
BEGIN
 RETURN NEW;
END;
$$ LANGUAGE plpgsql;

#include "postgres.h"
#include "commands/trigger.h"
PG_MODULE_MAGIC;
Datum empty_c_trigger(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(empty_c_trigger);
Datum
empty_c_trigger(PG_FUNCTION_ARGS)
{
 TriggerData *tg;
 HeapTuple ret;
tg = (TriggerData *) (fcinfo->context);
 if (TRIGGER_FIRED_BY_UPDATE(tg->tg_event))
 ret = tg->tg_newtuple;
 else
 ret = tg->tg_trigtuple;
return PointerGetDatum(ret);
}

PL/Pythonu

CREATE FUNCTION empty_python_trigger() RETURNS trigger AS $$
return
$$ LANGUAGE plpythonu;

PL/Perl

CREATE FUNCTION empty_perl_trigger() RETURNS trigger AS $$
 return; 
$$ LANGUAGE plperl;

PL/TCL

CREATE FUNCTION empty_tcl_trigger() RETURNS trigger AS $$
 return [array get NEW]
$$ LANGUAGE pltcl;

PL/Java

package org.postgresql.pljava;
import java.sql.SQLException;
import java.sql.ResultSet;
import org.postgresql.pljava.TriggerData;
import org.postgresql.pljava.TriggerException;
public class TriggerTest {
 static void test(TriggerData td) throws SQLException {
 ResultSet _new = td.getNew();
 }
}

PL/v8

CREATE FUNCTION empty_v8_trigger() RETURNS trigger AS $$
 return NEW;
$$
LANGUAGE plv8;

PL/R

CREATE FUNCTION empty_r_trigger() RETURNS trigger AS $$
 return(pg.tg.new)
$$ LANGUAGE plr;

All of the triggers essentially return NEW so we’re basically measuring the overhead starting up the trigger function. I then timed inserting 100,000 rows with the triggers in place and compared them to inserting into a table without a trigger. Some of the timings that I found were obvious such as C being the fastest, but others were pretty surprising.

Some of the bigger things that I noticed that out of the 3 built-in higher level languages, Python has much less overhead than Perl and TCL.

The other notable point was how little overhead PL/Java had compared to the other languages. PL/Java only had more overhead than C, PL/pgSQL and PL/Python.

The moral of the story is that when writing triggers some choices matter a lot. If you’re writing a simple trigger that just ensures a column equals the current timestamp, don’t write in PL/v8 just because it cool. Use PL/pgSQL for the simple things and save the other languages for your more complex logic where the overhead of starting them up won’t be noticed.

↧

Craig Kerstiens: Postgres and Connection Pooling

May 21, 2014, 10:00 pm

≫ Next: Michael Paquier: Postgres 9.4 feature highlight: Logical replication protocol

≪ Previous: Jim Mlodgenski: Trigger Overhead (Part 2)

Connection pooling is quickly becoming one of the more frequent questions I hear. So here’s a primer on it. If there’s enough demand I’ll follow up a bit further with some detail on specific Postgres connection poolers and setting them up.

The basics

For those unfamiliar, a connection pool is a group of database connections sitting around that are waiting to be handed out and used. This means when a request comes in a connection is already there whether in your framework or some other pooling process, and then given to your application for that specific request or transaction. In contrast, without any connection pooling your application will have to reach out to your database to establish a connection. While in the most basic sense you may thinking connecting to a database is quick, often theres some overhead here. An example is SSL negotiation that may have to occur which means you’re looking at not 1-2 ms but often closer to 30-50.

The options

There’s really two major options when it comes to connection pooling:

Framework pooling
Standalone pooler
Persistent connections

Framework pooling

Today many modern application frameworks have at least some basic level of connection pooling. This means as your application server starts up it will create a pool of connections to use. It’s worth noting that while most modern frameworks have pooling, not all do, and further it may not be enabled by default.

If you’re using the Sequel ORM for Ruby or SQLAlchemy for Python you’re well covered here. Further Rails is in pretty good shape also, though you may want to configure the pool size. For Django it’s a bit of a mixed story. For some time Django did not have pooling at all. As of Django 1.6 you now have persistent connections by default and the ability to enable a pool.

Persistent connections

Persistent connections don’t offer all of the benefits of pooling, but can often work well enough. Persistent connections is the act of maintaining a connection to your database once it’s connected. In the case where you have overhead of 30-50 ms each time you connect this can be quite helpful. At the same time you’re limited to the number of things that can be interacting with your databases as you’re limited to 1 connection per entry point to your webserver.

Standalone pooling

Postgres can be a bit of a sore spot when it comes to handling a ton of connections. For Postgres each connection you have to your database assumes some overhead of memory. Casual observations have seen it be between 5 and 10 MB assuming some basic query workload. And even if you have the memory overhead on your Postgres instance there becomes a point where management of connections becomes a limiting factor, we’ve seen this somewhere in the hundreds. While framework level connection poolers can give soem better performance and lenthen the time before you have to deal with something more complex if you’re successful that time may come.

A rule of thumb I’d use is if you have over 100 connections you want to look at something more robust

In this case that something more robust is a standalone pooler specifically for Postgres. A standalone pooler can be much more configurable overall letting you specify how it works for Postgres sessions, transactions, or statements. Further these are very specifically designed to work with Postgres handling a very large pool of connections without adding too much overhead. In contrast to the 5MB-ish standard connection to Postgres PG Bouncer has a 2kb per connection.

So once you’re at the point of needing one there’s really two options.

PG Bouncer

My short and sweet recomendation is towards PG Bouncer. Contrary to how it’s named PG Pool is a multi purpose tool that does a lot of things (pooling, load balancing, replication, more). PG Bouncer takes the philosophy of doing one thing and doing it extremely well. I tend to favor these types of tools, which is the same reason I lean towards WAL-E to help with Postgres replication.

Need more?

Need more guidance with setting up and running PGBouncer? Give this guide a look or try the pgbouncer buildpack if running on Heroku. If you’re still interested in a deeper guide let me know @craigkerstiens and I’ll work on getting it into the queue.

Finally, make sure to sign-up below to get updates on Postgres content and first access to training.

↧

Michael Paquier: Postgres 9.4 feature highlight: Logical replication protocol

May 21, 2014, 10:28 pm

≫ Next: Robert Treat: Postgres 9.4 - A First Look

≪ Previous: Craig Kerstiens: Postgres and Connection Pooling

When developping a logical change receiver with the new logical decoding facility of Postgres 9.4, there are a couple of new commands and a certain libpq protocol to be aware of before beginning any development for a logical replication receiver (on top of knowing some basics and what is an output decoder plugin). Here is an exhaustive list of the commands to know.

CREATE_REPLICATION_SLOT

This command can be used to create a logical replication slot (as well as a physical replication slot). Here is an example using an output plugin called decoder_raw using a replication connection:

$ psql "replication=database" \
    -c "CREATE_REPLICATION_SLOT custom_slot LOGICAL decoder_raw"
  slot_name  | consistent_point | snapshot_name | output_plugin 
-------------+------------------+---------------+---------------
 custom_slot | 0/16CC080        | 000003E9-1    | decoder_raw
(1 row)

When running this query, be sure that the result is make of 1 tuple with 4 fields (respectively PQntuples and PQnfields)! Then, the new slot can then be found listed on the server:

$ psql -c "SELECT slot_name, plugin, restart_lsn FROM pg_replication_slots"
  slot_name  |   plugin    | restart_lsn 
-------------+-------------+-------------
 custom_slot | decoder_raw | 0/16CC048
(1 row)

This can be done as well with pg_create_logical_replication_slot with a non-replication connection.

DROP_REPLICATION_SLOT

This command is used to drop a replication slot,simply like this for example:

$ psql "replication=database" -c "DROP_REPLICATION_SLOT custom_slot"
SELECT
$ psql -c "SELECT plugin, restart_lsn FROM pg_replication_slots WHERE slot_name = 'custom_slot'"
 plugin | restart_lsn 
--------+-------------
(0 rows)

After running this command, this result obtained has no tuples and no fields... Drop operation can be done as well with pg_drop_replication_slot using a normal connection to server.

START_REPLICATION

This command already exists in versions of PostgreSQL older than 9.4, it has been extended for the needs of logical replication. For example, to start logical replication from a certain LSN using the clot created above, a command like that sent through a replication slot is enough. like that in the case of the slot created above:

START_REPLICATION SLOT custom_slot LOGICAL restart_lsn;

restart_lsn can be used to specify from which point logical replication begins. With a given decoding plugin, you can as well pass custom options. Here is an example with decoder_raw:

START_REPLICATION SLOT custom_clot LOGICAL restart_lsn ("include-transaction" 'off');

This command will send back a response of type PGRES_COPY_BOTH, containing data that can be retrieved with PQgetCopyData.

Before rushing into coding, have a look at pg_recvlogical. It can provide a good base for developing a custom receiver.

↧

Robert Treat: Postgres 9.4 - A First Look

May 22, 2014, 8:23 am

≫ Next: Ian Barwick: In-server sponsoring

≪ Previous: Michael Paquier: Postgres 9.4 feature highlight: Logical replication protocol

Today I gave a talk at pgcon about the upcoming features in 9.4. As the beta was just last week, I think it's a fairly accurate representation of what should ultimately end up in 9.4. Of course, in the course of a talk I couldn't cover everything, but I think it should give a good primer for anyone looking to upgrade.

I want to give a big thanks to Magnus Hagander and Dave Page, who did talk on earlier versions of 9.4, which was invaluable in helping me put together my own slide deck. Also thank to Michael Paquier and Hiekki Linnikagas who provided supplemental materials. Also no one could do these talks without the work of depesz; I would strongly encourage those looking for more information on 9.4 to check out his blog. Finally, I'd like to thank all of the Postgres Developers who have worked on the 9.4 release, without whom we wouldn't have a release.

↧

Ian Barwick: In-server sponsoring

May 22, 2014, 12:23 pm

≫ Next: Devrim GÜNDÜZ: PostgreSQL 9.4 beta1 out -- also the RPMs! Please test!

≪ Previous: Robert Treat: Postgres 9.4 - A First Look

$1 foreign key violation error messages

permalink

↧

Devrim GÜNDÜZ: PostgreSQL 9.4 beta1 out -- also the RPMs! Please test!

May 23, 2014, 8:01 am

≫ Next: Andrew Dunstan: Pgcon

≪ Previous: Ian Barwick: In-server sponsoring

Last week, PostgreSQL beta 1 was announced. As noted there, there are lots of cool features in 9.4, which needs testing from you.

9.4 RPMs are also out, for sure!

If you are an RPM user, we would like you to test both 9.4 features with the RPMs. The Fedora 20 RPMs have a slight change in unit file: We removed PGPORT variable, which was one of the major complaints over the last few years, including the RHEL 5 and RHEL6 packages. Since RHEL 7 will switch to systemd, testing of Fedora 20 packages is crucial for us.

First, please download the repo package from here. After installing this package, please run

yum install postgresql94-server postgresql94-contrib

which will install the minimum 9.4 stuff for you. Please feel to report bugs to pgsql-bugs@postgresql.org.

Thanks!

↧

Andrew Dunstan: Pgcon

May 23, 2014, 2:32 pm

≫ Next: Greg Sabino Mullane: DBD::Pg, array slices, and pg_placeholder_nocolons

≪ Previous: Devrim GÜNDÜZ: PostgreSQL 9.4 beta1 out -- also the RPMs! Please test!

The jsquery stuff from Oleg and Teodor looks awesome. I will be exploring it very soon. Meanwhile, here are my conference slides: http://www.slideshare.net/amdunstan/94json where I cover mostly 9.4 Json features that aren't about indexing.

This has been one of the better pgcons. Well done Dan and the rest of the team.

↧

Greg Sabino Mullane: DBD::Pg, array slices, and pg_placeholder_nocolons

May 25, 2014, 11:05 am

≫ Next: Josh Berkus: 9.4 Theme Contest Analyzed by 9.4

≪ Previous: Andrew Dunstan: Pgcon

Howler monkey by Miguel Rangel Jr

New versions of DBD::Pg, the Perl driver for PostgreSQL, have been recently released. In addition to some bug fixes, the handling of colons inside SQL statements has been improved in version 3.2.1, and a new attribute named pg_placeholder_nocolons was added by Graham Ollis in version 3.2.0. Before seeing it in action, let's review the concept of placeholders in DBI and DBD::Pg.

Placeholders allow you store a dummy representation of a value inside your SQL statement. This means you can prepare a SQL statement in advance without specific values, and fill in the values later when it is executed. The two main advantages to doing it this way are to avoid worrying about quoting, and to re-use the same statement with different values. DBD::Pg allows for three styles of placeholders: question mark, dollar sign, and named parameters (aka colons). Here's an example of each:

$SQL = 'SELECT tbalance FROM pgbench_tellers WHERE tid = ? AND bid = ?';
$sth = $dbh->prepare($SQL);
$sth->execute(12,33);

$SQL = 'SELECT tbalance FROM pgbench_tellers WHERE tid = $1 AND bid = $2';
$sth = $dbh->prepare($SQL);
$sth->execute(12,33);

$SQL = 'SELECT tbalance FROM pgbench_tellers WHERE tid = :teller AND bid = :bank';
$sth = $dbh->prepare($SQL);
$sth->bind_param(':teller', 10);
$sth->bind_param(':bank', 33);
$sth->execute()

One of the problems with placeholders is that the symbols used are not exclusive for DBI only, but can be valid SQL characters as well, with their own special meaning. For example, question marks are used by geometric operators, dollar signs are used in Postgres for dollar quoting, and colons are used for both type casts and array slices. DBD::Pg has a few ways to solve these problems.

Question marks are the preferred style of placeholders for many users of DBI (as well as some other systems). They are easy to visualize and great for simple queries. However, question marks can be used as operators inside of Postgres. To get around this, you can use the handle attribute pg_placeholder_dollaronly, which will ignore any placeholders other than dollar signs:

## Fails:
$SQL="SELECT ?- lseg'((-1,0),(1,0))' FROM pg_class WHERE relname = \$1";
$sth = $dbh->prepare($SQL);
## Error is: Cannot mix placeholder styles "?" and "$1"

## Works:
$dbh->{pg_placeholder_dollaronly} = 1;
$sth = $dbh->prepare($SQL);
$sth->execute('foobar');
## For safety:
$dbh->{pg_placeholder_dollaronly} = 0;

Another good form of placeholder is the dollar sign. Postgres itself uses dollar signs for its prepared queries. DBD::Pg will actually transform the question mark and colon versions to dollar signs internally before sending the query off to Postgres to be prepared. A big advantage of using dollar sign placeholders is the re-use of parameters. Dollar signs have two problems: first, Perl uses them as a sigil, , Postgres uses them for dollar quoting. However, DBD::Pg is smart enough to tell the difference between dollar quoting and dollar-sign placeholders, so dollar signs as placeholders should always simply work.

The final form of placeholder is 'named parameters' or simply 'colons'. In this format, an alphanumeric string comes right after a colon to "name" the parameter. The main advantage to this form of placeholder is the ability to bind variables by name in your code. The downside is that colons are used by Postgres for both type casting and array slices. The type casting (e.g. 123::int) is detected by DBD::Pg and is not a problem. The detection of array slices was improved in 3.2.1, such that a number-colon-number sequence is never interpreted as a placeholder. However, there are many other ways to write array slices. Therefore, the pg_placeholder_nocolons attribute was invented. When activated, it effectively turns off the use of named parameters:

## Works:
$SQL = q{SELECT relacl[1:2] FROM pg_class WHERE relname = ?};
$sth = $dbh->prepare($SQL);
$sth->execute('foobar');

## Fails:
$SQL = q{SELECT relacl[1 :2] FROM pg_class WHERE relname = ?};
$sth = $dbh->prepare($SQL);
## Error is: Cannot mix placeholder styles ":foo" and "?"

## Works:
$dbh->{pg_placeholder_nocolons} = 1;
$SQL = q{SELECT relacl[1 :2] FROM pg_class WHERE relname = ?};
$sth = $dbh->prepare($SQL);
$sth->execute('foobar');

Which placeholder style you use is up to you (or your framework / supporting module!), but there should be enough options now between pg_placeholder_dollaronly and pg_placeholder_nocolons to support your style peacefully.

↧

Josh Berkus: 9.4 Theme Contest Analyzed by 9.4

May 25, 2014, 8:53 pm

≫ Next: Shaun M. Thomas: PGCon 2014 Unconference: A Community

≪ Previous: Greg Sabino Mullane: DBD::Pg, array slices, and pg_placeholder_nocolons

So a couple weeks ago I ran a little contest to see who could come up with a slogan for PostgreSQL 9.4. Surprisingly, we got over 300 votes on various slogans, which means I need to do some statistics to analyze them -- which means I'm going to show off some of PostgreSQL's new 9.4 features as part of that!

Version 9.4 includes a number of new aggregate, array and set operations which make it vastly easier and faster to do statistical summaries and analysis. Most of these were contributed by Andrew Gierth, including the two I'm going to use below, FILTER and WITHIN GROUP. I'm also going to use MATERIALIZED VIEWS, developed by Kevin Grittner. First, though, I need to import the data. So I downloaded the survey results as a CSV, and created a table for them in PostgreSQL and loaded it up:

CREATE TABLE raw_survey (
    ts       timestamptz,
    prf      integer,
    moreways integer,
    devops   integer,
    moresql integer,
    yesql    integer,
);
\copy raw_survey from 'slogans.csv' with csv header

Now, Google's column-per-question format isn't very friendly to analysis and comparison; I want a more vertical orientation. So I create one as a MatView. This means that if I reimport the data in the future, or weed out obvious ballot-box stuffing, I just need to refresh it:

CREATE MATERIALIZED VIEW slogans AS
SELECT 'Performance, Reliability, Flexibility' as slogan, prf as vote
FROM raw_survey
UNION ALL
SELECT 'More Ways to Database', moreways
FROM raw_survey
UNION ALL
SELECT 'A Database for Dev and Ops', devops
FROM raw_survey
UNION ALL
SELECT 'More Than SQL', moresql
FROM raw_survey
UNION ALL
SELECT 'NoSQL + YeSQL = PostgreSQL', yesql
FROM raw_survey;

Now, for some statistics. A total or average is easy, but it's not statistically sound. A median is a much better statistic. I also want to know the balance of people who hated a slogan (1) vs. loved it and put it first (5). So, some of the new aggregates.

In the past, I've retrieved medians by either using SciPy inside PL/Python, or by doing some elaborate calculations on windowing rank. No more. Now I can do a simple one-line median using WITHIN GROUP. WITHIN GROUP is a lot like a windowing aggregate, except that it returns a single summary aggregate. Shipping with version 9.4 are several such aggregates, including percentile_cont() which is one of three functions which allow you to get the value at the stated percent of a sorted group: in this case, 0.5 to get a median. Like so:

SELECT slogan,
    percentile_cont(0.5) WITHIN GROUP (ORDER BY vote)
FROM slogans
GROUP BY slogan;

slogan	percentile_cont
A Database for Dev and Ops	3
More Than SQL	3
More Ways to Database	3
NoSQL + YeSQL = PostgreSQL	3
Performance, Reliability, Flexibility	4

"Performance, Reliability, Flexibility" is taking a clear lead here. Incidentally, percentile_cont() can take an array of values in order to give you a full box (remember, every time you say "big data" without drawing a box plot, God kills a kitten):

SELECT slogan,
percentile_cont(ARRAY[0.1,0.25,0.5,0.75,0.9]) WITHIN GROUP (ORDER BY vote)
FROM slogans
GROUP BY slogan;

slogan	percentile_cont
A Database for Dev and Ops	{1,2,3,3,4}
More Than SQL	{1.4,2,3,4,5}
More Ways to Database	{1,2,3,4,5}
NoSQL + YeSQL = PostgreSQL	{1,1,3,4,5}
Performance, Reliability, Flexibility	{2,3,4,5,5}

Let's check or "loves" and "hates" to see if they tell us anything different. Now, the old way to do this would be:

SELECT slogan,
    sum(CASE WHEN vote = 1 THEN 1 ELSE 0 END) as hates,
    sum(CASE WHEN vote = 5 THEN 1 ELSE 0 END) as loves
FROM slogans
GROUP BY slogan;

Awkward, neh? Well, no more, thanks to the FILTER clause:

SELECT slogan,
    count(*) FILTER ( WHERE vote = 1 ) as hates,
    count(*) FILTER ( WHERE vote = 1 ) as loves
FROM slogans
GROUP BY slogan;

Isn't that way more intuitive and readable? I think it is, anyway. So, let's put it all together:

SELECT slogan,
    percentile_cont(0.5) WITHIN GROUP (ORDER BY vote) as median,
    count(*) FILTER ( WHERE vote = 1 ) as hates,
    count(*) FILTER ( WHERE vote = 5 ) as loves
FROM slogans
GROUP BY slogan;

And the results:

slogan	median	hates	loves
A Database for Dev and Ops	3	47	21
More Than SQL	3	32	58
More Ways to Database	3	39	55
NoSQL + YeSQL = PostgreSQL	3	81	58
Performance, Reliability, Flexibility	4	11	118

And there we have it: "Performance, Reliability, Flexibility" is the winning theme idea for 9.4. It wins on median, and on hates vs. loves counts.

Congratulations Korry Douglas; I'll contact you about shipping your Chelnik. Note that the theme will be workshopped a little bit to fit in the structure of the final 9.4 release announcement (i.e. we may change it slightly to match the sections of the actual press release), but we're going with that general idea now.

↧

Shaun M. Thomas: PGCon 2014 Unconference: A Community

May 27, 2014, 7:02 am

≫ Next: Keith Fiske: Table Partitioning and Foreign Keys

≪ Previous: Josh Berkus: 9.4 Theme Contest Analyzed by 9.4

This May, I attended my first international conference: PGCon 2014. Though the schedule spanned from May 20th to May 23rd, I came primarily for the talks. Then there was the Unconference on the 24th. I’d never heard of such a thing, but it was billed as a good way to network and find out what community members want from PostgreSQL. After attending the Unconference, I must admit I’m exceptionally glad it exists; it’s something I believe every strong Open Source project needs.

Why do I say that, having only been to one of them? It’s actually fairly simple. Around 10AM Saturday, everyone piled into the large lecture hall and had a seat. There were significantly fewer attendees, but most of the core committers remained for the festivities. We were promised pizza and PostgreSQL, and that’s all anyone needed. Josh Berkus started the festivities by announcing the rules and polling for ideas. The final schedule was pretty interesting in itself, but I was more enamored by the process and the response it elicited.

I’m no stranger to the community, and the mailing lists are almost overwhelmingly active. But these conversations, nearly all of them, are focused on assistance and hacker background noise. The thing that stood out to me during the Unconference planning was its organic nature. It wasn’t just that we chose the event schedule democratically. It wasn’t the wide range of topics. It wasn’t even the fact core members were there to listen. It was the engagement.

These people were excited and enjoying talking about PostgreSQL in a way I’ve never witnessed, and I’ve spoken at Postgres Open twice so far. I’ve seen several talks, been on both sides of the podium, and no previous experience even comes close. We were all having fun brainstorming about PostgreSQL and its future. For one day, it wasn’t about pre-cooked presentations chosen via committee, but about what “the community” wanted to discuss.

When it came time for the talks themselves, this atmosphere persisted. We agreed and disagreed, we had long and concise arguments for and against ideas, clarified our positions, and generally converged toward a loose consensus. And it was glorious. I know we were recording the sessions, so if you have the time when the videos are available, I urge you to watch just one so you can see the beauty and flow of our conversations.

I feel so strongly about this that I believe PGCon needs to start a day earlier. One unfortunate element about the Unconference is that it happens on a Saturday, when everyone wants to leave and return to their families. Worse, there is a marathon on Sunday, meaning it is difficult or even impossible to secure a hotel room for the Saturday event. People tend to follow the path of least resistance, so if there is a problem getting lodging, they won’t go.

And that’s a shame. Having a core of interested and engaged community members not only improves the reputation of PostgreSQL, but its advocacy as well. If people feel they can contribute without having to code, they’ll be more likely to do so. If those contributions, no matter how small, are acknowledged, their progenitors will stick around. I believe this is the grass-roots effort that makes PostgreSQL the future of the database world, and whoever came up with the Unconference deserves every accolade I can exclaim.

We need more of this. PostgreSQL has one of the most open communities I’ve had the pleasure of participating in, and that kind of momentum can’t really be forced. I hope every future PostgreSQL conference in every country has one of these, so everyone with the motivation can take part in the process.

Finally, find your nearest PostgreSQL User Group, and join the community. You’ll be glad you did.

↧

Keith Fiske: Table Partitioning and Foreign Keys

May 28, 2014, 7:27 am

≫ Next: Michael Paquier: Postgres 9.4 feature highlight: Logical replication receiver

≪ Previous: Shaun M. Thomas: PGCon 2014 Unconference: A Community

Table partitioning & foreign keys don’t get along very well in databases and PostgreSQL’s lack of having it built in shows it very clearly with the workarounds that are necessary to avoid the issues. The latest release of pg_partman deals with the lesser of two shortcomings that must be dealt with, that being where child tables in a partition set do not automatically inherit foreign keys created on the parent table. I’ll be using my other extension pg_jobmon as a reference for example here since it works well to illustrate both the issues and possible solutions. You can see here the job_detail table, which contains the individual steps of a logged job, references the the job_log table for the main job_id values.

keith=# \d jobmon.job_detail
                                           Table "jobmon.job_detail"
    Column    |           Type           |                              Modifiers                              
--------------+--------------------------+---------------------------------------------------------------------
 job_id       | bigint                   | not null
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass)
 action       | text                     | not null
 start_time   | timestamp with time zone | not null
 end_time     | timestamp with time zone | 
 elapsed_time | real                     | 
 status       | text                     | 
 message      | text                     | 
Indexes:
    "job_detail_step_id_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_job_id_idx" btree (job_id)
Foreign-key constraints:
    "job_detail_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id) ON DELETE CASCADE

With version <= 1.7.0 of pg_partman, turning this table into a partition set illustrates the issue.

keith=# select partman.create_parent('jobmon.job_detail', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

keith=# \d+ jobmon.job_detail
                                                               Table "jobmon.job_detail"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_step_id_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_job_id_idx" btree (job_id)
Foreign-key constraints:
    "job_detail_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id) ON DELETE CASCADE
Triggers:
    job_detail_part_trig BEFORE INSERT ON jobmon.job_detail FOR EACH ROW EXECUTE PROCEDURE jobmon.job_detail_part_trig_func()
Child tables: jobmon.job_detail_p0,
              jobmon.job_detail_p10000,
              jobmon.job_detail_p20000,
              jobmon.job_detail_p30000,
              jobmon.job_detail_p40000
Has OIDs: no

keith=# \d+ jobmon.job_detail_p0
                                                             Table "jobmon.job_detail_p0"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_p0_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_p0_job_id_idx" btree (job_id)
Check constraints:
    "job_detail_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Inherits: jobmon.job_detail
Has OIDs: no

You can see it is now a partitioned table, but if you look at any of the children, none of them have the FK back to the main job_log table.

As a side note, notice I set the p_jobmon parameter to false in create_parent(). By default pg_partman uses pg_jobmon when it is installed to log everything it does and provide monitoring that your partitioning is working. Since this would mean pg_jobmon is trying to log the partitioning steps of its own table, it puts it into a permanent lockwait state since it’s trying to write to the table it is partitioning. Turning pg_jobmon off for the initial creation avoids this compatibility issue between these two extensions. It can be turned back on for monitoring of future child table creation by modifying the jobmon column in pg_partman’s part_config table. Creation of partitions ahead of the current one does not interfere since a lock on the parent table is no longer required.

Back to the foreign key issue… Lets undo the partitioning here, upgrade pg_partman, and try again

keith=# select partman.undo_partition_id('jobmon.job_detail', 20, p_keep_table := false);
NOTICE:  Copied 0 row(s) to the parent. Removed 5 partitions.
 undo_partition_id 
-------------------
                 0
(1 row)

keith=# alter extension pg_partman update to '1.7.1';
ALTER EXTENSION

keith=# select partman.create_parent('jobmon.job_detail', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

keith=# \d jobmon.job_detail_p0
                                         Table "jobmon.job_detail_p0"
    Column    |           Type           |                              Modifiers                              
--------------+--------------------------+---------------------------------------------------------------------
 job_id       | bigint                   | not null
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass)
 action       | text                     | not null
 start_time   | timestamp with time zone | not null
 end_time     | timestamp with time zone | 
 elapsed_time | real                     | 
 status       | text                     | 
 message      | text                     | 
Indexes:
    "job_detail_p0_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_p0_job_id_idx" btree (job_id)
Check constraints:
    "job_detail_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Foreign-key constraints:
    "job_detail_p0_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
Inherits: jobmon.job_detail

Now our child table has the parent foreign key! The apply_foreign_keys() plpgsql function and the reapply_foreign_keys.py script that are part of the version 1.7.1 can actually be used on any table inheritance set, not just the ones managed by pg_partman. So some may find it useful elsewhere as well. So, what happens if we now partition the reference table, job_log, as well?

keith=# select partman.create_parent('jobmon.job_log', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

keith=# \d+ jobmon.job_log
                                                             Table "jobmon.job_log"
   Column   |           Type           |                            Modifiers                            | Storage  | Stats target | Description 
------------+--------------------------+-----------------------------------------------------------------+----------+--------------+-------------
 job_id     | bigint                   | not null default nextval('jobmon.job_log_job_id_seq'::regclass) | plain    |              | 
 owner      | text                     | not null                                                        | extended |              | 
 job_name   | text                     | not null                                                        | extended |              | 
 start_time | timestamp with time zone | not null                                                        | plain    |              | 
 end_time   | timestamp with time zone |                                                                 | plain    |              | 
 status     | text                     |                                                                 | extended |              | 
 pid        | integer                  | not null                                                        | plain    |              | 
Indexes:
    "job_log_job_id_pkey" PRIMARY KEY, btree (job_id)
    "job_log_job_name_idx" btree (job_name)
    "job_log_pid_idx" btree (pid)
    "job_log_start_time_idx" btree (start_time)
    "job_log_status_idx" btree (status)
Referenced by:
    TABLE "jobmon.job_detail" CONSTRAINT "job_detail_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id) ON DELETE CASCADE
    TABLE "jobmon.job_detail_p0" CONSTRAINT "job_detail_p0_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p10000" CONSTRAINT "job_detail_p10000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p20000" CONSTRAINT "job_detail_p20000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p30000" CONSTRAINT "job_detail_p30000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
    TABLE "jobmon.job_detail_p40000" CONSTRAINT "job_detail_p40000_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobmon.job_log(job_id)
Triggers:
    job_log_part_trig BEFORE INSERT ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_log_part_trig_func()
    trg_job_monitor AFTER UPDATE ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_monitor()
Child tables: jobmon.job_log_p0,
              jobmon.job_log_p10000,
              jobmon.job_log_p20000,
              jobmon.job_log_p30000,
              jobmon.job_log_p40000
Has OIDs: no


keith=# \d jobmon.job_log_p0
                                        Table "jobmon.job_log_p0"
   Column   |           Type           |                            Modifiers                            
------------+--------------------------+-----------------------------------------------------------------
 job_id     | bigint                   | not null default nextval('jobmon.job_log_job_id_seq'::regclass)
 owner      | text                     | not null
 job_name   | text                     | not null
 start_time | timestamp with time zone | not null
 end_time   | timestamp with time zone | 
 status     | text                     | 
 pid        | integer                  | not null
Indexes:
    "job_log_p0_pkey" PRIMARY KEY, btree (job_id)
    "job_log_p0_job_name_idx" btree (job_name)
    "job_log_p0_pid_idx" btree (pid)
    "job_log_p0_start_time_idx" btree (start_time)
    "job_log_p0_status_idx" btree (status)
Check constraints:
    "job_log_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Inherits: jobmon.job_log

It partitions the table without any errors and you can see all the child table foreign keys referencing the parent. But notice the job_log_p0 child table? It has no references from any of the children. And this is the bigger issue that pg_partman does not solve, and most likely never will…

Foreign key reference checks to the parent table in an inheritance set do not propagate to the children

Since the parent table in an inheritance set is typically either empty, or only contains a fraction of the total data, the table referencing the partition set will either fail on every insert or when it hits a value that is only in a child table. The below SQL statements illustrate the issue

keith=# INSERT INTO jobmon.job_log (owner, job_name, start_time, pid) values ('keith', 'FK FAILURE TEST', now(), pg_backend_pid());
INSERT 0 0

keith=# select * from jobmon.job_log;
 job_id | owner |                     job_name                     |          start_time           | end_time | status |  pid  
--------+-------+--------------------------------------------------+-------------------------------+----------+--------+-------
      2 | keith | FK FAILURE TEST                                  | 2014-05-26 23:14:35.830266-04 | «NULL»   | «NULL» | 25286

keith=# insert into jobmon.job_detail (job_id, action, start_time) values (2, 'FK FAILURE TEST STEP 1', now());
ERROR:  insert or update on table "job_detail_p0" violates foreign key constraint "job_detail_p0_job_id_fkey"
DETAIL:  Key (job_id)=(2) is not present in table "job_log".
CONTEXT:  SQL statement "INSERT INTO jobmon.job_detail_p0 VALUES (NEW.*)"
PL/pgSQL function jobmon.job_detail_part_trig_func() line 11 at SQL statement

You can clearly see the job_log table has the job_id value “2″, but trying to insert that value into the table that uses it as a reference fails. This is because that value lives in job_log_p0, not job_log and the FK reference check does not propagate to the child tables.

keith=# select * from only jobmon.job_log;
 job_id | owner | job_name | start_time | end_time | status | pid 
--------+-------+----------+------------+----------+--------+-----
(0 rows)

keith=# select * from only jobmon.job_log_p0;
 job_id | owner |    job_name     |          start_time           | end_time | status |  pid  
--------+-------+-----------------+-------------------------------+----------+--------+-------
      2 | keith | FK FAILURE TEST | 2014-05-26 23:14:35.830266-04 | «NULL»   | «NULL» | 25286
(1 row)

I’m not sure of all of the reasons why PostgreSQL doesn’t allow FK checks to propagate down inheritance trees, but I do know one of the consequences of doing so could be some heavy performance hits for the source table if the inheritance set is very large. Every insert would have to scan down all tables in the inheritance tree. Even with indexes, this could be a very expensive.

There is a way to write a trigger and “fake” the foreign key if this is needed. I looked into this because I do want to be able to partition the pg_jobmon tables and keep referential integrity. To see how this works, I’m starting with a clean installation of pg_jobmon (no partitions). First the original foreign key on job_detail has to be removed, then a trigger is created in its place.

keith=# alter table jobmon.job_detail drop constraint job_detail_job_id_fkey;
ALTER TABLE

keith=# CREATE OR REPLACE FUNCTION jobmon.job_detail_fk_trigger() RETURNS trigger
keith-#     LANGUAGE plpgsql
keith-#     AS $$
keith$# DECLARE
keith$# v_job_id    bigint;
keith$# BEGIN
keith$#     SELECT l.job_id INTO v_job_id
keith$#     FROM jobmon.job_log l
keith$#     WHERE l.job_id = NEW.job_id;
keith$# 
keith$#     IF v_job_id IS NULL THEN
keith$#         RAISE foreign_key_violation USING 
keith$#             MESSAGE='Insert or update on table "jobmon.job_detail" violates custom foreign key trigger "job_detail_fk_trigger" ',
keith$#             DETAIL='Key (job_id='||NEW.job_id||') is not present in jobmon.job_log';
keith$#     END IF;
keith$#     RETURN NEW;
keith$# END
keith$# $$;
CREATE FUNCTION

keith=# 
keith=# 
keith=# CREATE TRIGGER aa_job_detail_fk_trigger 
keith-# BEFORE INSERT OR UPDATE OF job_id
keith-# ON jobmon.job_detail
keith-# FOR EACH ROW
keith-# EXECUTE PROCEDURE jobmon.job_detail_fk_trigger();
CREATE TRIGGER

This MUST be a BEFORE trigger and I gave the trigger name a prefix of “aa_” because PostgreSQL fires triggers off in alphabetical order and I want to ensure it goes first as best I can. Now we partition job_detail & job_log the same as before.

keith=# select partman.create_parent('jobmon.job_detail', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

Time: 168.831 ms
keith=# \d+ jobmon.job_detail
                                                               Table "jobmon.job_detail"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_step_id_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_job_id_idx" btree (job_id)
Triggers:
    aa_job_detail_fk_trigger BEFORE INSERT OR UPDATE OF job_id ON jobmon.job_detail FOR EACH ROW EXECUTE PROCEDURE jobmon.job_detail_fk_trigger()
    job_detail_part_trig BEFORE INSERT ON jobmon.job_detail FOR EACH ROW EXECUTE PROCEDURE jobmon.job_detail_part_trig_func()
Child tables: jobmon.job_detail_p0,
              jobmon.job_detail_p10000,
              jobmon.job_detail_p20000,
              jobmon.job_detail_p30000,
              jobmon.job_detail_p40000
Has OIDs: no

keith=# \d+ jobmon.job_detail_p0
                                                             Table "jobmon.job_detail_p0"
    Column    |           Type           |                              Modifiers                              | Storage  | Stats target | Description 
--------------+--------------------------+---------------------------------------------------------------------+----------+--------------+-------------
 job_id       | bigint                   | not null                                                            | plain    |              | 
 step_id      | bigint                   | not null default nextval('jobmon.job_detail_step_id_seq'::regclass) | plain    |              | 
 action       | text                     | not null                                                            | extended |              | 
 start_time   | timestamp with time zone | not null                                                            | plain    |              | 
 end_time     | timestamp with time zone |                                                                     | plain    |              | 
 elapsed_time | real                     |                                                                     | plain    |              | 
 status       | text                     |                                                                     | extended |              | 
 message      | text                     |                                                                     | extended |              | 
Indexes:
    "job_detail_p0_pkey" PRIMARY KEY, btree (step_id)
    "job_detail_p0_job_id_idx" btree (job_id)
Check constraints:
    "job_detail_p0_partition_check" CHECK (job_id >= 0::bigint AND job_id < 10000::bigint)
Inherits: jobmon.job_detail
Has OIDs: no

keith=# select partman.create_parent('jobmon.job_log', 'job_id', 'id-static', '10000', p_jobmon := false);
 create_parent 
---------------
 
(1 row)

Time: 197.390 ms
keith=# \d+ jobmon.job_log
                                                             Table "jobmon.job_log"
   Column   |           Type           |                            Modifiers                            | Storage  | Stats target | Description 
------------+--------------------------+-----------------------------------------------------------------+----------+--------------+-------------
 job_id     | bigint                   | not null default nextval('jobmon.job_log_job_id_seq'::regclass) | plain    |              | 
 owner      | text                     | not null                                                        | extended |              | 
 job_name   | text                     | not null                                                        | extended |              | 
 start_time | timestamp with time zone | not null                                                        | plain    |              | 
 end_time   | timestamp with time zone |                                                                 | plain    |              | 
 status     | text                     |                                                                 | extended |              | 
 pid        | integer                  | not null                                                        | plain    |              | 
Indexes:
    "job_log_job_id_pkey" PRIMARY KEY, btree (job_id)
    "job_log_job_name_idx" btree (job_name)
    "job_log_pid_idx" btree (pid)
    "job_log_start_time_idx" btree (start_time)
    "job_log_status_idx" btree (status)
Triggers:
    job_log_part_trig BEFORE INSERT ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_log_part_trig_func()
    trg_job_monitor AFTER UPDATE ON jobmon.job_log FOR EACH ROW EXECUTE PROCEDURE jobmon.job_monitor()
Child tables: jobmon.job_log_p0,
              jobmon.job_log_p10000,
              jobmon.job_log_p20000,
              jobmon.job_log_p30000,
              jobmon.job_log_p40000
Has OIDs: no

You can see that triggers are not inherited to child tables, so that is why it must be a BEFORE trigger on the job_detail parent. The insert does not actually happen on the job_detail parent table, so the event must be caught before any insert is actually done. Also, this isn’t quite as flexible as a real foreign key since there are no CASCADE options to handle data being removed on the parent. This also causes much heavier locks than a real foreign key. Lets see what happens if we try the same inserts that failed above

keith=# INSERT INTO jobmon.job_log (owner, job_name, start_time, pid) values ('keith', 'FK FAILURE TEST', now(), pg_backend_pid());
INSERT 0 0

keith=# select * from jobmon.job_log;
 job_id | owner |    job_name     |          start_time          | end_time | status | pid  
--------+-------+-----------------+------------------------------+----------+--------+------
      1 | keith | FK FAILURE TEST | 2014-05-27 12:59:03.06901-04 | «NULL»   | «NULL» | 3591
(1 row)

keith=# insert into jobmon.job_detail (job_id, action, start_time) values (1, 'FK FAILURE TEST STEP 1', now());
INSERT 0 0

keith=# select * from jobmon.job_detail;
 job_id | step_id |         action         |          start_time          | end_time | elapsed_time | status | message 
--------+---------+------------------------+------------------------------+----------+--------------+--------+---------
      1 |       1 | FK FAILURE TEST STEP 1 | 2014-05-27 12:59:40.03766-04 | «NULL»   |       «NULL» | «NULL» | «NULL»
(1 row)

keith=# select * from only jobmon.job_detail;
 job_id | step_id | action | start_time | end_time | elapsed_time | status | message 
--------+---------+--------+------------+----------+--------------+--------+---------
(0 rows)

keith=# select * from only jobmon.job_detail_p0;
 job_id | step_id |         action         |          start_time          | end_time | elapsed_time | status | message 
--------+---------+------------------------+------------------------------+----------+--------------+--------+---------
      1 |       1 | FK FAILURE TEST STEP 1 | 2014-05-27 12:59:40.03766-04 | «NULL»   |       «NULL» | «NULL» | «NULL»
(1 row)

No errors! And what happens if we try and insert invalid data to the child table?

keith=# insert into jobmon.job_detail (job_id, action, start_time) values (2, 'FK FAILURE TEST STEP 1', now());
ERROR:  Insert or update on table "jobmon.job_detail" violates custom foreign key trigger "job_detail_fk_trigger" 
DETAIL:  Key (job_id=2) is not present in jobmon.job_log

Since the trigger function is doing a normal select on the parent table of the job_log partition set, it is seeing data across all the child partitions. AND, since job_id is the partition column of job_log, the trigger function will actually be able to take advantage of constraint exclusion and will only actually touch the one single partition that value could be in. So this works very well in this case, even if the partition set grows extremely large. Now, if you create a FK trigger like this on any other column that doesn’t have constraints, you will begin to notice performance issues as the reference table grows in size. If your tables contain static, unchanging data, pg_partman has some additional options that can help here as well (see my previous post about constraint exclusion).

The other issue with this is exclusive to pg_jobmon being an extension. The lack of a foreign key and presence of a trigger is different than the default extension code. There is the potential that a future extension update could either remove the trigger or replace the foreign key. There’s currently no way to give extension installation options for different code branches that I’m aware of and keep things consistent. In the case of pg_jobmon, the extension is mostly feature complete and I don’t foresee any updates breaking the above fix. But it is something to be aware of if you have to change the default code in any extension.

This is a complicated issue and one that many people don’t realize when trying to plan out table partitioning for more complex schemas. Hopefully I’ve helped clarify things and shown why partitioning is such a tricky issue to get right.

↧