Quantcast
Channel: Planet PostgreSQL
Viewing all 9824 articles
Browse latest View live

Michael Paquier: Postgres 9.5 feature highlight: Allocation routine suppressing OOM error

$
0
0

A couple of days ago the following commit has popped up in PostgreSQL tree for the upcoming 9.5, introducing a feature particularly interesting for developers of backend extensions and plugins:

commit: bd4e2fd97d3db84bd970d6051f775b7ff2af0e9d
author: Robert Haas <rhaas@postgresql.org>
date: Fri, 30 Jan 2015 12:56:48 -0500
Provide a way to supress the "out of memory" error when allocating.

Using the new interface MemoryContextAllocExtended, callers can
specify MCXT_ALLOC_NO_OOM if they are prepared to handle a NULL
return value.

Michael Paquier, reviewed and somewhat revised by me.

The memory allocation routines are located in the code of PostgreSQL in mcxt.c while being declared in palloc.h, the most famous routines of this set being palloc(), palloc0(), or repalloc() which work on CurrentMemoryContext. There are as well some higher-level routines called MemoryContextAlloc* able to perform allocations in a memory context specified by caller. Using those routines it is possible to allocate memory in any context other than the current one. Each existing allocation routine share a common property: when allocation request cannot be completed because of an out-of-memory error, process simply errors out, contrary to what a malloc() would do by returning a NULL pointer to the caller with ENONEM set as errno.

The commit above introduces in backend code a routine allowing to bypass this out-of-memory error and get back a NULL pointer if system runs out of memory, something particularly useful for features having a plan B if plan A that needed a certain amount of allocated buffer could not get the memory wanted. Let's imagine for example the case of a backend process performing some compression of data using a custom data type. If compression buffer cannot be allocated, process can store the data as-is instead of failing, making this case more robust.

So, the new routine is called MemoryContextAllocExtended, and comes with three control flags:

  • MCXT_ALLOC_HUGE, to perform allocations higher than 1GB. This is equivalent to MemoryContextAllocHuge if this flag is used alone.
  • MCXT_ALLOC_NO_OOM, to avoid any ERROR message when an OOM shows up. This is the real meat of the feature.
  • MCXT_ALLOC_ZERO, to fill in memory allocated with zeros. This is equivalent to MemoryContextAllocZero if this flag is used alone.

Something worth noticing is that the combination of MCXT_ALLOC_HUGE and MCXT_ALLOC_ZERO is something that even the existing routines cannot do. Now, using this new routine let's do something actually useless with it, as known as allocating a custom amount of memory, free'd immediately after, using a custom function defined as followed:

CREATE FUNCTION mcxtalloc_extended(size int,
    is_huge bool,
    is_no_oom bool,
    is_zero bool)
RETURNS bool
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT; 

And this function is coded like this:

Datum
mcxtalloc_extended(PG_FUNCTION_ARGS)
{
    Size    alloc_size = PG_GETARG_UINT32(0);
    bool    is_huge = PG_GETARG_BOOL(1);
    bool    is_no_oom = PG_GETARG_BOOL(2);
    bool    is_zero = PG_GETARG_BOOL(3);
    int     flags = 0;
    char   *ptr;

    if (is_huge)
        flags |= MCXT_ALLOC_HUGE;
    if (is_no_oom)
        flags |= MCXT_ALLOC_NO_OOM;
    if (is_zero)
        flags |= MCXT_ALLOC_ZERO;
    ptr = MemoryContextAllocExtended(CurrentMemoryContext,
            alloc_size, flags);
    if (ptr != NULL)
    {
        pfree(ptr);
        PG_RETURN_BOOL(true);
    }
    PG_RETURN_BOOL(false);
}

In an environment with low-memory, a huge allocation fails as follows:

-- Kick an OOM
=# SELECT mcxtalloc_extended(1024 * 1024 * 1024 - 1, false, false, false);
ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.

But with the new extended option MCXT_ALLOC_NO_OOM the error is avoided, giving more options to plugin as well as in-core developers:

=# SELECT mcxtalloc_extended(1024 * 1024 * 1024 - 1, false, true, false);
 mcxtalloc_extended
--------------------
 f
(1 row)

Just for people wondering: this code is available in pg_plugins here.


Pierre Ducroquet: Modern C++ stored procedure wrapper

$
0
0

In a application following an intelligent database design, calls to stored procedures happen very often and thus must be done with as little boilerplate as possible.
Usually frameworks abstracting calls to the database are just ORMs, ignoring completely stored procedures, making the database stupid and moving all the logic in the application.

A year ago, I read on planet PostgreSQL (http://tech.zalando.com/posts/zalando-stored-procedure-wrapper-part-i.html) about a simple system built using Java and the Java annotation and reflection system.
A stored procedure can be called using a few lines of interface :

@SProcService
interface BasicExample {
    @SProcCall
    long computeProduct(@SProcParam int a, @SProcParam int b);
}

Recently, I started planning the development on my spare time on a C++/Qt5 application using a PostgreSQL database and I realized I had just no way to easily call stored procedures. Doing a proper database for the application would thus be a huge pain from a C++ point of view, messing database calls in the middle of the application… Since my C++ skills needed an update (C++11 and C++14 are out in the wild since a few years and I never had an opportunity to use the new features they bring), I figured this would be the best time to do it.

C++ does not have (yet… C++17, I have faith in you) the attributes and introspection used in the stored procedure wrapper of Zalando. Instead, C++ has a great compilation-time processing system through the templates. Templates are not just meant for implementing generics, they are a turing-complete meta-programming language. You can really do a lot of things using them. A lot. For instance, a tuple type working just like a Python tuple, to store a few values of different types side by side. Or implementing a compile-time mathematical function. C++11 and C++14 brought variadic templates, auto and a few other tools
that seemed very powerful and could yield great solutions for my problem.

After a few hours of hacking, I had my first interesting results :

SqlBindingMapper<QDateTime> get_now("now");
qDebug() << get_now();

With a few lines to implement the database connection (using QtSql so far, because I plan to write Qt5 applications with it), these two lines are enough to call NOW() in the database and map the result to a QDateTime, the Qt Date-Time representation object.

Of course, returning a single value from an argument-less function, that’s not really interesting. Let’s sum two digits.

SqlBindingMapper<int, int, int> summer("sum");
qDebug() << summer(1, 2);

And this will display 3.

So that’s for one returned record with one field. What about calling generate_series ?

SqlBindingMapper<QList<int>, int, int> generateSeries("generate_series");
for (auto i: generateSeries(1, 10))
    qDebug() << i;

Now, what about the following composite type :

CREATE TYPE card AS (value integer, suit text);
CREATE FUNCTION test_card (a card ) RETURNS integer LANGUAGE SQL AS 
$function$ SELECT $1.value; $function$;

Calling that function in C++ is only requiring you to use std::tuple :

SqlBindingMapper<int, std::tuple<int, QString>> testCard("test_card");
int value = testCard(std::make_tuple(1, "test"));
qDebug() << value;

Qt QObject introspection is also supported and during the FOSDEM I hacked support for arrays (ok, Qt vectors, but STL vectors are as easy to support) :

SqlBindingMapper<int, QVector<int>, int> array_length("array_length");
QVector<int> data;
data << 1 << 2;
qDebug() << "Our dims are :" << array_length(data, 1);

 

How does all this work behind the scene ? SqlBindingMapper is a template class that take a variadic number of parameters, the first one being the return type. It then implements the operator () returning the specified returning type and taking the specified parameters. A query is then built (at runtime so far, but this could evolve) with placeholders and the appropriate casting, still using templates. The placeholders are then filled, and after coming back from the database, a SqlQueryResultMapper<T> instance maps the rows to the required objects.

So far, the code is a crude hack, done for recreational purpose and for my own education. If there is any interest in a proper explanation of the code and people wanting to use it in real serious projects, I would be happy to help or write more documentation of course.

Right now, the code packaging suck. It’s only 3 headers you’d have to copy from the example on github : https://github.com/pinaraf/StoredProq/ (pg_types.h, queryresult.h and sqlmapper.h)

I hope you had fun reading this !

Bruce Momjian: Sharding Presentation

$
0
0

As a followup to my scaling talk, I have written a draft of another talk, The Future of Postgres Sharding. It starts by explaining the advantages of sharding as a scaling option. It then covers future enhancements to individual Postgres features that, while useful on their own, could be combined to provide a powerful built-in Postgres sharding capability. I am hopeful this talk will help guide the community discussion of implementing built-in sharding.

Markus Winand: Modern SQL in PostgreSQL [and other databases]

$
0
0

“SQL has gone out of fashion lately—partly due to the NoSQL movement, but mostly because SQL is often still used like 20 years ago. As a matter of fact, the SQL standard continued to evolve during the past decades resulting in the current release of 2011. In this session, we will go through the most important additions since the widely known SQL-92, explain how they work and how PostgreSQL supports and extends them. We will cover common table expressions and window functions in detail and have a very short look at the temporal features of SQL:2011 and the related features of PostgreSQL.”

This is the abstract for the talk I’ve given at FOSDEM in Brussels on Saturday. The PostgreSQL community was so kind to host this talk in their (way too small) devroom—thus the references to PostgreSQL. However, the talk is build upon standard SQL and covers features that are commonly available in DB2, Oracle, SQL Server and SQLite. MySQL does not yet support any of those features except OFFSET, which is evil.

One last thing before going on to the slides: Use The Index, Luke has a shop. Stickers, coasters, books, mugs. Have a look.

Find the slides on SlideShare.

Ernst-Georg Schmid: Finding mass spectra with PostgreSQL: Spectral contrast angle in SQL

$
0
0
The spectral contrast angle of two spectra is another, according to literature, one of the methods for comparing spectra by similarity.

The spectral contrast angle S is calculated by building a intensity vector in N-dimensional space for each spectrum, where N is the number of m/z peaks in the spectrum, and finding the cosine of the angle between the two vectors. If both are the same, S is 1.0 if the two are orthogonal, S is 0.0.

The formula is S = sum(a.intensity*b.intensity) / sqrt(sum(a.intensity^2)*sum(b.intensity^2)) with a and b being the spectra to compare.

On a table like

CREATE TABLE <spectra_table>
(
  id integer NOT NULL,
  "m/z" numeric NOT NULL,
  intensity numeric NOT NULL

)

this can easily be done in PostgtreSQL with a common table expression:

WITH SQ AS (select "m/z", intensity from <spectra_table> where id = 1), ST AS (select  "m/z", intensity from <spectra_table> where id = 2),
qc as (select count(1) as qp from sq), tc as (select count(1) as tp from st)
select sum(sq.intensity*st.intensity)/sqrt(sum(sq.intensity^2)*sum(st.intensity^2)) as spectral_contrast from SQ,ST,qc,tc where qc.qp = tc.tp and SQ."m/z" = ST."m/z"


One interesting property of S is, that it evaluates to 1.0 if the intensities of the two spectra are different, but only by a multiple of an integer. If I understood it right, this is correct, because this means that you have measured the same composition at different concentrations.

Put into a function:

CREATE OR REPLACE FUNCTION spectral_contrast(int, int) RETURNS numeric
    AS ' WITH SQ AS (select "m/z", intensity from <spectra_table> where id = $1), ST AS (select  "m/z", intensity from <spectra_table> where id = $2),
qc as (select count(1) as qp from sq), tc as (select count(1) as tp from st)
select coalesce(sum(sq.intensity*st.intensity)/sqrt(sum(sq.intensity^2)*sum(st.intensity^2)) ,0.0) as spectral_contrast from SQ,ST,qc,tc where qc.qp = tc.tp and SQ."m/z" = ST."m/z"'

LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;


Please not the additional coalesce() to return 0.0 when the SQL evaluates to NULL because of the join.

In a real application, the calculation of S can of course be accelerated, e.g. by prefiltering spectra by their number of peaks or their m/z range.

That's it for today.

Pavel Stehule: template_fdw

$
0
0
Hi

I wrote a template foreign data wrapper. It is very simple FDW, that doesn't allow any DML and SELECT operation over table. It is based on Andrew Dunstan's blackhole FDW. What is benefit of this strange data wrapper? I wrote this for possibility to check plpgsql code that uses temporary tables. plpgsql_check cannot do a static validation of functions that uses temporary tables.

I have a function test:
CREATE OR REPLACE FUNCTION public.test()
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE r record;
BEGIN
BEGIN
DELETE FROM foo; -- temp table
EXCEPTION WHEN OTHERS THEN
CREATE TEMP TABLE foo(a int, b int);
END;
INSERT INTO foo VALUES(10,20);
FOR r IN SELECT * FROM foo
LOOP
RAISE NOTICE '% %', r.a, r.b;
END LOOP;
END;
$function$
This code I cannot to verify with plpgsql_check due dependency on temp table foo:
postgres=# select plpgsql_check_function('test()', fatal_errors := false);
plpgsql_check_function
----------------------------------------------------------------------------
error:42P01:5:SQL statement:relation "foo" does not exist
Query: DELETE FROM foo
-- ^
error:42P01:9:SQL statement:relation "foo" does not exist
Query: INSERT INTO foo VALUES(10,20)
-- ^
error:42P01:10:FOR over SELECT rows:relation "foo" does not exist
Query: SELECT * FROM foo
-- ^
error:55000:12:RAISE:record "r" is not assigned yet
Detail: The tuple structure of a not-yet-assigned record is indeterminate.
Context: SQL statement "SELECT r.a"
error:55000:12:RAISE:record "r" is not assigned yet
Detail: The tuple structure of a not-yet-assigned record is indeterminate.
Context: SQL statement "SELECT r.b"
(15 rows)

I can create persistent table foo. But if I forgot to drop this table, I can have lot of problems, but some one can be invisible. So I created "template storage" that disallow any DML or SELECT. This decrease a risk and I can have these fake tables persistent:
CREATE SERVER template FOREIGN DATA WRAPPER template_fdw;
CREATE FOREIGN TABLE foo(a int, b int) SERVER template;

postgres=# SELECT * FROM foo;
ERROR: cannot read from table "public.foo"
DETAIL: Table is template.
HINT: Create temp table by statement "CREATE TEMP TABLE foo(LIKE public.foo INCLUDING ALL);"

-- but EXPLAIN is valid
postgres=# EXPLAIN SELECT * FROM foo;
QUERY PLAN
-------------------------------------------------------
Foreign Scan on foo (cost=0.00..0.00 rows=0 width=8)
(1 row)

-- and now I can do static validation
postgres=# select plpgsql_check_function('test()', fatal_errors := false);
plpgsql_check_function
------------------------
(0 rows)

Hans-Juergen Schoenig: Geocoding: Entertaining results

$
0
0
Recently I was working on a project which had a need for geocoding. Normally we use PostGIS along with some free data for geocoding (http://postgis.net/docs/Geocode.html). However, to cross check data and to verify results I decided to write a little Python function to see, what Google would actually give me. To come up with the […]

Feng Tian: Julia, Postgres on Mac

$
0
0
I took Julia, PostgreSQL to a test drive on my macbook.    It worked like charm.

First, the Juno IDE is quite nice.   Download, drag, drop, open, runs!    Connection to Postgres via libpq is not very usable at this moment, so I went the odbc route.    I used port, to install psqlODBC and unixODBC.   Configure DNS at ~/.odbc.ini

Fengs-MBP:pg ftian$ cat ~/.odbc.ini 
[ftian]
Driver          = /opt/local/lib/psqlodbcw.so 
ServerName      = localhost
Port            = 5432
Username        = ftian
Database        = ftian

Tested with isql, DNS works.   Nice.   Next, try ODBC in Julia, cannot find any DNS.  Ooops.   Turns out julia ODBC need some help to locate libodbc.   The file to edit, is hidden at 

/Users/ftian/.julia/v0.3/ODBC/src/ODBC_Types.jl

After that, all works -- time to play with some data.   




So I loaded a csv file (TPCH 1G, lineitem) in Julia, took about 2 minutes.  I am quite impressed -- compared this to R!   TPCH data is really | separated, not comma separated, but Julia got the lineitem count right.   My favorite query language is still SQL, so, let's pipe the csv file through PostgreSQL using the wonderful file fdw.


set search_path=csvfdw;
CREATE FOREIGN TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    INTEGER /*DECIMAL(15,2)*/ NOT NULL,
                             L_EXTENDEDPRICE  MONEY/*DECIMAL(15,2)*/ NOT NULL,
                             L_DISCOUNT    DOUBLE PRECISION /*DECIMAL(15,2)*/ NOT NULL,
                             L_TAX         DOUBLE PRECISION /*DECIMAL(15,2)*/ NOT NULL,
                             L_RETURNFLAG  VARCHAR(1) /*CHAR(1)*/ NOT NULL,
                             L_LINESTATUS  VARCHAR(1) /*CHAR(1)*/ NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT VARCHAR(25) /*CHAR(25)*/ NOT NULL,
                             L_SHIPMODE     VARCHAR(10) /*CHAR(10)*/ NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL)
        server filefdw
        options ( filename '_PWD_/data/lineitem.tbl', 
                  format 'csv',
                  delimiter '|');


Now, let's ask on average, how much items in an order?    About 4.   PostgresSQL answered this query in about 10 sec.    If we load the data, we can answer it in about 5 seconds.

ftian=# select avg(lno) from ( select max(l_linenumber) as lno from csvfdw.lineitem group by l_orderkey) tmpt;
        avg         
--------------------
 4.0008100000000000
(1 row)

Time: 9912.140 ms
ftian=# select avg(lno) from ( select max(l_linenumber) as lno from tpch1.lineitem group by l_orderkey) tmpt;
        avg         
--------------------
 4.0008100000000000
(1 row)

Time: 5598.877 ms

Of course, cannot help show off our own code :-)   We run the query from CSV file faster than original PG after loading data!

ftian=# set vdb_jit = 1;
SET
Time: 0.140 ms
ftian=# select avg(lno) from ( select max(l_linenumber) as lno from csvfdw.lineitem group by l_orderkey) tmpt;
        avg         
--------------------
 4.0008100000000000
(1 row)

Time: 3755.514 ms

ftian=# select avg(lno) from ( select max(l_linenumber) as lno from tpch1.lineitem group by l_orderkey) tmpt;
        avg         
--------------------
 4.0008100000000000
(1 row)

Time: 1863.698 ms



Ernst-Georg Schmid: count(*) is faster than count(1)

$
0
0
Today I've discovered incidentally, that count(*) is faster than count(1) on PostgreSQL 9.3:

create table  hmr(id serial, value real);

insert into hmr (value) select random()*10000000 from generate_series(1,10000000);

select count(1) from hmr; --731 msec


select count(*) from hmr; --661 msec

Used repeatedly, e.g. in a function, this can really add up and make a difference.
In the spectral contrast angle CTE from the previous post, it reduces runtime by two seconds or 24 %.

I would have expected the constant to be faster than actually reading the row...

Hubert 'depesz' Lubaczewski: Fixed a bug in OmniPITR

$
0
0
Just thought I'll share a “fun" story. Friend reported weird bug – OmniPITR reported that xlogs are sent to archive, but they actually weren't. After some checking we found out that he was giving custom rsync-path (–rsync-path – path to rsync program) – and the path was broken. In this case – OmniPITR was not […]

Andrew Dunstan: New release of PLV8

$
0
0
I have released a new version of PLV8, which now builds on PostgreSQL 9.4, as well as containing a number of bug fixes.

It can be downloaded at http://pgxn.org/dist/plv8

Enjoy

Christophe Pettus: Logical Decoding and JSON Talks at FOSDEM

Ernst-Georg Schmid: count(*) is faster than count(1) revisited

$
0
0
Whoa, I didn't expect that much resonance to the last post. :-)

"Can you post the `explain analyze` output of both queries? I'm pretty sure this is normal "distribution" caused by other stuff going on in the system"

OK. Since my 'test' was on a VirtualBox VM with a self-compiled PostgreSQL and this might not be a reliable test setup, I hijacked one of our real PostgreSQL servers:

uname -a
Linux  2.6.32-358.23.2.el6.x86_64 #1 SMP Sat Sep 14 05:32:37 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz ,16 GB RAM, storage on SAN, ext4.

select version();
"PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit"

create table hmr(id serial, value real);

insert into hmr (value) select random()*10000000 from generate_series(1,10000000);


CREATE INDEX idx_id
  ON hmr
  USING btree
  (id);


vacuum analyze hmr;

---------------------------------------------------------------------------

explain analyze select count(1) from hmr;

"Aggregate  (cost=169248.00..169248.01 rows=1 width=0) (actual time=1439.521..1439.521 rows=1 loops=1)"
"  ->  Seq Scan on hmr  (cost=0.00..144248.00 rows=10000000 width=0) (actual time=0.006..642.276 rows=10000000 loops=1)"
"Total runtime: 1439.548 ms"


select count(1) from hmr; --811 msec.

----------------------------------------------------------------------------

explain analyze select count(*) from hmr;


"Aggregate  (cost=169248.00..169248.01 rows=1 width=0) (actual time=1299.034..1299.034 rows=1 loops=1)"
"  ->  Seq Scan on hmr  (cost=0.00..144248.00 rows=10000000 width=0) (actual time=0.006..644.912 rows=10000000 loops=1)"
"Total runtime: 1299.061 ms"


select count(*) from hmr; --670 msec 

----------------------------------------------------------------------------

explain analyze select count(id) from hmr;


"Aggregate  (cost=169248.00..169248.01 rows=1 width=4) (actual time=1576.046..1576.046 rows=1 loops=1)"
"  ->  Seq Scan on hmr  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.004..636.076 rows=10000000 loops=1)"
"Total runtime: 1576.069 ms"


select count(id) from hmr; --920 msec

---------------------------------------------------------------------------- 

And the winner is: still count(*) with count(1) second and count(id) third.

This could be an explanation for count(1) being slower:

"So count(1) is explicitly passing the 1 in and checking it for being NULL before incrementing a counter while count(*) is simply incrementing the counter."

And count(id) might be slower, because it takes additional time to consider using the index?

I'm out of my league here.

Christoph Berg: apt.postgresql.org statistics

$
0
0

At this year's FOSDEM I gave a talk in the PostgreSQL devroom about Large Scale Quality Assurance in the PostgreSQL Ecosystem. The talk included a graph about the growth of the apt.postgresql.org repository that I want to share here as well:

The yellow line at the very bottom is the number of different source package names, currently 71. From that, a somewhat larger number of actual source packages that include the "pgdgXX" version suffixes targeting the various distributions we have is built (blue). The number of different binary package names (green) is in about the same range. The dimension explosion then happens for the actual number of binary packages (black, almost 8000) targeting all distributions and architectures.

The red line is the total size of the pool/ directory, currently a bit less than 6GB.

(The graphs sometimes decrease when packages in the -testing distributions are promoted to the live distributions and the old live packages get removed.)

Hubert 'depesz' Lubaczewski: Returning data in multiple columns

$
0
0
I was working today on some updates to client database. While doing it, I figured it would be simpler if I saw all “codenames" and ids of rows from dictionary table – not so big. But it was bigger than my screen – I have only 90 lines of text on screen, and there were […]

Pavel Stehule: Simple multicolumn ouput in psql

$
0
0
There are interesting idea on Depesz's blog .

But some possibility has a Linux itself. There is simple pager column.

You can try (blogger engine break formatting):
postgres=# \pset tuples_only
postgres=# \setenv PAGER column

postgres=# select typname from pg_type limit 100;
bool pg_type line _bool _varchar _inet _numeric
bytea pg_attribute _line _bytea _int8 _cidr timetz
char pg_proc float4 _char _point _cstring _timetz
name pg_class float8 _name _lseg bpchar bit
int8 json abstime _int2 _path varchar _bit
int2 xml reltime _int2vector _box date varbit
int2vector _xml tinterval _int4 _float4 time _varbit
int4 _json unknown _regproc _float8 timestamp numeric
regproc pg_node_tree circle _text _abstime _timestamp refcursor
text smgr _circle _oid _reltime _date _refcursor
oid point money _tid _tinterval _time
tid lseg _money _xid _polygon timestamptz
xid path macaddr _cid aclitem _timestamptz
cid box inet _oidvector _aclitem interval
oidvector polygon cidr _bpchar _macaddr _interval

postgres=#
It works together with less
postgres=# \setenv PAGER '(column | less)'
postgres=# select typname from pg_type;
...

Do you know some other nice pagers?

Marko Tiikkaja: allas: connection pooling for LISTEN / NOTIFY

$
0
0
Lately I've been working on a connection pooler which only supports LISTEN / NOTIFY.  The idea is to be able to keep the number of Postgres connections down without having to give up (or come up with a workaround for) notifications.  With allas you can e.g. use pgbouncer or your environment's native connection pool and open a separate connection for notifications only.  allas internally uses only

Erik Van Norstrand: Architecture Is Key

$
0
0
Developing good software requires three key components: A strong base set of projects that build a framework for further development. A database and database architecture that scales as you add data. A good user interface that is responsive to users. The base projects that I decided to use for SQLObjectifier is a combination of popular open source products. The backend consists of NodeJS with

Jehan-Guillaume (ioguix) de Rorthais: Partitioning and constraints part 1 - UNIQUE

$
0
0

Partitioning in PostgreSQL has been an artisanal work for a long time now. And despite the current discussion running since few month on PostgreSQL’s hackers mailing list, it will probably stay this way for some time again. Just because it requires a lot of brainstorm and work.

Nowadays, I believe the current state of partitioning under PostgreSQL is quite well documented and under control most of the time. You can find a lot of informations about that online, starting by PostgreSQL documentation itself, but about tooling as well, extension, etc.

However, there’s still a dark side, not well covered or understood about partitioning under PostgreSQL: constraints related to them. More specifically unique constraints covering all partitions of a partitioned table and how to refer to them from foreign keys. This series of article analysis how to implement them by hands ourself thanks to some PostgreSQL great features, detailing how to avoid major traps. You will see that crafting these «constraints» wannabes requires some attention, but is definitely doable, in a clean way.

As this subject requires quite some details and explanations, I decided to split it in multiple articles. This first part is about creating a UNIQUE constraint across all partitions of a table. Next one covers how to reference a partitioned table. And maybe some other depending on the motivation, inspiration and feedback.

Study case

I chose to illustrate this article with a table partitioned by date range. This is a fairly frequent practice and adapting this article to another partitioning scheme is quite easy anyway.

Range partitioning on the PK has no challenge: each value of the PK could only live in strictly one child, each of them enforcing the PK internally. So the uniqueness of this PK across the partitions is already enforced by constraints CHECK and UNIQUE of each partition. That’s why my study case partition range using a timestampstz column. As the CHECKs do not apply on the primary key values, each of its values can be in any partition, which can lead to duplicate values residing in different partitions. Exactly what we really want to avoid.

So here is the dummy schema with table “master” partitioned across 5 childs using a date range key partitioning ts.

BEGIN;DROPTABLEIFEXISTSmaster,child0,child1,child2,child3,child4;CREATETABLEmaster(idserialPRIMARYKEY,dummyintDEFAULT(random()*31449600)::int,commenttext,tstimestamptz);CREATETABLEchild0(PRIMARYKEY(id),CHECK(ts>='2010-01-01 00:00:00'ANDts<'2011-01-01 00:00:00'))INHERITS(master);CREATETABLEchild1(PRIMARYKEY(id),CHECK(ts>='2011-01-01 00:00:00'ANDts<'2012-01-01 00:00:00'))INHERITS(master);CREATETABLEchild2(PRIMARYKEY(id),CHECK(ts>='2012-01-01 00:00:00'ANDts<'2013-01-01 00:00:00'))INHERITS(master);CREATETABLEchild3(PRIMARYKEY(id),CHECK(ts>='2013-01-01 00:00:00'ANDts<'2014-01-01 00:00:00'))INHERITS(master);CREATETABLEchild4(PRIMARYKEY(id),CHECK(ts>='2014-01-01 00:00:00'ANDts<'2015-01-01 00:00:00'))INHERITS(master);CREATEINDEXONchild0(ts);CREATEINDEXONchild1(ts);CREATEINDEXONchild2(ts);CREATEINDEXONchild3(ts);CREATEINDEXONchild4(ts);ALTERTABLEchild0ALTERtsSETDEFAULTTIMESTAMPTZ'2010-01-01 00:00:00'+(random()*31449600)::int*INTERVAL'1s';ALTERTABLEchild1ALTERtsSETDEFAULTTIMESTAMPTZ'2011-01-01 00:00:00'+(random()*31449600)::int*INTERVAL'1s';ALTERTABLEchild2ALTERtsSETDEFAULTTIMESTAMPTZ'2012-01-01 00:00:00'+(random()*31449600)::int*INTERVAL'1s';ALTERTABLEchild3ALTERtsSETDEFAULTTIMESTAMPTZ'2013-01-01 00:00:00'+(random()*31449600)::int*INTERVAL'1s';ALTERTABLEchild4ALTERtsSETDEFAULTTIMESTAMPTZ'2014-01-01 00:00:00'+(random()*31449600)::int*INTERVAL'1s';COMMIT;

The SET DEFAULT are only there to keep other commands simple to read. Note that I do not create the trigger on INSERT and UPDATE on the master table. This is out of the scope of this article, will not be needed and add no challenge to the subject.

The naive solution

Of course, the whole trick revolves around triggers. We have to check the uniqueness of a PK value across all partitions after any INSERT or UPDATE on any of them, for each rows. Let’s dive in and get wet with a first naive version of such a trigger:

CREATEORREPLACEFUNCTIONmaster_id_pkey()RETURNStriggerLANGUAGEplpgsqlAS$$BEGINIFcount(1)>1FROMmasterWHEREid=NEW.idTHENRAISEEXCEPTION'duplicate key value violates unique constraint "%" ON "%"',TG_NAME,TG_TABLE_NAMEUSINGDETAIL=format('Key (id)=(%s) already exists.',NEW.id);ENDIF;RETURNNULL;END$$;CREATETRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONmasterFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATETRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild0FOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATETRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild1FOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATETRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild2FOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATETRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild3FOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATETRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild4FOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();

Obviously, each partition need the trigger to check that the PK value it is about to write does not already exist in one of its siblings. The trigger function itself is quite easy to understand: if we find a row with the same value as the PK, raise an exception. Tests sounds promising:

=#INSERTINTOchild0(comment)VALUES('test 1');=#INSERTINTOchild1(comment)VALUES('test 2');=#INSERTINTOchild2(comment)VALUES('test 3');=#INSERTINTOchild3(comment)VALUES('test 4');=#INSERTINTOchild4(comment)VALUES('test 5');=#SELECTtableoid::regclass,*FROMmaster; tableoid | id |  dummy   | comment |           ts           ----------+----+----------+---------+------------------------ child0   |  1 | 22810434 | test 1  | 2010-07-29 02:49:24+02 child1   |  2 | 18384970 | test 2  | 2011-01-28 02:57:00+01 child2   |  3 | 10707988 | test 3  | 2012-05-17 04:58:36+02 child3   |  4 | 15801904 | test 4  | 2013-08-21 10:31:04+02 child4   |  5 | 14906458 | test 5  | 2014-10-16 00:09:58+02(5 rows)=#BEGIN;=#INSERTINTOchild0(id,comment)VALUES(5,'test 6');ERROR:  duplicate key value violates unique constraint "children_id_pkey" ON "child0"DETAIL:  Key (id)=(5) already exists.

OK, it works like expected. But there is two big issues with this situation. The first one is that a race condition involving two transactions or more is able to break our home made unique constraint:

session 1=#BEGIN;BEGINsession 2=#BEGIN;session 2=#INSERTINTOchild0(comment)VALUES('test 6')RETURNINGid; id----  6session 1=#INSERTINTOchild1(id,comment)VALUES(6,'test 7');session 1=#COMMIT;session 2=#COMMIT;session 2=#SELECTtableoid::regclass,*FROMmasterWHEREid=6; tableoid | id |  dummy   | comment |           ts           ----------+----+----------+---------+------------------------ child0   |  6 | 28510860 | test 6  | 2010-01-08 17:36:39+01 child1   |  6 |  2188136 | test 7  | 2011-07-15 07:13:59+02

The second issue is that real constraints can be deferred, which means constraints are disabled during a transaction and enforced on user request and at the end of the transaction by default. In other words, using deferred constraints allows you to violate them temporarilly during a transaction as far as everything is respected at the end. For more information about this mechanism, see the SETCONSTAINTS, CREATETABLE and… the CREATETRIGGER pages.

Yes, documentation says triggers can be deferred when defined as CONSTRAINT TRIGGER. So we can solve this issue by recreating our triggers:

DROPTRIGGERIFEXISTSchildren_id_pkeyONmaster;DROPTRIGGERIFEXISTSchildren_id_pkeyONchild0;DROPTRIGGERIFEXISTSchildren_id_pkeyONchild1;DROPTRIGGERIFEXISTSchildren_id_pkeyONchild2;DROPTRIGGERIFEXISTSchildren_id_pkeyONchild3;DROPTRIGGERIFEXISTSchildren_id_pkeyONchild4;CREATECONSTRAINTTRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONmasterDEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATECONSTRAINTTRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild0DEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATECONSTRAINTTRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild1DEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATECONSTRAINTTRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild2DEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATECONSTRAINTTRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild3DEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();CREATECONSTRAINTTRIGGERchildren_id_pkeyAFTERINSERTORUPDATEONchild4DEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_id_pkey();

The INITIALLY IMMEDIATE means the trigger constraint will be executed right after the related statement. The opposite DEFERRED behavior fire the trigger at the very end of the transaction unless the user decide to SET CONSTRAINTS { ALL | name [, ...] } IMMEDIATE somewhere during the transaction.

Defering the trigger to avoid the race condition ?

Now, if you step back a second to look at what we have, you might wonder if forcing our constraints triggers to be DEFERRABLE INITIALLY DEFERRED would solve the race condition. As constraints are checked at the very end of the transaction, maybe this would work by kind of serializing each transaction and their constraints? Short answer is: no.

For one, deferred constraints comes with a cost in performance we might not want to pay at each transaction. But most importantly, if you declared your trigger as deferrable, one could set it to IMMEDIATE, even if it is set as INITIALLYDEFERRED. So this is definitely not a viable solution. But anyway, occulting this for the purpose of the study, does it work?

Again, no. Even if it solves the “human timing race condition”, there’s another very small window where another race condition is possible in the core of PostgreSQL, when multiple transactions do not conflict and get committed all together at the exact same time. This idea itself sounds suspicious anyway, too fragile. If there is no good ol’locks floating around, there’s a race condition close enough to break things. It is pretty easy to prove with the following bash loop hammering each partitions with 100 INSERTs with colliding values as PK. Note that the triggers has been altered to INITIALLY DEFERRED:

$ psql -c '\d child*' part | grep children_id_pkey
    children_id_pkey AFTER INSERT OR UPDATE ON child0 DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE master_id_pkey()    children_id_pkey AFTER INSERT OR UPDATE ON child1 DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE master_id_pkey()    children_id_pkey AFTER INSERT OR UPDATE ON child2 DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE master_id_pkey()    children_id_pkey AFTER INSERT OR UPDATE ON child3 DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE master_id_pkey()    children_id_pkey AFTER INSERT OR UPDATE ON child4 DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE master_id_pkey()$ psql -c 'truncate master cascade' part

$for i in {1..100};do>   psql -c "INSERT INTO child0 (id, comment) SELECT count(*)+1, 'duplicated ?' FROM master" part &>   psql -c "INSERT INTO child1 (id, comment) SELECT count(*)+1, 'duplicated ?' FROM master" part &>   psql -c "INSERT INTO child2 (id, comment) SELECT count(*)+1, 'duplicated ?' FROM master" part &>   psql -c "INSERT INTO child3 (id, comment) SELECT count(*)+1, 'duplicated ?' FROM master" part &>   psql -c "INSERT INTO child4 (id, comment) SELECT count(*)+1, 'duplicated ?' FROM master" part &>done&> /dev/null &&wait$ cat <<EOQ | psql part> SELECT count(1), appears, total FROM (>   SELECT id, count(1) AS appears, sum(count(*)) over () AS total>   FROM master>   GROUP BY id> ) t > GROUP BY 2,3 ORDER BY appears> EOQ count | appears | total -------+---------+-------   149 |       1 |   209    23 |       2 |   209     3 |       3 |   209     1 |       5 |   209

Well, that’s pretty bad, we have a bunch of duplicated key. 23 of them appear in two different partitions, three others in three different partitions and even one in all of them! I could find duplicates like that each time I ran this scenario. Note that on 500 inserts, only 209 survived in total. That makes 291 exceptions raised out of 324 expected, counting the duplicated keys that were not caught.

Isolation level ?

Well, last chance. If this many transactions were committed in the exact same time, maybe we can force them to serialize with isolation level SERIALIZABLE?

ALTERDATABASEpartSETdefault_transaction_isolationTOSERIALIZABLE

After applying the preceding query, I re-ran the same scenario as the previous test: only 76 rows survived out of the 500 INSERTs, all of them unique. At last! Ok, this reflects what we had in mind previously, but we had to force PostgreSQL to really serialize transactions. Any other isolation level will just fail. And by the way, this works with IMMEDIATE and DEFERRED triggers as transactions are virtually serialized or rollback’ed. Log file confirms a lot of serialization conflicts were raised, grep’ing the log file shows 415 serialization exceptions and only 9 from our trigger:

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
HINT:  The transaction might succeed if retried.
STATEMENT:  INSERT INTO child3 (id, comment) SELECT count(*)+1, 'duplicated ?' FROM master

This solution work, but having to stay in SERIALIZABLE mode to achieve our goal is a heavy constraint to carry. Moreover, we have the same problem than with DEFERRED triggers: as a simple user can change its isolation level, any bug in the application or not informed user can lead to scenarios with silent duplications. Fortunately, another simpler and safer solution exist.

Real solution: adding locks

The SERIALIZABLE solution works because to emulate serial transaction execution for all transactions, it takes predicate locks behind the scene to detect serialization anomalies. What about taking care of this ourselves? We are used to locks, we know they work fine.

The best solution sounds to acquire a lock before being able to write out the value. This actually boil down to forcing conflicting transactions on a lock to serialize themselves, instead of having the engine do all the works for everyone. The question now is «how can we hold a lock on something that doesn’t exists yet?». The answer is: Advisory Locks. Advisory locks offers to applications a lock mechanism and manager on arbitrary integer values. It does not applies on real objects, transaction or rows. As the documentation says: «It is up to the application to use them correctly»

The idea now is simply to acquire an advisory lock on the same value as NEW.id in the trigger function. It should do the trick, cleanly, safely:

CREATEORREPLACEFUNCTIONpublic.master_id_pkey()RETURNStriggerLANGUAGEplpgsqlAS$function$BEGINPERFORMpg_advisory_xact_lock(NEW.id);IFcount(1)>1FROMmasterWHEREid=NEW.idTHENRAISEEXCEPTION'duplicate key value violates unique constraint "%" ON "%"',TG_NAME,TG_TABLE_NAMEUSINGDETAIL=format('Key (id)=(%s) already exists.',NEW.id);ENDIF;RETURNNULL;END$function$;

And with this version of master_id_pkey(), in “read committed” isolation level, here is result of the same scenario as in the previous chapter, executing 500 INSERTs concurrently with conflicting keys:

$ psql -f /tmp/count_duplicated_id.sql part
 count | appears | total -------+---------+-------    85 |       1 |    85 (1 row)

Sounds good. What about a small pgbench scenario?

$ cat /tmp/scenario_id.sql 
\setrandom part 0 4DO $func$ BEGIN EXECUTE format('INSERT INTO child%s (id, comment) SELECT count(*)+1, $1 FROM master', :part) USING 'duplicated ?'; EXCEPTION WHEN OTHERS THEN RAISE LOG 'Duplicate exception caught!'; END $func$;$ psql -c 'truncate master cascade' part

$ pgbench -n -f /tmp/scenario_id.sql -c 5 -T 300 part
transaction type: Custom queryscaling factor: 1query mode: simplenumber of clients: 5number of threads: 1duration: 300 snumber of transactions actually processed: 130908latency average: 11.458 mstps = 436.338755 (including connections establishing)tps = 436.354969 (excluding connections establishing)$ psql -f /tmp/count_duplicated_id.sql part
 count | appears | total -------+---------+------- 48351 |       1 | 48351$ grep -c "LOG:  Duplicate"$LOGFILE82557

After this 5 minute run with 5 workers inserting as fast as they can highly conflicting data, we have 48,351 rows in the partitions, 82,557 conflicting rows were rejected and not a single duplicate in the table.

I couldn’t find any duplicated value after stressing this solution. Whatever the number of queries for each parallel sessions working, whatever the pgbench scenario, I had no unique violation across partitions, as expected. This work in any transaction isolation level and user can not turn this off by mistake. This is safe

…Well, as far as a superuser or the owner of the table do not disable the trigger on the table, obviously. But hey, they can drop the unique constraint on a normal table as well, right?

Wow, at last, finished. What? No? I can hear you thinking it only applies on integers. OK, bonus.

Supporting other types

Supporting unique constraint on integers was straightforward using advisory locks. But how can this applies to other types? Like text for instance ? Easy: hash it1! For the purpose of this last chapter, lets add a unique constraint on comment:

CREATEORREPLACEFUNCTIONpublic.master_comment_unq()RETURNStriggerLANGUAGEplpgsqlAS$function$BEGINPERFORMpg_advisory_xact_lock(hashtext(NEW.comment));IFcount(1)>1FROMmasterWHEREcomment=NEW.commentTHENRAISEEXCEPTION'duplicate key value violates unique constraint "%" ON "%"',TG_NAME,TG_TABLE_NAMEUSINGDETAIL=format('Key (comment)=(%L) already exists.',NEW.comment);ENDIF;RETURNNULL;END$function$;CREATECONSTRAINTTRIGGERchildren_comment_unqAFTERINSERTORUPDATEONmasterDEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_comment_unq();-- [...]CREATECONSTRAINTTRIGGERchildren_comment_unqAFTERINSERTORUPDATEONchild4DEFERRABLEINITIALLYIMMEDIATEFOREACHROWEXECUTEPROCEDUREpublic.master_comment_unq();

If you followed this far, no need to play the “find the error game” to identify what’s the most important change here. The lock is taken on the result of the simple text to integer hash function hashtext, already provided in PostgreSQL’s core.

Ok, I can hear optimization freaks crying. Theoretically, two different strings can collide. But this hash function is supposed to compute uniform results amongst 4 billion possible values. I can live with the probability of two concurrent writes involving two different strings colliding here. The “1 case” out of 4 billion is already enough for me, but these colliding strings has to show up at the exact same time (at least in the same bunch of few milliseconds). And even if you are unlucky enough to experience this, these two transactions will just be serialized, not a big deal.

And if you are really not comfortable with this, you understood the trick here anyway: find a way to hold a lock somewhere to avoid concurrency. Use some other hashing function, create an extension with its own lock machinery in memory, write in an unlogged table (erk), whatever you want.

Time to test now.

$ cat /tmp/scenario_comment.sql 
\setrandom part 0 4DO $func$ BEGIN EXECUTE format('INSERT INTO child%s (comment) SELECT ''duplicated ''||count(1) from master', :part); EXCEPTION WHEN OTHERS THEN RAISE LOG 'Duplicate exception caught!'; END $func$;$ psql -c 'truncate master cascade' part

$ pgbench -n -f /tmp/scenario_comment.sql -c 5 -T 300 part
transaction type: Custom queryscaling factor: 1query mode: simplenumber of clients: 5number of threads: 1duration: 300 snumber of transactions actually processed: 93902latency average: 15.974 mstps = 312.971273 (including connections establishing)tps = 312.987557 (excluding connections establishing)$ cat <<EOQ | psql part> SELECT count(1), appears, total FROM (>   SELECT comment, count(1) AS appears, sum(count(*)) over () AS total>   FROM master>   GROUP BY comment> ) t > GROUP BY 2,3 ORDER BY appears> EOQ count | appears | total -------+---------+------- 29785 |       1 | 29785

Wow (again), only 29,785 rows out of 93,902 transactions escaped of that intense-colliding scenario. And we only find unique values across all partitions, as expected. Aaaaand grep-ing from the log file, I can find 64,117 rejected rows…

Conclusion

Such a long way already. Thank you for reading so far. At first I thought I could write about unique and foreign keys in the same article, but look at what we already covered…We talked about constraint triggers, race conditions, isolation level, advisory locks and hashing… few!

I do realize the solution provided here requires some skills and attention. This is not all magic and easy to play with. As long as this feature is not handled directly by the core, partitioning will require people to craft their tools themselves. In the meantime, it is a nice subject to learn more about these concepts, your favorite RDBMS and play with it.

I think this way of guaranteeing unicity over several partitions is bulletproof. If you think you found a loophole, please send me some feedback, I’ll be pleased to learn about them.

And don’t forget, we are not done! Lot of fun with foreign keys in the next part! Stay tuned!


1 No data has been harmed during this test.

Ernst-Georg Schmid: count(*) is faster than count(1) - not in 9.4

$
0
0
In PostgreSQL 9.4, count(1) speed is equal to count(*). Another reason to upgrade to 9.4.

count(<column>) is still slower if no index is used...
Viewing all 9824 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>