Vincenzo Romano: History table: my (very own) design pattern

May 29, 2018, 3:08 am

≫ Next: Vincenzo Romano: PostgreSQL and the temporary functions

≪ Previous: Craig Kerstiens: Fun with SQL: Window functions in Postgres

Modern applications often require the ability to track data changes along with the data itself. Think about stock prices, exchange rates and the likes. It’s also possible that the concept of “current data” don’t even have a meaning within a specific application realm.

For such cases a “history table” can be a good solution, while not the one and only. The aim of a history table is to keep a precise track of how a certain set of values has changed in time.

Let’s stick with the stock prices example and let’s start with an initial naif implementation.

create table stockprices (
  stock text primary key,
  price numeric
);

(Sorry, I cannot really bear uppercase SQL. My fault!)

This simple table can actually record the prices of stocks. But it’s way too simple as we need to know when that price has been updated. Let’s go back to the editor.

create table stockprices (
  stock text primary key,
  price numeric,
  enter timestamp
);

The idea is that every time you update a row with a new price or insert it, you also update the timestamp column to record when the change happened. The primary key is there because the “natural” way to query this table is to use the stock code column to get its current price.

select * from stockprices where stock='GOOGL';

Easy.
But there’s nothing historical here: you just keep the latest value. There’re no old values as after the first insertion you keep updating the very same row. So it’s time to go historical!

First things first. If there needs to be a primary key, that won’t be the stock column. We need to get more rows for the same stock and different enter timestamps. Here we go!

create table stockprices (
  stock text,
  price numeric,
  enter timestamp
);

Of course we will still use the stock code column to query this table so we’ll need an index over that same column. We also add an index over the timestamp column: it’ll be more clear soon.

create index on stockprices( stock );
create index on stockprices( enter desc ); -- look here!

The previous query now becomes:

select distinct on( stock ) *
  from stockprices
  where stock='GOOGL'
  order by stock, enter desc;

Here we sort by stock code and by reverse enter timestamp (that is “latest first”) being equal the stock code. The distinct clause will pick the first row in a group, thus that with the latest price. The main advantage here is that the same table works for both the historical data and for the current data. There’s actually no difference between older and newer data!

If you think we’re done, you’re wrong. We’re just halfway. Stay tuned then.

When dealing with history tables my experience reminds me about a few facts.

The time a datum enters isn’t almost always the time it is valid. It can enter either earlier (then it’s a forecast or prevision) or later (then it’s a correction).
Corrections can happen, even multiple times, and you need to keep track of all of them.
A datum can stop to exist after certain moment as if it never existed.
You can have two types of queries. One looking for current values and one looking for values as seen at a certain moment in the past (and sometimes also in the future).

Before going further on I’d like to remind you a few technical facts that are useful for our purpose.

When you delete rows in a table, no row is actually deleted. Those are being marked as “recyclable”. The same happens to the related entries in the indexes.

Similarly, when you update rows in a table, no row is actually updated. Those are “deleted” (see above) and new rows with the updated values are inserted. The same happens to the related entries in the indexes.

While these things don’t sound as bad things, if deletions and updates happen very often, the table and its indexes get “bloated” with a lot of those “recyclable” entries. Bloated tables do not represent a big performance issue. Bloated indexes do. In fact, we almost always have indexes to access rows and having “holes” punched into the table storage isn’t a big issue as the index allow direct row access. The same isn’t true for indexes. The larger they are, the more RAM they need to be stored and the more I/O you need to perform to load them. Keeping the indexes at their bare minimum storage requirements is a good thing. This is why most RDBMS have a feature, called “vacuuming”, that compacts both the table storage and the indexes and swipes those holes away. Vacuuming is also a good thing. It’s a pity you cannot really do it while the tables are being actively used: a table lock is needed in order to perform it. And, the larger the table to be vacuumed, the longer the time that lock will stay active. Vacuuming is a maintenance activity you will tend to perform during off-peak time or even with no application allowed to access those tables.

History tables help to address this problem. History tables usually don’t have any deletion or update at all. Just insertions. And queries, of course. For this reason history tables and indexes get new rows “appended”. No hole is punched into the storage and no vacuuming is needed thereafter. It’s a concept very similar to the one used with the ledgers for bookkeeping: you always append new data. if you need to correct, you add new data to cancel and yet new data for the correction. This is also why history table are well suited for the so-called “big data” applications. Time to get back to our history table implementation.

In a history table, besides the actual data we need (the stock code and it’s prices in our example) we need a few more columns for the housekeeping. A few of these needs cannot be explained at this stage and will be clearer later.

We need to be able to identify every single row unequivocally. A bigserial will do as it’s capacity almost reaches 10^18 (billions of billions). This isn’t always needed but, I admit it, I like the idea to have it everywhere.

We need to know when a certain row has been inserted, no matter the data validity time is. So we add a timestamp column for that.

We need to know since when a certain row is to be considered effective, so another timestamp is needed.

Finally, we need to be able to say that a certain stock code isn’t used any more, so it’s latest price isn’t valid any more. I used a bool for that.

Our table structure now does need some makeup.

create table stockprices (
  stock text not null,
  price numeric not null,
  enter timestamp not null default now(),
  valid timestamp not null default '-infinity'::timestamp,
  erase bool not null default false,
  id bigserial primary key
);

I have added a few default values, not really needed in the real life, but useful to remember the meaning of some columns.

Also our query needs some makeup, to become a little bit more complex.
We first select the current values for all the available rows.

select distinct on( stock ) *
  from stockprices
  where valid <= now()
  order by stock, valid desc, enter desc;

We will get a row for each single stock code with the latest prices as of now(). We are ordering by reverse (desc) validity and by reverse insertion times. Why?
Let’s imagine we have a stock code ‘GOOGL’ for EUR 400.00 on 09-04-2018. We then check that the value is wrong. It was actually 401.00 on the very same date. So we insert a new line where the enter is newer the that of the previous row. We can even apply more “corrections” on the very same “row” all with newer and newer enter timestamps. We are keeping track of all the values we have entered there, while the query will pick up only the latest of the latest. No deletion and no updates.

Once we are here, we can then expunge all those rows which have the erase flag set to false.

with x as (
  select distinct on( stock ) *
    from stockprices
    where valid < now()
    order by stock, valid desc, enter desc
)
select * from x where erase is false;

As the now() function returns newer and newer timestamps, the query will also pull in the game all those rows that have been inserted as “future” data, those whose valid column has been set in the future. For example, it can make a certain stock code “disappear”at a certain timestamp. With no maintenance intervention on the table. A row with the column erase set to true and the proper valid timestamp will enter the query row set and will make that row to disappear at the right time!

But we haven’t finished yet. Why?
The indexes haven’t got their makeup yet. Indexes is why don’t use plain files!
Everyone should follow a simple rule while designing indexes for DBs. This rule says: “None knows about the DB more than the RDBMS itself“.

Of course, we can start from a simple index design, usually based on reasonableness. We fill the tables is with some meaningful data and then ask PostgreSQL to show you how it’s doing with those data and the available indexes. It’s the wonderful world of the explain PostgreSQL extension. This is not (yet) part of the SQL standard and is one of the most interesting extensions.

First of all we need to use real life data. Sort of. You can write a small program to generate an SQL file to populate our table with a number of rows. My personal first test is based on 100k rows involving 500 different stock codes over 365 days with prices ranging from 0 to 500. In order to simplify the generation of the data I’ve modified the table in order to have the column stock as int instead of text. All data are random so there can be stock codes that never get into the table: we also need some “not found” results. Do we?

The table, at the moment, should only have a single index needed for the primary key. It should resemble to this one:

tmp1=# \d stockprices
                                       Table "public.stockprices"
 Column |            Type             | Collation | Nullable |                 Default                  
--------+-----------------------------+-----------+----------+------------------------------------------
 stock  | integer                     |           | not null | 
 price  | numeric                     |           | not null | 
 valid  | timestamp without time zone |           | not null | '-infinity'::timestamp without time zone
 enter  | timestamp without time zone |           | not null | now()
 erase  | boolean                     |           | not null | false
 id     | bigint                      |           | not null | nextval('stockprices_id_seq'::regclass)
Indexes:
    "stockprices_pkey" PRIMARY KEY, btree (id)

Our query doesn’t look for anything special. It “just” collapses the whole table to the latest values as of now(). We see it uses all columns but price (which is the column we will be interested in) and id (which is a unique row identification number, useless for most of our needs).

We could then start by creating an index for each of those columns. Let’s do it.

create index on stockprices ( stock );
create index on stockprices ( valid );
create index on stockprices ( enter );
create index on stockprices ( erase );

After a while all indexes will be created. I’d like to remember that if we created those indexes before loading the data, we could get a uselessly slow “table population process”. Creating them all at once will save a lot of time. Later on this.
We can now ask our command line tool to show some timing information for query execution:

\timing

I actually have this command in my ~/.psqlrc file so it’s always on by default.

It’s time to fire our query and see how it will be performed by PostgreSQL.

explain analyze with x as (
select distinct on( stock ) *
from stockprices
where valid < now()
order by stock, valid desc, enter desc -- Sort on 3 columns
)
select * from x where erase is false;

This is output on my system:

tmp1-# select * from x where erase is false;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on x  (cost=13976.82..13986.82 rows=250 width=61) (actual time=84.501..99.082 rows=500 loops=1)
   Filter: (erase IS FALSE)
   CTE x
     ->  Unique  (cost=13476.82..13976.82 rows=500 width=33) (actual time=84.497..99.015 rows=500 loops=1)
           ->  Sort  (cost=13476.82..13726.82 rows=100000 width=33) (actual time=84.496..94.757 rows=100000 loops=1)
                 Sort Key: stockprices.stock, stockprices.valid DESC, stockprices.enter DESC
                 Sort Method: external merge  Disk: 5680kB
                 ->  Seq Scan on stockprices  (cost=0.00..2435.00 rows=100000 width=33) (actual time=0.016..23.901 rows=100000 loops=1)
                       Filter: (valid < now())
 Planning time: 0.745 ms
 Execution time: 99.771 ms
(11 rows)

Time: 101,181 ms

In order to make life easier to DBAs, there’s a neat and great online help to better understand it, written and hosted by Hubert ‘depesz‘ Lubaczewski. It’s called … uhm … explain. You simply paste your explain output there and you get in return a more readable format.

From bottom up, I see on line #5 a sequential table scan (which isn’t a nice thing at all!) to select only those rows that are not in the future. At line #4 that that sort over three columns is run and then on line #3 the rows are “squashed” to be made unique. At line #1 the test on the column flag is run to expunge all rows which have been erased.

As a first attempt at getting it better I try to create a single index for that sort and leave the one on the erase column.

drop index stockprices_enter_idx;
drop index stockprices_valid_idx;
drop index stockprices_stock_idx;
create index on stockprices( stock, valid desc, enter desc );

Then I reboot my system (to be sure all disk caches are gone) and try again the query:

                                                                                QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on x  (cost=7736.16..7746.16 rows=250 width=61) (actual time=0.115..48.671 rows=500 loops=1)
   Filter: (erase IS FALSE)
   CTE x
     ->  Unique  (cost=0.42..7736.16 rows=500 width=33) (actual time=0.109..48.513 rows=500 loops=1)
           ->  Index Scan using stockprices_stock_valid_enter_idx on stockprices  (cost=0.42..7486.18 rows=99990 width=33) (actual time=0.107..42.855 rows=100000 loops=1)
                 Index Cond: (valid < now())
 Planning time: 0.486 ms
 Execution time: 48.750 ms
(8 rows)

Time: 49,875 ms

Bingo! I’ve cut the time in a half.
The table sequential scan is gone, replaced by an index scan plus a condition. Scanning an index should be much better that scanning a table. Shouldn’t it?

It’d be nice to squeeze some more time out of this query but I’ll leave it to the keen reader. I’ll save you some time: it’s useless (as of v10.4) to add the filter condition “erase IS FALSE” to the unified index (or to create one new). That condition is run over the CTE (common table expression, a temporary un-indexable table) we’ve called x.

For sure we can ditch the index on the flag, as it’s not used at all and would just cause more work on the RDBMS during insertions.

Of course we haven’t tried yet a complete query, that where you look for a specific stock code. The query could simply be:

explain with x as (
select distinct on( stock ) *
from stockprices
where valid < now()
order by stock, valid desc, enter desc
)
select * from x where erase is false and stock=42; -- Test here? No!

I won’t even try it. The condition on the stock code would be sequentially applied to the CTE, just like the one on the flag. It wouldn’t take advantage of any index!

The right way to do it is to do the selection within the CTE, like here:

explain analyze with x as (
select distinct on( stock ) *
from stockprices
where valid < now() and stock=142 -- The test is better here!
order by stock, valid desc, enter desc
)
select * from x where erase is false;

The results are different. Just a little bit:

                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on x  (cost=501.03..504.31 rows=82 width=61) (actual time=0.840..0.895 rows=1 loops=1)
   Filter: (erase IS FALSE)
   CTE x
     ->  Unique  (cost=500.54..501.03 rows=164 width=33) (actual time=0.834..0.888 rows=1 loops=1)
           ->  Sort  (cost=500.54..501.03 rows=198 width=33) (actual time=0.833..0.862 rows=215 loops=1)
                 Sort Key: stockprices.valid DESC, stockprices.enter DESC
                 Sort Method: quicksort  Memory: 41kB
                 ->  Bitmap Heap Scan on stockprices  (cost=6.45..492.98 rows=198 width=33) (actual time=0.210..0.670 rows=215 loops=1)
                       Recheck Cond: ((stock = 142) AND (valid   Bitmap Index Scan on stockprices_stock_valid_enter_idx  (cost=0.00..6.40 rows=198 width=0) (actual time=0.143..0.143 rows=215 loops=1)
                             Index Cond: ((stock = 142) AND (valid < now()))
 Planning time: 0.314 ms
 Execution time: 0.966 ms
(14 rows)

Time: 2,070 ms

We are almost done. We have a query to create a view of “current values”, we have a query to select the current value for a single stock code.

What we’re missing is a query for a set of stock codes. Of course, I am not going to make use of predicates like ... where (stock=42 or stock=142 or stock=242). First that would require you to compose a dynamic query (which isn’t a bad thing at all in general, while still error prone). Second, it just multiplies the query (or a part of it) by the number of different stock codes you are looking for. If they’re 100, the query will be likely repeated 100 times. So what?

The answer is INNER JOIN.

Let’s imagine we have a generic tool table we use for selections. This table can be used by multiple users (and for multiple purposes) and still keep track of all the selection that have been made.

create table stockselection (
  selid bigserial not null,
  stock int not null,
  primary key( selid,stock )
);

This is how it works. You insert the first stock code to search for like this:

insert into selection ( stock ) values ( 42 ) returning selid;
 selid 
-------
   666
(1 row)

INSERT 0 1

So you get the newly created value for the selid column. That’s the “selection id” that allows multiple selection queries to be run by “just” using different selection ids.
Then you insert the remaining stock codes using that very same selection id with queries like this one:

insert into selection values ( 666,142 );
insert into selection values ( 666,242 );
...

The query with the stock selection feature will become:

explain with x as (
  select distinct on( stock ) *
    from stockprices
    natural inner join stockselection # inner join
    where valid < now() and selid=666 # selection id
    order by stock, valid desc, enter desc
)
select * from x where erase is false;

This is it. More or less. Of course, there’s plenty of room for further enhancements. This article is to show ideas and to give hints, not boxed solutions. Some more work can be needed depending upon your very own case.

Enjoy your coding.

↧

Vincenzo Romano: PostgreSQL and the temporary functions

May 30, 2018, 2:14 am

≫ Next: Robert Haas: Who Contributed to PostgreSQL Development in 2017?

≪ Previous: Vincenzo Romano: History table: my (very own) design pattern

In my previous article I’ve used the builtin function now() to select from a history table only those rows that are “current“.

A “brilliant next idea” that came to my mind was: “if I can redefine now() in my temporary schema (one for each session) I can browse that table as if I were either in the past or in the future, thus opening my solution to a number of other applications“.

So I’ve tried to go this path.

create or replace temporary function now( out ts timestamp )
volatile language plpgsql as $l0$
begin
  ts = '2017-01-01 00:00:00'::timestamp;
end;
$l0$;

It doesn’t work: there’s no such a command like “create temporary function ...” in PostgreSQL!
So I’ve tried a different syntax.

create or replace function pg_temp.now( out ts timestamp )
volatile language plpgsql as $l0$
begin
  ts = '2017-01-01 00:00:00'::timestamp;
end;
$l0$;

This works indeed. Let’s check.

tmp1=# select * from now();
              now              
-------------------------------
 2018-05-30 10:48:19.093572+02
(1 row)

Time: 0,311 ms
tmp1=# select * from pg_temp.now();
         ts          
---------------------
 2017-01-01 00:00:00
(1 row)

Time: 0,790 ms

It works. In order to “mask” the builtin with the temporary I should change the search_path like this:

tmp1=# show search_path ;
   search_path   
-----------------
 "$user", public
(1 row)

Time: 0,354 ms
tmp1=# set search_path to "$user", pg_temp, public;
SET
Time: 0,382 ms

But …

tmp1=# select * from now();
              now              
-------------------------------
 2018-05-30 10:49:54.077509+02
(1 row)

Time: 0,201 ms
tmp1=# select * from pg_temp.now();
         ts          
---------------------
 2017-01-01 00:00:00
(1 row)

Time: 0,628 ms

It doesn’t work as expected! What? Is this a bug? Of course … it is not!

I’ve found the answer with some search.
There’s an explicit reference to such a case in the PostgreSQL general mailing list, by Tom Lane himself back in 2008. I quote (with slight edits):

You can [create a temporary function] today, as long as you don’t mind schema-qualifying uses of the function!
That’s intentional because of the risk of trojan horses.

I am not really sure that it will protect from evil behaviors: an evil temporary table can mask a good one and change the behavior of an application. And if I want to avoid any masking trick involving the search_path, I would always schema qualify all object references. Wouldn’t I?

Anyway, this means that my trick won’t ever work, not in the way I’ve devised. I have a number of other ideas here involving either the do statement or temporary tables or maybe something else …

↧

Robert Haas: Who Contributed to PostgreSQL Development in 2017?

June 1, 2018, 10:45 am

≫ Next: Alexey Lesovsky: Autovacuum slides from PgCon 2018 Ottawa

≪ Previous: Vincenzo Romano: PostgreSQL and the temporary functions

Last year, I wrote a post on who contributed to PostgreSQL development in 2016. This is a (belated) version of the same information for 2017. I used the same methodology this time for analyzing the commit log as I did last year.

Read more »

↧

Alexey Lesovsky: Autovacuum slides from PgCon 2018 Ottawa

June 2, 2018, 4:28 am

≫ Next: David Fetter:

≪ Previous: Robert Haas: Who Contributed to PostgreSQL Development in 2017?

Slides from our talks at PgCon 2018.

This week was busy for me and my colleague Ilya Kosmodemiansky as we made our way to Ottawa, Canada to attend one of the focal events in Postgres - PGCon. This PostgreSQL conference for users and developers runs annually and gathers PostgreSQL enthusiasts from across the globe. Aside fascinating and inspiring talks it’s a fantastic networking opportunity.

This year’s conference started on 29th May with tutorials, one of which was Ilya’s tutorial Linux IO internals for database administrators which was really well attended. The main conference started on 31st May and on the 1st June I gave my talk about autovacuum.

The aim of my talk was to address some of the key vacuum misconceptions and to make the audience realise that autovacuum is actually an essential tool for making database faster, more efficient and problem free and that disabling it would actually cause more damage than good. Below are my slides and some photos from the conference.

I must admit that though my journey to Ottawa was quite long (20 hours!) it was definitely worth it and this kind of event makes you feel a part of the powerful global community. It’s a very exciting conference to attend and, at the same time, the atmosphere is friendly and relaxing.

Huge thanks to organisers for doing such an amazing work!

↧

David Fetter:

June 2, 2018, 12:57 pm

≫ Next: David Fetter: Slides from "ASSERTIONs and how to use them"

≪ Previous: Alexey Lesovsky: Autovacuum slides from PgCon 2018 Ottawa

I've posted the slides for my ASSSERTIONs and how to use them" talk from PGCon 2018. The files are in this repository.

↧

David Fetter: Slides from "ASSERTIONs and how to use them"

June 2, 2018, 12:57 pm

≫ Next: Michael Paquier: Postgres 11 highlight - More Partition Pruning

≪ Previous: David Fetter:

I've posted the slides for my ASSERTIONs and how to use them" talk from PGCon 2018. The files are in this repository.

↧

Michael Paquier: Postgres 11 highlight - More Partition Pruning

June 3, 2018, 1:03 am

≫ Next: Viorel Tabara: Multi Datacenter setups with PostgreSQL

≪ Previous: David Fetter: Slides from "ASSERTIONs and how to use them"

While PostgreSQL 10 has introduced the basic infrastructure for in-core support of partitioned tables, many new features are introduced in 11 to make the use of partitioned tables way more instinctive. One of them is an executor improvement for partition pruning, which has been mainly introduced by commit 499be01:

commit: 499be013de65242235ebdde06adb08db887f0ea5
author: Alvaro Herrera <alvherre@alvh.no-ip.org>
date: Sat, 7 Apr 2018 17:54:39 -0300
Support partition pruning at execution time

Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query.  This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions

[... long text ...]

Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com

This is a shortened extract (you can refer to the link above for the full commit log which would bloat this blog entry).

First note that PostgreSQL 10 already has support for partition pruning at planner time (the term “pruning” is new as of version 11 though), which is a way to eliminate scans of entire child partitions depending on the quals of a query (set of conditions in WHERE clause). Let’s take an example with the following, simple, table using a single column based on value ranges for the partition definition:

=# CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
CREATE TABLE
=# CREATE TABLE child_0_10 PARTITION OF parent_tab
     FOR VALUES FROM (0) TO (10);
CREATE TABLE
=# CREATE TABLE child_10_20 PARTITION OF parent_tab
     FOR VALUES FROM (10) TO (20);
CREATE TABLE
=# CREATE TABLE child_20_30 PARTITION OF parent_tab
     FOR VALUES FROM (20) TO (30);
CREATE TABLE
=# INSERT INTO parent_tab VALUES (generate_series(0,29));
INSERT 0 30

This applies also to various operations, like ranges of values, as well as additional OR conditions. For example here only two out of the three partitions are logically scanned:

=# EXPLAIN SELECT * FROM parent_tab WHERE id = 5 OR id = 25;
                            QUERY PLAN
-------------------------------------------------------------------
 Append  (cost=0.00..96.50 rows=50 width=4)
   ->  Seq Scan on child_0_10  (cost=0.00..48.25 rows=25 width=4)
         Filter: ((id = 5) OR (id = 25))
   ->  Seq Scan on child_20_30  (cost=0.00..48.25 rows=25 width=4)
         Filter: ((id = 5) OR (id = 25))
=# EXPLAIN SELECT * FROM parent_tab WHERE id >= 5 AND id <= 15;
                        QUERY PLAN
-------------------------------------------------------------------
 Append  (cost=0.00..96.50 rows=26 width=4)
   ->  Seq Scan on child_0_10  (cost=0.00..48.25 rows=13 width=4)
         Filter: ((id >= 5) AND (id <= 15))
   ->  Seq Scan on child_10_20  (cost=0.00..48.25 rows=13 width=4)
         Filter: ((id >= 5) AND (id <= 15))
(5 rows)

When using several levels of partitions, this works as well, first let’s add an extra layer, bringing the partition tree to have this shape:

       ---------parent_tab------------
      /        |           |          \
     /         |           |           \
 child_0_10 child_10_20 child_20_30 child_30_40
                                     /       \
                                    /         \
                                   /           \
                             child_30_35  child_35_40

And this tree can be done with the following SQL queries:

=# CREATE TABLE child_30_40 PARTITION OF parent_tab
     FOR VALUES FROM (30) TO (40)
     PARTITION BY RANGE(id);
CREATE TABLE
=# CREATE TABLE child_30_35 PARTITION OF child_30_40
     FOR VALUES FROM (30) TO (35);
=# CREATE TABLE child_35_40 PARTITION OF child_30_40
     FOR VALUES FROM (35) TO (40);
CREATE TABLE
=# INSERT INTO parent_tab VALUES (generate_series(30,39));
INSERT 0 10

When selecting partitions which involve multiple layers the planner gets also the call, for example here:

=# EXPLAIN SELECT * FROM parent_tab WHERE id = 10 OR id = 37;
                            QUERY PLAN
-------------------------------------------------------------------
 Append  (cost=0.00..96.50 rows=50 width=4)
   ->  Seq Scan on child_10_20  (cost=0.00..48.25 rows=25 width=4)
         Filter: ((id = 10) OR (id = 37))
   ->  Seq Scan on child_35_40  (cost=0.00..48.25 rows=25 width=4)
         Filter: ((id = 10) OR (id = 37))
(5 rows)

Note of course that this can happen only when the planner can know the values it needs to evaluate, so for example using a non-immutable function in a qual results into all the partitions scanned. If a partition is large, you also most likely want to create indexes on them to switch to reduce the scan cost.

In PostgreSQL 10, there is also a user-level parameter which allows to control if the pruning can happen or not with constraint_exclusion, which actually also works with the trigger-based partitioning with inherited tables only driven by CHECK constraints.

Note that things have changed a bit in PostgreSQL 11 with the apparition of the parameter called enable_partition_pruning, which is in charge of controlling the discard of partitions when planner-time clauses are selective enough to do the work, causing constraint_exclusion to have no effect with the previous examples. So be careful if you used PostgreSQL 10 with partitioning and the previous parameter. (Note as well that constraint_exclusion has gained as value “partition” which makes the constraint exclusion working on relations working on inheritance partitions).

Now that things are hopefully clearer, finally comes the new feature introduced in PostgreSQL 11, which is this time partition pruning at execution time, which extends the somewhat-limited feature described until now in this post. This is an advantage in a couple of cases, those being for example of a PREPARE query, a value from a subquery, or parameterized value of nested loop joins (in which case partition pruning can happen multiple times if the parameter value is changed during execution).

Let’s for example take the case of a subquery, where even if pruning is enabled that no partitions are discarded. This plan is the same for PostgreSQL 10 and 11:

=# EXPLAIN SELECT * FROM parent_tab WHERE id = (SELECT 1);
                            QUERY PLAN
-------------------------------------------------------------------
 Append  (cost=0.01..209.38 rows=65 width=4)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Seq Scan on child_0_10  (cost=0.00..41.88 rows=13 width=4)
         Filter: (id = $0)
   ->  Seq Scan on child_10_20  (cost=0.00..41.88 rows=13 width=4)
         Filter: (id = $0)
   ->  Seq Scan on child_20_30  (cost=0.00..41.88 rows=13 width=4)
         Filter: (id = $0)
    ->  Seq Scan on child_30_35  (cost=0.00..41.88 rows=13 width=4)
         Filter: (id = $0)
   ->  Seq Scan on child_35_40  (cost=0.00..41.88 rows=13 width=4)
         Filter: (id = $0)
(13 rows)

Something that changes though, is that you should look at the output of EXPLAIN ANALYZE which appends to non-executed partitions the term “(never executed)”. Hence, with the previous query and version 11, the following will be found (output changed a bit to adapt to this blog as the part about the non-execution of each partition is appended at the end of each sequential scan line):

=# EXPLAIN ANALYZE SELECT * FROM parent_tab WHERE id = (select 1);
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Append  (cost=0.01..209.71 rows=65 width=4)
    (actual time=0.064..0.072 rows=1 loops=1)
  InitPlan 1 (returns $0)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
         (actual time=0.003..0.004 rows=1 loops=1)
   ->  Seq Scan on child_0_10  (cost=0.00..41.88 rows=13 width=4)
         (actual time=0.045..0.052 rows=1 loops=1)
         Filter: (id = $0)
         Rows Removed by Filter: 9
   ->  Seq Scan on child_10_20  (cost=0.00..41.88 rows=13 width=4)
         (never executed)
         Filter: (id = $0)
   ->  Seq Scan on child_20_30  (cost=0.00..41.88 rows=13 width=4)
         (never executed)
         Filter: (id = $0)
   ->  Seq Scan on child_30_35  (cost=0.00..41.88 rows=13 width=4)
         (never executed)
         Filter: (id = $0)
   ->  Seq Scan on child_35_40  (cost=0.00..41.88 rows=13 width=4)
         (never executed)
         Filter: (id = $0)
 Planning Time: 0.614 ms
 Execution Time: 0.228 ms
(16 rows)

When using PREPARE queries though, you could rely on the EXPLAIN to get a similar experience with the following queries, so feel free to check by yourself:

PREPARE parent_tab_scan (int) AS (SELECT * FROM parent_tab WHERE id = $1);
EXPLAIN EXECUTE parent_tab_scan(1);

So be careful of any EXPLAIN output, and refer to what EXPLAIN ANALYZE has to say when it comes to pruning at execution time for the case where the values (or set of values) used for the pruning are only known during the execution. One part where “(never executed)” can apply though is when using for example subqueries within the PREPARE statement. Feel free to retry the previous queries for that.

↧

Viorel Tabara: Multi Datacenter setups with PostgreSQL

June 4, 2018, 2:58 am

≫ Next: Aleksander Alekseev: PGCon 2018: Slides and Photos

≪ Previous: Michael Paquier: Postgres 11 highlight - More Partition Pruning

The main goals of a multi-datacenter (or multi-DC) setup — regardless of whether the database ecosystem is SQL (PostgreSQL, MySQL), or NoSQL (MongoDB, Cassandra) to name just a few — are Low Latency for end users, High Availability, and Disaster Recovery. At the heart of such an environment lies the ability to replicate data, in ways that ensure its durability (as a side note Cassandra’s durability configuration parameters are similar to those used by PostgreSQL). The various replication requirements will be discussed below, however, the extreme cases will be left to the curious for further research.

Replication using asynchronous log shipping has been available in PostgreSQL for a long time, and synchronous replication introduced in version 9.1 opened a whole new set of options to developers of PostgreSQL management tools.

Things to Consider

One way to understanding the complexity of a PostgreSQL multi-DC implementation is by learning from the solutions implemented for other database systems, while keeping in mind that PostgreSQL insists on being ACID compliant.

A multi-DC setup includes, in most cases at least one datacenter in the cloud. While cloud providers take on the burden of managing the database replication on behalf of their clients, they do not usually match the features available in specialized management tools. For example with many enterprises embracing hybrid cloud and/or multi-cloud solutions, in addition to their existing on premise infrastructure, a multi-DC tool should be able to handle such a mixed environment.

Further, in order to minimize downtime during a failover, the PostgreSQL management system should be able to request (via an API call) a DNS update, so the database requests are routed to the new master cluster.

Networks spanning large geographical areas are high latency connections and all solutions must compromise: forget about synchronous replication, and use one primary with many read replicas. See the AWS MongoDB and Severalnines/Galera Cluster studies for an in-depth analysis of network effects on replication. On a related note, a nifty tool for testing the latency between locations is Wonder Network Ping Statistics.

While the high latency nature of WAN cannot be changed, the user experience can be dramatically improved by ensuring that reads are served from a read-replica close to the user location, however with some caveats. By moving replicas away from the primary, writes are delayed and thus we must do away with synchronous replication. The solution must also be able to work around other issues such as read-after-write-consistency and stale secondary reads due to connection loss.

In order to minimize the RTO, data needs to be replicated to a durable storage that is also able to provide high read throughput, and according to Citus Data one option that meets those requirements is AWS S3.

The very notion of multiple data center implies that the database management system must be able to present the DBA with a global view of all data centers and the various PostgreSQL clusters within them, manage multiple versions of PostgreSQL, and configure the replication between them.

When replicating writes to regional data centers, the propagation delay must be monitored. If the delay exceeds a threshold, an alarm should be triggered indicating that the replica contains stale data. The same principle applies to asynchronous multi-master replication.

In a synchronous setup, high latency, or network disruptions may lead to delays in serving client requests while waiting for the commit to complete, while in asynchronous configurations there are risks of split-brain, or degraded performance for an extended period of time. Split-brain and delays on synchronous commits are unavoidable even with well established replication solutions as explained in the article Geo-Distributed Database Clusters with Galera.

Another consideration is vendor support — as of this writing AWS does not support PostgreSQL cross-region replicas.

Intelligent management systems should monitor the network latency between data centers and recommend or adjust changes e.g. synchronous replication is perfectly fine between AWS Availability Zones where data centers are connected using fiber networks. That way a solution can achieve zero data loss and it can also implement master-master replication along with load balancing. Note that AWS Aurora PostgreSQL does not currently provide a master-master replication option.

Decide on the level of replication: cluster, database, table. The decision criteria should include bandwidth costs.

Implement cascaded replication in order to work around network disruptions that can prevent replicas to receive updates from master due to geographical distance.

Solutions

Taking into consideration the all the requirements identify the products that are best suited for the job. A note of caution though: each solution comes with its own caveats that must be dealt with by following the recommendations in the product documentation. See for example the BDR Monitoring requirement.

The PostgreSQL official documentation contains a list of non-commercial open source applications, and an expanded list including commercial closed source solutions can be found at the Replication, Clustering, and Connection Pooling wiki page. A few of those tools have been reviewed in more detail in the Top PG Clustering HA Solutions for PostgreSQL article.

There is no turnkey solution, but some products can provide most of the features, especially when working with the vendor.

Here’s a non-exhaustive list:

Citus Data provide their own PostgreSQL build, enhanced with impressive enterprise features and deep integration with AWS.
EnterpriseDB offer a large suite of services that can be combined to meet most of the requirements. Most information is at Product Documentation.
Postgres-BDR is a powerful replication tool designed specifically for geographically distributed clusters, however it doesn’t integrate with any cloud provider.
ClusterControl comes with an impressive feature set- for managing PostgreSQL. It also has limited cloud integration.
ElephantSQL works across many cloud providers. However, there is no option for an on-premise setup.
Crunchy PostgreSQL for Kubernetes is a cloud agnostic product built on the upstream PostgreSQL.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Conclusion

As we’ve seen, when it comes to choosing a PostgreSQL multi-datacenter solution, there isn’t a one-size fits all solution. Often, compromising is a must. However, a good understanding of the requirements and implications can go a long way in making an informed decision.

Compared to static (read-only) data, a solution for databases needs to consider the replication of updates (writes). The literature describing both SQL and NoSQL replication solutions insists on using a single source of truth for writes with many replicas in order to avoid issues such as split-brain, and read-after-write consistency.

Lastly, interoperability is a key requirement considering that multi-DC setups may span data centers located on premise, and various cloud providers.

Tags:

PostgreSQL

multi-dc

postgresql data center

↧

Aleksander Alekseev: PGCon 2018: Slides and Photos

June 4, 2018, 5:00 am

≫ Next: Kaarel Moppel: More on Postgres trigger performance

≪ Previous: Viorel Tabara: Multi Datacenter setups with PostgreSQL

At PGCon 2018 that took place last week (29.05-01.06) my colleague Anastasia and I gave a talk "Growing up new PostgreSQL developers: On teaching students, team leading, project management and stuff". Here are the slides:

Download: growing-up-new-postgresql-developers-pgcon-2018.pdf (517K)

Other talks presented by colleagues of mine:

Credereum - blockchain-enabled …

↧

Kaarel Moppel: More on Postgres trigger performance

June 5, 2018, 1:12 am

≫ Next: Liaqat Andrabi: PGInstaller – A GUI based, user-friendly installer for PostgreSQL

≪ Previous: Aleksander Alekseev: PGCon 2018: Slides and Photos

In my last post I described what to expect from simple PL/pgSQL triggers in performance degradation sense, when doing some inspection/changing on the incoming row data. Conclusion for the most common “audit fields” type of use case was that we should not worry about it too much and just create those triggers. But in which use cases would make sense to start worrying a bit?

So to get more insights I conjured up some more complex trigger use cases and again measured transaction latencies on them for an extended period of time. So do please read on for some extra info on the performed tests or just jump to the concluding results table at end of article.

Default pgbench vs audit triggers for all updated tables

This was the initial test I ran for the original blog post– default pgbench transactions, with schema slightly modified to include 2 auditing columns for all tables being updated, doing 3 updates, 1 select, 1 insert (see here to see how the default transaction looks like) vs PL/PgSQL audit triggers on all 3 tables getting updates. The triggers will just set the last modification timestamp to current time and username to current user, if not already specified in the incoming row.
Results: 1.173ms vs 1.178ms i.e. <1% penalty for the version with triggers.

Single row update usecase

With multi statement transactions a lot of time is actually spent on communication over the network. To get rid of that the next test consisted of just a single update on the pgbench_accounts table (again 2 audit columns added to the schema). And then again the same with an PL/pgSQL auditing trigger enabled that sets the modification timestamp and username if left empty.
Results: 0.390ms vs 0.405ms ~ 4% penalty for the trigger version. Already a bit visible, but still quite dismissable I believe.

/* script file used for pgbench */
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

Single row update with a trigger written in C

But what it the above 4% performance degradation is not acceptable and it sums up if we are actually touching a dozen of tables (ca 60% hit)? Can we somehow shave off some microseconds?
Well one could try to write triggers in the Postgres native language of “C”! As well with optimizing normal functions it should help with triggers. But hughh, “C” you think…sounds daunting? Well…sure, it’s not gonna be all fun and play, but there a quite a lot of examples actually included in the Postgres source code to get going, see here for example.

So after some tinkering around (I’m more of a Python / Go guy) I arrived at these numbers: 0.405ms for PL/pgSQL trigger vs 0.401ms for the “C” version meaning only ~ +1% speedup! So in short – absolutely not worth the time for such simple trigger functionality. But why so little speedup against an interpreted PL language you might wonder? Yes, PL/pgSQL is kind of an interpreted language, but with a good property that execution plans and resulting prepared statements actually stay cached within one session. So if we’d use pgbench in “re-connect” mode I’m pretty sure we’d see some very different numbers.

...
	// audit field #1 - last_modified_on
	attnum = SPI_fnumber(tupdesc, "last_modified_on");

	if (attnum <= 0)
		ereport(ERROR,
				(errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION),
				 errmsg("relation \"%d\" has no attribute \"%s\"", rel->rd_id, "last_modified_on")));

	valbuf = (char*)SPI_getvalue(rettuple, tupdesc, attnum);
	if (valbuf == NULL) {
		newval = GetCurrentTimestamp();
		rettuple = heap_modify_tuple_by_cols(rettuple, tupdesc,
											 1, &attnum, &newval, &newnull);
	}
...

See here for my full “C” code.

Single row update with a trigger doing “logging insert”

Here things get a bit incomparable actually as we’re adding some new data, which is not there in the “un-triggered” version. So basically I was doing from the trigger the same as the insert portion (into pgbench_history) from the default pgbench transaction. Important to note though – although were seeing some slowdown…it’s most probably still faster that doing that insert from the user transaction as we can space couple of network bytes + the parsing (in our default pgbench case statements are always re-parsed from text vs pl/pgsql code that are parsed only once (think “prepared statements”). By the way, to test how pgbench works with prepared statements (used mostly to test max IO throughput) set the “protocol” parameter to “prepared“.
Results – 0.390ms vs 0.436ms ~ 12%. Not too bad at all given we double the amount of data!

Defaul pgbench vs 3 “logging insert” triggers

Here we basically double the amount of data written – all updated tables get a logging entry (including pgbench_accounts, which actually gets an insert already as part on normal transaction). Results – 1.173 vs 1.285 ~ 10%. Very tolerable penalties again – almost doubling the dataset here and only paying a fraction of the price! This again shows that actually the communication latency and transaction mechanics together with the costly but essential fsync during commit have more influence than a bit of extra data itself (given we don’t have tons of indexes on the data of course). For reference – full test script can be found here if you want to try it out yourself.

Summary table

Use Case	Latencies (ms)	Penalty per TX (%)
Pgbench default vs with audit triggers for all 3 updated tables	1.173 vs 1.178	0.4%
Single table update (pgbench_accounts) vs with 1 audit trigger	0.390 vs 0.405	3.9%
Single table update (pgbench_accounts) vs with 1 audit trigger written in “C”	0.390 vs 0.401	2.8%
Single table update vs with 1 “insert logging” trigger	0.390 vs 0.436	11.8%
Pgbench default vs with 3 “insert logging” triggers on updated tables	1.173 vs 1.285	9.6%

Bonus track – trigger trivia!

* Did you know that in Postgres one can also write DDL triggers so that you can capture/reject/log structural changes for all kinds of database objects? Most prominent use case might be checking for full table re-writes during business hours.
* Also there are “statement level” triggers that are executed only once per SQL. They’re actually the default even if you don’t specify the level. And in Postgres 10 they were also extended with the “transition tables” feature, allowing you to inspect all rows changed by the statement to possibly do some summary aggregations or validation.
* When you have many triggers on a table the execution order happens alphabetically by trigger name! Additionally in case of BEFORE and INSTEAD OF triggers, the possibly-modified row returned by each trigger becomes the input to the next trigger.
* Row level BEFORE triggers are much more “cheaper” then AFTER triggers when updating a lot of rows, as they fire immediately vs at the end of the statement in which case Postgres needs to temporarily store the row state information. Situation can usually be alleviated though with some sane WHEN conditions in trigger declarations.
* And yes, it’s possible to create for example an insert trigger that inserts again into the same table that caused the trigger to fire:) Won’t then there be an infinite loop, eating up all your disk space? Yes it would…if max_stack_depth wouldn’t kick in But of course I’d advise you to keep triggers always as simple as possible.
* For writing triggers you’re not actually tied to most popular trigger language of PL/pgSQL and abovementioned “C” – at least PL/Python and PL/Perl also support triggers, and there might be some more.
* Postgres 11 will include support for triggers on partitioned tables, allowing to declare them only once! Currently one had to define them for all sub-partitions separately.

The post More on Postgres trigger performance appeared first on Cybertec.

↧

Liaqat Andrabi: PGInstaller – A GUI based, user-friendly installer for PostgreSQL

June 5, 2018, 4:11 am

≫ Next: Markus Winand: PGCon 2018: Standard SQL Gap Analysis

≪ Previous: Kaarel Moppel: More on Postgres trigger performance

Installing the world’s top enterprise-class open source database – PostgreSQL – is now a whole lot easier with PGInstaller.

PGInstaller is a GUI based, user-friendly installer for PostgreSQL that is digitally signed and certified by 2ndQuadrant. The installer is currently available for PostgreSQL versions 9.5, 9.6, 10 and 11(beta) and has the ability to run in graphical, command line, or quiet installation modes.

In addition, PGInstaller provides the following features:

Built-in support for Python 3
Compression support using zlib
Integration with native service control managers such as systemd for Linux, Service Control Manager for Windows, and LaunchControl for OSX
Consistent interface across all supported platforms

Hot to get started with PGInstaller:

Visit the PGInstaller download page
Select desired PostgreSQL version and platform
Open the downloaded file to run installation wizard

It’s that simple!

We would love to hear your feedback on the new installation experience. Please share your thoughts here.

↧

Markus Winand: PGCon 2018: Standard SQL Gap Analysis

June 4, 2018, 5:00 pm

≫ Next: Pavel Stehule: plpgsql_check can identify variables with wrong type used in predicates and breaks index usage

≪ Previous: Liaqat Andrabi: PGInstaller – A GUI based, user-friendly installer for PostgreSQL

PostgreSQL Standard SQL Gap Analysis

Last week I’ve presented my “PostgreSQL Standard SQL Gap Analysis” at PGCon.org in Ottawa. If this sound familiar you might confuse it with the opposite talk “Features Where PostgreSQL Beats its Competitors” I gave at FOSDEM and PgConf.de this year.

The abstract of the gap analysis:

PostgreSQL supports an impressive number of standard SQL features in an outstanding quality. Yet there remain some cases where other databases exceed PostgreSQL’s capabilities in regard to standard SQL conformance.
This session presents the gaps found during an in-depth comparison of selected standard SQL features among six popular SQL databases. The selected features include, among others, window functions and common tables expressions—both of them were recently introduced to MySQL and MariaDB.
The comparison uses a set of conformance tests I use for my website modern-sql.com. These tests are based on the SQL:2016 standard and attempt to do a rather complete test of the requirements set out in the standard. This includes the correct declared type of expressions as well as the correct SQLSTATE in case of errors (teaser: nobody seems to care about SQLSTATE).
This presentation covers two aspects: (1) features not supported by PostgreSQL but by other databases; (2) features available in PostgreSQL that are less complete or conforming as in other databases.

You can download the slides here [PDF; 5MB].

Below I list the covered features with a short comment for your convenience. The slides have more information including the charts that show which databases support these features.

Features Less Complete or Conforming

Let’s start with features PostgreSQL supports, but in a less conforming or complete manner than other databases.

extract: In PostgreSQL, the extract expression returns a double value rather than an exact numeric value (e.g. numeric).
[respect|ignore] nulls for lead, lag, first_value, last_value and nth_value: PostgreSQL does not support the [respect|ignore] null modifier for these window functions.
Distinct aggregates as window functions: PostgreSQL doesn’t support distinct in aggregates when used as a window function (over): count(distinct …) over(…).
fetch [first|next]…: Fetch first is the standard clause for limit. PostgreSQL does not support percentages or the with ties modifier of fetch first.
Functional dependencies: PostgreSQL recognizes only very few of the known functional dependencies described in the standard.

Features Missing in PostgreSQL

Next, I list features that are not supported by PostgreSQL, but by at least one other major database:

Row Pattern Recognition (match_recognize)

In my presentation, I’ve made it very clear that I think this is the SQL extension of the decade. If you think window functions have changed the face of SQL, here is the next leap forward.

To learn more about this, read this free technical report from ISO: Row Pattern Recognition in SQL [ZIP+PDF; 850kB]

If you want to see even more, have a look at my slides on row pattern recognition and the articles Stew Ashton wrote about it.

Temporal and Bi-temporal Tables

This covers system and application versioning and is sometimes referred to as “time travel” or “temporal validity”. The interesting fact is that—out of the seven analyzed databases—PostgreSQL belongs to the minority that don’t support anything of this yet. This is just because MariaDB 10.3 was released the week before.

What is it? Just read the best free resource on it: Temporal features in SQL:2011 [PDF; 220kB]

Generated Columns

Again, PostgreSQL belongs to the minority of databases not supporting this. Arguably, it is not so important for PostgreSQL because PostgreSQL supports indexes on expressions natively so it doesn’t need the detour via generated columns as MySQL, MariaDB, and SQL Server do.

Combined Data Change and Retrieval

This was brought to my attention by Lukas Eder on the jOOQ blog recently. It is basically the standard variant of writable CTEs (insert, update, delete in with clauses). It is a little bit more powerful because it allows you to select either the old or the new data of an update.

Partitioned Join

Watch out: this is not about table partitioning. Instead it is about filling gaps in time series. This can be easily done with an outer join if there is only one time series. If you have multiple time series in one data set and need to fill all gaps in each of these series, partitioned join is the answer.

SELECT * 
  FROM data PARTITION BY (key) 
 RIGHT JOIN generate_series(...) 
         ON ...

listagg

I’ve written a full article about listagg before. Sure, PostgreSQL supports string_agg and other means to get a similar result, but that’s not standard. How cares? Well, SQL Server has a string_agg function too, but with a different syntax. This is what standards aim to prevent.

Distinct data types

This is about create type ... as <predefined type>. PostgreSQL supports structured types and domains, but not this particular way introduce a new type name, including type-safety, based on a predefined type such as integer.

Work in Progress

Finally, I’ve also mentioned two topics that are currently under construction:

merge

The standard way for upsert (update or insert), featuring a more flexible syntax. This was already committed for PostgreSQL 11 but got reverted shortly after. However, chances are that there’ll be a new attempt for PostgreSQL 12.

I’ve tested the patch for syntactical completeness before it got reverted and found no major gap to the other available implementations.

JSON

PostgreSQL has great JSON support. However, in late 2016—years after PostgreSQL added it—the standard added JSON functions too. No surprise they don’t match the PostgreSQL functions. In the meanwhile other databases get standard JSON support and so does PostgreSQL.

My preliminary test of the PostgreSQL SQL/JSON patches has shown some issues, but I did not yet check them in detail. I plan to do so in the next weeks and will report any gaps I might find.

The last slide is my offer to help the PostgreSQL community in interpreting the standard and testing patches if you ping me.

Please remember that this blog post is just a teaser. More background is available in the slides [PDF; 5MB].

“PostgreSQL Standard SQL Gap Analysis” by Markus Winand was originally published at modern SQL.

↧

Pavel Stehule: plpgsql_check can identify variables with wrong type used in predicates and breaks index usage

June 5, 2018, 1:01 pm

≫ Next: Don Seiler: Altering Default Privileges For Fun and Profit

≪ Previous: Markus Winand: PGCon 2018: Standard SQL Gap Analysis

Simple example:

create table bigtable(id bigint, ...)

...
declare _id numeric;
begin
  _id := ...
  FOR r IN SELECT * FROM bigtable WHERE id = _id
  LOOP
     ...

In this case, PostgreSQL newer use index due different type of query parameter (type of parameter is defined by type of PLpgSQL variable) and table attribute. This time this error is more usual due migration from Oracle. Id in tables are declared as bigint, int, but variables in functions are often declared as numeric.

PLpgSQL can identify some symptom of this issue - implicit cast inside predicate - and can throw performance warning. See commit.

Example:

create table bigtable(id bigint, v varchar);
create or replace function test()
returns void as $$
declare
  r record;
  _id numeric;
begin
  select * into r from bigtable where id = _id;
  for r in select * from bigtable where _id = id
  loop
  end loop;
  if (exists(select * from bigtable where id = _id)) then
  end if;
end;
$$ language plpgsql;

select * from plpgsql_check_function('test()', performance_warnings => true);
                                                    plpgsql_check_function
-------------------------------------------------------------------------------------------------------------------------------
 performance:42804:6:SQL statement:implicit cast of attribute caused by different PLpgSQL variable type in WHERE clause
 Query: select *        from bigtable where id = _id
 --                                             ^
 Detail: An index of some attribute cannot be used, when variable, used in predicate, has not right type like a attribute
 Hint: Check a variable type - int versus numeric
 performance:42804:7:FOR over SELECT rows:implicit cast of attribute caused by different PLpgSQL variable type in WHERE clause
 Query: select * from bigtable where _id = id
 --                                 ^
 Detail: An index of some attribute cannot be used, when variable, used in predicate, has not right type like a attribute
 Hint: Check a variable type - int versus numeric
 performance:42804:10:IF:implicit cast of attribute caused by different PLpgSQL variable type in WHERE clause
 Query: SELECT (exists(select * from bigtable where id = _id))
 --                                                     ^
 Detail: An index of some attribute cannot be used, when variable, used in predicate, has not right type like a attribute
 Hint: Check a variable type - int versus numeric
 warning:00000:3:DECLARE:never read variable "r"
(16 rows)

↧

Don Seiler: Altering Default Privileges For Fun and Profit

June 4, 2018, 10:00 pm

≫ Next: pgCMH - Columbus, OH: PostgreSQL Data Types and You

≪ Previous: Pavel Stehule: plpgsql_check can identify variables with wrong type used in predicates and breaks index usage

One of the big changes I came upon as I transitioned from Oracle to PostgreSQL was how privileges are handled. In Oracle, an object’s schema also determines its owner, as the schema is generally that user’s objects. PostgreSQL disassociates schemas from users. Schemas do have owners, but a given user (or role) could own many schemas. Furthermore, objects such as tables in those schemas could be owned by completely different users or roles. Once I unlearned what I had learned and got my head around that, I next came upon the concept of “default privileges”.

Let’s say we have a database, privtest, with a schema app_schema owned by app_owner. We also have 3 users: bigsam, rhino and marco. All 3 users have USAGE granted on privtest, while bigsam and rhino also have CREATE:

> grant usage on schema app_schema to bigsam;
GRANT
> grant usage on schema app_schema to rhino;
GRANT
> grant usage on schema app_schema to marco;
GRANT
> grant create on schema app_schema to bigsam;
GRANT
> grant create on schema app_schema to rhino;
GRANT

Setting Default Privileges

So we first connect as bigsam and define default privileges for tables he creates in the app_schema schema.

> \connect privtest bigsam
You are now connected to database "privtest" as user "bigsam".

> alter default privileges in schema app_schema
        grant select on tables to marco;
ALTER DEFAULT PRIVILEGES

Then let’s go ahead and create a new table:

> create table app_schema.sam_tables as select * from pg_tables;
SELECT 62

Default privileges allow you to say what privileges should be granted on the specified object types (tables, sequences, functions, etc.) that you create in the future. Notice that I emphasized “you”. At first I thought it applied to all objects in the schema, but it only applies to objects in the schema created by the user executing the ALTER DEFAULT PRIVILEGE command. So if I only grant default privileges as bigsam but then create the objects as app_owner, then marco still can’t see that table.

Also it’s important to re-state that this only applies to future objects. To change privileges on already-existing objects, use the standard GRANT syntax.

So we see in the above example that bigsam has said that all tables that he creates in the app_schema schema in the future should be readable by marco. So let’s take a look at the table privileges in the schema at this point with the \dp command:

> \dp app_schema.*
                                   Access privileges
   Schema   |    Name    | Type  |   Access privileges   | Column privileges | Policies
------------+------------+-------+-----------------------+-------------------+----------
 app_schema | sam_tables | table | bigsam=arwdDxt/bigsam+|                   |
            |            |       | marco=r/bigsam        |                   |
(1 row)

We see we have just the one table, owned by bigsam with full privileges. Then we also see that marco has the SELECT privilege (marco=r), and that it was granted by bigsam. The \dp access privilege codes can be found in the GRANT documentation.

So now as marco we can see the data in that table:

> \connect privtest marco
You are now connected to database "privtest" as user "marco".

> select count(*) from app_schema.sam_tables;
 count
-------
    62
(1 row)

Not Setting Default Privileges

Now let’s log in as rhino and create a table, but without setting any default privileges for marco or anybody:

> \connect privtest rhino
You are now connected to database "privtest" as user "rhino".

> create table app_schema.rhino_tables as select * from pg_tables;
SELECT 63

> \dp app_schema.*
                                    Access privileges
   Schema   |     Name     | Type  |   Access privileges   | Column privileges | Policies
------------+--------------+-------+-----------------------+-------------------+----------
 app_schema | rhino_tables | table |                       |                   |
 app_schema | sam_tables   | table | bigsam=arwdDxt/bigsam+|                   |
            |              |       | marco=r/bigsam        |                   |
(2 rows)

As expected, we have our new table with no special access privileges granted (the owner would still have full privileges). Let’s log back in as marco and see what we can see:

> \connect privtest marco
You are now connected to database "privtest" as user "marco".

> select count(*) from app_schema.rhino_tables;
psql:alter_default.sql:43: ERROR:  permission denied for relation rhino_tables
> \connect privtest rhino

To fix this we can log back in as rhino (or a superuser) and grant select on this table:

> \connect privtest rhino
You are now connected to database "privtest" as user "rhino".

grant select on app_schema.rhino_tables to marco;
GRANT

> \connect privtest marco
You are now connected to database "privtest" as user "marco".
> \dp app_schema.*
                                    Access privileges
   Schema   |     Name     | Type  |   Access privileges   | Column privileges | Policies
------------+--------------+-------+-----------------------+-------------------+----------
 app_schema | rhino_tables | table | rhino=arwdDxt/rhino  +|                   |
            |              |       | marco=r/rhino         |                   |
 app_schema | sam_tables   | table | bigsam=arwdDxt/bigsam+|                   |
            |              |       | marco=r/bigsam        |                   |
(2 rows)

> select count(*) from app_schema.rhino_tables;
 count
-------
    63
(1 row)

A superuser could also grant privileges for all tables in the schema, e.g.:

# GRANT SELECT ON ALL TABLES IN SCHEMA app_schema TO marco;
GRANT

In this current example, the results would be the same since marco already has SELECT privileges on sam_tables.

Stay tuned for part 2, in which our hero discovers another fun fact about default privileges!

↧

pgCMH - Columbus, OH: PostgreSQL Data Types and You

June 6, 2018, 9:00 pm

≫ Next: Bruce Momjian: Will Postgres Live Forever?

≪ Previous: Don Seiler: Altering Default Privileges For Fun and Profit

The June meeting will be held at 18:00 EST on Tues, the 26^th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

What

Our very own Douglas will be presenting this month. He’s going to tell us all about the most common datatypes you’ll see in PostgreSQL. This will be a two-part talk, concluding in July.

Where

CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3^rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11^th floor. (If the elevator won’t let you pick the 11^th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

Community space as seen from the stage

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

↧

Bruce Momjian: Will Postgres Live Forever?

June 7, 2018, 1:00 pm

≫ Next: Craig Kerstiens: Citus what is it good for? OLTP? OLAP? HTAP?

≪ Previous: pgCMH - Columbus, OH: PostgreSQL Data Types and You

I had the opportunity to present an unusual topic at this year's Postgres Vision conference: Will Postgres Live Forever? It is not something I talk about often but it brings out some interesting contrasts in how open source is different from proprietary software, and why innovation is fundamental to software usage longevity. For the answer to the question, you will have to read the slides.

↧

Craig Kerstiens: Citus what is it good for? OLTP? OLAP? HTAP?

June 7, 2018, 1:09 pm

≫ Next: Jignesh Shah: Setting up PostgreSQL 11 Beta 1 in Amazon RDS Database Preview Environment

≪ Previous: Bruce Momjian: Will Postgres Live Forever?

Earlier this week as I was waiting to begin a talk at a conference, I chatted with someone in the audience that had a few questions. They led off with this question: is Citus a good fit for X? The heart of what they were looking to figure out: is the Citus distributed database a better fit for analytical (data warehousing) workloads, or for more transactional workloads, to power applications? We hear this question quite a lot, so I thought I’d elaborate more on the use cases that make sense for Citus from a technical perspective.

Before I dig in, if you’re not familiar with Citus; we transform Postgres into a distributed database that allows you to scale your Postgres database horizontally. Under the covers, your data is sharded across multiple nodes, meanwhile things still appear as a single node to your application. By appearing still like a single node database, your application doesn’t need to know about the sharding. We do this as a pure extension to Postgres, which means you get all the power and flexibility that’s included within Postgres such as JSONB, PostGIS, rich indexing, and more.

OLAP - Data warehousing as it was 5 years ago

Once upon a time (when Citus Data was first started ~7 years ago), we focused on building a fast database to power analytics. Analytical workloads often had different needs and requirements around them. Transactions weren’t necessarily needed. Data was often loaded in bulk as opposed to single row inserts and updates. Analytics workloads have evolved and moved from pure-OLAP to a mix of OLTP and OLAP, and so Citus has too.

Data warehousing for slower storage and massive exploration

Going down the data warehousing rabbit hole, you’ll find use cases for storing hundreds of terabytes of data. This can range from historical audit logs, analytics data, to event systems. Much of the data is seldomly accessed, but one day you may need it so you want to retain it. This data is typically used internally by a data analyst that has some mix of defined reports (maybe they run them monthly) as well as ad-hoc exploration. For their reporting needs these can often be run in batch and aren’t interactive or user facing.

Data warehouses can be single node or multiple node, but almost always are a massively parallel processing (MPP) system. Often times they have columnar storage (which we’ll get into in a future post) that for their method of bulk ingestion and minimizing storage cost can work quite well. The design of most data warehouses though leads them to not be ideal for powering anything user facing as they typically have fairly low concurrency.

OLTP - Your transactional system of record

On the other side of the coin is a traditional transactional database. A transactional database is what you typically use when you start a Rails application and connect your app to a database for storage. Postgres falls squarely into this category. Side note: at times, Postgres has been forked and modified to meet data warehousing use cases, due to its favorable licensing and solid code base.

The standard CRUD (create, read, update, delete) operations that users perform within your application are the textbook definition of OLTP (online transaction processing). There are certain expectations that come with OTLP such as transactional guarantees for your data and an ability to abort/rollback transactions.

It isn’t a requirement for an OLTP database that it speaks SQL, though most transactional databases do support at least some version of the SQL standard and thus tend to play well with most application frameworks.

The rise of HTAP workloads, both transactional and analytical

Ten years ago, the world was black and white, you either had a database that was OLTP or OLAP. With the rise of software, there is more data than ever before. With the increase in data, many teams see an increase in the desire to provide value and analytics directly on top of transactional data in real-time. This new middle ground is often referred to as HTAP coined by Gartner. Per their definition:

Hybrid transaction/analytical processing (HTAP) is an emerging application architecture that “breaks the wall” between transaction processing and analytics. It enables more informed and “in business real time” decision making.

Real-time analytics, or HTAP, you’ll often find powering customer-facing dashboards, monitoring network data for security alerting, in high-frequency trading environments.

Okay so where does the Citus database fit in? Is it OLAP, OLTP, or HTAP?

When we first started building Citus, we began on the OLAP side. As time has gone on Citus has evolved to have full transactional support, first when targeting a single shard, and now fully distributed across your Citus database cluster.

Today, we find most who use the Citus database do so for either:

(OLTP) Fully transactional database powering their system of record or system of engagement (often multi-tenant)
(HTAP) For providing real-time insights directly to internal or external users across large amounts of data.

Have questions about whether Citus can help with your use case? Just let us know and we’d be happy to talk and explore if your use case would be a good fit or not.

↧

Jignesh Shah: Setting up PostgreSQL 11 Beta 1 in Amazon RDS Database Preview Environment

June 8, 2018, 12:00 am

≫ Next: Laurenz Albe: Adding an index can decrease SELECT performance

≪ Previous: Craig Kerstiens: Citus what is it good for? OLTP? OLAP? HTAP?

PostgreSQL 11 Beta 1 has been out for more than couple of weeks. The best way to experience it is to try out the new version and test drive it yourself.

Rather than building it directly from source, I nowadays take the easy way out and deploy it in the cloud. Fortunately, it is already available in Amazon RDS Database Preview Environment.

For this post I am going to use the AWS CLI since it is easy to understand the command line and copy/paste it and also easier to script it for repetitive testing. To use the Database Preview environment, the endpoint has to be modified to use https://rds-preview.us-east-2.amazonaws.com/ instead of the default for the region.

Because there might be multiple PostgreSQL 11 beta releases possible, it is important to understand which build version is being deployed. I can always leave it to the default which typically would be the latest preferred version but lot of times I want to make sure on the version I am deploying. The command to get all the versions of PostgreSQL 11 is describe-db-engine-versions.

$ aws rds describe-db-engine-versions --engine postgres --db-parameter-group-family postgres11 --endpoint-url https://rds-preview.us-east-2.amazonaws.com/

{

"DBEngineVersions": [

{

"Engine": "postgres",

"DBParameterGroupFamily": "postgres11",

"SupportsLogExportsToCloudwatchLogs": false,

"SupportsReadReplica": true,

"DBEngineDescription": "PostgreSQL",

"EngineVersion": "11.20180419",

"DBEngineVersionDescription": "PostgreSQL 11.20180419 (68c23cba)",

"ValidUpgradeTarget": [

{

"Engine": "postgres",

"IsMajorVersionUpgrade": false,

"AutoUpgrade": false,

"EngineVersion": "11.20180524"

}

]

{

"Engine": "postgres",

"DBParameterGroupFamily": "postgres11",

"SupportsLogExportsToCloudwatchLogs": false,

"SupportsReadReplica": true,

"DBEngineDescription": "PostgreSQL",

"EngineVersion": "11.20180524",

"DBEngineVersionDescription": "PostgreSQL 11.20180524 (BETA1)",

"ValidUpgradeTarget": []

}

]

}

From the above I see there are two versions 11.20180419 and 11.20180524. The versions are based on datestamp with the description showing the tag information of the version. Since I am interested in the BETA1 version I use the version 11.20180524.

$ aws rds create-db-instance --endpoint https://rds-preview.us-east-2.amazonaws.com --allocated-storage 100 --db-instance-class db.t2.small --db-name benchdb --master-username SECRET --master-user-password XXXXX --engine postgres --engine-version 11.20180524 --db-instance-identifier pg11beta1

Once deployed I can always get the endpoint of the

$ aws rds describe-db-instances --endpoint=https://rds-preview.us-east-2.amazonaws.com --db-instance-identifier pg11beta1 |grep Address

"Address": "pg11beta1.XXXXXX.us-east-2.rds-preview.amazonaws.com"

In my account I have already added my client to my default security group,

$ psql -h pg11beta1.ctiolgghvpx0.us-east-2.rds-preview.amazonaws.com -U pgadmin -d benchdb -c 'SELECT VERSION()'

version

PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

It is hard to test a database without any date. Normally I would just use pgbench directly on it and load data. But then I thought let me try a different way of loading data in this instance of the database similarly to how people will typically do from a different production setup. For this purpose, I will need to setup a production database before I proceed.

Before I create a production database instance, I first create a custom parameter group so that I can enable my typical settings that I use in a production database. In the preview environment I created a PostgreSQL 11 database family parameter group and edit the group to change some of the parameters as follows:

rds.logical_replication = 1

and saved the group.
Next, I create my production instance using the newly created parameter group.

$ aws rds create-db-instance --allocated-storage 100 --db-instance-class db.t2.small --engine postgres --db-name benchdb --master-username pgadmin --master-user-password SECRET --db-instance-identifier pg11prod

It is still empty so I filled it up with my production data.

$ pgbench -i -s 100 -h pg11prod.XXX.us-east-2.rds-preview.amazonaws.com -U pgadmin benchdb

Now I have a typical setup with one production instance and another empty test instance. I know have to figure how to get the data into my test instance. I could always dump all data using pg_dump and restored it on the new instance but this time I am going to try logical replication.

For setting up logical replication between two instances I first need to recreate the schema on the other instance. pg_dump provides a flag -s to just dump the schema with no data. I dump the schema from the production setup

$ pg_dump -s -h pg11prod.XXXX.us-east-2.rds.amazonaws.com -U pgadmin benchdb > schema.txt

and then load the schema into my test setup

$ psql -h pg11beta1.XXXX.us-east-2.rds-preview.amazonaws.com -U pgadmin -d benchdb -f schema.txt

Now I want to actually setup logical replication between the two users. For this I need a replication user. I could use the master password but that is too risky. So, I create a new user with read only privileges on the tables in the database and give it replication rights that will work in Amazon RDS.

$ psql -h pg11prod.XXXXX.us-east-2.rds-preview.amazonaws.com -U pgadmin benchdb

benchdb=> CREATE USER repluser WITH PASSWORD 'SECRET';

CREATE ROLE

benchdb=> GRANT rds_replication TO repluser;

GRANT ROLE

benchdb=> GRANT SELECT ON ALL TABLES IN SCHEMA public TO repluser;

GRANT

Next, I have to setup a publication for all tables in the production database

benchdb=> CREATE PUBLICATION pgprod11 FOR ALL TABLES;
CREATE PUBLICATION

One more thing to add here is to change the inbound rules of the security group of the production instance to allow the test instance to connect.

On my test instance I need to create a subscription to subscribe to all changes happening on my production setup.

$ psql -h pg11beta1.XXXXX.us-east-2.rds-preview.amazonaws.com -U pgadmin benchdb

benchdb=> CREATE SUBSCRIPTION pg11beta1 CONNECTION 'host=pg11prod.ctiolgghvpx0.us-east-2.rds-preview.amazonaws.com dbname=benchdb user=repluser password= SECRET' PUBLICATION pgprod11;
NOTICE: created replication slot "pg11beta1" on publisher
CREATE SUBSCRIPTION

Note if the command itself is taking a long time to execute then typically it means that it cannot connect to the production instance. Check the security group to make sure the rule to allow your test instance to connect is set properly. If the connection is allowed then the command returns instantaneously. However, the actual data might be loading behind the scenes.

After some time, I can see that my test instance has all the initial data from the production setup.

benchdb=> select count(*) from pgbench_branches;
count
-------
100
(1 row)

benchdb=> select count(*) from pgbench_history;
count
-------
0
(1 row)

(The table pgbench_history is typically empty after a fresh setup of pgbench)

Now let's run application workload on our production database pg11prod

$ pgbench -c 10 -T 300 -P 10 -h pg11prod.XXXXX.us-east-2.rds-preview.amazonaws.com -U pgadmin -U pgadmin benchdb

As the load starts (after the initial vacuum), log into the test instance and check for changes. With pgbench default test, it is easy to verify changes by counting entries in pgbench_history.

$ psql -h pg11beta1.XXXXX.us-east-2.rds-preview.amazonaws.com -U pgadmin benchdb
psql (10.4 (Ubuntu 10.4-2.pgdg16.04+1), server 11beta1)
Type "help" for help.

benchdb=> select count(*) from pgbench_history;
count
-------
2211
(1 row)

benchdb=> select count(*) from pgbench_history;
count
-------
10484
(1 row)

This is a simple test to see changes are being propagated from the production instance to the test instance.

Cool, I finally have a logical replication using a read-only user between two PostgreSQL 11 instances in Amazon RDS Database Preview Environment.

Pretty cool!

↧

Laurenz Albe: Adding an index can decrease SELECT performance

June 11, 2018, 1:30 am

≫ Next: Vincenzo Romano: Temporary functions (kind of) without schema qualifiers

≪ Previous: Jignesh Shah: Setting up PostgreSQL 11 Beta 1 in Amazon RDS Database Preview Environment

A bad query plan ... — © Laurenz Albe 2018

We all know that you have to pay a price for a new index you create — data modifying operations will become slower, and indexes use disk space. That’s why you try to have no more indexes than you actually need.

But most people think that SELECT performance will never suffer from a new index. The worst that can happen is that the new index is not used.

However, this is not always true, as I have seen more than once in the field. I’ll show you such a case and tell you what you can do about it.

An example

We will experiment with this table:

CREATE TABLE skewed (
   sort        integer NOT NULL,
   category    integer NOT NULL,
   interesting boolean NOT NULL
);

INSERT INTO skewed
   SELECT i, i%1000, i>50000
   FROM generate_series(1, 1000000) i;

CREATE INDEX skewed_category_idx ON skewed (category);

VACUUM (ANALYZE) skewed;

We want to find the first twenty interesting rows in category 42:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM skewed
WHERE interesting AND category = 42
ORDER BY sort
LIMIT 20;

This performs fine:

                             QUERY PLAN
--------------------------------------------------------------------
 Limit  (cost=2528.75..2528.80 rows=20 width=9)
        (actual time=4.548..4.558 rows=20 loops=1)
   Buffers: shared hit=1000 read=6
   ->  Sort  (cost=2528.75..2531.05 rows=919 width=9)
             (actual time=4.545..4.549 rows=20 loops=1)
         Sort Key: sort
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1000 read=6
         ->  Bitmap Heap Scan on skewed
                        (cost=19.91..2504.30 rows=919 width=9)
                        (actual time=0.685..4.108 rows=950 loops=1)
               Recheck Cond: (category = 42)
               Filter: interesting
               Rows Removed by Filter: 50
               Heap Blocks: exact=1000
               Buffers: shared hit=1000 read=6
               ->  Bitmap Index Scan on skewed_category_idx
                        (cost=0.00..19.68 rows=967 width=0)
                        (actual time=0.368..0.368 rows=1000 loops=1)
                     Index Cond: (category = 42)
                     Buffers: shared read=6
 Planning time: 0.371 ms
 Execution time: 4.625 ms

PostgreSQL uses the index to find the 1000 rows with category 42, filters out the ones that are not interesting, sorts them and returns the top 20. 5 milliseconds is fine.

A new index makes things go sour

Now we add an index that can help us with sorting. That is definitely interesting if we often have to find the top 20 results:

CREATE INDEX skewed_sort_idx ON skewed (sort);

And suddenly, things are looking worse:

                          QUERY PLAN                                                               
-------------------------------------------------------------
 Limit  (cost=0.42..736.34 rows=20 width=9)
        (actual time=21.658..28.568 rows=20 loops=1)
   Buffers: shared hit=374 read=191
   ->  Index Scan using skewed_sort_idx on skewed
                (cost=0.42..33889.43 rows=921 width=9)
                (actual time=21.655..28.555 rows=20 loops=1)
         Filter: (interesting AND (category = 42))
         Rows Removed by Filter: 69022
         Buffers: shared hit=374 read=191
 Planning time: 0.507 ms
 Execution time: 28.632 ms

What happened?

PostgreSQL thinks that it will be faster if it examines the rows in sort order using the index until it has found 20 matches. But it doesn’t know how the matching rows are distributed with respect to the sort order, so it is not aware that it will have to scan 69042 rows until it has found its 20 matches (see Rows Removed by Filter: 69022 in the above execution plan).

What can we do to get the better plan?

PostgreSQL v10 has added extended statistics to track how the values in different columns are correlated, but that does no track the distributions of the values, so it will not help us here.

There are two workarounds:

Drop the index that misleads PostgreSQL.If that is possible, it is a simple solution. But usually one cannot do that, because the index is either used to enforce a unique constraint, or it is needed by other queries that benefit from it.
Rewrite the query so that PostgreSQL cannot use the offending index.Of the many possible solutions for this, I want to present two:
- A subquery with OFFSET 0:
```
SELECT *
FROM (SELECT * FROM skewed
      WHERE interesting AND category = 42
      OFFSET 0) q
ORDER BY sort
LIMIT 20;
```
  This makes use of the fact that OFFSET and LIMIT prevent a subquery from being “flattened”, even if they have no effect on the query result.
- Using an expression as sort key:
```
SELECT * FROM skewed
WHERE interesting AND category = 42
ORDER BY sort + 0
LIMIT 20;
```
  This makes use of the fact that PostgreSQL cannot deduce that sort + 0 is the same as sort. Remember that PostgreSQL is extensible, and you can define your own + operator!

The post Adding an index can decrease SELECT performance appeared first on Cybertec.

↧

Vincenzo Romano: Temporary functions (kind of) without schema qualifiers

June 11, 2018, 3:40 am

≫ Next: Craig Kerstiens: Configuring memory for Postgres

≪ Previous: Laurenz Albe: Adding an index can decrease SELECT performance

In a previous article of mine I’ve been bitten by the “temporary function issue” (which isn’t an issue at all, of course).

I needed a way to use the now() function in different ways to do a sort of “time travel” over a history table.

I’ve found a few easy ways to accomplish the same task and have now distilled one that seems to me to be the “best one“. I will make use of a temporary table.

The trick is that a new function, called mynow() will access a table without an explicit schema qualifier but relying on a controlled search_path. This approach opens the door to a controlled table masking thanks to the temporary schema each session has. Let’s see this function first.

create or replace function mynow( out ts timestamp )
language plpgsql
as $l0$
begin
  select * into ts from mytime;
  if not found then
    ts := now();
  end if;
end;
$l0$;

If you put a timestamp in the table mytime, then that timestamp will be used a the current time. If there’s nothing in there, then the normal function output will be used.

First of all, you need to create an always-empty non-temporary table like this in the public schema:

create table mytime ( ts timestamp );

So the function will start working soon with the default time line. If I inserted a time stamp straight in there, I’d set the function behavior for all sessions at once. But this isn’t normally the objective.

As soon as a user needs a different reference time for its time travelling, she needs to do the following:

set search_path = pg_temp,"$user", public; -- 1st
create table mytime ( like public.mytime including all ); -- 2nd
insert into mytime values ( '2000-01-01' ); -- 3rd

That’s it. More or less.

The first statement alters the search_path setting so the pg_temp schema becomes the first one to be searched in and, thus, the “default” schema.

The second one creates a temporary table (re-read the previous sentence, please!) “just like the public one“. Please note the schema qualifying used with the like predicate.

The third one will insert into that temporary table (you re-read it, didn’t you?) a value to be used as the reference time in your time traveling.

Time to test, now. Without closing the current PostgreSQL connection try the following:

select * from mynow();
         ts
---------------------
2000-01-01 00:00:00
(1 row)

If you close the connection and re-open it, the pg_temp schema will vanish and the very same query will behave “just like the plain old” now() function.

The same behavior can be accomplished without closing the connection by either deleting from the temporary table or by resetting the search_path to it normal value, thus “unmasking” the persistent table and resetting the function behavior;

If you go back to my article, you can replace the function call now() with mynow() and enable a full blown table time travelling.

Of course, there are also alternative implementations and even those with more care in enforcing more controlled behaviors. Feel free to propose your very own.

↧