Michael Paquier: Postgres 9.5 feature highlight: More flexible expressions in pgbench

March 19, 2015, 5:52 am

≫ Next: Josh Berkus: PostgreSQL data migration hacks from Tilt

≪ Previous: Robert Haas: Parallel Sequential Scan for PostgreSQL 9.5

A nice feature extending the usage of pgbench, in-core tool of Postgres aimed at doing benchmarks, has landed in 9.5 with this commit:

commit: 878fdcb843e087cc1cdeadc987d6ef55202ddd04
author: Robert Haas <rhaas@postgresql.org>
date: Mon, 2 Mar 2015 14:21:41 -0500
pgbench: Add a real expression syntax to \set

Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated.  Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here.  This also
adds support for the modulo operator (%), with the same semantics as in
C.

Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost

pgbench has for ages support for custom input files using -f with custom variables, variables that can be set with for example \set or \setrandom, and then can be used in a custom set of SQL queries:

\set id 10 * :scale
\setrandom id2 1 :id
SELECT name, email FROM users WHERE id = :id;
SELECT capital, country FROM world_cities WHERE id = :id2;

Up to 9.4, those custom variables can be calculated with simple rules of the type "var operator var2" (the commit message above is explicit enough), resulting in many intermediate steps and variables when doing more complicated calculations (note as well that additional operands and variables, if provided, are simply ignored after the first three ones):

\setrandom ramp 1 200
\set scale_big :scale * 10
\set min_big_scale :scale_big + :ramp
SELECT :min_big_scale;

In 9.5, such cases become much easier because pgbench has been integrated with a parser for complicated expressions. In the case of what is written above, the same calculation can be done more simply with that, but far more fancy things can be done:

\setrandom ramp 1 200
\set min_big_scale :scale * 10 + :ramp
SELECT :min_big_scale;

With pgbench run for a couple of transactions, here is what you could get:

$ pgbench -f test.sql -t 5
[...]
$ tail -n5 $PGDATA/pg_log/postgresql.log
LOG:  statement: SELECT 157;
LOG:  statement: SELECT 53;
LOG:  statement: SELECT 32;
LOG:  statement: SELECT 115;
LOG:  statement: SELECT 43;

Another thing important to mention is that this commit has added as well support for the operator modulo "%". In any case, be careful to not overdue it with this feature, grouping expressions may be good for readability, but doing it too much would make it hard to understand later on how a given script has been designed.

↧

Josh Berkus: PostgreSQL data migration hacks from Tilt

March 19, 2015, 1:55 pm

≫ Next: Paul Ramsey: PostGIS 2.1.6 Released

≪ Previous: Michael Paquier: Postgres 9.5 feature highlight: More flexible expressions in pgbench

Since the folks at Tilt.com aren't on Planet Postgres, I thought I'd link their recent blog post on cool data migration hacks. Tilt is a YCombinator company, and a SFPUG supporter.

↧

Paul Ramsey: PostGIS 2.1.6 Released

March 19, 2015, 5:00 pm

≫ Next: Craig Ringer: Dynamic SQL-level configuration for BDR 0.9.0

≪ Previous: Josh Berkus: PostgreSQL data migration hacks from Tilt

The 2.1.6 release of PostGIS is now available.

The PostGIS development team is happy to release patch for PostGIS 2.1, the 2.1.6 release. As befits a patch release, the focus is on bugs, breakages, and performance issues. Users with large tables of points will want to priorize this patch, for substantial (~50%) disk space savings.

http://download.osgeo.org/postgis/source/postgis-2.1.6.tar.gz

Continue Reading by clicking title hyperlink ..

↧

Craig Ringer: Dynamic SQL-level configuration for BDR 0.9.0

March 19, 2015, 11:19 pm

≫ Next: Shaun M. Thomas: PG Phriday: Date Based Partition Constraints

≪ Previous: Paul Ramsey: PostGIS 2.1.6 Released

The BDR team has recently introduced support for dynamically adding new nodes to a BDR group from SQL into the current development builds. Now no configuration file changes are required to add nodes and there’s no need to restart the existing or newly joining nodes.

This change does not appear in the current 0.8.0 stable release; it’ll land in 0.9.0 when that’s released, and can be found in the bdr-plugin/next branch in the mean time.

New nodes negotiate with the existing nodes for permission to join. Soon they’ll be able to the group without disrupting any DDL locking, global sequence voting, etc.

There’s also an easy node removal process so you don’t need to modify internal catalog tables and manually remove slots to drop a node anymore.

New node join process

With this change, the long-standing GUC-based configuration for BDR has been removed. bdr.connections no longer exists and you no longer configure connections with bdr.[conname]_dsn etc.

Instead, node addition is accomplished with the bdr.bdr_group_join(...) function. Because this is a function in the bdr extension, you must first CREATE EXTENSION bdr;. PostgreSQL doesn’t have extension dependencies and the bdr extension requires the btree_gist extension so you’ll have to CREATE EXTENSION btree_gist first.

Creating the first node

Creation of the first node must now be done explicitly using bdr.bdr_group_create. This promotes a standalone PostgreSQL database to a single-node BDR group, allowing other nodes to then be joined to it.

You must pass a node name and a valid externally-reachable connection string for the dsn parameter, e.g.:

CREATE EXTENSION btree_gist;

CREATE EXTENSION bdr;

SELECT bdr.bdr_group_join(
  local_node_name = 'node1',
  node_external_dsn := 'host=node1 dbname=mydb'
);

Note that the dsn is not used by the root node its self. It’s used by other nodes to connect to the root node, so you can’t use a dsn like host=localhost dbname=mydb if you intend to have nodes on multiple machines.

Adding other nodes

You can now join other nodes to form a fully functional BDR group by calling bdr.bdr_group_join and specifying a connection string that points to an existing node for the join_using_dsn. e.g.:

CREATE EXTENSION btree_gist;

CREATE EXTENSION bdr;

SELECT bdr.bdr_node_join(
    local_node_name := 'node2',
    node_external_dsn := 'host=node2 dbname=mydb',
    join_using_dsn := 'host=node1 dbname=mydb'
);

Here, node_external_dsn is an externally reachable connection string that can be used to establish connection to the new node, just like you supplied for the root node.

The join_using_dsn specifies the node that this new node should connect to when joining the group and establishing its membership. It won’t be used after joining.

Waiting until a node is ready

It’s now possible to tell when a new node has finished joining by calling bdr.node_join_wait(). This function blocks until the local node reports that it’s successfully joined a BDR group and is ready to execute commands.

Database name “bdr” now reserved

Additionally, the database name bdr is now reserved. It may not be used for BDR nodes, as BDR requires it for internal management. Hopefully this requirement can be removed later once a patch to the BGWorkers API has been applied to core.

Documentation moving into the source tree

The documentation on the PostgreSQL wiki sections for BDR is being converted into the same format as is used for PostgreSQL its self. It’s being added to the BDR extension source tree and will be available as part of the 0.9.0 release.

Trying it out

If you’d like to test out bdr-plugin/next, which is due to become BDR 0.9.0, take a look at the source install instructions and the quick-start guide.

There are no packages for BDR 0.9.0 yet, so if you try to install from packages you’ll just get 0.8.0.

Comments? Questions

Please feel free to leave comments and questions here, or post to pgsql-general with BDR-related questions.

We’re also now using GitHub to host a mirror of the BDR repository. We’re using the issue tracker there, so if you’ve found a bug and can supply a detailed report with the exact version and steps to reproduce, please file it there.

↧

Shaun M. Thomas: PG Phriday: Date Based Partition Constraints

March 20, 2015, 10:02 am

≫ Next: Paul Ramsey: Making Lines from Points

≪ Previous: Craig Ringer: Dynamic SQL-level configuration for BDR 0.9.0

PostgreSQL has provided table partitions for a long time. In fact, one might say it has always had partitioning. The functionality and performance of table inheritance has increased over the years, and there are innumerable arguments for using it, especially for larger tables consisting of hundreds of millions of rows. So I want to discuss a quirk that often catches developers off guard. In fact, it can render partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning documentation. And the pg_partman extension at PGXN follows the standard partitioning model to automate many of the pesky tasks for maintaining several aspects of partitioning. With modules like this, there’s no need to manually manage new partitions, constraint maintenance, or even some aspects of data movement and archival.

However, existing partition sets exist, and not everyone knows about extensions like this, or have developed in-house systems instead. Here’s something I encountered recently:

CREATE TABLE sys_order
(
    order_id     SERIAL       PRIMARY KEY,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE TABLE sys_order_part_201502 ()
       INHERITS (sys_order);

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-02-01'::DATE + INTERVAL '1 mon');

This looks innocuous enough, but PostgreSQL veterans are already shaking their heads. The documentation alludes to how this could be a problem:

Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month changes the right boundary of this range constraint into a dynamic value. PostgreSQL will not use dynamic values in evaluating check constraints. Here’s a query plan from PostgreSQL 9.4.1, which is the most recent release as of this writing:

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Well, it looks like the PostgreSQL planner wants to check both tables, even though the constraint we added to the child does not apply. Now, this isn’t a bug per se, but it might present as somewhat counter-intuitive. Let’s replace the constraint with one that does not use a dynamic value and try again:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-03-01'::DATE);

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Wait a minute… what happened here? There’s no dynamic values; the constraint is a simple pair of static dates. Yet still, PostgreSQL wants to check both tables. Well, this was a trick question of sorts, because the real answer lies in the data types used in the constraint. The TIMESTAMP WITH TIME ZONE type, you see, is not interchangeable with TIMESTAMP. Since the time zone is preserved in this type, the actual time and date can vary depending on how it’s cast.

Watch what happens when we change the constraint to match the column type used for order_dt:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::TIMESTAMPTZ AND
             order_dt < '2015-03-01'::TIMESTAMPTZ);

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..0.00 rows=1 width=20)
   ->  Seq Scan on sys_order  ...

Now all of the types will be directly compatible, removing any possibility of time zones being cast to a different date than the constraint uses. This is an extremely subtle type mismatch, as many developers and DBAs alike, consider these types as interchangeable. This is further complicated by the fact DATE seems to be the best type to use for the constraint, since time isn’t relevant to the desired boundaries.

It’s important to understand that even experienced developers and DBAs can get types wrong. This is especially true when including information like the time zone appears completely innocent. In fact, it’s the default PostgreSQL datetime type for a very good reason: time zones change. Without the time zone, data in the column is bound to the time zone wherever the server is running. That this applies to dates as well, can come as a bit of a surprise.

The lesson here is to always watch your types. PostgreSQL removed a lot of automatic casting in 8.3, and received no small amount of backlash for doing so. However, we can see how subtly incompatible types can cause major issues down the line. In the case of partitioning, a type mismatch can be the difference between reading 10-thousand rows, or 10-billion.

↧

Paul Ramsey: Making Lines from Points

March 20, 2015, 1:07 pm

≫ Next: Paul Ramsey: Magical PostGIS

≪ Previous: Shaun M. Thomas: PG Phriday: Date Based Partition Constraints

Somehow I've gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague's blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there's always something new to learn.

Suppose you have a GPS location table:

gps_id: integer
geom: geometry
gps_time: timestamp
gps_track_id: integer

You can get a correct set of lines from this collection of points with just this SQL:


SELECT 
  gps_track_id, 
  ST_MakeLine(geom ORDER BY gps_time ASC) AS geom 
FROM gps_poinst
GROUP BY gps_track_id

Those of you who already knew about placing ORDER BY within an aggregate function are going "duh", and the rest of you are, like me, going "whaaaaaa?"

Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.

↧

Paul Ramsey: Magical PostGIS

March 21, 2015, 9:16 am

≫ Next: Rajeev Rastogi: Overview of PostgreSQL Engine Internals

≪ Previous: Paul Ramsey: Making Lines from Points

I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I've learned over the past six months about using PostgreSQL and PostGIS together to make "magic" (any sufficiently advanced technology...)

↧

Rajeev Rastogi: Overview of PostgreSQL Engine Internals

March 18, 2015, 12:01 am

≫ Next: Jason Petersen: Announcing pg_shard 1.1

≪ Previous: Paul Ramsey: Magical PostGIS

POSTGRESQL is an open-source, full-featured relational database. This blog gives an overview of how POSTGRESQL engine processes queries received from the user.

Typical simplified flow of PostgreSQL engine is:

SQL Engine Flow

As part of this blog, I am going to cover all modules marked in yellow colour.

Parser:

Parser module is responsible for syntactical analysis of the query. It constitute two sub-modules:

1. Lexical scanner

2. Bison rules/actions

Lexical Scanner:

Lexical scanner reads each character from the given query and return the appropriate token based on the matching rules. E.g. rules can be as follows:

Name given in the <> is the state name, in the above example <xc> is the state name for the comment start. So once it sees the comment starting character, comment body token will be read in the <xc> state only.

Bison:

Bison reads token returned from scanner and matches the same against the given rule for a particular query and performs the associated actions. E.g. the bison rule for SELECT statement is:

So each returned token is matched with the rule mentioned above in left-right order, if at any time it does not find matching rule, then either it goes to next possible matching rule or throws an error.

Analyzer:

Analyzer module is responsible for doing semantic analysis of the given query. Each raw information about the query received from the Parser module is transformed to database internal object form to get the corresponding object id. E.g. relation name "tbl" get replaces with its object id.

Output of analyzer module is Query tree, structure of same can be seen in the structure "Query" of file src/include/nodes/parsenodes.h

Optimizer:

Optimizer module also consider to be brain of SQL engine is responsible for choosing the best path for execution of the query. Best path for a query is selected based on the cost of the path. The path with least cost is considered to be a winner path.

Based on the winner path, plan is created which is used by executor to execute the query.

Some of the important decision points are taken in terms of below methods:

Scan Method

Sequential scan: Simply read the heap file start to end so it is considered to be very slow if many records to be fetched.
Index scan: Use a secondary data structure to quickly find the records that satisfy a certain predicate and then corresponding to that it looks for other part of the record in Heap. So it involves extra cost of random page access.

Join Method

Nested Loop Join: In this join approach, each record of outer table is matched with each record of inner table. The simple algorithm for the same is:

For a NL join between Outer and Inner on Outer:k = Inner:k:

for each tuple r in outer:

for each tuple s in Inner with s.k = r.k:

emit output tuple (r,s)

Equivalently: Outer is left, Inner is right.

Merge Join: This join is suitable only sorted record of each participating table and only for "=" join clause. The simple algorithm for this join is:

For both r in Outer, s in Inner:

if r.k = s.k:

emit output tuple (r,s)

Advance Outer & Inner

if r.k< s.k

Advance Outer

else

Advance Inner

Hash Join: This join does not require records to be sorted but this is also used only for "=" join clause.

For a HJ between Inner and Outer on Inner:k = Outer:k:

-- build phase

for each tuple r in Inner:

insert r into hash table T with key r.k

-- probe phase

for each tuple s in Outer:

for each tuple r in bucket T[s.k]:

if s.k = r.k:

emit output tuple (T[s.k], s)

3. Join Order: It is mechanism to decide the order in which table has to be joined.

Typical output of the plan is:

postgres=# explain select firstname from friend where age=33 order by firstname;

QUERY PLAN

--------------------------------------------------------------

Sort (cost=1.06..1.06 rows=1 width=101)

Sort Key: firstname

-> Seq Scan on friend (cost=0.00..1.05 rows=1 width=101)

Filter: (age = 33)

(4 rows)

Executor:

Executor is the module, which takes output of planner as input and transform each node of plan to state tree node. Then in turn each node gets executed to perform the corresponding operation.

The state tree nodes execution starts from root and to get the input, it keep going to child node till it reaches the leaf node. So finally leaf node executed to pass the input to upper node. Out of two leaf nodes, first outer node (i.e. left node) gets evaluated.

At this point it uses the interface from storage module to retrieve the actual data.

Typically execution process can be divided as:

Executor Start: Prepares the plan for execution. It process each node of plan recursively and generate corresponding state tree node. Also it initializes memory to hold projection list, qualification expression and slot for holding the resultant tuple.
Executor Run: Recursively process each state tree node and each resultant tuple is send to front-end using the register destination function.
Executor Finish: Free all the allocated resources.

One of the typical flow of execution is as below:

So in above flow, Execution starts from Merge Join but it needs input to process, so it flow towards first left child node, take one tuple using index scan and then it request for input tuple from right child node. Right child node is Sort node, so it request for tuple from its child, which in turn does the sequence scan. So once all tuple is received at Sort node and tuples are shorted, then it passes the first tuple to its parent node.

Reference:

Older papers from PostgreSQL.

↧

Jason Petersen: Announcing pg_shard 1.1

March 19, 2015, 12:53 pm

≫ Next: Marco Slot: PGConf.Russia talk on pg_shard

≪ Previous: Rajeev Rastogi: Overview of PostgreSQL Engine Internals

Last winter, we open-sourced pg_shard, a transparent sharding extension for PostgreSQL. It brought straightforward sharding capabilities to PostgreSQL, allowing tables and queries to be distributed across any number of servers.

Today we’re excited to announce the next release of pg_shard. The changes in this release include:

Improved performace— INSERT commands run up to four times faster
Shard repair— Easily bring inactive placements back up to speed
Copy script— Quickly import data from CSV and other files from the command line
CitusDB integration— Expose pg_shard’s metadata for CitusDB’s use
Resource improvements— Execute larger queries than ever before

For more information about recent changes, you can view all the issues closed during this release cycle on GitHub.

Upgrading or installing is a breeze: see pg_shard’s GitHub page for detailed instructions.

Whether you want a distributed document store alongside your normal PostgreSQL tables or need the extra computational power afforded by a sharded cluster, pg_shard can help. We continue to grow pg_shard’s capabilities and are open to feature requests.

Got questions?

If you have any questions about pg_shard, please contact us using the pg_shard-users mailing list.

If you discover an issue when using pg_shard, please submit it to our issue tracker on GitHub.

Further information is available on our website, where you are free to contact us with any general questions you may have.

↧

Marco Slot: PGConf.Russia talk on pg_shard

March 23, 2015, 5:53 am

≫ Next: Heikki Linnakangas: pg_rewind in PostgreSQL 9.5

≪ Previous: Jason Petersen: Announcing pg_shard 1.1

Last month we went to PGConf.Russia and gave a talk on pg_shard, now available for all to see:

We got some very interesting questions during the talk that we wanted to highlight and clarify.

Does pg_shard/CitusDB run my queries in parallel? In pg_shard, a query will use one thread on a master node and one on a worker node. You can run many queries in parallel by making multiple connections to the master node(s), whereas the real work is being done by the worker nodes. UPDATE and DELETE queries on the same shard are serialized to ensure consistency between the replicas. To parallelize multi-shard SELECT queries across the worker nodes, you can upgrade to CitusDB.
Can I use stored procedures in my queries? Yes, and this is a powerful feature of pg_shard. Function calls in queries are executed on the worker nodes, which allows you to include arbitrary logic in your queries and scale it out to many worker nodes.
How do I ALTER a distributed table?
pg_shard currently does not automatically propagate ALTER TABLE commands on a distributed table to the individual shards on the workers, but you can easily do this with a simple shell script. For pg_shard: alter-pgshard-table.sh and for CitusDB: alter-citusdb-table.sh.
What kind of lock is used when copying shard placements? In the latest version of pg_shard, we've added a master_copy_shard_placement function which takes an exclusive lock of the shard. This will temporarily block changes to the shard, while selects can still go through.
What's the difference between pg_shard and PL/Proxy? Pl/Proxy allows you to scale out stored procedures on a master node across many worker nodes, whereas pg_shard allows you to (transparently) scale out a table and queries on the table across many worker nodes using replication and sharding.
Can I use cstore_fdw and pg_shard without CitusDB? You certainly can! cstore_fdw and pg_shard can be used both in a regular PostgreSQL database or in combination with CitusDB.

We would like to thank the organizers for a great conference and providing the recording of the talk!

↧

Heikki Linnakangas: pg_rewind in PostgreSQL 9.5

March 23, 2015, 1:05 pm

≫ Next: gabrielle roth: Upgrading an existing RDS database to Postgres 9.4

≪ Previous: Marco Slot: PGConf.Russia talk on pg_shard

Before PostgreSQL got streaming replication, back in version 9.0, people kept asking when we’re going to get replication. That was a common conversation-starter when standing at a conference booth. I don’t hear that anymore, but this dialogue still happens every now and then:

- I have streaming replication set up, with a master and standby. How do I perform failover?
- That’s easy, just kill the old master node, and run “pg_ctl promote” on the standby.
- Cool. And how do I fail back to the old master?
- Umm, well, you have to take a new base backup from the new master, and re-build the node from scratch..
- Huh, what?!?

pg_rewind is a better answer to that. One way to think of it is that it’s like rsync on steroids. Like rsync, it copies files that differ between the source and target. The trick is in how it determines which files have changed. Rsync compares timestamps, file sizes and checksums, but pg_rewind understands the PostgreSQL file formats, and reads the WAL to get that information instead.

I started hacking on pg_rewind about a year ago, while working for VMware. I got it working, but it was a bit of a pain to maintain. Michael Paquier helped to keep it up-to-date, whenever upstream changes in PostgreSQL broke it. A big pain was that it has to scan the WAL, and understand all different WAL record types – miss even one and you might end up with a corrupt database. I made big changes to the way WAL-logging works in 9.5, to make that easier. All WAL record types now contain enough information to know what block it applies to, in a common format. That slashed the amount of code required in pg_rewind, and made it a lot easier to maintain.

I have just committed pg_rewind into the PostgreSQL git repository, and it will be included in the upcoming 9.5 version. I always intended pg_rewind to be included in PostgreSQL itself; I started it as a standalone project to be able to develop it faster, outside the PostgreSQL release cycle, so I’m glad it finally made it into the main distribution now. Please give it a lot of testing!

PS. I gave a presentation on pg_rewind in Nordic PGDay 2015. It was a great conference, and I think people enjoyed the presentation. Have a look at the slides for an overview on how pg_rewind works. Also take a look at the page in the user manual.

↧

gabrielle roth: Upgrading an existing RDS database to Postgres 9.4

March 23, 2015, 6:09 pm

≫ Next: Josh Berkus: pgDay SF recap

≪ Previous: Heikki Linnakangas: pg_rewind in PostgreSQL 9.5

Last Thursday, I had this short and one-sided conversation with myself: “Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 […]

↧

Josh Berkus: pgDay SF recap

March 24, 2015, 12:30 pm

≫ Next: Rajeev Rastogi: Index Scan Optimization for ">" condition

≪ Previous: gabrielle roth: Upgrading an existing RDS database to Postgres 9.4

main-image

On March 10th, we had our third ever pgDay for SFPUG, which was a runaway success. pgDaySF 2015 was held together with FOSS4G-NA and EclipseCon; we were especially keen to join FOSS4G because of the large number of PostGIS users attending the event. In all, around 130 DBAs, developers and geo geeks joined us for pgDay SF ... so many that the conference had to reconfigure the room to add more seating!

standing room only

The day started out with Daniel Caldwell showing how to use PostGIS for offline mobile data, including a phone demo.

Daniel Caldwell setting up

Ozgun Erdogan presented pg_shard with a a short demo.

Ozgun presents pg_shard with PostGIS

Gianni Ciolli flew all the way from London to talk about using Postgres' new Logical Decoding feature for database auditing.

Gianni Ciolli presenting

Peak excitement of the day was Paul Ramsey's "PostGIS Feature Frenzy" presentation.

Paul Ramsey making quotes

We also had presentations by Mark Wong and Bruce Momjian, and lightning talks by several presenters. Slides for some sessions are available on the FOSS4G web site. According to FOSS4G, videos will be available sometime soon.

Of course, we couldn't have done it without our sponsors: Google, EnterpriseDB, 2ndQuadrant, CitusDB and pgExperts. So a big thank you to our sponsors, our speakers, and the staff of FOSS4G-NA for creating a great day.

↧

Rajeev Rastogi: Index Scan Optimization for ">" condition

March 24, 2015, 10:00 pm

≫ Next: David Fetter: Formatting!

≪ Previous: Josh Berkus: pgDay SF recap

In PostgreSQL 9.5, we can see improved performance for Index Scan on ">" condition.

In order to explain this optimization, consider the below schema:

create table tbl2(id1 int, id2 varchar(10), id3 int);

create index idx2 on tbl2(id2, id3);

Query as:

select count(*) from tbl2 where id2>'a' and id3>990000;

As per design prior to this patch, Above query used following steps to retrieve index tuples:

Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2>'a'
Then it fetches each tuples from position found in step-1.
For each tuple, it matches all scan key condition, in our example it matches both scan key condition.
If condition match, it returns the tuple otherwise scan stops.

Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted).

So comparison on first scan key condition again in step-3 seems to be redundant.

So we have made the changes in BTree scan algorithm to avoid the redundant check i.e. remove the first key comparison for each tuple as it is guaranteed to be always true.

Performance result summary:

I would like to thanks Simon Riggs for verifying and committing this patch. Simon Riggs also confirmed improvement of 5% in both short and long index, on the least beneficial data-type and considered to be very positive win overall.

↧

David Fetter: Formatting!

March 26, 2015, 6:27 am

≫ Next: Josh Berkus: Save the Date: pgConf Silicon Valley

≪ Previous: Rajeev Rastogi: Index Scan Optimization for ">" condition

SQL is code.

This may seem like a simple idea, but out in the wild, you will find an awful lot of SQL programs which consist of a single line, which makes them challenging to debug.

Getting it into a format where debugging was reasonably easy used to be tedious and time-consuming, but no more!
Continue reading "Formatting!"

↧

Josh Berkus: Save the Date: pgConf Silicon Valley

March 26, 2015, 12:51 pm

≫ Next: Michael Paquier: Postgres 9.5 feature highlight: Scale-out with Foreign Tables now part of Inheritance Trees

≪ Previous: David Fetter: Formatting!

On November 18th, 2015, we will have an independent, multi-track conference all about high performance PostgreSQL: pgConf SV. This conference is being organized by CitusData at the South San Francisco Convention Center. Stay tuned for call for presentations, sponsorships, and more details soon.

↧

Michael Paquier: Postgres 9.5 feature highlight: Scale-out with Foreign Tables now part of Inheritance Trees

March 27, 2015, 6:53 am

≫ Next: Shaun M. Thomas: PG Phriday: High Availability Through Delayed Replication

≪ Previous: Josh Berkus: Save the Date: pgConf Silicon Valley

This week the following commit has landed in PostgreSQL code tree, introducing a new feature that will be released in 9.5:

commit: cb1ca4d800621dcae67ca6c799006de99fa4f0a5
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Sun, 22 Mar 2015 13:53:11 -0400
Allow foreign tables to participate in inheritance.

Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

[...]

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me

As mentioned in the commit message, foreign tables can now be part of an inheritance tree, be it as a parent or as a child.

Well, seeing this commit, one word comes immediately in mind: in-core sharding. And this feature opens such possibilities with for example a parent table managing locally a partition of foreign child tables located on a set of foreign servers.

PostgreSQL offers some way to already do partitioning by using CHECK constraints (non-intuitive system but there may be improvements in a close future in this area). Now combined with the feature committed, here is a small example of how to do sharding without the need of any external plugin or tools, only postgres_fdw being needed to define foreign tables.

Now let's take the example of 3 Postgres servers, running on the same machine for simplicity, using ports 5432, 5433 and 5434. 5432 will hold a parent table, that has two child tables, the two being foreign tables, located on servers listening at 5433 and 5434. The test case is simple: a log table partitioned by year.

First on the foreign servers, let's create the child tables. Here it is for the table on server 5433:

=# CREATE TABLE log_entry_y2014(log_time timestamp,
       entry text,
       check (date(log_time) >= '2014-01-01' AND
              date(log_time) < '2015-01-01'));
CREATE TABLE

And the second one on 5434:

=# CREATE TABLE log_entry_y2015(log_time timestamp,
       entry text,
       check (date(log_time) >= '2015-01-01' AND
              date(log_time) < '2016-01-01'));
CREATE TABLE

Now it is time to do the rest of the work on server 5432, by creating a parent table, and foreign tables that act as children, themselves linking to the relations on servers 5433 and 5434 already created. First here is some preparatory work to define the foreign servers.

=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
=# CREATE SERVER server_5433 FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE SERVER
=# CREATE SERVER server_5434 FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS (host 'localhost', port '5434', dbname 'postgres');
CREATE SERVER
=# CREATE USER MAPPING FOR PUBLIC SERVER server_5433 OPTIONS (password '');
CREATE USER MAPPING
=# CREATE USER MAPPING FOR PUBLIC SERVER server_5434 OPTIONS (password '');
CREATE USER MAPPING

And now here are the local tables:

=# CREATE TABLE log_entries(log_time timestamp, entry text);
CREATE TABLE
=# CREATE FOREIGN TABLE log_entry_y2014_f (log_time timestamp,
                                           entry text)
   INHERITS (log_entries) SERVER server_5433 OPTIONS (table_name 'log_entry_y2014');
CREATE FOREIGN TABLE
=# CREATE FOREIGN TABLE log_entry_y2015_f (log_time timestamp,
                                           entry text)
   INHERITS (log_entries) SERVER server_5434 OPTIONS (table_name 'log_entry_y2015');
CREATE FOREIGN TABLE

The tuple insertion from the parent table to the children can be achieved using for example a plpgsql function like this one with a trigger on the parent relation log_entries.

=# CREATE FUNCTION log_entry_insert_trigger()
   RETURNS TRIGGER AS $$
   BEGIN
     IF date(NEW.log_time) >= '2014-01-01' AND date(NEW.log_time) < '2015-01-01' THEN
       INSERT INTO log_entry_y2014_f VALUES (NEW.*);
     ELSIF date(NEW.log_time) >= '2015-01-01' AND date(NEW.log_time) < '2016-01-01' THEN
       INSERT INTO log_entry_y2015_f VALUES (NEW.*);
     ELSE
       RAISE EXCEPTION 'Timestamp out-of-range';
     END IF;
     RETURN NULL;
   END;
   $$ LANGUAGE plpgsql;
 CREATE FUNCTION
 =# CREATE TRIGGER log_entry_insert BEFORE INSERT ON log_entries
    FOR EACH ROW EXECUTE PROCEDURE log_entry_insert_trigger();
 CREATE TRIGGER

Once the environment is set and in place, log entries can be insertedon the parent tables, and will be automatically sharded across the foreign servers.

=# INSERT INTO log_entries VALUES (now(), 'Log entry of 2015');
INSERT 0 0
=# INSERT INTO log_entries VALUES (now() - interval '1 year', 'Log entry of 2014');
INSERT 0 0
=# INSERT INTO log_entries VALUES (now(), 'Log entry of 2015-2');
INSERT 0 0
=# INSERT INTO log_entries VALUES (now() - interval '1 year', 'Log entry of 2014-2');
INSERT 0 0

The entries inserted are of course localized on their dedicated foreign tables:

=# SELECT * FROM log_entry_y2014_f;
          log_time          |        entry
----------------------------+---------------------
 2014-03-27 22:34:04.952531 | Log entry of 2014
 2014-03-27 22:34:28.06422  | Log entry of 2014-2
(2 rows)
=# SELECT * FROM log_entry_y2015_f;
          log_time          |        entry
----------------------------+---------------------
 2015-03-27 22:31:19.042066 | Log entry of 2015
 2015-03-27 22:34:18.425944 | Log entry of 2015-2
(2 rows)

Something useful to note as well is that EXPLAIN is now verbose enough to identify all the tables targetted by a DML. For example in this case (not limited to foreign tables):

=# EXPLAIN UPDATE log_entries SET log_time = log_time + interval '1 day';
                                      QUERY PLAN
-----------------------------------------------------------------------------------
 Update on log_entries  (cost=0.00..296.05 rows=2341 width=46)
   Update on log_entries
   Foreign Update on log_entry_y2014_f
   Foreign Update on log_entry_y2015_f
   ->  Seq Scan on log_entries  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on log_entry_y2014_f  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on log_entry_y2015_f  (cost=100.00..148.03 rows=1170 width=46)
(7 rows)

And this makes a day.

↧

Shaun M. Thomas: PG Phriday: High Availability Through Delayed Replication

March 27, 2015, 9:53 am

≫ Next: Josh Berkus: Crazy SQL Saturday: replacing SciPy with SQL

≪ Previous: Michael Paquier: Postgres 9.5 feature highlight: Scale-out with Foreign Tables now part of Inheritance Trees

High availability of PostgreSQL databases is incredibly important to me. You might even say it’s a special interest of mine. It’s one reason I’m both excited and saddened by a feature introduced in 9.4. I’m Excited because it’s a feature I plan to make extensive use of, and saddened because it has flown under the radar thus far. It’s not even listed in the What’s new in PostgreSQL 9.4 Wiki page. If they’ll let me, I may have to rectify that.

What is this mysterious change that has me drooling all over my keyboard? The new recovery_min_apply_delay standby server setting. In name and intent, it forces a standby server to delay application of upstream changes. The implications, however, are much, much more important.

Let me tell you a story; it’s not long, actually. A couple years ago, I had to help a client that was using a hilariously over-engineered stack to prevent data loss. I only say that because at first glance, the number of layers and duplicate servers would shock most people, and the expense would finish the job. This was one of my full recommended stacks, plus a few extra bits for the truly paranoid. DRBD-bonded servers, Pacemaker failover, off-site disaster recovery streaming clones, nightly backup, off-site backup and historical WAL storage, and long-term tape archival in a vault for up to seven years. You would need to firebomb several cities to get rid of this data.

But data permanence and availability are not synonymous. All it took was a single misbehaving CPU to take out the entire constellation of database servers, and corrupt a bunch of recent WAL files for good measure. How this is possible, and how difficult it is to avoid, is a natural extension of using live streaming replicas for availability purposes. We always need to consider one important factor: immediacy applies to everything.

Here’s what actually happened:

A CPU on master-1 went bad.
Data being written to database files was corrupt.
DRBD copied the bad blocks, immediately corrupting master-2.
Shared memory became corrupt.
Streaming replication copied the corrupt data to dr-master-1.
DRBD copied the bad blocks, immediately corrupting dr-master-2.
In turn, PostgreSQL noticed the corruption and crashed on each server.
Monitoring systems started screaming on all channels.

Just like that, a bulletproof high-availability cluster imploded into itself. All we had left at that point were the pristine backups, and the off-site WAL archives. This is one of the major reasons I wrote walctl, actually. Keeping archived WAL files on a tertiary server isolates them from issues that affect the primary or disaster recovery clusters. Further, it means the files can be pulled by any number of new clones without overloading the masters, which are intended to be used for OLTP.

In this case, we pulled a backup from the off-site backup vault, gathered the WAL files that were generated before the CPU went bad, and got the cluster running again in a couple of hours. But this could have easily been much worse, and without the previously-mentioned expensive paranoia and surplus of redundancy levels, it would have. And despite the fact we recovered everything, there’s still the several-hour outage to address.

You see, we weren’t paranoid enough. For a truly high-available architecture, corruption of the data source should always be considered a possibility. Both DRBD and PostgreSQL strive to copy data as quickly as possible, just as they should. Synchronization delay is another huge, but unrelated problem applications often need to circumvent when communicating with replicas. One way to solve this is to keep a third standby server that uses traditional WAL consumption, and then implement a time delay.

Effectively, this means preventing the extra server from processing WAL files for some period of time. This interval allows a DBA to interrupt replay before corruption reaches a fully online replica. It takes time for monitoring systems to report outages, and for the DBA to log into a server and diagnose the problem. As we’ve seen, it can already be too late; the data is already corrupt, and a backup is the only recourse. But a delayed server is already online, can easily be recovered to a point right before the corruption started, and can drastically reduce the duration of an outage.

There are several ways of imposing this delay, and all of them require at least one more series of scripts or software to strictly regulate file availability. They’re also largely irrelevant since the introduction of PostgreSQL 9.4 and the recovery_min_apply_delay setting. Instead of a cron job, or using a complicated script as the restore_command in recovery.conf, or some other method, we just set this variable and we get the desired offset. Here’s a two-hour window:

recovery_min_apply_delay = '2h'

This works with both streaming replication, and more traditional WAL file recovery. There is however, one caveat to using this setting. Since the replica can not apply the changes as they’re presented, they are held in the pg_xlog directory until the imposed purgatory expires. On highly transactional systems, this can result in unexpected space usage on replicas that activate the setting. The larger the safety margin, the more files will accumulate awaiting replay.

Barring that, it’s a huge win for anyone who wants to run a highly available cluster. In fact, it can even be integrated into cluster automation, so a delayed server is stopped if the primary system is down. This keeps our desired window intact while we investigate the problem, without us having to stop troubleshooting and shut down the time-offset replica ourselves.

In addition, a delayed server can be used for standard recovery purposes. If a user erroneously deletes data, or a rogue process drops a critical object, there’s a replica ready and waiting to let us recover the data and reintroduce it to the master server.

Having a server sitting around with self-inflicted synchronization offset seems ridiculous at first glance. But from the perspective of a DBA, it can literally save the database if used properly. I highly recommend anyone who can afford to implement this technique, does so. Your system uptime will thank you.

↧

Josh Berkus: Crazy SQL Saturday: replacing SciPy with SQL

March 28, 2015, 11:34 am

≫ Next: Umair Shahid: NoSQL Support in PostgreSQL

≪ Previous: Shaun M. Thomas: PG Phriday: High Availability Through Delayed Replication

I have a data analytics project which produces multiple statistical metrics for a large volume of sensor data. This includes percentiles (like median and 90%) as well as average, min and max. Originally this worked using PL/R, which was pretty good except that some of the R modules were crashy, which was not so great for uptime.

This is why, two years ago, I ripped out all of the PL/R and replaced it with PL/Python and SciPy. I love SciPy because it gives me everything I liked about R, without most of the things I didn't like. But now, I've ripped out the SciPy as well. What am I replacing it with? Well, SQL.

In version 9.4, Andrew Gierth added support for percentiles to PostgreSQL via WITHIN GROUP aggregates. As far as I'm concerned, this is second only to JSONB in reasons to use 9.4.

Now, one of the more complicated uses I make of aggregates is doing "circular" aggregates, that is producing percentiles for a set of circular directions in an effort to determine the most common facings for certain devices. Here's the PL/Python function I wrote for this, which calculates circular aggregates using the "largest gap" method. This algorithm assumes that the heading measurements are essentially unordered, so to find the endpoints of the arc we look for two measurements which are the furthest apart on the circle. This means shifting the measurements to an imaginary coordinate system where the edge of this gap is the low measurement, calculating percentiles, and then shifting it back. Note that this method produces garbage if the device turned around a complete circle during the aggregate period.

Now, that SciPy function was pretty good and we used it for quite a while. But we were unhappy with two things: first, SciPy is rather painful as a dependency because the packaging for it is terrible; second, having PostgreSQL call out to SciPy for each iteration isn't all that efficient.

So, since 9.4 has percentiles now, I started writing a function based the built-in SQL percentiles. Initially I was thinking it would be a PL/pgSQL function, but was pleasantly surprised to find that I could write it entirely as a SQL function! Truly, Postgres's SQL dialect is turing-complete.

So here's the new all-SQL function, with some helper functions.

Then I performance tested it, and was pleasantly surprised again. The SciPy version took 2.6 seconds* to aggregate 100,000 sets of 20 measurements. The new SQL version takes 40 milleseconds, cutting response time by 98%. Wow!

And I've eliminated a hard-to-install dependency. So it's all win. Of course, if anyone has ideas on making it even faster, let me know.

Pushing the limits of SQL to the edge of insanity.

(* note: I expect that most of the extra time for the SciPy version is in calling out to Python through PL/Python, rather than in SciPy itself.)

↧

Umair Shahid: NoSQL Support in PostgreSQL

March 23, 2015, 5:36 am

≫ Next: Peter Eisentraut: Retrieving PgBouncer statistics via dblink

≪ Previous: Josh Berkus: Crazy SQL Saturday: replacing SciPy with SQL

Developers have been really excited about the addition of JSON support starting PostgreSQL v9.2. They feel they now have the flexibility to work with a schema-less unstructured dataset while staying within a relational DBMS. So what’s the buzz all about? Let’s explore below …

Why is NoSQL so attractive?

Rapid turnaround time … it is as simple as that. With the push to decrease time-to-market, developers are under constant pressure to turn POCs around very quickly. It is actually not just POCs, marketable products are increasingly getting the same treatment. The attitude is, “If I don’t get it out, someone else will.”.

Any decent sized application will need to store data somewhere. Rather than going through the pains of designing schemas and the debates on whether to normalize or not, developers just want to get to the next step. That’s how databases like MongoDB gained such tremendous popularity. They allow for schema-less, unstructured data to be inserted in document form and the developers find it easy to convert class objects within their code into that document directly.

There is a trade-off, however. The document (and key/value store) databases are very unfriendly to relations. While retrieving data, you will have a very hard time cross referencing between different tables making analytics nearly impossible. And, nightmare of nightmares for mission critical applications, these databases are not ACID compliant.

In walks PostgreSQL with JSON and HSTORE support.

NoSQL in PostgreSQL

While the HSTORE contrib module has been providing key/value data types in standard PostgreSQL table columns since v8.2, the introduction of native JSON support in v9.2 paves way for the true power of NoSQL within PostgreSQL.

Starting v9.3, not only do you have the ability to declare JSON data types in standard tables, you now have functions to encode data to JSON format and also to extract data elements from a JSON column. What’s more, you can also interchange data between JSON and HSTORE using simple & intuitive functions.

… and this is all ACID compliant!

The Power

Talk about bringing the best of both worlds together, the power that NoSQL capabilities bring to a traditional relational database is amazing. Developers now have the ability to kick-start their application development without any database bottlenecks using unstructured data. At the stage where analytics are required, they can be gradually structured to accommodate enterprise requirements within the same PostgreSQL database without the need for expensive migrations.

Have questions? Contact us NOW!

The post NoSQL Support in PostgreSQL appeared first on Stormatics.

↧