Quantcast
Channel: Planet PostgreSQL
Viewing all 9765 articles
Browse latest View live

Robins Tharakan: Explain-Plan-Nodes Grid

$
0
0
While reading up about the various PostgreSQL EXPLAIN Plan nodes from multiple sources, I realized a clear lack of a consolidated grid / cheat-sheet, from which I could (in-one-view) cross-check how the plan nodes perform on a comparative basis. While at it, I also tried to attribute what their (good / bad) characteristics are. Hope this (Work-In-Progress) lookup table helps others on the same

Spencer Christensen: Where is pg_ctl on CentOS 6 for Postgres 9.3?

$
0
0

When installing PostgreSQL 9.3 onto a CentOS 6 system, you may notice that some postgres commands appear to be missing (like pg_ctl, initdb, and pg_config). However, they actually are on your system but just not in your path. You should be able to find them in /usr/pgsql-9.3/bin/. This can be frustrating if you don't know that they are there.

To solve the problem, you could just use full paths to these commands, like /usr/pgsql-9.3/bin/initdb, but that may get ugly quick depending on how you are calling these commands. Instead, we can add them to the path.

You could just copy them to /usr/bin/ or create symlinks for them, but both of these methods are hack-ish and could have unintended consequences. Another option is to add /usr/pgsql-9.3/bin/ to your path manually, or to the path for all users by adding it to /etc/bashrc. But again, that seems hack-ish and when you upgrade postgres to 9.4 down the road things will break again.

So instead, let's look at how Postgres' other commands get installed when you install the rpm. When you run "yum install postgresql93" the rpm contains not only all the files for that package but it also includes some scripts to run (in this case one at install time and another at uninstall time). To view everything that is getting installed and to see the scripts that are run use this command: "rpm -qilv --scripts postgresql93". There will be a lot of output, but in there you will see:

...
postinstall scriptlet (using /bin/sh):
/usr/sbin/update-alternatives --install /usr/bin/psql pgsql-psql /usr/pgsql-9.3/bin/psql 930
/usr/sbin/update-alternatives --install /usr/bin/clusterdb  pgsql-clusterdb  /usr/pgsql-9.3/bin/clusterdb 930
/usr/sbin/update-alternatives --install /usr/bin/createdb   pgsql-createdb   /usr/pgsql-9.3/bin/createdb 930
...

That line "postinstall scriptlet (using /bin/sh):" marks the beginning of the list of commands that are run at install time. Ah ha! It runs update-alternatives! If you'r note familiar with alternatives, the short description is that it keeps track of different versions of things installed on your system and automatically manages symlinks to the version you want to run.

Now, not all the commands we're interested in are installed by the package "postgresql93"- you can search through the output and see that pg_config gets installed but is not set up in alternatives. The commands initdb and pg_ctl are part of the package "postgresql93-server". If we run the same command to view its files and scripts we'll see something interesting- it doesn't set up any of its commands using alternatives! Grrr. :-(

In the postgresql93-server package the preinstall and postinstall scripts only set up the postgres user on the system, set up /var/log/pgsql, add postgres to the init scripts, and set up the postgres user's .bash_profile. That's it. But, now that we know what commands are run for getting psql, clusterdb, and createdb into the path, we can manually run the same commands for the postgres commands that we need. Like this:

/usr/sbin/update-alternatives --install /usr/bin/initdb pgsql-initdb /usr/pgsql-9.3/bin/initdb 930
/usr/sbin/update-alternatives --install /usr/bin/pg_ctl pgsql-pg_ctl /usr/pgsql-9.3/bin/pg_ctl 930
/usr/sbin/update-alternatives --install /usr/bin/pg_config pgsql-pg_config /usr/pgsql-9.3/bin/pg_config 930

These commands should be available in your path now and are set up the same as all your other postgres commands. Now, the question about why these commands are not added to alternatives like all the others is a good one. I don't know. If you have an idea, please leave it in the comments. But at least now you have a decent work-around.

US PostgreSQL Association: I Got Into the PostgreSQL Community By Speaking…And You Should Too!

$
0
0

Seriously, you should. You should submit a talk proposal to PGConf US 2015– the worst thing that will happen is the talk committee will say “no” and offer a bunch of reasons to help you get your talk approved next year! Believe it or not, speaking at a PostgreSQL conference is a great way to help the community at large, and I hope this personal story I am going to share will shed some light as to why.

read more

Sergey Konoplev: Benchmarking PostgreSQL with Different Linux Kernel Versions on Ubuntu LTS

Baji Shaik: Finally, its.... Slony !! (switchover and failover)

$
0
0
Great tool for replication and using by most of the organizations for replicating b/w different versions of postgres and can do upgrade with minimum downtime(almost 0 sometimes??) and popular tool which I am weak at ... yeah, its SLONY. It was always in my TODO list. So finally, I learned it.. got some hands on.

And I found answer for my own question.. "when was the last time you did something NEW for the first time".

I see a lot posts on installing and configuring Slony, however when I had to do a switchover and failover of Slony for a customer, faced hard time with google. Ok, so better I have it somewhere, why not here !!!.

Customer wanted to upgrade their database from PostgreSQL 8.4 to 9.2. They have around 397 tables and wanted one set for each table hence 397 sets. Just for convenience, I'm taking 10 tables/sets to explain.

Create tables using below script in source(8.4) and target(9.3) databases:
source=# select 'create table slony_tab_'||generate_series(1,10)||'(t int primary key);';
Inserted values using below script in source database:
source=# select 'insert into slony_tab_'||a||' values (generate_series(1,100));' from generate_series(1,10) a;

Configure Slony using below scripts:

1. Init cluser script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
init cluster (id = 1 , comment = 'Primary Node For the Slave postgres');
#Setting Store Nodes ...
store node (id = 2, event node = 1 , comment = 'Slave Node For The Primary postgres');
#Setting Store Paths ...
echo  'Stored all nodes in the slony catalogs';
store path(server = 1 , client = 2, conninfo = 'host=127.0.0.1 dbname=source user=postgres port=5434');
store path(server = 2, client = 1 , conninfo = 'host=127.0.0.1 dbname=target user=postgres port=5432');
echo  'Stored all Store Paths for Failover and Switchover into slony catalogs ..';

#################################################################################################
2. Create Set Script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try { create set (id = 1 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set1 for upgrade!'; exit 1;}
try { create set (id = 2 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set2 for upgrade!'; exit 1;}
try { create set (id = 3 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set3 for upgrade!'; exit 1;}
try { create set (id = 4 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set4 for upgrade!'; exit 1;}
try { create set (id = 5 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set5 for upgrade!'; exit 1;}
try { create set (id = 6 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set6 for upgrade!'; exit 1;}
try { create set (id = 7 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set7 for upgrade!'; exit 1;}
try { create set (id = 8 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set8 for upgrade!'; exit 1;}
try { create set (id = 9 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set9 for upgrade!'; exit 1;}
try { create set (id = 10 ,origin = 1 , comment = 'Set for public'); } on error { echo  'Could not create Subscription set10 for upgrade!'; exit 1;}
set add table (set id = 1 ,origin = 1 , id = 1, full qualified name = 'public.slony_tab_1', comment = 'Table slony_tab_1 with primary key');
set add table (set id = 2 ,origin = 1 , id = 2, full qualified name = 'public.slony_tab_2', comment = 'Table slony_tab_2 with primary key');
set add table (set id = 3 ,origin = 1 , id = 3, full qualified name = 'public.slony_tab_3', comment = 'Table slony_tab_3 with primary key');
set add table (set id = 4 ,origin = 1 , id = 4, full qualified name = 'public.slony_tab_4', comment = 'Table slony_tab_4 with primary key');
set add table (set id = 5 ,origin = 1 , id = 5, full qualified name = 'public.slony_tab_5', comment = 'Table slony_tab_5 with primary key');
set add table (set id = 6 ,origin = 1 , id = 6, full qualified name = 'public.slony_tab_6', comment = 'Table slony_tab_6 with primary key');
set add table (set id = 7 ,origin = 1 , id = 7, full qualified name = 'public.slony_tab_7', comment = 'Table slony_tab_7 with primary key');
set add table (set id = 8 ,origin = 1 , id = 8, full qualified name = 'public.slony_tab_8', comment = 'Table slony_tab_8 with primary key');
set add table (set id = 9 ,origin = 1 , id = 9, full qualified name = 'public.slony_tab_9', comment = 'Table slony_tab_9 with primary key');
set add table (set id = 10 ,origin = 1 , id = 10, full qualified name = 'public.slony_tab_10', comment = 'Table slony_tab_10 with primary key');

#################################################################################################

3. Starting Slon Processes
#################################################################################################
/opt/PostgreSQL/9.2/bin/slon -s 1000 -d2 shadow 'host=127.0.0.1 dbname=source user=postgres port=5434' > /tmp/node1.log 2>&1 &
/opt/PostgreSQL/8.4/bin/slon -s 1000 -d2 shadow 'host=127.0.0.1 dbname=target user=postgres port=5432' > /tmp/node2.log 2>&1 &

#################################################################################################

4. Subscribing the sets.
#################################################################################################
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try { subscribe set (id = 1, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 1';
try { subscribe set (id = 2, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 2';
try { subscribe set (id = 3, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 3';
try { subscribe set (id = 4, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 4';
try { subscribe set (id = 5, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 5';
try { subscribe set (id = 6, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 6';
try { subscribe set (id = 7, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 7';
try { subscribe set (id = 8, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 8';
try { subscribe set (id = 9, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 9';
try { subscribe set (id = 10, provider = 1 , receiver = 2, forward = yes, omit copy = false); } on error { exit 1; } echo  'Subscribed nodes to set 10';
#################################################################################################

Till here it is normal, I mean, all can do very easily.... and of-course many sources you can find. However switchover.. failover !!

If you try to insert values in tables of "target" database, it throws below error:
target=# insert into slony_tab_1 values (11);
ERROR:  Slony-I: Table slony_tab_1 is replicated and cannot be modified on a subscriber node - role=0

Ah, its time to SwitchOver and FailOver...

Let us start with Switch-Over script --
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
lock set (id = 1, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 1, old origin = 1, new origin = 2); echo 'Set 1 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 2, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 2, old origin = 1, new origin = 2); echo 'Set 2 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 3, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 3, old origin = 1, new origin = 2); echo 'Set 3 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 4, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 4, old origin = 1, new origin = 2); echo 'Set 4 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 5, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 5, old origin = 1, new origin = 2); echo 'Set 5 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 6, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 6, old origin = 1, new origin = 2); echo 'Set 6 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 7, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 7, old origin = 1, new origin = 2); echo 'Set 7 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 8, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 8, old origin = 1, new origin = 2); echo 'Set 8 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 9, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 9, old origin = 1, new origin = 2); echo 'Set 9 Has Been Moved From Origin Node 1 To 2 ';
lock set (id = 10, origin = 1); sync (id = 1); wait for event (origin = 1, confirmed = 2, wait on = 2); move set (id = 10, old origin = 1, new origin = 2); echo 'Set 10 Has Been Moved From Origin Node 1 To 2 ';

Executing this script gives you this:
-bash-4.1$ /opt/PostgreSQL/9.2/bin/slonik /tmp/Switchover_script.slonik
/tmp/Switchover_script.slonik:4: Set 1 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:5: Set 2 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:6: Set 3 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:7: Set 4 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:8: waiting for event (1,5000000211) to be confirmed on node 2
/tmp/Switchover_script.slonik:8: Set 5 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:9: Set 6 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:10: Set 7 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:11: Set 8 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:12: waiting for event (1,5000000224) to be confirmed on node 2
/tmp/Switchover_script.slonik:12: Set 9 Has Been Moved From Origin Node 1 To 2
/tmp/Switchover_script.slonik:13: Set 10 Has Been Moved From Origin Node 1 To 2

-- Now try to insert values in tables of "target" database.. that should allow you.. :-)
target=# insert into slony_tab_1 values(21);
INSERT 0 1

Wow, its done.. isn't it easy.. !!
And now go ahead with failover.. its yours :P
Below is the script for FailOver...
cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
try {failover (id = 1, backup node = 2 );} on error { echo 'Failure Of The Failover For The Set 1  to 2 ';exit 1; }echo 'Failover Has been performed from 1 to 2';

Check if replication is happening..it should not !
so we are done with Switchover and Failover..
Hope this helps to someone. Any comments or suggestion would be appreciated !


Leo Hsu and Regina Obe: FOSS4GNA 2015 and PGDay San Francisco March 2015

Magnus Hagander: PGConf.EU 2014 - feedback is in

$
0
0

It's that time of the year again - we've wrapped PGConf.EU 2014, and I've just closed the feedback system, so it's time to take a look at what's been said.

We're keeping fairly consistent numbers with previous years, which is something we are definitely happy with. We did have a slight drop in "overall view", since this year we had 8% ranking us as 3, a worse score than we saw last year, and we had a couple of fewer people voting 5. And a slight shift from 5 to 4 on the programme. The numbers are still good of course, but since we had a tiny drop last year as well, we need to step our game back up for next year!

http://photos.smugmug.com/photos/i-hd7xTCK/0/O/i-hd7xTCK.pnghttp://photos.smugmug.com/photos/i-XgSPD3S/0/O/i-XgSPD3S.png

This year we had a slightly bigger spread of how users identify themselves, seeing most categories chip away a little on DBAs and Developers, but they are still definitely the dominating categories. We also have a lot of returning developers - it's cool to see so many people who have been to every one of our events so far, combined with a full 25% being first-time attendees!

http://photos.smugmug.com/photos/i-RgCv9Gs/0/O/i-RgCv9Gs.pnghttp://photos.smugmug.com/photos/i-MVQcnbt/0/O/i-MVQcnbt.png


Continue reading "PGConf.EU 2014 - feedback is in"

Baji Shaik: Ah, Does it mean a bad hardware or a kernel...Uh, Just want to avoid it.

$
0
0
I have seen many customers coming up with below errors and asking for root cause. They wonder with the reasons behind it and say "Ah, its because of a bad hardware or a kernel.. I hate it, just want to know how to avoid these"

Lets start with this:
ERROR: could not read block 4285 in file "base/xxxxx/xxxx": read only 0 of 8192 bytes

... have rarely been known to be caused by bugs in specific Linux kernel versions.  Such errors are more often caused by bad hardware, anti-virus software, improper backup/restore procedures, etc.

One very common cause for such corruption lately seems to be incorrect backup and restore. (For example, failure to exclude or delete all files from the pg_xlog directory can cause problems like this, or using filesystem "snapshots" which aren't really atomic.) The history of the database, including any recoveries from backup or promotion of replicas to primary, could indicate whether this is a possible cause. Faulty hardware is another fairly common cause, including SANs. If fsync or full_page_writes were ever turned off for the cluster, that could also explain it.

It is good to establish the cause where possible, so that future corruption can be avoided, but to recover the cluster should normally be dumped with pg_dumpall and/or pg_dump, and restored to a freshly created (via initdb) cluster on a machine which is not suspected of causing corruption. It may be possible to fix up or drop and recreate individual damaged objects, but when doing that it can be hard to be sure that the last of the corruption (or the cause of the initial corruption) has been eliminated.

Here is a nice article to find why-is-my-database-corrupted from Robert Haas.

Errors like this:
ERROR: unexpected data beyond EOF in block xxxx of relation pg_tblspc/xxxx
HINT: This has been seen to occur with buggy kernels; consider updating your system.

... are most often caused by Linux kernel bugs. If you are seeing both types of errors suggests it is likely that a hardware problem (like bad RAM) may be the cause of both problems, although other causes cannot be ruled out. It is recommended that you schedule a maintenance window and run thorough hardware checks. The latter error mes age has never been known to be caused by a bug in PostgreSQL itself or by improper backup/restore; it can only be caused by an OS bug or something which is interfering with the OS-level actions -- like hardware problems or AV software. The kernel bug can affect anything adding pages to a table or its indexes. It is a race condition in the kernel, so it will probably be infrequent and it will be hard to reproduce or to predict when it will be encountered. It can be caused by an fallocate() bug which is indeed fixed in below release:

6.5, also termed Update 5, 21 November 2013 (kernel 2.6.32-431): https://rhn.redhat.com/errata/RHSA-2013-1645.html

Given all the distributions of Linux and the different timings with which each has incorporated different bug fixes, it is not feasible to give a list of Linux versions that are known to work well.  A more practical approach would be to find out the exact version of Linux being used, and then do a web search for known bugs in that version.  Most often the main source of that is the list of bugs fixed in later versions.  The bug which could cause this error was fixed several years ago in all major distributions, so any bug-fix version of Linux released in the last two years is unlikely to contain the relevant bug, so simply applying available bug fixes for the distribution should rule out OS problems unless this is a new OS bug which is not yet run into.  If you continue to see this error while running with the latest OS bug fixes, the most likely cause is bad hardware.

I have googled around on "suggestions to avoid corruptions" and found this article from Craig Ringer. Here are some suggestions made by community/core team members:

** Maintain rolling backups with proper ageing. For example, keep one a day for the last 7 days, then one a week for the last 4 weeks, then one a month for the rest of the year, then one a year.
** Use warm standby with log shipping and/or replication to maintain a live copy of the DB.
** If you want point-in-time recovery, keep a few days or weeks worth of WAL archives and a basebackup around. That'll help you recover from those "oops I meant DROP TABLE unimportant; not DROP TABLE vital_financial_records;" issues.
** Keep up to date with the latest PostgreSQL patch releases. Don't be one of those people still running 9.0.0 when 9.0.10 is out.
** Plug-pull test your system when you're testing it before going live. Put it under load with something like pgbench, then literally pull the plug out. If your database doesn't come back up fine you have hardware, OS or configuration problems.
** Don't `kill -9` the postmaster. It should be fine, but it's still not smart.
** ABSOLUTELY NEVER DELETE postmaster.pid
** Use good quality hardware with proper cooling and a good quality power supply. If possible, ECC RAM is a nice extra.
** Never, ever, ever use cheap SSDs. Use good quality hard drives or (after proper testing) high end SSDs. Read the SSD reviews periodically posted on this mailing list if considering using SSDs. Make sure the SSD has a supercapacitor or other reliable option for flushing its write cache on power loss. Always do repeated plug-pull testing when using SSDs.
** Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not the right choices for a database you care about. Never, ever, ever use FAT32.
** If on Windows, do not run an anti-virus program on your database server. Nobody should be using it for other things or running programs on it anyway.
** Avoid RAID 5, mostly because the performance is terrible, but also because I've seen corruption issues with rebuilds from parity on failing disks.
** Use a good quality hardware RAID controller with a battery backup cache unit if you're using spinning disks in RAID. This is as much for performance as reliability; a BBU will make an immense difference to database performance.
** If you're going to have a UPS (you shouldn't need one as your system should be crash-safe), don't waste your money on a cheap one. Get a good online double-conversion unit that does proper power filtering. Cheap UPSs are just a battery with a fast switch, they provide no power filtering and what little surge protection they offer is done with a component that wears out after absorbing a few surges, becoming totally ineffective. Since your system should be crash-safe a cheap UPS will do nothing for corruption protection, it'll only help with uptime.


US PostgreSQL Association: An evening with PostgreSQL and Sponsoring LFNW

$
0
0

JD wrote:

A couple of weeks ago I spoke at Bellingham Linux User Group with a talk entitled: An Evening with PostgreSQL. It was an enlightening talk for me because it was the first time, in a long time, that I have spoke to a non-postgresql community. Most of the people in attendance were Linux Users of course but also a few Mongo as well as MySQL users. I was asked questions such as, "Why would I use PostgreSQL over MySQL?". To be honest, I didn't even realize that was still a question but it opens up a huge advocacy opportunity.

read more

Josh Berkus: SFPUG's New Home Page

$
0
0

main-image

SFPUG has a new home page, here. This page will be used to post our meetups, as well as local events including the upcoming pgDay SF. It runs on Ghost and PostgreSQL 9.3.

Stay tuned!

Josh Berkus: SFPUG's New Home Page

$
0
0

main-image

SFPUG has a new home page, here. This page will be used to post our meetups, as well as local events including the upcoming pgDay SF. It runs on Ghost and PostgreSQL 9.3.

Stay tuned!

Tomas Vondra: PostgreSQL performance with gcc, clang and icc

$
0
0

On Linux, a "compiler" is usually a synonym to gcc, but clang is gaining more and more adoption. Over the years, phoronix published severalarticlescomparing of performance of various clang and gcc versions, suggesting that while clang improves over time, gcc still wins in most benchmarks - except maybe "compilation time" where clang is a clear winner. But none of the benchmarks is really a database-style application, so the question is how much difference can you get by switching a compiler (or a compiler version). So I did a bunch of tests, with gcc versions 4.1-4.9, clang 3.1-3.5, and just for fun with icc 2013 and 2015. And here are the results.

I did two usual types of tests - pgbench, representing a transactional workload (lots of small queries), and a subset of TPC-DS benchmark, representing analytical workloads (a few queries chewing large amounts of data).

I'll present results from a machine with i5-2500k CPU, 8GB RAM and an SSD drive, running Gentoo with kernel 3.12.20. I did rudimentary PostgreSQL tuning, mostly by tweaking postgresql.conf like this:

shared_buffers=1GBwork_mem=128MBmaintenance_work_mem=256MBcheckpoint_segments=64effective_io_concurrency=32

I do have results from another machine, but in general it confirms the results presented here. The PostgreSQL was compiled like this

./configure--prefix=...makeinstall

i.e. nothing special (no custom tweaks, etc.). The rest of the system is compiled with gcc 4.7.

pgbench

I did pgbench with three dataset sizes - small (~150MB), medium (~25% RAM) and large (~200% RAM). For each scale I ran pgbench with 4 clients (which is the number of cores on the CPU) for 15 minutes, repeated 3x, and averaged the results. And all this in read-write and read-only mode.

The first observation is that once you start hitting the drives, compiler makes absolutely no measurable difference. That makes results from all the read-write tests (for all scales) uninteresting, as well as the read-only test on large dataset - for all these tests the I/O is the main bottleneck (and that's something the compiler can't really influence).

So we're left with just the read-only benchmark on small and medium datasets, where the results look like this:

compilertps (small scale=10)tps (medium scale=140)
gcc 4.1.25293249837
gcc 4.2.45307150219
gcc 4.3.65214749396
gcc 4.4.75259749834
gcc 4.5.45353750143
gcc 4.6.45323849959
gcc 4.7.45438351033
gcc 4.8.35449451627
gcc 4.9.25508452515
clang 3.15516051748
clang 3.25584852197
clang 3.35494651906
clang 3.45529752306
clang 3.55580052458
icc 20135224949197
icc 20155206449064

Let's use the gcc 4.1.2 results as a baseline, and express the other results as a percentage of the baseline. So 100 means "same as gcc 4.1.2", 90 means "10% slower than gcc 4.1.2" and so on. On a chart it then looks like this (the higher the number, the better):

pgbench-comparison.png

Not really a dramatic difference:

  • gcc 4.9 and clang 3.5 are winners, with ~4-5% improvement over gcc 4.1.2
  • gcc improves over time, with the exception of 4.3/4.4, where the performance dropped below 4.1
  • clang is very fast right from 3.1, peaking at 3.2 (which is slightly better than 3.5)
  • surprisingly, icc gives the worst results here

TPC-DS

Now, the data warehouse benchmark. I've used a small dataset (1GB), so that it fits into memory - otherwise we'd hit the I/O bottlenecks and the compilers would make no difference. First, lest's load the data - the script performs these operations:

  • COPY data into all the tables
  • create indexes
  • VACUUM FULL (not really necessary)
  • VACUUM FREEZE
  • ANALYZE

The results (in seconds) look like this:

tpcds-load.png

compilercopyindexesvacuum fullvacuum freezeanalyzetotal
gcc 4.1.211013116858422
gcc 4.2.410512816258408
gcc 4.3.610312716047401
gcc 4.4.710212716047400
gcc 4.5.410112616046397
gcc 4.6.410312816258406
gcc 4.7.410012215636387
gcc 4.8.310112215536387
gcc 4.9.210211815038381
clang 3.110812916248411
clang 3.210412516046399
clang 3.310512516036399
clang 3.410612616138404
clang 3.510512716248406
icc 201310612916348410
icc 201510512516046400

According to the totals, the difference between the slowest (gcc 4.1.2) and fastest (gcc 4.9.2) is ~10%. Again, gcc continuously improves, which is nice. Clang actually slightly slows down since 3.2, which is not so nice, and clang 3.5 is ~6.5% slower than gcc 4.9.2. And icc is somewhere in between, with a nice speedup between 2013 and 2015 versions.

But that was just loading the data, what about the actual queries? TPC-DS specifies 99 query templates. Some of those use features not yet available in PostgreSQL, leaving us with 61 PostgreSQL-compatible templates. Sadly 2 of those did not complete within 30 minutes on the 1GB dataset (clearly, room for improvement), so the actual benchmark consists of 59 queries.

Chart of total duration of three runs per query, using gcc 4.1.2 as a baseline (just like the pgbench, but this time lower numbers are better) looks like this:

tpcds-queries.png

Clearly, the differences are much more significant than in the pgbench results. Again, gcc continuously improves over time, with 4.9.2 being the winner here - the difference between 4.1.2 and 4.9.2 is astonishing ~15%. That's pretty significant improvement - good work, GCC developers!

Clang results fluctuate a lot - 3.1, 3.3 and 3.5 are quite good (not as good as gcc 4.9.2, though).

And icc is again somewhere in the middle - faster than gcc 4.1.2 but nowehere as fast as gcc 4.9.2 or the "good" clang versions. And this time 2015 actually slowed down (contrary to the previous results).

Summary

If your workload is transactional (pgbench-like), the compiler does not matter that much - either you're hitting disks (and the compiler does not matter at all), or the differences are within 5% from gcc 4.1.2. But if a gain this small is significant for you enough to warrant switching a compiler, you should probably consider getting a slightly more powerful hardware (CPU with more cores, faster RAM, better storage, ...).

Analytical workloads are a different case - gcc is a clear winner, and if you're using an ancient version (say, 4.3 or older), you can get ~10% speedup by switching to 4.7, or ~15% to 4.9. In any case, the newer the version, the better.

Josh Berkus: Aggregate, Aggregate, Aggregate!

Josh Berkus: Aggregate, Aggregate, Aggregate!

Josh Berkus: pgDay SF 2015: Call for Speakers and Sponsors

$
0
0

Use Postgres? In the San Francisco Bay Area? Maybe you're planning to attend FOSS4G-NA or EclipseCon? Well, join us as well! pgDay SF 2015 will be on March 10th in Burlingame, CA. We are currently looking for speakers and sponsors for the event.

pgDaySF will be a one-day, one track event held alongside FOSS4G North America and EclipseCon, allowing for cross-pollination among geo geeks, Java programmers, and PostgreSQL fans. We are looking for both user-oriented and advanced database talks, with a slant towards PostGIS. Interested? Submit a talk now. Submissions for full talks close on December 9th.

If your company uses or supports PostgreSQL, or markets products to PostgreSQL and PostGIS users, then you may want to sponsor the pgDay as well. Currently we're looking for one Sponsor and up to five Supporters. This is especially good for companies looking to hire PostgreSQL DBAs.

pgDay SF 2015 is sponsored by Google and PostgreSQL Experts Inc..


Christophe Pettus: When LIMIT attacks

$
0
0

One common source of query problems in PostgreSQL results an unexpectedly-bad query plan when a LIMIT clause is included in a query. The typical symptom is that PostgreSQL picks an index-based plan that actually takes much, much longer than if a different index, or no index at all, had been used.

Here’s an example. First, we create a simple table and an index on it:

xof=# CREATE TABLE sample (
xof(#   i INTEGER,
xof(#   f FLOAT
xof(# );
CREATE TABLE
xof=# CREATE INDEX ON sample(f);
CREATE INDEX

And fill it with some data:

xof=# INSERT INTO sample SELECT 0, random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# ANALYZE;
ANALYZE

Then, for about 5% of the table, we set i to 1:

UPDATE sample SET i=1 WHERE f<.0.05;
ANALYZE;

Now, let’s find all of the entires where i is 1, in descending order of f.

xof=# EXPLAIN ANALYZE SELECT * FROM sample WHERE i=1 ORDER BY f DESC;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=399309.76..401406.04 rows=838509 width=12) (actual time=1415.166..1511.202 rows=499607 loops=1)
   Sort Key: f
   Sort Method: quicksort  Memory: 35708kB
   ->  Seq Scan on sample  (cost=0.00..316811.10 rows=838509 width=12) (actual time=1101.836..1173.262 rows=499607 loops=1)
         Filter: (i = 1)
         Rows Removed by Filter: 9500393
 Total runtime: 1542.529 ms
(7 rows)

So, 1.5 seconds to do a sequential scan on the whole table. So, just getting the first 10 entries from that should be much faster, right?

xof=# EXPLAIN ANALYZE SELECT * FROM sample WHERE i=1 ORDER BY f DESC LIMIT 10;
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..277.33 rows=10 width=12) (actual time=12710.612..12710.685 rows=10 loops=1)
   ->  Index Scan Backward using sample_f_idx on sample  (cost=0.43..23218083.52 rows=838509 width=12) (actual time=12710.610..12710.682 rows=10 loops=1)
         Filter: (i = 1)
         Rows Removed by Filter: 9500393
 Total runtime: 12710.714 ms
(5 rows)

Oh. 12.7 seconds. What happened?

PostgreSQL doesn’t keep correlated statistics about columns; each column’s statistics are kept independently. Thus, PostgreSQL made an assumption about the distribution of values of i in the table: they were scattered more or less evenly throughout. Thus, walking the index backwards meant that, to get 10 “hits,” it would have to scan about 100 index entries… and the index scan would be a big win.

It was wrong, however, because all of the i=1 values were clustered right at the beginning. If we reverse the order of the scan, we can see that was a much more efficient plan:

xof=# EXPLAIN ANALYZE SELECT * FROM sample WHERE i=1 ORDER BY f  LIMIT 10;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..277.33 rows=10 width=12) (actual time=0.029..0.046 rows=10 loops=1)
   ->  Index Scan using sample_f_idx on sample  (cost=0.43..23218083.52 rows=838509 width=12) (actual time=0.027..0.044 rows=10 loops=1)
         Filter: (i = 1)
 Total runtime: 0.071 ms
(4 rows)

So, what do we do? There’s no way of telling PostgreSQL directly to pick one plan over the other. We could just turn off index scans for the query, but that could well have bad side effects.

In this particular case, where a predicate (like the WHERE i=1) picks up a relatively small number of entries, we can use a Common Table Expression, or CTE. Here’s the example rewritten using a CTE:

xof=# EXPLAIN ANALYZE
xof-# WITH inner_query AS (
xof(#    SELECT * FROM sample WHERE i=1 
xof(# )
xof-# SELECT * FROM inner_query ORDER BY f  LIMIT 10;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=351701.16..351701.18 rows=10 width=12) (actual time=1371.946..1371.949 rows=10 loops=1)
   CTE inner_query
     ->  Seq Scan on sample  (cost=0.00..316811.10 rows=838509 width=12) (actual time=1168.034..1244.785 rows=499607 loops=1)
           Filter: (i = 1)
           Rows Removed by Filter: 9500393
   ->  Sort  (cost=34890.06..36986.33 rows=838509 width=12) (actual time=1371.944..1371.944 rows=10 loops=1)
         Sort Key: inner_query.f
         Sort Method: top-N heapsort  Memory: 25kB
         ->  CTE Scan on inner_query  (cost=0.00..16770.18 rows=838509 width=12) (actual time=1168.040..1325.496 rows=499607 loops=1)
 Total runtime: 1381.472 ms
(10 rows)

A CTE is an “optimization fence”: The planner is prohibited from pushing the ORDER BY or LIMIT down into the CTE. In this case, that means that it is also prohibited from picking the index scan, and we’re back to the sequential scan.

So, when you see a query come completely apart, and it has a LIMIT clause, check to see if PostgreSQL is guessing wrong about the distribution of data. If the total number of hits before the LIMIT are relatively small, you can often use a CTE to isolate that part, and only apply the LIMIT thereafter. (Of course, you might be better off just doing the LIMIT operation in your application!)

Josh Berkus: Good kernel, bad kernel

$
0
0
A month ago I got into an argument on IRC with Sergey about telling people to avoid kernel 3.2.  This turned out to be a very productive argument, because Sergey then went and did a battery of performance tests against various Linux kernels on Ubuntu. Go read it now, I'll wait.

My takeaways from this:

  • Kernel 3.2 is in fact lethally bad.
  • Kernel 3.13 is the best out of kernel 3.X so far.  I hope that this can be credited to the PostgreSQL team's work with the LFS/MM group.
  • No 3.X kernel yet has quite the throughput of 2.6.32, at least at moderate memory sizes and core counts.
  • However, kernel 3.13 has substantially lower write volumes at almost the same throughput.  This means that if you are write-bound on IO, 3.13 will improve your performance considerably.
  • If your database is mostly-reads and highly concurrent, consider enabling
    kernel.sched_autogroup_enabled.
Thanks a lot to Sergey for doing this testing, and thanks even more to the LFS/MM group for improving IO performance so much in 3.13.

Raghavendra Rao: How to replicate only INSERTs not DELETEs/UPDATEs on Slony Slave Node ?

$
0
0
In the first place, we need to know about why such requirement needed. IMO, its absolutely a business necessity to maintain some kind of historical data on the target database(Slave Node). Especially, out of multiple slave nodes one of the slave node to retain the very first form of the data when it initially written into the database.

To accomplish this requirement, we should come up with some kind of filters like TRIGGERs/RULEs on Slave Node so that it avoids relaying DELETE and UPDATE statements. Since we are dealing with Slony-I, it doesn't have such built-in mechanism to filter DML's while replaying them on slave node though it has gathered all events from the Master node.(AFAIK Mysql,Oracle,SQL Server do support filters).

To get this straight, traditional Slony-I way maintains uniqueness of rows across all the nodes with its core concept of tables must have primary keys. In such architecture design, its hard to exclude DELETE/UPDATE statements, take an example of primary key column "orderid" of "orders" table has a first INSERT statement with value 100 and its been replicated as first form on filtered Slave Node. Later a DELETE statement executed for "orderid=100" and deleted row, now if any INSERT or UPDATE statement attempts to use the "orderid=100" then Slave node hits with duplicate key violation and it simple break the replication.
ERROR:  duplicate key value violates unique constraint "reptest_pkey"
DETAIL: Key (id)=(2) already exists.
CONTEXT: SQL statement "INSERT INTO "public"."reptest" ("id", "name") VALUES ($1, $2);"
.....
or
....
CONTEXT: SQL statement "UPDATE ONLY "public"."reptest" SET "id" = $1 WHERE "id" = $2;"
2014-11-17 23:18:53 PST ERROR remoteWorkerThread_1: SYNC aborted
Thus, implementing rule not an issue yet one should be extremely cautious when its in place. In reality however applying these filters on Slony-I slave node are very fragile, especially application/developer should always keep this in mind any duplicate entry of row by INSERT OR UPDATE could break the replication.

As DML rules not possible alone with Slony-I, we can make use of PostgreSQL CREATE RULE...ON DELETE/ON UPDATE DO INSTEAD NOTHING and apply that RULE on table by ALTER TABLE...ENABLE REPLICA RULE to void DELETE/UPDATE statement. Using this option takes a lot of discipline, so you can ensure your application and staff members really follow these rules.

To continue with steps, you should have slony setup, on the off chance that you need to setup up you can refer to my past post here.

Steps on Slave Node (Master DB: postgres, Slave DB: demo, Port: 5432):

1. Stop slon daemons
2. Create ON DELETE and ON UPDATE DO INSTEAD NOTHING rule
demo=# CREATE RULE void_delete AS ON DELETE TO reptest DO INSTEAD NOTHING;
CREATE RULE
demo=# CREATE RULE void_update AS ON UPDATE TO reptest DO INSTEAD NOTHING;
CREATE RULE
3. Apply RULE on table
demo=# ALTER TABLE reptest ENABLE REPLICA RULE void_delete;
ALTER TABLE
demo=# ALTER TABLE reptest ENABLE REPLICA RULE void_update ;
ALTER TABLE
4. Start Slon daemons

Now, you can notice below that UPDATE/DELETE has no impact on Slave Node:
postgres=# delete from reptest where id =2;
DELETE 1
postgres=# update reptest set id=2 where id=1;
UPDATE 1

--On Master
postgres=# select * from reptest ;
id | name
----+------------
2 | A
(1 row)

--On Slave
demo=# select * from reptest ;
id | name
----+------------
1 | A
2 | C
(2 rows)
If INSERT statement executed with value 1 then it will break the replication. Be noted...!!

Remember, there other ways to full-fill this request like dblinks, Triggers like BEFORE DELETE...return NULL value from function, but I believe the most efficient way would be to use RULE/ENABLE REPLICA RULE when you are working with Slony replication.

By now you might have read many blogs on Logical Decoding Replication slots new feature in PostgreSQL 9.4, hope in future it might include the concept of filter DMLs on Slave.

Thank you for visiting.

--Raghav

Michael Paquier: Postgres 9.5 feature highlight: pg_dump and external snapshots

$
0
0

A couple of days ago the following feature related to pg_dump has been committed and will be in Postgres 9.5:

commit: be1cc8f46f57a04e69d9e4dd268d34da885fe6eb
author: Simon Riggs <simon@2ndQuadrant.com>
date: Mon, 17 Nov 2014 22:15:07 +0000
Add pg_dump --snapshot option

Allows pg_dump to use a snapshot previously defined by a concurrent
session that has either used pg_export_snapshot() or obtained a
snapshot when creating a logical slot. When this option is used with
parallel pg_dump, the snapshot defined by this option is used and no
new snapshot is taken.

Simon Riggs and Michael Paquier

First, let's talk briefly about exported snapshots, a feature that has been introduced in PostgreSQL 9.2. With it, it is possible to export a snapshot from a first session with pg_export_snapshot, and by reusing this snapshot in transactions of other sessions all the transactions can share exactly the same state image of the database. When using this feature something like that needs to be done for the first session exporting the snapshot:

=# BEGIN;
BEGIN
=# SELECT pg_export_snapshot();
 pg_export_snapshot
--------------------
 000003F1-1
(1 row)

Then other sessions in parallel can use SET TRANSACTION SNAPSHOT to import back the snapshot and share the same database view as all the other transactions using this snapshot (be it the transaction exporting the snapshot or the other sessions that already imported it).

=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
=# SET TRANSACTION SNAPSHOT '000003F1-1';
SET
=# -- Do stuff
[...]
=# COMMIT;
COMMIT

Note that the transaction that exported the snapshot needs to remain active as long as the other sessions have not consumed it with SET TRANSACTION. This snapshot export and import dance is actually used by pg_dump since 9.3 for parallel dumps to make consistent the dump acquisition across the threads, whose number is defined by --jobs, doing the work.

Now, this commit adding the option --snapshot is simply what a transaction importing a snapshot does: caller can export a snapshot within the transaction of a session and then re-use it with pg_dump to take an image of a given database consistent with the previous session transaction. Well, doing only that is not that useful in itself. The fun begins actually by knowing that there is a different situation where a caller can get back a snapshot name, and this situation exists since 9.4 because it is the moment a logical slot. is created through a replication connection.

$ psql "replication=database dbname=ioltas"
[...]
=# CREATE_REPLICATION_SLOT foo3 LOGICAL test_decoding;
 slot_name | consistent_point | snapshot_name | output_plugin
-----------+------------------+---------------+---------------
 foo       | 0/16ED738        | 000003E9-1    | test_decoding
(1 row)

See "000003E9-1" in the field snapshot_name? That is the target. Note a couple of things as well at this point:

  • The creation of a physical slot does not return back a snapshot.
  • The creation of a logical slot using pg_create_logical_replication_slot with a normal connection (let's say non-replication) does not give back a snapshot name.
  • The snapshot is alive as long as the replication connection is kept. That is different of pg_export_snapshot called in the context of a non-replication connection where the snapshot remains alive as long as the transaction that called it is not committed (or aborted).

This is where this feature takes all its sense: it is possible to get an image of the database at the moment the slot has been created, or putting it in other words before any changes in the replication slot have been consumed, something aimed to be extremely useful for replication solutions or cases like online migration/upgrade of databases because it means that the dump can be used as a base image on which changes could be replayed without data lost. Then, the dump can simply be done like that:

pg_dump --snapshot 000003E9-1

When doing a parallel dump with a snapshot name, the snapshot specified is used for all the jobs and is not enforced by the first worker as it would be the case when a snapshot name is not specified, or when pg_dump would work in 9.3 and 9.4. Note as well that it is possible to use a newer version of pg_dump on older servers so it is fine to take a dump with an exported snapshot with 9.5's pg_dump from a 9.4 instance of Postgres, meaning that the door of a live upgrade solution of a single database is not closed (combined with the fact that a client application consuming changes from a logical replication slot can behave as a synchronous standby).

Robins Tharakan: Sqsh / FreeTDS with SQL2012 using the instance argument

$
0
0
This is a corner-case advise to anyone looking for a solution as to why sqsh / tsql / freetds combination is working perfectly in one SQL2012 instance but unable to login to a newly configured SQL2012 instance, the details for which just came in. Symptoms Sqsh / Tsql / FreeTDS is perfectly configured The setup logs in to another SQLServer perfectly well All this when you are able to login
Viewing all 9765 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>