Craig Ringer: Testing new PostgreSQL versions without messing up your existing install

August 29, 2013, 8:33 pm

≫ Next: Gabriele Bartolini: OSS4B 2013, innovate through Open Source

≪ Previous: Jim Mlodgenski: Latency: That pesky little thing

People are often hesitant to test out a new PostgreSQL release because they’re concerned it’ll break their current working installation.

This is a perfectly valid concern, but it’s easily resolved with a few simple protective measures:

Build PostgreSQL from source as an unprivileged user
Install your PostgreSQL build within that user’s home directory
Run PostgreSQL as that user, not postgres
Run on a non-default port by setting the PGPORT env var

If you take these steps the install its self cannot interfere at all. Starting and running the new PostgreSQL can only interfere by using too many resources (shared memory, file descriptors, RAM, CPU, etc) and you can just stop the new version if it causes any issues. The shared memory improvements in 9.3 make the shared memory issues largely go away, too.

If performance impact is a concern, set a ulimit to stop the new version using too many resources and run it nice‘d and ionice‘d, but I’m not going to get into that in this article, that’s a whole separate topic.

Let’s say you want to test a patch, either for your own use or when you’re doing review work for a commitfest. You’re testing on Linux, OS X, or BSD; Windows isn’t covered here. You’re comfortable with the command line but you might not have much or any prior development/compiling experience. You’re familiar with common PostgreSQL tools like psql, pg_dump and pg_restore.

Here’s how to get started:

1. Install `git`, gcc, and other tools.

You’ll need a compiler and some libraries to compile PostgreSQL. This is the only step that affects the software running on the rest of your system, and it only installs a few extra tools. It’s possible that it might update existing libraries, but only to the same versions an aptitude upgrade or yum update would, i.e. security and bug-fix releases.

Debian/Ubuntu

On Debian/Ubuntu, uncomment any deb-src lines in your sources.list then:

sudo aptitude update
sudo aptitude install git build-essential gdb linux-tools patch
sudo aptitude build-dep postgresql

Fedora / Red Hat

On Fedora / Red Hat:

sudo yum install git yum-utils gdb perf patch
sudo yum groupinstall "Development Tools"
sudo yum-builddep postgresql

The last step might fail with an error about missing sources; it seems to be common for Red Hat based distros not to configure any sources in yum, even disabled by default. If that’s the case for you, download the rpm spec file from the PGDG repository for your distro and version. The Pg version doesn’t matter much, since you’re only using it to get dependencies. For example, I used this spec file from http://svn.pgrpms.org/browser/rpm/redhat/9.2/postgresql/F-19/postgresql-9.2.spec even though I’ll be building git master (9.4). Once it’s downloaded, tell yum-builddep to install the requirements:

sudo yum-builddep /path/to/postgresql-9.2.spec

OS X

Mac OS X users will need to install XCode. To install the required libraries for PostgreSQL you should use a ports-like tool like MacPorts or homebrew.

Comments from Mac users who can provide more specific instructions would be welcomed. Many of the guides and tutorials on the ‘net appear to be outdated or assume you want to use the PostgreSQL sources provided by MacPorts, Homebrew, etc, rather than your own.

The BSDs

Use the ports system to ensure you have gcc and the other dependencies for PostgreSQL. If in doubt look at the portfile for PostgreSQL to see what you need.

Comments from BSD users who can provide more specific instructions would be welcomed.

Windows

Compiling PostgreSQL on Windows is totally different. I wrote about that separately but it’s not for the faint of heart. This guide does not apply to Windows, it will only confuse you.

Get the PostgreSQL sources

From this point on you should be working with your normal user account or a separate account created for the purpose. Do not use sudo or run as root.

I recommend grabbing PostgreSQL from git rather than getting a source tarball. It’ll save you time and hassle down the track.

Get the main PostgreSQL git repo first:

git clone git://git.postgresql.org/git/postgresql.git

This will take a while, but you’ll only need to do it once. Future updates can be done with a simple “git pull”.

Now check out the PostgreSQL release branch you want to work with. These follow a strict naming scheme, where REL9_1_STABLE is the 9.1.x branch, REL9_2_STABLE is the 9.2.x branch, etc. Individual releases are tags like REL9_2_1. The current development tip is the default branch, called master. If you’re testing a patch the email the patch came with generally includes information about what revision it applies on top of.

Sometimes you’ll want to test git master or the tip of a stable branch, say when you’re testing out a bug fix. In this case you can skip the next bit and move straight on to compiling and installing.

Get the patch you want to test

Changes to PostgreSQL are typically distributed as patches on the mailing list. Sometimes you’ll find that there’s a git branch published for the patch, but I’ll assume that there isn’t, or that you don’t want to learn git for the purpose. (If you do, start with the git book).

You’ll generally find the patch as an attachment to a mailing list post, or included in-line. Let’s pretend you want to apply this trivial patch, which should apply cleanly to most releases.

That patch is an attachment so save it somewhere, say $HOME/Downloads/parse_bool_with_len.patch. (If a patch is in-line in an email you need to copy and paste it into a text file instead).

Apply the patch to the PostgreSQL sources

Now I want to apply the patch to REL9_2_STABLE. To do that I cd into the postgresql git working tree from the previous step and run:

git checkout REL9_2_STABLE
# Update to the latest content of the branch
git pull
# and apply the patch
patch -p1 < ~/Downloads/parse_bool_with_len.patch

The patch should apply cleanly:

$ patch -p1 < ~/Downloads/parse_bool_with_len.patch
patching file src/backend/utils/adt/bool.c
$

If it reports an error, it’s possible you’ll need to add -l (ignore whitespace) or if it’s a patch created bit git format-patch try using git am -3 to apply it.. I won’t go into resolving conflicts further here, it’s a whole separate topic, we’ll just presume the patch applies cleanly like it should.

If patch reports:

Reversed (or previously applied) patch detected!  Assume -R? [n]

then it’s quite likely the patch has been applied to this release already.

Compile PostgreSQL

Your sources are patched, so you’re ready to compile. I’m going to assume you’re going to install to $HOME/postgresql-test . (In practice I tend to use $HOME/pg/92-parse_bool_with_len or similar, i.e. name the install dir after the patch).

You’re already cd‘d to the postgresql source dir, so:

./configure --prefix=$HOME/postgresql-test
make clean
make

For information about other options to configure see configure --help and the PostgreSQL documentation.

If you want to run the regression tests you can do so at this point.

PGPORT=5444 make check

Now install the PostgreSQL build to your home dir. Do not use sudo, it is not required:

make install

Congratulations, you compiled and installed PostgreSQL.

Starting PostgreSQL

You’ll usually want to create a new blank database cluster and start PostgreSQL on it. This is quite trivial:

# Choose an unused port on your machine
export PGPORT=5444
export PATH=$HOME/postgresql-test/bin:$PATH
initdb -D $HOME/postgresql-test-data
pg_ctl -D $HOME/postgresql-test-data -l $HOME/postgresql-test-data.log start

(See the initdb and pg_ctl documentation for details on command line options, etc. The -l flag tells pg_ctl to save the PostgreSQL logs to $HOME/postgresql-test-data.log.)

You’re done.

You can connect to the new server with psql. For other tools you’ll have to specify the port 5444. You’ll notice something odd, though: connecting without specifying a username or database fails with:

$ psql
psql: FATAL:  database "myusername" does not exist

That’s because by default initdb sets the superuser name to the user you run initdb as, instead of postgres. You can either run initdb with the -U flag to specify a different superuser name, or just explicitly connect to the default postgres database by name:

$ psql postgres
psql (9.4devel, server 9.4devel)
postgres#

I recommend using a different superuser name to what you have on your live data. It’ll help stop those embarrassing “oops, wrong server” mistakes.

Copying an existing PostgreSQL cluter

Sometimes – like when you’re testing a bugfix or minor point release – you want to copy an existing PostgreSQL database.

The easiest way to do that is pg_dump and pg_restore. This is just like any other dump and restore except that you specify the port of the new server to pg_restore when restoring the database.

Alternately, you can use pg_basebackup with --xlog-method=stream to copy the on-disk format of the existing cluster while it’s running, then use pg_ctl to start the new binaries against the copy. This won’t affect the original data at all. It only works if you’re running the same major release, eg your main DB is on 9.2.1 and you’re testing REL9_2_STABLE. It’s useful to do this when testing bug fixes.

Making sure you don’t connect to the wrong server

You’ve installed a custom built and possibly patched PostgreSQL in your home directory. From here, you can mess with it all you like, safe in the knowledge you can’t break anything important so long as you make sure you connect to the correct database server.

Be careful to use the correct port and check what you’ve connected to or – better – make sure you create a different user on the development copy that doesn’t exist on production so you can’t get the two muddled.

I like to create a separate user account on my laptop that I log into with sudo -i pgdev. This account has a .bash_profile with a PGPORT set and a different shell prompt to indicate I’m in dev mode, a .psqlrc that sets a different psql prompt, etc.

Where to go from here

Now that you’ve compiled, patched and installed your own copy of PostgreSQL you can get onto the bug testing, benchmarking, patch review, or whatever else bought you here in the first place.

Once you’re done with that, consider taking a look at some of the following materials:

The PostgreSQL documentation should be your first reference for any problems.
Reviewing a patch
The developer pages on postgresql.org
The Developer FAQ
The mailing lists, particularly pgsql-hackers
The Commitfest app
The git book. Git is a core tool for PostgreSQL development now, and it’s very useful to become familiar with it.

Have fun. It’s not as hard as it looks.

↧

Gabriele Bartolini: OSS4B 2013, innovate through Open Source

August 30, 2013, 1:25 am

≫ Next: Chris Travers: Encryption: MySQL vs PostgreSQL

≪ Previous: Craig Ringer: Testing new PostgreSQL versions without messing up your existing install

The traditional Monash University Prato Centre, historical venue of the first European PGDay, will host the first edition of another conference: Open Source Software for Business, aka OSS4B.

OSS4B 2013 will take place in Prato, Tuscany, Italy on September 19 and 20. Through the experience gained with several organisations of PostgreSQL related events, we have tried to bring this kind of conference to a higher level, by embracing all open source technologies and services (not just Postgres) which are at the core of ICT enterprises.

Having said this, PostgreSQL will play a fundamental role in the event. Postgres major developer and contributor Simon Riggs will be presenting “Success is 99% Persistence”. I am looking forward to attending his overview of our fantastic open source project.

Another Postgres community member, Harald Armin Massa, will be the master of ceremony and, being our favourite Lightning Talk Man, he could not miss the lightning talk session!

Also, a booth is available for the Italian PostgreSQL Community, but members of PostgreSQL Europe as well are encouraged to join and help with the promotion.

Finally, I owe an overview of the conference. We chose for this year to focus on the synergy between agile methodologies (such as DevOps and Kanban) and Open Source Software. We believe indeed that the culture of continuous improvement (in Japanese, kaizen) reaches its maximum expression through open source technologies and community involvement.

Keynote speakers of this edition are Gene Kim (co-author of the best seller “The Phoenix Project”) and Dragos Dumitriu, the first IT manager to apply the Kanban methodology during his experience at Microsoft.

There will be a track dedicated to technologies, which will host talks about successful software such as Postgres (of course), Linux, MongoDB, Percona MySQL, Chef, Puppet, CFEngine, LibreOffice, OpenERP, etc.

For further information on the conference, its goals and registrations, please visit the OSS4B website. Early bird registrations end tomorrow.

↧

Chris Travers: Encryption: MySQL vs PostgreSQL

August 30, 2013, 1:49 am

≫ Next: Hubert 'depesz' Lubaczewski: Pick a task to work on

≪ Previous: Gabriele Bartolini: OSS4B 2013, innovate through Open Source

First a note, all my tests involved a relatively simple table with a schema like this (column names did vary):

CREATE TABLE enctest (
   id int,
   id_text text,
   id_enc bytea
);

In MySQL varbinary(64) was used instead of bytea.

The id was formed from a sequence from 1 to 100000. I had more trouble loading this in MySQL than in PostgreSQL. id_text was a text cast of id, and id_enc was the value of id_text encrypted using 128-bit AES encryption. This was intended to mimic sales data consisting of short strings that would be decrypted and converted to numeric data before aggregation.

The goal was to see how fast the different implementations would decrypt all records and aggregate as numeric data types. For PostgreSQL, pgcrypto was used. The tests were conducted under ANSI mode on MySQL, and the tables were innodb.

What I found was remarkably disturbing. While MySQL was blazingly fast, this speed came at the cost of basic error checking and rather than an error, decrypting with the wrong key would give the wrong data back sometimes, even on traditional modes. This is because the errors instead of warnings, per the documentation, are only transformed on insert, not on select. In other words, MySQL is just as permissive in read operations with STRICT mode turned on as turned off.

mysql> select sum(cast(aes_decrypt(id_enc, sha2('secret', 512)) as decimal)) FROM enctest;
+----------------------------------------------------------------+
| sum(cast(aes_decrypt(id_enc, sha2('secret', 512)) as decimal)) |
+----------------------------------------------------------------+
|                                                     5000050000 |
+----------------------------------------------------------------+
1 row in set (0.33 sec)

That is fast. Very fast, My similar query in PostgreSQL took about 200 seconds, so approx 600x as long, and was entirely CPU-bound the whole time.

efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd')::numeric) from sumtest;
                                                          QUERY PLAN

--------------------------------------------------------------------------------
-----------------------------------------------
Aggregate (cost=7556.16..7556.17 rows=1 width=62) (actual time=217381.965..217
381.966 rows=1 loops=1)
   Output: sum((pgp_sym_decrypt(testvalsym, 'mysecretpasswd'::text))::numeric)
   Buffers: shared read=5556 written=4948
   -> Seq Scan on public.sumtest (cost=0.00..6556.08 rows=100008 width=62) (ac
tual time=0.015..1504.897 rows=100000 loops=1)
         Output: testval, testvaltext, testvalenc, testvalsym
         Buffers: shared read=5556 written=4948
Total runtime: 217382.010 ms
(7 rows)

My first thought was that for there to be a 3-orders-of-magnitude difference between the two implementations, something must be seriously wrong on the PostgreSQL side.   This is a huge difference. But then something occurred to me. What if I use the wrong password?

On PostgreSQL:

efftest=# explain (analyse, verbose, costs, buffers)
select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd2')::numeric) from sumtest;
ERROR: Wrong key or corrupt data

On MySQL, it is a very different story:

mysql> select sum(cast(aes_decrypt(id_enc, sha2('secret2', 512)) as decimal)) FROM enctest;
+-----------------------------------------------------------------+
| sum(cast(aes_decrypt(id_enc, sha2('secret2', 512)) as decimal)) |
+-----------------------------------------------------------------+
|                                                            1456 |
+-----------------------------------------------------------------+
1 row in set, 6335 warnings (0.34 sec)

Hmmm, out of 100000 rows, only 6000 (6%) gave a warning, and we got a meaningless answer back. Thanks, MySQL. So I tried some others:

mysql> select sum(cast(aes_decrypt(id_enc, sha2('s', 512)) as decimal)) FROM enctest;
+-----------------------------------------------------------+
| sum(cast(aes_decrypt(id_enc, sha2('s', 512)) as decimal)) |
+-----------------------------------------------------------+
|                                                      1284 |
+-----------------------------------------------------------+
1 row in set, 6230 warnings (0.35 sec)

Again 6% warnings, meaningless answer returned. Wow this is fun.....

Try as I might I couldn't get MySQL to throw any errors, and I always got meaningless results back with the wrong key.   A closer look would reveal that MySQL was throwing warnings only when certain rare criteria were met and was performing no validation on the data to ensure it matched the data in. Further review showed that the cryptograms were much shorter on MySQL than PostgreSQL suggesting that PostgreSQL was padding short strings in order to ensure that cryptography would better protect the data. More on this later.

This suggested that the difference in the performance might well be related to extra sanity checks in PostgreSQL that MySQL omitted for speed-related purposes. Armed with this knowledge, I tried the following:

efftest=# update sumtest set testvalsym = pgp_sym_encrypt(testvaltext, 'mysecretpasswd', 's2k-mode=0, s2k-digest-algo=md5');
UPDATE 100000

The query returned pretty fast. However these settings are not really recommended for production environments.

I went ahead and tried again my data test queries and my performance queries and the results were two orders of magnitude faster:

efftest=# explain (analyse, verbose, costs, buffers)
select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd2')::numeric) from sumtest;
ERROR: Wrong key or corrupt data
efftest=# update sumtest set testvalsym = pgp_sym_encrypt(testvaltext, 'mysecretpasswd', 's2k-mode=0, s2k-digest-algo=md5');
UPDATE 100000
efftest=# explain (analyse, verbose, costs, buffers) select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd2')::numeric) from sumtest;
ERROR: Wrong key or corrupt data
efftest=# explain (analyse, verbose, costs, buffers)
select sum(pgp_sym_decrypt(testvalsym, 'mysecretpasswd')::numeric) from sumtest;
                                                          QUERY PLAN

--------------------------------------------------------------------------------
-----------------------------------------------
Aggregate (cost=13111.00..13111.01 rows=1 width=71) (actual time=1996.574..199
6.575 rows=1 loops=1)
   Output: sum((pgp_sym_decrypt(testvalsym, 'mysecretpasswd'::text))::numeric)
   Buffers: shared hit=778 read=10333
   -> Seq Scan on public.sumtest (cost=0.00..12111.00 rows=100000 width=71) (a
ctual time=0.020..128.722 rows=100000 loops=1)
         Output: testval, testvaltext, testvalenc, testvalsym
         Buffers: shared hit=778 read=10333
Total runtime: 1996.617 ms
(7 rows)

Much, much faster. Of course that comes at the cost of security features.

The primary security features changed here are what are called string to key functions. PostgreSQL also offers some relatively complex containers for short data which include things like padding and session keys. MySQL does not provide string to key management, and requires that you generate the hexadecimal key yourself. PostgreSQL provides a number of options for string to key generation which allow for salted hashes to be used for the actual encryption.

One of the most obvious implications here is that with MySQL, you have to generate your salted hash yourself, while with PostgreSQL, it may generate a different salted hash for each line.   This is very important for encryption particularly with smaller strings because this helps thwart rainbow tables. In essence with salted keys, there is no 1:1 relationship between the passphrase/data combination and the cryptogram, because there is no 1:1 relationship between the passphrase and the key.   Further testing suggests that this is not responsible for the performance difference but it does suggest there are more checks lurking beneath the surface which are omitted from MySQL.

So given that the issue is not string to key management, the issue must be padding. For very short strings, PostgreSQL is managing padding and containers, while MySQL is purely encrypting short strings without more than minimal padding. Since there is insufficient padding, the decryption routines are much faster, but this comes at a cost of any reasonable security. Additionally PostgreSQL provides data checks that are not done on MySQL.

So what does this tell us? I think the primary lesson which I have had driven home a few times is that database-level encryption is tricky. This is particularly true when other considerations are involved, like performance aggregating data over significant sets.     Add to this the woes of in-db key management and the like and in-db encryption is definitely expert territory. In this regard, MySQL's approach seems to require a lot more complexity to maintain security than PostgreSQL's.

It is important to remember that short encrypted strings are relatively common in databases which use encryption. One of the most common uses is for things like credit card numbers.    For the reasons mentioned here I would suggest that PostgreSQL is much more trustworthy in these cases.

↧

Hubert 'depesz' Lubaczewski: Pick a task to work on

August 30, 2013, 6:18 am

≫ Next: Vasilis Ventirozos: PostgreSQL proper installation

≪ Previous: Chris Travers: Encryption: MySQL vs PostgreSQL

There are cases where system stores list of things to do, and then there are some worker processes that check the list, pick something to work on, do it, and remove from the list. Proper solution is to use some kind of queuing system. There is even PgQ which works withing PostgreSQL, but some people […]

↧

Vasilis Ventirozos: PostgreSQL proper installation

August 27, 2013, 3:06 pm

≫ Next: Vasilis Ventirozos: PostgreSQL, Binary replication in practice

≪ Previous: Hubert 'depesz' Lubaczewski: Pick a task to work on

Recently at work, i got assigned to upgrade some (about 10)
postgres 8.1.x to 9, I always liked compiling basically because I like the flexibility that compile offers, and thats what i proposed to the guys that are in charge of the project. They gave me a test system (vm) to play with, in all fairness they were a bit skeptical with the idea of compiling the rdbms. Mostly for reliability issues (don't ask me why). I explained that upgrading from source would be much easier later on and that the last year PostgreSQL developers are doing so much work that it cannot be ignored (pg_basebackup for example).Since the latest centos package was 9.1.x they agreed and i started working.
PostgreSQL is so easy to compile, no strange dependencies not many libraries needed. and about these reliability concerns ? Because developers are so neat, and i quote from
http://www.postgresql.org/support/versioning

"While upgrades always have some risk, PostgreSQL minor releases fix only frequently-encountered security and data corruption bugs to reduce the risk of upgrading. The community considers not upgrading to be riskier than upgrading."

You should never find an unexpected change that breaks an application in a minor PostgreSQL upgrade. Bug, security, and corruption fixes are always done in a way that minimizes the odds of introducing an externally visible behavior change, and if that's not possible, the reason why and the suggested workarounds will be detailed in the release notes. What you will find is that some subtle problems, resulting from resolved bugs, can clear up even after a minor version update. It's not uncommon to discover a report of a problem to one of the PostgreSQL mailing lists is resolved in the latest minor version update compatible with that installation, and upgrading to that version is all that's needed to make the issue go away.

so don't be afraid to upgrade , DO IT

If you want to compile postgres you will need (except from the basic development tools) the following 2 libraries :

zlib1g-dev (compression library needed if you want to compress directly from postgres, pg_dump -Z for example)

libreadline6-dev (used by the client to support history on psql)

You could call both these libraries optional and you actually can compile postgres without them, but meh , don't do that...

Other than that its pretty much straight forward,
untar
./configure
make
make install (check the documentation for extra flags , change of prefix etc)

postgres will install the binaries by default in /usr/local/pgsql/
Packages from the other hand will install in /usr/bin (MEGA SUPER LAME)

SO ! lets see the current situation. I inherited a postgresql server that had a package installation, the binaries located in /usr/bin and postgres home in /var/lib.
If it was compiled i could go to /usr/local/pgsql , rename it to /usr/local/pgsql8 remove it from the path and let it be, but yeah, i couldn't do that, i fixed the path, put /usr/local/pgsql before /usr/bin so the first postgres binaries in path order were 9's so i was ok. But still ! the packages tend to inter-grade deep into the OS making any other option (like running 2 postgres servers at the same time) more difficult that it should be.

I like adding some of the contrib packages into postgres , for logging, monitoring and tuning reasons pg_buffercache, pgbench, pg_stat_statements, pg_freespacemap are some examples.
In a compiled environment there is nothing easier than that, nothing extra to download not much to do, just compile the contrib module and add it to the database with CREATE EXTENSION.
Now lets say that after a while a new release of postgres comes and i wanna upgrade to that just because i want to stay current or because a new wow feature was added or a bug was resolved, all i have to do is compile the new version (assuming its a minor release) replace the binaries and restart the server. Package installed postgres server would have to wait till the distribution released a new package of the new server usually a few versions behind current, which means what ? a month after the release date? maybe more!?

Basically the only thing that i can imagine being easier with a package installation are the init scripts, and yeah , ok i don't think that is a huge pro comparing to a compiled version.

SO my suggestion is , compile the damn thing.. COMPILE !

Thanks for reading
-Vasilis

↧

Vasilis Ventirozos: PostgreSQL, Binary replication in practice

August 27, 2013, 3:07 pm

≫ Next: Vasilis Ventirozos: PostgreSQL Partitioning

≪ Previous: Vasilis Ventirozos: PostgreSQL proper installation

A couple of days ago I started making a short howto about streaming replication in PostgreSQL 9.2. Most of these things are well documented but in this howto i will also try to experiment with switch overs and switchbacks. It aims to show how easy it is to set it up right out of the box.

Streaming replication PostgreSQL 9.2

For my example i will use 2 debian VM's 192.168.0.100 (pglab1) and 192.168.0.101 (pglab2)

- not mandatory -

exchange ssh keys for passwordless ssh ,might be used if we need to scp scripts , rsync or whatever.

if you don't know how to do it, follow these steps :
http://www.debian-administration.org/articles/152

- ON MASTER (PGLAB1) -
after you create a postgres cluster using initdb,
edit master's postgresql.conf and change the following :
listen_addresses = '*'
wal_level = hot_standby #(could be archive too)
max_wal_senders = 5
hot_standby = on

create a replication user :
create user repuser replication password 'passwd';

edit pg_hba.conf and add :

host all all 192.168.0.0/0 trust
host replication repuser 192.168.0.0/0 md5

- ON SLAVE -

Now ssh to slave and from $PGDATA run :

pg_basebackup -D /opt/data/ -v -h 192.168.0.100 -U repuser
enter password , this will transfer a full copy of your cluster from your master. check the documentation of pg_basebackup for compression and other options available.

In $PGDATA edit a file called recovery.conf containing :
standby_mode = on
primary_conninfo = 'host=192.168.0.100 port=5432 user=repuser password=passwd'

with the master up, start slave , it should say :
LOG: database system is ready to accept read only connections

At this point you will have 2 nodes running with your master accepting read/write operations and your slave accepting only read only operations (reporting goes here maybe ?) now, lets say the master crashes and you need to failover and promote slave as the new master.

-FAILOVER-

shutdown master
on slave, execute:
pg_ctl promote

that's it, your slave (pglab2) is now master accepting all kinds of connections.

now lets say that the ex-master (pglab1) , is fixed and is ready to come up again,

- Switch back to original Master -

on the current master (pglab2) :

echo "select pg_start_backup('clone',true);" |psql pgbench
rsync -av --exclude postgresql.pid /opt/data/* 192.168.0.100:/opt/data/
echo "select pg_stop_backup();"|psql pgbench

this will sync all data from my current master (pglab2) to my current -to be- slave (pglab1), should be currently down.

edit recovery.done and fix the ip of the current master
rename recovery.done to recovery.conf
start ex-master (pglab1), now as slave , promote it with the slave (pglab2) down,
recreate slave (pglab2) with rsync, edit recovery.conf and start it again,
the servers now have their original roles.

note that you can have 2 master databases but this might (and probably will) create a mess, so be sure that you bring down the correct server before promoting.

if you want to check if a server is acting as master or slave , run :

select pg_is_in_recovery()
If it's true, you're on a slave.

Thanks for reading.

↧

Vasilis Ventirozos: PostgreSQL Partitioning

August 27, 2013, 3:08 pm

≫ Next: Vasilis Ventirozos: Migrating Oracle to PostgreSQL

≪ Previous: Vasilis Ventirozos: PostgreSQL, Binary replication in practice

Partitions are very usable when it comes to big tables, documentation suggests applying table partitioning when a table is bigger than 10Gb. In postgres there are 2 kinds of partitions

Range
List

implementation will just need the following steps.

Enable constraint exclusion in config file
Create a master table
Create child tables WITHOUT overlapping table constraints
Create indexes , pk's
Create function and trigger to insert data to child tables

First thing to notice here is that partitioning is basically using table inheritance. but enough with how things are working in theory, lets create one.

First , check into postgresql.conf for this parameter:

constraint_exclusion = partition

Now , lets create the master and child tables :

CREATE TABLE sales (
    sales_id serial NOT NULL,
    sales_date DATE NOT NULL DEFAULT CURRENT_DATE,
    description text
);

CREATE TABLE sales_2013_p1 (
CHECK ( sales_date >= DATE '2013-01-01' AND sales_date < DATE '2013-05-01' )
) INHERITS (sales);

CREATE TABLE sales_2013_p2 (
CHECK ( sales_date >= DATE '2013-05-01' AND sales_date < DATE '2013-09-01' )
) INHERITS (sales);

CREATE TABLE sales_2013_p3 (
CHECK ( sales_date >= DATE '2013-09-01' AND sales_date < DATE '2014-01-01' )
) INHERITS (sales);

notice the keyword INHERITS here :)
next, PK's and indexes on child tables,

ALTER TABLE sales_2013_p1 ADD CONSTRAINT sales_2013_p1_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p2 ADD CONSTRAINT sales_2013_p2_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p3 ADD CONSTRAINT sales_2013_p3_pkey PRIMARY KEY (sales_id, sales_date);

CREATE INDEX idx_2013_p1 ON sales_2013_p1 (sales_date);
CREATE INDEX idx_2013_p2 ON sales_2013_p2 (sales_date);
CREATE INDEX idx_2013_p3 ON sales_2013_p3 (sales_date);

and finaly a function that returns trigger and the on-insert trigger itself.

CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
        INSERT INTO sales_2013_p1 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
        INSERT INTO sales_2013_p2 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
        INSERT INTO sales_2013_p3 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_on_sales
    BEFORE INSERT ON sales
    FOR EACH ROW EXECUTE PROCEDURE sales_trig_func();

Now that we have a table with a basic partitioning schema, lets assume that we want to add more partitions for 2014. create a new child table (for examples sake) i will just create a partition for 2014.

CREATE TABLE sales_2014 (
CHECK ( sales_date >= DATE '2014-01-01' AND sales_date < DATE '2015-01-01' )
) INHERITS (sales);

ALTER TABLE sales_2014 ADD CONSTRAINT sales_2014_pkey PRIMARY KEY (sales_id, sales_date);

CREATE INDEX idx_2014 ON sales_2014 (sales_date);

CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
        INSERT INTO sales_2013_p1 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
        INSERT INTO sales_2013_p2 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
        INSERT INTO sales_2013_p3 VALUES (NEW.*);

    ELSIF ( NEW.sales_date >= DATE '2014-01-01' AND NEW.sales_date < DATE '2015-01-01' ) THEN
        INSERT INTO sales_2014 VALUES (NEW.*);
   ELSE
        RAISE EXCEPTION 'Date out of range.!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

and we are done!

Now lets say that 2013_p1 data are obsolete and we want to move them to a historical database, drop table, correct and replace the function and you are done.
This is how the master table would look after these operations :

partition=# \d+ sales
                                                   Table "public.sales"
   Column    | Type   |                        Modifiers                             | Storage | Stats target | Description
-------------+---------+----------------------------------------------------------+----------+--------------+-------------
sales_id    | integer | not null default nextval('sales_sales_id_seq'::regclass) | plain    |              |
sales_date | date    | not null default ('now'::text)::date                                  | plain    |              |
description | text    |                                                                                         | extended |              |
Triggers:
    insert_on_sales BEFORE INSERT ON sales FOR EACH ROW EXECUTE PROCEDURE sales_trig_func()
Child tables: sales_2013_p2,
             sales_2013_p3,
                     sales_2014
Has OIDs: no

Good thing about this approach is that partitions are generally easy to maintain and administer, child tables can have different indexes with each other, and you can of course delete large portions of data that may not be needed any more just by dropping a partition, uh ! the performance is VERY good on insert and select and maintenance work like reindex is faster. Reindex in particular wouldn't lock the whole master table for writes.

Thanks for reading
-Vasilis

↧

Vasilis Ventirozos: Migrating Oracle to PostgreSQL

August 27, 2013, 3:10 pm

≫ Next: Vasilis Ventirozos: PostgreSQL backup and recovery

≪ Previous: Vasilis Ventirozos: PostgreSQL Partitioning

I have an oracle dump from a database that i want to migrate to postgres for a project. I setup a 100gb VM, I setup oracle 11gr2 and i compiled postgres 9.2.3 on the same VM. i restored the dump to oracle and now i had to make a migration strategy.
I had some ideas like getting the schema from the dump with expdp (sqlfile=) and manually translate it to postgres but i remembered a project that i saw some time ago called ora2pg. This cool thing is basically a perl program that needs except of perl (duh) DBI and DBD for oracle and postgres. i set it up and i started experimenting first on the DDL's. the output was actually very good ! the only things that i had to change was some function based indexes and it was parse-able from postgres without problems, next was data. No problems there , it created a 20gb output file that all i had to do was to throw it to postgres. Just because i wasn't very proactive with my VM's disk space i faced some problems... 20Gb from the export plus the oracle data
(36664764 /opt/oradata/) + oracle software + other things ?
that was about 65% of my diskspace, and i had to also maintain a second live copy of these data+indexes on postgres.. So i gzipped the whole output file and i used "zcat |psql". worked like charm ,actually its still running as i write this because ...
And here are somethings to take under consideration before starting!
the dumpfile is 20gb, what happens if something goes wrong ?
You drop the database fix the ddl script and reapply the data
-- SUPER MEGA LAME --
Do it one object at the time, not like i did or at least separate the small from the big tables in groups and do the big tables one at the time, or something like :
ERROR: invalid input syntax for integer: "26.19"
CONTEXT: COPY twitter_tmp, line 1, column network_score: "26.19"
might end up fucking you up !
ora2pg translated a number column as integer , something that came back to bite me in the ass 2 hours after import run...
i changed the ddl script and i rerun it, if it fails again i have to start over.. SO, don't be like me , act smart, do it one part at the time :)
in all fairness i still have no idea what tables are in the database and my job has been made easy mode with ora2pg. So.. i cant blame it, i can only blame me for not MAKING A MIGRATION STRATEGY afterall.
i will also show you the parameters that i have changed in order to make the import faster, the difference is HUGE actually..

shared_buffers = 2GB
synchronous_commit = off
wal_buffers = 16MB
wal_writer_delay = 1000ms
checkpoint_segments = 64

Anyway so far i think i'm good with the table definitions (including constraints) and the data (hopefully) but there is a lot more that have to be done, views , code etc...

thanks for reading
to be continued...
Vasilis

↧

Vasilis Ventirozos: PostgreSQL backup and recovery

August 27, 2013, 3:11 pm

≫ Next: Vasilis Ventirozos: Setting shared_buffers the hard way

≪ Previous: Vasilis Ventirozos: Migrating Oracle to PostgreSQL

One of the main tasks of any kind of administrator is to make sure that the data that he's responsible for will be available if anything bad happens (asteroids, flood, locusts, hail) in order to do that you will need a high availability solution to obtain continuity and a good backup plan to support your system in case of human error ( -Hello Mr. admin, i just dropped the customer payment table) trust me, it happens ... a lot...
I briefly covered high availability with the implementation of a hot standby in an earlier post and now its time to cover backup options available in PostgreSQL.

There are 2 kinds of backups in PostgreSQL, physical and logical.

The good thing with logical backups is that they are simple to implement and maintain, selective backup and restore (even in later PG versions) is easy. Usually the backup output consists in one file that of course can be compressed. the major con of this method is that lacks Point In Time Recovery. (A database with the PITR feature can be restored or recovered to the state that it had at any time since PITR logging was started for that database.) it also lacks incrementality (each backup is a new WHOLE backup of your database). This makes these kinds of backups not really usable in very large production installations. Before you decide to say, "Not good" let me tell you cases that these kinds of backups would be better than incremental backups.
Test servers with not much data, small installations that the output will be small and the backups can be taken in a daily (or even less) basis. Large installations that don't have many data changes over the day. Data Warehouses and reporting servers.

Examples
once again PGlab1 will be the my lab rat , don't worry, No Animals or Data Were Harmed

PGDATA=/opt/PGDATA
export backupfile=""backup"_`date "+%d%m%Y"`.tgz"
pg_ctl stop && tar cvfz ~/$backupfile $PGDATA && pg_ctl start

Basicaly i just stopped the database , took a tgz of my $PGDATA and started the DB again.
simple and effective, restore can be done on a different path than $PGDATA, just make sure you provide -D on pg_ctl or set the PGDATA to the correct path before you start.

pg_dump and pg_dumpall

pg_dump and pg_dumpall, exports one or all databases to a (by default) human readable sql format.
it can be compressed by default and it supports a lot of options like data only , schema only etc.

i have a database for this posts sake called testbackup

if i run pg_dump testbackup i will get on stdout one by one the sql commands that i would need to remake the database from scratch so just by redirecting it to a file you have a simple backup. i wont get into details about the format of this file , i will just say that at first you will see connection details like the encoding and the extensions that exist , then the table creation script, then the data (using a postgres command called COPY) and then the indexes and constrains.
NOTE that taking a pg_dump wont backup the users, and thats because users in postgres are global and they exist in postgres database. to backup users you can use pg_dumpall -g (-g means globals).

Here's a script that i am using in order to take this kind of backup :

export PGDATA=/opt/db/data
export PGPASSFILE=/opt/db/data/pgpass.conf

logfile=/var/lib/pgsql/backup/pgbackup.log
backupdir=/var/lib/pgsql/backup
pgdump=/usr/bin/pg_dump
psql=/usr/bin/psql
pgdumpall=/usr/bin/pg_dumpall
psqluser=postgres
retention=7
#</Variables>
usersfilename="`date "+%d%m%Y"`.bck.users"
$pgdumpall -g -U $psqluser > $backupdir/$usersfilename && find $backupdir/*bck.users* -ctime +$retention -exec rm {} \;

for db in `echo "select datname from pg_database
where datname not in ('template0','template1','postgres');
"|$psql -A -t postgres`
do
backupname=""$db"_`date "+%d%m%Y"`.bck"
logfilename=""$db"_`date "+%d%m%Y"`.bck.log"
usersfilename=""$db"_`date "+%d%m%Y"`.users"
$pgdump -Fc -v -f $backupdir/$backupname -U $psqluser $db 2> $backupdir/$logfilename && find $backupdir/$db*bck* -ctime +$retention -exec rm {} \;
done

Notice that i use the -Fc switch in pg_dump, that means custom format, and it can be used for selective restore using the pg_restore command. if i had one of these backups and i wanted to restore the table "customers" i would run :
pg_restore -Fc -t customers -f <file name> -U <username> -h < host name> -d <db name>
NOTE that there is a switch (-j) for parallelism.
more about pg_restore pg_dump and pg_dumpall on : pg_restore , pg_dump , pg_dumpall

Now that we are done with database dump backup basics , lets move to live , or online backup , PITR and timelines.
In order to get a backup that is incremental you will need a basebackup and all the changes that transactions do to the database, so you need the transaction logs or as we call them in postgres WAL segments. I wont say many things about how transaction mechanism works in postgres, this is a backup and restore post so i will leave WAL mechanism for another post.

Standalone hot physical database backup

I will use the following directories and variables for examples sake

export BACKUPNAME=""backup"_`date "+%d%m%Y"`.tgz"

postgres@pglab1:/opt$ ls -l
total 8
drwxr-xr-x 2 postgres postgres 4096 Mar 27 12:32 BACKUP
drwx------ 15 postgres postgres 4096 Mar 27 11:40 PGDATA

mkdir /opt/BACKUP/archives

Set an archive_command. In postgresql.conf and restart the server:
wal_level = archive
archive_mode = on
archive_command = 'test -f /opt/BACKUP/archiving/archiving_active && cp %p /opt/BACKUP/archive/%f'

mkdir /opt/BACKUP/archiving/
touch /opt/BACKUP/archiving/archiving_active

now run :
psql -c "select pg_start_backup('BACKUP')"
tar -cvzf --exclude=$PGDATA/pg_xlog -f ../BACKUP/$BACKUPNAME $PGDATA
psql -c "select pg_stop_backup(), current_timestamp"

now, lets crash and restore

rm -rf /opt/PGDATA/* (yoohoo !!!)
untar the backup (.tgz) in $PGDATA , you should miss pg_xlog dir , create it as postgres user
then on $PGDATA edit a file called recovery.conf and add :
restore_command = 'cp /opt/BACKUP/archive/%f %p'

start the database and watch the logfile, it should show something like :

2013-03-27 13:22:58 EET::@:[3047]: LOG: archive recovery complete
2013-03-27 13:22:58 EET::@:[3045]: LOG: database system is ready to accept connections
2013-03-27 13:22:58 EET::@:[3061]: LOG: autovacuum launcher started

the recovery.conf will also be automatically renamed to recovery.done.

Hot physical backup & Continuous Archiving

Now this is what you would want for a mission critical production installation with a lot of GBs or Tbs of data and a lot of concurrent users hitting the DB 24/7.
For examples sake i will delete my whole cluster and make the steps one at the time the backup will be taken locally something that of course is not suggested, and at the end i will perform a PITR and i will also talk about timelines.

edit postgresql.conf and enable archiving :

wal_level = archive
archive_mode = on
archive_command = 'cp %p /opt/BACKUP/archives/%f'
(NOTE that archive_command can be scp, a more advanced external script or anything that would transfer the archived WALs to the desired location)
restart the server
psql -c "select pg_start_backup('my backup')"

you can now tar , rsync or whatever you want to another node, something like
"rsync -cva --inplace --exclude=*pg_xlog* ${PGDATA}$OTHERNODE:$BACKUPNAME/$PGDATA"
would work

for my example, i will just use tar like the previous example:
tar -cvz --exclude=/opt/PGDATA/pg_xlog/ -f /opt/BACKUP/backup.tgz $PGDATA

psql -c "select pg_stop_backup(), current_timestamp"

At this moment i have a base backup , and the mechanism that archives all wal segments, lets add some data and force some checkpoints.

notice that the archives directory now has WALs
postgres@pglab1:/opt/PGDATA/pg_xlog$ ls -l /opt/BACKUP/archives/
total 49156
-rw------- 1 postgres postgres 16777216 Mar 27 13:57 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Mar 27 14:02 000000010000000000000002
-rw------- 1 postgres postgres 293 Mar 27 14:02 000000010000000000000002.00000020.backup
-rw------- 1 postgres postgres 16777216 Mar 27 14:04 000000010000000000000003

a WAL segment is happening either on size or time threshold, with the default postgresql.conf values that means on 16Mb or every 5 minutes, whatever happens first. Both parameters can be, and should be changed for performance's sake depending on your workload, so monitor checkpoint frequency.

now lets say that something really bad happened, like a mistaken but commited update on the table backup to make it easier for me i created that table with a datetime column with default value now().
so we have :

datetime | count
----------------------------+-------
2013-03-27 14:05:05.999257 | 1000
2013-03-27 14:05:14.911462 | 1000
2013-03-27 14:05:19.419173 | 1000
2013-03-27 14:05:25.631254 | 1000
2013-03-27 14:06:39.97177 | 1000
2013-03-27 14:09:53.571976 | 1000

Lets also assume that we know that the update was recorded at 2013-03-27 14:05:25.631254 and we want the database back to that exact time.

edit a recovery.conf as we did before :
restore_command = 'cp /opt/BACKUP/archives/%f %p'
recovery_target_time = '2013-03-27 14:04:00'

and restart the db, check the logfile , you'll see something like :
LOG: starting point-in-time recovery to 2013-03-27 14:04:00+02

Now lets Recover from a crush
once again , rm -rf /opt/PGDATA/*

untar the basebackup , place recovery.conf with or without the recovery_target_time
and start the database.
in my example i also did a PITR to 2013-03-27 14:01:00
and the table now has :
1 | 2013-03-27 13:56:49.163269

Timelines

PostgreSQL documentation describes timelines much better than i could. so here it is right from the documentation :
The ability to restore the database to a previous point in time creates some complexities that are akin to science-fiction stories about time travel and parallel universes. For example, in the original history of the database, suppose you dropped a critical table at 5:15PM on Tuesday evening, but didn't realize your mistake until Wednesday noon. Unfazed, you get out your backup, restore to the point-in-time 5:14PM Tuesday evening, and are up and running. In this history of the database universe, you never dropped the table. But suppose you later realize this wasn't such a great idea, and would like to return to sometime Wednesday morning in the original history. You won't be able to if, while your database was up-and-running, it overwrote some of the WAL segment files that led up to the time you now wish you could get back to. Thus, to avoid this, you need to distinguish the series of WAL records generated after you've done a point-in-time recovery from those that were generated in the original database history.
To deal with this problem, PostgreSQL has a notion of timelines. Whenever an archive recovery completes, a new timeline is created to identify the series of WAL records generated after that recovery. The timeline ID number is part of WAL segment file names so a new timeline does not overwrite the WAL data generated by previous timelines. It is in fact possible to archive many different timelines. While that might seem like a useless feature, it's often a lifesaver. Consider the situation where you aren't quite sure what point-in-time to recover to, and so have to do several point-in-time recoveries by trial and error until you find the best place to branch off from the old history. Without timelines this process would soon generate an unmanageable mess. With timelines, you can recover to any prior state, including states in timeline branches that you abandoned earlier.
Every time a new timeline is created, PostgreSQL creates a "timeline history" file that shows which timeline it branched off from and when. These history files are necessary to allow the system to pick the right WAL segment files when recovering from an archive that contains multiple timelines. Therefore, they are archived into the WAL archive area just like WAL segment files. The history files are just small text files, so it's cheap and appropriate to keep them around indefinitely (unlike the segment files which are large). You can, if you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as a result of experimentation.
The default behavior of recovery is to recover along the same timeline that was current when the base backup was taken. If you wish to recover into some child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify the target timeline ID in recovery.conf. You cannot recover into timelines that branched off earlier than the base backup.

I know that this probably needs a review, and the plan is to do it at some point.
Thanks for reading
Vasilis

↧

Vasilis Ventirozos: Setting shared_buffers the hard way

August 27, 2013, 3:12 pm

≫ Next: Vasilis Ventirozos: Backing up PostgreSQL in HDFS

≪ Previous: Vasilis Ventirozos: PostgreSQL backup and recovery

One of the main performance parameters in PostgreSQL is shared_buffers, probably the most important one, there are guidelines and rules of thumb that say just set it to 20-30% of your machine total memory.
Don't get me wrong , these rules are generaly good and its a perfect starting point, but there are reasons why you should tune your shared buffers in more detail

a. 30% might not be enough and you will never know if you dont know exactly how to set this parameter
b. 30% might be a lot and you spend resources in vain.
c. you want to be awesome and tune every bit of your DB the optimal way.

What is shared buffers though ?
Shared buffers defines a block of memory that PostgreSQL will use to hold requests that are awaiting attention from the kernel buffer and CPU.
so basically PostgreSQL will put temporarily data blocks in the memory in order to process - EVERYTHING will go through the shared buffers.

Why not set shared_buffers to 80% of ram in a DB dedicated server ?
The OS also has cache, and if you set shared_buffers too high you will most likely have an overlap which called double buffering, having datablocks on both caches.

So, in order to set your shared_buffers you need to know what's happening inside shared memory.PostgreSQL has an implementation called clock-sweep algorithm, so everytime you use a datablock a usage counter is making that block harder to get rid of. the block gets a popularity number from 1-5 with 5 being heavily used and most likely it will stay in shared memory.
In theory you want the most popular data blocks in the shared buffers and the least popular ones out of it. To do that you have to be able to see what is inside the shared buffers, and thats exactly what pg_buffercache package does.You will find pg_buffercache in the contrib.

lets create 2 tables and full join them, update them and generaly run operations on these 2 tables while monitoring the buffer cache
I will give dramatic examples by setting shared_buffers too low , default and too high just to demonstrate what pg_buffercache views will show and then i i will find a good value for this specific workflow.i will run the same statements while i analyze what is happening inside shared buffers.
the to most very useful sql statements from the pg_buffercache views are the following :

-- buffers per relation and size
SELECT
c.relname,
pg_size_pretty(count(*) * 8192) as buffered,
round(100.0 * count(*) /
(SELECT setting FROM pg_settings
WHERE name='shared_buffers')::integer,1)
AS buffers_percent,
round(100.0 * count(*) * 8192 /
pg_relation_size(c.oid),1)
AS percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.oid,c.relname
ORDER BY 3 DESC
LIMIT 10;

-- buffers per usage count
SELECT
c.relname, count(*) AS buffers,usagecount
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname,usagecount
ORDER BY usagecount,c.relname

RESULTS :

512kB

relname | buffered | buffers_percent | percent_of_relation

----------------------------------+----------+-----------------+---------------------

parents_id_idx | 56 kB | 10.9 | 0.1

pg_operator | 40 kB | 7.8 | 35.7

pg_amop | 32 kB | 6.3 | 100.0

pg_statistic | 24 kB | 4.7 | 20.0

pg_opclass | 16 kB | 3.1 | 100.0

pg_operator_oid_index | 16 kB | 3.1 | 50.0

pg_statistic_relid_att_inh_index | 16 kB | 3.1 | 50.0

pg_amop_opr_fam_index | 16 kB | 3.1 | 50.0

pg_amproc_fam_proc_index | 16 kB | 3.1 | 50.0

pg_amop_fam_strat_index | 16 kB | 3.1 | 50.0

0.1% of one of my index in the buffer, there is no space for more

24Mb (default value)

relname | buffered | buffers_percent | percent_of_relation

----------------------------------+----------+-----------------+---------------------

child_id_idx                   | 12 MB |        48.3 |         18.0
children                        | 11 MB |      46.4 |      11.2
parents                       | 312 kB |    1.3 |         0.3

pg_opclass_am_name_nsp_index | 16 kB | 0.1 | 100.0

pg_namespace_oid_index | 16 kB | 0.1 | 100.0

pg_operator_oid_index | 24 kB | 0.1 | 75.0

pg_statistic_relid_att_inh_index | 24 kB | 0.1 | 75.0

pg_index | 16 kB | 0.1 | 66.7

pg_opclass_oid_index | 16 kB | 0.1 | 100.0

pg_amop_opr_fam_index | 24 kB | 0.1 | 75.0

(10 rows)

part of the index and part of children table is in the buffers , still not enough

5Gb
(dont forget to set your kernel resource limits)
Envy ~ # sysctl kernel.shmmax=5387018240
kernel.shmmax = 5387018240
Envy ~ # sysctl kernel.shmall=2052297
kernel.shmall = 2052297

relname              | buffered | buffers_percent | percent_of_relation
----------------------------------+------------+-----------------+---------------------
children                      | 149 MB |          3.0 |        100.0
parents                         | 117 MB |          2.3 |        100.0
child_id_idx                     | 64 MB   |       1.3 |        100.0
parents_id_idx                | 64 MB   |          1.3 |       100.0
pg_statistic_relid_att_inh_index | 32 kB      |             0.0 |           100.0
pg_opclass_oid_index              | 16 kB      |             0.0 |           100.0
pg_namespace                       | 8192 bytes |         0.0 |           100.0
pg_namespace_oid_index             | 16 kB      |             0.0 |           100.0
pg_operator_oid_index           | 32 kB      |             0.0 |           100.0
pg_rewrite_rel_rulename_index | 16 kB      |             0.0 |           100.0

as you may see now we have the full tables and their indexes inside our buffer cache , and thats what we want, to have the popular datablocks there. aparently we've set too much memory because
a.buffers_percent is too low
b.if we sum up all buffered data we will see that its close to 394 , so a setting close to 500-600mb would be optimal value FOR THE OPERATIONS that are performed in this database.

I havent mentioned how much ram i have on this machine, i do have 6Gb.If i set shared_buffers to 30% of my ram it would be 1.8Gb, and i just proved that i need 600mb for this database, that the 1/3 of the recommended value. I also havent talked about the performance gain, for setting this parameter even if its really out of the scope of this post, i had timing on and i can tell you that the optimal numbers are when shared buffers are close to the optimal.

In real life scenarios you will have much more data than my dummy 2 table database and tuning this parameter will be much , much harder, here comes the second sql statement that i've showed you, the one with the popular pages. Every database , every workflow needs special tunning and thats why there is no magic percentage on setting this parameter, keep the popular pages (popularity 3+) inside buffer cache , leave the rest to the OS.
20-30% is indeed a good starting point but it wont hurt after the database has run its workflow for a while to check whats happening inside the shared buffers, you might need to increase or decrease on the next restart.

thanks for reading
- Vasilis

↧

Vasilis Ventirozos: Backing up PostgreSQL in HDFS

August 27, 2013, 3:13 pm

≫ Next: Fabien Coelho: PostgreSQL Archeology

≪ Previous: Vasilis Ventirozos: Setting shared_buffers the hard way

There are number ways to backup a PostgreSQL database, some are standard and some just demonstrate the power of open source and the things you can do if you put in use your creativity and imagination. At OmniTI, we use OmniPITRtool to manage WAL files and running backup on secondary databases instead of primary to reduce load during backup. In this post, I will discuss OmniPITR and Hadoop to accomplish something very neat, storing your backups into HDFS (Hadoop Distributed File System).

You might be asking Why? HDFS is rock solid reliable, it has extremely low cost per byte and it can get 2Gbit per computer, scalable up to more than a TB per second. it is proven from internet giants for running a big variety of different use-cases.

Let's say that you have a 1TB database running, an uncompressed backup will need 1TB of reliable storage just to keep one copy. HFDS has the great advantage of using cheap hardware and being fault tolerant at the same time.

Imagine adding cheap SATA disks to the company workstations and keep the database backups there. Now, imagine explaining that to the next SAN salesman who will come to sell you storage. - FUN TIMES !

Let me give some details about tools used:

Hadoop:

It's an open-source software for reliable, scalable, distributed computing.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

OmniPITR :

OmniPITR, written and maintained by OmniTI,is a set of scripts to ease setting up WAL replication, and making hot backups from both Master and Slave systems.

This set of scripts has been written to make the operation seamless, secure and as light on resources-usage as possible.

Now, You must be wondering, why am I talking about it now? Depeszhas recently added a new feature called dst-pipes (pipe to program), this feature is limited only by the users imagination, Depesz as an example of this feature did gpg backups in one operation! i am going to use the same feature to do something entirely different.

Setup :

Software Versions: PostgreSQL 9.3 (beta2) , Hadoop 1.2.1

For the setup , I have used a VM for Hadoop and PostgreSQL, the reason for me using just one VM is that actually there are no limitations in the topology. Hadoop and Postgres are 2 entirely different ecosystems so you have no limitations, the PostgreSQL might be running on a cascading multi slave environment, While Hadoop is running as I said, on desktops. Other than Postgres and Haoop, you have to git clone OmniPITR. Depesz has put effort on making this self sufficient so no extra perl modules will be needed. The only thing that I have added to a minimal (net install) Debian installation was rsync.

Configuration :

In postgresql.conf, enable archiving and on the archive_command put something like :

archive_command = '/usr/local/omnipitr/bin/omnipitr-archive -dl /mnt/archives/xlog -s /usr/local/omnipitr/state -db /mnt/backup/xlog -l /usr/local/omnipitr/log/omnipitr-^Y-^m-^d.log -v "%p"'

Now, you may have a different archive_command, i really suggest using OmniPITR, it really works well and it has been tested in very large installation. you can do cool stuff like sending the archives to a remote location gzip them, make combinations of local gzipped copies and remote etc. You could read about OmniPITR features in documentation on github.

Let's make a script:

#!/usr/bin/env bash

hadoop dfs -put - /user/hduser/$1

This script will allow Hadoop to get a file from stdin and put it into a HDFS namespace

Bring up postgreSQL with above archive_command and Hadoop running on the node.

Let's try to backup straight into Hadoop:

/usr/local/omnipitr/omnipitr-backup-master -D $PGDATA -l dl-backup.log -dp ~/test/hadoop_bck.sh -x /mnt/backup/xlog -v

-dpis the key switch here, it uses the new feature of OmniPITR : 'dst-pipes' it will run the script and will output the backup as input to the script we just made.

To verify the backup is actually there :

$hadoop dfs -ls /user/hduser

Found 2 items

-rw-r--r-- 1 vasilis supergroup 115322880 2013-08-22 23:18 /user/hduser/dblab1-data-2013-08-22.tar

-rw-r--r-- 1 vasilis supergroup 33566720 2013-08-22 23:18 /user/hduser/dblab1-xlog-2013-08-22.tar

You might want to refer to Hadoop documentation for how to manipulate these files, organize them etc. but the backup is there :

vasilis@dblab1:~/test$ hadoop dfs -copyToLocal /user/hduser/dblab1-data-2013-08-22.tar

vasilis@dblab1:~/test$ ls -l ~/dblab1-data-2013-08-22.tar

-rw-r--r-- 1 vasilis vasilis 115322880 Aug 23 18:06 /home/vasilis/dblab1-data-2013-08-22.tar

To restore the cluster, you would need something like :

hadoop dfs -cat /user/hduser/dblab1-data-2013-08-22.tar |tar xv

For our example, we had no need for compression, gzip'ing the output is supported by OmniPITR and you may also do that on the script that pushes the tar to Hdfs

This is just the tip of the iceberg, just a proof of concept , the tools are there and everything is stable.

Use your imagination , save your company some money and be the hero !!!

Thanks for reading !!

↧

Fabien Coelho: PostgreSQL Archeology

August 31, 2013, 11:21 pm

≫ Next: Hans-Juergen Schoenig: Speeding up “min” and “max”

≪ Previous: Vasilis Ventirozos: Backing up PostgreSQL in HDFS

In a previous post I have ported a Turing Machine (TM) in SQL down to PostgreSQL 7.3.

I report here my fruitless attempts at running previous versions of PostgreSQL on a Debian or Ubuntu Linux. They would not configure and/or compile, at least with the minimal effort I was ready to undergo: writing portable-to-the-future system-dependent C code does not work. Basically, the code from 15 years ago is lost unless you run it on a system from 15 years ago, which may require a hardware from 15 years ago as well, or maybe a VM.

Trying to compile old PostgreSQL versions…

Here is a summary of failures encountered while trying to compile previous versions from available sources:

PostgreSQL 7.2 (2002)

configure thinks that my flex 2.5.35 is flex 2.5.3 and reports an issue. Then the compilation fails, possibly because of an include issue, on:

hba.c:885: error: storage size of ‘peercred’ isn’t known

Trying again with an older Debian Lenny, it fails when linking postgres:

/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches non-TLS reference in commands/SUBSYS.o
/lib/libc.so.6: could not read symbols: Bad value
collect2: ld returned 1 exit status

PostgreSQL 7.1 (2001)

configure fails, it does not recognize gcc4.7.2. autoconf fails on m4 while processing configure.in. Same result with the old Debian Lenny.

...
/usr/bin/m4:configure.in:930: non-numeric argument to builtin `_m4_divert_raw'
autom4te: /usr/bin/m4 failed with exit status: 1

PostgreSQL 7.0 (2000) and 6.5 (1999)

configure fails, but autoconf works fine. Fix manually string continuations in src/include/version.h. ld fails in the end with a segfault while linking postgres:

collect2: ld terminated with signal 11 [Segmentation fault]

PostgreSQL 6.0 (1997)

Create src/Makefile.custom, then make. Compilation fails, possibly because of include issues:

ipc.c:197: error: storage size of ‘semun’ isn’t known

Same result on the old Debian Lenny.

Postgres95 1.08 (1996)

Edit src/Makefile.global, then make. Compilation fails:

indexam.c:188:1: error: pasting "->" and "aminsert" does not give a valid preprocessing token

Conclusion

Although the sources of past versions of PostgreSQL are available, having them up and running on a modern system is another story.

rule

↧

Hans-Juergen Schoenig: Speeding up “min” and “max”

September 3, 2013, 1:27 am

≫ Next: Bruce Momjian: Five Events

≪ Previous: Fabien Coelho: PostgreSQL Archeology

Indexes are a perfect tool to finding a certain value or some kind of range in a table. It is possible to speed up a query many times by avoiding a sequential scan on a large table. This kind of behavior is widely known and can be observed in any relational database system. What is [...]

↧

Bruce Momjian: Five Events

September 3, 2013, 1:45 pm

≫ Next: Michael Paquier: Playing with large objects in Postgres

≪ Previous: Hans-Juergen Schoenig: Speeding up “min” and “max”

I am presenting at five new events in the next two months; they are either new groups, new cities, or new venues. The events are in Maryland, Chicago, Moscow, Italy, and Dublin.

↧

Michael Paquier: Playing with large objects in Postgres

September 4, 2013, 12:15 am

≫ Next: Hubert 'depesz' Lubaczewski: What is the overhead of logging?

≪ Previous: Bruce Momjian: Five Events

PostgreSQL has for ages a feature called large objects allowing to store in the database objects with a… Well… Large size. All those objects are stored in dedicated catalog tables called pg_largeobject_metadata for general information like ownership and pg_largobject for the data itself, data divided into pages of 2kB (default size, defined as BLCKSZ/4). This [...]

↧

Hubert 'depesz' Lubaczewski: What is the overhead of logging?

September 5, 2013, 9:10 am

≫ Next: gabrielle roth: PDXPUG: September meeting in two weeks – JSON

≪ Previous: Michael Paquier: Playing with large objects in Postgres

There obviously is one – after all, logging information has to be more expensive than not logging it. But how big is it? And more importantly, what is the difference between logging to stderr/file, csvlog and syslog? And what about syslog to remote machine? Let's see. For my test, I used 2 machines: DB server: […]

↧

gabrielle roth: PDXPUG: September meeting in two weeks – JSON

September 5, 2013, 6:04 pm

≫ Next: Dimitri Fontaine: Using trigrams against typos

≪ Previous: Hubert 'depesz' Lubaczewski: What is the overhead of logging?

When: 7-9pm Thu Sep 19, 2013
Where: Iovation
Who: Andrew Kreps
What: JSON

This month we return to our regular meeting location at Iovation. Andrew Kreps will be giving a JSON demo.

Topics:
- A brief definition of JSON for those who may not have a lot of experience with it
- Why the storage of JSON as JSON is important
- Working with the operators, selecting data, etc
- A practical example, probably using the Instagram API.

Andrew’s a Portland-based software engineer who digests APIs for breakfast. After stumbling through the worlds of Oracle and Mysql for many years, he’s found PostgreSQL to do things they way they should have always been done.

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Building security will close access to the floor at 7:30.

↧

Dimitri Fontaine: Using trigrams against typos

September 6, 2013, 7:15 am

≫ Next: Hans-Juergen Schoenig: From PostgreSQL directly to Vim

≪ Previous: gabrielle roth: PDXPUG: September meeting in two weeks – JSON

In our ongoing Tour of Extensions we played with earth distance in How far is the nearest pub? then with hstore in a series about trigger, first to generalize Trigger Parameters then to enable us to Auditing Changes with Hstore. Today we are going to work with pg_trgm which is the trigrams PostgreSQL extension: its usage got seriously enhanced in recent PostgreSQL releases and it's now a poor's man Full Text Search engine.

Some people are quite serious about trigrams

Of course we also have the rich men version with Text Search Parsers and several kinds of dictionnaries with support for stemming, thesaurus or synomyms support, and a full text query language and tools for ranking search result. So if what you need really is Full Text Search then go check the docs.

The use trigrams is often complementary to Full Text Search. With trigrams we can implement typing correction suggestions or index like and POSIX Regular Expressions searches.

Whatever the use case, it all begins as usual by enabling the extension within your database server. If you're running from PostgreSQL packages be sure to always install the contrib package, really. A time will come when you need it and you will then be happy to only have to type CREATE EXTENSION to get started.

# create extension pg_trgm;
CREATE EXTENSION

Setting up the use case

The use case I want to show today is to suggest corrections to some words the user did obviously typoed, because your search form is not finding any result. Or to offer suggest as you type feature maybe, doing a database search for approximate matching strings in a kind of catalog that you have to offer auto-completion.

One easy to use catalog here is the Dell DVD Store Database Test Suite that you can download also as a ready to use PostgreSQL text dump at http://pgfoundry.org/frs/download.php/543/dellstore2-normal-1.0.tar.gz.

This small database offers ten thousands products and simplifies the schema so much as to offer a single column actor in the products table. Let's pretend we just filled in a search box to find products by actor name, but we don't know the right spelling of the actor's name or maybe the cat really wanted to help us on the keyboard that day.

A cat! that picture should at least double the traffic to this article...

The trigram extension comes with two operators of interest for this situation here, which are the similarity operator named % and the distance operator named <->. The similarity operator will compare the list of trigrams extracted from the query terms with those extracted from each data of our table, and filter out those rows where the data is considered not similar enough.

# select show_trgm('tomy') as tomy,
         show_trgm('Tomy') as "Tomy",
         show_trgm('tom torn') as "tom torn",
         similarity('tomy', 'tom'),
         similarity('dim', 'tom');
-[ RECORD 1 ]-------------------------------------
tomy       | {"  t"," to","my ",omy,tom}
Tomy       | {"  t"," to","my ",omy,tom}
tom torn   | {"  t"," to","om ",orn,"rn ",tom,tor}
similarity | 0.5
similarity | 0

As you can read in the PostgreSQL trigram extension documentation the default similarity threshold is 0.3 and you can tweak it by using the functions set_limit().

Now let's find out all those actors whose name looks like tomy, as clearly the user did enter that in the search box but we found no exact match for it:

# select * from products where actor ~* 'tomy';
 prod_id | category | title | actor | price | special | common_prod_id 
---------+----------+-------+-------+-------+---------+----------------
(0 rows)

# select actor from products where actor % 'tomy';
  actor   
----------
 TOM TORN
 TOM DAY
(2 rows)

Time: 26.972 ms

Trigram indexing

That's a little too much time on that query when we consider only 10,000 entries in our table, let's try and do better than that:

# create index on products using gist(actor gist_trgm_ops);
CREATE INDEX

Now if we run the exact same query we get our result in less than 3 milliseconds, which is more like something we can push to production.

select actor from products where actor % 'tomy';
  actor   
----------
 TOM TORN
 TOM DAY
(2 rows)

Time: 2.695 ms

Oh and by the way, did you know that the ~* operator we used above to discover that there's not a single Tony actor in our products table, that ~* operator implements a case insensitive posix regex search in PostgreSQL? Isn't that awesome? Now, on to the next surprise, have a look at that explain plan:

# explain (costs off)
  select * from products where actor ~* 'tomy';
                   QUERY PLAN                    
-------------------------------------------------
 Index Scan using products_actor_idx on products
   Index Cond: ((actor)::text ~* 'tomy'::text)
(2 rows)

In PostgreSQL 9.3 the trigram extension is able to solve regular expression searches. The first production release of 9.3 should happen as soon as next week, I hope you're ready for it!

What about direct support for CRM114 then?

Auto Completion

What if you want to offer as-you-type completion to the names of the actors we know in our catalog? Then maybe you will find the following query useful:

#   select actor
      from products
     where actor % 'fran'
  order by actor <-> 'fran'
     limit 10;
    actor     
--------------
 FRANK HAWKE
 FRANK BERRY
 FRANK POSEY
 FRANK HAWKE
 FRANCES DEE
 FRANK LEIGH
 FRANCES DAY
 FRANK FOSTER
 FRANK HORNE
 FRANK TOMEI
(10 rows)

Time: 2.960 ms

Note that without the WHERE clause to filter on the trigram similarity I get run times of 30ms rather than 3ms in my tests here, because the GiST index is not used then. As usual EXPLAIN is your friend and remember that a query plan will change depending on the volume of your data set as known by the PostgreSQL planner statistics.

Conclusion

The trigram extension allows indexing like searches and regular expression searches, and also know how to compute similarity and distance in between texts, and how to index that. That's another power tool included with PostgreSQL. Another reason why you won't believe how much behind the other database systems you know of really are, if you ask me.

When all you have is hammer... doesn't apply to PostgreSQL

Oh, and get ready for PostgreSQL 9.3. Another release packed with awesome.

↧

Hans-Juergen Schoenig: From PostgreSQL directly to Vim

September 9, 2013, 1:42 am

≫ Next: Devrim GÜNDÜZ: PostgreSQL 9.3.0 is out!

≪ Previous: Dimitri Fontaine: Using trigrams against typos

Some (obvious) ideas can struck you when you are just sitting around at the airport or so. This is exactly what happened to me yesterday in Berlin. In some cases it can be quite handy to dump a (reasonably) small database, edit it with vi and use replay it. As a passionate (and fundamentalist) user […]

↧

Devrim GÜNDÜZ: PostgreSQL 9.3.0 is out!

September 9, 2013, 6:12 am

≫ Next: Keith Fiske: Table Partitioning & Long Names

≪ Previous: Hans-Juergen Schoenig: From PostgreSQL directly to Vim

PostgreSQL 9.3.0 was just announced. Please download, install and use it!

↧

1. Install git, gcc, and other tools.