Peter Eisentraut: Retrieving PgBouncer statistics via dblink

March 25, 2015, 5:00 pm

≫ Next: Christophe Pettus: PostgreSQL and JSON: 2015

≪ Previous: Umair Shahid: NoSQL Support in PostgreSQL

PgBouncer has a virtual database called pgbouncer. If you connect to that you can run special SQL-like commands, for example

$ psql -p 6432 pgbouncer
=# SHOW pools;
┌─[ RECORD 1 ]───────────┐
│ database   │ pgbouncer │
│ user       │ pgbouncer │
│ cl_active  │ 1         │
│ cl_waiting │ 0         │
│ sv_active  │ 0         │
│ sv_idle    │ 0         │
│ sv_used    │ 0         │
│ sv_tested  │ 0         │
│ sv_login   │ 0         │
│ maxwait    │ 0         │
└────────────┴───────────┘

This is quite nice, but unfortunately, you cannot run full SQL queries against that data. So you couldn’t do something like

SELECT * FROM pgbouncer.pools WHERE maxwait > 0;

Well, here is a way: From a regular PostgreSQL database, connect to PgBouncer using dblink. For each SHOW command provided by PgBouncer, create a view. Then that SQL query actually works.

But before you start doing that, I have already done that here:

CREATE EXTENSION dblink; -- customize start CREATE SERVER pgbouncer FOREIGN DATA WRAPPER dblink_fdw OPTIONS (host 'localhost', port '6432', dbname 'pgbouncer'); CREATE USER MAPPING FOR PUBLIC SERVER pgbouncer OPTIONS (user 'pgbouncer'); -- customize stop CREATE SCHEMA pgbouncer; CREATE VIEW pgbouncer.clients AS SELECT * FROM dblink('pgbouncer', 'show clients') AS _(type text, "user" text, database text, state text, addr text, port int, local_addr text, local_port int, connect_time timestamp with time zone, request_time timestamp with time zone, ptr text, link text); CREATE VIEW pgbouncer.config AS SELECT * FROM dblink('pgbouncer', 'show config') AS _(key text, value text, changeable boolean); CREATE VIEW pgbouncer.databases AS SELECT * FROM dblink('pgbouncer', 'show databases') AS _(name text, host text, port int, database text, force_user text, pool_size int, reserve_pool int); CREATE VIEW pgbouncer.lists AS SELECT * FROM dblink('pgbouncer', 'show lists') AS _(list text, items int); CREATE VIEW pgbouncer.pools AS SELECT * FROM dblink('pgbouncer', 'show pools') AS _(database text, "user" text, cl_active int, cl_waiting int, sv_active int, sv_idle int, sv_used int, sv_tested int, sv_login int, maxwait int); CREATE VIEW pgbouncer.servers AS SELECT * FROM dblink('pgbouncer', 'show servers') AS _(type text, "user" text, database text, state text, addr text, port int, local_addr text, local_port int, connect_time timestamp with time zone, request_time timestamp with time zone, ptr text, link text); CREATE VIEW pgbouncer.sockets AS SELECT * FROM dblink('pgbouncer', 'show sockets') AS _(type text, "user" text, database text, state text, addr text, port int, local_addr text, local_port int, connect_time timestamp with time zone, request_time timestamp with time zone, ptr text, link text, recv_pos int, pkt_pos int, pkt_remain int, send_pos int, send_remain int, pkt_avail int, send_avail int);

Here is another useful example. If you’re tracing back connections from the database server through PgBouncer to the client, try this:

SELECT * FROM pgbouncer.servers LEFT JOIN pgbouncer.clients ON servers.link = clients.ptr;

Unfortunately, different versions of PgBouncer return a different number of columns for some commands. Then you will need different view definitions. I haven’t determined a way to handle that elegantly.

↧

Christophe Pettus: PostgreSQL and JSON: 2015

March 28, 2015, 7:42 pm

≫ Next: Andrew Dunstan: Testing patches with a couple of commands using a buildfarm animal

≪ Previous: Peter Eisentraut: Retrieving PgBouncer statistics via dblink

The slides from my talk at PGConf US 2015 are now available.

↧

Andrew Dunstan: Testing patches with a couple of commands using a buildfarm animal

March 30, 2015, 7:39 am

≫ Next: Robert Haas: PostgreSQL Shutdown

≪ Previous: Christophe Pettus: PostgreSQL and JSON: 2015

I've blogged before about how the buildfarm client software can be useful for developers amd reviewers. Yesterday was a perfect example. I was testing a set of patches for a bug fix for pg_upgrade running on Windows, and they go all the way back to the 9.0 release. The simplest way to test these was using a buildfarm animal. On jacana, I applied the relevant patch in each branch repo, and then simply did this to build and test them all:

for f in root/[RH]* ; do 
  br=`basename $f`
  perl ./run_build.pl --from-source=`pwd`/$f/pgsql --config=jacana.conf --verbose $br
done

After it was all done and everything worked, I cleaned up the git repositories so they were ready for more buildfarm runs:

for f in root/[RH]* ; do 
  pushd $f/pgsql
  git reset --hard
  git clean -dfxq
  popd
done

Pretty simple! The commands are shown here on multiple lines for clarity, but in fact I wrote each set on one line, so after applying the patches the whole thing took 2 lines. (Because jacana only builds back to release 9.2, I had to repeat the process on frogmouth for 9.0 and 9.1, using the same process).

↧

Robert Haas: PostgreSQL Shutdown

March 31, 2015, 9:28 am

≫ Next: David Fetter: Monitoring pgbouncer with pgbouncer_wrapper

≪ Previous: Andrew Dunstan: Testing patches with a couple of commands using a buildfarm animal

PostgreSQL has three shutdown modes: smart, fast, and immediate. For many years, the default has been "smart", but Bruce Momjian has just committed a patch to change the default to "fast" for PostgreSQL 9.5. In my opinion, this is a good thing; I have complained about the current, and agreed with others complaining about it, many times, at least as far back as December of 2010. Fortunately, we now seem to have now achieved consensus on this change.

Read more »

↧

David Fetter: Monitoring pgbouncer with pgbouncer_wrapper

March 31, 2015, 9:28 am

≫ Next: Josh Berkus: Primary Keyvil, reprised

≪ Previous: Robert Haas: PostgreSQL Shutdown

If you're using pgbouncer, you'll notice that you can't put a WHERE clause in, which makes tracing down some particular issue on a particular connection more painful than it needs to be.
Continue reading "Monitoring pgbouncer with pgbouncer_wrapper"

↧

Josh Berkus: Primary Keyvil, reprised

March 31, 2015, 2:24 pm

≫ Next: Andreas Scherbaum: Reversing pg_rewind into the future

≪ Previous: David Fetter: Monitoring pgbouncer with pgbouncer_wrapper

Primary Keyvil was one of the most popular posts on my old blog. Since the old block has become somewhat inaccessible, and I recently did my Keyvil lightning talk again at pgConf NYC, I thought I'd reprint it here, updated and consolidated.

Two actual conversations I had on IRC ages ago, handles changed to protect the ignorant, and edited for brevity (irc.freenode.net, channel #postgresql):

    newbie1: schema design:
      http://www.rafb.net/paste/results/Bk90sz89.html

    agliodbs: hmmm ... why do you have an ID column
        in "states"? You're not using it.

    newbie1: because I have to.

    agliodbs: you what?

    newbie1: the ID column is required for normalization.

    agliodbs chokes

    newbie2: how do I write a query to remove the duplicate rows?

    agliodbs: please post your table definition

    newbie2: http://www.rafb.net/paste/results/Hk90fz88.html

    agliodbs: What's the key for "sessions"?

    newbie2: it has an "id" column

    agliodbs: Yes, but what's the real key?
        Which columns determine a unique row?

    newbie2: I told you, the "id" column.
        It's a primary key and everything.

    agliodbs: That's not going to help you
        identify duplicate sessions. You need another
key ... a unique constraint on real data columns,
        not just an "id" column.

    newbie2: no I don't

    agliodbs: Good luck with your problem then.

The surrogate numeric key has been a necessary evil for as long as we've had SQL. It was set into SQL89 because the new SQL databases had to interact with older applications which expected "row numbers," and it continues because of poor vendor support for features like CASCADE.

Inevitably, practices which are "necessary evils" tend to become "pervasive evils" in the hands of the untrained and the lazy. Not realizing that ID columns are a pragmatic compromise with application performance, many frameworks and development pragmas have enshrined numeric IDs as part of the logical and theoretical model of their applications. Worse yet, even RDBMS book authors have instructed their readers to "always include an ID column," suturing this misunderstanding into the body of industry knowledge like a badly wired cybernetic implant.

What Are Numeric Surrogate Primary Keys, Exactly?

Before people post a lot of irrelevant arguments, let me be definitional: I'm talking about auto-numbering "ID" columns, like PostgreSQL's or Oracle's SERIAL and MySQL's AUTONUMBER. Such columns are known as "surrogate keys" because they provide a unique handle for the row which has nothing to do with the row's data content. It is the abuse of these "numeric surrogate keys" which I am attacking in this column, not any other type of key.

Further, "keys" are real: a "key" is any combination of columns which forms a "predicate", or a set which uniquely identifies a tuple or row, of which there should be at least one per table. The concept of "primary key," however, has no intrinsic meaning in relational theory -- all keys are equal and no one of them is "primary". Instead, the idea of a "primary key" is based on the idea that one and only one key determines the physical order of the tuples on disk, something which relational theory specifically says we should ignore in the logical model of our data. Therefore primary keys are a specific violation of relational theory, a legacy of the days when most SQL databases were index-ordered. Mind you, some of them still are.

Theory-Schmeery. Why Should We Care?

Since there has been a relational theory for over thirty years and an ANSI SQL standard for longer than we've had PCs, it's easy to forget that E.F. Codd created the relational model in order to cure major, chronic data management problems on the mainframes at IBM. Careless abandonment of tenets of the relational model, then, risk repeating those same data management problems. These are not theoretical issues; these are real data issues that can cost your company a lot of money and you many weekends and late nights.

I'll give you an example from my own work history. We once developed a rather sophisticated web-based legal calendaring system for some multi-state, multi-firm litigation involving thousands of plaintiffs. Since there were multiple law firms involved, and some of them had a significant amount of partner turnover, the project suffered from horrible "spec drift," going 2 years late and $100,000 over budget. In the course of several hundred revisions to the database schema, the unique constraint to the central "events" table got removed. The spec committee didn't see this as a problem, because there was still the "event_id" column, which was the "primary key."

Then the duplicates started to appear.

It didn't take long (about 2 months) to discover that there was a serious problem with having "id" as the only unique column. We got multiple hearings scheduled on the calendar, in the same docket, on the same date or in the same place. Were these duplicates or two different hearings? We couldn't tell. The calendaring staff had to hire an extra person for six weeks just to call the legal staff on each case and weed out the duplicates. In the meantime, several attorneys drove hundreds of miles to show up for hearings which had been rescheduled or cancelled. The lead firm probably spent $40,000 getting the duplicates problem under control, not including wasted attorney time.

The essential problem is that an autonumber "id" column contains no information about the record to which it's connected, and tells you nothing about that record. It could be a duplicate, it could be unique, it could have ceased to exist if some idiot deleted the foreign key constraint.

A second example occurred when I and a different partner were working on an accounting application. We spent, off and on, about 6 weeks trying to track down an elusive error that would throw disbursements out of balance. When we found it, it turned out the problem was assigning an ID from an transaction record to a variable meant to hold a sub-transaction record, causing part of the disbursment to be assigned to the wrong transaction. Since all of the IDs in question where 4-byte integers, who could tell? They all looked the same, even in debug mode.

I am not saying that you should avoid autonumber surrogate keys like an Uber driver with a claw hammer. The danger is not in their use but in their abuse. The "events_id" column in the "events" table didn't give us any trouble until we began to rely on it as the sole key for the table. The accounting application gave us problems because we were using the ID as the entire handle for the records. That crossed the line from use to misuse, and we suffered for it.

Unfortunately, I'm seeing design mistakes that I made in the past not only repeated wholesale by younger developers, the rationales for them are being defended vigorously on the Internet and elsewhere.

Reasons to Use an Autonumber Surrogate Key

What follows are a number of reasons people have given me, on IRC and the PostgreSQL.org mailing lists, for using autonumber keys. Some of them are "good" reasons which demonstrate and understanding of the costs and benefits. Others are "bad" reasons based on sloppy thinking or lack of training. Form your own opinions before scrolling down.

Many-Column Keys
    The real key of the table has 3 or more columns and makes writing queries painful.

Table Size
    Since integers are smaller than most other types, using them makes the table, and my database, smaller. And a smaller database is faster.

Frameworks
    My web development framework requires that all tables have integer primary keys to do code generation from my database.

No Good Key
    My table has no combination of columns that makes a good natural key or unique index. Therefore I need to just use an ID.

Consistency
    Our technical specification requires all tables except join and load tables to have an "id" and a "name" column, each of which is unique.

Join/Sort Performance
    Integers sort and join much faster than large column types. So using integer primary keys gives me better query performance.

Design Principles
    Using an ID column in each table is an important principle of good relational database design, and is required for "normalization". I read a book/web page/magazine article that says so.

DB Abstraction/ORM
    The database abstraction library (like PDO or ActiveRecord) I use requires integer primary keys.

SQL Standard
    The SQL Standard requires an ID column in each table.

Programmer Demands
    The PHP/Python/JS/Java guys on the interface team refuse to deal with different data types and multi-column keys, and/or want an integer to use as an "object ID."

Mutability
    The natural keys in my table can change, and IDs aren't allowed to change.

Reasons to Use an Autonumber Surrogate Key, Evaluated

Here's my evaluation of the various reasons above. You'll have your own opinions, of course, but read through this list to make sure that your design decisions are well-grounded.

Many-Column Keys
    It Depends. As much as this shouldn't be a reason, the rather verbose SQL join syntax and multicolumn index performance makes it one. If SQL was more terse, and query executors better, this would evaporate as a reason. Note that in some cases though, it can still be better to use the multicolumn key, expecially if you're partitioning on some of the inherited key values.

No Real Key
    Very, Very Bad. This is an example of exactly the kind of very bad database design that puts the application designers into several weekends of overtime down the line. Without any natural key ... even if you use a surrogate key for joins, etc. ... you have no way of telling which rows in your table are duplicates. Which means that you will get duplicates, many of them, and be unable to fix the data without significant and costly fieldwork to reexamine the sources of your data ("Mary? Do we have one or two John MacEnroes working for us?")
    Worse, these indicate that the developer does not really know his data and that the spec was never really hashed out. When I interrogate people claiming that there's "no real key" I generally find that it's not actually the case that there aren't any unique keys, it's that the developer doesn't know what they are. This is a weathervane for far more serious design problems.
    As Jeff Davis likes to say, "Conflicts will get worked out somewhere. In general, it's far less expensive to work them out in the database than in the real world."
     Note that thanks to Exclusion Constraints, GIN indexes, and functional unique indexes, PostgreSQL is able to support complex criteria as keys of which other databases would not be capable. So if you're using something else, there is the possibility of "I know the real key, but my database engine doesn't support it."

External Requirements
    It Depends. The ORM, DB Abstraction and Programmer Demands arguments all amount to external requirements to use integer keys. Certainly a degree of genericization is necessary for any multi-purpose tool or code set. This is the most common reason for me when I succumb to using autonumber IDs. However, this should be a compelling reason only after you've evaluated the ORM, DB abstraction library and/or the staff involved to make sure that integer keys are a real requirement and that the tool/person will actually push your project forwards instead of becoming an obstacle.

Consistency
    Usually Bad. A scrupulous adherence to consistent design standards is generally a good thing. However, the ID/Name requirement suggests that the architects haven't looked very hard at the application's actual requirements or the real structure of the data.

Standard Practice
    Bad. Both the SQL Standard and the Design Principles arguments are based on ignorance. Generally the developer using these rationales heard from a friend of a collegue who read someone's blog who took a course at the University that ID columns were a good idea. That some of these ignorant designers are also book and article authors is really tragic. For the record, neither the SQL Standard nor relational theory compel the use of surrogate keys. In fact, the papers which established relational theory don't even mention surrogate keys.

Mutability
    It Depends. It's an unfortunate reality that many SQL DBMSes do not support ON UPDATE CASCADE for foreign keys, and even those which do tend to be inefficient in executing it (this may be a reason to switch to PostgreSQL). As a result, real keys which change very frequenty in large databases are generally not usable as join keys. However, I've seen this argument used for values which change extremely infrequently in small databases (like for full names or SSNs in a small company personnel directory), which makes it just an excuse.
    Sometimes, however, this argument is based completely on the misinformation that keys are supposed to be invariable and immutable for the life of the record. Where this idea came from I'm not sure; certainly not from either the SQL standard or the writings of E.F. Codd. It's probably unthinking bleed-over from mis-applied OO design. If this is your reason for not using a real key, it's wrong.
The other practical reason to require immutable keys is if you're using the keys as part of a cache invalidation or generic sharding system. However, a smart design for such a system still doesn't use autonumber surrogate keys; instead, you have a synthetic key which carries information about the entity to which it is attached in compressed form, such as a application-specific hash or addressing system.

Performance
    Usually Bad. I've saved the Table Size and Join/Sort Performance for last because performance is the most complex issue. The reason I say "usually bad" is that 80% of the time the developer making these arguments has not actually tested his performance claims, on this or any other database. Premature optimization is the hobgoblin of database design.
    For data warehouses, the Table Size argument can be compelling, although it needs to be balanced against the need for more joins in real performance tests. For any other type of application ... or any database smaller than 10GB, period ... this argument is nonsense. Whether your web application database is 200mb or 230mb is not going to make an appreciable performance difference on any modern machine, but poor design choices can make an appreciable difference in downtime.
    Join/Sort performance is a bit more of a serious argument, depending on the size of the database, the number of columns and data types of the real key. Note that using the natural key can, in many cases, allow you to avoid doing a join althogether, which can result in query speedups which outstrip any slowness due to column size. I have refactored data warehouses to use natural keys precisely for this reason.
   If you think you need to use a surrogate key for database performance, test, then make a decision.

Wrap-Up

As I said earlier, it's not using autonumber surrogate keys (which are a necessary, pragmatic evil) but misusing them that causes pain, late nights and budget overruns. Just make sure that you're using integer keys in the right ways and for the right reasons, and always ask yourself if an autonumber surrogate key is really needed in each table where you put one.

↧

Andreas Scherbaum: Reversing pg_rewind into the future

March 31, 2015, 3:00 pm

≫ Next: Francesco Canovai: Automating Barman with Puppet: it2ndq/barman (part two)

≪ Previous: Josh Berkus: Primary Keyvil, reprised

Andreas 'ads' Scherbaum

During Nordic PGDay 2015, I attended Heikki's talk about "pg_rewind". Someone in the audience asked, if it's possible to roll a PostgreSQL database forward, ahead of the master. The answer was that currently it's not possible, but there is no technical reason why this should not work.

Continue reading "Reversing pg_rewind into the future"

↧

Francesco Canovai: Automating Barman with Puppet: it2ndq/barman (part two)

April 1, 2015, 2:49 am

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Use 128-bit math to accelerate some aggregation functions.

≪ Previous: Andreas Scherbaum: Reversing pg_rewind into the future

In the first part of this article we configured Vagrant to execute two Ubuntu 14.04 Trusty Tahr virtual machines, respectively called pg and backup. In this second part we will look at how to use Puppet to set up and configure a PostgreSQL server on pg and back it up via Barman from the backup box.

Puppet: configuration

After defining the machines as per the previous article, we need to specify the required Puppet modules that librarian-puppet will manage for us.

Two modules are required:

puppetlabs/postgresql (https://github.com/puppetlabs/puppetlabs-postgresql/) to install PostgreSQL on the pg VM
it2ndq/barman (https://github.com/2ndquadrant-it/puppet-barman) to install Barman on backup

Both modules will be installed from Puppet Forge. For the puppetlabs/postgresql module, we’ll have to use version 4.2.0 at most at the moment, as the latest version (4.3.0) is breaking the postgres_password parameter we’ll be using later (see this pull request). Let’s create a file called Puppetfile containing this content in the project directory:

forge "https://forgeapi.puppetlabs.com"
mod "puppetlabs/postgresql", "<4.3.0"
mod "it2ndq/barman"

We can now install the Puppet modules and their dependencies by running:

$ librarian-puppet install --verbose

Although not essential, it’s preferable to use the option --verbose every time librarian-puppet is used. Without it the command is very quiet and it’s useful to have details about what it’s doing in advance. For example, without using --verbose, you may find out that you’ve wasted precious time waiting for a dependency conflict to be resolved, only to see an error many minutes later.

Upon successful completion of the command, a modules directory containing the barman and postgresql modules and their dependencies (apt, concat, stdlib) will be created in our working directory. In addition, librarian-puppet will create the Puppetfile.lock file to identify dependencies and versions of the installed modules, pinning them to prevent future updates. This way, subsequent librarian-puppet install runs will always install the same version of the modules instead of possible upgrades (in case an upgrade is required, librarian-puppet update will do the trick).

Now we can tell Vagrant we are using a Puppet manifest to provision the servers. We alter the Vagrantfile as follows:

Vagrant.configure("2")do|config|{:pg =>{:ip      =>'192.168.56.221',:box     =>'ubuntu/trusty64'},:backup =>{:ip      =>'192.168.56.222',:box     =>'ubuntu/trusty64'}}.each do|name,cfg|
    config.vm.define name do|local|
      local.vm.box = cfg[:box]
      local.vm.hostname = name.to_s +'.local.lan'
      local.vm.network :private_network, ip: cfg[:ip]
      family ='ubuntu'
      bootstrap_url ='https://raw.github.com/hashicorp/puppet-bootstrap/master/'+ family +'.sh'# Run puppet-bootstrap only once
      local.vm.provision :shell,:inline =><<-eos
        if[!-e /tmp/.bash.provision.done ];then
          curl -L #{bootstrap_url}| bash
          touch /tmp/.bash.provision.done
        fi
      eos

      # Provision with Puppet
      local.vm.provision :puppet do|puppet|
        puppet.manifests_path ="manifests"
        puppet.module_path =[".","modules"]
        puppet.manifest_file ="site.pp"
        puppet.options =['--verbose',]endendendend

With the lines we’ve just added, we’ve given Vagrant the instructions to provision the VMs using manifests/site.pp as the main manifest and the modules included in the modules directory. This is the final version of our Vagrantfile.

We now have to create the manifests directory:

$ mkdir manifests

and write in it a first version of site.pp. We’ll start with a very basic setup:

node backup {class{'barman':
    manage_package_repo =>true,}}
node pg {}

We can now start the machines and see that on backup there is a Barman server with a default configuration (and no PostgreSQL on pg yet). Let’s log into backup:

$ vagrant ssh backup

and take a look at /etc/barman.conf:

# Main configuration file for Barman (Backup and Recovery Manager for PostgreSQL)# Further information on the Barman project at www.pgbarman.org# IMPORTANT: Please do not edit this file as it is managed by Puppet!# Global options[barman]barman_home =/var/lib/barman
barman_user = barman
log_file =/var/log/barman/barman.log
compression = gzip
backup_options = exclusive_backup
minimum_redundancy =0retention_policy =retention_policy_mode = auto
wal_retention_policy = main
configuration_files_directory =/etc/barman.conf.d

The next step is running a PostgreSQL instance on pg. We must be aware of the parameters required by Barman on the PostgreSQL server, so we need to set:

wal_level at least at archive level
archive_mode to on
archive_command so that the WALs can be copied on backup
a rule in pg_hba.conf for access from backup

All of these parameters can be easily set through the puppetlabs/postgresql module. In addition, on the Barman server, we need:

a PostgreSQL connection string
a .pgpass file for authentication
a SSH command
to perform the SSH key exchange

it2ndq/barman generates a private/public keypair in ~barman/.ssh. However, automatically exchanging the keys between the servers requires the presence of a Puppet Master which is beyond the objectives of this tutorial (it will be part of the next instalment, which will focus on the setup of a Puppet Master and the barman::autoconfigure class) – therefore this last step will be performed manually.

We edit the site.pp file as follows:

node backup {class{'barman':
    manage_package_repo =>true,}
  barman::server {'test-server':
    conninfo     =>'user=postgres host=192.168.56.221',
    ssh_command  =>'ssh postgres@192.168.56.221',}
  file {'/var/lib/barman/.pgpass':ensure=>'present',
    owner   =>'barman',
    group   =>'barman',
    mode    =>0600,
    content =>'192.168.56.221:5432:*:postgres:insecure_password',}}

node pg {class{'postgresql::server':
    listen_addresses     =>'*',
    postgres_password    =>'insecure_password',
    pg_hba_conf_defaults =>false,}
  postgresql::server::pg_hba_rule {'Local access':
    type        =>'local',
    database    =>'all',
    user        =>'all',
    auth_method =>'peer',}
  postgresql::server::pg_hba_rule {'Barman access':
    type        =>'host',
    database    =>'all',
    user        =>'postgres',
    address     =>'192.168.56.222/32',
    auth_method =>'md5',}
  postgresql::server::config_entry {'wal_level': value =>'archive';'archive_mode': value =>'on';'archive_command': value =>'rsync -a %p barman@192.168.56.222:/var/lib/barman/test-server/incoming/%f';}class{'postgresql::server::contrib':
    package_ensure =>'present',}}

Having changed the manifest, the provision has to be rerun:

$ vagrant provision

With the machines running, we can proceed with the key exchanges. We log into pg:

$ vagrant ssh pg

and we create the keypair for the postgres user, using ssh-keygen, leaving every field empty when prompted (so always pressing enter):

vagrant@pg:~$ sudo -iu postgres
postgres@pg:~$ ssh-keygen
postgres@pg:~$ cat .ssh/id_rsa.pub

The last command outputs a long alphanumeric string that has to be appended to the ~barman/.ssh/authorized_keys file on backup.

$ vagrant ssh backup
vagrant@backup:~$ sudo -iu barman
barman@backup:~$ echo "ssh-rsa ...">>.ssh/authorized_keys

Similarly, we copy the public key of the barman user into the authorized_keys file of the postgres user on pg:

barman@backup:~$ cat .ssh/id_rsa.pub
ssh-rsa ...
barman@backup:~$ logout
vagrant@backup:~$ logout
$ vagrant ssh pg
vagrant@pg:~$ sudo -iu postgres
postgres@pg:~$ echo "ssh-rsa ...">>.ssh/authorized_keys

At this point, we make a first connection in both directions between the two servers:

postgres@pg:$ ssh barman@192.168.56.222
barman@backup:$ ssh postgres@192.168.56.221

We can run barman check to verify that Barman is working correctly:

barman@backup:~$ barman check all
Server test-server:
        ssh: OK
        PostgreSQL: OK
        archive_mode: OK
        archive_command: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        minimum redundancy requirements: OK (have 0 backups, expected at least 0)

Every line should read “OK”. Now, to perform a backup, simply run:

barman@backup:$ barman backup test-server

A realistic configuration

The Barman configuration used so far is very simple, but you can easily add a few parameters to site.pp and take advantage of all the features of Barman, such as the retention policies and the new incremental backup available in Barman 1.4.0.

We conclude this tutorial with a realistic use case, with the following requirements:

a backup every night at 1:00am
the possibility of performing a Point In Time Recovery to any moment of the last week
always having at least one backup available
reporting an error via barman check in case the newest backup is older than a week
enabling incremental backup to save disk space

We use the Puppet file resource to create a .pgpass file with the connection parameters and a cron resource to generate the job to run every night. Finally, we edit the barman::server to add the required Barman parameters.

The end result is:

node backup {class{'barman':
    manage_package_repo =>true,}
  barman::server {'test-server':
    conninfo                =>'user=postgres host=192.168.56.221',
    ssh_command             =>'ssh postgres@192.168.56.221',
    retention_policy        =>'RECOVERY WINDOW OF 1 WEEK',
    minimum_redundancy      =>1,
    last_backup_maximum_age =>'1 WEEK',
    reuse_backup            =>'link',}
  file {'/var/lib/barman/.pgpass':ensure=>'present',
    owner   =>'barman',
    group   =>'barman',
    mode    =>0600,
    content =>'192.168.56.221:5432:*:postgres:insecure_password',}
  cron {'barman backup test-server':
    command =>'/usr/bin/barman backup test-server',
    user    =>'barman',
    hour    =>1,
    minute  =>0,}}
node pg {class{'postgresql::server':
    listen_addresses  =>'*',
    postgres_password =>'insecure_password',
    pg_hba_conf_defaults =>false,}
  postgresql::server::pg_hba_rule {'Local access':
    type        =>'local',
    database    =>'all',
    user        =>'all',
    auth_method =>'peer',}
  postgresql::server::pg_hba_rule {'Barman access':
    type        =>'host',
    database    =>'all',
    user        =>'postgres',
    address     =>'192.168.56.222/32',
    auth_method =>'md5',}
  postgresql::server::config_entry {'wal_level': value =>'archive';'archive_mode': value =>'on';'archive_command': value =>'rsync -a %p barman@192.168.56.222:/var/lib/barman/test-server/incoming/%f';}}

Conclusion

With 51 lines of Puppet manifest we managed to configure a pair of PostgreSQL/Barman servers with settings similar to those we might want on a production server. We have combined the advantages of having a Barman server to handle backups with those of having an infrastructure managed by Puppet, reusable and versionable.

In the next and final post in this series of articles we will look at how to use a Puppet Master to export resource between different machines, thus allowing the VMs to exchange the parameters required for correct functioning via the barman::autoconfigure class making the whole setup process easier.

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Use 128-bit math to accelerate some aggregation functions.

April 1, 2015, 1:22 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

≪ Previous: Francesco Canovai: Automating Barman with Puppet: it2ndq/barman (part two)

On 20th of March, Andres Freund committed patch: Use 128-bit math to accelerate some aggregation functions. On platforms where we support 128bit integers, use them to implement faster transition functions for sum(int8), avg(int8), var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use numeric as a transition type. In some synthetic benchmarks this has been shown […]

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

April 2, 2015, 12:30 pm

≫ Next: Glyn Astill: Being cavalier with slony nodes and pg_dump/pg_restore

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Use 128-bit math to accelerate some aggregation functions.

On 22nd of March, Tom Lane committed patch: Allow foreign tables to participate in inheritance. Foreign tables can now be inheritance children, or parents. Much of the system was already ready for this, but we had to fix a few things of course, mostly in the area of planner and executor handling of row […]

↧

Glyn Astill: Being cavalier with slony nodes and pg_dump/pg_restore

April 2, 2015, 2:05 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add pg_rewind, for re-synchronizing a master server after failback.

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

It’s generally a bad idea to do logical dump/restores of slony nodes, and for this reason slony provides the CLONE PREPARE/CLONE FINISH action commands to clone subscriber nodes. In this instance though, I’ve a test environment where I’d stopped the slons, dumped out and dropped a subscriber database and then gone on to do some […]

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add pg_rewind, for re-synchronizing a master server after failback.

April 4, 2015, 5:13 am

≫ Next: Michael Paquier: Postgres 9.5 feature highlight: Default shutdown mode of pg_ctl to fast

≪ Previous: Glyn Astill: Being cavalier with slony nodes and pg_dump/pg_restore

On 23rd of March, Heikki Linnakangas committed patch: Add pg_rewind, for re-synchronizing a master server after failback. Earlier versions of this tool were available (and still are) on github. Thanks to Michael Paquier, Alvaro Herrera, Peter Eisentraut, Amit Kapila, and Satoshi Nagayasu for review. So, we have a situation, where we have master […]

↧

Michael Paquier: Postgres 9.5 feature highlight: Default shutdown mode of pg_ctl to fast

April 4, 2015, 6:01 am

≫ Next: Peter Geoghegan: Abbreviated keys for numeric to accelerate numeric sorts

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add pg_rewind, for re-synchronizing a master server after failback.

This week, I wanted to share something that may impact many users of Postgres, with this commit changing a behavior that had for a long time the same default:

commit: 0badb069bc9f590dbc1306ccbd51e99ed81f228c
author: Bruce Momjian <bruce@momjian.us>
date: Tue, 31 Mar 2015 11:46:27 -0400
pg_ctl:  change default shutdown mode from 'smart' to 'fast'

Retain the order of the options in the documentation.

pg_ctl has three shutdown modes:

smart, the polite one, waits patiently for all the active clients connections to be disconnected before shutting down the server. This is the default mode for Postgres for ages.
immediate, the brute-force one, aborts all the server processes without thinking, leading to crash recovery when the instance is restarted the next time.
fast, takes an intermediate approach by rollbacking all the existing connections and then shutting down the server.

Simply, the "smart" mode has been considered the default because it is the least distuptive, particularly it will wait for a backup to finish before shutting down the server. It has been (justly) discussed that it was not enough aggresive, users being sometimes surprised that a shutdown requested can finish with a timeout because a connection has been for example left open, hence the default has been switched to "fast".

This is not complicated litterature, however be careful if you had scripts that relied on the default behavior of pg_ctl when switching to 9.5, particularly for online backups that will be immediately terminated at shutdown with the new default.

↧

Peter Geoghegan: Abbreviated keys for numeric to accelerate numeric sorts

April 4, 2015, 9:19 am

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add support for index-only scans in GiST.

≪ Previous: Michael Paquier: Postgres 9.5 feature highlight: Default shutdown mode of pg_ctl to fast

Andrew Gierth's numeric abbreviated keys patch was committed recently. This commit added abbreviation/sortsupport for the numeric type (the PostgreSQL type which allows practically arbitrary precision, typically recommended for representing monetary values).

The encoding scheme that Andrew came up with is rather clever - it has an excellent tendency to concentrate entropy from the original values into the generated abbreviated keys in real world cases. As far as accelerating sorts goes, numeric abbreviation is at least as effective as the original text abbreviation scheme. I easily saw improvements of 6x-7x with representative queries that did not spill to disk (i.e. that used quicksort). In essence, the patch makes sorting numeric values almost as cheap as sorting simple integers, since that is often all that is actually required during sorting proper (the abbreviated keys compare as integers, except that the comparison is inverted to comport with how abbreviation builds abbreviated values from numerics as tuples are copied into local memory ahead of sorting - see the patch for exact details).

Separately, over lunch at pgConf.US in New York, Corey Huinker complained about a slow, routine data warehousing CREATE INDEX operation that took far too long. The indexes in question were built on a single text column. I suggested that Corey check out how PostgreSQL 9.5 performs, where this operation is accelerated by text abbreviation, often very effectively.

Corey chose an organic set of data that could be taken as a reasonable proxy for how PostgreSQL behaves when he performs these routine index builds. In all cases maintenance_work_mem was set to 64MB, meaning that an external tapesort is always required - those details were consistent. This was a table with 1.8 million rows. Apparently, on PostgreSQL 9.4, without abbreviation, the CREATE INDEX took 10 minutes and 19 seconds in total. On PostgreSQL 9.5, with identical settings, it took only 51.3 seconds - a 12x improvement! This was a low cardinality pre-sorted column, but if anything that is a less compelling case for abbreviation - I think that the improvements could sometimes be even greater when using external sorts on big servers with fast CPUs. Further organic benchmarks of abbreviated key sorts are very welcome. Of course, there is every reason to imagine that abbreviation would now improve things just as much if not more with large numeric sorts that spill to disk.

Future work

With numeric abbreviation committed, and support for the "datum" case likely to be committed soon, you might assume that abbreviation as a topic on the pgsql-hackers development mailing list had more or less played out (the "datum " sort case is used by things like "SELECT COUNT(DISTINCT FOO) ..." - this is Andrew Gierth's work again). You might now reasonably surmise that it would be nice to have support for the default B-Tree opclasses of one or two other types, like character(n), but that's about it, since clearly abbreviation isn't much use for complex/composite types - we're almost out of interesting types to abbreviate. However, I think that work on abbreviated keys is far from over. Abbreviation as a project is only more or less complete as a technique to accelerate sorting, but that's likely to only be half the story (Sorry Robert!).

I intend to undertake research on using abbreviated keys within internal B-Tree pages in the next release cycle. Apart from amortizing the cost of comparisons that are required to service index scans, I suspect that they can greatly reduce the number of cache misses by storing abbreviated keys inline in the ItemId array of internal B-Tree pages. Watch this space!

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add support for index-only scans in GiST.

April 5, 2015, 3:55 am

≫ Next: Paul Ramsey: PostGIS 2.0.7 & 2.1.7 Released

≪ Previous: Peter Geoghegan: Abbreviated keys for numeric to accelerate numeric sorts

On 26th of March, Heikki Linnakangas committed patch: Add support for index-only scans in GiST. This adds a new GiST opclass method, 'fetch', which is used to reconstruct the original Datum from the value stored in the index. Also, the 'canreturn' index AM interface function gains a new 'attno' argument. That makes it possible […]

↧

Paul Ramsey: PostGIS 2.0.7 & 2.1.7 Released

April 5, 2015, 5:00 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add stats for min, max, mean, stddev times to pg_stat_statements.

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add support for index-only scans in GiST.

Due to a critical bug in GeoJSON ingestion we have made an early release of versions 2.0.7 and 2.1.7. If you are running an earlier version on a public site and accepting incoming GeoJSON, we recommend that you update as soon as possible.

http://download.osgeo.org/postgis/source/postgis-2.1.7.tar.gz

http://download.osgeo.org/postgis/source/postgis-2.0.7.tar.gz

View all closed tickets for 2.0.7.

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add stats for min, max, mean, stddev times to pg_stat_statements.

April 6, 2015, 2:09 am

≫ Next: David Christensen: PgConf 2015 NYC Recap

≪ Previous: Paul Ramsey: PostGIS 2.0.7 & 2.1.7 Released

On 27th of March, Andrew Dunstan committed patch: Add stats for min, max, mean, stddev times to pg_stat_statements. The new fields are min_time, max_time, mean_time and stddev_time. Based on an original patch from Mitsumasa KONDO, modified by me. Reviewed by Petr Jelínek. While pg_stat_statements provides a lot of information about statements, it's timing […]

↧

David Christensen: PgConf 2015 NYC Recap

April 6, 2015, 3:10 pm

≫ Next: Andrew Dunstan: Fun with Raspberry Pi 2 and the buildfarm

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add stats for min, max, mean, stddev times to pg_stat_statements.

I recently just got back from PGConf 2015 NYC. It was an invigorating, fun experience, both attending and speaking at the conference.

What follows is a brief summary of some of the talks I saw, as well as some insights/thoughts:

On Thursday:

"Managing PostgreSQL with Puppet" by Chris Everest. This talk covered experiences by CoverMyMeds.com staff in deploying PostgreSQL instances and integrating with custom Puppet recipes.

"A TARDIS for your ORM - application level timetravel in PostgreSQL" by Magnus Hagander. Demonstrated how to construct a mirror schema of an existing database and manage (via triggers) a view of how data existed at some specific point in time. This system utilized range types with exclusion constraints, views, and session variables to generate a similar-structured schema to be consumed by an existing ORM application.

"Building a 'Database of Things' with Foreign Data Wrappers" by Rick Otten. This was a live demonstration of building a custom foreign data wrapper to control such attributes as hue, brightness, and on/off state of Philips Hue bulbs. Very interesting live demo, nice audience response to the control systems. Used a python framework to stub out the interface with the foreign data wrapper and integrate fully.

"Advanced use of pg_stat_statements: Filtering, Regression Testing & More" by Lukas Fittl. Covered how to use the pg_stat_statements extension to normalize queries and locate common performance statistics for the same query. This talk also covered the pg_query tool/library, a Ruby tool to parse/analyze queries offline and generate a JSON object representing the query. The talk also covered the example of using a test database and the pg_stat_statements views/data to perform query analysis to theorize about planning of specific queries without particular database indexes, etc.

On Friday:

"Webscale's dead! Long live Postgres!" by Joshua Drake. This talk covered improvements that PostgreSQL has made over the years, specific technologies that they have incorporated such as JSON, and was a general cheerleading effort about just how awesome PostgreSQL is. (Which of course we all knew already.) The highlight of the talk for me was when JD handed out "prizes" at the end for knowing various factoids; I ended up winning a bottle of Macallan 15 for knowing the name of the recent departing member of One Direction. (Hey, I have daughters, back off!)

"The Elephants In The Room: Limitations of the PostgreSQL Core Technology" by Robert Haas. This was probably the most popular talk that I attended. Robert is one of the core members of the PostgreSQL development team, and is heavily knowledgeable in the PostgreSQL internals, so his opinions of the existing weaknesses carry some weight. This was an interesting look forward at possible future improvements and directions the PostgreSQL project may take. In particular, Robert looked at the IO approach Postgres currently take and posits a Direct IO idea to give Postgres more direct control over its own IO scheduling, etc. He also mentioned the on-disk format being somewhat suboptimal, Logical Replication as an area needing improvement, infrastructure needed for Horizontal Scalability and Parallel Query, and integrating Connection Pooling into the core Postgres product.

"PostgreSQL Performance Presentation (9.5devel edition)" by Simon Riggs. This talked about some of the improvements in the 9.5 HEAD; in particular looking at the BRIN index type, an improvement in some cases over the standard btree index method. Additional metrics were shown and tested as well, which demonstrated Postgres 9.5's additional performance improvements over the current version.

"Choosing a Logical Replication System" by David Christensen. As the presenter of this talk, I was also naturally required to attend as well. This talk covered some of the existing logical replication systems including Slony and Bucardo, and broke down situations where each has strengths.

"The future of PostgreSQL Multi-Master Replication" by Andres Freund. This talk primarily covered the upcoming BDR system, as well as the specific infrastructure changes in PostgreSQL needed to support these features, such as logical log streaming. It also looked at the performance characteristics of this system. The talk also wins for the most quote-able line of the conference: "BDR is spooning Postgres, not forking", referring to the BDR project's commitment to maintaining the code in conjunction with core Postgres and gradually incorporating this into core.

As part of the closing ceremony, there were lightning talks as well; quick-paced talks (maximum of 5 minutes) which covered a variety of interesting, fun and sometimes silly topics. In particular some memorable ones were one about using Postgres/PostGIS to extract data about various ice cream-related check-ins on Foursquare, as well as one which proposed a generic (albeit impractical) way to search across all text fields in a database of unknown schema to find instances of key data.

As always, it was good to participate in the PostgreSQL community, and look forward to seeing participants again at future conferences.

↧

Andrew Dunstan: Fun with Raspberry Pi 2 and the buildfarm

April 6, 2015, 4:29 pm

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – psql: add asciidoc output format

≪ Previous: David Christensen: PgConf 2015 NYC Recap

Here's a picture of my two Raspberry Pi 2 boxes, both running headless and wireless.

One is running Raspbian, installed via NOOBS, and the other Fidora, a remix of Fedora 21 for Raspberry Pi 2. It turned out that Pidora doesn't work on the Raspberry Pi 2, a fact that is extremely well hidden on the Raspberry Pi web site.

I have set up test buildfarm animals on both of these. But something odd is happening. They are both getting intermittent failures of the stats regression test. Sometimes it happens during "make check", sometimes during "make installcheck" and sometimes during testing of pg_upgrade (which in turn runs "make installcheck").

These machines are not anything like speed demons. Far from it. But we also run other slow machines without getting this happening all the time. So I'm a bit perplexed about what might be going on.

Incidentally, if you want to play with one of these, I do recommend getting a starter kit from Amazon or elsewhere. It's probably cheaper than buying everything separately, and gets you everything you need to get started. Well worth the $69.99.

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – psql: add asciidoc output format

April 7, 2015, 1:21 am

≫ Next: Umair Shahid: HOWTO use JSON functionality in PostgreSQL

≪ Previous: Andrew Dunstan: Fun with Raspberry Pi 2 and the buildfarm

On 31st of March, Bruce Momjian committed patch: psql: add asciidoc output format Patch by Szymon Guz, adjustments by me Testing by Michael Paquier, Pavel Stehule To be honest, when Szymon posted first mail about asciidoc – it was the first time I heard about it. Immediately I thought: “why not markdown? or […]

↧