Kirk Roybal: DFW PUG Meetup November 5, 2014

October 6, 2014, 8:53 am

≫ Next: Andrew Dunstan: pg_repack pitfalls

≪ Previous: Andrew Dunstan: Towards a PostgreSQL Benchfarm

Our topic for November, “If you know PostgreSQL, then you know Big Data”. The HUGEdata Tech Team and Principal Data Scientist will share an overview of our Scale out SQL database that leverages PG admin client access. We will provide a demonstration, that includes a marketing example for customer segmentation. We’ll also talk about machine data, the internet of things, and other use case for Big Data. And, we’d love interaction with the group on the challenges they’ve faced scaling Postgres and their ideas on how to position our Analytics Platform to the community.

What: HugeData and PostgreSQL

Who: Beth Lahaie and the HugeData team

When: Wednesday, November 5, 2014

Where:
Improving Enterprises
16633 Dallas Parkway Suite 110 Addison, TX 75001

DFW PUG on Meetup

↧

Andrew Dunstan: pg_repack pitfalls

October 6, 2014, 2:06 pm

≫ Next: gabrielle roth: RDS: Three weeks in

≪ Previous: Kirk Roybal: DFW PUG Meetup November 5, 2014

pg_repack is a terrific tool for allowing you to reorganize a table without needing to hold long running strong locks on the table. That means that that your normal inserts, updates and deletes can continue to run against the table while the reorganization is proceeding.

I have had clients who have run into problems with it, however. In particular, it is possible to get it wedged so that the table is inaccessible and nothing can proceed, unless you either kill the repack operation or kill what is blocking it. Here is a simple example of how to cause problems.

In session 1, do:

pg_reorg -e -t foo dbnameset

and in session 2 in psql do:

select pg_sleep(10); lock table foo; rollback;

The sleep gets us past the time when pg_reorg is setting up, and happens while it is is doing its CREATE TABLE ... AS SELECT .... When that CREATE TABLE statement finishes, both sessions will be wedged. Session 2 will be hung because it is unable to lock the table, since pg_reorg's other session will hold a weak lock on the table. And nothing, including pg_reorg, will be able to do anything with the table.

The solution is to make sure that nothing holds or even tries to obtain any strong long running locks on the table.

One useful thing is to use the check_postgres.pl monitor script to look for things like long running transactions and processes waiting for locks.

Or you can create a more customized test to look for this exact situation.

Most importantly, you need to be aware that problems can occur, and to protect against them happening in the first place.

↧

gabrielle roth: RDS: Three weeks in

October 6, 2014, 9:55 pm

≫ Next: Joshua Drake: Don't kill yourself

≪ Previous: Andrew Dunstan: pg_repack pitfalls

I’ve spent the past few weeks learning my way around Amazon’s RDS offering (specifically Postgres, and a bit of elasticache). It’s a mixed bag so far; for every feature I think “Hey, this is neat!” I find at least one or two others that are not so thrilling. One of the things that may annoy […]

↧

Joshua Drake: Don't kill yourself

October 7, 2014, 10:11 am

≫ Next: Jim Mlodgenski: PostgreSQL Dollar Quoting

≪ Previous: gabrielle roth: RDS: Three weeks in

As a PostgreSQL consultant you end up working with a lot of different types of clients and these clients tend to all have different requirements. One client may need high-availability, while another needs a DBA, while yet another is in desperate need of being hit with a clue stick and while it is true that there can be difficult clients, there is no bad client.

What!!! Surely you can't be serious?
Don't call me shirley.
I am absolutely serious.

A bad client is only a reflection of a consultants inability to manage that client. It is true that there are difficult clients. They set unrealistic expectations, try to low ball you by with things like: "We can get your expertise for 30.00/hr from India" or my favorite: calling you directly when it is after hours to "chat".

How are these not bad clients? They are not bad clients because it is you that controls the relationship with the client. You as the consultant have to set proper boundaries with the client to insure that the relationship as a whole is positive and profitable. If you can't manage that relationship you have two choices:

Hire someone who can
Fire the client

Woah! Fire the client? Yes. Terminate the relationship with the client.

It is always amazing to me how many people can't fathom the idea of firing a client. It is always some sacred vow that a client can fire you but you are left holding the bag, somehow that bag is filled with the feces of some dog and you are expected to light it on fire and leave it on the porch of some unsuspecting high-school football coach.[1]

The counter argument to this is usually "I need the money". This is a valid argument but do you need the money so badly that you are willing to sacrifice your health or your relationships? It is astonishing how many consultants are willing to do exactly that. In the words of the legendary band Big Fun, "Suicide, don't do it"[2].

The better you manage a client, the better the relationship. Good luck!

http://en.wikipedia.org/wiki/All_the_Right_Moves_(film)
https://www.youtube.com/watch?v=i-w1GeH8KPU

↧

Jim Mlodgenski: PostgreSQL Dollar Quoting

October 7, 2014, 10:25 am

≫ Next: Keith Fiske: A Small Database Does Not Mean Small shared_buffers

≪ Previous: Joshua Drake: Don't kill yourself

I recently attended a excellent meetup about Redshift and one of the comments by the presenter was the trouble of the running of the UNLOAD command. The trouble they were having was that the UNLOAD command takes an SQL statement as a parameter, but if that SQL statement has strings, you need to escape everything which makes it fairly unreadable.

We can see an example of this in PostgreSQL using the dblink extension:

SELECT *
  FROM dblink('dbname=postgres', 'SELECT * FROM test WHERE b = ''2014-02-02''')
    AS t(a int, b date);

Since Redshift is a derivative of PostgreSQL, the dollar quoting syntax also works. Dollar quoting is a non-standard way of denoting string constants, but it makes things much simpler to read.

SELECT *
  FROM dblink('dbname=postgres', $$ SELECT * FROM test WHERE b = '2014-02-02' $$)
    AS t(a int, b date);

↧

Keith Fiske: A Small Database Does Not Mean Small shared_buffers

October 8, 2014, 11:25 am

≫ Next: Mark Wong: Loading Tables and Creating B-tree and Block Range Indexes

≪ Previous: Jim Mlodgenski: PostgreSQL Dollar Quoting

As a followup to my previous blog post, A Large Database Does Not Mean Large shared_buffers, I had some more interesting findings applying the queries in that blog post to another client recently. I assume you have read that one already and don’t repeat any of what I explained previously, so if you haven’t read that one and aren’t familiar with the pg_buffercache extension, I highly recommend you go read that one first.

Another mantra often heard in PostgreSQL circles that you usually don’t want to set shared_buffers higher than 8GB. I will admit, that for a majority of clients, that is great advice and a good starting point (and a whole lot more useful than the default 32MB). There are also issues around double-buffering and allowing the kernel to do what it can probably do better than PostgreSQL as far as managing page reads/writes (a topic way out of the scope of this blog post). But if you investigate further into how PostgreSQL is using its shared memory and what your high demand data blocks actually are, you can possibly find benefit in setting it higher. Especially when you can clearly see what PostgreSQL thinks it needs most often. Or if you can just fit the whole thing into memory, as I stated before.

The client in these examples has shared_buffers set to 24Gb and the total database size is 145GB (111GB in the primary followed by 28GB, 5GB, 270MB & 150MB). I say small in the title of this post, but both large and small are relative terms and for my typical work this is a small database. And a setting that is 17% of the total size is larger than normal, so along with being a catchy followup name, the results do fit the title.

So I ran the basic query at the end of my previous post to see what the “ideal” minimal is. I ran this several times over about a half-hour period and, unlike the databases in my previous post, it did not deviate much.

database=# SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
FROM pg_buffercache b
WHERE usagecount >= 3;
 ideal_shared_buffers 
----------------------
 18 GB

Much higher than I previously encountered and with a much smaller database too. The value did deviate slightly, but it never changed from the rounded, pretty value of 18GB. So I investigated further. First the primary, 111GB database:

database=# SELECT c.relname
   , pg_size_pretty(count(*) * 8192) as buffered
   , round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
   , round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
 FROM pg_class c
 INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
 INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
 WHERE pg_relation_size(c.oid) > 0
 GROUP BY c.oid, c.relname
 ORDER BY 3 DESC
 LIMIT 10;
          relname          | buffered | buffers_percent | percent_of_relation 
---------------------------+----------+-----------------+---------------------
 group_members             | 8697 MB  |            35.4 |                73.9
 order_items               | 1391 MB  |             5.7 |               100.0
 orders                    | 1258 MB  |             5.1 |               100.0
 users                     | 812 MB   |             3.3 |               100.0
 units                     | 801 MB   |             3.3 |               100.0
 images                    | 599 MB   |             2.4 |                71.5
 group_members_user_id_idx | 481 MB   |             2.0 |                10.9
 user_list_map             | 264 MB   |             1.1 |               100.0
 products                  | 202 MB   |             0.8 |               100.0

A good amount of the large tables had a significant amount of themselves in shared buffers. I looked at the top table here to see if it may be having problems keeping its high demand usage blocks in memory

database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'group_members'::regclass
AND usagecount >= 3;
 pg_size_pretty 
----------------
 6606 MB

Actually looks ok. It’s got about 2GB of space to be able to swap out lower priority blocks for higher ones if needed. How about those next two 100% tables?

database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'order_items'::regclass
AND usagecount >= 3;
 pg_size_pretty 
----------------
 1391 MB
(1 row)

database=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'orders'::regclass
AND usagecount >= 3;
 pg_size_pretty 
----------------
 1258 MB
(1 row)

I actually increased the usagecount parameter for both these tables all the way up to 5 and that only lowered the amount by a 2-3MB. So these are some pretty heavily used tables. For a client that does online order processing, this would seem to make sense for the context of this table. But it could also indicate a problem as well. This could mean there are queries doing a whole lot of sequential scans on this table and they might not need to be doing so. If that’s not something that’s readily apparent in the code accessing the database, I would then suggest turning to something like pgbadger for more in-depth query analysis to see where problems may be.

You may have noticed this doesn’t account for all the memory usage seen in the first query. Time to dive into the other databases (the 28GB one).

database=# \c mailer 
mailer=# SELECT c.relname
  , pg_size_pretty(count(*) * 8192) as buffered
  , round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
  , round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE pg_relation_size(c.oid) > 0
GROUP BY c.oid, c.relname
ORDER BY 3 DESC
LIMIT 10;
            relname             | buffered | buffers_percent | percent_of_relation 
--------------------------------+----------+-----------------+---------------------
 messages_pkey                  | 1769 MB  |             7.2 |                88.7
 messages                       | 1200 MB  |             4.9 |               100.0
 subject_text                   | 261 MB   |             1.1 |                41.9
 messages_mailing_id_idx        | 259 MB   |             1.1 |                15.4
 subject_text_pkey              | 104 MB   |             0.4 |               100.0
 messages_created_at_idx        | 26 MB    |             0.1 |                 1.2
 messages_recipient_id_idx      | 30 MB    |             0.1 |                 1.7
 pg_attrdef_adrelid_adnum_index | 16 kB    |             0.0 |               100.0
 pg_index_indrelid_index        | 40 kB    |             0.0 |                35.7
 pg_namespace_oid_index         | 16 kB    |             0.0 |               100.0
(10 rows)

That primary key is taking up a lot of space and almost all of it seems to be in memory. But again, how much of it is really high usage?

mailer=# SELECT pg_size_pretty(count(*) * 8192)
 FROM pg_class c
 INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
 INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
 WHERE c.oid::regclass = 'messages_pkey'::regclass
 AND usagecount >= 3;
 pg_size_pretty 
----------------
 722 MB

Not nearly as much as is in shared_buffers. So no justification for an increase here. How about the messages table?

mailer=# SELECT pg_size_pretty(count(*) * 8192)
FROM pg_class c
INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
WHERE c.oid::regclass = 'messages'::regclass
AND usagecount >= 5;
 pg_size_pretty 
----------------
 1200 MB

The whole thing is in very high demand! And there’s plenty of space for it it be there. The remainder of the majority of the space was a table similar to this in yet another one of the databases in the cluster.

So this PostgreSQL cluster seems to have some pretty good justification for having a shared_buffers 3x higher than what is typically suggested. It’s not actually using all of what’s available (only 18 of 24GB) and there’s still a significant amount in shared_buffers that’s got a usagecount below 3. My guidance to the client was to leave shared_buffers where it was, but to keep an eye on the tables like orders, order_items & messages. If the high usage of those tables is justified and they start increasing in size significantly, then this evaluation should be done again to see if shared_buffers should possibly be increased to keep that high demand data readily available in memory.

The pg_buffercache extension has been a great help with fine tuning one of the more important settings in PostgreSQL. Hopefully this helps clarify more how to evaluate shared_buffers usage and figuring out an ideal setting. And to be honest, I’m hoping that someone that reads this is in a position to better experiment with actually changing the shared_buffers value in situations like this to see if it really can make a difference in performance. As someone commented on my previous post, shared_buffers is a pretty invasive setting to change, not only because it requires a restart, but because you don’t want to screw up your performance on an active production machine. But you need the kind of activity that will be on an active production machine to accurately evaluate such settings. Reproducing such activity outside of production is really challenging.

So, looking for feedback and for anyone else to either validate or find fault with my experimentations so far.

↧

Mark Wong: Loading Tables and Creating B-tree and Block Range Indexes

October 3, 2014, 2:14 pm

≫ Next: Greg Sabino Mullane: Postgres copy schema with pg_dump

≪ Previous: Keith Fiske: A Small Database Does Not Mean Small shared_buffers

I have been looking at the new Block Range Indexes (BRIN) being developed for PostgreSQL 9.5. BRIN indexes are designed to provide similar benefits to partitioning, especially for large tables, just without the need to declare partitions. That sounds pretty good but let’s look in greater detail to see if it lives up to the hype.

How large? Here’s one data point. Using the TPC Benchmark(TM) H provided dbgen we created data for the lineitem table at the 10GB scale factor, which results in a 7.2GB text file.

We’re going to compare a couple of basic tasks. The first look will be at the impact of inserting data into a table using the COPY command. We will do a simple experiment of creating a table without any indexes or constraints on it and time how long it takes to load the lineitem data. Then repeat with a B-tree index on one column. And finally repeat again with a BRIN index instead of a B-tree index on the same column.

The above bar plot shows the average times over five measurements. Our baseline of loading the lineitem table without any indexes averaged 5.1 minutes. Once a B-tree index was added to the i_shipdate DATE column, the average load time increased to 9.4 minutes, or by 85%. When the B-three index was replaced by a BRIN index, the load time only increased to 5.6 minutes, or by 11%.

The next experiment is to average how long it takes to create a B-tree index on a table that is already populated with data. Then repeat that with a BRIN index. This will be done on the same i_shipdate DATE column and repeated for a total of five measurements each.

The B-tree index took 95 seconds to build, where the BRIN index 18 seconds to build, an 80% improvement.

That’s very encouraging. The overhead to loading data into a table from a single BRIN index is only 11%, and reduced the total load time by 40% when compared to having a B-tree index. And creating a new BRIN index takes only 20% of the time that a new B-tree index would take. We will have more experiments lined up to see where else BRIN indexes may or may not benefit us.

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n°318633 – the AXLE project – http://www.axleproject.eu

↧

Greg Sabino Mullane: Postgres copy schema with pg_dump

October 9, 2014, 8:44 am

≫ Next: Mark Wong: Index Overhead on a Growing Table

≪ Previous: Mark Wong: Loading Tables and Creating B-tree and Block Range Indexes

Manny Calavera (animated by Lua!)
Image by Kitt Walker

Someone on the #postgresql IRC channel was asking how to make a copy of a schema; presented here are a few solutions and some wrinkles I found along the way. The goal is to create a new schema based on an existing one, in which everything is an exact copy. For all of the examples, 'alpha' is the existing, data-filled schema, and 'beta' is the newly created one. It should be noted that creating a copy of an entire database (with all of its schemas) is very easy: CREATE DATABASE betadb TEMPLATE alphadb;

The first approach for copying a schema is the "clone_schema" plpgsql function written by Emanuel Calvo. Go check it out, it's short. Basically, it gets a list of tables from the information_schema and then runs CREATE TABLE statements of the format CREATE TABLE beta.foo (LIKE alpha.foo INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING DEFAULTS). This is a pretty good approach, but it does leave out many types of objects, such as functions, domains, FDWs, etc. as well as having a minor sequence problem. It's also slow to copy the data, as it creates all of the indexes before populating the table via INSERT.

My preferred approach for things like this is to use the venerable pg_dump program, as it is in the PostgreSQL 'core' and its purpose in life is to smartly interrogate the system catalogs to produce DDL commands. Yes, parsing the output of pg_dump can get a little hairy, but that's always preferred to trying to create DDL yourself by parsing system catalogs. My quick solution follows.

pg_dump -n alpha | sed '1,/with_oids/ {s/ alpha/ beta/}' | psql

Sure, it's a bit of a hack in that it expects a specific string ("with_oids") to exist at the top of the dump file, but it is quick to write and fast to run; pg_dump creates the tables, copies the data over, and then adds in indexes, triggers, and constraints. (For an explanation of the sed portion, visit this post). So this solution works very well. Or does it? When playing with this, I found that there is one place in which this breaks down: assignment of ownership to certain database objects, especially functions. It turns out pg_dump will *always* schema-qualify the ownership commands for functions, even though the function definition right above it has no schema, but sensibly relies on the search_path. So you see this weirdness in pg_dump output:

--
-- Name: myfunc(); Type: FUNCTION; Schema: alpha; Owner: greg
--
CREATE FUNCTION myfunc() RETURNS text
    LANGUAGE plpgsql
    AS $$ begin return 'quick test'; end$$;

ALTER FUNCTION alpha.myfunc() OWNER TO greg;

Note the fully qualified "alpha.myfunc". This is a problem, and the sed trick above will not replace this "alpha" with "beta", nor is there a simple way to do so, without descending into a dangerous web of regular expressions and slippery assumptions about the file contents. Compare this with the ownership assignments for almost every other object, such as tables:

--
-- Name: mytab; Type: TABLE; Schema: alpha; Owner: greg
--
CREATE TABLE mytab (
    id integer
);

ALTER TABLE mytab OWNER TO greg;

No mention of the "alpha" schema at all, except inside the comment! Before going into why pg_dump is acting like that, I'll present my current favorite solution for making a copy of a schema: using pg_dump and some creative renaming:

$ pg_dump -n alpha -f alpha.schema
$ psql -c 'ALTER SCHEMA alpha RENAME TO alpha_old'
$ psql -f alpha.schema
$ psql -c 'ALTER SCHEMA alpha RENAME TO beta'
$ psql -c 'ALTER SCHEMA alpha_old TO alpha'

This works very well, with the obvious caveat that for a period of time, you don't have your schema available to your applications. Still, a small price to pay for what is most likely a relatively rare event. The sed trick above is also an excellent solution if you don't have to worry about setting ownerships.

Getting back to pg_dump, why is it schema-qualifying some ownerships, despite a search_path being used? The answer seems to lie in src/bin/pg_dump/pg_backup_archiver.c:

  /*                                                                                                                                                      
     * These object types require additional decoration.  Fortunately, the                                                                                  
     * information needed is exactly what's in the DROP command.                                                                                            
     */
    if (strcmp(type, "AGGREGATE") == 0 ||
        strcmp(type, "FUNCTION") == 0 ||
        strcmp(type, "OPERATOR") == 0 ||
        strcmp(type, "OPERATOR CLASS") == 0 ||
        strcmp(type, "OPERATOR FAMILY") == 0)
    {
        /* Chop "DROP " off the front and make a modifiable copy */
        char       *first = pg_strdup(te->dropStmt + 5);

Well, that's an ~~ugly~~ elegant hack and explains why the schema name keeps popping up for functions, aggregates, and operators: because their names can be tricky, pg_dump hacks apart the already existing DROP statement built for the object, which unfortunately is schema-qualified. Thus, we get the redundant (and sed-busting) schema qualification!

Even with all of that, it is still always recommended to use pg_dump when trying to create DDL. Someday Postgres will have a DDL API to allow such things, and/or commands like MySQL's SHOW CREATE TABLE, but until then, use pg_dump, even if it means a few other contortions.

↧

Mark Wong: Index Overhead on a Growing Table

October 9, 2014, 5:35 pm

≫ Next: Michael Paquier: Postgres 9.5 feature highlight: SKIP LOCKED for row-level locking

≪ Previous: Greg Sabino Mullane: Postgres copy schema with pg_dump

This another simple test in continuation from last time. We will start with the same lineitem table as in the previous example. We will measure the time it takes to load the same 7.2GB text file repeatedly until the table size grows to about 1TB. We create a baseline with a table that has no indexes built on it. Then repeat with a B-tree index on the l_shipdate DATE column, and again after replacing the B-tree index with a BRIN index.

Our baseline shows that as the table grows the time it takes to insert data also increases. The difference in the time that it takes to insert data when the table is near 1TB compared to when it is empty is about 12 seconds. With the B-tree index in place the difference increases to 84 seconds. Finally the change is only about 15 seconds with the BRIN index in place.

So over a 1TB growth, the overheard on inserting data into the lineitem table due to just the size of the table increases about 4.3%. B-trees increase that difference to 12.2%. While the BRIN index continues to look encouraging by only increasing the overhead to 4.2%.

↧

Michael Paquier: Postgres 9.5 feature highlight: SKIP LOCKED for row-level locking

October 10, 2014, 12:10 am

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Implement SKIP LOCKED for row-level locks

≪ Previous: Mark Wong: Index Overhead on a Growing Table

SKIP LOCKED is a new feature associated with row-level locking that has been newly-introduced in PostgreSQL 9.5 by this commit:

commit: df630b0dd5ea2de52972d456f5978a012436115e
author: Alvaro Herrera <alvherre@alvh.no-ip.org>
date: Tue, 7 Oct 2014 17:23:34 -0300
Implement SKIP LOCKED for row-level locks

This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows.  While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.

Catalog version bumped because this patch changes the representation of
stored rules.

Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.

Author: Thomas Munro

Let's take for example the simple case of the following table that will be locked:

=# CREATE TABLE locked_table AS SELECT generate_series(1, 4) as id;
SELECT 1

Now a session is taking a shared lock on the row created of locked_table, taking the lock within a transaction block ensures that it will still be taken for the duration of the tests.

=# BEGIN;
BEGIN
=# SELECT id FROM locked_table WHERE id = 1 FOR SHARE;
 id
----
  1
(1 row)

Now, the shared lock prevents any update, delete or even exclusive lock from being taken in parallel. Hence the following query will wait until the transaction of previous session finishes. In this case this query is cancel by the user (note that error message tells for which row this query was waiting for):

=# SELECT * FROM locked_table WHERE id = 1 FOR UPDATE;
^CCancel request sent
ERROR:  57014: canceling statement due to user request
CONTEXT:  while locking tuple (0,1) in relation "locked_table"
LOCATION:  ProcessInterrupts, postgres.c:2966

There is already one way to bypass this wait phase, by using NOWAIT with the lock taken to return an error instead of waiting if there is a conflict:

=# SELECT * FROM locked_table WHERE id = 1 FOR UPDATE NOWAIT;
ERROR:  55P03: could not obtain lock on row in relation "locked_table"
LOCATION:  heap_lock_tuple, heapam.c:4542

And now shows up SKIP LOCKED, that can be used to bypass the rows locked when querying them:

=# SELECT * FROM locked_table ORDER BY id FOR UPDATE SKIP LOCKED;
 id
----
  2
  3
  4
(3 rows)

Note that this makes the data taken actually inconsistent, but this new clause finds its utility to reduce lock contention for example on queue tables where the same rows are being access from multiple clients simultaneously.

↧

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Implement SKIP LOCKED for row-level locks

October 10, 2014, 11:35 am

≫ Next: Josh Berkus: New Table Bloat Query

≪ Previous: Michael Paquier: Postgres 9.5 feature highlight: SKIP LOCKED for row-level locking

On 7th of October, Alvaro Herrera committed patch: Implement SKIP LOCKED for row-level locks This clause changes the behavior of SELECT locking clauses in the presence of locked rows: instead of causing a process to block waiting for the locks held by other processes (or raise an error, with NOWAIT), SKIP LOCKED makes the […]

↧

Josh Berkus: New Table Bloat Query

October 10, 2014, 5:19 pm

≫ Next: Hubert 'depesz' Lubaczewski: What logging has least overhead?

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Implement SKIP LOCKED for row-level locks

To accompany the New Index Bloat Query, I've written a New Table Bloat Query. This also involves the launch of the pgx_scripts project on GitHub, which will include most of the "useful scripts" I talk about here, as well as some scripts from my co-workers.

The new table bloat query is different from the check_postgres.pl version in several ways:

Rewritten to use WITH statements for better maintainability and clarity
Conditional logic for old Postgres versions and 32-bit platforms taken out
Index bloat removed, since we have a separate query for that
Columns formatted to be more immediately comprehensible

In the course of building this, I found two fundamentally hard issues:

Some attributes (such as JSON and polygon fields) have no stats, so those tables can't be estimated.
There's no good way to estimate bloat for compressed (TOAST) attributes and rows.

Also, while I rewrote the query almost entirely, I am still relying on Greg's core math for estimating table size. Comparing this with the results of pgstattuple, I'm seeing an error of +/- 20%, which is pretty substantial. I'm not clear on where that error is coming from, so help improving the math is very welcome!

Results look like this:

  databasename | schemaname |   tablename   | pct_bloat | mb_bloat | table_mb   
 --------------+------------+-------------------+-----------+----------+----------  
  members_2014 | public   | current_member  |    92 |  16.98 |  18.547   
  members_2014 | public   | member_response  |    87 |  17.46 |  20.000   
  members_2014 | public   | archive_member  |    84 |  35.16 |  41.734   
  members_2014 | public   | survey      |    57 |  28.59 |  50.188

pct_bloat is how much of the table (0 to 100) is estimated to be dead space. MB_bloat is how many megabytes of bloat are estimated to exist. Table_mb is the actual size of the table in megabytes.

The suggested criteria is to list tables which are either more than 50% bloat and bigger than 10MB, or more than 25% bloat and bigger than 1GB. However, you should calibrate this according to your own database.

↧

Hubert 'depesz' Lubaczewski: What logging has least overhead?

October 14, 2014, 5:45 am

≫ Next: Hans-Juergen Schoenig: Killing proper indexing: A neat idea

≪ Previous: Josh Berkus: New Table Bloat Query

When working with PostgreSQL you generally want to get information about slow queries. The usual approach is to set log_min_duration_statement to some low(ish) value, run your app, and then analyze logs. But you can log to many places – flat file, flat file on another disk, local syslog, remote syslog. And – perhaps, instead of […]

↧

Hans-Juergen Schoenig: Killing proper indexing: A neat idea

October 15, 2014, 5:35 am

≫ Next: Michael Paquier: Postgres 9.5 feature highlight: Replication slot control with pg_receivexlog

≪ Previous: Hubert 'depesz' Lubaczewski: What logging has least overhead?

After being on the road to do PostgreSQL consulting for Cybertec for over a decade I noticed that there are a couple of ways to kill indexing entirely. One of the most favored ways is to apply functions or expressions on the column people want to filter on. It is a sure way to kill […]

↧

Michael Paquier: Postgres 9.5 feature highlight: Replication slot control with pg_receivexlog

October 15, 2014, 11:20 pm

≫ Next: Magnus Hagander: A few short notes about PostgreSQL and POODLE

≪ Previous: Hans-Juergen Schoenig: Killing proper indexing: A neat idea

Introduced in PostgreSQL 9.4, pg_recvlogical has the ability to control the creation of logical replication slots from which logical changes can be streamed. Note that in the case this is a mandatory condition when using logical decoding. pg_receivexlog does not have in 9.4 any control on the physical replication slots it may stream from (to ensure that the WAL segment files this utility is looking for are still retained on the server side). This feature has been added for 9.5 with the following commit:

commit: d9f38c7a555dd5a6b81100c6d1e4aa68342d8771
author: Andres Freund <andres@anarazel.de>
date: Mon, 6 Oct 2014 12:51:37 +0200
Add support for managing physical replication slots to pg_receivexlog.

pg_receivexlog already has the capability to use a replication slot to
reserve WAL on the upstream node. But the used slot currently has to
be created via SQL.

To allow using slots directly, without involving SQL, add
--create-slot and --drop-slot actions, analogous to the logical slot
manipulation support in pg_recvlogical.

Author: Michael Paquier

This simply introduces two new options allowing to create or drop a physical replication slot, respectively --create-slot and --drop-slot. The main difference with pg_recvlogical is that those additional actions are optional (not --start option introduced as well for backward-compatibility). Be careful of a couple of things when using this feature though. First, when a slot is created, stream of the segment files begins immediately.

$ pg_receivexlog --create-slot --slot physical_slot -v -D ~/xlog_data/
pg_receivexlog: creating replication slot "physical_slot"
pg_receivexlog: starting log streaming at 0/1000000 (timeline 1)

The slot created can then be found in the system view pg_replication_slots.

=# select slot_name, plugin, restart_lsn from pg_replication_slots ;
   slot_name   | plugin | restart_lsn
---------------+--------+-------------
 physical_slot | null   | 0/1000000
(1 row)

Then, when dropping a slot, as process can stream nothing it exits immediately, and slot is of course not more:

$ pg_receivexlog --drop-slot --slot physical_slot -v
pg_receivexlog: dropping replication slot "physical_slot"
$ psql -c 'SELECT slot_name FROM pg_replication_slots'
 slot_name
-----------
(0 rows)

Deletion and creation of the replication slot is made uses the same replication connection as the one for stream and uses the commands CREATE_REPLICATION_SLOT and DROP_REPLICATION_SLOT from the replication protocol, resulting in a light-weight implementation. So do not hesitate to refer to this code when implementing your own client applications, src/bin/pg_basebackup/streamutil.c being particularly helpful.

↧

Magnus Hagander: A few short notes about PostgreSQL and POODLE

October 17, 2014, 12:41 am

≫ Next: Pavel Stehule: styles for unicode borders are merged (PostgreSQL 9.5)

≪ Previous: Michael Paquier: Postgres 9.5 feature highlight: Replication slot control with pg_receivexlog

The POODLE attack on https (the attack is about https, the vulnerability in SSL, an important distinction) has received a lot of media attention lately, so I figured a (very) short writeup was necessary.

The TL;DR; version is, you don't have to worry about POODLE for your PostgreSQL connections when using SSL.

The slightly longer version can be summarized by:

The PostgreSQL libpq client in all supported versions will only connect with TLSv1 and newer, which is not vulnerable.
The PostgreSQL server prior to the upcoming 9.4 version will however respond in SSLv3 (which is the vulnerable version) if the client insists on it (which a third party client can do).
To exploit POODLE, you need a client that explicitly does out-of-protocol downgrading. Something that web browsers do all the time, but very few other clients do. No known PostgreSQL client library does.
To exploit POODLE, the attacker needs to be able to modify the contents of the encrypted stream - it cannot be passively broken into. This can of course happen if the attacker can control parameters to a SQL query for example, but the control over the data tends to be low, and the attacker needs to already control the client. In the https attack, this is typically done through injecting javascript.
To exploit POODLE, there needs to be some persistent secret data at a fixed offset in each connection. This is extremely unlikely in PostgreSQL, as the protocol itself has no such data. There is a "cancel key" at the same location in each stream, but it is not reused and a new one is created for each connection. This is where the https attack typically uses the session cookie which is both secret and fixed location in the request header.

For a really good writeup on the problem, see this post from PolarSSL, or this one from GnuTLS.

↧

Pavel Stehule: styles for unicode borders are merged (PostgreSQL 9.5)

October 18, 2014, 10:25 pm

≫ Next: Hubert 'depesz' Lubaczewski: PostgreSQL + Perl + Unicode == confusion. Why?

≪ Previous: Magnus Hagander: A few short notes about PostgreSQL and POODLE

Following feature is less important for performance, but for somebody can be important for aesthetic reasons - now you can use a styles for unicode table borders. Possible styles are only two, but you can set a border, header and column style. It is a 6 combinations. Next you have a 3 styles for borders generally - so it together 18 possible combinations of psql table output:

postgres=# \pset unicode_header_linestyle double 
Unicode border linestyle is "double".
postgres=# \pset linestyle unicode 
Line style is unicode.
postgres=# \l
                                  List of databases
   Name    │  Owner   │ Encoding │   Collate   │    Ctype    │   Access privileges   
═══════════╪══════════╪══════════╪═════════════╪═════════════╪═══════════════════════
 postgres  │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ 
 template0 │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres          ↵
│          │          │             │             │ postgres=CTc/postgres
 template1 │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres          ↵
│          │          │             │             │ postgres=CTc/postgres
(3 rows)

postgres=# \l
                            List of databases
  Name     Owner   Encoding   Collate      Ctype      Access privileges   
═════════ ════════ ════════ ═══════════ ═══════════ ═════════════════════
postgres  postgres UTF8     en_US.UTF-8 en_US.UTF-8 
template0 postgres UTF8     en_US.UTF-8 en_US.UTF-8 =c/postgres          ↵
                                                    postgres=CTc/postgres
template1 postgres UTF8     en_US.UTF-8 en_US.UTF-8 =c/postgres          ↵
                                                    postgres=CTc/postgres
(3 rows)


postgres=# \pset border 2
Border style is 2.
postgres=# \l
                                   List of databases
┌───────────┬──────────┬──────────┬─────────────┬─────────────┬───────────────────────┐
│   Name    │  Owner   │ Encoding │   Collate   │    Ctype    │   Access privileges   │
╞═══════════╪══════════╪══════════╪═════════════╪═════════════╪═══════════════════════╡
│ postgres  │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │                       │
│ template0 │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres          ↵│
│           │          │          │             │             │ postgres=CTc/postgres │
│ template1 │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres          ↵│
│           │          │          │             │             │ postgres=CTc/postgres │
└───────────┴──────────┴──────────┴─────────────┴─────────────┴───────────────────────┘
(3 rows)

postgres=# \pset unicode_border_linestyle double 
Unicode border linestyle is "double".
postgres=# \l
                                   List of databases
╔═══════════╤══════════╤══════════╤═════════════╤═════════════╤═══════════════════════╗
║   Name    │  Owner   │ Encoding │   Collate   │    Ctype    │   Access privileges   ║
╠═══════════╪══════════╪══════════╪═════════════╪═════════════╪═══════════════════════╣
║ postgres  │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │                       ║
║ template0 │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres          ↵║
║           │          │          │             │             │ postgres=CTc/postgres ║
║ template1 │ postgres │ UTF8     │ en_US.UTF-8 │ en_US.UTF-8 │ =c/postgres          ↵║
║           │          │          │             │             │ postgres=CTc/postgres ║
╚═══════════╧══════════╧══════════╧═════════════╧═════════════╧═══════════════════════╝
(3 rows)

postgres=# \pset border 1
Border style is 1.
postgres=# \pset unicode_column_linestyle double
Unicode column linestyle is "double".
postgres=# \l
                                  List of databases
   Name    ║  Owner   ║ Encoding ║   Collate   ║    Ctype    ║   Access privileges   
═══════════╬══════════╬══════════╬═════════════╬═════════════╬═══════════════════════
 postgres  ║ postgres ║ UTF8     ║ en_US.UTF-8 ║ en_US.UTF-8 ║ 
 template0 ║ postgres ║ UTF8     ║ en_US.UTF-8 ║ en_US.UTF-8 ║ =c/postgres          ↵
║          ║          ║             ║             ║ postgres=CTc/postgres
 template1 ║ postgres ║ UTF8     ║ en_US.UTF-8 ║ en_US.UTF-8 ║ =c/postgres          ↵
║          ║          ║             ║             ║ postgres=CTc/postgres
(3 rows)

↧

Hubert 'depesz' Lubaczewski: PostgreSQL + Perl + Unicode == confusion. Why?

October 19, 2014, 6:06 am

≫ Next: robert berry: Monitoring Postgresql with a Background Worker

≪ Previous: Pavel Stehule: styles for unicode borders are merged (PostgreSQL 9.5)

Yesterday I had an interesting discussion on irc. A guy wanted to know why Perl script is causing problems when dealing with Pg and unicode characters. The discussion went sideways, I got (a bit) upset, and had to leave anyway, so I didn't finish it. But it did bother me, as for me the reasons […]

↧

robert berry: Monitoring Postgresql with a Background Worker

October 20, 2014, 5:00 pm

≫ Next: Craig Ringer: Ware Yosemite? Possible PostgreSQL upgrade issues in OS X 10.10

≪ Previous: Hubert 'depesz' Lubaczewski: PostgreSQL + Perl + Unicode == confusion. Why?

Monitoring Postgresql with a Background Worker

Oct 21, 2014 – Portland

pgantenna and pgsampler comprise an experimental Postgreqsql monitoring framework. This post explores how they work, and what problems they aim to solve.

Framework Overview

pgsampler is a background worker which collects data in a Postgresql cluster. It can log this data to CSV files or ship the metrics to a pgantenna instance over a tcp connection.

pgantenna is an application shipped as a Docker image which receives pgsampler data. It provides a web interface for live monitoring, checks for alerting conditions, and allows for psql access to a historical database of cluster metrics.

Motivation

There are a number of high quality monitoring and performance analysis tools for Postgresql. Many of these involve a remote service which connects to Postgresql as a regular client, or an application that parses log files.

The presented framework uses a background worker to ship statistics to a remote service. It aims to solve a grab bag of real or imagined problems discussed below. Of course, this approach presents it’s own problems and is thus best characterized as an experiment.

Live Monitoring

Data is sent from the cluster in a polling loop at second intervals. Different metrics can be tuned to desired sampling rates.

Using Postgres to Monitor Postgres

Dashboard plots and alert conditions are all written directly in SQL. For example, alert conditions are triggered whenever a cron-executed query returns a NULL in the first field in the first record. Plots are rendered with plotpg.

Historical Analysis with SQL

pgantenna provides a containerized remote cluster which stores historical data separate from transactional systems. The history is just a Postgresql database that can be queried with familiar tools.

Easy to Configure

The background worker uses direct access to identify and connect to databases automatically. Security concerns notwithstanding, this allows for very little configuration minutae to get started with comprehensive monitoring.

Close to metal

A background worker lives and dies with postmaster. One of the foundational alerting conditions is the receipt of a heartbeat from the background worker.

Extensible

Because the metrics collector is a background worker, it may prove to be able to collect data that other monitoring approaches could not reach. For example, while developing I considered several approaches to collect a notion of statements/second. I was thwarted by a reluctance to implement executor hooks or divine this information from shared memory data structures due to limited experience with Postgres internals. But is it possible, or a good idea? Maybe.

Trying it Out

This is an experimental prototype so it’s not appropriate for critical clusters.

To get started first launch a pgantenna instance which can be as simple as a single command on a system with Docker installed.

docker run -p 24831:24831 -p 80:80 no0p/pgantenna

Next install pgsampler and update postgresql.conf with an entry pointing pgsampler to the pgantenna instace.

pgsampler.output_network_host='localhost'

↧

Craig Ringer: Ware Yosemite? Possible PostgreSQL upgrade issues in OS X 10.10

October 21, 2014, 3:00 am

≫ Next: Josh Berkus: Introducing Flexible Freeze

≪ Previous: robert berry: Monitoring Postgresql with a Background Worker

I’m seeing reports of a number of issues with PostgreSQL after upgrades of OS X machines to Yosemite (OS X 10.10) that I’m concerned about, so I’m seeking more information about the experiences of PostgreSQL users who’ve done OS X 10.10 upgrades.

I can’t confirm anything yet, but back up all your databases before any upgrade to OS X 10.10. Just in case. (Of course, you do that before any upgrade, but just in case it slipped your mind this time…).

I don’t have access to a Mac because Apple’s policy prevents developers from running OS X for testing and development (or anything else) without buying physical Apple hardware and finding somewhere to run it. So I can’t test most of this myself, and I really need reports from users, or if possible, results of proactive testing by OS X users.

OS X built-in PostgreSQL deleted on update

Some OS X users appear to use the PostgreSQL version built-in to OS X for their own data, rather than installing a new PostgreSQL. Some of them, in addition to using the binaries, also use a PostgreSQL cluster (database instance) that’s created by OS X for the use of Server.app, instead of initdbing their own.

On releases prior to Yosemite the PostgreSQL provided by Apple was on the default PATH, though not necessarily running by default. It seems that on Yosemite it’s been removed; there’s no longer any /usr/bin/psql, etc. As far as I can tell Server.app now bundles PostgreSQL within the application bundle instead.

Some user reports suggest that on upgrade, the Apple-controlled databases in the new release are migrated into the new cluster managed by Server.app then the old cluster is stopped or possibly deleted – a colleage checked the upgrade script and found rm -rf /var/pgsql in it.

The PostgreSQL data directory in prior releases was /private/var/pgsql (and /var is a symlink to /private/var) or /Library/Server/PostgreSQL/Data.

The main symptom you’ll see is:

Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/pgsql_socket/.s.PGSQL.5432"?

… but this issue is only one of many, many possible causes of that message.

OS X updater may be removing empty directories

I’m seeing a number of reports that suggest that the OS X updater may be removing empty directories. This causes problems with PostgreSQL, which expects to have an empty pg_twophase, pg_tblspc, and often pg_stat_tmp as a part of normal operation.

Yosemite (OSX 10.0) problems with Postgresql
Mac OS X Yosemite Upgrade & PostgreSQL
`pg_tblspc` missing after installation of OS X Yosemite
… and so, so many posts on Holdem Manager and Poker Tracker forums.

It looks like working around this is a simple matter of mkdiring the directories and setting appropriate permissions, but it’s a concern that it’s happening at all.

`Server.app` 3.2.1 upgrade issues

I’m also seeing reports that the patch release 3.2.1 for Server.app upgrades its private bundled PostgreSQL from 9.2.8 to 9.3.4, which seems to be causing some users issues.

If you’ve used the PostgreSQL bundled in Server.app to initdb a new cluster, this will render it inaccessible until you find and install a compatible PostgreSQL 9.2.

If you’ve used Server.app‘s own install then it may not preserve your databases when it upgrades. I can’t confirm the upgrade process it uses yet, and really need user tests reports for this.

Test reports and more information needed

Given these concerns, I would value reports and test results from OS X users who’re still on pre-Yosemite versions and planning to update soon.

If you’re an Apple customer, please also contact Apple support and ask them to investigate this.

So far I don’t know for sure if there’s data loss involved and I don’t have the access to investigate properly but I’m quite concerned about the preliminary indications I’m able to find.

↧