Josh Berkus: Introducing Flexible Freeze

October 21, 2014, 2:18 pm

≫ Next: Hans-Juergen Schoenig: walbouncer: Filtering transaction log

≪ Previous: Craig Ringer: Ware Yosemite? Possible PostgreSQL upgrade issues in OS X 10.10

One of the things I mentioned in my series on VACUUM FREEZE was that we really needed a Postgres utility which would opportunistically freeze tables during low traffic periods. Today I'm announcing the Flexible Freeze project, our first attempt at designing such a utility.

All that's there right now is a simple Python script. However, that script is already a useful tool, installed at multiple production sites. Here's how the script works:

identify your active databases and daily/weekly low traffic periods.
create a cron job which calls flexible_freeze.py with a time limit to keep it inside your low traffic window.
flexible_freeze.py will loop through your tables with the oldest XIDs, freezing them until it runs out of time or out of tables

There is also a second mode, using the --vacuum switch, which does VACUUM ANALYZE on the tables with the most dead rows (according to pg_stat_user_tables). This is to help users who have a strong high/low traffic cycle and want to make sure that regular vacuuming takes place during low traffic. If you're running both modes, we advise doing the freeze first.

Of course, I have a tanker-truck full of desired improvements/expansions to this. So, pull requests welcome.

If you're more into Ruby, Wanelo has rewritten flexible freeze for Rails and incorporated it into their Postmodern tool.

↧

Hans-Juergen Schoenig: walbouncer: Filtering transaction log

October 23, 2014, 12:16 am

≫ Next: Christophe Pettus: “Finding and Repairing Database Corruption” at PGConf.EU

≪ Previous: Josh Berkus: Introducing Flexible Freeze

Up to now it was only possible to replicate entire database instances from one node to the other. A slave always had to consume the entire transaction log created by the master turning the slave into a binary copy. However, this is not always desirable. In many cases it is necessary to split the database […]

↧

Christophe Pettus: “Finding and Repairing Database Corruption” at PGConf.EU

October 23, 2014, 5:37 am

≫ Next: Christophe Pettus: “Be Very Afraid: Backup and Disaster Planning” at PGConf.EU

≪ Previous: Hans-Juergen Schoenig: walbouncer: Filtering transaction log

Slides from my talk, Finding and Repairing Database Corruption, are now available.

↧

Christophe Pettus: “Be Very Afraid: Backup and Disaster Planning” at PGConf.EU

October 24, 2014, 1:11 am

≫ Next: Bruce Momjian: Two New Presentations

≪ Previous: Christophe Pettus: “Finding and Repairing Database Corruption” at PGConf.EU

Slides from my talk, Be Very Afraid: Backup and Disaster Planning, are now available.

↧

Bruce Momjian: Two New Presentations

October 24, 2014, 1:45 pm

≫ Next: Joel Jacobson: “How we use PostgreSQL at Trustly” at PGConf.EU

≪ Previous: Christophe Pettus: “Be Very Afraid: Backup and Disaster Planning” at PGConf.EU

PostgreSQL Conference Europe has just finished and I delivered two new presentations at the conference (a first for me). Postgres Scaling Opportunities summarizes scaling options, and Flexible Indexing with Postgres summarizes indexing options. I have been submitting these presentations to conferences for many months but this is the first conference to have chosen them. The talks were well attended and generated positive feedback.

The talks are more surveys rather than focussing on specific technologies. It might be beneficial for future conferences to have more survey-oriented talks, as many attendees are not Postgres experts.

I am now heading to Russia for two weeks, presenting in St. Petersburg and Moscow.

↧

Joel Jacobson: “How we use PostgreSQL at Trustly” at PGConf.EU

October 25, 2014, 3:19 am

≫ Next: Michael Paquier: Make a logical receiver behave as a synchronous standby

≪ Previous: Bruce Momjian: Two New Presentations

Slides from my talk, How we use PostgreSQL at Trustly, are now available.

↧

Michael Paquier: Make a logical receiver behave as a synchronous standby

October 25, 2014, 9:03 pm

≫ Next: Andrew Dunstan: One more time: Replication is no substitute for good backups.

≪ Previous: Joel Jacobson: “How we use PostgreSQL at Trustly” at PGConf.EU

Logical decoding is a superset of the existing standby protocol. Hence after decoding changes from WAL an output plugin can shape it in any kind of ways, making for example possible to have a plugin that does the exact revert operation of the decoding portion a PostgreSQL server instancedid by reproducing similar WAL records that could be replayed similarly to a standby. Not sure if this would be actually useful, but well that's possible...

One of the great things in this new 9.4 infrastructure is then the possibility to have a client receiving the logical changes able to let the PostgreSQL instance decoding the changes think that what receives the changes is itself a standby by having it using the replication protocol that vanilla streaming standbys use and is present since 9.0 for the asynchronous node, and 9,1 for the "synchronous" node (having master node wait for the commit confirmation from a standby) guaranteeing no loss of data after a commit. There are three things that are important to be aware of on the receiver side when looking for such a behavior with a logical receiver.

First, using the replication protocol is necessary to let the master node think that what is connected is a kind of standby. Extracting logical changes is possible as well with the set of dedicated functions called pg_logical_slot_peek_changes and pg_logical_slot_get_changes (and their binary equivalent), but do not count on that if you want to wait from the receiver that a change has been committed (abuse of term as this depends on how this receiver consumes those changes).

Second, a master node classifies the standbys by priority using the parameter synchronous_standby_names, the synchronous standby being the lowest one strictly higher than zero. So when using a receiver, be sure that it connects to the master node using application_name to give to it a proper identifier, resulting in a connection string similar to that with a minimum configuration:

dbname=my_logical_database replication=database application_name=my_receiver

Finally, be sure that the receiver sends feedback to the master node. This has been already mentioned in a previous post. In the case of a receiver consuming logical information, this is important of course to release information on the replication slot being used, so as pg_xlog partition does not bloat on the master. But as well this is essential to let the master know that there is no delta in the changes being replayed, making the node able to really perform synchronous replication (similarly to vanilla standbys, this depends as well on the setting value of synchronous_commit).

Using the output plugin called decoder_raw coupled with the receiver receiver_raw (that is simply a background worker able to fetch the changes decoded and apply on the database this worker is connected to the raw queries generated by the output plugin) presented a couple of months back on this blog, it is actually possible to replication DML queries from one master node to another, as long as the schema is stable and similar (somebody mentioned me as well that receiver_raw could be used on the same master generating the changes on a different database but that's more spectacular to do it on two different nodes). The following table is created on both master nodes, both running on local host listening to ports 5432 and 5433:

=# CREATE TABLE replicated_table (time timestamp default now(), quote text);
CREATE TABLE

As there is no cheating, both nodes are indeed not in recovery:

$ psql -At -c 'SELECT pg_is_in_recovery()' -p 5432
f
$ psql -At -c 'SELECT pg_is_in_recovery()' -p 5433
f

First node has a logical replication slot using decoder_raw, with a "synchronous" standby called receiver_raw:

$ psql -p 5432 -c 'SELECT application_name, sync_state FROM pg_stat_replication'
  application_name | sync_state
 ------------------+------------
  receiver_raw     | sync
 (1 row)
$ psql -At -p 5432 -c 'SHOW synchronous_standby_names'
receiver_raw

Also the second node runs a background worker able to fetch and apply the changes:

$ ps x | grep 42787
42787   ??  Ss     0:00.06 postgres: bgworker: receiver_raw

Note as well the connection string used by the background worker on the second node to connect to the first node:

$ psql -At -p 5433 -c 'SHOW receiver_raw.conn_string'
replication=database dbname=my_db application_name=receiver_raw

With all those things in place, changes get replicated, and the second node is thought as in sync:

$ psql -At -p 5433 -c "SELECT pid FROM pg_stat_activity WHERE state = 'idle'"
42787
$ psql -p 5433 -c "SELECT * FROM replicated_table"
            time            |         quote
----------------------------+------------------------
 2014-10-25 18:12:19.923825 | Tuple data from node 1
(1 row)
$ psql -p 5432 -c "SELECT * FROM replicated_table"
            time            |         quote
----------------------------+------------------------
 2014-10-25 18:12:19.923825 | Tuple data from node 1
(1 row)

Note as well that making it crash-safe, aka by reporting to the master the correct WAL position that a given client has really fsync'd or written using respectively flush_position and write_position is as well an essential thing to take into account or the master node holding logical slot information would simply release it, losing it in the wild for the client receiving the changes that did not consume those changes properly if it has failed for a reason or another.

Note that this has been covered as well, not in so much details though, in the presentation about logical decoding that has been done in Madrid for Postgres Europe 2014 and Chicago for PG Open 2014, with the slides of the presentation being available here.

For people show attended any of those conferences, be sure to have a look as well at the PostgreSQL wiki on the following pages dedicated to Postgres Europe 2014 and Postgres Open 2014 where all the talk slides should be available. If you were a speaker, be sure as well to provide an URL of where your presentation slides are.

↧

Andrew Dunstan: One more time: Replication is no substitute for good backups.

October 28, 2014, 5:58 pm

≫ Next: Andreas Scherbaum: FOSDEM PGDay and Devroom 2015 - Announcement & Call for Papers

≪ Previous: Michael Paquier: Make a logical receiver behave as a synchronous standby

I don't know how many times I have had to try to drum this into clients' heads. Having an up to date replica won't protect you against certain kinds of failures. If you really want to protect your data, you need to use a proper backup solution - preferable a continuous backup solution. The ones I prefer to use are barman and wal-e. Both have strengths and weaknesses, but both are incredibly useful, and fairly well documented and simple to set up. If you're not using one of them, or something similar, your data is at risk.

(In case you haven't guessed, today is another of those days when I'm called in to help someone where the master and the replica are corrupted and the last trusted pg_dump backup is four days old and rolling back to it would cost a world of pain. I like these jobs. They can stretch your ingenuity, and no two are exactly alike. But I'd still rather be paid for something more productive.)

↧

Andreas Scherbaum: FOSDEM PGDay and Devroom 2015 - Announcement & Call for Papers

October 28, 2014, 11:34 pm

≫ Next: Hans-Juergen Schoenig: Analytics: Lagging entire rows

≪ Previous: Andrew Dunstan: One more time: Replication is no substitute for good backups.

Andreas 'ads' Scherbaum

FOSDEM PGDay is a one day conference that will be held ahead of FOSDEM in Brussels, Belgium, on Jan 3th, 2015. This will be a one-day focused PostgreSQL event, with a single track of talks. This conference day will be for-charge and cost 50€, and will be held at the Brussels Marriott Hotel. Registration is required to attend, the registration is open. Since we have a limited number of seats available for this event, we urge everybody to register as soon as possible once open.

PostgreSQL Europe will also have our regular devroom at FOSDEM on Saturday the 31st, which will be held at the main FOSDEM venue at ULB. This day will, of course, continue to be free of charge and open to all FOSDEM entrants. No registration is required to attend this day.

For full details about the conference, venue and hotel, see http://fosdem2015.pgconf.eu/.

The call for papers is now open for both these events. We are looking for talks to fill both these days with content for both insiders and new users. Please see http://fosdem2015.pgconf.eu/callforpapers/ for details and submission information.

The deadline for submissions is November 24th, 2014, but we may as usual pre-approve some talks, so get your submissions in soon!

We also have negotiated rate with the Brussels Marriott Hotel. For details, see http://fosdem2015.pgconf.eu/venue/.

↧

Hans-Juergen Schoenig: Analytics: Lagging entire rows

October 29, 2014, 2:58 am

≫ Next: Robert Hodges: An Ending and a Beginning: VMware Has Acquired Continuent

≪ Previous: Andreas Scherbaum: FOSDEM PGDay and Devroom 2015 - Announcement & Call for Papers

PostgreSQL has offered support for powerful analytics and windowing for a couple of years now already. Many people all around the globe use analytics to make their applications more powerful and even faster. However, there is a small little feature in the area of analytics which is not that widely known. The power to use composite […]

↧

Robert Hodges: An Ending and a Beginning: VMware Has Acquired Continuent

October 29, 2014, 8:00 am

≫ Next: Simon Riggs: Where lies the truth?

≪ Previous: Hans-Juergen Schoenig: Analytics: Lagging entire rows

As of today, Continuent is part of VMware. We are absolutely over the moon about it.

You can read more about the news on the VMware vCloud blog by Ajay Patel, our new boss. There’s also an official post on our Continuent company blog. In a nutshell the Continuent team is joining the VMware Cloud Services Division. We will continue to improve, sell, and support our Tungsten products and work on innovative integration into VMware’s product line.

So why do I feel exhilarated about joining VMware? There are three reasons.

1. Continuent is joining a world-class company that is the leader in virtualization and cloud infrastructure solutions. Even better, VMware understands the value of data to businesses. They share our vision of managing an integrated fabric of standard DBMS platforms, both in public clouds as well as in local data centers. It is a great home to advance our work for many years to come.

2. We can continue to support our existing users and make Tungsten even better. I know many of you have made big decisions to adopt Continuent technology that would affect your careers if they turned out badly. We now have more resources and a mandate to grow our product line. We will be able to uphold our commitments to you and your businesses.

3. It’s a great outcome for our team, which has worked for many years to make Continuent Tungsten technology successful. This includes our investors at Aura in Helsinki, who have been dogged in their support throughout our journey.

Speaking of the Continuent team…I am so proud of what all of you have achieved. Today we are starting a new chapter in our work together. See you at VMware!

↧

Simon Riggs: Where lies the truth?

October 30, 2014, 2:28 am

≫ Next: Feng Tian: Running TPCH on PostgreSQL (Part 1)

≪ Previous: Robert Hodges: An Ending and a Beginning: VMware Has Acquired Continuent

Ben Bradlee, the former editor of the Washington Post died recently. A famous speech of his from 1997 contains some words that mean something for me. It starts like this

"Newspapers don't tell the truth under many different, and occasionally innocent, scenarios. Mostly when they don't know the truth. Or when they quote someone who does not know the truth.

And more and more, when they quote someone who is spinning the truth, shaping it to some preconceived version of a story that is supposed to be somehow better than the truth, omitting details that could be embarrassing.

And finally, when they quote someone who is flat out lying...."

and summarises with

"Where lies the truth? That's the question that pulled us into this business, as it propelled Diogenes through the streets of Athens looking for an honest man. The more aggressive our serach for the truth, the more people are offended by the press. The more complicated are the issues and the more sophisticated are the ways to disguise the truth, the more aggressive our search for the truth must be, and the more offensive we are sure to become to some. So be it."

before ending

"I take great strength from that now, knowing that in my experience the truth does emerge. It takes forever sometimes, but it does emerge. And that any relaxation by the press will be extremely costly to democracy."

Who would have that that his words apply so well to PostgreSQL and especially to the cost of data integrity? Yes, referential integrity does require additional performance to make it work right, but how else can we be sure that we are passing valid data around? Surely the purpose of a database needs to be primarily a home for the truth, verified to be so by cross checks and constraints.

↧

Feng Tian: Running TPCH on PostgreSQL (Part 1)

October 17, 2014, 11:22 am

≫ Next: Mischa Spiegelmock: MySQL vs PostgreSQL - Why You Care

≪ Previous: Simon Riggs: Where lies the truth?

We have just release our 9.3.5.S for public beta test. Together with the product, we released a benchmark based on TPCH. The modification to data types is easy to understand -- money and double types are faster than Numeric (and no one on this planet has a bank account that overflows the money type, not any time soon). The modifications to queries are more interesting.

We modified the queries not because we want to show off how fast Vitesse DB is. Without these modifications, some query will never finish. We have seen similar queries, and similar modifications required in the field. Overall, PostgreSQL is well capable of running the workload of TPCH as long as developers pay attention to some "tricks".

Now let's look at them,

Q2: The skeleton of Q2 look like
select xxx from t1, t2, ... where foo = (select min(bar) from tx, ty, .. where tx.field = t1.field ...);

This is correlated subquery (tx.field = t1.field) with aggregate. You can pull the subquery out with a join,

select xxx from t1, t2, ...
(select tx.field, min(bar) as min_bar from tx, ty ... group by tx.field) tmpt
where t1.field = tmpt.field and foo = min_bar ...

Performance of join (in this case, hash join, very fast) is two orders of magnitude faster than the subplan query.

Same trick is applied to Q17.

In the next post, I will examine Q20 which uses CTE (with clause).

↧

Mischa Spiegelmock: MySQL vs PostgreSQL - Why You Care

October 19, 2014, 1:44 am

≫ Next: torsten foertsch: My first postgres extension

≪ Previous: Feng Tian: Running TPCH on PostgreSQL (Part 1)

Like many others before and since, my introduction to the world of databases was via MySQL. It was a good enough database that did what I wanted, so I was content to use it without realizing what I was missing out on.

Comparing MySQL to PostgreSQL is a bit like comparing notepad.exe to emacs. While they both let you type in some text and save it to a file, that doesn’t quite tell the whole story. They are both RDBMSes that allow for the execution of queries via a client connection, but the philosophies, capabilities and sophistication of each are hardly even on the same plane of existence.

The way I think about it is that MySQL is a simple database program. Technically it does have pluggable engines; you can choose between a terrifically useless engine and one that at least knows what a transaction is. I consider Pg more of a “database framework”: an incredibly sophisticated and flexible program that provides a standard frontend interface, much as emacs ties everything together through “modes” and elisp, Pg has a C client library and speaks SQL.
To give you a more concrete example of what I mean, in Pg you can create “foreign data wrappers”, where the actual implementation of a query can be defined by a plugin. One example of a foreign wrapper is “MySQL" - you connect to Pg like normal through your client library and run a SQL query, but it’s actually executed on a remote MySQL server and presented as a table view in Pg. Or perhaps your query is auto-parallelized and distributed among many mongoDB nodes instead. No big deal.
Pg also has an astonishingly powerful set of extensions. My absolute favorite is PostGIS, a geospatial extension that adds geographic and geometric types, operators, indexed storage and much much more. Seeing a demo of it at the SFPUG meetups blew my mind, and I’ve been an avid user of it ever since.
Did I mention it also has a full-text search capability every bit as useful as Solr or ElasticSearch, with stemming and your choice of GIN or GiST indexes? My life is so much better now that I can simply use a trigger to keep my tsearch columns updated instead of application-level logic to keep Solr indexes in sync.
Pg is chock-full of awesomesauce that I’ve used to replace many moving parts of my application. I chucked out Solr and replaced it with PostGIS and tsearch. Also I ditched ZeroMQ (nothing against ZMQ - it’s a great project) and just use Pg’s built-in async message queue instead. Oh, you didn’t know it had a message queueing system? As a matter of fact it does, and I gave a talk on it for SFPUG at UC Berkeley, coincidentally a few feet from where the precursor to Postgres was first written. In my talk I showed how to construct a location-based storage engine using PostGIS and a single trigger, which would fire off an async notif of JSON-encoded lat/lng updates whenever a row was inserted or updated. Add in a WebSocket<->Pg client/server (such as the esteemed WSNotify) and you have a real-time event push system that notifies a JavaScript browser client whenever a location field on a row is changed, all without a single line of application code (other than the SQL trigger). Let’s see MySQL do that. (Slides and example code are here: https://github.com/revmischa/pgnotify-demos)

We all love fancy features and reducing the number of moving parts in our infrastructure, but I actually think the most compelling argument for Pg is that it is an community-driven project and is not owned by Oracle. MySQL is not an open project, and it is owned by Oracle. One does not need to look far for bone-chilling examples of Oracle’s management of open-source database projects. Let’s talk about BDB.
BDB, the Berkeley DataBase, was a nice little embedded database engine used by many pieces of software dating all the way back to 1994. Any of the maintainers of software using BDB might be in for a bit of a nasty shock should they decide to upgrade to the latest version, now that Oracle’s acquired the software and changed the terms of the license. Now they have two options which are helpfully explained on Oracle’s website.
You can choose from option A - the “open source” version, or you can choose option B - the “pay Oracle money” version. That webpage does seem to leave out one minor little detail though, the “open source” version is actually the Affero GPL. The AGPL sounds a lot like the GPL but with an important difference – not only does it require any programs you distribute to provide the source code, but also covers server software as well (N.B. I am totally not a lawyer and probably don’t know what I’m talking about).
Many projects sort of skirt around the requirements of sharing their code even though they use GPL software because they don’t actually distribute binaries, instead they just run a server and let it communicate to clients. The AGPL was designed to close that rather sizable loophole. What this means in practice is that the thousands of existing commercial products that use BDB, or use code that uses BDB, are all going to be prevented from upgrading to the latest BDB unless they make their product open source… or choose option B (cough up the dough).
You gotta give Larry props for that one. I respect him and am very pleased with the fact that at least someone in this world is willing to go full James Bond villain, complete with tropical island fortress. However I’ll stick with the community-run project for my database system.
Last point to make here: there is clearly a hard limit on how good MySQL is going to be able to get. If it was truly awesome and powerful, why would anyone need to buy Oracle DB?

Well at this point you might be saying “gee, that PostgreSQL option does sound pretty nifty, but how on Earth am I going to switch my existing application to Postgres?” Worry not, friend. There is a very handy DB dump conversion tool (mysql2pgsql) which does the job for you. I used it myself and had a bit of trouble with converting BLOBs and some ordering of foreign keys, but I was able to patch those problems up quite easily and get my changes upstream, so no big deal. I switched in 2012 or so and haven’t looked back since.

Finally, the PostgreSQL community is wonderful. I have always had a great time at the SF postgres user group and seen some amazing stuff people do with it that you would never imagine could be done with your database server if you’re stuck in the tiny MySQL world. Go check it out.

I watch this video every day:

↧

torsten foertsch: My first postgres extension

October 27, 2014, 1:16 pm

≫ Next: Josh Berkus: Upcoming Seattle Visit

≪ Previous: Mischa Spiegelmock: MySQL vs PostgreSQL - Why You Care

After attending this year's pgconf.eu I decided to write my first postgres extension. I did this with 2 goals in mind. First of all, I wanted the extension. Secondly, I wanted to document how long it takes and how complicated it is.
A few words to my background, I am mainly a Perl programmer with a few years experience in programming C before I switched to Perl. Knowing C, I am not afraid to write Perl extensions in XS. Also, I am a contributor to the Apache mod_perl project which is also written in C. I know Postgres since version 6.4. But I haven't really delved into it until last year. Since then, I have learned a lot and even found a bug in postgres. The conference in Madrid was my second event of that type. Before I attended the PG conference in Dublin last year.

The idea

At work, we replicate databases over transcontinental links. Monitoring the replication lag is essential for us. With a stock PG 9.3, one can use the pg_stat_replication view on the master to get a picture how far the replicas lag behind. Usually, I use a command like this:

SELECT client_addr,
       application_name,
       state,
       flush_location,
       pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),
                                            flush_location)) as lag
  FROM pg_stat_replication
 ORDER BY 2, 1;

There are 2 points to watch out. First, the query must run on the master. This may or may not be a problem. Second, the lag output is in units of bytes. You may have a guess of the average bandwidth between master and replica. So, you might be able to derive an educated guess how long it will take for the lag to dissolve.
Anyway, at the conference I listened to a talk by Petr Jelinek about Uni Directional Replication. He mentioned to have measured the replication lag in units of time. Later I asked him how. It turned out he simply updated a special table with at timestamp. That way, records containing the time on the master are written to the WAL stream and replicated to the slave. Given both master and slave are time-synchronized by means of NTP or similar, one could simply compare the current clock time with the time in that table on the slave and get an impression at what time the WAL that is currently replayed on the slave was generated on the master and, thus, how far the slave lags behind in units of time. After a bit of pondering I decided that a custom background worker, a feature that comes with PG 9.3, is the right way to implement the idea.

Ready, steady, ...

It is the Sunday after the conference. At 12:45 I started my little project. First, I tried to find a template. So, I asked aunt Google for postgres extension. One of the first links that came up was http://www.postgresql.org/docs/9.3/static/extend-extensions.html. Here I read for a while what the constituents of an extension are. The next link also looked promising: http://pgxn.org/. First, I browsed a few extensions. Then I spotted the custom background worker link in the tag cloud on the home page. It led me to a page with 2 projects, config_log and worker_spark. The latter one has a repository on Github with exactly one commit and that only contained an almost empty README.md.

So, I went for config_log. After reading the README I made sure I have a pg_config and that it is found in $PATH.

Then:

git clone git://github.com/ibarwick/config_log.git
cd config_log
make &&
sudo make install

That went smoothly!

I created the extension in my database and played a bit with it. Seemed to work as expected. By now, it's 13:45.

Next, I studied the code and decided to first create a streaming_lag.control file:

# streaming_lag extension
comment = 'streaming lag in seconds instead of bytes'
default_version = '0.0.1'
module_pathname = '$libdir/streaming_lag'
relocatable = true
superuser = true

and a Makefile for my extension:

MODULES = streaming_lag
OBJS = streaming_lag.o

EXTENSION = streaming_lag
EXVERSION = $(shell sed -n \
              "/^default_version[[:space:]]*=/s/.*'\(.*\)'.*/\1/p" \
              streaming_lag.control)

DATA = streaming_lag--$(EXVERSION).sql

PG_CONFIG = pg_config

# verify version is 9.3 or later

PG93 = $(shell $(PG_CONFIG) --version | \
         (IFS="$${IFS}." read x v m s; expr $$v \* 100 + $$m \>= 903))

ifneq ($(PG93),1)
$(error Requires PostgreSQL 9.3 or later)
endif

PGXS := $(shell $(PG_CONFIG) --pgxs)

include $(PGXS)

The clock shows 14:30 and I start to work on streaming_lag.c where the actual code resides. With a 2 hour break for a late lunch I worked until midnight. Then I had a completely functional version of my extension on Github including documentation. Also by midnight, I had figured out how to get an account on pgxn.org and requested one.

Remarks

Besides the config_log code I studied several postgres header files and C code. The documentation there is amazing. A great source of information was also the worker_spi extension that I later found in the postgres source in contrib/.
In my extension I need a constantly ticking clock. At first I tried to use the timer mechanism provided on Linux by timer_create() because according to POSIX this is the way to go in new programs. But I wasn't able to link my shared library with -lrt in a way that it loads then correctly into the backend process. So, I switched back to the older setitimer() technique. I searched a bit the postgres code to see if it provides an interval timer. I found a place where it uses setitimer() but the it_interval part that makes the timer recurring was not usable through the provided interface. As this is my first encounter with the postgres source code, maybe I have overlooked something.
At first, I implemented the extension similar to worker_spi which creates all the necessary tables when the background worker starts up. Then I read about relocatable extensions and decided to go that way. Now, I even had something to put into the 3rd constituent of an extension, the SQL file.

The next day

First, I found a reply in my inbox confirming my new account on pxgn.org. Now, the extension can also be found there.

So far I was quite convinced that my extension works but a good piece of software requires automated tests. I couldn't find, however, a simple and reliable way to do that. And since the main goal of this effort was to get a glimpse of how to create postgres extensions, I decided to manually set up a test scenario between a postgres instance on my notebook as replica and a database on a server nearby connected via WIFI as master.

The extension was installed in the default configuration except for the precision GUC. Here I configured to write a timestamp every 200msec. Then I created a simple table with one column and 10 million rows. To generate WAL I simply updated all rows. I figured, over WIFI that should generate a measurable lag.

And it did. Using psql's \watch utility and redirecting the output into a file, I was able to produce the following diagram.

I spent another 3 hours doing that. Then I sat down to write this blog. BTW, this is also the first time I use blogger.com.

Lessons learned

writing postgres extensions, even in C, is easy
it took me about 2.5 hours from scratch to find all the resources required to implement the project
the whole effort including the real test on the next day took about 12 hours. I expected much more!

↧

Josh Berkus: Upcoming Seattle Visit

October 30, 2014, 4:32 pm

≫ Next: Michael Paquier: The Blackhole Extension

≪ Previous: torsten foertsch: My first postgres extension

I will be in Seattle soon on business. This will include two opportunties to do PostgreSQL community stuff:

A meetup with SEAPUG on the 12th, where I will talk about the upcoming 9.4 features (not on the calendar yet, hopefully it'll be fixed soon)
A BOF at Usenix LISA where I talk about the same, only to a different crowd.

If you're in the Seattle area, come chat! We'll go out to Dilletante and have chocolate!

↧

Michael Paquier: The Blackhole Extension

October 30, 2014, 5:52 pm

≫ Next: Josh Berkus: Finding Foreign Keys with No Indexes

≪ Previous: Josh Berkus: Upcoming Seattle Visit

Behind this eye-catching title is an extension called blackhole that I implemented yesterday, tired of needing to always structure a fresh extension when needing one (well copying one from Postgres contrib/ would be fine as well). Similarly to blackhole_fdw that is aimed to be an extension for a foreign-data wrapper, blackhole is an extension wanted as minimalistic as possible that can be used as a base template to develop a Postgres extension in C.

When using it for your own extension, simply copy its code, create a new git branch or whatever, and then replace the keyword blackhole by something you want in the code. Note as well that the following files need to be renamed:

blackhole--1.0.sql
blackhole.c
blackhole.control

Once installed in a vanilla state, this extension does not really do much, as it only contains a C function called blackhole, able to do the following non-fancy thing:

=# \dx+ blackhole
Objects in extension "blackhole"
  Object Description
----------------------
 function blackhole()
(1 row)
=# SELECT blackhole();
 blackhole
-----------
 null
 (1 row)

Yes it simply returns a NULL string.

The code of this template is available here, or blackhole/ with the rest of a set of PostgreSQL plugins managed in the repository pg_plugins. Hope that's useful (or not). In case, if you have ideas to improve it, feel free to send a pull request, but let's keep it as small as possible.

And Happy Halloween!

↧

Josh Berkus: Finding Foreign Keys with No Indexes

November 1, 2014, 10:25 am

≫ Next: Andrew Dunstan: Assignment beats SELECT INTO

≪ Previous: Michael Paquier: The Blackhole Extension

Unlike some other SQL databases, PostgreSQL does not automatically create indexes on the "child" (in formal language "referencing") side of a foreign key. There are some good reasons for this (see below), but it does give users another opportunity to forget something, which is indexing the foreign keys (FKs) that need it. But which ones need it? Well, fortunately, you can interrogate the system catalogs and find out.

I have added a query for finding foreign keys without indexes to pgx_scripts. Indexes on the "parent", or "referenced" side of the FK are automatically indexed (they have to be, because they need to be unique), so we won't talk about them further.

Now, in order to understand how to interpret this, you have to understand why you would or would not have an index on an FK, and what sort of indexes are valid. There's two times that indexes on the child side of FKs are used:

when doing JOIN and lookup queries using the FK column
when updating or deleting a row from the "parent" table

The second occasion is news to some DBAs. The way it works is this: before letting you delete or change a row in the "parent" table, Postgres has to verify that there are no rows in the "child" table referencing the FK value that might be going away. If there are, it needs to perform the action you have defined (such as CASCADE, SET NULL or RESTRICT). If the "child" table is large, this can be substantially speeded up by having an index on the FK column.

This means that it's important to have an index on the child side of the FK if any of the following are true:

The child table is large and the parent table gets updates/deletes
The parent table is large and the FK is used for JOINs
The child table is large and the FK is used to filter (WHERE clause) records on the child table

This means most FKs, but not all of them. If both tables are small, or if the parent table is small and the FK is used only to prevent bad data entry, then there's no reason to index it. Also, if the FK is very low cardinality (like, say, only four possible values) then it's probably also a waste of resources to index it.

Now you're ready to run the query on your own database and look at the results. The query tries to filter for the best indexing candidates, but it is just a query and you need to use your judgement on what you know about the tables. The query also filters for either the parent or child table being larger than 10MB.

Now, you might say "but I have an index on column Y" and wonder why it's appearing on the report. That's probably because the FK does match the first columns of the index. For example, an index on ( name, team_id ) cannot be used for an FK on team_id.

You may notice a 2nd section of the report called "questionable indexes". These are FKs which have an index available, but that index may not be usable for JOINs and constraint enforcement, or may be very inefficient. This includes:

Non-BTree indexes. Currently other types of indexes can't be used for FK enforcement, although that is likely to change in future Postgres versions. But they can sometimes be used for joins.
Indexes with more than one column in addition to the FK columns. These indexes can be used for FKs, but they may be very inefficient at it due to the bigger size and extra index levels.
Partial indexes (i.e. INDEX ... WHERE). In general, these cannot be used for FK enforcement, but they can sometimes be used for joins.

It's not quite as clear when you want to add to, or replace those indexes. My advice is to start with the missing indexes and then move on to the more nuanced cases.

↧

Andrew Dunstan: Assignment beats SELECT INTO

November 3, 2014, 10:41 am

≫ Next: gabrielle roth: PgConf.EU recap

≪ Previous: Josh Berkus: Finding Foreign Keys with No Indexes

While working on some customer code, I noticed that they have a lot of code that reads like this:

SELECT a,b,c
INTO foo.x, foo,y, foo.z;

I wondered why they were doing it that way, and if it might be easier to read if it was just:

foo := (a,b,c);

Now, these aren't quite the same, especially if foo has more than three fields. But even that could be got around.

But before I tried this out I decided to see how they performed. Here's what happened:

andrew=# do $x$ 
declare 
   r abc; 
begin 
   for i in 1 .. 10000000 
   loop 
      select 'a','b',i into r.x,r.y,r.z; 
   end loop; 
end; 
$x$;
DO
Time: 63731.434 ms

andrew=# do $x$ 
declare 
   r abc; 
begin 
   for i in 1 .. 10000000 
   loop 
      r := ('a','b',i); 
   end loop; 
end; 
$x$;
DO
Time: 18744.151 ms

That's a very big difference! Direct assignment takes less than 30% of the time that SELECT INTO takes.

I'm going to dig into why this happens, but meanwhile, I have quite a lot of low hanging performance fruit to pick as a result of this.

↧

gabrielle roth: PgConf.EU recap

November 3, 2014, 5:20 pm

≫ Next: Feng Tian: TPCH on PostgreSQL (Part 2)

≪ Previous: Andrew Dunstan: Assignment beats SELECT INTO

I’m safely home from PgConf.EU. Madrid at this time of year was glorious, particularly to this Portlander. (I came home to a steady 12*C and rainy for the next week or … so ;)) We had over 300 attendees, making this the biggest Postgres conference to date, I hear. Of course, I couldn’t get to […]

↧