create or replace function hstore_to_json(h hstore) returns text language sql as $f$ select '{' || array_to_string(array_agg( '"' || regexp_replace(key,E'[\\"]',E'\\\&','g') || '":' || case when value is null then 'null' when value ~ '^true|false|(-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?)$' then value else '"' || regexp_replace(value,E'[\\"]',E'\\\&','g') || '"' end ),',') || '}' from each($1) $f$;
Andrew Dunstan: Another transformation
Andrew Dunstan: Tree climbing
John DeSoi: pgEdit on GitHub
The source for the pgEdit TextMate bundle is now available on GitHub at https://github.com/desoi/pgedit-textmate.
Keith: PG Extractor - Got Git
I've finally gotten Git support added into pg_extractor. This works pretty much the same as the SVN option did already. One important difference is that there are two options for committing
--git
This just does a local commit to a locally maintained repository
--gitpush
This does a local commit as well as push to an already configured remote repository
You use either one option or the other, not both. The Git options also expects a proper .gitconfig file for your environment to be set up for the user running pg_extractor. There is no option for passing the git username like SVN has (and I don't see a need for one). Remote repositories will also have to be configured in advance of using the push option.
An important thing to note about using svn or git options with pg_extractor is that it does not do any initial VCS setups on the folders it creates and outputs too. It's best to run it first without any VCS options to get an initial dump and perform a manual commit (and/or push with git). Then for any future runs of pg_extractor, use the VCS options to track changes.
As always, please report any bugs or issues!
Tags:
Bruce Momjian: TOAST Queries
As a followup to my previous blog entry, I want to show queries that allow users to analyze TOAST tables. First, we find the TOAST details about the test heap table:
SELECT oid, relname, reltoastrelid, reltoastidxid FROM pg_class where relname = 'test'; oid | relname | reltoastrelid | reltoastidxid -------+---------+---------------+--------------- 17172 | test | 17175 | 0
Bruce Momjian: New Server
A few weeks ago, I finally replaced my eight-year-old home server. The age of my server, and its operating system, (BSD/OS, last officially updated in 2002) were a frequent source of amusement among Postgres community members. The new server is:
Super Micro 7046A-T 4U Tower Workstation 2 x Intel Xeon E5620 2.4GHz Quad-Core Processors Crucial 24GB Dual-Rank PC3-10600 DDR3 SDRAM Intel 160GB 320 Series SSD Drive 4 x Western Digital Caviar Green 2TB Hard Drives
Mark Wong: January Meeting Recap
11 people showed up for our first meeting of the new year. Thanks to Iovation for providing a comfortable space with pizza.
We will be having another PRP soon, as well as a YAMS hackathon. We may combine the two into one event. Watch this space for details.
Here are Tim’s slides from his Database Trending talk last night. I can’t wait to try this at home!
I converted his .odp slides to .ppt so’s I could a) upload them to WP (no .odp allowed!) and b) include his notes.

Marc Balmer: Get Database Security Right
From a users perspective, this is nice: The user does not even need to know there is a database under the hood. From a security perspective, this is a nightmare, for mostly two reasons.
Continue reading "Get Database Security Right"
Hubert 'depesz' Lubaczewski: Waiting for 9.2 – NULLS from pg_*_size() functions
Selena Deckelmann: I’m keynoting today at SCALE10x
Slides (as of this moment) are here: Mistakes were made. I changed quite a bit of the beginning and end, given how bit the audience is. Previous talks, we’ve usually ended with a fun “omg, here’s the craziest story I know” session. I imagine we’ll get a little bit of that today.
Postgres folks will note a relevant picture on slide 13.
This is my first keynote! Thanks so much to SCALE for inviting me. There were at least 1500 registered attendees as of Friday, so looking forward to a big crowd.
Valentine Gogichashvili: Schema based versioning and deployment for PostgreSQL
SECURITY DEFINER
feature of PostgreSQL).This approach has some disadvantages of course. One of the biggest technical problems, that is very easily becoming an organizational problem if you have a relatively big teem of developers, a problem of how to rapidly rollout new features without touching old functioning stored procedures, so that old versions of your upper level applications can still access the previous versions of stored procedures, and newly rolled out nodes with new software stack on them, access new stored procedures doing something more, or less, or returning some other data sets compared to their previous versions. And of course hundreds of stored procedures that are there to access and manipulate data are enough to make any attempt to keep all new versions of them backwards compatible, a nightmare.
Classical way to do this, would be to keep all the changes backwards compatible and if it is not possible, then create a new version of a stored procedure with some version suffix like
_v2
, mark the previous version as deprecated and after all your software stack is rolled out to use that new function, just drop the previous version. But if you are rolling out new version of the whole stack once of twice a week, the control of what is used and that is not becomes quite a challenge... and discipline of all the developers should be really good as well. Stored procedures are not the only objects, that are changing together with them. The return or input types can change as well. Changing of a return type, that is used by more then 2 stored procedures in a backwards compatible fashion is a pure horror if you want to do it without creating a new version of such a type and new versions of all the stored procedures, that use it. Dependency control becomes another problem.My solution to that problem was to introduce a schema based versioning of PostgreSQL stored procedures. It uses an idea of PostgreSQL schema and
search_path
for a session.So all the stored procedures, that are exposed to the client software stack, are grouped in one API schema that contains only stored procedures and types needed by them.
Schema name contains a version in it, like
proj_api_vX_Y_Z
, where X_Y_Z
is a version, that a software stack is targeted to. Software stack does SET search_path to proj_api_vX_Y_Z, public;
immedeately after it gets a connection from the pool and all calls to the stored procedures are done without explicitly specifying a schema name for that API stored procedure and PostgreSQL finds the needed stored procedure from the specified schema.So when a branch is stable and branch version is fixed, it is used as a property that will be used when setting the default
search_path
for the software, that is being deployed for that branch. For example in Java using BoneCP JDBC Pool, setting an inintSQL
property of all the pools used to access proj
database.We are storing the sources of all the stored procedures (and other database objects) in a special database directory structure that is checked in into a usual SCM system. All the files sorted in corresponding folders and are prefixed with a 2 digit numeric prefix to ensure the order of sorting (good old BASIC times :) ). Like
50_proj_api | ||
00_create_schema.sql | ||
20_types | ||
20_simple_object_input_type.sql | ||
30_stored_procedures | ||
20_get_object.sql | ||
20_set_object.sql |
Here
00_create_schema.sql
file is containing CREATE SCHEMA proj_api;
statement, statements to set default security options for newly created stored procedures and a SET search_path TO proj_api, public;
statement, that ensures, that all the objects, that are coming after that file are injected into the correct API schema. An example of 00_create_schema.sql
file can look like:RESET role;
CREATE SCHEMA proj_api AUTHORIZATION proj_api_owner;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres IN SCHEMA proj_api REVOKE EXECUTE ON FUNCTIONS FROM public;
GRANT USAGE ON SCHEMA proj_api TO proj_api_usage;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres IN SCHEMA proj_api GRANT EXECUTE ON FUNCTIONS TO proj_api_executor;
ALTER DEFAULT PRIVILEGES FOR ROLE proj_api_owner IN SCHEMA proj_api GRANT EXECUTE ON FUNCTIONS TO proj_api_executor;
ALTER DEFAULT PRIVILEGES IN SCHEMA proj_api GRANT EXECUTE ON FUNCTIONS TO proj_api_executor;
SET search_path to proj_api, public;
DO $SQL$
BEGIN
IF CURRENT_DATABASE() ~ '^(prod|staging|integration)_proj_db$' THEN
-- change default search_path for production, staging and integration databases
EXECUTE 'ALTER DATABASE ' || CURRENT_DATABASE() || ' SET search_path to proj_api, public;';
END IF;
END
$SQL$;
SET role TO proj_api_owner;
This kind of layout gives a possibility to bootstrap API schema objects into a needed database easily and that is very important, to keep track of all the database logic changes in SCM system that lets you review and compare the changes between releases.
Bootstrapping into a development database can be done by a very easy script like:
(In case of development database, we are actually bootstrapping all the objects including tables into a freshly prepared database instance, so that integration tests can run and modify data as they want.
echo 'DROP SCHEMA proj_api CASCADE;'
find 50_proj -type f -name '*.sql' \
| sort \
| xargs cat \
) | psql dev_proj_db -1 -f -
Injecting into a production or staging database can be automated and implemented with different kind of additional checks, but at the end it is something like:
(So after that, we have a fresh copy of the whole shiny API schema with all the dependencies rolled out to the production database. And this schema objects are only accessed by the software, that is supposed to do so, that is tested to run with this very combination and this versions of the stored procedures and depended types. And if we see any problems with the rollout, we can just rollback the software stack so it can still access our old stored procedures, located in a schema with previous version of out API.
cat 50_proj/00_create_schema | sed s/proj_api/proj_api_vX_Y_Z/g
find 50_proj -type f -name '*.sql' ! -name '00_create_schema.sql' \
| sort \
| xargs cat \
) | psql prod_proj_db -1 -f -
This method does not solve the problem of versioning of tables in our data schema (we would keep all the tables, related objects and low level transformation stored procedures in proj_data schema) but for that, there is a very simple, but very nice, solution, http://www.depesz.com/index.php/2010/08/22/versioning/ suggested and implemented by Depesz. Of cause, changes in table structure should be still kept backwards compatible and nicely written database diff rollout and rollback files should be written for every such change.
I am not going into details about how to prepare Springs configuration of the JDBC pools for the java clients or how to configure the bootstrapping for integration testing in your Maven project configuration as this information will not add any real value to this blog post that became much longer then I expected from the beginning.
NOTE: Because of a bug in PostgreSQL JDBC driver the types that are used as input parameters for stored procedures cannot be located in different schemas (TYPE OIDs are being searched only by name only, without consideration of a schema and search_path). Patching of the driver is very easy and we did so, in my company to be able to use the schema based versioning in our Java projects. I reported the bug twice already (http://archives.postgresql.org/pgsql-jdbc/2011-03/msg00007.php, http://archives.postgresql.org/pgsql-jdbc/2011-12/msg00083.php), but unfortunately no response from anybody. Probably have to submit a patch myself sometime.
Hubert 'depesz' Lubaczewski: Waiting for 9.2 – split of current_query in pg_stat_activity
Hubert 'depesz' Lubaczewski: Some new tools for PostgreSQL or around PostgreSQL
Andrew Dunstan: Using PLV8 to index JSON
In reality we'd want something a bit more sophisticated than this, but you can get the idea from this. Armed with this function we could now create our index, using the functional index feature of PostgreSQL:CREATE or replace FUNCTION jmember (j json, key text ) RETURNS text LANGUAGE plv8 IMMUTABLE AS $function$ var ej = JSON.parse(j); if (typeof ej != 'object') return NULL; return JSON.stringify(ej[key]); $function$;
Now, when we issue a query likeCREATE INDEX x_in_json ON mytable (jmember(jsonfield,'x'));
It should be able to use the index. This is reasonably analogous to a very simple use of MongoDB's ensureIndex() function.SELECT * FROM mytable WHERE jmember(jsonfield,'x') = 'foo';
We could make this somewhat nicer by providing some operators, and maybe building in a function like this, but the fundamental idea should work pretty much the same.
Bruce Momjian: More Lessons From My Server Migration
The new server is 2-10 times faster than my old 2003 server, but that 10x speedup is only possible for applications that:
- Do lots of random I/O, thanks to the SSDs. Postgres already supports tablespace-specific random_page_cost settings, but it would be interesting to see if there are cases that can be optimized for low random pages costs. This is probably not an immediate requirement because the in-memory algorithms already assume a low random page cost.
- Can be highly parallelized. See my previous blog entry regarding parallelism. The 16 virtual cores in this server certainly offer more parallelism opportunities than my old two-core system.
Other observations:
- It takes serious money to do the job right, roughly USD $4k — hopefully increased productivity and reliability will pay back this investment.
- I actually started the upgrade two years ago by adjusting my scripts to be more portable; this made the migration go much smoother. The same method can be used for migrations to Postgres by rewriting SQL queries to be more portable before the migration. Reliable hardware is often the best way to ensure Postgres reliability.
- My hot-swappable SATA-2 drive bays allow for a flexible hard-drive-based backup solution (no more magnetic tapes). File system snapshots allow similar backups for Postgres tablespaces, but it would be good if this were more flexible. It would also be cool if you could move a drive containing Postgres tablespaces from one server to another (perhaps after freezing the rows).
Andrew Dunstan: Setting up PLV8 on Fedora 16
cd inst.json sudo yum install v8 v8-devel hg clone https://code.google.com/p/plv8js/ cd plv8js PATH=../bin:$PATH make USE_PGXS=1 PATH=../bin:$PATH make USE_PGXS=1 install cd .. bin/createdb testplv8 bin/psql -c 'create extension plv8; create language plv8;' testplv8
Pretty simple, very quick.
Bruce Momjian: The Most Important Postgres CPU Instruction
Postgres consists of roughly 1.1 million lines of C code, which is compiled into an executable with millions of CPU instructions. Of the many CPU machine-language instructions in the Postgres server executable, which one is the most important? That might seem like an odd question, and one that is hard to answer, but I think I know the answer.
You might wonder, "If Postgres is written in C, how would we find the most important machine-language instruction?" Well, there is a trick to that. Postgres is not completely written in C. There is a very small file (1000 lines) with C code that adds specific assembly-language CPU instructions into the executable. This file is called s_lock.h. It is an include file that is referenced in various parts of the server code that allows very fast locking operations. The C language doesn't supply fast-locking infrastructure, so Postgres is required to supply its own locking instructions for all twelve supported CPU architectures. (Operating system kernels do supply locking instructions, but they are much too slow to be used for Postgres.)
Christophe Pettus: PostgreSQL Performance When It’s Not Your Job
My presentation from SCALE 10x, “PostgreSQL Performance When It’s Not Your Job” is now available for download.
Pavel Golub: Joomla! 2.5 with PostgreSQL support officially released
Joomla, one of the world’s most popular open source content management systems (CMS) used for everything from websites to blogs to Intranets, today announces the immediate availability of Joomla 2.5. Along with new features such as advanced search and automatic notification of Joomla core and extension updates, the Joomla CMS for the first time includes multi-database support with the addition of Microsoft SQL Server. Previous versions of Joomla were compatible exclusively with MySQL databases.
Way to go Joomla! But why don’t you guys mention PostgreSQL database in the main release story? Do you really think that MSSQL is more common choice for the database layer for any CMS? Seriously?
Filed under: Announces, PostgreSQL Tagged: CMS, Joomla!, microsoft sql server, PostgreSQL, release

Bruce Momjian: Increasing Database Reliability
While database software can be the cause of outages, for Postgres, it is often not the software but the hardware that causes failures — and storage is often the failing component. Magnetic disk is one of the few moving parts on a computer, and hence prone to breakage, and solid-state drives (SSDs) have a finite write limit.
While waiting for storage to start making loud noises or fail is an option, a better option is to use some type of monitoring that warns of storage failure before it occurs, e.g. enter SMART. SMART is a system developed by storage vendors that allows the operating system to query diagnostics on the drive and warn of unusual storage behavior before failure occurs. While read/write failures are reported by the kernel, SMART parameters often warn of danger before failure occurs. Below is the SMART output from a Western Digital (WDC) WD20EARX magnetic disk drive: