Dimitri Fontaine: Trigger Parameters

August 23, 2013, 3:08 am

≫ Next: Hubert 'depesz' Lubaczewski: OmniPITR v1.2.0 released

≪ Previous: Leo Hsu and Regina Obe: CREATE SCHEMA IF NOT EXISTS in 9.3 and tiger geocoder

Sometimes you want to compute values automatically at INSERT time, like for example a duration column out of a start and an end column, both timestamptz. It's easy enough to do with a BEFORE TRIGGER on your table. What's more complex is to come up with a parametrized spelling of the trigger, where you can attach the same stored procedure to any table even when the column names are different from one another.

I found a kind of trigger that I can use!

The exact problem to solve here is how to code a dynamic trigger where the trigger's function code doesn't have to hard code the field names it will process. Basically, PLpgSQL is a static language and wants to know all about the function data types in use before it compiles it, so there's no easy way to do that.

That said, we now have hstore and it's empowering us a lot here.

The exemple

Let's start simple, with a table having a d_start and a d_end column where to store, as you might have already guessed, a start timestamp (with timezone) and an ending timezone. The goal will be to have a parametrized trigger able to maintain a duration for us automatically, something we should be able to reuse on other tables.

create table foo (
  id serial primary key,
  d_start timestamptz default now(),
  d_end timestamptz,
  duration interval
);

insert into foo(d_start, d_end)
     select now() - 10 * random() * interval '1 min',
            now() + 10 * random() * interval '1 min'
       from generate_series(1, 10);

So now I have a table with 10 lines containing random timestamps, but none of them of course has the duration field set. Let's see about that now.

Playing with hstore

The hstore extension is full of goodies, we will only have to discover a handful of them now.

First thing to do is make hstore available in our test database:

# create extension hstore;
CREATE EXTENSION

And now play with hstore in our table.

# select hstore(foo) from foo limit 1;

 "id"=>"1",
 "d_end"=>"2013-08-23 11:34:53.129109+01",
 "d_start"=>"2013-08-23 11:16:04.869424+01",
 "duration"=>NULL
(1 row)

I edited the result for it to be easier to read, splitting it on more than one line, so if you try that at home you will have a different result.

What's happening in that first example is that we are transforming a row type into a value of type hstore. A row type is the result of select foo from foo;. Each PostgreSQL relation defines a type of the same name, and you can use it as a composite type if you want to.

Now, hstore also provides the #= operator which will replace a given field in a row, look at that:

# select (foo #= hstore('duration', '10 mins')).* from foo limit 1;
 id |            d_start            |             d_end             | duration 
----+-------------------------------+-------------------------------+----------
  1 | 2013-08-23 11:16:04.869424+01 | 2013-08-23 11:34:53.129109+01 | 00:10:00
(1 row)

We just replaced the duration field with the value 10 mins, and to have a better grasp at what just happened, we then use the (...).* notation to expand the row type into its full definition.

We should be ready for the next step now...

The generic trigger, using hstore

Now let's code the trigger:

create or replace function tg_duration()
 -- (
 --  start_name    text,
 --  end_name      text,
 --  duration      interval
 -- )
 returns trigger
 language plpgsql
as $$
declare
   hash hstore := hstore(NEW);
   duration interval;
begin
   duration :=  (hash -> TG_ARGV[1])::timestamptz
              - (hash -> TG_ARGV[0])::timestamptz;

   NEW := NEW #= hstore(TG_ARGV[2], duration::text);

   RETURN NEW;
end;
$$;

And here's how to attach the trigger to our table. Don't forget the FOR EACH ROW part or you will have a hard time understanding why you can't accedd the details of the OLD and NEW records in your trigger: they default to being FOR EACH STATEMENT triggers.

The other important point is how we pass down the column names as argument to the stored procedure above.

create trigger compute_duration
     before insert on foo
          for each row
 execute procedure tg_duration('d_start', 'd_end', 'duration');

Equiped with the trigger properly attached to our table, we can truncate it and insert again some rows:

# truncate foo;
# insert into foo(d_start, d_end)
       select now() - 10 * random() * interval '1 min',
              now() + 10 * random() * interval '1 min'
         from generate_series(1, 10);

# select d_start, d_end, duration from foo;
            d_start            |             d_end             |    duration     
-------------------------------+-------------------------------+-----------------
 2013-08-23 11:56:20.185563+02 | 2013-08-23 12:00:08.188698+02 | 00:03:48.003135
 2013-08-23 11:51:10.933982+02 | 2013-08-23 12:02:08.661389+02 | 00:10:57.727407
 2013-08-23 11:59:44.214844+02 | 2013-08-23 12:00:57.852027+02 | 00:01:13.637183
 2013-08-23 11:50:18.931533+02 | 2013-08-23 12:00:52.752111+02 | 00:10:33.820578
 2013-08-23 11:53:18.811819+02 | 2013-08-23 12:06:30.419106+02 | 00:13:11.607287
 2013-08-23 11:56:33.933842+02 | 2013-08-23 12:01:15.158055+02 | 00:04:41.224213
 2013-08-23 11:57:26.881887+02 | 2013-08-23 12:05:53.724116+02 | 00:08:26.842229
 2013-08-23 11:54:10.897691+02 | 2013-08-23 12:06:27.528534+02 | 00:12:16.630843
 2013-08-23 11:52:17.22929+02  | 2013-08-23 12:02:08.647837+02 | 00:09:51.418547
 2013-08-23 11:58:18.20224+02  | 2013-08-23 12:07:11.170435+02 | 00:08:52.968195
(10 rows)

Conclusion

Thanks to the hstore extension we've been able to come up with a dynamic solution where you can give the name of the columns you want to work with at CREATE TRIGGER time rather than hard-code that in a series of stored procedure that will end up alike and a pain to maintain.

↧

Hubert 'depesz' Lubaczewski: OmniPITR v1.2.0 released

August 23, 2013, 9:51 am

≫ Next: Jim Smith: COMMIT / ROLLBACK in Oracle and PostgreSQL

≪ Previous: Dimitri Fontaine: Trigger Parameters

Title: OmniPITR v1.2.0 released It's been a while since last release, but the new one finally arrived, and has some pretty cool goodies For starters, you can now skip creation of xlog backups – which is nice, if you have ready walarchive, with all xlogs – there is no point in wasting time on creation […]

↧

Jim Smith: COMMIT / ROLLBACK in Oracle and PostgreSQL

August 23, 2013, 11:03 am

≫ Next: Josh Berkus: PostgreSQL plus Vertica on Tuesday: SFPUG Live Video

≪ Previous: Hubert 'depesz' Lubaczewski: OmniPITR v1.2.0 released

COMMIT / ROLLBACK in Oracle and PostgreSQL

David Edwards and Lucas Wagner

Introduction

The use of transactions in relational databases allows a database architect to logically group SQL into chunks of SQL code which can execute using an “all or nothing” strategy. In case of disaster, such as power loss or a network outage, grouping a selection of SQL statements into a transaction means that all of the statements execute – or none of them execute.

Without transactions, a simple network outage could mean that only half of the statements are executed, possibly corrupting the data inside the database. Some pieces of customer data could be inserted into the database whereas other pieces could be lost forever.

In contrast, while using transactions, if all statements are able to be successfully executed, a COMMIT is performed and those changes become permanent. If a failure should occur during the execution of these statements, all changes since the last commit can be undone through the use of a ROLLBACK.

When translating Oracle PL/SQL to PostgreSQL PL/pgSQL, there are some subtle, yet important, scoping differences in how each has implemented COMMIT and ROLLBACK that all database architects should be aware of.

Example Scenario

Consider the following possible scenario where an application program interacts with a database function. The architect would like to do the following things in the form of a transaction:

- INSERT a new row into a table

- If disaster strikes and the insertion does not complete, ROLLBACK

- Otherwise, COMMIT

Oracle’s Way: Back to the Beginning

Oracle’s COMMIT or ROLLBACK statements scope back to the beginning of the transaction no matter where it is located. When an exception occurs, everything between the BEGIN up to and including the failing insert will be undone by an implicit ROLLBACK. If a COMMIT occurs on return from a function, no row will be made permanent. The Oracle ROLLBACK undoes everything since the transaction began – even functions.

In the example code below, calling a ROLLBACK from a function within a function will roll back the rows containing both ‘1’ and ‘2’. It will return back to the very first BEGIN.

CREATE OR REPLACE PROCEDURE testProcedure AS

BEGIN

INSERT INTO testTable(testColumn) VALUES ('1');

nestedTestProcedure;

END testProcedure;

CREATE OR REPLACE PROCEDURE nestedTestProcedure AS

BEGIN

INSERT INTO testTable(testColumn) VALUES ('2');

COMMIT;

EXCEPTION

WHEN others THEN

ROLLBACK;

END nestedTestProcedure;

The PostgreSQL Difference

While Oracle permits transactional statements inside a PL/SQL procedure or function, PostgreSQL does not. If we try to compile the PostgreSQL equivalent of the Oracle code above, it will not compile. It will throw an error:

ERROR: cannot begin/end transactions in PL/pgSQL

HINT: Use a BEGIN block with an EXCEPTION clause instead.

In short, a ROLLBACK cannot span functions, and will only work inside the current function. Everything prior to the invocation of the autonomous transaction is isolated and hence not impacted (i.e., it is neither committed nor rolled back).

In contrast with the Oracle sample code, instead of rolling back the rows containing ‘1’ and ‘2’, PostgreSQL would only roll back ‘2’.

CREATE OR REPLACE FUNCTION testFunction() RETURNS void

AS $$

BEGIN

INSERT INTO testTable(testColumn) VALUES ('1');

PERFORM nestedTestFunction();

END;

$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION nestedTestFunction() RETURNS void

AS $$

BEGIN

INSERT INTO testTable(testColumn) VALUES ('2');

COMMIT;

RETURN;

EXCEPTION

WHEN others THEN

ROLLBACK;

RETURN;

END;

$$ LANGUAGE plpgsql;

However, when translating Oracle code to PostgreSQL where this behavior is expected, we can implement a workaround that can behave like an autonomous transaction. An architect would choose this solution as a workaround (versus re-writing the function to become more "PostgreSQL-like") while working with an existing or mature application in order to minimize impact to the application that would interact with the database.

The workaround involves rewriting the parent (testFunction) to become a wrapper function. We then create an autonomous, rollback child function (nestedTestFunction) which will be opened by using dblink(). In essence, we are opening another connection to the same database and running nestedTestFunction:

CREATE OR REPLACE FUNCTION testFunction() RETURNS integer

AS $$

DECLARE

randomNum text := (random() * 9 + 1);

cnxName text;

cnxString text := 'hostaddr=127.0.0.1 port=5440 dbname=x user=y password=z';

success integer;

status text;

BEGIN

SELECT concat('cnx', randomNum) INTO cnxName;

PERFORM dblink_connect(cnxName, cnxString);

PERFORM dblink_exec(cnxName, ‘BEGIN’);

SELECT * INTO success from dblink(cnxName, 'SELECT nestedTestFunction()')

AS (status text);

IF (success >= 1) THEN

PERFORM dblink_exec(cnxName, 'COMMIT');

PERFORM dblink_disconnect(cnxName);

RETURN 1; -- committed

ELSE

PERFORM dblink_exec(cnxName, 'ROLLBACK');

PERFORM dblink_disconnect(cnxName);

RETURN 0; -- rolled back

END IF;

END;

$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION nestedTestFunction() RETURNS integer

AS $$

BEGIN

INSERT INTO testTable(testcolumn) VALUES (‘1’);

INSERT INTO testTable(testcolumn) VALUES (‘2’);

RETURN 1; -- good status

EXCEPTION

WHEN others THEN

RETURN 0; -- error status

END;

$$ LANGUAGE plpgsql;

Inside the code, the INSERT contained inside another function will be either committed or rolled back based on the result of the function, but neither operation will impact testFunction().

Conclusion

When translating PL/SQL to PL/pgSQL which makes use of COMMITs and ROLLBACKs, there are subtle issues of scope that are greater than what is visible on the surface. It is imperative that the database architect give consideration as to whether any adaptations between the two are required to ensure data integrity.

↧

Josh Berkus: PostgreSQL plus Vertica on Tuesday: SFPUG Live Video

August 23, 2013, 4:11 pm

≫ Next: Raghavendra Rao: How to change all objects ownership in a particular schema in PostgreSQL ?

≪ Previous: Jim Smith: COMMIT / ROLLBACK in Oracle and PostgreSQL

This upcoming Tuesday, the 27th, SFPUG will have live streaming video of Chris Bohn from Etsy talking about how he uses PostgreSQL and Vertica together to do data analysis of Etsy's hundreds of gigabytes of customer traffic. barring technical difficulties with the video or internet, of course.

The video will be on the usual SFPUG Video Channel. It is likely to start around 7:15PM PDT. Questions from the internet will be taken on the attached chat channel.

For those in San Francisco, this event will be held at Etsy's new downtown SF offices, and Etsy is sponsoring a Tacolicious taco bar. Of course, the event is already full up, but you can always join the waiting list.

In other, related events, sfPython will be talking about PostgreSQL performance, and DjangoSF will be talking about multicolumn joins, both on Wednesday the 28th. I'll be at DjangoSF, doing my "5 ways to Crash Postgres" lightning talk.

↧

Raghavendra Rao: How to change all objects ownership in a particular schema in PostgreSQL ?

August 24, 2013, 3:04 am

≫ Next: Valentine Gogichashvili: Real-time console based monitoring of PostgreSQL databases (pg_view)

≪ Previous: Josh Berkus: PostgreSQL plus Vertica on Tuesday: SFPUG Live Video

Few suggesion's here (Thanks), inspired me to compose a bash script for changing all object's (TABLES / SEQUENCES / VIEWS / FUNCTIONS / AGGREGATES / TYPES) ownership in a particular schema in one go. No special code included in a script, I basically picked the technique suggested and simplified the implementation method via script. Actually, REASSIGN OWNED BY command does most of the work smoothly, however, it changes database-wide objects ownership regardless of any schema. Two eventualities, where you may not use REASSIGN OWNED BY:

1. If the user by mistake creates all his objects with super-user(postgres), and later intend to change to other user, then REASSIGN OWNED BY will not work and it merely error out as:

postgres=# reassign owned by postgres to user1;
ERROR:  cannot reassign ownership of objects owned by role postgres because they are required by the database system

2. If user wish to change just only one schema objects ownership.

Either cases of changing objects, from "postgres" user to other user or just changing only one schema objects, we need to loop through each object by collecting object details from pg_catalog's & information_schema and calling ALTER TABLE / FUNCTION / AGGREGATE / TYPE etc.

I liked the technique of tweaking pg_dump output using OS commands(sed/egrep), because it known that by nature the pg_dump writes ALTER .. OWNER TO of every object (TABLES / SEQUENCES / VIEWS / FUNCTIONS / AGGREGATES / TYPES) in its output. Grep'ing those statements from pg_dump stdout by replacing new USER/SCHEMANAME with sed and then passing back those statements to psql client will fix the things even if the object owned by Postgres user. I used same approach in script and allowed user to pass NEW USER NAME and SCHEMA NAME, so to replace it in ALTER...OWNER TO.. statement.

Script usage and output:

sh change_owner.sh  -n new_rolename -S schema_name

-bash-4.1$ sh change_owner.sh -n user1 -S public

 Summary:
        Tables/Sequences/Views : 16
        Functions              : 43
        Aggregates             : 1
        Type                   : 2

You can download the script from here, and there's also README to help you on the usage.

--Raghav

↧

Valentine Gogichashvili: Real-time console based monitoring of PostgreSQL databases (pg_view)

August 24, 2013, 4:11 am

≫ Next: Michael Paquier: Postgres module highlight: customize passwordcheck to secure your database

≪ Previous: Raghavendra Rao: How to change all objects ownership in a particular schema in PostgreSQL ?

In many cases, it is important to be able to keep your hand on the pulse of your database in real-time. For example when you are running a big migration task that can introduce some unexpected locks, or when you are trying to understand how the current long running query is influencing your IO subsystem.

For a long time I was using a very simple bash alias that was injected from the .bashrc script and that included the calls to system utilities like watch, iostat, uptime, df, some additional statistics from the /proc/meminfo and psql that was extracting information about currently running queries and if that queries are waiting for a lock. But this approach had several disadvantages. In many cases I was interested in the disk read/write information for query processes or PostgreSQL system processes, like WAL and archive writers. Also I wanted to have a really easy way to notice the queries that are waiting for locks and probably highlight them by color.
Several weeks ago we finally open-sourced our new tool, that makes our lives much easier. That tool combines all the feature requests that I was dreaming of for a long time. Here it is: pg_view.

I already have some more feature requests actually and hope that Alexey will find some time to add them to the tool in nearest future. So if somebody wants to contribute or give some more ideas, please comment and open feature requests on the github page :)

↧

Michael Paquier: Postgres module highlight: customize passwordcheck to secure your database

August 24, 2013, 4:31 am

≫ Next: Fabien Coelho: Turing Machine in SQL (4)

≪ Previous: Valentine Gogichashvili: Real-time console based monitoring of PostgreSQL databases (pg_view)

passwordcheck is a contrib module present in PostgreSQL core using a hook present in server code when creating or modifying a role with CREATE/ALTER ROLE/USER able to check a password. This hook is present in src/backend/commands/user.c and called check_password_hook if you want to have a look. This module basically checks the password format and returns [...]

↧

Fabien Coelho: Turing Machine in SQL (4)

August 25, 2013, 11:01 pm

≫ Next: Hans-Juergen Schoenig: Table bloat revisited: Making tables shrink

≪ Previous: Michael Paquier: Postgres module highlight: customize passwordcheck to secure your database

In previous posts [1 2 3], I have presented different ways of implementing a Turing Machine (TM) in SQL with PostgreSQL. All three techniques rely on WITH RECURSIVE to iterate till the TM stops, so as to provide some kind of while construct.

In this post, I get rid of this construct, so that the solution does not require PostgreSQL 8.4 or later. Obviously there is a trick: I will use a recursive SQL function with side effects on a TABLE to execute the TM.

Turing Machine with a recursive SQL function

In this post the TM is built from the following SQL features: one recursive SQL function to iterate the TM, INNER JOIN to get transition and state informations, a CASE expression to stop or recurse, a separate TABLE to store the tape contents, INSERT and UPDATE commands to update the tape. A SEQUENCE is also used implicitely, but could be avoided.

An ARRAY with ORDER and a sub-SELECT are also used to record the tape state, but is not strictly necessary, it is just there for displaying the TM execution summary at the end.

Turing Machine tape setup

The tape is stored in a standard TABLE which stores the current symbols, and possibly temporarily the previous symbol at this position.

-- create initial tape contentsCREATETABLERunningTape(tidSERIALPRIMARYKEY,-- implicit sequence theresymbolINTEGERNOTNULLREFERENCESSymbol,-- previous symbol needed temporarily between an update & its recursionpsymbolINTEGERREFERENCESSymbolDEFAULTNULL);INSERTINTORunningTape(symbol)SELECTsymbolFROMTapeORDERBYtid;);

This tape will be used and modified by the next query while the TM is executed.

Turing Machine execution

Let us now execute a run with a recursive SQL function:

-- update tape as a recursive SQL function side effectCREATEORREPLACEFUNCTIONrecRun(iteINTEGER,staINTEGER,posINTEGER)RETURNSINTEGERVOLATILESTRICTAS$$-- keep a trace for later displayINSERTINTORun(rid,sid,pos,tape)VALUES(ite,sta,pos,ARRAY(SELECTsymbolFROMRunningTapeORDERBYtid));-- ensure that the tape is long enough, symbol 0 is blankINSERTINTORunningTape(symbol)VALUES(0);-- update tape contentsUPDATERunningTapeAStpSETsymbol=tr.new_symbol,-- update the tape symbolpsymbol=tr.symbol-- but keep a copy as we need it again for the recursionFROMTransitionAStrWHEREtr.sid=staANDtr.symbol=tp.symbolANDtp.tid=pos;-- now the recursionSELECTCASEWHENst.isFinalTHENst.sid-- stop recursion on a final stateELSErecRun(ite+1,tr.new_state,pos+tr.move)-- or do *recurse*ENDFROMTransitionAStrJOINRunningTapeAStpON(tp.psymbol=tr.symbol)JOINStateASstUSING(sid)WHEREst.sid=staANDtp.tid=pos;$$LANGUAGESQL;

The first INSERT records the TM execution and could be removed without affecting the end result of the Turing machine. The second INSERT extends the tape with a blank symbol, so that the TM cannot run out of the tape. The UPDATE modifies the tape contents based on the transition and state, but keep track of the changed symbol which is needed for the next statement. Finally, the SELECT either stops or recurses, depending on whether the state is final.

Then the recursive SQL function can be simply invoked by providing the initial state and tape position:

SELECTrecRun(0,0,1);-- start Turing Machine

You can try this self-contained SQL script which implements a Turing Machine for accepting the AnBnCn language using the above method.

As the version does not require WITH RECURSIVE and WINDOW functions, it should work with version of PostgreSQL before 8.4 which initially provided these features.

Final note

For any pratical system, all Turing completness proofs really deal with memory bounded Turing completeness, as the number of data is necessary finite, so we are really only talking about a (possibly) big automaton. For instance, our implementations rely on the INTEGER type for tape addresses, which implicitely imply that the tape, hence memory, is finite. It would be a TM if we could use a mathematical integer instead, but that in itself would require an unbounded memory.

Links

The WITH RECURSIVE feature comes with with SQL:1999, but WINDOW functions come with SQL:2003.

See the Cyclic Tag System (CTS) implementation in SQL by Andrew Gierth, which seems to be Turing complete although the proof of that is quite complex.

There is a post by Jens Schauder which builds a TM with Oracle SQL, however the iteration loop is finite, so it seems to me that this is not really Turing completeness.

Since the SQL:1999 standard includes an actual programming language (SQL/PSM), one could consider that SQL is Turing complete because of that, but this is cheating!

This interesting page by Andreas Zwinkau lists various accidentally Turing complete systems.

rule

↧

Hans-Juergen Schoenig: Table bloat revisited: Making tables shrink

August 26, 2013, 3:40 am

≫ Next: Chris Travers: When to use SELECT * in PostgreSQL

≪ Previous: Fabien Coelho: Turing Machine in SQL (4)

Many people are wondering why deleting data from a table in a PostgreSQL database does not shrink files on disk. You would expect storage consumption to go down when data is deleted. This is not always the case. To show this really works I have compiled some small examples. Let us get started with a [...]

↧

Chris Travers: When to use SELECT * in PostgreSQL

August 26, 2013, 4:24 am

≫ Next: Ian Barwick: Living on the edge - 9.3 RC1

≪ Previous: Hans-Juergen Schoenig: Table bloat revisited: Making tables shrink

In LedgerSMB we use a lot of queries which involve SELECT *. Many people consider SELECT * harmful but there are several cases where it is useful. Keep in mind we encapsulate the database behind an API, so SELECT * has different implications than it does from applications selecting directly from tables.

The Fundamental Design Questions

It all comes down to software interface contracts and types. Poorly thought-out contracts, loosely applied, lead to unmaintainable code. Clear contracts, carefully applied, lead to maintainable code because the expectations are easily enforced.

PostgreSQL comes with a complex type system where every table, view, or composite type is an object class. In the right contexts, SELECT * provides you a result of a guaranteed type. This is important when doing object relational work because it means you get a series of objects back in a defined class. This allows you to then pass those on to other functions to get derived data.

Select * therefore helps you when working with objects, because you can ensure that the result types are in fact valid objects of a specified class defined in the relation clause of the query.

Where SELECT * can't be helpful

SELECT * is never helpful (and can have significant problems) in specific areas, such as views and anywhere you have a join. There are specific reasons for these problems.

Views pose some real problems for SELECT * because this can cause the contract involving selecting from the view to change on dump/reload.

Consider the following:

chris=# create table typetest (test text);
CREATE TABLE
^
chris=# insert into typetest values ('test1'), ('test2');
INSERT 0 2
chris=# CREATE VIEW typetestview AS select * from typetest;
CREATE VIEW
chris=# select * from typetestview;
test
-------
test1
test2
(2 rows)

chris=# alter table typetest add newfield bool default false;
ALTER TABLE
chris=# select * from typetestview;
test
-------
test1
test2
(2 rows)

All seems ok until you dump and restore. If you dump and restore, testypeview will suddenly have another column. You probably don't want your contracts with software changing on a dump and reload, so you should probably avoid SELECT * in views, or if you must use it, rebuild your views frequently (i.e. run a SQL script with DROP VIEW... CREATE VIEW...) so that your contract changes with the underlying table and you aren't caught flat-footed in a disaster recovery situation.

The key problem here is that views are their own types, and consequently you cannot guarantee that the view type will be the same as the underlying table type. This is a problem which can be managed, but unless it is required for a specific view (for example, one implementing some form of row-level security), the management effort is probably not worth it.

Once joins are used in a query, however, SELECT * loses any potential benefit. Joins do not return a defined type, and so SELECT * should never be used in queries utilizing joins (aside possibly from ad hoc queries run by the dba to explore the data).

SELECT * and Stored Procedures

Consider for example the following CRUD stored procedure:

CREATE OR REPLACE FUNCTION accounts__list_all()
RETURNS SETOF accounts
LANGUAGE SQL AS
$$
SELECT * FROM accounts ORDER BY account_no;
$$;

This query is relatively simple, but the stored procedure returns a type that is defined by the underlying table. We all run into cases where application data can't be much further normalized and we may want to have stored procedures delivering that data to the application. In this case, we are likely to use a function like this, and that enables us to do other object-relational things outside it.

Now, if we need to change the underlying accounts table, we can always make a decision as to whether to make accounts a view with a stable representation, a complex type with a hand-coded query returning it, or just propagate the changes upwards. Because the application is not directly selecting from the underlying storage, we have options to ensure that the contract can be maintained. In essence this injects a dependency that allows us to maintain contracts more easily through schema changes.

Consequently although it leads to the same execution plan in this example, there is a tremendous difference, software engineering-wise, between an application calling:

SELECT * FROM accounts ORDER BY account_no;

and

SELECT * FROM accounts__list_all();

In the first case, you have only one contract, between the high level application code and the low-level storage. In the second case, you have two contracts, one between the storage and the procedure (which can be centrally adjusted), and a more important one between the application code and the stored procedure.

Conclusions

In PostgreSQL, the choice of whether to use SELECT * in a query is a relatively clear one. If you want to return objects of a type of an underlying construct, and the return type is closely tied over time to the output type, then SELECT * is fine. On the other hand, if these things aren't true then either you should find ways to make them true, or avoid using SELECT * altogether.

This makes a lot more sense when you realize that things like table methods can be passed up when select * is used (or methods applied to views, or the like).

In general you are going to find two areas where select * is most helpful. The first is in object-relational design. The second case is where you want PostgreSQL to define an object model for you. In reality the first case is a special case of the second.

This way of doing things is very different than the way most applications work today. The database is encapsulated behind an object model and the application consumes that object model. In those cases, select * is very helpful.

↧

Ian Barwick: Living on the edge - 9.3 RC1

August 26, 2013, 8:40 am

≫ Next: Selena Deckelmann: Fancy SQL Monday: format() instead of quote_*()

≪ Previous: Chris Travers: When to use SELECT * in PostgreSQL

PostgreSQL 9.3 RC1 was released the other day, and despite the dire warnings I couldn't resist putting it on this server to try out some of the new functionality in live operation. Admittedly it's not a real "production server" as having the database crash or mangle the data beyond repair would be merely an annoyance to myself and is nothing that can't be recovered from backups, so it's a good way of testing the new release. I've had good experiences with release candidates in the past, and probably the worst that could happen is the discovery of some issue requiring a bump of the catalog version number before final release, which means I'd have to upgrade from backups (which would probably mean whole minutes of downtime).

Disclaimer, esp. for any of my colleagues reading this wondering if I'm insane : there's no way I would ever do this with a genuine production installation.

Anyway, a full day's worth of log files shows no errors or other issues associated with the new release. This is particularly gratifying as there's now a (modest) custom background worker running which is giving me a warm tingly feeling and has got me thinking about ideas for new ones. I also feel a materialized view coming on, and I'm sure there's some way I could contrive a justification for using a writeable foreign data wrapper .

Thanks to everyone who has put so much effort into this release - I'm looking forward to the day when it can run in production for real.

permalink

↧

Selena Deckelmann: Fancy SQL Monday: format() instead of quote_*()

August 26, 2013, 9:29 am

≫ Next: Craig Ringer: Choosing a PostgreSQL text search method

≪ Previous: Ian Barwick: Living on the edge - 9.3 RC1

In the comments, Isaac pointed out that using format() dramatically increases the readability of SQL. I liked the look of his query, so I dug a little deeper.

As of version 9.1 (first released in 2010), a new function is listed in Postgres’ built-in string function documentation:

format(formatstr text [, str "any" [, ...] ]): Format a string. This function is similar to the C function sprintf; but only the following conversion specifications are recognized: %s interpolates the corresponding argument as a string; %I escapes its argument as an SQL identifier; %L escapes its argument as an SQL literal; %% outputs a literal %. A conversion can reference an explicit parameter position by preceding the conversion specifier with n$, where n is the argument position.

We also have examples linked in the definition for various quoting strategies for dynamic SQL.

This is an example where the Postgres documentation probably should have reversed the order what is mentioned.

It turns out that format() makes it much easier to avoid using the quote_*() functions. The code looks a lot more like a python """ string, with flexible options for usage. The only feature missing is named parameters.

My application requires Postgres 9.2 at this point (for JSON datatype), so my plan is to refactor a few functions using format() instead of quote_ident() in particular.

Are there situations where you’d prefer to use quote_*() other than for backward compatibility? It seems as though format() is far safer, particularly for the quoting problems mentioned on the Quote Literal Example documentation.

↧

Craig Ringer: Choosing a PostgreSQL text search method

August 27, 2013, 3:04 am

≫ Next: Dimitri Fontaine: Auditing Changes with Hstore

≪ Previous: Selena Deckelmann: Fancy SQL Monday: format() instead of quote_*()

(This article is written with reference to PostgreSQL 9.3. If you’re using a newer version please check to make sure any limitations described remain in place.)

PostgreSQL offers several tools for searching and pattern matching text. The challenge is choosing which to use for a job. There’s:

LIKE and ILIKESQL pattern matching;
~ and ~* operators for mostly-perl-compatible regular expressions;
full text search with @@, to_tsvector and to_tsquery
Use of an external search provider like Apache Lucene / Solr.

There’s also SIMILAR TO, but we don’t speak of that in polite company, and PostgreSQL turns it into a regular expression anyway.

Each of the built-in searching options comes with multiple choices of index:

b-tree indexes with the text_pattern_ops opclass for left anchored (prefix) LIKE and regex searches;
GiST or GIN pg_trgm indexes for infix and suffix pattern matching searches using KNNGIST;
GiST and GIN full-text search indexes with options for language and stemming.

It’s not surprising that people get confused.

Rather than starting with “what method should I use to make my search fastest”, I suggest you narrow the field by determining what the semantics of your search requirements are.

In the following descriptions any examples will use the words table created with:

create table words ( word text not null );
\copy words from /usr/share/dict/linux.words

(Your system might have different dictionary files in /usr/share/dict, but the effect will be much the same.)

Simple prefix searches (“Starts with…”)

Are you doing only simple prefix searches that match terms exactly, including punctuation and whitespace, either case sensitive or insensitive? If so, a simple column LIKE 'pattern%' search on a text_pattern_ops index may be suitable:

regress=> CREATE INDEX words_btree_tpo ON words(word text_pattern_ops);

regress=> # LIKE prefix search is fast:
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE word LIKE 'freck%';
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using words_btree_tpo on words (actual rows=18 loops=1)
   Index Cond: ((word ~>=~ 'freck'::text) AND (word ~ # as is left-anchored regexp matching
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE word ~ '^freck.*';
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using words_btree_tpo on words (actual rows=18 loops=1)
   Index Cond: ((word ~>=~ 'freck'::text) AND (word ~ # ILIKE doesn't use the index though:
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE word ILIKE 'freck%';
                 QUERY PLAN
--------------------------------------------
 Seq Scan on words (actual rows=18 loops=1)
   Filter: (word ~~* 'freck%'::text)
   Rows Removed by Filter: 479810
 Total runtime: 339.787 ms
(4 rows)

regress=> # and neither does a suffix search:
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE word LIKE '%freck';
                QUERY PLAN
-------------------------------------------
 Seq Scan on words (actual rows=1 loops=1)
   Filter: (word ~~ '%freck'::text)
   Rows Removed by Filter: 479827
 Total runtime: 91.125 ms
(4 rows)

In newer PostgreSQL versions this works even if you concatenate the wildcard onto a parameter:

regress=> PREPARE ps1(text) AS SELECT word FROM words WHERE word LIKE $1 || '%';
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF)  EXECUTE ps1('freck');
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using words_btree_tpo on words (actual rows=18 loops=1)
   Index Cond: ((word ~>=~ 'freck'::text) AND (word ~<~ 'frecl'::text))
   Filter: (word ~~ 'freck%'::text)
   Heap Fetches: 18
 Total runtime: 0.060 ms
(5 rows)

Variants on prefix search

For case insensitive prefix searches, you can use lower(column) LIKE lower(pattern):

regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE lower(word) LIKE lower('freck%');
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on words (actual rows=18 loops=1)
   Filter: (lower(word) ~~ 'freck%'::text)
   ->  Bitmap Index Scan on words_lower_btree_tpo (actual rows=18 loops=1)
         Index Cond: ((lower(word) ~>=~ 'freck'::text) AND (lower(word) ~<~ 'frecl'::text))
 Total runtime: 0.073 ms
(5 rows)

citext won’t help you; ILIKE won’t use an index even with the citext data type:

regress=> CREATE TABLE wordsci ( word citext not null );
regress=> \copy wordsci from '/usr/share/dict/linux.words'
regress=> create index wordsci_btree_tpo ON wordsci (word text_pattern_ops);
regress=> explain SELECT word FROM wordsci WHERE word LIKE 'AIL%';
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on wordsci  (cost=0.00..8463.85 rows=2399 width=10)
   Filter: (word ~~ 'AIL%'::citext)
(2 rows)

regress=> explain SELECT word FROM wordsci WHERE word ILIKE 'AIL%';
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on wordsci  (cost=0.00..8463.85 rows=2399 width=10)
   Filter: (word ~~* 'AIL%'::citext)
(2 rows)

If you’re doing only “ends with” searches you can actually index reverse(my_column) with text_pattern_ops and then search for LIKE reverse(my_pattern) in some cases, so you can use the same approach as prefix search. Again, this only works when you’re matching punctuation and spacing exactly.

regress=> CREATE INDEX words_rev_btree_tpo ON words(reverse(word) text_pattern_ops);
regress=> # Find words that end with "ent":
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE reverse(word) LIKE reverse('%ent');
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on words (actual rows=4069 loops=1)
   Filter: (reverse(word) ~~ 'tne%'::text)
   ->  Bitmap Index Scan on words_rev_btree_tpo (actual rows=4069 loops=1)
         Index Cond: ((reverse(word) ~>=~ 'tne'::text) AND (reverse(word) ~<~ 'tnf'::text))
 Total runtime: 5.680 ms
(5 rows)

You can do simple punctuation and spacing normalisation with a user-defined function that transforms the input string using replace or regexp_replace, so you search for my_normalize_func(col) LIKE my_normalize_func('pattern')… but it quickly gets inefficient and clumsy to work like this. It can be a good option if you have unusual or strict search requirements, though.

infix and suffix patterns

If you’re still reading, you probably need to match within the string, not just at the start, or you need to ignore punctuation and formatting differences. If you’ve been writing LIKE 'my%pattern' or LIKE '%word%' then this is you.

An infix wildcard like my%pattern will use a text_pattern_ops btree index, doing a search for my% then re-checking the matches. This is often good enough:

regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE word LIKE 'ta%le';
                               QUERY PLAN
------------------------------------------------------------------------
 Bitmap Heap Scan on words (actual rows=59 loops=1)
   Filter: (word ~~ 'ta%le'::text)
   Rows Removed by Filter: 2661
   ->  Bitmap Index Scan on words_btree_tpo (actual rows=2720 loops=1)
         Index Cond: ((word ~>=~ 'ta'::text) AND (word ~<~ 'tb'::text))
 Total runtime: 1.386 ms
(6 rows)

… but one with no left-anchored text at all, like %word%, cannot use a regular b-tree. In that case, the next question is whether you need to match partial words or not. If you’re given the search term ruff and need to find truffle then you’ve really only got one option for this kind of mid-word “contains” search: pg_trgm indexes on pattern matching LIKE or ~ searches.

As superuser:

regress=# CREATE EXTENSION pg_trgm;

then:

regress=> CREATE INDEX words_trgm_gin ON words USING GIN(word gin_trgm_ops);
regress=> EXPLAIN (ANALYZE ON, COSTS OFF, TIMING OFF) SELECT word FROM words WHERE word LIKE '%ruff%';
                             QUERY PLAN
---------------------------------------------------------------------
 Bitmap Heap Scan on words (actual rows=99 loops=1)
   Recheck Cond: (word ~~ '%ruff%'::text)
   Rows Removed by Index Recheck: 8
   ->  Bitmap Index Scan on words_trgm_gin (actual rows=107 loops=1)
         Index Cond: (word ~~ '%ruff%'::text)
 Total runtime: 0.511 ms
(6 rows)

Full-text search

If you’re looking for whole words or word prefixes you can consider full-text search. For your application:

Is it OK to ignore the order of search terms? Should dog jumped match jumped dog?
Do you want to offer a boolean / advanced search with and/or/not queries like jumped & !shark?
Do you need to stem words to their roots, so that a search for “cats” finds “cat”, etc?
Are you OK with inexact matches that ignore punctuation and case?

If all of that is “yes”, you probably want to use full-text search. A large variety of dictionaries are offered to customise behaviour with synonym matching, stemming, canonicalization of related terms, etc.

Full-text search supports word-by-word prefix matching, so you can match supercalifragilisticexpialidocious with super:*. It also offers boolean search with and, or, negation, and grouping, leading to such queries as:

SELECT to_tsvector('supercalafragalisticexpialidocious') @@ to_tsquery('(super:*) & !(superawful:* | supper)');

If you don’t want stemming and synonyms you can still use full text search with one of the thesaurus dictionaries or with the ‘simple’ dictionary. Prefix searches still work without stemming.

I won’t demonstrate all the options for full text search here are they’re already well covered elsewhere.

Mixed/multiple language search

Multi-language search is supported by full text search if you have a separate column that identifies the language the text is in. You can then index to_tsvector(language_column, text_column).

Mixed-language text or unknown-language text search is supported by full-text search, but only if you use the simple dictionary, in which case you don’t get stemming.

In conclusion

… it’s complicated. You need to know what you want before you can decide what to use.

You will notice that I barely even touched on performance in quantitative terms throughout this entire discussion. I looked into qualitative factors like which queries can use which index types, and into semantics, but not the details of timing and performance measurement. I didn’t get into GIN vs GiST choice for index types that offer both. That’s a whole separate topic, but one that’s already been discussed elsewhere. In any case the only really acceptable performance guidance is benchmarks you run on your hardware with a simulation of your data set.

At no point did I try to determine whether LIKE or full-text search is faster for a given query. That’s because it usually doesn’t matter; they have different semantics. Which goes faster, a car or a boat? In most cases it doesn’t matter because speed isn’t your main selection criteria, it’s “goes on water” or “goes on land”.

↧

Dimitri Fontaine: Auditing Changes with Hstore

August 27, 2013, 8:35 am

≫ Next: Yann Larrivee: ConFoo: The conference for web developers is looking for speakers.

≪ Previous: Craig Ringer: Choosing a PostgreSQL text search method

In a previous article about Trigger Parameters we have been using the extension hstore in order to compute some extra field in our records, where the fields used both for the computation and for storing the results were passed in as dynamic parameters. Today we're going to see another trigger use case for hstore: we are going to record changes made to our tuples.

Comparing hstores

One of the operators that hstore propose is the hstore - hstore operator whose documentation says that it will delete matching pairs from left operand.

# select 'f1 => a, f2 => x'::hstore - 'f1 => b, f2 => x'::hstore as diff;
   diff    
-----------
 "f1"=>"a"
(1 row)

That's what we're going to use in our changes auditing trigger now, because it's pretty useful a format to understand what did change.

Auditing changes with a trigger

First we need some setup, a couple of tables to use in our worked out example:

create table example
 (
   id   serial,
   f1   text,
   f2   text
 );

create table audit
 (
  change_date timestamptz default now(),
  before hstore,
  after  hstore
 );

The idea is to add a row in the audit table each time it is updated, with the hstore representation of the data in flight before and after the change. So as to avoid the problem of not being able to easily rebuild the current value of a row at any time in the history, we're going to store a couple of full hstore representations here.

create function audit()
  returns trigger
  language plpgsql
as $$
begin
  INSERT INTO audit(before, after)
       SELECT hstore(old), hstore(new);
  return new;
end;
$$;

I can't help but visualize triggers this way...

Now, we need to attach the trigger to the table which is the source of our events. Note that we could attach the same trigger to any table in fact, as the details of the audit table has nothing specific about the example table. If you want to do that, though, you will certainly want to add the name of the source table of the event you're processing, available from within your trigger as TG_TABLE_NAME. Oh and maybe add TG_TABLE_SCHEMA while at it!

Be sure to check the PL/pgSQL Trigger Procedures documentation.

create trigger audit
      after update on example
          for each row
 execute procedure audit();

Testing it

With that in place, let's try it out:

insert into example(id, f1, f2) values(1, 'a', 'a');
update example set f1 = 'b' where id = 1;
update example set f2 = 'c' where id = 1;

And here's what we can see:

Another kind of differential

# select change_date, after - before as diff from audit;
          change_date          |   diff    
-------------------------------+-----------
 2013-08-27 17:59:19.808217+02 | "f1"=>"b"
 2013-08-27 17:59:19.808217+02 | "f2"=>"c"
(2 rows)

The hstore extension is really useful and versatile, and we just saw another use case for it!

↧

Yann Larrivee: ConFoo: The conference for web developers is looking for speakers.

August 27, 2013, 5:47 pm

≫ Next: Peter Eisentraut: Automating patch review

≪ Previous: Dimitri Fontaine: Auditing Changes with Hstore

ConFoo is currently looking for web professionals with deep understanding of PHP, Java, Ruby, DotNet, HTML5, Databases, Cloud Computing, Security and Mobile development to share their skills and experience at the next ConFoo. Submit your proposals between August 26th and September 22nd.

ConFoo is a conference for developers that has built a reputation as a prime destination for exploring new technologies, diving deeper into familiar topics, and experiencing the best of community and culture.

ConFoo 2014 will be hosted on February 26-28 in Montreal, at the Hilton Bonaventure Hotel.
We take good care of our speakers by covering most expenses including travel, accommodation, lunch, full conference ticket, etc.
Presentations are 35min + 10min for questions, and may be delivered in English or French.
ConFoo is an open environment where everyone is welcome to submit. We are simply looking for quality proposals by skilled and friendly people.

If you would simply prefer to attend the conference, we have a $230 discount until October 16th.

↧

Peter Eisentraut: Automating patch review

August 28, 2013, 9:50 am

≫ Next: Peter Eisentraut: Testing PostgreSQL extensions on Travis CI revisited

≪ Previous: Yann Larrivee: ConFoo: The conference for web developers is looking for speakers.

I think there are two kinds of software development organizations (commercial or open source):

Those who don’t do code review.
Those who are struggling to keep up with code review.

PostgreSQL is firmly in the second category. We never finish commit fests on time, and lack of reviewer resources is frequently mentioned as one of the main reasons.

One way to address this problem is to recruit more reviewer resources. That has been tried; it’s difficult. The other way is to reduce the required reviewer resources. We can do this by automating things a little bit.

So I came up with a bag of tools that does the following:

Extract the patches from the commit fest into Git.
Run those patches through an automated test suite.

The first part is done by my script commitfest_branches. It extracts the email message ID for the latest patch version of each commit fest submission (either from the database or the RSS feed). From the message ID, it downloads the raw email message and extracts the actual patch file. Then that patch is applied to the Git repository in a separate branch. This might fail, in which case I report that back. At the end, I have a Git repository with one branch per commit fest patch submission. A copy of that Git repository is made available here: https://github.com/petere/postgresql-commitfest.

The second part is done by my Jenkins instance, which I have written about before. It runs the same job as it runs with the normal Git master branch, but over all the branches created for the commit fest. At the end, you get a build report for each commit fest submission. See the results here: http://pgci.eisentraut.org/jenkins/view/PostgreSQL/job/postgresql_commitfest_world/. You’ll see that a number of patches had issues. Most were compiler warnings, a few had documentation build issues, a couple had genuine build failures. Several (older) patches failed to apply. Those don’t show up in Jenkins at all.

This is not tied to Jenkins, however. You can run any other build automation against that Git repository, too, of course.

There is still some manual steps required. In particular, commitfest_branches needs to be run and the build reports need to be reported back manually. Fiddling with all those branches is error-prone. But overall, this is much less work than manually downloading and building all the patch submissions.

My goal is that by the time a reviewer gets to a patch, it is ensured that the patch applies, builds, and passes the tests. Then the reviewer can concentrate on validating the purpose of the patch and checking the architectural decisions.

What needs to happen next:

I’d like an easier way to post feedback. Given a message ID for the original patch submission, I need to fire off a reply email that properly attaches to the original thread. I don’t have an easy way to do that.
Those reply emails would then need to be registered in the commit fest application. Too much work.
There is another component to this work flow that I have not finalized: checking regularly whether the patches still apply against the master branch.
More automated tests need to be added. This is well understood and a much bigger problem.

In the meantime, I hope this is going to be useful. Let me know if you have suggestions, or send me pull requests on GitHub.

↧

Peter Eisentraut: Testing PostgreSQL extensions on Travis CI revisited

August 28, 2013, 7:27 pm

≫ Next: Fabien Coelho: Turing Machine in SQL (5)

≪ Previous: Peter Eisentraut: Automating patch review

My previous attempt to setup up multiple-PostgreSQL-version testing on Travis CI worked OK, but didn't actually make good use of the features of Travis CI. So I stole, er, adapted an idea from clkao/plv8js, which uses an environment variable matrix to control which version to use. This makes things much easier to manage and actually fires off parallel builds, so it's also faster. I've added this to all my repositories for PostgreSQL extensions now. (See some examples: pglibuuid, plxslt, pgvihash, pgpcre, plsh)

↧

Fabien Coelho: Turing Machine in SQL (5)

August 28, 2013, 11:47 pm

≫ Next: Hans-Juergen Schoenig: Reporting: Creating correct output

≪ Previous: Peter Eisentraut: Testing PostgreSQL extensions on Travis CI revisited

In a previous post I have shown how to build a Turing Machine (TM) in SQL with PostgreSQL using a recursive SQL function. I claimed that this would work with versions of PostgreSQL older than 8.4. In this post, I actually investigate this claim, porting the script down to version 7.3 on a Debian Linux. Testing the relatively simple SQL script with older versions reminds us of all the goodies that have been added over the years.

Moving back in PostgreSQL time…

The initial SQL script works with current PostgreSQL 9.2.

PostgreSQL 8.4 (2009)

Easy installation from the PostgreSQL APT repository.

However we loose the function parameter names in SQL, although they work with PL/pgSQL, thus I must revert to $n parameter style, that is to switch:

CREATEFUNCTIONrecRun(iteINTEGER,staINTEGER,posINTEGER)......WHEREtr.sid=staANDtp.tid=pos...

to the less readable:

CREATEFUNCTIONrecRun(INTEGER,INTEGER,INTEGER)......WHEREtr.sid=$2ANDtp.tid=$3...

PostgreSQL 8.2 (2006)

Easy installation of this unsupported version thanks to the APT repository. The PostgreSQL 8.4 script with SQL function parameters referenced by their number works fine.

PostgreSQL 8.1 and 8.0 (2005)

Manual installation with configure, make, initdb, pg_ctl works fine.

However, we loose the UPDATE ... AS ... aliasing, which for our query generates unsolvable attribute name ambiguities , so I had to rename tape attributes so that they do not interfere with transaction attributes. If the query had required to UPDATE and FROM the same table, I think it would not have been possible to write it, or maybe with an artificial VIEW to hide the renamings… We also loose VALUES lists which must be converted to repeated INSERTs or to a COPY syntax, and IF EXISTS in DROP SCHEMA.

PostgreSQL 7.4 (2003)

Manual installation as above works fine.

However we loose $$ dollar quoting, so the function definition must be put inside a string, which is much less readable under emacs or vi. Compare the nice syntax coloring by pygments:

CREATEFUNCTIONrecRun(...)...$$INSERTINTORun(rid,...)VALUES(...);$$LANGUAGESQL;

To the older style:

CREATEFUNCTIONrecRun(...)...'  INSERT INTO Run(rid, ...) VALUES(...);'LANGUAGESQL;

PostgreSQL 7.3 (2002)

Manual installation works fine.

However we loose ARRAY integration, so that I had to remove the display of the tape along the run and a related CHECK. The final tape state can still be shown.

PostgreSQL 7.2.8 (2005)

Manual installation worked.

Easily fixed syntactic issues: no SCHEMA support, COPY does not accept an attribute list.

Nevertheless, I could not make the recursive SQL function work with 7.2.8, as the server complains that the function does not exist yet while creating it.

Conclusion

This final PostgreSQL 7.3 compatible SQL script implements a Turing machine with a recursive SQL function. If you accept this trick, then it seems that PostgreSQL SQL is Turing complete since version 7.3, released on November 27, 2002. The key feature is listed in Section 3.10 of the release notes: Allow recursive SQL function (Peter). It was added on May 22, 2002 by Peter Eisentraut:

commit d60f10b0e74173653d17c09750a791afe6f56404
Author: Peter Eisentraut <*>
Date:   Wed May 22 17:21:02 2002 +0000

Add optional "validator" function to languages that can validate the
function body (and other properties) as a function in the language
is created.  This generalizes ad hoc code that already existed for
the built-in languages.

The validation now happens after the pg_proc tuple of the new function
is created, so it is possible to define recursive SQL functions.

Add some regression test cases that cover bogus function definition
attempts.

rule

↧

Hans-Juergen Schoenig: Reporting: Creating correct output

August 29, 2013, 2:14 am

≫ Next: Jim Mlodgenski: Latency: That pesky little thing

≪ Previous: Fabien Coelho: Turing Machine in SQL (5)

Creating reports is a core task of every PostgreSQL database engineer or developer. However, many people think that it is enough to hack up some SQL aggregating some data and execute it. This is not quite true. We have repeatedly seen reports being just plain wrong without people even taking note of it. How can [...]

↧

Jim Mlodgenski: Latency: That pesky little thing

August 29, 2013, 6:47 am

≫ Next: Craig Ringer: Testing new PostgreSQL versions without messing up your existing install

≪ Previous: Hans-Juergen Schoenig: Reporting: Creating correct output

Recently, I was helping out a client improve their performance of their ETL processes. They were loading several million rows and updating several million more and the whole process was taking better than 7 hours. Jumping in, it was apparent they didn’t spare any expense on the hardware. With 256GB of RAM and a high end SAN their server had plenty of horsepower, but they couldn’t push more than 1000 rows/second. I then took a close look at their config file and there were some changes that should be made, but they didn’t have anything that was unreasonable. I went ahead made some changes to the config that should affect the workload they are pushing on their server like shared_buffers and checkpoint_segments, but it had no effect at all. Watching the server while the load was running, I was surprised to see, the server was mostly idle. There was a little disk activity and a bit of CPU usage, but nothing that would indicate a problem. I went back and looked at the server loading the data and saw similar results, a little activity, but nothing that would pinpoint the issue. That’s when I figured I’d ping the PostgreSQL server from the ETL server and there it was, 1ms latency. That was something I would expect between availability zones on Amazon, but not in a enterprise data center. It turned out that they had a firewall protecting their database servers and the latency was adding 4-5 hours to their load times. The small amount of 1ms really adds up when you need to do millions of network round trips.

↧