Quantcast
Channel: Planet PostgreSQL
Viewing all 9800 articles
Browse latest View live

Chris Travers: Intro to PostgreSQL as Object-Relational Database Management System

$
0
0
This is a very brief intro to PostgreSQL as an object-relational database management system.   In future blog posts, we will look at more hands-on examples of these features in action.  Keep in mind these are advanced features typically used by advanced applications.

This is a very brief guide to the concepts we will be looking at more deeply in future posts, tying together in recipes and examples.   While PostgreSQL was initially designed to explore object-relational modelling possibilities, the toolkit today is somewhat different than it was initially intended, and therefore the focus of this series will be how to use PostgreSQL in an Object-Relational manner, rather than tracking the history of various components.

How is PostgreSQL "Object-Relational?"

The term Object-Relational has been applied to databases which attempt to bridge the relational and object-oriented worlds with varying degrees of success.  Bridging this gap is typically seen as desirable because object-oriented and relational models are very different paradigms and programmers often do not want to switch between them.  There are, however, fundamental differences that make this a very hard thing to do well.  The best way to think of PostgreSQL in this way is as a relational database management system with some object-oriented features.

By blending object-primative and relational models, it is often possible to provide much more sophisticated data models than one can using the relatively limited standard types in SQL.  This can be done both as an interface between an application and database, and as intra-query logic.  In future posts I will offer specific examples of each concept, explore how PostgreSQL differs from Oracle, DB2, and Informix in this area.

PostgreSQL is a development platform in a box.  It supports stored procedures written in entirely procedural languages like PL/PGSQL or Perl without loaded modules, and more object-oriented languages like Python or Java, often through third party modules.  To be sure you can't write a graphical interface inside PostgreSQL, and it would not be a good idea to write additional network servers, such as web servers, directly inside the database.  However the environment allows you to create sophisticated interfaces for managing and transforming your data. Because it is a platform in a box the various components need to be understood as different and yet interoperable.  In fact the primary concerns of object-oriented programming are all supported by PostgreSQL, but this is done in a way that is almost, but not quite, entirely unlike traditional object oriented programming.  For this reason the "object-relational" label tends to be a frequent source of confusion.

Data storage in PostgreSQL is entirely relational, although this can be degraded using types which are not atomic, such as arrays, XML, JSON, and hstore.  Before delving into object-oriented approaches, it is important to master the relational model of databases.  For the novice, this section is therefore entirely informational.  For the advanced developer, however, it is hoped that it will prove inspirational.

In object-oriented terms, every relation is a class, but not every class is a relation.  Operations are performed on sets of objects (an object being a row), and new row structures can be created ad-hoc.  PostgreSQL is, however, a strictly typed environment and so in many cases, polymorphism requires some work.

Data Abstraction and Encapsulation in PostgreSQL


The relational model itself provides some tools for data abstraction and encapsulation, and these features are taken to quite some length in PostgreSQL.  Taken together these are very powerful tools and allow for things like calculated fields to be simulated in relations and even indexed for high performance.

Views are the primary tool here.  With views, you can create an API for your data which is abstracted from the physical storage.  Using the rules system, you can redirect inserts, updates, and deletes from the view into underlying relations, preferably using user defined functions.  Being relations, views are also classes and methods.  Views cannot simply be inherited and workarounds cause many hidden gotchas.

A second important tool here is the ability to define what appear to be calculated fields using stored procedures.  If I create a table called "employee" with three fields (first_name, middle_name, last_name) among others, and create a function called "name" which accepts a single employee argument and concatenates these together as "last_name, first_name middle_name" then if I submit a query which says: 

select e.name from employee e;

it will transform this into:

select name(e) from employee e;

This gives you a way to do calculated fields in PostgreSQL without resorting to views. Note that these can be done on views as well because views are relations.  These are not real fields though.  Without the relation reference, it will not do the transformation (so SELECT name from employee will not have the same effect).

Messaging and Class API's in PostgreSQL


A relation is a class.  The class is accessed using SQL which defines a new data structure in its output.  This data structure unless defined elsewhere in a relation or a complex type cannot have methods attached to it and therefore can not be used with the class.method syntax described above.  There are exceptions to this rule, of course, but they are beyond the scope of this introduction.  In general it is safest to assume that the output of one query, particularly one with named output fields, cannot safely be used as the input to another.

A second messaging apparatus in PostgreSQL is the LISTEN/NOTIFY framework which can be used along with triggers to issue notifications to other processes when a transaction commits.  This approach allows you to create queue tables, use triggers to move data into these tables (creating 'objects' in the process) and then issuing a notification to another process when the data commits and becomes visible.  This allows for very complex and and interactive environments to be built from modular pieces.

Polymorphism in PostgreSQL


PostgreSQL is very extensible in terms of all sorts of aspects of the database.  Not only can types be created and defined, but also operators can be defined or overloaded.

A more important polymorphism feature is the ability to cast one data type as another.  Casts can be implicit or explicit.  Implicit casts, which have largely been removed from many areas of PostgreSQL, allow for PostgreSQL to cast data types when necessary to find functions or operators that are applicable.  Implicit casting can be dangerous because it can lead to unexpected behavior because minor errors  can lead to unexpected results.  '2012-05-31' is not 2012-05-31.  The latter is an integer expression that reduces to 1976.  If you create an implicit cast that turns an integer into a date being the first of the year, the lack of quoting will insert incorrect dates into your database without raising an error ('1976-01-01' instead of the intended '2012-05-31').  Implicit casts can still have some uses.

Inheritance in PostgreSQL


In PostgreSQL tables can inherit from other tables.  Their methods are inherited but implicit casts are not chained, nor are their indexes inherited.  This allows you develop object inheritance hierarchies in PostgreSQL.  Multiple inheritance is possible, unlike any other ORDBMS that I have looked at on the market (Oracle, DB2, and Informix all support single inheritance).

Table inheritance is an advanced concept and has many gotchas.  Please refer to the proper sections of the manual for more on this topic.  On the whole it is probably best to work with table inheritance first in areas where it is more typically used, such as table partitioning, and later look at it in terms of object-relational capabilities.

Overall the best way to look at PostgreSQL as an object-relational database is a database which provides very good relational capabilities plus some advanced features that allows one do create object-relational systems on top of it.  These systems can then move freely between object-oriented and relational worldviews but are still more relational than object-oriented.  At any rate they bear little resemblance to object-oriented programming environments today.  With PostgreSQL this is very much a toolkit approach for object-relational databases building on a solid relational foundation.  This means that these are advanced functions which are powerful in the hands of experienced architects, but may be skipped over at first.

Next week:   An intro to object-relational "classes."  We will build a simple class that represents items in inventory.

Forthcoming Posts:
  • An intro to object-relational "classes" looking at tables with extended functionality
  • Table inheritance in PostgreSQL (including how to solve key problems)
  • Composite and row types in tables, including references to other tables
  • Nested Data Structures (including things to avoid)
  • General Design Patterns and Anti-Patterns

Gabriele Bartolini: Management of the WAL archive in Barman

$
0
0

Barman, backup and recovery manager for PostgreSQL, is designed to manage the archive of WAL files separately from periodical backups (in Postgres terms, base backups).

You can see this archive as a “continuous” stream of files from the first available backup to the last shipped file (backup available history for a server).

In this article you will see how Barman manages WAL compression and archival, as well as how a particular WAL file is associated to a base backup in the history.

A quick recap:

  • Barman currently supports only asynchronous shipping of WAL information (this means that in version 1.0, recovery point objective is always greater than zero)
  • Backup is performed on a per server/instance basis (which means that single database backup is not supported)
  •  Backup servers in Barman are defined in the configuration file (each server has a section in the INI file)

The WAL archiver of the Postgres server sends the newly generated WAL file containing binary transaction information to the backup server. WAL files must be deposited in the incoming_wals_directory for that server, as showed by the “barman show-server” command. By default, the convention is to place these files in the SERVER_BACKUP_DIR/incoming directory.

Usually, the best approach (especially in a local network), is to let Barman compress these 16MB files on the backup server end (you can use the “compression” option for this purpose). This operation is performed jointly with archival by the “barman cron” command, which goes through every file in the incoming directory, performs compression (where applicable) and permanently moves the WAL file in the WAL archive for that server. Bear in mind that this process is asynchronous and does not cause any delay for the normal archiving procedures of the server.

According to Barman design, every WAL file is automatically associated the closest previous base backup available in the server’s backup history.

This definition implies that, in case no base backup has yet been taken, the cron command discards the incoming WAL files from the server’s archive.

The same definition however allows Barman to give DBAs more accurate information about the overall size of a periodical backup, including the size of the base backup (with tablespace data) and the number and size of the compressed WAL files.

The following screenshot for the “barman show-server” command gives you an idea (the test example does not take advantage of WAL compression – see the 48MBs occupied by the three WAL files?).

An example of output from the

Another command that has an impact on the WAL archive is the “backup delete” command.

In case the first available backup is deleted, all the associated WAL files are removed from the archive as well.

Things get more interesting if you delete an intermediate backup. Consider this scenario: your server’s backup history shows three backups, taken at three subsequent times (t1, t2 and t3 with t3 > t2 > t1). If you get rid of the “t2″ backup, the WAL files that were originally associated to this backup are not removed, rather  immediately assigned to the previous available backup (“t1″ in this example).

Among other things, I will be covering this topic as well at PostgreSQL Sessions in Paris on October 4th 2012. Hopefully I will be speaking about this at the next PostgreSQL European Conference in Prague, where my proposed talk on Disaster recovery with Barman is in the reserve list (if not, I will be there anyway, ready to share ideas about Barman and its future with you).

You can download Barman 1.0 from SourceForge.net, or you can get more information by visiting the website or joining our new #barman IRC channel on freenode. Ciao!

Tomas Vondra: Sysbench, memory bandwidth and gcc optimizations

$
0
0

If you're testing basic hardware performance, it's quite probable you're using sysbench, a widely used benchmarking toolkit for testing CPU, I/O and RAM. I've been using it quite intensively recently, and it seems that when it's compiled with gcc optimizations (which is the default in most distributions), some of the tests report meaningless values, because gcc optimizes out important parts of the code.

Well, I wouldn't object against harware with several TB/s memory bandwidth, but I wouldn't expect such numbers when testing current x86 hardware ...

This tool is often recommended (even by pg people like Greg Smith in his talks), so I wonder how many people used these crazy results as a basis for important decisions in the past. Sadly, I'm one of them. Let this be a proof that writing good benchmarking tools is quite tricky and how important it's not to take the numbers granted but to verify them.

Denish Patel: 2 Elephants in the Room!!

Andrew Dunstan: psql binary output

$
0
0
A while ago I looked at the problem of getting binary output from psql. In a discussion on the -hackers mailing list, Tom Lane came up with the good idea of doing this via a variant of the \g command. I've just got a working version of this, It provides two new backslash commands: \gb which gets the results in binary, and outputs the bytes, and \gbn, which does the same but also suppresses the use of the field separator and record separator. Here's an illustration:
[andrew@emma inst.psql-binout.5705]$ echo "select bytea '\\x00010203', bytea '\\x040506' \\gbn" | bin/psql | od -c
0000000  \0 001 002 003 004 005 006
0000007
[andrew@emma inst.psql-binout.5705]$ echo "select bytea '\\x00010203', bytea '\\x040506' \\gb" | bin/psql | od -c
0000000  \0 001 002 003   | 004 005 006  \n
0000011

Of course, just like \g you can also supply a filename or command to pipe these to:
[andrew@emma inst.psql-binout.5705]$ echo "select bytea '\\x00010203', bytea '\\x040506' \\gb |od -c" | bin/psql 
0000000  \0 001 002 003   | 004 005 006  \n
0000011
[andrew@emma inst.psql-binout.5705]$ echo "select bytea '\\x00010203', bytea '\\x040506' \\gbn |od -c" | bin/psql 
0000000  \0 001 002 003 004 005 006
0000007

There is still some work to do, but people who feel like playing along can watch the psql-binout branch on my bitbucket development repo.

Bruce Momjian: Reload Is Powerful

$
0
0

I previously explained the ability to set Postgres configuration variables at different levels. In this blog entry, I would like to explain how changes at the top level, postgresql.conf, propagate to running sessions.

The postgresql.conf file is usually stored at the top of the PGDATA directory, though it can be relocated. The most simplistic way to modify the file is to open it with a text editor. (Tools like pgAdmin allow file modifications via a GUI.)

Once the file has been modified, you must signal that the configuration file should be reloaded and your modifications applied. There are three methods to signal this:

  • send a SIGHUP signal the postmaster process, or SIGHUP only individual backends
  • run pg_ctl reload from the command-line
  • call the SQL function pg_reload_conf()

Continue Reading »

Andrew Dunstan: Secure authentication for Postgres

$
0
0
Recently there was some discussion about switching the hashing algorithm Postgres uses for password authentication. I think the consensus is to wait until the NIST hashing competition results are announced and then reassess the position. But really, that obscured the point that for real security you need to get out of the game of password authentication altogether. Switching hashing algorithms is like changing the deck chairs on the Titanic from a security point of view. If you don't believe me, go and look at this article on ars technica. Using passwords is a losing game. No matter how complex you make them the crackers will climb over whatever wall you think you have constructed. I only allow any sort of password authentication in an environment where any possible connection is at least semi-trusted. For secure authentication, I usually insist on an SSL connection authenticated by a client side certificate. GSSAPI authentication is probably also acceptable, although it's something I am less familiar with.

Jeff Frost: PostgreSQL, NUMA and zone reclaim mode on linux

$
0
0
Lately we've been seeing issues with zone reclaim mode on large memory multi processor NUMA linux systems.

What's NUMA?  It's just an acronym for Non-Uniform Memory Access.  This means that some memory in your system is more expensive for a particular CPU to access than its "local" memory.

You can see how much more distant the kernel considers the different zones by using the numactl command like so:

numactl --hardware
If you've got a modern multiprocessor system, you'll probably see something like this:


available: 2 nodes (0-1)
node 0 size: 48417 MB
node 0 free: 219 MB
node 1 size: 48480 MB
node 1 free: 135 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10 

Here we see a distance of 10 for node 0 to access node 0 memory and 21 for node 0 to access node 1 memory.   What does distance really mean?  It's a cost parameter based on number of "hops" or buses that separate the node from the distant memory.


Now, what's zone reclaim mode?

From the linux kernel docs:


zone_reclaim_mode:

Zone_reclaim_mode allows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no
zone reclaim occurs. Allocations will be satisfied from other zones / nodes
in the system.

This is value ORed together of

1       = Zone reclaim on
2       = Zone reclaim writes dirty pages out
4       = Zone reclaim swaps pages

zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
page allocator will then reclaim easily reusable pages (those page
cache pages that are currently not used) before allocating off node pages.

It may be beneficial to switch off zone reclaim if the system is
used for a file server and all of memory should be used for caching files
from disk. In that case the caching effect is more important than
data locality.

Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively
throttle the process. This may decrease the performance of a single process
since it cannot use all of system memory to buffer the outgoing writes
anymore but it preserve the memory on other nodes so that the performance
of other processes running on other nodes will not be affected.

Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.

I highlighted the text above because PostgreSQL depends heavily on the filesystem cache and disabling zone reclaim mode is desirable in this situation.

There's been a bit of discussion about this on the pgsql-performance mailing list here: http://archives.postgresql.org/pgsql-performance/2012-07/msg00215.php

If you've got a modern multi-socket system, odds are good that zone reclaim mode is enabled automatically on boot.  You can check this by looking at /proc/sys/vm/zone_reclaim_mode.

The biggest issue we've seen with zone reclaim mode enabled on customer multi-socket systems is the filesystem cache never filling up even when the database is much larger than RAM.  That's because the system is trying to keep some "local" memory available. After disabling zone_reclaim_mode, the filesystem cache fills up and performance improves.

So, how to disable zone_reclaim_mode?  The best way to do this is via sysctl.  Just add:

vm.zone_reclaim_mode = 0 

to /etc/sysctl.conf, save it and execute sysctl -p to load the new settings into the kernel.

Other interesting non PostgreSQL pages on NUMA/zone_reclaim_mode:



Leo Hsu and Regina Obe: Creating GeoJSON Feature Collections with JSON and PostGIS functions

$
0
0

If you do a lot of web-based GIS applications, a common desire is to allow a user to draw out an area on the map and then do searches against that area and return back a FeatureCollection where each feature is composed of a geometry and attributes about that feature. In the past the format was GML or KML, but the world seems to be moving to prefer JSON/GeoJSON. Normally you'd throw a mapping server that talks Web Feature Service , do more or less with a webscripting glue, or use a Webservice such as CartoDb that lets you pass along raw SQL.

In this article we'll demonstrate how to build GeoJSON feature collections that can be consumed by web mapping apps. using the built in JSON functions in PostgreSQL 9.2 and some PostGIS hugging. Even if you don't use PostGIS, we hope you'll come away with some techniques for working with PostgreSQL extended types and also how to morph relational data into JSON buckets.


Continue reading "Creating GeoJSON Feature Collections with JSON and PostGIS functions"

Chris Travers: PostgreSQL OR Modelling Part 2: Intro to Object Relational Classes in PostgreSQL

$
0
0
In the last post we went briefly over the considerations and concerns of object-relational programming in PostgreSQL.  Now we will put this into practice.  While LedgerSMB has not fully adopted this approach I think it is likely to be the long-term direction for the project.

In object relational thinking classes have properties and methods, and sets of objects are retrieve and even manipulated using a relational interface.  While relational systems look at sets of tuples, object-relational systems look at relational manipulation of sets of objects.  A table (and to a lesser extent, a view of a composite type) is hence a class and can have various forms of behavior associated with it the rows, which act as objects.  This means a number of considerations change.  Some differences include:

  • SELECT * is often useful to ensure one receives a proper data type
  • Derived values can be immitated by methods, as can dereferences of keys
  • Common filter conditions can be centralized, as can common queries
Basic Table Structure

Our current example set will use a very simplified schema for storing inventory.  Consider the (greatly simplified) chart of accounts table:

 CREATE TABLE account (
     id int not null unique,
     control_code text primary key, -- account number
     description text not null
);

Populated by:

INSERT INTO account (id, accno, description)
VALUES (1, '1500', 'Inventory'),
       (2, '4500', 'Sales'),
       (3, '5500', 'Purchase'); 

Of course in a real system the chart of account (and inventory) tables would be more complex.

CREATE TABLE inventory_item (
    id serial primary key,
    cogs_account_id int references account(id),
    inv_account_id int references account(id),
    income_account_id int references account(id),
    sku text not null,
    description text,
    last_cost numeric, -- null if never purchased
    sell_price numeric not null,
    active bool not null default true
);

Now, we'd also want to make sure only one active part can be associated with a sku at any given time, so we'd:

CREATE UNIQUE INDEX inventory_item_sku_idx_u
ON inventory_item (sku) WHERE active IS TRUE;

The create table statement also creates a complex type with the same structure, and thus it defines a data structure.  The idea of relations as data structures in themselves will come up again and again in this series.  It is the fact that tables are data structures which allows us to do interesting things here.

Method 1:  Derived Value called "markup"

The first method we may want to add to this table is a markup method, for calculating our markup based on current sell price and last cost.  Since this value will always be based on two other stored values, there is no sense in storing it (other than possibly in a pre-calculated index).  To do this, we:

CREATE FUNCTION markup(inventory_item) RETURNS numeric AS
$$ SELECT CASE WHEN $1.last_cost = 0 THEN NULL

               ELSE  ($1.sell_price - $1.last_cost)
                     / $1.last_cost

           END;

$$ LANGUAGE SQL IMMUTABLE;

Looking through the syntax here, this is a function named "markup" which receives a single input of the table type.  It then calculates a value and returns it based solely on inputs and returns a value.  The fact that it function always returns the same value when the same input is passed is reflected in the IMMUTABLE designation.  The planner uses this to plan queries, and PostgreSQL will not index function outputs unless they are marked immutable.

Once this is done, we can:

SELECT sku, description, i.markup from inventory_item i;

Note that you must include the table designation in calling the method.  PostgreSQL converts the i.markup into markup(i).

Of course, our table being empty, no values will be returned.  However if you try to omit the i. before markup, you will get an error.

Not only can we include this in the output we can search on the output:

SELECT sku, description from inventory_item i 
 where i.markup  < 1.5;

If we find ourselves doing a lot of such queries and need to index the values we can create an index:

CREATE INDEX inventory_item_markup_idx 
ON inventory_item (markup(inventory_item));

Note this statement does not support object.method notation here.  You must do method(class) instead.

Instead of adding columns and using triggers to maintain them, we can simply use functions to calculate values on the fly.  Any value you can derive directly from values already stored in the table can thus be calculated on output perhaps even with the values indexed.  This is one of the key optimizations that ORDBMS's allow.

Method 2:  Dereferencing  an Account

The above table is defined in a way that makes it easy to do standard, relational joins.  More specialized Object-Relational-friendly references will be covered in a future posting.   However suppose we want to create a link to the accounts.  We might create a method like:

CREATE OR REPLACE FUNCTION cogs_account(inventory_item) 
RETURNS account
LANGUAGE SQL AS
$$ SELECT * FROM account where id = $1.cogs_account_id $$;

We can now:

or_examples=# select (i.cogs_account).* FROM inventory_item i;

We will get an empty row with the chart of accounts structure.  This gives us the beginnings of a path system for LedgerSMB (a real reference/path system will be discussed in a future posting).

We can even:

select * from inventory_item i 
 where (i.cogs_account).control_code = '5500';

Note that in many of these cases, parentheses are necessary to ensure that the system can tell that we are talking about an object instead of a schema or other system.  This is true generally wherever complex data types are used in PostgreSQL (otherwise the i might be taken to be a schema name).  Unlike Oracle, PostgreSQL does not make assumptions of this sort, and unlike DB2 does not have a separate dereferencing operator.

Warning: Dereferencing objects in this way essentially forces a nested loop join.  It is not recommended when working with large return sets.  For example the previous has a plan (given that I already have one row in this table) as:

or_examples=# explain analyze
or_examples-# select * from inventory_item i where (i.cogs_account).control_code = '5500';
                                                 QUERY PLAN                    
                           
--------------------------------------------------------------------------------
----------------------------
 Seq Scan on inventory_item i  (cost=0.00..1.26 rows=1 width=145) (actual time=0
.174..0.175 rows=1 loops=1)
   Filter: ((cogs_account(i.*)).control_code = '5500'::text)
 Total runtime: 0.199 ms
(3 rows)

Note that the filter is not expanded to a join.  This means that for every row filtered upon, it is executing a query to pull the resulting record from the account table.  Depending on the size of that table and the number of pages containing rows referenced, this may perform badly.  For a few rows, however, it will perform well enough.

Method 3:  Computer-discoverable save method

One application for object-relational modeling is to provide a top-level machine-discoverable API which software programs can use, creating more autonomy and such between the application and the database.  We will create here a machine-discoverable save function which only updates values that may be updated, and returns the values as saved back to the application.  We can consider this sort of loose coupling a "dialog" rather than "remote command" interface because both are assumed to be autonomous and try to provide meaningful responses to the other where appropriate.

Our founction looks like this (not using writeable CTE's for backward compatibility):

CREATE FUNCTION save(inventory_item)
RETURNS inventory_item
LANGUAGE PLPGSQL STABLE AS
$$
  DECLARE out_item inventory_item;

 BEGIN
 -- we don't want to allow accounts to change on existing items
 UPDATE inventory_item
    SET sku = in_item.sku,
        description = in_item.description,
        last_cost = in_item.last_cost,
        sell_price = in_item.sell_price,
        active = in_item.active
  WHERE id = in_item.id;

 IF FOUND THEN
     SELECT * INTO out_item FROM inventory_item
      WHERE id = in_item.id;
     RETURN out_item;
 ELSE
 INSERT INTO inventory_item
                 (cogs_account_id,
                  inv_account_id,
                  income_account_id,
                  sku,
                  description,
                  last_cost,
                  sell_price,
                  active)

          VALUES (in_item.cogs_account_id,
                  in_item.inv_account_id,
                  in_item.income_account_id,
                  in_item.sku,
                  in_item.description,
                  in_item.last_cost,
                  in_item.sell_price,
                  in_item.active);
 SELECT * INTO out_item
      FROM inventory_item
     WHERE id = currval('inventory_item_id_seq');
    RETURN out_item;

  END IF;

 END;
$$;

This function is idempotent and it returns the values as saved to the application so it can determine whether to commit or rollback the transaction.  The structure of the tuple is also discoverable using the system catalogs and so an application can actually look up how to construct a query to save the inventory item.  However it certainly cannot be inlined and it will be slow on large sets.  If you have thousands of inventory parts, doing:

SELECT i.save FROM inventory_item i;

Will be painful and do very little other than generate dead tuples.  Don't do it.

On the other hand a software program (either at code generation or run-time) can look up the structure of the type and generate a call like:

 SELECT (i.save).* 
   FROM (SELECT (row(null, 3, 1, 2, 'TEST123', 'Inventory testing item', 1, 2, true)::inventory_item).save) i;

Obviously this is a sub-optimal interface for humans but it has the advantage of discoverability for a computer.  Note the "::inventory_item" may be unnecessary in some cases, however it is required for all practical purposes on all non-trivial databases because it avoids ambiguity issues.  We really want to make sure that it is an inventory_item we are saving especially as data types may change.  This then allows us to control application entry points to the data.

Note that the application has no knowledge of of what is actually happening under the hood of the save function.  We could be saving it in unrelated relations (and this may be a good way to deal with updateable views in an O-R paradigm where single row updates are the primary use case).

Caveats:  In general I think it is a little dangerous to mix imperative code with declarative SQL in this way.   Object-relational modelling is very different from object-oriented programming because with object-relational modelling we are modelling information, while with object-oriented programming we are encapsulating behavior.  This big difference results in endless confusion.

The most obvious way around this is to treat all SQL queries as questions and treat the transactional boundaries as imperative frames to the declarative conversation.  A human-oriented translation of the following exchange might be:

BEGIN;  -- Hello.  I have some questions for you.

SELECT (i.save).*
  FROM (SELECT (row(null, 3, 1, 2, 'TEST124', 
                'Inventory testing item 2', 1, 2, 
                true)::inventory_item).save) i;

-- Suppose I ask you to save an inventory item with the following info for me.
-- What will be saved?


  id | cogs_account_id | inv_account_id | income_account_id |   sku   |       desc
ription        | last_cost | sell_price | active
----+-----------------+----------------+-------------------+---------+-----------
---------------+-----------+------------+--------
  4 |               3 |              1 |                 2 | TEST124 | Inventory
testing item 2 |         1 |          2 | t


-- The above information will be saved if you ask me to.

COMMIT; -- Do it.

This way of thinking about the overall framework helps prevent a lot of problems down the road.  In particular this helps establish a separation of concerns between the application and the database.  The application is responsible for imperative logic (i.e. what must be done) while the database answers declarative queries and only affirmatively stores data on commit.  Imperative changes to data only occur when the application issues the commit command.

In this regard object behavior (outside of storage which is an odd fit for the model) really doesn't belong in the database.  The database is there to provide answers to questions and update stored information when told to commit changes.  All other behavior should be handled by other applications.  In a future post we will look at some ways to broaden the ways applications can receive data from PostgreSQL.  Object-relational modelling then moves beyond the question of "what information do I have and how can I organize it to get answers" to "what derivative information can be useful and how can I add that to my otherwise properly relational database?"

Alternate Constructor inventory_item(int)

Now in many cases we may not want to have to provide the whole object definition in order to instantiate it.  In fact we may want to be able to ask the database to instantiate it for us.  This is where alternate constructors come in.  Alternate constructors furthermore can be for single objects or for sets of objects.  We will look at the single objects first, and later look at the set-based constructors.

This constructor looks like:

CREATE OR REPLACE FUNCTION inventory_item(int)
RETURNS inventory_item
LANGUAGE SQL
AS $$

SELECT * FROM inventory_item WHERE id = $1

$$;

If I have an item in my db with an id of two (saved in with the previous method call) I can:

 or_examples=# select * from inventory_item(2);
 id | cogs_account_id | inv_account_id | income_account_id |   sku   |      descr
iption       | last_cost | sell_price | active
----+-----------------+----------------+-------------------+---------+-----------
-------------+-----------+------------+--------
  2 |               3 |              1 |                 2 | TEST123 | Inventory
testing item |         1 |          2 | t
(1 row)

I can also chain this together with other methods.  If all I want is the markup for item 2, I can:

or_examples=# select i.markup from inventory_item(2) i;
         markup        
------------------------
 1.00000000000000000000
(1 row)

I can even:

or_examples=# select (inventory_item(2)).markup;
         markup        
------------------------
 1.00000000000000000000
(1 row)

An application can then use something like this to traverse in-application links and retrieve new objects.

Alternate Constructor inventory_item(text)

Similarly we can have a constructor which constructs this from a  text field, looking up by active SKU:

CREATE OR REPLACE FUNCTION inventory_item(text)
RETURNS inventory_item
LANGUAGE sql AS $$

SELECT * FROM inventory_item WHERE sku = $1 AND active is true;

$$;

We can then:

SELECT (inventory_item('TEST123')).markup;

and get the same result as before.

Warning:  Once you start dealing with text and int constructors, you have the possibility of ambiguity on queries in.  For example, SELECT inventory_item('2') will run this on the text constructor instead of the integer constructor, giving you no results.  For this reason it is a very good idea to explicitly cast your inputs to the constructor.

Set Constructor inventory_item(tsquery)

Not only can this be used to create a constructor for a single item.  Whole sets can be constructed this way.  For example we could create a text search constructor (and this allows the default search criteria to change over time in a centrally managed way):

CREATE OR REPLACE FUNCTION inventory_item(tsquery)
RETURNS SETOF inventory_item
LANGUAGE SQL AS $$

SELECT * FROM inventory_item WHERE description @@ $1;

$$;

This allows some relatively powerful searches to be done, without the application having to worry about exactly what is searched on.  For example, we can:

or_examples=# select * from inventory_item(plainto_tsquery('test')); id | cogs_account_id | inv_account_id | income_account_id |   sku   |      descr
iption       | last_cost | sell_price | active
----+-----------------+----------------+-------------------+---------+-----------
-------------+-----------+------------+--------
  2 |               3 |              1 |                 2 | TEST123 | Inventory
testing item |         1 |          2 | t
(1 row)

Here test has been found to match testing because testing is a form of the word test.

These provide examples, I hope, provide some ideas for how one can take the object-relational concepts and apply them towards building more sophisticated, robust, and high-performance databases, as well as better interfaces for object-oriented programs.

Next Week:  Table Inheritance in PostgreSQL

Josh Berkus: Wrong defaults for zone_reclaim_mode on Linux

$
0
0
My coworker Jeff Frost just published a writeup on "zone reclaim mode" in Linux, and how it can be a problem.  Since his post is rather detailed, I wanted to give a "do this" summary:

  1. zone_reclaim_mode defaults to the wrong value for database servers, 1, on some Linux distributions, including Red Hat.
  2. This default will both cause Linux to fail to use all available RAM for caching, and throttle writes.
  3. If you're running PostgreSQL, make sure that zone_reclaim_mode is set to 0.
Frankly, given the documentation on how zone_reclaim_mode works, I'm baffled as to what kind of applications it would actually benefit.  Could this be another Linux misstep, like the OOM killer?

Selena Deckelmann: FrOSCon: Mistakes were Made: Education Edition talk slides and notes

$
0
0

I just finished giving my keynote at FrOSCon, and am pasting the notes I spoke from below. This was meant to be read aloud, of course. Where it says [slide] in the text is where the slides advance.

FrOSCon – Mistakes Were Made: Education Edition

[slide]

Thank you so much for inviting me here to FrOSCon. This is my first time visiting Bonn, and my first time enjoying Kölsch. I enjoyed quite a lot last night at the social event.

Especially, I would like to thank Scotty and Holgar who picked me up at the train station, Inga who talked with me at length on Thursday night. All the volunteers who have done a terrific job making this conference happen. Thank you all so much for a wonderful experience, and for cooking all the food last night!

And I promised to show off the laser etching on my laptop I had done here by the local hackerspace. I come from the PostgreSQL community, so I got an elephant etched into the laptop. It only costs 10 euro and looks awesome.

[slide]

I’ve also made a page of resources for this talk. I’ll be quoting some facts and figures and this pirate pad has links to all the documents I quoted.

For those of you from countries other than Ireland, Great Britain, United States, German and Turkey – if you know where to get a copy of computer science curriculum standards for your country, please add a link. Right at the top of this pirate pad is a link to another pirate pad where we’re collecting links to curriculum standards.

[slide]

And finally, this talk is really a speech, without a lot of bullet points. So, the slides will hopefully be helpful and interesting, but occasionally I will be showing nothing on a slide as I speak. This is a feature, not a bug.

[slide]

For the past few years, I’ve been giving talks about mistakes, starting with problems I had keeping chickens alive in my backyard. Here’s a map of my failures. Scotty is familiar with the video that is online that tells the whole story of how all these chickens died.

Next, I talked about system administration failures – like what happens when a new sysadmin runs UNIX find commands to clean up — and delete all the zero length files, including devices, on a system. Or how to take down a data center with four network cables and spanning tree turned off. Here’s a tip: it really only takes first cable.

And most recently, I talked about hiring – how difficult it is to find the right people for tech industry jobs, how once you hire them, they might find another job way too quickly, and how the tech industry’s demand for skilled developers – and especially for developers with open source skills – is growing faster than we’re able to train people.

Computer science enrollment at universities has decreased by about 3% since 2005 in the United States (from 11% of students down to 7% overall).

[slide]

At the same time the projected demand for CS and computer-related jobs will increase more than 50% by 2018, creating about 1.5 million new jobs in the US alone. Researchers say that even in places where enrollment in CS programs is up, companies report that they can’t trust that graduates have any of the fundamental skills that are necessary for new jobs.

And these companies aren’t just in Silicon Valley – in Oregon (where I’m from), the Netherlands (where I landed before I got to FrOSCon) and from what I’ve heard these last few days, Germany, are all experiencing shortages in skilled developers.

But I’m not here to talk about those things either.

Today, I’m going to share some observations about computer science education. I believe that our skill shortages start at the earliest stages in our schools, and if the system is left as it is, open source will suffer the most.

[slide]

In a survey of 2700 FOSS developers, 70% had at least a Bachelors degree, and most discovered FOSS sometime between the ages of 18-22. This age, and this time in college is the perfect time to connect with and attract people into the free software lifestyle. And think about this, how much easier would recruitment be if every student at university was already exposed to computer science ideas when they were in primary and secondary school?

[slide]

You may not know this, but my husband, Scott, is a high school teacher. That’s where I got my German last name. He specializes in global studies, journalism and psychology.

Recently, he joined forces with a friend of mine named Michelle Rowley to help teach women how to program with Python. Naturally, I volunteered to mentor in the classes that were offered.

[slide]

This is a picture from one of the classes. Before these workshops, I had never tried to teach anyone how to program.

For the workshops, I mentored groups of 6 or 8 women over two days. We walked around the tables, answering questions and just observing as some students learned about variables, conditionals and functions for the very first time. I enjoyed getting to know a group of women who were really excited and looking forward to applying the skills they were about to learn.

Mentoring made me feel great, but it was also a little shocking.

[slide]

Our first lessons explained file system navigation, the command-line and how to set up a GUI text editor. Some people quickly became lost and confused. The connection between a graphical filesystem browser and the command-line was very difficult.

Most students had never opened up a terminal and then, beyond that, typed a command into a terminal before. But that’s not all that surprising. What did surprise me was that some had never looked at files through the graphical file browser, instead using menus to find recently used files, or saving everyone into just one folder, or just using web-based file management tool like Google Docs. For those women, I found myself at a loss. I sat thinking during a break about how exactly I could explain a filesystem to someone who had never been exposed to the idea before. I thought hard about real world examples that would help me explain.

My hope is that you’re all thinking now about metaphors you’d use, pictures you’d draw and what you’d say to a person who didn’t understand filesystems. Or maybe, now that I’ve said that, you’re thinking about it now. Maybe you’re thinking about a person in your life who you might teach this exact lesson to. A parent, a brother or sister, a niece, your daughter or son.

I hope you are thinking, because I want to ask each of you to do something after this talk is done. I want you to sit down with an important person in your life who doesn’t understand a computer science concept like filesystems and teach them. My guess is, with the right lesson, you can teach this to someone in an hour. And if we don’t have the right lesson now, if enough of us try this out, we’ll end up with the best lesson in the world for teaching a person what filesystems are, using real-world examples and the feedback from all our loved ones about what worked and what didn’t.

There’s an important reason why I want you to do this.

[slide]

I want us to demonstrate that sharing lessons works. UNESCO recently made the Paris Declaration. In it they said that they wanted to encourage the open licensing of educational materials produced with public funds. Recently, I contacted an organization to ask if I could transcribe a couple lessons that they’d shared in PDFs into text form to make them easier to use and share them in a git repo. My idea was: share the lessons and let people submit changes and observations as diffs.

The organization that published the lessons told me that they couldn’t allow me to use their lessons in this way, because the research was government funded.

I believe that we can demonstrate to teachers and the organizations creating curriculum how useful it can be to share, so that no one gives me that excuse ever again.

I want to show teachers how interesting and engaging it is to let people take a lesson, try it out and report back. These, after all, are the same skills we need to work on open source software Except we’ll apply this skill to teaching a lesson.

So, get ready. I really am going to ask you all to do that.

[slide]

I started understanding what programming was my second year of college. I’d spent almost a year doing tech support at my university, getting the job after some friends taught me how to install linux from floppies and enough UNIX commands to be dangerous. One day, a friend sat me down and tried to teach me PASCAL from a book. The experience left me frustrated, and even angry. I remember thinking that very little of it made sense, and I felt very stupid. I decided at that moment that I never wanted to learn programming.

Later, a different friend from college, Istvan Marko, sat me down in front of a command line prompt and showed me a shell script. He told me about his work automating configurations and showed me how to set up linux systems way more quickly than I could by entering commands one at a time. The automation blew my mind.

What he modeled for me in shell scripting immediately made my work life better. The tools he showed me applied to what I already knew about computers and installing new linux systems, and I saw immediately how I could use it all.

A whole world opened up as I thought through problem after problem, wrote little scripts to recompile kernels, and copied tricks from other friends like timing commands or redirecting output from STDERR to STDOUT. In the beginning I was just copying and studying because I was a little afraid of making mistakes — automation was so powerful! But soon I was remixing and writing my own stuff from scratch. I was hooked.

The next year, I switched my degree program from Chemistry to Computer Science.

So, I don’t think every person exposed to shell scripting will want to become a developer. But there were two things that happened for me in that lesson: what Istvan managed to get right was teaching me in my “zone of proximal development” or ZPD. It’s an education term that basically means — it was just challenging enough to be interesting, but not so hard that I got completely frustrated. This zone is where people learn things really well.

[slide]

The other important thing that happened was that the skill my friend taught me was something I could immediately apply elsewhere. But first, he worked with me, what we call guided practice, to rewrite a simple shell script with my username as a variable. Then I went off on my own, writing my own scripts to start and stop network interfaces and automatically connect to servers and run commands. This is what we call independent practice. And later, when I started writing Perl, I wrote my Perl exactly like I was writing bash scripts. I had just generalized my skills to another language! Maybe in the worst way possible!

But what all those things were – the modeling, the guided practice, the independent practice and the generalization – was how I really learned a new skill. I learned how to think about tasks with automation in mind, with parameters and variables in mind. And I really, really learned it well because my friend took the time to make sure that I learned it.

My experience of having a real-world application for a new skill matches up with research about keeping women and minorities, and many men, engaged in computer science. The process of customizing curriculum for the life experience of students is called contextualization. And of course, each person’s context is different. Part of the challenge for educators is designing courses that can be relevant to students from a variety of backgrounds, perhaps very different than the teacher. Like, teaching a bubble sort of student names in the physical world by having kids get up and move around, instead of teaching sorting only with numbers on a screen. Or using election data from local elections that affect students lives to teach about database schema and report design.

Or, when you’re thinking about this lesson you’re going to teach about filesystems, find a way to tie it to the life of the person you’re teaching. Have they ever “lost” a file that you later helped them find with the filesystem “search”? Have they ever lost a hard drive, or part of a directory, or lost something “in the cloud”. Have they created files on their computer? Do they know where those files are? Or what “where” means on a computer? Could you maybe draw some kind of structure to help them think about how the files are organized? I’m sure you’ll come up with something great to fit your student’s experience.

[slide]

Some people believe that the reason why we don’t have enough people with the right kinds of developer skills is because university CS programs just aren’t teaching the right things. And, honestly, a lot of programmers never went to college for computer science!

For all of us at FrOSCon, who are often trying to hire people with open source specific skills, it’s certainly true that very few universities are training students for that. But I think there’s a much bigger problem than the university programs out there.

[slide]

If you look at CS curriculum versus math, science, history or literature, you’ll find that there’s almost no computer science taught in primary and secondary schools. In the US, over the past 10 years we have lost 35% of the comp sci classes taught in high school, which is 9-12 grades. In addition, we have very few computer science teachers, and inconsistent standards for testing and qualifying CS teachers — leading to a teacher shortage in the places where CS is actually wanted by a school.

[slide]

I talked with Inga Herber, one of the core organizing volunteers here at FrOSCon, on Thursday night. She’s is preparing to teach secondary school computer science here in Germany. Her observations were that here, there’s a strong movement in the schools to get more computer science classes, yet there are still not many qualified teachers.

But worse than the lack of classes and teachers, if you look at what is being taught in the few places where something like CS is available, we see classes like basic keyboarding — which drills to help you type faster — are given the “computer science” label. Also — there are classes on how to use Excel and Word, searching the internet, or how to program in obscure or outdated languages, which for students often means just copying and pasting functions out of books. We’re actually teaching the “copy pasta” form of programming in our schools!

The most promising classes in high school would seem to be those that teach students how to take apart and put back together computers. Knowing the parts of a computer is certainly useful. But learning computer science by taking apart and putting computers back together is like learning to read by tearing books apart and putting them back together. (thanks to Mike Lee for that analogy) In the same way that we don’t think of bookbinding as essential for literacy, taking apart and putting together computers, while fun and educational, will not teach computer science literacy.

[slide]

What we really need to teach students has nothing to do with keyboards, the office suite or motherboards. In the words of the the “Exploring computer science” curriculum, we need to teach “computational thinking practices of algorithm development, problem solving and programming within the context of problems that are relevant to student’s lives.”

This idea of “computational thinking” comes via Jeanette Wing, who wrote about this idea for the ACM in 2006. “Computational thinking is a fundamental skill for everyone, not just for computer scientists. To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking.”

[slide]

And she provides a much longer definition later, that includes this, my favorite part:

[it's] A way that humans, not computers, think. Computational thinking is a way humans solve problems; it is not trying to get humans to think like computers. Computers are dull and boring; humans are clever and imaginative. We humans make computers exciting. Equipped with computing devices, we use our cleverness to tackle problems we would not dare take on before the age of computing and build systems with functionality limited only by our imaginations.
Jeanette Wing’s description makes me think about a world where computer science would be inspiring to everyone. And not just inspiring, but creative and fun.

[slide]

It makes me think of the great Ada Lovelace comics I’ve seen like this one by Sydney Padua, where Charles Babbage and Ada Lovelace, creators of the first computing machine, are crimefighters. The heroes are quirky, smart and solving devilishly tricky problems.

Another show I love the new Sherlock, a BBC TV show, for how wonderfully geeky he is in his problem solving, and how he often uses silly pranks with technology to show off. The first episode has him sending group texts as a sarcastic counterpoint to a police chief’s press conference.

In the same way that Einstein and Feynman are crucial parts of the storytelling around physics, we need to talk more about the heroes of computer science, about what made them human, and interesting and not like computers at all.

And armed with these fascinating stories, we can share them as part of our teaching. Because this is all so fun — this conference, is full of people with great stories, working on an event that spans seven years. There have been great times, and near disasters, and triumphs. Those can be our examples and starting points for explaining the computer science that we want our friends and family to understand.

[slide]

As I’ve done my research, its become painfully clear how separated open source developers are from teachers. There’s a lot of reasons why this might be. I married a teacher, but I don’t think advocating for marriage between teachers and open source people is a scalable solution.

So, other than marriage, how can we invite more teachers into open source?

One barrier to communicating with teachers is being able to speak the language of education. This is not just the terms teachers use for their work. It’s also having the experience of and relating to teaching.

Teaching is incredibly difficult. It’s both mentally and physically challenging. When I finished mentoring students for one day and teaching a single hour-long lesson, I was ready for a beer and sleep. I can’t imagine doing that every day.

[slide]

But teachers – they do this for 8 hours a day, every day. A valuable experience for every developer is to just for a few minutes, to teach something new, in person and without a computer. I don’t think you need to get in front of a classroom to experience this.

What you can do is schedule an hour with a friend, a colleague or a family member and try to teach. See if you can get them to really understand, and then demonstrate the new skill back to you. Like with the filesystems – after you explain, see if they can do something specific — like find a special file (easter egg planting!), or explain back to you what it is that you taught them, or even better: watch as they try to explain filesystems to someone else.

Once you’ve had the experience of helping someone master a brand new skill, you’ve started down the path that teachers walk every day. This is a shared experience, a point of empathy you can draw on if you ever have the chance to talk directly to a teacher.

[slide]

For too long, free software advocates have focused on getting open source software into classrooms without understanding exactly what that means to teachers. When something goes wrong with my servers or my laptop, it’s my job to figure out what is wrong and to fix it. I have time in my day for mistakes, and for bugs.

Teachers, on the other hand, have a certain number of hours in a year with students. They count them! That time is carefully scripted, because teaching is actually very difficult. Teachers can’t improvise excellent teaching when the computers they are using crash, or the software doesn’t work the way they expected, or the user interface changes suddenly after an upgrade. All the things that I think of as features, for teachers are another thing that takes away time they would spend creating lessons and teaching students. This is why I think free software is not more widely used in schools.

[slide]

I do not mean to diminish the efforts of the many awesome projects like Skolelinux, a school-specific Linux distribution based on Debian. But if we look at the software that runs grading and attendance, the software that kids use to play games, and the operating systems on teacher computers — this software is largely still proprietary.

I hope that I can plant a seed of empathy in you all for what teachers are up against. Think about how much time that you spend considering the filesystem lesson you’re going to teach, for example. My husband was given one hour per day to plan for 7 hours of teaching. I spent nearly 100 hours preparing for this keynote. The ratio of preparation time to instruction time is terrifyingly small for professional teachers.

[slide]

If open source contributors all experienced what in-person teaching is like to the non-technical people in our lives, learning to use modeling, guided practice, independent practice and generalization in our own lessons about open source technology, we will develop a common vocabulary to talk with teachers. In the same way that in free software we share a vocabulary that starts with freedom, source code and sharing.

And once we can talk with teachers, and we do so on a regular basis, we can ask them what it is that they really need, and how we as open source experts can help them make schools and teaching even better. Because, really, teachers and the free software movement are natural allies in our efforts to share information.

[slide]

We have a tremendous problem ahead of us. There aren’t enough people who understand the fundamentals of computer science. And a lot is at stake.

We’re in an era where privacy, financial security and our elections are managed by software. If we all get this right, then software we create will also be used to fight corruption, solve important problems and make us all more free.

Before I leave, I want to share a story from 2009. This isn’t a free software story, not yet, but it’s about the power of computational thinking when applied to the democratic process.

[slide]

So in 2009, I was invited to come teach a class about PostgreSQL. I travelled to Ondo State, Nigeria, specifically to Akure.

[slide]

Here’s a picture of my students. They had degrees in computer science or taken programming classes, and several were professional developers.

[slide]

It was from them that I learned how the Governor of Ondo state, Olusegun Mimiko, won his election. He was running against former Governor Agagu, the People’s Democratic Party candidate, which is also the majority party across Nigeria.

[slide]

You may not have heard about this, but back in 2007 when the elections were held, there was country-wide unrest. United Nations observers reported violence, and accusations of voter fraud were raised.

[slide]

So, once the ballots were counted, Mimiko had lost.

[slide]

But, his campaign had been so sure they were going to win because of poll results.

[slide]

So, they filed a lawsuit and got ahold of the ballot boxes for a recount. And it was at this point where they did something different.

[slide]

The way that you vote in Nigeria is with a thumb-print next to the candidate you select on a paper ballot. So, if there was fraud, the Mimiko team reasoned, you would have lots of ballots with the same thumb print. A local group of techies put together a plan. They would electronically scan in all the ballots and then have someone validate fingerprints and find duplicates.

[slide]

They searched the world for a fingerprint expert, and found Adrian Forty in Great Britain. Adrian Forty and his team analyzed all the ballots, and they found a few duplicates.

[slide]

In fact, they found 84,814 duplicate fingerprints. In one case a single fingerprint was used 300 times.

[slide]
After a two year court battle, finally, they won. :) But the work was just beginning.
[slide]

One of the places my colleagues took me was Idanre Hill, which is on the tentative world heritage site list. This is a picture of a handrail that was cut by the outgoing government. My colleagues said this in Yoruba which means “left like thieves.” They won the election, but got no help from the outgoing government to transition to power.

[slide]

Of course, the method for detecting voter fraud was viral. The expertise in counting fingerprints has been shared with neighboring states, and similar fraud was uncovered and stopped in Osun State as well.

[slide]

The new government in Ondo State has been very focused on IT initiatives, and in particular focused on what using cell phones to connect citizens with their government can do. One initiative gave all new mothers cell phones to stay in touch with their doctors. The cell phone program resulted in reducing the number of mother and child deaths to just 1 last year, a 35% drop in mother and infant mortality. Their goal is a 75% reduction in infant mortality by 2015.

[slide]

This last picture was taken as two friends and I hiked up Idanre Hill.

Which brings me to what I want you all to do.

We need to teach people how to ask the right questions, to be suspicious or satisfied by the answers they get to their questions. We need to teach people how to break apart problems into understandable chunks instead of assuming that they will never understand a complicated process.

And we need to teach them the value of sharing source code. What it means to have software freedom, and how much it matters to us that everyone has the opportunity to learn from and build upon the work of others.

I believe that we can demonstrate again, to the world, how useful it can be to share, how interesting and engaging it is to let people take a lesson, try it out and report back.

Think about filesystems. Think about your friends and family. Who could you spend an hour with, teaching them an important skill that will help them understand our world of computers?

Thank you very much for your time today.

To encourage you all to do this, I created a little website where you can publicly say that you’re going to try to teach a lesson to someone. The authentication system only supports twitter right now – very sorry. But I have some code and was planning on hacking in email login this afternoon. I also have published the code on Github and linked to it from the site. I hope that you’ll have a look, and certainly if you find bugs, let me know.

Theo Schlossnagle: The myopic focus on IT and engineering has to stop.

$
0
0

Business is king. Customers rule. Service is everything. Yet every organization I go into has an engineering group that can't see outside their bubble. Perhaps they can, but they certainly choose not to.

I'm an engineer, I write code. I've written approaching 100k lines of C code in my life time, I've administered tens of thousands of systems in my career and I've help plan some of the largest customer-facing infrastructure ever built. I've learned a tremendous amount about technology and the hubristic nature of engineering teams. The most important take away from all of this? The technology doesn't mean anything unless it enables business by providing better service to customers.

Now, I realize that when I rant about this to technology folk, they emphatically agree. But, I'm tired of the lip service. People today in architecture, engineering and operations say again and again that their focus on enabling better customer experience. It's a nice sentiment, but every time I dive into someone's instrumentation and monitoring, I see an absolute vacuum when it comes to non-IT data.

The obvious things like financial and customer service metrics are missing, but so are all the more subtle things. Hiring is hard; finding and retaining talent is challenging; providing good benefits that add value and increase job appeal is a competitive task. All of these things are critically important to the organization as a whole (and specifically engineering and IT) and yet they are completely absent from the "monitoring" within the organization.

The truth is that there is absolutely critical telemetry coming from every facet of your organization. All of this telemetry is either directly related to providing better service to customers or directly related to providing better service to your organization itself which, in turn, stabilizes the platform on which you deliver products and services. Of this, I shouldn't have to convince you and I find that no convincing of the general population is required. Yet, here we are with almost every organization I see standing blind to this vital information.

Don't get me wrong, I don't think technology isn't a first-class component of today's (and tomorrow's) organizations. In fact, I think the technology group has been applying radically advanced techniques to telemetry data for years. It's high time that these techniques and tools were applied to the organization unabridged.

There is a profound shift in data transparency and accountability coming to the organization to tomorrow. If you don't buy in, you'll simply fail to achieve the agility and efficiencies of your competition. I'm here, with Circonus, to make that happen.

Business is king, not engineering. The difficult (but exceptionally simple) shift of engineering's focus from serving itself to serving the business as a whole will remake IT as the engine of the organization. As soon as you embrace this shift, technology will be the most powerful tool your organization has at its disposal.

Christoph Berg: PostgreSQL in Debian Hackathon

$
0
0

Almost a year has passed since my talk at pgconf.eu 2011 in Amsterdam on Connecting the Debian and PostgreSQL worlds, and unfortunately little has happened on that front, mostly due to my limited spare time between family and job. pgapt.debian.net is up and running, but got few updates and is lagging behind on PostgreSQL releases.

Luckily, we got the project moving. Dimitri Fontaine and Magnus Hagander suggested to do a face-to-face meeting, so we got together at my house for two days last week and discussed ideas, repository layouts, build scripts, and whatnot to get all of us aligned for pushing the project ahead. My employer sponsored my time off work for that. We almost finished moving the repository to postgresql.org infrastructure, barring some questions of how to hook the repository into the existing mirror infrastructure; this should get resolved this week.

The build server running Jenkins is still located on my laptop, but moving this to a proper host will also happen really soon now. We are using Mika Prokop's jenkins-debian-glue scripts for driving the package build from Jenkins. The big plus point about Jenkins is that it makes executing jobs on different distributions and architectures in parallel much easier than a bunch of homemade shell scripts could get us with reasonable effort.

Here's a list of random points we discussed:

  • We decided to go for "pgdg" in version numbers and distribution names, i.e. packages will have version numbers like 9.1.5-1.pgdg+1, with distributions wheezy-pgdg, squeeze-pgdg, and so on.
  • There will be Debian-testing-style distributions called like wheezy-pgdg-testing that packages go into for some time before they get promoted to the "live" distributions.
  • PostgreSQL versions out of support (8.2 and below) will not be removed from the repository, but will be moved to distributions called like wheezy-pgdg-deprecated. People will still be able to use them, but the naming should make it clear that they should really be upgrading.
  • We have a slightly modified (compared to Debian unstable) postgresql-common package that sets the "supported-versions" to all versions supported by the PostgreSQL project. That will make the postgresql-server-dev-all package pull in build-dependencies for all server versions, and make extension module packages compile for all of them automatically. (Provided they are using pg_buildext.)
  • There's no Ubuntu support in there yet, but that's mostly only a matter of adding more cowbuilder chroots to the build jobs. TBD soon.

We really aim at using unmodified packages from Debian as much as possible, and in fact this project doesn't mean to replace Debian's PostgreSQL packaging work, but to extend it beyond the number of server versions (and Debian and Ubuntu versions covered) supported. The people behind the Debian and Ubuntu packages, and this repository are mostly the same, so we will claim that "our" packages will be the same quality as the "original" ones. Big thanks go to Martin Pitt for maintaining the postgresql-common testsuite that really covers every aspect of running PostgreSQL servers on Debian/Ubuntu systems.

Stay tuned for updates! :)

Chris Travers: PostgreSQL O/R Modelling Part 3: Table Inheritance and O/R Modelling in PostgreSQL

$
0
0
Note:  PostgreSQL 9.2 will allow constraints which are not inherited.  This will significantly  impact the use of inheritance, and allow for real set/subset modelling with records present in both parent and child tables.  This is a major step forward in terms of table inheritance.

PostgreSQL allows tables, but not views or complex types, to inherit table structures.  This allows for a number of additional options when doing certain types of modelling.  In this case, we will built on our inventory_item type by adding attached notes.  However we may have notes attached to other tables too, and we want to make sure that each note can be attached to only one table.

This is exactly how we use table inheritance in LedgerSMB.

Before we begin, however it is worth looking at exactly what is inherited by a child table and what is not.

The following are inherited:
  • Basic column definitions (including whether a column is nullable)
  • Column default values
  • CHECK constraints (cannot be overridden in child tables)
  • Descendants can be implicitly casted to ancestors, and
  • Table methods are inherited 
The following are not inherited:
  • Indexes
  • Unique constraints
  • Primary Keys
  • Foreign keys
  • Rules and Triggers
The Foreign Key Problems and How to Solve Them

There are two basic foreign key problems when using table inheritance on PostgreSQL. The first is that foreign keys themselves are not inherited, and the second is that a foreign key may only target a specific relation, and so rows in child tables are not valid foreign key targets.  These problems together have the same solution, which is that where these are not distinct tables for foreign key purposes, the tables should be factored in such a way as to break this information out, and failing that, a trigger-maintained materialized view will go a long ways towards addressing all of the key management issues.

For example, for files, we might place the id and class information in a separate table and then reference these in on child relations.  This gives a valid foreign key target for the set of all inherited tuples.  Foreign keys can also be moved to this second table, allowing for them to be centrally managed as well, though it also means that JOIN operations must include the second table.  A second option might be to omit class and use the child table's tableoid.

The Primary Key Problem and How to Solve It

Similarly because indexes are not inherited, we cannot ensure that a specific combination of tuples is unique through a hierarchy.

The most obvious solution is to treat the combination of the natural primary key and the tableoid as a primary key for modelling purposes.   However this becomes only really useful when the primary key is a composite field with mutually exclusive constraints on child tables.  In other words, partitioning is necessary to sane set/subset modelling using table inheritance.

Where Not to Use Table Inheritance


Most of the table inheritance documentation suggests that the primary use of table inheritance is set/subset modelling, where an subset of a set has its own extended properties.  In practical use, however, a purely relational solution is usually cleaner for this specific sort of case and results in fewer key management issues (and therefore less overall complexity).

For set/subset modelling, a much cleaner solution is to use composite foreign keys.

For example, revising the example from the manual for table inheritance, a cleaner way to address the issue where you have cities and capitals is to do something like:

CREATE TABLE cities (
    name text primary key,
    altitude float, 
    population int,
    is_capital bool not null,
    unique(is_capital, name)
);

The unique constraint does nothing to the actual table.  If "name" is guaranteed to be unique, then ("is_capital", "name") is guaranteed to be unique.  What it does, however, is designate that as a secondary key.

You can then create a table of:

CREATE TABLE capitals (
    name text primary key,
    is_capital bool not null,
    state text not null,
    foreign key (name, is_capital) 
        references cities (name, is_capital)
        DEFERRABLE INITIALLY DEFERRED
);


 This approach, where subset membership is a part of a secondary key, allows for greater control regarding set/subset modelling.  You can then create a trigger on cities which checks to see if the city exists in capitals (enforcing an insert flow of "insert into capitals, then insert into cities, then commit").

No primary or foreign key problems occur with this solution to set/subset models.   With multiple subsets you will likely wish to generalize and use a "city_class" field of type int referencing a city_class table so you can control subsets more broadly.  However in general, table inheritance is a poor match for this problem because of foreign key constraint issues.  Primary key management issues are less of an issue however (and becoming significantly easier to solve).


The one use case for set/subset modelling using inheritance is table partitioning.  It has all the problems above, and enforcing foreign keys between two partitioned tables can become very complex, very fast unless materialized views are used to proxy the enforcement.


Table Structure:  Purely Relational Approach

CREATE SCHEMA rel_examples;
CREATE TABLE rel_examples.note (
   note text not null,
   subject text not null,
   created_by int not null,
   note_class int not null,
   created_at timestamp not null,
   id int unique,
   primary key (id, note_class)
);

CREATE TABLE rel_examples.note_to_inventory_item (
   note_id int not null,
   note_class int not null check (note_class = 1),
   item_id int references inventory_item (id),
   foreign key (note_id, note_class) 
      references rel_examples.note (id, note_class)
);

CREATE TABLE rel_examples.note_to_......;

These would then be queried using something like:

SELECT n.* FROM rel_examples.note n
  JOIN rel_examples.note_to_inventory_item n2i 
       ON n2i.note_id = n.id AND n2i.note_class = n.note_class
  JOIN inventory_item i ON i.id = n2i.item_id
 WHERE i.sku = 'TEST123';

This doesn't preclude other Object-Relational approaches being used here.  I could of course add a method tsvector to make full text searches easier, or the like and apply that to note, and it would work just as well.  However it doesn't really apply to the data model very well, and some performance can be gained, since we are only looking at pulling notes by attachment, by breaking up the table into more workable chunks which could be queried independently.

Additionally the global note table is a maintenance  nightmare.  Enforcing business rules in that case is very difficult.

Table Structure:  Object-Relational Approach

An Object-Relational approach starts off looking somewhat different:

CREATE TABLE note (
    id serial not null unique,
    note text not null,
    subject text not null,
    note_class int not null,
    created_by int not null,
    ref_key int not null,
    created_at timestamp not null
);

We'd then probably add a trigger to prevent this table from receiving rows.  Note that we cannot use a check constraint for this purpose because it will be inherited by every child table.  Typically we'd use a trigger instead of a rule because that way we can easily raise an exception.

Then we'd

CREATE TABLE inventory_note (
    CHECK (note_class = 1),
    PRIMARY KEY (id, note_class),
    FOREIGN KEY (ref_key) references inventory_item(id)
) INHERITS (note);

Suppose we also create a tsvector method as such:

CREATE OR REPLACE FUNCTION tsvector(note) RETURNS tsvector
LANGUAGE SQL AS $$
SELECT tsvector($1.subject || ' ' || $1.note);
$$;

Now in my example database, I have an inventory_item with an id of 2, so I might:

INSERT INTO inventory_note
     (subject, note, note_class, created_by, created_at, ref_key)
VALUES
     ('Testing Notes', 'This is a test of the note system', 1, 1, now(), 2);

We know this worked:

or_examples=# select * from note;
 id |               note                |    subject    | note_class | created_b
y | ref_key |         created_at        
----+-----------------------------------+---------------+------------+----------
--+---------+----------------------------
  1 | This is a test of the note system | Testing Notes |          1 |         
1 |       2 | 2012-08-22 01:17:27.807437
(1 row)

Note above that I selected from the parent table and child tables were pulled in automatically.  If I want to exclude these I can use SELECT * FROM ONLY note and none of these will be pulled in.

We can also note that the method is inherited:

or_examples=# select n.tsvector from inventory_note n;
                              tsvector                              
---------------------------------------------------------------------
 'Notes' 'Testing' 'This' 'a' 'is' 'note' 'of' 'system' 'test' 'the'
(1 row)

The tables could then be queried with one of two equivalent approaches:

SELECT n.* FROM note n
  JOIN inventory_item i ON n.ref_key = i.id AND n.note_class = 1
 WHERE i.sku  = 'TEST123';

or alternatively:

SELECT n.* FROM inventory_note n
  JOIN inventory_item i ON n.ref_key = i.id
 WHERE i.sku = 'TEST123';

Both queries return the same results.

Advantages of the Object-Relational Approach


The object-relational model allows us to be sure that every note is attached to something, which we cannot do gracefully in the relational model, and it allows us to ensure that two notes attached to different items are in fact different.  It also allows for simpler SQL.

On the performance side, it is worth noting that in both the query cases above, PostgreSQL will only query child tables where the constraints could be met by the parameters of the query, so if we have a dozen child tables, each of which may be very large, we only query the table we are interested in.  In other words, inherited tables are essentially naturally partitioned (and in fact people use inheritance to do table partitioning in PostgreSQL).

Multiple Inheritance

Multiple inheritance is supported, but if used care must be taken that identically named fields are actually used in both classes in compatible ways.  If they are not, then problems occur.  Multiple inheritance can however be used safely both for interface development (when emulating inheritance in composite types, see below), and where only one of the inheriting relations contains actual columns (the other might contain constraints).

Thus we could still safely do something like:

CREATE note (
    note text,
    subject text,
    ....
);

CREATE table joins_inventory_item (
   inventory_id int,
);

CREATE NOTE inventory_note (
    FOREIGN KEY (inventory_id) REFERENCES inventory_item(id),
    ... -- partial primary key constraints, and primary key def
) inherits (note, joins_inventory_item);

This could then be used to enforce consistency in interface design, ensuring more readable queries and the like, but it doesn't stop there.  We can use joins_inventory_item to be the vehicle for a way to follow a reference.  So:

CREATE OR REPLACE FUNCTION inventory_item
(joins_inventory_item)
RETURNS inventory_item
LANGUAGE SQL AS
$$ SELECT * FROM inventory_item WHERE id = $1.inventory_id $$; 

In this case, then you can use inventory_item as a virtual pointer to the inventory item being joined as such:

CREATE TABLE inventory_barcode (
    barcode text,
    FOREIGN KEY (inventory_id) REFERENCES inventory_item(id)
) INHERITS (joins_inventory_item);

The uses for multiple inheritance however don't end there.  Multiple inheritance gives you the ability to define your database in re-usable chunks with logic following that chunk.    Alternative presentation of sets of columns then becomes manageable in a way it is not with complex types.

A More Complex Example:  Landmarks, Countries, and Notes
Another example may be where we are storing landmarks, which are attached to countries.  Our basic table schema may be:

CREATE TABLE country (
    id serial not null unique,
    name text primary key,
    short_name varchar(2) not null unique
);

CREATE TABLE country_ref (
    country_id int
);

Our traversal function:

CREATE FUNCTION country(country_ref) RETURNS country
LANGUAGE SQL STABLE AS
$BODY$ SELECT * FROM country WHERE id = $1.country_id $BODY$;

Next to notes:

CREATE TABLE note_fields (
    note_subject text,
    note_content text
);

Like country_ref, we probably will never query note_fields themselves.  There is very little of value in querying only contents like this.  However we can add a tsvector derived value:

CREATE FUNCTION note_tsvector(note_fields)
RETURNS tsvector IMMUTABLE
LANGUAGE SQL AS $BODY$

SELECT to_tsvector('english', coalesce($1.note_subject, '') || ' ' ||
coalesce($1.note_content, ''));

$BODY$;

Now, the function name is prefixed with note_ in order to avoid conflicts with other similar functions.  This would allow multiple searchable fields to be combined in a table and then mixed by inheriting classes.

CREATE TABLE landmark (
    id serial not null unique,
    name text primary key,
    nearest_city text not null,
    foreign key (country_id) references country(id),
    CHECK (country_id IS NOT NULL),
    CHECK (note_content IS NOT NULL)
) INHERITS (note_fields, country_ref);

The actual table data will not display too well here, but anyway here are a few slices of it:

or_examples=# select id, name, nearest_city, country_id from landmark;
 id |        name        | nearest_city  | country_id
----+--------------------+---------------+------------
  1 | Eiffel Tower       | Paris         |          1
  2 | Borobudur          | Jogjakarta    |          2
  3 | CN Tower           | Toronto       |          3
  4 | Golden Gate Bridge | San Francisco |          4
(4 rows)

and the rest of the table:

or_examples=# select id, name, note_content from landmark;
 id |        name        |              note_content             
----+--------------------+----------------------------------------
  1 | Eiffel Tower       | Designed by a great bridge builder
  2 | Borobudur          | Largest Buddhist monument in the world
  3 | CN Tower           | Major Toronto Landmark
  4 | Golden Gate Bridge | Iconic suspension bridge
(4 rows)

The note_subject field is null on all records.

SELECT name, (l.country).name as country_name FROM landmark l;

or_examples=# SELECT name, (l.country).name as country_name FROM landmark l;
        name        | country_name
--------------------+---------------
 Eiffel Tower       | France
 Borobudur          | Indonesia
 CN Tower           | Canada
 Golden Gate Bridge | United States
(4 rows)

Demonstrating the note_tsvector interface.  Note that this could be improved upon by creating different tsvectors for different languages.

or_examples=# select name, note_content from landmark l where
plainto_tsquery('english', 'bridge') @@ l.note_tsvector;
        name        |            note_content
--------------------+------------------------------------
 Eiffel Tower       | Designed by a great bridge builder
 Golden Gate Bridge | Iconic suspension bridge
(2 rows)

So the above example shows how the interfaces offered by two different parent tables can be invoked by a child table.  This sort of approach avoids a lot of the problems that come with storing composite types in columns (see upcoming posts) because the complex types are stored inline inside the table.


Base Tables as Catalogs

The base tables in these designs are not generally useful to query for data queries. However they can simplify certain types of other operations both manually and data-integrity wise.   The uses for data integrity will the subject of a future post.  However here we will focus on manual tasks where this sort of inheritance can help.


Suppose we use the above note structure in a program which manages customer contacts, and one individual discovers inappropriate language in some of the notes entered by another worker.  In this case it may be helpful to try to determine the scope of the problem.  Suppose the note includes language like "I told the customer to stfu."  We might want to:


SELECT *, tableoid::regclass::text as table_name
  FROM note_fields nf 
 WHERE nf.note_tsvector @@ to_tsquery([pattern]);


In this case we may be trying to determine whether to fire the offending employee, or we may have fired the employee and be trying to figure out what sort of damage control is necessary.


A test query with the current data set above shows what will be shown:


or_examples=# SELECT *, tableoid::regclass::text as table_name
  FROM note_fields nf
 WHERE nf.note_tsvector @@ to_tsquery('bridge');
 note_subject |            note_content            | table_name
--------------+------------------------------------+------------
              | Designed by a great bridge builder | landmark
              | Iconic suspension bridge           | landmark
(2 rows)


This sort of interface has obvious uses when trying to do set/subset modelling where the interface is inherited.  Careful design is required to make this perform adequately however.  In the case above, we will be having to do somewhat deep inspection of all tables but we don't necessarily require immediate answers, however.


In general, this sort of approach makes it somewhat difficult to store records in both parent and child tables without reducing semantic clarity of those tables.  In this regard, the parent acts like a catalog of all children.  For this reason I remain sceptical whether even with NO INHERIT check constraints it will be a good idea to insert records in both parent and child tables.  NO INHERIT check constraints, however finally provide a useful tool for enforcing this constraint however.


Contrasts with DB2, Informix, and Oracle

DB2 and Oracle have largely adapted Informix's approach to table inheritance, which supports single table inheritance only, and therefore requires that complex types be stored in columns.  With multiple inheritance, if we are careful about column naming, we can actually in-line multiple complex types in the relation.  This provides at once both a more relational interface and one which admits of better object design.

Emulating Inhertance in Composite Types

Composite types do not allow inheritance in PostgreSQL, and neither do views.  In general, I would consider mixing inheritance and views to be dangerous and so would urge folks not to get around this limitation (which may be possible using RULEs and inherited tables).

One basic way to address this is to create a type schema and inherit tables from a table called something like is_abstract as follows:

CREATE TABLE is_abstract (check (false ));

No table that inherits is_abstract will ever be allowed to store rows, but check constraints are only checked on data storage because otherwise many problems arise so the fact that we have a check constraint that, by definition, always fails allows us to deny use of the table for storing data but not use of a relation as a class which could be instantiated from data stored elsewhere.

Then we can do something like

CREATE TABLE types.my_table (
    id int,
    content text
) INHERITS (is_abstract);

Once this is done, types.mytable will never be able to store rows.  You can then inherit from it, and use it to build a logical data model.  If you want to change this later and actually store data, this can be done using alter table statements, first breaking the inheritance, and second droping the is_abstract_check constraint.  Once these two are done, the inheritance tree can be used to store information.

Next:  Inheriting interfaces in set/subset modelling
Next Week:  Complex Types

Gabriele Bartolini: Submit your talk for PGDay.IT

$
0
0

The sixth edition of the Italian PGDay will be held in Prato on November 23 in the historical location of the Monash University Centre.

The call for papers has officially been opened today. International speakers are most welcome.

Although the event is primarily intended for an Italian language audience (Italy and Switzerland), talks in English from members of the international Community of Postgres are always a success.

The deadline for the submission of the abstracts is September 30. Detailed instructions for internal speakers are in the “International Call for Papers” page on the website. The president of the technical committee is Phd Luca Ferrari, vice president of ITPUG.

Finally, Prato is located in Tuscany, near Florence (distant less than 20km). It could be a great chance for you to spend some time with one of the warmest Postgres communities in the world, drink good wine and eat delicious food. Take advantage of the weekend and visit art galleries and museums in the area, with masterpieces by Leonardo, Michelangelo, Botticelli, Donatello, etc.

Martin Pitt: PostgreSQL 9.2 RC1 available for testing

Michael Paquier: Postgres: TRIGGER for beginners

$
0
0
This post has as goal to provide basics to help you understanding how work triggers in PostgreSQL. A trigger is the possibility to associate an automatic operation to a table in case a write event happens on this given table. Here is the synopsis of this query. CREATE [ CONSTRAINT ] TRIGGER name { BEFORE [...]

Tatsuo Ishii: Larger large objects

$
0
0
Large objects (BLOBs) have been there since PostgreSQL was born. The size limit of a large objects has been 2GB(assuming default block size) . Now I decide to expand the limit for PostgreSQL 9.3: 4TB is the target. Actually PostgreSQL backend could hold up to 4TB large objects. It has been just limitation of API: for example, lo_lseek() and lo_tell cannot return over 2GB offset. Those function's return type is "int". So you might wonder why not they cannot return over 2GB value? Well, the secret is frontend/backend protocol for large object.

The underlying protocol is called "fast path interface". It's similar to RPC(Remote Procedure Call). Client sends "Function call" packet along with target function OID(Object Id). The called function within backend is executed and the result is returned back through "Function call response".

The functions called in large object interface are:
  • lo_open
  • lo_close
  • lo_creat
  • lo_create
  • lo_unlink
  • lo_lseek
  • lo_tell
  • lo_truncate
  • loread
  • lowrite
Those functions OIDs are retrieved from backend at the first time when accessing large objects and they are cached in the connection handle(I'm talking about libpq and other interfaces such as JDBC's implementation might be different).

Problem is, lo_lseek and lo_tell as I said earlier. First, their offset parameter is defined 4 bytes long. Second their result length is defined as 4 byte long. So we can handle only up 2^31-1 = 2GB. What shall we do? Well, we will add new function in backend namely, lo_lseek64 and lo_tell64. Libpq will check if those 64-bit functions exist. If yes, then use them. Otherwise (that means backend is likely pre-9.3 version) we use plain old 32-bit limited lo_lseek and lo_tell. This way, we do not break backward compatibility. Of course you need to use 9.3 libpq to enjoy "larger large objects".

I hope I'm going to post the first cut of patch by in September.


Hubert 'depesz' Lubaczewski: Filling the gaps with window functions

$
0
0
Couple of days ago I had a problem that I couldn’t solve after ~ 2 hours, and decided to ask on IRC. Almost immediately after asking, I figured out the solution, but David asked me to write about the solution, even though it’s now (for me) completely obvious. The problem was like this: I had [...]
Viewing all 9800 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>