Quantcast
Channel: Planet PostgreSQL
Viewing all 9929 articles
Browse latest View live

Umair Shahid: HOWTO use JSON functionality in PostgreSQL

$
0
0

In a previous post, I talked about the excitement that surrounds NoSQL support in PostgreSQL. Today, I will dive a little deeper into the technical details of native JSON support that makes NoSQL possible.

Below, I explain how to use some basic functions and operators.

Creating a table with JSON column

So this is fairly simple. You can declare a JSON column just like a column of any other data type. Below, I create a table ‘sales’ with 2 columns, ‘id’ and ‘sale’, with the latter being a JSON.

json_sample=# CREATE TABLE sales (id INT, sale JSON);
CREATE TABLE

Inserting JSON data

Insertion of data is pretty close to that of any other data type as well, except that you have to make sure the data is in a valid JSON format. Below, I am inserting 4 records into the table, each with a JSON containing nodes for ‘customer_name’ and a nested JSON for ‘items’ containing a ‘description’ and purchased ‘quantity’.
John bought 4 cartons of milk:

json_sample=# INSERT INTO sales 
                   VALUES (1,'{ "customer_name": "John", "items": { "description": "milk", "quantity": 4 } }');
INSERT 0 1

Susan bought 2 loaves of bread:

json_sample=# INSERT INTO sales 
                   VALUES (2,'{ "customer_name": "Susan", "items": { "description": "bread", "quantity": 2 } }');
INSERT 0 1

Mark bought a dozen bananas:

json_sample=# INSERT INTO sales 
                   VALUES (3,'{ "customer_name": "Mark", "items": { "description": "bananas", "quantity": 12 } }');
INSERT 0 1

Jane bought a pack of cereal:

json_sample=# INSERT INTO sales 
                   VALUES (4,'{ "customer_name": "Jane", "items": { "description": "cereal", "quantity": 1 } }');
INSERT 0 1

Retrieving JSON data

Like insertion, retrieving the JSON formatted data is fairly straightforward as well. Below, I am retrieving the data I inserted in the previous section.

json_sample=# SELECT * FROM sales;
id |                                      sale
----+------------------------------------------------------------------------------------
1 | { "customer_name": "John", "items": { "description": "milk", "quantity": 4 } }
2 | { "customer_name": "Susan", "items": { "description": "bread", "quantity": 2 } }
3 | { "customer_name": "Mark", "items": { "description": "bananas", "quantity": 12 } }
4 | { "customer_name": "Jane", "items": { "description": "cereal", "quantity": 1 }

Retrieving JSONs – The ‘->’ and ‘->>’ operators

Now comes the real fun part! PostgreSQL provides native operators to retrieve individual nodes of the JSON object … very powerful indeed. In this section, I discuss the ‘->’ operator, which return a JSON object and the ‘->>’ operator, which returns TEXT.

Retrieving as a JSON:

json_sample=# SELECT sale->'customer_name' AS name FROM sales;
name
---------
"John"
"Susan"
"Mark"
"Jane"
(4 rows)

Retrieving as TEXT:

json_sample=# SELECT sale->>'customer_name' AS name FROM sales;
name
-------
John
Susan
Mark
Jane
(4 rows)

Chaining the ‘->’ and ‘->>’ operators

Since ‘->’ returns a JSON object, you can use it to return a nested object within the data and chain it with the operator ‘->>’ to retrieve a specific node.

json_sample=# SELECT id, sale->'items'->>'quantity' AS quantity FROM sales;
id | quantity
----+----------
1 | 4
2 | 2
3 | 12
4 | 1
(4 rows)

Using JSONs in extract criteria for queries

The operators discussed in the previous section can be used in the WHERE clause of a query to specify an extract criteria. A few examples, using the same data set, below.

Searching for a specific description of an item within a sale:

json_sample=# SELECT * FROM sales WHERE sale->'items'->>'description' = 'milk';
id |                                     sale
----+--------------------------------------------------------------------------------
1 | { "customer_name": "John", "items": { "description": "milk", "quantity": 4 } }
(1 row)

Searching for a specific quantity as TEXT:

json_sample=# SELECT * FROM sales WHERE sale->'items'->>'quantity' = 12::TEXT;
id |                                       sale
----+------------------------------------------------------------------------------------
3 | { "customer_name": "Mark", "items": { "description": "bananas", "quantity": 12 } }
(1 row)

Searching for a specific quantity as INTEGER:

json_sample=# SELECT * FROM sales WHERE CAST(sale->'items'->>'quantity' AS integer)  = 2;
 id |                                       sale                                       
----+----------------------------------------------------------------------------------
  2 | { "customer_name": "Susan", "items": { "description": "bread", "quantity": 2 } }
(1 row)

Using JSON nodes in aggregate functions

Once you understand how to retrieve individual nodes of a JSON object, you can easily use the retrieved values in aggregate functions as well.

json_sample=# SELECT SUM(CAST(sale->'items'->>'quantity' AS integer)) AS total_quantity_sold FROM sales;
total_quantity_sold
---------------------
19
(1 row)

JSON functions in PostgreSQL

Let’s go through some functions that PostgreSQL provides for manipulating JSON objects.

json_each

This function expands the outermost JSON object into a set of key/value pairs. Notice that the nested JSONs are not expanded.

json_sample=# SELECT json_each(sale) FROM sales;
json_each
--------------------------------------------------------------
(customer_name,"""John""")
(items,"{ ""description"": ""milk"", ""quantity"": 4 }")
(customer_name,"""Susan""")
(items,"{ ""description"": ""bread"", ""quantity"": 2 }")
(customer_name,"""Mark""")
(items,"{ ""description"": ""bananas"", ""quantity"": 12 }")
(customer_name,"""Jane""")
(items,"{ ""description"": ""cereal"", ""quantity"": 1 }")
(8 rows)

json_object_keys

Returns set of keys in the outermost JSON object. Again, notice that the nested keys are not displayed.

json_sample=# SELECT json_object_keys(sale) FROM sales;
json_object_keys
------------------
customer_name
items
customer_name
items
customer_name
items
customer_name
items
(8 rows)

json_typeof

Returns the type of the outermost JSON value as a text string. Possible types are ‘object’, ‘array’, ‘string’, ‘number’, ‘boolean’, and NULL.

json_sample=# SELECT json_typeof(sale->'items'), json_typeof(sale->'items'->'quantity') FROM sales;
json_typeof | json_typeof
-------------+-------------
object     | number
object     | number
object     | number
object     | number
(4 rows)

json_object

Builds a JSON object out of a text array. The function can be used in one of two ways:

(1) Array with exactly one dimension with an even number of members. In this case the elements are taken as alternating key/value pairs.

json_sample=# SELECT json_object('{key1, 6.4, key2, 9, key3, "value"}');
json_object
--------------------------------------------------
{"key1" : "6.4", "key2" : "9", "key3" : "value"}
(1 row)

(2) Array with two dimensions such that each inner array has exactly two elements. In this case, the inner array elements are taken as a key/value pair.

json_sample=# SELECT * FROM json_object('{{key1, 6.4}, {key2, 9}, {key3, "value"}}');
json_object
--------------------------------------------------
{"key1" : "6.4", "key2" : "9", "key3" : "value"}
(1 row)

to_json

Returns the value as a JSON object.

json_sample=# CREATE TABLE json_test (id INT, name TEXT);
CREATE TABLE
json_sample=# INSERT INTO json_test VALUES (1, 'Jack');
INSERT 0 1
json_sample=# INSERT INTO json_test VALUES (2, 'Tom');
INSERT 0 1
json_sample=# SELECT row_to_json(row(id, name)) FROM json_test;
row_to_json
----------------------
{"f1":1,"f2":"Jack"}
{"f1":2,"f2":"Tom"}
(2 rows)

 

PostgreSQL official documentation has a more comprehensive listing of the available JSON functions.

Have questions? Contact us NOW!

The post HOWTO use JSON functionality in PostgreSQL appeared first on Stormatics.


Andrew Dunstan: Dynamically disabling triggers without locks

$
0
0
Recently Simon Riggs committed a patch by himself and Andreas Karlsson to reduce the lock strength required by certain ALTER TABLE commands, including those to enable or disable triggers. Now the lock level required is SHARE ROW EXCLUSIVE instead of ACCESS EXCLUSIVE. That means it doesn't block SELECT commands any more, and isn't blocked by them, although it will still block and be blocked by INSERT, UPDATE and DELETE operations. Very nice.

However, without formally disabling a trigger you can tell it dynamically not to do anything in the current session without taking any locks at all. Here's a little bit of PLpgsql code I wrote recently for this sort of operation in an INSERT trigger:
    begin
disabled := current_setting('mypackage.foo_trigger_disabled');
exception
when others then disabled := 'false';
end;
if disabled = 'true' then
return NEW;
end if;
Note that this will only block the trigger from doing anything in sessions where this variable is set. But that's often exactly what you want. In the case this was written for, the trigger is redundant (and expensive) for certain bulk operations, but required for normal operations.  So in a session where we are performing the bulk operation, we can simply set this and avoid taking out a heavy lock on the table, and do this instead, before running our bulk operation:
    set mypackage.foo_trigger_disabled = 'true';
The code above is a bit ugly because of the requirement for the exception handler. There's a cure for that coming, too. David Christensen has submitted a patch to provide a form of current_setting() which will return NULL for unset variables instead of raising an exception.

Note, too, that you could use a value in a one-row one-column table if you wanted something that could apply in all sessions, not just the current session. It would be a bit less efficient, though. This mechanism is pretty light-weight.

gabrielle roth: PDXPUG: April meeting next week

$
0
0

When: 6-8pm Thursday April 16, 2015
Where: Iovation
Who: Eric Hanson
What: Aquameta release!

Eric Hanson will give a tutorial for how to build applications with Aquameta, an open source web development platform built entirely in PostgreSQL. Aquameta is about to be launched as open source, so we’ll do a quick launch recap, and then dive into the tutorial.


Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!


Shaun M. Thomas: PG Phriday: Functions and Addressing JSON Data

$
0
0

Fairly recently, a friend of mine presented a problem he wanted to solve with some JSON he had in a table. After he presented the end result he was trying to reach, I made the assumption that this would be pretty easy to do. But then I looked at the JSON Functions to try and find that quick fix. Though I read extensively and used rather liberal interpretations of the functions, there’s no way to directly manipulate JSON object contents with PostgreSQL.

Wait! Before you start yelling at me for being an idiot, I know what you’re going to say. I thought the same thing… at first. Go ahead and look, though. As of PostgreSQL 9.4, there is no built-in functionality to add or remove JSON elements without one or more intermediate transformation steps through PostgreSQL arrays or records. But that isn’t necessarily unexpected. Why?

Because PostgreSQL is a database. Its primary purpose is to store data and subsequently extract and view it. From this perspective, there’s no reason for PostgreSQL to have an entire library of JSON modification functions or operators. Regardless of this however, actions such as data merges and bulk updates still need to be possible. Yet all other fields allow a single update statement to append information, or otherwise perform a native calculation to replace the value in-line. There must be a way to do this with JSON too, without jumping through too many burning hoops.

Luckily there is, but it does require some preliminary work. Let’s start with a simple JSON document, as seen by PostgreSQL:

SELECT '{"Hairy": true, "Smelly": false}'::JSON;

               json
----------------------------------
 {"Hairy": true, "Smelly": false}

Ok. Now, how would I add an attribute named “Wobbly”? Well, I could pull the data into an external application, add it, and store the result. But suppose this was in a table of millions of records? That’s probably the least efficient way to modify them. This could be parallelized to a certain extent, but that requires a lot of scaffolding code and is way too much work for something so simple.

Instead, let’s create a function to do it for us. We’ve already established that PostgreSQL JSON manipulation is extremely limited, so what other option do we have? Here’s a python function:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$

    if not key:
        return data

    from json import loads, dumps
    js = loads(data)
    js[key] = loads(value)
    return dumps(js)

$$ LANGUAGE plpythonu;

Now we could add the field with ease:

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                    json_update                    
---------------------------------------------------
 {"Hairy": true, "Smelly": false, "Wobbly": false}

And if we want to get really fancy, there’s always PLV8:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$
    if (key)
        data[key] = value;
    return data;

$$ LANGUAGE plv8;

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                 json_update                  
----------------------------------------------
 {"Hairy":true,"Smelly":false,"Wobbly":false}

Though with PLV8, there are a couple of relatively minor caveats.

  1. PLV8 doesn’t work with JSONB yet, which is why all of these examples are in JSON instead.
  2. You might notice that it stripped all the extraneous whitespace, which may not be desirable.

Either way, both of these variants do something that PostgreSQL can’t do on its own. This is one of the reasons PostgreSQL is my favorite database; it’s so easy to extend and enhance.

Just as a thought experiment, which of these functional variants is faster? I didn’t use the IMMUTABLE or STRICT decorators, so it would be easy to run a loop of a few thousand iterations and see what the final run-time is. Here’s a modification of the test query:

EXPLAIN ANALYZE
SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON)
  FROM generate_series(1, 100000);

On my test VM, the python function took around four seconds, while the PLV8 version only needed a second and a half. Clearly PLV8’s native handling of its own datatype helps here, and python having to repeatedly import the json library hurts its own execution. By adding IMMUTABLE, both fly through all 100-thousand iterations in less than 200ms.

Don’t be afraid to stray from SQL when using PostgreSQL. In fact, this might be a good case for thinking about PostgreSQL in an entirely different light. I might start calling it PGDB from now on, simply to honor its roots and its primary functionality. SQL is no longer the Alpha and Omega when it comes to its capabilities these days. So I feel it’s only right to adapt along with it.

Here’s to the future of PGDB!

David E. Wheeler: PGXN Release Badges

$
0
0

Looks like it’s been close to two years since my last post on the PGXN blog. Apologies for that. I’ve thought for a while maybe I should organize an “extension of the week” series or something. Would there be interest in such a thing?

Meanwhile, I’m finally getting back to posting to report on a fun thing you can now do with your PGXN distributions. Thanks to the Version Badge service from the nice folks at Gemfury, you can badge your distributions! Badges look like this:

PGXN version

You’ve no doubt seem simlar badges for Ruby, Perl, and Python modules. Now the fun comes to PGXN. Want in? Assuming you have a disribution named pgfoo, just put code like this into the README file:

[![PGXN version](https://badge.fury.io/pg/pgfoo.svg)](https://badge.fury.io/pg/pgfoo)

This is Markdown format; use the syntax appropriate to your preferred README format to get the badg to show up on GitHub and PGXN.

That’s it! The badge will show the current releases version on PGXN, and the button will link through to PGXN.

Use Travis CI? You can badge your build status, too, as I’ve done for pgTAP, like this:

Build Status

[![Build Status](https://travis-ci.org/theory/pgtap.png)](https://travis-ci.org/theory/pgtap)

Coveralls provides patches, too. I’ve used them for Sqitch, though I’ve not yet taken the time figure out how to do coverage testing with PostgreSQL extensions. If you have, you can badge your current coverage like so:

Coverage Status

[![Coverage Status](https://coveralls.io/repos/theory/sqitch/badge.svg)](https://coveralls.io/r/theory/sqitch)

So get badging, and show off your PGXN distributions GitHub and elsewhere!

Umair Shahid: HOWTO create reports in Tableau with PostgreSQL database

$
0
0

For 2015, once again, Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms ranks Tableau pretty much at the top. How powerful would it be to combine Tableau with the world’s most advanced open source database, PostgreSQL, to create reports and analytics for businesses? This HOWTO walks you through the process of connecting Tableau with PostgreSQL and creating a simple report.

This HOWTO is based on the following configuration:

  • Windows 7
  • Tableau 9 Desktop
  • PostgreSQL 9.4.1

You can download the database used in this post from here: Database SQL (1 download ) .

Let’s create our first report!

Connecting to PostgreSQL

1) Launch Tableau. Click ‘More Servers …’ and select PostgreSQL from the menu as illustrated in the following screenshot:

 

2) Fill in the PostgreSQL connection properties and hit OK. If you don’t already have the required connection libraries installed, you will get an error as seen in following screenshot.

 

3) The (rather helpful) error dialog provides a link to download the libraries, click on it (requires working internet connection). This should take you to the driver’s section on Tableau’s official website. Locate PostgreSQL and download the corresponding setup file. See following screenshot for reference:

 

4) Run the downloaded file (may require administrator privileges). This will setup ODBC drivers and all system configurations required for PostgreSQL connectivity. Once you have completed setup, run Tableau Desktop again and connect to the PostgreSQL database you downloaded before the 1st step.

Creating a simple report

1) Once connected, you’ll find yourself on the Data Source tab. It lists the server, database and the tables as shown in the screen below.

 

2) You can drag and drop the tables you want to use. Alternatively, you can write a custom SQL for the required dataset. For the purpose of this blog, I have dragged and dropped ‘sales’ table. You can take a peek at a preview of the dataset in the result section of the window. Please note that you may have to click ‘Update’ in result pane to preview the result.

 

3) Click ‘Sheet’ tab (sheet2 in our example) to open the report sheet. The tab on the left will show 2 headers: Data and Measure. The former lists available dimensions whereas the latter lists all measurable values. Right pane in the window shows a pivot table like structure.

 

4) Now, let’s create a simple report that lists out sales and dates by country. To do this, simply drag ‘Sale’ measure and drop it on the data area in the pivot table. Now drag ‘Date’ and ‘Country’ from ‘Dimensions’ section and drop in ‘Rows’ area. And that’s it!

Refer to following screenshot for reference.

 

Adding extract criteria

1) Next, let’s try to filter the results by country. Start by dragging ‘Country’ from ‘Dimensions’ and dropping it in the ‘Filters’ area. In the dialog box that opens up, under the ‘General’ tab, click on ‘Select from list’. Next, click ‘All’ to select all the countries to show up and press OK.

 

2) Right-click ‘Country’ under ‘Dimensions’ and click Create -> Parameter.

 

3) The next dialog box specifies various properties for the parameter we are creating. Enter ‘Country Parameter‘ as Name, String as data type, and ‘List’ as ‘Allowable Values’. This last selection forces the user can to select a country name from the provided list. Click OK to confirm parameter properties.

 

4) ‘Country Parameter’ now appears under ‘Parameters’ section in the left pane.

 

5) Now open ‘Country’ filter properties and go to ‘Conditions’ tab. Select ‘By Formula’ and key in ‘[Country]=[Country Parameter]’ as shown in following screen shot. Click OK.

 

6) In order to present this option to the user, right click ‘County Parameter’ under ‘Parameters’ sub tab and select ‘Show Parameter Control’.

 

Running the report

Click ‘Presentation Mode’ OR press F-7 to preview. Select a country from ‘Country Parameter’ list to show its result.

 

Your first report with an extract criterion is complete!

Take a look at these freely available training videos from Tableau to learn more about what you can do using Tableau.

Have questions? Contact us NOW!

The post HOWTO create reports in Tableau with PostgreSQL database appeared first on Stormatics.

Josh Berkus: South Bay PUG: Vitesse and Heikki at Adobe

$
0
0

The South Bay PUG is having a meetup on April 28th. Speakers will include CK Tan of PostgreSQL enhancement company Vitesse Data, and Heikki Linnakangas, PostgreSQL Committer. We do not expect to have live video for this meetup, sorry!

RSVP on the Meetup Page.

Rajeev Rastogi: Indian PGDay, Bangalore (11th April 2015)

$
0
0
I recently just got back from Indian PGDay conference 2015. It was an interesting, motivating and lot of knowledge sharing in terms of both attending and speaking at the conference.

I spoke about the various kind of "Optimizer Hint" provided by many database engines and also a new idea of "Hint", which can be provided to Optimizer. Some of the speakers shared their work on PostgreSQL as User.
Also it was interesting to know that many companies are evaluating migration or are in process of migrating from other DBs to PostgreSQL. This is really encouraging for all PostgreSQL experts.



Some of the details from presentation are as below (For complete presentation please visit Full Optimizer Hint)

Statistics Hint:

Statistics Hint is used to provide any kind of possible statistics related to query, which can be used by optimizer to yield the even better plan compare to what it would have done otherwise.
Since most of the databases stores statistics for a particular column or relation but doesn't store statistics related to join of column or relation. Rather these databases just multiply the statistics of individual column/relation to get the statistics of join, which may not be always correct.

Example:
Lets say there is query as
SELECT * FROM EMPLOYEE WHERE GRADE>5 AND SALARY > 10000;

If we calculate independent stats for a and b.
suppose sel(GRADE) = .2 and sel(SALARY) = .2;

then sel (GRADE and SALARY) =
sel(GRADE) * sel (SALARY) = .04.
 
In all practical cases if we see, these two components will be highly dependent i.e. if first column satisfy,second column will also satisfy. Then in that case sel (GRADE and SALARY) should be .2 not .04. But current optimizer will be incorrect in this case and may give wrong plan.

Data Hint:

This kind of hints provides the information about the relationship/ dependency among relations or column to influence the plan instead of directly hinting to provide desired plan or direct selectivity value. Optimizer can consider dependency information to derive the actual selectivity.

Example:
Lets say there is a query as
SELECT * FROM TBL  WHERE ID1 = 5 AND ID2=NULL;
SELECT * FROM TBL  WHERE ID1 = 5 AND ID2!=NULL;

Now here if we specify that the dependency as
“If TBL.ID1 = 5 then TBL.ID2 is NULL”
then the optimizer will always consider this dependency pattern and accordingly combined statistics for these two columns can be choosen.

Note: This feature is not yet available in PG.

Conclusion:
Unlike other DB, we can provide some actual statistics information to optimizer to come out with the most optimal plan instead of directly telling planner to choose one specific plan.

Michael Paquier: Postgres 9.5 feature highlight: log_autovacuum_min_duration at relation level

$
0
0

log_autovacuum_min_duration is a system-wide parameter controlling a threshold from which autovacuum activity is logged in the system logs. Every person who has already worked on looking at a system where a given set of table is bloated has for sure already been annoyed by the fact that even a high value of log_autovacuum_min_duration offers no guarantee in reducing log spams of not-much-bloated tables whose autovacuum runtime takes more than the threshold value, making its activity being logged (and this is after working on such a lambda system that the author of this feature wrote a patch for it). Postgres 9.5 is coming with a new feature allowing to control this logging threshold at relation level, feature introduced by this commit:

commit: 4ff695b17d32a9c330952192dbc789d31a5e2f5e
author: Alvaro Herrera <alvherre@alvh.no-ip.org>
date: Fri, 3 Apr 2015 11:55:50 -0300
Add log_min_autovacuum_duration per-table option

This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.

This parameter can be set via CREATE TABLE or ALTER TABLE, with default value being the one defined by the equivalent parameter at server-level, like that for example:

=# CREATE TABLE vac_table (a int) WITH (log_autovacuum_min_duration = 100);
CREATE TABLE
=# ALTER TABLE vac_table SET (log_autovacuum_min_duration = 200);
ALTER TABLE

Note that This parameter has no unit and cannot use any units like the other relation-level options, and it has a default unit of milliseconds, so after CREATE TABLE the autovacuum activity of relation vac_table is logged if its run has taken more than 100ms, and 200ms after ALTER TABLE.

Thinking wider, there are two basically cases where this parameter is useful, an inclusive and an exclusive case:

  • when system-wide log_autovacuum_min_duration is -1, meaning that all the autovacuum activity is ignored for all the relations, set this parameter to some value for a set of tables, and the autovacuum activity of this set of tables will be logged. This is useful to monitor how autovacuum is working on an inclusive set of tables, be it a single entry or more.
  • when willing to exclude the autovacuum runs of a set of tables with a value of log_autovacuum_min_duration positive, simply set the value for each relation of this set at a very high value, like a value a single autovacuum is sure to not take, and then the autovacuum activity of this set of tables will be removed from the system logs.

In short words, this parameter is going to make life easier for any person doing debugging of an application bloating tables, and just that is cool.

Andrew Dunstan: Hot Fix for buildfarm client, currently broken by pg_upgrade move

$
0
0
Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.

Andreas Scherbaum: German-speaking PostgreSQL Conference 2015

$
0
0
Author
Andreas 'ads' Scherbaum

PGConf.de 2015 is the sequel of the highly successful German-speaking PostgreSQL Conferences 2011 and 2013. Due to space limitations in the old location, we are moving to Hamburg. The conference takes place on Friday, November 27th. We also add a day with trainings on the 26th.

http://2015.pgconf.de/

Registration for the conference will be possible well in advance. Tickets must be purchased online. For sponsors, we have put together a package that includes among other things, a number of discounted ticket. More in the Call for Sponsors in a separate announcement.

Josh Berkus: Expressions VS advanced aggregates

$
0
0
So ... you're using some of 9.4's new advanced aggregates, including FILTER and WITHIN GROUP.  You want to take some statistical samples of your data, including median, mode, and a count of validated rows.  However, your incoming data is floats and you want to store the samples as INTs, since the source data is actually whole numbers.  Also, COUNT(*) returns BIGINT by default, and you want to round it to INT as well.  So you do this:

    SELECT
        device_id,
        count(*)::INT as present,
        count(*)::INT FILTER (WHERE valid) as valid_count,
        mode()::INT WITHIN GROUP (order by val) as mode,
        percentile_disc(0.5)::INT WITHIN GROUP (order by val)
          as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


And you get this unhelpful error message:

    ERROR:  syntax error at or near "FILTER"
    LINE 4:         count(*)::INT FILTER (WHERE valid)
            as valid_count,


And your first thought is that you're not using 9.4, or you got the filter clause wrong.  But that's not the problem.  The problem is that "aggregate() FILTER (where clause)" is a syntactical unit, and cannot be broken up by other expressions.  Hence the syntax error.  The correct expression is this one, with parens around the whole expression and then a cast to INT:

    SELECT
        device_id,
        count(*)::INT as present,
        (count(*) FILTER (WHERE valid))::INT as valid_count,
        (mode() WITHIN GROUP (order by val))::INT as mode,
        (percentile_disc(0.5) WITHIN GROUP (order by val))::INT
           as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


If you don't understand this, and you use calculated expressions, you can get a worse result: one which does not produce and error but is nevertheless wrong.  For example, imagine that we were, for some dumb reason, calculating our own average over validated rows.  We might do this:

    SELECT
        device_id,
        sum(val)/count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


... which would execute successfully, but would give us the total of all rows divided by the count of validated rows. That's because the FILTER clause applies only to the COUNT, and not to the SUM.  If we actually wanted to calculate our own average, we'd have to do this:

    SELECT
        device_id,
        sum(val) FILTER (WHERE valid)
            / count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;


Hopefully that helps everyone who is using the new aggregates to use them correctly and not get mysterious errors.  In the meantime, we can see about making the error messages more helpful.

Peter Eisentraut: Storing URIs in PostgreSQL

$
0
0

About two months ago, this happened:

And a few hours later:

It took a few more hours and days after this to refine some details, but I have now tagged the first release of this extension. Give it a try and let me know what you think. Bug reports and feature requests are welcome.

(I chose to name the data type uri instead of url, as originally suggested, because that is more correct and matches what the parsing library calls it. One could create a domain if one prefers the other name or if one wants to restrict the values to certain kinds of URIs or URLs.)

(If you are interested in storing email addresses, here is an idea.)

Andrew Dunstan: New PostgreSQL Buildfarm Client Release

$
0
0
I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Enjoy!

Shaun M. Thomas: PG Phriday: Anonymous Blocks and Object Manipulation

$
0
0

PGDB has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing PGDB to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
DECLARE
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
BEGIN
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
  LOOP
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';

    EXECUTE 'ALTER TABLE ' || tab || ' DROP CONSTRAINT ' ||
        quote_ident(chk);

    EXECUTE 'ALTER TABLE ' || tab || ' ADD CONSTRAINT ' ||
        quote_ident(chk) || ' CHECK (' ||
        quote_ident(col) || ' >= ' || quote_literal(sdate) ||
          ' AND ' ||
        quote_ident(col) || ' < ' || quote_literal(edate) || ')';
  END LOOP;
END;
$$ LANGUAGE plpgsql;

I didn’t just use a bunch of unnecessary variables for giggles. The original version of this block used a single RECORD and a subquery to collect all of the necessary substitutions in their calculated forms. However, I felt this discussion needed a simpler step-by-step logic. Now let’s discuss this rather large block of SQL, because it is a very interesting lesson in several different aspects of the PL/pgSQL procedural language.

If you didn’t already know, you can loop through SQL results in a FOR loop, and pull SELECT results into variables while doing so. This is fairly common knowledge, so I won’t dwell on it. We should however, examine the query itself:

SELECT i.inhrelid::REGCLASS::TEXT AS tab,
       co.conname AS cname,
       substring(co.consrc FROM '\w+') AS col
  FROM pg_inherits i
  JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
 WHERE co.contype = 'c'

Here, we’re making use of system catalog tables that help PGDB manage table metadata. First comes pg_inherits for information on child partitions, since they’re extremely likely to inherit from some base table as suggested by the partition documentation. Next, we incorporate information from pg_constraint so we know the name of each check constraint (contype of ‘c’) to modify.

Regarding the SELECT block itself, there is admittedly some magic going on here. The REGCLASS type serves a dual purpose in PGDB. For one, it is compatible with the OID object identifier type used extensively in the catalog tables. And second, when cast to TEXT, it outputs the schema and object name it represents, based on the current namespace. This means that, no matter where we are in the database, we will get a full substitution of the object—wherever it lives.

In that same block, we also abuse the consrc field to obtain the name of the column used in the constraint. There’s probably a more official way to get this, but as it turns out, the \w wildcard will match any word character. By globbing with +, we essentially grab the first series of word characters in the check. It might not work with other check constraints, but date-based partition rules generally only have an upper and lower bound. For these, the first match gives us the column name, and we don’t care about the rest.

Within the loop itself, things are a bit more straight-forward. After a bit of variable juggling, we start by dropping the old check. It was malformed, so good riddance. Then we build the new constraint based on our desired start and end dates. Note the use of quote_literal here. By using this function, the date variables are converted to text and quoted as static values. The end result is a query like this:

ALTER TABLE some_child_201504
  ADD CONSTRAINT ck_insert_date_201504
CHECK (insert_date >= '2015-04-01' AND
       insert_date < '2015-05-01')

Because these static text values do not match the column type, PGDB will automatically cast them in the physical constraint it actually creates. This prevents the check type mismatches we wrestled with in the last article.

So ends this example of fixing some broken DDL with an ad-hoc anonymous block. In the past, it was fairly common for DBAs to write queries using concatenation to write the DDL commands in the SELECT section of the query. Then we would direct that output to a script, and execute it separately. In this particular case, we would need two scripts: one to drop the constraints, and another to re-create them. That approach is certainly an option for those still uncomfortable working with anonymous blocks or EXECUTE statements.

In the end, I always encourage exploring capabilities to their full extent. Dig into Server Programming documentation if you really want to learn more.


Amit Kapila: Write Scalability in PostgreSQL

$
0
0

I have ran some benchmark tests to see the Write performance/scalability in 
PostgreSQL 9.5 and thought it would be good to share the same with others,
so writing this blog post.

I have ran a pgbench tests (TPC-B (sort of) load) to compare the performance
difference between different modes and scale factor in HEAD (e5f455f5) on
IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here are the performance result





























Some of the default settings used in all the tests are:
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
autovacuum=off

I have kept auto vacuum as off to reduce the fluctuation due to same and is
dropping and re-creating the database after each run.  I have kept high values
of min_wal_size and max_wal_size to reduce the effect of checkpoints, probably
somewhat lower values could have served the purpose of this workload, but I
haven't tried it.

The data is mainly taken for 2 kind of modes (synchronous_commit = on | off) and
at 2 different scale factors to cover the cases when all the data fits in shared buffers
(scale_factor = 300) and when all the data can't fit in shared buffers, but can fit in
RAM (scale_factor = 3000).

First lets talk about synchronous_commit = off case, here when all the data fits in
shared_buffers (scale_factor = 300), we can see the scalability upto 64 client count
with TPS being approximately 75 percent higher at 64 client-count as compare to 8
client count which doesn't look bad. When all the data doesn't fit in shared buffers,
but fit in RAM (scale_factor = 3000), we can see scalability upto 32 client-count with
TPS being 64 percent higher than at 8 client-count and then it falls there on.

One major difference in case of Writes when data doesn't fit in shared_buffers is
that backends performing transactions needs to write the dirty buffers themselves
when they are not able to find a clean buffer to read the page, this can hamper
the TPS.

Now let's talk about synchronous_commit = on case, here when all the data fits in
shared_buffers (scale_factor = 300), we can see the scalability upto 64 client count
with TPS being approximately 189 percent higher at 64 client-count as compare to
8 client count which sounds good. When all the data doesn't fit in shared buffers,
but fit in RAM (scale_factor = 3000), we can see a pretty flat graph with some
scalability upto 16 client-count with TPS being approximately 22 percent higher than
at 8 client-count and then it stays as it is.

Here one point to note is that when the data fits in shared_buffers (scale_factor = 300),
TPS at higher client-count (64) in synchronous_commit = on mode becomes equivalent to 
TPS in synchronous_commit = off which suggests that there is no major contention
due to WAL writing in such loads.

In synchronous_commit = on case, when the data doesn't fit in shared_buffers 
(scale_factor = 3000), the TPS is quite low and one reason is that backends might
be performing writes themselves, but not sure if the performance is so low just
due to that reason as I have tried with different values of Bgwriter related parameters
(bgwriter_delay, bgwriter_lru_maxpages, bgwriter_lru_multiplier), but there is no
much difference.

As per my knowledge, the locks that can lead to contention for this workload
are:
a. ProcArrayLock (used for taking snapshot and at transaction commit)
b. WALWriteLock (used for performing WALWrites)
c. CLOGControlLock (used to read and write transaction status)
d. WALInsertLocks (used for writing data to WAL buffer)

I think among these ProcArrayLock and WALWriteLock are the candidates
which can be the reason for contention, but I haven't done any deep analysis
to find out the same.

Now it could be that the bottleneck is due to multiple locks as was the case
for read operations which I have explained in my previous Read Scalability
blog or it could be due to one of these locks.  I think all this needs further
analysis and work. Thats all what I want to say for now.

Baji Shaik: Woohoo !! Packt Publishing has published a book on troubleshooting PostgreSQL database.

$
0
0
(Baji is trying to impress 'X')
==========
Baji: Packt Publishing has published a book on troubleshooting PostgreSQL database.
 _X_: Uh, so what(!?). It published other 4 PostgreSQL books this year !
Baji: yeah, I know !
 _X_: then why do you care about thisssss.
Baji: I should care about it as I was part of technical reviewing team.. :(
 _X_: Oh really !, thats fantastic.. Congratulations !
==========

Note: Finally, Baji impressed _X_ :-)

Ok, in reality, I am glad to announce that "My first book as a Technical Reviewer has been published by Packt Publishing" ;-)

https://www.packtpub.com/big-data-and-business-intelligence/troubleshooting-postgresql
http://my.safaribooksonline.com/book/databases/postgresql/9781783555314/troubleshooting-postgresql/pr02_html

Author of this book is Hans-Jürgen Schönig, he has couple of other PostgreSQL Books as well.

This book is to provide a series of valuable troubleshooting solutions to database administrators responsible for maintaining a PostgreSQL database. It is aimed at PostgreSQL administrators who have developed an application with PostgreSQL, and need solutions to common administration problems they encounter when managing a database instance. So give a try ;-)

I would like to thank my loving parents for everything they did for me. Personal time always belongs to family, and I did this in my personal time.

I want to thank the Packt Publishing for giving me this opportunity and thanks to Sanchita Mandal and Paushali Desai for choosing me and working with me for this project.

Last but not least, would like to thanks Dinesh Kumar who taught me PostgreSQL and inspiring me for this. :)

Baji Shaik: Aha, you can count the rows for \copy command.

$
0
0
We all know that \copy command does not return anything when you load the data. The idea is to capture how many # of records got loaded into table through \copy command.
Here's a shell script that should work:
echo number of rows in input: $(wc -l data.in)
( echo "\copy test from stdin delimiter '|';" ; cat data.in ) | psql -v ON_ERROR_STOP=1
echo psql exit code $?

If the exit code printed is 0, everything went well, and the value printed by the first echo can be used to to indicate how many rows were inserted. If the printed exit code is non-zero, no rows were inserted, of course. If the exit code printed is 3 then the data being copied had some error.

From the docs: If the exit code printed is 1 or 2 then something went wrong in psql (like it ran out of memory) or the server connection was broken, respectively. Following facts play a role in the above script:

.) COPY (and hence \copy) expects the input records to be terminated by a newline. So counting the number of newlines in the input is a reliable way of counting the records inserted.
.) psql will exit with code 3 iff there's an error in script and ON_ERROR_STOP is set. 
Note: This seems to not apply to the `psql -c "sql command"` construct.

# Example clean input

$ pgsql -c "create table test(a text,b int);"
CREATE TABLE
$ cat data.in
column1|2
column1|2
column1|2
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in  ) | psql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
psql exit code 0

# Example malformed input
$ cat data.in
column1|2
column1|2
column1|2c
column1|2
column1|2
column1|2

$ echo number of rows in input: $(wc -l data.in); ( echo "\copy test from stdin delimiter '|';" ; cat data.in  ) | pgsql -v ON_ERROR_STOP=1 ; echo psql exit code $?
number of rows in input: 6 data.in
ERROR:  invalid input syntax for integer: "2c"
CONTEXT:  COPY test, line 3, column b: "2c"
psql exit code 3
 
I hope this helps someone.

Andrew Dunstan: Buildfarm bug fix for MSVC builds

$
0
0
Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_15_1.tgz or you can just pick up the fixed version of run_build.pl (the only thing changed) at https://raw.githubusercontent.com/PGBuildFarm/client-code/b80efc68c35ef8a1ced37b57b3d19a98b8ae5dd2/run_build.pl

Sorry for the inconvenience.

Ronan Dunklau: Import foreign schema support in Multicorn

$
0
0

Some of you may have noticed that support for the IMPORT FOREIGN SCHEMA statement has landed in the PostgreSQL source tree last july. This new command allows users to automatically map foreign tables to local ones.

Use-Case

Previously, if you wanted to use the postgres_fdw Foreign Data Wrapper to access data stored in a remote database you had to:

  • Create the extension
  • Create a server
  • Create a user mapping
  • For each remote table:
    • Create a FOREIGN TABLE which structures matches the remote one

This last step is tedious, and error prone: you have to match the column names, in the right order, with the right type.

The IMPORT FOREIGN SCHEMA statements allows you to automatically create a foreign table object for each object available remotely.

Multicorn implementation

The API has been implemented in Multicorn for a few months, lingering in its own branch.

I just merged it back to the master branch, and this feature will land in an upcoming 1.2.0 release. In the meantime, test it !

The API

Its simple, like always with Multicorn. FDW just have to override the import_schema method:

@classmethoddefimport_schema(self,schema,srv_options,options,restriction_type,restricts)

This method just has to build a list of TableDefinition objects:

return[TableDefinition(table_name,schema=None,columns=[ColumnDefinition(name=column_name,type_name=integer)])]

And thats it !

As for now, the only FDW shipped with Multicorn that does implement this API is the sqlalchemyfdw.

SQLAlchemyFDW test run

So, with this API in mind, I conducted a small test: trying to import an Oracle schema as well as a MS-SQLServer schema:

CREATEEXTENSIONmulticorn;</p><p>CREATESERVERmssql_serverFOREIGNDATAWRAPPERmulticornOPTIONS(wrappermulticorn.sqlalchemyfdw.SqlAlchemyFdw,drivernamemssql+pymssql,hostmyhost,port1433,databasetestmulticorn);</p><p>CREATEUSERMAPPINGFORronanSERVERmssql_serverOPTIONS(usernameuser,passwordpassword);</p><p>CREATESCHEMAmssql;</p><p>IMPORTFOREIGNSCHEMAdboFROMSERVERmssql_serverINTOmssql;</p><p>CREATESERVERsqlite_serverFOREIGNDATAWRAPPERmulticornOPTIONS(wrappermulticorn.sqlalchemyfdw.SqlAlchemyFdw,drivernamesqlite,database/home/ronan/mydb.sqlite3);</p><p>CREATESCHEMAsqlite;</p><p>IMPORTFOREIGNSCHEMAmainFROMSERVERsqlite_serverINTOsqlite;</p><p>DELETEFROMmssql.t1;DELETEFROMsqlite.t1;</p><p>INSERTINTOsqlite.t1(id,label)VALUES(1,DEFAULT);SELECT*FROMsqlite.t1;</p><p>CREATESERVERoracle_serverFOREIGNDATAWRAPPERmulticornOPTIONS(wrappermulticorn.sqlalchemyfdw.SqlAlchemyFdw,drivernameoracle,hostanother_host,databasetestmulticorn);</p><p>CREATEUSERMAPPINGFORronanSERVERoracle_serverOPTIONS(usernameuser,passwordpassword);</p><p>CREATESCHEMAoracle;</p><p>IMPORTFOREIGNSCHEMAronanFROMSERVERoracle_serverINTOoracle;

And thats it ! Its sufficient to query tables from sqllite, oracle and MS-SQL from a single connection.

Once again, feel free to test it and to report any bugs you may find along the way !

Viewing all 9929 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>