Avinash Kumar: PostgreSQL Upgrade Using pg_dumpall

March 18, 2019, 7:59 am

≫ Next: Tim Colles: PostgreSQL Roles are Cluster-Wide

≪ Previous: Venkata Nagothi: Configuring PostgreSQL for Business Continuity

PostgreSQL logo There are several approaches to assess when you need to upgrade PostgreSQL. In this blog post, we look at the option for upgrading a postgres database using pg_dumpall. As this tool can also be used to back up PostgreSQL clusters, then it is a valid option for upgrading a cluster too. We consider the advantages and disadvantages of this approach, and show you the steps needed to achieve the upgrade.

This is the first of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different paths to accomplish postgres upgrade or migration. The series will culminate with a practical webinar to be aired April 17th (you can register here).

We begin this journey by providing you the most straightforward way to carry on with a PostgreSQL upgrade or migration: by rebuilding the entire database from a logical backup.

Defining the scope

Let’s define what we mean by upgrading or migrating PostgreSQL using pg_dumpall.

If you need to perform a PostgreSQL upgrade within the same database server, we’d call that an in-place upgrade or just an upgrade. Whereas a procedure that involves migrating your PostgreSQL server from one server to another server, combined with an upgrade from an older version (let’s say 9.3) to a newer version PostgreSQL (say PG 11.2), can be considered a migration.

There are two ways to achieve this requirement using logical backups :

Using pg_dumpall
Using pg_dumpall + pg_dump + pg_restore

We’ll be discussing the first option (pg_dumpall) here, and will leave the discussion of the second option for our next post.

pg_dumpall

pg_dumpall can be used to obtain a text-format dump of the whole database cluster, and which includes all databases in the cluster. This is the only method that can be used to backup globals such as users and roles in PostgreSQL.

There are, of course, advantages and disadvantages in employing this approach to upgrading PostgreSQL by rebuilding the database cluster using pg_dumpall.

Advantages of using pg_dumpall for upgrading a PostgreSQL server :

Works well for a tiny database cluster.
Upgrade can be completed using just a few commands.
Removes bloat from all the tables and shrinks the tables to their absolute sizes.

Disadvantages of using pg_dumpall for upgrading a PostgreSQL server :

Not the best option for databases that are huge in size as it might involve more downtime. (Several GB’s or TB’s).
Cannot use parallel mode. Backup/restore can use just one process.
Requires double the space on disk as it involves temporarily creating a copy of the database cluster for an in-place upgrade.

Let’s look at the steps involved in performing an upgrade using pg_dumpall:

Install new PostgreSQL binaries in the target server (which could be the same one as the source database server if it is an in-place upgrade).
```
-- For a RedHat family OS
# yum install postgresql11*
Or
-- In an Ubuntu/Debian OS
# apt install postgresql11
```
Shutdown all the writes to the database server to avoid data loss/mismatch between the old and new version after upgrade.
If you are doing an upgrade within the same server, create a cluster using the new binaries on a new data directory and start it using a port other than the source. For example, if the older version PostgreSQL is running on port 5432, start the new cluster on port 5433. If you are upgrading and migrating the database to a different server, create a new cluster using new binaries on the target server – the cluster may not need to run on a different port other than the default, unless that’s your preference.
```
$ /usr/pgsql-11/bin/initdb -D new_data_directory
$ cd new_data_directory
$ echo “port = 5433” >> postgresql.auto.conf
$ /usr/pgsql-11/bin/pg_ctl -D new_data_directory start
```
You might have a few extensions installed in the old version PostgreSQL cluster. Get the list of all the extensions created in the source database server and install them for the new versions. You can exclude those you get with the contrib module by default. To see the list of extensions created and installed in your database server, you can run the following command.
```
$ psql -d dbname -c "\dx"
```
Please make sure to check all the databases in the cluster as the extensions you see in one database may not match the list of those created in another database.
Prepare a postgresql.conf file for the new cluster. Carefully prepare this by looking at the existing configuration file of the older version postgres server.

Use pg_dumpall to take a cluster backup and restore it to the new cluster.

-- Command to dump the whole cluster to a file.
$ /usr/pgsql-11/bin/pg_dumpall > /tmp/dumpall.sql
-- Command to restore the dump file to the new cluster (assuming it is running on port 5433 of the same server).
$ /usr/pgsql-11/bin/psql -p 5433 -f /tmp/dumpall.sql

Note that i have used the new pg_dumpall from the new binaries to take a backup.
Another, easier, way is to use PIPE to avoid the time involved in creating a dump file. Just add a hostname if you are performing an upgrade and migration.

$ pg_dumpall -p 5432 | psql -p 5433
Or
$ pg_dumpall -p 5432 -h source_server | psql -p 5433 -h target_server

Run ANALYZE to update statistics of each database on the new server.
Restart the database server using the same port as the source.

Our next post in this series provides a similar way of upgrading your PostgreSQL server while at the same time providing some flexibility to carry on with changes like the ones described above. Stay tuned!

—
Image based on photo by Sergio Ortega on Unsplash

↧

Tim Colles: PostgreSQL Roles are Cluster-Wide

March 18, 2019, 9:20 am

≫ Next: elein mustain: Having Group By Clauses elein’s GeneralBits

≪ Previous: Avinash Kumar: PostgreSQL Upgrade Using pg_dumpall

A role in PostgreSQL is common to all databases in the cluster. This seems to be the result of a design decision made when the former user and group handling was unified under role. Follow these links for reference:

Roles, or rather those roles that are not just representing a specific user, ought instead to be an intrinsic part of the database model. Roles are defined by the kind of access they provide (read, write, etc) and by what relations(table, view, function, etc) they provide that access to. Access control is ideally managed within a database using roles rather than separately within each individual application that uses that database. So it makes sense that the access control rules (the roles and their associated permissions) would be defined alongside the definitions of the relations for which they are controlling access, any changes are then self contained. The access control model should be represented as part and parcel of the rest of the database model. Individual users (which are also represented as roles in PostgreSQL) are assigned one or more of the roles defined within each particular database model (based on the local enterprise definition of what needs they have of any particular database).

There is no sense to representing this kind of role at the cluster level as the definition of the role is associated specifically with the database where it actually controls access. In PostgreSQL, to encapsulate the full functionality of the database requires using not only the system catalog tables specific to that database but also the roles relevant to that database held in the cluster wide system catalog tables. With the exception of some special cases, like roles for cluster wide database management, this is an artificial split. Roles that are managed outside of the database model across all databases in the cluster make some sense either when there is only one database in the cluster, or when all the databases in the cluster are not independent and act together as part of one managed system. Roles (again, those that are not just being used to represent a specific user) should otherwise be defined and managed at the level of the individual database, or arguably even at the level of each schema within a database.

Below are some ideas for enhancing PostgreSQL to support database specific roles. These ideas build up from limited and simplistic solutions to more flexible and complex ones.

Approach One

This is what we currently do. We prefix every role name with the database name. So roles called customer and salesperson used in a database called orders would actually be called orders_customer and orders_salesperson. This approach relies on all owners of databases in the cluster playing ball and we need some custom handling around role DDL statements when changes are made to the database model.

Approach Two

This just puts some syntactic sugar around the first approach.

Syntax	Behaviour
CREATE [DBONLY] ROLE abc	Creates the role *abc* after first adding a prefix which is the name of the current database returning an error if the name already exists. Without the DBONLY option behaves as now. Most other options would be incompatible with the DBONLY option, for example LOGIN, CREATEDB, etc.
DROP [DBONLY] ROLE abc	Drops the role *abc* with a prefix matching the name of the current database returning an error if the name is not found (except when IF EXISTS is used). Without the DBONLY option behaves as now.
GRANT … ON … TO [DBONLY] abc	Grants the privileges to the role *abc* with a prefix matching the name of the current database returning an error if the name is not found. Without the DBONLY option behaves as now.
REVOKE … ON … FROM [DBONLY] abc	Removes the privileges from the role *abc* with a prefix matching the name of the current database returning an error if the name is not found. Without DBONLY option behaves as now.
DROPDB	Drops a database and all the roles in the cluster that have a prefix matching the given database name.

The cluster wide system catalog table pg_roles is still used. A role created manually with a database prefix (without using the DBONLY option) would be treated the same as if the role had been created database specific (when using the DBONLY option). Roles names still have to be unique across the cluster.

With this approach no meta data is retained that identifies a role as cluster-wide or database-specific. A pg_dump will simply dump the role name as now (including the prefix) and a pg_load would re-create it in the same way as now. The dropdb command would drop all roles that have a prefix matching the database name irrespective of how they were actually created.

The advantage of this approach is that no change is required to the mechanism that looks up roles and checks that the level of access is appropriate to the user that has those roles when relations are actually used.

However we can definitely do better while staying simple.

Approach Three

This works much as above except that instead of adding a prefix to the role name we add a column to the cluster wide pg_roles table, catalog_name, where we add the name of the database associated with the role when using the DBONLY option (left null otherwise). The advantage is that we now have the meta data preserved as to how the role was originally defined – either database-specific or cluster-wide.

This fixes various concerns with the second approach. DROPDB can be constrained to search for roles matching the database name in the catalog_name column, so cluster roles that just happen to have a matching database name prefix will be ignored. The pg_dump and pg_restore commands can use the DBONLY option in the output as necessary by checking whether the catalog_name column has a value or not. The role name lookups for GRANT and REVOKE remain specifically controlled using the DBONLY option (so as to avoid handling any complex role hierarchy semantics). Still no change is required to the mechanism that looks up roles and checks that the level of access is appropriate to the user that has those roles when relations are actually used (on the assumption that when GRANT is used the corresponding pg_roles row OID is assigned, the actual name is irrelevant). However two roles could now exist called the same, one with catalog_name empty and one (or more) with a database name in catalog_name. So there are consequences for displaying the content of system catalog tables in psql using \du or \dp for example, so the system catalog views would need to be amended in some way.

A downside is that all the roles are still held in one system catalog table for the cluster, but in practice this might not actually matter. The semantics for pg_dump need to be thought about: when dumping a specific database would want an option to also dump its own specific roles; when dumping all databases all roles should be dumped – database-specific and cluster-wide.

This approach needs few code changes and is backwards compatible.

Approach Four

This would add a per-database pg_roles system catalog table. This holds the roles specific to that database and the role names can be the same as role names in other databases or cluster-wide. Would probably need rules to handle prioritisation in name clashes with roles held in the cluster wide pg_roles table. For example CREATE ROLE would need to check both the database-specific pg_roles and cluster-wide pg_roles. Similar complexity would need to be handled in GRANT and REVOKE. Access control checks would now need to check both database-specific pg_roles and cluster-wide pg_roles, and there may be conflicts to resolve depending on how a role hierarchy is implemented. While having a database-specific pg_roles table makes semantic sense, is necessary for managing who can change which roles, and a thorough implementation of role priority/conflict handling might result in useful behaviour, there is a lot of additional complexity in the implementation.

Schema Handling

A further enhancement should allow roles to be separately defined per-schema within a database. Support could be added in some fashion by allowing schema qualification of the role name in CREATE ROLE, using public (or as defined in the schema path) by default. The implementation of this in the third approach above would be to add a schema_name column into pg_roles.

Mar 18, 2019

↧

elein mustain: Having Group By Clauses elein’s GeneralBits

March 17, 2019, 7:33 pm

≫ Next: Hans-Juergen Schoenig: Foreign data wrapper for PostgreSQL: Performance Tuning

≪ Previous: Tim Colles: PostgreSQL Roles are Cluster-Wide

Some people go to great lengths to avoid GROUP BY and HAVING clauses in their queries. The error messages are fussy but they are usually right. GROUP BY and HAVING key words are essential for good SQL reporting.

The primary reason for GROUP BY is to reduce the number of rows, usually by aggregation. It produces only one row for each matching grouping from the input. This allows you to make sophisticated calculations via ordinary SQL.

Fruit Example:

We have some fruit:

 item    | source  | amt | fresh_until
---------+---------+-----+------------
 bananas | Chile   | 50  | 2019-05-01
 bananas | Bolivia | 25  | 2019-04-15
 bananas | Chile   | 150 | 2019-07-10
 apples  | USA-WA  | 75  | 2019-07-01
 apples  | USA-CA  | 75  | 2019-08-15
 apples  | Canada  | 80  | 2019-08-01
 grapes | USA-CA   | 120 | 2019-07-15
(7 rows)

This next case allows us to look forward. Mid-year, what fruits will be available? We do this with the same query as above, however, after the query runs we check the values of min(fresh_until) by using a having clause. HAVING is how you qualify an aggregate.

select item, count(source) as srcs, sum(amt) as tot_amt,
 min(fresh_until) as min_fresh_until
 from fruit
 group by item
 having min(fresh_until) > '30-jun-2019';

   item  | srcs | tot_amt | min_fresh_until
 --------+------+---------+----------------
  grapes | 1    | 120     | 2019-07-15
  apples | 3    | 230     | 2019-07-01
 (2 rows)

All of the apples and grapes will be available mid-year.

A target list may contain non-aggregates and aggregates. Those non-aggregate columns in the target list
should be in the group by clause. The error message says so. The order of the columns in the group by clause matters. It determines how the aggregates are grouped. The order is often hierarchical. What that means to your columns is your focus. It could be fruit, or sources and/or fresh_until date.

Playing Cards Examples

Let’s look at another set of examples that illustrate extracting information on playing cards. You can learn about cards on Wikipedia Standard Cards.

Suppose you deal out programmatically six 5-card hands, like six people playing poker. A total of 30 cards are used in this deal. They are in a hand table like the following where the names of cards and suits are joined in by lookup tables. We store ranks so we can sort properly. We use names for display. The names and ranks have a one to one relationship for each of cards and suits.

create or replace view hands_v as
 select handid, su_name, ca_name
 from hands
 join suits using (su_rank)
 join cards using (ca_rank);

  handid | su_name | ca_name
 --------+---------+---------
       1 | Diamond | 2
       1 | Club    | 8
       1 | Spade   | J
       1 | Diamond | 5
       1 | Heart   | Q
       2 | Heart   | 9
       2 | Spade   | 4
       2 | Heart   | 7
       ...

What is the suit count for each hand? We really only care about any hands that have 3 or more cards of the same suit. That will tell us who has better chances for a poker flush. Note that although GROUP BY would seem to imply ORDER BY, it does not. ORDER BY must be explicit.

select handid, su_name, count(ca_name)
 from hands_v
 group by handid, su_name
 having count(ca_name) >= 3
 order by handid, su_name;

  handid | su_name | count
 --------+---------+-------
       3 | Diamond | 3
       4 | Spade   | 3
       6 | Spade   | 3
 (3 rows)

So what if you mis-grouped your query? If this hand table is not grouped by handid, then you will get 30 records of 6 hands of 5-cards. If you had aggregates, they would be grouped by row. Not very helpful.

If you aggregate the card name and do not include
the card name solo on the target list and try to order by card name,
you will receive the error message that it should not be in
the order by clause. The order by clause should contain
elements of the group by clause.
However, if the card name is explicitly in the target list,

 select handid, ca_name, count(ca_name)...

then the card name must be in the group by clause and
therefore allowable on the order by clause.

If the query is by suit, there will be a minimum of 1 or maximum of 4 records per suit for each of the six hands. Notice that we are sorting by suit rank which
also must be in the group by clause. su_name and su_rank have a one to one relationship.

select handid, su_name, count(su_name)
from hands
join suits using (su_rank)
group by su_rank, su_name, handid
order by handid, su_rank;

 handid | su_name | count
--------+---------+-------
      1 | Spade   | 1
      1 | Heart   | 1
      1 | Diamond | 2
      1 | Club    | 1
      2 | Spade   | 1
      2 | Heart   | 2
      2 | Club    | 2
      3 | Spade   | 1
      3 | Diamond | 3
      3 | Club    | 1
      4 | Spade   | 3
      4 | Heart   | 1
      4 | Diamond | 1
      5 | Heart   | 2
      5 | Diamond | 1
      5 | Club    | 2
      6 | Spade   | 3
      6 | Diamond | 2
(18 rows)

To see the distribution of cards into hands, We must group by the card rank column. Of course there are 4 suits of each card, so you won’t see a card in more than four hands.

select ca_name, count(handid) as num_hands from hands
join suits using (su_rank)
join cards using (ca_rank)
group by ca_rank, ca_name order by ca_rank;

 ca_name | num_hands
---------+-----------
       2 | 2
       3 | 2
       4 | 4
       5 | 3
       6 | 2
       7 | 3   
       8 | 2
       9 | 3
       J | 1
       Q | 2
       K | 4
       A | 2
(12 rows)

To peek and see who is holding aces, we can use the following short query. Note that there is a WHERE clause which is executed while collecting the rows. HAVING is executed after the rows are collected.

select handid, count(ca_name)
from hands_v
where ca_name = 'A'
group by handid;
 handid | count
--------+-------
      5 | 1
      6 | 1
(2 rows)

Summary

These examples are simple ways to evaluate known entities. Experiment and use these simple rules.

If a column is on the target list and not an aggregate, it must be in a GROUP BY clause.
WHERE clauses occur during the selection process.
HAVING clauses occur after the aggregates are completed.
Only non-aggregates can be in the ORDER BY clause.
Order of the GROUP BY clause matters.

↧

Hans-Juergen Schoenig: Foreign data wrapper for PostgreSQL: Performance Tuning

March 19, 2019, 4:42 am

≫ Next: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language

≪ Previous: elein mustain: Having Group By Clauses elein’s GeneralBits

Foreign data wrappers have been around for quite a while and are one of the most widely used feature in PostgreSQL. People simply like foreign data wrappers and we can expect that the community will add even more features as we speak. As far as the postgres_fdw is concerned there are some hidden tuning options, which are not widely known by users. So let us see how we can speed up the PostgreSQL foreign data wrapper.

Foreign data wrappers: Creating a “database link”

To show how things can be improved we first have to create some sample data in “adb”, which can then be integrated into some other database:

adb=# CREATE TABLE t_local (id int);
CREATE TABLE
adb=# INSERT INTO t_local 
		SELECT * FROM generate_series(1, 100000);
INSERT 0 100000

In this case I have simply loaded 100.000 rows into a very simple table. Let us now create the foreign data wrapper (or “database link” as Oracle people would call it). The first thing to do is to enable the postgres_fdw extension in “bdb”.

bdb=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

In the next step we have to create the “SERVER”, which points to the database containing our sample table. CREATE SERVER works like this:

bdb=# CREATE SERVER some_server 
		FOREIGN DATA WRAPPER postgres_fdw 
		OPTIONS (host 'localhost', dbname 'adb');
CREATE SERVER

Once the foreign server is created the users we need can be mapped:

bdb=# CREATE USER MAPPING FOR current_user 
		SERVER some_server 
		OPTIONS (user 'hs');
CREATE USER MAPPING

In this example the user mapping is really easy. We simply want the current user to connect to the remote database as “hs” (which happens to be my superuser).

Finally we can link the tables. The easiest way to do that is to use “IMPORT FOREIGN SCHEMA”, which simply fetches the remote data structure and turns everything into a foreign table.

bdb=# \h IMPORT
Command:     IMPORT FOREIGN SCHEMA
Description: import table definitions from a foreign server
Syntax:
IMPORT FOREIGN SCHEMA remote_schema
    [ { LIMIT TO | EXCEPT } ( table_name [, ...] ) ]
    FROM SERVER server_name
    INTO local_schema
    [ OPTIONS ( option 'value' [, ... ] ) ]

The command is really easy and shown in the next listing:

bdb=# IMPORT FOREIGN SCHEMA public 
		FROM SERVER some_server 
		INTO public;
IMPORT FOREIGN SCHEMA

As you can see PostgreSQL has nicely created the schema for us and we are basically ready to go.

bdb=# \d
            List of relations
 Schema |  Name   |     Type      | Owner 
--------+---------+---------------+-------
 public | t_local | foreign table | hs
(1 row)

Testing postgres_fdw performance

When we query our 100.000 row table we can see that the operation can be done in roughly 7.5 milliseconds:

adb=# explain analyze SELECT * FROM t_local ;
                                    QUERY PLAN                                                  
----------------------------------------------------------------------------------
 Seq Scan on t_local  (cost=0.00..1443.00 rows=100000 width=4) 
	(actual time=0.010..7.565 rows=100000 loops=1)
 Planning Time: 0.024 ms
 Execution Time: 12.774 ms
(3 rows)

Let us connect to “bdb” now and see, how long the other database needs to read the data:

adb=# \c bdb
bdb=# explain analyze SELECT * FROM t_local ;
                                      QUERY PLAN                                                    
--------------------------------------------------------------------------------------
 Foreign Scan on t_local  (cost=100.00..197.75 rows=2925 width=4) 
	(actual time=0.322..90.743 rows=100000 loops=1)
 Planning Time: 0.043 ms
 Execution Time: 96.425 ms
(3 rows)

In this example you can see that 90 milliseconds are burned to do the same thing. So why is that? Behind the scenes the foreign data wrapper creates a cursor and fetches data in really small chunks. By default only 50 rows are fetched at a time. This translates to thousands of network requests. If our two database servers would be further away, things would take even longer – A LOT longer. Network latency plays a crucial role here and performance can really suffer.

One way to tackle the problem is to fetch larger chunks of data at once to reduce the impact of the network itself. ALTER SERVER will allow us to set the “fetch_size” to a large enough value to reduce network issues without increasing memory consumption too much. Here is how it works:

bdb=# ALTER SERVER some_server 
	OPTIONS (fetch_size '50000');
ALTER SERVER

Let us run the test and see, what will happen:

bdb=# explain analyze SELECT * FROM t_local;
                                      QUERY PLAN                                                     
---------------------------------------------------------------------------------------
 Foreign Scan on t_local  (cost=100.00..197.75 rows=2925 width=4) 
	(actual time=17.367..40.419 rows=100000 loops=1)
 Planning Time: 0.036 ms
 Execution Time: 45.910 ms
(3 rows)

PostgreSQL Foreign Data Wrapper performance

Wow, we have managed to more than double the speed of the query. Of course, the foreign data wrapper is still slower than a simple local query. However, the speedup is considerable and it definitely makes sense to toy around with the parameters to tune it.

If you want to learn more about Foreign Data Wrappers, performance and monitoring, check out one of our other postings.

The post Foreign data wrapper for PostgreSQL: Performance Tuning appeared first on Cybertec.

↧

Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language

March 19, 2019, 2:30 pm

≫ Next: Hubert 'depesz' Lubaczewski: Migrating simple table to partitioned. How?

≪ Previous: Hans-Juergen Schoenig: Foreign data wrapper for PostgreSQL: Performance Tuning

On 16th of March 2019, Alexander Korotkov committed patch: Partial implementation of SQL/JSON path language SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit … Continue reading

↧

Hubert 'depesz' Lubaczewski: Migrating simple table to partitioned. How?

March 19, 2019, 3:57 pm

≫ Next: Christophe Pettus: “Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

≪ Previous: Hubert 'depesz' Lubaczewski: Waiting for PostgreSQL 12 – Partial implementation of SQL/JSON path language

Recently someone asked, on irc, how to make table partitioned. The thing is that it was supposed to be done with new partitioning, and not the old way. The problem is that while we can create table that will be seen as partitioned – we can't alter table to become partitioned. So. Is it possible? … Continue reading

↧

Christophe Pettus: “Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

March 20, 2019, 6:34 am

≫ Next: Craig Kerstiens: How to evaluate your database

≪ Previous: Hubert 'depesz' Lubaczewski: Migrating simple table to partitioned. How?

The slides from my presentation at PGDay Nordic 2019 are now available.

↧

Craig Kerstiens: How to evaluate your database

March 20, 2019, 6:47 am

≫ Next: Yogesh Sharma: PostgreSQL Zero to Hero: Getting Started with RPMs -Part 1

≪ Previous: Christophe Pettus: “Look It Up: Practical PostgreSQL Indexing” at Nordic PGDay 2019

Choosing a database isn’t something you do every day. You generally choose it once for a project, then don’t look back. If you experience years of success with your application you one day have to migrate to a new database, but that occurs years down the line. In choosing a database there are a few key things to consider. Here is your checklist, and spoiler alert, Postgres checks out strongly in each of these categories.

Does your database solve your problem?

There are a lot of new databases that rise up every year, each of these looks to solve hard problems within the data space. But, you should start by looking and seeing if they’re looking to solve a problem that you personally have. Most applications at the end of the day have some relational data model and more and more are also working with some level of unstructured data. Relational databases of course solve the relational piece, but they increasingly support the unstructured piece as well. Postgres in particular

Do you need strong gurantees for your data? ACID is still at the core of how durable and safe is your data, knowing how it stacks up here is a good evaluation criteria. But then there is also the CAP theorem which you see especially applied to distributed or clustered databases. Each of the previous links is worth a read to get a better understanding of the theory around databases. If you’re interested in how various databases perform under CAP then check out the Jepsen series of tests. But for the average person like myself it can be boiled down a bit more. Do you need full gurantee around your transactions, or do you optimize for some performance?

While it doesn’t fully speak to all the possible options you can have with databases, Postgres comes with some pretty good flexibility out of the box. It allows both synchronous (guaranteed it makes it) and asynchronous (queued up occurring soon after) replication to standbys. Those standbys could be for read replicas for reporting or for high availability. What’s nice about Postgres is can actually allow you to swap between synchronous and asynchronous on a per transaction basis.

Then there is the richness of features. Postgres has rich data types, powerful indexes, and a range of features such as geospatial support and full text search. By default yes, Postgres usually does solve my problem. But that is only one of my criteria.

How locked in am I to my database?

Once I’ve established that my database I want to know a bit more about what I’m getting myself into. Is the database open source is a factor. That doesn’t mean I require the database to be open source, but it simplifies my evaluation. A closed source database means I’m committing to whatever the steward of that database decides. If the company is well established and is a good steward of the product a closed source database can absolutely satisfy what I need.

On the flip side open source doesn’t immediately mean it is perfect. Is it open source but with an extremely restrictive license? Is there a community around it? Has it been getting new releases? All of these play into my level of comfort in trusting you with my data.

Can I hire for my database?

This one gets missed so often by early stage companies! It is the number one reason I like using open technologies and frameworks because I can hire someone already familiar with my tech stack. In contrast, with a home grown in house framework or database the ability to test knowledge is harder and the ramp up time is considerably more for a new hire. Postgres shines as bright as any database here. A look at Hacker news who is hiring trends, which I view as a leading indicator, from a couple years ago shows Postgres leading the pack of desired database skills. The number of people that know Postgres continues to increase each day. It is not a fading skill.

Whose hiring from HN

What does the future look like?

Finally, I’m looking at what my future needs will be combined with the future of the database. Does the database have momentum to keep improving and advancing? Does it not only have features I need today, but does it have features that can benefit me in the future without complicating my stack? I often favor a database that can solve multiple problems not just one. Combining 10 very specialized tools leads to a much more complex stack, vs. in Postgres if I need to layer in a little full text search I already have something that can be my starting point.

Does it scale is my final big one. If my business is expected to remain small this is no longer a concern, but I want to know what my limits are. Replacing my database is a large effort task, how far can I scale my database without rearchitecting.

Personally Postgres having a good answer to the scale question is what attracted me to join Citus over 3 years ago. It takes Postgres and makes it even more powerful. It removes the scaling question for you, so when I need to scale I have my answer.

These aren’t the only criteria

I’m sure this is not an exhaustive list, but it is a framework I’ve used for many years. In most of those years I’m lead back to the simple answer: Just use Postgres.

What other criteria do you use when choosing your database? Let us know @citusdata

↧

Yogesh Sharma: PostgreSQL Zero to Hero: Getting Started with RPMs -Part 1

March 20, 2019, 7:33 am

≫ Next: Peter Bengtsson: Best way to count distinct indexed things in PostgreSQL

≪ Previous: Craig Kerstiens: How to evaluate your database

One of the most important things to using PostgreSQL successfully in your development and production environments is simply getting started! One of the most popular ways to install PostgreSQL is by using RPM packages. The PostgreSQL RPM packages work across many Linux distributions, including, RedHat Enterprise Linux (RHEL), CentOS, Fedora, Scientific Linux, and more, and the PostgreSQL community provides installers for these distributions.

This guide will help you get started with installing and configuring PostgreSQL for a CentOS / RHEL 7 based system, which will also work for Fedora 29. We will be installing PostgreSQL 11, which is the latest major release of PostgreSQL as of this writing.

Installation

Installing yum / dnf repository setup rpm

↧

Peter Bengtsson: Best way to count distinct indexed things in PostgreSQL

March 21, 2019, 10:13 am

≫ Next: Pavel Stehule: How to split string to array by individual characters?

≪ Previous: Yogesh Sharma: PostgreSQL Zero to Hero: Getting Started with RPMs -Part 1

`SELECT COUNT(*) FROM (SELECT DISTINCT my_not_unique_indexed_column FROM my_table) t`

↧

Pavel Stehule: How to split string to array by individual characters?

March 21, 2019, 11:57 pm

≫ Next: Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part Two: Amazon RDS

≪ Previous: Peter Bengtsson: Best way to count distinct indexed things in PostgreSQL

Postgres has too much features, so sometimes is good to remind some.

Function string_to_array is well known. This function has two or three parameters. If second parameter (delimiter) is null, then input string is separated to array of characters.

postgres=# select string_to_array('Pavel Stěhule',null);
┌───────────────────────────────┐
│        string_to_array        │
╞═══════════════════════════════╡
│ {P,a,v,e,l," ",S,t,ě,h,u,l,e} │
└───────────────────────────────┘
(1 row)

↧

Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part Two: Amazon RDS

March 22, 2019, 3:47 am

≫ Next: Peter Geoghegan: Visualizing Postgres page images within GDB

≪ Previous: Pavel Stehule: How to split string to array by individual characters?

This is the second part of the multi-series Benchmarking Managed PostgreSQL Cloud Solutions. In Part 1 I presented an overview of the available tools, I discussed the reason for using the AWS Benchmark Procedure for Aurora, as well as PostgreSQL versions to be used, and I reviewed Amazon Aurora PostgreSQL 10.6.

In this part, pgbench and sysbench will be running against Amazon RDS for PostgreSQL 11.1. At the time of this writing the latest PostgreSQL version is 11.2 released about a month ago.

It’s worth pausing for a second to quickly review the PostgreSQL versions currently available in the cloud:

Amazon is again a winner, with its RDS offering, by providing the most recent version of PostgreSQL. As announced in the RDS forum AWS made PostgreSQL 11.1 available on March 13th, which is four months after the community release.

Setting Up the Environment

A few notes about the constraints related to setting up the environment and running the benchmark, points that were discussed in more detail during Part 1 of this series:

No changes to the cloud provider default GUC settings.
The connections are limited to a maximum of 1,000 as the AWS patch for pgbench did not apply cleanly. On a related note, I had to download the AWS timing patch from this pgsql-hackers submission since it was no longer available at the link mentioned in the guide.
The Enhanced Networking must be enabled for the client instance.
The database does not include a replica.
The database storage is not encrypted.
Both the client and the target instances are in the same availability zone.

First, setup the client and the database instances:

The client is an on demand r4.8xlarge EC2 instance:
- vCPU: 32 (16 Cores x 2 Threads/Core)
- RAM: 244 GiB
- Storage: EBS Optimized
- Network: 10 Gigabit
Client Instance Configuration
The DB Cluster is an on demand db.r4.2xlarge:
- vCPU: 8
- RAM: 61GiB
- Storage: EBS Optimized
- Network: 1,750 Mbps Max Bandwidth on an up to 10 Gbps connection
Database Instance Configuration

Next, install and configure the benchmark tools, pgbench and sysbench, by following the instructions in the Amazon guide.

The last step in getting the environment ready is configuring the PostgreSQL connection parameters. One way of doing it is by initializing the environment variables in .bashrc. Also, we need to set the paths to PostgreSQL binaries and libraries:

export PGHOST=benchmark.ctfirtyhadgr.us-east-1.rds.amazonaws.com

export PGHOST=benchmark.ctfirtyhadgr.us-east-1.rds.amazonaws.com
export PGUSER=postgres
export PGPASSWORD=postgres
export PGDATABASE=postgres
export PATH=$PATH:/usr/local/pgsql/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/pgsql/lib
Verify that everything is in place:
[root@ip-172-31-84-185 ~]# psql --version
psql (PostgreSQL) 11.1
[root@ip-172-31-84-185 ~]# pgbench --version
pgbench (PostgreSQL) 11.1
[root@ip-172-31-84-185 ~]# sysbench --version
sysbench 0.5

Running the Benchmarks

pgench

First, initialize the pgbench database.

[root@ip-172-31-84-185 ~]# pgbench -i --fillfactor=90 --scale=10000

The initialization process takes some time, and while running generated the following output:

dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
creating tables...
generating data...
100000 of 1000000000 tuples (0%) done (elapsed 0.06 s, remaining 599.79 s)
200000 of 1000000000 tuples (0%) done (elapsed 0.15 s, remaining 739.16 s)
300000 of 1000000000 tuples (0%) done (elapsed 0.22 s, remaining 742.21 s)
400000 of 1000000000 tuples (0%) done (elapsed 0.33 s, remaining 814.64 s)
500000 of 1000000000 tuples (0%) done (elapsed 0.41 s, remaining 825.82 s)
600000 of 1000000000 tuples (0%) done (elapsed 0.51 s, remaining 854.13 s)
700000 of 1000000000 tuples (0%) done (elapsed 0.66 s, remaining 937.01 s)
800000 of 1000000000 tuples (0%) done (elapsed 1.52 s, remaining 1897.42 s)
900000 of 1000000000 tuples (0%) done (elapsed 1.66 s, remaining 1840.08 s)

...

500600000 of 1000000000 tuples (50%) done (elapsed 814.78 s, remaining 812.83 s)
500700000 of 1000000000 tuples (50%) done (elapsed 814.81 s, remaining 812.53 s)
500800000 of 1000000000 tuples (50%) done (elapsed 814.83 s, remaining 812.23 s)
500900000 of 1000000000 tuples (50%) done (elapsed 815.11 s, remaining 812.19 s)
501000000 of 1000000000 tuples (50%) done (elapsed 815.20 s, remaining 811.94 s)

...

999200000 of 1000000000 tuples (99%) done (elapsed 1645.02 s, remaining 1.32 s)
999300000 of 1000000000 tuples (99%) done (elapsed 1645.17 s, remaining 1.15 s)
999400000 of 1000000000 tuples (99%) done (elapsed 1645.20 s, remaining 0.99 s)
999500000 of 1000000000 tuples (99%) done (elapsed 1645.23 s, remaining 0.82 s)
999600000 of 1000000000 tuples (99%) done (elapsed 1645.26 s, remaining 0.66 s)
999700000 of 1000000000 tuples (99%) done (elapsed 1645.28 s, remaining 0.49 s)
999800000 of 1000000000 tuples (99%) done (elapsed 1645.51 s, remaining 0.33 s)
999900000 of 1000000000 tuples (99%) done (elapsed 1645.77 s, remaining 0.16 s)
1000000000 of 1000000000 tuples (100%) done (elapsed 1646.03 s, remaining 0.00 s)
vacuuming...
creating primary keys...
total time: 5538.86 s (drop 0.00 s, tables 0.01 s, insert 1647.08 s, commit 0.03 s, primary 1251.60 s, foreign 0.00 s, vacuum 2640.14 s)
done.

Once that part is complete, verify that the PostgreSQL database has been populated. The following simplified version of the disk usage query can be used to return the PostgreSQL database size:

SELECT
   d.datname AS Name,
   pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
   pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) AS SIZE
FROM pg_catalog.pg_database d
WHERE d.datname = 'postgres';

…and the output:

  name   |  owner   |  size
----------+----------+--------
postgres | postgres | 160 GB
(1 row)

With all the preparations completed we can start the read/write pgbench test:

[root@ip-172-31-84-185 ~]# pgbench --protocol=prepared -P 60 --time=600 --client=1000 --jobs=2048

After 10 minutes we get the results:

starting vacuum...end.
progress: 60.0 s, 878.3 tps, lat 1101.258 ms stddev 339.491
progress: 120.0 s, 885.2 tps, lat 1132.301 ms stddev 292.551
progress: 180.0 s, 656.3 tps, lat 1522.102 ms stddev 666.017
progress: 240.0 s, 436.8 tps, lat 2277.140 ms stddev 524.603
progress: 300.0 s, 742.2 tps, lat 1363.558 ms stddev 578.541
progress: 360.0 s, 866.4 tps, lat 1146.972 ms stddev 301.861
progress: 420.0 s, 878.2 tps, lat 1143.939 ms stddev 304.396
progress: 480.0 s, 872.7 tps, lat 1139.892 ms stddev 304.421
progress: 540.0 s, 881.0 tps, lat 1132.373 ms stddev 311.890
progress: 600.0 s, 729.3 tps, lat 1366.517 ms stddev 867.784
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 1000
number of threads: 1000
duration: 600 s
number of transactions actually processed: 470582
latency average = 1274.340 ms
latency stddev = 544.179 ms
tps = 782.084354 (including connections establishing)
tps = 783.610726 (excluding connections establishing)

sysbench

The first step is adding some data:

sysbench --test=/usr/local/share/sysbench/oltp.lua \
      --pgsql-host=aurora.cluster-ctfirtyhadgr.us-east-1.rds.amazonaws.com \
      --pgsql-db=postgres \
      --pgsql-user=postgres \
      --pgsql-password=postgres \
      --pgsql-port=5432 \
      --oltp-tables-count=250\
      --oltp-table-size=450000 \
      prepare

The command creates 250 tables, each table having 2 indexes:

sysbench 0.5:  multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 450000 records into 'sbtest1'
Creating secondary indexes on 'sbtest1'...
Creating table 'sbtest2'...
...
Creating table 'sbtest250'...
Inserting 450000 records into 'sbtest250'
Creating secondary indexes on 'sbtest250'...

Let’s look at indexes:

postgres=> \di
                        List of relations
Schema |         Name          | Type  |  Owner   |      Table
--------+-----------------------+-------+----------+------------------
public | k_1                   | index | postgres | sbtest1
public | k_10                  | index | postgres | sbtest10
public | k_100                 | index | postgres | sbtest100
public | k_101                 | index | postgres | sbtest101
public | k_102                 | index | postgres | sbtest102
public | k_103                 | index | postgres | sbtest103

...

public | k_97                  | index | postgres | sbtest97
public | k_98                  | index | postgres | sbtest98
public | k_99                  | index | postgres | sbtest99
public | pgbench_accounts_pkey | index | postgres | pgbench_accounts
public | pgbench_branches_pkey | index | postgres | pgbench_branches
public | pgbench_tellers_pkey  | index | postgres | pgbench_tellers
public | sbtest100_pkey        | index | postgres | sbtest100
public | sbtest101_pkey        | index | postgres | sbtest101
public | sbtest102_pkey        | index | postgres | sbtest102
public | sbtest103_pkey        | index | postgres | sbtest103
public | sbtest104_pkey        | index | postgres | sbtest104
public | sbtest105_pkey        | index | postgres | sbtest105

...

public | sbtest97_pkey         | index | postgres | sbtest97
public | sbtest98_pkey         | index | postgres | sbtest98
public | sbtest99_pkey         | index | postgres | sbtest99
public | sbtest9_pkey          | index | postgres | sbtest9
(503 rows)

Looking good...to start the test just run:

sysbench --test=/usr/local/share/sysbench/oltp.lua \
      --pgsql-host=aurora.cluster-ctfirtyhadgr.us-east-1.rds.amazonaws.com \
      --pgsql-db=postgres \
      --pgsql-user=postgres \
      --pgsql-password=postgres \
      --pgsql-port=5432 \
      --oltp-tables-count=250 \
      --oltp-table-size=450000 \
      --max-requests=0 \
      --forced-shutdown \
      --report-interval=60 \
      --oltp_simple_ranges=0 \
      --oltp-distinct-ranges=0 \
      --oltp-sum-ranges=0 \
      --oltp-order-ranges=0 \
      --oltp-point-selects=0 \
      --rand-type=uniform \
      --max-time=600 \
      --num-threads=1000 \
      run

A note of caution:

RDS storage is not “elastic”, meaning that the storage space allocated when creating the instance must be large enough to fit the amount of data generated during benchmark, or else RDS will fail with:

FATAL: PQexec() failed: 7 PANIC:  could not write to file "pg_wal/xlogtemp.29144": No space left on device
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

FATAL: failed query: COMMIT
FATAL: failed to execute function `event': 3
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process

The storage size can be increased without stopping the database, however, it took me about 30 minutes to grow it from 200 GiB to 500 GiB:

Increasing storage space on RDS

And here are the sysbench test results:

sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1000
Report intermediate results every 60 second(s)
Random number generator seed is 0 and will be ignored

Forcing shutdown in 630 seconds

Initializing worker threads...

Threads started!

[  60s] threads: 1000, tps: 1070.40, reads: 0.00, writes: 4309.35, response time: 1808.81ms (95%), errors: 0.02, reconnects:  0.00
[ 120s] threads: 1000, tps: 889.68, reads: 0.00, writes: 3575.35, response time: 1951.12ms (95%), errors: 0.02, reconnects:  0.00
[ 180s] threads: 1000, tps: 574.57, reads: 0.00, writes: 2320.62, response time: 3936.73ms (95%), errors: 0.00, reconnects:  0.00
[ 240s] threads: 1000, tps: 232.10, reads: 0.00, writes: 928.43, response time: 10994.37ms (95%), errors: 0.00, reconnects:  0.00
[ 300s] threads: 1000, tps: 242.40, reads: 0.00, writes: 969.60, response time: 9412.39ms (95%), errors: 0.00, reconnects:  0.00
[ 360s] threads: 1000, tps: 257.73, reads: 0.00, writes: 1030.98, response time: 8833.64ms (95%), errors: 0.02, reconnects:  0.00
[ 420s] threads: 1000, tps: 264.65, reads: 0.00, writes: 1036.60, response time: 9192.42ms (95%), errors: 0.00, reconnects:  0.00
[ 480s] threads: 1000, tps: 278.07, reads: 0.00, writes: 1134.27, response time: 7133.76ms (95%), errors: 0.00, reconnects:  0.00
[ 540s] threads: 1000, tps: 250.40, reads: 0.00, writes: 1001.53, response time: 9628.97ms (95%), errors: 0.00, reconnects:  0.00
[ 600s] threads: 1000, tps: 249.97, reads: 0.00, writes: 996.92, response time: 10724.58ms (95%), errors: 0.00, reconnects:  0.00
OLTP test statistics:
   queries performed:
      read:                            0
      write:                           1038401
      other:                           519199
      total:                           1557600
   transactions:                        259598 (428.59 per sec.)
   read/write requests:                 1038401 (1714.36 per sec.)
   other operations:                    519199 (857.18 per sec.)
   ignored errors:                      3      (0.00 per sec.)
   reconnects:                          0      (0.00 per sec.)

General statistics:
   total time:                          605.7086s
   total number of events:              259598
   total time taken by event execution: 602999.7582s
   response time:
         min:                                 55.02ms
         avg:                               2322.82ms
         max:                              13133.36ms
         approx.  95 percentile:            8400.39ms

Threads fairness:
   events (avg/stddev):           259.5980/3.20
   execution time (avg/stddev):   602.9998/2.77

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Benchmark Metrics

The metrics can be captured using AWS monitoring tools CloudWatch and Performance Insights. Here a few samples for the curious:

DB Instance CloudWatch Metrics

RDS Performance Insights - Counter Metrics

RDS Performance Insights - Database Load

Results

pgbench initialization results

pgbench run results

sysbench results

Conclusion

Despite running PostgreSQL version 10.6, Amazon Aurora clearly outperforms RDS which is at version 11.1, and that comes as no surprise. According to Aurora FAQs Amazon went to great lengths in order to improve the overall database performance which was built on top of a redesigned storage engine.

Next in Series

The next part will be about Google Cloud SQL for PostgreSQL.

Tags:

PostgreSQL

postgres

Amazon RDS

rds

AWS

↧

Peter Geoghegan: Visualizing Postgres page images within GDB

March 22, 2019, 5:07 pm

≫ Next: Tatsuo Ishii: Shared Relation Cache

≪ Previous: Viorel Tabara: Benchmarking Managed PostgreSQL Cloud Solutions - Part Two: Amazon RDS

It's straightforward to set up GDB to quickly invoke pg_hexedit on a page image, without going through the filesystem. The page image can even come from a local temp buffer.
A user-defined GDB command can be created which shows an arbitrary page image in pg_hexedit from an interactive GDB session.

This is a good way to understand what's really going on when debugging access method code. It also works well with core dumps. I found this valuable during a recent project to improve the Postgres B-Tree code.

An example of how to make this work is available from a newly added section of the pg_hexedit README file:

https://github.com/petergeoghegan/pg_hexedit/#using-pg_hexedit-while-debugging-postgres-with-gdb

↧

Tatsuo Ishii: Shared Relation Cache

March 24, 2019, 1:58 am

≫ Next: Beena Emerson: PostgreSQL : Test Coverage

≪ Previous: Peter Geoghegan: Visualizing Postgres page images within GDB

System catalogs?

Pgpool-II needs to access PostgreSQL's system catalogs whenever it recognizes tables in user's query. For example, Pgpool-II has to know whether the table in question is a temporary table or not. If it's a temporary table, then the query using the temporary table must be routed to the primary PostgreSQL, rather than one of standby PostgreSQL servers because PostgreSQL does not allow to create temporary tables on standby servers. Another use case is converting the table name to OID (Object Identifier). OIDs are unique keys for objects managed in PostgreSQL's system catalogs.

Same thing can be said to functions. Details of functions, for instance whether they are "immutable" or not, is important information since it affects the decision on which the query result using the function should be cached or not when query cache feature is enabled.

Local query cache for system catalogs

Sometimes Pgpool-II needs to issue up to as many as 10 queries to the system catalog when it sees a table or function for the first time. Fortunately Pgpool-II does not wast the query results. They are stored in local cache (wee call it "Relation Cache" or "Relcache"), and next time it sees the object in the same or different queries, it extracts info from the local cache . So far so good.

Problem is, the local cache is stored in private memory in Pgpool-II' s child process. For each new session from Pgpool-II clients, different child process is assigned for the session. So even if single table is used in queries, Pgpool-II continues to access system catalogs until the table's info gets filled in all the local caches.

Shared relation cache

How to mitigate the problem? One of the solutions would be sharing the relation cache info among Pgpool-II processes. This way, once one of the processes accesses the system catalogs and obtains the info, other processes do not need to access the system catalogs any more. The cache shared by the processes is called "shared relation cache".

How to implement it?

But how to implement it? Fortunately Pgpool-II already has shared query cache. Why can't we store the shared relation cache on it? Here's the idea:

If the table/function info is not in the local relation cache, check the shared relation cache.
If it is not in the shared relation cache, access the system catalogs and store the info in the local cache. Also copy the info to the shared relation cache,
If the table/function info is already in the shared relation cache, just copy the info to the local cache.

You might wonder why there are two kinds of cache: local one and shared one. The reason is locking. Since local cache is never be accessed by multiple processes, it does not need any locking, while the shared relation cache can be accessed by multiple processes, it must be guarded by locking, and this could be a serious bottle neck if there are many processes.

Cache invalidation

Any cache needs to be invalidated someday. In the current implementation the cache invalidation is based on timeout. The timeout value can be specified using "relcache_expire" parameter, which controls the local cache timeout as well.

Is it faster?

Is shared relation cache is faster? Well, it depends on use cases. If there are very small number of tables or functions, overhead of new shared relation cache will not give advantages. However there are many tables/functions, definitely it wins. This is the reason why Pgpool-II has a switch (enabled_shared_relcache) to enable or disable the feature.

To demonstrate the case when the feature wins, I created 1, 2, 4, 8, 16, 32, 64, 128 and 256 tables (all tables are empty), and accessed them using pgbench. pgbench's option is as follows:

pgbench -C -T 30 -c 30 -n -f script_file

pgbench ran 3 times for each session and I used the average of the numbers.
The script file includes 1-128 SELECTs to access each table.

The blue line (Speed = 1) is the base line, i.e. when the feature is disabled. The red line is when the feature is enabled. As you can see as the number of tables increases, performance increases as well, up to 32 tables. As the number of tables increases, performance is getting lower but still the performance when shared relcache being on is superior than off.

The result may differ according to the workload: if the SELECT is heavy, then the effect may be weaken because the longer access time of SELECT hides the effect of shared relache.

When it will be available?

The feature is already comitted into Pgpool-II version 4.1, which is supposed to be released around September 2019. So stay tuned!

↧

Beena Emerson: PostgreSQL : Test Coverage

March 25, 2019, 1:21 am

≫ Next: Luca Ferrari: psql.it Mailing List is Back!

≪ Previous: Tatsuo Ishii: Shared Relation Cache

Install lcov

Install Dependencies:

yum install perl-devel

yum install perl-Digest-MD5

yum install perl-GD

Download and install lcov

rpm -U lcov-1.13-1.el7.noarch.rpm

Run Test

Configure and make

Use the --enable-coverage configure flag

./configure --enable-coverage

make -j 4

Run make check

cd src/

make check -i

Check Coverage

HTML output

make coverage-html

A .gcov output file is created for each file in the test and a folder named 'coverage' with the index.html file to display the coverage information. The HTML page will show a summary of the coverage for each folder and recursively for each file and then for each line.

Text output

make coverage

A .gcov and .gcov.out file is created for each file in the test.

Reset

make coverage-clean

This resets the execution count.

Output files

<file>.gcov.out

This list out the details fo each function in the file. An example output for a function is shown below:

Function 'heap_sync' Lines executed:100.00% of 10 Branches executed:100.00% of 4 Taken at least once:75.00% of 4 Calls executed:100.00% of 6

<file>.gcov

This displays the original file entirely along with the original line number and the count of the number of times each line was executed Lines which were not executed are marked with hashes ‘######’ and '-' indicated that the line is not executable..

-: 9258: /* main heap */ 50: 9259: FlushRelationBuffers(rel); call 0 returned 100%

. <more code>

#####: 9283: Page page = (Page) pagedata;

-: 9284: OffsetNumber off;

-: 9285:

#####: 9286: mask_page_lsn_and_checksum(page);

call 0 never executed

index.html

The home page:

This lists out all the sub directory along with their coverage data.

Per directory info:

On clicking a particular directory, we get the coverage info of each file in the selected directory.

Select a file:

This gives out the per line hit count of the selected file. The one highlighted in blue are hit and those in red are never executed during the test run.

↧

Luca Ferrari: psql.it Mailing List is Back!

March 24, 2019, 5:00 pm

≫ Next: Regina Obe: PGConf US 2019 Data Loading Slides up

≪ Previous: Beena Emerson: PostgreSQL : Test Coverage

The historical mailing list of the Italian psql.it group has been succesfully migrated!

`psql.it` Mailing List is Back!

With the great work of people behind the psql.it Italian group the first (and for many years the only one) Italian language mailing list has been migrated to a new platform and is now online again!

On this mailing list you can find a few very talented people willing to help with your PostgreSQL-related problem or curiosity, to discuss the current status and the future of the development and anything else you would expect from a very technical mailing list. Of course, the language is Italian!.

The link to the new mailing list management panel is https://www.freelists.org/list/postgresql-it.
Enjoy!

↧

Regina Obe: PGConf US 2019 Data Loading Slides up

March 25, 2019, 3:00 pm

≫ Next: Stefan Fercot: pgBackRest archiving tricks

≪ Previous: Luca Ferrari: psql.it Mailing List is Back!

I gave a talk at PGConf US 2019 on some of the many ways you can load data into PostgreSQL using open source tools. This is similar to the talk I gave last year but with the addition of the pgloader commandline tool and the http PostgreSQL extension.

HTML slides PDF slides

Even though it was a talk Not much about PostGIS, but just tricks for loading data, I managed to get a mouthful of PostGIS in there.

↧

Stefan Fercot: pgBackRest archiving tricks

March 25, 2019, 5:00 pm

≫ Next: Hans-Juergen Schoenig: Speeding up GROUP BY in PostgreSQL

≪ Previous: Regina Obe: PGConf US 2019 Data Loading Slides up

pgBackRest is a well-known powerful backup and restore tool.

While the documentation describes all the parameters, it’s not always that simple to imagine what you can really do with it.

In this post, I will introduce the asynchronous archiving and the possibility to avoid PostgreSQL to go down in case of archiving problems.

With its “info” command, for performance reasons, pgBackRest doesn’t check that all the needed WAL segments are still present. check_pgbackrest is clearly built for that. The two tricks mentioned above can produce gaps in the archived WAL segments. The new 1.5 release of check_pgbackrest provides ways to handle that, we’ll also see how.

Installation

First of all, install PostgreSQL and pgBackRest packages directly from the PGDG yum repositories:

$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/11/redhat/\
rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm
$ sudo yum install -y postgresql11-server postgresql11-contrib
$ sudo yum install -y pgbackrest

Check that pgBackRest is correctly installed:

$ pgbackrest
pgBackRest 2.11 - General help

Usage:
    pgbackrest [options] [command]

Commands:
    archive-get     Get a WAL segment from the archive.
    archive-push    Push a WAL segment to the archive.
    backup          Backup a database cluster.
    check           Check the configuration.
    expire          Expire backups that exceed retention.
    help            Get help.
    info            Retrieve information about backups.
    restore         Restore a database cluster.
    stanza-create   Create the required stanza data.
    stanza-delete   Delete a stanza.
    stanza-upgrade  Upgrade a stanza.
    start           Allow pgBackRest processes to run.
    stop            Stop pgBackRest processes from running.
    version         Get version.

Use 'pgbackrest help [command]'for more information.

Create a basic PostgreSQL cluster :

$ sudo /usr/pgsql-11/bin/postgresql-11-setup initdb

Configure pgBackRest to backup the local cluster

By default, the configuration file is /etc/pgbackrest.conf. Let’s make a copy:

$ sudo cp /etc/pgbackrest.conf /etc/pgbackrest.conf.bck

Update the configuration:

[global]repo1-path=/var/lib/pgbackrestrepo1-retention-full=1process-max=2log-level-console=infolog-level-file=debugarchive-async=yarchive-push-queue-max=100MBspool-path=/var/spool/pgbackrest[some_cool_stanza_name]pg1-path=/var/lib/pgsql/11/data

Make sure that the postgres user can write in /var/lib/pgbackrest and in /var/spool/pgbackrest.

Configure archiving in the postgresql.conf file:

archive_mode = on
archive_command = 'pgbackrest --stanza=some_cool_stanza_name archive-push %p'

Start the PostgreSQL cluster:

$ sudo systemctl start postgresql-11

Create the stanza and check the configuration:

$ sudo-iu postgres pgbackrest --stanza=some_cool_stanza_name stanza-create
P00   INFO: stanza-create command end: completed successfully

$ sudo-iu postgres pgbackrest --stanza=some_cool_stanza_name check
P00   INFO: WAL segment 000000010000000000000001 successfully stored in the 
    archive at '/var/lib/pgbackrest/archive/some_cool_stanza_name/
    11-1/0000000100000000/
    000000010000000000000001-03a91d4d64251a54cf9d48ed59382d3cce3c7652.gz'
P00   INFO: check command end: completed successfully 

Let’s finally take our first backup:

$ sudo-iu postgres pgbackrest --stanza=some_cool_stanza_name --type=full backup
...
P00   INFO: new backup label = 20190325-142918F
P00   INFO: backup command end: completed successfully
...

pgBackRest configuration explanations

What the documentation says:

archive-async

Push/get WAL segments asynchronously.

Enables asynchronous operation for the archive-push and archive-get commands.

Asynchronous operation is more efficient because it can reuse connections and take advantage of parallelism.

archive-push-queue-max

Maximum size of the PostgreSQL archive queue.

After the limit is reached, the following will happen:

pgBackRest will notify PostgreSQL that the WAL was successfully archived, then DROP IT.
A warning will be output to the PostgreSQL log.

If this occurs, then, the archive log stream will be interrupted and PITR will not be possible past that point. A new backup will be required to regain full restore capability.

In asynchronous mode the entire queue will be dropped to prevent spurts of WAL getting through before the queue limit is exceeded again.

The purpose of this feature is to prevent the log volume from filling up at which point PostgreSQL will stop completely.

Better to lose the backup than have PostgreSQL go down.

Don’t use this feature if you want to rely entirely on your backups!

spool-path

This path is used to store data for the asynchronous archive-push and archive-get command.

The asynchronous archive-push command writes acknowledgments into the spool path when it has successfully stored WAL in the archive (and errors on failure) so the foreground process can quickly notify PostgreSQL.

Test the archiving process

Right after the backup, let’s see what’s in the spool directory:

$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000003.00000028.backup.ok
000000010000000000000003.ok

Generate a small database change and switch WAL:

$ sudo-iu postgres psql -c"DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
NOTICE:  table "my_table" does not exist, skipping
 pg_switch_wal 
---------------
 0/4016850
(1 row)

Check the spool directory again:

$ ls-l /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000004.ok

What’s in the archives directory?

$ ls /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/0000000100000000/
000000010000000000000003.00000028.backup
000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz

Break it!

$ sudo chmod -R 500 /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/

Generate a small database change and switch WAL:

$ sudo-iu postgres psql -c"DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
 pg_switch_wal 
---------------
 0/501BF88
(1 row)

Check the archiving process:

$ ls /var/lib/pgsql/11/data/pg_wal/archive_status/
000000010000000000000003.00000028.backup.done
000000010000000000000005.ready

$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000005.error

By default, a WAL segment is 16MB. We configured archive-push-queue-max to 100MB, so approximatively 6 archived WAL segments.

What happens after the seventh fail?

Generate a small database change and switch WAL 5 more times with the same command as above.

$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000005.error
000000010000000000000006.error
000000010000000000000007.error
000000010000000000000008.error
000000010000000000000009.error
00000001000000000000000A.error

$ ps -ef |grep postgres |grep archiver
00:00:00 postgres: archiver   failed on 000000010000000000000005

The archiver process is still blocked on the first fail.

Generate the seventh failure:

$ sudo-iu postgres psql -c"DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
 pg_switch_wal 
---------------
 0/B0025B8
(1 row)$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/
000000010000000000000005.ok
000000010000000000000006.ok
000000010000000000000007.ok
000000010000000000000008.ok
000000010000000000000009.ok
00000001000000000000000A.ok
00000001000000000000000B.ok

$ ps -ef |grep postgres |grep archiver
00:00:00 postgres: archiver   last was 00000001000000000000000B

The archiver isn’t failing anymore BUT there’s no WAL archived either:

$ sudo-iu postgres pgbackrest info --stanza=some_cool_stanza_name
stanza: some_cool_stanza_name
    status: ok
    cipher: none

    db (current)
        wal archive min/max (11-1): 000000010000000000000003/000000010000000000000004

        full backup: 20190325-142918F
            timestamp start/stop: 2019-03-25 14:29:18 / 2019-03-25 14:29:28
            wal start/stop: 000000010000000000000003 / 000000010000000000000003
            database size: 23.5MB, backup size: 23.5MB
            repository size: 2.8MB, repository backup size: 2.8MB

Repair it

$ sudo chmod -R 750 /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/

Generate a small database change, switch WAL and check the archiving process:

$ sudo-iu postgres psql -c"DROP TABLE IF EXISTS my_table; CREATE TABLE my_table(id int); SELECT pg_switch_wal();"
 pg_switch_wal 
---------------
 0/C0194A8
(1 row)$ ls /var/spool/pgbackrest/archive/some_cool_stanza_name/out/ 
00000001000000000000000C.ok

$ sudo-iu postgres pgbackrest info --stanza=some_cool_stanza_name
stanza: some_cool_stanza_name
    status: ok
    cipher: none

    db (current)
        wal archive min/max (11-1): 000000010000000000000003/00000001000000000000000C

        full backup: 20190325-142918F
            timestamp start/stop: 2019-03-25 14:29:18 / 2019-03-25 14:29:28
            wal start/stop: 000000010000000000000003 / 000000010000000000000003
            database size: 23.5MB, backup size: 23.5MB
            repository size: 2.8MB, repository backup size: 2.8MB

$ ls /var/lib/pgbackrest/archive/some_cool_stanza_name/11-1/0000000100000000/
000000010000000000000003.00000028.backup
000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
00000001000000000000000C-c90e3f9fbac504f51f44e1446c653d8a124dbd86.gz

Archiving is working but there’s missing archives and pgBackRest doesn’t see it.

The gap is here generated by the archive-push-queue-max but you could also have a gap simply due to asynchronous archiving with process-max greater than 1.

check_pgbackrest 1.5

The new 1.5 release offers some interesting changes:

Add --debug option to print some debug messages.
Add ignore-archived-since argument to ignore the archived WALs since the provided interval.
Add --latest-archive-age-alert to define the max age of the latest archived WAL before raising a critical alert.

Download check_pgbackrest:

$ sudo yum install -y perl-JSON
$ sudo-iu postgres
$ wget https://raw.githubusercontent.com/dalibo/check_pgbackrest/REL1_5/check_pgbackrest
$ chmod +x check_pgbackrest

This installation procedure is just a simple example.

Now, check the archives chain to know if there’s something missing:

$ ./check_pgbackrest --stanza=some_cool_stanza_name --service=archives 
                     --repo-path=/var/lib/pgbackrest/archive --format=human
Service        : WAL_ARCHIVES
Returns        : 2 (CRITICAL)
Message        : wrong sequence or missing file @ '000000010000000000000005'
Long message   : latest_archive_age=9m54s
Long message   : num_archives=3
Long message   : archives_dir=/var/lib/pgbackrest/archive/some_cool_stanza_name/11-1
Long message   : min_wal=000000010000000000000003
Long message   : max_wal=00000001000000000000000C
Long message   : oldest_archive=000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
Long message   : latest_archive=00000001000000000000000C-c90e3f9fbac504f51f44e1446c653d8a124dbd86.gz

Let’s ignore the latest archive producing the gap:

$ ./check_pgbackrest --stanza=some_cool_stanza_name --service=archives 
                     --repo-path=/var/lib/pgbackrest/archive --format=human 
                     --debug--ignore-archived-since=15m
DEBUG: file 000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz as interval since epoch : 36m52s
DEBUG: file 000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz as interval since epoch : 33m58s
DEBUG: file 00000001000000000000000C-c90e3f9fbac504f51f44e1446c653d8a124dbd86.gz as interval since epoch : 11m45s
DEBUG: max_wal changed to 000000010000000000000004
DEBUG: checking WAL 000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
DEBUG: checking WAL 000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz
Service        : WAL_ARCHIVES
Returns        : 0 (OK)
Message        : 2 WAL archived, latest archived since 33m58s
Long message   : latest_archive_age=33m58s
Long message   : num_archives=2
Long message   : archives_dir=/var/lib/pgbackrest/archive/some_cool_stanza_name/11-1
Long message   : min_wal=000000010000000000000003
Long message   : max_wal=000000010000000000000004
Long message   : oldest_archive=000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
Long message   : latest_archive=000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz

You also might want to receive an alert if the latest archive is too old:

$ ./check_pgbackrest --stanza=some_cool_stanza_name --service=archives 
                     --repo-path=/var/lib/pgbackrest/archive --format=human 
                     --ignore-archived-since=20m --latest-archive-age-alert=10m
Service        : WAL_ARCHIVES
Returns        : 2 (CRITICAL)
Message        : latest_archive_age (39m16s) exceeded
Long message   : latest_archive_age=39m16s
Long message   : num_archives=2
Long message   : archives_dir=/var/lib/pgbackrest/archive/some_cool_stanza_name/11-1
Long message   : min_wal=000000010000000000000003
Long message   : max_wal=000000010000000000000004
Long message   : oldest_archive=000000010000000000000003-5050f0829090a98c5f92ff112417a2bf6c115ffa.gz
Long message   : latest_archive=000000010000000000000004-3f9de64182e110ddcfe34d1191ad71c90f4fef3e.gz

The 2 options are here combined to avoid the alert on the missing archived WAL segments.

Conclusion

pgBackRest offers a lot of possibilities but, mainly for performance reasons, doesn’t check the archives consistency.

Combine it with, for example, a good monitoring system and the check_pgbackrest plugin for more safety.

↧

Hans-Juergen Schoenig: Speeding up GROUP BY in PostgreSQL

March 26, 2019, 2:00 am

≫ Next: Paul Ramsey: Notes for FDW in PostgreSQL 12

≪ Previous: Stefan Fercot: pgBackRest archiving tricks

In SQL the GROUP BY clause groups records into summary rows and turns large amounts of data into a smaller set. GROUP BY returns one records for each group. While most people know how to use GROUP BY not many actually know how to squeeze the last couple of percentage points out of the query. There is a small optimization, which can help you to speed up things by a couple of percents quite reliably. If you want to speed up GROUP BY clauses, this post is for you.

Creating a test data set in PostgreSQL

To prepare ourselves for the aggregation we first have to generate some data:

test=# CREATE TABLE t_agg (x int, y int, z numeric);
CREATE TABLE
test=# INSERT INTO t_agg SELECT id % 2, id % 10000, random() 
	FROM 	generate_series(1, 10000000) AS id;
INSERT 0 10000000

The interesting part is that the first column only has 2 distinct values while the second column will contain 10.000 different values. That is going to be of great importance for our optimization efforts.

Let us VACUUM the table to set hint bits and to build optimizer statistics. To make those execution plans more readable I also decided to turn off parallel queries:

test=# VACUUM ANALYZE ;
VACUUM
test=# SET max_parallel_workers_per_gather TO 0;
SET

Running a simple aggregation

Now that the is in place the first tests can be started:

test=# explain analyze SELECT x, y, avg(z) FROM t_agg GROUP BY 1, 2;
                                               QUERY PLAN                                                         
--------------------------------------------------------------------------------------------
 HashAggregate  (cost=238697.01..238946.71 rows=19976 width=40) 
		(actual time=3334.320..3339.929 rows=10000 loops=1)
   Group Key: x, y
   ->  Seq Scan on t_agg  (cost=0.00..163696.15 rows=10000115 width=19) 
		(actual time=0.058..636.763 rows=10000000 loops=1)
 Planning Time: 0.399 ms
 Execution Time: 3340.483 ms
(5 rows)

PostgreSQL will read the entire table sequentially and perform a hash aggregate. As you can see most of the time is burned by the hash aggregate (3.3 seconds minus 636 milliseconds). The resultset contains 6000 rows. However, we can do better. Keep in mind that the first column does not contain as many different values as the second column. That will have some implications as far as the hash aggregate is concerned. Let us try to play around with the GROUP BY clause

Changing aggregation order can improve performance

Let us run the same query again. But this time we won’t use “GROUP BY x, y” but instead use “GROUP BY y, x”. The result of the statement will be exactly the same as before (= 10.000 groups). However, the slightly modified query will be faster:

test=# explain analyze SELECT x, y, avg(z) FROM t_agg GROUP BY 2, 1;
                                                        QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=238697.01..238946.71 rows=19976 width=40) 
		(actual time=2911.989..2917.276 rows=10000 loops=1)
   Group Key: y, x
   ->  Seq Scan on t_agg  (cost=0.00..163696.15 rows=10000115 width=19) 
		(actual time=0.052..580.747 rows=10000000 loops=1)
 Planning Time: 0.144 ms
 Execution Time: 2917.706 ms
(5 rows)

Wow, the query has improved considerably. We saved around 400ms, which is a really big deal. The beauty is that we did not have to rearrange the data, change the table structure, adjust memory parameters or make any other changes to the server. All I did was to change the order in which PostgreSQL aggregated the data.

Which conclusions can developers draw from this example? If you are grouping by many different columns: Take the ones containing more distinct values first and group by the less frequent values later. It will make the hash aggregate run more efficiently in many cases. Also try to make sure that work_mem is high enough to make PostgreSQL trigger a hash aggregate in the first place. Using a hash is usually faster than letting PostgreSQL use the “group aggregate”.

It is very likely that future versions of PostgreSQL (maybe starting with PostgreSQL 12?) will already do this kind of change automatically. A patch has already been proposed by Teodor Sigaev and I am quite confident that this kind of optimization will make it into PostgreSQL 12. However, in the meantime it should be easy to make the change by hand and enjoy a nice, basically free speedup.

If you want to learn more about GROUP BY, aggregations and work_mem in general, consider checking out my blog post about this topic. On behalf of the entire team I wish everybody “happy performance tuning”. If you want to learn more about aggregation and check out Teodor Sigaev’s patch, check out the PostgreSQL mailing list.

If you want to learn more about performance tuning, advanced SQL and so on, consider checking out one of our posts about windows functions and analytics.

The post Speeding up GROUP BY in PostgreSQL appeared first on Cybertec.

↧

Paul Ramsey: Notes for FDW in PostgreSQL 12

March 26, 2019, 6:00 am

≫ Next: Avinash Kumar: PostgreSQL Upgrade Using pg_dump/pg_restore

≪ Previous: Hans-Juergen Schoenig: Speeding up GROUP BY in PostgreSQL

TL;DR: There are some changes in PostgresSQL 12 that FDW authors might be surprised by! Super technical, not suitable for ordinary humans.

OK, so I decided to update my two favourite extension projects (pgsql-http and pgsql-ogr-fdw) yesterday to support PostgreSQL 12 (which is the version currently under development likely to be released in the fall).

Fixing up pgsql-http was pretty easy, involving just one internal function signature change.

Fixing up pgsql-ogr-fdw involved some time in the debugger wondering what had changed.

Your Slot is Empty

When processing an FDW insert/update/delete, your code is expected to take a TupleTableSlot as input and use the data in that slot to apply the insert/update/delete operation to your backend data store, whatever that may be (OGR in my case). The data lived in the tts_values array, and the null flags in tts_isnull.

In PostgreSQL 12, the slot arrives at your ExecInsert/ExecUpdate/ExecDelete callback function empty! The tts_values array is populated with Datum values of 0, yet the tts_isnull array is full of true values. There’s no data to pass back to the FDW source.

What gives?!?

Andres Freund has been slowly laying the groundwork for pluggable storage in PostgreSQL, and one of the things that work has affected is TupleTableSlot. Now when you get a slot, it might not have been fully populated yet, and that is what is happening in the FDW code.

The short-term fix is just to force the slot to populate by calling slot_getallattrs, and then go on with your usual work. That’s what I did. A more future-proof way would be to use slot_getattr and only retrieve the attributes you need (assuming you don’t just need them all).

Your VarLena might have a Short Header

Varlena types are the variable size types, like text, bytea, and varchar. Varlena types store their length and some extra information in a header. The header is potentially either 4 bytes or 1 byte long. Practically it is almost always a 4 byte header. If you call the standard VARSIZE and VARDATA macros on a varlena, the macros assume a 4 byte header.

The assumption has always held (for me), but not any more!

I found that as of PostgreSQL 12, I’m getting back varchar data with a 1-byte header! Surprise!

The fix is to stop assuming a 4-byte header. If you want the size of the varlena data area, less the header, use VARSIZE_ANY_EXHDR instead of VARSIZE() - VARHDRSZ. If you want a pointer into the data area, use VARDATA_ANY instead of VARDATA. The “any” macros first test the header type, and then apply the appropriate macro.

I have no idea what commit caused short varlena headers to make a comeback, but it was fun figuring out what the heck was going on.

↧