Michael Paquier: Postgres 12 highlight - Table Access Methods and blackholes

June 3, 2019, 10:16 pm

≫ Next: Venkata Nagothi: How to Optimize PostgreSQL Logical Replication

≪ Previous: Regina Obe: PostGIS 3.0.0alpha2

Postgres is very nice when it comes to extending with custom plugins, with many set of facilities available, like:

Decoder plugins
Extensions
Background workers
Index access methods
Hooks
Custom function, aggregate, data types, etc.

After a heavy refactoring of the code, Postgres 12 ships with a basic infrastructure for table access methods which allow to customize how table data is stored and accessed. By default, all tables in PostgreSQL use the historical heap, which works on a page-based method of 8kB present in segment files of 1GB (default sizes), with full tuple versions stored. This means, in simple words, that even updating one attribute of a tuple requires storing a full new version. This makes the work related to vacuum and autovacuum more costly as well. Well, the goal of this post is not to discuss about that, and there is documentation on the matter. So please feel free to refer to it.

Table access methods are really cool, because they basically allow to plugin directly into Postgres a kind of equivalent to MySQL storage engines, making it possible to implement things like columnar storage, which is something where heap is weak at. It is possible to roughly classify what is possible to do into two categories:

Access method going through the storage manager of Postgres, which makes use of the existing shared buffer layer, with the exiting paging format. This has two advantages: backups and checksums are normally, and mostly, automatically supported.
Access method not going through Postgres, which has the advantage to not rely on Postgres shared buffers (page format can be a problem as well), making it possible to rely fully on the OS cache. Note that it is then up to you to add support for checksumming, backups, and such.

Access methods could make a comparison with foreign data wrappers, but the reliability is much different, one big point being that they are fully transactional with the backend they work with, which is usually a big deal for applications, and have transparent DDL and command support (if implemented in the AM).

Last week at PGCon in Ottawa, there were two talks on the matter by:

The presentation slides are attached directly on those links, and these will give you more details about the feature. Note that there have been recent discussions with new AMs, like zheap or zstore (names beginning by ‘z’ because that’s a cool letter to use in a name). It is also limiting to not have pluggable WAL (generic WAL can be used but that’s limited and not performance-wise), but this problem is rather hard to tackle as contrary to table AMs, WAL require registering callbacks out of system catalogs, and resource manager IDs (understand a category of WAL records) need to have hard values. Note that TIDs may also become of problem depending on the AM.

There is a large set of callbacks defining what a table AM is (42 as of when writing this post), and the interface may change in the future, still this version provides a very nice first cut.

On the flight back from Ottawa, I took a couple of hours to look at this set of callbacks and implemented a template for table access methods called blackhole_am. This AM is mainly here as a base for creating a new plugin, and it has the property to send to the void any data on a table making use of it. Note that creating a table access method requires CREATE ACCESS METHOD, which is embedded directly in an extension here:

=# CREATE EXTENSION blackhole_am;
CREATE EXTENSION
=# \dx+ blackhole_am
   Objects in extension "blackhole_am"
           Object description
-----------------------------------------
 access method blackhole_am
 function blackhole_am_handler(internal)
(2 rows)

Then a table can be defined to use it, throwing away any data:

=# CREATE TABLE blackhole_tab (id int) USING blackhole_am;
CREATE TABLE
=# INSERT INTO blackhole_tab VALUES (generate_series(1,100));
INSERT 0 100
=# SELECT * FROM blackhole_tab;
 id
----
(0 rows)

Note that there is a parameter controlling the default table access method, called default_table_access_method, enforcing the value of the USING clause to it. “heap” is the default. This feature opens a lot of doors and possibilities, so have fun with it.

↧

Venkata Nagothi: How to Optimize PostgreSQL Logical Replication

June 5, 2019, 2:48 am

≫ Next: Hans-Juergen Schoenig: Tech preview: Improving COPY and bulkloading in PostgreSQL 12

≪ Previous: Michael Paquier: Postgres 12 highlight - Table Access Methods and blackholes

Logical Replication or Pglogical is a table level, WAL based replication mechanism which replicates the data of specific Tables between two PostgreSQL instances. There seems to be a confusion between “pglogical” and “Logical Replication”. Both of them provide the same kind of replication mechanism with some differences in features and capabilities. Logical Replication is introduced in PostgreSQL-10 as an in-built feature unlike pglogical which is an extension. “Pglogical” with ongoing continuous developments, remains as the only option for implementing Logical Replication for those environments using PostgreSQL versions prior to 10. Eventually, all the features part of pglogical will be part of Logical Replication. In other words, pglogical (extension) became Logical Replication (in-built feature). The basic advantage of Logical Replication is that it does not need any extensions to be installed / created which is in turn beneficial to those environments where-in installing extensions is restricted.

This blog will focus on optimizing Logical Replication. That means, the optimization tips and techniques highlighted in this blog will apply for both pglogical and Logical Replication.

Logical Replication is a WAL based replication which is first of its kind. As a DBA, this would be much more reliable and performant replication mechanism when compared to other trigger based replication solutions. The changes made to the tables part of pglogical replication are replicated in real-time via WAL records which makes it highly efficient and non complex. All of the other replication mechanisms in the market are trigger based which can pose performance and maintenance challenges. With Logical Replication coming in, dependency on trigger based replication is almost gone.

There are other blogs which explain how to configure Logical Replication in quite a detail.

In this blog, the focus will be on how to optimize Logical Replication.

Optimizing Logical Replication

To begin with, the behaviour of “Logical Replication” is quite similar to “Streaming Replication”, the only difference is that streaming replication replicates the complete database whereas Logical Replication replicates only individual tables. When choosing specific individual tables to replicate, there are factors / challenges to be foreseen.

Let us take a look at factors influencing Logical replication.

Factors Influencing Logical Replication Performance

Optimizing Logical Replication is important to ensure data is replicated seamlessly without any interruptions. There are factors to foresee before setting up it. Let us take a look at them:

The type of data stored in the Tables to be replicated
How transactionally active are the tables (part of replication)
Infrastructure capacity must be foreseen
Parameter configuration must be optimally done

All of the above factors influence Logical Replication to a greater extent. Let us take a look at them in detail.

PostgreSQL Logical Replication Data Types

Understanding the type of data stored in the table is important. If the table part of replication stores Large text or Binary objects and encounters a high number of transactions, then, replication might slow down due to high usage of infrastructure resources. The capacity of the infrastructure must be adequate to handle such complex and big size data replication.

How Active Tables are Transactionally Part of Replication

When replicating highly transactionally active tables, Replication might lag behind in sync because of I/O performance issues, deadlocks, etc, which needs to be taken into consideration. This may not make production database environments look healthier. If the number of tables being replicated is high and the data is replicated to multiple sites, then, there might be high CPU usage and more number of CPUs (or CPU cores) are required.

Infrastructure Capacity

Before considering Logical Replication as a solution, it is important to ensure the Infrastructure capacity of the database servers is adequate enough. If there are a high number of tables being replicated, then, there must be enough CPUs available to do the replication job.

When replicating a high number of tables, consider splitting them into groups and replicate in parallel. Again, this will need multiple CPUs to be available for replication. If the data changes to the tables being replicated are frequent and high this might impact the replication performance as well.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Optimizing Parameters for Logical Replication

Parameters configured for Logical Replication functioning must be tuned optimally to ensure replication does not break.

Let us first take a look at parameters needed to configure it:

wal_level=’logical’
max_wal_senders=10                     # greater than number of subscribers (or replicas)
max_replication_slots=10              # greater than number of subscribers (or replicas)
max_worker_processes=10           # greater than number of subscribers (or replicas)
max_logical_replication_workers  # greater than number of subscribers (or replicas)
max_sync_workers_per_subscription # depends on number of tables being replicated

Tuning max_wal_senders

max_wal_senders must be always greater than the number of replicas. If the data is replicated to multiple sites, then multiple max_wal_senders come into play. So, it is important to ensure this parameter is set to an optimal number.

Tuning max_replication_slots

In general, all the data changes occurring on the tables are written to WAL files in pg_xlog / pg_wal which are termed as WAL records. Wal sender process would pick-up those WAL records (belonging to the tables being replicated) and sends across to the replicas and the wal_receiver process on the replica site would apply those changes at the subscriber node.

The WAL files are removed from the pg_xlog/pg_wal location whenever checkpoint occurs. If the WAL files are removed even before the changes are applied to the subscriber node, then, replication would break and lag behind. In-case subscriber node lags behind, a replication slot would ensure all the WAL files needed for the subscriber to get in sync with the provider are retained. It is recommended to configure one replication slot to each subscriber node.

Tuning max_worker_processes

It is important to have an optimal number of worker processors configured. This depends on how many max number of processes a server can have. This is possible only in multi-CPU environments. Max_worker_processes will ensure multiple processes are spawned to get the job done in a faster way by utilizing multiple CPU cores. When replicating data using Logical Replication, this parameter can help generate multiple worker processes to replicate the data faster. There is a specific parameter called max_logical_worker_processes which will ensure multiple processes are used to copy the data.

Tuning max_logical_worker_processes

This parameter specifies the maximum number of logical worker processes required to perform table data replication and synchronization. This value is taken from max_worker_processes which should be higher than this parameter value. This parameter is very beneficial when replicating data to multiple sites in multi-CPU environments. The default is 4. The max value depends on how many worker processes system supports.

Tuning max_sync_workers_per_subscription

This parameter specifies the maximum number of synchronization processes required per subscription. Synchronization process takes place during initial data sync and to ensure that happens faster this parameter can be used. Currently, only one synchronization process can be configured per table which means, multiple tables can be synced initially in parallel. The default value is 2. This value is picked from max_logical_worker_processes value.

Those are the parameters which must be tuned to ensure Logical Replication is efficient and faster. The other parameters which also effect Logical Replication are as follows.

wal_receiver_timeout, wal_receiver_status_interval and wal_retrieve_retry_interval.

These parameters do not have any effect on the provider node.

Conclusion

Replication specific tables is a common requirement which arises in large and complex database systems. This could be for business reporting or Data Warehousing purposes. As a DBA, I do believe Logical Replication greatly caters to such purposes due its easy implementation with less complexity. Configuring and tuning Logical Replication requires a good amount of planning, architecting and testing. The amount of data being replicated in real-time must be evaluated to ensure efficient and seem-less replication system is in place. To conclude, Databases running in PostgreSQL-10, Logical Replication is the way to go and for those databases running in PostgreSQL versions <10, pglogical is the option.

Tags:

PostgreSQL

postgres

logical replication

↧

Hans-Juergen Schoenig: Tech preview: Improving COPY and bulkloading in PostgreSQL 12

June 5, 2019, 6:03 am

≫ Next: Julien Rouhaud: PoWA 4: changes in powa-archivist!

≪ Previous: Venkata Nagothi: How to Optimize PostgreSQL Logical Replication

If you are relying heavily on the PostgreSQL COPY command to load data into PostgreSQL quickly, PostgreSQL 12 might offer a feature, which is most likely very beneficial to you. Bulkloading is an important operation and every improvement in this area is certainly going to help many people out there, who want to import data into PostgreSQL as fast as possible.

COPY: Loading and unloading data as fast as possible

When taking a closer look at the syntax of the COPY command in PostgreSQL 12 you will quickly see two things:

• \h will now point to the correct page in the documentation
• COPY now supports a WHERE condition

Here is the complete syntax overview:

db12=# \h COPY
Command:     COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    FORCE_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'

URL: https://www.postgresql.org/docs/12/sql-copy.html

While having a link to the documentation around is certainly beneficial, the WHERE condition added to PostgreSQL 12 might even be more important. What is the purpose of this new feature? So far it was possible to completely import a file. However, in some cases this has been a problem: More often than not people only wanted to load a subset of data and had to write a ton of code to filter data before the import or once data has been written into the database already.

COPY … WHERE: Applying filters while importing data

Im PostgreSQL data can be filtered while importing easily. The COPY become is pretty flexible and allows a lot of trickery. To show you, how the new WHERE clause works, I have compiled a simple example:

db12=# CREATE TABLE t_demo AS 
		SELECT * FROM generate_series(1, 1000) AS id;
SELECT 1000

First of all 1000 rows are generated to make sure that we got some data to play. Then we export the content of this table to a file:

db12=# COPY t_demo TO '/tmp/file.txt';
COPY 1000

Finally we can try to import this data again:

db12=# CREATE TABLE t_import (x int);
CREATE TABLE
db12=# COPY t_import FROM '/tmp/file.txt' WHERE x < 5;
COPY 4
db12=# SELECT * FROM t_import;
 x 
---
 1
 2
 3
 4
(4 rows)

As you can see filtering data is pretty simple and very straight forward. One important thing to note here is: I exported an “id” column and imported it as “x”. Keep in mind that the textfile does not know the data structure of our target table – you have to make sure that you filter on the column name of the table you want to import.

Old gems revisited …

If you are new to PostgreSQL in general I also want to present one of the older features, which I like a lot personally. COPY can send data to the UNIX pipe or read data from a pipe. Here is how it works:

db12=# COPY t_demo TO PROGRAM 'gzip -c > /tmp/file.txt.gz';
COPY 1000
db12=# COPY t_import FROM PROGRAM 'gunzip -c /tmp/file.txt.gz' 
	WHERE x BETWEEN 100 AND 103;
COPY 4
db12=# SELECT * FROM t_import WHERE x >= 100;
  x  
-----
 100
 101
 102
 103
(4 rows)

In some cases you might want to do more than to just export data. In this case I decided to compress the data while exporting. Before the data is imported again it is uncompressed and again filtered. As you can see it is pretty simple to combine those features in a flexible way.

If you want to learn more about PostgreSQL and loading data in general, check out our post about rules and triggers. If you want to learn more about COPY, checkout the PostgreSQL documentation.

The post Tech preview: Improving COPY and bulkloading in PostgreSQL 12 appeared first on Cybertec.

↧

Julien Rouhaud: PoWA 4: changes in powa-archivist!

June 5, 2019, 7:26 am

≫ Next: Mark Wong: PDXPUG June Meetup: Accessing Postgres with Java

≪ Previous: Hans-Juergen Schoenig: Tech preview: Improving COPY and bulkloading in PostgreSQL 12

This article is part of the PoWA 4 beta series, and describes the changes done in powa-archivist.

For more information about this v4, you can consult the general introduction article.

Quick overview

First of all, you have to know that there is not upgrade possible from v3 to v4, so a DROP EXTENSION powa is required if you were already using PoWA on any of your servers. This is because this v4 involved a lot of changes in the SQL part of the extension, making it the most significant change in the PoWA suite for this new version. Looking at the amount changes at the time I’m writing this article, I get:

 CHANGELOG.md       |   14 +
 powa--4.0.0dev.sql | 2075 +++++++++++++++++++++-------
 powa.c             |   44 +-
 3 files changed, 1629 insertions(+), 504 deletions(-)

The lack of upgrade shouldn’t be a problem in practice though. PoWA is a performance tool, so it’s intended to have data with high precision but with a very limited history. If you’re looking for a general monitoring solution keeping months of counters, PoWA is definitely not the tool you need.

Configuring the list of remote servers

Concerning the features themselves, the first small change is that powa-archivist does not require the background worker to be active anymore, as it won’t be used for remote setup. That means that a PostgreSQL restart is not needed needed anymore to install PoWA. Obviously, a restart is still required if you want to use the local setup, using the background worker, or if you want to install additional extensions that themselves require a restart.

Then, as PoWA needs some configuration (frequency of snapshot, data retention and so on), some new tables are added to be able to configure all of that. The new powa_servers table stores the configuration for all the remote instances whose data should be stored on this instance. This local PoWA instance is call a repository server (that typically should be dedicated to storing PoWA data), in opposition to remote instances which are the instances you want to monitor. The content of this table is pretty straightforward:

\dpowa_serversTable"public.powa_servers"Column|Type|Collation|Nullable|Default-----------+----------+-----------+----------+------------------------------------------id|integer||notnull|nextval('powa_servers_id_seq'::regclass)hostname|text||notnull|alias|text|||port|integer||notnull|username|text||notnull|password|text|||dbname|text||notnull|frequency|integer||notnull|300powa_coalesce|integer||notnull|100retention|interval||notnull|'1 day'::interval

If you already used PoWA, you should recognize most of the configuration options, that are now stored here. The new options are used to describe how to connect to the remote servers, and can provide an alias to be displayed in the UI.

You also probably noticed a password column here. Storing a password in plain text in this table is an heresy as far as security is concerned. So, as mentioned in the PoWA security section of the documentation, you can store a NULL password and use instead any of the authentication method that libpq supports (.pgpass file, certificate…). That’s strongly recommended for any non toy setup.

Another table, the powa_snapshot_metas table, is also added to store some metadata regarding each remote server snapshot information:

Table"public.powa_snapshot_metas"Column|Type|Collation|Nullable|Default--------------+--------------------------+-----------+----------+---------------------------------------srvid|integer||notnull|coalesce_seq|bigint||notnull|1snapts|timestampwithtimezone||notnull|'-infinity'::timestampwithtimezoneaggts|timestampwithtimezone||notnull|'-infinity'::timestampwithtimezonepurgets|timestampwithtimezone||notnull|'-infinity'::timestampwithtimezoneerrors|text[]

That’s basically a counter to track the number of snapshots done, the timestamp for each kind of event that happened (snapshot, aggregate and purge), and a text array to store any error happening during the snapshot, that the UI can display.

SQL API to configure the remote servers

While thoses table are simple, a basic SQL API is available to register new servers and configure them. Basically, 6 functions are available:

powa_register_server(), to declare a new remote server, and the list of extensions available on it
powa_configure_server() to update any setting for the specified remote server (using a JSON where the key is the name of the parameter to change, and the value is the new value to use)
powa_deactivate_server() to disable snapshots on the specified remote server (which actually is setting up the frequency to -1)
powa_delete_and_purge_server() to remove the specified remote server from the list of servers and remove all associated snapshot data
powa_activate_extension(), to declare that a new extension is available on the specified remote server
powa_deactivate_extension(), to specify that an extension is not available anymore on the specified remote server

Any action more complicated than this should be performed using plain SQL queries. Hopefully, there shouldn’t be many other needs, and the tables are straightforward so this shouldn’t be a problem. Feel free to ask for more functions if you feel the need though. Please also note that the UI doesn’t allow you to call those functions, as the UI is for now entirely read only.

Performing remote snapshots

As metrics are now stored on a different PostgreSQL instance, we had to extensively change the way snapshots (retrieving the data from a stat extension and storing them in PoWA catalog in a space efficient way) are performed.

The list of all stat extensions, or data sources, that are available on a server (either remote or local) and for which we should perform a snapshot are configured in a table called powa_functions:

Table"public.powa_functions"Column|Type|Collation|Nullable|Default----------------+---------+-----------+----------+---------srvid|integer||notnull|module|text||notnull|operation|text||notnull|function_name|text||notnull|query_source|text|||added_manually|boolean||notnull|trueenabled|boolean||notnull|truepriority|numeric||notnull|10

A new query_source field is added, that provides the name of a source function, required to support remote snapshot of any stat extensions. This function is used to export the counters provided by this extension on a different server, in a dedicated transient table. The snapshot function will then perform the snapshot using those exported data instead of the one provided by stat extensions locally when the remote mode is used. Note that the counters export and the remote snapshot is done automatically with the the new powa-collector daemon, that I’ll cover in another article.

Here’s an example of how PoWA perform a remote snapshot of the list of databases. As you’ll see, this is very simplistic, meaning that it’s very easy to add support for a new stat extension.

The transient table:

Unloggedtable"public.powa_databases_src_tmp"Column|Type|Collation|Nullable|Default---------+---------+-----------+----------+---------srvid|integer||notnull|oid|oid||notnull|datname|name||notnull|

For better performance, all the transient tables are unlogged, as their content is only needed during a snapshot and are trashed afterwards. In this example the transient table only stores the server identifier for which the data are, the oid and name of each databases present on the remote server.

And the source function:

CREATEORREPLACEFUNCTIONpublic.powa_databases_src(_srvidinteger,OUToidoid,OUTdatnamename)RETURNSSETOFrecordLANGUAGEplpgsqlAS$function$BEGINIF(_srvid=0)THENRETURNQUERYSELECTd.oid,d.datnameFROMpg_databased;ELSERETURNQUERYSELECTd.oid,d.datnameFROMpowa_databases_src_tmpdWHEREsrvid=_srvid;ENDIF;END;$function$

This function simply returns the content of pg_database if local data are asked (server id 0 is always the local server), or the content of the transient table for the given remote server otherwise.

The snapshot function can then easily do any required work with the data for the wanted remote server. In the case of the powa_databases_snapshot() function, the just synchronizing the list of databases, and storing the timestamp of removal if a previously existing database is not found anymore.

For more details, you can consult the PoWA datasource integration documentation, which was updated for the version 4 specificities.

PoWA 4: changes in powa-archivist! was originally published by Julien Rouhaud at rjuju's home on June 05, 2019.

↧

Mark Wong: PDXPUG June Meetup: Accessing Postgres with Java

June 5, 2019, 9:06 am

≫ Next: Robert Haas: The Contributors Team

≪ Previous: Julien Rouhaud: PoWA 4: changes in powa-archivist!

When: 6-8pm Thursday June 20, 2019
Where:PSU Business Accelerator (Parking is open after 5pm.)
Who: Will McLean

To follow the presentations on accessing Postgres from Python and Scala, I will lead a discussion on accessing Postgres with Java. I’ll start with a jdbc tutorial and finish with adding data access to a springboot webapp.

I have twenty years experience in e-commerce applications, the last eight here in Portland, mostly at Nike.For the last few years everything has been moving to Amazon RDS Postgres, that’s a trend pdxpug can get behind! I am currently working for Navis on CRM applications for the hospitality industry.

↧

Robert Haas: The Contributors Team

June 5, 2019, 10:37 am

≫ Next: Álvaro Hernández: PostgreSQL Ibiza: 2 weeks to go

≪ Previous: Mark Wong: PDXPUG June Meetup: Accessing Postgres with Java

Recently, the PostgreSQL project spun up a "contributors" team, whose mission is to ensure that the PostgreSQL contributors list is up-to-date and fair. The contributors page has a note which says "To suggest additions to the list, please email contributors@postgresql.org." The current members of the team are Dave Page, Stephen Frost, Vik Fearing, and me.

↧

Álvaro Hernández: PostgreSQL Ibiza: 2 weeks to go

June 6, 2019, 8:37 am

≫ Next: Robert Treat: The Lost Art of plpgsql

≪ Previous: Robert Haas: The Contributors Team

Five-day PostgreSQL networking experience, embedding a 2-day conference Just 2 weeks to go for PostgreSQL Ibiza. The new, innovative PostgreSQL Conferences that happens 50m away from a beach. The conference for thinkers, for networking, for partnering. The conference to be at. But a conference is nothing without great content. And after receiving more than 71 talk submissions, and the hard work that the Committee has done to select the talks, PostgreSQL Ibiza will have top-notch talks from top-notch international speakers.

↧

Robert Treat: The Lost Art of plpgsql

June 5, 2019, 10:12 pm

≫ Next: Bruce Momjian: Exploring Postgres Tips and Tricks

≪ Previous: Álvaro Hernández: PostgreSQL Ibiza: 2 weeks to go

One of the big features talked about when PostgreSQL 11 was released was that of the new stored procedure implementation. This gave Postgres a more standard procedure interface compared to the previous use of functions. This is perticularly useful for folks who are doing database migrations where they may have been using the standards CALL syntax vs Postgres traditional use of SELECT function(); syntax. So it struck me as odd earlier this year when I noticed that, despite the hoopla, that a year later that there was almost zero in the way of presentations and blog posts on either the new stored procedure functionality or the use of plpgsql in general.

And so I got the idea that maybe I would write such a talk and present it at PGCon; a nod to the past and the many years I've spent working with plpgsql in a variety of roles. The commitee liked the idea (disclosure that I am on the pgcon committee, but didn't advocate for myself) and so this talk was born. For a first time talk I think it turned out well, though it could definitly use some polish; but I'm happy that it did help spark some conversation and actually has given me a few items worth following up on, hopefully in future blog posts.

Video should be available in a few weeks, but for now, I've gone ahead and uploaded the slides on slideshare.

↧

Bruce Momjian: Exploring Postgres Tips and Tricks

June 6, 2019, 3:00 pm

≫ Next: elein mustain: Beautiful things, strings.

≪ Previous: Robert Treat: The Lost Art of plpgsql

I did a webinar two weeks ago titled, "Exploring Postgres Tips and Tricks." The slides are now online, as well as a video recording. I wasn't happy with the transition I used from the PDF to the blog entries, but now know how to improve that next time.

I think I might do more of these by expanding on some of the topics I covered, like psql and monitoring. Also, a new video is available of the sharding presentation I mentioned previously.

↧

elein mustain: Beautiful things, strings.

May 31, 2019, 4:45 pm

≫ Next: Paul Ramsey: Parallel PostGIS and PgSQL 12 (2)

≪ Previous: Bruce Momjian: Exploring Postgres Tips and Tricks

This blog today is going to talk about strings: how they are stored, how they are input, and lots of examples of how to use string operators and functions in order to manipulate them. Strings, strings, strings. What we are not going to cover is regular expressions, although we will use them. The Fine Manual […]

↧

Paul Ramsey: Parallel PostGIS and PgSQL 12 (2)

June 7, 2019, 9:00 am

≫ Next: Luca Ferrari: Checking the sequences status on a single pass

≪ Previous: elein mustain: Beautiful things, strings.

In my last post I demonstrated that PostgreSQL 12 with PostGIS 3 will provide, for the first time, automagical parallelization of many common spatial queries.

This is huge news, as it opens up the possibility of extracting more performance from modern server hardware. Commenters on the post immediately began conjuring images of 32-core machines reducing their query times to miliseconds.

So, the next question is: how much more performance can we expect?

To investigate, I acquired a 16 core machine on AWS (m5d.4xlarge), and installed the current development snapshots of PostgreSQL and PostGIS, the code that will become versions 12 and 3 respectively, when released in the fall.

How Many Workers?

The number of workers assigned to a query is determined by PostgreSQL: the system looks at a given query, and the size of the relations to be processed, and assigns workers proportional to the log of the relation size.

For parallel plans, the “explain” output of PostgreSQL will include a count of the number of workers planned and assigned. That count is exclusive of the leader process, and the leader process actually does work outside of its duties in coordinating the query, so the number of CPUs actually working is more than the num_workers, but slightly less than num_workers+1. For these graphs, we’ll assume the leader fully participates in the work, and that the number of CPUs in play is num_workers+1.

Forcing Workers

PostgreSQL’s automatic calculation of the number of workers could be a blocker to performing analysis of parallel performance, but fortunately there is a workaround.

Tables support a “storage parameter” called parallel_workers. When a relation with parallel_workers set participates in a parallel plan, the value of parallel_workers over-rides the automatically calculated number of workers.

ALTERTABLEpdSET(parallel_workers=8);

In order to generate my data, I re-ran my queries, upping the number of parallel_workers on my tables for each run.

Setup

Before running the tests, I set all the global limits on workers high enough to use all the CPUs on my test server.

SETmax_worker_processes=16;SETmax_parallel_workers=16;SETmax_parallel_workers_per_gather=16;

I also loaded my data and created indexes as usual. The tables I used for these tests were:

pd a table of 69,534 polygons
pts_10 a table of 695,340 points

Scan Performance

I tested two kinds of queries: a straight scan query, with only one table in play; and, a spatial join with two tables. I used the usual queries from my annual parallel tests.

EXPLAINANALYZESELECTSum(ST_Area(geom))FROMpd;

Scan performance improved well at first, but started to flatten out noticably after 8 cores.

Workers	1	2	4	8	16
Time (ms)	318	167	105	62	47

The default number of CPUs the system wanted to use was 4 (1 leader + 3 workers), which is probably not a bad choice, as the expected gains from addition workers shallows out as the core count grows.

Join Performance

The join query computes the join of 69K polygons against 695K points. The points are actually generated from the polygons, so there are precisely 10 points in each polygon, so the resulting relation would be 690K records long.

EXPLAINANALYZESELECT*FROMpdJOINpts_10ptsONST_Intersects(pd.geom,pts.geom);

For unknown reasons, it was impossible to force out a join plan with only 1 worker (aka 2 CPUs) so that part of our chart/table is empty.

Workers	1	2	4	8	16
Time (ms)	26789	-	9371	5169	4043

The default number of workers is again 4 (1 leader + 3 workers) which, again, isn’t bad. The join performance shallows out faster than the scan performance, and above 10 CPUs is basically flat.

Conclusions

There is a limit to how much advantage adding workers to a plan will gain you
The limit feels intuitively lower than I expected given the CPU-intensity of the workloads
The planner does a pretty good, slightly conservative, job of picking a realistic number of workers

↧

Luca Ferrari: Checking the sequences status on a single pass

June 10, 2019, 5:00 pm

≫ Next: Luca Ferrari: FizzBuzz (in both plpgsql and SQL)

≪ Previous: Paul Ramsey: Parallel PostGIS and PgSQL 12 (2)

It is quite simple to wrap a couple of queries in a function to have a glance at all the sequences and their cycling status.

Checking the sequences status on a single pass

The catalog pg_sequence keeps track about the definition of a single sequence, including the increment value and boundaries. Combined with pg_class and a few other functions it is possible to create a very simple administrative function to keep track about the overall sequences status.

I’ve created a seq_check() function that provides an output as follows:

testdb=#select*fromseq_check()ORDERBYremaining;seq_name|current_value|lim|remaining------------------------|---------------|------------|------------public.persona_pk_seq|5000000|2147483647|214248public.root_pk_seq|50000|2147483647|2147433647public.students_pk_seq|7|2147483647|2147483640(3rows)

As you can see, the function provides the current value of the sequence, the maximum value (limit) and how much values the sequence can still provide before it overflows or cycles. For example, persona_pk_seq has remained with 214248 values to provide. Combined with the current value, that is 5000000, this provides hint about the fact that the sequence has probably a too large increment interval.

The code of the function is as follows:

CREATEORREPLACEFUNCTIONseq_check()RETURNSTABLE(seq_nametext,current_valuebigint,lim...

↧

Luca Ferrari: FizzBuzz (in both plpgsql and SQL)

June 10, 2019, 5:00 pm

≫ Next: Hans-Juergen Schoenig: Tech preview: How PostgreSQL 12 handles prepared plans

≪ Previous: Luca Ferrari: Checking the sequences status on a single pass

While listening to a great talk by Benno Rice, I was pointed to the FizzBuzz alghortim. How hard could it be to implement it using PostgreSQL?

FizzBuzz (in both plpgsql and SQL)

FizzBuzz is something used as straight question during job interviews: the idea is that if you cannot get the alghoritm fine, you are not a programmer at all!
The alghoritm can be described as:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Now, how hard could it be? You can find my implementation here. Well, implementing using pgsql is as simple as:

CREATEORREPLACEFUNCTIONfizzbuzz(start_numberintDEFAULT1,end_numberintDEFAULT100)RETURNSVOIDAS$CODE$DECLAREcurrent_numberint;current_valuetext;BEGIN-- check argumentsIFstart_number>=end_numberTHENRAISEEXCEPTION'The start number must be lower then the end one! From % to %',start_number,end_number;ENDIF;FORcurrent_numberINstart_number..end_numberLOOPcurrent_value:=NULL;IFcurrent_number%3=0THENcurrent_value:='Fizz';ENDIF;IFcurrent_number% 

↧

Hans-Juergen Schoenig: Tech preview: How PostgreSQL 12 handles prepared plans

June 11, 2019, 1:00 am

≫ Next: Jeff McCormick: What's New in Crunchy PostgreSQL Operator 4.0

≪ Previous: Luca Ferrari: FizzBuzz (in both plpgsql and SQL)

PostgreSQL 12 is just around the corner and therefore we already want to present some of the new features we like. One important new feature gives users and devops the chance to control the behavior of the PostgreSQL optimizer. Prepared plans are always a major concern (especially people moving from Oracle seem to be most concerned) and therefore it makes sense to discuss the way plans are handled in PostgreSQL 12.

Firing up a PostgreSQL test database

To start I will create a simple table consisting of just two fields:

db12=# CREATE TABLE t_sample (id serial, name text);
CREATE TABLE

Then some data is loaded:

db12=# INSERT INTO t_sample (name)
       SELECT 'hans' FROM generate_series(1, 1000000);
INSERT 0 1000000

db12=# INSERT INTO t_sample (name)
       SELECT 'paul' FROM generate_series(1, 2);
INSERT 0 2

Note that 1 million names are identical (“hans”) and just two people are called “paul”. The distribution of data is therefore quite special, which has a major impact as you will see later in this post.

To show how plans can change depending on the setting, an index on “name” is defined as shown in the next listing:

db12=# CREATE INDEX idx_name ON t_sample (name);
CREATE INDEX

The PostgreSQL query optimizer at work

Let us run a simple query and see what happens:

db12=# explain SELECT count(*) FROM t_sample WHERE name = 'hans';
                        QUERY PLAN
------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
  -> Gather (cost=12656.01..12656.22 rows=2 width=8)
     Workers Planned: 2
     -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)  
     -> Parallel Seq Scan on t_sample 
          (cost=0.00..10614.34 rows=416668 width=0)
        Filter: (name = 'hans'::text)
(6 rows)

In this case PostgreSQL decided to ignore the index and go for a sequential scan. It has even seen that the table is already quite large and opted for a parallel query. Still, what we see is a sequential scan. All data in the table has to be processed. Why is that? Remember: Most people in the table have the same name. It is faster to read the entire table and kick out those other ones instead of having to read almost the entire index. The planner figures (correctly) that running a sequential scan will be faster.

What you can take away from this example is that an index is not used because it exists – PostgreSQL uses indexes when they happen to make sense. If we search for a less frequent value, PostgreSQL will decide on using the index and offer us the optimal plan shown in the next listing:

db12=# explain SELECT count(*) FROM t_sample WHERE name = 'paul';
                 QUERY PLAN
------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=8)
  -> Index Only Scan using idx_name on t_sample 
       (cost=0.42..4.44 rows=1 width=0)
     Index Cond: (name = 'paul'::text)
(3 rows)

Optimizer statistics: Fueling good performance

If you are looking for good performance, keeping an eye on optimizer statistics is definitely a good idea. The main question now is: Which data does the optimizer keep? pg_stats contains information about each column:

db12=# \x
Expanded display is on.
db12=# SELECT *
       FROM   pg_stats
       WHERE  tablename = 't_sample'
             AND attname = 'name';
-[ RECORD 1 ] 
--------------------------+------------
schemaname                | public
tablename                 | t_sample
attname                   | name
inherited                 | f
null_frac                 | 0
avg_width                 | 5
n_distinct                | 1
most_common_vals          | {hans}
most_common_freqs         | {1}
histogram_bounds          |
correlation               | 1
most_common_elems         |
most_common_elem_freqs    |
elem_count_histogram      |

PostgreSQL keeps track over the percentage of NULL entries in the table (null_frac). The average width of a column, the estimated number of distinct values (are all different, are all values the same). Then PostgreSQL keeps a list of the most frequent entries as well as their likelihood. The histogram_bounds column will contain the statistical distribution of data. In our example you will only find entries in this field if you are looking for the “id” column. There are only two names so keeping a histogram is basically pointless. The correlation column will tell us about the physical order of rows on disk. This field can be pretty important because it helps the optimizer to estimate the amount of I/O.

Preparing plans manually

If you send a query to PostgreSQL it is usually planned when the query is sent. However, if you explicitly want to prepare a query, you can make use of the PREPARE / EXECUTE commands. Here is how it works:

db12=# PREPARE myplan(text) AS
       SELECT  count(*)
       FROM    t_sample
       WHERE   name = $1;
PREPARE
db12=# EXECUTE myplan('paul');
count
-------
 2
(1 row)

As you can see the following query will give us an indexscan:

db12=# explain EXECUTE myplan('paul');
                         QUERY PLAN
-------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=8)
  -> Index Only Scan using idx_name on t_sample 
     (cost=0.42..4.44 rows=1 width=0)
     Index Cond: (name = 'paul'::text)
(3 rows)

If we fall back to the more common value we will again get a parallel sequential scan:

db12=# explain EXECUTE myplan('hans');
                              QUERY PLAN
-------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
  -> Gather (cost=12656.01..12656.22 rows=2 width=8)
     Workers Planned: 2
     -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)   
     -> Parallel Seq Scan on t_sample 
          (cost=0.00..10614.34 rows=416668 width=0)
        Filter: (name = 'hans'::text)
(6 rows)

Why is that the case? In PostgreSQL life is not so straight forward. Even if we prepare explicitly we will still get a “fresh plan” before a generic plan is created. What is a generic plan? A generic plan is made assuming some constant parameters. The idea is to keep the plan and execute it multiple times in the hope that overall performance goes up due to lower planning overhead. Up to PostgreSQL 11 this process has been a bit “obscure” to most people.

Here is how the “obscure” thing works in detail. There are two ways PostgreSQL can choose for executing a prepared statement:

It could create a new plan for every execution that considers the current parameter value. That will lead to the best possible plan, but having to plan the query for every execution can remove most of the benefit for an OLTP application, which is to avoid having to plan the same statement over and over again.
It could create a “generic plan” that does not take the parameter values into account. That will avoid re-planning the statement every time, but it can lead to problems during execution if the best plan depends heavily on the parameter values.

By default, PostgreSQL chooses a “middle road”: it will generate a “custom plan”
during the first 5 executions of the prepared statement that takes the parameter values
into account. From the sixth execution on, it will check if the generic plan would
have performed as well (by comparing the estimated execution costs of the custom and
the generic plan). If it thinks that the generic plan would have done just as well,
the prepared statement will always use the generic plan from that point on.
PostgreSQL 12 introduces a new variable, which allows users to control the behavior more explicitly. Let us try the same thing again and enforce a generic plan:

db12=# SET plan_cache_mode = 'force_generic_plan';
SET
db12=# PREPARE newplan(text) AS
       SELECT count(*)
       FROM   t_sample
       WHERE  name = $1;
PREPARE
db12=# explain EXECUTE newplan('hans');
                       QUERY PLAN
-----------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
  -> Gather (cost=12656.01..12656.22 rows=2 width=8)
     Workers Planned: 2
     -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
     -> Parallel Seq Scan on t_sample 
          (cost=0.00..10614.34 rows=416668 width=0)
        Filter: (name = $1)
(6 rows)

db12=# explain EXECUTE newplan('paul');
                         QUERY PLAN
-------------------------------------------------------------------
Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
  -> Gather (cost=12656.01..12656.22 rows=2 width=8)
     Workers Planned: 2
     -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
     -> Parallel Seq Scan on t_sample 
          (cost=0.00..10614.34 rows=416668 width=0)
        Filter: (name = $1)
(6 rows)

What you see here is that the plan is constant and PostgreSQL does not attempt replanning. Planning time will be cut BUT it does not necessarily mean that you always win. You might save on some CPU cycles to optimize the query but this of course means that the plan you are using is not necessarily optimal for your parameters.

plan_cache_mode: Valid parameters

If you want to play around with plan_cache_mode you can try the following values:

db12=# SET plan_cache_mode = 'force_custom_plan';
SET

db12=# SET plan_cache_mode = 'force_generic_plan';
SET

db12=# SET plan_cache_mode = 'auto';
SET

“auto”, which is the default value, resembles the traditional behavior of letting
PostgreSQL choose whether to use a generic plan or not.
You might ask what good it could be to use a prepared statement with “force_custom_plan”.
The main reason is that using prepared statements is the best way to prevent SQL injection
attacks, so it may be worth using them even if you don’t save on planning time.
If you want to learn more about PostgreSQL 12, consider checking out our blog post about optimizer support functions.

The post Tech preview: How PostgreSQL 12 handles prepared plans appeared first on Cybertec.

↧

Jeff McCormick: What's New in Crunchy PostgreSQL Operator 4.0

June 11, 2019, 8:27 am

≫ Next: Luca Ferrari: A recursive CTE to get information about partitions

≪ Previous: Hans-Juergen Schoenig: Tech preview: How PostgreSQL 12 handles prepared plans

Crunchy Data is pleased to release PostgreSQL Operator 4.0.

↧

Luca Ferrari: A recursive CTE to get information about partitions

June 11, 2019, 5:00 pm

≫ Next: Avinash Kumar: Bloom Indexes in PostgreSQL

≪ Previous: Jeff McCormick: What's New in Crunchy PostgreSQL Operator 4.0

I was wondering about writing a function that provides a quick status about partitioning. But wait, PostgreSQL has recursive CTEs!

A recursive CTE to get information about partitions

I’m used to partitioning, it allows me to quickly and precisely split data across different tables. PostgreSQL 10 introduced the native partitioning, and since that I’m using native partitioning over inheritance whenever it is possible.
But how to get a quick overview of the partition status? I mean, knowing which partition is growing the more?
In the beginning I was thinking to write a function to do that task, quickly finding myself iterating recursively over pg_inherits, the table that links partitions to their parents. But the keyword here is recursively: PostgreSQL provides recursive Common Table Expression, and a quick search revelead I was right: it is possible to do it with a single CTE. Taking inspiration from this mailing list message, here it is a simple CTE to get a partition status (you can find it on my GitHub repository):

WITHRECURSIVEinheritance_treeAS(SELECTc.oidAStable_oid,c.relnameAStable_name,NULL::textAStable_parent_name,c.relispartitionASis_partitionFROMpg_classcJOINpg_namespacenONn.oid=c.relnamespaceWHEREc.relkind='p'ANDc.relispartition=falseUNIONALLSELECTinh.inhrelidAStable_oid,c.relnameAStable_name,...

↧

Avinash Kumar: Bloom Indexes in PostgreSQL

June 14, 2019, 12:29 pm

≫ Next: pgCMH - Columbus, OH: What’s new in pgBouncer

≪ Previous: Luca Ferrari: A recursive CTE to get information about partitions

PostgreSQL Logo There is a wide variety of indexes available in PostgreSQL. While most are common in almost all databases, there are some types of indexes that are more specific to PostgreSQL. For example, GIN indexes are helpful to speed up the search for element values within documents. GIN and GiST indexes could both be used for making full-text searches faster, whereas BRIN indexes are more useful when dealing with large tables, as it only stores the summary information of a page. We will look at these indexes in more detail in future blog posts. For now, I would like to talk about another of the special indexes that can speed up searches on a table with a huge number of columns and which is massive in size. And that is called a bloom index.

In order to understand the bloom index better, let’s first understand the bloom filter data structure. I will try to keep the description as short as I can so that we can discuss more about how to create this index and when will it be useful.

Most readers will know that an array in computer sciences is a data structure that consists of a collection of values and variables. Whereas a bit or a binary digit is the smallest unit of data represented with either 0 or 1. A bloom filter is also a bit array of m bits that are all initially set to 0.

A bit array is an array that could store a certain number of bits (0 and 1). It is one of the most space-efficient data structures to test whether an element is in a set or not.

Why use bloom filters?

Let’s consider some alternates such as list data structure and hash tables. In the case of a list data structure, it needs to iterate through each element in the list to search for a specific element. We can also try to maintain a hash table where each element in the list is hashed, and we then see if the hash of the element we are searching for matches a hash in the list. But checking through all the hashes may be a higher order of magnitude than expected. If there is a hash collision, then it does a linear probing which may be time-consuming. When we add hash tables to disk, it requires some additional IO and storage. For an efficient solution, we can look into bloom filters which are similar to hash tables.

Type I and Type II errors

While using bloom filters, we may see a result that falls into a

type I error

but never a

type II error

. A nice example of a type I error is a result that a person with last name: “vallarapu” exists in the relation: foo.bar whereas it does not exist in reality (a

false positive

conclusion). An example for a type II error is a result that a person with the last name as “vallarapu” does not exist in the relation: foo.bar, but in reality, it does exist (a

false negative

conclusion). A bloom filter is 100% accurate when it says the element is not present. But when it says the element is present, it may be 90% accurate or less. So it is usually called a

probabilistic data structure

The bloom filter algorithm

Let’s now understand the algorithm behind bloom filters better. As discussed earlier, it is a bit array of m bits, where m is a certain number. And we need a k number of hash functions. In order to tell whether an element exists and to give away the item pointer of the element, the element (data in columns) will be passed to the hash functions. Let’s say that there are only two hash functions to store the presence of the first element “avi” in the bit array. When the word “avi” is passed to the first hash function, it may generate the output as 4 and the second may give the output as 5. So now the bit array could look like the following:

All the bits are initially set to 0. Once we store the existence of the element “avi” in the bloom filter, it sets the 4th and 5th bits to 1. Let’s now store the existence of the word “percona”. This word is again passed to both the hash functions and assumes that the first hash function generates the value as 5 and the second hash function generated the value as 6. So, the bit array now looks like the following – since the 5th bit was already set to 1 earlier, it doesn’t make any modifications there:

Now, consider that our query is searching for a predicate with the name as “avi”. The input: “avi” will now be passed to the hash functions. The first hash function returns the value as 4 and the second returns the value as 5, as these are the same hash functions that were used earlier. Now when we look in position 4 and 5 of the bloom filter (bit array), we can see that the values are set to 1. This means that the element is present.

Collision with bloom filters

Consider a query that is fetching the records of a table with the name: “don”. When this word “don” is passed to both the hash functions, the first hash function returns the value as 6 (let’s say) and the second hash function returns the value as 4. As the bits at positions 6 and 4 are set to 1, the membership is confirmed and we see from the result that a record with the name: “don” is present. In reality, it is not. This is one of the chances of collisions. However, this is not a serious problem.

A point to remember is – “The fewer the hash functions, the more the chances of collisions. And the more the hash functions, lesser the chances of collision. But if we have k hash functions, the time it takes for validating membership is in the order of k“.

Bloom Indexes in PostgreSQL

As you’ll now have understood bloom filters, you’ll know a bloom index uses bloom filters. When you have a table with too many columns, and there are queries using too many combinations of columns – as predicates – on such tables, you could need many indexes. Maintaining so many indexes is not only costly for the database but is also a performance killer when dealing with larger data sets.

So, if you create a bloom index on all these columns, a hash is calculated for each of the columns and merged into a single index entry of the specified length for each row/record. When you specify a list of columns on which you need a bloom filter, you could also choose how many bits need to be set per each column. The following is an example syntax with the length of each index entry and the number of bits per a specific column.

CREATE INDEX bloom_idx_bar ON foo.bar USING bloom (id,dept_id,zipcode)
WITH (length=80, col1=4, col2=2, col3=4);

length

is rounded to the nearest multiple of 16. Default is 80. And the maximum is 4096. The default

number of bits

per column is 2. We can specify a maximum of 4095 bits.

Bits per each column

Here is what it means in theory when we have specified length = 80 and col1=2, col2=2, col3=4. A bit array of length 80 bits is created per row or a record. Data inside col1 (column1) is passed to two hash functions because col1 was set to 2 bits. Let’s say these two hash functions generate the values as 20 and 40. The bits at the 20th and 40th positions are set to 1 within the 80 bits (m) since the length is specified as 80 bits. Data in col3 is now passed to four hash functions and let’s say the values generated are 2, 4, 9, 10. So four bits – 2, 4, 9, 10 –are set to 1 within the 80 bits.

There may be many empty bits, but it allows for more randomness across the bit arrays of each of the individual rows. Using a signature function, a signature is stored in the index data page for each record along with the row pointer that points to the actual row in the table. Now, when a query uses an equality operator on the column that has been indexed using bloom, a number of hash functions, as already set for that column, are used to generate the appropriate number of hash values. Let’s say four for col3 – so 2, 4, 9, 10. The index data is extracted row-by-row and searched if the rows have those bits (bit positions generated by hash functions) set to 1.

And finally, it says a certain number of rows have got all of these bits set to 1. The greater the length and the bits per column, the more the randomness and the fewer the false positives. But the greater the length, the greater the size of the index.

Bloom Extension

Bloom index is shipped through the contrib module as an extension, so you must create the bloom extension in order to take advantage of this index using the following command:

CREATE EXTENSION bloom;

Example

Let’s start with an example. I am going to create a table with multiple columns and insert 100 million records.

percona=# CREATE TABLE foo.bar (id int, dept int, id2 int, id3 int, id4 int, id5 int,id6 int,id7 int,details text, zipcode int);
CREATE TABLE
percona=# INSERT INTO foo.bar SELECT (random() * 1000000)::int, (random() * 1000000)::int,
(random() * 1000000)::int,(random() * 1000000)::int,(random() * 1000000)::int,(random() * 1000000)::int,
(random() * 1000000)::int,(random() * 1000000)::int,md5(g::text), floor(random()* (20000-9999 + 1) + 9999)
from generate_series(1,100*1e6) g;
INSERT 0 100000000

The size of the table is now 9647 MB as you can see below.

percona=# \dt+ foo.bar
List of relations
Schema | Name | Type  | Owner    | Size    | Description
-------+------+-------+----------+---------+-------------
foo    | bar  | table | postgres | 9647 MB | (1 row)

Let’s say that all the columns: id, dept, id2, id3, id4, id5, id6 and zip code of table: foo.bar are used in several queries in random combinations according to different reporting purposes. If we create individual indexes on each column, it is going to take almost 2 GB disk space for each index.

Testing with btree indexes

We’ll try creating a single btree index on all the columns that are most used by the queries hitting this table. As you can see in the following log, it took 91115.397 ms to create this index and the size of the index is 4743 MB.

postgres=# CREATE INDEX idx_btree_bar ON foo.bar (id, dept, id2,id3,id4,id5,id6,zipcode);
CREATE INDEX
Time: 91115.397 ms (01:31.115)
postgres=# \di+ foo.idx_btree_bar
                             List of relations
 Schema |     Name      | Type  |  Owner   | Table |  Size   | Description
--------+---------------+-------+----------+-------+---------+-------------
 foo    | idx_btree_bar | index | postgres | bar   | 4743 MB |
(1 row)

Now, let’s try some of the queries with a random selection of columns. You can see that the execution plans of these queries are 2440.374 ms and 2406.498 ms for query 1 and query 2 respectively. To avoid issues with the disk IO, I made sure that the execution plan was captured when the index was cached to memory.

Query 1
-------
postgres=# EXPLAIN ANALYZE select * from foo.bar where id4 = 295294 and zipcode = 13266;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_bar on bar  (cost=0.57..1607120.58 rows=1 width=69) (actual time=1832.389..2440.334 rows=1 loops=1)
   Index Cond: ((id4 = 295294) AND (zipcode = 13266))
 Planning Time: 0.079 ms
 Execution Time: 2440.374 ms
(4 rows)
Query 2
-------
postgres=# EXPLAIN ANALYZE select * from foo.bar where id5 = 281326 and id6 = 894198;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_bar on bar  (cost=0.57..1607120.58 rows=1 width=69) (actual time=1806.237..2406.475 rows=1 loops=1)
   Index Cond: ((id5 = 281326) AND (id6 = 894198))
 Planning Time: 0.096 ms
 Execution Time: 2406.498 ms
(4 rows)

Testing with Bloom Indexes

Let’s now create a bloom index on the same columns. As you can see from the following log, there is a huge size difference between the bloom (1342 MB) and the btree index (4743 MB). This is the first win. It took almost the same time to create the btree and the bloom index.

postgres=# CREATE INDEX idx_bloom_bar ON foo.bar USING bloom(id, dept, id2, id3, id4, id5, id6, zipcode)
WITH (length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4, col8=4);
CREATE INDEX
Time: 94833.801 ms (01:34.834)
postgres=# \di+ foo.idx_bloom_bar
                             List of relations
 Schema |     Name      | Type  |  Owner   | Table |  Size   | Description
--------+---------------+-------+----------+-------+---------+-------------
 foo    | idx_bloom_bar | index | postgres | bar   | 1342 MB |
(1 row)

Let’s run the same queries, check the execution time, and observe the difference.

Query 1
-------
postgres=# EXPLAIN ANALYZE select * from foo.bar where id5 = 326756 and id6 = 597560;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on bar  (cost=1171823.08..1171824.10 rows=1 width=69) (actual time=1265.269..1265.550 rows=1 loops=1)
   Recheck Cond: ((id4 = 295294) AND (zipcode = 13266))
   Rows Removed by Index Recheck: 2984788
   Heap Blocks: exact=59099 lossy=36090
   ->  Bitmap Index Scan on idx_bloom_bar  (cost=0.00..1171823.08 rows=1 width=0) (actual time=653.865..653.865 rows=99046 loops=1)
         Index Cond: ((id4 = 295294) AND (zipcode = 13266))
 Planning Time: 0.073 ms
 Execution Time: 1265.576 ms
(8 rows)
Query 2
-------
postgres=# EXPLAIN ANALYZE select * from foo.bar where id5 = 281326 and id6 = 894198;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on bar  (cost=1171823.08..1171824.10 rows=1 width=69) (actual time=950.561..950.799 rows=1 loops=1)
   Recheck Cond: ((id5 = 281326) AND (id6 = 894198))
   Rows Removed by Index Recheck: 2983893
   Heap Blocks: exact=58739 lossy=36084
   ->  Bitmap Index Scan on idx_bloom_bar  (cost=0.00..1171823.08 rows=1 width=0) (actual time=401.588..401.588 rows=98631 loops=1)
         Index Cond: ((id5 = 281326) AND (id6 = 894198))
 Planning Time: 0.072 ms
 Execution Time: 950.827 ms
(8 rows)

From the above tests, it is evident that the bloom indexes performed better. Query 1 took 1265.576 ms with a bloom index and 2440.374 ms with a btree index. And query 2 took 950.827 ms with bloom and 2406.498 ms with btree. However, the same test will show a better result for a btree index, if you would have created a btree index on those 2 columns only (instead of many columns).

Reducing False Positives

If you look at the execution plans generated after creating the bloom indexes (consider Query 2), 98631 rows are considered to be matching rows. However, the output says only one row. So, the rest of the rows – all 98630 – are false positives. The btree index would not return any false positives.

In order to reduce the false positives, you may have to increase the signature length and also the bits per column through some of the formulas mentioned in this interesting blog post through experimentation and testing. As you increase the signature length and bits, you might see the bloom index growing in size. Nevertheless, this may reduce false positives. If the time spent is greater due to the number of false positives returned by the bloom index, you could increase the length. If increasing the length does not make much difference to the performance, then you can leave the length as it is.

Points to be carefully noted

In the above tests, we have seen how a bloom index has performed better than a btree index. But, in reality, if we had created a btree index just on top of the two columns being used as predicates, the query would have performed much faster with a btree index than with a bloom index. This index does not replace a btree index unless we wish to replace a chunk of the indexes with a single bloom index.
Just like hash indexes, a bloom index is applicable for equality operators only.
Some formulas on how to calculate the appropriate length of a bloom filter and the bits per column can be read on Wikipedia or in this blog post.

Conclusion

Bloom indexes are very helpful when we have a table that stores huge amounts of data and a lot of columns, where we find it difficult to create a large number of indexes, especially in OLAP environments where data is loaded from several sources and maintained for reporting. You could consider testing a single bloom index to see if you can avoid implementing a huge number of individual or composite indexes that could take additional disk space without much performance gain.

↧

pgCMH - Columbus, OH: What’s new in pgBouncer

June 16, 2019, 9:00 pm

≫ Next: Hubert 'depesz' Lubaczewski: Changes on explain.depesz.com

≪ Previous: Avinash Kumar: Bloom Indexes in PostgreSQL

The June meeting will be held at 18:00 EST on Tues, the 25^th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

What

CoverMyMeds’ very own CJ will be presenting this month. This month’s meeting will be CJ telling us about what’s new and improved in pgBouncer as well as how to get it up and running. Discussion will include real life examples from its use at CMM. pgBouncer is the lightweight connection pooler for PostgreSQL.

Where

CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3^rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11^th floor. (If the elevator won’t let you pick the 11^th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

Community space as seen from the stage

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

↧

Hubert 'depesz' Lubaczewski: Changes on explain.depesz.com

June 18, 2019, 4:54 am

≫ Next: Jonathan Katz: Explaining CVE-2019-10164 + PostgreSQL Security Best Practices

≪ Previous: pgCMH - Columbus, OH: What’s new in pgBouncer

Recently got two bug reports: plans with “COSTS OFF" do not parse, and error out (bugreport by Marc Dean Jr) WorkTable Scan is not properly parsed (bugreport by Ivan Vergiliev) Additionally, I was kinda upset because plans that include trigger calls did not display properly. All of this has been fixed today: First, I fixed … Continue reading

↧

Jonathan Katz: Explaining CVE-2019-10164 + PostgreSQL Security Best Practices

June 20, 2019, 7:30 pm

≫ Next: Michael Paquier: Postgres 12 highlight - SQL/JSON path

≪ Previous: Hubert 'depesz' Lubaczewski: Changes on explain.depesz.com

The PostgreSQL Global Development Group provided an out-of-cycle update release for all supported to provide a fix for the CVE-2019-10164 vulnerability. This vulnerability only affects people running PostgreSQL 10, 11 or the 12 beta, and it is effectively remediated by simply upgrading all of your PostgreSQL installations to the latest versions.

What follows is some more insight about what this vulnerability is, the impact it can have in your environment, how to ensure you have patched all of your systems, and provide some PostgreSQL security best practices that could help mitigate the impact of this kind of vulnerability.

↧