Shaun M. Thomas: PG Phriday: Massively Distributed Operation

Postgres has been lacking something for quite a while, and more than a few people have attempted to alleviate the missing functionality multiple times. I’m speaking of course, about parallel queries. There are several reasons for this, and among them include various distribution and sharding needs for large data sets. When tables start to reach hundreds of millions, or even billions of rows, even high cardinality indexes produce results very slowly.

I recently ran across an extension called pmpp for Poor Man’s Parallel Processing and decided to give it a try. It uses Postgres’ dblink system to invoke queries asynchronously and then collates the results locally. This allows further queries on that result set, as if it were a temporary table.

Theoretically this should be ideal for a distributed cluster of shards, so let’s see what happens if we try this with our sensor_log table in that configuration:

CREATETABLE sensor_log (
  sensor_log_id  SERIAL PRIMARYKEY,
  location       VARCHARNOTNULL,
  reading        BIGINTNOTNULL,
  reading_date   TIMESTAMPNOTNULL);
 
CREATEINDEX idx_sensor_log_location ON sensor_log (location);
CREATEINDEX idx_sensor_log_date ON sensor_log (reading_date);
CREATEINDEX idx_sensor_log_time ON sensor_log ((reading_date::TIME));
 
CREATE SCHEMA shard_1;
SET search_path TO shard_1;
CREATETABLE sensor_log (LIKE public.sensor_log INCLUDING ALL)
INHERITS (public.sensor_log);
 
CREATE SCHEMA shard_2;
SET search_path TO shard_2;
CREATETABLE sensor_log (LIKE public.sensor_log INCLUDING ALL)
INHERITS (public.sensor_log);
 
CREATE SCHEMA shard_3;
SET search_path TO shard_3;
CREATETABLE sensor_log (LIKE public.sensor_log INCLUDING ALL)
INHERITS (public.sensor_log);
 
CREATE SCHEMA shard_4;
SET search_path TO shard_4;
CREATETABLE sensor_log (LIKE public.sensor_log INCLUDING ALL)
INHERITS (public.sensor_log);

The top sensor_log table in the public schema exists merely so we can query all the table sets as a whole without using a bunch of UNION statements. This should allow us to simulate how such a query would run without the benefit of parallel execution on each shard.

Now we need to fill the shards with data. Fortunately the generate_series function has an option to increment by arbitrary amounts, so simulating a has function for distribution is pretty easy. Here’s what that looks like:

INSERTINTO shard_1.sensor_log (location, reading, reading_date)SELECT s.id % 1000, s.id % 100, now()-(s.id ||'s')::INTERVALFROM generate_series(1,4000000,4) s(id);
 
INSERTINTO shard_2.sensor_log (location, reading, reading_date)SELECT s.id % 1000, s.id % 100, now()-(s.id ||'s')::INTERVALFROM generate_series(2,4000000,4) s(id);
 
INSERTINTO shard_3.sensor_log (location, reading, reading_date)SELECT s.id % 1000, s.id % 100, now()-(s.id ||'s')::INTERVALFROM generate_series(3,4000000,4) s(id);
 
INSERTINTO shard_4.sensor_log (location, reading, reading_date)SELECT s.id % 1000, s.id % 100, now()-(s.id ||'s')::INTERVALFROM generate_series(4,4000000,4) s(id);

Clearly a real sharding scenario would have a lot more involved in distributing the data. But this is poor-man’s parallelism, so it’s only appropriate to have a bunch of lazy shards to go with it.

In any case, we’re ready to query these tables. The way we generated the data, each table contains a million rows representing about six weeks of entries. A not infrequent use case for this structure is checking various time periods distributed across multiple days. That’s why we created the index on the TIME portion of our reading_date column.

If for example, we wanted to examine how 2PM looked across all of our data, we would do something like this:

\timing ON 
SELECTCOUNT(*)FROM public.sensor_log
 WHERE reading_date::TIME>='14:00'AND reading_date::TIME<'15:00';
 
TIME: 1215.589 ms
 
SELECTCOUNT(*)FROM shard_1.sensor_log
 WHERE reading_date::TIME>='14:00'AND reading_date::TIME<'15:00';
 
TIME: 265.620 ms

The second run with just one partition is included to give some insight at how fast the query could be if all four partitions could be checked at once. Here’s where the pmpp extension comes into play. It lets us send as many queries as we want in parallel, and pulls the results as they complete. Each query can be set to a different connection, too.

For the sake of simplicity, we’ll just simulate the remote connections with a local loopback to the database where we created all of the shards. In a more advanced scenario, we would be using at least two Postgres instances on potentially separate servers.

Prepare to be amazed!

CREATE EXTENSION postgres_fdw;
CREATE EXTENSION pmpp;
 
CREATE SERVER loopback
FOREIGNDATA WRAPPER postgres_fdw 
OPTIONS (host 'localhost', dbname 'postgres', port '5433');
 
CREATEUSER MAPPING
   FOR postgres 
SERVER loopback
OPTIONS (USER'postgres', password 'test');
 
\timing ON 
CREATE TEMP TABLE tmp_foo (total INT);
 
SELECTSUM(total)FROM pmpp.distribute(NULL::tmp_foo,'loopback',
  array['SELECT count(*) FROM shard_1.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''','SELECT count(*) FROM shard_2.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''','SELECT count(*) FROM shard_3.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''','SELECT count(*) FROM shard_4.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''']);
 
TIME: 349.503 ms

Nice, eh? With a bit more “wrapping” to hide the ugliness of broadcasting a query to multiple servers, this has some major potential! Since we separated the shards by schema, we could even bootstrap the connections so the same query could be sent to each without any further modifications. Of course, the Postgres foreign data wrapper doesn’t let us set the schema for created servers, so we’d need another workaround like pg_service.conf, but the components are there.

The primary caveat is that this approach works best for queries that drastically reduce the query set using aggregates. The problem is that Postgres needs to run the query on each system, fetch the results into a temporary structure, and then return it again from the distribute() function. This means there’s an inversely proportional relationship between row count and speed; there’s a lot of overhead involved.

Take a look at what happens when we try to run the aggregate locally:

\timing ON 
SELECTCOUNT(*)FROM pmpp.distribute(NULL::sensor_log,'loopback',
  array['SELECT * FROM shard_1.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''','SELECT * FROM shard_2.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''','SELECT * FROM shard_3.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''','SELECT * FROM shard_4.sensor_log
      WHERE reading_date::TIME >= ''14:00''
        AND reading_date::TIME < ''15:00''']);
 
TIME: 4059.308 ms

Basically, do anything possible to reduce the result set on queries. Perform aggregation remotely if possible, and don’t be surprised if pulling back tens of thousands of rows takes a bit longer than expected.

Given that caveat, this is a very powerful extension. It’s still very early in its development cycle, and could use some more functionality, but the core is definitely there. Now please excuse me while I contemplate ways to glue it to my shard_manager extension and methods of wrapping it to simplify invocation.

Shaun M. Thomas: PG Phriday: Massively Distributed Operation

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112