Quantcast
Channel: Planet PostgreSQL
Viewing all articles
Browse latest Browse all 9702

Henrietta Dombrovskaya:

$
0
0

At PG Day Chicago, I presented an extended version of my talk given last year at Citus.con – Temporal Tables and Standard. Just between the time my talk was accepted and I delivered the presentation, I learned that PG 17 would include the first-ever support of an important temporal feature: uni-temporal primary keys and unique constraints.

It has been a while since the last time I presented anything temporal-related, which meant that many people in the audience hadn’t heard anything about the bitemporal model before. There was no way I could cover everything in 40 minutes, and many questions, which were asked both during the Q&A and later in the hallways, remained unanswered.

In this blog, I will address some of these questions and expand on what I would like to see in the upcoming Postgres implementation of temporal tables.

  1. Bitemporal framework and GIST. The key feature of the bitemporal model is its dependency on existing PG extensions, specifically on GIST indexes and GIST with EXCLUSION constraints. In fact, the GIST extension does all the work needed to support (bi) temporal primary/unique keys: it ensures that there is no time overlap on any two distinct values of the primary key. In the bitemporal model, we check the same thing for the two time dimensions. For those who never needed GIST indexes, here is the relevant documentation. I learned about GIST when I first started implementing bitemporality, and I could not believe all my needs were already met!
  2. Disk space requirements. For some reason, people believe that keeping all versions of each tuple requires “too much disk space.” I won’t deny that you need more space to store row versions than when you don’t; however, how much more is often overestimated. In my talk at PG Conf.EU 2022, I presented an example of storing changelog vs. storing data in a bitemporal model and demonstrated that it actually takes less space while allowing queries to be executed much faster.
  3. Excessive IO. One of the questions I was asked was whether the use of bitemporal model increases the system IO. The answer is surprisingly insignificant. Let’s look at the database operations. INSERT is the same insert, whether it is temporal or not. The non-temporal update is equal to one INSERT and one DELETE. The uni-temporal UPDATE results in one INSERT and one UPDATE; in other words, two inserts and one delete. The bitemporal UPDATE is equal to two inserts and one update; in other words, it is equal to three inserts and one delete. That means the number of costly operations remains the same as with regular updates. Also, note one remarkable fact: the only field that changes in the updated record is the time interval. That means that 1) the record size is not going to change 2) since GIST index is an R-Tree, the intervals order is defined by inclusion. When we update the time range, the only thing we are doing with it is making is smaller (end-dating), thereby the node in the index will never move, which means that GIST indexes in this case will (almost) never experience a bloat. As for all regular B-tree indexes, all of the updates in the temporal models are HOT updates.
  4. Why is the first temporal feature in PG 17 so significant? Having temporal primary/unique temporal keys in PG 17 might seem insignificant – after all, that’s what GIST with exclusion does anyway. However, one of my huge asks for many previous years was the ability to see temporal keys in a table description. I’ve invented a lot of tricks (mainly having an “empty” check constraint) so that I would be able to identify temporal tables using the \d psql command. Now, we can do it.
  5. System time or application time? Now, I am going to switch to my questions and concerns about “what’s next” in the area of temporal tables support. When I first heard about the temporal key in PG 17 at SCaLE, I immediately asked the presenter when the second dimension would be added, to which he replied – very soon. We are actively working on it, and we are going to implement everything in the standard. That means, among other things, that Postgres should distinguish between SYSTEM_TIME (as per standard) and application time, and I do not see this distinction in the ongoing discussions. 

Why is this important? 

The SQL standard requires adding semantics to DML, adding a “FOR PERIOD” clause to all commands (and assuming CURRENT if it is omitted, to keep the old code functioning). However, for the SYSTEM_TIME, “FOR” is irrelevant, because system time is defined as “transaction time”, so it can only start at the “present” moment, and for any tuple in a temporal table, it can be either in the past or now. As for application time, it is not bound to a transaction, and “FOR” can be defined as any past, present, or future time period. In both cases, “update” is not a regular update but a sequence of inserts and updates, as I described in 2). And when we define temporal referential integrity, we need to take these semantics into account, which I still have to see. From my perspective, this test is not correct:

INSERT INTO temporal_rng (id, valid_at) VALUES
   ('[1,2)', daterange('2018-01-02', '2018-02-03')),
   ('[1,2)', daterange('2018-03-03', '2018-04-04')),
   ('[2,3)', daterange('2018-01-01', '2018-01-05')),
   ('[3,4)', daterange('2018-01-01', NULL));
 ALTER TABLE temporal_fk_rng2rng
     DROP CONSTRAINT temporal_fk_rng2rng_fk;
 INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', daterange('2018-01-02', '2018-02-01'), '[1,2)');
 ALTER TABLE temporal_fk_rng2rng
     ADD CONSTRAINT temporal_fk_rng2rng_fk
     FOREIGN KEY (parent_id, PERIOD valid_at)
     REFERENCES temporal_rng;
 ALTER TABLE temporal_fk_rng2rng
     DROP CONSTRAINT temporal_fk_rng2rng_fk;
 INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[2,3)', daterange('2018-01-02', '2018-04-01'), '[1,2)');
 -- should fail:
 ALTER TABLE temporal_fk_rng2rng
     ADD CONSTRAINT temporal_fk_rng2rng_fk
     FOREIGN KEY (parent_id, PERIOD valid_at)
     REFERENCES temporal_rng;
 ERROR:  insert or update on table "temporal_fk_rng2rng" violates foreign key constraint "temporal_fk_rng2rng_fk"
 DETAIL:  Key (parent_id, valid_at)=([1,2), [2018-01-02,2018-04-01)) is not present in table "temporal_rng".

(It is very probable that the link that was sent to me does not reflect the current status, so I am holding off my judgment until I double-check; however, that’s a good illustration of the importance of operations semantics)


Viewing all articles
Browse latest Browse all 9702

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>