Quantcast
Channel: Planet PostgreSQL
Viewing all articles
Browse latest Browse all 9650

Rikard Pavelic: Fast Postgres from .NET

$
0
0

It's often said that abstractions slow down your program, since they add layers which makes your application slower.
While this is generally correct, it's not always true.
Performance can be improved somewhat by removing layers, but the best way to improve performance is to change algorithms.

So let's see how we can beat performance of writing SQL and doing object materialization by hand, as it is (wrongly) common knowledge that this is the fastest way to talk to the database.

First use case, simple single table access

CREATETABLE Post
(
  id UUIDPRIMARYKEY,
  title VARCHARNOTNULL,
  created DATENOTNULL)

So – the standard pattern to access such a table would be:

SELECT*FROM Post

ignoring (for now) that it would probably be a better style to explicitly name columns. Alternatively, in Postgres we can also do:

SELECT p FROM Post p

which would return a tuple for each row in the table.

For the first query, without going too deep into the actual Postgres protocol, we would get three "columns" with length and content. Parsing such a response would look something like this:

IDataReader dr =...returnnew Post { 
    id = dr.GetGuid(0), 
    title = dr.GetString(1),
    created = dr.GetDateTime(2)};

The second query, on the other hand, has only one "column" with length and content. Parsing such a response requires knowledge of Postgres rules for tuple assembly and is similar to parsing JSON. The code would look like this:

IDataReader dr =...return PostgresDriverLibrary.Parse<Post>(dr.GetValue(0));

In the TEXT protocol, the response from Postgres would look like this:

(f4d84c89-c179-4ae4-991a-e2e6bc12d879,"some text",2015-03-12)

So, now we can raise a couple of questions:

  • is it faster or slower for Postgres to return the second version?
  • can we parse the second response faster than the first response on the client side?

To make things more interesting, let's investigate how would it compare talking to Postgres using BINARY protocol in first case and using TEXT protocol for second case. Common knowledge tells us that binary protocols are much faster then textual ones, but this also isn’t always true:

chart

(DSL Platform – serialization benchmark)

Verdict: for such a simple table, performance of both approaches is similar


(DSL Platform DAL benchmark – single table)

Second use case, master-detail table access

Common pattern in DB access is reading two tables to reconstruct an object on the client side. While we could use several approaches, let's use the "standard one" which first reads from one table and then from a second one. This can sometimes lead to reading inconsistent data, unless we change the isolation level.

For this example, let's use an Invoice and Item tables:

CREATETABLE Invoice
(
  number VARCHAR(20)PRIMARYKEY,
  dueDate DATENOTNULL,
  total NUMERICNOTNULL,
  paid TIMESTAMPTZ,
  canceled BOOLNOTNULL,
  version BIGINTNOTNULL,
  tax NUMERIC(15,2)NOTNULL,
  reference VARCHAR(15),
  createdAt TIMESTAMPTZNOTNULL,
  modifiedAt TIMESTAMPTZNOTNULL);
 
CREATETABLE Item
(
  invoiceNumber VARCHAR(20)REFERENCES Invoice,
  _index INT,PRIMARYKEY(invoiceNumber, _index),
  product VARCHAR(100)NOTNULL,
  cost NUMERICNOTNULL,
  quantity INTNOTNULL,
  taxGroup NUMERIC(4,1)NOTNULL,
  discount NUMERIC(6,2)NOTNULL);

To make things more interesting we'll also investigate how performance would compare if we used a type instead of table for the items property. In that case we don't need a join or two queries to reconstruct the whole object.

So let's say that we want to read several invoices and their details. We would usually write something along the lines of:

SELECT*FROM Invoice
WHERE number IN('invoice-1','invoice-2',...) 
SELECT*FROM Item 
WHERE invoiceNumber IN('invoice-1','invoice-2',...)

and if we wanted to simplify materialization we could add ordering:

SELECT*FROM Invoice
WHERE number IN('invoice-1','invoice-2',...)ORDERBY number
 
SELECT*FROM Item
WHERE invoiceNumber IN('invoice-1','"invoice-2',...)ORDERBY invoiceNumber, _index

While this is slightly more taxing on the database, if we did a more complicated search, it would be much easier to process stuff in order via the second version.

On the other hand, by combining records into one big object directly on the database, we can load it in a single query:

SELECT inv,ARRAY_AGG(SELECT it FROM Item it
  WHERE it.invoiceNumber = inv.number
  ORDERBY it._index) as items  
FROM Invoice inv
WHERE inv.number IN('invoice-1','invoice-2',...)

The above query actually returns two columns, but it could be changed to return only one column.

Materialization of such objects on the client for the first version would look like this:

IDataReader master =...IDataReader detail =...var memory =new Dictionary<string, Invoice>();while(master.Read()){var head =new Invoice { 
    number = master.GetString(0), 
    dueDate = master.GetDateTime(1), ...}...}while(detail.Read()){var invoice = memory[detail.GetString(0)];var detail =new Item { 
    product = detail.GetString(2),
    cost = detail.GetDecimal(3)...}
  invoice.Items.Add(detail);}

Postgres native format would be materialized as in first example along the lines of:

IDataReader dr =...return PostgresDriverLibrary.Parse<Invoice>(dr.GetValue(0));

Postgres response in TEXT protocol would start to suffer from nesting and escaping, and would look something like:

(invoice-1,2015-03-16,"{""(invoice-1,1,""""product name"""",...)...}",...)

With each nesting layer more and more space would be spent on escaping. By developing optimized parsers for this specific Postgres TEXT response we can parse such a response very quickly.

Verdict: manual coding of SQL and materialization has become non-trivial. Joins introduce noticeable performance difference. Manual approach is losing ground.


(DSL Platform DAL benchmark – parent/child)

Third use case, master-child-detail table access

Sometimes we have nesting two levels deep. Since Postgres has rich type support this is something which we can leverage. So, how would our object-oriented modeling approach look like if we had to store bank account data into a database?

CREATETYPE Currency AS ENUM ('EUR','USD','Other');
 
CREATETYPE Transaction AS(
  date DATE,
  description VARCHAR(200),
  currency Currency,
  amount NUMERIC(15,2));
 
CREATETYPE Account AS(
  balance NUMERIC(15,2),
  number VARCHAR(40),
  name VARCHAR(100),
  notes VARCHAR(800),
  transactions Transaction[]);
 
CREATETABLE BankScrape
(
  id INTPRIMARYKEY,
  website VARCHAR(1024)NOTNULL,
  at TIMESTAMPTZNOTNULL,
  info HSTORENOTNULL,
  externalId VARCHAR(50),
  ranking INTNOTNULL,
  tags VARCHAR(10)[]NOTNULL,
  createdAt TIMESTAMPTZNOTNULL,
  accounts Account[]NOTNULL);

Our SQL queries and materialization code will look similar to before (although complexity will have increased drastically). Escaping issue is even worse than before and while reading transactions we are mostly skipping escaped chars. Not to mention that due to LOH issues we can’t just process a string, it must be done using TextReader.

Verdict: manual coding of SQL and materialization is really complex. Joins introduce a noticeable performance difference. Manual approach is not comparable on any test:


(DSL Platform DAL benchmark – parent/child/child)

Conclusions

  • Although we have looked into simple reading scenarios here, insert/update performance is maybe even more interesting.
  • Approach took by the Revenj and backing compiler is not something which can realistically be reproduced by manual coding.
  • Postgres is suffering from parsing complex tuples – but with smart optimizations that can yield net win. There are also few "interesting" behaviors of Postgres which required various workarounds.
  • It would be interesting to compare BINARY and TEXT protocol on deep nested aggregates.
  • JSON might have similar performance to Postgres native format, but it's probably more taxing on Postgres.

Viewing all articles
Browse latest Browse all 9650

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>