Quantcast
Channel: Planet PostgreSQL
Viewing all articles
Browse latest Browse all 9642

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: Cast Away

$
0
0

Unlike a lot of programming languages that have loose typing, databases are extremely serious about data types. So serious in fact, many patently refuse to perform automatic type conversions in case the data being compared is not exactly equivalent afterwards. This is the case with PGDB (PostgreSQL) and surprisingly often, queries will suffer as a result. Fortunately this is something we can address!

There are easier ways to demonstrate this, but in the interests of making this more of an adventure, let’s use a join between two tables:

CREATETABLE sys_order
(
    order_id     SERIAL       NOTNULL,
    product_id   INTNOTNULL,
    item_count   INTNOTNULL,
    order_dt     TIMESTAMPTZ  NOTNULLDEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL);
 
INSERTINTO sys_order (product_id, item_count, order_dt, valid_dt)SELECT(a.id % 100000)+1,(a.id % 100)+1,
       now()-(id % 1000||'d')::INTERVAL,CASEWHEN a.id % 499=0THENNULLELSE now()-(id % 999||'d')::INTERVALENDFROM generate_series(1,1000000) a(id);
 
ALTERTABLE sys_order ADDCONSTRAINT pk_order_id
      PRIMARYKEY(order_id);
 
CREATETABLE sys_exchange_map
(
    exchange_map_id  SERIAL       NOTNULL,
    vendor_id        INTNOTNULL,
    ext_order_code   TEXT         NOTNULL,
    map_message      TEXT         NOTNULL,
    xmit_dt          TIMESTAMPTZ  NOTNULLDEFAULT now());
 
INSERTINTO sys_exchange_map (
         exchange_map_id, vendor_id, ext_order_code,
         map_message, xmit_dt
)SELECT a.id,(a.id % 10)+1,((a.id % 10000)+1)||':'||SUBSTRING(random()::TEXT,3),'This is a bunch of junk from an external vendor queue!',
       now()-(id % 30||'d')::INTERVALFROM generate_series(1,100000) a(id);
 
ALTERTABLE sys_exchange_map ADDCONSTRAINT pk_exchange_map_id
      PRIMARYKEY(exchange_map_id);
 
CREATEINDEX idx_exchange_map_xmit_dt
    ON sys_exchange_map (xmit_dt);
 
ANALYZE sys_order;
ANALYZE sys_exchange_map;

This structure simulates something I’ve seen somewhat often in real systems. In this case, our order table is normal, but there’s also an exchange mapping table that maps our orders to external vendor orders. Imagine this table is part of the queue engine running the application, so the various codes and message columns are describing the interactions within the platform.

Given this structure, it’s not uncommon to encounter a query that manually parses one or more of these codes and tries to find matching orders. This is what that might look like:

SELECT o.*FROM sys_exchange_map m
  JOIN sys_order o ON(
         o.order_id =SUBSTRING(m.ext_order_code,'(.*):')::NUMERIC)WHERE m.xmit_dt >CURRENT_DATE;

Unfortunately on my test VM, this query takes almost four seconds! Experienced developers and DBAs probably already know what the problem is, but let’s take a gander at two segments of the EXPLAIN ANALYZE output and see if there are any obvious clues.

MERGEJOIN(cost=142800.24..568726.04 ROWS=16935000 width=28)(actual TIME=4844.280..4875.651 ROWS=3333 loops=1) 
 [ SNIP ] 
 ->  Seq Scan ON sys_order o
       (cost=0.00..17353.00 ROWS=1000000 width=28)(actual TIME=0.036..884.354 ROWS=1000000 loops=1) 
 Planning TIME: 0.240 ms
 Execution TIME: 3626.783 ms

We should be able to immediately see two potential problems. Thinking back on our discussions regarding estimates matching actual, notice that the database thinks it will be returning almost 17M rows, when there are really only around 3300. Further, it’s scanning the entire sys_order table instead of using the primary key. What happened, here?

This may or may not come as a surprise, but NUMERIC is not compatible with standard integer types in PGDB. For those who have worked with other databases such as Oracle, it may seem natural to use NUMERIC as a catch-all for any number. This is especially true for developers who work with different data types in a myriad of languages, some of which feature multiple compatible numeric types.

With PGDB, casting is controlled through explicit declaration of type compatibility. We can look at NUMERIC and see which types it will convert to with this query:

SELECT s.typname AS source_type,
       t.typname AS target_type,CASE c.castcontext
         WHEN'a'THEN'Assignment'WHEN'i'THEN'Implicit'WHEN'e'THEN'Explicit'ENDAS context
  FROM pg_cast c
  JOIN pg_type s ON(s.oid = c.castsource)JOIN pg_type t ON(t.oid = c.casttarget)WHERE s.typname ='numeric';
 
 source_type | target_type |  context   
-------------+-------------+------------NUMERIC| int8        | Assignment
 NUMERIC| int2        | Assignment
 NUMERIC| int4        | Assignment
 NUMERIC| float4      | Implicit
 NUMERIC| float8      | Implicit
 NUMERIC| money       | Assignment
 NUMERIC|NUMERIC| Implicit

From this chart, it would appear that INT and NUMERIC are compatible, but only through assignment. This means we can use them interchangeably in calculations, for example, or save an integer to a numeric column or the reverse. This does not mean they’ll be automatically converted or treated as compatible for equality tests. Since the types are not implicitly compatible, PGDB will not use the INT index for the NUMERIC cast.

Unable to use the proper index, the planner does its best to come up with a working plan. Unfortunately the type incompatibility wreaks havoc during that phase as well. What we end up with is something of a mess, with completely incorrect row estimates, an unnecessary sequence scan, and an expensive in-memory merge. And all it would take to fix all of this, is to make one tiny change to our query:

SELECT o.*FROM sys_exchange_map m
  JOIN sys_order o ON(
         o.order_id =SUBSTRING(m.ext_order_code,'(.*):')::INT)WHERE m.xmit_dt >CURRENT_DATE;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Nested Loop  (cost=66.98..22648.71 ROWS=3387 width=28)(actual TIME=1.058..42.084 ROWS=3333 loops=1) 
 [ SNIP ] 
 ->INDEX Scan USING pk_order_id ON sys_order o 
       (cost=0.43..6.19 ROWS=1 width=28)(actual TIME=0.003..0.003 ROWS=1 loops=3333) 
 Planning TIME: 1.176 ms
 Execution TIME: 43.635 ms

That’s a pretty massive difference! In this particular case, it’s 80 times faster, but I’ve run into situations where a single miscast caused the query to be almost 80,000 times slower. It all depends on the scale of the data involved, and how badly everything compounds at each step of the execution process.

The message here is to pay particular attention to data types and casting. If a query is performing badly when it should be much faster, make sure types are compatible across the board. Some languages are particularly strict with regard data type usage, so consider PGDB among them.


Viewing all articles
Browse latest Browse all 9642

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>