Franck Pachot: COPY progression in YugabyteDB

When you want to evaluate the migration from PostgreSQL to YugabyteDB you may:

load metadata (DDL) from a pg_dump
load data with pg_restore
test the application

Here are some quick scripts that I've used for that. And it can give an idea of what can be done to import data from a http endpoint in JSON into a PostgreSQL table.

DDL

The DDL generated by pg_dump creates the table and then adds the primary key. This is fine in PostgreSQL as tables are heap tables and the primary key index is a secondary index. However, YugabyteDB tables are stored in LSM Tree organized (sharded by hash, range and sorted) according to the primary key. Similar to tables in MySQL, clustered indexes in SQL Server or Index Organized Tables in Oracle, but with horizontal partitioning to scale out. So, if you run the CREATE TABLE and ALTER TABLE to add the primary key, you will create them with a generated UUID, and then reorganize the tables. On empty tables, this is not a big deal but better do all in the create statement. One way to do that is using yb_dump, the Yugabyte fork of pg_dump, to export from PostgreSQL. But if you already have the .sql file, here is a quick awk to modify it:

awk'
# gather primary key definitions in pk[] and cancel lines that adds it later
/^ALTER TABLE ONLY/{last_alter_table=$NF}
/^ *ADD CONSTRAINT .* PRIMARY KEY /{sub(/ADD /,"");sub(/;$/,"");pk[last_alter_table]=$0",";$0=$0"\\r"}
# second pass (i.e when NR>FNR): add primary key definition to create table
NR > FNR && /^CREATE TABLE/{ print $0,pk[$3] > "schema_with_pk.sql" ; next}
# disable backfill for faster create index on empty tables
/^CREATE INDEX/ || /^CREATE UNIQUE INDEX/ { sub("INDEX","INDEX NONCONCURRENTLY") }
NR > FNR { print > "schema_with_pk.sql" }
' schema.sql schema.sql

This is a quick-and-dirty two-pass on the DDL to get the primary key definition, and then pur it in the CREATE TABLE statement.

Note that I also change the CREATE INDEX to disable backfill. More info about this in a previous blog post: https://dev.to/yugabyte/create-index-in-yugabytedb-online-or-fast-2dl3

Youu will probably create the secondary indexes after the load, but I like to run the whole DDL once, before loading data, to see if there are some PostgreSQL features that we do not support yet.

COPY progress

Then you may import terabytes of data. This takes time (there is work in progress to optimize this) and you probably want to see the progress. As we commit every 1000 rows by default (can be changed with rows_per_transaction) you can select count(*) but, that takes time on huge tables, requiring to increase the default timeout.

We have many statistics from the tablet server, exposed though the tserver endoint, like http://yb1.pachot.net:9000/metrics, as JSON. But for a PoC you may not have set any statistics collection and want a quick look at the statistics.

Here is a script to gather all tablet metrics into a temporary table:

-- create a temporary table to store the metrics:

drop table if exists my_tablet_metrics;
create temporary table if not exists my_tablet_metrics (
 table_id text,tablet_id text
 ,schema_name text, table_name text
 ,metric text, value numeric
 ,tserver text
 ,timestamp timestamp default now()
);

-- gather the metrics from the tserver:

do $do$ 
 declare
  tserver record;
 begin
  -- gather the list of nodes and loop on them
  for tserver in (select * from yb_servers())
   loop
    -- use COPY from wget/jq to load the metrics
    execute format($copy$
     copy my_tablet_metrics(table_id,tablet_id
     ,schema_name,table_name
     ,metric,value,tserver)
     from program $program$
      wget -O- http://%s:9000/metrics | 
      jq --arg node "%s" -r '.[] 
       |select(.type=="tablet") 
       |.attributes.table_id+"\t"+.id
       +"\t"+.attributes.namespace_name
       +"\t"+.attributes.table_name
       +"\t"+(.metrics[]|select(.value>0)
       |(.name+"\t"+(.value|tostring)))
       +"\t"+$node
       '
     $program$
    with (rows_per_transaction 0)
   $copy$,tserver.host,tserver.host);
   end loop;
 end; 
$do$;

-- aggregate the per-tablet row insertion metrics:

select sum(value),format('%I.%I',schema_name,table_name) 
from my_tablet_metrics where metric='rows_inserted'
group by 2 order by 1
;

I use a server-side COPY FROM PROGRAM here, so you must have installed wget to read from the http endpoint and jq to transform the JSON structure to a tab-separated text.

Here is an example after loading the Northwind demo schema:

If you find any issue when loading your PostgreSQL into Yugabyte, please reach out to us (https://www.yugabyte.com/community) there are many things in the roadmap.

Franck Pachot: COPY progression in YugabyteDB

DDL

COPY progress

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112