Jim Mlodgenski: HDFS for PostgreSQL Backups

On several occasions, I’ve been talking with groups of PostgreSQL users and the question comes up, “If I use PostgreSQL, why would I want to use Hadoop?” There are many answers and the question is usually asked when people don’t really understand the details about Hadoop, but let’s just focus on a single use case. Backups.

For most larger databases, on line backups are used with point in time recovery. This lets administrators to backup, or more importantly restore, their databases quickly. The trade off for this, is that you’re making a physical copy of the database files so if your database is a terabyte, you’re backup will be a terabyte before you compress it. If you’re keeping weekly backups and you have a company policy to retain your backups for months, it’ll require a lot of storage. That’s where Hadoop comes in.

At the core of Hadoop is the Hadoop Distributed File System (HDFS), which isn’t a POSIX compliant file system, but it does have some pretty great properties. It’s designed to run on inexpensive hardware while still be fault tolerant. This means that you can go out and buy some inexpensive drives and put them in some older desktops you have lying around the office and you’ll have a highly redundant storage cluster. No need to buy an expensive SAN or NAS device or ship your data to a cloud service like Amazon S3.

Leveraging HDFS for your PostgreSQL backups is pretty straight forward. Assuming you have a Hadoop cluster already setup, you’ll just need to put the Hadoop client on your server. From your PostgreSQL server, first test that you can connect to the cluster and do a simple directory listing.

jim@jim-XPS:~$ hadoop dfs -ls hdfs://192.168.122.91:9000/user
Found 3 items
drwxr-xr-x - bigsql supergroup 0 2013-06-26 12:15 /user/bigsql
drwxr-xr-x - bigsql supergroup 0 2013-06-26 12:00 /user/hive
drwxr-xr-x - bigsql supergroup 0 2013-07-08 12:17 /user/postgres

If you run into any errors, you most likely need to change the fs.default.name property in core-site.xml to use the correct URL instead of localhost.

Once you have your connectivity configured correctly, you can leverage the cluster as a place to keep your backups. Create your base backup with the tool of your choice and once it’s done, just copy it to HDFS.

hadoop dfs -copyFromLocal basebackup_20130711.tar.gz hdfs://192.168.122.91:9000/user/postgres/backups/base

You can even set you’re archive command to write out archive your WAL files directly to HDFS. Just be careful with this one if you’re switching log files pretty frequently. The command takes a bit longer than a simple rsync.

archive_command = 'hadoop dfs -copyFromLocal %p hdfs://192.168.122.91:9000/user/postgres/backups/wal/%f'

Jim Mlodgenski: HDFS for PostgreSQL Backups

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112