Quantcast
Channel: Planet PostgreSQL
Viewing all 9758 articles
Browse latest View live

Keith Fiske: Table Partitioning & Long Names

$
0
0

While working on a way to reapply indexes to existing partitions in my partition management extension, it dawned on me that PostgreSQL’s default object name length limit of 63 characters could cause some interesting issues for child table names when setting up partitioning. Since most partitioning relies on a suffix on the end of the child table names for uniqueness and to easily identify their contents, the automatic truncation of a long child name could really cause some big problems. So I set about seeing if I could handle this with pg_partman.

After several weeks of rewriting a lot of the core functionality of object naming, I think I’ve handled this as best as I can. This applied not only to the child table names, but also to the trigger function and trigger name. Below is a simplified version of the function that’s included in the extension for general use in any situation.

CREATE FUNCTION check_name_length (p_object_name text, p_object_schema text DEFAULT NULL, p_suffix text DEFAULT NULL) RETURNS text
    LANGUAGE plpgsql
    AS $$
DECLARE
    v_new_length    int;
    v_new_name      text;
BEGIN

IF char_length(p_object_name) + char_length(COALESCE(p_suffix, '')) >= 63 THEN
v_new_length := 63 - char_length(COALESCE(p_suffix, ''));
v_new_name := COALESCE(p_object_schema ||'.', '') || substring(p_object_name from 1 for v_new_length) || COALESCE(p_suffix, ''); 
ELSE
v_new_name := COALESCE(p_object_schema ||'.', '') || p_object_name||COALESCE(p_suffix, '');
END IF;

RETURN v_new_name;

END
$$;

This takes whatever object name is given, checks the length with or without the provided suffix, and returns an object name with a valid length. If the object name would be too long with the suffix, the original name is truncated and the suffix is added. You an also just provide an object name and no suffix and it will check that as well. If no truncation is necessary, you get your original name back with the schema and/or suffix applied if given. The schema name of an object does not count against this length limit, but a parameter is provided so you can get a schema qualified name returned.

Here is an example when no truncation is needed:

keith=# SELECT check_name_length('tablename', 'schema', '_suffix');
    check_name_length    
-------------------------
 schema.tablename_suffix

Here’s an example where truncation would be needed

keith=# select check_name_length('tablename_thisobjectsnameistoolongforpostgresandwillbetruncated', 'schema', '_suffix');
                           check_name_length                            
------------------------------------------------------------------------
 schema.tablename_thisobjectsnameistoolongforpostgresandwillbetr_suffix

The only issue I’ve seen with this method is that if you have a lot of really long, similarly named tables, you can still run into naming conflicts. Especially with serial based partitioning where the original table name is slowly getting truncated more and more over time. But I think handling it like this is preferred to having the suffix unexpectedly truncated, which could cause conflicts within the same partition set.

It’s edge cases like this that drove me to try and make something to handle partitioning easier in PostgreSQL until it’s built in. I’ll be giving a talk at PG Open next week about the development I’ve been doing and how the new extensions system has made it much easier to contribute to PostgreSQL without knowing a single line of C!

 


Josh Berkus: More about my favorite 9.3 Features (video and more)

$
0
0
If you somehow missed it, PostgreSQL 9.3.0 is now available, just in time for your back-to-school shopping.  9.3 includes lots of great stuff, some of which I've already been using in development, and I'll tell you more about my favorites below.  There's also a survey and we'll have live video of SFPUG doing 9.3 on Thursday.

No More SHMMAX


We didn't emphasize this in the release announcement -- mainly because it's like removing a wart, you won't want to talk about it -- but this is the one 9.3 change liable to make life easier for more developers than any other.  We've stopped using SysV memory for anything other than the postmaster startup lock, which means that you can now adjust shared_buffers to your heart's content without needing to mess with sysctl.conf.  Let alone the satanic incantations you have to go through on the Mac.

This also clears one of the main barriers to writing simple autotuning scripts.  Which means I'm out of excuses for not having written one.

Custom Background Workers


Need a daemon to do background work alongside Postgres, such as scheduling, queueing, maintenance, or replication?  Maybe you want to intercept MongoDB-formatted queries and rewrite them for Postgres?  Custom background workers allow you to create your own "autovacuum daemon" which does whatever you want it to.

Michael Paquier will be presenting Background Workers for SFPUG on Thursday the 12th (7:30PM PDT).  Details on our Meetup Page, including a link to live video for those of you not in the Bay Area.

Streaming-Only Cascading


This has been my biggest desire since 9.2 came out; we were so close to not needing to worry about archiving, ever, for small databases.  And now we're there.  You can make chains of replicas, fail over to one of them, remaster, make a replica at a new data center the master, change the direction of replication, and lots more configurations without needing to worry about WAL archiving and all its overhead.

If you combine this with Heikki's work on pg_rewind, things get even more flexible since you don't have to resnapshot for failback anymore.

I'll be presenting a live demo of this feature at the SFPUG meeting, including connecting replicas in a ring (all replicas -- no master!). 

So, what's your favorite 9.3 feature?  Vote here!

Leo Hsu and Regina Obe: PostGIS 2.1 windows bundle

$
0
0

PostgreSQL 9.3 came out today and we are excited to start using the new features. PostGIS 2.1.0 came out about 2 weeks ago and pgRouting just cut the RC 3 release. For windows PostGIS users who are impatient to try the new suite, binaries can be found on the Unreleased versions of PostGIS.net windows page.

We are planning an official release sometime probably next week on StackBuilder. We are waiting for release of pgRouting 2.0 before we do which should be out next week. This new 2.1 release will be dubbed the PostGIS 2.1 Bundle since it will have more than just PostGIS. It will include postgis extensions (postgis which includes geometry,raster, geography) , postgis_topology, postgis_tiger_geocoder), address_standardizer extension (a companion to tiger geocoder), and pgRouting 2.0.

For those people running PostGIS 2.0 and PostgreSQL 9.0+, especially (raster and geography) users, I highly recommend you jump to PostGIS 2.1. PostGIS 2.1 is a soft upgrade from 2.0. For raster there are enormous speed improvements and new functions. The ones we are most excited about in raster are the much much much faster ST_Clip and ST_Union functions (which now does multi-band in addition to being faster). These two functions are highly important since they are generally the first step in many raster workflows. Geography has speed improvements for point in poly and a ST_Segmentize function done on the spheroid (important for long range). Geometry has a couple of new functions. The Enhanced 3D functionality provided by SFCGAL is brand new and probably won't be distributed by many package maintainers until PostGIS 2.2 where it will garner a few more features and stability improvements.


Continue reading "PostGIS 2.1 windows bundle"

Denish Patel: A week(s) of Conferences!

$
0
0

This week, I will be attending Surge (Scalability Conference) in Washington D.C. Next week, I will be speaking/attending  PgOpen (Postgres conference)  in Chicago. If you are planning to attend any of these conferences, it will be nice opportunity to meet in-person and catch up on technologies & stories specifically about databases !

See you soon!!

Szymon Guz: PostgreSQL 9.3 Released

$
0
0

Yesterday PostgreSQL 9.3 was released. It contains many great new features, below is a simple description of those I think are most important. There are many more than the short list, all of them can be find at PostgreSQL 9.3 Release Notes website.

One of the most important features of the new release is the long list of bug fixes and improvements making the 9.3 version faster. I think it is the main reason for upgrading. There are also many new features which your current application will not possibly use, but faster database is always better.

The new mechanism of background workers gives us quite new possibilities to run a custom process in the background. I've got a couple of ideas for implementing such background tasks like a custom message queue, or postgres log analyzer, or a tool for accessing PostgreSQL using HTTP (and JSON - just to have API like the NoSQL databases have).

Another nice feature, which I haven't checked yet, is data checksums. Something really useful for checking data consistency at data files level. It should make all the data updates slower, but I haven't checked how much slower, there will be another blog post about that.

There is also parallel pg_dump which will lead to faster backups.

The new Postgres version also has switched from SysV to Posix shared memory model. In short: you won't need setting SHMMAX and SHMALL any more.

There are also many new JSON functions, I used some of them in one my previous posts.

Another really great feature is the possibility of creating event triggers. So far you could create triggers on data changes. Since PostgreSQL 9.3 you can create a trigger on dropping or creating a table, or even on dropping another trigger.

Views also changed a lot. Simple views are updatable now, and there are materialized views as well.

The Foreign Data Wrapper mechanism has been enhanced. The mechanism allows you to map an external data source to a local view. There is also the great postgres_fdw shipped with Postgres 9.3. This library enables to easily map a table from another PostgreSQL. So you can access many different Postgres databases using one local database. And with materialized views you can even cache it.

Another feature worth mentioning is faster failover of replicated database, so when your master database fail, the failover switch to slave replica is much faster. If you use Postgres for you website, this simply means shorter time your website is offline, when your master database server fails.

More information you can find in the release announcement.

Joshua Drake: Just back from NYCPug August, on to more talks

$
0
0
In August I spoke at NYCPUG on Dumb Simple PostgreSQL Performance. The talk was well accepted and there was about 60 people in attendance. I have always enjoyed my trips to NYC but this is the first time I have taken a leisurely look at the city. I found myself enjoying a water front walk from 42nd, through the Highline, to Battery Park, all the way to the Brooklyn Bridge and over to Brooklyn to a great pub for dinner. What I enjoyed most about the walk outside of the 10 miles was the community that was present. I think it is easy to get jaded by "midtown" and all that is the tourist in that area. The hustle and bustle, the pollution and dirt of a city. The waterfront walk however reminded me very much of Seattle, there was green and water everywhere, very little litter, lots of families and bike riders. All around a wonderful day and it made me look forward to coming back to NYC again in March for the NYC PostgreSQL Conference.

Alas my travel schedule is just beginning. I will be speaking at the Seattle PostgreSQL User Group on Oct 1st. I will be giving the same talk as I did in NYC as it has been updated and always seems to be well received. If you are in town you should come join us:

1100 Eastlake Ave E, Seattle, WA 98102

I always enjoy Seattle and will also be making a side trip to Bellingham as it looks like I will be moving there soon. It will be sad to say good bye to the Columbia River Gorge but I am looking forward to the Mount Baker area as well as hopefully starting Vancouver, B.C. and Whatcom county PostgreSQL user groups.

Just three weeks after Seattle, I will be cross the pond to wonderful Ireland where it is bound to be cold, dark and wet. That's alright though as I will plan on staying from Oct 26th - Nov 3rd, allowing for a full trip of sight seeing. Of course I will be dropping by on Friday the 1st to speak on PostgreSQL Backups which reminds me that I need to update the talk for the new pg_dump features found in 9.3.

Craig Kerstiens: Diving into Postgres JSON operators and functions

$
0
0

Just as PostgreSQL 9.3 was coming out I had a need to take advantage of the JSON datatype and some of the operators and functions within it. The use case was pretty simple, run a query across a variety of databases, then take the results and store them. We explored doing something more elaborate with the columns/values, but in the end just opted to save the entire result set as JSON then I could use the operators to explore it as desired.

Here's the general idea in code (using sequel):

result = r.connection { |c| c.fetch(self.query).all }
mymodel.results = result.to_json

As the entire dataset was stored as some compressed JSON I needed to do a bit of manipulation to get it back into a form that was workable. Fortunately all the steps were fairly straightforward.

First you want to unnest each result from the json array, in my case this looked like:

SELECT json_array_elements(result)

The above will unnest all of the array elements so I have an individual result as JSON. A real world example would look something similar to:

SELECT json_array_elements(result) 
FROM query_results 
LIMIT 2;
          json_array_elements
-----------------------------------------
 {"column_name":"data_in_here"}
 {"column_name_2":"other_data_in_here"}
(2 rows)

From here based on the query I would want to get some specific value. In this case I'm going to search for the text key column_name_2:

SELECT json_array_elements(result)->'column_name_2' 
FROM query_results 
LIMIT 1;

  json_array_elements  
-----------------------
 "other_data_in_here"
(1 rows)

One gotcha I encountered was when I wanted to search for some value or exclude some value... Expecting I could just compare the result of the above in a where statement I was sadly mistaken because the equals operator didn't translate. My first attempt at fixing this was to cast in this form:

SELECT json_array_elements(result)->'column_name_2'::text

The sad part is because of the operator the cast doesn't get applied as I'd expect. Instead you'll want to do:

SELECT (json_array_elements(result)->'column_name_2')::text

Of course theres plenty more you can do with the JSON operators in the new Postgres 9.3. If you've already got JSON in your application give them a look today. And while slightly worse, if you've got JSON stored in a text field simply cast it with ::json to begin using the operators.

Hubert 'depesz' Lubaczewski: How to make backups of PostgreSQL?

$
0
0
Recently someone was looking for help with script from Pg wiki. I never really checked it, but when talking with the guy needing help, I looked at it. And didn't really like it. For starters – the scripts are huge – over 9kB. Second – they use techniques that are not that great (I'm talking […]

Hans-Juergen Schoenig: PostgreSQL Vim integration: Finally …

$
0
0
Last time I had come up with the idea of writing some Vim script for PostgreSQL so that people can edit data fast and easily directly in Vim. The goal is to export a table directly to Vim, modify it and load it back in. This can come in handy when you want to edit […]

Hans-Juergen Schoenig: PostgreSQL 9.3: new functionality

$
0
0
PostgreSQL 9.3 has just been released and we have already received a lot of positive feedback for the new release. Many people are impressed by what has been achieved recently and are already eager to enjoy those new features. As always, the new release brings a great deal of new functionality and many improvements. Everybody […]

Michael Paquier: Postgres 9.3 feature highlight: event triggers

$
0
0
Event triggers is a new kind of statement-based trigger added in PostgreSQL 9.3. Compared to normal triggers fired when DML queries run on a given table, event triggers are fired for DDL queries and are global to a database. The use cases of event triggers are various. Restrict the execution of DDL or log/record information [...]

Greg Smith: Tuning Disk Timeouts on Virtual Machines

$
0
0

Dedicated servers are important for some databases, and I write a lot more about those difficult cases than the easy ones. But virtual machines have big management advantages. Recently I started moving a lot of my personal dedicated servers onto one larger shared box. It went terribly–my personal tech infrastructure lives out Murphy’s Law every day–and this blog has been down a whole lot of the last month. But I’m stubborn and always learn something from these battles, and this time the lesson was all about disk timeouts on virtual machines and similar cloud deployments. If you’d like to peek at the solution, I resolved the problems with advice from a blog entry on VM disk timeouts. The way things fell apart has its own story.

My project seemed simple enough: start with a Windows box (originally dual core/2GB RAM) and three Linux boxes (small hosting jobs, database/git/web/files). Turn them all into VMs, move them onto a bigger and more reliable server (8 cores, 16GB of RAM, RAID1), and run them all there. Disk throughput wasn’t going to be great, but all the real work these systems do fit in RAM. How bad could it be?

Well, really bad is the answer. Every system crashed intermittently, the exact form unique to their respective operating system. But intermittent problems get much easier to find when they happen more frequently. When one of the Linux systems started crashing constantly I dropped everything else to look at it. First there were some read and write errors:

Sep 11 13:09:48 wish kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Sep 11 13:09:48 wish kernel: ata3.00: cmd 61/08:00:48:12:65/00:00:02:00:00/40 tag 0 ncq 4096 out
Sep 11 13:09:48 wish kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 11 13:09:48 wish kernel: ata3.00: status: { DRDY }
Sep 11 13:09:48 wish kernel: ata3.00: failed command: READ FPDMA QUEUED
Sep 11 13:09:48 wish kernel: ata3.00: cmd 60/40:28:40:9f:68/00:00:02:00:00/40 tag 5 ncq 32768 in
Sep 11 13:09:48 wish kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 11 13:09:48 wish kernel: ata3.00: status: { DRDY }
Sep 11 13:09:48 wish kernel: ata3: hard resetting link
Sep 11 13:09:48 wish kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 11 13:09:48 wish kernel: ata3.00: qc timeout (cmd 0xec)
Sep 11 13:09:48 wish kernel: ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 11 13:09:48 wish kernel: ata3.00: revalidation failed (errno=-5)
Sep 11 13:09:48 wish kernel: ata3: limiting SATA link speed to 1.5 Gbps
Sep 11 13:09:48 wish kernel: ata3: hard resetting link
Sep 11 13:09:48 wish kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 11 13:09:48 wish kernel: ata3.00: configured for UDMA/133
Sep 11 13:09:48 wish kernel: ata3.00: device reported invalid CHS sector 0
Sep 11 13:09:48 wish kernel: ata3: EH complete

Most of the time when a SATA device gives an error, the operating system will reset the whole SATA bus it’s on to try and regain normal operation. In this example that happens, and Linux drops the link speed (from 3.0Gbps to 1.5Gbps) too. That of course makes the disk I/O problem worse, because now transfers are less efficient. Awesome.

To follow this all the way to crash, more errors start popping up. Next Linux disables NCQ, further pruning the feature set it’s relying on in hopes the disk works better that way. PC hardware has a long history of device bugs when using advanced features, so this falling back to small feature sets approach happens often when things start failing:

Sep 11 13:11:36 wish kernel: ata3: hard resetting link
Sep 11 13:11:36 wish kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 11 13:11:36 wish kernel: ata3.00: configured for UDMA/133
Sep 11 13:11:36 wish kernel: ata3: EH complete
Sep 11 13:11:36 wish kernel: ata3: illegal qc_active transition (00000003->000003f8)
Sep 11 13:11:36 wish kernel: ata3.00: NCQ disabled due to excessive errors
Sep 11 13:11:36 wish kernel: ata3.00: exception Emask 0x2 SAct 0x3 SErr 0x0 action 0x6 frozen
Sep 11 13:11:36 wish kernel: ata3.00: failed command: READ FPDMA QUEUED
Sep 11 13:11:36 wish kernel: ata3.00: cmd 60/28:00:e8:e4:95/00:00:00:00:00/40 tag 0 ncq 20480 in
Sep 11 13:11:36 wish kernel: res 00/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
Sep 11 13:11:36 wish kernel: ata3.00: failed command: READ FPDMA QUEUED
Sep 11 13:11:36 wish kernel: ata3.00: cmd 60/08:08:58:e2:90/00:00:01:00:00/40 tag 1 ncq 4096 in
Sep 11 13:11:36 wish kernel: res 00/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)

Finally, the system tries to write something to swap. When that fails too it hits a kernel panic, and then the server crashes. I can’t show you text because the actual crash didn’t ever hit the server log file–that write failed too and then the server was done. Here’s a console snapshot showing the end:

VM Disk Timeout Crash

VM Disk Timeout Crash

Now, when this happens once or twice, you might write it off as a fluke. But this started happening every few days to the one VM. And the Windows VM was going through its own mysterious crashes. Running along fine, the screen goes black, and Windows was just dead. No blue screen or anything, no error in the logs either. I could still reach the VM with VirtualBox’s RDP server, but the screen was black and it didn’t respond to input. Whatever mysterious issue was going on here, it was impacting all of my VM guests.

On this day where the one server was crashing all of the time, I looked at what was different. I noticed that the VM host itself was running a heavy cron job at the time. Many Linux systems run a nightly updatedb task, the program that maintains the database used by the locate command. When the VM host is busy, that makes I/O to all of the guest VMs slow too. And I had moved a 2TB drive full of data into that server the previous day. Reindexing that whole thing in updatedb was taking a long time. That was the thing that changed–the updatedb job was doing many disk reads.

What was happening here was an I/O timeout. Operating systems give the disks a relatively small amount of time to answer requests. When those timeouts, typically 30 seconds long, expire, the OS does all of this crazy bus reset and renegotiation stuff. With the dueling VMs I had, one of the two systems sharing a disk could easily take longer than 30 seconds to respond when the other was pounding data. Since I/O the kernel isn’t aware of is rare on dedicated hardware, it’s a condition that can easily turn into a crash or some other type of poorly handled kernel code. Thus my Linux kernel panics and mysterious Windows black screens.

Once I realized the problem, the fix was easy. There’s a fantastic article on VM disk timeouts that covers how to do this sort of tuning for every operating system I was worried about. I followed his udev based approach for my recent Linux systems, changing them to 180 seconds. (I even added a comment at the end suggesting a different way to test that is working). Then I hit regedit on Windows to increase its timeouts from the 60 seconds they were set to:

Windows Disk Timeout Registry Entry

Windows Disk Timeout Registry Entry

You’ll find similar advice from VM vendors too, like in VMWare’s Linux timeout guide. Some VM programs will tweak the timeouts for you when you install the guest tools for the VM. But the value I settled on that made all my problems go away was 180 seconds, far beyond what even VM setup software normally does by default.

You can also find advice about this from NAS manufacturers like NetApp too, although I wasn’t able to find one of their recommendations I could link to here (they all needed a support account to access). NAS hardware can be very overloaded sometimes. And even when you have an expensive and redundant model, there can be a long delay when a component dies and there’s a failover to its replacement. When disks die, even inside of a NAS it can go through some amount of this renegotiation work. Tuning individual disk timeouts in a RAID volume is its own long topic.

The other lesson I learned here is that swap is a lot more expensive on a VM than I was estimating, because the potential odds and impact for shared disk contention is very high. For example, even though the work memory of the server hosting this blog usually fit into the 512MB I initially gave it, the minute something heavy hit the server the performance tanked. And many of the performance events were correlated between servers. The Linux guest VM was running updatedb at the exact same time each night as the Linux host system! Both were using the same default crontab schedule. No wonder they were clashing and hitting timeouts during that period on many nights. I improved that whole area by just giving the one Linux VM 2GB of RAM, so swapping was much less likely even during things like updatedb. That change made the server feel much more responsive during concurrent busy periods on other hosts. You can’t just tune servers for average performance; you have to think about their worst case periods too.

The post Tuning Disk Timeouts on Virtual Machines appeared first on High Performance PostgreSQL.

Magnus Hagander: PostgreSQL conference registration updates

$
0
0

Right now we're hard at work settling the last details for PostgreSQL Conference Europe 2013 in Dublin, Ireland. But for those of you who wish to attend, you have an even closer deadline to consider - to qualify for the discounted Early Bird rate, you must complete your registration before September 16th, only a few days away! This is your best chance to learn about a large umber of PostgresSQL topics, from case studies to deep technical sessions about backend engineering. So take you chance and go register now!

In other conference related news, next week is Postgres Open in Chicago. I'll be there along with many other PostgreSQL contributors, to deliver a set of presentations almost as good as the one in Dublin. There are still some tickets left - why not go to both conferences!

Jan Mussler: Old school database access using stored procedures

$
0
0

Sproc Wrapper

A few weeks ago we introduced PGObserver, hinting at the broad use of stored procedures for accessing our data. Today, we will get into a bit more detail about how and why we have chosen a different route back then and continue to use stored procedures. To our new readers, for the Zalando E-commerce Operating System (ZEOS) platform we chose PostgreSQL to store our important data, spanning from customer order to article meta data. We did so, because back then we trusted, and still do so, into this great open source database, for its performance, reliability, and flexibility provided by a large feature set including stored procedures in multiple languages.

The decision to use stored procedures (SProcs) was motivated by different aspects, including performance benefits and explicit control over queries. Using stored procedures reduces the number of queries issued by our Java application and lessens the amount of data transferred between our database and application. Routing all access to underlying data through our API also provides us with the necessary abstraction to change the data structure and layout between releases. Additionally the API layer combined with limited privileges provides additional safety and control over changes in the database. Stored procedures also enable us to make some last minute changes to the live environment due to their easy and fast deployment. One last advantage I want to mention, is that stored procedures give you all queries before you deploy, which is great for reviewing and performance analysis. There are situations, where stored procedures require a lot of additional work and yield less benefits, e.g., CRUD heavy applications with lots of fields, where we have opened up the JPA / EclipseLink combo, with some extensions for PostgreSQL specifics, as enums and array fields.

Looking back, using stored procedures was writing a lot of boiler plate code, creating single classes for every procedure, writing type mappers from database result to Java objects, writing annotations for input parameters and so on. But all this changed, first two of our colleagues created the so called “typemapper” that took care of reading PostgreSQL type information, reading Java annotations and combining this to map stored procedure results to Java objects. This was a big improvement, a lot of code was removed, there was not any mapper code to write any more, and therefore the development became less error prone to manual mapping errors.

Setting the goal higher, we wanted to write even less code and make using stored procedures in our sharded environment comfortable, thus we implemented the “SProc Wrapper” for executing stored procedures. Basically, you define a Java method in an interface, use the proper annotations and from there the SProc wrapper takes over, deducing type information and so on to correctly execute the right database procedure, fetch the results, and map them back to the return type of the functions. This brings down the amount of code to write for a single stored procedure to just a few lines (one in the interface, and three more to be honest in the implementation). Further the SProc wrapper gives you features to run a procedure on a set of shards, select the shard automatically from “shard key” fields, and “aggregate” in a sense of concatenating distinct results into one result set. All of this proved really useful, due to our extensive use of sharding.

And now one very basic example, first the PostgreSQL function, supposing you are using 9.2 or higher:

CREATE FUNCTION compute_product(a int, b int) RETURNS bigint AS
$$
  SELECT a * b;
$$ LANGUAGE "sql";

And finally the Java code:

@SProcService
interface BasicExample {
  @SProcCall
  long computeProduct(@SProcParam int a, @SProcParam int b);
}

There is a bit more work involved setting up a data source and so on, but this example gives you a good impression of how little code is necessary in Java for any particular function.

And now the interesting part: you can find this on github.com/zalando/java-sproc-wrapper, try it and tell us what you think! Or wait until we go into more details in our follow up… :)

The post Old school database access using stored procedures appeared first on Zalando TechBlog.

Chris Travers: PostgreSQL, Community Development, and Support

$
0
0
With the impressive release of PostgreSQL 9.3 I have noticed that a number of journalists seem to only mention a single provider of support.    I decided to write a different sort of article here discussing the PostgreSQL commercial support offerings and how these fit into development.  Please note that the same basic approach applies to LedgerSMB as well, as we have done our best to emulate them.

This is exactly how not to run an open source project if you want it to be a cash cow for a single business but it is how to run an open source project if you want it to reach as large of a user base as possible and provide as many economic opportunities as possible.

PostgreSQL is a community developed, multi-vendor project.  Vendors come and go, but the community endures.  Many vendors who used to contribute to the project no longer do so but there are a number of mainstays.  This article was written in September of 2013, and if you are reading it years later, please be aware there may be additional sources of support available.

Because PostgreSQL is developed by multiple vendors working together, in theory any vendor which employs competent programmers can fix bugs, offer hot fixes, and more for clients, and can, reasonably, if the patches are of good quality, get them accepted upstream.  This is an extremely important selling point for the database management system.

There are several long-standing companies in the community which offer support accounts on the database itself.  This is on top of vendors like Red Hat who offer high quality support with their OS service level agreements.

This list provided here is largely for journalists and others who wish to discuss PostgreSQL support.  It is by no means exhaustive nor is it intended to be.  Support is available in various markets through other companies as well and one of our tasks as a community is to create a larger amount of support and consulting services, serving a larger variety of markets.  This is a strength of the community development model (as opposed to the vendor development model).

In the interest of full disclosure, I am a principal consultant for 2ndQuadrant, and I have worked with folks from Command Prompt, PGExperts, and some other companies on various projects.  Some aspects of what I say here come from something of an insider's perspective.

  1. 2ndQuadrant offers high quality 24x7 support delivered by support engineers which include actual contributors to the software.  Some of their support offerings offer guarantees not found by the vendors of proprietary databases.  I say this as a former employee of Microsoft's Product Support Services division.
  2. Command Prompt. Inc offers service level agreements which ensure quite a bit of proactive assistance.  The firm is one of the long-standing mainstays of the PostgreSQL scene.
  3. PGExperts offers a number of services aimed at ensuring support for critical production environments.
  4. EnterpriseDB offers support for the official version of PostgreSQL, as well as their own proprietary spinoff, "Postgres Plus Advanced Server."  Their proprietary version has a number of features aimed at smoother migration from Oracle, although it is sometimes mistaken for an "enterprise edition" of PostgreSQL.

In the end, this model of support is a selling point of the software.  Unlike with Oracle, the companies which provide support have to serve the customer's needs because otherwise the customer can go elsewhere.

PostgreSQL is used in a large number of critical production capabilities where the ability to call someone for support, and get a very competent second set of eyes when things go wrong is absolutely necessary, and the companies above provide that.  But the companies listed go further, and are able to support the software as if they were the vendor (or likely even better).

David Fetter: Libreadline on OSX/Homebrew

Dimitri Fontaine: PostgreSQL data recovery

$
0
0

The following story is only interesting to read if you like it when bad things happen, or if you don't have a trustworthy backup policy in place. By trustworthy I mean that each backup you take must be tested with a test recovery job. Only tested backups will prove useful when you need them. So go read our Backup and Restore documentation chapter then learn how to setup Barman for handling physical backups and Point In Time Recovery. Get back when you have proper backups, including recovery testing in place. We are waiting for you. Back? Ok, let's see how bad you can end up without backups, and how to still recover. With luck.

Set a trustworthy backup solution, and review your policy

Did I mention that a trustworthy backup solution includes automated testing of your ability to recover from any and all backup you've been taking? That might be more important than you think.

This article is going to be quite long. Prepare yourself a cup of your favorite beaverage. TL;DR PostgreSQL resilience, flexibility and tolerance to bad situations is quite remarkable and allowed us to get some data back in the middle of a destroyed cluster.

The Setup

Most of the customers I visit have already laid out a backup strategy and implemented it. Most of them did implement it with custom in-house scripts. They hire high-skilled engineers who have been doing system administration for more than a decade, and who are more than able to throw a shell script at the problem.

Shell scripting must in hindsight be one of the most difficult things to do right, given how many times it turns around doing something else entirely than what the program author though it would. If you want another one of my quite bulk advices, stop doing any shell scripting today: a shell is a nice interactive tool, if you are doing non-interactive scripting, that's actually called system programming and you deserve a better tool than that.

My take: shell script makes it harder to write production quality code.

In our very case, the customer did realize that a production setup had been approved and was running live before any backup solution was in place. Think about it for a minute. If you don't have tested backups in place, it's not production ready.

Well, the incident was taken seriously, and the usual backup scripts deployed as soon as possible. Of course, the shell scripts depended in non-obvious ways on some parameters (environment variables, database connections, database setup with special configuration tables and rows). And being a shell script, not much verification that the setup was complete had been implemented, you see.

The Horror Story

And guess what the first thing that backup script is doing? Of course, making sure enough place is available on the file system to handle the next backup. That's usually done by applying a retention policy and first removing backups that are too old given said policy. And this script too did exactly that.

The problem is that, as some of you already guessed (yes, I see those smiles trying to hide brains thinking as fast as possible to decide if the same thing could happen to you too), well, the script configuration had not been done before entering production. So the script ran without setup, and without much checking, began making bytes available. By removing any file more than 5 days old. Right. In. $PGDATA.

But recently modified are still there, right?

Exactly, not all the files of the database system had been removed. Surely something can be done to recover data from a very small number of important tables? Let's now switch to the present tense and see about it.

Can you spell data loss?

Remember, there's no backups. The archive_command is set though, so that's a first track to consider. After that, what we can do is try to start PostgreSQL on a copy of the remaining $PGDATA and massage it until it allows us to COPY the data out.

The desperate PITR

The WAL Archive is starting at the file 000000010000000000000009, which makes it unusable without a corresponding base backup, which we don't have. Well, unless maybe if we tweak the system. We need to first edit the system identifier, then reset the system to only begin replaying at the file we do have. With some luck...

A broken clock is still right twice a day, a broken backup never is...

Time to try our luck here:

$ export PGDATA=/some/place
$ initdb
$ hexedit $PGDATA/global/pg_control
$ pg_controldata
$ xlogdump /archives/000000010000000000000009
$ pg_resetxlog -f -l 1,9,19 -x 2126 -o 16667 $PGDATA
$ cat > $PGDATA/recovery.conf <<EOF
restore_command = 'gunzip -c /archives/%f.gz > "%p"'
EOF
$ pg_ctl start

Using the transaction data we get from reading the first archive log file we have with xlogdump then using pg_resetxlog and thus accepting to maybe lose some more data, we still can't start the system in archive recovery mode, because the system identifier is not the same in the WAL files and in the system's pg_controldata output.

So we tweak our fresh cluster to match, by changing the first 8 bytes of the control file, paying attention to the byte order here. As I already had a Common Lisp REPL open on my laptop, the easier for me to switch from decimal representation of the database system identifier was so:

(format nil "~,,' ,2:x" 5923145491842547187)
"52 33 3D 71 52 3B 3D F3"

Paying attention to the byte order means that you need to edit the control file's first 8 bytes in reverse: F3 3D 3B 52 71 3D 33 52. But in our case, no massaging seems to allow PostgreSQL to read from the archives we have.

On to massaging what is remaining in the old cluster then.

The Promotion

I'm usually not doing promotion in such a prominent way, but I clearly solved the situation thanks to my colleagues from the 24/7 support at 2ndQuadrant, with a special mention to Andres Freund for inspiration and tricks:

We also know how to recover your data, but we first insist in proper backups

Oh, did I mention about proper backups and how you need to have been successfully testing them before you can call a service in production or have any hope about your recovery abilities? I wasn't sure I did...

Playing fast and loose with PostgreSQL

The damaged cluster is not starting, for lack of important meta-data kind-of files. First thing missing is pg_filenode.map in the global directory. Using xlogdump it should be possible to recover just this file if it's been changed in the WAL archives we have, but that's not the case.

Trying to salvage a damage case

pg_filenode.map

As this file is only used for shared relations and some bootstraping situation (you can't read current pg_class file node from pg_class, as the file mapping is the information you need to know which file to read), and knowing that the version on disk was older than 5 days on a cluster recently put into production, we can allow ourselves trying something: copy the pg_filenode.map from another fresh cluster.

My understanding is that this file only changes when doing heavy maintenance on system tables, like CLUSTER or VACUUM FULL, which apparently didn't get done here.

By the way, here's one of those tricks I learnt in this exercise. You can read the second and fourth columns as filenames in the same directory:

od -j 8 -N $((512-8-8)) -td4 < $PGDATA/global/pg_filenode.map

So copying default pg_filenode.map allowed us to pass that error and get to the next.

pg_clog

Next is the lack of some pg_clog files. That's a little tricky because those binary files contain the commit log information and are used to quickly decide if recent transactions are still in flight, or committed already, or have been rolled back. We can easily trick the system and declare that all transaction older than 5 days (remember the bug in the cleanup script was about that, right?) have in fact been committed. A commit in the CLOG is a 01 value, and in a single byte we can stuff as many as 4 transactions' status.

Here's how to create those file from scratch, once you've noticed that 01010101 is in fact the ascii code for the letter U.

(code-char #b01010101)
#\U

So to create a series of clog file where all transactions have been committed, so that we can see the data, we can use the following command line:

for c in 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C
do
    dd if=/dev/zero bs=256k count=1 | tr '\0' 'U' > $c
done

pg_database

The next step we are confronted to is that PostgreSQL has lost its baking files for the pg_database relation and has no idea what are those directories in $PGDATA/base supposed to be all about. We only have the numbers!

That said, the customer still had an history of the commands used to install the database server, so knew in which order the databases where created. So we had an OID to name mapping. How to apply it?

Well pg_database is a shared catalog and the underlying file apparently isn't that easy to hack around, so the easiest solution is to actually hack the CREATE DATABASE command and have it accepts a WITH OIDS option ( OIDS is already a PostgreSQL keyword, OID is not, and we're not going to introduce new keywords just for that particular patch).

Equiped with that hacked version of PostgreSQL it's then possible to use the new command and create the databases we need with the OIDS we know.

Those OIDS are then to be found on-disk in the file where pg_database is internally stored, and we can ask the system where that file is:

select oid, relname, pg_relation_filenode(oid)
  from pg_class
 where relname = 'pg_database';
 oid  |   relname   | pg_relation_filenode 
------+-------------+----------------------
 1262 | pg_database |                12319
(1 row)

Then without surprise we can see:

$ strings $PGDATA/global/12319
postgres
template0
template1

Once that file is copied over to the (running, as it happened) damaged cluster, it's then possible to actually open a connection to a database. And that's pretty impressive. But suddenly it didn't work anymore...

Sytem Indexes

This problem was fun to diagnose. The first psql call would be fine, but the second one would always complain with an error you might have never seen in the field. I sure didn't before.

FATAL:  database "dbname" does not exists
DETAIL:  Database OID 17838 now seems to belong to "otherdbname"

Part of PostgreSQL startup is building up some caches, and for that it's using indexes. And we might have made a mistake, or the index is corrupted, but apparently there's a mismatch somewhere.

But your now all-time favourite development team knew that would happen to you and is very careful that any feature included in the software is able to bootstrap itself without using any indexes. Or that in bad situations the system knows how to resist the lack of those indexes by turning the feature off, which is the case for Event Triggers for example, as you can see in the commit cd3413ec3683918c9cb9cfb39ae5b2c32f231e8b.

Another kind of indexing system

So yes, it is indeed possible to start PostgreSQL and have that marvellous production ready system avoid any system indexes, for dealing with cases where you have reasons to think those are corrupted... or plain missing.

$ pg_ctl start -o "-P"
$ cat > $PGDATA/postgresql.conf <<EOF
	enable_indexscan = off
	enable_bitmapscan = off
	enable_indexonlyscan = off
EOF
$ pg_ctl reload

While at it, we edit the postgresq.conf and adjust some index usage related settings, as you can see, because this problem will certain happen outside of the system indexes.

If you're not using (only) PostgreSQL as your database system of choice, now is the time to check that you can actually start those other systems when their internal indexes are corrupted or missing, by the way. I think that tells a lot about the readiness of the system for production usage, and the attitude of the developpers towards what happens in the field.

Also note that with PostgreSQL it's then possible to rebuild those system indexes using the REINDEX command.

So we now have a running PostgreSQL service, servicing the data that still is available. Well, not quite, We have a PostgreSQL service that accepts to start and allows connections to a specific database.

pg_proc, pg_operator, pg_cast, pg_aggregate, pg_amop and others

The first query I did try on the new database was against pg_class to get details about the available tables. The psql command line tool is doing a large number of queries in order to serve the \d output, the \dt one is usable in our case.

To know what queries are sent to the server by psql commands use the \set ECHO_HIDDEN toggle.

About any query is now complaining that the target database is missing files. To understand which file it is, I used the following query in a fresh cluster. The following example is about an error message where base/16384/12062 is missing:

select oid, relname, pg_relation_filenode(oid)
  from pg_class
 where pg_relation_filenode(oid) = 12062;
 oid  | relname | pg_relation_filenode 
------+---------+----------------------
 1255 | pg_proc |                12062
(1 row)

In our specific case, no extensions were used. Check that before taking action here, or at least make sure that the tables you want to try and recover data from are not using extensions, that would make things so much more complex.

Here we can just use default settings for most of the system catalogs: we are using the same set of functions, operators, casts, aggregates etc as any other 9.2 system, so we can directly use files created by initdb and copy them over where the error message leads.

pg_namespace

Some error messages are about things we should definitely not ignore. The content of the pg_namespace relation was lost on about all our databases, and the application here were using non default schema.

To recover from that situation, we need to better understand how this relation is actually stored:

# select oid, * from pg_namespace;
  oid  |      nspname       | nspowner |        nspacl        
-------+--------------------+----------+----------------------
    99 | pg_toast           |       10 | 
 11222 | pg_temp_1          |       10 | 
 11223 | pg_toast_temp_1    |       10 | 
    11 | pg_catalog         |       10 | {dim=UC/dim,=U/dim}
  2200 | public             |       10 | {dim=UC/dim,=UC/dim}
 11755 | information_schema |       10 | {dim=UC/dim,=U/dim}
(6 rows)

# copy pg_namespace to stdout with oids;
99	pg_toast	10	\N
11222	pg_temp_1	10	\N
11223	pg_toast_temp_1	10	\N
11	pg_catalog	10	{dim=UC/dim,=U/dim}
2200	public	10	{dim=UC/dim,=UC/dim}
11755	information_schema	10	{dim=UC/dim,=U/dim}

So it's pretty easy here, actually, when you make the right connections: let's import a default pg_namespace file then append to it thanks to COPY IN, being quite careful about using tabs (well, unless you use the delimiter option of course):

# copy pg_namespace from stdin with oids;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 16443	my_namespace	10	\N
>> \.

And now there's a new schema in there with the OID we want. Wait, how do we figure out the OID we need?

# select c.oid, relname, relnamespace, nspname
    from pg_class c left join pg_namespace n on n.oid = c.relnamespace
   where relname = 'bar';
  oid  | relname | relnamespace | nspname 
-------+---------+--------------+---------
 16446 | bar     |        16443 | 
(1 row)

So in the result of that query we have no nspname, but we happen to know that the table bar is supposed to be in the schema my_namespace.

And believe it or not, that method actually allows you to create a schema in a database in a running cluster. We directly are editing the catalog files and editing even the OID of the rows we are injecting.

The reason we couldn't do that with pg_database, if you're wondering about that, is that pg_database is a shared catalog and part of the bootstrapping, so that it was impossible to start PostgreSQL until we fix it, and the only implementation of COPY we have requires a running PostgreSQL instance.

pg_attributes and pg_attrdef

So now we are able to actually refer to the right relation in a SQL command, we should be able to dump its content right? Well, it so happens that in some case it's ok and in some cases it's not.

We are very lucky in that exercise in that pg_attribute is not missing. We might have been able to rebuild it thanks to some pg_upgrade implementation detail by forcing the OID of the next table to be created and then issuing the right command, as given by pg_dump. By the way, did I mention about backups? and automated recovery tests?

We need the data attributes

In some cases though, we are missing the pg_attrdef relation, wholesale. That relation is used for default expressions attached to columns, as we can see in the following example, taken on a working database server:

# \d a
                         Table "public.a"
 Column |  Type   |                   Modifiers                    
--------+---------+------------------------------------------------
 id     | integer | not null default nextval('a_id_seq'::regclass)
 f1     | text    | 
Indexes:
    "a_pkey" PRIMARY KEY, btree (id)

#  select adrelid, adnum, adsrc
     from pg_attrdef
    where adrelid = 'public.a'::regclass;
 adrelid | adnum |             adsrc             
---------+-------+-------------------------------
   16411 |     1 | nextval('a_id_seq'::regclass)
(1 row)

# select attnum, atthasdef
    from pg_attribute
   where     attrelid = 'public.a'::regclass
         and atthasdef;
 attnum | atthasdef 
--------+-----------
      1 | t
(1 row)

We need to remember that the goal here is to salvage some data out of an installation where lots is missing, it's not at all about being able to ever use that system again. Given that, what we can do here is just ignore the default expression of the columns, by directly updating the catalogs:

# update pg_attribute
     set atthasdef = false
   where attrelid = 'my_namespace.bar';

COPY the data out! now!

At this point we are now able to actually run the COPY command to store the interesting data into a plain file, that is going to be usable on another system for analysis.

Not every relation from the get go, mind you, sometime some default catalogs are still missing, but in that instance of the data recovery we were able to replace all the missing pieces of the puzzle by just copying the underlying files as we did in the previous section.

Conclusion

Really, PostgreSQL once again surprises me by its flexibility and resilience. After having tried quite hard to kill it dead, it was still possible to actually rebuild the cluster into shape piecemeal and get the interesting data back.

I should mention, maybe, that with a proper production setup including a Continuous Archiving and Point-in-Time Recovery solution such as pgbarman, walmgr, OmniPITR or PITRtools; the recovery would have been really simple.

Using an already made solution is often better because they don't just include backup support, but also recovery support. You don't want to figure out recovery at the time you need it, and you don't want to have to discover if your backup really is good enough to be recovered from at the time you need it either. You should be testing your backups, and the only test that counts is a recovery.

It's even one of those rare cases where using PostgreSQL replication would have been a solution: the removing of the files did happen without PostgreSQL involved, it didn't know that was happening and wouldn't have replicated that to the standby.

Hans-Juergen Schoenig: Monitoring: Keeping an eye on old transactions

$
0
0
To handle transactions PostgreSQL uses a mechanism called MVCC (Multi Version Concurrency Control). The core idea of this machinery is to allow the storage engine to keep more than just one version of the row. What does it mean and why is that so? Let us consider a simple example: BEGIN; UPDATE foo SET bar […]

Jignesh Shah: PostgreSQL replication? I need temp tables for my reports

$
0
0
One of the frequent things I hear that many times  PostgreSQL users avoid running PostgreSQL replication  because they want to offload reports that use temporary tables on the slaves. Since PostgreSQL replicas are pure read only, it cannot support temporary tables.

There is one way to overcome this with postgres_fdw - PostgreSQL Foreign Data Wrapper which are improved in PostgreSQL 9.3 which is now released.
  • Create a Master-Slave setup with PostgreSQL 9.3 (with synchronous or asynchronous replication as per your needs).
  • On the slave setup, setup another PostgreSQL 9.3 instance (with different port) with postgres_fdw and map all tables from slaves as foreign tables with same names as their remote counterparts.
  • Run reports which requires temporary tables using this new instance
Of course there are few caveats for this setup
  • Query plans: Currently they are still inefficient but as postgres_fdw improves, this will likely go away. Infact more usage of this use-case scenario will force it to be improved
  • Lot of data moving: Most DW reports do read lot of rows. However by setting it up on the same server most of it are loopback and dont go on the wire outside. 
  • More Resources: This will do require more memory/cpu on the server but it is still cheaper since the management of such a server is still more simpler compared to other complex designs to achieve the same goal
I would like to hear about your experiences on the same too so feel free to send me comments.

Jim Mlodgenski: HadoopFDW

$
0
0

With the release of PostgreSQL 9.3, it let’s us do some really cool things with writable foreign tables. BigSQL just release a Hadoop Foreign Data Wrapper that is writable into HDFS files and Hbase tables. The Hbase integration allows for full SELECT, INSERT, UPDATE and DELETE syntax through PostgreSQL and the HDFS integration allows for SELECT and INSERT.

The HadoopFDW is released under the PostgreSQL license and can be found here.

Viewing all 9758 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>