Quantcast
Channel: Planet PostgreSQL
Viewing all articles
Browse latest Browse all 9654

Josh Berkus: Red Hat Kernel cache clearing issue

$
0
0
Recently, mega-real-estate sales site Tigerlead called us with a very strange problem.  One of their dedicated PostgreSQL servers refused to use most of its available RAM, forcing the system to read from disk.  Given that the database was 60GB in size and the server had 96GB of RAM, this was a painful performance degradation.

Output of free -m:

             total       used       free     shared    buffers     cached
Mem: 96741 50318 46422 0 21 44160
-/+ buffers/cache: 6136 90605
Swap: 90111 3 90107
 
As you can see here, the system is only using half the free memory for cache, and leaving the other half free.  This would be normal behavior if only half the cache were needed, but IOstat also showed  numerous and frequent reads from disk, resulting in IOwaits for user queries.  Still, there could be other explanations for that.

So, I tried forcing a cache fill by doing a pgdump.  This caused the cache to mostly fill free memory -- but then Linux aggressively cleared the cache, again getting it down to around 40GB of cache within a few minutes.  This seemed to be the case no matter what we did, including tinkering with the vm parameters, increasing the size of the swap file, and changing shared_buffers.  This was highly peculiar; it was as if Linux was convinced that we had half as much RAM as we did.

What fixed the problem was changing the kernel version.  It turns out that kernel
2.6.32-71.29.1.el6.x86_64, released by Red Hat during a routine update, has some kind of cache management issue which can't be fixed in user space.  Fortunately, they now have a later kernel version out as an update.

Before:

[root ~]# free -g
             total       used       free     shared    buffers     cached
Mem: 94 24 70 0 0 19

[root ~]# uname -a
Linux server1.company.com 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon
Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

After:

[root ~]# free -g
total used free shared buffers cached
Mem: 94 87 6 0 0 83

[root ~]# uname -a
Linux server1.company.com 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue
Feb 14 04:00:16 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux

That's more like it!   Thanks to Andrew Kerr of Tigerlead for helping figure this issue out.

I don't know if other Linux distributors released the same kernel with any routine update.  I haven't seen this behavior (yet) with Ubuntu, Debian, or SuSE.  If you see it, please report it in the comments, or better to the appropriate mailing list.


Viewing all articles
Browse latest Browse all 9654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>