Having given many talks about Postgres administration over the years, I have recently felt convicted that I have not given sufficient coverage to external monitoring tools. I have always discussed how to get information out of Postgres, but not how to efficiently process that information. I have updated my administration presentation, but I would like to go into more detail here to atone for my previous lack of coverage.
- Alerting: For alerting, it is tough to beat check_postgres and tail_n_mail. Check_postgres runs checks on many aspects of Postgres, reporting the results via text output, Nagios, or MTRG. Tail_n_mail scans for important messages in the server logs and emails them.
- Analysis: There are a lot of analysis tools out there. Nagios is great for distributing alerts, while Munin and Zabbix are good for graphing. In fact, some people use Nagios for alerting and Munin for graphing. Cacti and MTRG are also popular. This blog entry from 2009 compares several popular analysis tools.
- Queries: pgFouine ("fouine" is French for weasel) is an often-overlooked tool for analyzing SQL query traffic to find the slowest, most frequent, and queries that took the most cumulative execution time (sample report). It analyzes the queries by reading the server logs.
- Commercial: For commercially-developed tools, there is Circonus (open source version, Reconnoiter), Postgres Enterprise Manager (strong Postgres integration), and Hyperic (both open and closed source versions).
Postgres is good at generating voluminous output suitable for monitoring. External monitoring tools help to make that information useful for administrators.