The Tipping Point
In PostgreSQL 9.1 and earlier, benchmarks that I and others did all showed that the optimal number of active database connections was usually somewhere around ((2 * core_count) + effective_spindle_count). Above this number, both throughput and latency got worse. In every version since then the tipping point has moved, but the effect is still present at some point. In graphs you often see this visually with Transactions Per Second up the y axis and Concurrency (i.e., number of active connections) across the x axis, with a steep climb followed by a "knee" and a performance drop-off. The good news is that every major release for a while has moved the knee to the right and decreased the slope past the knee -- but the knee is still there.
Users and Database Connections are Different Things
Sometimes people will say "I want to support 2000 users, with fast response time." It is pretty much guaranteed that if you try to do that with 2000 actual database connections, performance will be horrible. If you have a machine with a lot of cores and the active data set is fully cached, you will see much better performance for those 2000 users by funnelling the requests through a small number database connections -- depending on the PostgreSQL version it may be anywhere from 2 to maybe 10 times the number of cores.
To understand why that is true, this thought experiment should help. Consider a hypothetical database server machine with only one resource to share -- a single core. This core will time-slice equally among all concurrent requests with no overhead. Let's say 100 requests all come in at the same moment, each of which needs one second of CPU time. The core works on all of them, time-slicing among them until they all finish 100 seconds later. Now consider what happens if you put a connection pool in front which will accept 100 client connections but make only one request at a time to the database server, putting any requests which arrive while the connection is busy into a queue. Now when 100 requests arrive at the same time, one client gets a response in 1 second; another gets a response in 2 seconds, and the last client gets a response in 100 seconds. Nobody had to wait longer to get a response, throughput is the same, but the average latency is 50.5 seconds rather than 100 seconds.
A real database server has more resources which can be used in parallel, but the same principle holds, once they are saturated, you only hurt things by adding more concurrent database requests. It is actually worse than the above thought experiment, because with more tasks you have more task switches, increased contention for locks and cache, L2 and L3 cache line contention, and many other issues which cut into both throughput and latency. On top of that, while a high
work_mem
setting can help a query in a number of ways, that setting is the limit per plan node for each connection, so with a large number of connections you need to leave this very small to avoid flushing cache or causing swapping; a smaller work_mem
setting, in turn, leads to the choice of slower plans or slower run times for the same plans from such things as hash tables spilling to disk.Some database products effectively build a connection pool or some form of request queuing into the server, but the PostgreSQL community has taken the position that since the best connection pooling is done closer to the client software, they will leave it to the users to manage this. Most poolers will have some way to limit the database connections to a hard number, while allowing more concurrent client requests than that, queuing requests as necessary. This is what you want, and it should be done on a transactionalbasis, not per statement or connection. Care must be taken to handle session properties correctly, and in some cases this poses a barrier to using the ideal type of connection pooling; in such cases it is still a good idea to find ways to keep the number of database connections as small as practical, using whatever techniques are available.
Where's the Beef?
One other analogy may help point to how a connection pooler can help -- consider a butcher shop with a counter, behind which are four butchers. If any butcher is idle when a customer walks in, a butcher will immediately offer to help that customer -- no problem. Now, if it is rush hour and 20 customers walk in, you can either have them take numbers once all the butchers are busy, or you can have a mad free-for-all at the counter. A butcher is slicing a quantity of meat for one customer and another comes up and demands he gets some attention, so the butcher sets aside the first person's order and starts working on the second person's order. Then a third person comes up and that butcher puts a third order into process. As the customers vie for attention, each butcher switches from one job to another to keep all of them from feeling their respective orders are being neglected. Some customers might get neglected by chance long enough to see new customers enter the shop, get served and leave -- without yet seeing the completion of their own, smaller order. Of course, there would be overhead to keeping track of the various orders and switching among them repeatedly, but even without that customers would be waiting longer, on average, than if the shop put in a "take a number" system.
Wrap-Up
At some point PostgreSQL may add a built-in connection pool or an admission control mechanism which can queue a request to start a database transaction if the number of active transactions is at some configurable limit. If that ever happens, this could be simpler from the application side. Until then, it is often possible to handle more users with better performance by using a client-side connection pool (e.g., Apache dbcp) or an external connection pool (e.g., pgbouncer).