Storing sessions with Drupal

Drupal logoWhen developing a Web app with PHP, especially when using a framework or a CMS, most often you just don’t care about sessions. Less frequently, you store and access some information from a session, accessing it via language constructs (like “singleton” object (e.g. ZendSession  in Zend framework), some special wrapper functions or just $_SESSION  global). More often, though, you don’t bother monitoring how and where sessions are stored. But sometimes you should.

The basic purpose of this post is to talk about how, when developing a high-load website, to allow users to log on and access some personalized information. When you have hundreds of thousands hits per day, you should expect that tens of thousands users will log in to their respective accounts. That’s when you need to start caring about session storage.

Default storage in PHP

By default PHP stores your users’ sessions in text files in a directory. By default the setting for session.save_path  is “”, which points to your system’s temporary files folder. When you have tons of simultaneous sessions, this directory will contain loads of small session files plus whatever else was put in there.

The direct consequence of this is that whenever a user browses your website (and passes your page cache), the respective PHP process will look through this directory to search for the session file for this user. The more sessions we store (a function of the number of users, session lifetime, and garbage collection timeout), the more time this takes, especially for magnetic volumes (SSD volume will scale up some, of course), sooner or later leading to an increasing number of “hanging” PHP processes waiting for disk read. A disaster, performance-wise.

Another case is when scaling up in a cluster. If you want all of your webheads to access session information, you will need to point session.save_path  to a network-shared location, which might not be an efficient solution.

3rd party storage

Definitely, the third party pay-solutions are out there. Services like Gigya offer you fast, scalable authentication and session storage, with lots of other services and by-products. Basically, the idea is to use JavaScript libraries to personalize pages based on your site’s output plus the user’s information, stored on a third-party website.

The downsides are: the cost of the service provider and the fact that your users’ information is stored somewhere else. The latter might not be an issue, unless it involves legal issues or you need to use the information actively in your website’s routines. In this case, accessing their information via the service’s API might be both a challenge and a performance bottleneck.

Memcached storage

Yeah, memcaching is so much faster! And enabling memcache-stored sessions is a matter of only a couple of rows. For instance, if you use the Memcache Storage module for Drupal (which I highly recommend over the analogs for speed and usability) it is a matter of a single line in your settings.php file:

Note: You can add Memcached as a session back-end by manipulating PHP ini variables, but I highly discourage you from doing so. First, depending on which PECL extension you use – Memcache or Memcached – you might get a crappy session locking mechanism resulting in losing users’ information. Second, using a contrib backend include, you are sure to store sessions separately and be able to purge cache and not touch sessions.

Memcached is a good solution, but not a perfect one. The basic limitations are the total amount of session information you have to store and the size of a single information chunk. The memcached bucket has a limited size, so whenever it is full the older records get expelled on new writes. You cannot guarantee the lifetime of the session for your users. And you may never guarantee the integrity of information stored in Memcached – with all the consequences.

The chunk size is also an issue as there are a limited number of slots for huge information chunks in each memcached bucket. So if you happen to store large tables in your users’ sessions, then you simply cannot use memcached.

MongoDB storage

As one of the most popular and most beautifully built document-based DBs, MongoDB is a huge boost in terms of sessions. Drupal now has a reasonably good contrib integration, MongoDB module, dealing with cache, field, session, block, logs, and queue storage. Since it is still in RC2 status, I would not risk using it for all the functionality, but session storage extension proved to be quite reliable and triggered quite easily:

First thing that you gain here is no size limitations and data durability (the MongoDB binlog will guarantee data integrity if you need that, with a negligible disk-write footprint). The downside is that it is a bit slower than Memcached, and you will need to think and configure your MongoDB server quite carefully to prevent write lockout phenomena on your high load website.

Redis storage

Redis is a powerful key-value store, with almost magical speed and data persistence abilities. However, Drupal integration for this backend is not that developed. The Session Proxy abstraction layer (RC1) allows adding Redis as a session backend (as described here), but I still have had no chance to test it in “battle conditions.” This will be a subject for another blog post!

(P.S. I intentionally did not include the option to store sessions in a relational database; this is a resource-expensive practice, not suitable for the base case of this article.)