Query optimization versus caching

Today I wanted to look at the relative merits of different optimization paths that can be taken on a Greenfield project. That is to say, one that has no constraints imposed on it by previous decisions made, and has received little to no optimization work to date.
Specifically, the two optimizations I wanted to compare are optimizing mysql versus caching. I should point out in advance that these optimizations are really orthogonal. The only thing that ties you to doing one versus the other is that they both consume the resource developer time.

Optimizing MySQL

This optimization typically starts by taking a look at the queries that are being sent to MySQL and running EXPLAIN over them. With some investigation it’s frequently common to add an index or make a small tweak to schema.
Advantages:

  1. An optimized query is usually fast for all users accessing the application. Since indexes cut-through data via logarithmic search (aka divide and conquer like you would search a phone book), they also sustain performance somewhat with data growth. A cache masking an unindexed query can sometimes perform even worse as the data grows. With growth, users who don’t hit the cache may have such poor performance the application is not useable.
  2. Optimizing MySQL does not require you to worry about invalidating cache or serving stale data from a cache.
  3. Optimizing MySQL can keep the technology stack simpler, which makes it slightly easier to replicate and work with in development environments.

Disadvantages:
1. Some queries can not be improved via just indexes and may require schema changes which can be difficult to retrofit into some applications.
2. Some schema changes may be for de-normalization (duplication of data). While this is a common technique for DBAs, it needs ownership to make sure that all places are updated by the application, or triggers are installed to guarantee such changes.
3. Some optimizations may be MySQL-specific. That is to say if the underlying software is shipped to work on multiple databases, it is difficult to justify some of the more complex optimizations that go beyond adding indexes.

Adding a cache

This optimization requires one to profile the application, and move expensive processing away from MySQL and into a third-party cache such as memcached or Redis.
Advantages:
1. This works really well when the application has expensive queries that are difficult to optimize with MySQL. For example: large aggregate/GROUP BY queries.
2. Caching can be a good retrofit to increase throughput of the system. i.e. when slowness is a result of many people accessing the application at once.
3. Caching may sit on top of another application easier. For example: your application may just be a front-end to another software package that stores data in MySQL. It might be very difficult to make any database changes to that other application.
Disadvantages:
1. If the data has many access patterns to being served (i.e. appears on many different pages in different formats), then invalidating the cache on update may be difficult and/or require stale data to be served. An alternative to this, is to store data in a more fine-grained cache. This has its own disadvantages, such as added latency from many cache fetches.
2. Caching an object that is expensive to generate may create lurking performance cliffs for users that miss the cache (see Optimizing MySQL advantage #1). Good performance practice suggests that you should try and tighten the variance between users, not just the average (as caching tends to do).
3. Naive caching implementations suffer from subtle bugs, such as cache stampedes. Just last week I helped someone who’s database server was brought down by multiple users trying to regenerate the same cache contents at once. The correct solution would be to introduce some level of locking to serialize the cache regeneration.

Conclusion

I typically recommend users take a look at optimizing MySQL first, as I see this as initially the most elegant solution. But long term most applications do have a use-case for implementing some level of both approaches.