Vijay Ramachandran ([info]vijayr) wrote,
@ 2009-05-05 14:11:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Scaling software
I recently went through the very interesting description of Facebook's photo hosting stack. It reminded me, yet again, how scaling web software is both simple (in concept) and difficult (to practice) at the same time. Here's a simple conceptual framework to follow which will solve an estimated 80% of web scaling problems:

  1. Identify your bottleneck. Is it cpu, disk, memory, network bandwidth?
  2. Fix it!
  3. Repeat

:)

Some tips while fixing bottlenecks -

  • Memory access is orders of magnitude faster than disk access.
  • Your memory should be enough to hold your working data set (which typically follows extreme power laws - say, 99-1 - compared to actual data set), so its important to determine your working data set size.
  • Design your system to be able to scale various layers independently.
  • Watch out if you've configured your web server (apache is the one that I use) to be able to use too many server processes. Context switching and swapping will kill you under high load
  • Do high load tasks as infrequently as you can. For instance, load some back lookup table once at startup time. Or, open a MySQL database once per http request, and reuse it for all db access during that request.


Of course, there are probably lots of unique cases which can't be solved so easily conceptually.

More resources - Cal Henderson - ex-flickr, yahoo - has a great presentation on web scalability. There are lots of others available as well.

Comments welcome!
Of course, there are probably lots of unique cases which can't be solved so easily conceptually.



(1 comment) - (Post a new comment)

Scale v/s performance
[info]anomalizer
2009-05-06 05:57 pm UTC (link)
One naive but well intentioned mistake I see is people make (myself included, multiple times over) getting obessed with trying to increase the performance of things as opposed to increasing scalability. Compare a solution 'a' that runs 1000x faster than solution 'b'. Also assume that 'a' is built on the fundamental assumption that it will run on one machine but 'b' can be made to run on a farm of machine and there exists a way to distribute the workload. Getting 'a' to take 10 times more load will be a nightmare but getting 'b' to take 10 times more load per node will be easier and more importantly with increase the overall system capacity far more easily.

This is something analogous to asymptotic complexity of algorthims where constants are ignored. An algothirm with a running complexity of 10000n2 is still better than one with 0.23n3 since eventually one will supercede the other.

The difference of course, is when you run a startup for solution 'a' might be the right thing to do until you actually start saturating it since you won't be able to afford the cost of 'b' from day #1. I found myself at this inflection point when I got started with my current gig and am finding myself at another such inflection point today. What is obvious is that if we could not have afforded (in every sense of the word) to do a year ago what we want to do going forward.

(Reply to this)


(1 comment) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…