You might not think you’re interested in a long technical blog post about scalability at YouTube… and you might be right… but don’t be so sure. You never know where you’re going to find useful ideas—hard-won ways of thinking that can apply in domains well beyond the one where they were born. I think this post has several ideas like that, and the best is jitter.:
- If your system doesn’t jitter then you get thundering herds. Distributed applications are really weather systems. Debugging them is as deterministic as predicting the weather. Jitter introduces more randomness because surprisingly, things tend to stack up.
- For example, cache expirations. For a popular video they cache things as best they can. The most popular video they might cache for 24 hours. If everything expires at one time then every machine will calculate the expiration at the same time. This creates a thundering herd.
- By jittering you are saying randomly expire between 18–30 hours. That prevents things from stacking up. They use this all over the place. Systems have a tendency to self synchronize as operations line up and try to destroy themselves. Fascinating to watch. You get slow disk system on one machine and everybody is waiting on a request so all of a sudden all these other requests on all these other machines are completely synchronized. This happens when you have many machines and you have many events. Each one actually removes entropy from the system so you have to add some back in.
What would it mean to add jitter in other domains? Maybe it would mean publishing things at odd hours. Maybe it would mean shaking up your own habits. (I think there’s evidence that exercise and diet both benefit from jitter.) What else?