One day (recently) at work, high traffic asked for five new web servers in our existing ASP.NET Web Farm. Setup, turned on and polished, IT ops discretely told the load balancer about the fresh fish.
But as soon as some of the traffic hit the five new boxes, we monitored an explosion of lost session exceptions in the event log. And it wasn’t even limited to the five noob servers - the entire farm was on fire!
Flabbergasted by the event, IT ops banished the deadly five from the farm and things calmed down immediately. After hours of investigating, fantasizing about race conditions, .NET, IIS, Windows bugs and debugging in a test farm, we finally found
Turns out this very old article described our problem exactly:
The five new web servers had a slightly different IIS application path (
/LM/W3SVC/12/ROOT) configured than the rest of the bunch (
/LM/W3SVC/11/ROOT). Since the Stateserver combines the ASP.NET Session ID with the IIS application path to create a unique key, sessions issued for one of the five new webs could not be found when accessed through one of the other webs which is obviously extremely unfortunate in a weighted round robin load balanced web farm. Nevermind the the old article’s Applies To section, our setting was IIS6 and .NET 3.5 (32bit).
Although the above incident temporarily smashed our dreams of scalability, it was awesome to come up with and discard hypotheses (our penultimate idea was a race condition) until we eventually acquired this microscopic piece of ASP.NET/IIS knowledge that none of us will ever forget :-)