Spolsky: Today is the day that we did not launch, although we planned to. But then... We'll wait for another week.Atwood:
Yeah, well, the good news on that is that we did actually figure out what that problem was.
Spolsky: Oh, oh, I want to hear, I want to hear, I want to hear.Atwood:
Eh. So, it was a third party library. Indirectly, I mean. It's the third party library, and our particular use of it. It was Log4Net.
We were logging in such a way that the log.... during the log call was triggering another log call. Which is normally okay, but with the load that we have, eventually they would happen so close together that there's also a lock. So, there's two locks going on there. There's a lock of like disposing of the database stuff that's going on. Then there's lock of like actually writing to a file...
Spolsky: Hm! Atwood:
And... Huh... They happen in the opposite order, so it's like a classic deadlock, right. So, you release the lock on the database, then you release the lock on the file. And than the other call was doing in the other order. And they were happenning so fast that... it was deadlocking eventually. And it was one of those things that would happen.. like... it was very [intermittent [??]
Spolsky: Right. Atwood:
So we had to dust-off Win Debug.
Spolsky: How on Earth do you find things like that? Atwood:
Well. You bust that Win Debug. One nice feature in Windows 2008, and I think this is in Vista as well. In Task Manager, you can right-click a task and take a dump of it.
Spolsky: Yeah! Atwood:
Like right there.
Spolsky: Aha. Atwood:
So we took a dump of the W3Service process, and...
Spolsky: [Then [??]] take a dump. Atwood:
Yeah, I know, any time you do this it's like.. It's like the teritorry for jokes. It's just...
Spolsky: [giggles] Atwood:
[laughs]. And then we loaded up a... Win... Debug
Spolsky: Windbg! Yeah Atwood:
...and then some .Net managed extensions you can, sort of load. You need like a chi-chi to figure out what the commands are. And then you load the dump, and you load the manage tools. And then you can sort of just investigate all of the threads. You can take the "Show me all the manage threads". And then say "Show me what's the call stack was for that thread". And what we saw was like tons and tons of threads that were all going "Hey, I would like to log something...". And it was like "Hmmmm... [laughs]... Interesting!". Right, you have like 80 threads that all try to write something to the log. So... Right then we kind of knew where the problem was.
And then somebody on Tweeter actually volutneered to help us diagnose the dump. So I put it up on our server, and he a.... he nailed... he had a great description of it, like line by line, blow by blow of exactly what was happening. I mean, I'm... I'm compenent enought to sort of figure out roughly what was going on, but he really knew this stuff and really helped us out, and I do appreciate that.
Spolsky: Today is the day we did not launch although we planned to. But then we wait another week. Atwood:
Atwood: Yeah, well, the good news on that is that we did actually figured out what that problem was.
Spolsky: Oh I wanna hear, I wanna hear, I wanna hear.
Atwood: Eh.. so it was a third party library, indirectly, ?? library and our particular use of it. It was Log4Net
Spolsky:: We were logging in such a way that the log during the log call was triggering another log call which is normally ok. Atwood:
But with the load we have eventually they would happen so close together that there's also a lock, so there's two locks going on there. So there's a lock of like disposing of the database stuff that's going on. Then there's a lock of like actually writing to the file, ...and they happen in opposite order, so it's like a classic dead lock, right? So you release the lock on the database, then you release lock on the file and then the other call was doing it in the other order. And they were happening so fast that it was dead-locking eventually.