Podcast 020Revision #44, 9/5/2008 3:49 AMUser: "Correct spelling of Geoff Dalgas, minor formatting" Tags: (None) Previous Next |
Podcast 020Revision #45, 9/5/2008 5:14 AM67.68.57.109: "Added from [32:52] to [36:00]" Tags: (None) Previous Next |
---|---|
Ads, intro [1:00] Spolsky: Today is the day that we did not launch, although we planned to. But then... We'll wait for another week. Atwood: Yeah, well, the good news on that is that we did actually figure out what that problem was.Spolsky: Oh, oh, I want to hear, I want to hear, I want to hear. Atwood: Eh. So, it was a third party library. Indirectly, I mean. It's the third party library, and our particular use of it. It was Log4Net.Spolsky: Oh! Atwood: We were logging in such a way that the log.... during the log call was triggering another log call. Which is normally okay, but with the load that we have, eventually they would happen so close together that there's also a lock. So, there's two locks going on there. There's a lock of like disposing of the database stuff that's going on. Then there's lock of like actually writing to a file...Spolsky: Hm! Atwood: And... Huh... They happen in the opposite order, so it's like a classic deadlock, right. So, you release the lock on the database, then you release the lock on the file. And then the other call was doing in the other order. And they were happening so fast that... it was deadlocking eventually. And it was one of those things that would happen.. like... it was very intermittent, right.Spolsky: Right. Atwood: So we had to dust-off Win Debug.Spolsky: How on Earth do you find things like that? Atwood: Well. You bust out Win Debug. One nice feature in Windows 2008, and I think this is in Vista as well. In Task Manager, you can right-click a task and take a dump of it.Spolsky: Yeah! Atwood: Like right there.Spolsky: Aha. Atwood: So we took a dump of the W3Service process, and...Spolsky: Ha ha, take a dump. Atwood: Yeah, I know, any time you do this it's like.. It's like the territory for jokes. It's just...Spolsky: [giggles] Atwood: [laughs]. And then we loaded up a... Win... Debug.Spolsky: Windbg! Yeah Atwood: ...and then some .Net managed extensions you can, sort of load. You need like a chi-chi to figure out what the commands are. And then you load the dump, and you load the manage tools. And then you can sort of just investigate all of the threads. You can take the "Show me all the managed threads." And then say "Show me what's the call stack was for that thread." And what we saw was like tons and tons of threads that were all going "Hey, I would like to log something..." And it was like "Hmmmm... [laughs]... Interesting!" Right, you have like 80 threads that all try to write something to the log. So... Right then we kind of knew where the problem was.And then somebody on Twitter actually volunteered to help us diagnose the dump. So I put it up on our server, and he a.... he nailed... he had a great description of it, like line by line, blow by blow of exactly what was happening. I mean, I'm... I'm competent enough to sort of figure out roughly what was going on, but he really knew this stuff and really helped us out, and I do appreciate that. [3:33] Spolsky: That's really awesome. Spolsky: I'd never .. no ... I never do... I nev... But you know I don't... I don't think I've ever worked on code that is sort of operational in the same way. Atwood: ah hm. Spolsky: Eh.. because we definitely eh.. put a lot more ... oh you know, you know what, I did, at Juno we used to have all kinds of logging. The trouble is that my philosophy has always been that you .. you.. you have a tendency to wanna log everything. But then you just get logs that are, you know, a hundred megabyte per user and you get thirty of them a minute and it can't possibly be analyzed or stored in any reasonable way. So the next thing you have to do is to start culling your logs or just have different levels of debugging, where it's like in high debug mode everything is logged and in low debug mode nothing is logged. And... it's kind of hard to figure out what you really want in a log. You you know you know .. a lot of logs, like I think of the logging that we did in Juno, where people would call with a complaint and you try to figure out where this program is crashing. And obviously a log of the crash, that's easy. Ehm, but then there's some line above the crash which hopefully gives you a lot of information about where it happened. And there's some line you don't see that should have been after that, after the crash, but it never got there 'cause it crashed sometime before there. And essentially what you're doing as you're adding logging, is you're doing binary search, right, where you're sticking in like "well gosh, I got to here and then got to there. But there's an awful lot of code between point A and point B. So let's make an A you know half-way from A to B, log point of some sort". Then you put that in and then you eliminate 50 percent of the possible places to look for your crash. Um, but I've never really been able to... Atwood: I mean that, ironically, to troubleshoot this hang, which turned out to be because of logging, we were adding more logging. Spolsky: [laughs] Atwood: The joke just writes itself! The joke just writes itself, right... Spolsky: It does... How many... How many third-party tools do you have... uhh... How many third-party tools are a part of the StackOverflow code base? Atwood: Well, okay, so... [chuckles] Uh, Dare [pronounces it as the English word "dare"] Obasanjo [pronounces it "oh-bih-san-ho"]... I don't know if I'm pronouncing it correctly. Spolsky: Okay, "Dare" [pronounces it "daray"]... Obasanjo [pronounces it "oh-bih-san-ja"]... It's "Dare." Atwood: Is it "Dare"? Spolsky: Yep. Atwood: Really... Okay, I didn't know that. Well, I've learned something. But he had a whole blog entry about how, you know, I had chosen to write my own sanitizer, and that was a very deliberate choice for me... Spolsky: Mm hmm. Atwood: ...for a number of reasons that I won't get into. But he was very critical of this, because, of course there were bugs in the sanitizer... Spolsky: Mm hmm. Atwood: ...which there were going to be, and to me, it's about, like, it's about your velocity; it's not about where you are; it's about where you're going, and we're gonna fix that stuff, right, and I'm making the sanitizer public as well, so other people can have a sanitizer that's not ten thousand lines of code, and ridiculous, and uh, so there's a philosophy there of building something that's reusable for everyone. Um, but I thought it was ironic, because he was talking about how developers should just pick a third-party library and go with it, and I think obvio... it's a balancing act, because we picked this logging library, right, which kind of caused a problem for us, right, I mean partially it was the way we were using it, but the way it was locking the files was a design issue in terms of the way Log4Net works. Spolsky: Right. Atwood: So I... I think it's a trade-off. I don't think it's always as clear-cut as "you should always pick a library" or "you should never pick a library," right? I think there's always some in-between there. So, for us, I'm definitely a minimalist—I don't like third-party libraries; I feel like we have a giant third-party library called "Windows," called ".NET"... huh... ASP.NET MVC is technically a third-party library. Um, but these are, you know, major vendor stacks. And I do feel like—as much as we talk about open source and stuff—there's a certain level of quality you associate with these major first-party stacks, right, whether it's from Apple or Microsoft or Sun or whoever. That may or may not be true, but hopefully usually is true: that these things are really heavily tested. Spolsky: There is definitely, yeah, there is definitely... I mean, there's something I've learned over the years, and, you know, I started out with working on the Excel team, um... The developers on that team had a motto, which was "Find the dependencies and eliminate them." You know, they had their own compiler; they would not use untested libraries from other groups at Microsoft even... Atwood: I love that they had their own compiler. That is so hardcore. I can't even, like, I could not even hang out with those guys... right... that hardcore. Spolsky: Hey, well, we have our own compiler, man. Atwood: Yeah... Spolsky: Let me tell you why they had their own compiler: They had their own compiler because Excel was getting huge, and just compiled 8086 was just too large to fit on floppy disks and to fit in memory. You know, we were really trying to cram things in there. And so they developed a pcode compiler, which basically... you know, it's like bytecode. They called it pcode. This is a very old technique, and it compiled Excel into an imaginary machine, a virtual machine, which was a lot more expressive that an 8086, and had all kinds of additional features, and so the compiled code is about one-third the size, and in a lot of situations this made the performance a lot faster. So, for example, in those days when almost everybody was running programs off of floppy disks, the chances... Or no, not floppy, but the 3.5-inch, not-so-floppy disks. But the read time on those things is really really slow, so if you could launch your app—if your app was smaller at the time that you read it from disk—it didn't matter if it ran a little bit slower. The whole... the overall experience would be a lot faster. So if you could fit in memory without swapping, then obviously the whole thing would run faster, so it was worth doing this pcode thing for a long time, and about the time of Excel 5.0, the bit flipped on that and it suddenly became... suddenly everybody had hard drives, and nobody really cared about the size of the executable, and it was okay to have about a, I think, a four-megabyte executable instead of a one-megabyte executable, and so they got rid of that pcode back-end. But even then I think they had their own compiler for a while because in order to right really really efficient code, they wanted to be able to control... oh this is a long story. [9:39] Spolsky: but a pointer on an 80386... for a while the 80386 was the target. On an 80386, a pointer consisted of (or even on the 8086 in general) a pointer consisted of two parts, the segment and the offset. So it's like "where do you want to start your pointer?" and then "what's your offset inside there?". And you couldn't just indirectly say "here's my pointer, just do something with this." You could, finally in 32-bit clean mode, but we didn't have that. What you had to do, was there was this thing called the segment register, and you loaded the segment register saying, "From now on my pointers which are 16 bits are going to be offset from this particular point. And the very loading of that segment register would cause all kinds of operating system traps to get executed and all kinds of interesting things to happen and it was a very, very slow operation. So if you were doing any kind of pointer manipulation, which you were cause it was C and everything was pointer manipulation, you wanted to load that segment register as infrequently as possible because that was a very, very expensive operation. And chances are, you're doing a whole bunch of pointer operations that are all in the same segment - for a while at least. You really want to be able to just load it once and then maybe do your next 20 operations with that as the base register. So all the Excel code had this assumption that they could do that - it gave them the ability to do that. And that made it just screamingly fast compared to the competition. I mean, I remember when Borland came out with Paradox for Windows, and they did not take this into account and they just used - it was C++ code, so they really had no choice but to use pointers for all their methods because it was all virtual tables and C++ objects. The net result was that they just used these pointers naively, pretending that the top 16 bits - every time they wanted to use a pointer, they reloaded the segment register, and that just made this app really, really slow. I mean, it took 90 seconds to start. You know Excel could launch in 10 or 15 seconds. Atwood: Wow. Spolsky: So this was this feature that they eventually got added to the regular Microsoft C compiler called base pointers, and I think then they stopped using their own compiler. But their philosophy was really not to trust anybody and to have control over everything so that there's some hope they can get it to work without having an external dependency. You know, I've sort of taken this with me a long ways, and every time I've failed to do that I've tended to regret it. Every time we've put outside technology into Fog Bugz we've regretted it. There's a lot of these excellent components, and they are really great components, that are made by vendors like .Net components and they're widgets like the cool calendar widget dropdown that you put into your web page and all that kind of stuff. And inevitably what I've found is that they are good enough for enterprise code, like internal apps that you're using at the insurance company, and they're just never good enough for the kind of app you want to ship that has to be perfect. Somehow there is something that's not commercial quality about them. You know it's fine if there's 20 people using it, and they're all using it the same way, it allows you to in fifteen seconds put a calendar dropdown into something. But then you'll get to some customer who says, "You know we don't start our week's on Sunday in my country". And you'll say, "Oh", and you'll find out this library doesn't have that feature. As a hypothetical example. [12:57] Atwood: Right! Spolsky: Which I ... Atwood: I believe that's one advantage of some of the web stuff is because everything is just public facing by default. You don't have sort of this, this internal development ghetto effect. Like... 'cuz I... we have... I mean, to be fair, any talk of dependencies, we have tons of dependencies, right? It's just a question of what dependencies you wanna take, like, I mean, jQuerry is a dependency, right? Spolsky: Hhm. Atwood: We're using the WMD control, that's a dependency. Spolsky: Hhm. Atwood: There's these little add-ons for jQuery that... Spolsky: But you know what, if there was a bug in... if you found a bug in jQuery, you would just go, edit the source and you would be shipping your own private version of jQuerry and problem solved. And it wouldn't be ideal, but at least you wouldn't be screwed. [13:32] Atwood: Right ... We have actually done that and let me actually give an example. So the WMD editor has a bug with international keyboards. Like there's no way we would have found this because we don't use international keyboards but obviously some of the people that use StackOverflow do and Spolsky: Sure Atwood: They were, I think, understandably very annoyed because they couldn't enter like right bracket. Which is like an important key particularly in markdown. That's one of the key, like one of the delimiters you use sometimes. Spolsky: Yeah, the... Atwood: Um. And they actually, huh, the prob...I'm still trying to get the source from from the authors so we don't actually have the source. What we have is well ob...not...I guess obfuscated is not the right word but minified javascript where they compress it down so all the variables are "a", "b", "c" and things like that. So it's not exactly fun code to look at anymore [laughs]. Spolsky: Yeah [14:12] Atwood: But somebody actually went through and actually found a little work-around and I feel bad because when they posted this I didn't realize that they actually found a work-around so I was like "Oh that's interesting" and I just didn't come back to it. But then somebody was complaining that this bug had stayed open for like three weeks and so I finally then went and then put in that fix. So you're right, having the source is great, because you can fix little problems that you run into and when you pick up components from the web (if you're talking about javascript) they're all sort of...um...you get the source by definition. I think that's actually one of the grea...I had a blog entry about this. That's one of the great strengths of the web. Everything is essentially open source by default, I mean, if you're curious about what Google is doing on you remember when uh you know maps came out and everyone's like "oooh". You know, maps, it's all this innovative zoom-in, zoom-out technology. Spolsky: Yeah [14:57] Atwood: You could just view source and if-if you were, you know, motivated enough, you could figure it out, right? There wasn't like an executible that you had to decompile or anything like that. So I feel Spolsky: [garbled - interrupting] Flash Atwood: Go ahead Spolsky: Unless it's Flash all flash. I mean like yahoo maps is all flash and you can't figure out what they're doing. Atwood: Oh, right. Right. Right. Right. Right. Well, that gets back into the whole we call it a rectangle problem - the browser where you have this alien rectangle [laughs] that lives in another universe and it pokes a hole into your dimension and then like this crazy stuff comes through and yeah. So it...it...on a related note, let's close out the topic. So the struggling with the deadlock put us back I would say at least four or five days. So in order to have a smooth landing, there's also a couple of features that I really desperately want us to get in before we open to the public like say a captcha for [laughs] I think that's kinda important when we go live. Um. So adding like a week to the schedule really helps us to have a smooth landing. I mean we could launch on the third. I mean we honestly could, but it would be a little desperate. We would be really flailing, fixing things at the last minute. [15:55] Spolsky: Yeah, no. We're in no rush, we can take another week. I thought thatthe end of August plan was a little bit ambitious. I think we're both in the same position of really being on the fence, as to whether, I don't want to say on the fence but it's sort of a close call between we want to do the Hollywood launch, going back to last week talking about Aarons Swartz's thing, whether we want to do the Hollywood launch where everyone hits us at once and the world comes to an end versus the Gmail style launch where we just start taking a thousand people a day or give out invites or something to at least have some kind of control over the rate at which people come in. Atwood: You know what I like now, what my philosophy of this has gone towards? It's almost like dating, where you don't really want to seem needy. If it comes up in the conversation, "Hey we have this website stackoverflow", if it's contextual, then talk about it. But maybe we won't have a whole post saying "Hey we're launching a new site called stackoverflow'. Maybe not even do that. Sounds very counter-intuitive but just bring it up in the context of things you're discussing. Because already in Twitter and in email I'll want to reference things in stackoverflow because I have a problem or I found something interesting. It's just a natural side-effect of conversation that I have with someone. And to me it's completely organic, it's the way it's supposed to be. And that would maybe solve the problem of how we launch, like maybe people would find out about it organically as we have these conversations without us going "hey look at this new thing, poke poke, go over here and look at this new thing." It's just a thought, but I'm totally open to that. Spolsky: Yeah. We have quite a finite number of people who listen to ourpodcast and read out blogs anyway, so they're going to find out. Atwood: Right. But the site is very sticky and very social, too. Along those same lines I just emailed Joel today. We know we've succeeded already in beta, and do you know how I know? Because we have a whole blog dedicated to hating stackoverflow on the Internet. So you know you're successful when that happens, it's like a stamp of approval. Spolsky: It's not even public!Atwood: I know. It's a huge success! If there are people who hate you and it's not even public then you're tremendously successful. Spolsky: These people they're going to the backlash state before we even got tothe hype stage. Come on you guys! Backlash comes after the hype, that's why it's called backlash. Atwood: And a funny thing that happened on the blog too. The way we secure stackoverflow is somewhat intentionally naive. Spolsky: For the beta.Atwood: Right, for the beta. The site is not supposed to be secure at all, it's supposed to be totally public, even in the sense that you can just walk up and type stuff in, literally, that's what the site is like. So securing it is just completely counter to everything the site does, and we even struggled to secure the site initially. How do you secure a site that's not designed to be secure? Do you want to write tons and tons of code around authentication? So, the minimal solution we have is basically a very simple cookie-based solution. And I love on this particular blog he found that out (I presume it's a he, it's always a guy), and he's like "look how lame their security is, they totally don't understand how cookies work. They don't understand security at all. You're going to trust them to build a website?" It just made me laugh because it very much missed the point of that whole thing. Spolsky: Don't even respond. Why are we even talking about them?Atwood: I know, I know. Some of the criticism is actually grounded. If there's something useful that comes out of it I will use it and I will respond to it. And it's not vitriol yet. He says it's a blog about flaming but it's actually somewhat reasonable. As long as it stays reasonable I have no problem responding to it. I'm not going to point it out or list the URL or anything like that, but we absolutely we are listening. Spolsky: blogging-harmful.blogspot.com. Complete waste of time, but you know ifwe get people to care about us whether it's positive or negative that means people care about us. Atwood: Exactly. That's my point. If nobody cares, that's the real loss. Spolsky: That's the real failure, exactly. If you can't get anyone to care oneway or another about what you've done. For example, this website blogging-harmful.blogspot.com is going to disappear without a trace. Even though I promoted it on the podcast it's going to make it all the more painful when nobody... [20:26] Atwood: But people who do something like that, they don't want attention or anything, they're not feeding on it at all, they're not interested in things like attention at all. The work is it's own reward, it doesn't matter if anyone is looking. Have you seen that thing on the ... I meant to blog about this but the whole concept of just, not looking at things, to basically discourage them. Or that, conversely, looking at things encourages them, like the whole Paris Hilton thing and just talking about these things over and over incessantly actually reinforces the whole trend. There was a series of children's books I don't know if you've heard them, they're called: "The Great Brain". It's set in Utah - Spolsky: Yeah Atwood: - at the turn of the century. I got these books as a kid and I was totally obsessed with them, because the Great Brain is all about a family. I don't remember the family's name, but there is one central character J.D. who is the Great Brain. Essentially he's always thinking up ways to, essentially social engineering before we had that word in computer circles. Basically getting people to do what you want them to do, completely of their own volition. The Great Brain is basically this genius of a kid, who is using all these social engineering exploits to get away with all this crazy stuff. In that family, if they found out the Great Brain was doing this stuff then of course he would get punished, but the ultimate penalty was what they called the silent treatment. The silent treatment meant that nobody would talk to you or acknowledge you for a certain period of time. They would give you food and stuff, but they wouldn't talk at you. It was just stunning in the book, you don't really think about this stuff as a kid (I was like 10 or so), how desperate it is as a person, as a social being, when nobody will acknowledge you. How profoundly affecting that is, right? Even the Great Brain, as a smart kid, hated the silent treatment and would do anything he could to avoid getting the silent treatment because it was just such a brutal penalty. I remember Jason Kottke talking about an episode of The Simpsons where these animated statues came to life, and the way they got rid of them was they started chanting "Just Don't Look!" Spolsky: (laughs) Atwood: This is led by Lisa Simpson, who says "Just don't look at them and they'll go away!" Spolsky: (laughs) Atwood: Its amazing how powerful that philosophy is. If there are things happening you don't like just don't talk about it or don't give it any attention, and look at the things that you actually care about and actually want (to have happen). Spolsky: Thats right. That was like the Clinton administration's policy on Rwanda. Atwood: (uncertain chuckle) Well thats the issue of social injustices, which I think is a little bit different. Spolsky: There are definitely people who are attention seeking, but you know what? That the thing about trolls, "don't feed the trolls" or whatever. You know what, trolls are doing a great service to the Internet, they're making it entertaining and interesting. Atwood: Absolutely. There's definitely been, and I think I referred to it in the previous podcast, "youthful experimentation" at stackoverflow. We're still in fact seeing that. Spolsky: (chuckles) Atwood: We're still seeing people just try stuff to push buttons and see what they can get away with. Spolsky: One of the things that will happen, and it happened with IRC all the time, when people get addicted to a social technology or something. They're addicted to the site, the love it, they come and they answer all the questions, but at some point there is just not enough of the main content that everybody is enjoying for them to entertain themselves. They have to find ways to entertain themselves for the next three hours after they've spent the 8 hours doing the regular thing they were supposed to be doing. So on IRC they would start these little flame-bot wars and they would write little bots to protect themselves against the wars, and they would try to cause splits so that they could take over somebody's nickname, and that kind of stuff. They weren't doing the kind of stuff that IRC was for, which was chatting, they'd taken it to the next level because they ran out of people to chat with and that became boring. So they started kind of attacking the system itself. Atwood: Right, and we're definitely looking at that. I mentioned that CAPTCHA thing, and thats the next piece of the puzzle that has to go in. We have rate limiting mechanisms. One of the early things that happened to us was - Spolsky: Flooding. Atwood: - somebody wrote a bot that would just revise posts every minute to keep it on the top of the stack. Actually there is a certain amount of people doing that still, which I'm trying to discourage. They way I like to discourage things is, where possible, creating rules in the system that make that behavior not desirable. Not negative, necessarily, but things happen that make it not worth much to you. So let me give you a specific example in that scenario. You have the user who's just editing their own posts every three hours so that its always on the top of the stack. We have this concept now, its actually implemented though I talked about it in previous podcasts, of the community owned post. Because one of the great divides in stackoverflow is that we have this ownership system where you get voted up and down, your content gets voted up and down, and that affects your reputation. You own stuff, so when you post something you own it. Then you contrast that with the Wikipedia model which is that nobody seems to own it, and we're trying to do both of those things. At the transition point we came up with a couple rules. The initial rule I had was that edits by four different people will cause a post to switch from being owned by Joel, for example, to being owned by the community user. At that point you don't lose any reputation that you got up to that point, but any future upvotes on that content don't go to anybody they go to the content. I think this is the way it should be. Ultimately you're voting on the content more than the person anyway, so hopefully people are ok with this. Seeing that people kept editing stuff over and over I bent the rules a little bit and said ok, if you edit your own thing more than N times then it also becomes a community owned post. There is no real value to the user, in terms of getting additional reputation, to bump stuff up to the top of the stack anymore because if you edit your own thing enough you won't get any reputation from it. It behooves you to only edit it once, or however many times you need to edit it but hopefully no more than once, and just let it sit there and have people find it organically and naturally the way it is supposed to happen. Spolsky: That will happen more when we're open to Google for searches. I think right now one of the problems with stackoverflow that some people have been experiencing is that they ask a question, and its a little too esoteric to get a response right away, and then it disappears from view for a while. Once the site has a much larger critical mass of people and is searchable by Google, those questions will naturally have people come to them so they won't need to try these little tricks to get them in front of people again to get an answer. Atwood: I also try to copy a lot of things I've seen online that have been successful, like conventions. Let me give you a specific example: PHP-BB, and I'm sure there are other web discussion boards that do this too but PHP-BB is the one I know, has this editing convention. When you post you can edit your own posts, and I noticed that when people are using PHP-BB that right after you post something you'll always notice some goofy mistake that you made, like immediately. This happens to me 9 times out of ten I'll post and think "oh, I should have talked about this" or "I missed that word," so you immediately go in and edit. At a certain threshold these are not treated as real edits, they're treated as just going back in time to pretend that it is the post you originally made. It doesn't kick off the whole auditing trail of you having edited it 50 times. One of the first things we did in stackoverflow is actually implement that. I remember talking to Geoff Dalgas about that, and he's like "why do we have to have this?" I said, "you don't understand, this feature has to be in there on day 1 otherwise we're going to have so many revisions that are just in the first minute or two after posting and are just silly little things that are being corrected." [28:20] Spolsky: You not actually recording the revision? You're not doing the diff thing? Atwood: [Not] within the threshold. Right now the threshold is actually 5 minutes. So up to 5 minutes after you post, if you edit your own stuff. Now if I go edit it, it's a real revision. Spolsky: Now...that's OK, except that you sort of run the risk that the historical record is...if somebody posts something and it's seen as being... Atwood: Sure, if the threshold is like 2 hours, right? Spolsky: ...5 minutes. Let's say I ask the question "Is Jeff Atwood smart?" (laughs) And then you reply saying "Yes! Absolutely. Definitely." And then I go back and change "smart" to "stupid" one second later. "Ah-hah...you said you were stupid!" And there's no track record. Not only that, but everybody can say "There's supposed to be an audit trail here. Let's look at the audit trail...oh, look! It wasn't edited!" Atwood: Right. No, that's true. That does leave you vulnerable. And I think that's where you come to playing with the rules, and seeing what happens. Another example of setting the rules...you can't always anticipate all the side effects of the rules you're going to set. And you don't always get the behavior that you wanted out of the rules that you set either. You have to be very, very careful. So when I instituted the rule of 5 edits by the owner causing a post to go to community mode, the next day somebody complained about that. Actually this was today. Somebody posted a thread on "What are your favorite developer magazines?" And because it's not in my opinion a great question, because it really can't be answered and it's very subjective. It's not a great question is the bottom line. So it got edited a lot, immediately. People started editing the tags. Somebody changed the title. The net effect was within 45 seconds of this guy posting it, it was in community mode. And he was kind of complaining about that [saying] "Look, I've had no chance to get any rep from this post at all." I kind of agreed--although again, not a great question. Spolsky: You shouldn't get any credit for a not great question. Atwood: You could argue that this is totally correct behavior. I can see where he's coming from, and one thing I thought of was having a certain time window where...it's four edits by four different people, but it has to be more than one hour. Or whatever time interval. Spolsky: Ugh. Too much tweaking. You know what I would say? Just tell people "Listen, if you want to ask one of these poll questions like 'What's your favorite developer magazine?', that would be a good question if you phrased it in the following way: 'What is your single most favorite developer magazine?'" And instead of just replying, see if someone else has already given the answer, and vote it up. And then what you get is a ranked listing of everybody's favorite developer magazines, in order from most favorite to least favorite. But if you say "What is your favorite magazines?" and people are answering with three, and they're discussing some magazine that they subscribed to a long time ago and it's just chatter going on in there, the question becomes a mess and it just doesn't fit the Stack Overflow data model. Atwood: It doesn't. That's right. And I think that's the challenge, is also getting people to understand what is the Stack Overflow model. Spolsky: We can do polls, and we should do polls, for things like "Should I learn C? Yes/No" (laughs) There you go...everybody vote. Atwood: One advantage, and I haven't put this in yet, but you will be able to opt-in to the community mode so that you don't get any rep. So then you could post things that are effectively voting questions. Let's say we had "What is your favorite developer magazine?" And everybody that posted in it went into community mode immediately, all those votes would go towards the magazine, no you. I said "BYTE. BYTE is my favorite programmer magazine. It's a classic." So everybody that voted for that would not be voting for me, they'd be voting for BYTE. I wouldn't get any reputation from that at all. Spolsky: I could not program my Atari 800 without the great articles I've gotten from BYTE magazine. [Actually I had an Atari 800--back in 1981 or 1982, and while BYTE remains my all-time favorite programming magazine--especially in the early days before it went downhill, the magazine that helped me master the Atari 800 was Compute.] Atwood: (laughs) There you go. I think I've said this before, but I think it bears repeating. I really enjoy working on Stack Overflow--it's a really fun social experiment. Because we're getting useful stuff out of it for the most part, to me that makes it workwhile. You call that tweaking, but to me if it's a tweak that results in six months from now hundreds of much better questions then it's worth doing. [32:52] [57:56] Spolsky: [We will actually have] I'm in the process of making another movie. Did we talking about this on the podcast yet? Atwood: Oh really? You sent me a copy of Aardvark'd. I have seen that. Spolsky: That was our first movie. That was like a slightly serious reality TV show about the interns building Copilot. It was probably better than what on MTV. But the truth is the audience, the Joel on Software audience, really wanted something more technical and a lot of people wrote to me and said "You know, I really wanted to learn how software is developed at Fog Creek", and this movie was just a little bit too light on that content. This is going to be much more hardcore. SO it's really going to be about how software is developed at Fog Creek. Atwood: Ooh wow... Has this started? Spolsky: It's about a one year project. We've got the same film maker as we had last time, except I'm basically interviewing everybody at Fog Creek. You know, three or four times, to get footage for this thing. And then we'll splice it all together, and the current plan is to have several different formats of this movie. There'll be a lot of short form little piece, like four minutes, we can put up on YouTube. If you want to hear about source code control, or you want to learn about hiring programmers or interviewing programmers or the phone interview. They'll be these little pieces that you can just download freely on the internet or watch. And then there's a [sort of a] more substantial ninety minute version, which is probably too much to distribute over the internet. But it'll probably distribute in DVD form. Maybe by that time, there'll be a good way to distribute a [you know] ninety minute DVD length thing over the internet. At kind of the high end, and what will hopefully pay for this whole operation, will be a corporate training video. Maybe a five to six hour thing that a team might buy inside a corporation to learn how to do software development a little bit better, and it'll just go into much more depth. There'll be like a whole hour of training on how to conduct interviews for programmers and a whole hour on how to setup your tools and that kind of stuff. Atwood: Wow. So you're like Howard Stern. You're like the kind of the media now. You have like your movie. You've got your podcasts with me. You've got, you know, your blog, your not-a-blog. Spolsky: But I mean, you know, that was the original idea behind this podcast remember how I said, like, "There's some people that you can reach in certain channels, and they need to hear you. I don't listen to Eminem's music, blog I saw the movie 8-mile, or miles... whatever it was. Different people, you will reach in different channels. Part of the goal is to do that, and the other part of the goal is: I have an inherent inability to ever give up. And instead of just saying, "You know, we tried to make a movie, and it wasn't as good as it should have been, and so maybe we're not so good at making movies.", we just said "You know, let's learn from the experience. Let's try and make a two point oh, that's just much better." Atwood: I didn't really hear a lot of negative stuff about it. Spolsky: You know who liked it? Programmer's girlfriend, wives and families. Because there's nothing about it that's inaccessible to somebody who knows nothing about programming. And it really does convey to the normal person a lot about what a programmer's life is like. Just the sort of feel or being a programmer. The fact that programmers get into these conversations about things. You know that was probably, I thought, one of the highlights of the movie - when we told the interns that their job was to figure out if they could jump across to the ledge of the neighboring building if there was a fire. Could they jump out the window and make it on to the ledge of the neighboring building? And programmers love to have conversations like that, and they immediately leapt up to the white board and started drawing equations from physics and stuff like that, and they tried to work it out. And that's really what our conversations are like, what our lives are like. And to geeks, nobody knows--, if you're a geek you probably don't think anything is strange about this. But to a normal human being... this is very bizarre. That we talk that way, and that we think about these kind of things, and that we then need to solve the problem mathematically, to decide if you can make it to the neighboring ledge before you died. Atwood: Right... Well that's cool. I'm looking forward to that coming out. Looks like it'll be a while. I definitely recommend people who haven't seen it, check out Aardvark'd. I enjoyed it. I didn't think it lacked anything. Spolsky: The way to find that is ProjectAardvark.com, and there's link to the movie. ProjectAardvark.com is a blog that the interns kept that summer. It was a couple of years ago. [1:02:14] [1:02:50] Atwood: Two things; we have a wiki, for people who can't listen to this. Where people can contribute transcriptions of our incredibly boring podcasts, and we thank you very much for that. Although I do have one request for the transcriptionists, and the ironic thing is, you're going to transcribe this, which I think is hilarious. When you transcribe, don't write down every time I say 'uh' or pause or 'yeah'. Make me sound awesome, that's my one request for the transcriptionists. Spolsky: It doesn't have to be word for word. It doesn't necessarily read as well when it's word for word you can leave [out] 'uhms' and 'uhs'. Atwood: In fact, leave out whole. If you think it reads better a certain way, just make me say whatever makes the transcriptionists sound the most awesome. Spolsky: And it's wiki, go ahead and edit it. Atwood: People edit anyway, you're right, it's hilarious. I've been reading the [revisions], it's very funny. Atwood: The other thing is, if you do contribute to the wiki. Since our beta has been pushed back a week, this will get you in the same day, to the StackOverflow beta. If you want to be in, just email me after you've done a little bit of transcription, one minute or whatever you're comfortable with. If you want to get your question answered on the air, send a less than 90 second recording to podcast@stackoverflow.com, we will put it in the queue and hopefully answer it on the next podcast. Spolsky: Alright, that's it. Thank you very much, see you next week! Atwood: See you next week. [1:04:04] [Outro] | Ads, intro [1:00] Spolsky: Today is the day that we did not launch, although we planned to. But then... We'll wait for another week. Atwood: Yeah, well, the good news on that is that we did actually figure out what that problem was.Spolsky: Oh, oh, I want to hear, I want to hear, I want to hear. Atwood: Eh. So, it was a third party library. Indirectly, I mean. It's the third party library, and our particular use of it. It was Log4Net.Spolsky: Oh! Atwood: We were logging in such a way that the log.... during the log call was triggering another log call. Which is normally okay, but with the load that we have, eventually they would happen so close together that there's also a lock. So, there's two locks going on there. There's a lock of like disposing of the database stuff that's going on. Then there's lock of like actually writing to a file...Spolsky: Hm! Atwood: And... Huh... They happen in the opposite order, so it's like a classic deadlock, right. So, you release the lock on the database, then you release the lock on the file. And then the other call was doing in the other order. And they were happening so fast that... it was deadlocking eventually. And it was one of those things that would happen.. like... it was very intermittent, right.Spolsky: Right. Atwood: So we had to dust-off Win Debug.Spolsky: How on Earth do you find things like that? Atwood: Well. You bust out Win Debug. One nice feature in Windows 2008, and I think this is in Vista as well. In Task Manager, you can right-click a task and take a dump of it.Spolsky: Yeah! Atwood: Like right there.Spolsky: Aha. Atwood: So we took a dump of the W3Service process, and...Spolsky: Ha ha, take a dump. Atwood: Yeah, I know, any time you do this it's like.. It's like the territory for jokes. It's just...Spolsky: [giggles] Atwood: [laughs]. And then we loaded up a... Win... Debug.Spolsky: Windbg! Yeah Atwood: ...and then some .Net managed extensions you can, sort of load. You need like a chi-chi to figure out what the commands are. And then you load the dump, and you load the manage tools. And then you can sort of just investigate all of the threads. You can take the "Show me all the managed threads." And then say "Show me what's the call stack was for that thread." And what we saw was like tons and tons of threads that were all going "Hey, I would like to log something..." And it was like "Hmmmm... [laughs]... Interesting!" Right, you have like 80 threads that all try to write something to the log. So... Right then we kind of knew where the problem was.And then somebody on Twitter actually volunteered to help us diagnose the dump. So I put it up on our server, and he a.... he nailed... he had a great description of it, like line by line, blow by blow of exactly what was happening. I mean, I'm... I'm competent enough to sort of figure out roughly what was going on, but he really knew this stuff and really helped us out, and I do appreciate that. [3:33] Spolsky: That's really awesome. Spolsky: I'd never .. no ... I never do... I nev... But you know I don't... I don't think I've ever worked on code that is sort of operational in the same way. Atwood: ah hm. Spolsky: Eh.. because we definitely eh.. put a lot more ... oh you know, you know what, I did, at Juno we used to have all kinds of logging. The trouble is that my philosophy has always been that you .. you.. you have a tendency to wanna log everything. But then you just get logs that are, you know, a hundred megabyte per user and you get thirty of them a minute and it can't possibly be analyzed or stored in any reasonable way. So the next thing you have to do is to start culling your logs or just have different levels of debugging, where it's like in high debug mode everything is logged and in low debug mode nothing is logged. And... it's kind of hard to figure out what you really want in a log. You you know you know .. a lot of logs, like I think of the logging that we did in Juno, where people would call with a complaint and you try to figure out where this program is crashing. And obviously a log of the crash, that's easy. Ehm, but then there's some line above the crash which hopefully gives you a lot of information about where it happened. And there's some line you don't see that should have been after that, after the crash, but it never got there 'cause it crashed sometime before there. And essentially what you're doing as you're adding logging, is you're doing binary search, right, where you're sticking in like "well gosh, I got to here and then got to there. But there's an awful lot of code between point A and point B. So let's make an A you know half-way from A to B, log point of some sort". Then you put that in and then you eliminate 50 percent of the possible places to look for your crash. Um, but I've never really been able to... Atwood: I mean that, ironically, to troubleshoot this hang, which turned out to be because of logging, we were adding more logging. Spolsky: [laughs] Atwood: The joke just writes itself! The joke just writes itself, right... Spolsky: It does... How many... How many third-party tools do you have... uhh... How many third-party tools are a part of the StackOverflow code base? Atwood: Well, okay, so... [chuckles] Uh, Dare [pronounces it as the English word "dare"] Obasanjo [pronounces it "oh-bih-san-ho"]... I don't know if I'm pronouncing it correctly. Spolsky: Okay, "Dare" [pronounces it "daray"]... Obasanjo [pronounces it "oh-bih-san-ja"]... It's "Dare." Atwood: Is it "Dare"? Spolsky: Yep. Atwood: Really... Okay, I didn't know that. Well, I've learned something. But he had a whole blog entry about how, you know, I had chosen to write my own sanitizer, and that was a very deliberate choice for me... Spolsky: Mm hmm. Atwood: ...for a number of reasons that I won't get into. But he was very critical of this, because, of course there were bugs in the sanitizer... Spolsky: Mm hmm. Atwood: ...which there were going to be, and to me, it's about, like, it's about your velocity; it's not about where you are; it's about where you're going, and we're gonna fix that stuff, right, and I'm making the sanitizer public as well, so other people can have a sanitizer that's not ten thousand lines of code, and ridiculous, and uh, so there's a philosophy there of building something that's reusable for everyone. Um, but I thought it was ironic, because he was talking about how developers should just pick a third-party library and go with it, and I think obvio... it's a balancing act, because we picked this logging library, right, which kind of caused a problem for us, right, I mean partially it was the way we were using it, but the way it was locking the files was a design issue in terms of the way Log4Net works. Spolsky: Right. Atwood: So I... I think it's a trade-off. I don't think it's always as clear-cut as "you should always pick a library" or "you should never pick a library," right? I think there's always some in-between there. So, for us, I'm definitely a minimalist—I don't like third-party libraries; I feel like we have a giant third-party library called "Windows," called ".NET"... huh... ASP.NET MVC is technically a third-party library. Um, but these are, you know, major vendor stacks. And I do feel like—as much as we talk about open source and stuff—there's a certain level of quality you associate with these major first-party stacks, right, whether it's from Apple or Microsoft or Sun or whoever. That may or may not be true, but hopefully usually is true: that these things are really heavily tested. Spolsky: There is definitely, yeah, there is definitely... I mean, there's something I've learned over the years, and, you know, I started out with working on the Excel team, um... The developers on that team had a motto, which was "Find the dependencies and eliminate them." You know, they had their own compiler; they would not use untested libraries from other groups at Microsoft even... Atwood: I love that they had their own compiler. That is so hardcore. I can't even, like, I could not even hang out with those guys... right... that hardcore. Spolsky: Hey, well, we have our own compiler, man. Atwood: Yeah... Spolsky: Let me tell you why they had their own compiler: They had their own compiler because Excel was getting huge, and just compiled 8086 was just too large to fit on floppy disks and to fit in memory. You know, we were really trying to cram things in there. And so they developed a pcode compiler, which basically... you know, it's like bytecode. They called it pcode. This is a very old technique, and it compiled Excel into an imaginary machine, a virtual machine, which was a lot more expressive that an 8086, and had all kinds of additional features, and so the compiled code is about one-third the size, and in a lot of situations this made the performance a lot faster. So, for example, in those days when almost everybody was running programs off of floppy disks, the chances... Or no, not floppy, but the 3.5-inch, not-so-floppy disks. But the read time on those things is really really slow, so if you could launch your app—if your app was smaller at the time that you read it from disk—it didn't matter if it ran a little bit slower. The whole... the overall experience would be a lot faster. So if you could fit in memory without swapping, then obviously the whole thing would run faster, so it was worth doing this pcode thing for a long time, and about the time of Excel 5.0, the bit flipped on that and it suddenly became... suddenly everybody had hard drives, and nobody really cared about the size of the executable, and it was okay to have about a, I think, a four-megabyte executable instead of a one-megabyte executable, and so they got rid of that pcode back-end. But even then I think they had their own compiler for a while because in order to right really really efficient code, they wanted to be able to control... oh this is a long story. [9:39] Spolsky: but a pointer on an 80386... for a while the 80386 was the target. On an 80386, a pointer consisted of (or even on the 8086 in general) a pointer consisted of two parts, the segment and the offset. So it's like "where do you want to start your pointer?" and then "what's your offset inside there?". And you couldn't just indirectly say "here's my pointer, just do something with this." You could, finally in 32-bit clean mode, but we didn't have that. What you had to do, was there was this thing called the segment register, and you loaded the segment register saying, "From now on my pointers which are 16 bits are going to be offset from this particular point. And the very loading of that segment register would cause all kinds of operating system traps to get executed and all kinds of interesting things to happen and it was a very, very slow operation. So if you were doing any kind of pointer manipulation, which you were cause it was C and everything was pointer manipulation, you wanted to load that segment register as infrequently as possible because that was a very, very expensive operation. And chances are, you're doing a whole bunch of pointer operations that are all in the same segment - for a while at least. You really want to be able to just load it once and then maybe do your next 20 operations with that as the base register. So all the Excel code had this assumption that they could do that - it gave them the ability to do that. And that made it just screamingly fast compared to the competition. I mean, I remember when Borland came out with Paradox for Windows, and they did not take this into account and they just used - it was C++ code, so they really had no choice but to use pointers for all their methods because it was all virtual tables and C++ objects. The net result was that they just used these pointers naively, pretending that the top 16 bits - every time they wanted to use a pointer, they reloaded the segment register, and that just made this app really, really slow. I mean, it took 90 seconds to start. You know Excel could launch in 10 or 15 seconds. Atwood: Wow. Spolsky: So this was this feature that they eventually got added to the regular Microsoft C compiler called base pointers, and I think then they stopped using their own compiler. But their philosophy was really not to trust anybody and to have control over everything so that there's some hope they can get it to work without having an external dependency. You know, I've sort of taken this with me a long ways, and every time I've failed to do that I've tended to regret it. Every time we've put outside technology into Fog Bugz we've regretted it. There's a lot of these excellent components, and they are really great components, that are made by vendors like .Net components and they're widgets like the cool calendar widget dropdown that you put into your web page and all that kind of stuff. And inevitably what I've found is that they are good enough for enterprise code, like internal apps that you're using at the insurance company, and they're just never good enough for the kind of app you want to ship that has to be perfect. Somehow there is something that's not commercial quality about them. You know it's fine if there's 20 people using it, and they're all using it the same way, it allows you to in fifteen seconds put a calendar dropdown into something. But then you'll get to some customer who says, "You know we don't start our week's on Sunday in my country". And you'll say, "Oh", and you'll find out this library doesn't have that feature. As a hypothetical example. [12:57] Atwood: Right! Spolsky: Which I ... Atwood: I believe that's one advantage of some of the web stuff is because everything is just public facing by default. You don't have sort of this, this internal development ghetto effect. Like... 'cuz I... we have... I mean, to be fair, any talk of dependencies, we have tons of dependencies, right? It's just a question of what dependencies you wanna take, like, I mean, jQuerry is a dependency, right? Spolsky: Hhm. Atwood: We're using the WMD control, that's a dependency. Spolsky: Hhm. Atwood: There's these little add-ons for jQuery that... Spolsky: But you know what, if there was a bug in... if you found a bug in jQuery, you would just go, edit the source and you would be shipping your own private version of jQuerry and problem solved. And it wouldn't be ideal, but at least you wouldn't be screwed. [13:32] Atwood: Right ... We have actually done that and let me actually give an example. So the WMD editor has a bug with international keyboards. Like there's no way we would have found this because we don't use international keyboards but obviously some of the people that use StackOverflow do and Spolsky: Sure Atwood: They were, I think, understandably very annoyed because they couldn't enter like right bracket. Which is like an important key particularly in markdown. That's one of the key, like one of the delimiters you use sometimes. Spolsky: Yeah, the... Atwood: Um. And they actually, huh, the prob...I'm still trying to get the source from from the authors so we don't actually have the source. What we have is well ob...not...I guess obfuscated is not the right word but minified javascript where they compress it down so all the variables are "a", "b", "c" and things like that. So it's not exactly fun code to look at anymore [laughs]. Spolsky: Yeah [14:12] Atwood: But somebody actually went through and actually found a little work-around and I feel bad because when they posted this I didn't realize that they actually found a work-around so I was like "Oh that's interesting" and I just didn't come back to it. But then somebody was complaining that this bug had stayed open for like three weeks and so I finally then went and then put in that fix. So you're right, having the source is great, because you can fix little problems that you run into and when you pick up components from the web (if you're talking about javascript) they're all sort of...um...you get the source by definition. I think that's actually one of the grea...I had a blog entry about this. That's one of the great strengths of the web. Everything is essentially open source by default, I mean, if you're curious about what Google is doing on you remember when uh you know maps came out and everyone's like "oooh". You know, maps, it's all this innovative zoom-in, zoom-out technology. Spolsky: Yeah [14:57] Atwood: You could just view source and if-if you were, you know, motivated enough, you could figure it out, right? There wasn't like an executible that you had to decompile or anything like that. So I feel Spolsky: [garbled - interrupting] Flash Atwood: Go ahead Spolsky: Unless it's Flash all flash. I mean like yahoo maps is all flash and you can't figure out what they're doing. Atwood: Oh, right. Right. Right. Right. Right. Well, that gets back into the whole we call it a rectangle problem - the browser where you have this alien rectangle [laughs] that lives in another universe and it pokes a hole into your dimension and then like this crazy stuff comes through and yeah. So it...it...on a related note, let's close out the topic. So the struggling with the deadlock put us back I would say at least four or five days. So in order to have a smooth landing, there's also a couple of features that I really desperately want us to get in before we open to the public like say a captcha for [laughs] I think that's kinda important when we go live. Um. So adding like a week to the schedule really helps us to have a smooth landing. I mean we could launch on the third. I mean we honestly could, but it would be a little desperate. We would be really flailing, fixing things at the last minute. [15:55] Spolsky: Yeah, no. We're in no rush, we can take another week. I thought thatthe end of August plan was a little bit ambitious. I think we're both in the same position of really being on the fence, as to whether, I don't want to say on the fence but it's sort of a close call between we want to do the Hollywood launch, going back to last week talking about Aarons Swartz's thing, whether we want to do the Hollywood launch where everyone hits us at once and the world comes to an end versus the Gmail style launch where we just start taking a thousand people a day or give out invites or something to at least have some kind of control over the rate at which people come in. Atwood: You know what I like now, what my philosophy of this has gone towards? It's almost like dating, where you don't really want to seem needy. If it comes up in the conversation, "Hey we have this website stackoverflow", if it's contextual, then talk about it. But maybe we won't have a whole post saying "Hey we're launching a new site called stackoverflow'. Maybe not even do that. Sounds very counter-intuitive but just bring it up in the context of things you're discussing. Because already in Twitter and in email I'll want to reference things in stackoverflow because I have a problem or I found something interesting. It's just a natural side-effect of conversation that I have with someone. And to me it's completely organic, it's the way it's supposed to be. And that would maybe solve the problem of how we launch, like maybe people would find out about it organically as we have these conversations without us going "hey look at this new thing, poke poke, go over here and look at this new thing." It's just a thought, but I'm totally open to that. Spolsky: Yeah. We have quite a finite number of people who listen to ourpodcast and read out blogs anyway, so they're going to find out. Atwood: Right. But the site is very sticky and very social, too. Along those same lines I just emailed Joel today. We know we've succeeded already in beta, and do you know how I know? Because we have a whole blog dedicated to hating stackoverflow on the Internet. So you know you're successful when that happens, it's like a stamp of approval. Spolsky: It's not even public!Atwood: I know. It's a huge success! If there are people who hate you and it's not even public then you're tremendously successful. Spolsky: These people they're going to the backlash state before we even got tothe hype stage. Come on you guys! Backlash comes after the hype, that's why it's called backlash. Atwood: And a funny thing that happened on the blog too. The way we secure stackoverflow is somewhat intentionally naive. Spolsky: For the beta.Atwood: Right, for the beta. The site is not supposed to be secure at all, it's supposed to be totally public, even in the sense that you can just walk up and type stuff in, literally, that's what the site is like. So securing it is just completely counter to everything the site does, and we even struggled to secure the site initially. How do you secure a site that's not designed to be secure? Do you want to write tons and tons of code around authentication? So, the minimal solution we have is basically a very simple cookie-based solution. And I love on this particular blog he found that out (I presume it's a he, it's always a guy), and he's like "look how lame their security is, they totally don't understand how cookies work. They don't understand security at all. You're going to trust them to build a website?" It just made me laugh because it very much missed the point of that whole thing. Spolsky: Don't even respond. Why are we even talking about them?Atwood: I know, I know. Some of the criticism is actually grounded. If there's something useful that comes out of it I will use it and I will respond to it. And it's not vitriol yet. He says it's a blog about flaming but it's actually somewhat reasonable. As long as it stays reasonable I have no problem responding to it. I'm not going to point it out or list the URL or anything like that, but we absolutely we are listening. Spolsky: blogging-harmful.blogspot.com. Complete waste of time, but you know ifwe get people to care about us whether it's positive or negative that means people care about us. Atwood: Exactly. That's my point. If nobody cares, that's the real loss. Spolsky: That's the real failure, exactly. If you can't get anyone to care oneway or another about what you've done. For example, this website blogging-harmful.blogspot.com is going to disappear without a trace. Even though I promoted it on the podcast it's going to make it all the more painful when nobody... [20:26] Atwood: But people who do something like that, they don't want attention or anything, they're not feeding on it at all, they're not interested in things like attention at all. The work is it's own reward, it doesn't matter if anyone is looking. Have you seen that thing on the ... I meant to blog about this but the whole concept of just, not looking at things, to basically discourage them. Or that, conversely, looking at things encourages them, like the whole Paris Hilton thing and just talking about these things over and over incessantly actually reinforces the whole trend. There was a series of children's books I don't know if you've heard them, they're called: "The Great Brain". It's set in Utah - Spolsky: Yeah Atwood: - at the turn of the century. I got these books as a kid and I was totally obsessed with them, because the Great Brain is all about a family. I don't remember the family's name, but there is one central character J.D. who is the Great Brain. Essentially he's always thinking up ways to, essentially social engineering before we had that word in computer circles. Basically getting people to do what you want them to do, completely of their own volition. The Great Brain is basically this genius of a kid, who is using all these social engineering exploits to get away with all this crazy stuff. In that family, if they found out the Great Brain was doing this stuff then of course he would get punished, but the ultimate penalty was what they called the silent treatment. The silent treatment meant that nobody would talk to you or acknowledge you for a certain period of time. They would give you food and stuff, but they wouldn't talk at you. It was just stunning in the book, you don't really think about this stuff as a kid (I was like 10 or so), how desperate it is as a person, as a social being, when nobody will acknowledge you. How profoundly affecting that is, right? Even the Great Brain, as a smart kid, hated the silent treatment and would do anything he could to avoid getting the silent treatment because it was just such a brutal penalty. I remember Jason Kottke talking about an episode of The Simpsons where these animated statues came to life, and the way they got rid of them was they started chanting "Just Don't Look!" Spolsky: (laughs) Atwood: This is led by Lisa Simpson, who says "Just don't look at them and they'll go away!" Spolsky: (laughs) Atwood: Its amazing how powerful that philosophy is. If there are things happening you don't like just don't talk about it or don't give it any attention, and look at the things that you actually care about and actually want (to have happen). Spolsky: Thats right. That was like the Clinton administration's policy on Rwanda. Atwood: (uncertain chuckle) Well thats the issue of social injustices, which I think is a little bit different. Spolsky: There are definitely people who are attention seeking, but you know what? That the thing about trolls, "don't feed the trolls" or whatever. You know what, trolls are doing a great service to the Internet, they're making it entertaining and interesting. Atwood: Absolutely. There's definitely been, and I think I referred to it in the previous podcast, "youthful experimentation" at stackoverflow. We're still in fact seeing that. Spolsky: (chuckles) Atwood: We're still seeing people just try stuff to push buttons and see what they can get away with. Spolsky: One of the things that will happen, and it happened with IRC all the time, when people get addicted to a social technology or something. They're addicted to the site, the love it, they come and they answer all the questions, but at some point there is just not enough of the main content that everybody is enjoying for them to entertain themselves. They have to find ways to entertain themselves for the next three hours after they've spent the 8 hours doing the regular thing they were supposed to be doing. So on IRC they would start these little flame-bot wars and they would write little bots to protect themselves against the wars, and they would try to cause splits so that they could take over somebody's nickname, and that kind of stuff. They weren't doing the kind of stuff that IRC was for, which was chatting, they'd taken it to the next level because they ran out of people to chat with and that became boring. So they started kind of attacking the system itself. Atwood: Right, and we're definitely looking at that. I mentioned that CAPTCHA thing, and thats the next piece of the puzzle that has to go in. We have rate limiting mechanisms. One of the early things that happened to us was - Spolsky: Flooding. Atwood: - somebody wrote a bot that would just revise posts every minute to keep it on the top of the stack. Actually there is a certain amount of people doing that still, which I'm trying to discourage. They way I like to discourage things is, where possible, creating rules in the system that make that behavior not desirable. Not negative, necessarily, but things happen that make it not worth much to you. So let me give you a specific example in that scenario. You have the user who's just editing their own posts every three hours so that its always on the top of the stack. We have this concept now, its actually implemented though I talked about it in previous podcasts, of the community owned post. Because one of the great divides in stackoverflow is that we have this ownership system where you get voted up and down, your content gets voted up and down, and that affects your reputation. You own stuff, so when you post something you own it. Then you contrast that with the Wikipedia model which is that nobody seems to own it, and we're trying to do both of those things. At the transition point we came up with a couple rules. The initial rule I had was that edits by four different people will cause a post to switch from being owned by Joel, for example, to being owned by the community user. At that point you don't lose any reputation that you got up to that point, but any future upvotes on that content don't go to anybody they go to the content. I think this is the way it should be. Ultimately you're voting on the content more than the person anyway, so hopefully people are ok with this. Seeing that people kept editing stuff over and over I bent the rules a little bit and said ok, if you edit your own thing more than N times then it also becomes a community owned post. There is no real value to the user, in terms of getting additional reputation, to bump stuff up to the top of the stack anymore because if you edit your own thing enough you won't get any reputation from it. It behooves you to only edit it once, or however many times you need to edit it but hopefully no more than once, and just let it sit there and have people find it organically and naturally the way it is supposed to happen. Spolsky: That will happen more when we're open to Google for searches. I think right now one of the problems with stackoverflow that some people have been experiencing is that they ask a question, and its a little too esoteric to get a response right away, and then it disappears from view for a while. Once the site has a much larger critical mass of people and is searchable by Google, those questions will naturally have people come to them so they won't need to try these little tricks to get them in front of people again to get an answer. Atwood: I also try to copy a lot of things I've seen online that have been successful, like conventions. Let me give you a specific example: PHP-BB, and I'm sure there are other web discussion boards that do this too but PHP-BB is the one I know, has this editing convention. When you post you can edit your own posts, and I noticed that when people are using PHP-BB that right after you post something you'll always notice some goofy mistake that you made, like immediately. This happens to me 9 times out of ten I'll post and think "oh, I should have talked about this" or "I missed that word," so you immediately go in and edit. At a certain threshold these are not treated as real edits, they're treated as just going back in time to pretend that it is the post you originally made. It doesn't kick off the whole auditing trail of you having edited it 50 times. One of the first things we did in stackoverflow is actually implement that. I remember talking to Geoff Dalgas about that, and he's like "why do we have to have this?" I said, "you don't understand, this feature has to be in there on day 1 otherwise we're going to have so many revisions that are just in the first minute or two after posting and are just silly little things that are being corrected." [28:20] Spolsky: You not actually recording the revision? You're not doing the diff thing? Atwood: [Not] within the threshold. Right now the threshold is actually 5 minutes. So up to 5 minutes after you post, if you edit your own stuff. Now if I go edit it, it's a real revision. Spolsky: Now...that's OK, except that you sort of run the risk that the historical record is...if somebody posts something and it's seen as being... Atwood: Sure, if the threshold is like 2 hours, right? Spolsky: ...5 minutes. Let's say I ask the question "Is Jeff Atwood smart?" (laughs) And then you reply saying "Yes! Absolutely. Definitely." And then I go back and change "smart" to "stupid" one second later. "Ah-hah...you said you were stupid!" And there's no track record. Not only that, but everybody can say "There's supposed to be an audit trail here. Let's look at the audit trail...oh, look! It wasn't edited!" Atwood: Right. No, that's true. That does leave you vulnerable. And I think that's where you come to playing with the rules, and seeing what happens. Another example of setting the rules...you can't always anticipate all the side effects of the rules you're going to set. And you don't always get the behavior that you wanted out of the rules that you set either. You have to be very, very careful. So when I instituted the rule of 5 edits by the owner causing a post to go to community mode, the next day somebody complained about that. Actually this was today. Somebody posted a thread on "What are your favorite developer magazines?" And because it's not in my opinion a great question, because it really can't be answered and it's very subjective. It's not a great question is the bottom line. So it got edited a lot, immediately. People started editing the tags. Somebody changed the title. The net effect was within 45 seconds of this guy posting it, it was in community mode. And he was kind of complaining about that [saying] "Look, I've had no chance to get any rep from this post at all." I kind of agreed--although again, not a great question. Spolsky: You shouldn't get any credit for a not great question. Atwood: You could argue that this is totally correct behavior. I can see where he's coming from, and one thing I thought of was having a certain time window where...it's four edits by four different people, but it has to be more than one hour. Or whatever time interval. Spolsky: Ugh. Too much tweaking. You know what I would say? Just tell people "Listen, if you want to ask one of these poll questions like 'What's your favorite developer magazine?', that would be a good question if you phrased it in the following way: 'What is your single most favorite developer magazine?'" And instead of just replying, see if someone else has already given the answer, and vote it up. And then what you get is a ranked listing of everybody's favorite developer magazines, in order from most favorite to least favorite. But if you say "What is your favorite magazines?" and people are answering with three, and they're discussing some magazine that they subscribed to a long time ago and it's just chatter going on in there, the question becomes a mess and it just doesn't fit the Stack Overflow data model. Atwood: It doesn't. That's right. And I think that's the challenge, is also getting people to understand what is the Stack Overflow model. Spolsky: We can do polls, and we should do polls, for things like "Should I learn C? Yes/No" (laughs) There you go...everybody vote. Atwood: One advantage, and I haven't put this in yet, but you will be able to opt-in to the community mode so that you don't get any rep. So then you could post things that are effectively voting questions. Let's say we had "What is your favorite developer magazine?" And everybody that posted in it went into community mode immediately, all those votes would go towards the magazine, no you. I said "BYTE. BYTE is my favorite programmer magazine. It's a classic." So everybody that voted for that would not be voting for me, they'd be voting for BYTE. I wouldn't get any reputation from that at all. Spolsky: I could not program my Atari 800 without the great articles I've gotten from BYTE magazine. [Actually I had an Atari 800--back in 1981 or 1982, and while BYTE remains my all-time favorite programming magazine--especially in the early days before it went downhill, the magazine that helped me master the Atari 800 was Compute.] Atwood: (laughs) There you go. I think I've said this before, but I think it bears repeating. I really enjoy working on Stack Overflow--it's a really fun social experiment. Because we're getting useful stuff out of it for the most part, to me that makes it workwhile. You call that tweaking, but to me if it's a tweak that results in six months from now hundreds of much better questions then it's worth doing. [32:52] Spolsky: My feeling is that if the tweak is a little bit subtle and little bit weird, people don't know quite what it is doing. Like that thing you were mentioning earlier about how if you do your own edit within about 5 minutes it doesn't get into the history. See, right then you are doing something that wouldn't be what people would expect. I mean, they might be able to learn that that's the way that it works, but it isn't what they would expect. They would expect that you either see the history, or they don't see the history. It would never occur to them that you'd do something more subtle than that, and therefore, they'd always assume the simple model, and therefore they may have usability problems because they don't understand what the app wants from them. You know what I mean? Basically, usability problems always occur at the intersection, of the user not understanding how the program model works. The program has a model as to how it works, and the user has some understanding as to how the program works, and when those are different that's when you have a usability problem. It may be small and it may be subtle, but that's where you have a usability problem. Your best hope, if you're setting things up and you're saying "Hey I going to give you points if you do X". and everyone does X, and X is something that you want, and you told them that that's what you'll give them points for, and it's obvious, then that's great. But if you're going to do something where you're doing something non-obvious, or a little bit tricky, or you're creating a little bit of a conflict between how they think it is going to work and how it really works, then in all those cases the best you can hope for is that they will accidentally stumble upon doing some behavior that you want them to do. Because of their misunderstanding will cause them to accidentally trip into the particular dark hallway that you want them to go down. That's the best you can do. I think you're always better off striving to make it that people understand what's going on. A lot of times that may mean that you can't have behavior that isn't clearly visible. There's going to be some kind of behavior in there like how you earn a badge, or when wiki edits don't show, there has to be some extreme visibility in the app. It has to explain itself a little bit, so that people understand what it is doing. Atwood: Right. Normally, I'd totally agree with you. I think this is a little bit of an exception just because again it came out of phpBB, and these other very long well established messaging systems. Were harvesting these ideas from Wikipedia, message boards, and wherever I've been online and I've had a community that I thought really worked. I try to steal those ideas, and fold them into StackOverflow. So I think it is a proven idea that works. It's just a peculiarity of human behavior that you're always going to make mistakes immediately after doing something. So you have that little cusion. It's kind of a special case based on human behavior. And then two, I think we have a community -- Spolsky: What does it hurt to have at least the transaction history shown? Atwood: Well, because it becomes noise. It's really tiny, simple, edits. Spolsky: I don't know. Atwood: Well again, we're not exactly doing... we're a hybrid, right? So, we're in between. So I think we harvest those ideas from different places. I don't think -- Spolsky: Hey I got a question. Heh. I got some questions. Want to listen to some questions? Atwood: Yes. Spolsky: Sorry, you'll see why I'm laughing in a minute. [36:00] [57:56] Spolsky: [We will actually have] I'm in the process of making another movie. Did we talking about this on the podcast yet? Atwood: Oh really? You sent me a copy of Aardvark'd. I have seen that. Spolsky: That was our first movie. That was like a slightly serious reality TV show about the interns building Copilot. It was probably better than what on MTV. But the truth is the audience, the Joel on Software audience, really wanted something more technical and a lot of people wrote to me and said "You know, I really wanted to learn how software is developed at Fog Creek", and this movie was just a little bit too light on that content. This is going to be much more hardcore. SO it's really going to be about how software is developed at Fog Creek. Atwood: Ooh wow... Has this started? Spolsky: It's about a one year project. We've got the same film maker as we had last time, except I'm basically interviewing everybody at Fog Creek. You know, three or four times, to get footage for this thing. And then we'll splice it all together, and the current plan is to have several different formats of this movie. There'll be a lot of short form little piece, like four minutes, we can put up on YouTube. If you want to hear about source code control, or you want to learn about hiring programmers or interviewing programmers or the phone interview. They'll be these little pieces that you can just download freely on the internet or watch. And then there's a [sort of a] more substantial ninety minute version, which is probably too much to distribute over the internet. But it'll probably distribute in DVD form. Maybe by that time, there'll be a good way to distribute a [you know] ninety minute DVD length thing over the internet. At kind of the high end, and what will hopefully pay for this whole operation, will be a corporate training video. Maybe a five to six hour thing that a team might buy inside a corporation to learn how to do software development a little bit better, and it'll just go into much more depth. There'll be like a whole hour of training on how to conduct interviews for programmers and a whole hour on how to setup your tools and that kind of stuff. Atwood: Wow. So you're like Howard Stern. You're like the kind of the media now. You have like your movie. You've got your podcasts with me. You've got, you know, your blog, your not-a-blog. Spolsky: But I mean, you know, that was the original idea behind this podcast remember how I said, like, "There's some people that you can reach in certain channels, and they need to hear you. I don't listen to Eminem's music, blog I saw the movie 8-mile, or miles... whatever it was. Different people, you will reach in different channels. Part of the goal is to do that, and the other part of the goal is: I have an inherent inability to ever give up. And instead of just saying, "You know, we tried to make a movie, and it wasn't as good as it should have been, and so maybe we're not so good at making movies.", we just said "You know, let's learn from the experience. Let's try and make a two point oh, that's just much better." Atwood: I didn't really hear a lot of negative stuff about it. Spolsky: You know who liked it? Programmer's girlfriend, wives and families. Because there's nothing about it that's inaccessible to somebody who knows nothing about programming. And it really does convey to the normal person a lot about what a programmer's life is like. Just the sort of feel or being a programmer. The fact that programmers get into these conversations about things. You know that was probably, I thought, one of the highlights of the movie - when we told the interns that their job was to figure out if they could jump across to the ledge of the neighboring building if there was a fire. Could they jump out the window and make it on to the ledge of the neighboring building? And programmers love to have conversations like that, and they immediately leapt up to the white board and started drawing equations from physics and stuff like that, and they tried to work it out. And that's really what our conversations are like, what our lives are like. And to geeks, nobody knows--, if you're a geek you probably don't think anything is strange about this. But to a normal human being... this is very bizarre. That we talk that way, and that we think about these kind of things, and that we then need to solve the problem mathematically, to decide if you can make it to the neighboring ledge before you died. Atwood: Right... Well that's cool. I'm looking forward to that coming out. Looks like it'll be a while. I definitely recommend people who haven't seen it, check out Aardvark'd. I enjoyed it. I didn't think it lacked anything. Spolsky: The way to find that is ProjectAardvark.com, and there's link to the movie. ProjectAardvark.com is a blog that the interns kept that summer. It was a couple of years ago. [1:02:14] [1:02:50] Atwood: Two things; we have a wiki, for people who can't listen to this. Where people can contribute transcriptions of our incredibly boring podcasts, and we thank you very much for that. Although I do have one request for the transcriptionists, and the ironic thing is, you're going to transcribe this, which I think is hilarious. When you transcribe, don't write down every time I say 'uh' or pause or 'yeah'. Make me sound awesome, that's my one request for the transcriptionists. Spolsky: It doesn't have to be word for word. It doesn't necessarily read as well when it's word for word you can leave [out] 'uhms' and 'uhs'. Atwood: In fact, leave out whole. If you think it reads better a certain way, just make me say whatever makes the transcriptionists sound the most awesome. Spolsky: And it's wiki, go ahead and edit it. Atwood: People edit anyway, you're right, it's hilarious. I've been reading the [revisions], it's very funny. Atwood: The other thing is, if you do contribute to the wiki. Since our beta has been pushed back a week, this will get you in the same day, to the StackOverflow beta. If you want to be in, just email me after you've done a little bit of transcription, one minute or whatever you're comfortable with. If you want to get your question answered on the air, send a less than 90 second recording to podcast@stackoverflow.com, we will put it in the queue and hopefully answer it on the next podcast. Spolsky: Alright, that's it. Thank you very much, see you next week! Atwood: See you next week. [1:04:04] [Outro] |