00:00 – 01:12 Opening credits.
Atwood: We had a request to do sort of a retrospective, I guess, post-mortem, as you would call it.
Spolsky: Great, alright let's do it, let's do a retrospective episode of the podcast StackOverflow podcast.
Atwood: Well I don't want to make the whole episode about that but I think it's worth discussing …
Spolsky: Let's make one of those episodes like they have on TV sitcoms where it's like the Christmas episode and they haven't actually filmed anything? And so there's a bunch of them sitting around drinking Egg-Nog like, all the main characters, sitting around the living room or round the Christmas tree drinking and they're saying “Do you remember that time when...”, and then the screen gets all...
-Joel makes a noise here that can only be described as “wibbly wobbly flibbly”-
And they switch and the cut and they show you a full 5 minutes from that other episode?
Atwood: Oh wow!
Spolsky: And then they cut back to the people sitting around and you realise, “Wait a minute, they didn't film an episode for this week did they? -laughs- You know what I'm talking about?
Atwood: I do, I totally do, but I don't think many shows do that, I think it's more of a 80's, early 80's sitcom kind of experience.
Spolsky: They used to do that all the time, yeah. Blossom! A very special Blossom! Tonight on ABC.
Atwood: So Jon Skeet said he sent in a recording but we can't find it, we looked and, I don't know, it must have gotten eaten by the e-mail monsters, for which I totally blame FogBugz by the way, since all this comes through Fogbugz. So that's great! I get to blame FogBugz for this failure!
Spolsky: -in a mock-offended tone- THAT'S IT, YOU'RE CUT OFF!
Atwood: Aah who knows, some kind of e-mail...
Spolsky: BOOP. BOOP. (trying to bleep Jeff) Okay, ladies and Gentlemen, I'd like to introduce the new lead developer for StackOverflow, Mr Michael Pryor!
Atwood: -laughs- Michael Pryor..Uh, so, I'm going to just read what the question was.
Spolsky: The question never came into Fogbugz, it didn't!
Atwood: Yes, we never saw it for some reason. So it was recorded, we'd like to play it, but we cannot. So in lieu of that, I will read it – and the question is:
“Lessons learned in a year?
Has anything happened exactly as expected?
Do users behave better, worse, weirder, than expected?
Any technical lessons learned?
What would you have done differently?
And what do you expect for the next year?”
So this is sort of standard retrospective type questions I think.
Spolsky: This is, this is almost so standard I don;t know how to answer it.
I'm almost at a loss!
Atwood: Yeah, and you know me, I'm not a big “What's the road going to be like in a year?” kind of guy, I'm more like, “What is next month gonna be like?”, and I have a hazy picture of what the month after that should look like, I'm not really thinking about a year from now so I am not the best person to ask these questions, I would say the one take-away...because people do ask me, because I had a meeting the other day with....You know Joel, and I was explaining to him all the stuff, and at some point you feel like you're explaining things over and over, you know, you have this story that you're telling, and it's the same story, and you tell it the same to everybody that you meet, you kinda wonder if you're being boring on some level.
Spolsky: Or if you just forgot to write that blog post.
Atwood: Yeah, and that's a good point too I mean you could certainly – not that Jon Skeet hasn't, I'm sure he's seen all our blog entries – but if somebody is really interested in the history of StackOverflow we do try to document all of the significant stuff that happens in the podcast, in the blog, and to a lesser extent on the site itself, although now we have Meta, that can become more formalised.
Spolsky: Somebody who is thinking of making a Stack-Exchange site emailed me to say “Do you guys have any lessons learned that you want to share about building these community sites?”.
Atwood: It's that people never read, that's what I've learnt, people don't read, like..
Spolsky: Tell you what! Sit down and listen to about 52 hours of podcasts that we've done, no, 64 hours of podcasts that we've done.
Atwood: Well you know, maybe that does bring up a good issue, that is, how do you distil the FAQ? It's questions that keep getting asked over and over, so, I guess one of the development models of StackOverflow is that I try to optimise the system in such a way that I don't get tons of support e-mail, I don't get tons of questions about “Why does it work in this way?”, “How does it work this way?”, if you get tons of questions like that it means to me that you're failing, you've built something that people basically don't understand, don't get.
Spolsky: Right, but they care enough at least to ask?
Atwood: Yeah they care to ask!
Spolsky: You could also not be getting questions because they don't even care!
Atwood: Yes, nobody cares is always a possible outcome. So yeah, believe me I'm glad to get the questions.
Spolsky: -laughing- You get those little websites that nobody has ever been to, it's like a blog, and it's got 3 entries with an FAQ 45 pages long -laughs- “How often do you post here?” “Well, I haven't decided yet, I think maybe I'll post....” anyway, you know what I mean?
Atwood: Yes, that's another reason I try to avoid introspection, because introspection doesn't really matter until you've done something that's worth...
Spolsky: Alright I'm cutting off this question then.
Atwood: What? No no no no! Wait, I think it's good to have a summary, I mean it has been a year, it's good to have a little moment and have our little thought process about what we did right and what we did wrong.
Spolsky: Let's get drunk on the show.
Atwood: I'm so wasted right now!
-Joel makes some unbelievably high pitched noise of some kind, from which the only comprehensible word is “StackOverflow”-
Spolsky: Oh my god I just snorted...
Atwood: Ah...Yeah. So, retrospectives can be tough, but one thing I will say in terms of telling the StackOverflow story to people I meet which is where I was going – I do identify 2 things that we kind of, got wrong early on, and those really, if I had to summarise all the things we got wrong early on it's about capping things that happen, in other words, making sure that anything that happens in your system has a cap on the number of times that it can occur, because when it comes to the web anything that is unbounded, somebody is going to exploit, you cannot have unbounded behaviours in your system.
Spolsky: That's sort of the difference between a small site and a large site, small sites can get away with it because nobody cares, well, I don't want to say nobody cares but I think a site has to get at least StackOverflow size, you know it has to be getting like 1 million page views per something before you start to notice that being a problem.
Atwood: Yeah. Well I think even if you have 5 users one of your users is going to be a joker.
Spolsky: Nah, 1 out of 5 users is not going to be a joker it's much less than that.
Atwood: Well let me give you a specific example, this happens all the time today to the point where we're actually considering writing code to fix this – or at least, automatically penalise people who do this because it's so annoying to us. We have view counters on our questions, and on our user pages, and you know, programmers being programmers see there's a way to increment this view?
Atwood: And they look at the source and think, “Oh I see how you're incrementing this view, I'm going to increment your view one billion times to show you how awesome I am!”, you know, and they don't really consider that they're not the first person to have this idea, this genius idea of incremental the view, which during the Beta did happen – we hadn't had the chance to write the code to put a bounding on that. You know it's just very simple, every time you refresh the page you just increment it, blam, it's like 2 lines of code. So that was just ready to be exploited – but that was like a year ago.
Atwood: I mean first of all, how stupid do you think we are? I mean really. Do you're that clever...
Spolsky: Well that's what they're asking, they're like, “I'm wondering if...”
Atwood: Argh, but then they just go and do it, I have a daily report that shows me all the sort of weird access patterns to the site, and they show up like a sore thumb on the report like, “Wow, somebody retrieved this one URL 20,000 times, -laughs-, and it doesn't do anything because we've long since made it a no-op to even retrieve it, but it's still annoying that people either a) try it, and b) suck up some meagre portion of our bandwidth and resources by doing this stupid thing that they're doing, and 3) it's kind of offensive that they think this would work, that we're actually this dumb to have not figured this out.
Spolsky: Yeah that's how you feel, whenever anybody tries to rip you off you're like “Ah! That's just offensive that you're trying to rip me off in this obvious and benal way”.
Atwood: Right well , because there's lots of clever hacks and I have talked about many exploits that people have identified, brilliant exploits, or just stupidity on our part, and I will own up and say “Yeah that was dumb we shouldn't have done that”, but this isn't, this is not in that category this is not clever at all.
Atwood: So yeah, stuff like this does come up and this I that unbounded behaviour I was talking about, you have to bound all the sort of, scores and numbers in your system, all the things users can do have to be bounded. How many questions can be asked, how many answers can be voted on...
Spolsky: Yeah, to go philosophical I don't think I would go so far as to say that was a mistake, I think it was something that we didn't know we would have to do, but I still wouldn't have done that in advance to building the site.
Atwood: Well I look at it this way, if somebody is going to design a system like StackOverflow, asking “how do I design this?”, I would say look, you've got to bound everything. From day one. Just put in the boundings, because we didn't and, you know, I kind of realised we'd have to do some of it but I didn't realise how pervasive those boundings would be, like in every aspect of what we do there's boundings in the system that you have to have.
Spolsky: So the system is counting basically.
Atwood: Yeah you;re just making sure that nothing happens too much. Because if anything happens too much it's just bad. It just leads to really really bad things happening, both from the reputation system to the scoring perspective to the hardware perspective, it's pervasive throughout the system.
Spolsky: Have we had to shut off any countries yet?
Atwood: No, only IP's.
Spolsky: We haven't just blocked an entire nation?
Atwood: No, hopefully it won't come to that.
Spolsky: A small nation state?
Atwood: It won't come to that. So that's the one piece of advice, just really think about bounding your system in as many ways and as many places as possible, even really really early on. Then the second one is kind of obvious too, there's the whole desire for this meta discussion that I just wanted to whoosh away and pretend like, wave my hand, you don't need to talk about this stuff!
Atwood: But there's a real human need for people to really want to get into the system, and really enjoy it, there' a very very strong human urge to talk about this stuff, and repressing it is unnatural, and will lead to strange things happening in the system that you don;t want, and are arguably worse than having a meta-discussion site, so I agree with the criticism that I should have had this meta outlet from..
Spolsky: Well we had uservoice for all that time but it was...
Atwood: Yeah but that was kinda like, sweeping it under the rug to some degree, because uservoice was not a good discussion-y system.
Atwood: I mean it was good at certain things but discussion was not one of them, and that I think is a big part of what people wanted out of that, they want to be able to say okay “You want to be able to do this, here's a feature request, here's a bug”, and uservoice worked servicably at that I think but when it comes down to “lets talk about this” it was just, our system which was totally not designed for discussion is still way better for discussion. Oddly enough than uservoice.
Atwood: And plus, I think people like our system that's the whole reason they're there on the site, they like it, and to have the meta site be one of our sites is totally logical, much more logical than uservoice, etc. etc.
Atwood: So really, I guess those are the two high level things that I would point to, like the big big decisions about strategic things that I wish we had started on earlier.
Atwood: and so that's how you kinda, hurt people through the system without having al of of rules and FAQ's and stuff like that, just make the good stuff fun and entertaining and rewarding! And then the bad stuff just kinda falls to the wayside except for that small percentage of users who are …
Spolsky: I think the way we've described this since the beginning is there's always gonna be somebody who's gonna get sick of playing the game you've set out for them, they've gotten bored of the chess game and they've invented the new “Throw the pieces on the floor” game, because they're bored with the chess game. -laughs-
Spolsky: And we always knew that that was, it's always what happens in any kind of on-line community because it's, there's only a finite amount of time you can spent doing the official, officially sanctioned things either before you run out of them, there's just no questions left to answer that you know the answer to, or because you just get bored doing what the sit wants you to do.
Atwood: That's right, and then there's also the aspect of the Teachers lounge? To the meta site? Of you're just giving people who say “Oh I love being at school I love teaching people” because eventually I think if you use StackOverflow enough you become kind of, a teacher, you're teaching people things. Which I think is very rewarding I mean on some level when I'm writing the blog it's sort of – I mean it's a 2 way street I'm not saying “I'm the teacher, you're the student”, but we're both teachers and students at the same time.
Atwood: and it's a great powerful aspect of the system but at the same time it's like “Hey I love this so much, I'm just never going to leave school”, right?
Spolsky: Yeah, speaking of your blog, yeah, and these people need to get a life! No wait sorry! -makes a record playing backwards noise-
Atwood: No! That's not... I don't think that's true I think these people really like …
Spolsky: These people need to congratulate themselves!
Atwood: -laughs- No, no no, other teachers will learn from other teachers, on some level the teachers are learning how to teach more effectively, the students are learning how to learn more effectively, it's very much analogous to having after school activities at school or teachers lounge, I found that's a great way to explain it. And another positive side to having the Meta is you're giving people a teachers lounge.
Spolsky: I bet you get absolutely no traffic whatsoever, nobody goes to meta.
Atwood: -laughs- It does okay, initially I thought it was going to do really well traffic-wise but it does 1/10 of what serverfault does. Because somebody was actually asking me that, it's 1/10 of what serverfault does.
Spolsky: Yeah a little bit less or to put it another way it does 0.5% of what StackOverflow does -laughs-
Atwood: Yes, no StackOverflow does 100x what meta does and 10x what serverfault does.
Spolsky: 200 times.
Atwood: Yeah, that's a lot.
Spolsky: Yeah. That's about right, 0.5% of your traffic is meta.
Spolsky: Speaking of your blog, did you find any COBOL programmers? I also don't believe any of these people that say COBOL is like, everywhere.
Atwood: Well, yeah, you've probably met..
Spolsky: I think they're probably just reading some old article.
Atwood: I literally have never met a COBOL programmer. Have you?
Spolsky: No. I don't think I have.
Atwood: That's just, that was the shocking thing to me, okay, not that I've been all arund the world, far from it but ..
Spolsky: It's a myth. COBOL myth.
Atwood: Yeah! Because they say there's tons of code out there, there has to be..
Spolsky: How many legit COBOL questions do we have on StackOverflow?
Atwood: I actually looked at that there's actually some good COBOL questions on StackOverflow.
Spolsky: There's a couple.
Atwood: There's like 80.
Atwood: 62, there's 62.
Spolsky: Yeah and half of them are like, “Does anybody actually use COBOL?”
Spolsky: So those, half of those don't really count. “should I learn COBOL?” “Ho can we make COBOL programmers good programmers?”
Atwood: Oh that's great, we have to feature that one day, the question of the week, “Should I learn COBOL?”
That's just shocking, why would you want to learn COBOL? It's crazy.
Spolsky: Yeah, unnecessary, now lets take, just for a hypothetical other thing that people never use, what other tags have 62? Like I'm gonna have to go to page 202 or something here on the tags page.
Atwood: Yeah you're gonna have to get way deep in the pages.
Spolsky: Lets see what else is around there, um, oh it's not so deep, 50, I'm doing a quick binary search here, page 25, page 20, OK here we are it looks like it's on page 22.
Spolsky: Aaaand what else has the same..
Atwood: Smalltalk! -laughs- Smalltalk has 69!
Spolsky: -laughs- That's the only thing that you can even recognise, oh Delphi 2007, huh! I guess that's because a lot of people use Delphi2009.
Atwood: Yeah. Hey, FogBugz has 66, what is that?
Spolsky: There you go!! Fogbugz is at least as popular as COBOL. And those Fogbugz questions are legit, so I just don't believe it, you know what I think is happening is that in 1967 somebody wrote an article in scientific America saying the most popular business programming language is COBOL! And the journalists have been copying and quoting that information ever since then, but I just don't believe it, I don't buy it that there's a billion lines of code and you know what? It takes 4,000 lines just to add 2 numbers together in COBOL, so I wouldn't be surprised -laughs-
Atwood: Oh, it was shocking. Because when I wrote that blog entry I thought I should probably put some COBOL code in here so people know what I'm talking about, and you can do COBOL .NET, I'd known that from the earliest days, I remember they listed all the languages you could use in .NET, and of course nobody does because everybody jut uses C#
Atwood: But umm, they had Fortran and I remember seeing COBOL, wow, COBOL! COBOL running on the common language runtime, that's hilarious. And then if you trace though, it's like this 3 line thing in C# becomes what 12 lines in COBOL, of like really dense, upper-case text. It was just appalling ,and you can kinda see why no sane programmer would seek out COBOL.
Spolsky: There's a question, there's a totally legit question here about COBOL, in which somebody wants to do something that's like, you know, a word in SQL syntax -laughs-. It's not even a statement in SQL it's like, he just wanted to check for duplicate records. It's such a trivial thing to do in SQL and in COBOL you're like, “Oh yes, this is all organised to be able to do that, first you create a file and then you create another file and then you sort through the sorts of files and then you go through them one at a time and you”... ugh.
Atwood: Well did you notice there's ways to do math that are literal? Like, “Add years to H”?
Spolsky: Yes, that was the original, the thing about COBOL, at first I think there was this design decision that it should be an English language type thing? And this would allow the business analysts to write the code somehow because it would be kinda like English? And that turned out to be harder than they thought in 1956, and it;s still hard, and the last programming language to reproduce that mistake is Apple script.
Spolsky: But anyway they then came up with this story that, if at least the syntax was English-like, and the programmers had to write it using this obscure subset of English, at least the managers could understand what it was they were doing. So COBOL supposedly had the benefit that a manager could understand “Add 2 to increment accumulator”, or whatever.
Atwood: Right, and that's been debunked so thoroughly now.
Spolsky: Well the managers will never understand what's going on, but there are all these things like the classic COBOL statements like, “Multiply price by sales tax giving you the total, and put this away somewhere” -laughs-.
Atwood: Well the COBOL tag is good for hilarity, I mean every other question is just hilarious -laughs-, because COBOL is hilarious, you can't really take it seriously it's just, aah -laughs more-, so yeah, if you're ever bored and want to spend 30 minutes amusing yourself, then definitely browse the COBOL tag on StackOverflow, and marvel at the wonder that is COBOL. And where are those 220 billion lines of code or whatever it is the analysts keep quoting over and over...
Spolsky: The thing that I don't really get is the .. oh you know what happened, the company that made COBOL bought Borland, microfocus was the company that made the COBOL compiler.
Spolsky: And they finally bought Borland. That's a shame. So turbo Pascal has been purchased.
Atwood: By the company that makes COBOL -laughs- that's a sad end.
Spolsky: That;s a very very sad ending for the Borland legacy.
Atwood: It was probably one of the saddest possible endings.
Spolsky: Well you know what it was was that Microfocus was this company that grew from being very much a legacy provider, like, Okay it;s no the coolest thing but you're stuck in COBOL land, so we'll take care of you. They had the only COBOL compiler for PC's, which was practically impossible to get to run I believe, and they've just been dragging it kicking and screaming along, adding windows programming and .NET programming and all that kind of stuff when it doesn't make any sense.
Spolsky: What was I going to say..
Atwood: Well it;s a product, it must have a market?
Spolsky: Yeah yeah yeah! It must have a big market.
Atwood: I mean there must be money in this somewhere.
Spolsky: Yeah, you've got a system that was build in 1964, and you don't wanna redo it, because god was that hard in 1964 – and you've tried to build I several times over the centuries – a good example would be the flight control systems ran by the EFA in the USA, to track aircraft in the sky. And it's running on mainframes and written in COBOL or, you know, IBM 360 assembler or something like that, and it's just tons and tons and tons of code and it;s just a big hair mess, you don't want to mess with it, and you rely on it. If you look closely you'll discover that it was built in 1964. A big computer might have had a 256k of main memory. So how much code could there be in 256k, right ? Like how long could it possibly take to rewrite that using a modern language with all the advances we've made since the '60s. If you could start from fresh, figure out exactly what it's solving, it could not be that complicated because it's gotta fit into 256k. With the data.
Atwood: and even then the biggest hard drive was what like a gig, that was enormous...that would have been like millions and millions...
Spolsky: No I think we were talking about 5 MEG, like those big gigantic hard-drives they had in one room would have 5 megs of storage in them.
Atwood: Oh right.
Spolsky: They had to be like the size of a dishwasher.
Atwood: You saw those at the computer history museum right? They were really cool.
Atwood: Like the giant hard-drives they were like, Look! One MB!
Spolsky: Yeah, I've used them there was one that controlled Datamane(?)(24:30), a very famous hard drive, and in order to keep it absolutely spotlessly clean .. the dust would get in there and cause the heads to crash.. so they built in a vacuum cleaner, so there was like a little vacuum suction nozzle like right next to the platters of the hard drive? And everybody said it was the only thing they ever made in their life that didn't suck.
Atwood: -laughs- Nice!
Spolsky: The vacuum cleaner? Get it? Didn't suck!
Atwood: I totally got it. That was a good one.
Spolsky: Not a true story!
Atwood: So yeah, COBOL. Interesting topic. I mean there are a lot of old languages that are still.. I mean LISP, right? It's still revered. And like, ALGOL, I remember going to the computer history museum and looking at the ALGOL display, and thinking “wow that looks like something I could have written today!”
Atwood: So there are languages that have stood the test of time, and then there's COBOL -laughs- right?
Spolsky: I wouldn't say, it's not fair to say that Lisp has stood the test of time.
Atwood: I mean, there's a lot of people that still really respect the Lisp syntax and the power of Lisp and..
Atwood: I mean certainly it's stood the test of time more than COBOL, right?
Atwood: You have to agree with that?
Spolsky: No I think COBOL is running on a lot more companies than Lisp is.
Atwood: It's a good pint.
Spolsky: I think Lisp is pretty much just running ITA software and that's it.
Atwood: Well that;s a good point I mean there's a big disconnect between what's readable and what's clean, and what actually got used.
Atwood: And that;s why, if you ask me what I think the current COBOL is, the COBOL of today? What will be COBOL tomorrow? I used to think it was Java, but I don;t think that any more, I think Java is like the new C basically, it's just standard and pervasive and kind of everywhere, it's not bad it just is what it is, but I think the new COBOL honestly? PHP. PHP is the new COBOL.
Atwood: There are going to be billions of lines of code.
Spolsky: I buy that yeah.
Atwood: Produced in PHP, it's got kind of a weird syntax that not everybody – including me – likes, thinks has got to be insane on some level, but then it doesn't matter because so many people are using it, so many people are creating stuff with it, and you can't really argue with success right? I mean if all these companies are using PHP and being successful and building cool stuff, I mean who am I to really judge?
Spolsky: There's a certain class of languages though, and COBOL does not have this feeling, there's a class of languages though that tend to attract the kind of people that – even if there's nothing wrong with the language – attract people who are just looking to get something done, and those tend not to be the best programmers. I mean they might be good at other stuff, but they're forced to do some programming? That was the problem with Visual Basic all these years.
Atwood: Yep, that was the classic Visual Basic problem.
Spolsky: And Java and PHP are all in that class, where they've attracted programmers who are not professional programmers and don't really care about how clean their code is, they just want to get something done, and that's why they've chosen this language, because everybody says it's a really easy way to get something done. Perl basically killed itself through this approach.
Spolsky: And so the code that exists in those languages is of a much much lower quality than you would expect.
Spolsky: I don't think that's true of COBOL, in the case of COBOL nobody starts a program, you know, think about what it was to be a programmer in the 60's, you probably went on some very special 3 month training course or something at IBM offices in your city, at the big old IBM tower in your town that every city had, and you took some extensive course that had all kinds of training and handbooks and all things like that and were basically put through the basics, and you learned how to do things in very specific ways, and you did them, and they worked. Maybe the reason we never hear from any of these COBOL people – well first of all they're probably all older, and secondly they're probably all at the point in there career where they're not going to change how they've done stuff in their career, and so for example, they're not used to having the internet available as a tool for learning things , they're not of that generation who logs on the net and reads things that are written on the internet. I'm going to get all kinds of nasty e-mail now.
Atwood: -laughs- No, I think it's a valid point, and one worth considering. I think one of the weaknesses of the current programming community is that there's all these young programmers – and I used to be one too so I empathise, and I think I've talked about this on previous podcasts – where you feel like, all that stuff the old guys did is irrelevant, it's all about what I'm doing now. You know, I am the vibrant new life of this industry. And you know it's true...
Spolsky: Yeah, it's fair because you know what, if you had to do COBOL? You would just quit. You just wouldn't do it.
Atwood: Yeah and there's a lot of – we've talked about this before on the podcast so we won't de labour(?)(28:46) it- but there's a lot of pre-suppositions that a lot of older generations have made that are no longer true, so there's a lot of truth to the young, up-and-coming, young-gun programmer. They really do drive a lot of the industry. But the downside of that is that is that they tend to forget there's actually lessons in this old stuff that transcends time. The people stuff, essentially, why things work, why things don't work, it often boils down to, you know, the human factor stuff – like, why do they design it a certain way? Does it match the way people work? Those lessons are timeless, and I think they're throwing the baby out with the bathwater in not looking at this old time stuff and trying to suss out like, why, you know, the history of it? And analyse why it works. It's like learning ancient Roman history, I mean, why is that really valuable to anyone?
Atwood: It's because of people! People haven't changed that much in 3,000 or however many years it's been, we're still doing the same basic stupid human things just, with atomic powered devices now, so it's worth considering that these aspects of computer history are still valid, even if COBOL as a language is kind of crazy, there's still lessons there to be learnt from it.
Spolsky: Yeah, not really.
Atwood: No? -laughs-
Spolsky: Well, you;re the one that doesn't wanna learn C!
Spolsky: And that's even of our generation, so to speak.
Atwood: Yeah, I think the whole pointer thing, the whole memory allocation..
Spolsky: That's what COBOL was full of! It was full of unnecessary words.
Spolsky: That were there to..Oh my god, I mean, without knowing much about COBOL, COBOL was being built to do business applications, which means a lot of databases actually, which means accessing databases, running a payroll, that kind of stuff, and it's the stuff you may do much of with SQL today, which is the next generation – except you had to do everything manually,you couldn't just say “Get me all the full time employees and their salaries so I can pay them”, you couldn't do that, you had to say “Open this file, the record looks like this, now read the record, now evaluate if the salary is that”, and it was just ridiculously verbose for a reason that there is just no reason to do any more because that particular problem has been solves in the 60's, you know, as opposed to the 50's.
Spolsky: You know, there's an enormous number of, I feel like, not only are people not gonna know, I mean you're right, not only are programmers going to be able to go through their entire careers not knowing what a pointer is, but there's stuff now that we have to do kind of manually, that people are not going to believe because there's going to be the next generation of even faster stuff. So like right now, one thing people obsess about in the .NET world – and they have to because even Microsoft hasn't figured this out – is “How do you get data out of your database?” and the current 2 contenders are I believe, is the new Entity frameworks, which is supposed to be the new hotness but it doesn't do all kinds of things, and then there's the old link-to-SQL hotness, which never really got finished, and neither of them are complete or can do basic stuff, and so you're always forced to decide between these two different ways of doing things, without good information. Now all the .NET people are getting angry and picking up their pens to write me a letter. Well am I wrong here? That there's..
Atwood: Well the thing you picked is kind of like, the vietnam though, because the whole object-relational mapping problem is just so hard, I don't think there is an answer to that really.
Spolsky: Well there might be, what's happening right now is that there's massive obsession, because when you use one of these tools it sometimes generates bad SQL. By which I mean, if you are not writing your own SQL statements in your source code, there is a very major risk that you generate a SQL statement that pulls back column's that you don't need, or pulls back rows that you don't need and therefore wastes time, or does something in an inefficient way that does something without using an index you've carefully created, or..
Atwood: The way I like to think about this is, it;s the one time where assembly language, and by assembly language I mean hand-tuning SQL statements matters, because the performance, these are huge performance issues you're talking about, this is not “I shortened a loop and saved 1ms of 100k iterations”, which by the way, it's amazing how easy it is to – I do it all the time I fall into that trap of “I'm going to optimise this!”
Spolsky: Right right right right.
Atwood: And it's some code path and like, I profile it, and literally I've saved like 10ms in the entire day.
Spolsky: So it's obvious that 10 years from now, I hope we're still not doing this podcast -
Spolsky: Hopefully I will have gotten mini-Joel to take over by now -laughing-, or maybe one of these interns that are here can take over for me, but 10 years from now, the kids are going to be coming up and they're not gonna know SQL, and they're not gonna optimise anything, and there's gonna be all kinds under the covers, and they're gonna say “You old people you used to have to worry about whether you – now we just bring the whole table in, don't care how big it is, there is no table on earth big enough that you cannot fit it in the L1 cache of a modern CPU.”
Atwood: That's a great point because I actually have written to someone, on twitter that I have – oops I used the “T” word, sorry – we were upgrading memory and I was just marvelling about how cheap memory is, 48GB, 64GB, and it'll come to a point where you stop worrying like “You know what? All that disk stuff? It's a waste of time!” Just have everything in memory 24/7!
Spolsky: Yeah, sure.
Atwood: And have huuuuge amounts of stuff in memory, and sure, what you're describing could actually happen, although I..
Spolsky: This is what I don;t understand about these kids who, when they're concatenating 3 strings together they're scanning each string 3 times! Oh my god!!!
Spolsky: “Look at how you did that!” and they're like “Shut up old man!!! It doesn't matter!”.
Spolsky: And it;s gonna be the same with all that SQL stuff, it;s gonna be the exact same, there's gonna be some horrible inefficiencies going on, and there's gonna be some instruction in the CPU that Intel built for us, that does a Select clause in some way that is monumentally faster than anything you've ever seen before.
Atwood: No that's a good point, we've sort of danced around this on previous podcasts, but that's why the young programmers coming up not knowing any of this is actually a benefit, they don't learn the obsolete stuff, they don't learn these prejudices we have that no longer matter!
Spolsky: They don't waste a lot of time threading.
Atwood: They just start with an open mind, the beginners mind!
Atwood: And it works for them, if you're smart and you start with a beginners mind, you're going to learn just the stuff that matters and throw away all the stuff that doesn't. And we don't have that luxury, we've learned all this stuff that no longer matters.
Spolsky: Yeah, but sometimes we old folk are able to pull something out of our ..
Atwood: Our crusty old ears and our head?
Spolsky: ..Toolbox that actually blows away the young ones and they can't believe we've accomplished something and they're like “That doesn't seem possible!”.
Atwood: Yeah, when you've done all that experience stuff that's where the people stuff comes in, that's what I think I've learned over time it's more the people stuff than computer skills, that's the ore that'll help you 20 years out, more so than individual technology stuff will.
Atwood: But this is a good segue into something else I wanted to talk about which is, profiling, which you mentioned in optimising the concatenation of strings.
Spolsky: Profiling? Is that where if a little old lady walks up to an aeroplane you don;t really have to search her handbag because, come on, she's a little old lady!
Spolsky: That kind of profiling?
Atwood: Not that kind of profiling.
Atwood: Performance profiling.
Atwood: We periodically like to go through our code and re-visit performance assumptions, and figure out where we're spending our time and that sort of thing, and one thing I've learnt – this is the crusty old toolbox – wherever you think your code is slow? You're wrong, not only are you wrong, you're probably totally and completely wrong, you're probably looking at the fastest part of your code! -laughs-
Atwood: So, never ever assume, I mean I've been wrong so many times on this I've just given up. I've given up on guessing where our code sucks and is inefficient, and we use a profiler. Now, one of our buddies, Red gate's has the Ants profiler, which is actually very very good, they have an evaluation version that you can download. And it works really really well, so what we do is we take some popular page, say, the questions page, and start the profiler, load it up, and basically just refresh the page 50 times, so that basically we get a bunch of data.
Atwood: ..in the system, and then once we do that, the profiler report will show us just the really hot stuff in the code and leaves out all the noise, it shows you the top of all the functions that are being called that take the most time in this code path on the page, and as usual it's never what you think it I, and often it's things in little nooks and crannies of the code that you have forgotten about.
Spolsky: Yeah, some of the stuff you have to think about – you're probably just looking at it server side like, time being spent on the server? But if you actually – or are you looking at it from the perspective of the browser?
Atwood: No this would be all server side.
Atwood: Well how are you actually measuring this though, are you measuring it from the client..
Spolsky: From the browser, yeah.
Atwood: But how?
Spolsky: Well there's this thing Yahoo! Has called Y slow which is pretty good and then there's..
Atwood: Well that's not to measure client that's all server performance …
Atwood: Well it shows you I guess the known render time?
Spolsky: Yeah, exactly.
Spolsky: Right, yes.
Atwood: Well I agree with what you're saying that a lot of the control that you have over the speed is ultimately decided on the server...
Spolsky: No, I'm saying the opposite! -laughs- That one of the things that surprised us when we looked is a lot of it is based on the browsers beg so damn slow.
Atwood: Oh yeah, that's absolutely true. No question about that.
Spolsky: And, um, and so where I thought it was slow and just grinding away on the server, was not actually that big an issue, and where that doe matter – and here's the difference between StackOverflow and Fogbugz – is that on FogBugz, except for Fogbugz on demand, you've probably put FogBugz on a server that's not very busy, if it's only serving FogBugz it's got plenty of CPU time. Whereas in the case of StackOverflow, making the server side faster – even if doesn't affect the end users wall clock time, can dramatically increase the number of responses you can produce on a given piece of hardware in a given amount of time.
Spolsky: So you kind of have to know whether you care about server side speed.
Atwood: I'm having a hard time parsing what you're saying here I mean, of course you're going to care about server speed.
Spolsky: No, no no you may not, in the case of FogBugz for example, somebody will get FogBugz, install it on a server, and this will be serving the 12 programmers on their team who each go there and get a page maybe 12 times a day.
Atwood: Oh I get it, the load is so low..
Spolsky: The load on the server is low enough that for all intents and purposes the CPU is not being used and therefore, once you get the wait time to the point where you don't notice it, there's no point optimising it more. But take that same code base, and put it on a server where you have 50,000 users banging on it, and all of a sudden you're using 90% of your CPU, and if you could reduce the CPU load a little bit you might be able to reduce it to 50% and thus have dramatically lower wait times for everybody.
Atwood: Oh I understand what you're saying now. That is true, I mean in one case you're optimising for the busy low-load state where you just want to spin-up as fast as possible, and that is kinda the opposite of what we do on StackOverflow, because you're kind of juggling balls all the time, and you're a juggler juggling thousands and thousands of balls, and they're always in flight, right?
Atwood: And the set-up item to start this juggling is nothing because it's always happening, but if the juggler has to go pick up his balls from the container and get them ready every 5 minutes and you say “Hey go juggle some balls for me” -laugh-
Spolsky: Yeah, yeah.
Atwood: that's almost exactly what's happening right? There's just not enough balls in flight.
Spolsky: Right. So you have a server – if you can serve it from a different domain name then your browser is going to open more connections and get it in parallel.
Atwood: That's right, and I think newer browsers have gotten more generous with the number of connections they will make, like simultaneous connections, out of the box?
Spolsky: Right, traditionally 2 per domain.
Atwood: Yeah, IE6 era it was like 2, really low.
Atwood: And I don;t know where we are with IE7 and IE8, but I assume with IE8 and any browser of that vintage, anything new, I think it's doing quite a few connections simultaneously. I think this was done to not saturate connections and stuff like that.
Atwood: So that's another benefit to using another domain name. So we bought this domain name, sstatic.net, to use for this purpose, and just had it sitting around for a while – and this weekend, we finally got past our inertia and decided “Let's just get this thing done”. And went ahead and rolled it out, so now all the static content is served through sstatic.net. And one of the surprising benefits of this is this is like, poor-mans server farm, because we only have one StackOverflow server, we have not paid the technical price of having 2 servers yet – we're getting there, but we may have postponed that for another month of more, because it's shocking that putting those static content requests but our requests to the server in half.
Spolsky: Yeah serving static content, isn't that like, in kernel mode or something now? I think there's like http kernel mode...
Atwood: There are ways to make it in kernel mode, there's all these complex set of rules about how much dynamic...There's certain things you can't do once you're in kernel mode but yeah.
Spolsky: I thought serving static files was?
Atwood: It can be. It depends what's being served and how you've configured it, I don't know if we have it configured that way, and it doesn't really matter it's irrelevant.
Spolsky: That's cool, it does take a bunch of the load off. Yeah.
Atwood: Yeah it does because you have less concurrent connections so the server has less to juggle, and we saw a direct benefit to CPU which surprised us, even though the CPU didn't go up on the server that is actually serving these requests, which has a shockingly low CPU usage, we definitely saw a decline in CPU usage on the StackOverflow server and we think it's because there's just less connections in flight at any given time. So it has more time to focus on the connections that take longer.
Spolsky: People used to reccomend serving J query from Google?
Atwood: Well we do that too now.
Spolsky: Does that even make any difference? Is that a myth?
Atwood: Well the reason we did that, okay so, I was against that for a while -not because I don't like google, I love google – but we were sort of mooshing a bunch of files together to serve as one big lump instead of having to open 10 connections..
Spolsky: Oh I see..
Atwood: This was a problem because the major technological hurdle to doing that is that if J query doesn't come down for any reason? It is really hard to recover from that.
Spolsky: Does – wait, you're telling me there's somebody that allows StackOverflow but not google?
Atwood: Google API's, this is like Google API.
Spolsky: Oh so they block that..
Atwood: It's not google, it's not root google, it's like google.somethingsomething
Spolsky: Oh you know what if people want to break the internet the internet is going to be broken.
Spolsky: I'm so unsympathetic at working around people's crazy internal policies.
Atwood: Well I suppose it depends how many people have this problem.
Spolsky: I just want them to fail. People who are working at a company that doesn't like them get to google or whatever the case may be, or get files off the internet, I don't want to work around them I just want them to fail, because the company has moronic policies and it's necessary for the evolution of good healthy strong companies that that particular company – Fail.
Atwood: Well usually they just let the powers-that-be know they need the URL and it gets fixed, but I agree with you, if they let the powers- that be know “Hey, I need this thing” and they're jerks about it, then sure, then they deserve to fail, but I think there needs to be that chain of communication first before you can conclude that they need to fail.
Spolsky: You have to tell them why you're failing them, we should do it, we should block IE6
Atwood: IE6 is 7% of all our...
Spolsky: I just -laughs- found a way to save 7% of your CPU time!!!
Atwood: Well, using IE6 is becoming its own penalty because we have rendering problems in IE6 that we're just not going to fix any more, I mean the site will work, but it;s going to look a little weird...
Spolsky: There are companies out there that think it is better to put their employees on IE6 because it is more “Tested, stable, reliable” than IE7 or 8, where 7 has been out for a couple of years, and I'm not saying go crazy here and get firefox, I'm just saying eh, just, eh, and I'm not even saying that IE6 isn't stable and you tested it with all your in-house applications and you know that it works with all your crazy in-house applications, all I'm saying is that IE6 is a worse browser than 7, it is less stable, the very thing you are claiming about requiring people to use IE6 for is the thing that you are not getting my using IE6.
Atwood: Yeah I don't really understand that, on some level I empathise with what you're saying which is these companies are making decisions so bad, that maybe they're dinosaurs, but hey! There's a lot of dinosaurs out there. I don't know. But my position at the moment on IE6 is that we want the site to work – we make no guarantees about, if it's going to look kind of bad.
Atwood: The alignment is going to be off and there's so many crazy little CSS things that are wrong with IE6 now that we're just not going to fix, but we do sort of semi-guarantee that you'll be able to use the site at a basic level. But yeah I would love, and I'm sure the whole world would, for IE6 to just poof disappear overnight. But I'm not sure how realistic that is. So let me get back to, let's finish up what we were talking about which is – the serving of static content fro ma different domain is, once you get to a certain volume of site, is substantial, and I definitely recommend it, I'm sort of shocked – the site appears much more responsive. Because you're parellellising those requests, and they come from a dedicated server, and I think it's easier for the browser to cache it?
Spolsky: Well the theory was If you get J query from google then all those other sites that get J query from google are increasing the chances of a cache hit.
Atwood: The other reason it helps with caching – I didn't get to finish my little explanation – is when you don't serve the request with a cookie, some proxies will see that you served the request with a cookie, and view it as unsafe to cache, but if you're serving it as an unadorned file with minimal headers then it's much safer for the proxy to cache because there's nothing user-unique about this file.
Spolsky: Ooh! Right right.
Atwood: It's just HTML or whatever, so I strongly recommend it. And it's been a nice little performance bump for us on a number of different levels. Again it;s that poor-man's server farm right?
Spolsky: Yeah. One of those things 10 years from now we're not gonna know about or how to do.
Atwood: Well you were talking about all the google indexing requests from a different server, when the google spider comes to visit, send it to a different server. That would be another poor-mans optimisation or server farm. This is analogous to that. I also reduced our cookie size? I took a hard look at all our cookies that we were storing.
Atwood: What, is that a problem?
Spolsky: No, I'm just impressed that you've got time to do all this.
Atwood: Well we like performance, me when I say performance is a feature I want the site to be as fast as we can make it. I guess I'm kinda disappointed that we're never in any – I guess never is a long time but – it's unlikely we're going to get to content distribution network? It kinda bothers me a little bit that people in Europe have to go all the way to Oregon to get to our data?
Spolsky: Well you can put the static content on a content distribution network.
Atwood: I could put the static stuff on a CDN, I guess that would be the next little step being having our own little Mini-CDN.
Spolsky: Yeah, you know what at some point it's not – it;s a waste of time.
Atwood: Well yeah, we haven't gotten there yet.
Atwood: But it bugs me a little bit, I feel bad because the people coming from Europe, Asia, we have a huge international community right? I mean the US is #1 for sure just in terms of traffic and stuff, but I actually posted on the blog I think the UK, Australia, Europe, these are big contingents of our audience, and they have to come a long long way to get our content. There's nothing I can do about the speed of light, right? I can't make that faster, and the speed of the telecommunications network.
Atwood: All I can really do is optimise our servers to serve it up as fast as possible but I guess I would like it if we had some other server hub somewhere. Eventually. Maybe when we get to this wikipedia-scale we might eventually get to then hopefully it will make sense then. But aside from that I think we're at the end of the road in terms of low-hanging fruit of Y slow recommendations, the static serving was kinda the last major one we hadn't gotten to.
Atwood: Not that we can't get faster!
Atwood: Can always buy faster hardware! You know, people always criticise me about that, that blog post I wrote, you know, just throw hardware at the problem?
Atwood: But I wasn't saying JUST throw hardware at the problem, I was saying START with, I don't know why people do this they real like 10% of the post and they complain.
Atwood: Start with fast hardware. Then do the other stuff. Because fast hardware is so cheap, it's crazy from a financial standpoint not to get the fast hardware.
Atwood: And then do the optimisations, I'm saying do both, that;s really what that blog post was saying, do both of these things – but start with the hardware. It's just a no-brainer. Because optimisation takes brainpower, we have to think about what we're doing, measure it, get out the profiler, right?
Spolsky: Yeah we have finite brains. Which I think all our viewers will agree with -laughs-
Atwood: Yes! Writing a cheque to Dell takes like no brain power, writing a cheque to somebody is the ultimate no-brainer. Unless the cheque is for a billion trillion dollars or whatever but, hardware is cheap and getting cheaper all the time. So that was my point, we continue to do both, we optimise the software and the hardware.
Atwood: Yeah, so hopefully the site is nice and fast now for people or faster.
Spolsky: I didn't notice, it's always been very fast for me. -laughs-
Atwood: Well that's good, you're coming form New York, it's got to go cross-country, so that's good.
Spolsky: Nah cross-country is not such a big problem, Europe Isn't that bad either. When people usually have a problem it's not, well, Australia always has a problem because they don't have enough bandwidth for the whole country, the whole continent down there.
Spolsky: And err, yeah the pipes, and everybody is paying through the nose for bandwidth down there.
Atwood: So hey, before we go, I know it's getting close to the end of the podcast, but I want to cover this question because it's such an interesting question, and we haven't done a StackOverflow question in a while.
Atwood: So this is question number 1133581
Atwood: 3581. And the title is, Is 23...
Spolsky: Oh now StackOverflow is really slow. And I mean really slow.
Atwood: It shouldn't be!
Spolsky: Jeff! I don;t know you're going to have to read it, nope, nope, it's not coming up.
Atwood: That's weird. Yeah it does...ah interesting.
Spolsky: Is it not coming up for you? It came up for you.
Atwood: Well it was a little sluggish there to load the homepage, but now it's fine.
Spolsky: Maybe I'm just off the internet or something.
Atwood: Maybe some temporary thing. Anyway. So this question – Is -some giant number- a magic number or sheer chance? I'm not going to read the number because it's enormous. So this post has 156 upotes, and 72 favourites, a comment with 82 upvotes, the top answer has 563 upvotes...
Spolsky: Is that that thing, the person got charged on their phone bill..?
Atwood: Yes!! It's about some guy got charged a giant large amount of money to buy a pack of cigarettes.
Spolsky: Yeah that was on reddit and digg and stuff, that's why it got all those votes.
Atwood: Yes but it is interesting because the guy who got 563 upvotes was able to figure out – and I suck at this stuff, these puzzly kind of things – he was able to reverse engineer where this number came from.
Atwood: And I don't want to spoil the surprise, I want you to go to the question and read it because I really think that Guffa, guy...
Spolsky: It was pretty awesome he's tryna figure out why does a random line on your phone bill, was it phone bill?
Atwood: Credit card bill.
Spolsky: Credit card bill, show up showing -some huge specific number-.
Spolsky: Like how do you get this particular number, this crazy number showing up on a credit card bill?
Atwood: Yes. Exactly.
Spolsky: As opposed to any other crazy number.
Atwood: And somebody gives this totally plausible explanation that seems accurate.
Atwood: Just based on this completely random looking number, and I think that's what people are reacting to, the brilliant piece of detective work.
Spolsky: Yeah it was a cool piece of detective work. It's similar to some other bugs that have happened with similar form, do you remember when somebody found 6 numbers in Excel that, if you multiply them you got something obviously wrong? Like very very specific numbers, very very specific floating point numbers in excel, and I think you needed Excel 2007, I think that was the version, they just found a couple of obscure floating point numbers that when you multiply them you get an obviously wrong answer. Do you remember this?
Spolsky: Vaguely, it was just one of those little floating point bugs that you hardly ever notice, and out of every single number you can possibly represent in excel, this particular bug was on 6 of them.
Atwood: Oh yeah I remember that.
Spolsky: Or 8. I think I might have blogged about it.
Atwood: Yes, that was a great entry I remember that.
Spolsky: And a lot of people tried to reverse engineer it, and nobody ever did, nobody was ever really able to figure it out, and even after Microsoft told people – until they sort of, listed the assembly, nobody really understood how that bug came about.
Atwood: Yeah. These number bugs are kind of fun I'm just not, as we've previously established I'm not by any means a math wizard, and I also dislike puzzles so … but I have tremendous respect for people who can...
Spolsky: Who can figure out these things.
Atwood: Yeah absolutely. It's fun – it's a very geeky fun thing so...this has come up too on StackOverflow, there's this disconnect, some people don't want some of the fun questions – this would be kind of a fun question I guess, like, where does this magic number come from and why does it exist? Is this really about programming? Is this like, how do I write this C# code to do this particular thing? Not really..
Spolsky: Pretty much any time one of our questions gets on the homepage of reddit you have to be a little suspicious, I think you might have to take away like 100 points for that.
Atwood: But I think some of these fun questions so long as they're programming related, which this one clearly is, I think they're okay, I think you need to be a programmer to understand the nuances of hexadecimal and overriding things with spaces and it's strongly related, I don;t really have a problem with it, and I think some of the fun stuff should be allowed.
Atwood: It's interesting right?
Spolsky: I'm all for it. I got a credit card from a bank once that had an expiration date, that said 49.
Spolsky: And it just didn't work anywhere.
Spolsky: It was like a little credit union and I think they had decided that, I guess they had gone back and read the spec and they found they could set the expiration date for their card as far forwards as they wanted, up to .. and there was some algorithm that decided – this was in the '90s, and there was some algorithm that decided wether it was this century or next century...
Atwood: Wait you had a credit card with an expiration of 2049?!
Spolsky: Yeah. Well ,it just had the 2 digits 49.
Atwood: -laughs- Well that's insane though! Who does that?
Spolsky: I know! And that's the thing, I guess this little credit union decided that your credit card, or debit card or whatever it was, that they issued, they just didn't really want it to expire right? That was their decision, and it was technically legit I think, but it didn't work anywhere.
Atwood: Okay, so they've decided that in order to make a non-expiring card, they've made a card that doesn't work anywhere.
Atwood: That's great. That's not at all what they had in mind is it.
Spolsky: -laughing- No it's really not. But I think it's kind of funny because they said, “What's the latest we can make the expiration date?”, and they looked at the spec, and noticed numbers below 50 will be deemed to be in the 21st century, you know that's what some spec said so this should have been legit.
Atwood: I don't know who authorises this craziness...
Spolsky: It's a little crazy, but these little credit unions you know they're like, 4 and a half people, it's not like making a bank you know, credit unions are very easy to start up. Relative to the full fledged banks.
Atwood: I guess. So did you have anything else you wanted to discuss before we...
Spolsky: No, I'm not sure if I'm going to be, I'm not sure how much bandwidth I'll have next week because I'm going to be away in le France, and not just any France, like the rural France.
Atwood: Wow. Do you want to just not do the podcast?
Spolsky: And they claim there's high speed internet but let's just not count on it
Atwood: Okay, we'll see.
Spolsky: So we may be skipping the podcast next week, and the week after that I'm going to be in Barcelona. Well i'll definitely be on e-mail so i'll let you know but, Augusts are difficult. Hey, I just did all my travel reservatiosn for the StackOverflow Devdays, oh my god it's insanity, i'm away from hoem for 3 weeks! Compeltely crazy.
Atwood: I'll be alright.
Spolsky: But we should still be able to do a podcast during that or, maybe what we'll do is record some of the sessions and throw them up, and cal lthat a very special StackOverflow podcast.
Atwood: That'd be cool if we could do that.
1:02:14 – Outro Credits, music, trail-out.