View  Info
 

Podcast 44

Revision #15, 3/12/2009 6:27 PM
User: "Completed until 34:50"
Tags: (None)

Previous Next 

Podcast 44

Revision #16, 3/12/2009 6:32 PM
User: "Fixed a spelling mistake"
Tags: (None)

Previous 

[incomplete]

Intro, advertising

[01:11]

Atwood: Are we on President number 44 now ?

Spolsky: Maybe? There's a question as to whether you count that guy that was President twice.

Atwood: <laughs>

Spolsky: { Not sure, please check } And wasn't there somebody who had been president for five minutes?

Atwood: Oh gosh I don't know, my American history isn't good enough to cover that.

Spolsky: Do we have a podcast guest?

Atwood: No, no, I had to { Fill this in }

Spolsky: Oh, next week maybe.

Atwood: Do you wanna play that song, there was a song.

Spolsky: Nobody like us.

Atwood: <laughs> That's not true.

Spolsky: We had a guest song, I don't know, let me look up onto YouTube and see whether I can find it.

Atwood: Yeah.

Spolsky: What was it?

Atwood: It was a parody of Let It Be, is it what it is called?

Spolsky: Right, it was about the C programming language.

Atwood: Yeah, I actually contacted that guy. I had a link here, I'll send it to you.

Spolsky: Here it is.

Atwood: I contacted that guy on YouTube but everytime I done that...

[Spolsky plays Write in C song at the background until the beginning of the chorus]

Spolsky: Write in C. Alright, enough of that word.

Atwood: That was really funny, though, I enjoyed that.

Spolsky: It was a good one, it's a nice, hardcore... so you actually contacted this kid?

Atwood: Well, I have. In the past I wanted to contact people on YouTube [and] you can use a contact form, because I have a YouTube account. And I contact him, explain { Fill this in }, who you were, who I was and of course there was no response at all. <laughs> Which is pretty typical, I mean YouTube is just not a good messaging... uh, mechanism.

Spolsky: The first thing I'd do if I post a video to YouTube would be to install some kind of, like, email electrification, zapper and nukifier to prevent everybody contacted by anyone.

Atwood: Yeah... so we are { Fill this in }

Spolsky: Yeah, it's not paying the proper royalties to the Beatles anyway. <laughs> We'll link to that from the shownotes. Awesome song, Write in C.

Atwood: That's right, Joel's favourite song. Write everything in C, because Joel does in fact write everything in C, don't you, Joel?

Spolsky: I started using a little bit of C99, the latest version of C, which let you declare variables after you written some statements.

Atwood: Isn't there like a... well no, there's is another version of C++ coming out, like, 0x something?

Spolsky: Yea, C++0x. { Fill this in } they just haven't decided what year it's going to ship yet.

Atwood: I see.

Spolsky: I guess it's either C++09 or nothing, they are kind of running out of 0's.

Atwood: Mmm, yea, as I'm not a C programmer, I don't really keep up with that stuff, but occasionally...

Spolsky: Do you know that I had lunch with Brain Kernighan?

Atwood: Oh right! That's awesome. How did that go?

Spolsky: You know, he told me what he thought was the one mistake of C programming language. Now he wrote the book The C Programming Language but he did not invent C. He work with the folks that did, he invented the language is called Awk... umm, among other things, probably. But it's an awesome book, and he said that probably the only mistake in C was the operator precedence of the bitwise logical operators as compared to the equality operator. He thought that the bitwise logical operators should be higher priority than the equality operator. And other than that he thought there wasn't really a mistake in the language, [and] I tend to agree with that. I'd say that's kind of true.

Atwood: Certainly it's been wildly popular. I mean, C has been the backbone of a lot of programming [languages], so, by that measure, it's wildly successful.

Spolsky: Yeah.

Atwood: I don't think that the criticism that it is a low-level or medium-level language, that was by design, that was the intent of C.

Spolsky: And don't forget the context, it was 78 or something, right?

Atwood: That was a long time ago.

Spolsky: So, you have to put things in context. The stuff in the C that I consider as accidental complexity like stuff you have to manage yourself, like memory management, things like malloc() things for yourself you don't have to do anymore because we have figured out ways to let the compiler and the runtime do it for you with garbage collection... or even with the reference counting like they had in Visual Basic. So the stuff in C that just doesn't have to be in a programming language anymore, but that doesn't mean that it wasn't a good programming language for its time.

Atwood: Absolutely. Now it was hugely influential -- C#, Javascript... and the other languages that looks like C, which... I was kind of bitter about, actually. I was never a big fan of sort of the way select...

Spolsky: That wasn't C, that's Algol, right? C was looking like Algol, I mean, those are all the strutural programming languages that was meant to look like Algol68.

Atwood: Right, but I blame C. Because it was just so much popular. I don't know how popular Algol was, but certainly, as long I've been a programmer, C was like, the touchstone/cornerstone language and it seemed a lot of language decisions like in Java and C# were made so that people wouldn't look at the code and freak out "I don't recognize this! This doesn't look like code I could understand." So they made it look similar to C to reduce the learning pain. I'm a little bitter about that because I always felt that I really dislike the look of C.

Spolsky: Really?

Atwood: Yeah.

Spolsky: So clean and... mechanical?

Atwood: For one thing, the curlies... like when you are ending curly braces, you never know what you are really ending. Just another curly brace -- I guess it's kind of like the Lisp problem, you have paren(thesis), you never really know the parens are closing. This is just a preference, but I'd like a little more verbosity in the ending of blocks, that I know exactly the block that was ended versus just....

Spolsky: That was a weird thing... You know there were programming languages where the end matchs the opening of the block, so like Basic: If... End If. But then again, what if you have nested if's? You still don't exactly know which one it goes to.

Atwood: Yeah, that's true. But it's really just a tradeoff, it's just a preference. It's not written in stone obviously.

Spolsky: I come to like the fact that Python that you just don't write an end: you just unindent, and so the indenting actually reflect the structure.

Atwood: That was actually a cool aspect of Python. I never really work in Python actually at all, but I always respected the choice that it was a brave choice to make the whitespace to carry the meaning and actually enforce the whitespace, and thought that was very cool. And one thing I do agree with -- and coming from a Basic background, which is the language I knew for a long time, I eventually grudging agreed that having carraige return be your line terminator.

Spolsky: That's Unix, that's not really...

Atwood: No, no, what I mean like, so in C# or C, until you put a semicolon in, it's one line.

Spolsky: Oh, I'm sorry. I thought you meant the line separator was the carraige return and  linefeed. You are talking about the semicolon versus just the line end.

Atwood: Right. At first I was bitter about that, because I was like "I have to type the stupid semicolon at the end of every line", but then I realize that it gives you so much flexibility because there's a lot of situations that you get into in Basic where you want to continue the line you have to use some crazy, like, line continuation marker like underscore.

Spolsky: It's not crazy, it's awkward.

Atwood: It's just awkward because the semicolon way, I think, is much better. And I think any language you have to have an explicit line terminator, I don't think it's a good idea to use carraige returns as a line terminator.

Spolsky: Hey do you know what Kernighan's second favourite language is?

Atwood: What's that?

Spolsky: Basic.

Atwood: <laughs> Really? Did he tell you that, today?

Spolsky: Yeah.

Atwood: That's crazy.

Spolsky: No, it's true, he said that's sort of a secondary language. He told me that "Are they really coming up with another version of VB6? 'Cos I really don't like VB.NET."

Atwood: Wow.

Spolsky: Yeah, he has done a bunch of stuff like, for example, he had some kind of a big complicated library that did all kinds of interesting optimizations and it didn't really have a very good UI like input/output thing. So they did the input/output UI through Excel, or you'd just type things into a spreadsheet and then he had this VBA code that basically linked these COM objects that were written in C to interface with them.

Atwood: Cool.

Spolsky: But in his class, I think, he teaches little bit of Visual Basic, a little of COM programming, a little bit of.. . he teaches a lot of other things, a little bit of Awk, a little bit of command line... this, that and the other things. I think it's important for programmers, especially the college levels, to learn a lot of little languages and to use the right tools for the job. There's some stuff that I think like, if you give me a problem that most people would solve it in Perl or Awk, I might solve it with Excel, like in kind of a one time way. Like's it's a problem as you get a big old file and you need to separate the first name from the last name, put the phone numbers, capitals, multiply this by that... Some kind of like column-wise kind of problem, I'd often do it in Excel I know how to do that really fast really easily.

Atwood: I agree with that. I actually talked about that in a blog post before, I see the future of languages as a lot of small languages that are good in specific things. And you'd switch between them in a fluid way, to when you are like "Oh, this is a set-based problem" or "Oh, this is a database problem" or "Oh, this is a text manipulation problem" and you sort of drop in a language that is good in that thing. I'm a big fan of that, I never really liked the idea that, and I think a lot of developers... let me give a specific example, they hate SQL for some reason, like they really don't like using SQL as a language or manipulating it, so they come up with this huge layer of abstraction just to get rid of SQL.

Spolsky: Yeah, it's stuff that they should just probably be doing...

Atwood: Yeah! And to me it's contorted!

Spolsky: { Check this line; probably wrong } As they learn more about SELECT statements, they'd realize that there's a better way to do it.

Atwood: Yeah, I mean SQL has its faults, to be sure, but it's really good: basic set-based data manipulation, I think. I actually like SQL a lot, and I realize that it's not perfect.

Spolsky: Yeah. The biggest weakness, especially for SQL Server, is any kind of interesting string manipulation you tried to do...

Atwood: It's painful.

Spolsky: ... it just falls all over the place. So if you just try and do something where its like splitting of word, ... I don't know, just anything that's inside the columns, there's not enough functions there, like they don't even have the proper, like Left, Mid, IntStr... like the most basic string functions.

Atwood: We actually created a user-defined function to bring .NET regular expressions into SQL Server. You can actually use managed code, it's not super-super fast, obviously, because it has to call to .NET.

Spolsky: How many years has SQL Server been out that they can't put regular expressions into T-SQL?

Atwood: Yeah, I was really shocked, because...

Spolsky: { Fill me in } ... this is such an obvious thing, just go friggin' put regular expression in this there! I know they are not fast, but everybody have to write COM objects now if you want use a regular expression in a SQL SELECT statement? What year it is?

Atwood: Well, it's not a COM object, it's a .NET managed code object. So it's not quite as bad as COM, but there's definitely a huge speed penalty because you are transitioning between those two worlds. Right?

Spolsky: If you do it in COM, then I'd say fast. But yeah, and your SQL queries can crash the whole machine. <laughs>

Atwood: Right.

Spolsky: ... caused by memory or god knows what. Sometimes it's just surprises me like what on earth they could have added in SQL:2008 if it wasn't that?

Atwood: Well, one thing: remember we talked about the Oslow, the modelling language, the DSL... thing we are talking about?

Spolsky: Yeah, I know, and I'm { Fill me in } about it and taped three episodes in which we get all kinds of Oslow modelling archives.

Atwood: Well, maybe what it's partly about, like sometimes, as a programmer you realize that you are trying to solve a problem and sometimes the language itself is sort of getting in your way at some fundamental level. It's not good at X, where X is... like in the case of SQL, it's not good at string manipulation. So the language is getting in your way. So what do you do? You could sort of roll up your sleeve and say "What if we could change the language itself?" And this is sort of going down the rabbit hole and to like Lisp and like Ruby and those languages that you can redefine the language and write constructs that perform the same as the language.

Spolsky: Be careful as somebody's getting out there got an email from... { Fill me in } <laughs>

Atwood: You don't agree with that?

Spolsky: Well, OK. Let's carry on that thought then.

Atwood: OK, that's as far as I'm going to go with this. I think sometimes you want to fix the problems, like, in the language, like you don't want to shell out to another executable or, you know, come up with some other of Rube Goldberg type solution. You wanna...

Spolsky: That's the trouble. The trouble is with the exception of Lisp, languages that are powerful enough to fix the problem in the language don't have these problem. They don't really have to. The exception of Lisp, where you start with language that had nothing and a really good macro capability so that you can extend the language to make it into whatever kind of language you want, so long you like parenthesis, and that's all very cool. But, I mean, the truth is, most of the language I use, either don't have the problems, or that they just aren't extensible enough to fix the problems. To give you a great example, C# is not quite extensible enough... let's say that the problem that I have is I don't like to declare types of variable, the arguments and stuff like that, and I want a dynamic version of the language... maybe that's my problem. Well, C# can't fix that. They are getting closer and closer, they got that bar thing now but it's not really a dynamic language. There's all kinds of places that where you can't... but the language is not powerful enough to fix that part of the language.

Atwood: Well, let me give you a more basic example. In C#, one thing that I find to this day very, very annoying, is when you do String::Format, so you are doing substitution within a string, there's all these cool parameters you can use to figure out, like, what type of date you are going to substitute, what it's going to look like and things like that. All the actual subtitutions themselves are numbered, so you know, {0}, {1}, {2}... which is based on the order of the paramters. Why can't I do named parameters? Why can't I give this a name, like ... you know ... LastName? That'd read so much better than, you know, {0}: it just say {LastName}. And you really can't do that. You can come up with extensions .. and .. and  some sort of workaround-dy kind of stuff, but you can't fundamentally go in and .. and .. you know ... add your own native function that does it that just mysteriously appears ... uh ... in the runtime. Whereas, in a language like Ruby, I think that's pretty common when you see something...

Spolsky: <interrupts> You can do that.  If you wanted to write your own little templating string formatting thing, you could do that in C [sic], couldn't you?

Atwood: In C#? You can, you can write an extension method. But it's not really the same as making it part of the language, it's sort of just exists only in your code.

Spolsky: So it won't work for other people's libraries that are gonna call the built-in format and stuff?

Atwood: No, I mean it does work. You can write your own function and you can call it. But at some level you just want to make a global fix to some of the stuff, because you realize that there are some essential wrongs in the world that you want to right, ... humm ... or at least for some version of it you want to right those wrongs. But it's not really possible in these [languages]... C# mostly being a static language. One cool thing that they are doing actually, they are doing named parameters to functions now. I don't know if you see all of those in C# 4.0?

Spolsky: Uh-oh.

Atwood: You don't like named parameters? So you have to say seven parameters, you can call them in any order as long you specify the name of the parameter that you want.

Spolsky: That's a little bit useful because it makes the code that's calling the function have the ability at least to be a little more self-documenting.

Atwood: That's right, so if you have seven parameters to a call [and] you are using only the last one, you don't have to put in "null, null, null, null, null, null, 3".

Spolsky: Why do you want a function that take seven parameters and you only need to use one of them? Like, why not make it a class that has seven properties and you only have to set the ones that you want to set?

Atwood: Well, this is about flexibility. This is about naming things versus order, this is the same as the string problem, isn't it? For some situations you just want to give things names and call them...

Spolsky: <interrupts> You know what, I liked the named paramters because I actually personally made them put into in VBA because we had a lot of functions in Excel that took a lot of arguments and sometimes you really only wanted to use the third and the seventh, depending what the function did. And usually when you look at the documentation for those functions, it's because the functions were defined in a bizzare way, like the documention would say: "If you provide the first argument and the third argument, this prints a piece of paper. If you provide the second argument and the seventh argument, this emails a letter to the email address in the seventh argument using the formatting of the template of the second argument. Unless you also provide a seventh argument, in which case, it rolls the dice and decides whether to turn off your screen or blow up your Macintosh.

Atwood: <laughs> I agree. I mean a parameter...

Spolsky: <interrupts> At some level you're like "Boy...! If these arguments were names, it'd be so much easier." <laughs>

Atwood: On some level I agree, you'd be asking like if you have a function with seven parameters, the question you should be asking is not "How can I call the seventh parameter very easily?" but like "Why the hell do we have a function with seven parameters?"...

Spolsky: <interrupts> Right.  Shouldn't there be another function that just takes the seventh parameter? Maybe that's just a different ... yeah ...

Atwood: But to me it's a different discussion ... but I do agree with it.  Like when you see stuff like that you should question it of course.

Spolsky: <interrupts> Right. So I didn't ... humm... mea culpa I did not question this and I just knew that we had a lot of long functions that took alot of arguments ... and I insisted on these named parameters being added to VBA, which they did.  And they had to use the colon equals syntax because the equals syntax would do a boolean expression and ... you know ... either pass true or false.  So in order to name the parameters you had to use ... I think ... colon-equals because colon did something else and that's the only way they could do it in VBA.  So ... they did that and lo-and-behold when the converted all the old Excel macro functions into the more object-oriented model for Visual Basic for Applications where instead of having a function called "MoveWindow" which takes like 18 arguments and they all do different things, we now had a Window class and it had a bunch of parameters [sic] and if you wanted to move it, you changed the Left and the Top ... and then it moved.  Humm... and lo-and-behold when we were done making this object-oriented thing, we didn't have any functions left that took alot of arguments, with a couple of tiny exceptions. Humm... and so ... the named arguments feature turned out not to be as important anymore, whereas if you look at the previous versions, like if you look at Word Basic 1.0, every dialog box had a function corresponding to it.  So if it was like the Format Paragraph dialog box and Format Paragraph has 83 little edits in their that do all kinds of things like the line-spacing and the indent .. .and those little things, there would be a FormatParagraph function that would have 83 arguments and the only way that you could specify which ones you actually wanted to change is using the named arguments.

Atwood: Right.

Spolsky: Wow.  We went into that a little too deep ... I feel like we've got everybody now turning us off.

Atwood. <laughs>. Well let's switch gears.  I have something else...

[Spolsky interrupts Atwood with some more write in C]

Spolsky: Oh sorry.  What?

Atwood: <laughs>.  That's a great song.  So we did have one milestone on StackOverflow that I am sorta remiss in talking about the day it happened.  We actually ... had our 100-thousandth question that was posted ... and no, I don't know what the actual question was <laughs> ... keeps coming up ... uh ... it's hard to tell because deletions affect the order.

Spolsky: Well what's the one that has 100000 as it's ID in the database?

Atwood: That happened a long time ago because we share ID's for a bunch of stuff ... for like ... questions and answers share ID's ... so ....

Spolsky: Oh.

Atwood: Yeah.  Those are both records in the same table.  We don't really distinguish a question and answer in that way.

Spolsky: Is that a good thing?  Are you gonna like ... revamp?

Atwood: No.  No, that parts not really changing ... humm ... because the difference between a question and answer is pretty minimal.  I mean questions have a title and questions have tags but other than that, they're pretty darn similar ... so we just have those two fields.  Well, there's actually more than that ... there's a bunch of fields there and all ... but ... it's just easier to have 'em in one table ... we think.  So we don't know which was the exact 100000'th, but that happened sometime on Wednesday February 25th.  That was an important milestone, so congratulations to all the StackOverflow users who got us there, that was nice.  The other thing I talked about in the blog post was that Jeff Dalgas set up this thing called Cacti for us ... have you heard of this?  Cacti is a graphing aggregation tool.  It's an open source tool that will take a bunch of inputs from typically networking and servers and just graph it for you .. automatically.  You can just give it all these inputs and it'll do weekly, monthly, daily ... all this really cool graphing stuff.

Spolsky: Just like to make pretty pictures to put up on the wall in your bedroom?

Atwood. Uh... no ... we wanna keep track of ... what we have it set up to do now is look at our bandwidth.  Because we're a little concerned about bandwidth.  Because we changed hosts and we negotiated a certain amount of bandwidth.  We were concerned that we were exceeding that and it turns out we are actually substantially exceeding what we thought we were using ... 'cause we changed from the old billing method that we had which was purely based on how much bandwidth gets used in a given time period.  I think we had 1250 gigabytes per month which we didn't even really get close to.  But the new billing model's what's called the 95th [percentile] ... burstable ... it's burstable billing.  At the 95th percentile.

Spolsky: That's what almost everybody does.

Atwood: Yeah, and it's a little bit harder to calculate, but it's based on your highest burst period for a certain amount of time.  95th percentile ends up being like 30 minutes.  So if you spend 30 minutes...

Spolsky: <interrupts> If you sample your bandwidth usage ... like every minute over the course of a month and throw away the five percent of the highest ... you know, the highest.

Atwood: I see.  Right.  Yeah, it's kinda weird but Cacti does this like by default, so....

Spolsky: <interrupts> Doesn't your ISP give you like... our ISP gives us things where we can look at these reports on their side on their port, they can tell us all this how much they are throwing at us? { Check this part }

Atwood: They might, but I think Jeff was familiar with this Cacti tool, it's easy to setup. It didn't take him... [it] take him maybe one day for that. And now we can see ourselves. So the answer is we are using the 95 percentile level, about 6Mbits, which ends up being 750KB/s? That was actually quite high.

Spolsky: 750KB/s?

Atwood: KB/s, at our peak.

Spolsky: So that's like a massively saturated DSL line.

Atwood: Err... no, it'd be more than that. That's almost 1MB/s, Joel. I mean think about we're close to get to 1MB/s. Anyway, for what it's worth.

Spolsky: So, 10Mbits. They usually do these in bits for some reason.

Atwood: Yeah, I don't like bits. Like to me I...

Spolsky: <interrupts> And { Fill me in } too, it's not fair.

Atwood: Yeah, bits versus bytes is annoying. I sort of convert to bytes because I think bits are ridiculous bit of measurement. I mean unless they do...

Spolsky: <interrupts> That's what they do. It's standard.

Atwood: I know, that doesn't mean I have to like it. So I put both notations.

Spolsky: Is that just like you the stop bit and the start bit and just a extra bit for the...?

Atwood: I research this a while ago and I don't remember the rationale. I don't remember, you know, the kilobits versus kilobytes. So we actually went through and since we are using more bandwidth than we thought we realize text over... text overflow <laughs>...

Spolsky: What does our site call anyway? <laughs>

Atwood: It's Stack Overflow. But what I'm trying to get to is that most of our site is text, so in a way it is a text overflow. If you look at our site, what images are we serving up? The logo, the vote buttons... I mean these are tiny, tiny images, really. We hardly... we don't do any image hosting locally. So any images you see on the site is from a third party site almost by definition. So the fact that we're going through almost 750KB/s peak of pure text, and compressed text at that, because for a long time we had the gzip religion, which is that everything that you serve should be compressed because it's just an utter no-brainer to compress 'cos you just have ridulous amount of CPU time and, like, tiny trickle of bandwidth. Plus it's just a better experience, right? You get the page faster, it's just better anyway you slice it. The only time not to do it is for some reason every CPU you have is pegged at 100%, then you don't want to be doing compression, but that's such a rare case. So we revisited it: there were a few edge cases we weren't compressing. I found out that in IIS, I think 6 and 7, it will not compress anything that's sent to a proxy, like if it sees "PROXY"...

Spolsky: <interrupts> Oh, IIS6 won't even compress anything that comes from a program.

Atwood: Yeah, 6 has some issues. 7 doesn't - finally, acceptably. But there are some caveats which is that if it sees it's a proxy or if it's HTTP 1.0, it won't compress. Now the weird thing is that a lot of proxies, for some reason, will say "Hey, I'm a HTTP 1.0 proxy."

Spolsky: How it was HTTP 1.0?

Atwood: I know, this is the thing: HTTP is like the oldest standard in the world, almost. It's 10 years old! It's been like 7 years. It's like, can we move on to 1.1? What's the problem here?

Spolsky: Longer than that, I think.

Atwood: Yea, we were really surprised. There was a lot of (a) proxies; and (b) proxies that report themselves "Ay! I'm HTTP 1.0".

Spolsky: With HTTP 1.0, maybe I'm getting this wrong here. I'm pretty sure HTTP 1.0 doesn't have like GET where you give it the URL including the full URL.

Atwood: I'd looked it up when I wrote this blog post and I couldn't really tell what's the difference between 1.0 and 1.1. I know it was somewhat significant but I can't remember the details. I couldn't find a good article. Probably because the world moved on to 1.1. Nobody gives a crap about 1.0.

Spolsky: No, I don't think you could even host two sites on one machine with two different top level URLs with the same IP address. I don't think you could do that in 1.0, I think you need 1.1. Because in 1.1 they finally realize you specify the entire URL in the GET line instead of the... because originally there was an assumption that once you get to a webserver there's only one webserver living there for one top-level domain in DNS. You see what I'm saying? So if you have A.com and B.com running on the same IP address, with HTTP 1.0 you would just connect and say "GET /" and wouldn't know what to send you, so you couldn't do that. And so HTTP 1.1 corrected that and said "Now you have to say "GET http://A.com/"".

Atwood: I know what you mean.

Spolsky: The idea of multi-hosting it goes back to, like, 1995, pretty much. Like HTTP 1.0 literally did not work. I'm probably getting this all wrong.

Atwood: <laughs> Well, the point is that HTTP 1.0 is ancient and yet there's a lot of proxies out there that were just happily telling the world "Hey! I'm HTTP 1.0! Look at me!" <laughs> And IIS sees that, and like "guess what, you aren't getting compression", and we see quite a bit of traffic from proxies.

Spolsky: And can you... what if you send them the compressed version then what would they do? The proxies...?

Atwood: That was the other thing. Well, the other ironic thing is while these proxies will report themselves as 1.0, and at the same time, in this very same request they'd say give me compressed content. <chuckles> I'm not entirely sure compressed content was invalid in 1.0, I couldn't quite tell. So, anyway, we resolved the proxy issue.

We also had an issue that feeds, RSS feeds, weren't being compressed due to some of the vagaries of ASP.NET MVC, the version we are running which is still not the latest version, unfortunately. We resolved that, so every bit of text you see or can retrieve from Stack Overflow should be compressed.

Now, the question you sort of inferring there is can you force compression for clients that didn't even ask for it. That's sort of the nuclear option, where you are technically only supposed to serve a compressed content if the client says "hey! I want compressed content if you can give it to me." So it's a little sketchy to take a client that didn't ask for it and sort of force it down their throats.

Spolsky: Maybe some guy's command line web browser thing that they wrote. Not that they deserve it.

Atwood: It is irritating, 'cos I would going to this sniffer and watch the traffic. And you do sort of wonder why are this... and you know, the worst thing is crawlers that aren't smart enough, like Alexis crawler is so dumb it won't even request compressed content. Now Googlebot does, and I think other well-written bots do but there's no reason in this day and age to ever send anything except for compressed content... HTTP content over the wire.

Spolsky: Is there anybody alive anymore in Alexa? You get that feeling that they wrote that in 1997 and then they fire all the people but they forget to turn off their servers. <laughs>

Atwood: <laughs> They are still out there. I read their little...

Spolsky: <interrupts> Yea, I know... huh?

Atwood: They have a little blog, and I read their little blog and... yeah.

Spolsky: Is it still an IE only toolbar?

Atwood: Well yeah. So... I guess let's give some background for the listener who aren't familiar with Alexa. So Alexa's claim to fame was that they could tell for any website how much traffic they were getting, theoretically.

Spolsky: From IE.

Atwood: From... well, that's the question, how could they do this? And the answer is, they had a browser toolbar, the Alexa toolbar, that was install in lots and lots of versions of IE. I don't think it's true anymore, but in the battle days, it's in lots of lots of version of IE.

Spolsky: No, I mean, you have to install it. People would download it and install it allegedly. I don't think it came with IE.

Atwood: Really? I thought it was bundled into it in some scenarios.

Spolsky: Maybe on some, like, I bought a laptop from, you know, Walmart and it had 48,000 things that were in the list. But mostly they just tell people to download it and a small percentage of people did and they decided that was a good enough [toolbar].

Atwood: Right. They were reliant on essentially sampling, which is the sample is people on the Internet who happen to have Alexa installed would go to, say, Walmart.com, that information would be transmitted to Alexa. And Alexa would make an assumption, "well, if one user went, that's representative of N-thousand real users.

Spolsky: The data is absolutely meaningless. No reasonable people have the Alexa toolbar installed. So they always under-report if it's an interesting website. I mean, how many programmers need one of these crappy toolbars that lets you.... It's like, do you remember? When there were the websites that let's you change the cursor to be like a little Dilbert. Comet Cursor, remember that? You got those website that'd be like "Would you like to install Comet Cursor ActiveX control?" You'll be able... if you go to the Dilbert website, your cursor will turn into like Dilbert and his tie will be the pointer.

Atwood: <laughs> You know a lot about this. You know a lot about this, I gonna to say.

Spolsky: A lot of people installed that damn thing. It's like it's just a 10KB download or whatever.

Atwood: Yeah, I guess. I always had the association of Comet Cursor with like... spyware and malware and stuff like that.

Spolsky: Right, I don't think it was any... [it was] neccessarily malware. Or maybe there was.

Atwood: Maybe early on there wasn't. But I think quickly, there was one of the things when their business model devovles so rapidly.

Spolsky: Imaginely, they're like "we could take over half of your screen and show you Viagra ad, ... ad { << Check + Fill me in } as is, from now until the end of time and prevent you from ever installing it, then at least then we could make a buck".

Atwood: Yeah. That's a whole unfortunate part of computing history.

[34:50]

....

[47:03]


Atwood: You've illustrated an important principal here, which is that "architecture" implies divorcing the people that are doing the work from the people that are making the decisions. This is always in my experience, super, super dangerous. So to the extent that the architecture group or the architect is not really with you in the trenches, helping you do the work, they're not going to make the right decisions. 


Spolsky: They just don't have any of the information. 


Atwood: They don't have any of the context, any of the information.  i think that is the root problem that I was trying to get at.


[47:31]


.....

[68:10 ends]

Outro, advertising

[69:20]

[incomplete]

Intro, advertising

[01:11]

Atwood: Are we on President number 44 now ?

Spolsky: Maybe? There's a question as to whether you count that guy that was President twice.

Atwood: <laughs>

Spolsky: { Not sure, please check } And wasn't there somebody who had been president for five minutes?

Atwood: Oh gosh I don't know, my American history isn't good enough to cover that.

Spolsky: Do we have a podcast guest?

Atwood: No, no, I had to { Fill this in }

Spolsky: Oh, next week maybe.

Atwood: Do you wanna play that song, there was a song.

Spolsky: Nobody like us.

Atwood: <laughs> That's not true.

Spolsky: We had a guest song, I don't know, let me look up onto YouTube and see whether I can find it.

Atwood: Yeah.

Spolsky: What was it?

Atwood: It was a parody of Let It Be, is it what it is called?

Spolsky: Right, it was about the C programming language.

Atwood: Yeah, I actually contacted that guy. I had a link here, I'll send it to you.

Spolsky: Here it is.

Atwood: I contacted that guy on YouTube but everytime I done that...

[Spolsky plays Write in C song at the background until the beginning of the chorus]

Spolsky: Write in C. Alright, enough of that word.

Atwood: That was really funny, though, I enjoyed that.

Spolsky: It was a good one, it's a nice, hardcore... so you actually contacted this kid?

Atwood: Well, I have. In the past I wanted to contact people on YouTube [and] you can use a contact form, because I have a YouTube account. And I contact him, explain { Fill this in }, who you were, who I was and of course there was no response at all. <laughs> Which is pretty typical, I mean YouTube is just not a good messaging... uh, mechanism.

Spolsky: The first thing I'd do if I post a video to YouTube would be to install some kind of, like, email electrification, zapper and nukifier to prevent everybody contacted by anyone.

Atwood: Yeah... so we are { Fill this in }

Spolsky: Yeah, it's not paying the proper royalties to the Beatles anyway. <laughs> We'll link to that from the shownotes. Awesome song, Write in C.

Atwood: That's right, Joel's favourite song. Write everything in C, because Joel does in fact write everything in C, don't you, Joel?

Spolsky: I started using a little bit of C99, the latest version of C, which let you declare variables after you written some statements.

Atwood: Isn't there like a... well no, there's is another version of C++ coming out, like, 0x something?

Spolsky: Yea, C++0x. { Fill this in } they just haven't decided what year it's going to ship yet.

Atwood: I see.

Spolsky: I guess it's either C++09 or nothing, they are kind of running out of 0's.

Atwood: Mmm, yea, as I'm not a C programmer, I don't really keep up with that stuff, but occasionally...

Spolsky: Do you know that I had lunch with Brain Kernighan?

Atwood: Oh right! That's awesome. How did that go?

Spolsky: You know, he told me what he thought was the one mistake of C programming language. Now he wrote the book The C Programming Language but he did not invent C. He work with the folks that did, he invented the language is called Awk... umm, among other things, probably. But it's an awesome book, and he said that probably the only mistake in C was the operator precedence of the bitwise logical operators as compared to the equality operator. He thought that the bitwise logical operators should be higher priority than the equality operator. And other than that he thought there wasn't really a mistake in the language, [and] I tend to agree with that. I'd say that's kind of true.

Atwood: Certainly it's been wildly popular. I mean, C has been the backbone of a lot of programming [languages], so, by that measure, it's wildly successful.

Spolsky: Yeah.

Atwood: I don't think that the criticism that it is a low-level or medium-level language, that was by design, that was the intent of C.

Spolsky: And don't forget the context, it was 78 or something, right?

Atwood: That was a long time ago.

Spolsky: So, you have to put things in context. The stuff in the C that I consider as accidental complexity like stuff you have to manage yourself, like memory management, things like malloc() things for yourself you don't have to do anymore because we have figured out ways to let the compiler and the runtime do it for you with garbage collection... or even with the reference counting like they had in Visual Basic. So the stuff in C that just doesn't have to be in a programming language anymore, but that doesn't mean that it wasn't a good programming language for its time.

Atwood: Absolutely. Now it was hugely influential -- C#, Javascript... and the other languages that looks like C, which... I was kind of bitter about, actually. I was never a big fan of sort of the way select...

Spolsky: That wasn't C, that's Algol, right? C was looking like Algol, I mean, those are all the strutural programming languages that was meant to look like Algol68.

Atwood: Right, but I blame C. Because it was just so much popular. I don't know how popular Algol was, but certainly, as long I've been a programmer, C was like, the touchstone/cornerstone language and it seemed a lot of language decisions like in Java and C# were made so that people wouldn't look at the code and freak out "I don't recognize this! This doesn't look like code I could understand." So they made it look similar to C to reduce the learning pain. I'm a little bitter about that because I always felt that I really dislike the look of C.

Spolsky: Really?

Atwood: Yeah.

Spolsky: So clean and... mechanical?

Atwood: For one thing, the curlies... like when you are ending curly braces, you never know what you are really ending. Just another curly brace -- I guess it's kind of like the Lisp problem, you have paren(thesis), you never really know the parens are closing. This is just a preference, but I'd like a little more verbosity in the ending of blocks, that I know exactly the block that was ended versus just....

Spolsky: That was a weird thing... You know there were programming languages where the end matchs the opening of the block, so like Basic: If... End If. But then again, what if you have nested if's? You still don't exactly know which one it goes to.

Atwood: Yeah, that's true. But it's really just a tradeoff, it's just a preference. It's not written in stone obviously.

Spolsky: I come to like the fact that Python that you just don't write an end: you just unindent, and so the indenting actually reflect the structure.

Atwood: That was actually a cool aspect of Python. I never really work in Python actually at all, but I always respected the choice that it was a brave choice to make the whitespace to carry the meaning and actually enforce the whitespace, and thought that was very cool. And one thing I do agree with -- and coming from a Basic background, which is the language I knew for a long time, I eventually grudging agreed that having carraige return be your line terminator.

Spolsky: That's Unix, that's not really...

Atwood: No, no, what I mean like, so in C# or C, until you put a semicolon in, it's one line.

Spolsky: Oh, I'm sorry. I thought you meant the line separator was the carraige return and  linefeed. You are talking about the semicolon versus just the line end.

Atwood: Right. At first I was bitter about that, because I was like "I have to type the stupid semicolon at the end of every line", but then I realize that it gives you so much flexibility because there's a lot of situations that you get into in Basic where you want to continue the line you have to use some crazy, like, line continuation marker like underscore.

Spolsky: It's not crazy, it's awkward.

Atwood: It's just awkward because the semicolon way, I think, is much better. And I think any language you have to have an explicit line terminator, I don't think it's a good idea to use carraige returns as a line terminator.

Spolsky: Hey do you know what Kernighan's second favourite language is?

Atwood: What's that?

Spolsky: Basic.

Atwood: <laughs> Really? Did he tell you that, today?

Spolsky: Yeah.

Atwood: That's crazy.

Spolsky: No, it's true, he said that's sort of a secondary language. He told me that "Are they really coming up with another version of VB6? 'Cos I really don't like VB.NET."

Atwood: Wow.

Spolsky: Yeah, he has done a bunch of stuff like, for example, he had some kind of a big complicated library that did all kinds of interesting optimizations and it didn't really have a very good UI like input/output thing. So they did the input/output UI through Excel, or you'd just type things into a spreadsheet and then he had this VBA code that basically linked these COM objects that were written in C to interface with them.

Atwood: Cool.

Spolsky: But in his class, I think, he teaches little bit of Visual Basic, a little of COM programming, a little bit of.. . he teaches a lot of other things, a little bit of Awk, a little bit of command line... this, that and the other things. I think it's important for programmers, especially the college levels, to learn a lot of little languages and to use the right tools for the job. There's some stuff that I think like, if you give me a problem that most people would solve it in Perl or Awk, I might solve it with Excel, like in kind of a one time way. Like's it's a problem as you get a big old file and you need to separate the first name from the last name, put the phone numbers, capitals, multiply this by that... Some kind of like column-wise kind of problem, I'd often do it in Excel I know how to do that really fast really easily.

Atwood: I agree with that. I actually talked about that in a blog post before, I see the future of languages as a lot of small languages that are good in specific things. And you'd switch between them in a fluid way, to when you are like "Oh, this is a set-based problem" or "Oh, this is a database problem" or "Oh, this is a text manipulation problem" and you sort of drop in a language that is good in that thing. I'm a big fan of that, I never really liked the idea that, and I think a lot of developers... let me give a specific example, they hate SQL for some reason, like they really don't like using SQL as a language or manipulating it, so they come up with this huge layer of abstraction just to get rid of SQL.

Spolsky: Yeah, it's stuff that they should just probably be doing...

Atwood: Yeah! And to me it's contorted!

Spolsky: { Check this line; probably wrong } As they learn more about SELECT statements, they'd realize that there's a better way to do it.

Atwood: Yeah, I mean SQL has its faults, to be sure, but it's really good: basic set-based data manipulation, I think. I actually like SQL a lot, and I realize that it's not perfect.

Spolsky: Yeah. The biggest weakness, especially for SQL Server, is any kind of interesting string manipulation you tried to do...

Atwood: It's painful.

Spolsky: ... it just falls all over the place. So if you just try and do something where its like splitting of word, ... I don't know, just anything that's inside the columns, there's not enough functions there, like they don't even have the proper, like Left, Mid, IntStr... like the most basic string functions.

Atwood: We actually created a user-defined function to bring .NET regular expressions into SQL Server. You can actually use managed code, it's not super-super fast, obviously, because it has to call to .NET.

Spolsky: How many years has SQL Server been out that they can't put regular expressions into T-SQL?

Atwood: Yeah, I was really shocked, because...

Spolsky: { Fill me in } ... this is such an obvious thing, just go friggin' put regular expression in this there! I know they are not fast, but everybody have to write COM objects now if you want use a regular expression in a SQL SELECT statement? What year it is?

Atwood: Well, it's not a COM object, it's a .NET managed code object. So it's not quite as bad as COM, but there's definitely a huge speed penalty because you are transitioning between those two worlds. Right?

Spolsky: If you do it in COM, then I'd say fast. But yeah, and your SQL queries can crash the whole machine. <laughs>

Atwood: Right.

Spolsky: ... caused by memory or god knows what. Sometimes it's just surprises me like what on earth they could have added in SQL:2008 if it wasn't that?

Atwood: Well, one thing: remember we talked about the Oslow, the modelling language, the DSL... thing we are talking about?

Spolsky: Yeah, I know, and I'm { Fill me in } about it and taped three episodes in which we get all kinds of Oslow modelling archives.

Atwood: Well, maybe what it's partly about, like sometimes, as a programmer you realize that you are trying to solve a problem and sometimes the language itself is sort of getting in your way at some fundamental level. It's not good at X, where X is... like in the case of SQL, it's not good at string manipulation. So the language is getting in your way. So what do you do? You could sort of roll up your sleeve and say "What if we could change the language itself?" And this is sort of going down the rabbit hole and to like Lisp and like Ruby and those languages that you can redefine the language and write constructs that perform the same as the language.

Spolsky: Be careful as somebody's getting out there got an email from... { Fill me in } <laughs>

Atwood: You don't agree with that?

Spolsky: Well, OK. Let's carry on that thought then.

Atwood: OK, that's as far as I'm going to go with this. I think sometimes you want to fix the problems, like, in the language, like you don't want to shell out to another executable or, you know, come up with some other of Rube Goldberg type solution. You wanna...

Spolsky: That's the trouble. The trouble is with the exception of Lisp, languages that are powerful enough to fix the problem in the language don't have these problem. They don't really have to. The exception of Lisp, where you start with language that had nothing and a really good macro capability so that you can extend the language to make it into whatever kind of language you want, so long you like parenthesis, and that's all very cool. But, I mean, the truth is, most of the language I use, either don't have the problems, or that they just aren't extensible enough to fix the problems. To give you a great example, C# is not quite extensible enough... let's say that the problem that I have is I don't like to declare types of variable, the arguments and stuff like that, and I want a dynamic version of the language... maybe that's my problem. Well, C# can't fix that. They are getting closer and closer, they got that bar thing now but it's not really a dynamic language. There's all kinds of places that where you can't... but the language is not powerful enough to fix that part of the language.

Atwood: Well, let me give you a more basic example. In C#, one thing that I find to this day very, very annoying, is when you do String::Format, so you are doing substitution within a string, there's all these cool parameters you can use to figure out, like, what type of date you are going to substitute, what it's going to look like and things like that. All the actual subtitutions themselves are numbered, so you know, {0}, {1}, {2}... which is based on the order of the paramters. Why can't I do named parameters? Why can't I give this a name, like ... you know ... LastName? That'd read so much better than, you know, {0}: it just say {LastName}. And you really can't do that. You can come up with extensions .. and .. and  some sort of workaround-dy kind of stuff, but you can't fundamentally go in and .. and .. you know ... add your own native function that does it that just mysteriously appears ... uh ... in the runtime. Whereas, in a language like Ruby, I think that's pretty common when you see something...

Spolsky: <interrupts> You can do that.  If you wanted to write your own little templating string formatting thing, you could do that in C [sic], couldn't you?

Atwood: In C#? You can, you can write an extension method. But it's not really the same as making it part of the language, it's sort of just exists only in your code.

Spolsky: So it won't work for other people's libraries that are gonna call the built-in format and stuff?

Atwood: No, I mean it does work. You can write your own function and you can call it. But at some level you just want to make a global fix to some of the stuff, because you realize that there are some essential wrongs in the world that you want to right, ... humm ... or at least for some version of it you want to right those wrongs. But it's not really possible in these [languages]... C# mostly being a static language. One cool thing that they are doing actually, they are doing named parameters to functions now. I don't know if you see all of those in C# 4.0?

Spolsky: Uh-oh.

Atwood: You don't like named parameters? So you have to say seven parameters, you can call them in any order as long you specify the name of the parameter that you want.

Spolsky: That's a little bit useful because it makes the code that's calling the function have the ability at least to be a little more self-documenting.

Atwood: That's right, so if you have seven parameters to a call [and] you are using only the last one, you don't have to put in "null, null, null, null, null, null, 3".

Spolsky: Why do you want a function that take seven parameters and you only need to use one of them? Like, why not make it a class that has seven properties and you only have to set the ones that you want to set?

Atwood: Well, this is about flexibility. This is about naming things versus order, this is the same as the string problem, isn't it? For some situations you just want to give things names and call them...

Spolsky: <interrupts> You know what, I liked the named paramters because I actually personally made them put into in VBA because we had a lot of functions in Excel that took a lot of arguments and sometimes you really only wanted to use the third and the seventh, depending what the function did. And usually when you look at the documentation for those functions, it's because the functions were defined in a bizzare way, like the documention would say: "If you provide the first argument and the third argument, this prints a piece of paper. If you provide the second argument and the seventh argument, this emails a letter to the email address in the seventh argument using the formatting of the template of the second argument. Unless you also provide a seventh argument, in which case, it rolls the dice and decides whether to turn off your screen or blow up your Macintosh.

Atwood: <laughs> I agree. I mean a parameter...

Spolsky: <interrupts> At some level you're like "Boy...! If these arguments were names, it'd be so much easier." <laughs>

Atwood: On some level I agree, you'd be asking like if you have a function with seven parameters, the question you should be asking is not "How can I call the seventh parameter very easily?" but like "Why the hell do we have a function with seven parameters?"...

Spolsky: <interrupts> Right.  Shouldn't there be another function that just takes the seventh parameter? Maybe that's just a different ... yeah ...

Atwood: But to me it's a different discussion ... but I do agree with it.  Like when you see stuff like that you should question it of course.

Spolsky: <interrupts> Right. So I didn't ... humm... mea culpa I did not question this and I just knew that we had a lot of long functions that took alot of arguments ... and I insisted on these named parameters being added to VBA, which they did.  And they had to use the colon equals syntax because the equals syntax would do a boolean expression and ... you know ... either pass true or false.  So in order to name the parameters you had to use ... I think ... colon-equals because colon did something else and that's the only way they could do it in VBA.  So ... they did that and lo-and-behold when the converted all the old Excel macro functions into the more object-oriented model for Visual Basic for Applications where instead of having a function called "MoveWindow" which takes like 18 arguments and they all do different things, we now had a Window class and it had a bunch of parameters [sic] and if you wanted to move it, you changed the Left and the Top ... and then it moved.  Humm... and lo-and-behold when we were done making this object-oriented thing, we didn't have any functions left that took alot of arguments, with a couple of tiny exceptions. Humm... and so ... the named arguments feature turned out not to be as important anymore, whereas if you look at the previous versions, like if you look at Word Basic 1.0, every dialog box had a function corresponding to it.  So if it was like the Format Paragraph dialog box and Format Paragraph has 83 little edits in their that do all kinds of things like the line-spacing and the indent .. .and those little things, there would be a FormatParagraph function that would have 83 arguments and the only way that you could specify which ones you actually wanted to change is using the named arguments.

Atwood: Right.

Spolsky: Wow.  We went into that a little too deep ... I feel like we've got everybody now turning us off.

Atwood. <laughs>. Well let's switch gears.  I have something else...

[Spolsky interrupts Atwood with some more write in C]

Spolsky: Oh sorry.  What?

Atwood: <laughs>.  That's a great song.  So we did have one milestone on StackOverflow that I am sorta remiss in talking about the day it happened.  We actually ... had our 100-thousandth question that was posted ... and no, I don't know what the actual question was <laughs> ... keeps coming up ... uh ... it's hard to tell because deletions affect the order.

Spolsky: Well what's the one that has 100000 as it's ID in the database?

Atwood: That happened a long time ago because we share ID's for a bunch of stuff ... for like ... questions and answers share ID's ... so ....

Spolsky: Oh.

Atwood: Yeah.  Those are both records in the same table.  We don't really distinguish a question and answer in that way.

Spolsky: Is that a good thing?  Are you gonna like ... revamp?

Atwood: No.  No, that parts not really changing ... humm ... because the difference between a question and answer is pretty minimal.  I mean questions have a title and questions have tags but other than that, they're pretty darn similar ... so we just have those two fields.  Well, there's actually more than that ... there's a bunch of fields there and all ... but ... it's just easier to have 'em in one table ... we think.  So we don't know which was the exact 100000'th, but that happened sometime on Wednesday February 25th.  That was an important milestone, so congratulations to all the StackOverflow users who got us there, that was nice.  The other thing I talked about in the blog post was that Jeff Dalgas set up this thing called Cacti for us ... have you heard of this?  Cacti is a graphing aggregation tool.  It's an open source tool that will take a bunch of inputs from typically networking and servers and just graph it for you .. automatically.  You can just give it all these inputs and it'll do weekly, monthly, daily ... all this really cool graphing stuff.

Spolsky: Just like to make pretty pictures to put up on the wall in your bedroom?

Atwood. Uh... no ... we wanna keep track of ... what we have it set up to do now is look at our bandwidth.  Because we're a little concerned about bandwidth.  Because we changed hosts and we negotiated a certain amount of bandwidth.  We were concerned that we were exceeding that and it turns out we are actually substantially exceeding what we thought we were using ... 'cause we changed from the old billing method that we had which was purely based on how much bandwidth gets used in a given time period.  I think we had 1250 gigabytes per month which we didn't even really get close to.  But the new billing model's what's called the 95th [percentile] ... burstable ... it's burstable billing.  At the 95th percentile.

Spolsky: That's what almost everybody does.

Atwood: Yeah, and it's a little bit harder to calculate, but it's based on your highest burst period for a certain amount of time.  95th percentile ends up being like 30 minutes.  So if you spend 30 minutes...

Spolsky: <interrupts> If you sample your bandwidth usage ... like every minute over the course of a month and throw away the five percent of the highest ... you know, the highest.

Atwood: I see.  Right.  Yeah, it's kinda weird but Cacti does this like by default, so....

Spolsky: <interrupts> Doesn't your ISP give you like... our ISP gives us things where we can look at these reports on their side on their port, they can tell us all this how much they are throwing at us? { Check this part }

Atwood: They might, but I think Jeff was familiar with this Cacti tool, it's easy to setup. It didn't take him... [it] take him maybe one day for that. And now we can see ourselves. So the answer is we are using the 95 percentile level, about 6Mbits, which ends up being 750KB/s? That was actually quite high.

Spolsky: 750KB/s?

Atwood: KB/s, at our peak.

Spolsky: So that's like a massively saturated DSL line.

Atwood: Err... no, it'd be more than that. That's almost 1MB/s, Joel. I mean think about we're close to get to 1MB/s. Anyway, for what it's worth.

Spolsky: So, 10Mbits. They usually do these in bits for some reason.

Atwood: Yeah, I don't like bits. Like to me I...

Spolsky: <interrupts> And { Fill me in } too, it's not fair.

Atwood: Yeah, bits versus bytes is annoying. I sort of convert to bytes because I think bits are ridiculous bit of measurement. I mean unless they do...

Spolsky: <interrupts> That's what they do. It's standard.

Atwood: I know, that doesn't mean I have to like it. So I put both notations.

Spolsky: Is that just like you the stop bit and the start bit and just a extra bit for the...?

Atwood: I research this a while ago and I don't remember the rationale. I don't remember, you know, the kilobits versus kilobytes. So we actually went through and since we are using more bandwidth than we thought we realize text over... text overflow <laughs>...

Spolsky: What does our site call anyway? <laughs>

Atwood: It's Stack Overflow. But what I'm trying to get to is that most of our site is text, so in a way it is a text overflow. If you look at our site, what images are we serving up? The logo, the vote buttons... I mean these are tiny, tiny images, really. We hardly... we don't do any image hosting locally. So any images you see on the site is from a third party site almost by definition. So the fact that we're going through almost 750KB/s peak of pure text, and compressed text at that, because for a long time we had the gzip religion, which is that everything that you serve should be compressed because it's just an utter no-brainer to compress 'cos you just have ridulous amount of CPU time and, like, tiny trickle of bandwidth. Plus it's just a better experience, right? You get the page faster, it's just better anyway you slice it. The only time not to do it is for some reason every CPU you have is pegged at 100%, then you don't want to be doing compression, but that's such a rare case. So we revisited it: there were a few edge cases we weren't compressing. I found out that in IIS, I think 6 and 7, it will not compress anything that's sent to a proxy, like if it sees "PROXY"...

Spolsky: <interrupts> Oh, IIS6 won't even compress anything that comes from a program.

Atwood: Yeah, 6 has some issues. 7 doesn't - finally, acceptably. But there are some caveats which is that if it sees it's a proxy or if it's HTTP 1.0, it won't compress. Now the weird thing is that a lot of proxies, for some reason, will say "Hey, I'm a HTTP 1.0 proxy."

Spolsky: How it was HTTP 1.0?

Atwood: I know, this is the thing: HTTP is like the oldest standard in the world, almost. It's 10 years old! It's been like 7 years. It's like, can we move on to 1.1? What's the problem here?

Spolsky: Longer than that, I think.

Atwood: Yea, we were really surprised. There was a lot of (a) proxies; and (b) proxies that report themselves "Ay! I'm HTTP 1.0".

Spolsky: With HTTP 1.0, maybe I'm getting this wrong here. I'm pretty sure HTTP 1.0 doesn't have like GET where you give it the URL including the full URL.

Atwood: I'd looked it up when I wrote this blog post and I couldn't really tell what's the difference between 1.0 and 1.1. I know it was somewhat significant but I can't remember the details. I couldn't find a good article. Probably because the world moved on to 1.1. Nobody gives a crap about 1.0.

Spolsky: No, I don't think you could even host two sites on one machine with two different top level URLs with the same IP address. I don't think you could do that in 1.0, I think you need 1.1. Because in 1.1 they finally realize you specify the entire URL in the GET line instead of the... because originally there was an assumption that once you get to a webserver there's only one webserver living there for one top-level domain in DNS. You see what I'm saying? So if you have A.com and B.com running on the same IP address, with HTTP 1.0 you would just connect and say "GET /" and wouldn't know what to send you, so you couldn't do that. And so HTTP 1.1 corrected that and said "Now you have to say "GET http://A.com/"".

Atwood: I know what you mean.

Spolsky: The idea of multi-hosting it goes back to, like, 1995, pretty much. Like HTTP 1.0 literally did not work. I'm probably getting this all wrong.

Atwood: <laughs> Well, the point is that HTTP 1.0 is ancient and yet there's a lot of proxies out there that were just happily telling the world "Hey! I'm HTTP 1.0! Look at me!" <laughs> And IIS sees that, and like "guess what, you aren't getting compression", and we see quite a bit of traffic from proxies.

Spolsky: And can you... what if you send them the compressed version then what would they do? The proxies...?

Atwood: That was the other thing. Well, the other ironic thing is while these proxies will report themselves as 1.0, and at the same time, in this very same request they'd say give me compressed content. <chuckles> I'm not entirely sure compressed content was invalid in 1.0, I couldn't quite tell. So, anyway, we resolved the proxy issue.

We also had an issue that feeds, RSS feeds, weren't being compressed due to some of the vagaries of ASP.NET MVC, the version we are running which is still not the latest version, unfortunately. We resolved that, so every bit of text you see or can retrieve from Stack Overflow should be compressed.

Now, the question you sort of inferring there is can you force compression for clients that didn't even ask for it. That's sort of the nuclear option, where you are technically only supposed to serve a compressed content if the client says "hey! I want compressed content if you can give it to me." So it's a little sketchy to take a client that didn't ask for it and sort of force it down their throats.

Spolsky: Maybe some guy's command line web browser thing that they wrote. Not that they deserve it.

Atwood: It is irritating, 'cos I would going to this sniffer and watch the traffic. And you do sort of wonder why are this... and you know, the worst thing is crawlers that aren't smart enough, like Alexa's crawler is so dumb it won't even request compressed content. Now Googlebot does, and I think other well-written bots do but there's no reason in this day and age to ever send anything except for compressed content... HTTP content over the wire.

Spolsky: Is there anybody alive anymore in Alexa? You get that feeling that they wrote that in 1997 and then they fire all the people but they forget to turn off their servers. <laughs>

Atwood: <laughs> They are still out there. I read their little...

Spolsky: <interrupts> Yea, I know... huh?

Atwood: They have a little blog, and I read their little blog and... yeah.

Spolsky: Is it still an IE only toolbar?

Atwood: Well yeah. So... I guess let's give some background for the listener who aren't familiar with Alexa. So Alexa's claim to fame was that they could tell for any website how much traffic they were getting, theoretically.

Spolsky: From IE.

Atwood: From... well, that's the question, how could they do this? And the answer is, they had a browser toolbar, the Alexa toolbar, that was install in lots and lots of versions of IE. I don't think it's true anymore, but in the battle days, it's in lots of lots of version of IE.

Spolsky: No, I mean, you have to install it. People would download it and install it allegedly. I don't think it came with IE.

Atwood: Really? I thought it was bundled into it in some scenarios.

Spolsky: Maybe on some, like, I bought a laptop from, you know, Walmart and it had 48,000 things that were in the list. But mostly they just tell people to download it and a small percentage of people did and they decided that was a good enough [toolbar].

Atwood: Right. They were reliant on essentially sampling, which is the sample is people on the Internet who happen to have Alexa installed would go to, say, Walmart.com, that information would be transmitted to Alexa. And Alexa would make an assumption, "well, if one user went, that's representative of N-thousand real users.

Spolsky: The data is absolutely meaningless. No reasonable people have the Alexa toolbar installed. So they always under-report if it's an interesting website. I mean, how many programmers need one of these crappy toolbars that lets you.... It's like, do you remember? When there were the websites that let's you change the cursor to be like a little Dilbert. Comet Cursor, remember that? You got those website that'd be like "Would you like to install Comet Cursor ActiveX control?" You'll be able... if you go to the Dilbert website, your cursor will turn into like Dilbert and his tie will be the pointer.

Atwood: <laughs> You know a lot about this. You know a lot about this, I gonna to say.

Spolsky: A lot of people installed that damn thing. It's like it's just a 10KB download or whatever.

Atwood: Yeah, I guess. I always had the association of Comet Cursor with like... spyware and malware and stuff like that.

Spolsky: Right, I don't think it was any... [it was] neccessarily malware. Or maybe there was.

Atwood: Maybe early on there wasn't. But I think quickly, there was one of the things when their business model devovles so rapidly.

Spolsky: Imaginely, they're like "we could take over half of your screen and show you Viagra ad, ... ad { << Check + Fill me in } as is, from now until the end of time and prevent you from ever installing it, then at least then we could make a buck".

Atwood: Yeah. That's a whole unfortunate part of computing history.

[34:50]

....

[47:03]


Atwood: You've illustrated an important principal here, which is that "architecture" implies divorcing the people that are doing the work from the people that are making the decisions. This is always in my experience, super, super dangerous. So to the extent that the architecture group or the architect is not really with you in the trenches, helping you do the work, they're not going to make the right decisions. 


Spolsky: They just don't have any of the information. 


Atwood: They don't have any of the context, any of the information.  i think that is the root problem that I was trying to get at.


[47:31]


.....

[68:10 ends]

Outro, advertising

[69:20]