Podcast 079Revision #8, 1/11/2010 5:37 AM75.22.62.84: "15:00 - 20:00 completed" Tags: (None) Previous Next |
Podcast 079Revision #9, 1/11/2010 11:33 AM78.105.3.37: "transcribed discussion following question about take-home coding assignments" Tags: (None) Previous Next |
---|---|
Spolsky: I was just in a meeting last second with this guy he made a StackExchange called ClimateDeal. Have you heard about that? Atwood: I haven't heard of it. Spolsky: climatedeal.org. It's a climate change StackExchange and I guess there's lots of money in the NGO business. Atwood: Really? I didn't know anything about that. And another thing I don't know anything about that somebody mailed me about and I want to mention since we're on the topic of StackExchanges is and astronomy StackExchange. Not just astronomy but... Spolsky: Like I'm a libra and you're a leo therefore... Atwood: Actually Joel, I'm a capricorn though, and capricorns are very stubborn as you know. Spolsky: And I'm a leo. I know that you're stubborn I don't know about capricorns in general. Atwood: That's the crazy thing about astrological science. Everybody is born in a specific month has the same personality traits. That doesn't even make any sense at all. It doesn't even pass the sniff test of like: Is this sensible? No. This is like turning lead into gold ridiculous. Spolsky: What's real is biorhythms. Atwood: This is a real thing it's actually astro-tech, I guess it's technical astronomy: interfacing with telescopes and astronomical instrumentation. It's answers.ascom-standards.org. I guess they had an existing site. Spolsky: Terrible URL... Atwood: So if you're into astro-tech... Spolsky: Woah, look at all these people, look at all these questions. Is ascom a thing? I think it might kind of be a thing because there are people asking about ascom on here. Atwood: Yeah, I think it is a thing. This is, when we created StackExchange we were looking at niches and I'm a big fan of these little niches on the internet, I think it's wonderful. Spolsky: We got to tell the listeners. Ascom is a many-to-many and language-independent architecture supported by most astronomy devices that connect to computers. So it sounds like it's like MIDI of telescopes. Atwood: Yes, that makes sense. Spolsky: That's what is sounds like. Atwood: But only if it can play music Joel. Spolsky: Except that it doesn't play music, it plays telescopes. And some of the names here I recognize from StackOverflow so I think that there's some overlap. Maybe not, maybe it's just because there are people named Chris and Bob and stuff. It is sort of interesting how these StackExchanges, like the first level and the second level of spreading I asked Jose who's here from ClimateDeal how he heard about StackExchange and why he decided to start using it. He said they're building a whole organization around StackExchange kind of and they're going to promote to other climate change type of organizations and other "green" organizations that they know. And I asked him how he found out about it and he's like, "We were working with these programmers and they suggested that we look at this and they use StackOverflow." Like all the programmers on StackOverflow have an obligation to tell other industries and get them excited about the StackOverflow vision for the future. Atwood: Anything to keep people off of PHP. It's like keeping people off drugs, it's the right thing to do. Spolsky: Right... Atwood: And when I say PHP I mean PHPBB. I'm specifically talking about PHPBB. Spolsky: You know that there is 99.999% of PHP is not PHPBB. Atwood: I know. There's a ton. [5:00] Spolsky: Let's keep people off that too. Atwood: Well, not necessarily, I've kind of resigned myself to a world of PHPBB at this point. So, one thing I want to talk about is over the holidays I was able to contact the person who created the Markdown... there's 2 Markdowns: Markdown is the markup language that we use on StackOverflow. There's 2 implementations that we have: one is for the client side preview, which is the wmd control which we had to reverse-engineer... the whole story is in a previous podcast. And then there's the server-side implementation. So one of the difficulties we ran into was these are subtly different. Spolsky: So the previews are not matching when it shows up on the site? Atwood: Right. Over the holiday I did improve the preview, like the main areas that were just kind of oversights really. Like we changed some of the rules about bold and italic and for the most part they match now except for really weird edge-cases. I got rid of all the obvious mismatches. Spolsky: Yeah. Atwood: And actually I got help from Jacob, I'm going to mispronounce his name I'm just going to call him Jacob, who runs the MathOverflow StackExchange. He was very helpful. He was helpful in sort of troubleshooting that. They used a lot of weird syntax on MathOverflow. Spolsky: They do amazing things with math notation basically. They use LaTeX. Atwood: Yes, but, we've talked about that before, but in addition to that they just have ASCII notation as well, and the ASCII notation can be problematic because you're putting characters in sequences that are just really, really uncommon in any kind of normal text at all. So they were running into a lot of edge-conditions as well so he was very helpful so I do want to give a shout out to Jacob in that regard. Spolsky: Have you looked at MathOverflow lately? It's absolutely insane. Look at all these tags, they've got tags with little dots in them. Why is that? Oh, I think that the dot is like the... Atwood: Don't you understand Joel? I'm kind of like allergic to math so it's not really good for me to be around math. Spolsky: Look at the site and look at their tag cloud over there. What they're doing is, they've got like 2 letter abbreviations for tags. So it's like a 2-letter abbreviation, a dot, and then the name. So it's fa.functional-analysis. Or ra.rings-and-algebras. Atwood: Look what you've done Joel, you've made me go to a math site. Spolsky: Are you listening to a word I'm saying? Atwood: I am listening! I'm just trying to tell you... Spolsky: Look at the tags, this is a general idea that they seem to have invented here. So now if you want to look at probability stuff you don't have to type 'probability' you just type 'pr.' and then it only has one match. You see? Get it? Atwood: The Hawaiian Earring? Spolsky: Look at the tags! Atwood: I am looking at the tags, I see what you're talking about. I've processed that. Spolsky: You see what they've done? They have this cool feature, that you can just type like 3 letters and it'll only have a unique match. Atwood: Very rapidly yeah. Although we do match anywhere. Spolsky: I know but you'll have multiple matches because those 2 letters... because they put that little dot in there, this means that if you just know the 2-letter code for something, you're just going to hit the 2-letter code and you're done. Atwood: Yep. Cool. MathOverflow's great, it's been hugely popular. There's definitely been demand for it from way back. Spolsky: I don't understand anything. Like nothing. Hawaiian Earrings, I know what those are. Atwood: I'm glad there's people smart enough to do this advanced math because I really, really suck at it. Spolsky: I'm voting that up. Atwood: Wow, you can vote on MathOverflow? Spolsky: No, it didn't let me. I need to talk to Aaron, I want to be able to vote on MathOverflow. Atwood: So anyway, MathOverflow is fantastic and Jacob's the guy who's been helping us out on that. The server-side implementation was where I wanted to do some additional work and it wasn't actually an open-source project. I don't think it was intentional, but the original author did not present it under an open-source license, which means, as you know: if it's not open-source it's copyrighted by default. So I contacted him and he was totally cool about it and he granted the copyright to me. So I was then able to turn around and open-source that and put it up as Markdown Sharp on Google Code and I'll link that in the show notes. I was able to make quite a bit of progress. You know, we're a little bit down on unit-testing, but this is like a textbook example of where you want unit-tests. One of the first things I did was put in unit-tests. Unit-tests for Markdown are pretty simple, they're basically just input and output. You have an input file which contains Markdown and you pass it through the processor and the output should match the reference. Spolsky: This is an awesome example of where it's straight text transformation, it's so easy to do automated tests, unit-tests, TDD and all that kind of stuff. [10:00] Atwood: It's brilliant, because I found just an unbelievable number of bugs... oh my gosh I found a lot of bugs. Bugs, like, in our port, just accidental bugs. Literally just like an extra space in the regex in the wrong place. Spolsky: Right. Atwood: And it was causing it, it wasn't causing it to break, but it was causing like failure-to-match and that was causing the output to be subtly wrong. Not in a way that really broke anything per se but it was wrong. I fixed that, and there's a lot of bugs from the actual implementation, the Perl implementation. The original implementation of Markdown is Perl. Atwood: Yes, so I sent you a link. You should click on that link now and look at that. Atwood: Yes, there's somewhat of a tradition, unfortunately, of writing Markdown parsers in regexes. That definitely starts to have a downside. I'm a huge fan of regular expressions, but there's a point where it becomes extremely complicated code. I haven't been able to get anyone to really help me. Now, to be fair, this gets into issues of like running an open-source project. Now I am "running an open-source project." It's a very small one. And I solicited help and a lot of people have contributed patches and stuff and I really appreciate that, but one thing I've noticed is there's a lot of "painting the bike shed" that goes on versus the core problem of when you have this dense mass of code that's just a bunch of really complicated regular expressions; although, some of them aren't too complicated, but the flow of the program is very regex based. People are not really able to help you very much. That's what I've seen. Atwood: They can't or they don't want to, but the really hard part of the code I'm not getting a ton of help with. Atwood: Let me clarify, you're looking at the PHP Markdown. Now one of the problems- let me give you a little background: when I mentioned we have a reference Markdown standard, that's kind of the problem with Markdown. It is kind of a standard, John Gruber laid out the specification, but there's a lot of edge-conditions he didn't cover. Atwood: There's just a lot of bugs. I mean a ton of bugs. Atwood: I don't know. I don't think you need to be a computer scientist to write code. Atwood: It's not really a parser. Atwood: No. And you definitely- as I said, this is the PHP implementation. What I found is that the PHP implementation is actually much better than the Perl implementation. Even the- there's some secret unreleased versions of the Perl implementation. Atwood: The thing about the Perl implementation is it's really close. But it had edge-conditions that are super-super-hard to get rid of without writing a lot of complicated code. I think it's the classic example of Perl code in that it worked for the 95% case, but once you start looking at the unit tests that fail, to fix them is this rabbit hole of like- [15:00] Spolsky: I'm sorry, I didn't mean to criticize anyone in particular, it's just that the choice of- you know a lot people see a problem with Markdown, and they say "Ah, I need to search for certain things and replace them with other things." And I think that that's kind of- that the real way to look at that is- I mean you can do- you can go down that path, of regexps and I am searching for things and replacing them with other things, but when you do that, you're not really keeping track of what state you're in as you go through the tree and you make mistakes and there are edge-conditions and there are things that people can insert that will cause you to output things that are very, very invalid. And I think that somebody who has taken a compilers course would say "Oh, I have text that I have to translate into a different form, I need to lex it and parse it, and then I need to create an abstract syntax tree, and then I need write out that other form." This is not a lot of code and you wouldn't get a lot of code if you did it that way actually.Atwood: There was a funny post on Reddit, a reaction to the blog post that I put up and he said "It became a tradition to have crappy implementations of Markdown." Because the reference implementation was a certain way so it kicked off a lot of clones, because people all just copy this. It really does work for the 95% case. The edge-conditions are not terribly bad. Atwood: But fixing them is just unbelievably difficult and that's where you get into "If you want to do this the right way," then it is difficult to do with regular expressions. Atwood: It's possible, it's just that the code becomes very, very, difficult to work with in my opinion. I'm certainly seeing that with the PHP implementation where they fixed a lot of the problems with the Perl 1.01 and the 1.02 the unreleased version. He had a different parser there, and it's really complicated. Atwood: I think there are actually, but the problem is I just did a cursory look. My goal was really simple, I sort of fell down the rabbit hole as I got- okay I'm just porting code, I'm not trying to write new code, that's not really my goal here. I just want to make sure I match the reference implementation. You have 2 problems: one is the reference implementation kind of sucks, it's not really right. Atwood: It's not "referency" at all. So then you look at the alternative implementation which is PHP Markdown, honestly the most mature one, the one that's maintained the best, the most accessible, the one that I could find, and it follows the lead of the original implementation. Atwood: For PHP it's quite good. Atwood: Well, I sent Joel a link and I'll put this link in the show notes but that's the link to the HTML detection regex which is like, I would say on an average large programmers monitor, it's a regular expression that's probably 2 to 3 pages long. And it's used with whitespace, I mean it's broken up, it's probably the most complicated regular expression I've ever seen that's actually a real thing and not a joke. Atwood: I know, but I conside that one kind of joke. Nobody hopefully really uses that. But this was written by a human being and it's commented and uses whitespace and all the right things, just to show you how complicated it is, if you specify compiled on that regex it does not help it actually hurts in this case because the regex is so complicated. .NET freaks out on my machine for about 5 seconds, like trying to compile this thing. Atwood: It works, it does compile it, but it takes like- it literally just freezes; your CPU usage goes way up, and it kind of drives the regex compiler a little bit crazy I think. So it's quite a sight to see. It really highlights to me one of the big weaknesses of regular expressions which is matching pairs. Atwood: Yeah, that's really a pain in the butt. And that's what a lot of the hairiest code is balanced matching. [20:00] Atwood: Spolsky:
| Spolsky: I was just in a meeting last second with this guy he made a StackExchange called ClimateDeal. Have you heard about that? Atwood: I haven't heard of it. Spolsky: climatedeal.org. It's a climate change StackExchange and I guess there's lots of money in the NGO business. Atwood: Really? I didn't know anything about that. And another thing I don't know anything about that somebody mailed me about and I want to mention since we're on the topic of StackExchanges is and astronomy StackExchange. Not just astronomy but... Spolsky: Like I'm a libra and you're a leo therefore... Atwood: Actually Joel, I'm a capricorn though, and capricorns are very stubborn as you know. Spolsky: And I'm a leo. I know that you're stubborn I don't know about capricorns in general. Atwood: That's the crazy thing about astrological science. Everybody is born in a specific month has the same personality traits. That doesn't even make any sense at all. It doesn't even pass the sniff test of like: Is this sensible? No. This is like turning lead into gold ridiculous. Spolsky: What's real is biorhythms. Atwood: This is a real thing it's actually astro-tech, I guess it's technical astronomy: interfacing with telescopes and astronomical instrumentation. It's answers.ascom-standards.org. I guess they had an existing site. Spolsky: Terrible URL... Atwood: So if you're into astro-tech... Spolsky: Woah, look at all these people, look at all these questions. Is ascom a thing? I think it might kind of be a thing because there are people asking about ascom on here. Atwood: Yeah, I think it is a thing. This is, when we created StackExchange we were looking at niches and I'm a big fan of these little niches on the internet, I think it's wonderful. Spolsky: We got to tell the listeners. Ascom is a many-to-many and language-independent architecture supported by most astronomy devices that connect to computers. So it sounds like it's like MIDI of telescopes. Atwood: Yes, that makes sense. Spolsky: That's what is sounds like. Atwood: But only if it can play music Joel. Spolsky: Except that it doesn't play music, it plays telescopes. And some of the names here I recognize from StackOverflow so I think that there's some overlap. Maybe not, maybe it's just because there are people named Chris and Bob and stuff. It is sort of interesting how these StackExchanges, like the first level and the second level of spreading I asked Jose who's here from ClimateDeal how he heard about StackExchange and why he decided to start using it. He said they're building a whole organization around StackExchange kind of and they're going to promote to other climate change type of organizations and other "green" organizations that they know. And I asked him how he found out about it and he's like, "We were working with these programmers and they suggested that we look at this and they use StackOverflow." Like all the programmers on StackOverflow have an obligation to tell other industries and get them excited about the StackOverflow vision for the future. Atwood: Anything to keep people off of PHP. It's like keeping people off drugs, it's the right thing to do. Spolsky: Right... Atwood: And when I say PHP I mean PHPBB. I'm specifically talking about PHPBB. Spolsky: You know that there is 99.999% of PHP is not PHPBB. Atwood: I know. There's a ton. [00:05:00] Spolsky: Let's keep people off that too. Atwood: Well, not necessarily, I've kind of resigned myself to a world of PHPBB at this point. So, one thing I want to talk about is over the holidays I was able to contact the person who created the Markdown... there's 2 Markdowns: Markdown is the markup language that we use on StackOverflow. There's 2 implementations that we have: one is for the client side preview, which is the wmd control which we had to reverse-engineer... the whole story is in a previous podcast. And then there's the server-side implementation. So one of the difficulties we ran into was these are subtly different. Spolsky: So the previews are not matching when it shows up on the site? Atwood: Right. Over the holiday I did improve the preview, like the main areas that were just kind of oversights really. Like we changed some of the rules about bold and italic and for the most part they match now except for really weird edge-cases. I got rid of all the obvious mismatches. Spolsky: Yeah. Atwood: And actually I got help from Jacob, I'm going to mispronounce his name I'm just going to call him Jacob, who runs the MathOverflow StackExchange. He was very helpful. He was helpful in sort of troubleshooting that. They used a lot of weird syntax on MathOverflow. Spolsky: They do amazing things with math notation basically. They use LaTeX. Atwood: Yes, but, we've talked about that before, but in addition to that they just have ASCII notation as well, and the ASCII notation can be problematic because you're putting characters in sequences that are just really, really uncommon in any kind of normal text at all. So they were running into a lot of edge-conditions as well so he was very helpful so I do want to give a shout out to Jacob in that regard. Spolsky: Have you looked at MathOverflow lately? It's absolutely insane. Look at all these tags, they've got tags with little dots in them. Why is that? Oh, I think that the dot is like the... Atwood: Don't you understand Joel? I'm kind of like allergic to math so it's not really good for me to be around math. Spolsky: Look at the site and look at their tag cloud over there. What they're doing is, they've got like 2 letter abbreviations for tags. So it's like a 2-letter abbreviation, a dot, and then the name. So it's fa.functional-analysis. Or ra.rings-and-algebras. Atwood: Look what you've done Joel, you've made me go to a math site. Spolsky: Are you listening to a word I'm saying? Atwood: I am listening! I'm just trying to tell you... Spolsky: Look at the tags, this is a general idea that they seem to have invented here. So now if you want to look at probability stuff you don't have to type 'probability' you just type 'pr.' and then it only has one match. You see? Get it? Atwood: The Hawaiian Earring? Spolsky: Look at the tags! Atwood: I am looking at the tags, I see what you're talking about. I've processed that. Spolsky: You see what they've done? They have this cool feature, that you can just type like 3 letters and it'll only have a unique match. Atwood: Very rapidly yeah. Although we do match anywhere. Spolsky: I know but you'll have multiple matches because those 2 letters... because they put that little dot in there, this means that if you just know the 2-letter code for something, you're just going to hit the 2-letter code and you're done. Atwood: Yep. Cool. MathOverflow's great, it's been hugely popular. There's definitely been demand for it from way back. Spolsky: I don't understand anything. Like nothing. Hawaiian Earrings, I know what those are. Atwood: I'm glad there's people smart enough to do this advanced math because I really, really suck at it. Spolsky: I'm voting that up. Atwood: Wow, you can vote on MathOverflow? Spolsky: No, it didn't let me. I need to talk to Aaron, I want to be able to vote on MathOverflow. Atwood: So anyway, MathOverflow is fantastic and Jacob's the guy who's been helping us out on that. The server-side implementation was where I wanted to do some additional work and it wasn't actually an open-source project. I don't think it was intentional, but the original author did not present it under an open-source license, which means, as you know: if it's not open-source it's copyrighted by default. So I contacted him and he was totally cool about it and he granted the copyright to me. So I was then able to turn around and open-source that and put it up as Markdown Sharp on Google Code and I'll link that in the show notes. I was able to make quite a bit of progress. You know, we're a little bit down on unit-testing, but this is like a textbook example of where you want unit-tests. One of the first things I did was put in unit-tests. Unit-tests for Markdown are pretty simple, they're basically just input and output. You have an input file which contains Markdown and you pass it through the processor and the output should match the reference. Spolsky: This is an awesome example of where it's straight text transformation, it's so easy to do automated tests, unit-tests, TDD and all that kind of stuff. [00:10:00] Atwood: It's brilliant, because I found just an unbelievable number of bugs... oh my gosh I found a lot of bugs. Bugs, like, in our port, just accidental bugs. Literally just like an extra space in the regex in the wrong place. Spolsky: Right. Atwood: And it was causing it, it wasn't causing it to break, but it was causing like failure-to-match and that was causing the output to be subtly wrong. Not in a way that really broke anything per se but it was wrong. I fixed that, and there's a lot of bugs from the actual implementation, the Perl implementation. The original implementation of Markdown is Perl. Atwood: Yes, so I sent you a link. You should click on that link now and look at that. Atwood: Yes, there's somewhat of a tradition, unfortunately, of writing Markdown parsers in regexes. That definitely starts to have a downside. I'm a huge fan of regular expressions, but there's a point where it becomes extremely complicated code. I haven't been able to get anyone to really help me. Now, to be fair, this gets into issues of like running an open-source project. Now I am "running an open-source project." It's a very small one. And I solicited help and a lot of people have contributed patches and stuff and I really appreciate that, but one thing I've noticed is there's a lot of "painting the bike shed" that goes on versus the core problem of when you have this dense mass of code that's just a bunch of really complicated regular expressions; although, some of them aren't too complicated, but the flow of the program is very regex based. People are not really able to help you very much. That's what I've seen. Atwood: They can't or they don't want to, but the really hard part of the code I'm not getting a ton of help with. Atwood: Let me clarify, you're looking at the PHP Markdown. Now one of the problems- let me give you a little background: when I mentioned we have a reference Markdown standard, that's kind of the problem with Markdown. It is kind of a standard, John Gruber laid out the specification, but there's a lot of edge-conditions he didn't cover. Atwood: There's just a lot of bugs. I mean a ton of bugs. Atwood: I don't know. I don't think you need to be a computer scientist to write code. Atwood: It's not really a parser. Atwood: No. And you definitely- as I said, this is the PHP implementation. What I found is that the PHP implementation is actually much better than the Perl implementation. Even the- there's some secret unreleased versions of the Perl implementation. Atwood: The thing about the Perl implementation is it's really close. But it had edge-conditions that are super-super-hard to get rid of without writing a lot of complicated code. I think it's the classic example of Perl code in that it worked for the 95% case, but once you start looking at the unit tests that fail, to fix them is this rabbit hole of like- [00:15:00] Spolsky: I'm sorry, I didn't mean to criticize anyone in particular, it's just that the choice of- you know a lot people see a problem with Markdown, and they say "Ah, I need to search for certain things and replace them with other things." And I think that that's kind of- that the real way to look at that is- I mean you can do- you can go down that path, of regexps and I am searching for things and replacing them with other things, but when you do that, you're not really keeping track of what state you're in as you go through the tree and you make mistakes and there are edge-conditions and there are things that people can insert that will cause you to output things that are very, very invalid. And I think that somebody who has taken a compilers course would say "Oh, I have text that I have to translate into a different form, I need to lex it and parse it, and then I need to create an abstract syntax tree, and then I need write out that other form." This is not a lot of code and you wouldn't get a lot of code if you did it that way actually.Atwood: There was a funny post on Reddit, a reaction to the blog post that I put up and he said "It became a tradition to have crappy implementations of Markdown." Because the reference implementation was a certain way so it kicked off a lot of clones, because people all just copy this. It really does work for the 95% case. The edge-conditions are not terribly bad. Atwood: But fixing them is just unbelievably difficult and that's where you get into "If you want to do this the right way," then it is difficult to do with regular expressions. Atwood: It's possible, it's just that the code becomes very, very, difficult to work with in my opinion. I'm certainly seeing that with the PHP implementation where they fixed a lot of the problems with the Perl 1.01 and the 1.02 the unreleased version. He had a different parser there, and it's really complicated. Atwood: I think there are actually, but the problem is I just did a cursory look. My goal was really simple, I sort of fell down the rabbit hole as I got- okay I'm just porting code, I'm not trying to write new code, that's not really my goal here. I just want to make sure I match the reference implementation. You have 2 problems: one is the reference implementation kind of sucks, it's not really right. Atwood: It's not "referency" at all. So then you look at the alternative implementation which is PHP Markdown, honestly the most mature one, the one that's maintained the best, the most accessible, the one that I could find, and it follows the lead of the original implementation. Atwood: For PHP it's quite good. Atwood: Well, I sent Joel a link and I'll put this link in the show notes but that's the link to the HTML detection regex which is like, I would say on an average large programmers monitor, it's a regular expression that's probably 2 to 3 pages long. And it's used with whitespace, I mean it's broken up, it's probably the most complicated regular expression I've ever seen that's actually a real thing and not a joke. Atwood: I know, but I conside that one kind of joke. Nobody hopefully really uses that. But this was written by a human being and it's commented and uses whitespace and all the right things, just to show you how complicated it is, if you specify compiled on that regex it does not help it actually hurts in this case because the regex is so complicated. .NET freaks out on my machine for about 5 seconds, like trying to compile this thing. Atwood: It works, it does compile it, but it takes like- it literally just freezes; your CPU usage goes way up, and it kind of drives the regex compiler a little bit crazy I think. So it's quite a sight to see. It really highlights to me one of the big weaknesses of regular expressions which is matching pairs. Atwood: Yeah, that's really a pain in the butt. And that's what a lot of the hairiest code is balanced matching. [00:20:00] [...] [00:58:19] Listener: Hi Joel and Jeff, I have a question about code samples. Part of the Joel Test is writing code during the interview. How do you feel about the companies that give candidates take-home coding assignments... Spolsky: ...waste of time. Listener: ...also if you are a candidate, how much time should you spend producing a code sample? Spolsky: ...all of it. Listener: ...As a job seeker I'd rather be able to point some code I've written for some open-source project than to spend Saturday writing a sample program for just one company. From the company's perspective I can see the value in having all candidates answer the same problem. Do you think that take-home coding assignments are effective means of finding smart candidates who get things done? Thanks. Spolsky: OK, so I answer... Atwood: Wait, first who does take-home assignments? It's for a job? Spolsky: Very common. Atwood: Really? I have never... Spolsky: Yes Atwood: This just sounds ridiculous. Frankly. Ridiculous. Tell me why this even would make sense to any pointy-haired manager? Spolsky: Because they get two hundreds resumes for the job, and they are like "Oh, shit, 98% of these people cannot program". And so they are looking for some kind of a screening mechanism to quickly eliminate that 98%. And so they give everybody a take-home programming problem. And the problem is that the people that cannot [solve it], cheat. No offense. But they do. They cheat. Atwood: Right... Spolsky: Maybe not by getting someone to help them, but they say they spent more time on it that they really said they spent on it and they are getting answers and help on them on the StackOverflow, and they cut-and-paste from here and there and other things, and they just... You're not gonna come up with a question that is so clever that it isn't already in forty seven places on the internet... Atwood: Wait, wait, I have a thought about that. So it would drive away exactly this kind of people you would want. The people that would realize, like me, that this is ridiculous. Spolsky: Well, the truth is that there are plenty of completely legitimate employers that do this because they think this gonna, you know... I mean maybe you are trying to get a great job at whatever the company is, and this is just a crazy thing that they got in their head to do, but it's a good company to work for, mostly, and it is in your area, they pay well, and you'd like to work there. Atwood: So you think sometimes you have to make through the bad kind of candidates and good kind of candidates. So this is really not gonna accomplish your goal at all. Spolsky: Exactly. It will accomplish nothing in terms of distinguishing... Yes, that's what I am saying. Atwood: OK, fine. I totally get that. Spolsky: Asking them to code in front of you is a way for you to figure out of they're smart and to see they really know how to program... Atwood: Wait, wait, wait, I got an Idea. Could you do this... I mean do they physically have to come and code in front of you? Spolsky: No, you can do it... Atwood: It doesn't scale! Spolsky: Well, scale... It doesn't have to scale. How many people you are hiring? Probably some fraction of the number of people that are working for you. Atwood: This is an additional pass, they are trying to do some filtering. Spolsky: Yeah. So we do that with Copilot or with Etherpad, where we basically use Copilot to get onto their computer and ask them to open Notepad and we do it over the phone. Atwood: Ah, that's right. Spolsky: We do want to see them doing it. Because the truth is we want to hear them think, we want to see them think, we want to see how quickly they do things, we want a proof that they can do it. Because anybody can generate code for million hours. To get a job they will be able to somehow find the way. Do you think they are not asking their friends for help when you get them a little programming assignment? Especially the people that can't get [inaudible] ... Atwood: So it's the observation part that is really missing here. That's the key. Spolsky: Right. And that's where you get the insight into whether this person is smart or not and how they really think, what they really understand. I recently asked an intern candidate a very, very simple programming problem. I don't think I am burning it, because I don't plan to use this problem very much anymore. Give an array of numbers and given a pointer into a middle of this array (it doesn't have to be a pointer, [just] and element in the middle of the array, like the 37-th element in the array)... Atwood: Wait, wait, Joel, can I use regular expressions? Spolsky: No! Atwood: [laughter] Spolsky: You need to write a function that determines if the value of all the elements in the array summed up to the left of that pointer is equal to the value of all the numbers to the right of that pointer. If you get [...] 100 elements in the array and if I give you the number 37, I need you to tell me if the elements 1 through 36 summed up are the same value as 38 through 100. That's really all it has to it. And there are more and less efficient ways to do this. This is very easy problem, it is just meant as a very initial are-you-a-complete-retardo, can-you-think-about-programming... You can use any language you want. What was interesting we very rapidly got into a conversation with this intern candidate about the performance aspect of it, will that be efficient. He has done it in Python, and I said "so how does Python actually implement that, is it going to be checking the length each time in a loop or...", is there a more efficient way to do this, or less efficient way to do this. The first code that he was writing was making ridiculous number of copies of the array, so that was never going to be efficient. But he figured that our right away and just in having a conversation I realized that he was certainly smart and knew what was going on behind the scenes. So if I just asked him to write a code and send it in, first of all he would have had plenty of time to debug it, test it... This way, I saw him do it in two minutes. [...] This intern was hired, BTW. But here is a different story: we had a [...] candidate who came to apply for a job at FogCreek and I gave him and even simpler problem [...]: given a point, determine if it is inside a rectangle. Atwood: Uhm. Spolsky: And he did it, but it took him like 45 minutes. It was absurd how long it took him. Really, very, very, very bizarre. So we didn't hire him for that (and for another reason)... Atwood: His hair? You didn't like his hair? Spolsky: No..., he did not,... hm, actually, if you don't like the hair that's a good reason not to hire! Atwood: [laughter] Spolsky: Later I saw in his blog, that he applied for a job at a very large company that does software development and normally has a reputation for being good at interviews. And they couldn't be bothered to fly him up for an interview or whatever, so they just asked him to e-mail them the answer to a problem which, lo and behold, turned out to be almost exactly the same as I have given this person! Atwood: [laughter] Spolsky: By coincidence! And he e-mailed it in and he got a job! [...] I did not hire him pretty much on the basis that it took him 45 minutes to do something that he should have been able to do as fast as he could write. And it's not that he could not get the answer. It's that it was such a freaking struggle for him to do something that easy, that he was not going to be able to do something complicated. That was my feeling at this time. In this particular case the guy was kind of inexperienced and he later became a great programmer. But not in [inaudible]. Atwood: I think the lesson here about the observation is the important one. You have to give people a task and observe them doing the task. Spolsky: That's what it's all about. Atwood: Otherwise you are just looking at the output. And there is so many variables there, right? Spolsky: Yes, you might [inaudible], but it's really a question like: was it time limited? Did you give them 30 minutes to do it? Did you give them 15 minutes to do it? Or did you give them a weekend to do it? Thats a big difference, because that's the difference between good programmers and bad programmers a lot of times. Atwood: I guess also observing the thought process tells you more than just looking at the output. Spolsky: Yes. Atwood: Although it is scary, though. If you think about the way the media works, blogging and stuff... What if your process is completely broken? All people see is the output and, you know, "he must be a brilliant writer"... Spolsky: Right... Atwood: ...but the process is completely broken. Would that even matter? Spolsky: Hm... Atwood: I don't know, it's kind of weird if you think about it. Spolsky: Here is the example where it matters: when you cheat, if your process involves plagiarism, then it does matter. If it takes you forever to write anything, when all you can do is crib someone's else notes and write... you probably had people take one of your blog posts and just paraphrase it, as if it was their own blog post. Atwood: That's what I do, Joel! On my blog! Spolsky: Yeah, that's true. [laughter] Atwood: [laughter] Spolsky: But you are quoting, you are just quoting... Atwood: [laughter] Perfect! But I totally agree, it's all about the process. And also you have to work with this person. That's also what you are observing - what it is going to be like to work with that person. If this person is taking 45 minutes to do something that is trivial, you are going to be incredibly frustrated, if you ever have to work with this person. Spolsky: Yes, or if you need them to get stuff done in a reasonable amount of time. Atwood: Yes. Spolsky: So this is a crazy way to do interviews, but it is pretty common as an attempt to do first filtering, but I just don't believe that it works (not having actually done it). Atwood: It sounds that it's pretty bad. Spolsky: There is a company that has a little website that conducts programming tests for you on the internet... let me see if I can find them... Atwood: If you find it, mail it to me, I'll put it in the show notes. Spolsky: Oh, here it is, Codility. Let's go there: codility.com . You go there and it makes a little programming test, and you can tell it what language you speak, so it has English, Chinese and, I don't know, Hindi? I am not really sure, sorry... Polish, I think. [click] "Take a free test". Ah yes, English, Polish and Chinese. [...] You are given a programming problem, you can do it in Java, C++, C#, C, Pascal, Python and PHP, which is pretty cool, and you have 30 minutes. And it gives you an editor in a webpage. And you've got to just start typing your code. And it's going to time you, basically you have to do it in a certain amount of time. And it actually runs your code and determines the performance characteristics of your code. Atwood: Wait, wait, wait, do you think it works? To me it lacks the whole human element that you said was so important before... Spolsky: Right. Atwood: Are you for this or against this?... Spolsky: Hm, uhm, hm, khm... It has a time limit, that's a good thing. It can be cheated on, so you pretty much have to bring people on to do this, and they say they developed it--it is sort of intereting that the test is available in English, Polish and Chinese, because that says a little bit something about why they developed it. They developed it because in certain markets when you try to hire a developer you get enormous numbers of applicants, all of whom are great on paper, [....] many of whom are unbelievably unqualified. And so that first screening is very important. When we do the first screening based on resumes, we really can eliminate a lot of bad programmers, just by looking at a resume, believe or not. [...] Apparently in China or India you put on an ad for a programmer and you get three thousands applications, all of whom got absolute top marks at very best universities in India, who will come in and most of whom, will it turn out, don't actually know how to operate a computer. And so they developed this test for the purpose of being the first screening and I think it would probably work if you did it as a first screening in person, so that they have to come in and sit down in front of a computer and do the test at the computer so that there is no cheating. And you just use it simply as a way to get to the next stage and it is very similar to what we are doing in the first Copilot interview except nobody is really watching you [...]. You still have to do that stuff later, but this is just the way to reduce that pile of 3000 applicants down to 30 in a moderately cost-effective kind of way. Atwood: Uhm... Spolsky: So the question was also about: "people ask you to submit a code sample". And that's kind of reasonable, although not really. I think everybody can come up with a code sample that looks pretty good. Most people will come up with code samples of things that they have done before. Just saying: "look I contributed this to an open-source project and here is some code that I wrote for them"... I don't really know how much you learn from reading that. Atwood: I will say, that I have done it in the past--I have done some interviewing, but not nearly as much as you--but I did find that helpful... Spolsky: To look at code samples? Atwood: ... because a lot of programmers will just refuse to give you any code, "I can't give you any code", and that is a flag. They can't give you any code that they are proud of? The way I phrased was "give me any code that you are proud of, some of your best, most interesting work." And if they come up with nothing, which has happend to me, I was like "hm, OK, well, that's done..." :-) How can you not have anything? Spolsky: Well, sometimes they think that it's all owned by their employer and they are not allowed to give it to you. But then again, why aren't they doing anything outside of work? I don't know... you should have some projects... Atwood: I guess [so]... I think no person owns you, you are not a slave... [inaudible] Spolsky: No, but this is completely legitimate for them to say "I can't give you code, because it is owned by my employer". It is. Atwood: I think it is a flag if they are not willing to overrule their own employer and... Spolsky: No. Oh, this is a good one. It is a flag if they are willing to break their employment contract, in which they specifically agreed to keep the stuff private. Atwood: They aren't gonna share like trade secrets! I am asking they show me something interesting. And there is nothing that you can show me that is interesting which is not like a massive trade secret? It's like copyrighted?... Spolsky: I don't know. They have a point. I would think that it is slightly a flag if somebody is working for an employer and they just [inaudible] code directly from employer without that employer's permission. Atwood: Oh, take like a code fragment, not the entire compiling application! Spolsky: Yes. Atwood: 10 lines of code, really! Spolsky: Yeah. Then just write something for this purpose. Atwood: Yes. That's the thing. Generally the code you have written for your employer is a code you could have written for yourself anyway. Isn't it? Spolsky: Yes, because I am always writing code for nuclear power plant for my own personal use. Atwood: Oh, you did loops and increments... Good Lord!... Spolsky: The trouble is that's not so interesting, showing a bunch of loops and increments. You want to get something algorithmic, and meaty, and cool. Atwood: I guess. Spolsky: I don't like the whole idea, but on the other hand, having been on the other side with graphic designers, just being able to look at their portfolios--that's awesome. I could not live without that. But then again, that's just what they do, but with programming they don't really have their portfolios, so to speak, because almost nobody can do anything by themselves. Maybe with accomplished programmers you have to have some open-source projects, some things that you have built, you have to be able to tell me "I have worked on that product". But most of what we do as programmers is working on a big teams, it is hard to tell what you did and what other people did. The number of people walking around that wrote Microsoft Word or Microsoft Excel is astonishing. Atwood: Right. We should probably wrap it up, it's a little long here. Spolsky: It's enough, it's enough... Nobody wants to listen to this crap anyway... Atwood: [laughter] Atwood: Spolsky: [...]
|