View  Info
 

Podcast 080

Revision #11, 1/24/2010 10:32 PM
User: "add listener question about doc"
Tags: (None)

Previous Next 

Podcast 080

Revision #12, 1/29/2010 2:53 AM
User: "fixed typo"
Tags: (None)

Previous Next 

Jeff:     I was just sort of ranting a little bit on twitter about eh.. my experience with Github has been pretty negative so far. Part of it is just me but..

Joel:    Yeah, they probably hate you there.

Jeff:     No, no no. I think what they do is cool. It's just I find it.. We're getting a lot of noice of the Github. So what's hosted there is the WMD editor that we had to reverse engineer that Dana Robinson did pretty much all the work on. Dana has been tied up with some other stuff. In the mean time, a bunch of people have forked off the project, which is fine, I don't really have a problem with that, but it makes it really hard to figure out like what is going on with the project, because you have Dana working, then stopping for an extended period of time. Then you have a bunch of people that sort of picked up and started doing semi-random stuff.

Joel:    Wait, you just let them check it in? Or what, what, where, what?

Jeff:    What do you mean? No, no, no. They have their forks, their own private forks. Well, not private, that's actual the problem, I think. So anyway, in talking to people on Twitter about this is like, everyone that approaches WMD wants to help, wants to figure out, they have this immediate hurdle of like, well, what's the current version, you know, and if you go to derobins branch you'll see here's derobins and he did some work, and then you will see some bunch of people work and you're like, which one of those is the good one.

Joel:    That's the nature of any open-source project. There's supposed to be some kind of, like, shall we call it a parent, to use a metaphor.

Jeff:    Yes

Joel:    Who exhibits some kind of parenthood.

Jeff:    Yes, now that's definitely true, and I agree with that. But I also think that there's a symptom here, in that, part of Github's model is their not free, they're pay, which, again, I have no problem with. I have no problem with paying for stuff, I have no problem with it. The artificial distinction that they use, is you cannot make stuff private, unless you pay. So I think this is a little bit annoying, because with it means is people who are playing around will come in and get, they'll check out WMD. Well, not checkout, but whatever the correct term is, pull I guess, and they will show up in the timeline for WMD, even if they have no intention of..

Jeff and Joel: ....

Joel: ..as pulling

Jeff: Apparently, I mean, that's what I'm hearing on Twitter. That's what I'm kinda objecting to, that's there is always some sort of random stuff in the timeline. <Joel talking in the mean time>. You know, I don't care.. Okay, here's my thinking. And people correct me if i'm wrong about this, because i'm somewhat new to distributed version control, so I could be completely wrong, but what I'm thinking is, you know, I only want to know about your timeline if you have intent, you fork if you will, your your pull, if you have intent to fold back in.

Joel: It's not up to the person, you see, I think that in an open source project <unclear: it doesn't> work that way.

Jeff: I think this is all very side-effecty. What I'm saying is these people come with free accounts, they can't mark their stuff private, therefore, it always shows up in the timeline, if they even touch WMD at all. What makes the timeline...<Joel interupting>

Joel: What's your WMD, what's your account on Github?

Jeff: I actually just deleted it, because I wasn't using it...

Joel: Where's WMD on Github?

Jeff: Just do a Google search for WMD Github.

Joel: There's a whole bunch, there'se derobins's WMD at master.

Jeff: There's a whole bunch, this is the whole problem. <laughing> This is what I'm trying to tell you. So derobins's is theoretical the correct one, in that that's the one we use, but there's newer ones.

Joel: Okay, this is fine. Who care's there are newer ones. This is a branch though. "Functionality removed that we not needed"

Jeff: There's a lot of changes going on with other people, and it's confusing. I mean, this is my problem. <laughing> I think it's really confusing.

Joel: This is <Jeff talking: part of that is..> C# WMD. This is the WMD..

Jeff: No, this is the Javascript, the client.

Joel: This is the official, this is the original, not really official fork ten million different ways.

Jeff: Well, that's where I'm getting at. The whole situations is a little weird. Part of it is bad parenting. And granted, Dana hasn't been around, I haven't been around. So I have actually moved ..<interrupted>

Joel: Wait a minute, where have you been? 

Jeff: Well, I just haven't any time to work on the Javascript stuff at all.


<around the 19 minute mark>

 Jeff: And one thing I'll tell you right off the bat, the state machine was waaaaay more code. Way more code. You're talking, like, 25 lines of code! <laughing, astonished>

Joel: Yeah, but that could be.. <interrupted>

Jeff: Could have been done in the Regex in like, three lines. I'm just sayin'!

Joel: Yeah, <interrupted>

Jeff: So that's the downside, it's a *ton* of code!

Joel: Yeah, but that code is all legible. <laughs a lot>

Jeff: Hummmmmm. I don't know. I mean it's debatable. <talks over Joel> It gets somewhat debatable. It's definitely faster. There's no question that it's faster because you're doing three Regexes in this case to do the <interrupted>

Joel: It's much easier, it's easier to debug too.

Jeff: Naaaah, I dunno. The way I was seeing this code <laugh> there's actually a bug, sadly, in the normalize routine he contributed, I'm gonna have to roll it back, umm because it's not actually removing the newlines at the end of the code like it's supposed to <laugh>. Uh, but I don't know, I was a little taken back.. <interrupted>

Joel: So why don't you just fix it?

Jeff: Well, because, I, th, th, the thing is it's a bunch, like 25 lines of code, I have to look at and understand verses like three. <laughs> It's actually quite a bit more complicated. <talks over Joel>

Joel: Yeah, Fine. <whatever>

Jeff: I mean I can give you the code if you want to look at it. I mean It's not *super* complicated code but it's 25 lines of code.

Joel: Right. <exasperated>

Jeff: I'm mean lines of code, the more lines of code the more bugs, man.

<end classic segment>

(25:50)

 

Hey Joel and Jeff, this is Dave from the tri-state area.  I'm calling with a question for you guys.  I work at a large software company with a humongous code base.  Well in surplus of 50 million lines, and we have every language in there from FORTRAN to C++ to Javascript to God knows what else.  And there's a heavy kind of pervasive philosophy all around the company whenever I ask for documentation.  The usual answer I get is, "documentation? What documentation? The source code is the documentation!" I find this kind of irksome as a new guy over there, still trying to get my way around the environment. But what are your thoughts on this kind of approach?  Do you think that it helps you to learn more about the environment you're working in? Or impede development? Or what? Anyway, I'd love to hear your thoughts on this, and thanks for a great podcast and a great site.

Joel: Yeah.

Jeff: Documentation. It's exciting.

Joel: Did he say "dongumentation?"  Okay, this just makes me want to quit my job as a programmer even more.

Jeff: What, having to answer the question about documentation?

Joel: Just even thinking about this, yeah.

Jeff: Well, I think, I don't know, this is the theme I come back to a lot, but I think having worked in a large company, discoverability was the number one problem, and I just feel like the more you can rely on an external code base versus an internal code base, or have some sort of open-source thing where you're contributing as a company to some open-source thing that's sort of larger than your company... I just think there's... it's too difficult to attack this stuff internally, basically.  The reason that people would resist, well now I gotta write documentation for this internal thing, y'know, when I could be building this internal thing. And y'know if you create documentation, it's only going to be visible internally.  So what kind of benefit is even a large, large...

Joel: That's not even, okay come one, that doesn't even... I'm going to have to disagree with you here, because it sounds like what you're about to say is well of course this problem doesn't have a... you're starting to become of these open-source weenies, or that this problem doesn't happen on the Macintosh.

Jeff: No, no, no, no.

Joel: We don't have this problem on Linux.  On Linux, it's awesome to write documentation, because everybody reads it.  The whole world can read your awesome documentation, whereas who wants to write documentation for 50,000 Microsoft programmers to read?  That's boring and lame.

Jeff: Well let me backtrack a little bit. That wasn't exactly what I was saying.  What I was saying is that most large companies, the audience for anything at a typical large company is what, like 50 people?

Joel: No, because it's two and half people who are going to be using your code, who are going to have to be working on your code, and you don't know when they're coming and when they're not. But you're right, it's just a tiny number of people.

Jeff: That's why it's hard to get excited about it.  When I was documenting stuff internally, I said okay, I'll document it for the three people who are ever going to look at this, ever.

Joel: Exactly. And by the time, and you go to some major effort to write some documentation, and you're like I finally documented that.  Three years later, the code is changing every possible which way.  In the meantime, nobody's read your documentation, because they don't like to read, they're rather watch television, they can always just ask you. And you're like well did you read the documentation? And they're like, oh, yeah, I didn't think that was going to be right. So you're right, documentation is an impossible... or specifically documenting code...

Jeff: Wait, wait, wait, I got an idea, I got a perfect idea. I think what would have much better value in a company, a large company, is unit testing. Unit testing can be a form of documentation. (Joel sighs.) No? No? I mean, if you're going to sit down and write documentation, I think you'll get much more value out of writing actual unit tests.  And particularly because he's talking about large code bases, do you remember when I referred to unit tests as scaffolding around some grand old building?

Joel: But, but but, he's talking about, the problem he's talking about I think, he said he's kinda new at the company, and he's trying to get his mind around 50 billion lines of code they have. The problem there, what he really is asking for is like, isn't there a textbook I can read, where somebody will get me all started on all this complicated code and figure out where things are and all that kinda stuff.  And I don't think there is a code base in the world that has that documentation that is gonna make job easy for you, of learning your way around a large code base.  It's just hard.  It's like when you're a doctor, or when you're training to become a doctor, you spend a couple of years learning, like, human anatomy, and that's documentation for a big complicated system, in a sense.  And you read it, and there's lots of textbooks, and they teach you things, and after a couple years of studying the documentation and studying the human body in various other ways, you finally have kind of a good grasp on how that complicated system works.  And it's the same thing with a code base, except you don't even have, y'know, an anatomy textbook, so it's even harder, I guess.

Jeff:  You just need a native guide.  I guess there's inventory that would be more useful in that scenario.  Although I'm still going to go with my unit testing.  I still think it would be helpful to just read through some of the unit tests.  Plus, if he's going to be changing some of this code...?

Joel:  That's documentation of a very small... of a unit.  It doesn't really tell you that this subdirectory contains a whole bunch of files which are input to this function, in that place, which generates automatically a parser that can handle the watchamacallit thing that you use for this gigantic module that you don't have to know about because we haven't used that code in 15 years.  It's still there because we have a particular customer that's apparently using it, we're not sure, we're afraid to ask, there's no reason to delete it.  So if you have a large body of code, if you've ever tried to take over a large body of code, or to just start working on a large body of code, it's like impossible. You can't, y'know, it's really hard.

Jeff: Yeah. Well, you're recommendation was to pick some tiny bug...

Joel: Yes, that was my recommendation, I was just going to mention that again, so I guess this'll probably come up on every single podcast.  It's that the best way to start on a large body of code is to just be assigned a whole bunch of random little bugs all over the place, and just somehow figure them out.  It's gonna be hard, but eventually you'll just get better and better at understanding what's in the code and where it should be.  That doesn't... now, it sounds like I'm making excuses for not documenting your work, and there are different levels of documentation.  So you got documentation in the small.  Unit tests are awesome if you want to document a tiny little piece of... like, if you had very, very detailed unit tests for MarkDown, your MarkDown parser, or your WMD parser, you could look at them to resolve ambiguities.  You'd be like, hmm, there is an ambiguity here as to whether a line of dashes is a horizontal rule or it means heading-one.  And then you would go look for dashes in the unit tests and you'd see if anybody had made any unit tests ... you'd discover that they hadn't, and you'd say, well, I guess that's not documented.

Jeff: Well, somebody pointed out that part of my cognitive friction with testing on MarkDown# was that the way I was testing was too large.  Actually doing the input/output testing was considered too large to be a true unit test. Unit tests are supposedly smaller. It's like another kind of testing

Joel: But if you have a well-developed body of unit tests, for a well-developed body of code, and the unit testers are doing a great job, and they've been studying it for 200 years, and they've spent a year in a cave with Robert, with Uncle Bob Martin, actually side-by-side, what's the word, pair programming, and they've got awesome unit tests for their body of code, then in that circumstance, wouldn't you expect that the unit tests would be about the same size as the code base, if not larger?

Jeff: They'd be pretty large.

Joel: They might be larger than the code base.

Jeff: Yeah.

Joel: Okay, how does that help?  Go read that other thing, which is just an alternate expression of the same code base, in a different language, or a different format...

Jeff: Well, maybe I was thinking ahead in terms of not technically reading it, but just making a change and then being able to

Joel: Oh, yeah, that for sure is a very useful, that's useful and that's valuable...

Jeff: I would say that any time spent on documentation, to me, is really hard to justify.  At least, with unit testing it's still somewhat hard to justify, but I can see the benefit of, like, a new programmer who doesn't know anything about your code, can come in and make a change, and have some reassurance that, okay, I didn't break everything.

Joel: Yeah.

Jeff: I may have broken something small, but at least it wasn't something covered by one of the major unit tests that we have, or even the...  whatever the kind of testing I'm doing on MarkDown#. I still don't fully understand the distinction, but uh, y'know, input/output type testing.

Joel: Yeah.  There is definitely a feeling among programmers that there's never enough documentation of the code they've been told to go work on, ever. And there's also a pretty clear reluctance to ever write any documentation, because documentation in and of itself almost never gets written.  In fact, if you follow a team of programmers just kind of working naturally, they might document something they're about to code, as a way of understanding what they're about to do, and then they'll check that in as the documentation, and what they do is maybe 25% different.  And that code is going to change 14 different times, and that piece of documentation is still going to be checked in - that wrong piece of documentation.

Jeff: Yes, it doesn't stay in lock-step with the code. At least, the unit tests, if you break something, you kind of have to...

Joel: You kind of have to, yeah.  But the unit tests don't tell you enough.  They tell you about things in the small that you could figure out by looking at the code, or by reading the comment in front of the function that explains what the function does.  I dunno, sometimes they may clarify something, and they may be useful.  There's just a whole bunch of clues that you're gonna have to get.  There's some other stuff that's sometimes kind of weird like, I found that if you have a database, and you don't carefully document every column, that after a year or two you start to have a really, really brittle world.  So somebody'll make, um... whatever your application is, some table that has the most columns and is the most central to your application.  Y'know it's like the StackOverflow questions, or the user table, or whatever. It's got 48 different columns, and it's really, really kind of crucial, and 15 of those columns are a little bit mysterious.  Somebody put them in because they wanted to basically hang their data onto a user, or a question or whatever...  And if you don't ever document those, you just sort of throw them in there, then what you'll find is that people will write code, and it'll create new users without setting appropriate defaults, and other people won't keep those columns up to date, and just stuff will break, because those columns are not well-understood.

Jeff:  Right.  So maybe what you're trying to say is just document the core, the center.

Joel: Uh, document the data structures, at least, is the most crucial ...  data structures, and your tables and columns and stuff like that.  The most important thing is just to have very, very tight documentation of...

Jeff:  But start at the center, I think that's a good observation.  Like, find the center and document the crap out of that.  And, the center in terms of data structures, specifically.  That's a good idea.

Joel: And you can even, at some point you wanna have the new developer's guide, which you should maintain up to date.  There's something we've done, I don't know if we're still doing it, but a policy I used to have is there'd be a new developer's guide that'd say, here's how you get a checkout, here's how you set up the tools that you're going to be needing, just to compile, and here's how to get you to the point at which you can edit any file in our source code and cause there to be a compiled version and test it and debug it under your debugger.  And maybe even deploy it.  So like, the minimum, like, how the hell do I work with this code.  Not even, what does it do, or where is the code, or whatever, just how do I work with this body of code in this situation: how do I check things out, what passwords do I have, what compilers do I need, what tools do I need in my PATH, what environment variables do I have to set, all that kind of stuff.  And just like everything else, that stuff gets out of date pretty quickly, and nobody maintains it.  So you have a rule that the new guy has to use that documentation to get started, and every time they find a mistake, the new guy is responsible to fix it.

Jeff: That's a good idea.

Joel: So, at least every time a new guy joins, it gets refreshed, to be up to date.

Jeff: Yes. I like that.  Well, I think we have some good tips answering that question...

Joel: But just the idea of documentation makes me want to cry, because it really is impossible to... And when I think about writing verbosely, like the way you and I write our blog posts, where you actually try to explain everything in a way that somebody who's not patient and reads will understand them, and then you see the way people have gotten to be reading on the internet, where they're just skipping forward, they're ignoring paragraphs, they're just jumping from pretty bullet list to the next pretty bullet list, they're in Twitter mentality, they don't sit patiently and read your documentation, even if it is going to save them, they will not read it.  They will just skip to, y'know, interesting little pictures and blobs and blurbs and stuff like that on the page.

Jeff: You know Joel, I didn't even listen to any of that, 'cause I was browsing the internet.


 

 

Jeff:     I was just sort of ranting a little bit on twitter about eh.. my experience with Github has been pretty negative so far. Part of it is just me but..

Joel:    Yeah, they probably hate you there.

Jeff:     No, no no. I think what they do is cool. It's just I find it.. We're getting a lot of noice of the Github. So what's hosted there is the WMD editor that we had to reverse engineer that Dana Robinson did pretty much all the work on. Dana has been tied up with some other stuff. In the mean time, a bunch of people have forked off the project, which is fine, I don't really have a problem with that, but it makes it really hard to figure out like what is going on with the project, because you have Dana working, then stopping for an extended period of time. Then you have a bunch of people that sort of picked up and started doing semi-random stuff.

Joel:    Wait, you just let them check it in? Or what, what, where, what?

Jeff:    What do you mean? No, no, no. They have their forks, their own private forks. Well, not private, that's actual the problem, I think. So anyway, in talking to people on Twitter about this is like, everyone that approaches WMD wants to help, wants to figure out, they have this immediate hurdle of like, well, what's the current version, you know, and if you go to derobins branch you'll see here's derobins and he did some work, and then you will see some bunch of people work and you're like, which one of those is the good one.

Joel:    That's the nature of any open-source project. There's supposed to be some kind of, like, shall we call it a parent, to use a metaphor.

Jeff:    Yes

Joel:    Who exhibits some kind of parenthood.

Jeff:    Yes, now that's definitely true, and I agree with that. But I also think that there's a symptom here, in that, part of Github's model is their not free, they're pay, which, again, I have no problem with. I have no problem with paying for stuff, I have no problem with it. The artificial distinction that they use, is you cannot make stuff private, unless you pay. So I think this is a little bit annoying, because with it means is people who are playing around will come in and get, they'll check out WMD. Well, not checkout, but whatever the correct term is, pull I guess, and they will show up in the timeline for WMD, even if they have no intention of..

Jeff and Joel: ....

Joel: ..as pulling

Jeff: Apparently, I mean, that's what I'm hearing on Twitter. That's what I'm kinda objecting to, that's there is always some sort of random stuff in the timeline. <Joel talking in the mean time>. You know, I don't care.. Okay, here's my thinking. And people correct me if i'm wrong about this, because i'm somewhat new to distributed version control, so I could be completely wrong, but what I'm thinking is, you know, I only want to know about your timeline if you have intent, you fork if you will, your your pull, if you have intent to fold back in.

Joel: It's not up to the person, you see, I think that in an open source project <unclear: it doesn't> work that way.

Jeff: I think this is all very side-effecty. What I'm saying is these people come with free accounts, they can't mark their stuff private, therefore, it always shows up in the timeline, if they even touch WMD at all. What makes the timeline...<Joel interupting>

Joel: What's your WMD, what's your account on Github?

Jeff: I actually just deleted it, because I wasn't using it...

Joel: Where's WMD on Github?

Jeff: Just do a Google search for WMD Github.

Joel: There's a whole bunch, there'se derobins's WMD at master.

Jeff: There's a whole bunch, this is the whole problem. <laughing> This is what I'm trying to tell you. So derobins's is theoretical the correct one, in that that's the one we use, but there's newer ones.

Joel: Okay, this is fine. Who care's there are newer ones. This is a branch though. "Functionality removed that we not needed"

Jeff: There's a lot of changes going on with other people, and it's confusing. I mean, this is my problem. <laughing> I think it's really confusing.

Joel: This is <Jeff talking: part of that is..> C# WMD. This is the WMD..

Jeff: No, this is the Javascript, the client.

Joel: This is the official, this is the original, not really official fork ten million different ways.

Jeff: Well, that's where I'm getting at. The whole situations is a little weird. Part of it is bad parenting. And granted, Dana hasn't been around, I haven't been around. So I have actually moved ..<interrupted>

Joel: Wait a minute, where have you been? 

Jeff: Well, I just haven't any time to work on the Javascript stuff at all.


<around the 19 minute mark>

 Jeff: And one thing I'll tell you right off the bat, the state machine was waaaaay more code. Way more code. You're talking, like, 25 lines of code! <laughing, astonished>

Joel: Yeah, but that could be.. <interrupted>

Jeff: Could have been done in the Regex in like, three lines. I'm just sayin'!

Joel: Yeah, <interrupted>

Jeff: So that's the downside, it's a *ton* of code!

Joel: Yeah, but that code is all legible. <laughs a lot>

Jeff: Hummmmmm. I don't know. I mean it's debatable. <talks over Joel> It gets somewhat debatable. It's definitely faster. There's no question that it's faster because you're doing three Regexes in this case to do the <interrupted>

Joel: It's much easier, it's easier to debug too.

Jeff: Naaaah, I dunno. The way I was seeing this code <laugh> there's actually a bug, sadly, in the normalize routine he contributed, I'm gonna have to roll it back, umm because it's not actually removing the newlines at the end of the code like it's supposed to <laugh>. Uh, but I don't know, I was a little taken back.. <interrupted>

Joel: So why don't you just fix it?

Jeff: Well, because, I, th, th, the thing is it's a bunch, like 25 lines of code, I have to look at and understand verses like three. <laughs> It's actually quite a bit more complicated. <talks over Joel>

Joel: Yeah, Fine. <whatever>

Jeff: I mean I can give you the code if you want to look at it. I mean It's not *super* complicated code but it's 25 lines of code.

Joel: Right. <exasperated>

Jeff: I'm mean lines of code, the more lines of code the more bugs, man.

<end classic segment>

(25:50)

 

Hey Joel and Jeff, this is Dave from the tri-state area.  I'm calling with a question for you guys.  I work at a large software company with a humongous code base.  Well in surplus of 50 million lines, and we have every language in there from FORTRAN to C++ to Javascript to God knows what else.  And there's a heavy kind of pervasive philosophy all around the company whenever I ask for documentation.  The usual answer I get is, "documentation? What documentation? The source code is the documentation!" I find this kind of irksome as a new guy over there, still trying to get my way around the environment. But what are your thoughts on this kind of approach?  Do you think that it helps you to learn more about the environment you're working in? Or impede development? Or what? Anyway, I'd love to hear your thoughts on this, and thanks for a great podcast and a great site.

Joel: Yeah.

Jeff: Documentation. It's exciting.

Joel: Did he say "dongumentation?"  Okay, this just makes me want to quit my job as a programmer even more.

Jeff: What, having to answer the question about documentation?

Joel: Just even thinking about this, yeah.

Jeff: Well, I think, I don't know, this is the theme I come back to a lot, but I think having worked in a large company, discoverability was the number one problem, and I just feel like the more you can rely on an external code base versus an internal code base, or have some sort of open-source thing where you're contributing as a company to some open-source thing that's sort of larger than your company... I just think there's... it's too difficult to attack this stuff internally, basically.  The reason that people would resist, well now I gotta write documentation for this internal thing, y'know, when I could be building this internal thing. And y'know if you create documentation, it's only going to be visible internally.  So what kind of benefit is even a large, large...

Joel: That's not even, okay come one, that doesn't even... I'm going to have to disagree with you here, because it sounds like what you're about to say is well of course this problem doesn't have a... you're starting to become of these open-source weenies, or that this problem doesn't happen on the Macintosh.

Jeff: No, no, no, no.

Joel: We don't have this problem on Linux.  On Linux, it's awesome to write documentation, because everybody reads it.  The whole world can read your awesome documentation, whereas who wants to write documentation for 50,000 Microsoft programmers to read?  That's boring and lame.

Jeff: Well let me backtrack a little bit. That wasn't exactly what I was saying.  What I was saying is that most large companies, the audience for anything at a typical large company is what, like 50 people?

Joel: No, because it's two and half people who are going to be using your code, who are going to have to be working on your code, and you don't know when they're coming and when they're not. But you're right, it's just a tiny number of people.

Jeff: That's why it's hard to get excited about it.  When I was documenting stuff internally, I said okay, I'll document it for the three people who are ever going to look at this, ever.

Joel: Exactly. And by the time, and you go to some major effort to write some documentation, and you're like I finally documented that.  Three years later, the code is changing every possible which way.  In the meantime, nobody's read your documentation, because they don't like to read, they're rather watch television, they can always just ask you. And you're like well did you read the documentation? And they're like, oh, yeah, I didn't think that was going to be right. So you're right, documentation is an impossible... or specifically documenting code...

Jeff: Wait, wait, wait, I got an idea, I got a perfect idea. I think what would have much better value in a company, a large company, is unit testing. Unit testing can be a form of documentation. (Joel sighs.) No? No? I mean, if you're going to sit down and write documentation, I think you'll get much more value out of writing actual unit tests.  And particularly because he's talking about large code bases, do you remember when I referred to unit tests as scaffolding around some grand old building?

Joel: But, but but, he's talking about, the problem he's talking about I think, he said he's kinda new at the company, and he's trying to get his mind around 50 billion lines of code they have. The problem there, what he really is asking for is like, isn't there a textbook I can read, where somebody will get me all started on all this complicated code and figure out where things are and all that kinda stuff.  And I don't think there is a code base in the world that has that documentation that is gonna make job easy for you, of learning your way around a large code base.  It's just hard.  It's like when you're a doctor, or when you're training to become a doctor, you spend a couple of years learning, like, human anatomy, and that's documentation for a big complicated system, in a sense.  And you read it, and there's lots of textbooks, and they teach you things, and after a couple years of studying the documentation and studying the human body in various other ways, you finally have kind of a good grasp on how that complicated system works.  And it's the same thing with a code base, except you don't even have, y'know, an anatomy textbook, so it's even harder, I guess.

Jeff:  You just need a native guide.  I guess there's inventory that would be more useful in that scenario.  Although I'm still going to go with my unit testing.  I still think it would be helpful to just read through some of the unit tests.  Plus, if he's going to be changing some of this code...?

Joel:  That's documentation of a very small... of a unit.  It doesn't really tell you that this subdirectory contains a whole bunch of files which are input to this function, in that place, which generates automatically a parser that can handle the watchamacallit thing that you use for this gigantic module that you don't have to know about because we haven't used that code in 15 years.  It's still there because we have a particular customer that's apparently using it, we're not sure, we're afraid to ask, there's no reason to delete it.  So if you have a large body of code, if you've ever tried to take over a large body of code, or to just start working on a large body of code, it's like impossible. You can't, y'know, it's really hard.

Jeff: Yeah. Well, your recommendation was to pick some tiny bug...

Joel: Yes, that was my recommendation, I was just going to mention that again, so I guess this'll probably come up on every single podcast.  It's that the best way to start on a large body of code is to just be assigned a whole bunch of random little bugs all over the place, and just somehow figure them out.  It's gonna be hard, but eventually you'll just get better and better at understanding what's in the code and where it should be.  That doesn't... now, it sounds like I'm making excuses for not documenting your work, and there are different levels of documentation.  So you got documentation in the small.  Unit tests are awesome if you want to document a tiny little piece of... like, if you had very, very detailed unit tests for MarkDown, your MarkDown parser, or your WMD parser, you could look at them to resolve ambiguities.  You'd be like, hmm, there is an ambiguity here as to whether a line of dashes is a horizontal rule or it means heading-one.  And then you would go look for dashes in the unit tests and you'd see if anybody had made any unit tests ... you'd discover that they hadn't, and you'd say, well, I guess that's not documented.

Jeff: Well, somebody pointed out that part of my cognitive friction with testing on MarkDown# was that the way I was testing was too large.  Actually doing the input/output testing was considered too large to be a true unit test. Unit tests are supposedly smaller. It's like another kind of testing

Joel: But if you have a well-developed body of unit tests, for a well-developed body of code, and the unit testers are doing a great job, and they've been studying it for 200 years, and they've spent a year in a cave with Robert, with Uncle Bob Martin, actually side-by-side, what's the word, pair programming, and they've got awesome unit tests for their body of code, then in that circumstance, wouldn't you expect that the unit tests would be about the same size as the code base, if not larger?

Jeff: They'd be pretty large.

Joel: They might be larger than the code base.

Jeff: Yeah.

Joel: Okay, how does that help?  Go read that other thing, which is just an alternate expression of the same code base, in a different language, or a different format...

Jeff: Well, maybe I was thinking ahead in terms of not technically reading it, but just making a change and then being able to

Joel: Oh, yeah, that for sure is a very useful, that's useful and that's valuable...

Jeff: I would say that any time spent on documentation, to me, is really hard to justify.  At least, with unit testing it's still somewhat hard to justify, but I can see the benefit of, like, a new programmer who doesn't know anything about your code, can come in and make a change, and have some reassurance that, okay, I didn't break everything.

Joel: Yeah.

Jeff: I may have broken something small, but at least it wasn't something covered by one of the major unit tests that we have, or even the...  whatever the kind of testing I'm doing on MarkDown#. I still don't fully understand the distinction, but uh, y'know, input/output type testing.

Joel: Yeah.  There is definitely a feeling among programmers that there's never enough documentation of the code they've been told to go work on, ever. And there's also a pretty clear reluctance to ever write any documentation, because documentation in and of itself almost never gets written.  In fact, if you follow a team of programmers just kind of working naturally, they might document something they're about to code, as a way of understanding what they're about to do, and then they'll check that in as the documentation, and what they do is maybe 25% different.  And that code is going to change 14 different times, and that piece of documentation is still going to be checked in - that wrong piece of documentation.

Jeff: Yes, it doesn't stay in lock-step with the code. At least, the unit tests, if you break something, you kind of have to...

Joel: You kind of have to, yeah.  But the unit tests don't tell you enough.  They tell you about things in the small that you could figure out by looking at the code, or by reading the comment in front of the function that explains what the function does.  I dunno, sometimes they may clarify something, and they may be useful.  There's just a whole bunch of clues that you're gonna have to get.  There's some other stuff that's sometimes kind of weird like, I found that if you have a database, and you don't carefully document every column, that after a year or two you start to have a really, really brittle world.  So somebody'll make, um... whatever your application is, some table that has the most columns and is the most central to your application.  Y'know it's like the StackOverflow questions, or the user table, or whatever. It's got 48 different columns, and it's really, really kind of crucial, and 15 of those columns are a little bit mysterious.  Somebody put them in because they wanted to basically hang their data onto a user, or a question or whatever...  And if you don't ever document those, you just sort of throw them in there, then what you'll find is that people will write code, and it'll create new users without setting appropriate defaults, and other people won't keep those columns up to date, and just stuff will break, because those columns are not well-understood.

Jeff:  Right.  So maybe what you're trying to say is just document the core, the center.

Joel: Uh, document the data structures, at least, is the most crucial ...  data structures, and your tables and columns and stuff like that.  The most important thing is just to have very, very tight documentation of...

Jeff:  But start at the center, I think that's a good observation.  Like, find the center and document the crap out of that.  And, the center in terms of data structures, specifically.  That's a good idea.

Joel: And you can even, at some point you wanna have the new developer's guide, which you should maintain up to date.  There's something we've done, I don't know if we're still doing it, but a policy I used to have is there'd be a new developer's guide that'd say, here's how you get a checkout, here's how you set up the tools that you're going to be needing, just to compile, and here's how to get you to the point at which you can edit any file in our source code and cause there to be a compiled version and test it and debug it under your debugger.  And maybe even deploy it.  So like, the minimum, like, how the hell do I work with this code.  Not even, what does it do, or where is the code, or whatever, just how do I work with this body of code in this situation: how do I check things out, what passwords do I have, what compilers do I need, what tools do I need in my PATH, what environment variables do I have to set, all that kind of stuff.  And just like everything else, that stuff gets out of date pretty quickly, and nobody maintains it.  So you have a rule that the new guy has to use that documentation to get started, and every time they find a mistake, the new guy is responsible to fix it.

Jeff: That's a good idea.

Joel: So, at least every time a new guy joins, it gets refreshed, to be up to date.

Jeff: Yes. I like that.  Well, I think we have some good tips answering that question...

Joel: But just the idea of documentation makes me want to cry, because it really is impossible to... And when I think about writing verbosely, like the way you and I write our blog posts, where you actually try to explain everything in a way that somebody who's not patient and reads will understand them, and then you see the way people have gotten to be reading on the internet, where they're just skipping forward, they're ignoring paragraphs, they're just jumping from pretty bullet list to the next pretty bullet list, they're in Twitter mentality, they don't sit patiently and read your documentation, even if it is going to save them, they will not read it.  They will just skip to, y'know, interesting little pictures and blobs and blurbs and stuff like that on the page.

Jeff: You know Joel, I didn't even listen to any of that, 'cause I was browsing the internet.