A: So one thing that happened last week that I was dying to talk about was the whole CNProg thing, that came up with the Chinese copy of StackOverflow
S: Oh yeah, well, i don't want to, I mean, I hate to be airing our dirty laundry in public and stuff, but...
A: Why is this dirty laundry?
S: Well, I'm about to air dirty laundry in public.
A: Ok, go ahead.
S: We sort of just like, you and I talk and we sort of knew that the content that users type into StackOverflow, except for their personal data, we had sort of agreed that stuff should be "open source content", whatever that means -- Creative Commons. Basically, when you put stuff -- we had agreed philosophically, that if you go and you type a question or an answer into StackOverflow, or you do some editing, you're working on a body of work that is in the public good.
S: Right, and the only exception was going to be, and we weren't going to necessarily open source our code, because that wasn't our goal. Our goal was to make sure people knew that they weren't putting stuff... there's a certain website that I hate to mention by name. Actually there are a lot of websites, even going back to IMDB, that started out as public... or the cd database that has all the cd track listings, that started out as public goods and then suddenly some company owned them. And all the work that people had been contributing on the Internet to making this public good, it was suddenly the copyright of some large organization that was very protective of it. So we didn't want that to happen. We wanted people to know apriori when they put content into SO that you're contributing to the public good and there's no chance that some company's going to take the content and say "We own it" and start charging for it.
The dirty laundry is all we sort of did is slapped some Creative Commons sticker on the bottom of every page in the website, which doesn't really explain what that policy is. And there's two bugs in what we did. One is we never really said "Look, this doesn't cover the CSS. It doesn't cover the HTML. You can't just copy the look and feel." And number two, we also didn't really provide a good mechanism for anybody who wants to exercise that right, to get the data that was contributed in StackOverflow in some way other than screen scraping. And if we catch people doing that we have to ban them because we don't want to support that on our servers.
A: Right. Well certainly we should have clarified the position. To me it was obvious...
S: OK [laughing]
A: It could always be more obvious.
S: It doesn't state that anywhere. It just says "Hey, It's Creative Commons. Have fun..."
A: The ironic thing, he didn't actually copy...
S: We should actually update... there are some listener's that don't know what this... It was a website that somebody did, I don't know who... that was a complete look and feel clone of StackOverflow, down to the point of literally... there's sort of two intellectual property infringements, so to speak. One is I think he actually copied the CSS file, right?
A: Right, it's really a rip. It's not just a copy. It's a rip. It's like in the classical pirate sense.
A: Like complete, like byte for byte copy of the exact look and feel of the site. And the reason this is a problem...
S: So that's a copyright violation. Except that it's not clear that it is because you do have that Creative Commons thing on there, so I'm not really...
A: I don't think it would have mattered if we had the... I don't think it would have stopped this particular..
S: Ok, alright, so that's ... that's neither here nor there. I'm just talking about what the legal situation is. Now the second problem with Creative Commons, I'm pretty sure does not address... is that by making a site that looks exactly like our site (which they did except of course everything is in Chinese), they're violating trademark law, which is completely different than copyright law, in that they're attempting to look like us. And it's something in trademark called trade dress. And that would be a violation no matter how open source we made our site. So they do have to fix that!
A: Yes, I just don't want there to be any confusion. I actually did write the guys and they did respond. And I thinked I CC'd you on that or BCC'd you...
A: I tried to be clear... first of all, it's nice to have something that people want to copy. And I think we might have briefly touched on this in a previous podcast, it is a compliment that people want to copy you. I mean there's so much stuff out there that nobody gives a damn about, right? So the people care enough to copy you is an incredible compliment. And we do treat it like a compliment. But where we do draw the line is... Ok, you can be inspired by SO and that is totally cool... and honestly we can't stop you anyway. And we're not in the business of stopping people from doing whatever it is that they want to do, as long as it's not hurting anybody. But the point at which there could be confusion about which site is which, except for the language thing which would be a big tip off -- that's bad. Because then it looks like we created CNProg...
S: Yeah, yeah, that's the trademark.
A: We had nothing to do...
S: That's the trade dress
A: Yeah, we had nothing to do with CNProg, so CNProg sucks. It deletes all your... it deletes your hard drive or whatever. People could theoretically think we were responsible for this. The site looks exactly like ours! Did you see they actually copied the blog as well, completely?
S: No I didn't even see that
A: If you click on the blog on CNProg it looks exactly like blog.stackoverflow.com. Same categories, same..
S: Are they just translating the things you post?
A: No, no. They are posting... the content is unique. That's the other thing. They didn't take any of the content that was on SO..
S: Which is the only thing they could have had
A: They could have totally copied all the content. That is completely legal.
S: Well, we have to... you can't... It's not enough for us to just think "This is our policy". We have to have a page that says "This is our policy..." And I think just having a random Creative Commons link on the bottom of every page is if anything more confusing than not... And I think what we need to do first of all is instead of just having Creative Commons link, you know, having a page that explains the license under which things are contributed and saying "Listen, anything, except for the user profile information which is personal, is open.. is owned by the community, etc" It's not whatever the Creative Commons... is licensed under the Creative Commons license. Basically... And it also needs to say, "hey if you type something into SO you are agreeing that these words into SO are going to be under this licenses". And the next thing we need to do to prevent the screen scrapers and also just to be legit here, if we really are claiming this is open and we don't want screen scrapers pounding our servers... we have to provide some mechanism for people to actually get that data if they want it. And it could be anything. We could say "Listen, send us a $100 bill with the president facing the front of the envelope, such and such an address, and we'll make you a tape, a DAT tape, with ... a LTO-2 tape, with the database backup and send it to you via camel jockey or something." I mean there has to be some mechanism whereby someone that actually wants to get the raw data can get it. It doesn't have to be a download link. But there has to be some way I think.
A: This is a pretty highly voted item in UserVoice, largely because we've been satisfying, not recently but, prior to this we were satisfying a lot of the UserVoice requests as we got them in basically vote order. And that's a very highly voted item. And we do plan to get to it. The actual main barrier to that at the moment is that we have to remove all the personally identifiable information from the database...
A: And I'm a little nervous because of the whole AOL. Remember when AOL released all that anonymous data?
A: That was like not in fact anonymous at all. And everybody was able to track everybody in there. That kind of freaks me out a little.
A: And believe me I'm totally down. I really believe in the CCWiki thing, and I want to follow through, but I don't want to get in serious trouble, you know, like AOL. Basically... is all I'm trying to say. I think what I'll do actually is blog about this and get some feedback on that, because I definitely want to avoid the AOL problem.
S: Right, right, right
A: And it's on the list. It's just not a super high priority, but it's definitely on the list. So, Google's Chinese translation tools are actually surprisingly good. I mean we were able to go in and translate a lot of the things that were going in on CNProg just to figure out what was going and get reasonably comprehensible English out of it. Which shocked me, but one of the funny things we saw was that somebody asked a question on CNProg.. Essentially asking "Isn't this exactly like StackOverflow?" And the translation was very funny. The translation is, that Google gave, was "Why this child like a two site printout of this mold?"
A & S: [chuckles]
A: Which I thought was very funny, cuz it is kind of what it is.
S: Yeah, it's like a moldy version...
A: [laughing] No, it was really perfect.
A: Now part of me is sympathetic to what they are trying to do. I think they're trying to have a local hacker/programmer culture site
A: in there own language and that's probably something we're not going to get to...
S: Why not?
A: ..in a reasonable time frame..
S: I think that's a mistake
A: Well, I don't think it is for our audience and I kind of had the blog post about that. I think English is the de facto standard language for programming.
S: There are five languages for which that is not true, and you can tell because those are the five languages that visual studio are localized into. And I notice this because I have a little Visual Studio plugin for FogBugz that I wrote myself and I get the bugs for those five... I think there's five languages that VS .. there are five important languages... that VS is translated into. And tons and tons of the MSDN content is translated, but not into 39 languages, but into five languages. ANd these are the languages for whatever reason there is a very very large body of working programmers that are just not as happy in English. I mean, they may know English and they are willing to use English if they have to. But it's just slower for them. They'd just rather use the language that's faster and easier for them if that's available. Those are in my experience German, Spanish, French, Japanese and Chinese.
A: [jokingly] And Latin
S: No, uh Swahili.
A: [fake laugh] I don't know. I mean part of it is... and I acknowledged in my blog post where I talked about this. It's a little uncomfortable to say because you do feel like the Ugly American or the Ugly English Speaking person, but I really thing for our audience... we have a small team. We have a limited number of things that we can in any reasonable amount of time that's not 6-8 weeks. And localization is still pretty hard? [laugh] I think serving the primary audience which is English speaking programmers I think is by far the most important thing and I think if we get that right, really right and concentrate on keeping that right over a year or two years... however long it takes to get to this. I think that's more important than killing ourselves trying to localize really early. And maintain those communities..
S: Well this is a decision that almost every startup makes. I would go so far as to say every startup. Which is either localization for providing different markets is always second priority and it's not a wrong decision. It's a decision that everybody makes. And what happens, just so you know... Look at for example Google or Ebay, these companies launched in the US in English and they tried to localize aggressively but sometimes there was a local fast copy company that just got there faster and took the market. A good example: Last time I went to New Zealand I noticed there was this website trademe. And trademe is just an ebay clone. And I'm almost certain it came out after ebay and the NZ'ers saw eBay and said "Hey, let's try this" and it just didn't work because of the expense of shipping to and from NZ and so they said "Hey, let's make one that's NZ base". And they built a clone of eBay called TradeMe. And I didn't research the story, but the way I understand things, eBay tried to buy them. eBay tried to move into NZ but just couldn't because TradeMe already had critical mass. That's where the stuff was. That's where the people were. There's this very strong network effect. You don't want to auction stuff on a site where there's no buyers, and you don't want to try to buy things on a site where there are no sellers. SO there's just no way to move in as a second auction website. So eBay basically had no choice if they ever wanted to be in the NZ market, admittedly a small market, other than to buy trademe. But the price was just too high and I think they gave up. And it's not just that. Google has the same story with ... what was it, Baidu? There was this company they ended up buying 3%. Because they just got into China too late...
A: Well I don't think those are the same scenario.
S: I think they are all the same scenario because...
A: Well, you had a long diatribe there and I want to interject... You're talking about selling physical goods..
S: What? Google!
A: eBay! Your first example was eBay
S: Ok there's language... don't concentrate on the selling physical goods. Concentrate on the fact that...
A: Ok, searching! Searching is culture dependent. Programming is more like mathematics in the sense... it's numbers! You're not going to localize Pi.
S: Stop! So you're saying there's no such thing as people that want to talk about programming in the Chinese language?
A: They're going to do the programming in the English keywords because they have to..
S: That's actually not true!
A: And they're going to have to learn some of it regardless and I agree it's not the entire market. But I think the focus is we're talking about something...to the extent that programming is like mathematics - you have a common language which is the keywords and the comments stuff like that... primarily in English. So there's a limit to how good you can be as a programmer without learning English, in my opinion. Now that's not to say you can't have local culture that's complementary.
S: Listen, all of this is true, but this is not realistically the way it works.
A: I don't know... I've read hundreds if not thousands of comments on this topic and all of them basically said the same thing, which was that... you can't really be a good programmer without learning English.
S: That doesn't matter! That doesn't mean there isn't a huge community of people whose English is either not strong enough or just prefer to use their local language. Programmers in China will just go to the Chinese language site if that exists first. You can sit there and you can "Tut! Tut!" them and say "You must learn English!" "It is the lingua franca of programming!"
A: But these are the second tier programmers. This is not our audience..
S: [mockingly] OHhhhh, they're not good enough for us because they don't speak English well enough.
A: What I'm saying is we can't do all these things at once.
S: I agree with that.
A: That's what I'm saying. I'm not saying they suck. I'm saying we have to serve one audience really well.
A: Rather than doing a bunch of things crappily, let's do one thing really well. Right? And you gotta pick your battles. I don't think that's a battle...
S: Right, that's what always happens. The end game just so you know in advance, is that six years from now when we're really big and we've got all this money and we're just trying to expand. We'll be like "How do we expand?" And the only opportunity there is there's this big gigantic untapped German speaking market of really good programmers that just happen to prefer to read and write their questions in German. They could speak English if they wanted to, they just don't choose to. And they're being served by this beautiful stackoverflow clone that has critical mass in Germany. And we have to either buy it or just give up on ever having that market. And so you've limited your growth in certain markets. I've presented the story with regards to language markets, like the Chinese.. I think what they are Chinese German Japanese and Spanish. Those are the four I probably care the most about... just because I know those are audiences where there's a huge amount... I mean if you go into a Japanese bookstore, you will find more programming in there written in Japanese than you will find in an English bookstore in America. There are ... I don't know if this is numerically true.. I just know statistically when I walked into bookstores in Tokyo they have way more books on programming topics than you can get in English that are written in Japanese. And there's a lot of English text there but believe me they don't have... In the Tokyo bookstores it is really hard to find an English language programming book. They just buy them in Japanese. And so those are the markets I'd care about but that's just language markets ... Think about also things like IT... I mean we wanna do IT.. we wanna do a website for system builders. We might wanna do gamer site or something like that. There's sort of other categories of sites we want to go in... and I think it's pretty important to get there before somebody else's clone SO and does another website that's exactly like this in those spaces.
A: But I think it's an illusion it's going to happen anyway. The copies that are being made are not really great copies in my opinion. First of all we continue to evolve the site. There are tweaks we do ALL the time that are really significant in terms of how the site works and sort of the social rules of how .. like bounty for example. I don't know that CNProg has any concept of bounty. I also noticed that CNProg, you have to log in to do anything there.. which misses a huge point that came up in our site was that - we're all about reducing barriers. We don't make you log in to do stuff. So already they're getting it wrong. It's kind of like the crappy ipod copies that are all over the market, but there's one ipod.
S: You're just looking at the first couple of sites that are sprouting up. Somebody's going to do it right and then we're gonna not be able to get into that market, whatever it is.
A: Well I just think that's fear driven development. I don't think that that's in fact true. I mean, I think there's just a broad generic...
S: Somebody's gonna make... look... I'll be you I don't know if it's this Chinese site, but somebody's gonna make a site. What about Japan? I'd love for all Japanese programmers to speak English really well but they don't and there's a LOT of programmers in Japan. It's probably the second or third largest market for programmers and for example we don't have a localized version of FogBugz and we sell almost nothing in Japan because we don't localize. Nothing! And it is a market that's just as big and just as important. Not quite as big, but almost as big and almost as important as the English speaking market and let's say localization adds another five or ten percent to the costs of developing your software... 20%?
A: Well this is social software! How would I even know that the Japanese content was correct?
S: You don't do this, you hire people who know Japanese! Jeff! [laughing]
A: My problem is I can't control it. The content is built by those people.
A: The whole social software is a mirror of the audience thing...
A: I don't know. I just .. it's not the same thing. You're treating it like a copy of Word. "Well you just localize Word and then you sell it." What if in that culture the model doesn't work?
S: What if it does?
A: I mean it's social software. Thats the other risk there. YOu have to think about the audience.
A: And since I'm not Japanese I don't know the audience at all..
S: I would never have you do the Japanese version. The way to do the Japanese version is to find somebody who really knows the Japanese developer market and knows Japanese developer mentality and knows American developer mentality and knows how to translate these things and knows what features you might have to add and what features you might have to remove and does a really good localized version. If you look at the Japanese version of MS WOrd, it's not just a translation of the strings. There's all kinds of ability to make little greeting cards in there that are ... and they have these long snippets because there are these long formal phrases you always include in Japanese letters so Word has all these features to handle this. So you do add another small number of features which are necessary to make the product truly native.
S: Yeah, you would have to hire professionals or get local partners, but all I'm really saying here is that it is extremely common... First of all I think your logical argument that everybody should just learn English or should know English, or that first class vs. second class developers or that English as the lingua franca... that's totally true, but not really relevant because I guarantee you just walk into that bookstore... Where was I? Oh yeah, Germany, I was in Germany. There's a bookstore right by the museum in Munich... the modern art museum, like two blocks away from that, that is just a programming books bookstore. And it's like a whole...
A: I can't believe we're... these are dead tree books?
A: You're telling me the future is dead tree books?
S: All I'm telling you is if you walk into that bookstore, every book is in German, and there's no ... there might be a bookstore with programming books in Silicon Valley, but this is an entire bookstore of dead tree books that are all in German. And that means that the local audience of programmers, and they speak *beautiful* English in Germany, and they STILL prefer to read their books about programming in German.