A: So one thing that happened last week that I was dying to talk about was the whole CNProg thing, that came up with the Chinese copy of StackOverflow
S: Oh yeah, well, i don't want to, I mean, I hate to be airing our dirty laundry in public and stuff, but...
A: Why is this dirty laundry?
S: Well, I'm about to air dirty laundry in public.
A: Ok, go ahead.
S: We sort of just like, you and I talk and we sort of knew that the content that users type into StackOverflow, except for their personal data, we had sort of agreed that stuff should be "open source content", whatever that means -- Creative Commons. Basically, when you put stuff -- we had agreed philosophically, that if you go and you type a question or an answer into StackOverflow, or you do some editing, you're working on a body of work that is in the public good.
S: Right, and the only exception was going to be, and we weren't going to necessarily open source our code, because that wasn't our goal. Our goal was to make sure people knew that they weren't putting stuff... there's a certain website that I hate to mention by name. Actually there are a lot of websites, even going back to IMDB, that started out as public... or the cd database that has all the cd track listings, that started out as public goods and then suddenly some company owned them. And all the work that people had been contributing on the Internet to making this public good, it was suddenly the copyright of some large organization that was very protective of it. So we didn't want that to happen. We wanted people to know apriori when they put content into SO that you're contributing to the public good and there's no chance that some company's going to take the content and say "We own it" and start charging for it.
The dirty laundry is all we sort of did is slapped some Creative Commons sticker on the bottom of every page in the website, which doesn't really explain what that policy is. And there's two bugs in what we did. One is we never really said "Look, this doesn't cover the CSS. It doesn't cover the HTML. You can't just copy the look and feel." And number two, we also didn't really provide a good mechanism for anybody who wants to exercise that right, to get the data that was contributed in StackOverflow in some way other than screen scraping. And if we catch people doing that we have to ban them because we don't want to support that on our servers.
A: Right. Well certainly we should have clarified the position. To me it was obvious...
S: OK [laughing]
A: It could always be more obvious.
S: It doesn't state that anywhere. It just says "Hey, It's Creative Commons. Have fun..."
A: The ironic thing, he didn't actually copy...
S: We should actually update... there are some listener's that don't know what this... It was a website that somebody did, I don't know who... that was a complete look and feel clone of StackOverflow, down to the point of literally... there's sort of two intellectual property infringements, so to speak. One is I think he actually copied the CSS file, right?
A: Right, it's really a rip. It's not just a copy. It's a rip. It's like in the classical pirate sense.
A: Like complete, like byte for byte copy of the exact look and feel of the site. And the reason this is a problem...
S: So that's a copyright violation. Except that it's not clear that it is because you do have that Creative Commons thing on there, so I'm not really...
A: I don't think it would have mattered if we had the... I don't think it would have stopped this particular..
S: Ok, alright, so that's ... that's neither here nor there. I'm just talking about what the legal situation is. Now the second problem with Creative Commons, I'm pretty sure does not address... is that by making a site that looks exactly like our site (which they did except of course everything is in Chinese), they're violating trademark law, which is completely different than copyright law, in that they're attempting to look like us. And it's something in trademark called trade dress. And that would be a violation no matter how open source we made our site. So they do have to fix that!
A: Yes, I just don't want there to be any confusion. I actually did write the guys and they did respond. And I thinked I CC'd you on that or BCC'd you...
A: I tried to be clear... first of all, it's nice to have something that people want to copy. And I think we might have briefly touched on this in a previous podcast, it is a compliment that people want to copy you. I mean there's so much stuff out there that nobody gives a damn about, right? So the people care enough to copy you is an incredible compliment. And we do treat it like a compliment. But where we do draw the line is... Ok, you can be inspired by SO and that is totally cool... and honestly we can't stop you anyway. And we're not in the business of stopping people from doing whatever it is that they want to do, as long as it's not hurting anybody. But the point at which there could be confusion about which site is which, except for the language thing which would be a big tip off -- that's bad. Because then it looks like we created CNProg...
S: Yeah, yeah, that's the trademark.
A: We had nothing to do...
S: That's the trade dress
A: Yeah, we had nothing to do with CNProg, so CNProg sucks. It deletes all your... it deletes your hard drive or whatever. People could theoretically think we were responsible for this. The site looks exactly like ours! Did you see they actually copied the blog as well, completely?
S: No I didn't even see that
A: If you click on the blog on CNProg it looks exactly like blog.stackoverflow.com. Same categories, same..
S: Are they just translating the things you post?
A: No, no. They are posting... the content is unique. That's the other thing. They didn't take any of the content that was on SO..
S: Which is the only thing they could have had
A: They could have totally copied all the content. That is completely legal.
S: Well, we have to... you can't... It's not enough for us to just think "This is our policy". We have to have a page that says "This is our policy..." And I think just having a random Creative Commons link on the bottom of every page is if anything more confusing than not... And I think what we need to do first of all is instead of just having Creative Commons link, you know, having a page that explains the license under which things are contributed and saying "Listen, anything, except for the user profile information which is personal, is open.. is owned by the community, etc" It's not whatever the Creative Commons... is licensed under the Creative Commons license. Basically... And it also needs to say, "hey if you type something into SO you are agreeing that these words into SO are going to be under this licenses". And the next thing we need to do to prevent the screen scrapers and also just to be legit here, if we really are claiming this is open and we don't want screen scrapers pounding our servers... we have to provide some mechanism for people to actually get that data if they want it. And it could be anything. We could say "Listen, send us a $100 bill with the president facing the front of the envelope, such and such an address, and we'll make you a tape, a DAT tape, with ... a LTO-2 tape, with the database backup and send it to you via camel jockey or something." I mean there has to be some mechanism whereby someone that actually wants to get the raw data can get it. It doesn't have to be a download link. But there has to be some way I think.
A: This is a pretty highly voted item in UserVoice, largely because we've been satisfying, not recently but, prior to this we were satisfying a lot of the UserVoice requests as we got them in basically vote order. And that's a very highly voted item. And we do plan to get to it. The actual main barrier to that at the moment is that we have to remove all the personally identifiable information from the database...
A: And I'm a little nervous because of the whole AOL. Remember when AOL released all that anonymous data?
A: That was like not in fact anonymous at all. And everybody was able to track everybody in there. That kind of freaks me out a little.
A: And believe me I'm totally down. I really believe in the CCWiki thing, and I want to follow through, but I don't want to get in serious trouble, you know, like AOL. Basically... is all I'm trying to say. I think what I'll do actually is blog about this and get some feedback on that, because I definitely want to avoid the AOL problem.
S: Right, right, right
A: And it's on the list. It's just not a super high priority, but it's definitely on the list. So, Google's Chinese translation tools are actually surprisingly good. I mean we were able to go in and translate a lot of the things that were going in on CNProg just to figure out what was going and get reasonably comprehensible English out of it. Which shocked me, but one of the funny things we saw was that somebody asked a question on CNProg.. Essentially asking "Isn't this exactly like StackOverflow?" And the translation was very funny. The translation is, that Google gave, was "Why this child like a two site printout of this mold?"
A & S: [chuckles]
A: Which I thought was very funny, cuz it is kind of what it is.
S: Yeah, it's like a moldy version...
A: [laughing] No, it was really perfect.
A: Now part of me is sympathetic to what they are trying to do. I think they're trying to have a local hacker/programmer culture site
A: in there own language and that's probably something we're not going to get to...
S: Why not?
A: ..in a reasonable time frame..
S: I think that's a mistake
A: Well, I don't think it is for our audience and I kind of had the blog post about that. I think English is the de facto standard language for programming.
S: There are five languages for which that is not true, and you can tell because those are the five languages that visual studio are localized into. And I notice this because I have a little Visual Studio plugin for FogBugz that I wrote myself and I get the bugs for those five... I think there's five languages that VS .. there are five important languages... that VS is translated into. And tons and tons of the MSDN content is translated, but not into 39 languages, but into five languages. ANd these are the languages for whatever reason there is a very very large body of working programmers that are just not as happy in English. I mean, they may know English and they are willing to use English if they have to. But it's just slower for them. They'd just rather use the language that's faster and easier for them if that's available. Those are in my experience German, Spanish, French, Japanese and Chinese.
A: [jokingly] And Latin
S: No, uh Swahili.
A: [fake laugh] I don't know. I mean part of it is... and I acknowledged in my blog post where I talked about this. It's a little uncomfortable to say because you do feel like the Ugly American or the Ugly English Speaking person, but I really thing for our audience... we have a small team. We have a limited number of things that we can in any reasonable amount of time that's not 6-8 weeks. And localization is still pretty hard? [laugh] I think serving the primary audience which is English speaking programmers I think is by far the most important thing and I think if we get that right, really right and concentrate on keeping that right over a year or two years... however long it takes to get to this. I think that's more important than killing ourselves trying to localize really early. And maintain those communities..
S: Well this is a decision that almost every startup makes. I would go so far as to say every startup. Which is either localization for providing different markets is always second priority and it's not a wrong decision. It's a decision that everybody makes. And what happens, just so you know... Look at for example Google or Ebay, these companies launched in the US in English and they tried to localize aggressively but sometimes there was a local