Wikipedia and why it matters

The following talk, titled "What Wikipedia is and why it matters," was delivered to the Stanford University Computer Systems Laboratory EE380 Colloquium, on January 16, 2002. The colloquium can be seen at [1]; this is pretty much the same text that I have below, plus about a half hour of Q & A. --Larry_Sanger

First, I'd like to thank Dennis Allison for very kindly inviting me to speak here today. I'd also like to thank the Stanford Computer Forum for providing some support.

My topic is an online encyclopedia project that I co-founded and that I have been employed full time in helping to manage. The project's name is Wikipedia, and can be found on the web at Wikipedia.com.

On a friend's advice, I'm going to begin by explaining a bit about why you should care about Wikipedia at all.

Wikipedia is an encyclopedia project. It's free--not just without cost, but free in the GNU sense. I'll explain some more later about that. We began work just one year ago (in fact, one year and one day--yesterday was our first anniversary). Since then, participants have added well over 20,000 articles. In the past four months, we have doubled our number of articles. Also in that time, the project has been the subject of focused news coverage by The New York Times, The New York Times Magazine, MIT's Technology Review, and a variety of other sources in the international press. The project has also been favorably mentioned on National Public Radio and the Associated Press. We've also been Slashdotted a few times.

What's remarkable is that Wikipedia has received all of this attention, and has made all of this progress, in spite of--or perhaps because of--the fact that it is completely open to any contributor. Wikipedia is a WikiWikiWeb, which means that anyone can go to any page, click on a link that reads "edit text of this page," and proceed to edit the article. After doing some copyediting, or adding a few new paragraphs of information, one simply presses the "save" button, and the changes to the article have been made. The editing is automatically logged on a "Recent Changes" page that participants monitor closely. If someone sees something inane, false, biased, or otherwise defective, he or she can quickly and easily change the text.

I think Wikipedia is, therefore, an excellent example of a new kind of website: a radically collaborative, truly open website, that actually produces content that the general public might want to read.

All right. Now, everything I've said up to this point has been introductory. The rest of this talk is divided into two parts, the first part being a discussion of some leading characteristics of Wikipedia, and the second part being some further discussion about why, on various different levels, you should care about Wikipedia. The first part is much longer than the second part. You should be aware that the talk as a whole is not intended to be particularly technical or academic; my main interest here is to introduce you to Wikipedia. I think that just describing it and making a few bold claims about why it's important should be thought-provoking enough.

What Wikipedia is.

edit

So, what exactly is Wikipedia?

There are three essential characteristics of the project that together define its niche on the World Wide Web: it is a WikiWiki, it is an encyclopedia, and it is open content (or free). I'm going discuss these points in turn.

First, Wikipedia is a WikiWiki. What exactly does that mean? I am told that the word, "WikiWiki," or just "wiki" for short, comes from a Hawaiian word for "quick information." Now, there's some disagreement about what constitutes a wiki, but basically it's a software application that allows people to collaborate on the writing of an entire website, without any central editorial mechanism necessary. Participants just go in and edit any page they want, whenever they want, working at their own place.

Let me give an example of how you might edit a page. As I said, wiki software essentially allows anyone to go to the wiki website and edit any page. So, for example, suppose I go to a certain page about philosophy, and I see two problems with the page: there's a spelling error, and the page doesn't discuss some pet topic of mine. I feel inspired to write an article about that topic. So here's what I do. I press the "edit text of this page" link. Up pops a text box; I scroll down to the spelling mistake and correct it. Then I find a place to add a link to a new article. Say there's some discussion of political philosophy and freedom as a political concept, and I want to talk about so-called negative freedom. So I add a sentence or two that explains the distinction between positive and negative freedom, and when I write the words "negative freedom," I enclose them in double brackets--first two left brackets, then the words "negative freedom," then two right brackets. Then I save changes to the article. Then I'm presented with the article again, but now with the changes I just made. Since there is not yet an article called "negative freedom," there is a little blue question mark after the words "negative freedom" that I typed. So I click on that little blue question mark and I arrive at another text box, which invites me to write something. So I do. I expound and I excogitate and then I press the "save" button. So now I've edited one article and I have created another article. Both of these changes are recorded in a publicly-viewable "Recent Changes" log. Other people can observe the changes I made and, if they want to make some further changes, they can. That's basically how it works.

I really don't have enough time to explain the history and culture surrounding WikiWikis, but I would certainly be remiss if I did not mention that the man who is responsible for the original wiki is Ward Cunningham, who has co-written a book called "The Wiki Way." You can see the very first wiki, which is still one of the largest and most active, if you follow a link from c2.com. I'd also be remiss if I did not make it very plain that the online culture surrounding Wikipedia is quite different from the online culture surrounding most other wikis. For more details, I would recommend a visit to c2.com.

This brings me to the second point. The Wikipedia project is self-consciously an encyclopedia--rather than a dictionary, discussion forum, web portal, etc.--or even just a typical wiki. Quite a few other wikis do not have any very explicit purpose. They are, their hosts say, whatever the participants want them to be. As a result, as you might expect, those wikis often end up being nothing in particular, and generally just another way to conduct online discussions. Now, I don't mean this as a criticism of other wikis--indeed, wikis have a very interesting, refreshing openness and cordiality that Wikipedia shares in. I'm just trying to distinguish Wikipedia, which has a very specific purpose--to build an encyclopedia--from other wikis, which in many cases do not have any such specific purpose.

Actually, there is a bit of a tension between the facts that Wikipedia is an encyclopedia project and that it is a wiki. When people arrive at the website and see that they can edit any page, and that there is little central oversight, it is immediately evident that Wikipedia is an encyclopedia only because we decided to make it one. The website, when we first set it up, was just a blank slate. It required myself and some other people to declare, "We're making an encyclopedia here on this wiki"--we had to make that declaration repeatedly in order for people to know that indeed, we wanted to make an encyclopedia. We could have instead made a poetry forum, a dictionary, or a chat area. But we didn't want to. We wanted it to become an encyclopedia, and that is what it has become.

You might be skeptical, however, that a wiki could produce a reference work that would deserve to be called an encyclopedia. In that case, you would think that "wiki-based encyclopedia project" would be something of an oxymoron. I'll address that objection shortly. Before I do that, though, I want to present the third defining feature of the Wikipedia project. So far, I've said that Wikipedia is a wiki and that it's an encyclopedia.

It's also essential to note that the content is all open content, or free in the GNU sense. Wikipedia contents are released to the public under the GNU Free Documentation License. This license works, roughly speaking, as follows. Text and media are licensed by the copyright holder to the general public, permitting anyone to redistribute and alter the text free of charge, and guaranteeing that no one will be able to restrict access to amended versions of the content. An open content, or free content, license is intended to guarantee that all of the content stays free forever. If we were simply to declare that the contents are in the public domain, then anyone could come along, make some slight changes to the text, and then copyright that, preventing others from using their slightly-changed version. We want to prevent that from happening. As you can see, the concepts are much the same that you've encountered in the open source or free software movement.

You might not guess this immediately, but it is very important that Wikipedia is open content. It's important for two reasons. First of all, Wikipedia's contributors--called Wikipedians--understand that their efforts will be freely distributable forever. That is, I think, one of the main incentives they have to participate. If the organisers of the project were instead to claim the content for themselves, and not release it freely, the participants would have the sense that they were merely working for someone else's gain. But since the content is free, contributors instead have the sense that they are working for the benefit of a worldwide audience. And they are!

The second reason it's important that Wikipedia is open content is that this helps to make the project an institution that can take on a life of its own. If, for whatever reason, the organisers can no longer support the project--we definitely don't expect that to happen, by the way--then others can pick up where we left off, if necessary. There is no reason that Wikipedia's content ever needs to, as it were, sit on someone's shelf, gathering dust, as traditional encyclopedias have done. It's because it's open content that the world is free to continue to develop it. I think that's a great thing. It portends great things for Wikipedia and for its sister project, Nupedia, in the future. Though Wikipedia has 20,000 articles now, the quality of the articles is admittedly uneven--while we have a lot of rather good articles, we also have a lot of articles that need work. (Wikipedians are very honest with themselves about this, by the way.) But what about in ten or twenty years? To take that question seriously--"What about in ten or twenty years?"--requires that we assume that the project is going to exist in ten or twenty years. Few things in this life are certain, but we can rest a lot more assured that Wikipedia will still exist in ten or twenty years precisely because it is open content. A lot of people care about this project, and the license guarantees that it will stay free to continue to develop.

So, indeed, what will happen in ten or twenty years? It's hard to say for sure, but if we continue on as we have been, it seems pretty likely that we will have hundreds of thousands of articles, and that the older articles, at least, will be very long and very well-developed, having been mercilessly edited by many different people. Some of us in the project have suggested that Britannica and Encarta have something to worry about--which, of course, sounds ludicrous right now.

Next, let me talk about the growth of the project, because I think a certain bit about that might be of some interest to you. As I said, we began a year ago, on January 15, 2001. The project was originally going to be a side-project for Nupedia, which is also a free encyclopedia project I manage; Nupedia, unlike Wikipedia, is very carefully peer-reviewed, and managed mainly by academics with Ph.D.'s. Nupedia's advisory board did not want Nupedia to be associated with a wiki. So Wikipedia was born as a separate project but still had the distinct benefit of the participation of a number of well-educated Nupedia contributors. So, after a few months, we had created over a thousand articles. As the months passed, the pace of production increased. We announced having over 6,000 articles in July, which would be about a thousand articles created in a month; in September we announced having 10,000 articles, which indicates that between July and September we were creating 2,000 articles per month. Then, between September and the present, we have, we estimate, created another 10,000 articles, which indicates that in that time, despite the holiday slowdown, we were creating something like 2,500 articles per month.

I and the other project organisers have a theory as to why Wikipedia traffic has increased. After reviewing our referrer logs last summer (I think it was), we observed that the Google search engine had been sending us a growing amount of traffic, and since then, Google has continued to send us increasing amounts of traffic. We believe that we are, happily, in a positive feedback loop with Google, as follows. We write a thousand articles; Google spiders them and sends some traffic to those pages. Some small percentage of that traffic becomes Wikipedia contributors, increasing our contributor base. The enlarged contributor base then writes another two thousand articles, which Google dutifully spiders, and then we receive an even larger influx of traffic. All the while, no doubt in part due to links to our articles from Google, an increasing number of other websites link to Wikipedia, increasing the standing of Wikipedia pages in Google results. Needless to say, we're delighted to be in this situation.

All right, so far, I've explained three distinguishing characteristics of the Wikipedia project--it's a wiki, it's an encyclopedia, and it's open content--and I have said a few things about Wikipedia's growth. Next, I would like to switch gears a bit, and answer some obvious objections to the very idea of Wikipedia. There is, in fact, a very long page on Wikipedia that you can read in which I and some other people have posed and answered a whole raft of criticisms of Wikipedia.

In academia, we learn, one way or another, that the reputation of an author is very important for determining credibility, and reputation is determined to a great extent by such things as faculty appointments and by publications. Given that, how can we respect the credibility of a project to which any anonymous user can drop any string of characters onto a Wikipedia page? There are a lot of other, closely related objections. If anyone can contribute, how do we know that what we're reading hasn't been written by a crank, a mediocre student, or someone with an ax to grind? The project has no peer review process, or at least no traditional review process--so how can we trust the content generally?

The short answer to most of such objections is that we are constantly editing each others' work via the same process that makes easy article creation possible in the first place--and this turns out to be a reasonably powerful, though far from perfect, review process in itself. Moreover, as I'll explain in a moment, Wikipedia remains loosely associated with Nupedia, which has a more traditional academic review process.

Now to expand this short answer. The general objection can be formulated as an argument (in the form of a hypothetical syllogism), as follows:

Premise 1. If Wikipedia is open to anyone to edit, then it permits contributions by cranks, vandals, dilettantes, and other unreliable sources.

Premise 2. If Wikipedia permits contributions by the aforementioned unreliable sources, then it is itself an unreliable reference.

Conclusion. Therefore, if Wikipedia is open to anyone to edit, then it is an unreliable reference.

Now, the first premise here is true (in fact, the logicians will be happy to point out that it's logically true). Indeed, if Wikipedia is open to anyone to edit, then it does permit contributions by all sorts of riff-raff. It's the second premise that those familiar with the project will deny. Just because we do permit cranks, vandals, etc., to contribute, it hardly follows just from that that Wikipedia is unreliable. What if, as is the case, we have a lot of reliable regulars who are constantly monitoring the "Recent Changes" page, and cleaning up after the riff-raff, and constantly editing everyone else's work as well, for that matter? If a crank contributes some crankish article on a pet theory only to find it nearly instantly deleted, or edited beyond recognition, then the crank is going to go away in a huff. We've seen this happen a number of times already. As for vandals, if some kid with too much time on his hands types in obscenities on a page, it's easy enough for someone to restore the last unvandalized version of the page; and if necessary, I personally ban the vandal's IP address from working on the project.

My point here is that it is hardly as though Wikipedia were simply--no more than--a repository of nonsense and drivel, because it certainly is not that, and the reason for that is that Wikipedians keep a watchful eye over new contributions. I am not trying to persuade you that Wikipedia produces work that can, at present anyway, match the reliability of a traditional encyclopedia. But over time, articles improve considerably, and a number of our articles are nearly as good as anything you'll find in any encyclopedia. That's because the core group of our contributors includes quite a few Ph.D.'s, professors, professionals, and top-flight students. So I would say that a better way to estimate the reliability of Wikipedia, than simply to say that it permits cranks and vandals to contribute, is to observe that the articles that are now part of Wikipedia have been created by a general public under the general guidance of some very smart, articulate people. In other words, article quality is determined by standards enforced by the best and brightest of a particular community--not by the worst, most transient elements of that community.

Still, you might ask yourself whether there is any sort of guarantee that the information in any given article is reliable, at least according to someone whom we ought to be able to trust--you know, an expert. Well, that's a good question, and the answer is: not yet. But we have discussed various ways of putting someone's stamp of approval on our best articles. We want to design an approval system that does not interfere with the creation of new content--we don't want to kill the goose that lays the golden eggs. Probably the most promising scheme is to combine efforts with Nupedia, letting Wikipedia act solely as a content-generation system and letting Nupedia act as a content-approval system. One thing that makes me a bit optimistic about this is that we are in the process of designing a greatly-simplified, but still traditional academic, review system for Nupedia articles. In essence, the process of getting an article accepted to Nupedia will be very similar to the process of getting an article accepted to an academic journal: an expert reviewer will simply read the article and say "yea," "nay," or "conditional pass." This should be a great improvement over Nupedia's present system, which, frankly, has ground to a halt under the weight of its baroque and extremely rigorous review system, after having produced twenty-five articles. (Twenty-five excellent articles, mind you.) We think the new Nupedia system will, for a variety of reasons, be much more productive, and some of us think it might prove to be a suitable review mechanism for Wikipedia articles.

There's one more very important aspect of Wikipedia that I think needs to be introduced, and that is that Wikipedia projects in other languages are being developed concurrently with the English language project. Currently, some of the largest and most well-trafficked projects, aside from the English language project, are those in Polish, Spanish, and German, but you might be interested to know that the Esperanto project has been taking off as well; and content has been added to other Wikipedia's as well.

Unfortunately, it's difficult for me to manage these other projects because, while I know a little German and even less French and Russian, I don't enough of any of these languages to be able to be taken seriously as a sort of project leader. So, while Wikipedia.com hosts these websites, for the most part they have been developing entirely on their own. We have set up a mailing list for cross-language Wikipedia discussion, though, and that provides the community a very weak oversight function.

There's an issue about the international Wikipedia situation that you might find interesting--I certainly do, anyway. The issue is how, and to what extent, work done by these different wikis should be coordinated. For example, should we, the organisers of the project as a whole, try to dictate to people who are working on the French language wiki project that articles in French should, wherever possible, be translations of any existing English language articles? Well, we think that such an attempt would be clearly anti-wiki--which, in case you didn't know, is a bad thing. Basically, the reason that wikis work as well as they do is that people feel free to add content to them. Now, in organising the English language Wikipedia project, I think we've discovered that a wiki can be guided to satisfy certain basic policy constraints. So it's not just nonsensical to suggest that we might ask people who are writing in other languages to translate a longer, English language article on a subject, if such an article exists in English. We could ask people working on the French language wiki, for example, to do that; the problem is that, probably, no one would want to work on the French language wiki in that case.

So, instead, with the newest version of the software that runs Wikipedia, we're going to make it very easy for people to set up links between articles in different languages. This, I think, you might find to be an interesting technical puzzle. If we write an article titled, say, "philosophy" for the English language Wikipedia, and the Germans have written a different article, titled "philosophie," for the German language Wikipedia, we need a nice easy way to link those two articles. Then, if someone writes an article about philosophy in Polish, the software should allow us simply to link the article to either the English or the German language article, and all of the articles about philosophy will be interlinked automatically.

In this way, while we won't have any official requirement that people translate articles from one language into another, we can adjust the articles in different languages to each other. Hopefully in the long run, all of the articles will have a maximal amount of useful information about all of their topics, in all languages spoken on the Internet. The dream is that we can get to that point not by establishing a single lingua franca like English, but by allowing articles in all different languages to converge naturally (as it were) through interlinking.

Why Wikipedia matters.

edit

So far in this talk I've simply tried to introduce you to the Wikipedia project. In the second, briefer part of the talk, I'm going to explain why, at least in my opinion, Wikipedia matters--or how it could matter after some more development, if it is properly guided. I'm going to try to persuade you that it is, or could become, an important project on multiple levels.

There's a lot that I could say under this heading, but I'm going to have to restrain myself. The fact is that, right now, like most websites and like most open source and open content projects, Wikipedia just isn't very important, not in the grand scheme of things anyway, and it would be rather silly of me to try to convince you otherwise. So I'm going to try to limit my remarks to the more demonstrable and plausible of the points that I could make.

Importance as an encyclopedia

Wikipedia is of growing importance as an encyclopedia, primarily because it is a free encyclopedia. If we were where we are now and it were 1999, when Britannica and other encyclopedias online were free of charge, this would be harder to maintain. But as things are, as surprising as this might sound, Wikipedia is rapidly becoming one of the better free reference works online.

Moreover, Wikipedia's freedom will grow only more important as our content grows in breadth, depth, and quality. If there comes a time when it is beyond any doubt that, in terms of breadth, depth, and quality, Wikipedia is comparable to some given proprietary encyclopedia, that encyclopedia might have a difficult time staying in business. And if, as is possible but by no means certain, Wikipedia and Nupedia together develop to become a really great encyclopedia, one has to wonder what will become of the proprietary encyclopedias like Britannica and Encarta. Of course, those proprietary encyclopedias have nothing to worry about right now. For all we know, maybe no open content encyclopedia will ever develop into anything that can compete with Britannica, for reasons that we can see now only murkily. But if they have anything to worry about, it is the fact, as I explained earlier, that, since it is open content, Wikipedia is quite naturally an institution--meaning that we have the advantage of time.

One might very well consider Wikipedia and Nupedia together to be important as the first example of a usable encyclopedia that is built collaboratively by the public. Some maintain the Web itself is, for all intents and purposes, a giant encyclopedia built of course by the public (who else?). So if you consider the Web as an encyclopedia, then of course Wikipedia is, as just a very small part of the Web, not at all significant as an example of a publicly-built reference source. Even conceding that, though--which on semantic grounds one might not want to--Wikipedia would be our first example of a useable encyclopedia built by direct collaboration among members of the public, an international public.

Importance as an online, collaborative project

Another area in which one might well think that Wikipedia is of importance is as an online, collaborative project. I say this because it tests and demonstrates the viability of completely wide-open, collaborative writing, made possible only by the Internet, and implemented perhaps for the first time by wiki software. The Web is of course an amazing medium, permitting the extremely rapid propagation of and interlinking between documents, and thereby providing a lot of new ways to learn, discuss, and collaborate. In my opinion--though this thought is of course not original to me--the wiki concept represents a leap forward in how the web can work. It allows total strangers from around the world to collaborate on the creation of content in ways that, prior to Wiki, were virtually unknown, even on the Internet. The closest thing to it was the collaboration people often did on Usenet FAQs. But that's a poor analogy, because there was always an FAQ maintainer. On wiki, there is no "wiki maintainer." This is one of the main reasons that I am not the editor-in-chief of Wikipedia. The whole idea of an editor-in-chief, of the ultimate responsibility of a text resting with some one person or small group of people, is contrary to the way that wikis can and should work.

Now, to be clear, what makes Wikipedia important in this context is that it provides one of the very best examples, in my opinion, of the success of the wiki concept. Of course, it's a new wiki--it's far from being the first. Moreover, there have been other useful wikis, of course. But those older wikis have been mainly of interest and of use to computer programmers; being an encyclopedia, Wikipedia has a much broader appeal. I think Wikipedia has proven and will continue to prove to the general online public, in a way that the other wikis could not, that online collaborative writing in a radically open context is a viable option. I think that as Wikipedia becomes better known and as it grows, it will be pointed to as an example of how and why open, collaborative writing can produce useful content.

To put it simply, Wikipedia demonstrates particularly clearly how people can collaborate and talk to one another in a new way online.

Importance as an open content/free resource

My final point can be made briefly.

I think Wikipedia is important in that it can be held up as one of the best examples of an open content project, or a free content project ("free" being used here in the GNU sense). Indeed, Wikipedia and Nupedia are frequently cited in online discussions as among the most important examples of open content projects. Of course, there are a number of books and articles that have been released under an open content license, which you can read, for example, at Andamooka.org; but very many of those books are, again, aimed at programmers. Like the Internet as a whole, both wiki and the open content concept have started out being of interest mainly to programmers; but, if it succeeds rather better than it has done so far, I think Wikipedia will help to make the wiki concept and the open content concept popular among a much broader audience. I think that's a good thing.