Why Wikipedia ran slow in late 2003

This is an archived discussion of performance issues which affected the Wikipedia site in 2003 and the first month of 2004. They were solved by a fund raising effort, new hardware purchase and use of Squid for caching about 70% of hits. See why Wikipedia runs slow to see if there are any current performance issues.

Wikipedia has displayed a pattern of usability and access difficulties. The background, history, current examination, and potential remediations of these problems will be compiled on this page.

The current discussion thread, Wikipedia Really Slow, is at the bottom of the page. Please append or insert comments in the thread.
Comments that just say "yes, Wikipedia is slow! I find this frustrating" are not helpful; everyone knows this already. Concrete suggestions for improvement from people with experience running web sites under heavy load are helpful. Concrete offers to provide new servers to split the load are helpful. Actual runnable more efficient code is helpful.
Quick Answer.   Wikipedia has grown very rapidly (broke top 1,000 websites, 9/2003), and is still very young (began 1/2001). The platform is a combination web server, MySQL database, and the Wikipedia software. Several physical servers have been outgrown, and future loads will demand leading-edge performance. MySQL, while a meritorious database product, was designed with an eye on moderate-sized bases, and moderate complexity of application: both conditions are receding memories at the 'pedia. The Wikipedia software is quite new, still evolving, and ongoing development has for some time taken place while simultaneously delivering the encyclopedia to the public and supporting heavy editing activities. Not unlike performing major automotive overhauls while cruising down the freeway.

This page will contain (insert suggestions):

  • Updated current status report, log
    • OpenFacts for Wikipedia posts the status of Wikipedia and related domains, as Up or Down. Also offers a log of server failures back to July 2003, with brief live-action commentary, and a troubleshooting conversation between Brion VIBBER, Angela, and Tim Starling. "OpenFacts uses the Wikipedia software."
  • Descriptions of the general causes of such difficulties.
  • Brief summaries of, and links to previous references to the issues, located elsewhere on Wikipedia.
    • Cache strategy starts out with discussions of client-side versus server-side caching strategies, but then branches to several more performance and accessibility related conversations. Much info, and numerous further links. (thanks, Martin)
    • One-pass parser is a compact description of multiple pass parsing methods, with the use of Regular Expressions (notoriously slow). Includes an informative exchange between Magnus Manske and Brion VIBBER.
    • Wikitech-l is the Wikipedia Mailing Lists center: follow Sign-up links to subscribe. The "Technical Issues" list deals with Wikipedia performance concerns, among other tech matters. Contains links to the archive of each list.
    • Main causes of lag speculatively submits "Too many users?", "Poor programming?", and "Too many searches?" to account for lag.
  • What does not cause the problems.
    • Bandwidth
      • "Bandwidth is not a problem." --Brion VIBBER 18:27, 13 Sep 2003 (UTC). Contributed to current Wikipedia Really Slow discussion.
    • rambot, and other means of automated entry.
      • rambot. see: Is your bot slowing down Wikipedia? "...A Wikipedia administrator checked the logs and verified that it is not causing the server lag." by Ram-Man.
    • Pages containing large numbers of links and dependencies.
  • Stakeholders in Wikipedia: implications of impaired usability.
    • Page-visitors, largely from Google.
      • Google referrals typically come to good articles, due to engine indexing technique. Surfers suffer the gamut of Internet reality as a matter of course, including several forms of outright abuse. Delayed server response is a lessor evil, and is rather common among ethically motivated information delivery websites. Will hardly remark on significant delays.
    • Encyclopedia users: finding information, wandering.
    • Anonymous editors. A very big component.
    • Registered editors.
    • System Administrators. sysops
    • Developers.
    • Jimbo Wales
    • WikiMedia
  • Prospects, means and forms of probable solutions, or other outcomes.
    • Pushing_To_1.0, by Jimbo Wales is a large effort to refine definitions and terms for the formal release of Wikipedia v1.0. Many conversations on topics and issues.
    • Software_Phase_IV explores ideas for the next version of the Wikipedia software. Still largely in meta form, but info-dense. Reflects on some Phase III topics, too.

Advanced Wikipedians can make most timely contributions, by posting right below links or hints to the presence of existing relevant material of which they are aware, 'out there' in the vast Wiki-domain. Fresh discussion or insight from experts would be greatly appreciated (Experts do not need Degrees. Right, Tim? ;):

Thank you! --Ted Clayton 02:12, 12 Sep 2003 (UTC)

The servers are about to be upgraded which will hopefully alleviate the situation for the near future. Longer term, maybe clustering the servers, mirroring the database and connecting the severs with optical fiber may help? Ther is a fair amount of activity in the Unix/Linux clustering area at SourceForge (search software/group for cluster). Maybe one of the projects there would be of use here. CyberMaus 12:25, 13 Sep 2003 (UTC)

Yes, CyberMaus, word of a server upgrade is good news. The caveats you attach, though, nicely paint the picture to know. This page, "Why Wikipedia runs slow" appears set for a long run. With any luck. The accessibility problems here at the 'pedia are basically to die for, being the price of dramatic success. The steep part of the growth curve is on several counts still ahead. The software and hardware folks are going to be very hard pressed. Another couple upgrade cycles and the needed performance marks are going to be rarified stuff. The cycles have been coming quick, and are likely to compress. Modest solutions won't do: like the ideas you mention, they must before long range near the limits of the art. --Ted Clayton 01:59, 14 Sep 2003 (UTC)
Has anybody looked at Lingerd to see if using it may improve the situation?
From the Lingerd website:
Lingerd is a daemon (service) designed to take over the job of properly closing network connections from an http server like Apache.
Because of some technical complications in the way TCP/IP and HTTP work, each Apache process currently wastes a lot of time "lingering" on client connections, after the page has been generated and sent. Lingerd takes over this job, leaving the Apache process immediately free to handle a new connection. As a result, Lingerd makes it possible to serve the same load using considerably fewer Apache processes. This translates into a reduced load on the server.
Lingerd is particularily useful in Apache webservers that generate dynamic pages (e.g in conjunction with mod_perl, mod_php or Java/Jakarta/Tomcat).
this can proves to be a simple way to increase the efficiency of the current setup -- Ap 11:14, 17 Sep 2003 (UTC)

From an experienced administrator of busy websites..
I was a developer a while back for the (purportedly) 17th busiest website in the world and though the sysadmins were more directly involved in improving response speed, I ended up doing a lot of stuff myself.

The way that Wikipedia is slow is very similar to problems we had. The delays are almost all in the initial request for new pages. Once the connection is made, content usually comes across rapidly. This usually points to some sort of full queue in software, or a full queue due to excessive connections on a single machine causing a hardware wait state.

New url requests are made to stand in line, sometimes because settings for maximum simultaneous connections are too low, or the settings are high enough but all RAM is consumed servicing current requests, etc,. This may seem obvious, but it lets us de-emphasize other potential problems such as bloated overworked DB, bogged disk fetches, etc. So, based on all this, I would say the greatest single improvement would be to set up some sort of simple DNS round robin (true load balancing could come later). I'm not sure what your current server setup is, but if you could have at least two Apache servers running on two machines with one of them running the Round Robin algorithm I think the majority of your response problems would disappear. Don't listen to those who say Round Robin is a naive approach. It's true that allocation of new connections is done in a "dumb" way (in a two server setup it will just throw every other connection to the secondary webserver)-- but that's all you really need, I think. Suddenly each machine is servicing half the client connections and everything is fast... Of course, maybe the reasons for your slowness are more complex, but based on what I can see from the client side my suspicion is that a simple Round Robin would clear it all up and that simply adding new Apache processes on new servers as you grow would make you at least 10 times faster during peak times than at present. JDG 14:21, 19 Sep 2003 (UTC)

I took the liberty of editing your message for formatting purposes as I found it extremely useful. --Maio 14:26, 13 Jan 2004 (UTC)

Previous discussion

edit

I know there are a lot af wiki's on this server so most of them will have this problem. So could all the wikipedia's on this server please dissable there maintaince page or edit these pages only on moments with less traffic like at night. I think it will be faster then.

You mean the maintenance pages like Special:Shortpages and Special:Lonelypages? They're slow on en: because the database there is so big. Are they slow on the other languages too? How long do they take to load, on average? -- Tim Starling 08:07, 9 Sep 2003 (UTC)
many of these pages have been already disabled on the bigger ones, such as de et fr. Speed was horrible on international wikis about 8 hours ago; now it is on en again. Sigh
Sorry for my bad English but i wanted to start an all-Wiki discusion how to improve the speed. Sometimes it's very fast and sometimes it verry slow and are you unable to connected with the server. And the growth is just beginning when i look at the stats. Someone one the dutch Wiki but not a system admin. And somethime on the English for [[nl:]] but he is also sometimes verry slow.

Where to discuss Wikipedia really slow

edit

<Hmmm... the follow section, down to Wikipedia Really Slow, is the lower portion of Wikipedia Really Slow, cut from it's logical location and pasted in front of the beginning of Wikipedia Really Slow. Otherwise no changes, except for the addition of a heading. Should go back.> --Ted Clayton 06:19, 14 Sep 2003 (UTC)

I did this as a deliberate move to allow editing of the two portions of discussion - issues and where to discuss - without having to risk a too large edit size for some browsers. It also makes a convenient chunk to remove once the question is resolved. Do you think that this move actually changes the character of the discussion in some way? If you do, please do go ahead and restore things but it appears not to harm meaning in any way.JamesDay 06:52, 14 Sep 2003 (UTC)

Well ... I've sure noticed how pages can grow! ;) Something has to be done, and in fact I've thought on how/whom to approach for advice on dividing this page as it grows (... ;). But, the transposition you made didn't change the size of the page, just split the thread in half and put the whole second half in front of the first half. I'm glad you think the Where to Discuss topic merits deliberate attention. Most of the Wikipedia Really Slow thread that came from Village Pump is disjointed - a set of fits and starts - so there's no sustained narrative to disrupt. Having it sitting in original chronologic order seemed a reasonable 'organization'. ... Ohhhh - you mean you did this to force one those little <edit> links in the right margin? Allowing us to edit smaller portions of the page? Yes, I see that a), we lost the original edit links in the cut 'n past to Meta, and b) we want moderate chunks. I'll figure out <edit> and install them at handy intervals. Thanks! --Ted Clayton 08:27, 14 Sep 2003 (UTC) Yes, that was the intent. Not a smaller page, but smaller edit chunks, which also reduce the chance of an edit collision. If we've reached mutual understanding with no outstanding reservations, how about a one line summary of this aside, or outright deletion, once you've read this reply?:)JamesDay 00:06, 15 Sep 2003 (UTC)

Alternative activities and projects are constructive - good suggestions!, but a direct request for a discussion of important technical matters remains unmet. Is it improper to raise - or respond to - the accessibility issues? Is the difficulty wrapped in a difficulty? -- Ted Clayton 04:16, 11 Sep 2003 (UTC)

You can discuss technical issues at wikitech-l. There are also some pages on meta: m:Cache strategy, m:Main causes of lag, m:One-pass parser, etc. Martin 09:56, 11 Sep 2003 (UTC)

The Village Pump introduces itself as the place to "...raise and try to answer Wikipedia-related questions and concerns regarding technical issues, policies, and operation in our community." (my emphasis) I have raised prominent and systemic technical issues that affect visitors, lay editors and advanced wikipedians alike. There are clear policy and operational considerations connected to these issues. All of these matters - the questions, the concerns, the policies, and the operations are all explicitly identified as the proper, and sole, purpose & content of the Village Pump.


The additional pages that you reference contain important information that helps me a lot. Searching had not uncovered these: thanks! There are links in those pages going to others, and others yet.. I'll explore, dig into the archives, sign up for WikiTech newsletters, and study.

The Village Pump is where these matters should be taken up, firstly. Specialize pages and mailing lists are vital, too, but the broad aspects of the Wikipedia accessibility issues belong in front of the general community of visitors, users, and editors. Perhaps the results of this discussion should be gathered into/under a page (with links to tech pages?) that remains readily & easily available, that rapidly brings newcomers and interested visitors up to speed or directs them as their interests lead.


Hmm .. I see this thread is now slated to be removed from the Village Pump to a place called Lag, wikipedia:lag. That takes it out of public view, before any actual discussion has even occurred. If the increasingly voluminous preliminaries to discussion need to be pared, let's do that and leave the thread here, with a link to Lag. -- Ted Clayton 15:02, 11 Sep 2003 (UTC)

Ted -- stuff gets archived into FAQ pages sometimes -- Tarquin 17:01, 11 Sep 2003 (UTC)

Thanks, Tarquin! I'll scan those. --Ted Clayton 21:05, 11 Sep 2003 (UTC)

  • re: the results of this discussion should be gathered into/under a page: Good idea. Are you volunteering? Angela

I would be the chief suspect. ;) An actual discussion of the matters should be taking place (it hasn't, yet), in a readily accessible, familiar location, and an appropriate place for the link selected, so that it remains in evidence to all comers. But yes, I'll do the work.

A link to such a place would be retired/archived when the issues become history.

  • re: slated to be removed: How else can you stop this page getting too big? Meta is probably a better place to discuss these issues as it affects all Wikipedias, not just en:. Angela

I am only now becoming familiar with the existence of Meta, after a month. Village Pump is the prominently advertised & self-identified location for public matters. This appears to me to be the forum attended by the affected audience.

I do see that this discussion about a possible discussion is getting long, and we need to keep Village Pump usable. But Lag is mostly the record of a problem from a year ago, and it's functionally invisible.

If we move this stuff somewhere, but keep a header, intro and link here, that would be fine: can that work? And make new entries as old ones get large and should be moved? I do in principle like the generality/internationality implied by Meta. Do you think a compendium page such as we mentioned above should be in Meta? -- Ted Clayton 16:55, 11 Sep 2003 (UTC)

A permanent link from here to the lag discussion (which I do think should be at Meta) would be useful as it is a recurring question. Thank you for taking on this task. Angela 19:29, Sep 11, 2003 (UTC)

My pleasure! Do I recall that you must perform the move, Angela? A link in the list beneath the Village Pump introductory paragraph?:

It isn't possible to move a page from here to Meta in the proper way. You have to use the cut and paste method. There is already a m:Why Wikipedia runs slow page over there with not much in it so far, so you may want to add this text to that page rather than create a new one. Angela 23:09, Sep 11, 2003 (UTC)

Wikipedia really slow

edit

Moved from en:Wikipedia:Village pump on Saturday, September 13th, 02003.

  • I'll second Ted's comments. The Wikipedia is close to useless for writing/editing before about 4pm ET. I can get up and read, but I find that I like writing and editing most of all. This is true on a number of sites where I access this system. Dwmyers 21:11, 18 Sep 2003 (UTC)
  • Wikipedia is slow and things are deteriorating. Yesterday, the Wikipedia crashed and no editing could be done. There was a brief announcement, that a problem had occurred with an upgrade. However, the problem appears broader than that. There is, for example, the case of numerous dynamic pages that are now served from a cache: formerly, they were valued tools, but now slow the encyclopedia excessively. Ted Clayton
  • Perhaps Wikipedia needs beefed-up hardware. I'd be happy to see text ads on Wikipedia. 217.155.199.100
  • See Wikipedia:Donations. Angela
  • I think Jimbo has just bought some new hardware. CGS
  • Slow is not the word. I get in so far and get a DNS error. Marshman
  • How do the established Wikipedians get things done, manage the place? Ted Clayton
  • High levels of patience. Add interlanguage links to the less slow Wikipedias. Angela
  • Go to Wikibooks. Marshman
  • Is it improper to raise these issues? Ted Clayton
  • See wikitech-l, m:Cache strategy, m:Main causes of lag, m:One-pass parser. Martin
  • Isn't the village pump the place to discuss it? The results of this discussion should be gathered into/under a page. Ted Clayton
  • Stuff from here gets archived into FAQ pages. Tarquin
  • An actual discussion of the matters should be taking place. Ted Clayton
  • It needs to be moved to keep this page getting too big. It should go to Meta. Angela.
  • Do you think a compendium page should be in Meta? Ted Clayton
  • A permanent link from here to the lag discussion would be useful as it is a recurring question. Angela
  • Do I recall that you must perform the move, Angela? Ted Clayton
  • It isn't possible to move a page from here to Meta in the proper way. You have to use the cut and paste method, which I have now done. Angela.

When I began studying the Wikipedia almost a month ago, I noticed immediately that pages were usually slow to respond, and not infrequently failed to respond. Using the 'pedia has continued to be difficult:   I often give up. Creating and editing articles proved to be a challenge, too:   though I can - sort of - prepare materials outside the Wikipedia environment, there is usually considerable cross-work that can only be done within the database ... which again poses an access/usability problem.

Over the past month, conditions seem to be gradually deteriorating. Yesterday, the Wikipedia crashed and no editing could be done. There was a brief announcement, that a problem had occurred with an upgrade. However, the problem appears broader than that. There is, for example, the case of numerous dynamic pages that are now served from a cache:   formerly, they were valued tools, but now slow the encyclopedia excessively.

The issue is such that I have spent several lengthy sessions (slowly) searching for dicussions on Wikipedia that might shed some light on the matter, without success. Is there such a discussion, and/or might we have a detailed description of the status of the Wikipedia infrastructure, what the source(s) of difficulty are, and what is anticipated to be the situation/solution going forward? --Ted Clayton

I don't know what's causing it, but I agree it's a serious nuisanse. Perhaps Wikipedia needs beefed-up hardware, like aserver farm? If money is a problem, I'd be happy to see text ads on Wikipedia (like on Google) if that would help to provide the necessary resources. -- 217.155.199.100 21:02, 9 Sep 2003 (EDT)
See Wikipedia:Donations. Angela 21:12, 9 Sep 2003 (EDT)
I think Jimbo has just bought some new hardware. CGS 08:28, 10 Sep 2003 (UTC).
It comes and goes, but for the past week I've had to give up attempting to even log in. Slow is not the word. I get in so far and get a DNS error (essentially page could not be retreived before timing out). - Marshman

Yes, the situation is so bad it's surprising editing and other work continues at all:   I have actually checked Recent Changes just to see if other people are still able to do anything. How do the established Wikipedians get things done, manage the place? -- Ted Clayton 20:50, 10 Sep 2003 (UTC)

High levels of patience. :) The other Wikipedias aren't so slow at the moment. Have a wander round those and see if you can add any interlanguage links. Angela 20:57, Sep 10, 2003 (UTC)

So, it's an acquired skill-set? I have noticed an increasing tolerance for punishment.  ;) Interlanguage - hmm. -- Ted Clayton 21:20, 10 Sep 2003 (UTC)

I started a "project" on Wikibooks. When it gets impossible to edit here, I work on that one. It is usually very quiet over there, but lots of work to be done. - Marshman 00:40, 11 Sep 2003 (UTC)

Is there a system for automatically gathering stats on what slows Wikipedia down. - Cumulative timers? I think this would be very beneficial feedback. Much better than speculating.

Ofcourse there are issues that this can't address directly, like client-side caching and bandwidth. Bandwidth is $$ (and it seems bandwidth is one of the more prominent problems.), and client-side caching is a hairy subject because of the wiki-nature. I think client-side caching is also a very prominent part of the problem. But, like i said, it's merely speculative. Cumulative statistics would be very helpfull. Kevin Baas

Bandwidth is not a problem. --Brion VIBBER 18:27, 13 Sep 2003 (UTC)
IFF that's true, then it is either a software or system engineering issue. I know you don't have the luxury for more servers right now, but have you thought of having several computers running the same parsing software? Something like, if parser1 is busy, send it to parser2. How are you currently doing this? IANAL, but it looks to me that your main problem is the either the parsing of the wiki articles and/or having the web and DB server running under the same CPU. I'm for example having a lot of DNS problems: I would sometimes submit an edit, the server will update it, but the page will never load up; so I have to open another window an access the recently edited article on a different window.. the thing is that the loaded page is updated, while the other one is still trying to load. Do you have a page listing Wikipedia's hardware? ..the servers' page just say if they are up or not. It could help a lot if you create a page explaining how you are doing all these processes, as I'm sure there are a lot of CS/CpE/SE Wikipedians. --Maio 14:13, 13 Jan 2004 (UTC)
Found it, Wikipedia:Technical FAQ. :) --Maio 14:30, 13 Jan 2004 (UTC)

Reverse proxy servers (Squid?)

edit

Any signs of connection load slowing things down? Proxy servers between the server and the net to offload connection load from the main servers is one of the techniques Livejournal uses. One big advantage: it's easy to do without interfering with other things. I see that there may be an old server (after the upgrade) around for a test sometime over the next few weeks.JamesDay 05:26, 14 Sep 2003 (UTC)

That wouldn't work well I think; every connection has to check in with the server to see if the page has been updated, and different users see different results of the same URL (logged-in users have their names in the corner and different options; anons may have a "you've got messages" link if someone dropped a message in their IP talk page). The proxies would have to pass through every request and not cache anything themselves.
As these solutions go, it would be simplest to just drop in an arbitrarily large number of web servers, sharing the upload space and session files by NFS and getting hit round-robin. (And, if we get memcached support finished, they would share the distributed cache.) --Brion VIBBER 06:04, 14 Sep 2003 (UTC)
What LJ found was that their web servers were being tied up dribbling data to people with slow connections. Adding the proxy server let the web server send out the completed page to the proxy at full LAN speed instead of waiting. Then the proxy handled the dribbling while the web server got on with real work. Net result: lower number of connections to the web server, less process overhead and significant performance gain. So LJ has as many proxy servers as they need for dribbling, as many web servers as they need for page building, then the database bits behind them.JamesDay 07:00, 14 Sep 2003 (UTC)
Ah, that makes a certain amount of sense. Right now though we're CPU bound on the web server, so that's the priority. --Brion VIBBER 07:26, 14 Sep 2003 (UTC)