Talk:List of Wikipedias/Archive 5
Please delete or redefine "depth"
editThe current definition for "depth" is in my opinion not usable. Actually, it doesn't make any sense. Using the total of "edits" is something I can partly understand, but the number of non-articles is not relevant at all. Different communities will use different ways of discussing subjects, archive their talk pages and the number of discussions is not naturally a indicator of quality in all communities. So why not use variables like article-length or something? Until nothing better exist it's better to remove this "rank"! --Jeroenvrp 21:10, 20 August 2007 (UTC)
–Agreed! 85.80.150.50 19:38, 25 September 2007 (UTC)
Sure, lets delete it, i never got it in the first place, before i just had "stub ratio" but then the "depth" idea came up here by another user. 213.61.163.194 13:40, 26 September 2007 (UTC)
We removed "depth" from the script. RobiH 17:09, 26 September 2007 (UTC)
- Why? It was the most interesting element!Benoni 20:18, 28 September 2007 (UTC)
- Replacing something with nothing? Not an improvement! -MarsRover 06:33, 29 September 2007 (UTC)
- I defined "Depth" better, as some people requested. And note that both "Non-Articles / Articles" and "Edits / Articles" are important. Here's why:
- 1) The first formula emphasizes the fact that the article count of a Wikipedia is just the tip of the iceberg, User Pages and Discussion Pages being a crucial indicator of "Wikipedianness" (at least more so than the article count itself)
- 2) The second formula emphasizes the fact that some Wikipedias might only include some copied & pasted articles (and many of them may even have copyright issues) or articles written by only one person (which doesn't necessarily mean they are biased or non-academic, but surely means they are not "Wikipedian"). It also emphasizes that many articles in many Wikipedias are automatically generated (for more or less "honorable" reasons), which this second formula reflects quite accurately, enforcing more transparency (whether some admins like it or not).
- Addressing some objections. "Depth" is all about the ratio between the authoring work and the end result. Yes, I am aware that the current formula for depth is only the best one available. I still hope somebody can find a way to collect even more relevant data as a basis for a more accurate depth formula. The current one is just a quick fix. On the other hand, most objections to the current formula are easily refuted by the next two arguments:
- 1) All the other parameters are exposed to similar distortions that make them less relevant than we would like them to be -- actually this "Depth" parameter is based just on them! Here they are: "Edits" is affected by the edit wars, the "Articles" count is affected by automatically created articles, and "Non-Articles" is affected in many Wikipedias because of outdated pages. Shouldn't we then eliminate them all from the table?
- 2) A correct understanding of its definition makes "Depth" much less objectionable. As stated above, "Depth" is not about academic "static perfection", but about "ecosystemic vitality". For instance, while it is true that editing wars can affect "Depth" (just as/because they affect "Edits"), editing wars are a normal part of a wiki-based Encyclopedia's life. They still show how much editing attention a certain Wikipedia gets. The more editing wars, the more Wikipedianness. When we understand the FREE part in "Wikipedia, the FREE Encyclopedia", we certainly start to appreciate Wikipedia's "ecosystemic" (lack of) accuracy and (lack of) perfection -- and we are happy with such more fuzzy-logical parameters as.. "Depth". :)
There are images on the Spanish wiki
editThis page says '0' for number of images for the Spanish wikipedia. Is this because all the images are from commons or has nobody bothered to check. - FinalWish
- The Spanish Wikipedia has no images since 11 June 2007. All images used come from Commons. This Wikipedia is ranked fourth in number of images used. Please notice that in few days, the Portuguese Wikipedia will delete its last image. Best regards, Alpertron 16:34, 25 August 2007 (UTC)
- Would it be possible to insert some sort of explanatory note? It is somewhat disconcerting to see a "zero" for images - it makes it seem like the Spanish Wikipedia is image-free, rather than being image-rich and collaboration minded. (Or even better - could the images number be supplemented with one that includes those images transcluded from the commons? This gives a better indication of the "image density" of the wikipedia - whether an image is hosted on the commons or the local wikipedia is somewhat arbitrary, anyway.) -- 16:20, 27 December 2007 (UTC)
That's one of the absurd politics of the awful Spanish wikipedia, as you can see it is one of the poorest wikipedias and it has 500 million of potential users, please thank the Spanish administrators
Quenya or Sindarin
editI think a lot of people would love it if Elvish was included. I personally am not proficient enough to write article but I know at least a hundred who would be.
- There is a problem: Klingon is also a fictional language and no articles can be created or edited for that language's Wikipedia. Quenya and Sindarin would have the same fate if they were created. Johnny Au 13:41, 1 September 2007 (UTC)
- The Klingon Wikipedia was closed (resp. moved to http://klingon.wikia.com) mainly because it was dead, not because it is a constructed language. There is a number of Wikis in constructed languages, including Lojban and Esperanto. So constructed languages are as such eligible for new projects. However, before a Wiki can go live, it has to have an active editing community and a successful test project. Looking at the Quenya test project at the incubator, you will see that it is dead. Somebody started it, but apparently lost interest. --Johannes Rohr 09:08, 29 September 2007 (UTC)
New Metric
editI think that the formula (edits x users) would be a better metric than just articles. 200.248.254.100 15:56, 4 September 2007 (UTC)
- # of Users is easy to inflate. Advise against it.
199.239.101.125 17:58, 5 September 2007 (UTC)
Hiri Motu
edithttp://ho.wikipedia.org/wiki/ has been closed, but it's still listed. I read the comment in the namespace page there saying not to bother editing as it'll be overwritten soon, but it's been closed for months but is still listed... just thought I'd let you know. (en:Nach0king) 17:25, 5 September 2007 (UTC)
Depth
editSorry, but where's the depth disappeared?
- I was thinking the same thing
- Yes, it's probably the most important element to make a difference between true encyclopediae and "fake" or "artificial" ones (i.e. volapük, lombard and cebuano which have a depth of 0 since most of their pages are the creation of bots)...Benoni 20:16, 28 September 2007 (UTC)
but where's the depth disappeared?
please see Talk:List_of_Wikipedias#Please_delete_or_redefine_.22depth.22 Mutante 22:13, 5 October 2007 (UTC)
Hey, please bring the "depth" column back! If you think it's relevant enough, then mention that in the note, but don't remove it completely, especially considering that there are other columns even less relevant than Depth. Sometimes even the Article Count is less relevant than Depth!
- Regarding relevancy of the Depth column, it may not show the absolute quality of a Wikipedia, but it surely does indicate it's "wikiness" (i.e. how collaboratively it was written, and how much attention the articles got).
- And anyway don't remove it just because it was so requested by some frustrated admins of some artificially-inflated Wikipedias (which got low Depth rank)! Aren't more admins supposed to vote for this first?
- 195.71.90.10 04:51, 29 September 2007 (UTC)
- As for me the "depth" was quite a funny thing, for example in ukrainian community this word ("глибина") began to be like an idiom for some doubtfully usefull editings. I'd prefer the "Depth" to be reserved here --A1 09:06, 29 September 2007 (UTC)
I think depth has been restored now.. IT is really required.. Moreover.. we need a list of wikipedia sorted with respect to its depths..--Vssun 21:30, 29 September 2007 (UTC)
- The problem is that one (or more) user(s) has/ve begun an editing conflict against "depth".Benoni 18:12, 30 September 2007 (UTC)
- Probably those users have something to hide. Some Wikipedias are full of automatically created articles, you know, and their Article count is very nice looking. It's just this stupid "Depth" parameter that stands in their way for world domination... :D Khenriksen 20:07, 30 September 2007 (UTC)
- Many of us are just copying and pasting from the link to update the tables, but it is not my fault that it did not have depth. Others did the same thing as me. Please do not call us vandals if we are in good faith. Please look at my earlier contributions: they are mostly updating the tables. I actually have a neutral stance on depth. Thank you. Johnny Au 03:29, 1 October 2007 (UTC)
- If you keep calling us vandals, then someone should change the link such that it shows depth, which would prove that we are in good faith. I added a note near the link to update the tables. Thank you. Johnny Au 03:46, 1 October 2007 (UTC)
- I understand you are in good faith, but were has depth disappeared in the table you use? Do you know were is this table coming from? There used to be depth until last week, and then it suddenly disappeared...Benoni 12:09, 1 October 2007 (UTC)
- Probably Johnny Au updated this page while the script was still not changed back to normal. Khenriksen 14:43, 1 October 2007 (UTC)
- I was only trying to be funny :), I am sorry that this happened to offend you, Johnny Au. I didn't even know it could refer to you, as I didn't watch the editing history. I am saying this because I know you from our previous "interaction" (I even mentioned you in my comments above) and I know all you do is update the tables from the script. Anyway, I never used the word "vandal(ize)" :). I guess some people thought (some of) you kept on updating the tables without the "Depth" column even after "Depth" was restored in the script. Khenriksen 14:43, 1 October 2007 (UTC)
- I understand you are in good faith, but were has depth disappeared in the table you use? Do you know were is this table coming from? There used to be depth until last week, and then it suddenly disappeared...Benoni 12:09, 1 October 2007 (UTC)
- If you keep calling us vandals, then someone should change the link such that it shows depth, which would prove that we are in good faith. I added a note near the link to update the tables. Thank you. Johnny Au 03:46, 1 October 2007 (UTC)
- Many of us are just copying and pasting from the link to update the tables, but it is not my fault that it did not have depth. Others did the same thing as me. Please do not call us vandals if we are in good faith. Please look at my earlier contributions: they are mostly updating the tables. I actually have a neutral stance on depth. Thank you. Johnny Au 03:29, 1 October 2007 (UTC)
- Probably those users have something to hide. Some Wikipedias are full of automatically created articles, you know, and their Article count is very nice looking. It's just this stupid "Depth" parameter that stands in their way for world domination... :D Khenriksen 20:07, 30 September 2007 (UTC)
Hi all! I can understand that you want to keep the Depth in the list. But is it not contraproductive always to revert the updates on the table and to replace with a version that is totally out of date (at least one week old). This makes this page less useful than if it would be up-to-date and just the depth would be missing. The final solution should be to take the depth into the updating script. But as long as this not happens, please don't revert all updates. Thank you. Sökaren --193.251.160.225 20:56, 2 October 2007 (UTC)
- "But as long as this not happens, please don't revert all updates." Who says it didn't happen? While I must agree your comment is simply idiotic, I must add it's not what you're saying that is stupid, but your timing. This "reverting vandalism" problem has been solved long ago.
- You may think it's just a matter of a few days. Yeah, right, but it seems these few days matter to you too. After all, you're the one that was terrorized by "totally out of date" versions of this page (which were actually several days older). Probably seeing one hundred articles more in each Wikipedia mattered to you more than depth.
- In other words, how on earth could this page be "less useful" with slightly outdated (and irrelevant) numbers but with depth? What are you, a.. counter, not a human? Isn't this page more useful with depth? I mean, honestly, who visits this page without watching depth?
- Thanks for this great anonymous answer :-) I accept your point of view, but are all these affronts necessary? To call someone for stupid, idiotic, not being a human etc. as he has another opinion??? Really phantastic! One reason why I frequently watch this page is to see how the Upper Sorbian language is growing, I think this is a proper "human" usage of this page. That's why I wrote the above standing argument and I would ask you to accept this and other opinions in general, even if you dont agree. Thanks! --193.251.160.225 17:06, 3 October 2007 (UTC) (Sökaren)
I have included the depth column with the new formula given below in "Depth 2.0" discussion. Decide yourself if you post from [1], and please continue the discussion there. Mutante 12:03, 7 October 2007 (UTC)
Name of the Irish language
editThe official name of the Irish language in English is 'Irish' not 'Irish Gaelic' as it states in the list. Can it be changed?
Official EU web site: Official languages of the EU
86.43.213.95 09:08, 1 October 2007 (UTC)
- I've changed it to 'Irish'. 86.43.213.95 00:28, 2 October 2007 (UTC)
- I've also changed it to 'Irish' in my script database, also for wiktionaries, wikiquotes, wikisources. Mutante 17:47, 17 October 2007 (UTC)
Where'd aa go?
editWhy did the Afar Wikipedia disappear from the table? The domain still exists. You can even still edit the wiki. - dcljr 02:58, 2 October 2007 (UTC)
Uhm, i have no idea how that slipped through, but now i (re-)added it. Mutante 19:26, 18 October 2007 (UTC)
Kazakh is already 1000
editSomebody please edit tables Ash
Depth 2.0
editThanks for the return of the depth metric. But I have a question regarding it (and please don't use this as an argument to delete it). Does it include Bot edits in the formula such as the ever increasing amount of interwiki links that are being generated? Also, an even smarter enhancement is to ignore a user doing an "undo" and the person that did the initial edit, so as to not include vandalism in the edit counts. --MarsRover 19:48, 3 October 2007 (UTC)
- I agree, since this may be the reason why English and many languages with less than 10,000 articles have a high depth count, despite having little content. However, a better suggestion is to use the average (mean) article size as depth, i.e. total article size divided by total number of articles. Johnny Au 21:41, 3 October 2007 (UTC)
- Large articles are not necessary more "in depth". InfoBoxes, long lists and categories would inflate the depth. Also, bots could inflate that metric with verbose text. If you are interested in "bytes per article" that information is already available. The current depth formula does a good job is measuring how much work was done per article (copy editting, tweaks, updates, photos, graphics, etc.) and we can assume that work improved the article's completeness. But a lot of the edits nowadays are the result of "interwiki links" and "vandalism". I am wondering if that is the reason English wikipedia has such a high depth. What is the depth if we cleaned the 4 inputs to the depth formula? --MarsRover 23:28, 3 October 2007 (UTC)
- There is a problem; the bytes per article for English had not been updated for over half a year. If there are too many lists, which make up the largest articles in terms of size, then this is not a good indicator of depth. My new suggestion for depth 2.0 is to use the old depth formula, but ignoring bot edits and vandalism/reverts. Johnny Au 02:07, 4 October 2007 (UTC)
- Large articles are not necessary more "in depth". InfoBoxes, long lists and categories would inflate the depth. Also, bots could inflate that metric with verbose text. If you are interested in "bytes per article" that information is already available. The current depth formula does a good job is measuring how much work was done per article (copy editting, tweaks, updates, photos, graphics, etc.) and we can assume that work improved the article's completeness. But a lot of the edits nowadays are the result of "interwiki links" and "vandalism". I am wondering if that is the reason English wikipedia has such a high depth. What is the depth if we cleaned the 4 inputs to the depth formula? --MarsRover 23:28, 3 October 2007 (UTC)
Where did the depth go? I don't see it anywhere. Afrikaans had such a high depth :/ — Adriaan (T★C) 10:58, 6 October 2007 (UTC)
- Some frustrated people from fake wikipedias keep suppressing depth :-(
I have some suggestions for a new depth formula. From S23.org, I see a column called "stub ratio". I think it can be used as one of the factor of the depth. The new formula can be (Edits/Articles × Non-Articles/Articles)/(1 + stub ratio) . If the stub ratio is high, the depth will be lower. -- Kevinhksouth 17:43, 6 October 2007 (UTC)
I have included the depth column again with the new formula given above. Mutante 11:59, 7 October 2007 (UTC)
- Ok, but how is the stub ratio calculated at first, because this new formula gives to it a very important weight? Otherwise, I think it would be better to keep to the closest round number. And there are obvious problems: see Ripuarian for example...Benoni 12:45, 7 October 2007 (UTC)
- Thank you for accepting my suggestions. However, I think that at most 2 decimal places (or even just round number) is already okay for the table on List of Wikipedias. -- Kevinhksouth 15:01, 7 October 2007 (UTC)
- Yes, round to a whole number. It a rough estimate so giving that level of precision is misleading. --MarsRover 03:23, 8 October 2007 (UTC)
- Please, round up. --Meldor 07:05, 9 October 2007 (UTC)
I think a important attribute of "Depth 2.0" is to be bot-proof. Having a bot add "redirects" shouldn't increase the depth. MarsRover 03:23, 8 October 2007 (UTC)
I have rounded to a whole number now, and the stub ratio is calculated as good/total. Addiotionally there was a bug in my script that resulted in ratio being 0. Now there should be accurate numbers. Mutante 16:31, 8 October 2007 (UTC)
- And what is a "good" article when calculating the stub ratio? Benoni 22:00, 8 October 2007 (UTC)
- Technically spoken, "good" articles are pages, that are neither
- redirects
- discussion pages
- image description pages
- user profile pages
- templates
- help pages
- portals
- articles without links to other articles
- pages for Wikipedia administration
RobiH 20:49, 17 October 2007 (UTC)
- So, this implies a "stub" is a good article, since it doesn't fall into any of these categories -- technically. --83.85.142.49 10:08, 17 November 2007 (UTC)
I agree with MarsRover that, if the point of the depth parameter is measuring human activity (is it? should it be?), not simply total activity, then bot changes (adding redirects and interwiki links, correcting spelling mistakes, changing images to newer versions, etc.) shouldn't be counted. NB: how is the stub ratio calculated? I note that most of the articles on the Volapük Wikipedia are small summaries with only the most important information (surface, population, geographic coordinates) and little text; yet the Volapük stub ratio looks surprisingly low at only 0.75 (see the s23.org list). I'd have expected something around 0.94. Smeira 15:23, 10 October 2007.
Mutante, I still believe the Depth column would fit best right after the article count. Also, you forgot to dismiss irrelevant depths. By the way, I suggest changing the relevancy rule to:
if ($depth > 300 && $row['good'] < 100000) { $depth="--"; }
It is a more realistic cleanup. Note that it's 300 instead of 200 and 100 000 instead of 10 000.
Reasons for the suggested change:
- it takes into account the current depth of the English Wikipedia (which is the best candidate as a inter-Wikipedia "frame of reference"); the best alternative would be a condition like "$depth >= $EnglishWikipediaDepth" instead of "$depth > 300"
- the old limit had already been loose enough, so, considering the steady evolution of the article count of most Wikipedias, we now afford to set the limit to 100 000
- all Wikipedias between 100 000 and 10 000 are almost never expected to exceed a depth of 100, let alone 300, so they won't be affected by the new formula anyway
Khenriksen 05:21, 9 October 2007 (UTC)
- 300 is fine. Zh-classical: stands at 144.Hillgentleman
"yet the Volapük stub ratio looks surprisingly low at only 0.75" Well, they surely forgot to put stub templates at the end of their articles :) Anyway, an article can be stub if missing crucial information, even beeing long (e.g. unfinished biographies). So stub ratio depend also on the sincerity of the editors. 78.92.37.163 21:30, 16 October 2007 (UTC)
- That is certainly true -- though I suppose there could be some pressure from other wikipedias for people to add stub templates to articles that don't satisfy certain criteria (which?). Anyway, I work on the Volapük wikipedia; we do have a stub template there ({{sid}}), but I stopped adding it to stubby articles because it had absolutely no effect on the autmoatic article count of the statistics page. Nothing I did ever prevented all new stubs from being counted as "good articles"; at one point I thought there was a page somewhere (on Meta?) where stub templates could be reported, and that this would affect the statistics pages on all wikipedias, but I never found such a page. Also, I frequently see "stubs" being defined more or less mechanically as "articles with less than X words" or "articles without links" -- I'm not sure that's a good idea. Mutante, do you take stub templates into account when doing calculations? Would there be a way to add this feature to your programs, or are you limited to system variables like good and articles? Smeira 02:25, 24 Oct 2007
Just implemented the "if ($depth > 300 && $row['good'] < 100000) { $depth="--"; }" line. Mutante 17:08, 17 October 2007 (UTC)
- Cool! As I explained above, probably it would be best to use something like "$EnglishWikipediaDepth" instead of 300, if easy to implement, but it's OK as it is, too. Thank you for your cooperation. ;) Khenriksen 04:29, 22 October 2007 (UTC)
A problem with Depth 2.0?
editMutante and Kevinhksouth, I think there is a problem with the new formula: (Edits/Articles x Non-Articles/Articles)/(1+stub ratio). Since Mutante says stub ratio = good / total, then it really does not reflect the fraction of stubs, but the fraction of good articles in a Wikipedia. So a low stub ratio means many stubs and proportionally few good articles, and a high stub ratio means many good articles and proportionally few stubs. If this is so, then it seems the formula yields higher depth for Wikipedias with more stubs: many stubs => stub ratio (good/total) is small, i.e. close to 0 => 1 + good/total is close to 1 => the weight 1/(1+stub ratio) is close to 1, and depth 2.0 = depth 1.0; and few stubs => stub ratio (good/total) is large, i.e. close to 1 => 1 + stub ratio is close to 2 => the weight 1/(1+stub ratio) is close to 1/2, so depth 2.0 = 1/2 of depth 1.0. I.e., depth 2.0 tends towards 1/2 of depth 1.0 if there are few stubs, and it tends towards depth 1.0 if there are many stubs. So, as the number of stubs increases, depth 2.0 increases from 1/2 of depth 1.0 towards depth 1.0. Should depth 2.0 increase when the number of stubs increases? My intuitive expectation would be exactly the opposite. (This would be remedied by taking as the weight not 1/(1+stub ratio), but simply (1-stub ratio): i.e., make the formula be: (Edits/Articles x Non-Articles/Articles x (1-stub ratio)).) Smeira 15:21, 16 October 2007.
I changed the depth formula as suggested above, so 1-ratio at the end. Mutante 17:09, 17 October 2007 (UTC)
- The forumla and depth numbers on the table are out of whack. mr depth according to (Edits/Articles x Non-Articles/Articles x (1-stub ratio)) is ~16. The table says 26
- 199.239.101.125 16:36, 18 October 2007 (UTC)
Uhm, well yeah it was $depth=($edits/$good*$nonarticles/$good)/(1-$ratio); :p, but i could swear i was told to enter it this way last time. Anyways, i changed that to $depth=$edits/$good*$nonarticles/$good*(1-$ratio); but now mr. depth is about 9 ?! [2] Mutante 19:47, 18 October 2007 (UTC)
- Sounds right from the calcs. My previous calc used total/good instead of non-articles/good :)
- 199.239.101.125 22:49, 18 October 2007 (UTC)
Hm, with this definition the depth values end up looking a bit too high. Wouldn't it be a good idea to divide them by 10? It would just rescale the results, not change the order. Smeira 12:12, 19 October 2007.
- I agree. It would be more readable.Benoni 22:59, 20 October 2007 (UTC)
- My opinion is that we shouldn't divide depth by 10. The exaggerated depths you are talking about are irrelevant anyway so eliminating them (which has already been done) is more common sense than making them even more relevant than relevant by allowing them to influence all the other depths (the valid ones). Khenriksen 06:06, 22 October 2007 (UTC)
- I was actually talking about the upper part of the table: English with 2800, most others in the upper hundreds etc. It may be just aesthetical prejudice, but it seemed to me that keeping the values below 1000 by rescaling them -- without compromising their order, comparability, or information content -- would make them more readable. If most other people don't agree, that's OK, keep the old values, no big problem there. Smeira 02:30, 25 Oct 2007.
- What the hell are you "actually" talking about? Are you sure you know what page you're talking about? When did you see depths above 1000 in the upper part of the table?
- The problem had since been fixed. [3]. Try using the history page next time. Cheers. MarsRover 19:17, 27 October 2007 (UTC)
- I saw depths above 1000 in the upper part on 20 October, as MarsRover shows. But since then the depths have been divided by 10, so there is no longer a problem. No need to be so aggressive, nameless stranger. --Smeira 05:49, 29 October 2007 (UTC)
- The problem had since been fixed. [3]. Try using the history page next time. Cheers. MarsRover 19:17, 27 October 2007 (UTC)
- What the hell are you "actually" talking about? Are you sure you know what page you're talking about? When did you see depths above 1000 in the upper part of the table?
- I was actually talking about the upper part of the table: English with 2800, most others in the upper hundreds etc. It may be just aesthetical prejudice, but it seemed to me that keeping the values below 1000 by rescaling them -- without compromising their order, comparability, or information content -- would make them more readable. If most other people don't agree, that's OK, keep the old values, no big problem there. Smeira 02:30, 25 Oct 2007.
- My opinion is that we shouldn't divide depth by 10. The exaggerated depths you are talking about are irrelevant anyway so eliminating them (which has already been done) is more common sense than making them even more relevant than relevant by allowing them to influence all the other depths (the valid ones). Khenriksen 06:06, 22 October 2007 (UTC)
Not-existent Wikipedias
editWhy listed not-existent Flemish and Pontic Wikipedias?!--89.218.165.172 05:24, 5 October 2007 (UTC)
West Flemish and Abkhaz deliver valid statistics. RobiH 21:08, 17 October 2007 (UTC)
mazandarani's articles
editplease count mazandarani's articles .
- mzn is already included. RobiH 21:08, 17 October 2007 (UTC)
- ...that is an amazingly cool trick. Is there somewhere that these sorts of things are documented? EVula // talk // 21:47, 17 October 2007 (UTC)
Percentage increase?
editHi, I read the stats with great interest, but I would be happy to have percentage increase in articles added. To have place for it one may remove total number of pages - as they are not that interesting, having articles is what is needed really. It is quite interesting to see how wiki's grow by average number of articles and this should be possible to integrate into the stats. Ulflarsen 12:34, 10 October 2007 (UTC)
To calculate increases we would have to start collecting historic data, e.g. at least save the last update to compare to. Currently my script doesnt do that, but its planned. There are also graphical stats by Martin Kozák who saves data from my stats to create graphs. Mutante 17:16, 17 October 2007 (UTC)
Page depth of Ripuarian Wikipedia
editWhat does the page depth of the Ripuarian wikipedia (http://ksh.wikipedia.org) (= 2570) indicate? Is this is because of the huge number of the edits that wiki is having.
But I wonder how a wiki with only 271 users has more than 4 lakh edits. The number of images(42) are also less. I wonder whether there is any human activity happening in that wiki.
Is this is beacuse of the bot activity? In that case we should not include the edits by bots while calculating the page depth. If we are doing so there is no meaning in calculating the page depth.--Shijualex 04:49, 11 October 2007 (UTC)
- Good question. Why don't you go and have a look? Hillgentleman 06:16, 11 October 2007 (UTC)
- Yes, Ripuarian too is a reason to add the relevancy rule back to the depth formula and to update it (see my comments above). And it looks like it will soon pass the 10 000 limit.
- Khenriksen 07:19, 11 October 2007 (UTC)
Yes it is the bot activity that is increasing the number of edits. See the recent changes page http://ksh.wikipedia.org/wiki/Spezial:Letzte_%C3%84nderungen (See the page with and without edits by the bots).
Today only 4 manual edits are there till this time. But the number of edits by bots are more than 1200!!
Also a huge number of stub articles are there. --Shijualex 08:51, 11 October 2007 (UTC)
- There are not so many more stub articles than other "young" Wikipedias have. The far more bigger problem (bigger by count, that is) stems from a unusual high percentage of redirects, which to a great extent are bot-created. They are needed because Ripuarian is a collection of more than 100 languages having inidividual, and varying, writing systems and writing styles, as well as being spoken quite differently. (E.g. year = Joch, Jor, Johr, Joohr, Joh, Joah, Joar, and Joa)
- So again, let me suggest to not count redirects and edits to make redirects in the depth, so as to make it a better tool for comparison of real content.
- Both redirects, and bot edits, do add value and usability to a wikipedia, of course, but their average is likely differing from a human edit. Weighing them correctly may require some study, though. My apriory assumption is that, the younger, and the less developed a Wikipedia is, the more likely does a bot edit add a comparatively high value to it while in the more mature Wikipedias, their relative gain decreases more and more. --Purodha Blissenbach 16:24, 9 December 2007 (UTC)
pagedepth calaculation is too funny
editThe current pagedepth calaculation is too funny. Now the wikies with high number of articles (most of them stub atricles) got very high page deph!!!. (See the page depth of all wikies where the number of articles are less than 10,000). The smaller wikies with with less number of articles but higher number of edits and good encyclopaedic articles got less page depth.
I feel the earlier page depth calculation was far better than this calculation which depends only on the number of articles.
If this is the way the page depth calculation is going then the number of aricles field is more than enough to show the quality of wiki. --203.199.150.10 05:52, 20 October 2007 (UTC)
Depth formula
editWhat is the current depth formula? Borgx 14:19, 26 October 2007 (UTC)
if ($good==0) {
$depth="-";
} else {
$depth=$edits/$good*$nonarticles/$good*(1-$ratio);
if ($depth > 300 && $good < 100000) { $depth="--"; }
Mutante 11:21, 27 October 2007 (UTC)
- Thanks, but how do you calculate $ratio, I mean there is no {{RATIO}}> keyword isn't? Borgx 00:07, 28 October 2007 (UTC)
- As I recall, Mutante said $ratio is good/total. I found it a bit misleading to call it "stub ratio"; it looks more like the "good article ratio". --Smeira 05:52, 29 October 2007 (UTC)
- As Mutante says the stub-ratio is good/total. It is not the stub pages ratio, so it makes no sense in the depth formula:
(Edits/Articles) × (Non-Articles/Articles) × (1 − Stub-ratio) =
(Edits/good) × ((total-good)/good) × ((total-good)/total)
- The Non-Articles are overvalued as they are included in two factors. --Vriullop 16:54, 29 October 2007 (UTC)
- But where is the number of good article coming from?Benoni 23:41, 30 October 2007 (UTC)
- From the raw statistics query (e.g. for de.wikipedia). Indeed total-good is overvalued. Depending on how much importance one wants to give to stubs, this may be good (depth will decrese inversely to the square of non-good/total = approx. fraction of stubs). I wonder then if it wouldn't be better to make the third factor the same as the second, i.e. either:
(edits/good) × ((total-good)/good))2
or(edits/good) × ((total-good)/total))2
.
Alternatively, I'm now wondering if it wouldn't be better to keep the two terms as independent measurements -- e.g. by puttingedits/good
and(total-good)/total
in two different columns in the table. --Smeira 01:26, 31 October 2007 (UTC)$ - Now a different question: why is the first term
edits/good
rather thanedits/total
? Since the number of edits refers to all articles, not just good ones,edits/good
is actually higher than the actual average number of edits per good article -- it is making them look like they were edited more often than they really were. --Smeira 01:39, 31 October 2007 (UTC)
- From the raw statistics query (e.g. for de.wikipedia). Indeed total-good is overvalued. Depending on how much importance one wants to give to stubs, this may be good (depth will decrese inversely to the square of non-good/total = approx. fraction of stubs). I wonder then if it wouldn't be better to make the third factor the same as the second, i.e. either:
- But where is the number of good article coming from?Benoni 23:41, 30 October 2007 (UTC)
- As I recall, Mutante said $ratio is good/total. I found it a bit misleading to call it "stub ratio"; it looks more like the "good article ratio". --Smeira 05:52, 29 October 2007 (UTC)
Common table
editWhat about moving the table section to a common template used by all translations of this page? I think it's a bit difficult to upgrade all languages every time and most times it comports an out-of-date version of the page.--Iradigalesc 21:33, 28 October 2007 (UTC)
- Feel free to do it. RobiH 08:11, 29 October 2007 (UTC)
How rename Tajik Wiki
editHi, Can anybody help for us? Problem: we want to rename the "tg.wikipedia.org" to "tj.wikipedia.org" because the short name of Tajikistan is "TJ". Thanks.
Tajik Wiki users 217.11.177.6 06:33, 1 November 2007 (UTC).
- Hello, the ISO 639-1 for w:Tajik_language is tg not tj, best regards, --birdy geimfyglið (:> )=| 07:34, 1 November 2007 (UTC)
- The codes for languages and the codes for countries differ: thus tg/tj, uk/ua, et/ee. Коды языков и коды стран могут отличаться: так таджикский и Таджикистан это tg/tj, украинский и Украина uk/ua, эстонский и Эстония et/ee. Поскольку речь о стандарте Международной организации по стандартам (ISO), сменить код языка невозможно (разве что через ISO). Slavik IVANOV
The above is a new ranking of Wikipedias which I'm working on, based on the List of articles every Wikipedia should have. Would it be OK if I added a link to it to the content page here (under 'See also')? (Note that only the largest Wikipedias are listed thus far; the others are still being evaluated. Ultimately all of them should be there. My intention was to find an alternative to article count as a way of ranking and evaluating Wikipedias. Of course, there are also problems in the measure I propose there. I'd be thankful for any comments and insights.) --Smeira 16:44, 7 November 2007 (UTC)
- Since nobody has expressed any objections, I'll add a link to the content page. --Smeira 19:06, 9 November 2007 (UTC)
Closed wikipedias
edita number of closed wikipedias (mostly with one article) are indexed at the bottom. Suggest removing them.147.197.199.106 13:52, 12 November 2007 (UTC)
Growth rate
editIs there a way to add growth rate to this list? 81.159.138.224 17:18, 12 November 2007 (UTC)
Sub-groups of Language Wikis at 500,000+
editYesterday, German and French were in a separate category of Wikis of languages with more than 500,000 articles. I anticipated Japanese, Italian, Spanish and Portuguese would soon join that list, and possibly Polish and Dutch too. (Russian and Chinese would have to wait until Wiki enthusiasm takes off among speakers of those languages.) But today German and French are back in a list of "over 100,000." Why is that?
I agree. Please see my recent comment here: http://en.wikipedia.org/wiki/Talk:Main_Page#Wikipedia_languages -- now six languages over 500,000, so seems logical to add this additional grouping at 500,000+. Jimthing 22:48, 5 February 2009 (UTC)
Volapük
editI request that Volapük be removed from this list. As it consists almost entirely of bot-created stubs, its listing as the "15th largest" Wikipedia is misleading. It is a special case and should be treated as such. It could be listed at the bottom with an explanation of why it is there. It has one contributor.--Jimbo Wales 23:09, 14 November 2007 (UTC)
- Indeed - per vo:Gebanibespik:Smeira#Answers.3F, the bot run was specifically to publicise Volapük by getting it a high ranking in lists such as this and on the front page of www.wikipedia.org. Volapük is a real language with a long history, and there's no consensus to remove the Wikipedia itself - but rewarding such gaming seems ill-advised. Anyone want to do the looong painful edit of renumbering the whole list? - David Gerard 23:19, 14 November 2007 (UTC)
- Hmm. From David's links and Jimmy's arguments, I think that this seems to be an inappropriate listing for us to have. If we continue to have consensus, I will correct the list as suggested in a day or so.
- James F. (talk) 16:52, 15 November 2007 (UTC)
- Let me note that this "gaming" has actually increased the community of contributors to the Volapük Wikipedia (cf. arguments below) -- a goal that everybody seems to agree is good (cf. the Three-year plan here at Meta). --Smeira 15:17, 16 November 2007 (UTC)
- This list is regularly updated using "S23 WikiStats" (a tool that provides up-to-date statistics for various wikis), so simply correcting it would be pointless. We should use another tool instead, or a customized version of WikiStats.
- Also, please note that several languages are inappropriately listed too, starting with Lombard (next to Volapük in the list); some bot-created Wikipedias were removed from en:Template:Wikipedialang ([4], [5]). Korg 03:23, 16 November 2007 (UTC)
- OK. Note also that the existence of stubs aplenty even in the major Wikipedias probably implies that their ordering is not correct (Polish comes to mind; compare the relative positions of Polish and Spanish here and in the List of Wikipedias by sample of articles). --Smeira 15:34, 16 November 2007 (UTC)
- Maybe the versions whose depth is lower than 10 would be hidden from the table? Is is a good solution? -- Kevinhksouth 14:09, 16 November 2007 (UTC)
- That seems more adequate and less prejudiced against one project. Note, however, the problems with depth itself. So, for instance, I am now adding redirects to the Volapük Wikipedia (like the ones on en.wiki for US cities), which has already caused its depth to grow from 1 to 3. If my estimations are correct, it should reach 10 by the middle of next year. Shouldn't there be some discussion on better parameters? --Smeira 15:17, 16 November 2007 (UTC)
- Depth can be easily bloated too. Maybe the number of registered users is a little more solid. Currently, seven wikipedias present more than 50 articles per user:
- Volapük: 383 articles by user; Lombard: 287; Newar / Nepal Bhasa: 224; Bishnupriya Manipuri: 169; Tarantino: 84; Cebuano: 83; Piedmontese: 53
- For comparison: English: 0.35 articles per user; German: 1.39; Finnish: 2.02; Esperanto: 25; Ukrainian: 9.6; Basque: 12.8; Simple English: 1.60.
- Beside the assumption about the quality of mass-created articles, there is also the fact that so much articles cannot be adequately overseen and upkept by so few users. The main risk with allowing bloated wikipedias to remain in the list is of course to generate an epidemic of competing racing wikipedias. Grasyop ✉ 16:10, 16 November 2007 (UTC)
- – The assumption about mass-created articles is also discutable. Some of them (cf. e.g. nl:Aignes, or nl:Adams (Tennessee)) seem to me as good as non-mass-created stubs. In fact, isn't the problem stubs themselves, not how they were created?
–I've heard before that the upkeep of the articles should be difficult -- but why? Undesired changes show up in the "recent changes" and can be dealt with; any improvements to the articles can be done either individually -- if someone is interested in a particular city -- or automatically, if it is a standard change (like the links from geographic coordinates to map sites currently being added to the Volapük Wikipedia). Which precise difficulty is meant here?
–If all Wikipedias explode, this would simply be the final proof that article count is not a good parameter to consider. People should pay attention to other criteria instead. This is a problem only if for some reason we want to keep article count as an important indicator -- and there are lots of reasons not to do that, among which the 'exploding wikipedias' argument. --Smeira 18:18, 16 November 2007 (UTC)
- – The assumption about mass-created articles is also discutable. Some of them (cf. e.g. nl:Aignes, or nl:Adams (Tennessee)) seem to me as good as non-mass-created stubs. In fact, isn't the problem stubs themselves, not how they were created?
The "Volapük Incident" somehow reflects that most people emphasize too much on the number of articles as the major factor of the successfulness of a Wikipedia. For example, the main page of Wikipedia show the top 10 most articles Wikipedia, instead of the top 10 most first-speakers Wikipedia. I think we should do something to de-emphasize the importance of the quantity, so that people are more aware of other factors, such as quality. -- Kevinhksouth 02:36, 16 November 2007 (UTC)
I am the "only contributor" in the Volapük Wikipedia. Here a few arguments against taking Volapük off the table:
- The emphasis on article count is misguided, as Kevinhksouth said above; cf. en:Misuse of statistics. Consider the List of Wikipedias by sample of articles for a better X-ray of how well Wikipedias are doing.
- As can be seen by having a look at the recent changes page, I am certainly the the most important, but by no means the only, contributor. Look up e.g. the contributions by LadyInGrey, Malafaya, Chabi, HannesM, Robert, as well as a number of anonymous contributions. I believe that one can objectively say, by looking at these pages, that Volapük is doing better in terms of activity than most of the, say, 100 smallest Wikipedias on this table.
- The question of bot-created stubs needs further discussion. With all due respect to Jimbo Wales, to consider a high number of stubs as creating a "misleading 15th position" for Volapük implies some assumptions about stubs and statistics that are disputable. Stub articles aren't bad; in the case of the Volapük ones, they contain relevant and accurate information, also found on all major Wikipedias. Stubs are OK, and they're a large percentage of all Wikipedias. Since stubs are also counted for all other Wikipedias, the 15th position is not "misleading": it is entirelly correct. What is misleading here is that "article count" is the only parameter used to define this 15th position, and people tend to consider it a direct measure of the worth of a Wikipedia (compare it to considering population size as the major criteria for how important in history a country is). Removing Volapük from the table in this case would be like saying: "OK, our parameter is bad; but let's keep it and just change the table so that it 'looks OK'". The answer a statistician would give is, of course, don't change the table, choose a different parameter. (Again I suggest the List of Wikipedias by sample of articles as an alternative.)
- I note other similar cases: Lumbaart, Cebuano, Piemontese... Any decision to remove Volapük from the table should probably (a) list criteria and (b) also be applied to them. (Wouldn't it in fact be better to redo the statistics so as to exclude stubs -- in case they are really to be seen as so bad?)
--Smeira 15:01, 16 November 2007 (UTC)
- Measuring wikipedia projects with NUMBEROFARTICLES is perhaps a necessary evil (evil as in WM:PIE, i.e. condensing a complex object to one dimension), for a common man would ask:
- How accurate or reliable is wikipedia?
- How comprehensive is wikipedia?
And NUMEROFARTICLES gives an answer to number 2. And then the wikimedia marketing department uses this number often.
But then we can also have a set of more specific and accurate measurements to answer more specific questions. Hillgentleman 16:47, 16 November 2007 (UTC)
- A necessary evil perhaps, but consider this: if it is possible to, e.g., add one stub for every entry of the NASA/IPAC Extragalactic Database, then a Wikipedia project could, in principle, grow beyond 10,000,000 in article count -- and yet all articles would be about extragalactic objects. So, of course, the number of articles is not really a good indicator even of how comprehensive a Wikipedia is. I suppose it would be better to define a measure based on the various major areas/fields of knowledge (the top categories of the major Wikipedias are a good example) and then check, say: (a) how many articles each has; (b) how big the articles are in average; (c) what are the quality indicators (how many featured articles? how many good articles? how many stubs?) for the category. Has a good census per category ever been done for any Wikipedia, by the way? Is it possible that the English Wikipedia has a bias towards, say, sitcom characters or movie stars (when compared with other categories) that has thus far gone undetected? --Smeira 18:18, 16 November 2007 (UTC)
- I would also like to see a census. I noticed that the mathematical and physical corpi in English wikipedia are growing in a healthy rate. It may be interesting to see the rate of wanted articles ->stubs -> articles. Or even a subfield (such as condensed matter physics, string theory, number theory) Hillgentleman 20:36, 18 November 2007 (UTC)
- A necessary evil perhaps, but consider this: if it is possible to, e.g., add one stub for every entry of the NASA/IPAC Extragalactic Database, then a Wikipedia project could, in principle, grow beyond 10,000,000 in article count -- and yet all articles would be about extragalactic objects. So, of course, the number of articles is not really a good indicator even of how comprehensive a Wikipedia is. I suppose it would be better to define a measure based on the various major areas/fields of knowledge (the top categories of the major Wikipedias are a good example) and then check, say: (a) how many articles each has; (b) how big the articles are in average; (c) what are the quality indicators (how many featured articles? how many good articles? how many stubs?) for the category. Has a good census per category ever been done for any Wikipedia, by the way? Is it possible that the English Wikipedia has a bias towards, say, sitcom characters or movie stars (when compared with other categories) that has thus far gone undetected? --Smeira 18:18, 16 November 2007 (UTC)
- Numbers are just that: numbers. They have no meaning aside from what people assign to them. vo.wp does have enough articles to rank it #15 when sorted by number of articles. Perhaps we should add a comment that the number of articles doesn't necessarily translate into the quality of the project. EVula // talk // ☯ // 20:04, 16 November 2007 (UTC)
Any talk about Volapuk "gaming the system" should also be talking about the Lombard wikipedia, whose abreviation "lmo" seems like it should be missing an "a". LMO:WP is mostly a collection of robot generated "information" about prime numbers, albums, songs ("X" is a song on album "Y".), astronomical and geographical stubs. Most of it seemed to be chock-full of anglicisms and broken templates copped from the English wikipedia. At least Volapuk's stubs seemed to have a little meat on its bones! -- Yekrats 21:34, 16 November 2007 (UTC)
- I agree. You might also add Cebuano, Piedmontese and a few others to this list. --83.85.142.49 10:01, 17 November 2007 (UTC)
I think that the formula (edits x users) would be a more accurate metric than number of articles. Edits count the overall efforts made in building some wikipedia, and users are a good approximation of real speakers. I made a table with this criterion and the result was interesting. 200.248.254.100 20:42, 21 November 2007 (UTC)
- What were the results? I'm curious. (By the way, by simply using edits x users, you'll get very high numbers -- en.wp will go above billions. And you're also not differentiating bot activity from human activity, which is important to some people.) --Smeira 21:42, 21 November 2007 (UTC)
- Indeed, the numbers are monstruous! Here there is a table with the top 20 Wikipedias by this metric:
...Language......Edits........Users.....EditsxUsers
01 English......180790311...5866062...1060527173325282
02 German........40854803....479104.....19573699536512
03 French........23842038....318916......7603607390808
04 Spanish.......13214222....571008......7545426475776
05 Italian.......13066244....221897......2899360344868
06 Japanese......16690476....172351......2876620229076
07 Portuguese.....8387444....310080......2600778635520
08 Chinese........5592182....373726......2089943810132
09 Polish........10850954....154799......1679716828246
10 Dutch.........10494629....141875......1488925489375
11 Russian........6482540.....85949.......557167830460
12 Turkish........2597139....119489.......310329541971
13 Swedish........5641347.....50016.......282157611552
14 Finnish........3944382.....69593.......274501376526
15 Hebrew.........4417015.....47546.......210011395190
16 Norwegian......3146962.....61655.......194025942110
17 Arabic.........1772701.....97316.......172512170516
18 Hungarian......2475127.....32374........80129761498
19 Romanian.......1634594.....44585........72878373490
20 Persian.........919454.....71126........65397085204
- Volapük is in 61º. 200.248.254.100 17:18, 22 November 2007 (UTC)
- Interesting, but it doesn't prove a whole lot. The number of users tends to reflect the population base more than other statistics, which is why, for example, Chinese, Spanish, Arabic, Persian and Portuguese rank more highly than on the main list. (There are some exceptions - Russian has fewer users than I would've expected, so is #11 on both lists, and Hebrew moves up to #15 despite its relatively small number of speakers.) Not only Volapük and Lombard, but also Esperanto and Catalan drop out of the top 20 because of their low number of users / low population base. And they have a lot of good quality content - better than some of the wikis listed here. 222.106.128.121 10:40, 23 November 2007 (UTC)
- It's all right! My point is only that this metric (edits x users)is better in average than count solely the number of articles. But due to complexity of parameters involved, I believe that a real satisfatory metric will never be reached - unfortunately. 200.248.254.100 20:00, 23 November 2007 (UTC)
- My original suggestion was using a sample of articles and comparing their level of quality by simply comparing article length (cf. the List of Wikipedias by sample of articles. The disadvantages are that some languages have longer or shorter words -- one would need some better measurement of the informational content instead. Another problem is the sample to choose -- I used the List of articles that all Wikipedias should have simply because I thought this list would have been widely used in other Wikipedias (so that I was giving them a better chance at getting a good classification). But the list is clearly Western-biased, which puts non-Western Wikipedias at a disadvantage. It is also true that some Wikipedias concentrate their efforts in some areas (more interesting to their users), and it's unfair to them to measure them by this list. I'd certainly find it interesting if someone could point me to a better chosen sample of articles... --Smeira 21:14, 23 November 2007 (UTC)
- Let me add that counting the number of bytes (you're counting bytes, not characters) is also misleading. In the UTF-8 encoding you have languages such as Georgian where words are usually longer than in other languages but also, as they use their own alphabet (each symbol corresponds to a letter, not a syllable or word, like in Chinese or Japanese) and it's 3-byte encoded in UTF-8, you'd get a very biased count. For example, the word საქართველო (meaning Georgia) contains 10 Georgian letters which would result in 30 bytes on a page just for that single word, over 4 times longer than its English or Spanish counterpart. Malafaya 14:31, 24 November 2007 (UTC)
- I'm not sure I'm counting bytes rather than characters. I'm using the python function len() (on the text of each article: i.e., text = page.get(), followed by length = len(text)). At the talk page of the List of Wikipedias by sample of articles, User:Hillgentleman claims this function counts each Chinese character as a unit, not as a sequence of two or three bytes. At least for Greek and Russian letters, len() also counts each character as one unit. Do you happen to know what len() does with Georgian characters? --Smeira 16:44, 27 November 2007 (UTC)
- I had forgotten about this thread. Sorry for that. I just made a test with the len function in Python. A single Georgian character returns a len() of 3. The same should be happening to Russian as its alphabet is also 3-byte encoded (try print(len('я')). Malafaya 15:57, 5 December 2007 (UTC)
- OK. I must admit I'm partially wrong. There is a difference whether you use the Unicode specifier u before the string or not. In the first case, yes, you get 1. If you don't use u, I guess it assumes a stream of single byte characters and it would become 3. As I suppose you're using the Unicode codec, you should get the correct value. Malafaya 17:40, 5 December 2007 (UTC)
- I'm not sure I'm counting bytes rather than characters. I'm using the python function len() (on the text of each article: i.e., text = page.get(), followed by length = len(text)). At the talk page of the List of Wikipedias by sample of articles, User:Hillgentleman claims this function counts each Chinese character as a unit, not as a sequence of two or three bytes. At least for Greek and Russian letters, len() also counts each character as one unit. Do you happen to know what len() does with Georgian characters? --Smeira 16:44, 27 November 2007 (UTC)
- Let me add that counting the number of bytes (you're counting bytes, not characters) is also misleading. In the UTF-8 encoding you have languages such as Georgian where words are usually longer than in other languages but also, as they use their own alphabet (each symbol corresponds to a letter, not a syllable or word, like in Chinese or Japanese) and it's 3-byte encoded in UTF-8, you'd get a very biased count. For example, the word საქართველო (meaning Georgia) contains 10 Georgian letters which would result in 30 bytes on a page just for that single word, over 4 times longer than its English or Spanish counterpart. Malafaya 14:31, 24 November 2007 (UTC)
- My original suggestion was using a sample of articles and comparing their level of quality by simply comparing article length (cf. the List of Wikipedias by sample of articles. The disadvantages are that some languages have longer or shorter words -- one would need some better measurement of the informational content instead. Another problem is the sample to choose -- I used the List of articles that all Wikipedias should have simply because I thought this list would have been widely used in other Wikipedias (so that I was giving them a better chance at getting a good classification). But the list is clearly Western-biased, which puts non-Western Wikipedias at a disadvantage. It is also true that some Wikipedias concentrate their efforts in some areas (more interesting to their users), and it's unfair to them to measure them by this list. I'd certainly find it interesting if someone could point me to a better chosen sample of articles... --Smeira 21:14, 23 November 2007 (UTC)
- It's all right! My point is only that this metric (edits x users)is better in average than count solely the number of articles. But due to complexity of parameters involved, I believe that a real satisfatory metric will never be reached - unfortunately. 200.248.254.100 20:00, 23 November 2007 (UTC)
- Interesting, but it doesn't prove a whole lot. The number of users tends to reflect the population base more than other statistics, which is why, for example, Chinese, Spanish, Arabic, Persian and Portuguese rank more highly than on the main list. (There are some exceptions - Russian has fewer users than I would've expected, so is #11 on both lists, and Hebrew moves up to #15 despite its relatively small number of speakers.) Not only Volapük and Lombard, but also Esperanto and Catalan drop out of the top 20 because of their low number of users / low population base. And they have a lot of good quality content - better than some of the wikis listed here. 222.106.128.121 10:40, 23 November 2007 (UTC)
- Volapük is in 61º. 200.248.254.100 17:18, 22 November 2007 (UTC)
- As I did elswhere already, let me again suggest a count based on words rather than characters or bytes. This count should not include grammar-stuff like articles, prepositions, and the like, for languages that have them. Identifying written words ist not trivial for languages such as Thai and Chinese, but possible with free software. Whether or not it is desirable to split compound words, and count them individually for those languages that use them excessively, such as German and some Indic ones, remains an open question to me. I know, I am suggesting quite some programming, and quite some server load, if articles need to be examined for their word counts. Yet, we can either:
- incorporate a word count algorthm in the Mediawiki "article save" function, and keep word counts in the article data base, or
- find "bytes per word" averages for each language by one or another sampling method, and use them to approximate the wanted figures with a simple straightforward calculation.
- --Purodha Blissenbach 17:11, 9 December 2007 (UTC)
- I think your suggestion is too complicated. Figuring out word boundary logic for 100+ languages is not trivial and would end up being a huge research project. I believe calculating character count and weighting it by a language specific value would give you a similar result to what you're after. For example, German usually requires more characters than English for similar information, while Chinese requires less characters than English. Also, IMHO a metric needs to be a simple calculation otherwise people will have more to complain about it. (Why is my wiki rated so low? what's the word boundary logic used?, what's defines a compound word?, who decides what a preposition is?, etc.) --MarsRover 20:41, 9 December 2007 (UTC)
- As I did elswhere already, let me again suggest a count based on words rather than characters or bytes. This count should not include grammar-stuff like articles, prepositions, and the like, for languages that have them. Identifying written words ist not trivial for languages such as Thai and Chinese, but possible with free software. Whether or not it is desirable to split compound words, and count them individually for those languages that use them excessively, such as German and some Indic ones, remains an open question to me. I know, I am suggesting quite some programming, and quite some server load, if articles need to be examined for their word counts. Yet, we can either:
- In my opinion, bot edits, or bot created stub articles are not more or less valuable than those typed by a human. It is their uniqueness, and their correctness, that matters much more.
- As a reader, I am happy if I get any information at all, even if bot created. I for one can read english, so I have access to the largest ressource there is, and likely to one of the sources that many bot-created articles in other language Wikipedias were derived from in one way or another. Yet assuming I were unable to read english, unless a human had already created an article about some subject matter, I certainly would prefer some bot-created stub with rudimetary data over no information at all.
- I love good and thorough articles, and often I read them just for fun even if I should not use so much of my time on them. But, very often, I really do need no more than an introduction: knowing which domain a word or subject belongs to, can be already enough to fulfill my immediate information needs. These aspects should be considered when diskussing bot activities and stub creation, and their value.
- As a writer, I add to existing articles about 90% more often than creating new ones, even if I stumble over imho missing ones. I don't know why it seems to feel easier and safer to me, to extend something which is already there. I believe that this behavioral pattern applies others as well. So stubs are inviting writers! Plus, in many fields, like geography, biographies, abbreviations, there are data bases available, that make creation of stubs by bots really easy. I am always in favour of unburdoning humans from tasks more effectively perfomed by machines. There remains enough to be done by men, which machines cannot do anyways, enough for dozens of years of collective work in the Wikipedias alone! (just m 2 ct.) --Purodha Blissenbach 17:11, 9 December 2007 (UTC)
- So, it looks like there was consensus, but nothing ever got fixed. Can we just remove Volapuk from this table, as it is generally agreed that it is confusing and wrong to have it here?— The preceding unsigned comment was added by Jimbo Wales (talk)
- The one thing that is generally agreed is that: Numbers are just numbers (in Evula's words). It is unfair to say "nothing ever got fixed", when many wikmedians are discussing and studying different statistics of Wikipedias and www.wikipedia.org portal design. As for this list, the outstanding fix, whether the table is just a table of raw numbers or not, was not do anything creative with the order, but to add a column "Notes" at the end. However, the column was wiped out every time somebody gets an update from s23.org and we have given up adding it back. Hillgentleman 04:35, 9 July 2008 (UTC)
- Add the comments outstide the section of the automatic include, so it won't be wiped off with every update. From a technical point of view, vo: is part of this statistics. RobiH 12:12, 22 February 2009 (UTC)
added column
editWe have added a column "Comments" at the right-end of the table, after discussions on meta:babel. Please kindly adjust your robots to accomodate this little change. Any comments? Questions? Objections?Hillgentleman 19:20, 24 November 2007 (UTC)
- You had added the new column unilaterally. Please discuss the utility of the column before adding it. --202.79.62.21 02:24, 26 November 2007 (UTC)
- "Comments" is not a utility. It was proposed on babel. Yes we should discuss it. Thanks. Hillgentleman 19:13, 26 November 2007 (UTC)
I guess those wikipedias labeled 'bot articles' will have a problem with it. Everyone else should be ok. Of course, my own opinion. 199.239.101.125 21:08, 26 November 2007 (UTC)
- I suppose so -- but at least the column describes a fact (which can be disproved by those who think it was unfairly added by simply showing that the number of human-created articles in their Wikipedias is higher than that of bot-created articles -- and then the comment is removed). Of course this should not be assumed to imply necessarily prejudice against bot-created articles: despite received wisdom, bot-created articles can be good (at least IMHO). But since so many people have an issue with bot-created articles, the suggestion of adding such a column (in the discussion on meta:babel) to warn them was seen as a lesser evil. Let everyone think what s/he wants about this fact.
Note also that, in principle, this column can be used to add other kinds of comments. --Smeira 16:38, 27 November 2007 (UTC)
- An alternative is "bot-intensive". The issue is that "intensive" is vague; Wikipedia in English can also be called "bot-intensive". Else just use the three letters "BOT", or even a single letter "B", which would urge the reader to read the footnotes. Hillgentleman 03:30, 30 November 2007 (UTC)
- Yes. It is for all kinds of comments, and for reminding people that statistics can lie. Hillgentleman 03:32, 30 November 2007 (UTC)
- I like the idea of a 'B' marker. To prevent the English Wikipedia from qualifying, I think it should be reserved for Wikipedias in which more than a threshold percentage (50%, 75%, 85%...) of the articles was bot-created. Details could be explained in the notes. --Smeira 19:03, 1 December 2007 (UTC)
- Yes. It is for all kinds of comments, and for reminding people that statistics can lie. Hillgentleman 03:32, 30 November 2007 (UTC)
Country of origion
editIt mite be useful/interesting to have the contry/contries the languages are used in. There may be problems with the fact that some languages that are used world wide(many contries) but they could be solved some how. MAybe
- Lanugauge: Loluish
- Spoken in : Loluland, botto, mamaty, beekar, world wide
~ZyMOS
- Hello, nice idea, though imho, therefore exists the link to wikipedia, where the information is much better and we save double work, thanks, best regards, --birdy geimfyglið (:> )=| 01:15, 30 November 2007 (UTC)
Stub ratio - what is a stub?
editThe depth is calculated also depending on the stub-ratio. My question is: what is the criteria used to recognize a stub? Malafaya 15:27, 5 December 2007 (UTC)
(Take a look at a typical 10Kb page at Georgian Wikipedia and a less "stubby" page in English Wikipedia (bigger character count), although not related to the same subject. The English page is less than 7Kb. Wikis using scripts/alphabets other than latin have a clear advantage using byte-size criteria. Malafaya 15:47, 5 December 2007 (UTC))
- From what I understand, "stub-ratio" is a misnomer. It is simply "good articles / total of artilces", i.e. it is actually the rate of good articles, not of stubs. And the criteria are embedded in the wiki software itself: if you do a statistics query, it returns a value for GOOD and TOTAL, I forgot on which criteria. --Smeira 17:41, 5 December 2007 (UTC)
- Good are articles (main namespace) and Total are total pages in all namespaces. As for the stub, what is the size that is considered a threshold between a stub and non-stub for WikiStats? Malafaya 10:23, 6 December 2007 (UTC)
- If I understand correctly, "good" also excludes redirects. --83.85.142.49 23:58, 6 December 2007 (UTC)
- No. "Good" is equal to the pages as defined in Wikipedia:What is an article?. Also, I think "stub" (good pages / total pages) in Wikistat is misleading, and it should be renamed in order to avoid confusion with "stub" (short & incomplete acticles) in Wikipedia. -- Kevinhksouth 04:15, 7 December 2007 (UTC)
- Alright. I see no sense in the current formula. So, depth's 3rd factor apparently is proportional to the ratio of non-articles compared to articles. That means if I have 10 good articles and 2000 talk pages and alike, I will have a 3rd factor with a value of (1 - 10/2010) ~= 0.995 with the highest achievable value being 1. Well, if depth is supposed to evaluate the amount of discussion pages you get per real article then I suppose it's correct... (grin) Malafaya 15:25, 7 December 2007 (UTC)
- It's actually all non-article pages, templates, categories, etc. More of these 'support' pages, the better the structure. So It's ok to have that as a proportional factor. However, I believe second and third factor cancel out each other (for the most part), leaving articles/edits as the dominant factor. Need to fix that.
- 199.239.101.125 19:37, 7 December 2007 (UTC)
- You can assume article talk and other discusson (by average) to increase article quality, OK. I disagreee, however, with the idea to call category pages "support pages" like templates, images, or talk pages, since good category pages are giving some information as well, hence existing non-empty category pages should be counted somehow more in the way article pages are. --Purodha Blissenbach 17:20, 9 December 2007 (UTC)
- No. "Good" is equal to the pages as defined in Wikipedia:What is an article?. Also, I think "stub" (good pages / total pages) in Wikistat is misleading, and it should be renamed in order to avoid confusion with "stub" (short & incomplete acticles) in Wikipedia. -- Kevinhksouth 04:15, 7 December 2007 (UTC)
- If I understand correctly, "good" also excludes redirects. --83.85.142.49 23:58, 6 December 2007 (UTC)
- Good are articles (main namespace) and Total are total pages in all namespaces. As for the stub, what is the size that is considered a threshold between a stub and non-stub for WikiStats? Malafaya 10:23, 6 December 2007 (UTC)
Making the table user-sortable
editNote to future editors: I modified the header for the table to say class="sortable". This makes the table sortable by any of its columns, so you can easily look up a particular language, or see the list in order of number of edits or number of users if you want, instead of number of articles. I also edited a few of the longer language names to use <br> to split the names onto two lines to make the table fit more easily in a typical window. Specifically, I edited the Language column for numbers 79, 129, 140, and 161 (Belarusian, Franco-Provençal, Pennsylvania German, and Assyrian Neo-Aramaic), and I edited the Language (local) column for numbers 38, 60, 79, 114, and 163 (Cebuano, Serbo-Croatian, Belarusian, Norman, and Chavacano). I hope these changes will be easy to preserve in future edits. Plenty 21:41, 9 December 2007 (UTC)
Non existing Wikipedias
editHoi, In this list, at the bottom, there are groups of Wikipedias that do not exist any more. You will for instance not find the Herero or the Ndonga Wikipedia. They have been closed. They will have to go through the Incubator to be recreated. I think that this list should only include active Wikipedias. The alternative is that we might as well include Klingon (not a good idea). GerardM 08:16, 31 December 2007 (UTC)
- Ndonga is not closed. And the list already includes Klingon. -- Prince Kassad 11:33, 31 December 2007 (UTC)