Concept: Provide an algorithm based article commit check system in MediaWiki similar to the current edit conflict (a context sensitive tutorial, especially for new users) to increase good edits (e.g. new articles with more content from the first version on) and thus reduce unwanted edits (vandalism, flames on discussion pages, edit wars, stubs) of the same person. The proposed ideas are strictly based on certain formal aspects of editing. This is no suggestion about censoring of articles by content and also no suggestion about creating an elitist userbase. Those algorithms are based on formal aspects of common sense e.g. about length and structure of articles.

Ideas

edit

These individual ideas are primary for illustration what the possibilities of this proposal are. If a community uses them or not or maybe some other individual ideas or how the individual community settings are like is entirely upon their decision; see Edit hints#Implementation.

Reduction of stubs

edit

Problem

edit

In Wikipedia it is common sense, that stubs aren't that good, but there is no common sense (but a heated discussion) if those short articles should be radical deleted or simply kept until someone wants to enhance them.

Many stubs are written by new users (or more evil by bots) that either want to impress others with their huge amount of new articles or simply don't know it better. Stubs are causing frequent anger for all Wikipedians. Especially for those who want to keep Wikipedia "clean" and thus set them onto vote for deletion and those who are angry that "their" articles get deleted. The social solution that you write a personal note to the authors of stubs causes much personal working time (which could be spend on articles much better) and frustration if people don't want to understand that a stub is not good. And of course the already written stubs are still there even if the person stops writing such short articles.

So a social only solution (deleting or teaching) doesn't work that good and would work better if one could concentrate on the real problematic cases and have some soft kind of preventing them.

Solution

edit

If an (newly created) article has less than a certain amount of human readable text (counting only sentences, not meta info as tables), e.g. less than three sentences (or alternatively maybe less than 500 characters in sentences; note: these values given here are only for illustration and can't be hardcoded to the software. the same applies to all other values in the other individual ideas) give a smart and polite warning to the user after pressing "save page" (like at edit conflict) as: "This article is rather short. Don't you know a little bit more about this topic? If you want to know how to write an article in Wikipedia simply look at <link>. If you really want to submit this article press "save page" and please consider enhancing it later."

This has the advantage that a new user inmediately notices that stubs aren't that nice but on the other hand they aren't excluded from Wikipedia and are given help and advice and since the amount of new stubs will (hopefully) be reduced and the number of articles with more content (by the same person) will be increased the admins have more time working on other things and of course one frequent flame war topic on mailing lists will hopefully be over. ;-)

Drawbacks

edit

No one knows how well this works, but if we assume good faith (which has proven to be workable in most cases) it should work. There is some tweaking needed (so the stub threshold needs to be configurable) especially with respect to disambiguation pages.

There are times when a very short article is of no small value. For example, it may be useful to identify a moderately famous author in terms of birth and death dates, nationality, genre, and title of a major work. We don't want to discourage that. Of course one hopes the stub will grow, but that doesn't mean it is a liability.

In particular, when working on a longer article, it is perfectly common to discover that a term is red-linked and that a brief explanation would be useful. Writing it inline adds information only to the one article; a stub provides the information for any other articles that may link the same term. Again, we don't want to discourage that.

Further, a lot of newbies and a lot of editors with slow connections tend to write their articles by saving a sentence or two at a time. That should not be discouraged, either.

Reduction of meta information

edit

Problem

edit

Recently articles are getting more and more meta information as boxes, tables ore whatever. This problem is also a constantly controversial point, which is especially annoying in stub articles with lots of meta information but no sentences.

Solution

edit

As noted above a friendly software solution would help here quite a lot. So there could be defined another threshold, e.g. the percentage of meta data to readable full sentences and a similar polite warning as above. As templates are in heavy use for adding meta data to articles the source code of the templates an articles uses need to be expanded and counted to the article source code too as pure meta content regardless what they contain. Meta information can easily be separated by a parser from pure sentences and thus be easily counted.

Drawbacks

edit

An infobox about a topic that is not otherwise covered may be better than nothing, and structured information like this is often easier to port from one language to another.

edit

Problem

edit

Recently Wikipedia articles get more, and more and in many cases irrelevant (external), web links which originate in several motivations of the editors. This problem has already led to the drastic solution in the English Wikipedia to add a "nofollow" tag for search engine robots server side in every web link so that those weblinks don't get a high Google page rank.

The most annoying case is so called link spam, when a person adds the same URL to many unrelated articles again and again (in worst case with a bot). The motivation of those editors is clear: They want to be at the first places in the search results of web pages that take the linkage of a web site as basis for their result page rank. Another group are people that simply want to add their favourite web site to one or some articles but don't want to look if their web link really enhances the content provided by the article and combined with that some people don't want to write content but simply put a web link onto a page to replace missing content. The result of all those motivations is that Wikipedia would move towards a web link directory, which is clearly not its aim.

Of course there exist guidelines that indicate that you should not put much more than five web links onto a page (this number applies for the German language Wikipedia, the other languages sure have similar values) but this policy is in most cases not known or in some cases even ignored. That people don't really know about the linkage policy even if it is linked prominently in the Wikipedia handbooks and tutorials can be seen very nice in an effort at de:Benutzer:SirJective/Wartungslisten/Artikel mit vielen Weblinks where a user made a SQL query of a data base dump and found more than 500 articles with more than 20 web links only in German language Wikipedia in article namespace. The reactions of users after the web links got reduced (there where cases with more than 300 web links in one page) are in many cases that they simply didn't know that there is a policy not to add much more than five web links to articles and unluckily there are often reverts of people that want to "defend" their web links, which leads to a never ending work which could be spend much better on enhancing articles with text an images if there was a better solution.

Solution

edit

The link spam problem already gets addressed server side with the Spam blacklist. An edit containing such an URL on that list gets blocked by MediaWiki software. Of course such a rude instrument has to be used very carefully and can't be used in all the other link problems mentioned above. So here again a context sensitive hint would very much help. If an article contains more than a certain number of (external) web links, e.g. let's say 10, give a smart and polite warning to the user after pressing "save page" (like at edit conflict) as: "This article contains many web links. Are you sure that all of them are really interesting to the article? If you want to know how to add web links the right way simply look at <link>. If you really want to submit this article press "save page" and please consider reducing the web links later."

This would point people context sensitive directly to the policy and people would know from the beginning that there is a certain web link policy and don't need to be teached afterwards. So such an action as done in German language Wikipedia would be much less work with this hint and would cause much less trouble then.

A solution as a "nofollow" tag or a "spam blacklist" is always a drastic method as good and interesting web links also loose their Google page rank. This hint method would be much more polite and would help people much more better in understanding why there is a certain policy.

Drawbacks

edit

Sometimes it can be necessary to have quite some web links on a page e.g. at a community portal page and of course people often link a certain version of an article with a web link instead of a wiki link. This could maybe be addressed that there is a community admin configurable white list of pages that can contain many web links without getting such a hint. Of course as this is only a hint, nobodies edit really gets blocked.

Slashdot effect (or for Germans: Heise-DDOS)

edit

Problem

edit

Often popular news sites embed Wikipedia links within their articles (which is a good thing, since it shows how good/popular we are) out of several reasons (most times explanation of background topics ore some special word). The negative side effect is, that many people following this link are looking for the first time at Wikipedia and can't believe it that they can change everything and write anonymous some virtual graffiti on it e.g. "Your page sucks." or "Haha you idiot you have a security hole, I can change everything." On the other hand, anonymous edits are very valuable in most cases (e.g. correction of typos) and help quite a lot to get new Wikipedia authors since the entry barrier is very low.

At the moment this problem gets "solved" by temporarily protecting this article if the (anonymous) vandalism gets too strong which is not that good, since the "normal" authors can't edit the article any longer.

Solution

edit

If an article has a very fast workflow (let's say more than five edits per hour) the software protects this article automatically against anonymous edits until the edits per hour go under a certain threshold. Because of this fast editing the article seems to be worked on by many regular users and so every failure will be found fast enough. This way the "Slashdot storm" doesn't exclude so many people since now the article hopefully does not get so much vandalism and thus doesn't need to be protected. Of course this affects only a small percentage of all Wikipedia articles (which nonetheless are quite time consuming cause of this vandalism) so anonymous edits are still very welcome. And of course an admin does not need to protect a page manually and can concentrate on other more important things.

Perhaps it would also be sufficient if the anonymous user gets a similar friendly warning as with all the other ideas on the other hand those users that get influenced by this are in this case pressumably less probably vandals.

Drawbacks

edit

Not sure if this should be applied to all namespaces (e.g. also talk-pages). Having this also on talk pages would be good in cases where people put anonymous rubbish into the discussion (especially a problem with protected pages) on the other hand a discussion has often frequent changes and should be very open. Perhaps it should only be applied to article-namespace or better a configuration switch for admins to which namespace they want to apply this idea (and all other here proposed ideas).

Legitimate hot topics (e.g. an ongoing disaster) often accumulate vast information from previously uninvolved users. We do not want to discourage that.

Also, at least in the English language Wikipedia, slashdotted articles often improve considerably over the next 24 hours. The process may not all be pretty, but the results are good. Reverting is easy. Winning back people who may be discouraged by their inability to edit may not be.

Too many discussions

edit

Problem

edit

Some Wikipedians are frequently using Wikipedia/Wikinews for discussions only (and in some cases rant and flame...) instead of writing by themselves on articles. Another group of Wikipedians is making many suggestions or wishes on discussion pages but hesitates to make changes by themselves to articles because they think they aren't good at writing.

The social solution is that you politely say to those Wikipedians that Wikipedia is no discussion forum but a collaboration area concentrating on articles and that they should be couraged to write their ideas by themselves into the articles. Unluckily especially in heated discussions there is sometimes no chance in convincing those people that they can't demand something from others they don't want to do by themselves (and of course it is not always easy being friendly and polite for human beings if it was the 10. person you tried to explain it this day).

Solution

edit

Above a certain percentage of edits in talk-namespace (discussions) to edits in article-namespace (the real work) of overall edits by that user (let's say a value in the range above 75% talk edits to all counted edits of this user) the user gets again a polite automatical warning on submitting the latest edit to a talk page e.g.: "You seem to have lots of ideas but unluckily you aren't submitting them directly to the articles by yourself. Please also consider that Wikipedia is no discussion board. Be couraged and try to add your ideas and wishes by yourself to the articles. If you want to know how to write an article in Wikipedia simply look at <link>. If you really want to submit this text press "save page"." Of course this notice doesn't make sense if a person has lets say less than 10 edits overall, so there has to be meet a second condition that the user has more than a specific number of edits until he can get this notice. It is probably very essential for (practical) realisation of this idea that only the last 50 (or some other number in this range) overall edits (to the affected namespaces) of this user get counted. First it reacts more directly (if users have a rather long edit list) to a change of edit behavior and second it doesn't consume so much computing time (generating the edit list of a user takes fairly long) which is an important issue on our database servers. So a global configuration switch of maximum counted edits is necessarry (so that e.g. a language community can't cause to heavy load with their community settings). This would mean that in worst case after 13 subsequent edits (with 75% threshhold) on articles you don't get the message if you write again to a talk page.

That way people get a friendly notice and others don't need to waste so much time in teaching them such easy things and can concentrate more on the real fun as working together on a subject with the same person. And of course people unwilling to add something to articles get some time for thinking and after a while change their mind or get frustrated and stop flooding talk pages. (Of course they get faster frustrated than people in the same discussion that also work a lot on articles).

Drawbacks

edit

Not sure if this also should be applied to other meta-namespaces as e.g. "wikipedia:" and "user talk:" which is also used for discussions. One problem arises if people are helping a lot with meta stuff (e.g. vote for deletion and review of articles) so perhapes counting other namespaces than article and user talk pages as discussion wouldn't be that good.

The last thing we want to do on controversial topics is to encourage people to solve problems by edit wars rather than by discussion. This problem maybe could be solved if the edit war hint has a higher priority in an affected article.

Especially this hint implies a heavy load on the database servers and thus needs to be tweaked very carefully.

Edit wars

edit

Problem

edit

If there is a topic in question and users disagree about certain views and/or what should be written in this article about that topic they often revert the version of the other person and after some hours the version history has reached new horizons where no article has gone before and the authors are full of adrenalin and maybe don't want to discuss with each other to find a solution. The actual social solution is a write protection of the article by an admin, so that the authors can calm down and maybe now can discuss their individual points and settle down for a solution or if this doesn't help an admin tries to bring them together and in worst cases there is blocking of IPs and users if nothing helps.

Solution

edit

An edit war can be detected (theoretically) fairly easy by software, since there are in most cases only delete/replace operations. This can be detected by diffs and compressing subsequent diffs, which is nothing else than comparing those diffs and is already in use in MediaWiki 1.4 in a little different aspect (compression of subsequent versions of an article for getting more free disc space). So if an article has more than a certain amount of such operations per hour (let's say 5) a user gets a polite warning at submitting the article (again similar to the edit conflict) e.g.: "There seems to be disagreement about the content of the article. Please try first to discuss this problem on the talk page <link to talk page> with other authors and try to find a solution in a friendly discussion. If you don't want to do so try to take a short break and edit again later on this article. If you have questions how to write a good article in Wikipedia simply look at <link>. If you think that you tried to discuss this topic enough with the other authors and/or really want to submit this text press "save page"." Eventually this can be combined with the quota of talk page edits to that article by this user, so that a user who tried to get in contact with the other person doesn't get that fast this notice.

That way users get the possibility to calm down and are adviced politely to get in contact with each other and of course the speed of an edit war gets reduced because of this warning and users that don't want to discuss are on a disadvantage and get earlier frustrated since their contrahents that tried to discuss don't get this warning that fast. Edit wars are constantly causing lots of work especially for admins and people who are under attack by vandals although the vast majority of articles didn't even have an edit war. And of course edit wars don't make fun and there are frontiers of solving an fully running edit war by social interaction over internet (personal contact can solve much more, but we are an internet project with less personal meetings). So it is essentially that an edit war gets detected early (if an admin occours it is already running and running) and that people within an edit war get constantly friendly but resolute notices, which you can't demand always from a human beeing. And of course less edit wars mean less frustration for everyone.

Drawbacks

edit

In many cases a user tries to get in contact directly via individual talk page (since the article talk page is already full of flame wars), so this should perhapes also be taken in account, so that people that tried to get in contact with the others don't get frustrated so much but this is rather complicated to be taken in account by software (finding the edit to a talk page of a involved user). Perhaps it is to much overhead counting the talk edits since in many cases an edit war also has a heated discussion on the talk page. Not sure how to apply this edit war hint to namespaces dedicated for discussion as e.g. "wikipedia:" (where the talk pages don't make so much sense)

An edit war between two equally unprincipled users cannot be readily distinguished from reverting a persistent vandal.

Advantages

edit
  • fully automatically:
    • Less meta work for admins and all other authors, so they can concentrate on other more fun/important things and of course not everything can be solved via direct social interaction, even for angels (some people want to be the admins like angels and are very upset if one makes a failure and think now they can do lots of failures by themselves)
    • Computers have no personality and no emotions, so if someone gets such a notice he can't argue with the software (okay complaining about the software with others will always be done...)
  • No valuation by content, so none can complain about censorship. This complain arises often if someones edits aren't accepted by other people out of several reasons (Okay there will always be people who think that THEY are behind everything whoever THEY are, but those poor people need professional help and no Wikipedia).
  • Although there is no valuation by content there will be a soft effect towards more cooperation and objectivity and in the end better overall quality of content. So the already remarkable internal climate of cooperation and objecticity of the majority will be enforced and a small minority of vandals doesn't consume so much valuable time of others.
  • We don't loose our openness to the outside with such a solution and increase wanted edits and reduce unwanted behavior at the same time from the same person.
  • Problems where there can't be found a common sense (as e.g. deleting or not deleting of stubs) can't be solved otherwise.
  • Problems get solved or indicated before or directly when they arise and not afterwards as at the moment with the social solutions. So prevention is always a good thing.
  • Context sensitive hints are more wiki-like than arbitrary policies of a community.

Disadvantages

edit
  • If the algorithm doesn't work that precise there will be negative side effects and so it needs to be done very carefully.
  • Different communities may have different behavior (e.g. the Japanese are searching first a compromise on the talk page and then submit it to the article while the Americans first submit and then talk, according to statistical research of Jimbo Wales).
    Actually this is something I've been told anecdotally, and I'm planning research to confirm if it is true or not. Still, the general point is perfectly valid. However, if these things are not hardcoded, but are somehow templated so that users can change them, then local cultures can be easily accomodated. --Jimbo Wales 23:31, 8 Jan 2005 (UTC)
    Yes this is also a very important point of me in my proposal that those hints aren't hardcoded but highly configurable by the communities, so that they can easily be accomodated to different cultures (I personally fear that some of the "contras" are due to a language barrier that way that some people didn't recognize this important point in the implementation section). Arnomane 20:44, 10 Jan 2005 (UTC)
  • Most if not all of these suggestions are related to highly controversial editorial matters; building them directly into the software (however "softly") implies that there is an accepted "correct" stance on each debate, and allows the Developers control over the community. When a human warns you, you can reply, and discuss the policy; this is central to the "internal climate of cooperation" fostered by all Wikimedia projects.
  • Several of the proposed changes would require quite heavy database access - for instance, calculating the ratio of different namespaces edited by a user would involve the equivalent load of viewing their edit history every time anybody edits anything. With the Wikimedia servers barely coping as it is, database access needs to be kept to an absolute minimum - hence why internal search and many Special: pages are often disabled - so it is questionable whether such extra queries would be manageable.

Implementation

edit

IMHO there is no change of the database structure needed (but some caching, that the relevant values do not need to get recalculated over and over again). All necessary changes only affect the MediaWiki software itself.

Additional for most flexibility there could also be a level system (with individual subsequent thresholds):

  1. Flag solution: Make a flag specifically to the subject in the user interface. For example, an edit war flag or a stub flag visibile in the recent changes our the individual watchlist. Other hints that affect the user itself (as the discussion hint) could be implemented with a flag in the users interface only visible to that user. (e.g. a small red flag with a link to an explantion text on top next to the username)
  2. Warning as on edit conflict: The solution that is most times suggested at the individual ideas.
  3. Blocking: Blocking of this users commit (as e.g. on the slashdotted-article-idea)

The individual ideas and escalation levels need to be plugin like, with configuration switches so that they can be switched on/off and tweaked individually by the language community admins to their liking (which idea they want and which namespace they are applied to and which numerical values they have). The escalation levels also need to be switched on/off individually for the individual ideas. This is a very important point to give the communities the power to decide. There also need to be some global maximum values e.g. for the "To much discussion"-idea to prevent to heavy load on the database servers.

So by default this system will be switched off. Every community has to decide by internal vote which individual ideas they want and what settings they have.

Poll

edit

If you simply want to express your personal view but have no further comment simply vote here. If you have suggestions or comments add them to the article or the talk page.

  • pro: It was my idea which I improved quite a lot after many fruitfull discussions at the Wikipedia meeting on 21C3 conference in Berlin, -- Arnomane 22:40, 3 Jan 2005 (UTC)
  • pro good idea! -- MichaelDiederich 14:07, 4 Jan 2005 (UTC)
  • pro yes indeed! --Reinhard 16:00, 4 Jan 2005 (UTC)
  • pro --Langec 23:27, 4 Jan 2005 (UTC)
  • pro --Maha 00:08, 5 Jan 2005 (UTC)
  • neutral do not name it edit rules but edit hints -- Nichtich 19:57, 5 Jan 2005 (UTC)
    I have now changed the title of the article from edit rules to edit hints. Arnomane 14:39, 12 Jan 2005 (UTC)
  • pro worth a try -- Schewek 17:27, 6 Jan 2005 (UTC)
  • pro The ideas mentioned above are really useful and would enhance Wikipedia a lot --LuTo 23:28, 6 Jan 2005 (UTC)
  • pro -- tsor 08:12, 7 Jan 2005 (UTC) (hier leider nicht angemeldet)
  • very much pro Very good, common sense-based proposals; it seems clear to me that there must be a common effort to keep working going on instead of spending too much time on discussing problems. --Lullus 15:00, 7 Jan 2005 (UTC)
  • very much contra; I've explained my issues in the various "drawbacks" sections. -- Jmabel 19:56, 7 Jan 2005 (UTC)
  • contra: as far as I can see, this would involve a computer making decisions on extremely controversial editorial matters (such as "minimum acceptable length"); like Jmabel, I would be happy for these edits to be flagged for attention, but I do not think they should require this kind of extra confirmation. - IMSoP 21:06, 7 Jan 2005 (UTC)
  • contra. I agree with Jmabel's arguments, and I have added a long rant (friendly rant :-) to the talk page. The short version: if you want to make this system as an enhancement to classifying articles, I'm all for it. If you want to let it talk to people, I'm all against it. JRM 21:19, 7 Jan 2005 (UTC)
    Update: I've read the response to my response. It is crystal clear to me that we are still very much in the discussion phase. This vote cannot be binding, it can only be a straw poll. You can use the results to gauge what has support and what doesn't. You cannot use them to derive that some form of consensus has been reached — you cannot vote on something if the proposal keeps changing. As the proposal currently stands, my vote is and will remain contra. There may be a new proposal that will give us more opportunity for discussion first — I'll be happy to participate. As a vote it's inappropriate; we should be discussing on what should be done before we simply say "yes" and "no". We haven't gotten to that stage yet; not by a long shot. JRM 17:38, 8 Jan 2005 (UTC)
    Yes this was for me from the beginnings on no real vote but simply an easy method for expressing your feeling if you have no further comment, aka: a poll. In the talk page I have written down several times why I was doing it and now I regret that I had added it to the article but I couldn't forsee everything. I never came to the idea that people could watch this as a binding vote. So I have changed the title of this section to poll. Arnomane 19:03, 8 Jan 2005 (UTC)
    Voting in general is a tricky concept. It was unclear whether this was already a worked-out proposal or not, I'm glad to hear that it wasn't. :-) Since it's clear that a lot of people like the idea, and others just don't like the way it's implemented, there's plenty of room for discussion left. After enough time has passed for people to make comments here, you could start a new page (under a more neutral name than "Editing rules", because we haven't decided yet whether they should be "rules") that is explicitly a discussion page, and not about voting of any kind. Votes make me nervous, because they force me to make quick judgements. :-) JRM 03:31, 9 Jan 2005 (UTC)
    If answered to your suggestions and made some more general remarks (of course not related to you) in Talk:Edit rules#The quadration of the circle Arnomane 12:43, 9 Jan 2005 (UTC)
  • contra (strongly oppose). Openness makes Wikipedia work; further software enhancements should not change the current method of editing but work transparently on top of it. I agree with Jmabel's and JRM's comments. -- w:user:Rbellin
  • contra I don't like feature bloat. Noisy 01:54, 8 Jan 2005 (UTC)
  • contra: please don't make MediaWiki nagware. --Ben Brockert < 02:33, 8 Jan 2005 (UTC)
    I have written a answer in the talk page regarding to the fear of bloat and nagware. Arnomane 13:24, 8 Jan 2005 (UTC)
  • contra Wouldn't deter vandals but would annoy users. What I'd like more is a tutorial mode for newbies that would tell them things like this in a very polite and informative way. en:user:zocky
    Well I think this is a tutorial mode for (new) users. Especially the first two rules are more or less an interactive tutorial mode. And even in the warning texts I suggested a link to a page where people get some hints how to write a good article. Arnomane 19:03, 8 Jan 2005 (UTC)
  • contra --PatrickD 21:06, 8 Jan 2005 (UTC)
  • neutral - depending on the final form of the proposal. A system of edit hints might be useful. A couple of extra tools (eg detecting likely duplicate or related entries to an article about to be created) too. Detecting probable vandalism (eg short anon edits to high-risk-categorised pages) could be flagged in a Special page; but why not just ban anon editing of high-risk pages. Rd232 14:58, 9 Jan 2005 (UTC)
  • re "Stubs": pro if the message is amended a bit (it should mention that it is computer-generated); re "Too much meta-data": neutral; re the other proposals: contra. -- en:User:Jitse Niesen
  • Pro on edit war and stub proposal. Having these messages automated helps editors to get back to other work. Contra against other proposals. [[User:MacGyverMagic|MacGyverMagic|(talk)]] 09:57, Jan 18, 2005 (UTC)
  • con because most of the so-called problems on this page are not prolems imho. Maybe the solution is to look at the positive side of some of the things mentioned, rather than only seeing them as problems.

Tabby 03:44, 27 March 2007 (UTC)[reply]