User:Invadibot/scope/meta-2
Task in meta.wikimedia.org | ||||
---|---|---|---|---|
# | Description | Authorization | Server | |
2 | Fixing links to Wikimedia projects and applying protocol-relative URLs | 27 Apr 2013 |
Goals
editThe objectives of this task are:
- to apply protocol-relative URLs in links with an external link format and an HTTP defined protocol, to allow users to navigate without changing the protocol in use;
- additionally, to fix and convert links with an external link format to interwiki format, providing that this change does not modify the text to show.
Fixing all mixed-content warnings will be a long effort by both the MediaWiki core and extension developers and the project communities. A number of templates, CSS, and Javascript on projects are improperly referencing resources, and as such, they are being loaded incorrectly. All resources should be referenced using protocol-relative URLs now (//<resource-url> vs http://<resource-url>).
[...]
All of the links in our content have changed from being protocol-specific to protocol-relative. This content is cached in our squid layer, and in our parser cache. We don’t wish to clear our entire cache immediately to fix this, as it would cause severe performance issues. Instead we will either clear the cache slowly over time, or we’ll let it clear naturally.
Procedure: conditions for the change
editApplication of protocol-relative URLs
editProtocol-relative URL is applied if the link found (all conditions required):
- has an external link format, which means:
- it is between single squared brackets and
- it starts with an URL;
- has an HTTP defined protocol (not HTTPS);
- points to (wikipedia/wikinews/wikisource/wikibooks/wikiquote/wikiversity/wiktionary/wikivoyage/wikidata/wikimedia/wikimediafoundation/mediawiki).org domain names, case sensitive;
- is not inside these tags:
categorytree
,comment
,charinsert
,dynamicpagelist
,gallery
,hiero
,imagemap
,inputbox
,invoke
,math
,nowiki
,pagelist
,pagequality
,pages
,poem
,pre
,property
,score
,section
,source
,syntaxhighlight
,templatedata
,timeline
; - is not in the exceptions list, and neither the page is in it.
Application of interwiki format
editInterwiki format is also applied if the link found (all conditions required):
- has an external link format, which means:
- it is between single squared brackets and
- it starts with an URL;
- has not an HTTPS defined protocol;
- points to (wikipedia/wikinews/wikisource/wikibooks/wikiquote/wikiversity/wiktionary/wikivoyage/wikidata/wikimedia/wikimediafoundation/mediawiki).org domain names, case sensitive;
- is not inside these tags:
categorytree
,comment
,charinsert
,dynamicpagelist
,gallery
,hiero
,imagemap
,inputbox
,invoke
,math
,nowiki
,pagelist
,pagequality
,pages
,poem
,pre
,property
,score
,section
,source
,syntaxhighlight
,templatedata
,timeline
; - has a defined text to show, which means it contains some text after the URL, separated by a space;
- has not a canonical URL format;
- points to a defined page after
/wiki/
path; - is not in the exceptions list, and neither the page is in it
Scope
editChanges that this task carries out can be made on all editable pages of a wiki. They are needed to allow users to navigate with HTTP or HTTPS, and to maintain the protocol in use along the navigation.
Code
editRegular expressions used in this task, ready to run with Pywikipediabot in user-fixes.py
file, are available here:
# -*- coding: utf-8 -*-
# <nowiki>
fixes['wmp-prurls'] = {
# ----
# From <https://meta.wikimedia.org/wiki/User:Invadibot/scope/meta-2/user-fixes.py>.
# By David Abián and Roan Kattouw.
# ----
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details,
# <http://www.gnu.org/licenses/>.
# ----
# To debug this script, please go to
# <https://meta.wikimedia.org/wiki/User:Invadibot/scope/meta-2/user-fixes.py>.
# The goals and procedures are explained in
# <https://meta.wikimedia.org/wiki/User:Invadibot/scope/meta-2>.
# ----
# Thanks for your help!
# ----
'nocase': False,
'recursive': True,
'regex': True,
'msg': {
# Please add an edit summary for your project,
# if not defined, and update the script in
# <https://meta.wikimedia.org/wiki/User:Invadibot/scope/meta-2/user-fixes.py>.
'an':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Apanyando vinclos enta prochectos Wikipedia y aplicando adrezas URL de protocolo relativo',
'en':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Fixing links to Wikimedia projects and applying protocol-relative URLs',
'es':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Arreglando enlaces a proyectos Wikimedia y aplicando direcciones URL de protocolo relativo',
'fa':u'[[:m:User:Invadibot/scope/meta-2|ربات]]: تصحیح پیوند به پروژههای خواهر و تبدیل کردن پیوندها به خنثی در برابر پروتکل',
'foundation':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Fixing links to Wikimedia projects and applying protocol-relative URLs',
'gl':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Arranxando ligazóns a proxectos Wikimedia e aplicando enderezos URL de protocolo relativo',
'meta':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Fixing links to Wikimedia projects and applying protocol-relative URLs',
'test':u'[[:m:User:Invadibot/scope/meta-2|Bot]]: Testing links to Wikimedia projects',
},
'replacements': [
(ur'\[http://([^@:/ ]+\.)wik(ipedia|inews|isource|ibooks|iquote|iversity|tionary|idata|ivoyage|imedia)\.org/', ur'[//\1wik\2.org/'),
(ur'\[http://wik(ipedia|inews|isource|ibooks|iquote|iversity|tionary|idata|ivoyage|imedia)\.org/', ur'[//wik\1.org/'),
(ur'\[http://(www\.)?mediawiki\.org/', ur'[//\1mediawiki.org/'),
(ur'\[http://(www\.)?wikimediafoundation\.org/', ur'[//\1wikimediafoundation.org/'),
(ur'\[//(www\.)?mail\.wikipedia\.org/', ur'[//lists.wikimedia.org/'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikipedia\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:w:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikinews\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:n:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikisource\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:s:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikibooks\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:b:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikiquote\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:q:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikiversity\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:v:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wiktionary\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:wikt:\2:\3|\4]]'),
(ur'\[//(www\.)?([^@:/ (www)]+)\.wikivoyage\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:wikivoyage:\2:\3|\4]]'),
(ur'\[//(www\.)?wikidata\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:d:\2|\3]]'),
(ur'\[//(www\.)?mediawiki\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:mw:\2|\3]]'),
(ur'\[//(www\.)?wikimediafoundation\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:wmf:\2|\3]]'),
(ur'\[//(www\.)?meta\.wikimedia\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:m:\2|\3]]'),
(ur'\[//(www\.)?outreach\.wikimedia\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:outreach:\2|\3]]'),
(ur'\[//(www\.)?wikitech\.wikimedia\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:wikitech:\2|\3]]'),
(ur'\[//(www\.)?commons\.wikimedia\.org/wiki/([^\s\]\?\|]+) ([^\]]+)\]', ur'[[:commons:\2|\3]]'),
(ur'\[http://toolserver\.org/', ur'[//toolserver.org/'),
#
# One of the next lines can be uncommented and adjusted depending
# on the project in which this script is going to run.
#
#(ur'\[\[:?m:([^\]]+)\]\]', ur'[[:\1]]'), # Meta-Wiki
#(ur'\[\[:?d:([^\]]+)\]\]', ur'[[:\1]]'), # Wikidata
#(ur'\[\[:?mw:([^\]]+)\]\]', ur'[[:\1]]'), # MediaWiki
#(ur'\[\[:?outreach:([^\]]+)\]\]', ur'[[:\1]]'), # Outreach
#(ur'\[\[:?commons:([^\]]+)\]\]', ur'[[:\1]]'), # Commons
#(ur'\[\[:?wikitech:([^\]]+)\]\]', ur'[[:\1]]'), # Wikitech
#(ur'\[\[:?w:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikipedia (replace "en" by the language code)
#(ur'\[\[:?n:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikinews (replace "en" by the language code)
#(ur'\[\[:?s:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikisource (replace "en" by the language code)
#(ur'\[\[:?b:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikibooks (replace "en" by the language code)
#(ur'\[\[:?q:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikiquote (replace "en" by the language code)
#(ur'\[\[:?v:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikiversity (replace "en" by the language code)
#(ur'\[\[:?wikt:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wiktionary (replace "en" by the language code)
#(ur'\[\[:?wikivoyage:en:([^\]]+)\]\]', ur'[[:\1]]'), # Wikivoyage (replace "en" by the language code)
#(ur'\[\[:?(foundation|wikimedia|wmf):([^\]]+)\]\]', ur'[[:\2]]'), # Foundation Wiki
#
],
'exceptions': {
'title': [
'\.(css|js|php|py|sh)',
'([Bb]lack|[Gg]r[ae]y|[Ww]hite)[ _]?[Ll]ist',
'([Ss]abliera|[Ss]and[ _]?([Bb]ox|[Pp]ut|[Cc]haschte|[Kk]assen?|[Kk]assinn|[Ll][aå]dan)|([Zz]ona|[Pp][aáà](g|ch)ina)[ _]?de[ _]?([Pp]r(ue[bv]as?|o[bv][ae]s|e[bv]atinas?)|[Tt]estes?))', # You can occasionally comment this line for testing purposes.
u'(صفحه[ _]تمرین|گودال)', #for Persian, no need to make it very general
],
'inside': [
(ur'\[//(www\.)?([^@:/ (www)]+)\.[a-z]+\.org/wiki/[^\s\]\?\|]+ (.*?\[\[.*?\]\].*?)+\]'),
(ur'\[//.{500}.*?\]'),
(ur'\[http://(www\.)?(apt|bayes|bayle|brewster|commonsprototype\.tesla\.usability|commons\.prototype|cs|cz|dataset2|de\.prototype|download|dumps|ekrem|emery|en\.prototype|ersch|etherpad|fenari|flaggedrevssandbox|flgrevsandbox|gallium|ganglia|ganglia3|harmon|hume|ipv4\.labs|ipv6and4\.labs|jobs|mlqt\.tesla\.usability|mobile\.tesla\.usability|m|nagios|noboard\.chapters|noc|observium|oldusability|project2|prototype|results\.labs|search|sitemap|snapshot3|stafford|stats|status|svn|test\.prototype|torrus|ubuntu|wiki-mail|yongle)\.wikimedia\.org'),
(ur'\[http://(www\.)?(arbcom\.[a-z]+|download|m|static|wg\.[a-z]+)\.wikipedia\.org'),
(ur'\[http://(www\.)?[^@:/]+\.m\.wikipedia\.org'),
(ur'\[//(www\.)?(ten|test|test2)\.wikipedia\.org'), # To prevent: test.wikipedia -> [[w:test:]]
],
'inside-tags': [
# You can occasionally comment some of these exception tags,
# under your own risk.
'categorytree',
'comment',
'charinsert',
'dynamicpagelist',
'gallery',
'hiero',
'imagemap',
'inputbox',
'invoke',
'math',
'nowiki',
'pagelist',
'pagequality',
'pages',
'poem',
'pre',
'property',
'score',
'section',
'source',
'syntaxhighlight',
'templatedata',
'timeline',
]
}
}
# </nowiki>