List of Wikipedias by sample of articles/Source code (original)
This is the source code I'm currently using for the statistics at the List of Wikipedias by sample of articles. A few suggestions for improvement have already been made on the talk page -- I'll work on a new version when I have time. Of course, other interested people are welcome to work on it too.
The script is very simple and straightforward (NB: the updated page names for the List of articles every Wikipedia should have at Meta are not in this script; rather they are listed in the outdated file called 'yegedalised.txt' = list of articles. The current contents of this file, now being used for updates, are listed after the source code). Note that the variable names and messages are in Volapük. If this makes understanding difficult, please feel free to contact me. If there are also any obvious problems or bugs I have missed, please let me know.) --Smeira 16:37, 29 November 2007 (UTC)
Note: there are only five Wikipedias in pukalised; of course, every time I run the script I change them into the ones I want to look at now.
# -*- coding: utf_8 -*-
import sys
sys.path.append('c:\\Sergio\\Python2.5\\pywikipedia')
import wikipedia
import pagegenerators
import catlib
lingl = wikipedia.Site('en', 'wikipedia')
pukalised = ['en', 'de', 'fr', 'it', 'ja']
pukataib = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
sekataib = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
npuks = len(pukalised)
pukanum = 1
for puk in pukalised:
pukavuk = wikipedia.Site(puk, 'wikipedia')
if puk == 'en': pukavuk = lingl
ragiv = open("C:\\Sergio\\Python2.5\\pywikipedia\\xxxfiles\\yegedalised.txt")
yegedanum = 0
zaned = 0
sekataib[pukanum] = [0, 0, 0, 0, 0]
print u'\n======================\nPÜKAMALAT: ',puk,'\n======================\n'
for lien in ragiv:
yeged = lien[:-1].decode('cp1252')
pukataib[0].insert(yegedanum, yeged)
linglapad = wikipedia.Page(lingl, yeged)
linglavodem = linglapad.get()
pukavodem = linglavodem
if puk != 'en':
plad1 = linglavodem.find(u'[[' + puk + u':')
if plad1 > 0:
plad1 = linglavodem.find(u':', plad1)
pladf = linglavodem.find(u']]', plad1)
pukayeged = linglavodem[plad1+1:pladf]
pukapad = wikipedia.Page(puk, pukayeged)
pukavodem = pukapad.get(get_redirect=True)
if pukavodem.find(u'#REDIRECT') > -1:
plad1 = pukavodem.find('[[')
pladf = pukavodem.find(']]', plad1)
pukayeged = pukavodem[plad1+2:pladf]
pukapad = wikipedia.Page(puk, pukayeged)
pukavodem = pukapad.get()
else:
pukavodem = ''
gretot = len(pukavodem)
pukataib[pukanum].insert(yegedanum, gretot)
if gretot == 0:
sekataib[pukanum][0] = sekataib[pukanum][0] + 1
elif gretot > 0 and gretot < 10000:
sekataib[pukanum][1] = sekataib[pukanum][1] + 1
elif gretot > 10000 and gretot < 30000:
sekataib[pukanum][2] = sekataib[pukanum][2] + 1
elif gretot > 30000:
sekataib[pukanum][3] = sekataib[pukanum][3] + 1
print pukataib[0][yegedanum], pukataib[pukanum][yegedanum]
zaned = zaned + gretot
yegedanum = yegedanum + 1
ragiv.close()
sekataib[pukanum][4] = int(zaned / (yegedanum-1))
print sekataib[pukanum]
pukanum = pukanum + 1
print '\n\n'
print 'SEKATAIB PEKALKULON.\n------- ------------\n\n'
print u'Pük:',' N/Db ',' <10k ','10-30k',' >30k '
for puk in range(npuks):
grad = 0
print pukalised[puk].ljust(4),
for num in range(4):
volad = sekataib[puk+1][num]
print str(volad).rjust(6),
grad = grad + volad*((num)**2)
print ' .... ', grad, ' (yegedagretot zanedik: ', sekataib[puk+1][4],' b).'
print
print 'KALKULAM EFINIKON...'
print
vtaib = '{|border="1" cellpadding="2" cellspacing="0" style="width:100%; background: #f9f9f9; border: 1px solid #aaaaaa; border-collapse: collapse; white-space: nowrap; text-align: center"'
vtaib = vtaib + '\n|-\n'
vtaib = vtaib + u'!width = 15 | № !! width = 25 | Lang. !! width = 150 | Average Article Size<br>(chars) !! width = 70 | Absent<br>(0k) !! width=70| Stubs<br>(< 10k)!! width = 70 | Articles<br>(10-30k) !! width = 70 | Long Art.<br>(> 30k) !! Score'
vtaib = vtaib + '\n|-\n'
for puk in range(npuks):
grad = 0
vtaib = vtaib + '|' + str(puk+1) + '\n'
vtaib = vtaib + '| [[:' + pukalised[puk] + ':|' + pukalised[puk] + ']]\n'
vtaib = vtaib + '| ' + str(sekataib[puk+1][4]) + '\n'
for num in range(4):
volad = sekataib[puk+1][num]
vtaib = vtaib + '| ' + str(volad) + '\n'
grad = grad + volad*((num)**2)
vtaib = vtaib + '| ' + str(grad) + '\n|-\n'
vtaib = vtaib[:-2] + '}'
print vtaib
List of articles
editHere is the content of the yegedalised.txt (list of articles) file.
Brigitte Bardot Sarah Bernhardt Marlon Brando Charlie Chaplin Marlene Dietrich Marx Brothers Marilyn Monroe Sandro Botticelli Pieter Bruegel the Elder Le Corbusier Leonardo da Vinci Salvador Dalí Donatello Albrecht Dürer Vincent van Gogh Francisco Goya Frida Kahlo Henri Matisse Michelangelo Pablo Picasso Jackson Pollock Raphael Rembrandt Diego Velázquez Andy Warhol Frank Lloyd Wright Peter Paul Rubens Abu Nuwas Arnaut Daniel Matsuo_Bash%C5%8D Samuel Beckett Jorge Luis Borges George Gordon Byron, 6th Baron Byron Luís de Camões Miguel de Cervantes Geoffrey Chaucer Anton Chekhov Dante Alighieri Rubén Darío Charles Dickens Fyodor Dostoevsky Ferdowsi Fuzûlî Gabriel García Márquez Johann Wolfgang von Goethe Homer Horace Victor Hugo Henrik Ibsen James Joyce Franz Kafka K%C4%81lid%C4%81sa Omar Khayyám Li Bai Naguib Mahfouz John Milton Molière Vladimir Nabokov Ovid Edgar Allan Poe Munshi Premchand Marcel Proust Alexander Pushkin Arthur Rimbaud Shota Rustaveli José Saramago Sappho William Shakespeare Sophocles Snorri Sturluson J. R. R. Tolkien Leo Tolstoy Mark Twain Virgil Oscar Wilde Wu Cheng'en William Butler Yeats Johann Sebastian Bach The Beatles Ludwig van Beethoven Hector Berlioz Anton Bruckner Johannes Brahms Pyotr Ilyich Tchaikovsky Frédéric Chopin Anton%C3%ADn Dvo%C5%99%C3%A1k George Frideric Handel Jimi Hendrix Michael Jackson Madonna (entertainer) Gustav Mahler Wolfgang Amadeus Mozart Giacomo Puccini Elvis Presley The Rolling Stones Franz Schubert Bed%C5%99ich Smetana Robert Schumann Jean Sibelius Igor Stravinsky Giuseppe Verdi Antonio Vivaldi Richard Wagner Roald Amundsen Neil Armstrong Jacques Cartier Christopher Columbus James Cook Hernán Cortés Yuri Gagarin Vasco da Gama Ferdinand Magellan Marco Polo Zheng He Alexander von Humboldt Ingmar Bergman Walt Disney Federico Fellini Alfred Hitchcock Stanley Kubrick Akira Kurosawa George Lucas Steven Spielberg Archimedes Alexander Graham Bell Tim Berners-Lee Tycho Brahe Nicolaus Copernicus Marie Curie Charles Darwin Thomas Edison Albert Einstein Euclid Leonhard Euler Michael Faraday Enrico Fermi Fibonacci Henry Ford Joseph Fourier Galileo Galilei Carl Friedrich Gauss Johannes Gutenberg Ernst Haeckel James Prescott Joule Johannes Kepler John Maynard Keynes Muhammad ibn M%C5%ABs%C4%81 al-Khw%C4%81rizm%C4%AB Gottfried Leibniz Carl Linnaeus James Clerk Maxwell Dmitri Mendeleev Antonio Meucci Isaac Newton Blaise Pascal Louis Pasteur Max Planck Ernest Rutherford Erwin Schrödinger Richard Stallman Nikola Tesla Alan Turing James Watt Wright brothers Thomas Aquinas Aristotle Augustine of Hippo Avicenna Giordano Bruno Simone de Beauvoir Noam Chomsky René Descartes Émile Durkheim Francis of Assisi Sigmund Freud Georg Wilhelm Friedrich Hegel Herodotus Hippocrates Immanuel Kant John Locke Martin Luther Rosa Luxemburg Niccolò Machiavelli Karl Marx Friedrich Nietzsche Paul the Apostle Plato Pythagoras Jean-Jacques Rousseau Jean-Paul Sartre Adam Smith Socrates Sun Tzu Voltaire Max Weber Ludwig Wittgenstein Akbar the Great Alexander the Great Mustafa Kemal Atatürk Augustus David Ben-Gurion Otto von Bismarck Simón Bolívar Napoleon I of France George W. Bush Julius Caesar Charlemagne Winston Churchill Empress Dowager Cixi Cleopatra VII Constantine I Charles de Gaulle Indira Gandhi Elizabeth I of England Genghis Khan Haile Selassie I of Ethiopia Hirohito Adolf Hitler Vladimir Lenin Louis XIV of France Nelson Mandela Mao Zedong Benito Mussolini Kwame Nkrumah Peter I of Russia Qin Shi Huang Saladin Joseph Stalin Margaret Thatcher Harry S. Truman Victoria of the United Kingdom George Washington Abraham Moses Jesus Muhammad Gautama Buddha Osama bin Laden Mohandas Karamchand Gandhi Emma Goldman Joan of Arc Helen Keller Martin Luther King, Jr. Mother Teresa Florence Nightingale Rosa Parks Che Guevara History Prehistory Stone Age Bronze Age Iron Age Mesopotamia Ancient Egypt Ancient Greece Roman Empire Age of Enlightenment Aztec Byzantine Empire Crusades Holy Roman Empire Hundred Years' War Middle Ages Mongol Empire Ming Dynasty Ottoman Empire Protestant Reformation Renaissance Thirty Years' War Viking American Civil War History of South Africa in the Apartheid era British Empire Cold War French Revolution Great Depression Gulf War The Holocaust Industrial Revolution Korean War Nazi Germany Russian Revolution (1917) Qing Dynasty Spanish Civil War Treaty of Versailles Vietnam War World War I World War II Geography Capital City Continent Country Desert Earth science Map North Pole Ocean Rainforest River Sea South Pole Africa Antarctica Asia Europe Latin America Middle East North America Oceania South America Afghanistan Algeria Argentina Australia Austria Bangladesh Belgium Brazil Canada China People's Republic of China Democratic Republic of the Congo Egypt Ethiopia France Germany Greece India Indonesia Iran Iraq Republic of Ireland Israel Italy Japan Mexico Netherlands Pakistan Poland Russia Saudi Arabia Singapore South Africa South Korea Spain Sudan Switzerland Tanzania Thailand Turkey Ukraine United Kingdom United States Vietnam Portugal Amsterdam Athens Baghdad Bangkok Beijing Beirut Berlin Brisbane Brussels Buenos Aires Cairo Canberra Cape Town Chicago Damascus Dar es Salaam Dublin Edinburgh Florence Hong Kong Istanbul Jakarta Jerusalem Karachi Kyoto Los Angeles, California London Mecca Melbourne Mexico City Milan Moscow Mumbai Nairobi Naples New Delhi New York City Paris Rio de Janeiro Rome Seoul Shanghai Singapore Sydney Tehran Tel Aviv Tokyo Venice Vienna Washington, D.C. Amazon River Aral Sea Arctic Ocean Atlantic Ocean Baltic Sea Black Sea Caribbean Sea Caspian Sea Congo River Danube Dead Sea Euphrates Ganges Great Barrier Reef Great Lakes Indian Ocean Indus River Lake Baikal Lake Tanganyika Lake Titicaca Lake Victoria Mediterranean Sea Mississippi River Niagara Falls Niger River Nile North Sea Pacific Ocean Panama Canal Rhine Suez Canal Southern Ocean Tigris Volga River Yangtze River Alps Andes Himalayas Mount Kilimanjaro Mount Everest Rocky Mountains Sahara Society Civilization Education Family Child Man Marriage Woman Behavior Emotion Love Thought Politics Anarchism Colonialism Communism Conservatism Democracy Dictatorship Diplomacy Fascism Globalization Government Ideology Imperialism Liberalism Marxism Monarchy Nationalism Nazism Republic Socialism State Political party Propaganda Economics Macroeconomics Microeconomics Agriculture Capital (economics) Capitalism Currency Euro Japanese yen United States dollar Industry Money Tax Law Constitution African Union Arab League Association of Southeast Asian Nations Commonwealth of Independent States Commonwealth of Nations European Union International Red Cross and Red Crescent Movement NATO Nobel Prize OPEC United Nations International Atomic Energy Agency International Court of Justice International Monetary Fund UNESCO Universal Declaration of Human Rights World Health Organization World Bank Group World Trade Organization Civil war Military Peace War Abortion Capital punishment Human rights Racism Slavery Culture Art Comics Painting Photography Sculpture Pottery Dance Fashion Theatre Cannes Film Festival Language Alphabet Chinese character Cyrillic alphabet Greek alphabet Latin alphabet Letter (alphabet) Grammar Noun Syntax Verb Linguistics Literacy Literature Prose Fiction Novel One Thousand and One Nights Poetry Epic of Gilgamesh Iliad Mah%C4%81bh%C4%81rata Ramayana Pronunciation Arabic language Bengali language Chinese language English language Esperanto French language German language Greek language Hebrew language Hindi Interlingua Italian language Japanese language Latin Persian language Russian language Sanskrit Spanish language Tamil language Turkish language Word Writing Architecture Arch Bridge Canal Dam Dome House Aswan Dam Colosseum Great Wall of China Eiffel Tower Empire State Building Hagia Sophia Parthenon Giza pyramid complex St. Peter's Basilica Taj Mahal Pyramid Tower Film Animation Radio Television Music Song Blues Classical music Opera Symphony Electronic music Folk music Jazz Pop music Reggae Rhythm and blues Rock and roll Hard rock New Age music Drum Flute Guitar Piano Trumpet Violin Game Backgammon Chess Go (board game) Playing card Gambling Martial arts Judo Karate Olympic Games Sport American football Auto racing Badminton Baseball Basketball Cricket Fencing Football (soccer) Golf Horse racing Ice hockey Tennis Rugby union Wrestling Athletics (track and field) Toy Deity God Mythology Atheism Fundamentalism Materialism Monotheism Polytheism Soul Religion Bahá'í Faith Buddhism Christianity Roman Catholic Church Confucianism Hinduism Islam Jainism Judaism Shinto Sikhism Taoism Haitian Vodou Zoroastrianism Spirituality Philosophy Beauty Dialectic Ethics (philosophy) Epistemology Feminism Free will Knowledge Logic Mind Morality Reality Truth Science Astronomy Asteroid Big Bang Black hole Comet Galaxy Milky Way Light-year Moon Planet Earth Jupiter Mars Mercury (planet) Neptune Saturn Uranus Venus Solar System Star Sun Universe Biology DNA Enzyme Protein Botany Death Suicide Ecology Endangered species Domestication Life Scientific classification Species Metabolism Digestion Photosynthesis Respiration (physiology) Evolution Reproduction Asexual reproduction Sexual reproduction Heterosexuality Homosexuality Pregnancy Sex Female Male Sexual intercourse Anatomy Cell Circulatory system Blood Heart Endocrine system Gastrointestinal tract Colon (anatomy) Small intestine Liver Integumentary system Breast Skin Muscle Nervous system Brain Sensory system Auditory system Ear Gustatory system Olfactory system Somatosensory system Visual system Eye Reproductive system Penis Vagina Respiratory system Lung Skeleton Medicine Addiction Alcoholism Drug addiction Alzheimer's disease Cancer Cholera Acute Viral Nasopharyngitis (Common Cold) Dentistry Disability Blindness Hearing impairment Mental disorder Disease Medication Ethanol Nicotine Tobacco Drug Health Headache Myocardial infarction Heart disease Malaria Malnutrition Obesity Pandemic Penicillin Pneumonia Poliomyelitis Sexually transmitted disease AIDS Stroke Tuberculosis Diabetes mellitus Virus Influenza Smallpox Organism Animal Arthropod Insect Ant Bee Butterfly Arachnid Chordate Amphibian Frog Bird Columbidae Fish Shark Mammal Ape Human Camel Cat Cattle Dog Dolphin Elephant Horse Domestic sheep Lion Pig Whale Reptile Dinosaur Snake Archaea Bacteria Fungus Plant Flower Tree Protist Chemistry Biochemistry Chemical compound Acid Base (chemistry) Salt Chemical element List of elements by name Periodic table Aluminium Carbon Copper Gold Helium Hydrogen Iron Neon Nitrogen Oxygen Silver Tin Zinc Metal Alloy Steel Organic chemistry Alcohol Carbohydrate Hormone Lipid Phase (matter) Gas Liquid Plasma (physics) Solid Avalanche Climate El Niño-Southern Oscillation Global warming Earthquake Geology Mineral Diamond Plate tectonics Rock (geology) Natural disaster Volcano Weather Cloud Flood Tsunami Rain Acid rain Snow Tornado Tropical cyclone Physics Acceleration Atom Energy Conservation of energy Electromagnetic radiation Infrared Visible spectrum Color Ultraviolet Gamma ray Force Electromagnetism Gravitation Nuclear force Light Magnet Magnetic field Mass Molecule Quantum mechanics Sound Speed Speed of light Speed of sound Theory of relativity Time Velocity Weight Length Anno Domini Calendar Gregorian calendar Century Day Minute Millennium Month Time zone Daylight saving time Week Year Technology Biotechnology Clothing Cotton Engineering Lever Pulley Screw Wedge (mechanical device) Wheel Irrigation Plough Metallurgy Nanotechnology Communication Book Information Encyclopedia Journalism Newspaper Mass media Printing Rail transport Telephone Mobile phone Electronics Electric current Frequency Capacitor Inductor Transistor Diode Resistor Transformer Computer Hard disk drive Processor Random access memory Artificial intelligence Information technology Algorithm Internet E-mail World Wide Web Web browser Operating system Programming language Computer software User interface Keyboard (computing) Computer display Mouse (computing) Energy (society) Renewable energy Electricity Nuclear power Fossil fuel Internal combustion engine Steam engine Fire Glass Paper Plastic Wood Transport Aircraft Automobile Bicycle Boat Ship Train Weapon Axe Explosive material Gunpowder Firearm Machine gun Nuclear weapon Sword Tank Food Bread Cereal Barley Maize Oat Rice Rye Sorghum Wheat Cheese Chocolate Honey Fruit Apple Banana Grape Legume Soybean Lemon Nut (fruit) Meat Sugar Vegetable Potato Beer Wine Coffee Milk Tea Water Juice Mathematics Algebra Arithmetic Axiom Calculus Geometry Circle Pi Square Triangle Group theory Mathematical proof Number Complex number Integer Natural number Prime number Rational number Infinity Set theory Statistics Trigonometry Measurement Joule Kilogram Litre Metre Newton International System of Units Volt Watt Second Kelvin