Research:Wiki Ed student editor contributions to sciences on Wikipedia
2016 is the Year of Science for the Wiki Education foundation. With this push to create new science content, how do we determine what the real impact is on Wikipedia? To find the portion of science content being generated by Wiki Ed, we analyse work performed on science articles by the Spring 2016 Wiki Ed cohort. We conclude that while Wiki Ed science output varies substantially based on the current position in the academic year, at its peak, students maintain a sustained output of 6% of total science-related content output.
Research Question
editRQ: What portion of science content is generated by Wiki Ed students?
Methods
editDefining science articles
editOur first challenge is to identify what articles count as science articles. We do this by first finding science WikiProjects and then finding articles tagged by these projects.
We start with the Wikiproject directory, selecting all projects listed under Science. This gives us a diverse list of 295 projects. This list is then reviewed and non-top-level projects (e.g. taskforces like Climate change task force) are removed as well as projects that fell significantly out of a reasonable definition of science (i.e. Internet culture). Our final list of selected projects is listed below.
Selected projects
edit- Abortion
- AIDS
- Aircraft
- Airlines
- Airports
- Algae
- Alternative_fuels
- Alternative_medicine
- Amiga
- Amphibians_and_Reptiles
- Anatomy
- Animal_anatomy
- Animals
- Apple_Inc.
- Aquarium_Fishes
- Aquatic_Invertebrates
- Archaeology
- Arthropods
- Astronomical_objects
- Astronomy
- Audiovisual_telecommunications
- Australian_biota
- Automobile_construction
- Aviation
- Banksia
- Beekeeping
- Beetles
- Bell_System
- Biology
- Biophysics
- Biota_of_Great_Britain_and_Ireland
- Birds
- Bivalves
- Blades
- C++
- Cannabis
- Carnivorous_plants
- Cats
- Cell_Signaling
- Cellular_devices
- Cephalopods
- Cetaceans
- Chemical_and_Bio_Engineering
- Chemicals
- Chemistry
- Civil_engineering
- Climate
- Cognitive_science
- Color
- Computational_Biology
- Computer_graphics
- Computer_music
- Computer_science
- Computer_Security
- Computer_Vision
- Computing
- Cosmology
- Cryptography
- Cryptozoology
- Dams
- Databases
- Dentistry
- Dinosaurs
- Dogs
- Dyslexia
- Earthquakes
- Earth_science
- Eclipses
- Ecology
- Economics
- Ecoregions
- Electrical_engineering
- Electronics
- Elements
- Energy
- Engineering
- Environment
- Equine
- Evolutionary_biology
- Explosives
- Extinction
- Firearms
- First_aid
- Fishes
- Forestry
- Formation_Evaluation
- Free_Software
- Fungi
- Futures_studies
- Game_theory
- Gastropods
- Gemology_and_Jewelry
- Gender_Studies
- Genetics
- Geologic_timescale
- Geology
- Gliding
- Golden_ratio
- Health_and_fitness
- History_of_Biology
- History_of_Nuclear_Enterprise
- History_of_Science
- Horticulture_and_Gardening
- Hospitals
- Human_Genetic_History
- Insects
- IRC
- Java
- KDE
- Lepidoptera
- LGBT_studies
- Linux
- Malware
- Mammals
- Mantodea
- Marine_life
- Mars
- Mass_spectrometry
- Mathematical_and_Computational_Biology
- Mathematics
- Mathematics_Competitions
- Measurement
- Medicine
- Metalworking
- Meteorology
- Method_engineering
- Microbiology
- Microsoft
- Microsoft_Windows
- Mind-Body
- Mining
- Molecular_and_Cellular_Biology
- Monotremes_and_Marsupials
- Moon
- National_Health_Service
- Nature
- .NET
- Neuroscience
- NIH
- NLP_concepts_and_methods
- Non-tropical_storms
- Numbers
- Nursing
- Optics
- Perl
- Pharmacology
- Phasmatodea
- Physical_Chemistry
- Physics
- Physiology
- Plan_9
- Plants
- Pollution
- Polyhedra
- Polymers
- Primates
- Probability
- Programming_languages
- Prokaryotes_and_protists
- Pseudoscience
- Psychedelics,_Dissociatives_and_Deliriants
- Psychology
- Pterosaurs
- Radio_Stations
- RISC_OS
- RNA
- Robotics
- Rocketry
- Rocks_and_minerals
- Rodents
- Sanitation
- Science
- Sea_Monsters
- Seamounts
- Severe_weather
- Sexology_and_sexuality
- Sharks
- Signal_Processing
- Software
- Soil
- Solar_System
- Spaceflight
- Spectroscopy
- Spiders
- Statistics
- Superfunds
- Systems
- Systems_Engineering_Initiative
- Technology
- Telecommunications
- Time
- Trains
- Trains_in_Japan
- Transportation
- Tree_of_Life
- Tropical_cyclones
- Uniform_Polytopes
- Veterinary_medicine
- Viruses
- Volcanoes
- Water
- Women_scientists
- Years_in_science
- Zoo
Collecting science revisions
editAfter selecting science WikiProjects we were still left needing to identify the pages belonging to the selected project and the revisions belonging to the selected pages.
Selecting science pages
editWe used the category labels to identify science pages. Using the replica databases provided at wmflabs, we selected all category links indicating quality rating or importance which frequently contain the project name. The output of this query is then further processed to extract the project name and associated page id. From this list of project page-id pairs we select all page ids in the selected projects.
Selecting science revisions
editWith our science pages identified we iterate through all enwiki-20160601-stub-meta-history*.xml.gz
dumps selecting out and diffing the number of bytes between consecutive revisions for all revisions belonging to the identified pages.
Tallying daily contributions
editUsing the selected revisions we tally the number of positive bytes added by Wiki Ed students and by contributors in general. We iterate through our data set of selected revisions, searching back 10 revisions to look for a sha1 matching, indicating a likely revert. If we find a match we omit all edits between the original and the reverting edit. The reverting edit is also omitted. For each edit that passes this test and contributes a positive number of bytes we identify its date and contributor. If the contributor is part of the Wiki Ed cohort it is added to that date's sum of contributions for Wiki Ed.
Results
editExamining contribution trends by general users throughout the semester shows that they are relatively consistent with the exception of some intermittent spikes. The largest of these in science contribution, and actually in all contribution in the history of Wikipedia occurred on April 11 and correlated with User:Rfassbind merging list of 100 minor planets into lists of 1000 minor planets. For the sake of further analysis contributions made on April 11 were removed from consideration since the metric was dominated by the shifting around of content, rather than its creation which is the concept we are attempting to measure.
The median level of contribution over Wikipedia's entire lifespan is 1129336 bytes/day while looking only over the past two year it is a slightly reduced 1079130 bytes/day.
Looking at student bytes added to science articles we see a clear relationship with the academic schedule. Students are more active in the last two months of the typical semester, correlating to the typical times in which term papers would be due. There are also smaller bursts of activity in mid February and early March believed to be correlated with quarter system classes.
During the spring 2016 semester general editors produced 192878894 bytes, 167649187 bytes when April 11 is removed. Wiki Ed students produced 4685882 bytes of content, 4611449 bytes omitting April 11, during the spring 2016 semester. This amounts to 2.8% of all science content added. If we narrow our focus to the most active time in the school year, between April 1 and May 15 (April 11 omitted), Wiki Ed students contribute 5.9% of all science content added.