StrepHit: ?
HTML dumps
editEnterprise maintains html dumps for 6 wikipedias (as of 2/25)
WD integration
editHoping to merge into Enterprise
- what about dbpedia?
Coordinated project: Structured Wikipedia
edit- currently used by Ecosia + Pleias
Structure pages for external reuse. Do parsing that reusers already do or need
- HuggingFace (talk to Poli) -- detailed drop templates incl numerical conversions
- Other embeddings : often use a bespoke parsing (wikitext, not html)
- Note the high template/infobox count on some wikis
abstract / entity / sections / infobox / image / ORES scores / revert risk / redirects
- Todo: talk page activity, references, what links here, tables
- Investigation into annotation as upgrade to talk page sections
Commons metadata
editReleasing among free snapshots.
Other
editClassification pages:
Images: extraction and listing
Infoboxes:
References:
Sections:
Pageviews (HalT): by geo, priv preserving