User:Sj/!/struct

StrepHit: ?

HTML dumps

edit

Enterprise maintains html dumps for 6 wikipedias (as of 2/25)

WD integration

edit

Hoping to merge into Enterprise

what about dbpedia?

Coordinated project: Structured Wikipedia

edit

Repository

currently used by Ecosia + Pleias

Structure pages for external reuse. Do parsing that reusers already do or need

  • HuggingFace (talk to Poli) -- detailed drop templates incl numerical conversions
  • Other embeddings : often use a bespoke parsing (wikitext, not html)
  • Note the high template/infobox count on some wikis

abstract / entity / sections / infobox / image / ORES scores / revert risk / redirects

Todo: talk page activity, references, what links here, tables
Investigation into annotation as upgrade to talk page sections

Commons metadata

edit

Releasing among free snapshots.

Other

edit

Classification pages:

Images: extraction and listing

Infoboxes:

References:

Sections:

Pageviews (HalT): by geo, priv preserving