Research talk:Automated classification of article importance/Work log/2017-03-27
Monday, March 27, 2017
editToday I will continue my WPMED conversation and training and evaluation of a classifier using the clickstream data.
WPMED categories of Low-importance
editCertain types of articles in WPMED are automatically Low-importance. Per their importance scale it is at least these: very rare diseases, lesser-known medical signs, equipment, hospitals, individuals, historical information, publications, laws, investigational drugs, detailed genetic and physiological information, and obscure anatomical features. Or per our conversation on their talk page: "people, books, laws, journals, organizations". Can we easily identify some of these categories? Let's look at some examples:
Title | Potential categories | Wikidata |
---|---|---|
Patient Protection and Affordable Care Act | Anything matching "legislation"? | instance of "legislation" |
Alexander Fleming | "People from…", "People educated at…", and several others | instance of "human" |
Health Insurance Portability and Accountability Act | Matches against "legislation"? | Nothing |
Medicare (United States) | Not sure | instance of "government program", "publicly funded health care", and "health insurance in the United States" |
Benjamin Rush | Births, deaths, "People from…", etc… | instance of "human" |
JAMA (journal) | "…medical journals", "Publications established in…" | instance of "scientific journal" |
Merck & Co. | "Companies listed on…", "Companies based in…", "Pharmaceutical companies…: | instance of "business enterprise" |
Individuals should be feasible, not sure about the others. I'll dig more into the Low-importance mispredictions to see what I can find.