Research talk:Wikipedia Inconsistency Detection

Ping

Latest comment: 1 day ago7 comments7 people in discussion

Hi! We are pinging you as we noticed you have identified cross-article contradictions on Wikipedia before.

We are developing a research prototype to help editors identify and resolve factual inconsistencies across articles and would like to invite you to help evaluate it and help us improve it. We appreciate your participation and/or feedback.

For more information, please visit Research:Wikipedia Inconsistency Detection - Meta. If interested in participating, please fill out this form: Expression of Interest in Research Participation: Wikipedia Inconsistency Detection Project - Stanford OVAL As a token of our appreciation, we will provide Amazon gift cards to participants.

@Stifle: @Ruttgc49: @Holly Cheng: @Super Goku V: @Muzilon: @SandyGeorgia: @Bwrs: @Chanakal: @Koavf: @Error: @Charwood12: @Florian Blaschke: @SMcCandlish: @A Shortfall Of Gravitas: @Layzner: @Nlu: @Frete unicolore: @Panamitsu: @Beland: @MtPenguinMonster: @Hairy Dude: Sjsem (talk) 09:05, 13 January 2025 (UTC)Reply

I'm happy to test your system, but I couldn't fill out the form because it requires a Google account.

As an AI engineer myself, in the interest of maximizing the chances for success, I'll also offer some friendly unsolicited advice: I expect the use of an LLM will create accuracy problems. There's no particular reason to expect that the sentences that result from attempted information extraction will make the same claims as the articles they are extracted from, because the corpus the LLM is based on is contaminated by probabilities from maybe a few contradictory and definitely lots of completely unrelated claims. It's possible a good ranking system could keep bad summaries from clogging up the work queue (but an obvious one does not come to mind), or that they will be infrequent enough that humans can simply skip over them. But I think a different approach is more promising to quickly hone in on direct contradictions, and also produce outputs useful to other applications. (In past projects, I have also found combining my rule-based approach with completely different - often more statistical - approaches can lead to performance better than any one given approach.)

I would approach this problem by grammatically parsing sentences using an off-the-shelf library like NLTK or SpaCy, then transforming that into a semantic representation in the form of a small tree (like "Boston" -> is_a -> "city", "Boston" -> part_of -> "Massachusetts"). Some named entity recognition techniques and some logic as you describe in your project page (like correlating birth and death dates, parents, and names across humans) can be used to resolve nouns as either non-specific (like "table") or specific (like "Abraham Lincoln" or ideally even specific common nouns like "quantum mechanics"). There are many existing ontology languages which can be used to avoid re-inventing the wheel and improve compatibility with other systems. Assertions about specific nouns can then be compared across articles, and they can also be compared to existing off-the-shelf semantic databases. (And obviously, a lot of processing time and storage space can be saved by comparing the article "Boston" only to other articles that include that string.)

Because this technique embodies specific knowledge and reasoning, it requires more up-front investment, but this can be made reasonable by scoping a prototype to specific types of things. Geographical entities might be a good choice - there are so many that human editors have trouble keeping up, they are highly international and those in non-English-speaking countries tend to get neglected, they are easy to reason about, it's easy to create stubs for missing articles, and I've noticed inconsistencies between articles as status slowly changes (for example, counting the number of provinces in a country). People would also be a good choice, because errors in the biographies of living people are highly damaging and a top fact-checking priority for the project. It also vastly simplifies construction to pick an area where things are simply objectively true or false, rather than X being true according to one school of thought and Y being true according to Y school of thought.

The logic for detecting inconsistencies would need to distinguish mutually compatible vs. incompatible claims on the same parameters. For example, a state can have many cities, but generally only one capital. A city can generally only be part of one state. A person can only have one birth date. Ideally the system would be able to cope with sentences like "City X is the capital of both Y and Z states" which happens sometimes and "sources disagree on whether X was born on date Y or Z"...maybe other pages would be required to note the duality every time one or the other was mentioned, or maybe it's enough for other pages to be consistent with one of the choices. Some errors should be obvious, like asserting "Boston" -> is_part_of -> "Connecticut" should imply the existence of the article "Boston, Connecticut", and an entry for "Boston" on the list of cities in Connecticut; the fact that doesn't exist is an error that needs to be checked. Tree-walking can be used to do more useful checking, for example to make sure none of the articles about the neighborhoods of Boston claim they are outside of any of its ancestors - Suffolk County, Massachusetts, North America. Some care needs to be taken with ambiguous terms; for example, there is a "Boston, Lincolnshire, England, UK" as well as a "Boston, Massachusetts, US", with different properties. Presumably the system would be able to notice if articles are confusing the two, which is a major source of error - linking to the wrong article. Time is also critical; it would be very interesting to search Wikipedia for mentions of someone being involved in an event that was either before their birthday or after their death, or holding an office (e.g. governor) outside the dates asserted by their biography. It could also be useful to detect anachronistic names, when the name of a person or place changes.

There are many existing knowledge graphs, knowledge bases, ontologies, semantic networks, or whatever you want to call them, which could either inspire the semantic frames and inference logic this system would use, or actually be integrated into the system to avoid starting from scratch. Some of them are already integrated with the machine-readable data on Wikipedia. Interesting systems include BabelNet, UBY, and YAGO (which integrate several smaller networks like WordNet and Wikidata); and Open Mind Common Sense and Cyc/OpenCyc (which target "common sense" assertions typically not explicitly written).

Regardless of what approach you take, I wish you good luck and good skill with your project!

-- Beland (talk) 21:45, 13 January 2025 (UTC)Reply

I am morally opposed to AI and large language models due to their chronic inaccuracy and environmental impact, and am not prepared to support, or assist with, any work involving them. Stifle (talk) 09:26, 14 January 2025 (UTC)Reply
While I think AI/LLMs are a great thing, I oppose the use of AI/LLMs in its current state to be used as a tool to write or edit Wikipedia due to the amount of information it hallucinates. Personally, I'd wait a few years before I'd participate in such a thing. But don't get me wrong -- I agree that this could be a really useful tool, but I'm not ready for it just yet. Panamitsu (talk) 22:22, 18 January 2025 (UTC)Reply
I don't even remember having done this, so I'm not sure how much help I would be. holly {chat} 19:29, 14 January 2025 (UTC)Reply
Will try it out. As with some commenters above, I do not at all trust "AI" (LLMs) to write content. They both get things wrong and sometimes fake their sourcing. I've even gotten ChatGPT to literally lose its @#$*ing mind and become utterly incapable of producing correct output even when it knew it was wrong and why it was wrong (something I'll do a write-up about soon). But I've used a couple of LLMs enough to be familiar with their ability to analyze content semantically. A tool that leverages an LLM for figuring out where two written approaches to the same subject are in conflict with each other might be interesting. As long as we do not trust the LLM to determine what the correct version is and write it, nor trust its ability to evaluate the reliability of or subtle, contextualized meaning of the sources behind the competing versions. — SMcCandlish ☺ ☏ ¢ >^ʌⱷ҅_ᴥⱷ^ʌ< 17:03, 25 January 2025 (UTC)Reply

This sounds really interesting and could be quite constructive. It's similar to what I had in mind when creating c:Category:Identification of internal inconsistencies and contradictions in Wikimedia projects – could you also make this detect inconsistencies between the categories? For example, some article or Commons category having two years as birth year or both cats 'Videos in English' and 'Videos in Spanish' set or a subcat of both 'Indoor climbing' and 'Outdoor sports' set would be marked as having contradictory categorization. Inconsistencies between categories between languages could also be marked. Prototyperspective (talk) 18:49, 4 February 2025 (UTC)Reply

Add topic