Community Wishlist Survey 2022/Larger suggestions/Semantic search

  • Problem: The only tool to search Wikipedia currently is a simple direct keyword search which looks for literal query matches. If any information is needed, the only way to find it would be to try searching specific keywords or phrases which must match perfectly with the information one is trying to find. Wikipedia is filled with so much information, yet so much of it is hard to find because of this. Even if one knows what article contains the information they seek, the only way to find specific information would be to use keyword matching search (which would make specific information hard to find, especially in long articles).
  • Proposed solution: The proposed solution would be a Wikipedia semantic search which can help users find information using natural language queries. This means that one could enter a question like "What is the deepest point of the ocean?", and they would be directed to the section of the Wikipedia article about the Mariana Trench which explains this fact. This is not only a possibility, but is already used by many search engines like Google (which has many more pages to index than just Wikipedia). This would have a tremendous impact on the future of free knowledge as it would make finding information significantly easier.
  • Who would benefit: The group that would benefit most would be those who are looking for specific information or to have a question of theirs answered.
  • More comments: I have briefly worked on a project independently which allowed for semantic search within a Wikipedia article (i.e. if the user wanted to know when Tiger Woods began golfing, they would choose the Tiger Woods Wikipedia page and search something like, "When did Tiger Woods start golfing?"). While this is on a smaller scale, it can be extended to search all of Wikipedia. Further, along with readers benefitting from this tool, editors who are looking to make contributions to specific articles that contain topics they are familiar with can use this tool to find these articles (and sections within those articles) to which they can contribute.
  • Phabricator tickets:
  • Proposer: Ajshul (talk) 00:34, 11 January 2022 (UTC)[reply]

Discussion

  • I liked the idea of semantic search, to improve the search in general, they can add a gear toggle to the search bar and add additional settings and so on. Even other search models can be embedded in that toggle. Mohammad ebz (talk) 13:58, 11 January 2022 (UTC)[reply]
    I went to talk to the search team and they pointed out that sadly the term "semantic search" is a bit confusing in general, since really we are really doing quite a bit of semantic search, in in terms of trying to understand the searcher's intent and interpreting it in different ways.
    Would a question answering service describe what you are thinking of?
    The search team also mentioned that they have discussed this, because it's an interesting idea, and also they mentioned that for the next 2 quarters, they will have to focus on migrating to Elasticsearch 7 at first. KSiebert (WMF) (talk) 17:58, 12 January 2022 (UTC)[reply]
    Something along those lines; however, it might also be useful to not only provide the answer to the question, but also bring the user to the section of the page that contains the answer, so they can see the context surrounding the answer. Ajshul (talk) 15:34, 13 January 2022 (UTC)[reply]
    I also think it's a good idea!
    LG,
    Dwain 09:21, 20 January 2022 (UTC)[reply]
  • I also think it's good idea to search based on algorithm, like Google or Wolfram Alpha does, like answering questions. Thingofme (talk) 11:44, 20 January 2022 (UTC)[reply]
  • This is out of scope for our team. As such I'm moving this to our Larger suggestions category so the Search team can receive your valuable input (this is not to promise they will be able to deliver on it, but you showing them what you want and how much you want it is still valuable :) I think this is a huge project even for the Search experts! Thanks for participating in the survey, MusikAnimal (WMF) (talk) 02:44, 28 January 2022 (UTC)[reply]
  • Semantic search requires semantically structured data to search through, which is what Wikidata has, and I think Wikidata does need to be searchable by something approaching natural language queries. OTOH, trying to make semantic search work for unstructured natural-language text like in Wikipedia is an open research problem AFAIK. Silver hr (talk) 17:07, 3 February 2022 (UTC)[reply]
  • Someone has said: The Internet is like a library, where all the books have been thrown in a large pile on the floor.
I agree that searching for information is a difficult task and I appreciate any serious efforts to address it, but I'm skeptical about asking a single natural language question and have the correct answer automatically delivered to you. Instead of doing a full-text search yourself for various keywords, it would be like walking past that unstructured pile of books up to the information desk and asking library staff for the answer; now they have to perform that full-text search for you, and you have thereby effectively turned your question into Somebody Else's Problem, i.e. you are nowhere closer to an answer since library staff has no magical wand by which only they can summon that answer.
Instead, I believe the search process must involve a dialogue between you and the (automated) information retriever (the "robot librarian"), where that natural language question may be a good introduction, but depending on the exact circumstances and nature of your question, you may have to provide additional details for clarification of what you really want to know, and the robot must be able to request those clarifications, say by presenting a few simultaneous answers to make sure the question was properly understood and any ambiguities are resolved (such as when there are two golf players by the same name etc).
While Wikidata is structured, that structure isn't necessarily perfect or universally understood; your mind may well be structured in a slightly different way, sometimes making it difficult to have Wikidata "understand" what you are really asking for. Also, any information repository, be it Wikidata, a conventional library, or your own brain, is by necessity incomplete, and lack of any information, such as an entry for that hypothetical second golf player named Tiger Woods, is in no way proof that he doesn't exist somewhere. Or the information may well be there, but just not in the context where you expect to find it, and therefore it risks being overlooked.
I will still give my support to this proposal, not because I believe in (or even understand) the exact proposed solution, but because the problem it tries to address is both deserving and well described. If it merely leads to some serious experiments with systematic search dialogues, we have come a long way. --SM5POR (talk) 21:26, 3 February 2022 (UTC)[reply]

Voting