Community Wishlist Survey 2021/Categories/Find similar images

Find similar images

  • Problem: Many of the new images on Wikimedia Commons have insufficient metadata, but are similar to ones we already have.
  • Who would benefit: Users of Wikimedia Commons
  • Proposed solution: A find similar routine that enabled categorisers on Wikimedia commons to link a new image which lacks a good description to other images of the same object
  • More comments:
  • Phabricator tickets:
  • Proposer: WereSpielChequers (talk) 22:12, 29 November 2020 (UTC)[reply]

Discussion

How would this work? By comparing metadata or comparing the files? The former seems to be the issue ("insufficient metadata") and the latter seems to be difficult due to the amount of files, and the fact that even if the files are similar enough to be detected, they would get removed for being too similar. Opalzukor (talk) 19:36, 8 December 2020 (UTC)[reply]

Often new images lacking categories and/or detailed description are used in a specific Wikipedia article by the owner of the image. This kind of indirect metadata could be used automatically by adding the commonscat category of the article (if it exists) to the image. --HeinrichStuerzl (talk) 20:43, 8 December 2020 (UTC)[reply]

Perhaps use the same technique as in Google Images (click on the camera icon), would that be possible? JopkeB (talk) 04:37, 9 December 2020 (UTC)[reply]

A better search would allow part names or words rather than rely on complex AI e.g. 'suffrag' finding Suffrage, Suffragette, Suffragist - apologies if this already exists in Commons. Kaybeesquared (talk) 13:42, 9 December 2020 (UTC)[reply]

Voting

While AI might not reach the requirement to fully automate it, I think if getting a little help from AI, a human Wikipedia editor can have 10x or 100x productivity finding similar images while maintain a high precision by human review. Therefore, it might not be feasible in other use-cases, but with passionate Wikipedian's help I think there is a feasiblity. Xinbenlv (talk) 05:07, 9 December 2020 (UTC)[reply]