Grants talk:Project/AICAT

Interesting idea

edit

Curious to see how it goes. --Jura1 (talk) 09:25, 3 February 2018 (UTC)Reply

Very interesting, we’ll certainly be taking a look at this! One thing, though: I can’t find the section you’re linking to on that page. Would you be able to provide some more indication how to find it? Linasv (talk) 21:43, 18 February 2018 (UTC)Reply
It has been archived in the meantime: d:Wikidata:Project_chat/Archive/2018/02#Interesting_initiative. --Jura1 (talk) 11:59, 23 February 2018 (UTC)Reply

Comments of Glrx

edit

I would decline this proposal. There is no indication that this project can provide reasonable results in the short term. What data would be used to train the classifier? What does the top five error rate of 3% actually mean? The classifier will provide its top 5 guesses, and 97% of the time the correct class will be in that list of 5? Will the image be autoclassified to each of the 5 classes (and therefore have an 80% error rate because 4 of the classes are wrong)? How difficult is the real-world task on Commons? Is distinguishing a cat from a dog as easy as distinguishing impressionist and post-impressionist paintings? Glrx (talk) 19:09, 5 February 2018 (UTC)Reply

Thanks for your feedback. We wanted to answer it to address some of those concerns. First of all, we are planning to use models pre-trained on ImageNet. Secondly, the existing labelled images in the Wikimedia Commons repository can serve as a dataset for the training, test and validation sets. We appreciate that this is likely going to require some not insignificant semi-manual data wrangling to pull that dataset together - but part of the aim of this project is to investigate and understand the current “ML-readiness” of the WIkimedia Commons image repository. For the second question, we think providing a set of labels that can be manually removed or modified by the users later will already be of great value to the Structured Data project. Thus, keeping all five top labels in the automated step is probably best. For the third question, we suspect that some image types (e.g. those showing specific object categories) will be far more amenable to this approach than impressionist paintings. We are not hoping to solve the entire image classification pipeline with this project, just make a first reasonable foray into applying ML based tools to this task. Linasv (talk) 21:43, 18 February 2018 (UTC)Reply
I would still decline. I don't get the sense of a clear project or a well-defined task. ImageNet has a restrictive noun vocabulary and averages 500 training images per noun. I don't see Commons having the same value for training. Commons has 52 c:Category:Portraits of Winston Churchill, and those vary in age from child to young adult to elder statesman; there are versions in military uniform and civilian attire. If an editor were making 80% bad edits on en.WP, then that editor would get warned; if the editor persisted, the editor would be blocked or banned for incompetence. I need to see detail beyond image categorization is an important problem, Google has an image classifier, and Google's classifier should be used on Commons. I'm also leery of the participants' qualifications. I went over to cooljugator (a project by both participants), and the first conjugation page I landed on was the horribly confused https://cooljugator.com/en/broke ("Oh, it broked off 'cause they were unglued."; "The worst was when he threw himself over everybody and if I do not hold him, he brokes his head on the pavement."; "Sorrys. Things beens, like, a bit hecticals since my bands brokes up..."; "I mean, we ams supposed to be brokes up by now."). OK, I purposely selected an irregular verb, but I didn't expect to see confusion among wikt:break, wikt:break up, and wikt:broker. Glrx (talk) 20:38, 28 February 2018 (UTC)Reply
Thanks for your opinion. You're right that Commons does not have the same values for training but that is precisely what we are trying to do: bridging the gap between ImageNet categories and Commons categories to the maximum possible extent. And yes, ImageNet has been trained on Winston Churchill too. Thanks for taking a look at Cooljugator. Please note that 'broke' on that page is a verb [with the meaning to act as a broker]. You are correct to point out that the examples on that page are incorrect, since our examples picker wrongly identified the past participle of the verb break as a form of the verb. However, this is something we cannot fully avoid, especially not as we are trying to make Cooljugator freely available in close to fifty languages. We outline and discuss ways how we are addressing those limitations on [our about page] and further under [our terms of service], and, it is indicated on top of the page that you have linked to that the information on that page is still Review pending. In regards to the verbs you have mentioned, you might have intended to see one of these pages: [break], [break up] or [broker], conjugations for all of which we have. Linasv (talk) 15:37, 1 March 2018 (UTC)Reply
In addition, we thought we would let you know we have slightly reformulated our goals to make them more specific (also in alignment with feedback from Structured Data) and expanded our team with one additional senior member in order to address some concerns about our ability to build ML models. Linasv (talk) 12:29, 9 March 2018 (UTC)Reply

Comments of Pintoch

edit
  • There is no clear evidence that the proposers can achieve what they set out for (if you have done relevant work in the past, you should link to it more clearly).
  • Even if the goals are attained, no integration with Commons is planned: this project is just about building a research prototype. Going from a research prototype to a system that will actually help editors takes a lot of effort (sometimes more than what it took to build the prototype). So there will be no direct benefit to the Wikimedia community.

Pintoch (talk) 11:37, 6 February 2018 (UTC)Reply

Thanks for your feedback! Well, in order to address the implementation with Commons part, we have contacted some members of the team behind the Commons: Structured Data project. The product manager indicated they would 'absolutely want to integrate machine learning into the Structured Data project', and, if we complete the task, they would be ready to work on plugging it in at the appropriate stage. They even indicated they might be able to give us a design/spec framework to work within. We intend to cooperate closely with them, thus we would do our best that this does not remain at the level of research and tangible benefits accrue to the community. Linasv (talk) 21:43, 18 February 2018 (UTC)Reply
As a non-developer, though I'm enthusiastic about the general idea, the outline what is meant to be done doesn't really talk to me either. Maybe I underestimate the complexity of the issue, but wouldn't there be to parts:
  1. propose categories (or values for "depicts"-property statements) based on the existing algorithm/imagenet mappings to Wikidata items
  2. train the algorithm further based on images with categories or depicts statements.
--Jura1 (talk) 11:59, 23 February 2018 (UTC)Reply
Thanks for your further input! Well, our development process is likely to involve both of these things. For example, we are certainly planning to propose categories in a way you have described (we have tried to say so right at the end of the solution section). We are not yet sure whether training the algorithm further with Wikipedia's categories as opposed to using other classified data is likely to be viable as a strategy, but it is something we are likely to attempt (it is likely to be part of the first two of our 'Project goals', and its efficiency is likely to be reflected in our final recommendation report, mentioned in goal No. 3). Linasv (talk) 23:21, 27 February 2018 (UTC)Reply

The Commons app would be interested in this, if it works well

edit

Unlike the Upload Wizard, the Commons mobile app has smart suggestions, currently based on:

  • Categories used in the past by that user
  • Search based on the title/description
  • EXIF location is used to find the categories of existing Commons pictures around that picture

We would be interested in adding AI-based suggestions in addition to the suggestions given by the three methods above. Actually we identified that need two years ago.

Ideally the tool would provide us with Commons categories, but with Structured Commons coming QIDs would be OK too.

Good luck! Syced (talk) 12:29, 6 February 2018 (UTC)Reply

We would be very interested in discussing in more detail how we can make the API of our prototype suitable for your application! What do you think the best way for us to go about this is? Linasv (talk) 21:43, 18 February 2018 (UTC)Reply

Comments of Ruslik0

edit

It is a really interesting proposal. But I see some problems:

  1. You goals are not really goals. They look like a project plan. "An assessment of the current best practices and model implementations ..." can not really be a goal!
  2. Furthermore the measures of success are very vague, possibly, because you do not have well defined goals (see above).
  3. It seems to me that your familiarity with the Wikimedia projects is very limited. I advise you to find a good adviser with a significant experience with the Wikimedia projects.

Ruslik (talk) 18:10, 14 February 2018 (UTC)Reply

Thanks for your comments! In regards to our goals, we think our second goal (the ML implementation) to be the main one, but we thought that providing a survey of the space and delivering some clear written recommendations for fit within Wikimedia can help people working with similar tasks in the future, and thus that those two things are important enough to be outlined as separate goals. We do welcome suggestions on this: do you think there is a way to modify our goals (perhaps removing the first and third ones) so as to address your concerns? In regards to our familiarity with Wikimedia, we hope to cooperate closely with Commons: Structured Data (see above for our conversation with them so far). Linasv (talk) 12:27, 9 March 2018 (UTC)Reply

Comments of Incnis Mrsi

edit

The proposed strategy—to deploy expensively trained robots to the front where humans perform well—is wrong. If Wikimedia Commons needs an expensive categorization helper at all, then such helper should compete with humans in realms where humans have shortcomings. Some counter-proposals for machine-detectable conditions of a fresh upload, for which an automatic action would be really useful.

  • Exif contains a machine-readable date in the past, whereas date= the current date → correct the argument to date=.
  • source={{own}}, not a very old date, the image is presumedly a photograph, but missing Exif → alarm, intervention of the community needed.
  • A small © symbol found in a raster image → alarm, intervention of the community needed.
  • The image likely underwent resampling → tag for cleanup.
  • The image looks like a logo, and uploader is registered only few days ago → alarm.
  • The image is encoded in JPEG, but is heavily posterised (uses few colors) → {{BadJPEG}}.
  • A raster image likely contains a lot of text → attempt to do OCR and further action depending on results.
  • An image looking like a GUI screenshot but not tagged as such → alarm, intervention of the community needed.

Incnis Mrsi (talk) 16:55, 9 March 2018 (UTC)Reply

Thanks for your input. These are all excellent suggestions and may form part of a separate grant proposal in their own right. However, this is very distant in purpose from what we are proposing. Linasv (talk) 17:06, 9 March 2018 (UTC)Reply

Eligibility confirmed, round 1 2018

edit
 
This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 1 2018 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through March 12, 2018.

The committee's formal review for round 1 2018 will occur March 13-March 26, 2018. New grants will be announced April 27, 2018. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 02:20, 17 February 2018 (UTC)Reply

Feedback from WMF Research Team

edit

Dear Linasv, Alekszej, and KMisiunas,

Staff from the Research Team at the Wikimedia Foundation (WMF) have reviewed your proposal and we wanted to provide some feedback. We do appreciate your interest in AI-based image classification for Wikimedia Commons, a very relevant topic for the Wikimedia movement at this time.

We have general reservations about grant-funded projects piloting ‘’technology’’ in the form of APIs or services that are meant to be considered for productization/platform integration at a later stage. The barriers to reuse and integrate tech in our stack are significant. We’ve found through our own experience that knowledge transfer from research to service prototyping to productization is challenging even within WMF. We feel it would be even more challenging for grantees to manage this complex process with much more limited resources, even if the grantees have extensive experience in this field.

While image classification (beyond a first pilot we just completed) is not among the current annual goals for the Technology group at WMF, we have tentative plans for a multi-team effort in 2019 or 2020, involving full-time staff with dedicated experience in image classification. We believe that a robust staff-led effort is likely to be more successful for such an initiative since it will require more resourcing than is possible through grant funding alone.

Though we value the questions this proposal seeks to address, we believe it would be best not to proceed with this project at this time, given the timeline and resourcing aspects mentioned above.

Kind regards,

Posted on behalf of the Research Team by Marti (WMF) (talk) 01:57, 17 March 2018 (UTC)Reply

Thanks for letting us know, Marti (WMF). Do you think there is a way we could redirect this proposal to make it more useful in terms of the planned efforts, or do you think this is just an insurmountable impasse at this point? Thanks. Linasv (talk) 09:31, 17 March 2018 (UTC)Reply
Linasv, as I understand this feedback, I do see it as an insurmountable impasse, unfortunately. As indicated in the feedback above, this grant program is generally not resourced to sufficiently resource projects to pilot technology meant for productization/platform integration at a later stage. Therefore, funding would depend on tight, well-planned integration with the planned efforts of the Research Team. Since the Research Team has indicated that knowledge transfer is difficult even within WMF, knowledge transfer from outside the organization is likely to be even more challenging still. Such difficulties would need to be addressed in planning stages and even if the Research Team were available to consult, this is not the kind of question that can be answered in the short time frame permitted by our grantmaking process. In short, unfortunately, I don't see a feasible way to make this work in light of the feedback provided above. --Marti (WMF) (talk) 21:06, 19 March 2018 (UTC)Reply

Aggregated feedback from the committee for AICAT

edit
Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
5.5
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
5.0
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
5.5
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
4.5
Additional comments from the Committee:
  • There are many ways to improve the images classifcation in the new upload images, it's great, but there is no way to continue the project after the grant ends. The project is developed outside WMF cloud and the grantees asks for funds to be used in AWS -external services-
  • The project fits with Wikimedia's strategic priorities. However it is unclear how it can be sustained or scaled or adapted to Wikimedia Commons? More likely it will die after the grant period ends.
  • There a lot of risks involved in the execution of the projects and has a lot of external dependencies: AWS server, algorithms, Structured data on Commons.
  • The approach is certainly innovative. The potential impact is significant but so are risks. The measures of success are not well defined.
  • The budget seems reasonble, but high if the users asks funds to implement AWS in the project.
  • The proposal is rather vague in what they are really going to achieve. So, it is not clear if the project can be accomplished within the requested 8 months.
  • There are a few supporters, it seems low as I expect in this kind of project.
  • The community engagement is low although it should be an important part of the project.
  • The project has a lot ways to improve the user experience in Wikimedia Commons, but the grantees could investigate few ways as the Wikimedia CloudVPS could help them to host a prototype. If the interested people return with a prototype, the community -or developers- could check if the idea seems reasonable.
  • I think that the main problem is that the proposal is too general and too vague. Its path to the routine Commons' use is unclear. The applicants should at first submit a more narrowly focused (or pilot) proposal for some helper tool. For instance, they can create a gadget for Commons. And generally they should familiarize yourself with how the Wikimedia Community really works.

Linasv, Alekszej, KMisiunas,

Aggregated committee comments from committee review of your proposal are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal.

Based on their review, a majority of committee reviewers have not recommended your proposal for funding. We routinely invite applicants to respond to committee comments before a final funding decision is announced on May 18. You are welcome to respond if you like. For the sake of respecting your time, however, I will restate that I do expect the Research Team's feedback to be a blocker to a grant award in this case.

Please let me know if you have any questions.

Kind regards, --Marti (WMF) (talk) 00:28, 19 April 2018 (UTC)Reply

Round 1 2018 decision

edit
 

This project has not been selected for a Project Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding. This was a very competitive round with many good ideas, not all of which could be funded in spite of many merits. We appreciate your participation, and we hope you'll continue to stay engaged in the Wikimedia context.


Next steps: Applicants whose proposals are declined are welcome to consider resubmitting your application again in the future. You are welcome to request a consultation with staff to review any concerns with your proposal that contributed to a decline decision, and help you determine whether resubmission makes sense for your proposal.

Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.
Return to "Project/AICAT" page.