Grants talk:Project/Ilya/ScalaWiki data processing toolbox

Latest comment: 7 years ago by Mjohnson (WMF) in topic Changing status to 'test'
Moved to draft by the author --Ilya (talk) 15:35, 24 August 2016 (UTC)Reply

Reminder: Project Grant Application Deadline is Aug. 2nd

edit

@Ilya: please note that the deadline for the first round of Project Grants is today, August 2nd. If you'd like to be considered in this round, please update the status on your grant from "draft" to "proposed". Alex Wang (WMF) (talk) 03:15, 3 August 2016 (UTC)Reply

@AWang (WMF):, Thank you, updated. I also would like to edit the proposal before community discussion starts. --Ilya (talk) 03:27, 3 August 2016 (UTC)Reply
@Ilya:, No problem! Alex Wang (WMF) (talk) 03:29, 3 August 2016 (UTC)Reply

Some concerns

edit
  • The project goals do not appear to be well defined.
  • It is not clear whether the toolbox would extract the data from MediaWiki API, SQL, dumps or from all sources?
  • It is not clear what data or types of data the proposed toolbox will make available for tool developers and in what form.
  • It is not clear that the proposed toolbox is sustainable in the long run. Who will upgrade it if API changes?
  • The project seems to have a very broad scope and it is unclear if it is really realistic taking into account that only one developer is going to work on it?

Ruslik (talk) 12:08, 12 August 2016 (UTC)Reply

Concerns and suggestions for refining proposal

edit

The project needs more definition. It muddles technologies with solutions. Seems that it wants data to be more easily accessible and query-able. It doesn't go into how to do that but rather mentions a bunch of big data technologies. Since the goal is to make data available I suggest to focus on jupyter notebooks that can type into currently existing data sources. Analytics team uses scala, spark and oozie in our cluster but none of those technologies are useful for less technical users to tap into data and much less os for labs environment , which has heavy performance constrains.


Jupyter notebooks is a better fit.

Pageview data

edit

Pageview data is public and already available on Pageview API, it can be made available also via querying other datasets we hold in hive/druid.

Edit data

edit

The analytics team is already working in making edit data as easily available to the community as part of our efforts to replace stats.wikimedia.org (wikimetrics runs on labs and it is subjected to many constrains so it will never work for wide data access and complex calculations). See backlog regarding editing data here: https://phabricator.wikimedia.org/T130256

Concrete suggestions for improving the proposal

edit
  • Be specific as to what are the deliverables. For example: " being able to run any kind of metric when it comes to edit data for ukranian wikipedia"
  • Be specific as to the project steps. Even if it is at high level.
  • Be specific as to hardware and human resources needed.

NRuiz (WMF) (talk) 16:15, 26 August 2016 (UTC)Reply

  • The current complexity of the proposal prevents me from completing a fair review. I estimate that it would take me several days of reading and writing to express my opinion about the approach.
  • The risks you listed seem too high in my opinion. And there is too much overlap with other ongoing efforts, like wikistats 2.0 and the jupyter notebook project. For this proposal to go forward, it could eliminate some risks and complexity by scoping down and implementing one small easy to measure feature.
  • I suggest meeting with people working on other projects to discuss how this solution fits in.
  • I can discuss further on IRC, my nick is milimetric and I am always in #wikimedia-analytics on freenode.

Milimetric (talk) 21:58, 30 August 2016 (UTC)Reply

@Milimetric:, thank you for acknowledging that you need time for a fair review and inviting for discussion! --Ilya (talk) 05:01, 31 August 2016 (UTC)Reply

Neat ideas, but lacking on concrete deliverables and milestones

edit

The solution overview reminds me a bit of Yahoo! Pipes which was a pretty cool and useful tool.

Problems:

  • No clear end user facing deliverables
  • No clear project milestones
  • Technical choices of Kite and Flink are largely unsubstantiated with clear head to head solution comparison with competing technologies that are already in use in the Wikimedia production environment (Oozie, Spark).
  • Presumption of Tool Labs hosting is not accompanied by any estimate of CPU, RAM, and disk usage needs for the completed project.
  • No clear path for promotion from Labs experiment to hardened production services.
  • No mention of OSI-approved software license that would be used by the project.
  • Choice of scala as implementation language excludes a large portion of existing Tool Labs developers based on the 2015 survey results which reported only 6.6% of users preferring to develop in Java.

— The preceding unsigned comment was added by BDavis (WMF) (talk) August 26 2016 17:13 (UTC)

October 11 Proposal Deadline: Reminder to change status to 'proposed'

edit

The deadline for Project Grant submissions this round is October 11th, 2016. To submit your proposal, you must (1) complete the proposal entirely, filling in all empty fields, and (2) change the status from "draft" to "proposed." As soon as you’re ready, you should begin to invite any communities affected by your project to provide feedback on your proposal talkpage. If you have any questions about finishing up or would like to brainstorm with us about your proposal, there are still two proposal help sessions before the deadlne in Google Hangouts:

Warm regards,
Alex Wang (WMF) (talk) 03:16, 6 October 2016 (UTC)Reply

Project Grant proposal submissions due today!

edit

Thanks for drafting your proposal for a Project Grant. Proposals are due today! In order for this submission to be reviewed, it must be formally proposed. When you have completed filling out the infobox and have fully responded to the questions on your draft, please change status=DRAFT to status=PROPOSED to formally submit your grant proposal. This can be found in the Probox template found on your grant proposal page. If you have already done this, thanks for your submission, and you should be receiving feedback from the Project Grants committee in the coming weeks. Thanks, I JethroBT (WMF) (talk) 18:16, 14 March 2017 (UTC)Reply

Changing status to 'test'

edit

Since this proposal has not been edited for a year, I am changing the status from "draft" to "test" (the default status when a proposal is first created) so that it will not continue to show up in the draft proposals queue. If you would like to continue to work on your proposal and make it visible for additional feedback, you can change the status back to draft at any time.

As posted on the Project Grants startpage, the deadline for submissions this round is September 26, 2017. To submit your proposal, you must (1) complete the proposal entirely, filling in all empty fields, and (2) change the status from "draft" to "proposed." A

Warm regards,
--Marti (WMF) (talk) 05:13, 26 September 2017 (UTC)Reply

Return to "Project/Ilya/ScalaWiki data processing toolbox" page.