維基媒體基金會年度計畫/2024-2025/產品與技術目標與關鍵結果
本文件代表維基媒體基金會產品與技術部門 2024-2025 年度計劃流程的第一部分。這描述了該部門的「目標和關鍵成果」(OKRs)。這是去年開始的工作組合(名義上稱為「工作桶/工作類別」)結構的延續。
我在 11 月時就與大家討論了我認為維基媒體運動面臨的最緊迫問題:我們如何確保維基百科和所有維基媒體計畫是能夠延續多代的?我要在此感謝所有花時間認真考慮這個問題並直接回覆我的人士——現在我有機會花一些時間思考你們的回答。我將分享我在當中所學到的東西。
首先,志願者做出貢獻的原因並非單一性。為了培養多世代的志願者,我們需要更好地了解人們為我們的計畫貢獻時間的多種原因。接下來,我們需要專注於我們的與眾不同之處:隨著虛假訊息和錯誤訊息在網路和平台上爭奪新一代注意力,我們有能力提供值得信賴的內容。這包括確保我們透過擴大對因不平等、歧視或偏見造成缺失資訊的覆蓋範圍來實現匯集並向世界提供所有人類知識的使命。我們的內容同時需要在人工智慧和豐富體驗驅動的互聯網中發揮作用並保持活力。最後,我們需要找到方法透過為我們的產品和收入制定共同策略,為我們的維基媒體運動提供可持續的資金,以便我們能夠長期資助這項工作。
這些想法將反映在維基媒體基金會的 2024–2025 年度計劃中,我今天以我們產品與技術工作目標草案的形式與大家分享其中的第一部分。與去年一樣,我們的整個年度計劃將圍繞我們的受眾和平台技術需求,我們希望您提供回饋,以讓我們了解工作是否專注於正確的問題。在過去幾個月,我們透過對話:2024、郵件清單和討論頁面,以及社群活動聽取了社群成員關於未來一年產品和技術策略的意見——這些目標就是在此基礎上制定的。您可以在下面查看目標草稿的完整清單。
「目標」是一個高層次的方向——這將塑造我們下一個財政年度採取的產品和技術專案。我們有意讓這些目標更為寬泛,以代表我們的策略方向——更重要的是,這代表了我們建議在來年的許多可能的重點領域中優先考慮的挑戰。我們現在分享這一點,以便社群成員可以在制定今年的預算和可衡量目標之前幫助塑造我們的早期思維。
反饋
我們特別希望獲得回饋的一個領域是我們以「維基體驗」為名進行的工作。「維基體驗」是關於我們如何有效地交付、改進和創新人們直接使用維基的方式——無論是作為貢獻者、使用者還是捐贈者。這涉及支援我們的核心技術和能力,並確保我們可以透過更好的功能和工具、翻譯服務和平台升級來改善志願者編輯的體驗,特別是具有擴展權限的編輯。
以下是我們最近規劃討論的一些反思,以及向大家提出的一些問題,以幫助我們完善我們的想法:
- 志願者應該會感到參與維基媒體計畫很有意義。我們同時認為,線上協作的體驗應該是吸引志願者回來的一個主要因素。志願者需要怎麼做才能發現編輯是有價值的,並且更好地共同努力建立值得信賴的內容?
- 我們內容的可信度是維基媒體對世界獨特貢獻的一部分,也是吸引人們造訪我們的平台並使用我們的內容的原因。我們可以建立什麼來幫助更快地成長值得信賴的內容,但仍然在社群為每個專案設定的品質護欄之內?
- 為了保持相關性並與其他大型線上平台競爭,維基媒體需要新一代消費者感受到與我們的內容的連結。我們如何讓讀者和捐贈者更容易發現我們的內容並與之互動?
- 在網路濫用盛行的時代,我們需要確保我們的社群、平台和服務系統受到保護。我們同時面臨著不斷變化的合規義務,全球政策制定者希望塑造線上隱私、身分和資訊共享。我們打擊濫用行為的哪些改進將幫助我們應對這些挑戰?
- MediaWiki 是維基百科得以運行的軟體平台和介面,需要在未來十年內持續提供支援,以便大規模提供開放式多語言內容的創建、審核、儲存、發現和使用。今年我們可以做出哪些決策和平台改進來確保 MediaWiki 永續發展?
目標
目前公佈的是最高規劃等級-「目標」。
下一階段——每個最終目標的「關鍵結果」(英文簡稱為:KR)已列於下文。
每個關鍵結果的基礎「假設」會在下方公佈,並將在相關專案/團隊的維基頁面上全年不斷更新,並隨著吸取的經驗教訓而全年不斷更新。
維基體驗 (英文簡稱為:WE)目標 | ||||
---|---|---|---|---|
目標 | 目標範圍 | 目標 | 目標內容 | 擁有者 |
WE1 | 貢獻者體驗 | 擁有豐富經驗的貢獻者和新貢獻者都在線上聚集在一起,更加輕鬆地建立值得信賴的百科全書。 | 為了使維基百科在未來幾年充滿活力,我們必須開展工作,培養新世代志願者,並讓貢獻成為人們想做的事情。不同世代的志願者需要不同的投資——更有經驗的貢獻者需要簡化和修復其強大的工作流程,而新的貢獻者需要對他們有意義的新編輯方式。在這幾代人中,所有貢獻者都需要能夠相互聯繫和協作,以完成最有影響力的工作。為了實現這一目標,我們將改進經驗豐富的貢獻者的關鍵工作流程,我們將降低新成員做出建設性貢獻的障礙,並且我們將投資於志願者可以圍繞共同興趣相互發現和溝通的方式。 | Marshall Miller |
WE2 | 維基百科內容 | 支援社群透過更容易存取、適應和改進的工具和支援系統有效縮小知識差距,確保值得信賴的百科全書內容成長。 | 維基百科上主要的百科全書內容可以透過持續參與和創新來增加和改進。可供貢獻者用於滿足其需求的工具和資源(技術性和非技術性)可以變得更容易發現和可靠。透過在短週期內實現功能改進,維基媒體基金會應該更好地支援這些工具。鑑於人工智慧輔助內容產生和改變使用者行為的最新趨勢,我們同時將探索重大變革的基礎工作(例如:維基功能-Wikifunctions),以幫助內容創建和重複使用的規模增長。識別內容差距的機制應該更容易發現和規劃。支援百科全書內容成長的資源(包括姊妹專案、維基百科圖書館等等的專案和活動的內容)可以更好地與貢獻工作流程整合。同時,用於成長的方法應該有針對日益增長的威脅的防護措施,這可以確保對過程的持續信任,同時忠於維基媒體計畫所認可的百科全書內容基本原則。
受眾:編輯、翻譯 |
Runa Bhattacharjee |
WE3 | 消費者體驗(閱讀與媒體) | 新一代消費者來到維基百科,尋找一個發現和參與百科全書內容,並與其建立持久聯繫的首選目的地。 | 目標:
留住現有和新一代的消費者和捐助者。 透過使我們的內容更容易發現和互動,提高與現有和新一代消費者的相關性。 跨平台工作以適應我們的經驗和現有內容,以便新一代消費者和捐贈者可以探索和策劃百科全書式內容。 |
Olga Vasileva |
WE4 | 信任與安全 | 改善我們的基礎設施、工具和流程,以便我們有能力保護社群、平台和我們的服務系統免受不同類型的大規模和定向濫用,同時保持對不斷變化的監管環境的合規性。 | 我們打擊濫用行為的能力中某些方面需要升級。基於 IP 濫用緩解措施的效果越來越差,一些管理工具需要提高效率——我們需要制定一個統一的策略,透過使用各種訊號和緩解機制(驗證碼、區塊驗證等等)來幫助我們打擊大規模濫用行為。今年,我們將開始在該領域最大的問題上取得進展。此外,對濫用保護的投資必須與對社群健康的理解和改善的投資相平衡,其中的幾個方面均包含在各種監管要求中。 | Suman Cherukuwada |
WE5 | 知識平台 1(平台演進) | 發展 MediaWiki 平台及其介面,以便更好地滿足維基百科的核心需求。 | MediaWiki 旨在支援大規模創建、管理、儲存、發現和消費開放的多語言內容。在知識平台發展的第二年,我們將對系統進行策劃,並開始致力於平台改進,以從維基百科開始有效支援未來十年維基媒體計畫的核心需求。這包括繼續努力定義我們的知識生產平台,加強平台的可持續性,專注於擴展/掛鉤系統以澄清和簡化功能開發,以及繼續投資於知識共享並使人們能夠為 MediaWiki 做出貢獻。 | Birgit Müller |
WE6 | 知識平台 2(開發者服務) | 技術人員和志願者開發人員擁有有效支援維基媒體計畫所需的工具。 | 我們將繼續致力於改進(和擴展)維基媒體製作中的開發、測試和部署工作流程,並擴展定義以包括為工具開發人員提供的服務。我們同時旨在提高回答開發人員/工程工作流程和受眾領域常見問題的能力,並提供相關數據以做出明智的決策。這項工作的一部分是研究目前對我們的生態系統構成挑戰的實踐(或缺乏此類實踐)。 | Birgit Müller |
訊號與資料服務 (英文簡稱為:SDS)目標 | ||||
---|---|---|---|---|
目標 | 目標範圍 | 目標 | 目標內容 | 擁有者 |
SDS1 | 共享见解 | 我們關於如何支持維基媒體使命和運動的決策是基於高階指標和見解。 | 為了使我們能夠有效且有效率地建立技術、支援志願者,並倡導保護和促進知識獲取的政策,我們需要了解維基媒體生態系統並參考成功的例子進行調整。這意味著追蹤一組可靠、易於理解且及時可用的通用指標。這同時意味著提供研究和見解,幫助我們了解測量背後的原因和方式。 | Kate Zimmerman |
SDS2 | 實驗平台 | 產品經理可以快速、輕鬆、自信地評估產品功能的影響。 | 為了實現並加速有關產品功能開發的數據知情決策,產品經理需要一個實驗平台——他們可以在其中定義功能、選擇受眾,並查看影響的衡量標準。加快從發佈到分析的時間至關重要——因為這縮短學習時間,加速實驗,並最終加速創新。手動任務和定制測量方法已被認為是速度的障礙。理想的情況是,產品經理可以從實驗啟動到發現,而無需工程師和分析師的手動干預。 | Tajh Taylor |
未來受眾 (英文簡稱為:FA)目標 | ||||
---|---|---|---|---|
目標 | 目標範圍 | 目標 | 目標內容 | 擁有者 |
FA1 | 檢驗假設 | 根據實驗的見解,為維基媒體基金會提供策略性投資建議,這些實驗加深了我們對知識如何在線上共享和消費的理解,幫助我們的運動在不斷變化的互聯網中為新受眾提供服務。 | 由於技術和線上用戶行為的持續變化(例如:越來越多的人喜歡通過社交程序獲取信息、短影片寓教於樂的流行、生成式人工智慧的興起),維基媒體運動在吸引和留住讀者和貢獻者方面面臨挑戰。這些變化同時帶來了透過以新方式創建和傳遞訊息來服務新受眾的機會。然而,作為維基媒體運動,我們並沒有清晰的數據資訊來了解我們可以採取哪些潛在策略來克服挑戰或抓住新機會的好處和權衡。例如,我們應該:
為了確保維基媒體成為一個多世代的專案,我們將測試不同的假設,以更好地理解和推薦有前途的策略——對於維基媒體基金會和維基媒體運動——追求吸引和留住未來的受眾。 |
Maryana Pinchuk |
產品和工程支援(英文簡稱為:PES)目標 | ||||
---|---|---|---|---|
目標 | 目標範圍 | 目標 | 目標內容 | 擁有者 |
PES1 | 營運效率 | 讓維基媒體基金會的工作更快、更具經濟效益、更有影響力。 | 員工在日常工作中處理大量工作,以使我們的營運更快、更具經濟效益、更有影響力。這一目標強調了具體舉措,這些舉措將取得更快、更具經濟效益或更有效的實質收益;並協調工作以改變維基媒體基金會的正式和非正式做法。從本質上講,該目標中包含的關鍵結果是我們今年可以對涉及我們產品和技術的工作的營運效率做出的最困難和最好的改進。 | Amanda Bittaker |
關鍵結果
下列為每個最終目標的「關鍵結果」(英文簡稱為:KR),對應上述的每個目標。
每個關鍵結果的基本「假設」都公布在本頁下方,並將在全年內根據經驗教訓在相關專案或團隊的維基頁面上進行更新。
維基體驗(英文簡稱為:WE)關鍵結果
[ 目標 ] | |||
---|---|---|---|
關鍵結果簡稱 | 關鍵結果文本 | 關鍵結果背景 | 擁有者 |
WE1.1 | 發展或改進一種工作流程,幫助具有共同興趣的貢獻者相互聯繫並共同做出貢獻。 | 我們認為維基上的社群空間和互動讓人們作為貢獻者更快樂和更有效率。此外,社群空間可以幫助新成員開展貢獻和指導,塑造貢獻的最佳實踐,並幫助解決知識差距。然而,維基百科上支援人際關係的現有資源、工具和空間都較差,無法滿足當今大多數編輯的挑戰和需求。同時,活動團隊的工作表明,許多組織者渴望採用和嘗試具有結構化工作流程的新工具,以幫助他們進行社群的工作。基於這些原因,我們希望專注於鼓勵和促進維基貢獻者的歸屬感。 | Ilana Fried |
WE1.2 | 建設性啟動:根據受控實驗的測量,廣泛部署的干預行動顯示,在行動裝置的主命名空間中發表≥1篇建構性編輯的新使用者,在行動網頁上會共同造成10%的相對增加 (年增率),而在iOS上則會造成25%的相對增加 (年增率)。
注:該關鍵結果衡量會根據每個平台為準。 |
目前的整頁編輯體驗需要太多的背景、耐心和反覆試驗,許多新成員才能做出建設性的貢獻。為了支援新一代的志願者,我們將增加更小、更具結構化和更特定於任務的編輯工作流程(例如:編輯檢查和結構化任務)的數量和可用性。
注意:此基準只會在本財政年度第四季末建立,之後我們將建立關鍵結果目標指標百分比。 |
Peter Pelberg |
WE1.3 | 將4款審核產品的使用者滿意度提高5個百分點。 | 擁有擴充權限的編輯者可以利用各種現有功能、擴充功能、工具和腳本來對維基媒體專案執行審核任務。今年我們希望專注於改進這個工具,而不是開展專案來在這個領域建立新功能。我們的目標是在這一年中接觸多種產品,並希望對每項產品做出有影響力的改進。透過這項工作,我們希望整體上改善內容審核的體驗。
我們將為常見的仲裁者工具定義基線,以確定每個工具滿意度的提高情況。社群願望清單將為決定關鍵結果的優先事項做出重大貢獻。 |
Sam Walton |
WE2.1 | 到第二季度末,支持組織者、撰稿人和機構通過實驗在關鍵主題領域[TBD]增加X篇高質量內容。 | 該關鍵結果旨在提高主題覆蓋範圍,以縮小現有的知識差距。我們已經確定社群可以從有效的工具和旨在提高專案內容品質的活動中受益。今年我們希望專注於改進現有工具,並嘗試新的方法來確定關鍵主題領域的優先級別,以解決知識差距。
我們將根據現有的優質內容創作基準來確定X篇目標文章。此外,我們將在下個季度與社群和機構確定共同關注的主題領域。 |
Purity Waigi & Fiona Romeo |
WE2.2 | 到第二季末,實施並測試兩項社會和技術建議,以支援小型語言社群的語言開展工作,並進行評估以分析社群的回饋。 | 維基百科有大約 300 種語言的版本,可是這個世界有更多的語言被數百萬人每天使用——這些語言當中沒有維基百科,也沒有任何維基專案。這是實現我們願景的障礙:每個人都可以自由地分享所有知識的總和。維基媒體孵育場是由維基媒體基金會託管的地方——潛在的新語言版本維基媒體計畫維基可以在此被安排、編寫、測試——並證明這是值得的。維基媒體孵育場於2006年啟動,當時假設其使用者已事先具備維基編輯的知識。儘管維基媒體和維基專案上的編輯工作自那時以來已經有了很大改進,但由於技術限制,維基媒體孵育場並沒有得到更新。目前,一個維基專案從維基媒體孵育場畢業需要幾週的時間,而每年只有大約12個維基專案被創建,這顯示了一個明顯的瓶頸。
現有的研究和材料揭示了為各種語言開展工作各個階段的技術挑戰:向維基媒體孵育場添加新語言、開發和審查內容的複雜性,以及當語言從維基媒體孵育場完成時創建維基網頁的過程緩慢。 每個階段都是緩慢的、需要人手操作的且複雜的——這表明需要改進。解決這個問題將允許更快速和更輕鬆地用新語言創建維基,並允許更多的人一起分享知識。各種利害關係者、現有研究和資源都強調了所提出的社會和技術建議。這項關鍵結果建議測試社會和技術方面的兩項建議,並評估社群的回饋。 |
Satdeep Gill & Mary Munyoki |
WE2.3 | 到第二季末,兩個新功能引導貢獻者添加符合專案指南的來源資料,3至5個合作夥伴提供了解決語言和地理差距的來源材料。 | 為了增加獲得縮小策略性內容差距所需的優質來源材料的機會,我們將:
|
Fiona Romeo & Alexandra Ugolnikova |
WE2.4 | 到第二季末,啟用 Wikifunctions 調用至少一種較小語種的維基百科,以提供一種更具可擴展性的方式來播種新內容。 | 為了有效縮小我們的知識差距,我們需要改進工作流程,以支援高品質內容的可擴展成長,尤其是在較小的語言社群中。 | Amy Tsay |
WE3.1 | 向代表性維基發佈兩個精心策劃的、可訪問的、並由社群驅動的瀏覽和學習體驗,目標是將體驗用戶的非登入讀者保留率提高 5%。 | 該關鍵結果的重點是提高新一代讀者在我們網站上的保留率,透過探索讓讀者更輕鬆地發現和學習他們感興趣的內容的機會,讓新一代與維基百科建立持久的聯繫。這將包括探索和開發新的策劃、個人化和由社群驅動的瀏覽和學習體驗(例如:相關內容的提要、主題內容推薦和建議、社群策劃的內容探索機會等等)。
我們計劃在本財政年度開始時進行一系列瀏覽體驗實驗,以確定我們希望將哪些瀏覽體驗擴展到生產用途,以及在哪個平台(網頁、應用程式或兩者)上進行擴展,然後我們將專注於擴展這些實驗並測試它們在提高生產環境中的保留率方面的功效。我們的目標是在今年年底在代表性維基上推出至少兩種體驗,並準確測量參與這些體驗的讀者保留率增加 5%。 為了最有效地實現此關鍵結果,我們需要能夠對非登入使用者進行 A/B 測試,以及能夠測量讀者保留率的儀器。我們可能同時需要新的 API 或服務來提供推薦和其他管理機制。 |
Olga Vasileva |
WE3.2 | 每個平台透過年度橫幅和電子郵件呼籲以外的接觸點進行的捐款數量增加 50%。 | 我們的目標是提供多樣化的收入來源,同時認可我們現有的捐助者。根據回饋和數據,我們的重點是增加捐款數量,超越維基媒體基金會過去所依賴的方法,特別是年度橫幅呼籲。我們希望透過投資於更全面的捐款者體驗,我們可以為對橫幅呼籲沒有反應的捐款者和潛在捐款者提供替代方案,從而維持我們的工作,並擴大我們的影響力。50% 是基於 Vector 2022 導致網路上捐款按鈕可見度降低,以及 2023-2024 財政年度維基百科應用程式上旨在增強捐贈者體驗的試點計畫的捐贈數量增加的初步估計(50.1%的捐款增加)。按平台評估此指標將有助於我們了解平台的趨勢,以及未來是否應根據平台受眾的行為差異部署不同的策略。 | Jazmin Tanner |
WE3.3 | 到2024-25年第二季度末,志願者將開始在維基百科制作文章中將傳統圖形轉換為新的圖形扩展。 | 自2023年4月起,Graph扩展由於安全原因已被禁用,讀者無法查看社群成員在過去10年中投入時間和精力制作的許多圖表。
數據可視化在創建引人入勝的百科全書內容方面發揮著重要作用,因此在2024-25財政年度,我們將建立一項新的安全服務,以取代圖形扩展,處理維基百科文章頁面上的大多數簡單數據可視化用例。這項新服務將以可擴展的方式構建,以便在 WMF 或社群開發人員將來選擇支持更複雜的用例時提供支持。 當社群成員使用新服務成功轉換舊圖形和發布新圖形時,我們就知道我們已經取得了成功。在项目初始階段,我們將確定使用哪個底層數據可視化庫以及支持哪些圖表類型。 |
Christopher Ciufo |
WE3.4 | Develop the capability model to improve website performance through smaller scale cache site deployments that take one month to implement while maintaining technical capabilities, security and privacy. |
The Traffic team is responsible for maintaining the Content Delivery Network (CDN). This layer caches frequently accessed content, pages, etc, in memory and on disk. This reduces the time it takes to process requests for users. The second bit is storing content closer to the user in a physical sense. That reduces the time it takes for data to reach the user (latency). Last year, we enabled one site in Brazil meant to reduce latency in the Southern American region. Setting up new data centers would be great but it is expensive, time consuming, and requires a lot of work to get done – for example, last year’s work spanned the full year. We would love to have centers in Africa and Southeast Asia, and we would love to have them all around the world. Our hypothesis is to spin up smaller sites in other places around the world where traffic is lower. These would require fewer servers, of not more than four or five servers. This reduces our cost. It would still help us reduce latency for users in these regions, while being more lightweight in terms of time and effort to maintain them. |
Kwaku Ofori |
WE4.1 | 在第三季度末之前,根據數據和不斷變化的監管環境,提出3項針對騷擾和有害內容的對策建議。 | 確保使用者安全和福祉是網路平台的基本責任。許多司法管轄區均有法律法規要求線上平台針對騷擾、網路霸凌和其他有害內容採取行動。如果未能解決這些問題,平台可能會面臨法律責任和監管制裁。
目前我們不清楚這些問題有多大或其背後的原因。我們嚴重依賴軼事證據和手動流程,這使我們面臨法律風險,以及其他深遠的後果:低估問題、損害升級、聲譽受損和用戶信任受到侵蝕。 我們需要建立強大的文化來衡量騷擾和有害內容的發生率,並積極採取對策。 |
Madalina Ana |
WE4.2 | 在第三季度末之前開發至少兩個用於反濫用工作流程的訊號,以提高對不良行為者採取行動的準確性。 | 維基嚴重依賴 IP 封鎖作為阻止破壞、垃圾郵件和濫用的機制,但 IP 位址作為單一行為者的穩定識別碼越來越沒有用處,並且阻止 IP 位址會對碰巧與不良行為者共享相同 IP 位址的善意用戶產生意想不到的負面影響。IP 位址穩定性下降,以及我們對 IP 封鎖的嚴重依賴,導致針對不良行為者的精確度和有效性降低,同時善意使用者的附帶損害程度不斷增加。我們希望看到相反的情況:降低附帶損害水平,並提高針對不良行為者的緩解措施的精確度。
為了更好地支援工作人員的反濫用工作,並提供可在現有工具(例如:CheckUser、Special:Block)和新工具中重複使用的建置模組,在本關鍵結果中,我們建議探索如何將個人成員與其行為可靠地關聯起來(傀儡緩解措施),並結合現有訊號(例如:IP 位址、帳戶歷史記錄、請求屬性),以便更精確地針對不良行為者採取行動。 |
Kosta Harlan |
WE4.3 | 根據我們調整措施所需的時間以及我們在模擬情況中可以維持的流量來衡量,將大規模分散式攻擊的有效性降低了 50%。 | 網路格局的演變,包括大規模殭屍網路的興起和更頻繁的攻擊,使得我們限制大規模濫用的傳統方法已經過時。此類攻擊會使我們的基礎設施被大量請求淹沒,從而導致我們的網站無法使用,或削弱我們社群應對大規模破壞行為的能力。這同時給我們的高權限編輯和技術社群帶來了不合理的壓力。
我們迫切需要提高自動偵測、抵禦、減輕或阻止此類攻擊的能力。 為了衡量我們的進步,我們不能只依靠實際攻擊的頻率/強度,因為這意味我們依賴外部行動,並且很難清楚地定量了解我們的進展。 透過設置多個不同性質/複雜性/持續時間的模擬攻擊,以便針對我們的基礎設施安全運行,並於每季度運行一次。我們將能夠在不受攻擊的情況下測試我們的新對策,並客觀地報告我們的改進。 |
Giuseppe Lavagetto |
WE4.4 | Launch temp accounts to 100% of all wikis. | Temporary accounts are a solution for complying with various regulatory requirements around the exposure of IPs on our platform on various surfaces. This work involves updating many products, data pipelines, functionary tools, and various volunteer workflows to cope with the existence of an additional type of account. | Madalina Ana |
WE5.1 | 到第三季度末,完成至少5項旨在提高平台永續性的干預措施。 | MediaWiki 平台的可持續性是一項常青的工作——對於我們擴展、提高或避免開發人員滿意度下降,以及發展我們技術社群的能力非常重要。這取決於技術和社會因素,然而我們擁有對永續發展具有策略意義的具體改進領域的隱性知識。計劃的干預措施可能有助於提高平台的可持續性和可維護性,或避免其退化。我們計劃在第四季度評估這項工作的影響,並為未來的永續發展目標提出建議。永續性介入的例子有: 簡化 MediaWiki 核心的複雜代碼域(但只有少數人知道這是如何運作的);增加程式碼分析工具的使用,以了解我們的程式碼庫的品質;簡化打包和發佈等流程。 | Mateus Santos |
WE5.2 | 在第二季度末確定並在第四季度末完成一項或多項干預措施,以開發 MediaWiki 生態系統的編程接口,以實現解耦、更簡單和更可持續的功能開發。 | 關鍵結果 5.2 的主要目標是改進和澄清 MediaWiki 核心平台與其擴展、皮膚和其他部分之間的交互作用。我們的目的是對 MediaWiki 的架構進行功能改進,實現實用的模組化和可維護性,更容易開發擴展,並滿足更廣泛的 MediaWiki 產品願景的要求。這項工作同時旨在告知核心、擴展或它們之間的介面中應該存在(或不存在)的項目。今年將分為兩個階段:為期 5 個月的研究和實驗階段,將為第二階段實施具體干預措施提供資訊。 | Jonathan Tweed |
WE5.3 | 在第二季末之前,完成一項資料收集計劃和一項性能改進實驗,為後續產品和平台干預措施提供信息,以利用 MediaWiki 將頁面建模為結構化片段組合所釋放的功能。 | 這裡的主要目標是使開發人員和產品經理能夠利用新的 MediaWiki 平台功能,透過提供目前難以實施的新產品,並提高平台的效能和彈性,來滿足百科全書內容的當前和未來需求。
具體來說,以 MediaWiki 的角度來看,我們希望將 MediaWiki 的處理模型從將頁面視為整體單元轉變為將頁面視為結構化內容單元的組合。基於 Parsoid 的閱讀視圖、維基資料整合,以及將 Wikifunctions 整合到維基當中都是朝著這一目標的隱含舉措。作為此關鍵結果的一部分,我們希望更有意識地試驗和收集數據,為基於這些新功能的未來干預措施提供信息,以確保我們能夠實現預期的基礎設施和產品影響。 |
Subramanya Sastry |
WE5.4 | By the end of Q2, execute the 1.43 LTS release with a new MW release process that synchronizes with PHP upgrades. | The MediaWiki software platform relies on regular updates to the next PHP version to remain secure and sustainable, which is a pain point in our process and important for the modernization of our infrastructure. At the same time, we regularly release new versions of the MediaWiki software, which e.g. translatewiki.net depends on, the platform used to translate software messages for the Wikimedia projects and many other open source projects. There’s an opportunity to improve the MediaWiki release process, including technical documentation and synchronization with PHP upgrades in alignment with the MediaWiki product strategy before the next release, which will be a long term support version (LTS). This work is part of our strategic investment in the sustainability of the MediaWiki platform (see also: 5.1) and aims to improve developer experience and infrastructure management. |
Mateus Santos |
WE6.1 | 解決 5 個問題,以提高開發人員和工程的工作流程,服務的效率和明智的決策,並在第四季度末之前提供相關數據。 | 「這很複雜!」是對「哪些資料儲存庫將部署到維基媒體的產品?」等問題的常見回答。在此關鍵結果中,我們將探索工程生產力和經驗領域的一些「常青問題」——一些看似簡單卻難以回答的重複性問題,一些我們可以回答但資料無法存取且需要主題專家自訂查詢的問題,或者一些由於流程差距或其他原因而難以獲得回應的問題。我們將為每個問題定義「解決」的含義:對於某些問題,這可能只是意味著使現有準確的數據易於存取。其他問題將需要更多的研究和工程時間來解決。這項工作的總體目標是減少深入了解開發人員體驗關鍵方面所需的時間、解決方法和工作量,並使我們能夠改善工程和開發人員的工作流程和服務。 | [TBD] |
WE6.2 | 到第四季度末,增強現有專案並執行至少兩項實驗,旨在提供可維護的、具有針對性的環境,推動我們走向安全和半連續的交付。 | 開發人員和使用者依賴維基媒體測試版叢集來捕捉錯誤,以免影響到貢獻中的使用者。隨著時間的推移,維基媒體測試版叢集的用途不斷增加,並產生了衝突——其用途過於多樣,無法在單一環境中實現。我們將改進現有的一個替代環境,並進行實驗,旨在用一個可維護的替代環境取代目前由維基媒體測試版叢集滿足的單一高優先級測試需求,從而更好地滿足每個用例的需求。 | Tyler Cipriani |
WE6.3 | Develop a Toolforge sustainability scoring framework by Q3. Apply it to improve at least one critical platform aspect by Q4 and inform longer-term strategy. | Toolforge 是維基媒體志願者建構工具的關鍵平台,從編輯到反破壞行為均發揮著至關重要的作用。我們的目標是增強 Toolforge 的可用性、降低貢獻障礙、改善社群實踐,並促進對既定政策的遵守。為此,我們將在第二季度末引入評分系統來評估 Toolforge 平台的可持續性,重點關注技術和社會方面。以該系統為指導,我們的目標是將其中一項關鍵技術因素提高50%。 | Slavina Stefanova |
信號和數據服務(英文簡稱為:SDS)關鍵結果
[ 目標 ] | |||
---|---|---|---|
關鍵結果簡稱 | 關鍵結果文本 | 關鍵結果背景 | 擁有者 |
SDS1.1 | 到第三季度末,兩個專案或關鍵結果驅動的倡議已評估了其工作對一項或多項核心指標的直接影響。 | 我們的核心組織指標是評估維基媒體基金會實現其目標進展的關鍵工具。當我們為專案分配資源並設計以關鍵結果為導向的工作流程時,這些高級指標應指導我們如何將這些投資與年度計劃中定義的維基媒體基金會總體目標聯繫起來。
本關鍵結果中的工作承認,維基媒體基金會作為一個整體,在將所有計劃干預措施的影響與高層次或核心指標進行量化聯繫方面處於早期階段。為了實現這一最終目標,本關鍵結果旨在開發一個流程——透過此流程,我們可以分享我們的計畫與高層次衡量標準之間的邏輯和理論連結。在實踐中,這意味著要與整個維基媒體基金會的倡議所有者合作,了解他們在專案層面的工作成果如何與維基媒體基金會層面的核心指標相聯繫並對其產生影響。 我們將採用「變革理論繪圖 」和因果圖建構等影響繪圖框架和練習,以確保在記錄工作的潛在影響當中一致性和嚴謹性。為了實現這一關鍵成果,我們同時需要開發輔助材料,幫助計畫負責人了解組織衡量標準,並了解如何建立與其工作相關的變革理論。 |
Omari Sefu |
SDS1.2 | 在 2024 年 12 月之前回答兩個策略性開放研究問題,以便提供建議或為 2026 財政年度的年度規劃提供資訊。 | 維基媒體生態系統中有許多開放的研究問題,回答其中一些問題對於維基媒體基金會或其自治體具有策略意義。這些問題的答案可以為未來的產品或技術開發提供信息,或者可以支援政策領域的決策/倡導。雖然其中一些問題可以透過純粹的研究或研究工程專業知識來回答,但考慮到維基媒體專案的社會技術性質,獲得可靠的見解通常需要跨團隊協作來收集資料、情境建立、使用者互動、仔細設計實驗等等。透過這個關鍵結果,我們的目標是優先考慮我們的一些資源來回答一個或多個此類問題。
該關鍵結果中的工作包括對一系列策略性開放問題進行優先排序,以及進行實驗工作以找到其中 X 個(目前估計為兩個)問題的答案。我們在此關鍵結果中解決的理想問題類型是一旦回答即可透過使多個其他團隊或團體能夠(在具有更多資訊的情況下?)開展產品、技術或政策工作並產生解鎖效果的問題。 我們希望本關鍵結果中的工作能夠對以下關鍵結果進行補充:
|
Leila Zia |
SDS1.3 | 使數據利益相關者了解和追蹤 3 個核心和基本指標的數據流所需的平均時間至少減少 50% | 這是數據治理標準所必需的。
追溯數據集的轉換和來源很困難,並且需要了解不同的儲存庫和系統。我們應該讓人們更容易理解數據如何在我們的系統中流動,以便數據利益相關者能夠以更自助的方式工作。 這項工作將支援數據轉換並用於分析、功能、API 和資料品質作業的工作流程。這將有一個圍繞記錄指標的後續關鍵結果。 |
Luke Bowmaker |
SDS2.1 | 到第二季末,我們可以支援一個產品團隊透過基本 A/B 測試來評估功能或產品,從而將他們獲取使用者互動資料的時間減少 50%。 | 我們認為使用共享工具將增強產品團隊對數據驅動決策的信心,提高效率和生產力,並增強產品策略和創新。 我們將考慮採用團隊的獨立時間作為使用者互動資料基線,並將其改進 50%。我們同時將研究如何將這些收益融入所有產品團隊的更全面背景當中。 我們希望了解如何根據採用團隊的回饋和 SDS 2.2 的結果來改善體驗,並確定能力增強和其優先順序。 |
Virginia Poundstone |
SDS2.2 | 到第二季末,我們將擁有三個用於分析實驗(A/B 測試)的基本指標,以支持測試與 2024-25 財政年度關鍵結果相關的產品/功能假設。 | 當產品經理(或設計師)假設產品/功能將解決使用者或組織的問題/需求時,實驗就是他們如何測試該假設,並有助了解他們的想法對指標的潛在影響。實驗結果告知產品經理相關的資訊,並幫助他們決定下一步要採取什麼行動(放棄這個想法並嘗試不同的假設,如果實驗是在開發生命週期的早期進行的,則繼續開發,或者向更多用户發佈產品/功能)。產品經理必須能夠在他們信任和理解的證據支持下充滿信心地做出這樣的決定。
實現這一目標的一個主要障礙是,產品團隊目前使用客製化的專案特定指標來建立假設,而這些指標需要專門的分析師支援來定義、衡量、分析和報告。如果改用一套基本指標來制定所有可測試的產品/功能假設陳述,就可以做到這一點:
我們認為,一組被廣泛理解和一致使用的基本指標(並受到行業標準指標的通知/影響)將提高組織的數據素養,並促進審查、實驗和學習的文化。我們重點關注以下基本指標:(1) 最好地衡量和評估與2 個維基體驗關鍵結果(WE3.1 和WE1.2)相關的產品/功能的成功/影響所需的指標,以及 (2) 反映或映射到行業網絡分析中使用的標準指標。 |
Mikhail Popov |
SDS2.3 | Deploy a unique agent tracking mechanism to our CDN which enables the A/B testing of product features with anonymous readers. | Without such a tracking mechanism, it is not reasonable to implement A/B testing of product features with anonymous readers.
This is basically a milestone-based result to create a new technical capability that others can build measurable things on top of. The key priority use-case will be A/B testing of features with anonymous readers, but this work also enables other important future things, which may create follow-on hypotheses later in WE4.x (for request risk ratings and mitigating large-scale attacks) and for metrics/research about unique device counts as their resourcing and priorities allow. |
Brandon Black |
未來的觀眾(英文簡稱為:FA)關鍵結果
[ 目標 ] | |||
---|---|---|---|
關鍵結果簡稱 | 關鍵結果文本 | 關鍵結果背景 | 擁有者 |
FA1.1 | 根據未來受眾的實驗見解和建議,到第三季末,非未來受眾團隊擁有的至少一個目標或關鍵結果將出現在下一年年度計畫的草案中。 | 自 2020 年以來,維基媒體基金會一直在追蹤可能影響我們為後代知識消費者和知識貢獻者服務能力的外部趨勢,並為後代繼續蓬勃發展的自由知識運動。未來受眾(Future Audiences)是一個小型研發團隊,將:
在我們的常規年度計劃期間,根據從實驗中獲得的見解,就維基媒體基金會應進行的新的非實驗性投資(即需要由一個或多個團隊負責的新產品或計劃)提出建議。如果下一財政年度的年度計畫草案中出現至少一項由未來受眾(Future Audiences)以外的團隊負責,並由未來受眾(Future Audiences)建議推動的目標或關鍵成果,則將實現這一關鍵成果。 |
Maryana Pinchuk |
產品和工程支援(英文簡稱為:PES)關鍵結果
[ 目標 ] | |||
---|---|---|---|
關鍵結果簡稱 | 關鍵結果文本 | 關鍵結果背景 | 擁有者 |
PES1.1 | 審查文化: 在季度調查中,逐步提高產品與技術部門員工對我們的交付、協調、方向和團隊健康的評價分數。 | 審查文化是一種基於較短的迭代、學習和適應週期的產品開發文化。這意味著我們的組織可能會設定年度目標,但我們為實現這些目標所做的工作將隨著我們的學習而在一年中變化和適應。建立審查文化有兩個組成部分:流程和行為。本關鍵結果重點關注後者。行為改變可以發展和加強我們的審查文化。當我們轉向更迭代的產品開發時,這涉及到個人習慣和慣例的改變。該關鍵結果將基於自我報告的個人行為變化,並衡量由此產生的員工情緒變化(如有)。 | Amy Tsay |
PES1.2 | 到第二季末,新的願望清單更好地將維基媒體運動的想法和請求與維基媒體基金會產品與技術部門的活動聯繫起來:願望清單積壓的專案透過 2024-2025 財政年度的關鍵結果得到解決,維基媒體基金會已完成 10 個較小的願望,維基媒體基金會已與志願者為 2025-26 財政年度確定 3 個以上的機會領域。 | 社群願望清單代表了維基媒體運動的一小部分——大約有 1,000 人參與,其中大多數是貢獻者或管理員。人們經常透過 Phabricator 編寫功能請求和錯誤報告來繞過願望清單——而在 Phabricator 上很難辨別來自維基媒體基金會或社群的請求。對參與者來說,願望清單是一項昂貴的時間投資,但回報卻微乎其微。他們仍然參與願望清單是因為他們認為這是唯一可以喚起人們對有影響的錯誤和功能改進的關注,或表明需要更廣泛的策略機會的工具。願望通常被寫成解決方案,而不是問題。解決方案在紙面上看似合理,但不一定考慮到技術複雜性或維基媒體運動策略的影響。
願望的範圍和廣度有時超出了社群技術或單一團隊的範圍和能力,從而使挫折感持續存在,導致請求意見稿和取消願望清單的呼籲。雖然社群成員更喜歡使用願望清單來表達專案想法,但維基媒體基金會的團隊會查看願望清單和其他接收流程來確定優先級別,部分原因是一些願望對於年度規劃來說不合時宜,並且很難納入路線圖/目標與關鍵結果 。 未來的社群願望清單應該成為社群和維基媒體基金會之間的一座橋樑——社群以結構化的方式提供意見,以便我們能夠採取行動,進而讓志願者感到高興。我們正在創建一個新的招募流程,供任何登入的志願者一年 365 天提交願望。願望可以報告或突出顯示錯誤、請求改進或構思新功能。任何人都可以評論、討論或支持影響優先順序的願望。維基媒體基金會不會將願望分類為「太大的願望」或「太小的願望」。 一些與更大問題領域相關的願望可以影響年度規劃和團隊路線圖,提供策略方向和機會。維基媒體運動將在一個儀表板工具上看到所有的願望——該儀表板工具按專案、產品/問題領域和願望類型對願望進行分類。維基媒體基金會將及時回應願望,並與社群合作,對願望進行分類和優先排序。我們將與維基媒體成員合作,確定並優先考慮三個改進領域,並將其納入維基媒體基金會的 2025-2026 年度計劃,以提高有影響力的願望的採納率和實現率。我們將為志願開發者社群和維基媒體基金會團隊標記範圍明確的願望,從而提高團隊和開發者的參與度,實現更多願望以提高社群滿意度。實現更多的願望可以提高貢獻者的幸福感、效率和保留率,從而產生更多高品質的編輯、更高品質的內容和更多的讀者。 |
Jack Wheeler |
PES1.3 | 從現有的探索性產品/功能中運行並得出兩項實驗,為我們提供數據/見解,了解如何將維基百科發展為第一季度和第二季度當前消費者和志願者受眾的知識目的地。在第三季末之前,在維基體驗工作桶中完成並分享未來目標與關鍵結果工作可能採用的經驗教訓和建議。 | 這項工作與未來受眾(Future Audiences)團隊的目標相對應,但重點是透過更靈活地測試更多平台產品創意來發現增加和加深現有受眾(維基百科消費者和貢獻者)參與度的機會。
這在 PES1 中的作用是激勵和倍增——將個人成員和團隊已經投入到黑客/實驗副專案中的時間引導到更有前景的功能上來。這個關鍵結果提供了一條途徑,使其中一些想法有可能透過經過驗證的實驗進入更大的 APP 設定中,從而更有效地利用員工的時間,激發他們的創造力和生產力。 透過開展更多此類規模較小和時間較短的專案,我們同時可以分散「賭注」,以便學習和嘗試更多可能根據當前受眾不斷變化的需求和期望改變維基百科的想法。 這將使我們的工作更具影響力,速度也更快——因為這有助於維基媒體基金會在更短的時間內實現正確的目標。 |
Rita Ho |
PES1.4 | 了解如何:設定、監控 SLO 並做出決策。當我們發佈時,至少選擇一項新事物來定義 SLO。與對應團隊(通常:產品、開發團隊、SRE)協作來定義 SLO。反映並記錄有關未來哪些版本應具有 SLO 以及如何設定的指南。 | 未來的)關鍵結果:
設定流程和基本工具來設定和監控新版本的 SLO。每季報告一次,並決定何時(或不)優先處理修復問題的工作,以及與社群分享報告。 為什麼: 我們不知道何時需要優先修復某些問題。我們有大量的程式碼——隨著程式碼數量的不斷增加,我們可能需要在解決問題與專注於創新之間做出抉擇的情況也會越來越多,何時需要做出抉擇的不確定性也會越來越大。此外,員工和社群不清楚我們對與他們互動的所有不同特性和功能的可靠性和表現的支持/承諾程度。如果我們定義了預期的服務水平,我們就能知道何時應該分配資源。 |
Mark Bergsma |
PES1.5 | Define ownership and commitments (including SLOs) on services and learn how to track, report and make decisions as a standard and scalable practice by trialing it in 3 teams across senior leaders in the department. | After collaboratively defining an SLO for the EditCheck feature as part of PES1.5, we will now trial and learn from using the SLO in practice to help prioritisation of reliability work. We will also document roles and responsibilities for ownership of code/services, allowing us to make clear shared commitments on the level of ongoing support. We will try to use these as practices in 3 teams across the department. | Mark Bergsma |
Hypotheses
The hypotheses below are the specific things we are doing each quarter to address the associated key results above.
Each hypothesis is an experiment or stage in an experiment we believe will help achieve the key result. Teams make a hypothesis, test it, then iterate on their findings or develop an entirely different new hypothesis. You can think of the hypotheses as bets of the teams’ time–teams make a small bet of a few weeks or a big bet of several months, but the risk-adjusted reward should be commensurate with the time the team puts in. Our hypotheses are meant to be agile and adapt quickly. We may retire, adjust, or start a hypothesis at any point in the quarter.
To see the most up-to-date status of a hypothesis and/or to discuss a hypothesis with the team please click the link to its project page below.
Q1
The first quarter (Q1) of the WMF annual plan covers July-September.
Wiki Experiences (WE) Hypotheses
[ WE Key Results ] | ||
---|---|---|
Hypothesis shortname | Q1 text | Details & Discussion |
WE1.1.1 | If we expand the Event List to become a Community List that includes WikiProjects, then we will be able to gather some early learnings in how to engage with WikiProjects for product development. | |
WE1.1.2 | If we identify at least 15 WikiProjects in 3 separate Wikipedias to be featured in the Community List, then we will be able to advise Campaigns Product in the key characteristics needed to build an MVP of the Community List that includes WikiProjects. | |
WE1.1.3 | If we consult 20 event organizers and 20 WikiProject organizers on the best use of topics available via LiftWing, then we can prioritize revisions to the topic model that will improve topical connections between events and WikiProjects. | |
WE1.2.1 | If we build a first version of the Edit Check API, and use it to introduce a new Check, we can evaluate the speed and ease with other teams and volunteers could use the API to create new Checks and Suggested Edits. | |
WE1.2.2 | If we build a library of UI components and visual artefacts, Edit Check’s user experience can extend to accommodate Structured Tasks patterns. | |
WE1.2.3 | If we conduct user tests on two or more design prototypes introducing structured tasks to newcomers within/proximate to the Visual Editor, then we can quickly learn which designs will work best for new editors, while also enabling engineers to assess technical feasibility and estimate effort for each approach. | mw:Growth/Constructive activation experimentation |
WE1.2.4 | If we train an LLM on detecting "peacock" behavior, then we can learn if it can detect this policy violation with at least >70% precision and >50% recall and ultimately, decide if said LLM is effective enough to power a new Edit Check and/or Suggested Edit. | |
WE1.2.5 | If we conduct an A/B/C test with the alt-text suggested edits prototype in the production version of the iOS app we can learn if adding alt-text to images is a task newcomers are successful with and ultimately, decide if it's impactful enough to implement as a suggested edit on the Web and/or in the Apps. | mw:Wikimedia Apps/iOS Suggested edits project/Alt Text Experiment |
WE1.3.1 | If we enable additional customisation of Automoderator's behaviour and make changes based on pilot project feedback in Q1, more moderators will be satisfied with its feature set and reliability, and will opt to use it on their Wikimedia project, thereby increasing adoption of the product. | mw:Automoderator |
WE1.3.2 | If we are able interpret subsets of wishes as moderator-related focus areas and share these focus areas for community input in Q1-Q2, then we will have a high degree of confidence that our selected focus area will improve moderator satisfaction, when it is released in Q3. | |
WE2.1.1 | If we build a country-level inference model for Wikipedia articles, we will be able to filter lists of articles to those about a specific region with >70% precision and >50% recall. | m:Research:Language-Agnostic Topic Classification/Countries |
WE2.1.2 | If we build a proof-of-concept providing translation suggestions that are based on user-selected topic areas, we will be set up to successfully test whether translators will find more opportunities to translate in their areas of interest and contribute more compared to the generic suggestions currently available. | mw: Translation suggestions: Topic-based & Community-defined lists |
WE2.1.3 | If we offer list-making as a service, we’ll enable at least 5 communities to make more targeted contributions in their topic areas as measured by (1) change in standard quality coverage of relevant topics on the relevant wiki and (2) a brief survey of organizer satisfaction with topic area coverage on-wiki. | |
WE2.1.4 | If we developed a proof of concept that adds translation tasks sourced from WikiProjects and other list-building initiatives, and present them as suggestions within the CX mobile workflow, then more editors would discover and translate articles focused on topical gaps. By introducing an option that allows editors to select translation suggestions based on topical lists, we would test whether this approach increases the content coverage in our projects. | mw: Translation suggestions: Topic-based & Community-defined lists |
WE2.2.1 | If we expand Wikimedia's State of Languages data by securing data sharing agreements with UNESCO and Ethnologue, at least one partner will decide to represent Wikimedia’s language inclusion progress in their own data products and communications. On top of being useful to our partner institutions, our expanded dataset will provide important contextual information for decision-making and provide communities with information needed to identify areas for intervention. | |
WE2.2.2 | If we map the language documentation activities that Wikimedians have conducted in the last 2 years, we will develop a data-informed baseline for community experiences in onboarding new languages. | |
WE2.2.3 | If we provide production wiki access to 5 new languages, with or without Incubator, we will learn whether access to a full-fledged wiki with modern features such as those available on English Wikipedia (including ContentTranslation and Wikidata support, advanced editing and search results) aids in faster editing. Ultimately, this will inform us if this approach can be a viable direction for language onboarding for new or existing languages, justifying further investigation. | mw:Future of Language Incubation |
WE2.3.1 | If we make two further improvements to media upload flow on Commons and share them with community, the feedback will be positive and it will help uploaders make less bad uploads (with the focus on copyright) as measured by the ratio of deletion requests within 30 days of upload. This will include defining designs for further UX improvements to the release rights step in the Upload Wizard on Commons and rolling out an MVP of logo detection in the upload flow. | |
WE2.4.1 | If we build a prototype of Wikifunctions calls embedded within MediaWiki content, we will be ready to use MediaWiki’s async content processing pipeline and test its performance feasibility in Q2. | phab:T261472 |
WE2.4.2 | If we create a design prototype of an initial Wikifunctions use case in a Wikipedia wiki, we will be ready to build and test our integration when performance feasibility is validated in Q2 (see hypothesis 1). | phab:T363391 |
WE2.4.3 | If we make it possible for Wikifunctions users to access Wikidata lexicographical data, they will begin to create natural language functions that generate sentence phrases, including those that can handle irregular forms. If we see an average monthly creation rate of 31 for these functions, after the feature becomes available, we will know that our experiment is successful. | phab:T282926 |
WE3.1.1 | Designing and qualitatively evaluating three proofs of concept focused on building curated, personalized, and community-driven browsing and learning experiences will allow us to estimate the potential for increased reader retention (experiment 1: providing recommended content in search and article contexts, experiment 2: summarizing and simplifying article content, experiment 3: making multitasking easier on wikis. | |
WE3.1.3 | If we develop models for remixing content such as a content simplification or summarization that can be hosted and served via our infrastructure (e.g. LiftWing), we will establish the technical direction for work focused on increasing reader retention through new content discovery features. | |
WE3.1.4 | If we analyze the projected performance impact of hypothesis WE3.1.1 and WE3.1.2 on the Search API, we can scope and address performance and scalability issues before they negatively affect our users. | |
WE3.1.5 | If we enhance the search field in the Android app to recommend personalized content based on a user's interest and display better results, we will learn if this improves user engagement by observing whether it increases the impression and click-through rate (CTR) of search results by 5% in the experimental group compared to the control group over a 30-day A/B test. This improvement could potentially lead to a 1% increase in the retention of logged out users. | |
WE3.2.1 | If we create a clickable design prototype that demonstrates the concept of a badge representing donors championing article(s) of interest, we can learn if there would be community acceptance for a production version of this method for fundraising in the Apps. | Fundraising Experiment in the iOS App |
WE3.2.2 | Increasing the prominence of entry points to donations on the logged-out experiences of the web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% Year over Year | phab:T368765 |
WE3.2.3 | If we make the “Donate” button in the iOS App more prominent by making it one click or less away from the main navigation screen, we will learn if discoverability was a barrier to non banner donations. | |
WE3.3.1 | If we select a data visualization library and get an initial version of a new server-rendered graph service available by the end of July, we can learn from volunteers at Wikimania whether we’re working towards a solution that they would use to replace legacy graphs. | |
WE4.1.1 | If we implement a way in which users can report potential instances of harassment and harmful content present in discussions through an incident reporting system, we will be able to gather data around the number and type of incidents being reported and therefore have a better understanding of the landscape and the actions we need to take. | |
WE4.2.1 | If we explore and define Wikimedia-specific methods for a unique device identification model, we will be able to define the collection and storage mechanisms that we can later implement in our anti-abuse workflows to enable more targeted blocking of bad actors. | phab:T368388 |
WE4.2.9 | If we provide contextual information about reputation associated with an IP that is about to be blocked, we will see fewer collateral damage IP and IP range blocks, because administrators will have more insight into potential collateral damage effects of a block. We can measure this by instrumenting Special:Block and observing how behavior changes when additional information is present, vs when it is not. | WE4.2.9 Talk page |
WE4.2.2 | If we define an algorithm for calculating a user account reputation score for use in anti-abuse workflows, we will prepare the groundwork for engineering efforts that use this score as an additional signal for administrators targeting bad actors on our platform. We will know the hypothesis is successful if the algorithm for calculating a score maps with X% precision to categories of existing accounts, e.g. a "low" score should apply to X% of permanently blocked accounts | WE4.2.2 Talk page |
WE4.2.3 | If we build an evaluation framework using publicly available technologies similar to the ones used in previous attacks we will learn more about the efficacy of our current CAPTCHA at blocking attacks and could recommend a CAPTCHA replacement that brings a measurable improvement in terms of the attack rate achievable for a given time and financial cost. | |
WE4.3.1 | If we apply some machine learning and data analysis tools to webrequest logs during known attacks, we'll be able to identify abusive IP addresses with at least >80% precision sending largely malicious traffic that we can then ratelimit at the edge, improving reliability for our users. | phab:T368389 |
WE4.3.2 | If we limit the load that known IP addresses of persistent attackers can place on our infrastructure, we'll reduce the number of impactful cachebusting attacks by 20%, improving reliability for our users. | |
WE4.3.3 | If we deploy a proof of concept of the 'Liberica' load balancer, we will measure a 33% improvement in our capacity to handle TCP SYN floods. | |
WE4.3.4 | If we make usability improvements and also perform some training exercises on our 'requestctl' tool, then SREs will report higher confidence in using the tool. | phab:T369480 |
WE4.4.1 | If we run at least 2 deployment cycles of Temp Accounts we will be able to verify this works successfully. | |
WE5.1.1 | If we successfully roll out Parsoid Read Views to all Wikivoyages by Q1, this will boost our confidence in extending Parsoid Read Views to all Wikipedias. We will measure the success of this rollout through detailed evaluations using the Confidence Framework reports, with a particular focus on Visual Diff reports and the metrics related to performance and usability. Additionally, we will assess the reduction in the list of potential blockers, ensuring that critical issues are addressed prior to wider deployment. | |
WE5.1.2 | If we disable unused Graphite metrics, target migrating metrics using the db-prefixed data factory and increase our outreach efforts to other teams and the community in Q1, then we would be on track to achieve our goal of making Graphite read-only by Q3 FY24/25, by observing an increase of 30% in migration progress. | |
WE5.1.3 | If we implement a canonical url structure with versioning for our REST API then we can enable service migration and testing for Parsoid endpoints and similar services by Q1. | phab:T344944 |
WE5.1.4 | If we complete the remaining work to mitigate the impact of browsers' anti-tracking measures on CentralAuth autologin and move to a more resilient authentication infrastructure (SUL3), we will be ready to roll out to production wikis in Q2. | |
WE5.1.5 | If we increase the coverage of Sonar Cloud to include key MediaWiki Core repos, we will be able to improve the maintainability of the MediaWiki codebase. This hypothesis will be measured by spliting the selected repos into test and control groups. These groups will then be compared over the course of a quarter to measure impact of commit level feedback to developers. | |
WE5.2.1 | If we make a classification of the types of hooks and extension registry properties used to influence the behavior of MediaWiki core, we will be able to focus further research and interventions on the most impactful. | [1] |
WE5.2.2 | If we explore a new architecture for notifications in MW core and Echo, we will discover new ways to provide modularity and new ways for extensions to interact with core. | [2] |
WE5.3.1 | If we instrument parser and cache code to collect template structure and fine-grained timing data, we can quantify the expected performance improvement which could be realized by future evolution of the wikitext parsing platform. | T371713 |
WE5.3.2 | On template edits, if we can implement an algorithm in Parsoid to reuse HTML of a page that depends on the edited template without processing the page from scratch and demonstrate 1.5x or higher processing speedup, we will have a potential incremental parsing solution for efficient page updates on template edits. | T363421 |
WE5.4.1 | If the MediaWiki engineering group is successful with release process accountability and enhances its communication process by the end of Q2 in alignment with the product strategy, we will eliminate the current process that relies on unplanned or volunteer work and improve community satisfaction with the release process. Measured by community feedback on the 1.43 LTS release coupled with a significant reduction in unplanned staff and volunteer hours needed for release processes. | |
WE5.4.2 | If we research and build a process to more regularly upgrade PHP in conjunction with our MediaWiki release process we will increase speed and security while reducing the complexity and runtime of our CI systems, by observing the success of PHP 8.1 upgrade before 1.43 release. | |
WE6.1.1 | If we design and complete the initial implementation of an authorization framework, we’ll establish a system to effectively manage the approval of all LDAP access requests. | |
WE6.1.2 | If we research available documentation metrics, we can establish metrics that measure the health of Wikimedia technical documentation, using MediaWiki Core documentation as a test case. | mw:Wikimedia Technical Documentation Team/Doc metrics |
WE6.1.3 | If we collect insights on how different teams are making technical decisions we are able to gather good practices and insights that can enable and scale similar practices across the organization. | |
WE6.2.1 | If we publish a versioned build of MediaWiki, extensions, skins, and Wikimedia configuration at least once per day we will uncover new constraints and establish a baseline of wallclock time needed to perform a build. | mw:Wikimedia Release Engineering Team/Group -1 |
WE6.2.2 | If we replace the backend infrastructure of our existing shared MediaWiki development and testing environments (from apache virtual servers to kubernetes), it will enable us to extend its uses by enabling MediaWiki services in addition to the existing ability to develop MediaWiki core, extensions, and skins in an isolated environment. We will develop one environment that includes MediaWiki, one or more Extensions, and one or more Services. | wikitech:Catalyst |
WE6.2.3 | If we create a new deployment UI that provides more information to the deployer and reduce the amount of privilege needed to do deployment, it will make deployment easier and open deployments to more users as measured by the number of unique deployers and number of patches backported as a percentage of our overall deployments. | Wikimedia Release Engineering Team/SpiderPig |
WE6.2.4 | If we migrate votewiki, wikitech and commons to MediaWiki on Kubernetes we reap the benefits of consistency and no longer need to maintain 2 different infrastructure platforms in parallel, allowing to reduce the amount of custom written tooling, making deployments easier and less toilous for deployers. This will be measured by a decrease in total deployment times and a reduction in deployment blockers. | 工单「T292707」 |
WE6.2.5 | If we move MultiVersion routing out of MediaWiki, we 'll be able to ship single version MediaWiki containers, largely cutting down the size of containers allowing for faster deployments, as measured by the deployment tool. | SingleVersion MW: Routing options |
WE6.3.1 | By consulting toolforge maintainers about the least sustainable aspects of the platform, we will be able to gather a list of potential categories to measure. | |
WE6.3.2 | By creating a "standard" tool to measure the number of steps for a deployment we will be able to assess the maximal improvement in the deployment process. | |
WE6.3.3 | If we conduct usability tests, user interviews, and competitive analysis to explore the existing workflows and use cases of Toolforge, we can identify key areas for improvement. This research will enable us to prioritize enhancements that have the most significant impact on user satisfaction and efficiency, laying the groundwork for a future design of the user interface. |
Signals & Data Services (SDS) Hypotheses
[ SDS Key Results ] | ||
---|---|---|
Hypothesis shortname | Q1 text | Details & Discussion |
SDS 1.1.1 | If we partner with an initiative owner and evaluate the impact of their work on Core Foundation metrics, we can identify and socialize a repeatable mechanism by which teams at the Foundation can reliably impact Core Foundation metrics. | |
SDS1.2.2 | If we study the recruitment, retention, and attrition patterns among long-tenure community members in official moderation and administration roles, and understand the factors affecting these phenomena (the ‘why’ behind the trends), we will better understand the extent, nature, and variability of the phenomenon across projects. This will in turn enable us to identify opportunities for better interventions and support aimed at producing a robust multi-generational framework for editors. | phab:T368791 |
SDS1.2.1 | If we gather use cases from product and feature engineering managers around the use of AI in Wikimedia services for readers and contributors, we can determine if we should test and evaluate existing AI models for integration into product features, and if yes, generate a list of candidate models to test. | phab:T369281 |
SDS1.3.1 | If we define the process to transfer all data sets and pipeline configurations from the Data Platform to DataHub we can build tooling to get lineage documentation automatically. | |
SDS 1.3.2 | If we implement a well documented and understood process to produce an intermediary table representing MediaWiki Wikitext History, populated using the event platform, and monitor the reliability and quality of the data we will learn what additional parts of the process are needed to make this table production ready and widely supported by the Data Platform Engineering team. | |
SDS2.1.2 | If we investigate the data products current sdlc, we will be able to determine inflection points where QTE knowledge can be applied in order to have a positive impact on Product Delivery. | |
SDS2.1.3 | If the Growth team learns about the Metrics Platform by instrumenting a Homepage Module on the Metrics Platform, then we will be prepared to outline a measurement plan in Q1 and complete an A/B test on the new Metrics platform by the end of Q2. | |
SDS2.1.4 | If we conduct usability testing on our prototype among pilot users of our experimentation process, we can identify and prioritize the primary pain points faced by product managers and other stakeholders in setting up and analyzing experiments independently. This understanding will lead to the refinement of our tools, enhancing their efficiency and impact. | |
SDS2.1.5 | If we design a documentation system that guides the experience of users building instrumentation using the Metrics Platform, we will enable those users to independently create instrumentation without direct support from Data Products teams, except in edge cases. | phab:T329506 |
SDS2.2.1 | If we define a metric for logged-out mobile app reader retention, which is applicable for analyzing experiments (A/B test), we can provide guidance for planning instrumentation to measure retention rate of logged out readers in the mobile apps and enable the engineering team to develop an experiment strategy targeting logged out readers. | |
SDS2.2.2 | If we define a standard approach for measuring and analyzing conversion rates, it will help us establish a collection of well-defined metrics to be used for experimentation and baselines, and start enabling comparisons between experiments/projects to increase learning from these. | |
SDS2.2.3 | If we define a standard way of measuring and analyzing clickthrough rate (CTR) in our products/features, it will help us design experiments that target CTR for improvement, standardize click-tracking instrumentation, and enable us to make CTR available as a target metric to users of the experimentation platform. | |
SDS2.3.1 | If we conduct a legal review of proposed unique cookies for logged out users, we can determine whether there are any privacy policy or other legal issues which inform the community consultation and/or affect the technical implementation itself. |
Future Audiences (FA) Hypotheses
[ FA Key Results ] | ||
---|---|---|
Hypothesis shortname | Q1 text | Details & Discussion |
FA1.1.1 | If we make off-site contribution very low effort with an AI-powered “Add a Fact” experiment, we can learn whether off-platform users could help grow/sustain the knowledge store in a possible future where Wikipedia content is mainly consumed off-platform. | m:Future Audiences/Experiment:Add a Fact |
Product and Engineering Support (PES) Hypotheses
[ PES Key Results ] | ||
---|---|---|
Hypothesis shortname | Q1 text | Details & Discussion |
PES1.1.1 | If the P&T leadership team syncs regularly on how they’re guiding their teams towards a more iterative software development culture, and we collect baseline measurements of current development practices and staff sentiment on how we work together to ship products, we will discover opportunity areas for change management. The themes that emerge will enable us to build targeted guidance or programs for our teams in coming quarters. | |
PES1.2.2 | If the Moderator Tools team researches the Community Wishlist and develops 2+ focus areas in Q1, then we can solicit feedback from the Community and identify a problem that the Community and WMF are excited about tackling. | |
PES1.2.3 | If we bundle 3-5 wishes that relate to selecting and inserting templates, and ship an improved feature in Q1, then CommTech can take the learnings to develop a Case Study for the foundation to incorporate more "focus areas" in the 2025-26 annual plan. | |
PES1.3.1 | If we provide insights to audiences about their community and their use of Wikipedia over a year, it will stimulate greater connection with Wikipedia – encouraging greater engagement in the form of social sharing, time spent interacting on Wikipedia, or donation. Success will be measured by completing an experimental project that provides at least one recommendation about “Wikipedia insights” as an opportunity to increase onwiki engagement. | mw: New Engagement Experiments#PES1.3.1_Wikipedia_user_insights |
PES1.3.2 | If we create a Wikipedia-based game for daily use that highlights the connections across vast areas of knowledge, it will encourage consumers to visit Wikipedia regularly and facilitate active learning, leading to longer increased interaction with content on Wikipedia. Success will be measured by completing an experimental project that provides at least one recommendation about gamification of learning as an opportunity to increase onwiki engagement. | mw: New Engagement Experiments#PES_1.3.2:_Wikipedia_games |
PES1.3.3 | If we develop a new process/track at a Wikimedia hack event to incubate future experiments, it will increase the impact and value of such events in becoming a pipeline for future annual plan projects, whilst fostering greater connection between volunteers and engineering/design staff to become more involved with strategic initiatives. Success will be measured by at least one PES1.3 project being initiated and/or advanced to an OKR from a foundation-supported event. | mw: New Engagement Experiments#PES_1.3.3:_Incubator_space |
PES1.4.1 | If we draft an SLO with the Editing team releasing Edit Check functionality, we will begin to learn and understand how to define and track user-facing SLOs together, and iterate on the process in the future. | |
PES1.4.2 | If we define and publish SLAs for putting OOUI into “maintenance mode”, growth of new code using OOUI across Wikimedia projects will stay within X% in Q1. | |
PES1.4.3 | If we map ownership using the proposed service catalog for known owned services in Q1, we will be able to identify significant gaps in service catalog as it helps in solving the SLO culture by the end of the year. |
Q2
The second quarter (Q2) of the WMF annual plan covers October-December.
Wiki Experiences (WE) Hypotheses
[ WE Key Results ] | ||
---|---|---|
Hypothesis shortname | Q2 text | Details & Discussion |
WE1.1.1 | If we expand the Event list to become a Community List that includes WikiProjects, then we will be able to gather some early learnings in how to engage with WikiProjects for product development. | Campaigns/Foundation Product Team/Event list |
WE1.1.2 | If we launch at least 1 consultation focused on on-wiki collaborations, and if we collect feedback from at least 20 people involved in such collaborations, then we will be able to advise Campaigns Product on the key characteristics needed to develop a new or improved way of connecting. | Campaigns/WikiProjects |
WE1.1.3 | If we consult 20 event organizers and 20 WikiProject organizers on the best use of topics available via LiftWing, then we can prioritize revisions to the topic model that will improve topical connections between events and WikiProjects. | |
WE1.1.4 | If we integrate CampaignEvents into Community Configuration in Q2, then we will set the stage for at least 5 more wikis opting to enable extension features in Q3, thereby increasing tool usage. | |
WE1.2.2 | If we build a library of UI components and visual artifacts, Edit Check’s user experience can extend to accommodate Structured Tasks patterns. | |
WE1.2.5 | If we conduct an A/B/C test with the alt-text suggested edits prototype in the production version of the iOS app we can learn if adding alt-text to images is a task newcomers are successful with and ultimately, decide if it's impactful enough to implement as a suggested edit on the Web and/or in the Apps. | |
WE1.2.6 | If we introduce new account holders to the “Add a Link” Structured Task in Wikipedia articles, we expect to increase the percentage of new account holders who constructively activate on mobile by 10% compared to the baseline. | |
WE1.3.1 | If we enable additional customisation of Automoderator's behaviour and make changes based on pilot project feedback in Q1, more moderators will be satisfied with its feature set and reliability, and will opt to use it on their Wikimedia project, thereby increasing adoption of the product. | mw:Moderator_Tools/Automoderator |
WE1.3.3 | If we improve the user experience and features of the Nuke extension during Q2, we will increase administrator satisfaction of the product by 5pp by the end of the quarter. | mw:Extension:Nuke/2024_Moderator_Tools_project |
WE2.1.3 | If we offer list-making as a service, we’ll enable at least 5 communities to make more targeted contributions in their topic areas as measured by (1) change in standard quality coverage of relevant topics on the relevant wiki and (2) a brief survey of organizer satisfaction with topic area coverage on-wiki. | |
WE2.1.4 | If we developed a proof of concept that adds translation tasks sourced from WikiProjects and other list-building initiatives, and present them as suggestions within the CX mobile workflow, then more editors would discover and translate articles focused on topical gaps. By introducing an option that allows editors to select translation suggestions based on topical lists, we would test whether this approach increases the content coverage in our projects. |
|
WE2.1.5 | If we expose topic-based translation suggestions more broadly and analyze its initial impact, we will learn which aspects of the translation funnel to act on in order to obtain more quality translations. | |
WE2.2.4 | If we provide production wiki access to 5 new languages, with or without Incubator, we will learn whether access to a full-fledged wiki with modern features such as those available on English Wikipedia (including ContentTranslation and Wikidata support, advanced editing and search results) aids in faster editing. Ultimately, this will inform us if this approach can be a viable direction for language onboarding for new or existing languages, justifying further investigation. | |
WE2.2.5 | If we move addwiki.php to core and customize it to Wikimedia, we will improve code quality in our wiki creation system making it testable and robust, and we will make it easy for creators of new wikis and thereby make significant steps towards simplifying wiki creation process. | phab:T352113 |
WE2.3.2 | If we make two further improvements to media upload flow on Commons and share them with community, the feedback will be positive and it will help uploaders make less bad uploads (with the focus on copyright) as measured by the ratio of deletion requests within 30 days of upload. This will include release of further UX improvements to the release rights step in the Upload Wizard on Commons and automated detection of external sources. | |
WE2.3.3 | If the BHL-Wikimedia Working Group creates Commons categories and descriptive guidelines for the South American and/or African species depicted in publications, they will make 3,000 images more accessible to biodiversity communities. (BHL = Biodiversity Heritage Library) |
|
WE2.4.1 | If we build a prototype of Wikifunctions calls embedded within MediaWiki content and test it locally for stability, we will be ready to use MediaWiki’s async content processing pipeline and test its performance feasibility in Q2. | phab:T261472 |
WE2.4.2 | If we create a design prototype of an initial Wikifunctions use case in a Wikipedia wiki, we will be ready to build and test our integration when performance feasibility is validated in Q2, as stated in Hypothesis 1. | phab:T363391 |
WE2.4.3 | If we make it possible for Wikifunctions users to access Wikidata lexicographical data, they will begin to create natural language functions that generate sentence phrases, including those that can handle irregular forms. If we see an average monthly creation rate of 31 for these functions, after the feature becomes available, we will know that our experiment is successful. | phab:T282926 |
WE3.1.3 | If we develop models for remixing content such as a content simplification or summarization that can be hosted and served via our infrastructure (e.g. LiftWing), we will establish the technical direction for work focused on increasing reader retention through new content discovery features. | Research |
WE3.1.6 | If we introduce a personalized rabbit hole feature in the Android app and recommend condensed versions of articles based on the types of topics and sections a user is interested in, we will learn if the feature is sticky enough to result in multi-day usage by 10% of users exposed to the experiment over a 30-day period, and a higher pageview rate than users not exposed to the feature. | |
WE3.1.7 | If we run a qualitative experiment focused on presenting article summaries to web readers, we will determine whether or not article summaries have the potential to increase reader retention, as proxied by clickthrough rate and usage patterns | |
WE3.1.8 | If we build one feature which provides additional article-level recommendations, we will see an increase in clickthrough rate of 10% over existing recommendation options and a significant increase in external referrals for users who actively interact with the new feature. | |
WE3.2.2 | Increasing the prominence of entry points to donations on the logged-out experiences of the Vector web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% YoY. | mw:Readers/2024_Reader_and_Donor_Experiences |
WE3.2.3 | If we make the “Donate” button in the iOS App more prominent by making it one click or less away from the main navigation screen, we will learn if discoverability was a barrier to non banner donations. | Navigation Refresh |
WE3.2.4 | If we update the contributions page for logged-in users in the app to include an active badge for someone that is an app donor and display an inactive state with a prompt to donate for someone that decided not to donate in app, we will learn if this recognition is of value to current donors and encourages behavior of donating for prospective donors, informing if it is worth expanding on the concept of donor badges or abandoning it. | Private Donor Recognition Experiment |
WE3.2.5 | If we create a Wikipedia in Review experiment in the Wikipedia app, to allow users to see and share personalized data about their reading, editing, and donation habits, we will see 2% of viewers donate on iOS as a result of this feature, 5% click share and, 65% of users rating the feature neutral or satisfactory. | Personalized Wikipedia Year in Review |
WE3.2.7 | Increasing the prominence of entry points to donations on the logged-out experiences of the Minerva web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% YoY. | |
WE3.3.2 | If we develop the Charts MVP and get it working end-to-end in production test wikis, at least two Wikipedias + Commons agree to pilot it before the code freeze in December. | |
WE3.4.1 | If we were to explore the feasibility by doing an experiment of setting up smaller PoPs in cloud providers like Amazon, we can expand our data center map and reach more users around the world, at reduced cost and increased turn-around time. | |
WE4.1.2 | If we deploy at least one iteration of the Incident Reporting System MVP on pilot wikis, we will be able to gather valuable data around the frequency and type of incidents being reported. | https://meta.wikimedia.org/wiki/Incident_Reporting_System# |
WE4.2.1 | If we explore and define Wikimedia-specific methods for a unique device identification model, we will be able to define the collection and storage mechanisms that we can later implement in our anti-abuse workflows to enable more targeted blocking of bad actors. | |
WE4.2.9 | If we provide contextual information about reputation associated with an IP that is about to be blocked, we will see fewer collateral damage IP and IP range blocks, because administrators will have more insight into potential collateral damage effects of a block. We can measure this by instrumenting Special:Block and observing how behavior changes when additional information is present, vs when it is not. | |
WE4.2.2 | If we define an algorithm for calculating a user account reputation score for use in anti-abuse workflows, we will prepare the groundwork for engineering efforts that use this score as an additional signal for administrators targeting bad actors on our platform. We will know the hypothesis is successful if the algorithm for calculating a score maps with X% precision to categories of existing accounts, e.g. a "low" score should apply to X% of permanently blocked accounts. | |
WE4.2.3 | If we build an evaluation framework using publicly available technologies similar to the ones used in previous attacks we will learn more about the efficacy of our current CAPTCHA at blocking attacks and could recommend a CAPTCHA replacement that brings a measurable improvement in terms of the attack rate achievable for a given time and financial cost. | |
WE4.3.1 | If we apply some machine learning and data analysis tools to webrequest logs during known attacks, we'll be able to identify abusive IP addresses with at least >80% precision sending largely malicious traffic that we can then ratelimit at the edge, improving reliability for our users. | |
WE4.3.3 | If we deploy a proof of concept of the 'Liberica' load balancer, we will measure a 33% improvement in our capacity to handle TCP SYN floods. | |
WE4.3.5 | By creating a system that spawns and controls thousands of virtual workers in a cloud environment, we will be able to simulate Distributed Denial of Service (DDoS) attacks and effectively measure the system's ability to withstand, mitigate, and respond to such attacks. | |
WE4.3.6 | If we integrate the output of the models we built in WE 4.3.1 with the dynamic thresholds of per-ip concurrency limits we've built for our TLS terminators in WE 4.3.2, we should be able to increase our ability to neutralize automatically attacks with 20% more volume, as measured with the simulation framework we're building. | |
WE4.3.7 | If we roll out a user-friendly web application that enables assisted editing and creation of requestctl rules, SREs will be able to mitigate cachebusting attacks in 50% less time than our established baseline. | |
WE4.4.2 | If we deploy Temporary Accounts to a set of small-to-medium sized projects, we will be able to the functionality works as intended and will be able to gather data to inform necessary future work. | mw:/wiki/Trust_and_Safety_Product/Temporary_Accounts |
WE5.1.1 | If we successfully roll out Parsoid Read Views to all Wikivoyages by Q1, this will boost our confidence in extending Parsoid Read Views to all Wikipedias. We will measure the success of this rollout through detailed evaluations using the Confidence Framework reports, with a particular focus on Visual Diff reports and the metrics related to performance and usability. Additionally, we will assess the reduction in the list of potential blockers, ensuring that critical issues are addressed prior to wider deployment. | |
WE5.1.3 | If we reroute the endpoints currently exposed under rest_v1/page/html and rest_v1/page/title paths to comparable MW content endpoints, then we can unblock RESTbase sunsetting without disrupting clients in Q1. | |
WE5.1.4 | If we complete the remaining work to mitigate the impact of browsers' anti-tracking measures on CentralAuth autologin and move to a more resilient authentication infrastructure (SUL3), we will be ready to roll out to production wikis in Q2. | |
WE5.1.5 | If we increase the number of relevant SonarCloud rules enabled for key MediaWiki Core repositories and refine the quality of feedback provided to developers, we will optimize the developer experience and enable them to improve the maintainability of the MediaWiki codebase in the future. This will be measured by tracking developer satisfaction levels and whether test group developers feel the tool is becoming more useful and effective in their workflow. Feedback will be gathered through surveys and direct input from developers to evaluate the perceived impact on their confidence in the tool and the overall development experience. | |
WE5.1.7 | If we represent all content module endpoint responses (10 in total) in our MediaWiki REST API OpenAPI spec definitions, we will be able to implement programmatic validation to guarantee that our generated documentation matches the actual responses returned in code. | |
WE5.1.8 | If we introduce support for endpoint description translation (ie: does not include actual object definitions or payloads) into our generated MediaWiki REST API OpenAPI specs, we can lay the foundation to support Wikimedia’s expected internationalization standards. | |
WE5.2.3 | If we conduct an experiment to reimplement at least [1-3] existing Core and Extension features using a new Domain Event and Listener platform component pattern as an alternative to traditional hooks, we will be able to confirm our assumption of this intervention enabling simpler implementation with more consistent feature behavior. | |
WE5.3.3 | If we instrument both parsers to collect availability of prior parses and timing of template expansions, and to classify updates and dependencies, we can prioritize work on selective updates (Hypothesis 5.3.2) informed by the quantification of the expected performance benefits. | |
WE5.3.4 | If we can increase the capability of our prototype selective update implementation in Parsoid using the learnings from the 5.3.1 hypothesis, we can leverage more opportunities to increase the performance benefit from selective update. | |
WE5.4.1 | If the MediaWiki engineering group is successful with release process accountability and enhances its communication process by the end of Q2 in alignment with the product strategy, we will eliminate the current process that relies on unplanned or volunteer work and improve community satisfaction with the release process. Measured by community feedback on the 1.43 LTS release coupled with a significant reduction in unplanned staff and volunteer hours needed for release processes. | |
WE5.4.2 | If we research and build a process to more regularly upgrade PHP in conjunction with our MediaWiki release process we will increase speed and security while reducing the complexity and runtime of our CI systems, by observing the success of PHP 8.1 upgrade before 1.43 release. | |
WE6.1.3 | If we collect insights on how different teams are making technical decisions we are able to gather good practices and insights that can enable and scale similar practices across the organization. | |
WE6.1.4 | If we research solutions for indexing the code of all projects hosted in WMF’s code repositories, we will be able to pick a solution that allows our users to quickly discover where the code is located whenever dealing with incident response or troubleshooting. | |
WE6.1.5 | If we test a subset of draft metrics on an experimental group of technical documentation collections, we will be able to make an informed decision about which metrics to implement for MediaWiki documentation. | Wikimedia_Technical_Documentation_Team/Doc_metrics |
WE6.2.1 | If we publish a versioned build of MediaWiki, extensions, skins, and Wikimedia configuration at least once per day we will uncover new constraints and establish a baseline of wallclock time needed to perform a build. | mw:Wikimedia Release Engineering Team/Group -1 |
WE6.2.2 | If we replace the backend infrastructure of our existing shared MediaWiki development and testing environments (from apache virtual servers to kubernetes), it will enable us to extend its uses by enabling MediaWiki services in addition to the existing ability to develop MediaWiki core, extensions, and skins in an isolated environment. We will develop one environment that includes MediaWiki, one or more Extensions, and one or more Services. | wikitech:Catalyst |
WE6.2.3 | If we create a new deployment UI that provides more information to the deployer and reduce the amount of privilege needed to do deployment, it will make deployment easier and open deployments to more users as measured by the number of unique deployers and number of patches backported as a percentage of our overall deployments. | mw:SpiderPig |
WE6.2.5 | If we move MultiVersion routing out of MediaWiki, we 'll be able to ship single version MediaWiki containers, largely cutting down the size of containers allowing for faster deployments, as measured by the deployment tool. | https://docs.google.com/document/d/1_AChNfiRFL3VdNzf6QFSCL9pM2gZbgLoMyAys9KKmKc/edit |
WE6.2.6 | If we gather feedback from QTE, SRE, and individuals with domain specific knowledge and use their feedback to write a design document for deploying and using the wmf/next OCI container, then we will reduce friction when we start deploying that container. | T379683 |
WE6.3.4 | If we enable the automatic deployment of a minimal tool, we will be able to evaluate the end to end flow and set the groundwork to adding support for more complex tools and deployment flows. | phab:T375199 |
WE6.3.5 | By assessing the relative importance of each sustainability category and its associated metrics, we can create a normalized scoring system. This system, when implemented and recorded, will provide a baseline for measuring and comparing Toolforge’s sustainability progress over time. | phab:T376896 |
WE6.3.6 | If we conduct discovery, such as target user interviews and competitive analysis, to identify existing Toolforge pain points and improvement opportunities, we will be able to recommend a prioritized list of features for the future Toolforge UI. | Phab:T375914 |
Signals & Data Services (SDS) Hypotheses
[ SDS Key Results ] | ||
---|---|---|
Hypothesis shortname | Q2 text | Details & Discussion |
SDS 1.1.1 | If we partner with an initiative owner and evaluate the impact of their work on Core Foundation metrics, we can identify and socialize a repeatable mechanism by which teams at the Foundation can reliably impact Core Foundation metrics. | |
SDS1.2.1.B | If we test the accuracy and infrastructure constraints of 4 existing AI language models for 2 or more high-priority product use-cases, we will be able to write a report recommending at least one AI model that we can use for further tuning towards strategic product investments. | Phab:T377159 |
SDS1.2.2 | If we study the recruitment, retention, and attrition patterns among long-tenure community members in official moderation and administration roles, and understand the factors affecting these phenomena (the ‘why’ behind the trends), we will better understand the extent, nature, and variability of the phenomenon across projects. This will in turn enable us to identify opportunities for better interventions and support aimed at producing a robust multi-generational framework for editors. | Learn more. |
SDS1.2.3 | If we combine existing knowledge about moderators with quantitative methods for detecting moderation activity, we can systematically define and identify Wikipedia moderators. | T376684 |
SDS1.3.1.B | If we integrate the Spark / DataHub connector for all production Spark jobs, we will get column-level lineage for all Spark-based data platform jobs in DataHub. | |
SDS1.3.2.B | If we implement a frequently run Spark-based MariaDB MW history data querying job, reconciliate missing events and enrich them, we will provide a daily updated MW history wikitext content data lake table. | |
SDS2.1.1 | If we create an integration test environment for the proposed 3rd party experimentation solution, we can collaborate practically with Data SRE, SRE, QTE, and Product Analytics to evaluate the solution’s viability within WMF infrastructure in order to make a confident build/install/buy recommendation. | mw:Data_Platform_Engineering/Data_Products/work_focus |
SDS2.1.3 | If the Growth team learns about the Metrics Platform by instrumenting a Homepage Module on the Metrics Platform, then we will be prepared to outline a measurement plan in Q1 and complete an A/B test on the new Metrics platform by the end of Q2. | |
SDS2.1.4 | If we conduct usability testing on our prototype among pilot users of our experimentation process, we can identify and prioritize the primary pain points faced by product managers and other stakeholders in setting up and analyzing experiments independently. This understanding will lead to the refinement of our tools, enhancing their efficiency and impact. | |
SDS2.1.5 | If we design a documentation system that guides the experience of users building instrumentation using the Metrics Platform, we will enable those users to independently create instrumentation without direct support from Data Products teams, except in edge cases. | 工单「T329506」 |
SDS2.1.7 | If we provide a function for user enrollment and a mechanism to capture and store CTR events to a monotable in a pre-declared event stream we can ship MPIC Alpha in order to launch an basic split A/B test on logged in users. | |
SDS2.2.2 | If we define a standard approach for measuring and analyzing conversion rates, it will help us establish a collection of well-defined metrics to be used for experimentation and baselines, and start enabling comparisons between experiments/projects to increase learning from these. | |
SDS2.3.1 | If we conduct a legal review of proposed unique cookies for logged out users, we can determine whether there are any privacy policy or other legal issues which inform the community consultation and/or affect the technical implementation itself. |
Future Audiences (FA) Hypotheses
[ FA Key Results ] | ||
---|---|---|
Hypothesis shortname | Q2 text | Details & Discussion |
FA1.1.1 | If we make off-site contribution very low effort with an AI-powered “Add a Fact” experiment, we can learn whether off-platform users could help grow/sustain the knowledge store in a possible future where Wikipedia content is mainly consumed off-platform. | Experiment:Add a Fact |
Product and Engineering Support (PES) Hypotheses
[ PES Key Results ] | ||
---|---|---|
Hypothesis shortname | Q2 text | Details & Discussion |
PES1.2.4 | If we research the Task Prioritization focus area in the Community Wishlist in early Q2, we will be able to identify and prioritize work that will improve moderator satisfaction, which we can begin implementing in Q3. | |
PES1.2.5 | If we are able to publish and receive community feedback on 6+ focus areas in Q2, then we will have confidence in presenting at least 3+ focus areas for incorporation in the 2025-26 annual plan. | |
PES1.2.6 | By introducing favouriting templates, we will improve the number of templates added via the template dialog by 10%. | |
PES1.3.4 | If we create an experience that provides insights to Wikipedia Audiences about their community over the year, it will stimulate greater connection with Wikipedia – encouraging engagement in the form of social sharing, time spent interacting on Wikipedia, or donation. | |
PES1.4.1 | If we draft an SLO with the Editing team releasing Edit Check functionality, we will begin to learn and understand how to define and track user-facing SLOs together, and iterate on the process in the future. | |
PES1.4.2 | If we define and publish SLAs for putting OOUI into “maintenance mode”, growth of new code using OOUI across Wikimedia projects will stay within X% in Q1. | |
PES1.4.3 | If we map ownership using the proposed service catalog for known owned services in Q1, we will be able to identify significant gaps in service catalog as it helps in solving the SLO culture by the end of the year. | |
PES1.5.1 | If we finalize and publish the Edit Check SLO draft, practice incorporating it in regular workflows and decisions, and draft a Citoid SLO, we’ll continue learning how to define and track user-facing and cross-team SLOs together. | |
PES1.5.2 | If we clarify and define in writing a document with set of roles and responsibilities of stakeholders throughout the service lifecycle, this will enable teams to make informed commitments in the Service Catalog, including SLOs |
有關工作桶的解釋
維基體驗
本工作桶的目的是有效地提供、改進和創新維基體驗,從而在全世界傳播自由知識。本工作桶符合維基媒體運動策略建議#2(改善使用者體驗)和#3(提供安全性和包容性)。我們的受眾包括我們網站上的所有合作者,以及自由知識的讀者和其他消費者。我們為全球排名前十的網站和許多其他重要的自由文化資源提供支援。這些系統對效能和正常運作時間的要求不亞於世界上最大的科技公司。我們提供維基、翻譯、開發人員應用程式(以及更多!)的使用者介面,並提供支援應用程式和基礎設施,所有這些都構成了一個強大的平台,供志願者合作分享全球範圍的自由知識。我們的目標是改進我們的核心技術和能力,確保我們不斷改善專案志願者編輯和管理員的體驗,改善所有技術貢獻者的體驗,以改進或提高維基體驗,並確保為全球自由知識的讀者和消費者提供良好的體驗。我們將透過產品和技術工作以及研究和行銷來實現這一目標。我們預計這個工作桶最多有五個目標。
知識是由人們創造的!因此,我們的年度計劃將重點關注內容,為內容做出貢獻的人們、以及獲取和閱讀內容的人們。
我們的目標是根據現有策略制定運營計劃,主要是我們關於貢獻者、消費者和內容「飛輪」的假設。這些目標的主要轉變是強調「飛輪」的內容部分,並探索我們的管理員和工作人員現在可能需要我們提供什麼,目的是確定未來的社群健康指標。
信號和數據服務
為了滿足確保決策公平的維基媒體運動策略建議(建議 #4)、改善用戶體驗(建議 #2),以及評估、迭代和適應(建議#10 ),整個維基媒體運動的決策者必須能夠存取可靠、相關且及時的數據、模型、見解和工具,以幫助他們評估其工作的影響(已實現的影響和潛在影響)及其社群的工作,使他們能夠做出更好的策略決策。
在信號和數據服務的工作桶,我們確定了四個主要受眾:維基媒體基金會工作人員、維基媒體自治體和用戶組、重用我們內容的開發人員以,及維基媒體研究人員——我們優先考慮並滿足這些受眾的數據和見解需求。我們的工作將涵蓋一系列活動:定義差距、制定指標、建立計算指標的管道,以及開發數據和訊號探索經驗和途徑,幫助決策者更有效和更愉快地與數據和見解互動。
未來的觀眾
此工作桶的目的是探索擴大現有消費者和貢獻者受眾範圍的策略,努力真正覆蓋世界上的每個人,作為自由知識生態系統的重要基礎設施。該工作桶與 維基媒體運動策略建議 #9(自由知識的創新)保持一致。人們越來越多地以不同於我們傳統的文章網站提供的體驗和形式消費資訊——人們正在使用語音助理、花時間觀看影片、與人工智慧互動等等來獲得資訊。在這個工作桶中,我們將圍繞自由費知識生態系統的潛在長期未來以及我們將如何成為其重要基礎設施提出假設並進行測試。我們將透過產品和技術工作,以及研究、合作和行銷來實現這一目標。當我們確定有希望的未來狀態時,從這個工作桶中學到的知識將影響並透過連續年度計劃中的工作桶 #1 和工作桶 #2 進行擴展,推動我們的產品和技術產品朝著服務未來知識尋求者所需的方向發展。此工作桶的目標應該推動我們進行實驗和探索,讓我們聚焦在自由知識的未來願景。
子工作桶
我們有另外兩個「子工作桶」,其中包含關鍵功能領域,這些功能必須存在於維基媒體基金會中以支援我們的基本運營,其中一些功能與任何軟體組織均有共同之處。這些「子工作桶」不會有自己的最高層目標,但會參與並支援其他群體的最高層目標。他們是:
- 基礎設施的基礎。此工作桶涵蓋維持和發展我們的數據中心、運算和儲存平台、和其營運服務,支援我們面向公眾的站點和服務營運的工具和流程的團隊。
- 產品和工程支援。此類別包括「大規模」營運的團隊,為其他團隊提供服務,從而提高其他團隊的生產力和營運。