weeklyOSM 612

09:51, Sunday, 17 2022 April UTC

05/04/2022-11/04/2022

lead picture

progress-visualizer shows progress of Import/Catalogue/Road import (Norway)/Update [1] © by Mathias Haugsbø | map data © OpenStreetMap contributors (ODbL) |

Breaking news

  • OSM Ukraine is urging everyone to restrain (uk) > en from any mapping in Ukraine while the conflict is still occuring as it fuels the information war.

Mapping

  • jmapb has published part two of their ‘Using NYC Dept of Buildings Building Information Search’ series.
  • Anne-Karoline Distel’s video this week covers streetlight mapping.
  • The proposal on content=track_ballast is waiting for your comments. The proposed tag describes the content of a container feature such as man_made=storage_tank as track ballast, a term which describes solid material used to fill a track bed on railways.
  • Voting is currently open for:
    • A modified version of the artwork_subject=sheela-na-gig proposal until Wednesday 20 April. This version drops the suggestion of also adding a subject:wikidata tag duplicating the same information.
    • isced:2011:programme=*, to update isced:level tagging to the 2011 version of the International Standard Classification of Education, until Sunday 24 April.

Community

Local chapter news

Events

  • René Chalon (fr) > en (user renecha) presented (fr) the OpenStreetMap project during Journée du Libre Éducatif 2022. Hosted in Lyon on 1 April, the event showcased (fr) > en 12 different open-source projects with educational potential to more than 400 visitors.

Maps

  • As well as a tool for mapping place name elements (we reported last week), SeeSchloss also offers a map to search street names in France. Unfortunately it’s currently falling foul of OSM France’s ‘attribution is not optional’ campaign, as can be seen by looking at the map.

switch2OSM

  • The Ukrainian war crime evidence collection platform uses (uk) > en OSM.

Licences

  • AngocA wrote (es) > en a long blog post about ‘Clarifying permission to use CC BY in OSM’.

Software

  • [1] Mathias Haugsbø, from Norway, has created progress-visualizer, a tool that takes OpenStreetMap wiki tables and visualises each project’s mapping progress on a map, making it very simple to create status maps for mapping projects. To date, the tool only covers Norway.
  • The Mapillary team is conducting a two minute user survey (via Facebook), with an option to answer extended questions. The goal is to gather community opinions about satisfaction with the Mapillary apps and platform and to make plans for Mapillary to better fit user needs. Anyone who has used Mapillary is encouraged to share feedback.
  • Pieter Vander Vennet has released a new version of MapComplete, which has support to help in quickly translating the application. His diary post explained how this can be done and invited everyone to contribute translations.
  • AngocA compared (es) > en the features offered by different services/applications/websites that use OSM map notes.

Programming

  • Tomasz Taraś (tomczk) presented a few options for importing OSM data into a PostgreSQL database, including a worked example using imposm3.

Did you know …

  • … the liveuamap that links news from different countries of the world (we reported earlier) to an OSM map? There is also a thematic map on epidemics.
  • … the advantages of micromapping traffic signs? The OSM community in Helsinki did just that and discovered all sorts of faults with official signage.
  • … the fastest way to contact the Data Working Group?
  • … web services that offer map tiles are usually free up to a certain limit, after which they charge by usage? At Protomaps, Brandon Liu gave some thoughts on this subject.

OSM in the media

  • On April 5, there was a curious celebration: ‘Read a Road Map Day‘. On the occasion of this day, Saarländischer Rundfunk filmed a short feature at the elementary school in Lebach. Thanks to the OSM veteran Wolfgang Barth and the OSM-connected editor Herbert Mangold, OpenStreetMap is of course also addressed (de).

Other “geo” things

  • Christopher Beddow wrote about his vision for the next generation of maps.
  • Open311 is a ‘collaborative model and open standard for civic issue tracking’. It’s used by the City and County of San Francicso, which has a public services provision problem. Because the data is open, someone has been able to create a map of some of the issues that this causes.

Upcoming Events

Where What Online When Country
OSM World Discord Note Mapathon osmcalpic 2022-04-10 – 2022-04-17
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-18
Lyon Rencontre mensuelle Lyon osmcalpic 2022-04-19 flag
150. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-04-19
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-04-19 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-04-19 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-20
Focused Workshop – Editing OSM Data with Street-Level Imagery osmcalpic 2022-04-21
Dublin Irish Virtual Map and Chat osmcalpic 2022-04-21 flag
New York New York City Meetup osmcalpic 2022-04-23 flag
Bogotá Distrito Capital – Municipio Introducción a la edición del Wiki de OpenStreetMap osmcalpic 2022-04-23 flag
京都市 京都!街歩き!マッピングパーティ:第29回 Re:鹿王院 osmcalpic 2022-04-24 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-25
Bremen Bremer Mappertreffen (Online) osmcalpic 2022-04-25 flag
OSMF Engineering Working Group meeting osmcalpic 2022-04-25
San Jose South Bay Map Night osmcalpic 2022-04-27 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-27
Roma Capitale Incontro dei mappatori romani e laziali osmcalpic 2022-04-27 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-04-28
Gent Open Belgium 2022 osmcalpic 2022-04-29 flag
Rapperswil-Jona Mapathon/Hackathon at the OST Campus Rapperswil and virtually osmcalpic 2022-04-29 flag
IJmuiden OSM Nederland bijeenkomst (online) osmcalpic 2022-04-30 flag
London Missing Maps London Mapathon osmcalpic 2022-05-03 flag
Berlin OSM-Verkehrswende #35 (Online) osmcalpic 2022-05-03 flag
Boa Viagem BOA VIAGEM(CE) BRASIL – EDIFÍCIOS, ESTRADAS, PONTOS DE INTERESSE E ÁREA VERDE. osmcalpic 2022-05-07 – 2022-05-08 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, SK53, SomeoneElse, Strubbl, TheSwavu, derFred.

In the fourth week of the conflict between Russia and Ukraine, the Russian federal body Roskomnadzor threatened to censor the access to Wikipedia in the territory. This caused many people to start downloading all the content of the encyclopedia through Kiwix, where one of its funcionalities is to packaging the site for offline viewing. This month, the platform registered 105,889 downloads of the Russian Wikipedia, this means a growth of more than 4000% compared to the first half of January. What’s more, the downloads that come from Russia occupy 42% of all server traffic of the platform, in 2021 they only occupied 2%.

What is Kiwix and how is it used?

Created in 2007, Kiwix is ​​an application that allows you to download content from web pages and then view it without the neccesity of Internet connection. In addition to Wikipedia, you can also access to projects like the Gutenberg project, Stack Exchange, TED talks, Khan Academy, among others.

Although Kiwix is ​​a great tool to preserve the content of Wikipedia against possible censorship; the possibility of access to the website without needing an Internet connection opens up a range of possibilities for communities that have very little – or no – access to the Internet.

Given this scenario, Wikimedia Chile created a tutorial in Spanish that is now available for download in Commons. This document aims to give an introduction to Kiwix, as well as provide guides and examples on how to use it for educational purposes. Doing this, we seek that Spanish-speaking people understand how Kiwix works and also know where to find the files to download Wikipedia or other projects.

The importance of Kiwix for society

We’re aware that many people can’t access to the Internet, either due to lack of infrastructure or because of the costs that the service has in their territories. Kiwix makes it possible to overcome these barriers by making, for example, Wikipedia content accessible to devices without Internet access.

In addition, Kiwix gives options to choose how to download the content: for example, if you don’t have enough storage space, you can download only the first paragraphs of the articles, or download the content without images. Another thing to note is that the Kiwix team uploads thematic files, so if you need to teach about a specific topic such as the History of Chile, you can download (or create) your own Wikipedia that only has articles related to that.

Wikimedia Chile values ​​the educational role of Kiwix and we seek that more communities in our region can use it, understanding that in our country and in Latin America there are still limitations in Internet access.


Download the tutorial on Wikimedia Commons

The first episode of WIKIMOVE has now dropped!

20:29, Thursday, 14 2022 April UTC

We are excited to be dropping the first episode of WIKIMOVE, the new podcast on everything Wikimedia Movement Strategy.

What’s in this episode? We introduce the concept of the pod and then chat about the latest news in our movement. Then, as the main topic of this episode we discuss one of the two pillars of movement strategy: Knowledge as a Service. During our guest interview we learn more about the origin story of the strategic direction and discuss its relevance for the future of our movement and for the way we provide our services. 

Our guests are…

Tochi Precious, Founder of the Igbo Wikimedia User GroupToichi has been working on developing the creativity of young African editors and increasing the number of African language Wikipedias. 

Guillaume Paumier, Principal Program Manager, Advancement, Wikimedia Foundation
Guillaume was instrumental in the earlier stage of movement strategy development and helped write the strategic direction.

You can find the audio podcast on Commons and a shortened video version on YouTube. Please visit our Meta-wiki page to react to the episode and subscribe to get notified of each new release. 

The topic and guests for our next episode will be announced soon, stay tuned!

We are very happy to share that Abstract Wikipedia and Wikifunctions will be supported by a Google.org Fellowship. Let’s first introduce Google.org and the Fellowship program in their own words:


About Google.org

Google.org, Google’s philanthropic arm, supports nonprofits that address humanitarian issues and apply scalable, data-driven innovation to solving the world’s biggest challenges. They accelerate progress by connecting nonprofits with a unique blend of support that includes funding, products, and technical expertise from Google volunteers. They engage with these believers-turned-doers who make a significant impact on the communities they represent, and whose work has the potential to produce meaningful change. Google.org wants a world that works for everyone—and they believe technology and innovation can move the needle.

Google.org Fellowship

The Google.org Fellowship enables Googlers to complete up to 6 months of full-time pro bono technical work to accelerate the social impact of nonprofits and civic entities. Fellows, including engineers, product managers, UX researchers, and designers, roll up their sleeves alongside the organization’s staff to help build open-source solutions and equip the organization to maintain and implement these solutions long after the Fellowship ends. Each year, 50+ Googlers provide 50,000+ hours of pro bono services assisting organizations to build solutions for some of the world’s toughest challenges.

For nine months, up to 10 Google.org Fellows will be supporting the Abstract Wikipedia and Wikifunctions team. Most of their focus will be on the backend of Wikifunctions: in other words, they will work towards making the evaluation of functions far more efficient. This will enable Wikifunctions to grow without immediately overwhelming the computing resources we have available at the Wikimedia Foundation.

The Fellows will also work on a few other projects about which we will share updates in the weeks and months to come. We will also publish a detailed report in the end about the impact and the results of the work. Several Fellows will start their work in April this month, and we’ll bring on others to begin work in July. 

As with all of our work, the work of the Fellows will be open source and developed in the open, using our usual workflows and tools. The Fellows will work under the direction of the Wikimedia Foundation, and will, during the time of the Fellowship, interact with the community and others just like every other employee or contractor.

The contributions of the Fellows are expected to allow us to launch sooner and scale Wikifunctions much faster than we would otherwise be able to. This also will allow us to focus on Abstract Wikipedia sooner than we had expected over the last year, and thus ultimately speed up to deliver on our mission of allowing more people to share knowledge in more languages.

In the following, we let the Fellows introduce themselves in their own words.

Ali Assaf, Software Engineer Fellow [Apr.-Sep. 2022]
Location: Zurich, Switzerland
Ali (he/him) has worked for more than 6 years as a software engineer at Google Zurich, developing backend infrastructure for YouTube. His work enables creators and partners to deliver content at scale. The cat videos must flow! Before that, he was doing graduate studies in theoretical computer science in Paris, on the subject of logic and computer-assisted theorem proving. Although he completed his PhD, his quest for the ultimate proof language is still incomplete. Ali grew up in Beirut, Lebanon, at the intersection of cultures, languages, and beliefs. He is passionate about education and access to information, and considers Wikipedia to be one of the top 3 most important websites in the world.

Ariel Gutman, Software Engineer Fellow [Apr.-Sep. 2022]
Location: Zurich, Switzerland
Ariel (he/him) defended his PhD thesis at the University of Konstanz in 2016, where he was researching Neo-Aramaic dialects as an associate fellow of the Zukunftskolleg interdisciplinary institute. His curriculum includes a master’s degree in Linguistics awarded by the Université Sorbonne Nouvelle and a master’s degree in computer Science awarded by the École Normale Supérieure, following a B.Sc. from the Hebrew University in Jerusalem. He has conducted fieldwork on Neo-Aramaic in France and Israel, as well as fieldwork on an Austronesian language in West Papua, Indonesia. He has published numerous articles about Neo-Aramaic and Language Acquisition. His first book (co-authored with Wido van Peursen) entitled The Two Syriac Versions of the Prayer of Manasseh was published by Gorgias Press in 2011. His second book, Attributive Constructions in North-Eastern Neo-Aramaic, was published by Language Science Press in 2018. Currently he is working as a software engineer specialized in computational linguistics at Google, Zurich.

Eunice Moon, Program Manager Fellow [Apr.-Sep. 2022]
Location:  San Francisco, CA
Eunice (she/her) is a Business Partner in Google’s Operational Effectiveness, responsible for contingent workforce strategy and planning for Alphabet. Previously, Eunice was a Senior Manager at Accenture, where she advised Fortune 500 companies on building data-driven organizational change programs. Through Accenture Development Partnerships, Eunice worked at the United Nations to develop a strategy to enable access to family planning for 3 million women in the Philippines. As a consultant at TechnoServe, she developed a strategy that improved the livelihoods of smallholder farmers in Mozambique. Eunice holds an MBA from London Business School and a BA in Economics from UC Berkeley.

Mary Yang, Software Engineer Fellow [Apr.-Sep. 2022]
Location:  Seattle, WA
Mary (she/her) is a Software Engineer at Google Seattle for 2.5 years. At her home team in Ads, she primarily works on data pipelines. She graduated college with a B.S. in Computer Science and a B.A. in Physics. She enjoys learning and researching, and she dreams of going back to school one day. With Chinese as her first language, Mary is passionate about the mission of this project, making information available around the globe. In her spare time, Mary likes baking, traveling, playing with ML models, and her two cats. 

Olivia Zhang, Product Manager Fellow [Apr.-Sep. 2022]
Location:  St. Louis, MO
Olivia (she/her) is a Google Cloud customer engineer, focused on enabling & coaching enterprises in their infrastructure and application modernization journey. Before Google, Olivia held roles in SaaS solution architecture and technology consulting, driving large scale technology transformations for customers across industry verticals. Olivia has a Bachelor of Arts in Psychology from Washington University in St. Louis, and an MBA from University of Missouri – Columbia.

Ori Livneh, Software Engineer (TLM) Fellow [Apr.-Sep. 2022]
Location: New York City
Ori (he/him) is a Software Engineer at Google, where he works on optimizing software to run efficiently on modern hardware. He lives in New York City with his partner and their son. In his spare time, he enjoys running and hacking on open-source software. He’s been a Wikipedian since 2005. Ori also worked for the Wikimedia Foundation as a software engineer from 2012 to 2016.

Sandy Woodruff, UX Designer Fellow [Jul.-Dec. 2022]
Location:  San Francisco
Sandy (she/her) is a Bay Area-based interaction designer with 9+ years of experience who strives to create experiences that are both ethical and impactful. She has been at Google for 4.5 years on the Cloud AI & Industry Solutions UX team designing products that make AI accessible to users with limited machine learning experience. Outside of Google, Sandy loves helping others break into the field of UX through career coaching. Before that, she designed at Etsy, Rent the Runway, and Fab; advised for Cornell Tech’s incubator and launched a startup that helped New York City residents develop better recycling habits.

If you have further questions or comments, feel free to ask on wiki or on the Abstract Wikipedia mailing list. We are very thankful to Google.org for offering us this co-operative program.

This relationship is not exclusive to them; if other companies, universities, or other groups would be interested in making similar donations in kind towards the Wikimedia movement, we would be delighted to work with them, too. You can reach out to us at [email protected].

Our work with the Fellows is part of an ongoing relationship between the Foundation, Google, and Google.org. We’re happy to continue our work with both entities to support a healthy web ecosystem, making knowledge more accessible and representative of the diversity of the web’s users.  

Originally published by Denny Vrandečić on April 12, 2022 on Media-wiki

Are you planning to attend the Wikimedia Hackathon from May 20-22 next month? We hope so! The main event will be held online.

Host a session

The call for sessions is now open on the schedule page. If you’d like to host a session, pick an open slot in the category that best fits your topic, and add yourself to the schedule. Want to run a session but not sure of a topic? Check out the interests that people have listed on the Participants list!

The Developer Advocacy team has put together some suggestions for how to create a fun session.

Propose a project for hacking

Hackathons are all about hacking! To propose a project or seek help on an existing project, add a task to the Phabricator board. As the board grows, check back to find projects you might want to hack on during the event.

Other ways to participate

In addition to attending (virtually or at a local meetup), you can also participate by volunteering to welcome newcomers, contribute translations, or help with other types of tasks. Check the wiki page for ideas.

More info

You can find more information on the Hackathon MediaWiki.org page, which will continue to grow over the next few weeks.

About this post

Featured image credit: File:A_cat_reads_the_Gerrit_commit_message_guidelines_on_MediaWiki.org.jpg, TBurmeister(WMF), Creative Commons Attribution-Share Alike 4.0 license

Eric Luth at the opening ceremony of Wikimania 2019 in Stockholm. Image by Vysotsky; CC BY-SA 4.0.

The Wikimedia Foundation’s Global Advocacy and Public Policy team is excited to announce our sponsorship of four events: three of which we have never sponsored before. By means of this support, we are giving back to the larger community of digital rights activists and civil society organizations who advocate for policy frameworks and regulations that support human rights on the internet. Through active participation in these events, our movement will help develop policies that advance free knowledge, while also expanding access to knowledge and promoting knowledge equity in historically underserved regions, as identified in our 2030 Movement Strategy.  

Read on to learn more about the events that we are supporting, what support we are providing to each, and how you can participate.


Digital Rights and Inclusion Forum (April)

The Digital Rights and Inclusion Forum (DRIF), hosted by the Paradigm Initiative, will bring together digital rights activists and scholars from across Africa in online and in-person events to work towards a digitally inclusive and rights-respecting future. The Wikimedia Foundation will provide financial assistance to cover internet access costs for some registered attendees participating in online events. This contribution will reduce the financial barriers that could prevent greater access to and participation in the Forum among African civil society groups.

  • Dates: 12 April to 20 May 2022
  • Format: Virtual and in-person(see agenda for distributed sessions across Africa)
  • View: Follow Paradigm Initiative’s YouTube channel here

C20 Global Civil Society Forum (April)

Our global reach will further expand with the Wikimedia Foundation’s support of the C20, the global civil society forum that runs parallel to the Group of 20 (G20) economic forum. The 2022 process is hosted by Indonesia and will convene civil society organizations from around the world to discuss issues related to digital transformation and inequities, among other topics. The Foundation will provide financial assistance for logistics and participate in the Working Group on Digitalization, Education, and Global Citizenship. This partnership will empower civil society, including our movement, to help develop policy recommendations presented to the 20 largest economies in the world on issues relating to free knowledge and internet regulation. 

The Foundation’s Rachel Arinii Judhistari, public policy specialist for Asia, will represent us during these engagements from April to May 2022, and will advance our vision for a world in which every single human being can freely share in the sum of all knowledge. Watch the C20 kick-off meeting here, and stay up-to-date on C20 activities by checking out C20 news here

  • Dates: 4 April to 20 May 2022
  • Format: Virtual and in-person

More Transparency in Content Moderation: How Do We Achieve It? (May)

We also look forward to collaborating with the Observatorio Latinoamericano de Regulación, Medios y Convergencia (OBSERVACOM) to hold an event marking World Press Freedom Day in Punta del Este, Uruguay, on 1 May 2022. The event will explore transparency and accountability around online platforms’ content moderation practices and the ethical use of online user data. The aim of this event is to enhance debate around these issues in Latin America, so that civil society groups and their allies can better anticipate and respond to undemocratic regulations in the region. Raising awareness and building coalitions around these issues is increasingly important as governments in the region seek to regulate online platforms. A proposed regulation in Chile is an example of one such initiative, which caused concern for the Wikimedia Foundation and free knowledge movement at large. Our team looks forward to leading conversations around smart legislation that protects the rights of Wikimedians to enforce their own community standards.

  • Date: 1 May 2022
  • Format: Virtual and in-person
  • Registration: To be announced

RightsCon 2022 (June)

Finally, the Foundation will continue to support and participate in RightsCon, the world’s leading summit on human rights in the digital age, hosted by AccessNow. Every year, RightsCon convenes civil society, governments, tech companies, and human rights activists from around the world to debate and identify potential solutions around the emerging challenges to a free and open internet. These include: privacy and surveillance, internet access, inclusion, and internet shutdowns and disruptions. The organizers of RightsCon accepted five proposals for sessions submitted by the Wikimedia Foundation, two of which were co-created with Wikimedia volunteers. Our support this year is unique in that the Foundation has contributed to the conference’s Connectivity and Accessibility funds. This support will help to make the conference accessible in more languages and across more devices, while also supporting internet connectivity for grassroots actors to participate in the virtual events.

  • Dates: 6–11 June 2022
  • Format: Virtual
  • Registration: open now, details here

If you are interested in learning more about these and similar events, follow the Global Advocacy and Public Policy team on Twitter to stay up-to-date on public policy that affects free knowledge.

This Month in GLAM: March 2022

08:48, Wednesday, 13 2022 April UTC

DSA: Trilogues Update

08:03, Wednesday, 13 2022 April UTC

European Union (EU) lawmakers are under a lot of self-imposed pressure to reach an agreement on content moderation rules that will apply to all platforms. Several cornerstones have been placed either at the highest political levels (e.g., banning targeted ads directed at minors) or agreed upon on a technical level (e.g., notice-and-action procedures). But there is still no breakthrough on a few other articles, like the newly floated “crisis response mechanism.”  

The European Commission published its legislative proposal back in December 2020. The European Council adopted its position in December 2021, while the European Parliament agreed on its version in January 2022. We have previously compared the three stances from a free knowledge point of view. Since January, the three institutions are in semi-formal negotiation procedures called “trilogues”, where they are trying to reach a final compromise. It is time for us to give you an update on the negotiations.

Whose Content Moderation And Rules Are We Talking About?

Online platforms allowing users to post content often have functions that allow these users to set up their own rules and actively moderate certain spaces. This is true for the now classical, but still very popular, online discussion forums, including Reddit groups, fan pages or club bulletin boards. It is especially true for Wikimedia projects, including Wikipedia, where volunteer editors make up the rules and moderate the space. 

With the Digital Services Act (DSA) imposing obligations for content moderation, it would be undesirable to put volunteer citizens who care about a space under the same legal pressures as professionals working full time for a corporation. Hence, we need to make sure that the definitions of “content moderation” and “terms and conditions” reflect that. Currently they both do. 

As of this week, both the Parliament and the Council agree to back the Commission proposal and define “content moderation” within this regulation as “activities, automated or not, undertaken by providers of intermediary services.”

When it comes to “terms and conditions” the two bodies have a slight disagreement. The Parliament position is to add a “by the service provider” clarification to the definition. The Council, however, believes that is already a given in the text, which reads:

(q) ‘terms and conditions’ means all terms and conditions or clauses, irrespective of their name or form, which govern the contractual relationship between the provider of intermediary services and the recipients of the services.

We welcome the fact that legislators and officials are having a conversation about this, with projects such as ours and online forums in mind. 

“Actual knowledge” of Illegal Content

A cornerstone of the DSA is to set up clear and straightforward rules for the interactions between users and providers with regards to content moderation. A notice-and-action mechanism is the first step. Then there are ways for users to contest the decisions—or indecision—of the service providers: internal complaints, out-of-court dispute settlements and, of course, court challenges. 

It was of the utmost importance for Wikimedia to highlight that not every notice the Wikimedia Foundation receives is about illegal content. This is crucial, as “actual knowledge” of illegal content forces action, usually a deletion. The agreed upon new text now includes language explaining that notices imply actual knowledge of illegal content only if a “diligent provider of hosting services can establish the illegality of the relevant activity or information without a detailed legal examination.”

A new addition in the negotiations is that the internal complaint handling mechanism would allow users to complain when platforms decide not to act on breaches of their terms and conditions.

Who Will Regulate and Oversee Wikipedia and its Sister Projects? 

According to the original Commission proposal, each Member State would designate a regulator responsible for enforcing the new rules. A platform would be regulated either where it is established or where it chooses to have a legal representative if the service provider is not located within the EU. During the trilogues, the Council suggested and the Parliament accepted that rules specific to Very Large Online Platforms (VLOP) should be enforced by the Commission. We normally welcome this move, even if it does provide for some inconsistency out of the Wikimedia projects, only Wikipedia is likely to be a VLOP, which means that it alone will be overseen by the Commission, while our other projects will be overseen by national authorities. 

What Will This Cost Wikimedia?

As the idea to have the Commission play the role of a regulator received traction, another line of thought was also suddenly accepted: establishing a fee for VLOPs to pay for the additional Commission staff needed. The idea is for the DSA to give powers to the Commission to impose fees based on a delegated act.  

It took some back and forth, but the final proposal by the Commission is to waive the fee for not-for-profit service providers. 

Crisis Response Mechanism

Sparked by the invasion of Ukraine, the last weeks saw political pressure build up to include a “crisis response mechanism” into the regulation. It would empower the Commission to require that providers of VLOPs apply “specific effective and proportionate measures” during a crisis. A crisis is defined to take place “where extraordinary circumstances lead to a serious threat to public security or public health in the Union.” 

While we understand the need for such a mechanism in principle, we are uncomfortable with its wording. Several key points must be addressed:               

  • Decisions that affect freedom of expression and access to information, in particular in times of crisis, cannot be legitimately taken through executive power alone.
  • The definition of crisis is unclear and broad, giving enormous leeway to the European Commission. 
  • A crisis response must be temporary by nature. The text must include a solid time limit

Targeted Advertising

It looks like the Parliament and the Council will agree to ban targeted advertising to minors as well as using sensitive data (e.g., political and religious beliefs) for targeting. Wikimedia generally supports everyone’s right to edit and share information without being tracked and recorded. 

Waiver for Nonprofits, Maybe?

It is still an open question whether the Council will accept the Parliament’s proposal to include a waiver allowing not-for-profits to be excluded from certain obligations, such as out-of-court-dispute settlement mechanisms. We welcome this, as it would avoid us setting up new mechanisms that could disrupt a largely efficient community content moderation system. But if the definitions of the DSA make it clear that this applies only to service provider decisions, we will not worry too much about it. 

General Monitoring Prohibition

Negotiators are still discussing a compromise that there should be no general obligation to monitor, “through automated or non-automated means,” information transmitted or stored by intermediary services. The Parliament wants to go further and clarify that this also includes “de facto” general monitoring obligations—i.e., rules that compound to general monitoring in practice. The thinking behind this is that sometimes several smaller obligations can lead the providers to a situation where they need to monitor all content. The Council is still pushing back on this. 

We do believe that a ban on general monitoring is crucial to ensure intellectual freedom and support the Parliament’s position on this.

Next Steps

The next technical meeting of advisers and experts is on 19 April 2022. The next political round of negotiations is scheduled for 22 April 2022. Europe needs a set of general content moderation rules, and the DSA is on track to deliver exactly this. We hope that all parts of the regulation will be properly deliberated and proper safeguards will be enshrined. Wikimedia will continue to provide constructive input to lawmakers as well as participate in the public debate. 

Content Translation and people that use the tool

07:00, Wednesday, 13 2022 April UTC

In October 2021, the Content Translation tool, also known as CX, that helps Wikimedia contributors translate articles in Wikipedia reached a one million article milestone, thanks to the hard work of 70,000 contributors who use it. Every contribution by these volunteers is valuable, motivating the Wikimedia Foundation Language team to improve the tool. Hence, we would like to highlight the contributions of some of our distinctive volunteers who use the tool to translate content across different language Wikipedias; this is not to say that the users we have selected are most important, but these eight users are unique in different ways, and we will like to share their stories and more of such stories in the future.

Highest contributors based on language Wikipedia

The Arabic, Ukrainian and Tamil Wikipedias have some contributors with the highest number of articles translated using the Content Translation tool. When it comes to the size of Wikipedia, which is determined based on the number of content, active contributors and users of Wikipedia, Arabic, Ukrainian, and Tamil are the 26th, 45th and 47th consecutively. In other words, the Arabic, Ukrainian and Tamil Wikipedias are categorised as medium-sized. Notwithstanding this position, they are among the highest content translators from other languages using the Content Translation tool. We would like to share the user stories of some of the prolific users in these languages:

Adel Nehaoua, whose username is Nehaoua in the Wiki projects, lives in Algeria and translates content to Arabic Wikipedia. Adel has been a Wikimedian for 12 years and has been a prolific user of Content Translation for three years. He discovered the feature by checking the Arabic Wikipedia menu bar. He translates content from English and French to Arabic. He remains one of the consistent translators with more than one thousand five hundred translated articles in the Arabic Wikipedia community. He does not have specific translation topics, but uses the tool for any article that interests him. To Nehaoua, the essential feature of the tool is its ability to translate internal links and automatically transfer page format. However, it still needs improvement in converting and translating templates and tables from the original article to the translated one.

Letizia-Olivia is another one of many prolific users of Content Translation. She is based in Ukraine and translates content into Ukrainian, spoken by more than 40 million people. She noticed the tool was enabled as a beta feature in Ukrainian Wikipedia and decided to try it. Letizia-Olivia said, “I use the tool because it helps you translate one language to another in an easy and straightforward manner. I also introduce bilingual newcomers to this tool since it is a simple way to contribute to Wikipedia”. This user usually translates biographies and articles about buildings and phenomena.” Also, she believes that improving the tool further will result in more quality articles. She has been a staunch user of this translation tool since she discovered it a year ago.

Another prolific user is P. Mariappan, whose username is சத்திரத்தான் (“Cattirattāṉ”), who translates into Tamil, a language spoken by more than 75 million people, mostly in the south of India. He was introduced to the tool by the Tamil Wikipedia community. சத்திரத்தான் translates all kinds of articles that interest him at that moment, for example, biographies of politicians, articles about parks, sanctuaries and historic places, etc. Being an active member of his Wikipedia community, he always recommends the tool to editors. He trains people during Wikipedia workshops on how to use the Content Translation. He has tried translating articles manually before the existence of the Content Translation tool, so “the tool makes my job faster”, say’s User சத்திரத்தான். One improvement he would like in the Content Translation tool is better support for the translation of long tables. 

Pioneer and newbie translators

Testing Content Translation in Wikimedia Labs in 2014 was the beginning of the tool’s journey. The first tests were done with translation from Spanish to Catalan, which is spoken by about nine million people. Many of the pioneers still use it for translation to date. One of our distinguished pioneer translators is:

Àlex Hinojo, whose username is Kippelboy; his translation, Castillo de la Geltrú is the oldest article translated with Content Translation after the deployment. Àlex translated the Castillo de la Geltrú article on 17th January 2015 at 07:42:12 UTC. Since the tool’s inception, this user has been an ardent translator. He uses it to translate articles from other languages to Catalan. For him, Content Translation is beneficial and helps import references, format, and add Wikidata links. Àlex’s translations are mostly biographies and articles about organisations. “These types of articles have the same structure, and they are easy to adapt to our local context”, says Àlex. He is not just an individual contributor: Àlex has facilitated Wiki workshops and organised translation campaigns like the popular Catalan culture challenge in the Catalan Wikipedia community. He has introduced the tool to many people in his community as an excellent way to add quality content. However, Alex thinks that the tool is “hidden” in the user menu; therefore, the Language team needs to improve this to attract more newbies.

For the newbies’ category, we classify less than eighteen months users of the tool as newcomers, and one of the new users we would like to highlight is: 

Rimma Vasilyeva, User: Римма Васильева from the Republic of Tatarstan is just a year in the Wikimedia movement. She did her first translation in April 2021. Afterwards, she has translated more than one thousand articles from Bashkir to Tatar Wikipedia with the help of the Content Translation tool and has helped newer translators use the tool. Rimma has no focused topic; she targets translating any article that interests her. 

COVID-19 and gender-based translators

As the pandemic shook our world, Wikipedians were challenged to keep up with the massive influx and change of news events about COVID-19, sieving misinformation and documenting them in Wikipedia. While some Wikipedians chronicled COVID-19 and the pandemic, others translated these articles into different languages using the Content Translation tool to close the knowledge gap during the trying times. 

One out of many volunteers we would highlight that took up translating COVID-19 related articles with the tool is a Nigerian Wikipedian:

Aliyu Yusuf, known as Aliyu shaba in the Wikispace. He learned about the tool from a dedicated contributor in the Hausa Wikipedia community, who taught him to use the feature. During the pandemic, the tool became handy for Aliyu. It helped him translate major COVID-19 and health-related articles from English to the Hausa language, which is spoken by more than 60 million people in Nigeria, Niger, and other countries in Western Africa. Aliyu can’t imagine the experience of translating these kinds of health-related articles without the Content Translation tool. He literally can’t because he has never tried it.

While some contributors translate health-related articles, others devote more of their time to biographies and gender-related articles of notable indigenous people. Among them is Nkem Osuigwe, from Anambra, the Southeastern part of Nigeria, and Nkem Osuigwe is known as the AfricanLibrarian. She translates into the Igbo language, which is spoken by 30 million people in Nigeria. 

The AfricanLibrarian attended a virtual translation training organised by the Igbo Wikimedia community where she learned how to use the tool. Nkem said the tool’s automatic saving feature is beneficial, especially when working with an unstable internet. “There is a whole world of difference translating with Content Translation. I can say this because I have experience translating books from other platforms. The machine-generated translations are also a unique feature of the tool. However, I always find it challenging to translate hyperlinks and tables most of the time,” Nkem Osuigwe explained. The AfricanLibrarian has translated more biographies of notable Nigerian women than men. Apart from translations, Nkem also facilitates training sessions to teach newcomers in the Igbo community how to translate using the Content Translation tool.

The most viewed translated articles

The main page of the English Wikipedia is known as the most popular page frequently visited. We noticed an interesting trend for the articles added to Wikipedia with the Content Translation tool: entertainment-related topics or current affairs are the most popular articles. Some of them are: 

Volunteers translate these articles, and the last but not the least contributor’s user story we would like to share is -Alabama- the translator of the Loki TV series in the Spanish language based in Alabama, USA. 

Alabama’s discovery of the Content Translation tool was accidental. The edit tag caught his attention while viewing a fellow Wikipedian’s contribution; he clicked on the label, and curiosity won him over to using the tool. His contributions are mainly on film and television articles. His translation is among the most visited with Six hundred and seventy six thousand, thirty five page views from March 2021 to February 2022. For Alabama, using the tool to translate is more straightforward than starting an article from scratch. 

The above are the user stories of just eight people out of thousands that use the CX tool. The Wikimedia Foundation Language team is constantly inspired by the contributions of the 70,000 users of the Content translation tool. We are motivated by these volunteers’ work to keep improving and innovating better ways of translating articles for every user. In the future, we anticipate more milestones, not just with the Content Translation but with the Section Translation tool, our recent boost initiative that expands the CX capabilities and prioritises the use of mobile devices to translate articles in bits with ease.

How we deploy code

01:13, Wednesday, 13 2022 April UTC

Last week I spoke to a few of my Wikimedia Foundation (WMF) colleagues about how we deploy code—I completely botched it. I got too complex too fast. It only hit me later—to explain deployments, I need to start with a lie.

M. Jagadesh Kumar explains:

Every day, I am faced with the dilemma of explaining some complex phenomena [...] To realize my goal, I tell "lies to students."

This idea comes from Terry Pratchett's "lies-to-children" — a false statement that leads to a more accurate explanation. Asymptotically approaching truth via approximation.

Every section of this post is a subtle lie, but approximately correct.

Release Train

The first lie I need to tell is that we deploy code once a week.

Every Thursday, Release-Engineering-Team deploys a MediaWiki release to all 978 wikis. The "release branch" is 198 different branches—one branch each for mediawiki/core, mediawiki/vendor, 188 MediaWiki extensions, and eight skins—that get bundled up via git submodule.

Progressive rollout

The next lie gets a bit closer to the truth: we don't deploy on Thursday; we deploy Tuesday through Thursday.

The cleverly named TrainBranchBot creates a weekly train branch at 2 am UTC every Tuesday.

Progressive rollouts give users time to spot bugs. We have an experienced user-base—as Risker attested on the Wikitech-l mailing list:

It's not always possible for even the best developer and the best testing systems to catch an issue that will be spotted by a hands-on user, several of whom are much more familiar with the purpose, expected outcomes and change impact on extensions than the people who have written them or QA'd them.

Bugs

Now I'm nearing the complete truth: we deploy every day except for Fridays.

Brace yourself: we don't write perfect software. When we find serious bugs, they block the release train — we will not progress from Group1 to Group2 (for example) until we fix the blocking issue. We fix the blocking issue by backporting a patch to the release branch. If there's a bug in this release, we patch that bug in our mainline branch, then git cherry-pick that patch onto our release branch and deploy that code.

We deploy backports three times a day during backport deployment windows.  In addition to backports, developers may opt to deploy new configuration or enable/disable features in the backport deployment windows.

Release engineers train others to deploy backports twice a week.

Emergencies

We deploy on Fridays when there are major issues. Examples of major issues are:

  • Security issues
  • Data loss or corruption
  • Availability of service
  • Preventing abuse
  • Major loss of functionality/visible breakage

We avoid deploying on Fridays because we have a small team of people to respond to incidents. We want those people to be away from computers on the weekends (if they want to be), not responding to emergencies.

Non-MediaWiki code

There are 42 microservices on Kubernetes deployed via helm. And there are 64 microservices running on bare metal. The service owners deploy those microservices outside of the train process.

We coordinate deployments on our deployment calendar wiki page.

The whole truth

We progressively deploy a large bundle of MediaWiki patches (between 150 and 950) every week. There are 12 backport windows a week where developers can add new features, fix bugs, or deploy new configurations. There are microservices deployed by developers at their own pace.

Important Resources:

More resources:


Thanks to @brennen, @greg, @KSiebert, @Risker, and @VPuffetMichel for reading early drafts of this post. The feedback was very helpful. Stay tuned for "How we deploy code: Part II."

First Women’s Day editathon in UAE

19:00, Tuesday, 12 2022 April UTC

On International Women’s Day, the Wikimedians of UAE User Group organized an edit-a-thon dedicated to creating and editing articles on notable Emirati and Arab Women who broke the bias in their field of work. It was the Wikimedians of UAE User Group first edit-a-thon ever. We booked a room in the House of Wisdom library in Sharjah, one of the iconic libraries in the country.

The event was hybrid (in person and virtual). We had 11 people coming to the event and 5 joining us virtually. We started the edit-a-thon with small video from Maryana Iskander, the Wikimedia Foundation CEO, who gave a nice welcome to our new user group and the initiative on women’s day. Then our group leader Ahlam Bolooki, welcomed everyone and talked about our plans and initiatives. Right after that, the edit-a-thon started, we shared names of around 40 notable Emirati and Arab women and asked the participants to start researching and creating the articles. The ambient was very friendly and collaborative. we were going around the participants offering them help and support.

To UAE Wikimedians from Maryana Iskander for the Gender Gap 2022 in Sharjah

Statement by the jury of WLM 2021

18:52, Tuesday, 12 2022 April UTC

We, the Jurors of Wiki Loves Monuments 2021, have recently completed the jury process. The international team is in the process of finalizing the communications, and we as the 2021 jury feel it pertinent to point out a few elements, for clarity and transparency.

metal made peace monument

Peace monument in Gulu, Northern Uganda. WLM 2019, by Malaika Overcomer – CC BY- SA 4.0

Wiki Loves Monuments is an international photographic competition, born and developed inside the Wikimedia Community, to promote historic and cultural sites around the world, it has seen participation from more than 50 countries over the years. The competition has for over a decade worked towards creating a unique visual record of places that hold cultural heritage value and we hope that these records can be used to help keep their stories alive, or repair any damage caused, in the years ahead.

Our final selections include pictures of monuments located in countries currently at war with other countries. In all the involved countries the Wikimedia Movement is present with Wikimedians and cultural activities in the aim of making human knowledge available to every person in the world. Cultural places, and as an extension monuments, belong to each and every single human being on this planet. Every country, regardless of where the monuments are located, is responsible for preserving them; they are not just “the countries’ monuments”, they are milestones of the whole human journey. We are all merely the current custodians of these cultural sites and must work to ensure that they are passed down to the next generations.

The awarded 2021 pictures were selected before the events of 24th February. We do confirm our choices now, with a war raging and destroying both people and heritage.  We have Wikimedians in each of the involved countries; and we love them all. We are worried for their safety and wish them all the peace that an honest mind can imagine. We are waiting for them to start working together again as soon as possible, in peace. We love them all. We can only ask that peace and prosperity returns to peoples lives.

This being said, we find ourselves unable to present our results as we usually do. There is no joy, this year, in celebrating the results of the annual WLM competition. Our hearts are bleeding and our emotions are tinged with sadness. But our commitment to record and share cultures remains stronger and brighter than all that.

The Jurors of the 2021 edition of Wiki Loves Monuments

This blog post is Part II of a three part blog series exploring how campaign organizing helps the Wikimedia movement grow participation and respond to movement strategy. You can find Part I here and learn more about our current campaign, #WikiForHumanRights here.

In the last post, I discussed how the New Editors Research and subsequent Growth Team software allows us to make real progress on more welcoming experiences for editors who click the edit button. However, some potential audiences need a human invitation before they click edit to become invested in our movement. These Social Changers and Joiner Inners are inspired by the potential for knowledge to change the world and want to join the community we have.

If you have been around the movement for any amount of time, you have probably encountered one of our campaigns, an editathon, a partnership with a cultural heritage program or an activity organized by an affiliate. In fact, each year the Wikimedia grantmaking programs, alongside external fundraising by our affiliates, supports these programs with 10+ million USD a year.

Community-led programs are important, because cultural institutions, educational communities, activist networks and other non-profits don’t always “get” how our communities work — they need an introduction, training and support. Importantly too, certain types of new editors need help fulfilling their purpose on the wikis. 

Who provides this support? Movement organizers. Organizers are an important layer of the Wikimedia ecosystem that provides human and accessible relationships to our movement. In 2019, we published the Movement Organizers Research to better understand organizers: Who are the facilitators that introduce new audiences to contributing? Where do they come from? How do we make sure that our movement makes it as easy to contribute as an organizer as it is to edit? 

The Spanish language version of the Movement Organizers research. By studying organizers in Ghana and Argentina, and talking to organizers in other parts of the world, we were able to define what the “do” is in reaching out to other audiences.

The research found that organizers face unique challenges as they mediate between our largely on-wiki contribution communities and the needs of newcomers from external audiences. These challenges can exhaust enthusiastic organizers. But they are very concrete challenges to address: from technical needs in order to run editing events in our community (partially being worked on by the Campaigns Product Team), to a whole range of non-technical support and skills, ranging from how to run successful activities to how to deal with on-wiki conflicts (partially being addressed by the Universal Code of Conduct). Important for growing and sustaining the Wikimedia movement, Organizers are key in building common purpose with outside movements.

Where do organizers come from?

If we want to invite knowledgeable people outside the movement to edit, how do we find the organizers that can provide that invitation? The Movement Organizers research found that Wikimedia organizers come from two places: the editing communities and a broader collection of outside movements and professional communities. 

When we think about how we recruit organizers to the movement they often come from two places: the existing communities or by an invitation to join. Invited organizers have the added benefit of understanding the external communities they come from.

The first path to organizing is through editing and transitioning from editing to facilitating the community. Most of our affiliates have strong roots in that tradition — stories like the one I described in the first post of going from enthusiastic editor to organizer abound. But this group has real limits in terms of recruiting: most people thrive in our editing communities because they are more interested in the editing itself, not the social structures around the movement.

However, the second and more promising route for recruiting organizers is from other movements and professional communities. These people are slightly different from the social changer persona developed for New Editor Experiences: they tend to have complex organizing lives outside of the Wikimedia movement, and they see Wikimedia as a platform for sharing the knowledge that advances their larger mission. 

In Argentina and Ghana for example, in the research we saw social activists and open source community leaders eagerly joining the movement. However, even with deep organizing experience, coming from outside movements to Wikimedia is not easy: there are technical, social and learning curve barriers, in addition to the complexity introduced by working in 300+ languages and adapting tactics to nearly every country in the world.

However, if we are serious about accessing the missing potential contributors to Wikimedia, in order to achieve our longer term goal of “anyone who shares our vision will be able to join us”, we need to make all of that complexity simpler. 

How do Organizers invite social changers? 

The first response at movement events, like Wikimania or regional conferences, when you ask a question like “Who will edit?”, tends to be focused on audiences that replicate the self-selection bias of the open source communities I mentioned in the last post: of highly educated people, with digital skills from locations where we are already active. 

These instinctive biases close us off to the real promise of recruiting the social changers and the invited organizers that we need: the people who sit at the edge of our movement who can be taught how we work, if we work to understand where they come from, how they work and what our movement can learn from them. Some of the most successful organized groups in the movement have become those bridges into other movements. Movement Strategy is clear: we need 10,000s more connections if we will fulfill our mission — the sum of all human knowledge.

When preparing the Movement Organizers research, one of the organizers that I first interviewed said something that has stuck with me since, and driven my approach to movement building. She said something like: 

“If you are going to successfully invite a group of people you don’t know well to dinner: the first thing you need to do is visit their house, see how they serve dinner, figure out the conversations they are having, and create a welcoming invitation that meets their expectations.”  

If we are serious about looking for and growing the contributors of all types to Wikimedia — we need to look for and cultivate the organizers who are excellent at speaking the language of these other communities of practice and design practical, hands-on ways for these communities to join us. We need to get beyond the one off, random-invitation-to-dinner editathons where we rarely retain participants, and move towards the welcoming recurring weekend picnics that help their communities feel at home. 

That also means, as organizers, we need to be careful not to over-promise or communicate too early, when we are not ready for a target audience or when we are not ready to speak their language and provide support that nurtures their participation in the movement. I, occasionally, still find myself doing this in meetings with partners or new organizers, who want to work with the Foundation, and that infectious vision takes hold. This is a well meaning habit that hurts us as a collective of organizers: we will leave bad impressions, and wear ourselves out chasing audiences or activities that we aren’t prepared to nurture and care for.

So this is great in theory, what about in practice?

In my first post, I described how my early years of outreach as a volunteer felt largely unproductive. Most organizers who come from the first path to organizing enthusiastically run editathons and other outreach activities. For certain parts of the movement, there was some success with these tactics — particularly in GLAM institutions and among photographers — where the work of professionals fit closely with the work of our editing communities. 

Carol Mwaura, one of the librarians trained by the library association AfLIA who has become an active organizer and editor in the community in Kenya. Targeted organizer recruitment matters, and we can get better at it.

However, when you look at the last few years, I feel like the work of movement organizers has felt increasingly impactful because they have innovated in tactics. As a community, we have gotten better at listening to audiences and effectively targeting them as both the contributors and organizers that will join our work: 

  • By focusing on librarians (after listening carefully to movement leaders from the sector) with #1Lib1Ref, we have been able to grow a network of library partners in different corners of the world, such as Romania or across Africa with AfLIA
  • Black Lunch Table expanded their already successful black artist archiving program to include Wikimedia tactics in a way that meaningfully grew their reach — they connected their larger goals to our way of working in the movement.
  • Art+Feminism has become excellent at retaining organizers from outside the movement to run events year after year. Organizers from Art+Feminism regularly show up in other campaigns and editing events, such as #Lib1Ref. 
  • WikiGap has effectively leveraged the style of organizing found in Art+Feminism and paired it with the global presence of the Swedish embassies to retain partners and editors from the gender activism all over the world.
  • The Education team at the Wikimedia Foundation has made a fairly significant pivot from focusing on student first programs (box-checker audiences who are rarely retained as editors or organizers but produce good content, and where retention is focused on instructors as organizers), to Reading Wikipedia in the Classroom, which facilitates secondary school teachers (one of those social changer audiences) as both new organizers and entry points to editors in the movement.

For each of these programs, retention and invitation of both editors and organizers starts from a place of: Why does a target audience want to contribute? If we teach them, will the people actually stick? How do we make them stick? Success at invitation in each of these activities comes from a very targeted message to a very specific community motivated to change the world through knowledge or communication. Each program says “We know you (specialized audience) will edit or organize if you understand why editing/organizing helps you achieve your goals in the world. Here is how.” 

These programs have succeeded because they have divorced themselves from “anyone can edit” (see Part I of this series), and aligned themselves with “who wants to contribute, why, and how?”.

What next?

So where does this hard-learned insight leave us for sustaining the movement? 

  • We need to look in the places where we have already had some success and reinvest in gathering places for organizers, i.e. with librarians, educators, cultural professionals, gender activists and photographers for example– where we know we can invite them to our communities and they will stick. These programs are often well known and celebrated across the movement, but they need more investment in tools, infrastructure, capacity, and advocacy to reach their full potential.
  • We need to look at the growing edges of our movement where action is already happening, and make sure we are prepared with organizers ready to greet new audiences. From where I sit, I am seeing the most energy in communities growing to welcome people interested in: Sustainability, LGBTQ+ communities, African diaspora communities, communities around indigenous languages and cultures, and Human Rights. I am sure I am missing a few. 
  • And we need to imagine how we find new recruitable audiences with social causes and start listening to their movements. We need to identify the knowledge gaps and topics for impact described in the Movement Strategy implementation process, so that we might match new audiences with new actions that help create the change they want.

If we are serious about a healthy and vibrant movement, we need to get better at communicating with each other why and where we are focusing — and make sure that we aren’t unintentionally losing organizers and the many publics that could be contributors along the way. Our enthusiasm for “anyone can edit” shouldn’t be an excuse for engaging these audiences without collective preparedness. 

In Part III of the series, I am going to describe why the implementation of the Movement Strategy initiative of “Align with Environmental Sustainability Initatives” may actually be pointing at one of the best opportunities for us to reach new movements. The world needs us to focus on Sustainability and the Climate Crises now: the Climate movement cares about factual communication; and it gives us the opportunity to address some of our biggest, most impactful knowledge and community gaps.  

If you got to the end of this post, and want to get involved in the Sustainability work, the currently running #WikiForHumanRights campaign is designed to help local organizers test outreach tactics to local communities focused on sustainability. Or if you are an experienced editor, join the writing challenge.

Wikidata query service updater evolution

14:41, Tuesday, 12 2022 April UTC

The Wikidata Query Service (WDQS) sits in front of Wikidata and provides access to query its data via a SPARQL API. The query service itself is built on top of Blazegraph, but in many regards is very similar to any other triple store that provides a SPARQL API.

In the early days of the query service (circa 2015), the service was only run by Wikidata, hence the name. However, as interest and usage of Wikibase continued to grow more people started running a query service of their own, for data in their own Wikibase. But you’ll notice most people still refer to it as WDQS today.

Whereas most core Wikibase functionality is developed by Wikimedia Deutschland, the query service is developed by the search platform team at the Wikimedia Foundation, with a focus on wikidata.org, but also a goal of keeping it useable outside of Wikimedia infrastructure.

The query service itself currently works as a whole application rather than just a database. Under the surface, this can roughly be split into 2 key parts

  • Backend Blazegraph database that stores and indexes data
  • Updater process that takes data from a Wikibase and puts it in the database

This actually means that you can run your own query service, without running a Wikibase at all. For example, you can load the whole of Wikidata into a query service that you operate, and have it stay up to date with current events. Though in practice this is quite some work, and expense on storage and indexing and I expect not many folks do this.

Over time the updater element of the query service updater has iterated through some changes. The updater now packaged with Wikibase as used by most folks outside of the Wikimedia infrastructure is now 2 steps behind the updater used for Wikidata itself.

The updater generations look something like this:

  • HTTP API Recent Changes polling updater (used by most Wikibases)
  • Kafka based Recent Changes polling updater
  • Streaming updater (used on Wikidata)

Let’s take a look at a high-level overview of these updaters, what has changed and why. I’ll also be applying some pretty arbitrary / gut feeling scores to 4 categories for each updater.

Fundamentally they all work in the same way, somehow find out that entities in a Wikibase have changed, get the new data from the wikibase, and update the data in one or more blazegraph backends using a couple of differing methods.

Diagram from high-level query service architecture overview showing the general elements of a query service backend

HTTP API polling updater

Simplicity Score: 9/10
Legacy Score: 6/10
Scalability Score: 3/10
Reliability Score: 4/10

The HTTP API polling updater was the original updater likely dating back to 2014/2015. It makes use of the MediaWiki recent changes data and API, normally polling every 10 seconds to look for new changes in the namespaces that Wikibase entities are expected to be found. If changes are detected, it will retrieve the new data, removing old data from the database and storing this new data.

As the updater makes use of a MediaWiki core feature no additional extensions, services or functionality need to be deployed to a Wikibase. It’s nice and easy to set up, requires minimal resources and is quite easy to reason with. So, high simplicity score.

Diagram from high-level query service architecture overview showing what happens at runtime of the RCPoller updater

A middle ground score for legacy is given, as this is the oldest updater, however still used widely by Wikibases around the world.

When judging scalability, we have to look at Wikidata. This updater would no longer work very well for the number of changes on Wikidata (let’s say 600 edits per minute) and the number of backend query service databases that those changes need to make their way to (12+ currently). This updater was designed to point at a single blazegraph backend.

By default Recent Changes only store 30 days’ worth of data, so if your updater breaks for 30 days and you don’t notice, you’ll need to reload the data from scratch. For small wikibases, this is one of the most noticeable and annoying things to happen.

Kafka based polling updater

Simplicity Score: 5/10
Legacy Score: 9/10
Scalability Score: 5/10
Reliability Score: 7/10

I’ll gloss over the Kafka polling updater, as this was never generally used in the Wikibase space and only ever in Wikimedia production for Wikidata. It was used roughly between 2018 (created in T185951) and 2021.

At a high level, this updater simple replaced the MediaWiki recent changes HTTP API with a stream of recent changes that were written to Kafka by various event-related extensions for MediaWiki. These events contained similar information that the recent changes API would provide, such as page title, namespace, and from this, the updater can determine what entities have changed.

This loses points for simplicity, as it requires both running of Kafka, and also extra extensions in MediaWiki to emit events. Top scores for legacy, as no one uses this solution anymore. Some scalability issues were solved, such as the elimination of repeated hits to the MediaWiki API, but as with the HTTP updater the total process is still duplicated as the number of backends scale-up, and writes to backends are not very efficient. But as this didn’t rely on the public HTTP API, instead on an internal Kafka service, it can have some extra points for reliability.

Streaming updater

Simplicity Score: 3.5/10
Legacy Score: 1/10
Scalability Score: 9/10
Reliability Score: 9/10

The streaming updater was fully rolled out to Wikidata at the end of 2021 (see T244590) and came with some more significant changes to the update process.

Simplicity decreases due to more components making up the process, as well as more complicated RDF diffing on updates. Low legacy score as its currently in use and actively maintained by the Wikimedia search platform team. It solves a variety of scaling issues for Wikidata with some insane increase in updater performance, and on the whole, due to this factor and more is more reliable.

Similar to the Kafka based polling updater the information about when entities changed comes from Kafka. A single “Producer” listens to this stream of entity changes producing another stream of RDF changes that need to happen in the change. This stream of RDF changes is then listened to by a “Consumer” on each backend which runs write queries against the backend to update the data stored. Note the “Single Host” box in the diagram below.

Diagram from high-level query service architecture overview showing how the streaming updater is split between general services and per backend host services

Some major wins when it comes to this new implementation are:

  • Streaming rather than polling, so no waiting in between polls
  • Entity changes and the RDF from Wikibase are only retrieved once by the Producer in situations where multiple backends run
  • Only RDF changes are written into the database rather than removing all triples associated with an entity and rewriting new triples. This reduces blazegraph write load, as well as increases update speed.

These effects can be seen clearly on Wikidata. The number of requests to the API to retrieve RDF data for Wikibase entities has dropped (less load on MediaWiki). And in cases where a backend falls behind due to some issue and is then fixed, the backend will very quickly catch back up with the current state of Wikidata rather than taking hours.

The post Wikidata query service updater evolution appeared first on addshore.

Movement Strategy and Governance Newsletter: Issue 6

14:00, Tuesday, 12 2022 April UTC

Welcome to the sixth issue of Movement Strategy and Governance News! The newsletter distributes relevant news and events about the Movement Charter, Universal Code of Conduct, Movement Strategy Implementation grants, Board of Trustees elections and other relevant topics. The purpose of this newsletter is to help Wikimedians stay involved with the different projects and activities going on within the broad Movement Strategy and Governance team of the Wikimedia Foundation.

The MSG Newsletter is scheduled for quarterly deliveries, while there’s the more frequent Movement Strategy Weekly, which is designed to cater to Wikimedians who want to closely follow our processes. You can leave feedback or ideas for future issues on the Newsletter talk page. You can also help us by translating the newsletter issues in your languages and sharing the newsletter on your community portals and platforms. Also remember to please subscribe to this Newsletter here here.

Thank you for reading and participating!

Leadership Development: A Working Group is Forming!

In February, the Community Development (CD) team published a Call for Feedback about a Leadership Development Working Group, which was shared in 42 languages. The Movement Strategy & Governance team’s 16 language facilitators collected feedback from their language and regional communities through multiple channels: Meta-wiki, Telegram, 1:1 meetings, community calls, on-wiki discussion boards and others.

The Call for Feedback was a crucial step to gather community input about leadership development and the working group, as community members were able to share feedback about the meaning of “leader,” the composition of the working group, and the need for continued community feedback. You can view a summary of the feedback on Meta-wiki.

Furthermore, the application to join the Leadership Development Working Group closed on April 10th, 2022. Up to 12 community members, including volunteers and affiliate staff will be selected to participate in the working group, with a term commitment of one year starting in May 2022. To find a detailed description of the working group structure, please find it on the Meta-wiki page. If you have further questions or concerns, please email the Community Development team: comdevteam(_AT_)wikimedia.org.

Universal Code of Conduct Ratification Results are out!

On 24 January 2022, the Movement Strategy and Governance (MSG) facilitation team supported the translation and publishing of the updated enforcement guidelines for the Universal Code of Conduct, covering over a dozen languages. The document was produced by a volunteer-staff drafting committee that worked throughout 2021 to produce the recommendations.

The team then engaged and encouraged community members to review and comment on the document through a series of community conversations leading to a global vote. The global decision process via SecurePoll was held from 7 to 21 March. Over 2,300 eligible voters from at least 128 different home projects submitted their opinion and comments. The results of the vote have now been published here. You can read more about the UCoC project here.

Movement Discussions on Hubs

The Global Conversation event on Regional and Thematic Hubs was held on Saturday, March 12, 2022. Participation at the event was open with a request for pre-registration. As a result, anyone interested in the hubs conversation could participate, irrespective of their actual intent to start or continue working on a regional or thematic hub. The event was eventually attended by 84 Wikimedians with diverse backgrounds and experiences from across the movement. The goal of the event was to:

  • Share the findings from the Hubs Dialogue qualitative research interview series.
  • Validate the key findings of the research with the participants.
  • Clarify whether the identified shared needs can only be fulfilled by a Hub structure.
  • Gather more inputs for drafting a preliminary definition of Hubs.

Until now there is no specific clear definition of hubs; one of the goals of the conversation was to try to propose a definition based on analyzing the generally shared needs that were concluded from the hubs dialog, and try to understand what really is related to the essence of hubs.

This phase of the process is just for piloting and learning for practice, as it will take time to get to a really good formal consensus across the movement about what qualifies as a hub, and the exclusivity of hubs that makes it different from affiliates. There is a need to specify the minimum viable piloting criteria, understand what criteria hubs need to meet to form, and pilot parties to do some actual work around the regional and thematic coordination. Conversations will continue, and will need to be well connected and aligned with the charter conversations, which will tackle the governance questions that exist all around the movement.

The summary report is published on Meta-wiki.

Movement Strategy Grants Remain Open!

Since the start of the year, six proposals with a total value of about $80,000 USD have been approved. There are also seven fully submitted proposals currently being reviewed and awaiting decisions, and twice as many proposals still in the idea stage being supported by MSG facilitators to help transform the ideas into full proposals. A big congratulations to those who are now in the process of implementing their projects. Here’s a full list of projects approved so far:

Movement Strategy Grants reopened in October last year. MSG facilitators have been engaging with various individuals, affiliates and user groups to increase knowledge about Movement Strategy and support the transformation of ideas into full projects and proposals. Through this effort, we are learning a lot about the challenges that many face in developing strong proposals and getting the support needed to not only apply for funds, but also to engage fully in Movement Strategy discussions.

Do you have a movement strategy project idea? Are you unsure how your ideas fit into Movement Strategy and if those ideas are fundable through MS grants? Please feel free to reach out to a facilitator to request support. You can also reach out to us at strategy2030(_AT_)wikimedia.org, if you have any questions related to Movement Strategy Implementation Grants or ideas you want support for.

The Movement Charter Drafting Committee is All Set!

In October 2021, about 1,000 Wikimedians participated in an election and selection process to form the Movement Charter Drafting Committee (MCDC). The Committee consists of fifteen members, and aims to create a document that will define the future governance of the Wikimedia movement. After months of hard work, the Committee has agreed on the essential values and methods for its work, and has started to create the outline of the Movement Charter draft.

During its first five months together, the Drafting Committee invested significant amounts of time in establishing its work systems. For example, the Committee created documents to define its Principles and how it makes internal decisions, put together a communications plan, and updated the timeline of the drafting process. The Committee also replaced one of its members due to health reasons, agreed on a way to work with the Board of Trustees and refurbished its information page on Meta-wiki.

Currently, the Drafting Committee is discussing the first version of the draft Movement Charter’s outline. To hear about the outline, once it is published, and more about the Committee’s work, we invite you to follow the MCDC Updates (regularly published on the 10th of each month).

Introducing Movement Strategy Weekly – Contribute and Subscribe!

If you ever found it difficult to find your way between the many different Movement Strategy pages, or struggled to know what is going on and where you can participate, we invite you to subscribe to the newly-launched Movement Strategy Weekly! On this simply-accessible portal on Meta-wiki, you will find up-to-date news about the various ongoing projects, upcoming events and participation opportunities in Movement Strategy.

The portal is connected to the various Movement Strategy pages on Meta-wiki (for example, the Hubs page), which are automatically updated through the portal, without duplicating translations or content. If you have a project you are working on, you are also welcome to submit it. The Movement Strategy and Governance Team welcomes submissions for updates from everyone. Please don’t forget to subscribe to the updates and watch the Meta page!

Blogs

Here are some publications on Diff about the Movement Strategy, movement governance, and related topics which you may find interesting:

Check out the new Podcast: WIKIMOVE

13:43, Tuesday, 12 2022 April UTC

Wikimedia Deutschland is delighted to announce the launch of our new podcast series on all things Movement Strategy: WIKIMOVE. The podcast will be a forum for open and frank conversations about topics related to movement strategy. 

The first episode will be available next week on our website and our Meta page.

Make sure to subscribe on Meta to get notified for each episode. 

In 2017 our movement produced a strategic direction, and in 2020 recommendations were published that shed light on the changes that the free knowledge movement will need to make to stay relevant and grow in size and diversity.  Change is never easy. We hope that the conversations in the podcast will inspire people, and open up opportunities for thinking and working together.

What will WIKIMOVE be about?
The topics we discuss will dance around the strategic direction, the recommendations, the principles, the initiatives. But we will also look up and examine larger issues and concepts from the knowledge ecosystem or beyond that are relevant to the transformation of the Wikimedia movement. By creating this space we hope to let the audience know about the latest happenings, new ideas, present opportunities to participate, contribute and provide feedback. We hope that new ideas are born from the conversations and collaborations kick-started.

What can you expect of WIKIMOVE ?
The show will be a space for respectful exchange and mutual support. Looking into the future, optimistically, rather than complaining about the past or present. While paying respect to the ‘old’ movement, critically questioning colonial and unequitable systems, structures, policies, narratives and habits. Shining a light on those who try new things, develop innovations, whether they succeed or fail while doing so. Iteration, ambiguity, and uncertainty are welcome. We especially welcome people with questions, and don’t expect ready made solutions.

Who are the guests?
People who are working on 2030 initiatives, or are participating in the governance reform, people who come from underrepresented communities, people from other movements who have experiences and inspiration to share. We aim to strengthen mutuality and solidarity and to show that there are people inside and outside of our movement that have already developed solutions for our challenges and questions. 

Who are the hosts? 
Nicole Ebber and Nikki  Zeuner from Wikimedia Deutschland’s Movement Strategy and Global Relations team will be hosting the show.

Looking forward to our episodes?
The first episode will be released next week, both as an audio podcast and a shorter video version. It features Tochi Precious and Guillaume Paumier, discussing knowledge as a service.

Tech News: 2022-15

21:03, Monday, 11 2022 April UTC

Other languages: Bahasa Indonesia, Deutsch, English, dagbanli, español, français, italiano, magyar, polski, português, português do Brasil, suomi, svenska, čeština, русский, українська, עברית, العربية, বাংলা, 中文, 日本語, ꯃꯤꯇꯩ ꯂꯣꯟ, 한국어

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

  • There is a new public status page at www.wikimediastatus.net. This site shows five automated high-level metrics where you can see the overall health and performance of our wikis’ technical environment. It also contains manually-written updates for widespread incidents, which are written as quickly as the engineers are able to do so while also fixing the actual problem. The site is separated from our production infrastructure and hosted by an external service, so that it can be accessed even if the wikis are briefly unavailable. You can read more about this project.
  • On Wiktionary wikis, the software to play videos and audio files on pages has now changed. The old player has been removed. Some audio players will become wider after this change. The new player has been a beta feature for over four years. [1][2]

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 12 April. It will be on non-Wikipedia wikis and some Wikipedias from 13 April. It will be on all wikis from 14 April (calendar).

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Why good information on the environment matters

16:13, Monday, 11 2022 April UTC

Human-dominated landscapes tend to be homogenized in a that’s often invisible to us. Tourists visiting anywhere in the tropics expect a see lot of the same things — coconut trees, mangos, pineapples, bananas. Despite the fact that the tropics are some of the most biologically diverse regions of the planet, we see this artificial aggregation of a small number of common species. And alongside these intentional introductions are a whole lot of species that we have unintentionally spread around the world. Tramp species are species that have been spread around the world by human activity. Originally applied to ant species that had managed to find their way around the world like tramps or stowaways, the term has come to describe a group of species that are usually associated with human activity. While some tramp species become invasive species, most do not.

Most people are familiar with the invasive species, but might have a hard time separating that concept from the related idea of introduced species. Familiar ideas like these got added to Wikipedia first (the invasive species article was created in 2002, while the introduced species article was created in 2003). The article on tramp species, on the other hand, wasn’t created until November 2021 when a student in Sarah Turner’s Advanced Seminar in Environmental Science class created the article. It’s a concept that fits an important part in our understanding of this topic, but as long as it had no Wikipedia article, it’s likely to be invisible to many people learning about the topic. Since undergraduates rely heavily on Wikipedia as a freely available alternative to textbooks, the topics that are missing from Wikipedia are more likely to slip through the cracks for students learning ecology.

Disease, as we have learned during the Covid-19 pandemic, is more than just the interaction between a pathogen and its host. There’s a whole world of environmental factors that means that there’s much more to disease transmission than simply infection rates. These sorts of things are part of the science of disease ecology, but more than a year into the pandemic, Wikipedia’s article on the topic was just a short overview. A student editor in the class was able to transform the article into something much more useful and information to to readers.

Climate change affects not only global temperatures, but also rainfall patterns and sea level rise. By expanding the ice sheet model and flood risk management articles, student editors were able to improve the information that’s out there for people trying to understand these important tools for forecasting changes in the world we live in. Other new articles created by students in the class include CLUE model, a spatially-explicit landuse-change model, Cooper Reef, an artificial reef in Australia, Indigenous rainforest blockades in Borneo, the Impacts of tourism in Kodagu district in Karnataka, India, and Soapstone mining in Tabaka, Kenya. Other existing articles that they made major improvements to include Alopecia in animalsBlond capuchin and Stream power.

Wikipedia’s coverage of environmental science is uneven. Many are covered well, but there are large gaps. Other articles suffer because they’re incomplete, badly organized, or out of date. This leaves a lot of room for student editors to make important contributions.

Image credit: Forest & Kim Starr, CC BY 3.0 US, via Wikimedia Commons

Tech News issue #15, 2022 (April 11, 2022)

00:00, Monday, 11 2022 April UTC
previous 2022, week 15 (Monday 11 April 2022) next

Tech News: 2022-15

weeklyOSM 611

09:42, Sunday, 10 2022 April UTC

29/03/2022-04/04/2022

lead picture

Patterns in placenames [1] © see | map data © OpenStreetMap contributors

Mapping

  • Anne-Karoline Distel reported on a survey of Callan, Ireland, where address attributes (house numbers and street names) seem kind of curious.
  • Dino Michelini wrote (it) > en, in his blog, a well researched piece on the ancient Etruscan-Roman road Via Clodia. He also outlined what still needs to be done to improve the mapping of this road in OSM.
  • LySioS, an OSM France contributor, proposed (fr) > en that mappers in the field use an OSM business card to facilitate contacts with local residents.
  • LySioS also published (fr) > en a diary post for beginners about the ten commandments for OSM mapping (we reported earlier).
  • The OpenStreetMap tool set Neis-one.org now recognises MapComplete as a distinct data editor rather than just one of the ‘unknown’, as reported by MapComplete’s main developer.
  • The following proposals are waiting for your comments:

Community

  • The UN Mappers is now also choosing a Mapper of the Month. UN Mapper of the Month for April 2022 is SSEKITOLEKO.
  • Amanda McCann’s activity report for March 2022 is online.
  • Christoph Hormann shared his analysis of OSM-related group communication channels and platforms.
  • Minh Nguyen tackled the lack of a negative feedback option on the wiki and provided a JavaScript snippet to add to a user script page, so that one could chide any chosen contribution (an April Fool’s Day joke).
  • raspbeguy shared (fr) a small script, similar to git-blame, that indicates the last person who modified or deleted tags on a OSM element.
  • Seth Deegan has proposed adding the Translate extension to the OSM Wiki, something that would improve the process of translating articles on the Wiki. The proposal is open for comments.
  • The Ukrainian OSM community has published an appeal to the OSM community urging everyone to refrain from any mapping of the territory of Ukraine while the Russian–Ukrainian war is unfolding.

Events

  • OSMUS has honoured Ian Dees with the Hall of Fame Award.
  • Bryan Housel presented the 2.0 alpha of the new RapiD at SotMUS. The test instance shows high performance during tests.

Education

  • The group ‘Geospatial Analysis Community of Practice’ at the University of Queensland, Australia, has published an extensive tutorial on ‘spatial networks’ with R.

OSM research

  • Marco Minghini and his colleagues published a paper reviewing the initiatives from the Italian OpenStreetMap community during the early COVID-19 pandemic, discussing it from a data ecosystem perspective at both national and European scales.

Maps

  • [1] SeeSchloss created a map tool that uses OpenStreetMap data to visualise patterns in place names in various northern hemisphere territories.
  • MapTiler presented a short tutorial on ‘Customised maps made easy’.
  • Christopher Beddow wrote an article examining the bundle of geospatial components that make up Google Maps, and listed alternatives to each. He further suggests that bundling the alternatives is a strategy to compete with Google Maps as a widely used mobile app.

Did you know …

  • … the possibilities 1, 2, 3 of printing beautiful map based gifts?
  • flat_steps? The tag for steps where individual steps are separated by about 1 metre or more. Such steps may be accessible to some people who would otherwise avoid highway=steps.

Other “geo” things

  • CAMALIOT, an Android App, is a project run by a consortium led by ETH Zurich (ETHZ) in collaboration with the International Institute for Applied Systems Analysis (IIASA) and the European Space Agency. The app is gathering data for machine learning analysis of meteorology and space weather patterns.
  • Cartographers from Le Monde described (fr) > en the steps taken in making their maps, using the Ukraine situation as an example.
  • @MatsushitaSakura left a photo (zhcn) on an internet detective hobby club and asked for help to find out where it was. Another user (@猫爪子) found the possible location six months later with the help of overpass turbo and some detailed Danish mapping and showed the Overpass QL code (01:45) (zhcn) he used.

Upcoming Events

Where What Online When Country
Skillshare Session: OSM Community Forum osmcalpic 2022-04-08
Berlin 166. Berlin-Brandenburg OpenStreetMap Stammtisch osmcalpic 2022-04-08 flag
OSM Africa April Mapathon: Map Kenya osmcalpic 2022-04-09
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-11
臺北市 OpenStreetMap x Wikidata Taipei #39 osmcalpic 2022-04-11 flag
Roma Capitale Incontro dei mappatori romani e laziali osmcalpic 2022-04-11 flag
Washington MappingDC Mappy Hour osmcalpic 2022-04-13 flag
San Jose South Bay Map Night osmcalpic 2022-04-13 flag
20095 Hamburger Mappertreffen osmcalpic 2022-04-12 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-13
Michigan Michigan Meetup osmcalpic 2022-04-14 flag
OSM Utah Monthly Meetup osmcalpic 2022-04-14
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-18
OSMF Engineering Working Group meeting osmcalpic 2022-04-18
150. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-04-19
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-04-19 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-04-19 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-20
Dublin Irish Virtual Map and Chat osmcalpic 2022-04-21 flag
New York New York City Meetup osmcalpic 2022-04-23 flag
京都市 京都!街歩き!マッピングパーティ:第29回 Re:鹿王院 osmcalpic 2022-04-24 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-25
Bremen Bremer Mappertreffen (Online) osmcalpic 2022-04-25 flag
San Jose South Bay Map Night osmcalpic 2022-04-27 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-27
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-04-28
Gent Open Belgium 2022 osmcalpic 2022-04-29 flag
Rapperswil-Jona Mapathon/Hackathon at the OST Campus Rapperswil and virtually osmcalpic 2022-04-29 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, Strubbl, TheSwavu, derFred.

A Trainsperiments Week Reflection

13:59, Friday, 08 2022 April UTC

Over here in the Release-Engineering-Team, Train Deployment is usually a rotating duty. We've written about it before, so I won't go into the exact process, but I want to tell you something new about it.

It's awful, incredibly stressful, and a bit lonely.

And last week we ran an experiment where we endeavored to perform the full train cycle four times in a single week... What is wrong with us? (Okay. I need to own this. It was technically my idea.) So what is wrong with me? Why did I wish this on my team? Why did everyone agree to it?

First I think it's important to portray (and perhaps with a little more color) how terrible running the train can be.

How it usually feels to run a Train Deployment and why

Here's a little chugga-choo with a captain and a crew. Would the llama like a ride? Llama Llama tries to hide.

―Llama Llama, Llama Llama Misses Mama

At the outset of many a week I have wondered why, when the kids are safely in childcare and I'm finally in a quiet house well fed and preparing a nice hot shower to not frantically use but actually enjoy, my shoulder is cramping and there's a strange buzzing ballooning in my abdomen.

Am I getting sick? Did I forget something? This should be nice. Why can't I have nice things? Why... Oh. Yes. Right. I'm on train this week.

Train begins in the body before it terrorizes the mind, and I'm not the only one who feels that way.

A week of periodic drudgery which at any moment threatens to tip into the realm of waking nightmare.

―Stoic yet Hapless Conductor

Aptly put. The nightmare is anything from a tiny visual regression to taking some of the largest sites on the Internet down completely.

Giving a presentation but you have no idea what the slides are.

―Bravely Befuddled Conductor

Yes. There's no visibility into what we are deploying. It's a week's worth of changes, other teams' changes, changes from teams with different workflows and development cycles, all touching hundreds of different codebases. The changes have gone through review, they've been hammered by automated tests, and yet we are still too far removed from them to understand what might happen when they're exposed to real world conditions.

It's like throwing a penny into a well, a well of snakes, bureaucratic snakes that hate pennies, and they start shouting at you to fill out oddly specific sounding forms of which you have none.

―Lost Soul been 'round these parts

Kafkaesque.

When under the stress and threat of the aforementioned nightmare, it's difficult to think straight. But we have to. We have to parse and investigate intricate stack traces, run git blames on the deployment server, navigate our bug reporting forms and try to recall which teams are responsible for which parts of the aggregate MediaWiki codebase we've put together which itself is highly specific to WMF's production installation and really only becomes that long after changes merge to main branches of the constituent codebases.

We have to exercise clear judgement and make decisive calls of whether to rollback partially (previous group) or completely (all groups to previous version). We may have to halt everything and start hollering in IRC, Slack channels, mailing lists, to get the signal to the right folks (wonderful and gracious folks) that no more code changes will be deployed until what we're seeing is dealt with. We have to play the bad guys and gals to get the train back on track.

Trainsperiments Week and what was different about it

Study after study shows that having a good support network constitutes the single most powerful protection against becoming traumatized. Safety and terror are incompatible. When we are terrified, nothing calms us down like a reassuring voice or the firm embrace of someone we trust.

―Bessel Van Der Kolk, M.D., The Body Keeps the Score

Four trains in a single week and everyone in Release Engineering is onboard. What could possibly be better about that?

Well there is a safety in numbers as they say, and not in some Darwinistic way where most of us will be picked off by the train demons and the others will somehow take solace in their incidental fitness, but in a way where we are mutually trusting, supportive, and feeling collectively resourced enough to do the needful with aplomb.

So we set up video meetings for all scheduled deployment windows, had synchronous hand offs between our European colleagues and our North American ones. We welcomed folks from other teams into our deployments to show them the good, the bad, and the ugly of how their code gets its final send off 'round the bend and into the setting hot fusion reaction that is production. We found and fixed longstanding and mysterious bugs in our tooling. We deployed four full trains in a single week.

And it felt markedly different.

One of those barn raising projects you read about where everybody pushes the walls up en masse.

―Our Stoic Now Softened but Still Sardonic Conductor

Yes! Lonely and unwitnessed work is de facto drudgery. Toiling safely together we have a greater chance at staving off the stress and really feeling the accomplishment.

Giving a presentation with your friends and everyone contributes one slide.

―Our No Longer Befuddled but Simply Brave Conductor

Many hands make light work!

It was like throwing a handful of pennies into a well, a well of snakes, still bureaucratic and shouty, oh hey but my friends are here and they remind me these are just stack traces, words on a screen, and my friends happen to be great at filling out forms.

―Our Once Lost Now Found Conductor

When no one person is overwhelmed or unsafe, we all think and act more clearly.

The hidden takeaways of Trainsperiment Week

So how should what we've learned during our Trainsperiment Week inform our future deployment strategies and process. How should train deployments change?

The known hypothesis we wanted to test by performing this experiment was in essence:

  1. More frequent deployments will result in fewer changes being deployed each time.
  2. Fewer changes on average means the deployment is less likely to fail. The deployment is safer.
  3. A safer deployment can be performed more frequently. (Positive feedback loop to #1.)
  4. Overall we will: move faster; break less.

I don't know if we've proved that yet but we got an inkling that yes, the smaller subsequent deployments of the week did seem to go more smoothly. One week, however, even a week of four deployment cycles is not a large enough sample to say definitively whether doing train more frequently will for sure result in safer, more frequent deployments with fewer failures.

What was not apparent until we did our retrospective, however, is that it simply felt easier to do deployments together. It was still a kind of drudgery, but it was not abjectly terrible.

My personal takeaway is that a conductor who feels resourced and safe is the basis for all other improvements to the deployment process, and I want conductors to not only have tooling that works reliably with actionable logging at their disposal, but to feel a sense of community there with them when they're pushing the buttons. I want them to feel that the hard calls of whether or not to halt everything and rollback are not just their calls but shared in the moment among numerous people with intimate knowledge of the overall MediaWiki software ecosystem.

Better tooling—particularly around error reporting and escalation—is a barrier to entry for sure. Once we've made sufficient improvements there we need to get that tooling into other people's hands and show them that this process does not have to be so terrifying. And I think we're on the right track here with increased frequency and smaller sets of changes, but we can't lose sight of the human/social element and foundational basis of safety.

More than anything else, I want wider participation in the train deployment process by engineers in the entire organization along with volunteers.


Thanks to @thcipriani for reading my drafts and unblocking me from myself a number of times. Thanks to @jeena and @brennen for the inspirational analogies.

More Wikidata metrics on the Dashboard

14:17, Thursday, 07 2022 April UTC

We’re excited to announce some new updates to Dashboard statistics regarding Wikidata. As of April 2022, the Programs and Events Dashboard shares Wikidata details about merges, aliases, labels, claims, and more!

In early March, we rolled out the final batch of improvements from Outreachy intern Ivana Novaković-Leković. Ivana’s internship focused on improving the Dashboard’s support for Wikidata. After an overhaul of the system’s interface messages to add Wikidata-specific terminology — “Item” instead of “Article” and so on, for events that focus on Wikidata — Ivana worked on integrating Wikidata edit analysis into the Dashboard’s data update system. We deployed under-the-hood changes in February to begin collecting the data we would need — edit summaries from all tracked Wikidata edits. The final step was to add a visualization of that data, which you can see in action here.

The new Wikidata stats are based on analyzing the edit summary of each edit. The edit summaries for Items on Wikidata are more structured than the free-form summaries from Wikipedia and other wikis, making it possible to reliably classify most common types of contributions. For example, adding an label for a Wikidata item will result in an edit summary that includes the code `wbsetlabel-add`.

There are some limitations to this strategy, however. Multi-part revisions — for example, adding a new property that also includes qualifiers and references — will only be partially represented in the stats. That example gets counted towards ‘Claims created’, but not towards ‘References added’ or ‘Qualifiers added’. The Wikidata API provides no direct method to count these details, but it’s possible to calculate them by comparing the ‘before’ and ‘after’ state of an Item via its complete JSON entity data. We may explore that in the future, but it would require some significant changes in the Dashboard’s storage architecture before that would be possible.

Over the last several weeks, we’ve been backfilling the Wikidata stats for almost all the Programs & Events Dashboard events that edited Wikidata, and Campaign pages also show aggregate Wikidata stats.

Thanks, Ivana, for your great work!

So what does this mean for Dashboard users?

Anne Chen, a Wikidata Institute alumnae, has been using Wikidata more in the archaeology course she teaches at Yale University. As you can see from this screenshot of a recent edit-a-thon, there are many more granular Wikidata statistics you can follow. Prior to this update, the Dashboard provided users with statistics limited to number of participants, items created, items edited, total edits, references added, and page views. Although these are useful statistics to have access to, the nature of Wikidata editing can demand other sets of metrics.

Screenshot from Dashboard
Wikidata Dashboard detailed statistics example

Merging, for instance, is an important feature of Wikidata editing. Data is coming to Wikidata from different corners of the world all at once, so duplication is a natural occurrence on Wikidata, but it still needs to be addressed. Now this specific metric is easy to track on the Dashboard. Similarly, label, alias, and description work is essential for transition, disambiguation, and providing context to users about items. These statistics used to be more difficult to discover, now they show up in the statistics box on the Dashboard.

Screenshot of the Download Stats button
Download stats button on the Dashboard

For experienced Dashboard users, you may be used to obtaining these statistics from the “Download stats” button on the home tab of the Dashboard. This button still exists, so if it’s more convenient to have these stats as a CSV file, you can still get them that way! For those curious users who may be wondering what “other updates” mean, those are edits made outside of the item space on Wikidata. This would include user pages, talk pages, and WikiProject pages.

We’re excited that these new statistics are more accessible since users will have different outcomes for their projects. The more statistics we can track the better we can tell the stories of our impact and work on Wikidata. We hope you enjoy these new features.

If you’re interested in learning more about Wikidata, editing Wikidata, and Wikidata statistics, keep an eye on our calendar for future courses.

Sowt and the Wikimedia Foundation, have partnered  to produce a new season of Sowt’s podcast Manbet. Hosted and written by veteran content creator Bisher Najjar, the educational podcast explores various topics from the fields of humanities and society.

This partnership is a result of Sowt and the Wikimedia Foundation’s mutual vision to share knowledge with the world. The Wikimedia Foundation is the global nonprofit that operates Wikipedia and the other Wikimedia free knowledge projects and aims at ensuring every single human can freely share in the sum of all knowledge. Sowt produces and distributes high-quality audio shows in Arabic to create a dialogue around the most important topics to Arab listeners across the world.

“This partnership brings together the power of audio storytelling and the importance of open knowledge and access to information. As a leader in Arabic podcast production, Sowt and the Wikimedia Foundation are aligned on expanding access to high quality audio content for Arabic speaking audiences,” said Jack Rabah, the Wikimedia Foundation Lead Regional Partnerships Manager (Middle East and Africa). “Together, we can build greater awareness and understanding of Wikipedia through a series of informative narrated podcasts for all listeners across the MENA region.” 

The first episode of Manbet published in October 2020, was followed by 3 seasons with topics ranging from exploring the history of passports and the birth of Arab feminism to the fashion revolution, among many others. Tune in to the new season of Manbet to uncover topics such as Sufism, the life of Native Americans, the history of Yemen, the story of Nollywood, and humans’ ancient dream of flying.

“I believe that the partnership between Sowt and the Wikimedia Foundation enriches the production of content in the Arab Region and Manbet is the best program to reflect this kind of collaboration. This season of Manbet attempts to take our audience through a journey of history, cinema, music and thriller. This span of information and stories will give us an insight of how knowledge has been and will always be power.” Said Ahmed Eman Zakaria, Manbet producer working with Sowt. 

The new season of Manbet comes out in April 2022, and you can listen to new episodes wherever you get your podcasts.

Links:

Outreachy report #30: March 2022

00:00, Tuesday, 05 2022 April UTC

March was a tough month–my partner and I had dengue fever as we reviewed and processed initial applications–, but we made through it. ✨ Team highlights Sage developed new code to help us review and process school time commitments: Sage and I have been trying to develop strategies to review academic calendars quickly for years. We’ve gone from external notes to trying to gathering data on specific schools and requesting initial application reviewers to assign students to us.

WikiCrowd at 50k answers

19:13, Monday, 04 2022 April UTC

In January 2022 I published a new Wikimedia tool called WikiCrowd.

This tool allows people to answer simple questions to contribute edits to Wikimedia projects such as Wikimedia Commons and Wikidata.

It’s designed to be able to deal with a wide variety of questions, but due to time constraints, the extent of the current questions covers Aliases for Wikidata, and Depict statements for Wikimedia Commons.

The tool has just surpassed 55k questions, 50k answers, 32k edits and 75 users.

Thanks to @pmgpmgpmgpmg (Twitter, Github) and @waldyrious (Twitter, Github) for their sustained contributions to the project filling issues as well as contributing code and question definitions.

User Leaderboard

Though I haven’t implemented a leaderboard as part of the tool, the number of questions answered, and resulting edits are tracked in the backend.

Thus, of the 50k answers, we can take a look at who contributed to the crowd!

  1. PMG: 35,581 answers resulting in 21,084 edits at a 59% edit rate
  2. I dream of horses: 4543 answers resulting in 3184 edits at a 70% edit rate
  3. Tiefenschaerfe: 3749 answers resulting in 3207 edits at an 85% edit rate
  4. Addshore: 3049 answers resulting in 2133 edits at a 69% edit rate
  5. OutdoorAcorn: 708 answers resulting in 526 edits at a 74% edit rate
  6. Waldyrious: 443 answers resulting in 310 edits at a 69% edit rate
  7. Fences and windows: 409 answers resulting in 242 edits at a 59% edit rate
  8. Amazomagisto: 328 answers resulting in 211 edits at a 64 % edit rate

Thanks to all of the 75 users that have given the tool a go in the past months.

Answer overview

  • Yes is the favourite answer with 32,192 occurrences
  • No comes second with 13,473 occurrences
  • And a total of 3,818 questions were skipped altogether

In the future skipped questions will likely be presented to a user a second time.

Question overview

Depicts questions have by far been the most popular, and also the easiest to generate more interesting groups of questions for.

  • 48,236 Depicts questions
  • 776 Alias questions
  • 471 Depicts refinement questions

The question mega groups were split into subgroups.

  • Depicts has had 45 different things that could be depicted
  • Aliases can be added from 3 different language Wikipedias
  • Depicts refinement has been used on 19 of the 45 depicted things

Question success rate

Some questions are harder than others, and some questions have better filtering in terms of candidate answers than others.

For this reason, I suspect that some questions will have a much higher success rate, than others, and some with more skips.

At a high level, the groups of questions have quite different yes rates.

  • Depicts: 65% yes, 27% no, 8% skip
  • Alias: 54% yes, 23% no, 21% skip
  • Depicts refinement: 95% yes, 2% no, 2% skip

If we take a deeper dive into the depict questions, we can probably see some depictions that are hard to spot or commons categories that possibly include a wider variety of media around a core subject.

An example of this would be categories for US presidents that also include whole categories for election campaigns, or demonstrations, neither of which would normally feature the president.

Depicts yes no skip
firework 95.99% 0% 4.01%
jet aircraft 95.19% 3.48% 1.33%
helicopter 89.50% 1.41% 9.09%
dog 87.70% 8.55% 3.76%
steam locomotive 85.24% 7.48% 7.28%
duck 83.35% 10.14% 6.51%
train 82.75% 10.66% 6.59%
hamburger 82.58% 5.63% 11.80%
candle 77.07% 16.67% 6.27%
house cat 74.26% 16.31% 9.43%
laptop 63.32% 27.36% 9.32%
bridge 61.36% 23.93% 14.71%
parachute 61.04% 20.22% 18.74%
camera 57.85% 39.86% 2.29%
electric toothbrush 48.79% 34.76% 16.45%
Barack Obama 28.29% 70.23% 1.49%
pie chart 21.13% 61.76% 17.11%
covered bridge 3.51% 79.61% 16.88%
Summary of depict questions (where over ~1000 questions exist) ordered by yes %

The % rate of yes answers could be used to decide the ease of questions allowing some users to pick harder categories, or forcing new users to try easy questions first.

As question generation is tweaked, particularly for depicts questions where categories can be excluded, we should also see the yes % change over time. Slowly tuning question generation to get to a 80% yes range could be fun!

Of course, none of this is implemented yet ;)…

Queries behind this data

Just in case this needs to be generated again, here are the queries used.

For the user leader boards…


DB::table('answers') ->select('username', DB::raw('count(*) as answers')) ->groupBy('username') ->orderBy('answers', 'desc') ->join('users','answers.user_id','=','users.id') ->limit(10) ->get(); DB::table('edits') ->select('username', DB::raw('count(*) as edits')) ->groupBy('username') ->orderBy('edits', 'desc') ->join('users','edits.user_id','=','users.id') ->limit(10) ->get();
Code language: PHP (php)

And the question yes rate data came from the following query and a pivot table…


DB::table('questions') ->select('question_groups.name','answer',DB::raw('count(*) as counted')) ->join('answers','answers.question_id','=','questions.id', 'left outer') ->join('edits','edits.question_id','=','questions.id', 'left outer') ->join('question_groups','questions.question_group_id','=','question_groups.id') ->groupBy('question_groups.name','answer') ->orderBy('question_groups.name','desc') ->get();
Code language: PHP (php)

Looking forward

Come and contribute, code, issues or ideas on the Github repo.

Next blog post at 100k? Or maybe now that there are cron jobs for question generation (people don’t have to wait for me) 250k is a more sensible next step.

The post WikiCrowd at 50k answers appeared first on addshore.

Tech News issue #14, 2022 (April 4, 2022)

00:00, Monday, 04 2022 April UTC
previous 2022, week 14 (Monday 04 April 2022) next

weeklyOSM 610

10:02, Sunday, 03 2022 April UTC

22/03/2022-28/03/2022

lead picture

JOSM on a Steam Deck [1] © by Riiga licensed under CC BY-SA 4.0 | map data © OpenStreetMap contributors (ODbL) | JOSM: GPLv2 or later

Mapping campaigns

  • OSM Ireland’s building mapping campaign reached a significant milestone as reported by Amanda McCann.

Mapping

  • FasterTracker ponders (pt) > de the lack of a clear and immediate definition of the use of the network key in the context of the public transport, taking the example of the AML (pt) > en, the Lisbon metropolitan area.
  • Minh Nguyen blogged about oddities of township boundaries in Ohio (and as both Minh and commenters point out, it is not just Ohio).
  • muchichka pointed out (uk) > en
    that providing information about the movement and deployment of military forces and relevant international aid is forbidden according to a recent amendment of the Ukrainian Criminal Code. The diary post title indicates that in muchichka’s interpretation this extends to any mapping of military facilities.
  • The following proposals are waiting for your comments:
    • Standardising the tagging of manufacturer:*=* and model:*=* of artificial elements.
    • Introducing quiet_hours=* to facilitate people looking for autism-friendly opening hours.
    • Clarifying the diference between surveillance:type=guard and office=security.
    • Adding loading dock details like dock:height=*, door:height=* or door:width=*.

Community

  • [1] @riiga#7118, on the OSM World Discord, showed JOSM running on their Steam Deck Game Console: ‘No matter the device, no matter the place: mapping first, and with JOSM of course’. Original post (Discord login required.)
  • Based on the OSM Community Index, Who Maps Where (WMW) allows one to search for a mapper with local knowledge anywhere in the world. If you’re okay with your area of local knowledge being shown on the map, the project’s README on GitHub describes how that works.
  • qeef shared his view that communication within the HOT Tasking Manager is wrong because it duplicates OSM functionality.

OpenStreetMap Foundation

  • Guillaume Rischard noted, on Twitter, a blog post from Fastly about how the OpenStreetMap Operations team is using the Fastly (CDN) to ‘provide updates in near real-time’.

Education

  • unen’s latest diary entry continued his reflections on the discussions at his weekly help desk sessions for the HOT Open Mapping Hub Asia-Pacific. He invited people to provide contact details to be informed of future discussion agendas. Issues from recent weeks included accessing older versions of OSM data, and participants’ problems with remote control issues of JOSM.

Maps

    • Marcus Marcos Dione was dissatisfied with the appearance of hill shading in areas of Northern Europe. In his blog he explained how this is a result of the way hill shading is calculated using OSGeo tools.

updated

switch2OSM

  • PlayzinhoAgro wrote, in his blog, about adding public service points to address gender-based violence in Brazil. Volunteer lawyers and psychologists providing assistance are shown (pt) > en on a map.

Open Data

  • ITDP has published a recording of the webinar ‘Why Open Data Matters for Cycling’, available on the Trufi Association website.

Software

  • A new version of Organic Maps has been released for iOS and Android. Map data was updated and Wikipedia articles were added. As usual, the release also includes small bugfixes for routing, styles, and translations.
  • Anthon Khorev has released ‘osm-note-viewer’, an alternative to https://www.openstreetmap.org/user/username/notes, where one can have an overview of notes related to a user both as a list and on a map.

Releases

  • GNU/Linux.ch reported (de) > en on the new version of StreetComplete. The intuitive usability, even for OSM newbies, is highlighted.

Did you know …

Other “geo” things

  • @Pixel_Dailies (a Twitter account) challenges pixel artists with a new theme everyday. On Monday the theme was bird’s eye view and most of the participants’ entries, which can be found through #BirdsEyeView, feature some kind of aerial map.
  • Valentin Socha tweeted screen captures from 1993 France weather reports, where weekly forecasts were shown on a cut-out map with a letter for each day.

Upcoming Events

Where What Online When Country
Tucson State of the Map US osmcalpic 2022-04-01 – 2022-04-03 flag
Burgos Evento OpenStreetMap Burgos (Spain) 2022 osmcalpic 2022-04-01 – 2022-04-03 flag
Região Geográfica Imediata de Teófilo Otoni Mapathona na Cidade Nanuque – MG -Brasil – Edifícios, Estradas, Pontos de Interesses e Área Verde osmcalpic 2022-04-02 – 2022-04-03 flag
Bogotá Distrito Capital – Municipio Notathon en OpenStreetMap – resolvamos notas de Latinoamérica osmcalpic 2022-04-02 flag
Ciudad de Guatemala Segundo mapatón YouthMappers en Guatemala (remoto) osmcalpic 2022-04-02 – 2022-04-03 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-04
OSMF Engineering Working Group meeting osmcalpic 2022-04-04
Bologna Open Data Pax osmcalpic 2022-04-04 flag
Stuttgart Stuttgarter Stammtisch osmcalpic 2022-04-05 flag
Greater London Missing Maps London Mapathon osmcalpic 2022-04-05 flag
Berlin OSM-Verkehrswende #34 (Online) osmcalpic 2022-04-05 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-06
Tasking Manager Collective Meet Up – Option 1 osmcalpic 2022-04-06
Tasking Manager Collective Meet Up – Option 2 osmcalpic 2022-04-06
Heidelberg Heidelberg Int’l. Weeks Against Racism: Humanitarian Cartography and OpenStreetMap osmcalpic 2022-04-06 flag
Berlin 166. Berlin-Brandenburg OpenStreetMap Stammtisch osmcalpic 2022-04-08 flag
OSM Africa April Mapathon: Map Kenya osmcalpic 2022-04-09
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-11
臺北市 OpenStreetMap x Wikidata Taipei #39 osmcalpic 2022-04-11 flag
Washington MappingDC Mappy Hour osmcalpic 2022-04-13 flag
San Jose South Bay Map Night osmcalpic 2022-04-13 flag
20095 Hamburger Mappertreffen osmcalpic 2022-04-12 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-13
Michigan Michigan Meetup osmcalpic 2022-04-14 flag
OSM Utah Monthly Meetup osmcalpic 2022-04-14
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-18
150. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-04-19
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-04-19 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-04-19 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-20
Dublin Irish Virtual Map and Chat osmcalpic 2022-04-21 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, Sammyhawkrad, Strubbl, TheSwavu, UNGSC_Alessia13, alesarrett, derFred.

Profiling a Wikibase item creation on test.wikidata.org

21:54, Saturday, 02 2022 April UTC

Today I was in a Wikibase Stakeholder group call, and one of the discussions was around Wikibase importing speed, data loading, and the APIs. My previous blog post covering what happens when you make a new Wikibase item was raised, and we also got onto the topic of profiling.

So here comes another post looking at some of the internals of Wikibase, through the lens of profiling on test.wikidata.org.

The tools used to write this blog post for Wikimedia infrastructure are both open source, and also public. You can do similar profiling on both your own Wikibase, or for your requests that you suspect are slow on Wikimedia sites such as Wikidata.

Wikimedia Profiling

Profiling of Wikimedia sites is managed and maintained by the Wikimedia performance team. They have a blog, and one of the most recent posts was actually covering profiling PHP at scale in production, so if you want to know the details of how this is achieved give it a read.

Throughout this post I will be looking at data collected from a production Wikimedia request, by setting the X-Wikimedia-Debug header in my request. This header has a few options, and you can find the docs on wikitech.wikimedia.org. There are also browser extensions available to easily set this header on your requests.

I will be using the Wikimedia hosted XHGui to visualize the profile data. Wikimedia specific documentation for this interface also exists on wikitech.wikimedia.org. This interface contains a random set of profiled requests, as well as any requests that were specifically requested to be profiled.

Profiling PHP & MediaWiki

If you want to profile your own MediaWiki or Wikibase install, or PHP in general, then you should take a look at the mediawiki.org documentation page for this. You’ll likely want to use either Tideways or XDebug, but probably want to avoid having to setup any extra UI to visualize the data.

This profiling only covered the main PHP application (MediaWiki & Wikibase extension). Other services such as the query service would require separate profiling.

Making a profiled request

On test.wikidata I chose a not so random item (Q64) which happens to be a small version of the item for Berlin on Wikidata. It has a bunch of labels and a couple of statements.

I made a few modifications including removing the ID and changing all labels to avoid conflicts with the item that I had just copied and came up with some JSON ready to feed back into the API.

I navigated to the API sandbox for test.wikidata.org, and setup a request using wbeditentity which would allow me to create a fresh item. The options look something like this:

  • new = item
  • token = <Auto-fill the token using the UI button>
  • data = <json data that I am using to create an item>

With the XHGui option selected in the WikimediaDebug browser extension, I can hit the “Make request” button and should see my item created. The next page will also output the full runtime of the request from the client perspective, in this case roughly 3.6 seconds.

Finding the request in XHGui

Opening up XHGui I should find the POST request that I just made to test.wikidata somewhere near the top of the list of profiled requests.

Clicking on the Time column, the details page of the profiled request will load. You can find my request, id 61fc06c1fe879940dbdf4a38 (archive URL just in case).

Profiling overview

There are lots of gotchas when it comes to reading a profile such as this:

  • The fact that profiling is happening will generally make everything run slower
  • Profiling tends to overestimate the cost of calling functions, so small functions called many times will appear to look worse than they actually are
  • When IO is involved, such as caching (if the cache is cold), database writes, relying on the internet, or external services, any number of things can cause individual functions to become inflated
  • It’s hard to know what any of it means, without knowing what the classes and methods are doing

Next let’s look at some terms that it makes sense to understand:

  • Wall time: also called real-world time, is the actual time that a thing has taken to run. This includes things such as waiting for IO, or your CPU switching to low power mode.
  • CPU time: also called process time, is the ammount of time the CPU actaully spent processing instructions, excluding things such as time spent waiting for IO.
  • Self: also called exclusive, covers the resources spent in the function itself, excluding time spent in children.
  • Inclusive: covers the resources inclusive of all children

You can read some more about different types of time and inclusivity in profiling on the Time docs for blackfire.io.

Reading the profile

The full wall time of the request is 5,266,796 µs, or 5.2 seconds. This is significantly more than we saw from the perspective of the client making the API request. This is primarily because of the extra processing that MediaWiki and Wikibase does after sending a response back to the user.

The full CPU time of the request is 3,543,361 µs, or 3.5 seconds. We can infer from this that the request included roughly 1.7 seconds of time not doing computations. This could be waiting for databases, or other IO.

We can find likely candidates for this 1.7 seconds of time spent not computing by looking at the top of the function breakdown for wall time, and comparing CPU time.

Method Calls Self Wall Time Self CPU time Difference
Wikimedia\Rdbms\DatabaseMysqli::doQuery 809 1,003,729 µs 107,371 µs ~ 0.9 s
GuzzleHttp\Handler\CurlHandler::__invoke 1 371,120 µs 2,140 µs ~ 0.3 s
MultiHttpClient::runMultiCurl 15 280,697 µs 16,066 µs ~ 0.25 s
Wikimedia\Rdbms\DatabaseMysqli::mysqlConnect 45 68,183 µs 15,229 µs ~ 0.05 s

The 4 methods above have a combined difference between wall and CPU time of 1.5s, which accounts for most of the 1.7s we were looking for. The most expensive method call here is actually the single call to GuzzleHttp\Handler\CurlHandler::__invoke which spends 0.3s waiting, as all of the other methods are called many other times. On average Wikimedia\Rdbms\DatabaseMysqli::doQuery only spends 0.001s per method call in this request.

GuzzleHttp\Handler\CurlHandler::__invoke

Lets have a closer look at this GuzzleHttp\Handler\CurlHandler::__invoke call. We have a few options to see what is actually happening in this method call.

  1. Click on the method to see the details of the call, navigate up through the parents to find something that starts to make some sense
  2. Use the callgraph view (only shows methods that represent more than 1% of execution time)

I’ll choose number 2, and have included a screenshot of the very tall call graph for this method to the right.

At the top of this call we see MediaWiki\SyntaxHighlight\Pygmentize::highlight, which I was not expecting in such an API call.

Another level up we see WANObjectCache::fetchOrRegenerate which means that this was involved in a cache miss, and this data was regenerated.

Even further up the same tree I see SyntaxHighlight::onApiFormatHighlight.

This method is part of the SyntaxHighlight extension, and spends some time making the output of the API pretty for users in a web browser.

So what have I learnt here? Don’t profile with jsonfm. However using the API sandbox you don’t get this option, and thus bug T300909 was born.

Callgraph overview

Having the callgraph open we can see some of the most “expensive” methods in terms of inclusive wall time. You can also find these in the table view by sorting using the headings.

main() represents the bulk of the MediaWiki request (5.2s). This is split into ApiMain::execute taking ~3.4 seconds, and MediaWiki::doPostOutputShutdown taking ~1.7 seconds.

ApiMain::execute

This is where the “magic happens” so to speak. ~3.4 seconds of execution time.

The first bit of Wikibase code you will see in this call graph path is Wikibase\Repo\Api\ModifyEntity::execute. This is the main execute method in the base class that is used by the API that we are calling. Moving to this Wikibase code we also lose another ~0.4 seconds due to my syntax highlighting issue that we can ignore.

Taking a look at the next level of methods in the order they run (roughly) we see most of the execution time.

Method Inclusive Wall time Description
Wikibase\Repo\Api\ModifyEntity::loadEntityFromSavingHelper ~0.2 seconds Load the entity (if exists) that is being edited
Wikibase\Repo\Api\EditEntity::getChangeOp ~0.6 seconds Takes your API input and turns it into ChangeOp objects (previous post)
Wikibase\Repo\Api\ModifyEntity::checkPermissions ~0.3 seconds Checks the user permissions to perform the action
Wikibase\Repo\Api\EditEntity::modifyEntity ~1.8 seconds Take the ChangeOp objects and apply them to an Entity (previous post)
Wikibase\Repo\Api\EntitySavingHelper::attemptSaveEntity ~0.4 seconds Take the Entity and persist it in the SQL database

In the context of the Wikibase stakeholder group call I was in today that was talking about initial import speeds, and general editing speeds, what could I say about this?

  • Why spend 0.3 seconds of an API call checking permissions? Perhaps you are are doing your initial import in a rather “safe” environment. Perhaps You don’t care about all of the permissions that are checked?
  • Permissions are currnetly checked in 3 places for this call. 1) upfront 2) if we need to create a new item 3) just before saving. In total this makes up ~0.6 seconds according to the profiling.
  • Putting the formed PHP Item object into the database actually only takes ~0.15 seconds.
  • Checking the uniqueness of of labels and descriptions takes up ~1.2 seconds of validation of ChangeOps. Perhaps you don’t want that?

MediaWiki::doPostOutputShutdown

This is some of the last code to run as part of a request.

The name implies it, but to be clear this PostOutputShutdown method runs after the user has been served with a request. Taking a look back at the user-perceived time of 3.6 seconds, we can see that the wall time of the whole request (5.2s) minus this post output shutdown (1.7s) is roughly 3.5 seconds.

In relation to my previous post from the point of view of Wikibase, this is when most secondary data updates will happen. Some POST SEND derived data updates also happen in this step.

Closing

As I stated in the call, Wikibase was created primarily with the usecase of Wikidata in mind. There was never a “mass data load” stage for Wikidata requiring extremely high edit rates in order to import thoughts or millions of items. Thus interfaces and internals do not tend to this usecase and optimizations or configurations that could be made have not been made.

I hope that this post will trigger some questions around expensive parts of the editing flow (in terms of time) and also springboard more folks into looking at profiling of either Wikidata and test.wikidata, or their own Wikibase installs.

For your specific use case you may see some easy wins with what is outlined above. But remember that this post and specific profiling is only the tip of the iceberg, and there are many other areas to look at.

The post Profiling a Wikibase item creation on test.wikidata.org appeared first on addshore.

Altering a Gerrit change (git workflow)

21:54, Saturday, 02 2022 April UTC

I don’t use git-review for Gerrit interactions. This is primarily because back in 2012/2013 I couldn’t get git-review installed, and someone presented me with an alternative that worked. Years later I realized that this was actually the documented way of pushing changes to Gerrit.

As a little introduction to what this workflow looks, and a comparison with git-review I have created 2 overview posts altering a gerrit change on the Wikimedia gerrit install. I’m not trying to convince you, either way, is better, merely show the similarities/difference and what is happening behind the scenes.

Be sure to take a look at the other post “Altering a Gerrit change (git-review workflow)

I’ll be taking a change from the middle of last year, rebasing it, making a change, and pushing it back for review. Fundamentally the 2 approaches do the same thing, just one (git-review) requires an external tool.

1) Rebase

Firstly I’ll rebase the change by clicking the “Rebase” button in the top right of the UI. (But this step is entirely optional)

This will create a second patchset on the change, automatically rebase on the master branch if possible. (Or it would tell you to rebase locally).

2) Checkout

In order to checkout the change I’ll use the “Download” button on the right of the change near the changed files.

A dialogue will appear with a bunch of commands that I can copy depending on what I want to do.

As I want to alter the change in place, I’ll use the “Checkout” link.

This will fetch the ref/commit, and then check it out.

3) Change

I can now go ahead and make my change to the commit in my IDE.

The change is quite small and can be seen in the diff below.

Now I need to amend the commit that we fetched from gerrit.

If I want to change to commit message in some way I can do git commit --all --amend

If there is no need to change the commit message you can also pass the --no-edit option.

You’ll notice that we are still in a detached state, but that doesn’t matter too much, as the next step is pushing to gerrit, and once that has happened we don’t need to worry about this commit locally.

4) Push

In order to submit the altered commit back to gerrit, you can just run the following command


git push origin HEAD:refs/for/master

The response of the push will let you know what has happened, and you can find the URL back to the change here.

A third patchset now exists on the change on Gerrit.

Overview

The whole process looks like something like this.

Visualization created with https://git-school.github.io/
  1. A commit already exists on Gerrit that is currently up for review
  2. Clicking the rebase button will rebase this commit on top of the HEAD of the branch
  3. Fetching the commit will bring that commit on to your local machine, where you can now check it out
  4. Making a change and ammending the commit, will create a new commit locally
  5. You can then push this altered commit back to gerrit for review

If you want to know more about what Gerrit is doing, you can read the docs on the “gritty details”

Git aliases

You can use a couple of git aliases to avoid some of these slightly long commands


alias.amm=commit -a --amend alias.amn=commit -a --amend --no-edit alias.p=!f() { git push origin HEAD:refs/for/master; }; f

And you can level these up to provide you with a little more flexibility


alias.amm=commit -a --amend alias.amn=commit -a --amend --no-edit alias.main=!git symbolic-ref refs/remotes/origin/HEAD | sed 's@^refs/remotes/origin/@@' alias.p=!f() { git push origin HEAD:refs/for/$(git main)%ready; }; f alias.pd=!f() { git push origin HEAD:refs/for/$(git main)%wip; }; f
Code language: JavaScript (javascript)

You can read more about my git aliases in a previous post.

The post Altering a Gerrit change (git workflow) appeared first on addshore.

Let’s talk about relationships — nothing gossip-y — but, rather, how does one thing relate to something else? On Wikidata we talk about relationships using something called properties. Part of the semantic triple (subject, predicate, object — or in Wikidata parlance, item, property, value), properties define how one thing relates to another on Wikidata. Is it a date? A name? A location? An image? An identifier. Here’s an example: for those in the northern hemisphere, we may be thankful that this post is being published as spring (Q1312) follows (P155) winter (Q1311). In that sentence ‘follows’ is the property that explains a relationship between ‘winter’ and ‘spring.’ The Wikidata community uses properties to define any kind of relationship between things. How many properties are there? I’m glad you asked.

As of March 2022, there are around 10,000 properties on Wikidata. Roughly 7,000 of these are external identifier properties (external identifier properties correspond to external collections — museums and libraries — whose collection includes a person, place or concept that also exists in Wikidata). That leaves around 3,000 properties the community uses to describe everything. You can read the discussion page of any property to orient yourself to that property, but there are other ways to understand how properties work too. Knowing where to start with those can be a little overwhelming. This post will profile properties about properties. If that sounds confusing, I get it! I’ll provide plenty of examples to contextualize everything and help you better understand how properties work.

Let’s learn through examples. As you discover properties, wouldn’t it be wonderful if there were a way to see the property in action to know if you were using it correctly? I have good news for you: there IS a property that does this. It’s called Wikidata Property Example (P1855 for super-fans). Click that link, and read all about property examples, including links to queries where you can see thousands of properties — with examples — in the wild on Wikidata. To review: there is a property on Wikidata that exists to give you examples of properties and how they work. Can you focus the query on a specific property? Yes. Can you get multiple examples for one query? Yes. Does the example I shared list all properties with examples? Yes! Is this is one of the best ways you can use properties like a pro? Absolutely.

Now that you’re familiar with one way to learn how a properties works, consider this: maybe the dataset you are working with requires you to describe an inverse relationship — or something that is the opposite of something else. If only there were a property that could express an inverse relationship! Well, today is your lucky day because there is a property called inverse property (P1696) that does exactly that. Please note, and this is very important, that this property describes other properties on Wikidata and their relationship is inverse to each other. For example if what you’re describing follows something or if it is followed by something else, the follows (P1696) property would be connected by the inverse property. Another example would be family relationships like a parent property (mother/father) and the child property is the property for you!

If you’re not talking about relationships (properties), but rather items — concepts, people, places — there is a completely different property called opposite of (P461) that the community uses to describe conceptual opposites. What’s a conceptual opposite? You can think of this as the opposite of the color white is the color black. The opposite of summer is winter. It’s okay if it’s a little confusing. Examples will help distinguish these two. To review: an inverse property is used exclusively with relationships — child/parent, capital/capital of, officeholder/position held, owner of/owned by. Another property “opposite of” is used exclusively to describe opposing concepts. Both of these properties are great for distinguishing related things on Wikidata. Let’s move on to another distinguished property.

You are nearly a property pro. You’re feeling confident, you understand how these descriptive connections relate to each other. The world is your oyster and you want to describe more things with more properties, more accuracy, and more precision. I love the enthusiasm. There’s a property that can help you do this: it suggests related properties on Wikidata. It’s called — you guessed it — related property Property (P1659). You can use this property to see other properties related to the one you are wondering about. You can think of it as a “see also” recommendation for properties. There are MANY location-type properties on Wikidata. Suppose you want to know all of the properties related to P131, which describes where things are geographically located? You could use “related properties” in a query to get a list: just like this! You can think of this property as a way to reveal how properties are related to similar properties. Using this property will help make you a super-describer on Wikidata. There’s nothing you can’t describe now!

These three properties (well, four) should reveal more about how to describe anything on Wikidata. Learning how to use properties on Wikidata is essential for maintaining data quality and usefulness of the data. It is also one of the most effect ways to learn how to query and write better queries. The more familiar you are with properties, the more you will get out of Wikidata (and likely any other dataset you’re working with whether it’s part of Wikidata or not). Now that you know more about properties on Wikidata, consider these two things:

  1. Wikidata will always require new properties. If one is missing, you can propose it here. Properties also change over time. If an existing property isn’t working for you (or has never worked for you), you can propose changes on the property’s discussion page. The only way Wikidata will ever be an equitable resource is if property usage and definitions work for all kinds of data and relationships in the world.
  2. The properties I’ve shared with you in this post themselves are incomplete. The community could always use more examples, better definitions, and other ways of describing things. Adding statements to items and properties is a very important way you can help improve these resources.

Stay tuned for more Wikidata property exploration posts here. And if you want to learn more, take the Wikidata Institute course I teach!