This Recent research column originally appeared in the March 2022 issue of the Signpost. It is republished from on-wiki, and by extension is dual-licensed under CC BY SA 4.0 and GFDL 1.3. The authors of this post are Bri, Gerald Waldo Luis and Tilman Bayer.

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

The first scholarly references on Wikipedia articles, and the editors who placed them there

Reviewed by Bri

The authors of the study “Dataset of first appearances of the scholarly bibliographic references on Wikipedia articles”[1] developed “a methodology to detect [when] the oldest scholarly reference [was] added” to 180,795 unique pages on English Wikipedia. The authors concluded the dataset can help investigate “how the scholarly references on Wikipedia grew and which editors added them”. The paper includes a list of the top English Wikipedia editors in a number of scientific research fields.

English Wikipedia lacking in open access references

Reviewed by Gerald Waldo Luis

A four-author study was published by the journal Insights on February 2, 2022 titled “Exploring open access coverage of Wikipedia-cited research across the White Rose Universities”.[2] As implied, it analyzes English Wikipedia references published by universities of the White Rose University ConsortiumLeeds, Sheffield, and York—and examines why open access (OA) is an important feature for Wikipedians to use. It summarizes that the English Wikipedia is still lacking in OA references—that is, those from the consortium.

The study opens by stating that despite the open source nature of Wikipedia editing, there is no requirement to link to OA sites where possible. It then criticizes this lack of scrutiny, reasoning that it is contrary to Wikipedia’s goal of being an accessible portal to knowledge. Several following sections encapsulate the importance of Wikipedia among the research community, which makes OA crucial; this has been recognized by the World Health Organization when they announced they would make their COVID-19 content free to use for Wikipedia. Wikipedia has also proven to be a factor in increasing paper readerships.

Overall, 300 references were sampled for this study. The authors also added: “Of the 293 sample citations where an affiliation could be validated, 291 (99.3%) had been correctly attributed.” “In total,” the study summarizes, “there were 6,454 citations of the [consortium’s] research on the English Wikipedia in the period 1922 to April 2019.” It then presented tables breaking down these references to specific categories: Sheffield was cited the most (2,523), while York was the least (1,525). Biology-related articles cited the consortium the most (1,707), while art and writing articles cited them the least (7). As expected by the authors, journal articles—specifically from Sheffield—were cited the most (1,565). There is also a table breaking the references down by different OA licenses. York had the most OA sources cited on the English Wikipedia (56%). There are fewer sources that have non-commercial and non-derivative Creative Commons licenses. The study, however, disclaims that this is not a review of all English Wikipedia references.

In a penultimate “discussion” section, the study says that while there are many OA references, it is still “some way to go before all Wikipedia citations are fully available [in OA]”, with nearly half of the sampled references paywalled, thus stressing the need for more OA scholarly works. However, with Plan S, a recent OA-endorsing initiative, the study expressed optimism in this goal. It also proposes the solution of more edit-a-thons, which usually involve librarians and researchers who can help with this OA effort. The study notes that Leeds once held an edit-a-thon too. Its “conclusion” section states that “This [effort] can be achieved through greater awareness regarding Wikipedia’s function as an influential and popular platform for communicating science, [a] greater understanding […] as to the importance of citing OA works over [paywalled works].”

Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome. Compiled by Tilman Bayer

“Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia’s Verifiability”

From the abstract:[3]

“In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason.”

“Psychology and Wikipedia: Measuring Psychology Journals’ Impact by Wikipedia Citations”

From the abstract:[4]

“We are presenting a rank of academic journals classified as pertaining to psychology, most cited on Wikipedia, as well as a rank of general-themed academic journals that were most frequently referenced in Wikipedia entries related to psychology. We then compare the list to journals that are considered most prestigious according to the SciMago journal rank score. Additionally, we describe the time trajectories of the knowledge transfer from the moment of the publication of an article to its citation in Wikipedia. We propose that the citation rate on Wikipedia, next to the traditional citation index, may be a good indicator of the work’s impact in the field of psychology.”

“Measuring University Impact: Wikipedia Approach”

From the abstract:[5]

“we discuss the new methodological technique that evaluates the impact of university based on popularity (number of page-views) of their alumni’s pages on Wikipedia. […] Preliminary analysis shows that the number of page-views is higher for the contemporary persons that prove the perspectives of this approach [sic]. Then, universities were ranked based on the methodology and compared to the famous international university rankings ARWU and QS based only on alumni scales: for the top 10 universities, there is an intersection of two universities (Columbia University, Stanford University).”

“Creating Biographical Networks from Chinese and English Wikipedia”

From the abstract and paper:[6]

“The ENP-China project employs Natural Language Processing methods to tap into sources of unprecedented scale with the goal to study the transformation of elites in Modern China (1830-1949). One of the subprojects is extracting various kinds of data from biographies and, for that, we created a large corpus of biographies automatically collected from the Chinese and English Wikipedia. The dataset contains 228,144 biographical articles from the offline Chinese Wikipedia copy and is supplemented with 110,713 English biographies that are linked to a Chinese page. We also enriched this bilingual corpus with metadata that records every mentioned person, organization, geopolitical entity and location per Wikipedia biography and links the names to their counterpart in the other language.” “By inspecting the [Chinese Wikipedia dump] XML files, we concluded that there was no metadata that identifies the biographies and, therefore, we had to rely on the unstructured textual data of the pages. […] we decided to rely on deep learning for text classification. […] The task is to assign a document to one or more predefined categories, in our case, “biography” or “non-biography.” […] For our extraction, we used one of the most widely used contextualized word representations to date, BERT, combined with the neural network’s architecture, BiLSTM. BiLSTM is state of the art for many NLP tasks, including text classification. In our case, we trained a model with examples of Chinese biographies and non-biographies so that it relies on specific semantic features of each type of entry in order to predict its category.”

See also an accompanying blog post.

Apparently the authors were unaware of Wikipedia categories such as zh:Category:人物 (or its English Wikipedia equivalent Category:People) which might have provided an useful additional feature for the machine learning task of distinguishing biographies and non-biographies. On the other hand, they made use of Wikidata to generate a training dataset of biographies and non-biographies.

“Learning to Predict the Departure Dynamics of Wikidata Editors”

From the abstract:[7]

“…we investigate the synergistic effect of two different types of features: statistical and pattern-based ones with DeepFM as our classification model which has not been explored in a similar context and problem for predicting whether a Wikidata editor will stay or leave the platform. Our experimental results show that using the two sets of features with DeepFM provides the best performance regarding AUROC (0.9561) and F1 score (0.8843), and achieves substantial improvement compared to using either of the sets of features and over a wide range of baselines”

“When Expertise Gone Missing: Uncovering the Loss of Prolific Contributors in Wikipedia”

From the abstract and paper (preprint version):[8]

“we have studied the ongoing crisis in which experienced and prolific editors withdraw. We performed extensive analysis of the editor activities and their language usage to identify features that can forecast prolific Wikipedians, who are at risk of ceasing voluntary services. To the best of our knowledge, this is the first work which proposes a scalable prediction pipeline, towards detecting the prolific Wikipedians, who might be at a risk of retiring from the platform and, thereby, can potentially enable moderators to launch appropriate incentive mechanisms to retain such `would-be missing’ valued Wikipedians.”

“We make the following novel contributions in this paper. – We curate a first ever dataset of missing editors, a comparable dataset of active editors along with all the associated metadata that can appropriately characterise the editors from each dataset.[…]

– First we put forward a number of features describing the editors (activity and behaviour) which portray significant differences between the active and the missing editors.[…]

– Next we use SOTA machine learning approaches to predict the currently prolific editors who are at the risk of leaving the platform in near future. Our best models achieve an overall accuracy of 82% in the prediction task. […]

An intriguing finding is that some very simple factors like how often an editor’s edits are reverted or how often an editor is assigned administrative tasks could be monitored by the moderators to determine whether an editor is about to leave the platform”

References

Due to software limitations, references are displayed here in image format. For the original list of references with clickable links, see the Signpost column.

A Trainsperiments Week Reflection

04:56, Wednesday, 06 2022 April UTC

Over here in the Release-Engineering-Team, Train Deployment is usually a rotating duty. We've written about it before, so I won't go into the exact process, but I want to tell you something new about it.

It's awful, incredibly stressful, and a bit lonely.

And last week we ran an experiment where we endeavored to perform the full train cycle four times in a single week... What is wrong with us? (Okay. I need to own this. It was technically my idea.) So what is wrong with me? Why did I wish this on my team? Why did everyone agree to it?

First I think it's important to portray (and perhaps with a little more color) how terrible running the train can be.

How it usually feels to run a Train Deployment and why

Here's a little chugga-choo with a captain and a crew. Would the llama like a ride? Llama Llama tries to hide.

―Llama Llama, Llama Llama Misses Mama

At the outset of many a week I have wondered why, when the kids are safely in childcare and I'm finally in a quiet house well fed and preparing a nice hot shower to not frantically use but actually enjoy, my shoulder is cramping and there's a strange buzzing ballooning in my abdomen.

Am I getting sick? Did I forget something? This should be nice. Why can't I have nice things? Why... Oh. Yes. Right. I'm on train this week.

Train begins in the body before it terrorizes the mind, and I'm not the only one who feels that way.

A week of periodic drudgery which at any moment threatens to tip into the realm of waking nightmare.

―Stoic yet Hapless Conductor

Aptly put. The nightmare is anything from a tiny visual regression to taking some of the largest sites on the Internet down completely.

Giving a presentation but you have no idea what the slides are.

―Bravely Befuddled Conductor

Yes. There's no visibility into what we are deploying. It's a week's worth of changes, other teams' changes, changes from teams with different workflows and development cycles, all touching hundreds of different codebases. The changes have gone through review, they've been hammered by automated tests, and yet we are still too far removed from them to understand what might happen when they're exposed to real world conditions.

It's like throwing a penny into a well, a well of snakes, bureaucratic snakes that hate pennies, and they start shouting at you to fill out oddly specific sounding forms of which you have none.

―Lost Soul been 'round these parts

Kafkaesque.

When under the stress and threat of the aforementioned nightmare, it's difficult to think straight. But we have to. We have to parse and investigate intricate stack traces, run git blames on the deployment server, navigate our bug reporting forms and try to recall which teams are responsible for which parts of the aggregate MediaWiki codebase we've put together which itself is highly specific to WMF's production installation and really only becomes that long after changes merge to main branches of the constituent codebases.

We have to exercise clear judgement and make decisive calls of whether to rollback partially (previous group) or completely (all groups to previous version). We may have to halt everything and start hollering in IRC, Slack channels, mailing lists, to get the signal to the right folks (wonderful and gracious folks) that no more code changes will be deployed until what we're seeing is dealt with. We have to play the bad guys and gals to get the train back on track.

Trainsperiments Week and what was different about it

Study after study shows that having a good support network constitutes the single most powerful protection against becoming traumatized. Safety and terror are incompatible. When we are terrified, nothing calms us down like a reassuring voice or the firm embrace of someone we trust.

―Bessel Van Der Kolk, M.D., The Body Keeps the Score

Four trains in a single week and everyone in Release Engineering is onboard. What could possibly be better about that?

Well there is a safety in numbers as they say, and not in some Darwinistic way where most of us will be picked off by the train demons and the others will somehow take solace in their incidental fitness, but in a way where we are mutually trusting, supportive, and feeling collectively resourced enough to do the needful with aplomb.

So we set up video meetings for all scheduled deployment windows, had synchronous hand offs between our European colleagues and our North American ones. We welcomed folks from other teams into our deployments to show them the good, the bad, and the ugly of how their code gets its final send off 'round the bend and into the setting hot fusion reaction that is production. We found and fixed longstanding and mysterious bugs in our tooling. We deployed four full trains in a single week.

And it felt markedly different.

One of those barn raising projects you read about where everybody pushes the walls up en masse.

―Our Stoic Now Softened but Still Sardonic Conductor

Yes! Lonely and unwitnessed work is de facto drudgery. Toiling safely together we have a greater chance at staving off the stress and really feeling the accomplishment.

Giving a presentation with your friends and everyone contributes one slide.

―Our No Longer Befuddled but Simply Brave Conductor

Many hands make light work!

It was like throwing a handful of pennies into a well, a well of snakes, still bureaucratic and shouty, oh hey but my friends are here and they remind me these are just stack traces, words on a screen, and my friends happen to be great at filling out forms.

―Our Once Lost Now Found Conductor

When no one person is overwhelmed or unsafe, we all think and act more clearly.

The hidden takeaways of Trainsperiment Week

So how should what we've learned during our Trainsperiment Week inform our future deployment strategies and process. How should train deployments change?

The known hypothesis we wanted to test by performing this experiment was in essence:

  1. More frequent deployments will result in fewer changes being deployed each time.
  2. Fewer changes on average means the deployment is less likely to fail. The deployment is safer.
  3. A safer deployment can be performed more frequently. (Positive feedback loop to #1.)
  4. Overall we will: move faster; break less.

I don't know if we've proved that yet but we got an inkling that yes, the smaller subsequent deployments of the week did seem to go more smoothly. One week, however, even a week of four deployment cycles is not a large enough sample to say definitively whether doing train more frequently will for sure result in safer, more frequent deployments with fewer failures.

What was not apparent until we did our retrospective, however, is that it simply felt easier to do deployments together. It was still a kind of drudgery, but it was not abjectly terrible.

My personal takeaway is that a conductor who feels resourced and safe is the basis for all other improvements to the deployment process, and I want conductors to not only have tooling that works reliably with actionable logging at their disposal, but to feel a sense of community there with them when they're pushing the buttons. I want them to feel that the hard calls of whether or not to halt everything and rollback are not just their calls but shared in the moment among numerous people with intimate knowledge of the overall MediaWiki software ecosystem.

Better tooling—particularly around error reporting and escalation—is a barrier to entry for sure. Once we've made sufficient improvements there we need to get that tooling into other people's hands and show them that this process does not have to be so terrifying. And I think we're on the right track here with increased frequency and smaller sets of changes, but we can't lose sight of the human/social element and foundational basis of safety.

More than anything else, I want wider participation in the train deployment process by engineers in the entire organization along with volunteers.


Thanks to @thcipriani for reading my drafts and unblocking me from myself a number of times. Thanks to @jeena and @brennen for the inspirational analogies.

Brand guidelines navigation

Over the past few months, the Brand Studio team within the Wikimedia Foundation communications department has been working on much needed updates to the Movement Brand Guidelines portal on Meta-Wiki. The update included expanding the portal with more information, a fresh look as well as a pilot to provide new do-it-yourself tools using the cloud based design platform, figma.

It all started last October when the Wikimedia Foundation Board of Trustees approved a resolution to advance key areas of global branding while extending the pause of renaming work. One of the new projects the Board of Trustees directed the Brand Studio team to work on was: supporting the movement with updated brand guidelines. 

The brand portal on Meta-Wiki  is used by affiliates, foundation staff, the press and individuals. Unfortunately over time, the needs of brand users began to outpace the portal’s resources. In February we shared a proposal with the community to replace it with an updated version and continue to apply updates on the portal at least biannually based on community feedback. 

Updated content and navigation

The new portal provides the user with seven main sections to navigate: Overview, Logo, Typography, Colours, Imagery, Campaigns and Events, and Create. The logo section, for example, provides information about all the Wikimedia marks, not just the Wikimedia Foundation’s logo. 

Our colours have been expanded to include all the Wikimedia colour palettes: The Core palette, (black, white and greyscale) The Legacy palette (our legendary tricolour) and The Creative palette (inspired by the Wikipedia 20 birthday identity). We have also defined our colours with specific values so you can easily implement them in your design and shared colour versions that have been optimised for accessibility. 

Updated templates and new do-it-yourself tools

The Create section of the portal provides the user with ready templates on Figma that can be used to create a new logo for an affiliate or a community activity in very short time without any need for experience with design. In addition, the section provides a new presentation template to replace the one we all have been using in events and meetings since 2016. 

Last January, we shared the proposed portal, design templates and the new presentation template for feedback. We would like to thank everyone who took the time to review the materials and share their feedback with us through the survey, office hours, the email or Meta-Wiki. We used a big part of the feedback provided to apply rapid changes to the tools before publishing them and we are planning to use more of this feedback during the next round of updates. 

Training

Please join us on Tuesday 19 April 2022 at 15:00 UTC (using this link) or Sunday 24 April 2022 at 08:00 UTC (using this link) in a 60 minutes live workshop to walk you through the new tools, how to use them, create designs together and answer any questions you may have about the updates. Recorded summaries of the training will be available to watch later for those who cannot attend the live workshop.

We look forward to hearing more feedback on the updates from across the movement and to see the updates in use by affiliates and individuals. 

This blog post is part of a three part blog series exploring how organizing helps the Wikimedia movement grow participation and respond to movement strategy. Parts two and three will be published during the next several weeks, and links will be updated here.

When I first joined the Wikipedia community, I became known among many in the United States community as a young, enthusiastic advocate — as someone who could share excitedly the promise of GLAM-Wiki, the Education Program and editing Wikipedia to change the world. I ran nearly a dozen events powered by enthusiasm — advocating Wikipedia to anyone I could, and making sure that they edited at least once. 

How many of those early recruits are still in the movement? Nearly none. 

12+ years of organizing later, I look back and see that only a handful of the people I contacted in those first few years are still contributing with the movement. My enthusiasm for the mission, the opportunities that I saw on Wikipedia, and the energy I invested in recruiting in those first few years — felt a bit of a waste. Fun, but poorly spent energy.  Through sheer force of will I made those activities work — so much so that volunteer Alex convinced some early Foundation staff that there was a model that worked of “just training enthusiastic organizers like Alex”. 

Nearly every day, I encounter this same will and enthusiasm while organizing campaigns like #WikiForHumanRights, joining the outings of my local affiliates, and communicating with, coaching and mentoring organizers throughout the movement — as part of the Campaigns Programs team at the foundation. And just as frequently, I find that this sometimes misdirected energy can leave organizers either dispirited or exhausted. 

A flier for our upcoming #WikiForHumanRights campaign (learn more on diff): a campaign at the intersection of environmental issues and human rights. Learning how to run campaigns has really changed how I think about who we can recruit to our community.

As Wikimedians, our love for the process of documenting the world also creates an infectious enthusiasm and care for showing others how to do the same. But the fact is: most people don’t share our enthusiasm for public knowledge, or documenting the world, or learning how we do it (on a Wiki). It takes more than a beautiful vision to inspire; it requires alignment to our purpose and actions, and invitation and care in bringing newcomers along with us.

“Anyone can edit” is a software setting, not how you sustain a community

At the heart of my early enthusiasm was a complete belief in “Anyone can edit.” I looked at my University’s library, saw how much knowledge was not on Wikipedia, and thought “if only they knew what power filling these knowledge gaps could have, everyone would want to edit”.  Obviously that wasn’t the case.

It’s important to remember how we got to “anyone can edit”. The tagline comes from our roots in the early internet and the open source software community. Anyone can edit is a theory (i.e. there could be editors, because the software lets them edit), but in practice this is rarely the case. Editing requires both a number of interactions to go right with our platforms, and the motivation, knowledge and access to time to do so.

The instructions prepared by my colleague Felix Nartey for the #1lib1ref campaign. Even though editing a page is turned on, there are a lot of steps between clicking edit and successful contribution.

There’s an old saying in the movement that “Wikipedians are born, not made”. These born Wikimedians are a rather rare bunch — they are often those self-selecting because they have the skills to use our software, research an encyclopedia and belong in the social environment that arises from this process. This, in many ways, echoes certain parts of the open source communities which created meritocracy cultures by focusing on the communities they already had, they often became tragically misaligned cultures of exclusion (for example). 

The New Editors Experiences research completed by the Wikimedia Foundation in 2017−18 provided a fairly robust examination of how we made it hard for the motivated participants. And the subsequent work of the Growth team at the Foundation has created some impressive, incremental steps in making it easy for “anyone who clicks edit” or “anyone who creates an account” to be successful. Now instead of the vacant stare of a blank wikitext page welcoming you after you create an account, you are actually welcomed, given a choose-your-own adventure path to participating, and soon you will be told how your work matters

But that doesn’t mean that everyone will see the opportunity to edit and create an account. Only a tiny portion of the global population sees the edit button as an invitation and is willing to invest time and energy in the centuries-long project we have invested in: the sum of all human knowledge. The growth features are useful to most new editors if they click edit, but were designed to most deeply facilitate participation for two of the six fictional design research personas developed during the New Editor Experiences research: the Reactive Corrector and the Knowledge Sharer. For them, the edit button is a sufficient invitation for their personal missions, and their purpose can be facilitated by a wonderful set of tools. But the other types of new editors don’t always imagine themselves just clicking edit to begin with. 

The six fictional personas identified by the New Editors Research in 2017. Two of the personas are commonly identified when discussing “Born not made” Wikipedians, the reactive corrector and knowledge sharer. But how do we put invitations out to other potential contributors?

If you found your way into the movement organically through editing, there is a good chance that part of your personality and motivation is similar to these two fictional personas prioritized by the Growth team.  For example, teach someone who is an instinctive Knowledge Sharer how to edit Wikidata, and they will be editing for years to come; or invite a Reactive Corrector to WPWP, and you might get 1000 new images on Wikipedia articles. And, though we continue to grow in many parts of the world through natural growth from these kinds of newcomers, the editing communities are starting to slow their replacement rate. 

Most people don’t share the same compulsion to edit as our existing communities — they could be inspired by the work of the reactive corrector and knowledge sharer, but the sheer persistence to document the world is not their thing. Each of the four other personas from the research need something else beyond the better tools to motivate their participation in our projects: the Audience Builder, a financial or self-promotion goal; the Box Checker, an outside requirement such as a school assignment; the Social Changer, a vision for how knowledge changes the world; and the Joiner Inner, a community to join. 

How do we shift focus to “anyone who shares our vision will be able to join us”?

The Wikimedia movement’s vision imagines “a world in which every single human being can freely share in the sum of all knowledge.” This can feel as inspiring as it is broad. It mentions nothing of wikis, or of particular Wikimedian knowledge production processes, or even the internet. But in practice, our work as a movement happens on our platforms and in our communities, and is driven by very concrete socio-technical tools and norms. This is where Wikimedia’s 2030 Strategic Direction Movement Strategy gives us a clearer and more actionable mandate: “anyone who shares our vision will be able to join us”. Through this lens, we can more clearly see the limitations of “anyone can edit” as a call to action. In order for people who share our vision — of universal access to knowledge — to join us, they first must be able to see their own public knowledge missions within the work of Wikimedia. 

I would argue that from a platform contribution perspective — that means we need to get better at inviting two of the four personas who we can provide a more deliberate invitation to: the Social Changers and the Joiner Inners. I exclude the Box Checkers and the Audience Builders, because Wiki Education and other education programs have figured out great ways to engage the Box Checkers, and the general sentiment of the movement about the Audience Builders is mixed: if making benign edits, they can become fine parts of the community; but as a group they have a tendency to start by pushing their point of view in such a way that it may be more trouble than it is worth (and in a world of disinformation, this can have toxic results).  

One of the problems for Social Changer and Joiner Inner personas though: they need a connection to our movement. These personas need people and/or spaces that help them achieve their personal mission. Fortunately we have a growing network of capable affiliates and organizers in the movement who can provide that space. These affiliates and organizers create partnerships, outreach activities, events, and learning opportunities for the parts of the public that may be familiar with Wikimedia, but haven’t yet thought to participate.

Organizers align some of our hardest to reach newcomers to our purpose and actions. By focusing on these newcomers with invitation and care, the newcomers feel supported and join us in our mission.

In 2019, we published the Movement Organizers Research to better understand organizers: Who are the facilitators that introduce new audiences to contributing? Where do they come from? How do we make sure that our movement makes it as easy for Organizers to contribute as it is to edit?

In Part II, I will focus on what we learned from the Movement Organizers research, and how organizers in the last few years have helped us reimagine what a welcoming invitation is for targeted participants, like Social Changers. 

Tech/News/2022/14

15:48, Tuesday, 05 2022 April UTC

Other languages: Bahasa Indonesia, Deutsch, English, dagbanli, français, italiano, magyar, polski, português do Brasil, suomi, svenska, čeština, русский, українська, עברית, العربية, فارسی, ગુજરાતી, 中文, 日本語, ꯃꯤꯇꯩ ꯂꯣꯟ, 한국어

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Problems

  • For a few days last week, edits that were suggested to newcomers were not tagged in the Special:RecentChanges feed. This bug has been fixed. [1]

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 5 April. It will be on non-Wikipedia wikis and some Wikipedias from 6 April. It will be on all wikis from 7 April (calendar).
  • Some wikis will be in read-only for a few minutes because of a switch of their main database. It will be performed on 7 April at 7:00 UTC (targeted wikis).

Future changes

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Don’t Blink: Public Policy Snapshot for March 2022

14:38, Tuesday, 05 2022 April UTC

In case you blinked, we’re happy to catch you up with legislative and regulatory developments around the world that shape people’s ability to participate in the free knowledge movement. 

Here are the most important developments that have preoccupied the Wikimedia Foundation’s Global Advocacy team.


US Legislative Developments

  • Sec. 230: As part of our work advocating for protections for online intermediaries like Wikipedia, our team participated in a symposium hosted by William and Mary Law School. The panel discussion about ‘Business Law’s Response to Emerging Cultural Issues’ covered how crucial Section 230 of the Communications Decency Act has been to the development of free expression online, as well as how the various proposals to amend it may negatively impact the Internet. Our policy specialist Kate Ruane highlighted how these proposed changes would impact nonprofit projects like Wikipedia, and other online services, that are distinct from the large social media companies often at the center of the debate surrounding proposed changes to Section 230. 
  • Journalism Competition & Preservation Act (JCPA): Wikimedia Small Projects hosted our team for an episode of the SuenaWiki podcast to discuss the challenges posed by the JCPA currently under consideration in the US Congress.

Latin America and the Caribbean

  • Argentina: The Foundation’s Global Advocacy and Legal teams collaborated to file an amicus brief in the Supreme Court of Argentina in a right to be forgotten case (Denegri v. Google Inc). The applicant, a public figure, is asking Google to delist their name from content related to their media past, which they wish to be forgotten. In our brief, we argue that the right to be forgotten should not apply in this case as doing so would be an obstacle to freedom of expression and the right to information.
  • Chile: Both international and Chilean groups, including Wikimedia Chile, have expressed concern over legislation under consideration in the Chilean Congress to regulate digital platforms. Our blog post highlights the shortcomings of the Bill, including ambiguous language, impractical content moderation requirements, and a lack of consideration of community-led platforms. The Bill has the potential to become a misguided influence on similar regulations throughout the region, if approved.

Asia

  • Bangladesh: Authorities are currently reviewing the recommendations for proposed regulations, which could potentially impact Wikimedia’s volunteer-driven model and impose a short timeline for content removal and excessive penalties. A coalition letter signed by the Wikimedia Foundation and sent to the Bangladesh Telecommunication Regulatory Commission on 7 March outlines the concerns of major international human rights and internet freedom groups with the proposed “Regulation for Digital, Social Media, and OTT Platforms.” The letter has received significant media attention by Bangladesh’s major print and online media.

European Union

  • Digital Markets Act (DMA): On March 24, the three main EU bodies concluded negotiations over a common version of the  DMA, an EU regulation that is intended to ensure a higher degree of competition among services on the internet by preventing large companies from abusing their power. The Free Knowledge Advocacy Group EU has been monitoring these developments and advocating for provisions that will enable free knowledge projects to thrive. An analysis of the practical consequences of the negotiation outcomes regarding interoperability of services can be found in their blogpost

Additional Developments

  • United Kingdom Online Safety Bill: The United Kingdom (UK) Government formally introduced its long-awaited Online Safety Bill on Thursday, March 17. Our Global Advocacy team published an initial assessment of what this Bill means for community-governed platforms like Wikipedia. The Bill attempts to hold internet platforms accountable for harmful content that is spread via their services, but the approach promoted in the UK Bill is misguided both in terms of the users it claims to protect and the platforms it supposedly holds accountable. Stay tuned for a deep-dive analysis of the Bill. 
  • The European Court of Human Rights has dismissed the Wikimedia Foundation’s 2019 petition to lift the block of Wikipedia in Turkey. The Court explained its decision on the grounds that the Turkish government had already restored access to Wikipedia in January 2020, and because the block was already determined to be a human rights violation in the Turkish Constitutional Court’s December 2019 ruling. The European Court of Human Rights’ decision comes at a time when access to knowledge continues to be under threat around the world. The Wikimedia Foundation will continue to defend the right of everyone to freely access and participate in knowledge. Learn more about the case, and the current status of Turkish Wikipedia in the Foundation’s official statement
  • World Intellectual Property Organization (WIPO): The Global Advocacy team has supported a group of Wikimedia chapters in applying for ad hoc observer status at the WIPO Standing Committee on Copyright and Related Rights. Observer status in this body will allow the Wikimedia Movement to have a voice in future discussions shaping copyright issues globally. The team has also been supporting interested affiliates to apply for permanent observer status at WIPO, which will allow them to participate in discussions on other intellectual property issues (e.g., traditional knowledge, climate change) that impact access to knowledge. Chapters we have helped apply for permanent and ad hoc observer status include those of Argentina, France, Germany, Italy, Mexico, South Africa, Sweden, Switzerland.

To learn more about our team and the work we do, follow us on Twitter (@WikimediaPolicy) or sign-up to the Wikimedia public policy mailing list. The team’s Meta page is under construction.

Outreachy report #30: March 2022

00:00, Tuesday, 05 2022 April UTC

March was a tough month–my partner and I had dengue fever as we reviewed and processed initial applications–, but we made through it. ✨ Team highlights Sage developed new code to help us review and process school time commitments: Sage and I have been trying to develop strategies to review academic calendars quickly for years. We’ve gone from external notes to trying to gathering data on specific schools and requesting initial application reviewers to assign students to us.

WikiCrowd at 50k answers

19:13, Monday, 04 2022 April UTC

In January 2022 I published a new Wikimedia tool called WikiCrowd.

This tool allows people to answer simple questions to contribute edits to Wikimedia projects such as Wikimedia Commons and Wikidata.

It’s designed to be able to deal with a wide variety of questions, but due to time constraints, the extent of the current questions covers Aliases for Wikidata, and Depict statements for Wikimedia Commons.

The tool has just surpassed 55k questions, 50k answers, 32k edits and 75 users.

Thanks to @pmgpmgpmgpmg (Twitter, Github) and @waldyrious (Twitter, Github) for their sustained contributions to the project filling issues as well as contributing code and question definitions.

User Leaderboard

Though I haven’t implemented a leaderboard as part of the tool, the number of questions answered, and resulting edits are tracked in the backend.

Thus, of the 50k answers, we can take a look at who contributed to the crowd!

  1. PMG: 35,581 answers resulting in 21,084 edits at a 59% edit rate
  2. I dream of horses: 4543 answers resulting in 3184 edits at a 70% edit rate
  3. Tiefenschaerfe: 3749 answers resulting in 3207 edits at an 85% edit rate
  4. Addshore: 3049 answers resulting in 2133 edits at a 69% edit rate
  5. OutdoorAcorn: 708 answers resulting in 526 edits at a 74% edit rate
  6. Waldyrious: 443 answers resulting in 310 edits at a 69% edit rate
  7. Fences and windows: 409 answers resulting in 242 edits at a 59% edit rate
  8. Amazomagisto: 328 answers resulting in 211 edits at a 64 % edit rate

Thanks to all of the 75 users that have given the tool a go in the past months.

Answer overview

  • Yes is the favourite answer with 32,192 occurrences
  • No comes second with 13,473 occurrences
  • And a total of 3,818 questions were skipped altogether

In the future skipped questions will likely be presented to a user a second time.

Question overview

Depicts questions have by far been the most popular, and also the easiest to generate more interesting groups of questions for.

  • 48,236 Depicts questions
  • 776 Alias questions
  • 471 Depicts refinement questions

The question mega groups were split into subgroups.

  • Depicts has had 45 different things that could be depicted
  • Aliases can be added from 3 different language Wikipedias
  • Depicts refinement has been used on 19 of the 45 depicted things

Question success rate

Some questions are harder than others, and some questions have better filtering in terms of candidate answers than others.

For this reason, I suspect that some questions will have a much higher success rate, than others, and some with more skips.

At a high level, the groups of questions have quite different yes rates.

  • Depicts: 65% yes, 27% no, 8% skip
  • Alias: 54% yes, 23% no, 21% skip
  • Depicts refinement: 95% yes, 2% no, 2% skip

If we take a deeper dive into the depict questions, we can probably see some depictions that are hard to spot or commons categories that possibly include a wider variety of media around a core subject.

An example of this would be categories for US presidents that also include whole categories for election campaigns, or demonstrations, neither of which would normally feature the president.

Depicts yes no skip
firework 95.99% 0% 4.01%
jet aircraft 95.19% 3.48% 1.33%
helicopter 89.50% 1.41% 9.09%
dog 87.70% 8.55% 3.76%
steam locomotive 85.24% 7.48% 7.28%
duck 83.35% 10.14% 6.51%
train 82.75% 10.66% 6.59%
hamburger 82.58% 5.63% 11.80%
candle 77.07% 16.67% 6.27%
house cat 74.26% 16.31% 9.43%
laptop 63.32% 27.36% 9.32%
bridge 61.36% 23.93% 14.71%
parachute 61.04% 20.22% 18.74%
camera 57.85% 39.86% 2.29%
electric toothbrush 48.79% 34.76% 16.45%
Barack Obama 28.29% 70.23% 1.49%
pie chart 21.13% 61.76% 17.11%
covered bridge 3.51% 79.61% 16.88%
Summary of depict questions (where over ~1000 questions exist) ordered by yes %

The % rate of yes answers could be used to decide the ease of questions allowing some users to pick harder categories, or forcing new users to try easy questions first.

As question generation is tweaked, particularly for depicts questions where categories can be excluded, we should also see the yes % change over time. Slowly tuning question generation to get to a 80% yes range could be fun!

Of course, none of this is implemented yet ;)…

Queries behind this data

Just in case this needs to be generated again, here are the queries used.

For the user leader boards…


DB::table('answers') ->select('username', DB::raw('count(*) as answers')) ->groupBy('username') ->orderBy('answers', 'desc') ->join('users','answers.user_id','=','users.id') ->limit(10) ->get(); DB::table('edits') ->select('username', DB::raw('count(*) as edits')) ->groupBy('username') ->orderBy('edits', 'desc') ->join('users','edits.user_id','=','users.id') ->limit(10) ->get();
Code language: PHP (php)

And the question yes rate data came from the following query and a pivot table…


DB::table('questions') ->select('question_groups.name','answer',DB::raw('count(*) as counted')) ->join('answers','answers.question_id','=','questions.id', 'left outer') ->join('edits','edits.question_id','=','questions.id', 'left outer') ->join('question_groups','questions.question_group_id','=','question_groups.id') ->groupBy('question_groups.name','answer') ->orderBy('question_groups.name','desc') ->get();
Code language: PHP (php)

Looking forward

Come and contribute, code, issues or ideas on the Github repo.

Next blog post at 100k? Or maybe now that there are cron jobs for question generation (people don’t have to wait for me) 250k is a more sensible next step.

The post WikiCrowd at 50k answers appeared first on addshore.

Tech News issue #14, 2022 (April 4, 2022)

00:00, Monday, 04 2022 April UTC
previous 2022, week 14 (Monday 04 April 2022) next

weeklyOSM 610

10:02, Sunday, 03 2022 April UTC

22/03/2022-28/03/2022

lead picture

JOSM on a Steam Deck [1] © by Riiga licensed under CC BY-SA 4.0 | map data © OpenStreetMap contributors (ODbL) | JOSM: GPLv2 or later

Mapping campaigns

  • OSM Ireland’s building mapping campaign reached a significant milestone as reported by Amanda McCann.

Mapping

  • FasterTracker ponders (pt) > de the lack of a clear and immediate definition of the use of the network key in the context of the public transport, taking the example of the AML (pt) > en, the Lisbon metropolitan area.
  • Minh Nguyen blogged about oddities of township boundaries in Ohio (and as both Minh and commenters point out, it is not just Ohio).
  • muchichka pointed out (uk) > en
    that providing information about the movement and deployment of military forces and relevant international aid is forbidden according to a recent amendment of the Ukrainian Criminal Code. The diary post title indicates that in muchichka’s interpretation this extends to any mapping of military facilities.
  • The following proposals are waiting for your comments:
    • Standardising the tagging of manufacturer:*=* and model:*=* of artificial elements.
    • Introducing quiet_hours=* to facilitate people looking for autism-friendly opening hours.
    • Clarifying the diference between surveillance:type=guard and office=security.
    • Adding loading dock details like dock:height=*, door:height=* or door:width=*.

Community

  • [1] @riiga#7118, on the OSM World Discord, showed JOSM running on their Steam Deck Game Console: ‘No matter the device, no matter the place: mapping first, and with JOSM of course’. Original post (Discord login required.)
  • Based on the OSM Community Index, Who Maps Where (WMW) allows one to search for a mapper with local knowledge anywhere in the world. If you’re okay with your area of local knowledge being shown on the map, the project’s README on GitHub describes how that works.
  • qeef shared his view that communication within the HOT Tasking Manager is wrong because it duplicates OSM functionality.

OpenStreetMap Foundation

  • Guillaume Rischard noted, on Twitter, a blog post from Fastly about how the OpenStreetMap Operations team is using the Fastly (CDN) to ‘provide updates in near real-time’.

Education

  • unen’s latest diary entry continued his reflections on the discussions at his weekly help desk sessions for the HOT Open Mapping Hub Asia-Pacific. He invited people to provide contact details to be informed of future discussion agendas. Issues from recent weeks included accessing older versions of OSM data, and participants’ problems with remote control issues of JOSM.

Maps

    • Marcus Marcos Dione was dissatisfied with the appearance of hill shading in areas of Northern Europe. In his blog he explained how this is a result of the way hill shading is calculated using OSGeo tools.

updated

switch2OSM

  • PlayzinhoAgro wrote, in his blog, about adding public service points to address gender-based violence in Brazil. Volunteer lawyers and psychologists providing assistance are shown (pt) > en on a map.

Open Data

  • ITDP has published a recording of the webinar ‘Why Open Data Matters for Cycling’, available on the Trufi Association website.

Software

  • A new version of Organic Maps has been released for iOS and Android. Map data was updated and Wikipedia articles were added. As usual, the release also includes small bugfixes for routing, styles, and translations.
  • Anthon Khorev has released ‘osm-note-viewer’, an alternative to https://www.openstreetmap.org/user/username/notes, where one can have an overview of notes related to a user both as a list and on a map.

Releases

  • GNU/Linux.ch reported (de) > en on the new version of StreetComplete. The intuitive usability, even for OSM newbies, is highlighted.

Did you know …

Other “geo” things

  • @Pixel_Dailies (a Twitter account) challenges pixel artists with a new theme everyday. On Monday the theme was bird’s eye view and most of the participants’ entries, which can be found through #BirdsEyeView, feature some kind of aerial map.
  • Valentin Socha tweeted screen captures from 1993 France weather reports, where weekly forecasts were shown on a cut-out map with a letter for each day.

Upcoming Events

Where What Online When Country
Tucson State of the Map US osmcalpic 2022-04-01 – 2022-04-03 flag
Burgos Evento OpenStreetMap Burgos (Spain) 2022 osmcalpic 2022-04-01 – 2022-04-03 flag
Região Geográfica Imediata de Teófilo Otoni Mapathona na Cidade Nanuque – MG -Brasil – Edifícios, Estradas, Pontos de Interesses e Área Verde osmcalpic 2022-04-02 – 2022-04-03 flag
Bogotá Distrito Capital – Municipio Notathon en OpenStreetMap – resolvamos notas de Latinoamérica osmcalpic 2022-04-02 flag
Ciudad de Guatemala Segundo mapatón YouthMappers en Guatemala (remoto) osmcalpic 2022-04-02 – 2022-04-03 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-04
OSMF Engineering Working Group meeting osmcalpic 2022-04-04
Bologna Open Data Pax osmcalpic 2022-04-04 flag
Stuttgart Stuttgarter Stammtisch osmcalpic 2022-04-05 flag
Greater London Missing Maps London Mapathon osmcalpic 2022-04-05 flag
Berlin OSM-Verkehrswende #34 (Online) osmcalpic 2022-04-05 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-06
Tasking Manager Collective Meet Up – Option 1 osmcalpic 2022-04-06
Tasking Manager Collective Meet Up – Option 2 osmcalpic 2022-04-06
Heidelberg Heidelberg Int’l. Weeks Against Racism: Humanitarian Cartography and OpenStreetMap osmcalpic 2022-04-06 flag
Berlin 166. Berlin-Brandenburg OpenStreetMap Stammtisch osmcalpic 2022-04-08 flag
OSM Africa April Mapathon: Map Kenya osmcalpic 2022-04-09
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-11
臺北市 OpenStreetMap x Wikidata Taipei #39 osmcalpic 2022-04-11 flag
Washington MappingDC Mappy Hour osmcalpic 2022-04-13 flag
San Jose South Bay Map Night osmcalpic 2022-04-13 flag
20095 Hamburger Mappertreffen osmcalpic 2022-04-12 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-13
Michigan Michigan Meetup osmcalpic 2022-04-14 flag
OSM Utah Monthly Meetup osmcalpic 2022-04-14
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-18
150. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-04-19
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-04-19 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-04-19 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-20
Dublin Irish Virtual Map and Chat osmcalpic 2022-04-21 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, Sammyhawkrad, Strubbl, TheSwavu, UNGSC_Alessia13, alesarrett, derFred.

Profiling a Wikibase item creation on test.wikidata.org

21:54, Saturday, 02 2022 April UTC

Today I was in a Wikibase Stakeholder group call, and one of the discussions was around Wikibase importing speed, data loading, and the APIs. My previous blog post covering what happens when you make a new Wikibase item was raised, and we also got onto the topic of profiling.

So here comes another post looking at some of the internals of Wikibase, through the lens of profiling on test.wikidata.org.

The tools used to write this blog post for Wikimedia infrastructure are both open source, and also public. You can do similar profiling on both your own Wikibase, or for your requests that you suspect are slow on Wikimedia sites such as Wikidata.

Wikimedia Profiling

Profiling of Wikimedia sites is managed and maintained by the Wikimedia performance team. They have a blog, and one of the most recent posts was actually covering profiling PHP at scale in production, so if you want to know the details of how this is achieved give it a read.

Throughout this post I will be looking at data collected from a production Wikimedia request, by setting the X-Wikimedia-Debug header in my request. This header has a few options, and you can find the docs on wikitech.wikimedia.org. There are also browser extensions available to easily set this header on your requests.

I will be using the Wikimedia hosted XHGui to visualize the profile data. Wikimedia specific documentation for this interface also exists on wikitech.wikimedia.org. This interface contains a random set of profiled requests, as well as any requests that were specifically requested to be profiled.

Profiling PHP & MediaWiki

If you want to profile your own MediaWiki or Wikibase install, or PHP in general, then you should take a look at the mediawiki.org documentation page for this. You’ll likely want to use either Tideways or XDebug, but probably want to avoid having to setup any extra UI to visualize the data.

This profiling only covered the main PHP application (MediaWiki & Wikibase extension). Other services such as the query service would require separate profiling.

Making a profiled request

On test.wikidata I chose a not so random item (Q64) which happens to be a small version of the item for Berlin on Wikidata. It has a bunch of labels and a couple of statements.

I made a few modifications including removing the ID and changing all labels to avoid conflicts with the item that I had just copied and came up with some JSON ready to feed back into the API.

I navigated to the API sandbox for test.wikidata.org, and setup a request using wbeditentity which would allow me to create a fresh item. The options look something like this:

  • new = item
  • token = <Auto-fill the token using the UI button>
  • data = <json data that I am using to create an item>

With the XHGui option selected in the WikimediaDebug browser extension, I can hit the “Make request” button and should see my item created. The next page will also output the full runtime of the request from the client perspective, in this case roughly 3.6 seconds.

Finding the request in XHGui

Opening up XHGui I should find the POST request that I just made to test.wikidata somewhere near the top of the list of profiled requests.

Clicking on the Time column, the details page of the profiled request will load. You can find my request, id 61fc06c1fe879940dbdf4a38 (archive URL just in case).

Profiling overview

There are lots of gotchas when it comes to reading a profile such as this:

  • The fact that profiling is happening will generally make everything run slower
  • Profiling tends to overestimate the cost of calling functions, so small functions called many times will appear to look worse than they actually are
  • When IO is involved, such as caching (if the cache is cold), database writes, relying on the internet, or external services, any number of things can cause individual functions to become inflated
  • It’s hard to know what any of it means, without knowing what the classes and methods are doing

Next let’s look at some terms that it makes sense to understand:

  • Wall time: also called real-world time, is the actual time that a thing has taken to run. This includes things such as waiting for IO, or your CPU switching to low power mode.
  • CPU time: also called process time, is the ammount of time the CPU actaully spent processing instructions, excluding things such as time spent waiting for IO.
  • Self: also called exclusive, covers the resources spent in the function itself, excluding time spent in children.
  • Inclusive: covers the resources inclusive of all children

You can read some more about different types of time and inclusivity in profiling on the Time docs for blackfire.io.

Reading the profile

The full wall time of the request is 5,266,796 µs, or 5.2 seconds. This is significantly more than we saw from the perspective of the client making the API request. This is primarily because of the extra processing that MediaWiki and Wikibase does after sending a response back to the user.

The full CPU time of the request is 3,543,361 µs, or 3.5 seconds. We can infer from this that the request included roughly 1.7 seconds of time not doing computations. This could be waiting for databases, or other IO.

We can find likely candidates for this 1.7 seconds of time spent not computing by looking at the top of the function breakdown for wall time, and comparing CPU time.

Method Calls Self Wall Time Self CPU time Difference
Wikimedia\Rdbms\DatabaseMysqli::doQuery 809 1,003,729 µs 107,371 µs ~ 0.9 s
GuzzleHttp\Handler\CurlHandler::__invoke 1 371,120 µs 2,140 µs ~ 0.3 s
MultiHttpClient::runMultiCurl 15 280,697 µs 16,066 µs ~ 0.25 s
Wikimedia\Rdbms\DatabaseMysqli::mysqlConnect 45 68,183 µs 15,229 µs ~ 0.05 s

The 4 methods above have a combined difference between wall and CPU time of 1.5s, which accounts for most of the 1.7s we were looking for. The most expensive method call here is actually the single call to GuzzleHttp\Handler\CurlHandler::__invoke which spends 0.3s waiting, as all of the other methods are called many other times. On average Wikimedia\Rdbms\DatabaseMysqli::doQuery only spends 0.001s per method call in this request.

GuzzleHttp\Handler\CurlHandler::__invoke

Lets have a closer look at this GuzzleHttp\Handler\CurlHandler::__invoke call. We have a few options to see what is actually happening in this method call.

  1. Click on the method to see the details of the call, navigate up through the parents to find something that starts to make some sense
  2. Use the callgraph view (only shows methods that represent more than 1% of execution time)

I’ll choose number 2, and have included a screenshot of the very tall call graph for this method to the right.

At the top of this call we see MediaWiki\SyntaxHighlight\Pygmentize::highlight, which I was not expecting in such an API call.

Another level up we see WANObjectCache::fetchOrRegenerate which means that this was involved in a cache miss, and this data was regenerated.

Even further up the same tree I see SyntaxHighlight::onApiFormatHighlight.

This method is part of the SyntaxHighlight extension, and spends some time making the output of the API pretty for users in a web browser.

So what have I learnt here? Don’t profile with jsonfm. However using the API sandbox you don’t get this option, and thus bug T300909 was born.

Callgraph overview

Having the callgraph open we can see some of the most “expensive” methods in terms of inclusive wall time. You can also find these in the table view by sorting using the headings.

main() represents the bulk of the MediaWiki request (5.2s). This is split into ApiMain::execute taking ~3.4 seconds, and MediaWiki::doPostOutputShutdown taking ~1.7 seconds.

ApiMain::execute

This is where the “magic happens” so to speak. ~3.4 seconds of execution time.

The first bit of Wikibase code you will see in this call graph path is Wikibase\Repo\Api\ModifyEntity::execute. This is the main execute method in the base class that is used by the API that we are calling. Moving to this Wikibase code we also lose another ~0.4 seconds due to my syntax highlighting issue that we can ignore.

Taking a look at the next level of methods in the order they run (roughly) we see most of the execution time.

Method Inclusive Wall time Description
Wikibase\Repo\Api\ModifyEntity::loadEntityFromSavingHelper ~0.2 seconds Load the entity (if exists) that is being edited
Wikibase\Repo\Api\EditEntity::getChangeOp ~0.6 seconds Takes your API input and turns it into ChangeOp objects (previous post)
Wikibase\Repo\Api\ModifyEntity::checkPermissions ~0.3 seconds Checks the user permissions to perform the action
Wikibase\Repo\Api\EditEntity::modifyEntity ~1.8 seconds Take the ChangeOp objects and apply them to an Entity (previous post)
Wikibase\Repo\Api\EntitySavingHelper::attemptSaveEntity ~0.4 seconds Take the Entity and persist it in the SQL database

In the context of the Wikibase stakeholder group call I was in today that was talking about initial import speeds, and general editing speeds, what could I say about this?

  • Why spend 0.3 seconds of an API call checking permissions? Perhaps you are are doing your initial import in a rather “safe” environment. Perhaps You don’t care about all of the permissions that are checked?
  • Permissions are currnetly checked in 3 places for this call. 1) upfront 2) if we need to create a new item 3) just before saving. In total this makes up ~0.6 seconds according to the profiling.
  • Putting the formed PHP Item object into the database actually only takes ~0.15 seconds.
  • Checking the uniqueness of of labels and descriptions takes up ~1.2 seconds of validation of ChangeOps. Perhaps you don’t want that?

MediaWiki::doPostOutputShutdown

This is some of the last code to run as part of a request.

The name implies it, but to be clear this PostOutputShutdown method runs after the user has been served with a request. Taking a look back at the user-perceived time of 3.6 seconds, we can see that the wall time of the whole request (5.2s) minus this post output shutdown (1.7s) is roughly 3.5 seconds.

In relation to my previous post from the point of view of Wikibase, this is when most secondary data updates will happen. Some POST SEND derived data updates also happen in this step.

Closing

As I stated in the call, Wikibase was created primarily with the usecase of Wikidata in mind. There was never a “mass data load” stage for Wikidata requiring extremely high edit rates in order to import thoughts or millions of items. Thus interfaces and internals do not tend to this usecase and optimizations or configurations that could be made have not been made.

I hope that this post will trigger some questions around expensive parts of the editing flow (in terms of time) and also springboard more folks into looking at profiling of either Wikidata and test.wikidata, or their own Wikibase installs.

For your specific use case you may see some easy wins with what is outlined above. But remember that this post and specific profiling is only the tip of the iceberg, and there are many other areas to look at.

The post Profiling a Wikibase item creation on test.wikidata.org appeared first on addshore.

Altering a Gerrit change (git workflow)

21:54, Saturday, 02 2022 April UTC

I don’t use git-review for Gerrit interactions. This is primarily because back in 2012/2013 I couldn’t get git-review installed, and someone presented me with an alternative that worked. Years later I realized that this was actually the documented way of pushing changes to Gerrit.

As a little introduction to what this workflow looks, and a comparison with git-review I have created 2 overview posts altering a gerrit change on the Wikimedia gerrit install. I’m not trying to convince you, either way, is better, merely show the similarities/difference and what is happening behind the scenes.

Be sure to take a look at the other post “Altering a Gerrit change (git-review workflow)

I’ll be taking a change from the middle of last year, rebasing it, making a change, and pushing it back for review. Fundamentally the 2 approaches do the same thing, just one (git-review) requires an external tool.

1) Rebase

Firstly I’ll rebase the change by clicking the “Rebase” button in the top right of the UI. (But this step is entirely optional)

This will create a second patchset on the change, automatically rebase on the master branch if possible. (Or it would tell you to rebase locally).

2) Checkout

In order to checkout the change I’ll use the “Download” button on the right of the change near the changed files.

A dialogue will appear with a bunch of commands that I can copy depending on what I want to do.

As I want to alter the change in place, I’ll use the “Checkout” link.

This will fetch the ref/commit, and then check it out.

3) Change

I can now go ahead and make my change to the commit in my IDE.

The change is quite small and can be seen in the diff below.

Now I need to amend the commit that we fetched from gerrit.

If I want to change to commit message in some way I can do git commit --all --amend

If there is no need to change the commit message you can also pass the --no-edit option.

You’ll notice that we are still in a detached state, but that doesn’t matter too much, as the next step is pushing to gerrit, and once that has happened we don’t need to worry about this commit locally.

4) Push

In order to submit the altered commit back to gerrit, you can just run the following command


git push origin HEAD:refs/for/master

The response of the push will let you know what has happened, and you can find the URL back to the change here.

A third patchset now exists on the change on Gerrit.

Overview

The whole process looks like something like this.

Visualization created with https://git-school.github.io/
  1. A commit already exists on Gerrit that is currently up for review
  2. Clicking the rebase button will rebase this commit on top of the HEAD of the branch
  3. Fetching the commit will bring that commit on to your local machine, where you can now check it out
  4. Making a change and ammending the commit, will create a new commit locally
  5. You can then push this altered commit back to gerrit for review

If you want to know more about what Gerrit is doing, you can read the docs on the “gritty details”

Git aliases

You can use a couple of git aliases to avoid some of these slightly long commands


alias.amm=commit -a --amend alias.amn=commit -a --amend --no-edit alias.p=!f() { git push origin HEAD:refs/for/master; }; f

And you can level these up to provide you with a little more flexibility


alias.amm=commit -a --amend alias.amn=commit -a --amend --no-edit alias.main=!git symbolic-ref refs/remotes/origin/HEAD | sed 's@^refs/remotes/origin/@@' alias.p=!f() { git push origin HEAD:refs/for/$(git main)%ready; }; f alias.pd=!f() { git push origin HEAD:refs/for/$(git main)%wip; }; f
Code language: JavaScript (javascript)

You can read more about my git aliases in a previous post.

The post Altering a Gerrit change (git workflow) appeared first on addshore.

Tech/News/2022/13

09:22, Friday, 01 2022 April UTC

Other languages: Bahasa Indonesia, Deutsch, English, français, italiano, polski, suomi, čeština, русский, українська, עברית, العربية, فارسی, ไทย, 中文, 日本語, ꯃꯤꯇꯩ ꯂꯣꯟ

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

  • There is a simple new Wikimedia Commons upload tool available for macOS users, Sunflower.

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 29 March. It will be on non-Wikipedia wikis and some Wikipedias from 30 March. It will be on all wikis from 31 March (calendar).
  • Some wikis will be in read-only for a few minutes because of regular database maintenance. It will be performed on 29 March at 7:00 UTC (targeted wikis) and on 31 March at 7:00 UTC (targeted wikis). [1][2]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Announcing www.wikimediastatus.net

08:40, Thursday, 31 2022 March UTC

The Site Reliability Engineering (SRE) Team is pleased to announce that we’ve launched a new status page that we’ll update during major outages to Wikipedia and other Wikimedia projects.  You can find it at www.wikimediastatus.net.

By “major outages” we mean problems so severe that the general public or the media might notice – issues like wikis being very slow or unreachable for many users.  The status page will definitely be useful for the editor community and others directly involved in the projects, but it won’t be replacing forums for in-depth discussion like Technical Village Pumps or Phabricator – rather, it will supplement them, particularly as a place to check when the wikis are unreachable for you.

Of course, timeliness of information is really important during any large disruption to our services.  A key feature of the new status page is a set of five high-level metrics that give a look into the overall health and performance of the wiki environment.  We wanted a set of indicators that would show widespread issues as obvious deviations from normal, so that non-technical people could look at the graphs and say “ah, yes, something is wrong”.  Automatically publishing these metrics means that users can have some idea that something is wrong for everyone, not just themselves, even before SRE has had a chance to post an update.

The rate of errors served by Wikimedia, during and then just after an outage.

Wikimedia previously offered a status page, but it was difficult to read and sometimes inaccurate.  The SRE team officially sunset it in 2018.  We’re pleased to re-launch a status page that we think is easy to interpret by both technical and non-technical folks, and that we’re committing to keep accurate and up-to-date.

Since we didn’t want to use any of our existing hosting infrastructure for the status page – as the entire point is that it must remain accessible when our servers or network connections are broken – we’re using an externally-hosted commercial product.  Do note that the Atlassian privacy policy applies when visiting the status page.

If you’re seeking more background on the project, or curious about technical decisions and implementation details, the project page on Wikitech is a good place to start. There’s also the Phabricator task for the project, which is not only a good place to learn more but also to offer any feedback. We’ll also be checking in on our team’s Talk page from time to time, and of course we’re reachable at our usual team hangout on IRC, #wikimedia-sre on Libera.chat.

Get involved with #WikiForHumanRights 2022 by joining a live launch webinar on 14 April; participating in a month-long Wikipedia writing challenge; attending global events; and more.

Image: Adapted illustration by Jasmina El Bouamraoui and Karabo Poppy Moletsane for Wikipedia 20; photo by Neil Palmer (CIAT). Khat leaves. (CC BY-SA 2.0).

We are excited for the launch of the 2022 #WikiForHumanRights campaign, which calls on volunteers to write and improve content on Wikipedia about human rights, environmental health, and the range of communities impacted by environmental issues around the world.  The campaign will run from 14 April – 30 June.

For the first time last year, the UN officially recognized that having a clean, healthy, and sustainable environment is a human right. This key decision highlights how vital it is that people have access to information, including that found on Wikipedia, that helps them better understand their rights, and how to guard them. 

It also comes at a time when our planet is facing what the UN calls a “triple planetary crisis” of climate change, biodiversity loss, and pollution. Access to neutral, fact-based, and current information about climate change topics plays a critical role in our ability to not only understand these interconnected crises, but to mitigate their causes and adapt society to ensure a healthy future for all.  In turn, the role of Wikipedia has never mattered more. 

You can help respond to this global crisis by joining the #WikiForHumanRights campaign. We invite you to the following activities to learn more about the connection between human rights and the environmental crises, and how to share this information on Wikipedia.

Join the Launch Webinar!

 

On the 14th of April 14:00 UTC, we invite you to join the live event which will mark the launch of this year’s #WikiForHumanRights campaign.  

For the third time, the Wikimedia Foundation, in conjunction with the  United Nations Environment Programme (UNEP) and the United Nations Human Rights Office (UNOHCHR), will be hosting a special live conversation on how Wikipedia and other public knowledge play a role in understanding the human rights impacts of the triple planetary crisis, of biodiversity loss, climate change and pollution. 

We will have a guest panel of experts working with the UN system to advance the Right to a Healthy Environment. This will include the Special Rapporteur on Toxics and Human Rights, Dr. Marcos A. Orellana; UN Assistant Secretary General and Executive Secretary, Secretariat of the Convention on Biological Diversity Elizabeth Mrema; Women4Biodiversity director Mrinalini Rai; and youth activist Alejandro Daly. To join the event, register to participate here.

During the event, we will discuss the role of open knowledge platforms like Wikipedia in addressing the environmental crises. Panelists will share thoughts on how to make the connection between human rights and the environment more clear and accessible to the general public. 

The event will include live translations in Spanish, Portuguese, French, Chinese, and Arabic.  

Join the Global Writing Challenge

From 15 April to 15 May 2022, we are calling on Wikimedians everywhere to join the one-month global writing challenge. The challenge is aimed at bridging content gaps on human rights, environmental health, and the diverse people affected by the convergent environmental crises of climate change, pollution, and biodiversity loss. 

Join Community Events

As part of the campaign, there will be a myriad of synchronous activities and events happening across different regions. Events like, webinars, edit-a-thons, workshops, and local writing contests are taking place around the world, and we encourage you to find an event to join.

Earth Day

In commemoration of Earth Day, we will be hosting an editing workshop on  22 April  15:00UTC which will focus deeply on environment and human rights topic areas as we train participants on how to contribute to these topics on Wikipedia. 

The impact of the environmental crises cannot be understated. We all have a responsibility to ensure that everyone has access to neutral, fact-based, and current information about our shared Right to a Healthy Environment. Be part of this event by joining the zoom call.

Become part of the Human Rights Interest Group by signing up here!

The Human Rights Interest Group (HRIG) is a new group consisting of Wikimedians, individuals, and affiliated organizations interested in human rights as they pertain to Wikimedia projects. It is a movement-wide initiative aiming to create a safe space wherein the human rights concerns of our diverse communities can be heard, discussed, and addressed. 

The HRIG also aims to support the sharing of notable, reliable knowledge regarding human rights ideas, challenges, movements, and actors so our readers are informed of the state of human rights around the globe and in their own part of the world. The mission of the HRIG is to create an equitable and global approach to human rights in Wikimedia projects. Become a member by signing up here. You can read more about the group via meta

Share the story and learn more!  

Follow us on Twitter @Wikipedia,@Wikimedia, and @WikiSusDev or join the WikiForHumanRights Telegram Channel for event details and updates as the campaign continues through the 30th of June 2022 and check back for updates on the event page. Use the campaign hashtag #WikiForHumanRights to spread the word. You can also write to [email protected] if you have questions. 

Let’s talk about relationships — nothing gossip-y — but, rather, how does one thing relate to something else? On Wikidata we talk about relationships using something called properties. Part of the semantic triple (subject, predicate, object — or in Wikidata parlance, item, property, value), properties define how one thing relates to another on Wikidata. Is it a date? A name? A location? An image? An identifier. Here’s an example: for those in the northern hemisphere, we may be thankful that this post is being published as spring (Q1312) follows (P155) winter (Q1311). In that sentence ‘follows’ is the property that explains a relationship between ‘winter’ and ‘spring.’ The Wikidata community uses properties to define any kind of relationship between things. How many properties are there? I’m glad you asked.

As of March 2022, there are around 10,000 properties on Wikidata. Roughly 7,000 of these are external identifier properties (external identifier properties correspond to external collections — museums and libraries — whose collection includes a person, place or concept that also exists in Wikidata). That leaves around 3,000 properties the community uses to describe everything. You can read the discussion page of any property to orient yourself to that property, but there are other ways to understand how properties work too. Knowing where to start with those can be a little overwhelming. This post will profile properties about properties. If that sounds confusing, I get it! I’ll provide plenty of examples to contextualize everything and help you better understand how properties work.

Let’s learn through examples. As you discover properties, wouldn’t it be wonderful if there were a way to see the property in action to know if you were using it correctly? I have good news for you: there IS a property that does this. It’s called Wikidata Property Example (P1855 for super-fans). Click that link, and read all about property examples, including links to queries where you can see thousands of properties — with examples — in the wild on Wikidata. To review: there is a property on Wikidata that exists to give you examples of properties and how they work. Can you focus the query on a specific property? Yes. Can you get multiple examples for one query? Yes. Does the example I shared list all properties with examples? Yes! Is this is one of the best ways you can use properties like a pro? Absolutely.

Now that you’re familiar with one way to learn how a properties works, consider this: maybe the dataset you are working with requires you to describe an inverse relationship — or something that is the opposite of something else. If only there were a property that could express an inverse relationship! Well, today is your lucky day because there is a property called inverse property (P1696) that does exactly that. Please note, and this is very important, that this property describes other properties on Wikidata and their relationship is inverse to each other. For example if what you’re describing follows something or if it is followed by something else, the follows (P1696) property would be connected by the inverse property. Another example would be family relationships like a parent property (mother/father) and the child property is the property for you!

If you’re not talking about relationships (properties), but rather items — concepts, people, places — there is a completely different property called opposite of (P461) that the community uses to describe conceptual opposites. What’s a conceptual opposite? You can think of this as the opposite of the color white is the color black. The opposite of summer is winter. It’s okay if it’s a little confusing. Examples will help distinguish these two. To review: an inverse property is used exclusively with relationships — child/parent, capital/capital of, officeholder/position held, owner of/owned by. Another property “opposite of” is used exclusively to describe opposing concepts. Both of these properties are great for distinguishing related things on Wikidata. Let’s move on to another distinguished property.

You are nearly a property pro. You’re feeling confident, you understand how these descriptive connections relate to each other. The world is your oyster and you want to describe more things with more properties, more accuracy, and more precision. I love the enthusiasm. There’s a property that can help you do this: it suggests related properties on Wikidata. It’s called — you guessed it — related property Property (P1659). You can use this property to see other properties related to the one you are wondering about. You can think of it as a “see also” recommendation for properties. There are MANY location-type properties on Wikidata. Suppose you want to know all of the properties related to P131, which describes where things are geographically located? You could use “related properties” in a query to get a list: just like this! You can think of this property as a way to reveal how properties are related to similar properties. Using this property will help make you a super-describer on Wikidata. There’s nothing you can’t describe now!

These three properties (well, four) should reveal more about how to describe anything on Wikidata. Learning how to use properties on Wikidata is essential for maintaining data quality and usefulness of the data. It is also one of the most effect ways to learn how to query and write better queries. The more familiar you are with properties, the more you will get out of Wikidata (and likely any other dataset you’re working with whether it’s part of Wikidata or not). Now that you know more about properties on Wikidata, consider these two things:

  1. Wikidata will always require new properties. If one is missing, you can propose it here. Properties also change over time. If an existing property isn’t working for you (or has never worked for you), you can propose changes on the property’s discussion page. The only way Wikidata will ever be an equitable resource is if property usage and definitions work for all kinds of data and relationships in the world.
  2. The properties I’ve shared with you in this post themselves are incomplete. The community could always use more examples, better definitions, and other ways of describing things. Adding statements to items and properties is a very important way you can help improve these resources.

Stay tuned for more Wikidata property exploration posts here. And if you want to learn more, take the Wikidata Institute course I teach!

Benchmarking MediaWiki with PHPBench

12:14, Wednesday, 30 2022 March UTC

This post gives a quick introduction to a benchmarking tool, phpbench, ready for you to experiment with in core and skins/extensions.[1]

What is phpbench?

From their documentation:

PHPBench is a benchmark runner for PHP analagous to PHPUnit but for performance rather than correctness.

In other words, while a PHPUnit test will tell you if your code behaves a certain way given a certain set of inputs, a PHPBench benchmark only cares how long that same piece of code takes to execute.

The tooling and boilerplate will be familiar to you if you've used PHPUnit. There's a command-line runner at vendor/bin/phpbench, benchmarks are discoverable by default in tests/Benchmark, a configuration file (benchmark.json) allows for setting defaults across all benchmarks, and the benchmark tests classes and tests look pretty similar to PHPUnit tests.

Here's an example test for the Html::openElement() function:

namespace MediaWiki\Tests\Benchmark;

class HtmlBench {

        /**
        * @Assert("mode(variant.time.avg) < 85 microseconds +/- 10%")
        */
        public function benchHtmlOpenElement() {
                \Html::openElement( 'a', [ 'class' => 'foo' ] );
        }
}

So, taking it line by line:

  • class HtmlBench (placed in tests/Benchmark/includes/HtmlBench.php) – the class where you can define the benchmarks for methods in a class. It would make sense to create a single benchmark class for a single class under test, just like with PHPUnit.
  • public function benchHtmlOpenElement() {} – method names that begin with bench will be executed by phpbench; other methods can be used for set-up / teardown work. The contents of the method are benchmarked, so any set-up / teardown work should be done elsewhere.
  • @Assert("mode(variant.time.avg) < 85 microseconds +/- 10%") – we define a phpbench assertion that the average execution time will be less than 85 microseconds, with a tolerance of +/- 10%.

If we run the test with composer phpbench, we will see that the test passes. One thing to be careful with, though, is adding assertions that are too strict – you would not want a patch to fail CI because the assertion for execution was not flexible enough (more on this later on).

Measuring performance while developing

One neat feature in PHPBench is the ability to tag current results and compare with another run. Looking at the HTMLBench benchmark test from above, for example, we can compare the work done in rMW5deb6a2a4546: Html::openElement() micro-optimisations to get before and after comparisons of the performance changes.

Here's a benchmark of e82c5e52d50a9afd67045f984dc3fb84e2daef44, the commit before the performance improvements added to Html::openElement() in rMW5deb6a2a4546: Html::openElement() micro-optimisations

❯ git checkout -b html-before-optimizations e82c5e52d50a9afd67045f984dc3fb84e2daef44 # get the old HTML::openElement code before optimizations
❯ git review -x 727429 # get the core patch which introduces phpbench support
❯ composer phpbench -- tests/Benchmark/includes/HtmlBench.php --tag=original

And the output [2]:

Note that we've used --tag=original to store the results. Now we can check out the newer code, and use --ref=original to compare with the baseline:

❯ git checkout -b html-after-optimizations 5deb6a2a4546318d1fa94ad8c3fa54e9eb8fc67c # get the new HTML::openElement code with optimizations
❯ git review -x 727429 # get the core patch which introduces phpbench support
❯ composer phpbench -- tests/Benchmark/includes/HtmlBench.php --ref=original --report=aggregate

And the output [3]:

We can see that the execution time roughly halved, from 18 microseconds to 8 microseconds. (For understanding the other columns in the report, it's best to read through the Quick Start guide for phpbench.) PHPBench can also provide an error exit code if the performance decreased. One way that PHPBench might fit into our testing stack would be to have a job similar to Fresnel, where a non-voting comment on a patch alerts developers whether the PHPBench performance decreased in the patch.

Testing with extensions

A slightly more complex example is available in GrowthExperiments (patch). That patch makes use of setUp/tearDown methods to prepopulate the database entries needed for the code being benchmarked:

/**
 * @BeforeMethods ("setUpLinkRecommendation")
 * @AfterMethods ("tearDownLinkRecommendation")
 * @Assert("mode(variant.time.avg) < 20000 microseconds +/- 10%")
 */
public function benchFilter() {
        $this->linkRecommendationFilter->filter( $this->tasks );
}

The setUpLinkRecommendation and tearDownLinkRecommendation methods have access to MediaWikiServices, and generally you can do similar things you'd do in an integration test to setup and teardown the environment. This test is towards the opposite end of the spectrum from the core test discussed above which looks at Html::openElement(); here, the goal is to look at a higher level function that involves database queries and interacting with MediaWiki services.

What's next

You can experiment with the tooling and see if it is useful to you. Some open questions:

  • do we want to use phpbench? or are the scripts in maintenance/benchmarks already sufficient for our benchmarking needs?
  • we already have a benchmarking tools in maintenance/benchmarks that extend a Benchmarker class; would it make sense to convert these to use phpbench?
  • what are sensible defaults for "revs" and "iterations" as well as retry thresholds?
  • do we want to run phpbench assertions in CI?
    • if yes, do we want assertions about using absolute times (e.g. "this function should take less than 20 ms") or relative assertions ("patch code is within 10% +/- of old code)
    • if yes, do we want to aggregate reports over time, so we can see trends for the code we benchmark?
    • should we disable phpbench as part of the standard set of tests run by Quibble, and only have it run as a non-voting job like Fresnel?

Looking forward to your feedback! [4]


[1] thank you, @hashar, for working with me to include this in Quibble and roll out to CI to help with evaluation!

[2]

> phpbench run --config=tests/Benchmark/phpbench.json --report=aggregate 'tests/Benchmark/includes/HtmlBench.php' '--tag=original'
PHPBench (1.1.2) running benchmarks...
with configuration file: /Users/kostajh/src/mediawiki/w/tests/Benchmark/phpbench.json
with PHP version 7.4.24, xdebug ✔, opcache ❌

\MediaWiki\Tests\Benchmark\HtmlBench

    benchHtmlOpenElement....................R1 I1 ✔ Mo18.514μs (±1.94%)

Subjects: 1, Assertions: 1, Failures: 0, Errors: 0
Storing results ... OK
Run: 1346543289c75373e513cc3b11fbf5215d8fb6d0
+-----------+----------------------+-----+------+-----+----------+----------+--------+
| benchmark | subject              | set | revs | its | mem_peak | mode     | rstdev |
+-----------+----------------------+-----+------+-----+----------+----------+--------+
| HtmlBench | benchHtmlOpenElement |     | 50   | 5   | 2.782mb  | 18.514μs | ±1.94% |
+-----------+----------------------+-----+------+-----+----------+----------+--------+

[3]

> phpbench run --config=tests/Benchmark/phpbench.json --report=aggregate 'tests/Benchmark/includes/HtmlBench.php' '--ref=original' '--report=aggregate'
PHPBench (1.1.2) running benchmarks...
with configuration file: /Users/kostajh/src/mediawiki/w/tests/Benchmark/phpbench.json
with PHP version 7.4.24, xdebug ✔, opcache ❌
comparing [actual vs. original]

\MediaWiki\Tests\Benchmark\HtmlBench

    benchHtmlOpenElement....................R5 I4 ✔ [Mo8.194μs vs. Mo18.514μs] -55.74% (±0.50%)

Subjects: 1, Assertions: 1, Failures: 0, Errors: 0
+-----------+----------------------+-----+------+-----+---------------+-----------------+----------------+
| benchmark | subject              | set | revs | its | mem_peak      | mode            | rstdev         |
+-----------+----------------------+-----+------+-----+---------------+-----------------+----------------+
| HtmlBench | benchHtmlOpenElement |     | 50   | 5   | 2.782mb 0.00% | 8.194μs -55.74% | ±0.50% -74.03% |
+-----------+----------------------+-----+------+-----+---------------+-----------------+----------------+

[4] Thanks to @zeljkofilipin for reviewing a draft of this post.

Tech/News/2022/12

08:38, Wednesday, 30 2022 March UTC

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available : Bahasa Indonesia, Deutsch, français, italiano, magyar, polski, português, português do Brasil, svenska, čeština, русский, українська, עברית, العربية, 中文, 日本語, 한국어

New code release schedule for this week

  • There will be four MediaWiki releases this week, instead of just one. This is an experiment which should lead to fewer problems and to faster feature updates. The releases will be on all wikis, at different times, on Monday, Tuesday, and Wednesday. You can read more about this project.

Recent changes

Future changes

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe on wiki.

On Thursday March 24th the trilogue negotiators concluded discussions, dramatic at times, over the Digital Markets Act. The compromise includes some gains on interoperability, a potential changemaker in the online intermediation. What to expect? Where not to hold your breath? We parse out the practical consequences of the trilogue outcome on interoperability.

Winding road to the final compromise

Interoperability has been a point of contention since the European Commission published their first draft in December 2020. The EC drafted it narrowly, obligating gatekeepers to offer interoperability to the so-called ancillary services, like payment or identification services, that wish to operate within closed ecosystems. IMCO Rapporteur MEP Andreas Schwab followed this approach in his draft report. 

That didn’t go well with many MEPs who were disappointed with the fact that an opportunity to open up walled gardens of online intermediation had not been exploited. Many amendments and heated debates later, the final EP report provided that interconnection should be also possible between messaging apps and services (the so-called number independent interpersonal communication services) as well as social networks.

Since the Council’s approach was focused on refining the business-to-business side of interoperability, the trilogues didn’t show much promise in securing the extension of the EC’s scope. Somehow, under pressure of time the delegation of MEPs managed to negotiate some gains that keep the spirit if not the letter of the EP mandate.

Basic rules of becoming interoperable under DMA

As originally devised, the final DMA compromise envisions that only services designated as gatekeepers will be obliged to create conditions for interoperability with other services. This possibility will be, however, accessible on request – meaning that there won’t be any obligation to make a given service permanently and publicly accessible. A gatekeeper will have 3 months to “render requested basic functionalities operational”. 

Contrary to the original proposal by the European Commission, the compromise includes a definition of the functionality enabling opening digital ecosystems that so far have been closed: 

‘Interoperability’ means the ability to exchange information and mutually use the information which has been exchanged through interfaces or other solutions, so that all elements of hardware or software work with other hardware and software and with users in all the ways in which they are intended to function.

The definition is pretty straightforward and covers potential applications of frictionless communication exchange broadly, relating to both hardware and software. It refers to both the provisions already outlined by the European Commission in the original draft and to those worked out during the trilogues. The latter, as explained below in more detail, is an improvement as it encompasses some services that are then accessible to individual users and groups of individuals (the so-called end users).

End users will be able to freely decide whether they want to make use of the interconnected services or rather stay with the provider they had originally chosen. A service that wants to connect with a gatekeeper will need to do so within the same level of security. This means that if a gatekeeper offers end-to-end encryption, a connecting service will also need to provide it.

Messaging

End-to-end text messaging between two end users will be one of the basic functionalities that will become interoperable on request. Within two years after designation, the gatekeeper will also need to make available text messaging within groups of individual users. 

Similarly, sharing of images, voice messages and video attachments between two individuals will be the key available function that, after two years from becoming a gatekeeper, will need to be extended to groups.  

Calling

Voice and video calls will not be immediately available after gatekeepers are designated. They will have 4 years to create technical and operational conditions to make end-to-end video or voice calls available between two individuals and groups. 

Social networking? Maybe…

Social networking should also be one of the functionalities that gatekeepers should make interoperable, but the negotiators were not keen on agreeing to proposals made by the European Parliament team. The obligation for gatekeepers who offer social networking services did not make it into the final text. 

Fortunately, the DMA has a revision clause that binds the European Commission to evaluate the regulation and report to the European Parliament and the Council of the EU. The negotiators agreed to include an assessment if social networking services should be included in the scope of interoperability provisions in the revision clause. So there is no promise, but at least the EC has to look into the issue again and produce some evidence for – or against – extending the scope. 

The art of war compromise

The negotiations over interoperability were indeed dramatic. Apparently the French Presidency was unsure of its mandate from the Council to negotiate extended interoperability provisions and hesitated to negotiate beyond what the Council had included in their draft. Even worse, the European Commission authored a non-paper full of simplified claims pointing at how interoperability is not a feasible solution either for messaging or social networking.

“With the French Presidential elections looming, the incentive to wrap up what is possible to wrap up became greater. This was the chance for the Parliamentary negotiators to defend the mandate bestowed on them by the EP. “

Fortunately for the DMA, the negotiations over DSA were dragging. It became apparent that despite bold promises to deliver the outcome on the two regulations, the French Presidency won’t be able to assign two successes to its account. With the French Presidential elections looming, the incentive to wrap up what is possible to wrap up became greater. This was the chance for the Parliamentary negotiators to defend the mandate bestowed on them by the EP. 

Hence the result that goes along the demarcation line between what the EP wanted and what the Council agreed to give. Yes, end users will enjoy more interconnectivity, but only if service providers request it from the gatekeepers. Yes, private one-on-one messaging will be available first via text and sharing of images, audio and video attachments, but groups will need to wait two years to benefit from that. Yes, calling and video calling others will be possible but within 4 years. Yes, social networking could become interoperable but only if the European Commission sees it as necessary to ensure contestability – and that the soonest 3 years after the regulation enters into force. 

No doubt, the EP delegation fought hard and used available opportunities to secure what they could regarding interoperability. Ideally it would be better and extended to social networking but considering the pressure from the Council and the lobbying of the Big Tech on the issue, we couldn’t realistically count on more.

Stay tuned for the analysis of other provisions of the Digital Markets Act as adopted by the trilogues negotiators!

My Home Assistant Music Cube

00:02, Monday, 28 2022 March UTC

Last year, I spent $17 on an Aqara cube, and it’s been one of my best purchases for enjoyment per dollar spent.

I control my multi-room audio using a gyroscopic gesture-recognition cube -- yes, this basically makes me Iron Man.

The Aqara cube is a three-inch square plastic cube that sends gestures over Zigbee to a cheap off-the-shelf dongle.

By pairing this cube with Home Assistant, I have a three-dimensional button with 45 unique interactions to control whatever I want.

And over the last six months, I’ve used it to control a small fleet of antiquated streaming devices to help me discover new music.

🎭 The Tragedy of the Logitech Squeezebox

The Logitech Squeezebox is a bygone streaming device that was too beautiful for this world. Logitech snuffed the Squeezebox in 2012.

But because others share my enthusiasm for Squeezeboxes, there’s still hope. The second-hand market persists. And there are wonderful nerds cobbling together Squeezeboxes from Raspberry Pis.

Logitech Squeezebox fans

I built a DIY Squeezebox from a Pi Zero Pimoroni PirateRadio kit and Squeezelite software.

I blanket my humble abode in music by combining a DIY PirateRadio, a Squeezebox Boom, and a Squeezebox Touch.

My Dockerized Logitech Media Server perfectly synchronizes these three devices. Music from Spotify or WQXR is seamless when you walk from bedroom to kitchen to dining room.

🏴‍☠️ Pimoroni PirateRadio

Home Assistant is ✨magic✨

Home Assistant is open-source home automation software, and it’s the only IoT software I don’t find myself screaming at regularly.

And, of course, there’s a Logitech Squeezebox integration for Home Assistant. The integration lets you use Logitech Media Server’s (somewhat esoteric) API to control your devices from Home Assistant.

Home Assistant Squeezebox Lovelace Card

I also use a community-made Home Assistant Blueprint that automates each of the cube’s 45 unique gestures.

Mi Magic Cube in Home Assistant

Currently, since my mental stack is tiny, I only use four gestures:

  1. Shake: Turn on all players, and start playing a random album from Spotify (that’s right, album – I’m old enough to yearn for the halcyon days of Rdio).
  2. Double-tap: Turn off all players.
  3. Flip: Next track.
  4. Twist: Twist right for volume up; twist left for volume down – like a volume knob.

🧐 Why would anyone do this?

In a 2011 article, “A Brief Rant on the Future of Interaction Design,” Brett Victor describes touchscreens as “pictures under glass.” I loathe pictures under glass.

It’s impossible to use a device with a touchscreen without looking at it. And touchscreen interaction is slow – traversing a menu system is all point-and-click, there are no shortcuts.

Another alternative is control via smart speakers – devices literally straight out of a dystopian novel.

While the smart speaker is the closest thing to a ubiquitous command-line interface in everyday use, I’m too weirded-out to have a smart speaker in my house.

I’ve opted for a better way: shake a cube and music appears.

The cube is a pleasant tactile experience – shake it, tap it, spin it – its a weighty and fun fidget toy. Its design affords instant access to all its features – there is no menu system to dig through.

The cube is frictionless calm technology and it’s behaved beautifully in the background of my day-to-day for months.

Tech News issue #13, 2022 (March 28, 2022)

00:00, Monday, 28 2022 March UTC
previous 2022, week 13 (Monday 28 March 2022) next

weeklyOSM 609

10:06, Sunday, 27 2022 March UTC

15/03/2022-21/03/2022

Mapping campaigns

  • UN Mappers are going to map building footprints, supporting UNSMIL to ensure peace in Libya, on Wednesday 30 March from 06:00 UTC until 16:00 UTC. The tasks will be distributed via the tasking manager.
  • Andrés Gómez Casanova announced that the note-a-thon (a group activity solving local notes in Latin American countries) will be held on the first Saturday of each month from now on. The note-a-thon is registered as an organised activity. See the details (es) > en of this activity on the wiki. Events are coordinated (es) > en on Meetup.

Mapping

  • bgo_eiu reported on his unexpected insights gained while trying to improve the mapping of Baltimore’s public transit in OSM.
  • Counter-mapping the City was a two day virtual conference about using mapping as a means of promoting the struggles of marginalised people in the Philippines.
  • dcapillae has started (es) > en an initiative with authorised mappers to improve the positions of recently imported recycling containers in Malaga (Spain). To support this he has created a special style usable in JOSM showing the different types of goods suitable for recycling.
  • User kempelen pointed out (hu) > en, once again, the importance of separating landuse=* and area=* from highway=*.
  • Voting is open until Tuesday 5 April for artwork_subject=sheela-na-gig, for mapping Sheela-na-gigs, stone carvings depicting nude women exposing their genitals, found on churches, castles, other walls and in museums.
  • Tjuro has finished their micro-mapping of the rebuilt station square north in Zwolle.

Community

  • Edoardo Neerhut is looking for work for Aleks (@Approksimator), who has lost his income due to the war in Ukraine.
  • OSMF Japan and Microsoft Corporation will hold (ja) a workshop (ja) > en on Soundscape and OSM on Wednesday 30 March. Soundscape is a 3D voice guidance application based on OSM data.
  • Amanda McCann’s February diary is online.
  • The Communication Working Group of OSMF has officially announced the new discussion forum for OSM (we reported earlier) in a blog post.
  • For International Women’s Day GeoladiesPH had a 3 hour mapathon, focusing on women, called #BreakTheBiasedMap.
  • Chinese mapper 喵耳引力波 wrote (zhcn) > en a diary entry, in which they list all of the Chinese mappers who gathered to map and refine the mountainous terrain and roads after the MU5735 air crash, and guesses this may have been due to modern media allowing mappers to follow breaking news.

Events

  • The 8th State of the Map France (SotM-Fr) will take place at the University of Nantes from 10 to 12 June. The call for papers is open (fr) until Thursday 31 March.
  • The State of the Map Working Group revealed the logo for SotM 2022 in Firenze (Italy). They also published a number of key dates as follows:
    • Monday 25 April 2022: 23:59:59 UTC: Deadline for talk and workshop submissions
    • June 2022: Talk video production (test video and final video)
    • August 2022: Lightning talk video production
    • 19 to 21 August 2022: State of the Map.

Education

  • Corinna John described (de) > en in her blog how to create a ‘photo map’ using QGIS.
  • Youthmappers published their project results about how to connect volunteered geographic information and crowdsourced spatial data with government cartographers and geographers to better serve the public across the Americas.

Maps

  • [1] Sven Geggus has improved his OpenCampingMap. Now sanitary_dump_station, water_point and drinking_water are also displayed at zoom level 19.
  • Reporters from franceinfo used (fr) > en OpenStreetMap’s data for an article about the cost of commuting after the recent rise in oil prices.
  • Alex Wellerstein presented his OSM based ‘nukemap’, which allows you to visualise the impact of a simulated nuclear detonation. Start with a tactical bomb (10 kt) and try the advanced options!

Software

  • lwn.net reported that there is an OpenStreetMap viewer for Emacs.
  • Organic Maps is participating in the Google Summer of Code 2022. Six ideas for projects are already available.
  • Kevin is the new maintainer of the Awesome OpenStreetMap list. He invites you to help make this list more awesome.

Releases

  • Last week we reported on release 17.0.4 of Vespucci. In it there was a problem with the default templates that can be solved by manually updating the templates.

Upcoming Events

Where What Online When Country
Perth Social Mapping Online osmcalpic 2022-03-27 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-03-28
Bremen Bremer Mappertreffen (Online) osmcalpic 2022-03-28 flag
San Jose South Bay Map Night osmcalpic 2022-03-30 flag
Ville de Bruxelles – Stad Brussel Virtual OpenStreetMap Belgium meeting osmcalpic 2022-03-29 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-03-30
Tucson State of the Map US osmcalpic 2022-04-01 – 2022-04-03 flag
Hlavní město Praha Missing Maps GeoNight MSF CZ online mapathon 2022 #1 osmcalpic 2022-04-01 flag
Burgos Evento OpenStreetMap Burgos (Spain) 2022 osmcalpic 2022-04-01 – 2022-04-03 flag
Região Geográfica Imediata de Teófilo Otoni Mapathona na Cidade Nanuque – MG -Brasil – Edifícios, Estradas, Pontos de Interesses e Área Verde osmcalpic 2022-04-02 – 2022-04-03 flag
Bogotá Distrito Capital – Municipio Notathon en OpenStreetMap – resolvamos notas de Latinoamérica osmcalpic 2022-04-02 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-04
OSMF Engineering Working Group meeting osmcalpic 2022-04-04
Stuttgart Stuttgarter Stammtisch osmcalpic 2022-04-05 flag
Greater London Missing Maps London Mapathon osmcalpic 2022-04-05 flag
Berlin OSM-Verkehrswende #34 (Online) osmcalpic 2022-04-05 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-06
Tasking Manager Collective Meet Up – Option 1 osmcalpic 2022-04-06
Tasking Manager Collective Meet Up – Option 2 osmcalpic 2022-04-06
Berlin 166. Berlin-Brandenburg OpenStreetMap Stammtisch osmcalpic 2022-04-08 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-11
臺北市 OpenStreetMap x Wikidata Taipei #39 osmcalpic 2022-04-11 flag
Washington MappingDC Mappy Hour osmcalpic 2022-04-13 flag
Hamburg Hamburger Mappertreffen osmcalpic 2022-04-12 flag
San Jose South Bay Map Night osmcalpic 2022-04-13 flag
Open Mapping Hub Asia Pacific OSM Help Desk osmcalpic 2022-04-13
OSM Utah Monthly Meetup osmcalpic 2022-04-14
Michigan Michigan Meetup osmcalpic 2022-04-14 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, Strubbl, TheSwavu, derFred, Can.

Semantic MediaWiki 4.0.1 released

20:23, Thursday, 24 2022 March UTC

March 24, 2022

Semantic MediaWiki 4.0.1 (SMW 4.0.1) has been released today as a new version of Semantic MediaWiki.

It is a maintenance release providing a bug fixes and translation updates. Please refer to the help pages on installing or upgrading Semantic MediaWiki to get detailed instructions on how to do this.

Wikipedia article, or essay?

16:14, Thursday, 24 2022 March UTC

In Wiki Education’s Wikipedia Student Program, college and university instructors assign their students to edit Wikipedia as part of the coursework. For most courses, this replaces a research paper or the literature review section of a longer analytical paper, related to the course topic. But for some courses, Wikipedia also becomes an object of study for students.

That’s the case for New York University Clinical Associate Professor David Cregar, who teaches in the Expository Writing Program. His course, “We are not in a post-fact world”: Wikipedia & the Construction of Knowledge, focuses on both having students contribute to Wikipedia and contextualizing Wikipedia in the broader knowledge landscape. Student Audrey Yang fulfilled the second part of the assignment in a creative way — by creating what looks like a Wikipedia article, called “Wikipedia & the Structure of Knowledge” (but, it notes, “From Fake Wikipedia, not a real encyclopedia (but still free)”). Audrey’s reflection of how Wikipedia compares to the Library of Babel contains all the hallmarks of a Wikipedia article: a table of contents, edit buttons on section headers, citations, wikilinks, and even a talk page, complete with notes from her thought process as she wrote the essay!

This example of an end-of-term reflective essay is particularly fun and creative. Download this PDF to see Audrey’s work. And many thanks to Audrey and Professor Cregar for sharing it with us!

Today, the European Court of Human Rights announced that it is dismissing the Wikimedia Foundation’s 2019 petition to lift the block of Wikipedia in Turkey. The case was dismissed because access to Wikipedia was restored by the Turkish government in January 2020 and the block was already determined to be a human rights violation in the Turkish Constitutional Court’s December 2019 ruling. The Court believes that the Turkish Constitutional Court can for now effectively address future problems related to violations of free expression online. 

“We respect the Court’s decision given that our primary goal of restoring access to Wikipedia in Turkey has already been achieved,” said Stephen LaPorte, Associate General Counsel of the Wikimedia Foundation. “We thank the Court for their attention on this issue and would like to once again celebrate that Turkey’s more than 80 million people have unrestricted access to Wikipedia.” 

In its decision, the Court emphasized that governments must acknowledge (explicitly or in substance) violations to the European Convention on Human Rights and then address the issue accordingly. The Court did find that the Turkish government had effectively done so in this case by restoring access to Wikipedia after the Turkish Constitutional Court’s 2019 ruling and, in combination with its other past cases, had outlined clearer criteria for addressing website blocking in the future. Additionally, the Court provided guidance that the over two years taken by the Turkish Constitutional Court to address the violation may in the future be seen as an excessive delay for governments to take action in cases of website blocking.

The ruling comes following a petition filed with the Court in April 2019 by the Wikimedia Foundation, the nonprofit that operates Wikipedia. Established in 1959, the Court is the international human rights court which enforces the European Convention on Human Rights. Turkey is a long-standing party to the Convention.

In April 2017, the Turkish government blocked all language editions of Wikipedia in Turkey. The move denied more than 80 million people in the country access to free knowledge and prevented them from sharing their history, culture, and experiences with the world. After exhausting domestic efforts to restore access, the Wikimedia Foundation turned to the European Court of Human Rights in April 2019. The Foundation contended that the blanket ban of Wikipedia violated fundamental freedoms, including the right to freedom of expression, as guaranteed by Article 10 of the European Convention. The case was granted priority by the Court.

In January 2020, access to Wikipedia was restored in Turkey, following a ruling by the Constitutional Court of Turkey acknowledging in substance that the block of Wikipedia violated both the Turkish Constitution and the European Convention on Human Rights. The European Court of Human Rights was asked to evaluate the Turkish law that was used as a basis to block access to Wikipedia and examine whether that law violated free expression.

The European Court of Human Rights’ decision comes at a time when access to knowledge continues to be under threat around the world, including in Russia where authorities recently demanded the removal of Wikipedia content related to the Russian government’s invasion of Ukraine. The Wikimedia Foundation will continue to defend the right of everyone to freely access and participate in knowledge.

Wikipedia is built on the idea that knowledge belongs to everyone—that information should be freely accessible, without restriction, and that everyone should be able to freely participate in the creation of our historical record. 

Today, Wikipedia is read more than 6,000 times every second by people around the globe. Its more than 55 million articles across over 300 languages provide knowledge across a wide array of topics, from major current events to the history of ancient civilizations. More than 280,000 volunteers around the world work together to write, edit, and update articles in real time, using reliable sources to verify the facts. Through this open, editorial process, articles become more neutral and reliable over time, ensuring topics are covered from all sides and established in fact-based sources.  

Since the block has been lifted in Turkey, Turkish Wikipedia is thriving again. It has grown to include more than 474,000 articles, edited by a community of 8,000 Turkish-speaking volunteers every month. Wikipedia and Wikimedia projects are viewed more than 150 million times a month in Turkey. Thanks to efforts led by a local group of volunteers, students across several Turkish universities are contributing Wikipedia articles as part of their course requirements, and museum professionals in Turkey are learning how to add their knowledge to Wikipedia.

The Wikimedia Foundation is represented by Can Yeginsu of 3 Verulam Buildings in London, and Gönenç Gürkaynak at ELIG Gurkaynak Attorneys-at-Law in Istanbul. We wish to express our enthusiastic appreciation for their counsel over the last five years.

Avrupa İnsan Hakları mahkemesi bugün, Vikipedi’ye erişim engelinin kaldırılması için Wikimedia Vakfı’nın 2019 tarihli dilekçesini reddettiğini duyurdu. Dava, Türk hükûmeti tarafından Vikipedi’ye erişimin Ocak 2020’de tekrar sağlanmış olması ve engelin bir insan hakları ihlali olduğunun Türk Anayasa Mahkemesi’nin Aralık 2019 tarihli kararında zaten belirlenmiş olması nedeniyle reddedildi. Mahkeme, Türk Anayasa Mahkemesi’nin artık çevrimiçi ifade özgürlüğü ihlalleri ile ilgili gelecekteki sorunları etkin biçimde ele alabileceğine inanmaktadır.

Wikimedia Vakfı Baş Hukuk Müşaviri Stephen LaProte, “Türkiye’de Vikipedi’ye erişimin tekrar sağlanması ana hedefimize zaten ulaşılmış olduğunu göz önüne alarak Mahkemenin kararını saygıyla karşılıyoruz” dedi. “Mahkemeye bu konuya gösterdiği ilgiden dolayı teşekkür ediyor ve Türkiye’nin 80milyondan fazla insanını, Vikipedi’ye sınırsız erişim sağlayabilmlerinden ötürü tekrar kutlamak istiyoruz.”

Mahkeme kararında, hükûmetlerin Avrupa İnsan Hakları Sözleşmesi ihlallerini (gerek açık gerek örtük ihlaller) tanımaları ve ardından konuyu buna göre ele almaları gerektiğini vurguladı. Mahkeme, bu davada Türk hükûmetinin bunu, Türk Anayasa Mahkemesinin 2019 tarihli kararının ardından Vikipedi’ye erişimi yeniden sağlayarak etkin biçimde yapmış olduğunu ve geçmişteki diğer davalarla birlikte, gelecekte web sitesi engelleme konusunu ele almada daha net kriterler belirlediğini tespit etti.

Karar, Vikipedi sitesini işleten, kâr amacı gütmeyen bir kuruluş olan Wikimedia Vakfı tarafından Nisan 2019’da Mahkeme’ye sunulan bir dilekçenin ardından geldi. 1959’da kurulan Mahkeme, Avrupa İnsan Hakları Sözleşmesi’ni uygulayan uluslararası insan hakları mahkemesidir. Türkiye, uzun süredir Sözleşme’ye taraftır.

Türk hükûmeti Nisan 2017’de Vikipedi’nin tüm dil sürümlerine Türkiye’den erişimi engelledi. Bu, ülkedeki 80 milyondan fazla insanın özgür bilgiye erişimesini ve tarihlerini, kültürlerini ve deneyimlerini dünyayla paylaşmalarını engelledi. Wikimedia Vakfı, erişimi yeniden sağlamak için başvurduğu iç hukuk yolları tükendikten sonra Nisan 2019’da Avrupa İnsan Hakları Mahkemesi’ne başvurdu. Vakıf, Vikipedi’nin tamamına erişimin engellenmesinin, Avrupa İnsan Hakları Sözleşmesi’nin 10. maddesi tarafından garanti edilen ifade özgürlüğü hakkı da dahil olmak üzere temel özgürlükleri ihlal ettiğini iddia etti. Mahkeme tarafından dava öncelikli olarak ele alındı.

Ocak 2020’de Türkiye Anayasa Mahkemesi’nin, Vikipedi’ye erişim engellinin gerek Türk Anayasasını gerekse Avrupa İnsan Hakları Sözleşmesini ihlal ettiğini kabul eden kararının ardından, Türkiye’de Vikipedi’ye yeniden erişim sağlandı. Avrupa İnsan Hakları Mahkemesi’nden Vikipedi’ye erişim engelinin dayandırıldığı Türk yasasını değerlendirmesi ve bu yasanın ifade özgürlüğünü ihlal edip etmediğini incelemesi istendi.

Avrupa İnsan Hakları Mahkemesi’nin kararı, dünyada bilgiye erişimin, kısa süre öne yetkililerin Rus hükûmetinin Ukraynayı işgali hakkındaki Vikipedi içeriğinin kaldırılmasını talep ettikleri Rusya da dahil olmak üzere, tehdit altında olmaya devam ettiği bir zamanda geldi. Wikimedia Vakfı, bilgiye herkesin özgürce erişme ve katılma hakkını savunmaya devam edecektir.

Vikipedi, bilginin herkese ait olduğu fikri üzerine inşa edilmiştir; bilgi, kısıtlama olmaksızın özgürce erişilebilir olmalıdır ve tarihimizin kaydınıın tutulmasına herkesin serbestçe katılabilmesi gerekir.

Günümüzde Vikipedi, dünyanın dört bir yanındaki insanlar tarafından saniyede 6.000’den fazla kez okunmaktadır. 300’den fazla dilde 55 milyondan fazla maddesi, önemli güncel olaylardan eski uygarlıkların tarihine kadar çok çeşitli konularda bilgi sağlıyor. Dünyada 280.000’den fazla gönüllü, olguları doğrulamak için güvenilir kaynakları kullanarak maddeleri anlık olarak yazmak, düzenlemek ve güncellemek için birlikte çalışıyor. Maddeler bu açık editoryal süreç sayesinde zamanla daha tarafsız ve güvenilir hale gelmekte ve konuların her yönüyle ele alınması ve olgulara dayalı kaynakları temel alması mümkün olmaktadır.

Türkiye’de engel kalktığından bu yana, Türkçe Vikipedi yeniden büyüyor. Her ay Türkçe konuşan 8.000 gönüllüden meydana gelmiş bir topluluk tarafından geliştirilen maddelerin sayısı, 474.000’i geçmiştir. Vikipedi ve Wikimedia projeleri Türkiye’de ayda 150 milyondan fazla kez görüntüleniyor. Yerel bir gönüllü grubun öncülük ettiği çalışmalar sonucu, çeşitli Türk üniversitelerindeki öğrenciler ders yükümlülüklerinin bir parçası olarak Vikipedi makalelerine katkıda bulunuyor ve Türkiye’deki müzeciler, bilgilerini Vikipedi’ye nasıl ekleyebileceklerini öğreniyor.

Wikimedia Vakfı, Can Yeğinsu (3 Verulam Buildings, Londra) ve Gönenç Gürkaynak (ELIG Gurkaynak Attorneys-at-Law, İstanbul ) tarafından temsil edilmektedir. Son beş yıldaki hukuki hizmetleri için takdirimizi hararetle ifade ederiz.

Brandon Katzir
Brandon Katzir

Digital Services Librarian Brandon Katzir wanted to learn how to edit Wikidata and build queries using SPARQL — so he signed up for Wiki Education’s Wikidata Institute class.

“The class gave me a foundation in things like Wikidata data structures, and it also taught me how to query and use different kinds of tools in Wikidata,” he says. “Most importantly, the class points you to a lot of great resources and projects to get involved with or learn from even after the class is over.”

Brandon, who works at Oklahoma State University, added some information on Oklahoma government to Wikidata. His research interests including Jewish studies in literary studies, so he also edited some items related to Yiddish authors and their books. Even though the course has wrapped up, Brandon has continued to edit more in these topic areas.

“Editing Wikidata is really a pretty enjoyable process, and there’s a great community of editors on the site if you get stuck,” he says.

His favorite part of Wikidata is (unsurprisingly for a librarian) adding solid references to things he cares about. He also enjoys using Recoin, which was introduced in the Wikidata Institute curriculum, to fill in missing information. Recently, he’s been doing more SPARQL queries to find answers to interesting questions.

Brandon encourages other library professionals to learn more about Wikidata, and says the Wikidata Institute was a great first step for him.

“Wikidata challenges me to think about connecting library collections to Wikidata and the (ideally) reciprocal relationships between libraries and Wikidata,” he says. “Other professionals should engage because Wikidata is a massive dataset, but it’s still only as good as the contributions made. The more contributions, the better, more representative, and more precise Wikidata can be.”

The class taught him what he wanted to know — and also sparked an interest in Wikidata’s lexemes. While much of Wikidata is focused on documenting concepts, Wikidata also contains a structured database of words, their parts, and phrases — lexicographical data. (Want to see one in action? Click on the Random Lexeme link in the left column of Wikidata.)

“I have a longstanding interest in dictionaries and lexicography,” Brandon says. “I think there’s a very human impulse to try and catalog and describe the entirety of a language. A century ago, people were doing this with massive, multi-volume books. It makes sense that today we’d try and do this with datasets that can, ideally, aid with translation, AI, natural language processing, etc. It’s a huge task, obviously, but I’m very interested in seeing how it develops and the applications that arise from lexemes.”

Overall, the course exceeded Brandon’s expectations, and left him encouraging more people to take it.

“The class was great, I highly recommend it, you’ll learn a lot about Wikidata and get valuable resources to continue your knowledge after the class,” he says.

To sign up for the Wikidata institute, visit wikiedu.org/wikidata.

Image credit: Brandon Katzir, CC BY-SA 4.0, via Wikimedia Commons

Tech News issue #12, 2022 (March 21, 2022)

00:00, Monday, 21 2022 March UTC
previous 2022, week 12 (Monday 21 March 2022) next

weeklyOSM 608

15:35, Sunday, 20 2022 March UTC

08/03/2022-14/03/2022

lead picture

Barometer of cycle-friendly cities [1] © FUB | map data © OpenStreetMap contributors

Mapping campaigns

  • OSM Ghana tweeted statistics of their #28DaysMappingChallenge. In February OSM Ghana started a campaign to motivate people to make contributions to OSM on each day of the month. They garnered 598,386 edits from 69 contributors.
  • Becky Candy wrote about live validation during a mapathon.
  • Flanders Tourism launched a website to gather data on tourism POIs, such as: charging points for electric bikes, playgrounds, picnic tables and benches. This project, built upon MapComplete, entails a public campaign with a video presentation (de) and a data import . The developer gave a quick overview in his diary.
  • raphaelmirc presents (pt) > en MapSãoFrancisco, an initiative of Canoa de Tolda and InfoSãoFrancisco that aims to develop collaborative mapping projects throughout the São Francisco river basin (Brazil).
  • The new UMBRAOSM event will be held (pt) > en on Saturday 2 April, from 00:00 to 23:59 (UTC-3). It will be a mapathon focusing on buildings, roads, POIs, and green areas.

Mapping

  • Christoph Hormann presented his ‘TagDoc Project’, which is intended to offer an alternative to the OSM Wiki with the focus on the de facto use of tags.
  • The following proposals await your comments:

Community

  • The Korea OSM community has decided (ko) > en to change Ukrainian OSM place names in Korean from Russian notation (e.g. 키예프 – as same ‘Kiev’, 하리코프 – as same ‘Kharkov’) to Ukrainian notation (e.g. 키이우 – as same ‘Kyiv’, 하르키우 – as same ‘Kharkiv’) and is currently editing OSM. The revision of the notation is based on the National Institute of Korean Language’s Ukrainian Geographical Names and Personal Information Collection (2022-03-11) (ko).
  • The OpenStreetMap Ops Team tweeted that the new Discourse based Community Forum has launched. The community is invited to participate in the next steps.
  • The OSM community of Burgos, Spain, is planning several activities, including a mapathon and talks on 1 and 2 April. More information on the event is available on the wiki (es) page.
  • OSM Fukushima members have released videos introducing weeklyOSM 603 and 604 (ja). After introducing weeklyOSM, they interviewed Hokkosha, one of the most active mappers in Japan.

OpenStreetMap Foundation

  • Paul Norman explained: the usage polices for Nominatim and OSM Map Tiles, automatic blocking, and how to fix too frequent requests.

Events

  • On Saturday 19 March, Maptime Bogota brought together (es) > en mappers virtually to update the representation of bike paths in OSM for the Colombian capital.
  • Thanks to C3VOC, video recordings from FOSSGIS 2022 are now available (de).
  • Save the date for Open Belgium 2022, which will be held on Friday 29 April at De Krook, in Ghent, as a real-life event. If you have an idea for a talk or workshop, submit it yourself or ask for advice on how to (deadline is Monday 21 March).

Education

OSM research

  • Datenschatz has written a detailed tutorial on how one can systematically identify and visualise missing data points in OSM data with the help of Pandas and a bit of geo-datascience. The data he is using to compare are health departments in Germany.
  • Transport researchers are studying how open data (such as OSM) stimulates digital civic engagement. One of them has asked mappers to participate in a five minute survey.

Maps

  • K. Sakanoshita has developed (ja) an online map to show historic objects in Ukraine.
  • [1] The Fédération française des Usagers de la Bicyclette (French National Cyclists Association) has published (fr) > en the results of the 2021, and third, edition of the ‘aromètre Parlons Vélo‘ which is a nationwide survey about bicycle usage. A general map shows (fr) > en the multiple cities ranking, while a second one shows (fr) > en locations perceived as problematic, improved or where a parking solution is needed.
  • Liveuamap has made an OSM based interactive map showing the latest news in Ukraine.

Open Data

Software

  • yuiseki has developed (ja) an online tool (ja) to assist in editing building address data. Currently, only Japanese addresses are supported, but he plans to support the entire world in the future. The source code is available on GitHub.
  • The Technical University of Darmstadt presented (de) the ‘Intermodal Travel Information’ Motis-Project on media.ccc.de. A demonstration application (de) and the sources are available.
  • RouteXL is a web appplication using OpenStreetMap data for road itinerary planning, with optimisation for multiple stops to save travel time and cost. It is free to use for routes of up to 20 stops. Various geocoders can be selected including Nominatim.
  • Strasbourg launched its mobile application StrasApp (fr) > en, which provides a map view to users through OpenMapTiles. Unfortunately, as has been noted (fr) > en, with a complete lack of attribution of either OpenMapTiles or OpenStreetMap as requested by the OpenMapTiler group.

Programming

  • Geofabrik presentedShortbread‘, a basic, lean, general-purpose vector tile schema for OpenStreetMap data, CC0/FTWPL licensed.
  • Sarah Hoffmann pointed out (de) > en that OpenStreetMap is participating in the Google Summer of Code as an organisation and the first project ideas for it have been made.

Releases

  • Vespucci’s March 2022 maintenance release 17.04 is now available in the Play Store.

Did you know …

  • BBBike Map Compare for comparing maps? You can use it find out if Waze uses OSM without attribution, for example.
  • … the new Twitter account OSMChangesets?

OSM in the media

  • Robert Goedl described (de) > en how to use Maperitive to edit OpenStreetMap maps on Linux.

Other “geo” things

  • In the era of phones and online navigation maps, the police emergency response to a shooting in rural Nova Scotia (Canada) showed that the police local units did not have a rapid and reliable backup system for road navigation and aerial images. They consulted an online map that provided road navigation results that seemed unreliable to people that knew the area.
  • contrapunctus asked on OSMIndia: ‘Does anyone know of any opportunities to be a full-time, professional OSM mapper?’. They received a number of accurate answers.

Upcoming Events

Where What Online When Country
臺北市 OpenStreetMap街景踏查團工作坊1 osmcalpic 2022-03-19 flag
Ciudad de Guatemala Primer mapatón YouthMappers en Guatemala (remoto) osmcalpic 2022-03-19 – 2022-03-20 gt
City of Subiaco Social Mapping Sunday: Shenton Park osmcalpic 2022-03-20 flag
Habay-la-Neuve Réunion des contributeurs OpenStreetMap, Habay-la-Neuve osmcalpic 2022-03-21 flag
OSMF Engineering Working Group meeting osmcalpic 2022-03-21
Why Open Data Matters for Cycling: Visualizing a Cycling City osmcalpic 2022-03-22
Barcelona Geomob Barcelona osmcalpic 2022-03-22 flag
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-03-22 flag
Decatur County OSM US Mappy Hour osmcalpic 2022-03-24 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-03-24
IJmuiden OSM Nederland bijeenkomst (online) osmcalpic 2022-03-26 flag
06060 Reunión Bimestral OSM-LatAm osmcalpic 2022-03-26 flag
Perth Social Mapping Online osmcalpic 2022-03-27 flag
Bremen Bremer Mappertreffen (Online) osmcalpic 2022-03-28 flag
San Jose South Bay Map Night osmcalpic 2022-03-30 flag
Ville de Bruxelles – Stad Brussel Virtual OpenStreetMap Belgium meeting osmcalpic 2022-03-29 flag
Tucson State of the Map US osmcalpic 2022-04-01 – 2022-04-03 flag
Burgos Evento OpenStreetMap Burgos (Spain) 2022 osmcalpic 2022-04-01 – 2022-04-03 flag
Stuttgart Stuttgarter Stammtisch osmcalpic 2022-04-05 flag
London Missing Maps London Mapathon osmcalpic 2022-04-05 flag
Berlin OSM-Verkehrswende #34 (Online) osmcalpic 2022-04-05 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, Sammyhawkrad, SomeoneElse, Strubbl, Ted Johnson, TheSwavu, YoViajo, derFred, jcr83, muramototomoya.[

Production Excellence #41: February 2022

12:49, Sunday, 20 2022 March UTC

How’d we do in our strive for operational excellence last month? Read on to find out!

Incidents

3 documented incidents last month.

2022-02-01 ulsfo network
Impact: For 3 minutes, clients served by the ulsfo POP were not able to contribute or display un-cached pages.

2022-02-22 wdqs updater codfw
Impact: For 2 hours, WDQS updates failed to be processed. Most bots and tools were unable to edit Wikidata during this time.

2022-02-22 vrts
Impact: For 12 hours, incoming emails to a specific recently created VRTS queue were not processed with senders receiving a bounce with an SMTP 550 Error.

Figure from Incident graphs.


Incident follow-up

Remember to review and schedule Incident Follow-up work in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read about past incidents at Incident status on Wikitech.

Recently conducted incident follow-up:

Create a dashboard for Prometheus metrics about health of Prometheus itself.
Pitched by CDanis after an April 2019 incident, carried out by Filippo (@fgiunchedi).

Improve wording around AbuseFilter messages about throttling functionality.
Originally filed in 2018. This came up last month during an incident where the wording may've led to a misunderstanding. Now resolved by @Daimona.

Exclude restart procedure from automated Elasticsearch provisioning.
There can be too much automation! Filed after an incident last September. Fixed by @RKemper.


Outstanding errors

Take a look at the workboard and look for tasks that could use your help.

View Workboard

I skip breakdowns most months as each breakdown has its flaws. However, I hear people find them useful, so I'll try to do them from time to time with my noted caveats. The last breakdown was in the December edition, which focussed on throughput during a typical month. Important to recognise is that neither high nor low throughput is per-se good or bad. It's good when issues are detected, reported, and triaged correctly. It's also good if a team's components are stable and don't produce any errors. A report may be found to be invalid or a duplicate, which is sometimes only determined a few weeks later.

The below "after six months" breakdown takes more of that into consideration by looking at what's on the table after six months (tasks upto Sept 2021). This may be considered "fairer" in some sense, although has the drawback of suffering from hindsight bias, and possibly not highlighting current or most urgent areas.

WMF Product:

  • Anti Harassment Tools (3): 1 MW Blocks, 2 SecurePoll.
  • Community Tech (0).
  • Design Systems (1): 1 WVUI.
  • Editing Team (15): 14 VisualEditor, 1 OOUI.
  • Growth Team (13): 11 Flow, 1 GrowthExperiments, 1 MW Recent changes.
  • Language Team (6): 4 ContentTranslation, 1 CX-server, 1 Translate extension.
  • Parsoid Team (9): 8 Parsoid, 1 ParserFunctions extension .
  • Product Infrastructure: 2 JsonConfig, 1 Kartographer, 1 WikimediaEvents.
  • Reading Web (0).
  • Structured Data (4): 2 MW Uploading, 1 WikibaseMediaInfo, 1 3D extension.

WMF Tech:

  • Data Engineering: 1 EventLogging.
  • Fundraising Tech: 1 CentralNotice.
  • Performance: 1 Rdbms.
  • Platform MediaWiki Team (19): 4 MW-Page-data, 1 MW-REST-API, 1 MW-Action-API, 1 MW-Snapshots, 1 MW-ContentHandler, 1 MW-JobQueue, 1 MW-libs-RequestTimeout, 9 Other.
  • Search Platform: 1 MW-Seach.
  • SRE Service Operations: 1 Other.

WMDE:

  • WMDE-Wikidata (7): 5 Wikibase, 2 Lexeme.
  • WMDE-TechWish: 1 FileImporter.

Other:

  • Missing steward (7): 2 Graph, 2 LiquidThreads, 2 TimedMediaHandler, 1 MW Special-Contributions-page.
  • Individually maintained (2): 1 WikimediaIncubator, 1 Score extension.

Trends

In February, we reported 25 new production errors. Of those, 13 have since been resolved, and 12 remain open as of today (two weeks into the following month). We also resolved 22 errors that remained open from previous months. The overall workboard has grown slightly to a total of 301 outstanding error reports.

For the month-over-month numbers, refer to the spreadsheet data.


Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof