Episode 116: Adam Baso and Julia Kieserman

17:54, Tuesday, 05 2022 July UTC

🕑 1 hour 14 minutes

Adam Baso and Julia Kieserman are both developers in the Abstract Wikipedia group at the Wikimedia Foundation; Adam is the director of engineering, while Julia is a senior software engineer.

Links for some of the topics discussed:

Can you reuse images of the Italian cultural heritage in public domain published on Wikimedia Commons for commercial purposes? According to the new Italian National Plan for the Digitization of Cultural Heritage (Piano Nazionale di Digitalizzazione, PND) images can be published on the Wikimedia projects, but to reuse them for commercial purposes you need to ask for permission and pay a fee. This is a restriction to public domain and a misuse of our Wikimedia projects, which are collaborative repositories meant to freely provide content, also for commercial purposes.

The new PND – under review until June 30th, 2022 – for the first time explicitly refers to Wikimedia Commons in its Guidelines for the acquisition, circulation and reuse of cultural heritage reproductions in the digital environment (page 28) and it states:

“The download of cultural heritage reproductions published on third-party websites is not under the control of the public entity that holds the assets (e.g., images of cultural heritage assets downloadable from Wikimedia Commons, made “freely” by contributors by their own means for purposes of free expression of thought and creative activity, and thus in the full legitimacy of the Cultural Heritage Code). It remains the responsibility of the cultural institution to charge fees for subsequent commercial uses of reproductions published by third parties.”

In spite of a clear support to open access, FAIR data, collaboration, co-creation and reuse, the guidelines of the PND want to turn all images of Italian cultural heritage in public domain available on Wikimedia Commons into non commercial (NC) images with the new label MIC BY NC (MIC stands for the Italian Ministry of Culture). According to an Italian administrative norm (Codice dei beni culturali e del patrimonio), Italian monuments and collections under public domain can be photographed for non-commercial purposes, while the commercial uses are allowed only with a preventive authorization and the payment of a fee to the institutions managing that site or collection.

The system can’t work and it is unsustainable

The application of this kind of fee by the Ministry of Culture and cultural institutes on commercial reuses of Italian cultural heritage images on Wikimedia Commons is unrealistic (especially if the re-users are based outside Italy), complex and expensive to manage (for handling permissions and payments). Furthermore this system follows an outdated business model that aims at making money on heritage digitization instead of opening it up to reuses, as the European policies suggest (ref. open government, open data, open science).

Wikimedia projects are exploited and sabotaged

The fee charged on the reuse of Wikimedia content exploits our free infrastructure and the work of volunteers and donors and goes against our principles of free knowledge, openness and reuse. Furthermore it is in contrast with the thousands of authorizations collected by Wikimedia Italia in ten years of history of Wiki Loves Monuments and the engagement of Italian GLAMs committed to provide their public domain heritage with open tools accessible for all purposes without fees.

What is being done and what can be done

Wikimedia Italia sent an open letter to representatives of the Italian government, calling out not to add restrictions on images of cultural heritage in the public domain licensed under an open license on Wikimedia projects. We will keep asking that, to push our country to align to international standards on openness and civil society participation to the conservation of its own heritage.

Help us raise our voice: let us know if you have similar issues in your countries and how you have been dealing with that. If you are a volunteer on Wikimedia Commons, let us know if and how the community could help supporting our requests.

Learn more about the current situation here.

Tech/News/2022/27

15:49, Tuesday, 05 2022 July UTC

Other languages: Bahasa Indonesia, Deutsch, English,italiano, polski, português, português do Brasil, svenska, čeština, русский, українська, עברית, العربية, فارسی, বাংলা, 日本語

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 5 July. It will be on non-Wikipedia wikis and some Wikipedias from 6 July. It will be on all wikis from 7 July (calendar).
  • Some wikis will be in read-only for a few minutes because of a switch of their main database. It will be performed on 5 July at 07:00 UTC (targeted wikis) and on 7 July at 7:00 UTC (targeted wikis).
  • The Beta Feature for DiscussionTools will be updated throughout July. Discussions will look different. You can see some of the proposed changes.
  • This change only affects pages in the main namespace in Wikisource. The Javascript config variable proofreadpage_source_href will be removed from mw.config and be replaced with the variable prpSourceIndexPage. [1]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Outreachy report #33: June 2022

00:00, Tuesday, 05 2022 July UTC

June was one of the hardest months since I’ve joined the Outreachy team almost 4 years ago. It was the first time I’ve experience the loss of a colleague—and Marina wasn’t just a colleague. I keep feeling that words aren’t enough to express how much I own Marina and Outreachy for the life I now have. Before becoming an Outreachy intern, I struggled to find meaning in life; I had so many dreams, but it was hard to see myself achieving any of them.

Tech News issue #27, 2022 (July 4, 2022)

00:00, Monday, 04 2022 July UTC
previous 2022, week 27 (Monday 04 July 2022) next

Tech News: 2022-27

weeklyOSM 623

10:32, Sunday, 03 2022 July UTC

21/06/2022-27/06/2022

lead picture

Qwant-Map – OSM data connected to Wikimedia and Tripadvisor [1] © Qwant | map data © OpenStreetMap contributors

About us

  • DeepL Pro now offers translations into Bahasa Indonesia. We have taken the liberty of automatically translating and publishing issue #623. We would like to do this better in the future. However, to do this, we need two proofreaders to make the necessary corrections starting Friday each week. Get in touch via info at weeklyosm dot eu.

Mapping

  • Enock Seth Nyamador raised some concerns about the quality of YouthMappers’ edits in Ghana.
  • SK53 noted that in a number of countries Bing imagery in the iD editor is very out-of-date (5 years old or more). This appears to be the result of a recent change in iD’s code, and is discussed on GitHub.
  • Tobias Zwick wrote about another small project he will be working on over the next few months. It is made possible by an NLNet NGI Zero Discovery grant. Topic: how to improve and complete maxspeed=* data in OSM, including inferring default speed limits.
  • Requests have been made for comments on the following proposals:

  • Voting on amenity=library_dropoff, for mapping a place where library patrons can return or drop off books, other than the library itself, is open until Friday 8 July.
  • The proposal for the improved tagging of neighbourhood places (place=*) in Japan was approved with 14 votes for, 1 vote against, and 1 abstention.

OpenStreetMap Foundation

  • The OSM Tech Twitter account conducted a poll on whether OpenStreetMap should consider publishing a quarterly electronic newsletter. Although, at present, the poll is favourable, the thread highlights that there are a number of obstacles in producing such a newsletter.

Local chapter news

  • Take a look at the June OpenStreetMap US Newsletter.

Events

  • The deadline for submitting a poster for this year’s State of the Map conference is Sunday 31 July.

Software

  • Lilly Tinzmann reported that there have been features added to the Ohsome Quality Analyst, a data quality analysis tool for OSM accessible via a web interface. The new features include the ability to retrieve HTML snippets with a visual representation of the indicator results and expanded data input options.
  • Sarah Hoffmann blogged about the current status of postcodes in OpenStreetMap, explaining their usefulness and offering a QA layer for incorrectly formatted postcodes.
  • Visit Sights offers suggestions for self-guided sightseeing tours by foot around the world – based on OpenStreetMap and Wikipedia. For each city there is also an overview with individual sights including a map.

Did you know …

  • [1] … Qwant-Maps? We last reported on Qwant-Maps in July 2019. The map has been developed further since then. It draws on Tripadvisor for hotels and restaurants, and also links to Wikipedia, thus providing significant added value.
  • … Martijn van Exel has a bash script to create a vintage OpenStreetMap tile server?
  • … that Jason Davies, one of the contributors to the D3 graphics package, created a webpage demonstrating several dozen map projections of the Earth with smooth transitions between each?

Other “geo” things

  • Ariel Kadouri saw that a road in Google Maps was renamed incorrectly. Something that is said to be a problem of an open system like OSM, not a closed system like Google Maps, which is not true evidently. Noel Hidalgo said he had a friend who filed a ticket at Google, which solved the issue after about 36 hours.
  • User F-5 System made (ru) a pilgrimage to a chapel at the reputed source of the Lena River in Northern Russia. It turns out that, as with many large rivers, the source is a contentious issue.

Upcoming Events

Where What Online When Country
Washington A Synesthete’s Atlas (Washington, DC) osmcalpic 2022-07-01 flag
Essen 17. OSM-FOSSGIS-Communitytreffen osmcalpic 2022-07-01 – 2022-07-03 flag
OSM Africa July Mapathon: Map Liberia osmcalpic 2022-07-01
OSMF Engineering Working Group meeting osmcalpic 2022-07-04
臺北市 OpenStreetMap x Wikidata Taipei #42 osmcalpic 2022-07-04 flag
San Jose South Bay Map Night osmcalpic 2022-07-06 flag
London Missing Maps London Mapathon osmcalpic 2022-07-05 flag
Berlin OSM-Verkehrswende #37 (Online) osmcalpic 2022-07-05 flag
Salt Lake City OSM Utah Monthly Meetup osmcalpic 2022-07-07 flag
Roma Incontro dei mappatori romani e laziali osmcalpic 2022-07-06 flag
Fremantle Social Mapping Sunday: Fremantle osmcalpic 2022-07-10 flag
München Münchner OSM-Treffen osmcalpic 2022-07-12 flag
Berlin Missing Maps – GRC Online Mapathon osmcalpic 2022-07-12 flag
20095 Hamburger Mappertreffen osmcalpic 2022-07-12 flag
London London pub meet-up osmcalpic 2022-07-12 flag
Landau an der Isar Virtuelles Niederbayern-Treffen osmcalpic 2022-07-12 flag
Salt Lake City OSM Utah Monthly Meetup osmcalpic 2022-07-14 flag
153. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-07-19
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-07-19 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-07-19 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by JAAS, Lejun, LuxuryCoop, Nordpfeil, PierZen, SK53, Strubbl, TheSwavu, derFred.

A belated writeup of CVE-2022-28201 in MediaWiki

06:03, Sunday, 03 2022 July UTC

In December 2021, I discovered CVE-2022-28201, which is that it's possible to get MediaWiki's Title::newMainPage() to go into infinite recursion. More specifically, if the local interwikis feature is configured (not used by default, but enabled on Wikimedia wikis), any on-wiki administrator could fully brick the wiki by editing the [[MediaWiki:Mainpage]] wiki page in a malicious manner. It would require someone with sysadmin access to recover, either by adjusting site configuration or manually editing the database.

In this post I'll explain the vulnerability in more detail, how Rust helped me discover it, and a better way to fix it long-term.

The vulnerability

At the heart of this vulnerability is Title::newMainPage(). The function, before my patch, is as follows (link):

public static function newMainPage( MessageLocalizer $localizer = null ) {
    if ( $localizer ) {
        $msg = $localizer->msg( 'mainpage' );
    } else {
        $msg = wfMessage( 'mainpage' );
    }
    $title = self::newFromText( $msg->inContentLanguage()->text() );
    // Every page renders at least one link to the Main Page (e.g. sidebar).
    // If the localised value is invalid, don't produce fatal errors that
    // would make the wiki inaccessible (and hard to fix the invalid message).
    // Gracefully fallback...
    if ( !$title ) {
        $title = self::newFromText( 'Main Page' );
    }
    return $title;
}

It gets the contents of the "mainpage" message (editable on-wiki at MediaWiki:Mainpage), parses the contents as a page title and returns it. As the comment indicates, it is called on every page view and as a result has a built-in fallback if the configured main page value is invalid for whatever reason.

Now, let's look at how interwiki links work. Normal interwiki links are pretty simple, they take the form of [[prefix:Title]], where the prefix is the interwiki name of a foreign site. In the default interwiki map, "wikipedia" points to https://en.wikipedia.org/wiki/$1. There's no requirement that the interwiki target even be a wiki, for example [[google:search term]] is a supported prefix and link.

And if you type in [[wikipedia:]], you'll get a link to https://en.wikipedia.org/wiki/, which redirects to the Main Page. Nice!

Local interwiki links are a bonus feature on top of this to make sharing of content across multiple wikis easier. A local interwiki is one that maps to the wiki we're currently on. For example, you could type [[wikipedia:Foo]] on the English Wikipedia and it would be the same as just typing in [[Foo]].

So now what if you're on English Wikipedia and type in [[wikipedia:]]? Naively that would be the same as typing [[]], which is not a valid link.

So in c815f959d6b27 (first included in MediaWiki 1.24), it was implemented to have a link like [[wikipedia:]] (where the prefix is a local interwiki) resolve to the main page explicitly. This seems like entirely logical behavior and achieves the goals of local interwiki links - to make it work the same, regardless of which wiki it's on.

Except it now means that when trying to parse a title, the answer might end up being "whatever the main page is". And if we're trying to parse the "mainpage" message to discover where the main page is? Boom, infinite recursion.

All you have to do is edit "MediaWiki:Mainpage" on your wiki to be something like localinterwiki: and your wiki is mostly hosed, requiring someone to either de-configure that local interwiki or manually edit that message via the database to recover it.

The patch I implemented was pretty simple, just add a recursion guard with a hardcoded fallback:

    public static function newMainPage( MessageLocalizer $localizer = null ) {
+       static $recursionGuard = false;
+       if ( $recursionGuard ) {
+           // Somehow parsing the message contents has fallen back to the
+           // main page (bare local interwiki), so use the hardcoded
+           // fallback (T297571).
+           return self::newFromText( 'Main Page' );
+       }
        if ( $localizer ) {
            $msg = $localizer->msg( 'mainpage' );
        } else {
            $msg = wfMessage( 'mainpage' );
        }

+       $recursionGuard = true;
        $title = self::newFromText( $msg->inContentLanguage()->text() );
+       $recursionGuard = false;

        // Every page renders at least one link to the Main Page (e.g. sidebar).
        // If the localised value is invalid, don't produce fatal errors that

Discovery

I was mostly exaggerating when I said Rust helped me discover this bug. I previously blogged about writing a MediaWiki title parser in Rust, and it was while working on that I read the title parsing code in MediaWiki enough times to discover this flaw.

A better fix

I do think that long-term, we have better options to fix this.

There's a new, somewhat experimental, configuration option called $wgMainPageIsDomainRoot. The idea is that rather than serve the main page from /wiki/Main_Page, it would just be served from /. Conveniently, this would mean that it doesn't actually matter what the name of the main page is, since we'd just have to link to the domain root.

There is an open request for comment to enable such functionality on Wikimedia sites. It would be a small performance win, give everyone cleaner URLs, and possibly break everything that expects https://en.wikipedia.org/ to return a HTTP 301 redirect, like it has for the past 20+ years. Should be fun!

Timeline

Acknowledgements

Thank you to Scott Bassett of the Wikimedia Security team for reviewing and deploying my patch, and Reedy for backporting and performing the security release.

Sock🧦 nerdery🤓

01:15, Friday, 01 2022 July UTC

Being a nerd is not about what you love; it’s about how you love it.

Wil Wheaton

My running last week

I’m a runner and a sock nerd, and in four days, I’m running a half-marathon (eek!).

Here are some reflections on socks because if there’s one thing every runner knows it’s: socks. matter.

Join the Darn Tough sock cult.

Darn Tough makes merino wool socks prized by hikers, runners, and buy-it-for-lifers because they’re guaranteed for life.

Darn Tough’s lifetime warranty

According to my Amazon order history, I ordered five pairs of “Darn Tough Merino Wool Double Cross, No Show Tab, Light Cushion Sock Molten Large” socks in 2016. Today, six years later, I’m wearing a pair of the socks I ordered in 2016, and they’re great.

And in all this time I’ve never used their warranty program, but I decided to try it out on a particularly worn pair—we’ll see how it goes!

About compression socks

Why? Because squeezy is good.

Peter Sagal, Host of NPR’s “Wait Wait… Don’t Tell Me!”

Compression socks supply support and structure. And it makes them a joy to wear—even when you’re not running.

Initially, compression socks emerged to support circulation in the legs of diabetics. But now savvy runners sport them to capitalize on numerous studies claiming they aid performance and recovery (although who knows what the control is in those studies).

I own two colors of CEP Progressive+ Run 2.0—basic black and caution-tape yellow.

These socks are made of nylon (mostly) which massages my calves, keeping my blood flowing on my recovery days. I’ve owned these socks for years and wear them weekly.

But it’s not all cozy, compressed joy:

  • 💸Compression socks are too expensive—mine cost $65 a pair!
  • 🧐 The socks come with instructions about how to put them on
  • 🛂 You need instructions to put them on

Avoid cotton socks

90% of everything is crap

Sturgeon’s Law

Most socks are crap for running because most socks are cotton.

But cotton is the wrong material for socks for the same reason it’s the right material for towels. Cotton is absorbent—it holds water and doesn’t release it. The sweat trapped between your foot and your cotton sock can cause blisters while running or hiking.

In contrast, technical socks tend to be made of less absorbent material that dries quickly. So when you sweat, your sweat moves to the surface of the sock and evaporates before it gives you blisters.

I believed blisters were unavoidable—I tossed a roll of Leukotape in my firstaid kit and accepted that I’d use it often. But then I realized the real problem was my cotton socks.

You think about socks every day.

“I don’t want to make decisions about what I’m eating or wearing. Because I have too many other decisions to make.”

Barack Obama

Mental energy is precious. You should avoid misspending your limited mental energy on your socks.

You could argue writing a blog post about socks is the definition of misspent mental energy. But I believe it’s when you’re spending your mental energy that matters.

If you find yourself bleary-eyed, rooting around for the one good pair of socks in the drawer, then you’re thinking about socks at the wrong time.

Spend your effort up-front.

Declare sock bankruptcy and find a brand of comfortable socks that you can wear in every situation, and then stock up.

My GLAMorous introduction into the Wikiverse

16:54, Thursday, 30 2022 June UTC

In January 2021, I had no experience on any of the Wikimedia platforms. By the end of 2021, I had added over 200,000 words across Wikipedia and Wikidata and assisted in two Smithsonian edit-a-thons.

The Beginning

After completing a digital archival research project on anti-rape protests at The Ohio State University, my friend encouraged me to apply to the 2021 Virtual Because of Her Story (BOHS) internship Project with the American Women’s History Initiative (AWHI) at the Smithsonian Institution. The internship was eight weeks, 40 hours a week, and paid. Without the financial assistance BOHS provided, I would not have been able to do this opportunity. 

My BOHS project “Wikimedia, Gender Equity, and the Digital Museum” aimed to “advance gender equity on Wikipedia by making our collections about women accessible on the Wikimedia platforms.” As a Women’s, Gender, and Sexuality Studies major the project appealed to me because disseminating knowledge in accessible ways has been key to many feminists’ organization efforts. 

Summer 2021

My mentor Kelly Doyle taught me the basics of Wiki-etiquette, including conflict of interest, determining reliable sources, establishing notability, and adding categories. My first edit was adding the category “South Korean adoptees” to Mia Mingus’ page. Kelly encouraged utilizing the Wikipedia: Task Center to find pages to categorize or copyedit. I then started editing people’s Wikipedia pages. The visual editor was incredibly helpful for me. Being able to make these changes and see the results immediately gave me a lot of motivation to keep editing.

Andrew Lih introduced me and other BOHS interns to Wikidata, Wikipedia, and Wikimedia Commons. I began editing Wikidata and realized Wikidata was more intuitive for me than Wikipedia. I felt comfortable creating Wikidata properties in real time, like when Zaila Avant-garde won the 2021 Scripps National Spelling Bee. I utilized my Wikipedia and Wikidata skills for the Black Women in Food Smithsonian Edit-A-Thon. I created my first ever article for LaDeva Davis and created Wikidata properties for women featured in our Edit-A-Thon.

Before my internship ended, I wanted to complete a passion project so I created the Wikipedia page for the “Asian Americans (documentary series).” Creating this Wikipedia page meant a lot to me because I wanted to highlight Asian-American contributions on Wikipedia. I wanted the page to act in a similar way as the documentary and connect Asian-American Wikipedia pages together in a cohesive and contextually relevant way. Being especially thorough when it comes to Asian-American Wikipedia presence is important to me because omitting details felt like erasing Asian-American contributions all over again. 

At the end of my Summer BOHS internship with the Smithsonian I had created 6 Wikipedia pages and 24 Wikidata properties, added 294 references, and wrote ~36,000 words across Wikiplatforms.

Autumn 2021

Mia Cariello

I had the privilege to continue my internship into the Fall. During the Fall, I presented at WikidataCon 2021, collaborated with the National Air and Space Museum on their Wikipedia Edit-a-Thon, and participated in Wikipedia:Asian Month (finishing in at #18 out of 46 participants). During my fall internship I created 12 new Wikipedia articles and 45 Wikidata properties, added 1,170 references, and wrote ~188,000 words across Wikimedia. I did this all while completing my first semester of graduate school and only working 30 hours a week for AWHI. 

Off- Wiki Outcomes

I took my knowledge of the Smithsonian, Wikipedia, and women’s (under)representation to the classroom. I taught undergraduate students how they can find information on women in Wikipedia and Museum databases. We discussed how the internet can replicate biases and how including marginalized groups onto Wikipedia and Wikidata could help combat this. My WikiWork found its way into other people’s classes as well. Professors thanked me for creating the Asian Americans (documentary series) page because they planned on using it in their own courses.

The opportunity AWHI’s BOHS Internship program provided me is invaluable. After completing my degree, I hope to pursue more work with Wikipedia and GLAM institutions. I hope that the Smithsonian and other GLAM institutions continue to create or expand their Wiki-programs. Future interns could potentially add millions of words across Wikimedia platforms and have the tools to create their own passion projects well after their internships have ended. 


Mentor Observations

Mia’s contributions to Wikimedia are incredible and far exceeded my expectations. She went from a complete newbie with zero edits to a superstar editor who is now considering Wikimedia and/or open access as a career. Her internship teaches us several things: that it’s possible to pilot Wikimedia focused internships as a model for future engagement at GLAMs, that mentorship and focused Wikimedia guidance produces dedicated editors who care about our movement and, interns have high editor retention after their official role has ended. 

Mia participated in community campaigns that intersected with the focus of her internship like Wikipedia Asia Month, and quickly began to navigate between editing Wikipedia and Wikidata. She continues to find connections between Wikimedia and her graduate level coursework. Mia even incorporated Wikipedia into her Spring 2022 Women’s and Gender Studies course at Ohio State University. I’m hopeful that interns focusing on Wikimedia can become an integral part of future GLAM-Wiki engagement. 

This summer, I’m co-mentoring two more interns with the Smithsonian Asian Pacific American Center, focused on increasing the representation of Asian Pacific American women on Wikipedia and picking up on Mia’s successes in 2021. 


Learn more

Mia Cariello is currently pursuing a Masters Degree in Women’s, Gender, and Sexualitiy Studies at The Ohio State University.

Kelly Doyle is the Open Knowledge Coordinator for the Smithsonian American Women’s History Initiative

By Jesse Amamgbu and Isaac Johnson

Introduction

Every month, editors on Wikipedia make somewhere between 10 and 14 million edits to Wikipedia content. While that is clearly a large amount of change, knowing what each of those edits did is surprisingly difficult to quantify. This data could support new research into edit dynamics on Wikipedia, more detailed dashboards of impact for campaigns or edit-a-thons, and new tools for patrollers or inexperienced editors. Editors themselves have long relied on diffs of the content to visually inspect and identify what an edit changed.

Example wikitext diff of Lady Cockburn and Her Three Eldest Sons showing that User:BrownHairedGirl made an edit that inserted a new template, changed an existing template, and changed an existing category. Additional lines in the diff are shown by the tool for context to help in determining what this change did.
Figure 1: Example wikitext diff [Source]

For example, Figure 1 above shows a diff from the English Wikipedia article “​​Lady Cockburn and Her Three Eldest Sons” in which the editor inserted a new template, changed an existing template, and changed an existing category. Someone with a knowledge of wikitext syntax can easily determine that from viewing the diff, but the diff itself just shows where changes occurred, not what they did. The VisualEditor’s diffs (example) go a step further and add some annotations, such as whether any template parameters were changed, but these structured descriptions are limited to a few types of changes. Other indicators of change – the minor edit flag, the edit summary, the size of change in bytes – are often overly simplistic and at times misleading.

Our goal with this project was to generate diffs that provided a structured summary of the what of an edit – in effect seeking to replicate what many editors naturally do when viewing diffs on Wikipedia or the auto-generated edit summaries on Wikidata (example). For the edit to Lady Cockburn, that might look like: 1 template insert, 1 template change, 1 category change, and 1 new line across two sections (see Figure 2). Our hope is that this new functionality could have wide-ranging benefits:

  • Research: support analyses similar to Antin et al. about predictors of retention for editors on Wikipedia or more nuanced understandings of effective collaboration patterns such as Kittur and Kraut.
  • Dashboards: the Programs and Events Dashboard already shows measures of references and words added for campaigns and edit-a-thons, but could be expanded to include other types of changes such as images or new sections.
  • Vandalism Detection: the existing ORES edit quality models already use various structured features from the diff, such as references changed or words added, but could be enhanced with a broader set of features.
  • Newcomer Support: many newcomers are not aware of basic norms such as adding references for new facts or how to add templates. Tooling could potentially guide editors to the relevant policies as they edit or help identify what sorts of wikitext syntax they have not edited yet and introduce them to these concepts (more ideas).
  • Tooling for Patrollers: in the same way that editors can filter their watchlists to filter out minor edits, you could also imagine them setting more fine-grained filters, such as not showing edits that only change categories or whitespace.
A more visual wikitext diff on the left is summarized by the tool on the right to be 1 category change, 1 template change, 1 template insertion, and 1 whitespace insertion across 2 changed sections.
 Figure 2. Same wikitext diff of Lady Cockburn and Her Three Eldest Sons but with edit types output added to the left to show how the library describes the edit [Source]

Background

Automated approaches for describing edits is not a new idea. While our requirements led us to build our own end-to-end system, we were able to build heavily on past efforts. Past approaches largely fit into two categories: human-intelligible and machine-intelligible. The diff in Figure 1 from Wikidiff2 is an example of human-intelligible diffs that are generally only useful if you have someone who understands wikitext interpreting it (a very valid assumption for patrollers on Wikipedia). This sort of diff has existed since the early 2000s (then called just Wikidiff).

Past research has also attempted to generate machine-intelligible diffs, primarily for machine-learning models to do higher-order tasks such as detecting vandalism or inferring editor intentions. These diffs are useful for models in that they are highly structured and quick to generate, but can be so decontextualized as to be non-useful for a person trying to understand what the edit did. An excellent example of this is the revscoring Python library, which provides a variety of tools for extracting basic features from edits such as the number of characters changed between two revisions. Most notably, this library supported work by Yang et al. to classify edits into a taxonomy of intentions – e.g., copy-editing, wikification, fact-update. These higher-order intentions require labeled data; however, that is expensive to gather from many different language communities.

We instead focus on identifying changes at the level of the different components of wikitext that comprise an article – e.g., categories, templates, words, links, formatting, images [1]. The closest analog to our goals and a major source of inspiration was the visual diffs technology, which was built in 2017 in support of VisualEditor. While its primary goal is to be human-intelligible, it does take that additional step of generating structured descriptions of what was changed for objects such as templates.

Implementation

The design of our library is based heavily on the approach taken by the Visual Diffs team [2] with four main differences:

  1. Visual diffs is written in Javascript and we work in Python to allow for large-scale analyses and complement a suite of other Python libraries intended to provide support for Wikimedia researchers.
  2. Visual diffs work with the parsed HTML content of the page, not the raw wikitext markup. Because the parsed content of pages is not easily retrievable in bulk or historically, we work with the wikitext and parse the content to convert it into something akin to an HTML DOM.
  3. We do not need to visually display the changes so we relax some of the constraints of Visual diffs, especially around e.g., which words were most likely changed and how.
  4. We need broader coverage of the structured descriptions – i.e. not just specifics for templates and links, but also how many words, lists, references, etc. also were edited.

There are four stages between the input of two raw strings of wikitext (usually a revision of page and its parent revision) and the output of what edit actions were taken:

  1. Parse each version of wikitext and format it as a tree of nodes – e.g., a section with text, templates, etc. nested within it. For the parsing, we depend heavily on the amazingly powerful mwparserfromhell library.
  2. Prune the trees down to just the sections that were changed – a major challenge with diffs is balancing accuracy with computational complexity. The preprocessing and post-processing steps are quite important to this.
  3. Compute a tree diff – i.e. identify the most efficient way (inserts, removals, changes, moves) to get from one tree to the other. This is the most complex and costly stage in the process.
  4. Compute a node diff – i.e. identify what has changed about each individual element. In particular, we do a fair bit of additional processing to summarize what changed about the text of an article (sentences, words, punctuation, whitespace). It is at this stage that we could also compute additional details such as exactly how many parameters of a template were changed etc.

Learnings

Testing was crucial to our development. Wikitext is complicated and has lots of edge cases – images appear in brackets…except for when they are in templates or galleries. Diffs are complicated to compute and have no one right answer – editors often rearrange content while editing, which can raise questions about whether content was moved with small tweaks or larger blocks of text were removed and inserted elsewhere. Interpreting the diff in a structured way forces many choices about what counts as a change to a node – is a change in text formatting just when the type of formatting changes or also when the content within it changes? Does the content in reference tags contribute to word counts? Tests forced us to record our expectations and hold ourselves accountable to them, something the Wikidiff2 team also discovered when they made improvements in 2018. No amount of tests would truly cover the richness of Wikipedia either, so we also built an interface for testing the library on random edits so we could slowly identify edge cases that we hadn’t imagined.

Parsing wikitext is not easy and though we thankfully could rely on the mwparserfromhell library for much of this, we also made a few tweaks. First, mwparserfromhell treats all wikilinks equally regardless of their namespace. This is because identifying the namespace of a link is non-trivial: the namespace prefixes vary by language edition and there are many edge cases. We decided to differentiate between standard article links, media, and category links as the three most salient types of links on Wikipedia articles. We extracted a list of valid prefixes for each language from the Siteinfo API to assist with this, which is a simple solution, but will occasionally need to be updated to the most recent list of languages and aliases. Second, mwparserfromhell has a rudimentary function for removing the syntax from content and just leaving plaintext, but it was far from perfect for our purposes. For instance, because mwparserfromhell does not distinguish between link namespaces, parameters for image size or category names are treated as text. Content from references is included (if not wrapped in a template) even though these notes do not appear in-line and often are just bibliographic. We wrote our own wrapper for deciding what was text, so that the outputs more closely adhered to what we considered to be the textual content of the page.

It is not easy to consistently identify words or sentences across Wikipedia’s over 300 languages. Many languages (like English) are written with words that are separated by spaces. Many languages are not though, either because the spaces actually separate syllables or because there are no spaces in between characters at all. While the former are easy to tokenize into words, the latter set of languages require specialized parsing or a different approach to describing the scale of changes. For now, we have borrowed a list of languages that would require specialized parsing and report character counts for them as opposed to word counts (code). For sentences, we aim for consistency across the languages. The challenge is constructing a global list of punctuation that is used to indicate the ends of sentences, including latin scripts like in this blogpost, as well as characters such as the danda or many CJK punctuation. It is challenges like these that remind us of the richness and diversity of Wikipedia.

What’s next?

We welcome researchers and developers (or folks who are just curious) to try out the library and let us know what you find! You can download the Python library yourself or test out the tool via our UI. Feedback is very welcome on the talk page or as a Github issue. We have showcased a few examples of how to apply the library to the history dumps for Wikipedia or use the diffs as inputs into machine-learning models. We hope to make the diffs more accessible as well so that they can be easily used in tools and dashboards. 

While this library is generally stable, our development is likely not finished. Our initial scope was Wikipedia articles with a focus on the current format and norms of wikitext. As the scope for the library expands, additional tweaks may be necessary. The most obvious place is around types of wikilinks. Identifying media and category links is largely sufficient for the current state of Wikipedia articles, but applying this to e.g. talk pages would likely require extending this to at least include User and Wikipedia (policy) namespaces (and other aspects of signatures). Extending to historical revisions would require separating out interlanguage links.

We have attempted to develop a largely feature-complete diff library, but, for some applications, a little accuracy can be sacrificed in return for speed. For those use-cases, we have also built a simplified version that ignores document structure. The simplified library loses the ability to detect content moves or tell the difference between e.g., a category being inserted and a separate one being removed vs. a single category being changed. In exchange, it has an approximately 10x speed-up and far smaller footprint, especially for larger diffs. This can actually lead to more complete results when the full library otherwise times out.

[1] For the complete list, see: https://meta.wikimedia.org/wiki/Research:Wikipedia_Edit_Types#Edit_Types_Taxonomy 

[2] For more information, see this great description by Thalia Chan from Wikimania 2017: https://www.mediawiki.org/wiki/VisualEditor/Diffs#Technology_used 

About this post

Featured image credit: File:Spot the difference.jpg by Eoneill6, licensed under Creative Commons Attribution 4.0 International

Figure 1 credit: File:Wikitext diff example.png by Isaac (WMF), licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

Figure 2 credit: File:Edit types example.png by Isaac (WMF), licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

We are excited to be dropping the 3rd episode of WIKIMOVE, our podcast on everything Wikimedia Movement Strategy. In this episode we talk about innovation and explore the opportunities created by the UNLOCK accelerator within our movement and beyond. 

Good news!

Our podcast is now available with RSS Feed on Acast, Spotify, Soundcloud, Stitcher and Castbox. More podcast platforms will follow. 

The video version of our show is also available on Youtube with english subtitles. 

What’s in this episode? 

We are looking back at years of complaints about how Wikimedia technology is outdated and exclusive. Non-encyclopedic forms of knowledge are still impossible or hard to insert into our existing formats. The last big innovation from our movement is Wikidata, which is now almost ten years old. Movement Strategy calls on us to innovate our technical and social systems so that new and marginalized communities can join and share their knowledge. We talk about the UNLOCK accelerator program, how it is being implemented in collaboration with WMS and WMDE this year, and explore how the movement can become more of an innovation ecosystem.

Our guests are…

Kannika Thaimai, Program lead of the UNLOCK accelerator at Wikimedia Deutschland

Ivana Madžarević, Program and Community Support Manager at Wikimedia Serbia

Please visit our meta page to react to the episode and subscribe to our newsletter to get notified of each new release. 

We wish you all a summer break and will be back in August with our next episode! 

Tech/News/2022/26

21:29, Monday, 27 2022 June UTC

Other languages: Bahasa Indonesia, Deutsch, English, français, italiano, magyar, polski, português, português do Brasil, čeština, русский, українська, עברית, العربية, فارسی, বাংলা, 中文, 日本語, 한국어

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 28 June. It will be on non-Wikipedia wikis and some Wikipedias from 29 June. It will be on all wikis from 30 June (calendar).
  • Some wikis will be in read-only for a few minutes because of a switch of their main database. It will be performed on 28 June at 06:00 UTC (targeted wikis). [1]
  • Some global and cross-wiki services will be in read-only for a few minutes because of a switch of their main database. It will be performed on 30 June at 06:00 UTC. This will impact ContentTranslation, Echo, StructuredDiscussions, Growth experiments and a few more services. [2]
  • Users will be able to sort columns within sortable tables in the mobile skin. [3]

Future meetings

  • The next open meeting with the Web team about Vector (2022) will take place tomorrow (28 June). The following meetings will take place on 12 July and 26 July.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Woman working a seaweed farm in Zanzibarfarmers have had a lot of problems growing seaweed due to climate change. Two decades ago, 450 seaweed farmers roamed Paje, now, only about 150 farmers remain.

The Wikimedia Foundation continues its effort to ensure that our work and mission support are in step with a more sustainable world. We recently published our annual environmental sustainability (carbon footprint) report that covers calendar year 2021, while also looking forward to our efforts in 2022 and beyond. This report is part of an ongoing annual series of carbon footprint reports and renews our commitment to be conscious of our overall environmental impact while we work to make free knowledge available to every human being. 

The details of our environmental sustainability carbon footprint report can be found on Wikimedia Commons, which holds much of the media used on Wikipedia and its sister projects. A summary of the report is below.

• • • 

The Wikimedia Foundation’s overall carbon emissions continue to be at a lower level than previous years due to the COVID-19 global pandemic and associated business travel restrictions. Our overall emissions decreased by 7% in 2021 – a smaller decline than expected but reflects updates to our calculation methodology, particularly as it relates to our data center emissions.

Our overall carbon footprint for calendar year 2021 was approximately 1,090 metric tons CO2-equivalent (mtCO2e):

  • 0.47 mtCO2e is attributed to the natural gas and refrigerants used in the San Francisco office: scope 1 emissions
  • 15.7 mtCO2e is attributed to the electricity and steam used in the San Francisco office: scope 2 emissions
  • 1,074 mtCO2e was due to electricity usage in our data centers and the San Francisco office: scope 3 emissions
  • A negligible amount of our overall emissions were associated with travel and commuting
Wikimedia Foundation Carbon Footprint – year over year graph

Key updates to our methodology

We have reallocated our data center electricity emissions from scope 2 to scope 3 category, based on guidance outlined in the GHG Protocol (using the category 8: upstream leased assets definition). The electricity used by our servers is procured by our vendors and is included in their greenhouse gas inventory as scope 2 emissions. 

  • By moving these emissions to the scope 3 category, we will correct previous double counting of these numbers in earlier reports.

Electricity use by the San Francisco server room has been folded into overall office electricity consumption. Because the electricity used by the San Francisco office’s server room is directly related to office operations and does not support the running of our sites, we have reassigned the Foundation’s San Francisco office server room emissions from the Data Center functional category to the Office functional category.

Exclusion of data center water usage. Due to challenges related to data collection and the relatively insignificant impact that data center water use has on our overall carbon footprint, we have decided to exclude emissions related to data center water management from our reporting going forward. 

  • Several of our data centers already use water-free cooling or operate on closed-loop systems and recently, the codfw data center site in Carrollton, Texas upgraded to water-free cooling.
Wikimedia Foundation Carbon Footprint report, year over year, by scope graph

Impacts in 2021

Overall travel was significantly reduced as several large attendance annual events were held virtually and business travel was restricted to essential purposes only.

New caching servers were added in France to better serve our African, Middle Eastern, and Asian audiences. This new location utilizes a river water cooling system and has a designed Power usage effectiveness (PUE) of 1.23.

A special discussion with Louise Mabulo, hosted by the Foundation, focused on sustainable agriculture, local food production, climate change mitigation, and the relationship to broader information ecosystems; an associated editathon was also held.

Earth Day editathon was held for Foundation staff and contractors, in their volunteer capacities, as part of our third annual Earth Day celebration.

The #WikiForHumanRights 2021 campaign Right to a Healthy Environment drew 300+ participants at 34+ community events, working on 2,000+ articles in 30+ languages to encourage the expansion of Wikipedia’s climate change content.

Approximately 324 million annual page views were recorded on Wikipedia across 26,000 articles explicitly about climate change in 2021, with billions more page views on other climate-related topics.

Wikimedia Foundation Carbon Footprint Report, year over year, by functional area graph

Looking forward – 2022 and beyond

The Wikimedia Foundation has decided to discontinue direct acceptance of cryptocurrency as a means of donating. We began our direct acceptance of cryptocurrency in 2014 based on requests from our volunteers and donor communities. We are making this decision based on recent feedback from those same communities. Specifically, we will be closing our Bitpay account, which will remove our ability to directly accept cryptocurrency as a method of donating.

Travel restrictions have eased and new guidance is in place on how we travel. In 2022, the Wikimedia Foundation All Hands annual event will be held virtually, and the community event Wikimania will be a primarily virtual event with support for local gatherings where possible.

Beginning in fiscal year 2023 (July 2022 – June 2023), we will charge ourselves an internal carbon fee, per metric ton of carbon equivalent, calculated for the Foundation’s scope 1, 2, and 3 carbon emissions. We will direct these funds to support community-led initiatives focused on knowledge gaps and “topics for impact” in the environmental sustainability and sustainable development space. This is an experimental effort and more details will be released soon. 

For more information about the Wikimedia Foundation’s sustainability efforts, please visit our sustainability page; you can also ask us a question on the discussion page at meta.wikimedia.org/wiki/Talk:Sustainability.

[Editor’s note: This updated article restates information related to cryptocurrency as a means of donation and a new internal carbon fee.]

Since 2019, the Web team at the Wikimedia Foundation has been working to create a better desktop experience for Wikipedia readers and community members, regardless of their language or location. Our focus is on creating a more welcoming experience, ensuring that anyone who comes to our sites can easily find the knowledge they need, and for those interested, the tools they need to begin their journey towards contributing.

The process of changing Wikipedia is a complex one — we have worked feature by feature, with the global volunteer community, to optimize every important function or tool to serve the various needs of our diverse contributors and readers. We began by making changes to the navigation, introducing a collapsible sidebar and a limited content width and continued by refining some of our most used tools like search and language switching. Since our last post on Diff, we have made even more significant changes to the experience. These changes, as explained below, are to make sure our interface is welcoming, intuitive, and easy to use.

User Menu

Becoming a Wikipedia editor is exciting and rewarding. Yet, oftentimes, it takes a while to learn the details of the site, its navigation, and to understand the tools necessary and available to edit tasks and participate in conversations successfully.

Unfortunately, our previous navigation system did not provide new editors with much guidance on this journey. The top right corner of the page was full of links, such as: user name, talk, sandbox, preferences, beta, watchlist, contributions, log out. These not only took up important space on the page, making the interface feel cluttered and unfocused. They also did not indicate their purpose and how, or if, they were related to one another.

These links are what we call the user tools — instrumental to helping a user become an efficient editor. They provide entry points to a user’s sandbox — the place where they can begin working on articles, their personal talk page where they can discuss questions and needs with other editors, their preferences where they can set the settings and customizations they need from the interface, and more.

The lack of cohesion between these links often led to confusion. For example, when two links on a page are both named “talk”, how can we set an expectation that they lead to different places?

Our solution was simple: gather all links that pertained to the user and their personal tools under a single menu. This not only reduced visual clutter, but also created a better-organized interface in which it is clear where personal links are located.

Sticky Header

Currently, many of the essential tools that readers or editors use when browsing or contributing to Wikipedia are located at the top of the page. This is where people search for a new article to read, reference the history of edits and versions an article went through, or access the discussion on an existing article to ask important questions before making an edit. These functionalities are vital to the usage of Wikipedia. Yet, previously, they were only available in a single location — at the very top of the page.

Imagine you are an editor reading an article and you notice something off — a sentence that doesn’t make sense or a date that is clearly wrong — a potential case of vandalism. Your next step would be to go to the history page immediately to see when the last change was made, and who made it. Right now, you would have to scroll to the top of the page, every single time. With short articles, this wouldn’t be an issue, with a quick scroll you are there. However, for a longer article, you would find yourself scrolling for a while, and losing valuable time in the process.

We decided to improve this by making certain important functionality accessible throughout the page via a fixed or sticky header.

Our first task here was deciding which functionality to include. To answer this question, we looked at two things: 1) the data we currently had: which links were getting the most clicks, and 2) the needs of users in different areas, especially within emerging markets. We were able to work with the design research team and a number of contractors to survey multiple communities of readers and editors and do in-depth testing and research to define this. Overall, we performed testing in Ghana, Argentina, and Indonesia. This allowed us to select the types of links to be included in the sticky header. We then continued with further testing within our communities, receiving inputs across 30 languages and iterating on the feedback.

Our new persistent or “sticky” header allows direct access to the most important functionality for readers and editors — search, language switching, links to history and talk pages, and more, decreasing the need to scroll to the top of the page in order to use a given link by 16 percent.

Table of Contents

Similar to the header, the table of contents was previously only available at the very top of the page, causing unnecessary scrolling. Unlike the sticky header, which was more closely related to the tools, the table of contents and its location affected the perception of the content itself.

Important context about the page or article — how long it is, how many sections it has, and what their content is, was lost once you scrolled down. Jumping from section to section was impossible, locking the user into reading the article linearly and spending a long time finding the exact piece of information they are looking for.

By making the table of contents persistent as you scroll, we make sure that people have the necessary context they need, as well as the ability to jump from section to section whenever they want.

As with the sticky header, we were able to test different versions of the table of contents with new and existing readers in three of our target markets – Ghana, Argentina, and Indonesia. This testing was crucial. It allowed us to narrow down multiple prototypes to one single design, define the basic requirements for the feature, and identify what readers really needed from a table of contents.

Next, we tested with our editors across 30 languages. They focused on the details, identifying their special needs, and edge cases that only people with a lot of experience on the site would notice.

Currently, our new table of contents is available on our pilot wikis, as well as for all those who have opted into the new experience.

Next Steps

We are excited to share that we are approaching the last stages of the Desktop Improvements Project. We will be putting the final touches on our current work, and we hope to bring the updated desktop to every Wikipedia in the coming months. So far, the latest iteration of the desktop has been deployed on 31 language versions, and will include Japanese Wikipedia next.

We welcome feedback on the ideas presented above, as well as any others that might help in creating a more welcoming and intuitive interface for both readers and editors. If you have an account, you can also follow along with our changes as they come to you – just go to your Preferences page and select the Vector (2022) option in the second (Appearance) tab. You can also contact us on MediaWiki. Reach out to us with your thoughts in any language!

Tech News issue #26, 2022 (June 27, 2022)

00:00, Monday, 27 2022 June UTC
previous 2022, week 26 (Monday 27 June 2022) next

Tech News: 2022-26

weeklyOSM 622

10:13, Sunday, 26 2022 June UTC

14/06/2022-20/06/2022

lead picture

Osmose using open data in France and Spain now [1] © Osmose | map data © OpenStreetMap contributors

Breaking news

  • The next OSMF Board meeting will take place on Thursday 30 June 2022, at 13:00 UTC via the OSMF video room (which opens about 20 minutes before the meeting). The draft agenda is available on the wiki. The topics to be covered are:
    • Treasurer’s report
    • Updated membership prerequisites plan
    • Consider directing the OWG to cut access off due to attribution or other
      legal policy reasons, if flagged by the LWG
    • OSM Carto
    • OSM account creation API
    • Advisory Board – monthly update
    • Presentation by Mapbox Workers Union
    • Guest comments or questions.

Mapping

  • ViriatoLusitano has updated (pt) > de his very detailed and richly illustrated guide describing how to integrate data from the National Institute of Statistics (INE) into OSM, with names and georeferenced boundaries of different urban agglomerations.
  • Anne-Karoline Distel made a short report on her mapping trip to North Wales.
  • At this year’s SotM France conference, Stéphane Péneau gave (fr) an overview of street-level imagery, from hardware choice to file management.
  • Requests have been made for comments on the following proposals:
    • school=entrance to deprecate the use of the tag school=entrance.
    • exit=* to deprecate entrance=exit, entrance=emergency, and entrance=entrance in favour of clearer tags.
    • Emergency access and exits to address issues with the current tagging of these items.
    • aeroway=stopway for mapping the area beyond the runway that has a full-strength pavement able to support aircraft, which can be used for deceleration in the event of a rejected take-off.
    • runway=displaced_threshold for mapping the part of a runway which can be used for take-off, but not landing.
    • school:for=* a tag for schools to indicate what kinds of facilities are available for special needs students.
    • information=qr_code for tagging a QR code that provides information about a location of interest to tourists.
  • Voting on the pitch:net=* proposal, for indicating if a net is available at a sports pitch, is open until Saturday 2 July.
  • Voting on the following proposals has closed:
    • aeroway=aircraft_crossing to mark a point where the flow of traffic is impacted by crossing aircraft, was approved with 14 votes for, 0 votes against and 0 abstentions.
    • substation=* to improve tagging of power substations and transformers mixing on the same node, was approved with 11 votes for, 1 vote against and 0 abstentions.

Community

  • In the 133rd episode of the Geomob Podcast, Muki Haklay, Professor of Geoinformatics at UCL, an early adopter of combining geography with computer science and one of the earliest supporters of OpenStreetMap, is the guest. There is a discussion about extreme Citizen Science.
  • Nathalie Sidibé (fr) > de, from OSM Mali, is now involved in another community: Wikipedia! Her commitment to the Malian community, to open source data and of course to OSM has already been featured in several profiles. Now there is her full biography (fr) > de and an initiative of the ‘Les sans pagEs(fr) > de women geographers project.

Imports

  • Daniel Capilla provided (es) > de an update about the import of Iberdrola charging stations for electric vehicles in Malaga, which is now complete. The data is available under an open licence from the Municipality of Malaga (Spain). He maintains a corresponding wiki page for the documentation and coordination of open data imports.

Events

  • YouthMappers UMSA, a recently opened chapter of YouthMappers in Bolivia, tweeted (es) about their first OpenStreetMap training activity on 22 June.
  • Videos of the presentations at the SofM-Fr 2022 conference are now available (fr) online. A session listing for the conference, which was held 10 to 12 June in Nantes, is available (fr) > en on their website.

Education

  • Anne-Karoline Distel explained in a new video how to add running trails to OpenStreetMap.
  • Astrid Günther explained, in a tutorial, how she created vector tiles for a small area of Earth and hosts them herself.

OSM research

  • Youjin Choe, a PhD student in Geomatics at the University of Melbourne, Australia, is looking for your advice on a potential focus group study on the design of the OSM changeset discussion interface. Her research topic is on the user conflict management process in online geospatial data communities (which has mixed components of GIS, HCI, and organisational management).

Maps

  • Hub and spoke is a map that shows the 10 nearest airports to a given position.
  • CipherBliss published (fr) a thematic map of places to eat based on OpenStreetMap, ‘Melting Pot(fr) > en.

Open Data

  • [1] Osmose is now using open data to compare against OpenStreetMap data to find any missing roads or power lines in OSM. At present comparisons are made for power lines in France and highways in Spain.

Software

  • The first version of ‘Organic Maps’, a free and open source mobility map app for Android and iOS, was released (ru) > en last June (2021). After more than 100,000 installations and one year of intensive development work, the results and plans for the future are presented.

Programming

  • The new OSM app OSM Go! is looking for translators and developers.

Releases

  • Version 17.1 of the Android OSM editor Vespucci has been released.
  • With version StreetComplete v45.0-alpha1 Tobias Zwick introduced the new overlays functionality.

Did you know …

  • … that there are apps out there helping you find windy roads? Curvature, Calimoto and Kurviger are just some examples.
  • … the MapCSS style for special highlighting of bicycle infrastructure in JOSM?
  • HistOSM.org, which renders historical features exclusively?
  • … the Japanese local chapter of OSMF, OSMFJ, maintains a tile server and also offers a vector tile download service (via user smellman)? More details are on the wiki (ja) > en.

OSM in the media

  • OpenStreetMap featured (fr) > en (see video (fr)) ) in an overview of a wide range of modern mapping technologies in a segment on the France24 news channel. The OSM examples were: participative mapping in Africa (3m17s); and, Grab’s use of OSM in South-East Asia (4m10s), which allows them, unlike other map providers, to take into account the reality of Asia with rainy seasons and a lot of narrow roads. Other topics include Apple’s 3-D visualisation of Las Vegas, 360 degree image capture, indoor mapping and geoblocking.

Other “geo” things

  • Matthew Maganga wrote, in ArchDaily, about the inequalities created through modern mapping methods and especially Google StreetView.
  • Google Earth blogged about how they process Copernicus Sentinel-2 satellite images daily to create a current and historical land cover data set.
  • Saman Bemel Benrud, an early Mapbox employee, looked back at the 12 years he worked at the company and describes how it changed over time – leading to a failed attempt to found a union, which was part of the reason he left the company last year.
  • Canada and Denmark had a decades long land dispute, called the Whisky War, over an uninhabited Arctic island between Nunavut and Greenland. Following an agreement to divide control of Hans Island / Tartupaluk / ᑕᕐᑐᐸᓗᒃ, Canada now has a land border with a second country after the United States. Note that Canada also shares a maritime border with a second European country (France) near Newfoundland (second because Greenland is a constituent country of the Kingdom of Denmark).

Upcoming Events

Where What Online When Country
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Contribution osmcalpic 2022-06-28 flag
Hlavní město Praha MSF Missing Maps CZ Mapathon 2022 #2 Prague, KPMG office (Florenc) osmcalpic 2022-06-28
City of New York A Synesthete’s Atlas (Brooklyn, NY) osmcalpic 2022-06-29 flag
Roma Incontro dei mappatori romani e laziali osmcalpic 2022-06-29 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-06-30
Washington A Synesthete’s Atlas (Washington, DC) osmcalpic 2022-07-01 flag
Essen 17. OSM-FOSSGIS-Communitytreffen osmcalpic 2022-07-01 – 2022-07-03 flag
OSM Africa July Mapathon: Map Liberia osmcalpic 2022-07-01
OSMF Engineering Working Group meeting osmcalpic 2022-07-04
臺北市 OpenStreetMap x Wikidata Taipei #42 osmcalpic 2022-07-04 flag
San Jose South Bay Map Night osmcalpic 2022-07-06 flag
London Missing Maps London Mapathon osmcalpic 2022-07-05 flag
Berlin OSM-Verkehrswende #37 (Online) osmcalpic 2022-07-05 flag
Salt Lake City OSM Utah Monthly Meetup osmcalpic 2022-07-07 flag
Fremantle Social Mapping Sunday: Fremantle osmcalpic 2022-07-10 flag
München Münchner OSM-Treffen osmcalpic 2022-07-12 flag
20095 Hamburger Mappertreffen osmcalpic 2022-07-12 flag
Landau an der Isar Virtuelles Niederbayern-Treffen osmcalpic 2022-07-12 flag
Salt Lake City OSM Utah Monthly Meetup osmcalpic 2022-07-14 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, SK53, SeverinGeo, Strubbl, Supaplex, TheSwavu, YoViajo, derFred.

Women’s suffrage and the Hunger Strike Medal

16:16, Friday, 24 2022 June UTC

Dr Sara Thomas, Scotland Programme Coordinator for Wikimedia UK

On International Women’s Day, I ran training for long-term Wikimedia UK partners Protests & Suffragettes and Women’s History Scotland. The editathon focused on Scottish Suffrage(ttes), and is just one of a series of events that they’ll be running over the next few months.  

A few days after the event, I was tagged in a brilliant Twitter thread from one participant and new Wikipedia editor Becky Male. Becky had been working on the Hunger Strike Medal article. I was really struck not only by her new-found enthusiasm for Wikipedia editing, but also by this quote: Knowledge activism matters because, for most people, Wikipedia is their first port of call for new info. I did the Cat and Mouse Act in GCSE History. Don’t remember learning about the medal or the names of the women..” 

We often talk about Knowledge Activism in the context of fixing content gaps that pertain to voices and communities left out by structures of power and privilege, and how the gender gap manifests in different ways on-wiki. I thought that this was a great example of how the Wikimedia community’s work is helping to address those gaps, so I reached out to Becky to ask if she’d like to write a blog for us which you can read below. Thanks Becky!

Picture of the English suffragette Emily Davison, date unknown, but c.1910-12. CC0.

By Becky Male, @beccamale

Joining Wikipedia was one of those things I’d thought about doing from time to time – I’d come across an article that was woefully short and think to myself “someone should probably do something about that”. But fear of accidentally breaking something stopped me.

But then it’s International Women’s Day, and Women’s History Scotland, Protests & Suffragettes and Wikimedia UK are organising an Editathon to get some of the information P&S has found – they’ve created fantastic educational resources on the Scottish suffrage movement – added to Wikipedia. This is the Knowledge Gap: even when things are known about women, that knowledge hasn’t made it on to Wikipedia. It’s most people’s first port of call for new information, which makes this a big problem.

So I signed up and did the intro tutorial. A misspent adolescence on LiveJournal meant the leap from basic HTML to editing in source was fairly small. And there’s something about sitting in a Zoom call of two dozen women, all a bit nervous about this process too and being told “It’s okay, you really can’t screw this up that badly” that’s genuinely reassuring – failure’s a lot less scary when you’ve got backup.

Offline, I volunteer at Glasgow Women’s Library digitising artefacts. Creating the article on the Suffragette Penny sounded like a perfect extension of that. But it was wisely suggested that I should pick an existing article for my first. The Hunger Strike Medal needed work and was similar enough to get me started.

I studied the Cat and Mouse Act for GSCE History so I already had some background knowledge of the suffragette tactic of hunger striking. I cleaned up the lead, separated the information into sections and added a few other interesting titbits – as I learned at the Editathon, Wikipedia users love trivia. But the biggest change I made was to the list of medal recipients.

The medal was the WSPU’s highest honour – not only had a woman been gaoled for her beliefs, she’d risked her life and health for the cause. The hunger strikes and subsequent force-feeding by prison authorities contributed to early deaths, caused serious illnesses, and destroyed women’s mental health. They suffered horrifically and their sacrifices deserve to be remembered.

The list is now over 90 names, each one sourced, each medal confirmed. Some I found in books, maybe just one line about them. Others I found with a Google search, the suggested images showing me new medals the deeper I went, leading me to the sites of auction houses and local museums. My favourites, though, are in newsreels from 1955, women well into their 60s still proudly wearing their medals.

There’s another 60+ hunger strikers whose medals haven’t been found yet. Some names I moved to the Talk page if the evidence doesn’t support them on the list. I can’t say for sure that this is the most comprehensive list of WSPU hunger strikers but I think it’s likely – I certainly haven’t found one anywhere else.

And I’ve still got that Suffragette Penny article to write.

Militant suffragette Janie Terrero (1858-1944) wearing her Hunger Strike Medal and Holloway brooch c1912. CC0.

The post Women’s suffrage and the Hunger Strike Medal appeared first on WMUK.

The Global Data & Insights team’s Equity Landscape metrics project completed the first pilot cohort for data use and consultation in Feb-March 2022.

The purpose of the pilot and consultation was to gather feedback and suggestions from community use case participants either reflecting on the project and metrics design, or directly piloting the dashboard design options and metrics to understand their geographic space in terms of movement engagement signals.

The Pilot

The first cohort of the pilot launched in February and had nine participants who joined under a non-disclosure agreement (NDA) as one of the targeted community audiences. After an initial doodle poll to arrive at a common meeting time, two office hour slots were selected based on the time preference indicated by the participants. The virtual office hour sessions were hosted on Google Meet. The first sessions were held with two options in February, the first on a Friday from 4-5 PM UTC, with seven participants attending. The second, on a Tuesday from 5-6 PM UTC, with five participants attending.

The initial pilot session provided a basic overview of the pilot and data privacy boundaries, as well as a demonstration on how to navigate the data of the equity landscape within the dashboard testing space. The following sessions provided more demonstrations and specific walk-throughs with the data to answer key use case questions which had been selected as most relevant by the pilot cohort:

  1. What countries may be underrepresented in the grants or affiliates ecosystem? 
  2. How many other affiliates may be competing for resources in this same space? 
  3. Which countries have a strong affiliate presence and reader base but are lacking in editorship? 
  4. To what extent does a country engage in various languages and projects? 
  5. Does a country have a well-balanced movement organizer ecosystem to support an international event or extensive collaborative partnership?  

All demonstrations were recorded and made available to those who could not attend the sessions live. Upon the group’s request, we continued to host optional office hours every two to three weeks for technical support and for pilot participants to share back their experiences. We also asked pilot participants to alert us to bugs in the data, and help us to understand barriers within their user experience which would be used to improve our tooling design to ensure accessibility.

The Outcome

Throughout the pilot sessions, community participants provided input and feedback via email and direct talk page comments to the ten Directed Review Questions and Design Considerations. We captured and analyzed roughly 29 points of input from the meta portal comments from 5 community commenters who were actively involved as well as participant feedback gathered during the course of our first pilot sessions with 9 community participants.

The results have been analyzed and posted on Meta-Wiki. For the most part, we must continue to pilot and consult more closely over the coming months with some of our key movement organizing and governance bodies. We must ensure further dialogue on both the metrics design, as well as connect for future sense-making around the metrics.

Challenges & Barriers

The pilot process revealed important challenges that we encountered as a team. We faced time-zone challenges as we tried to find a common time for the monthly office hours. Given the community participants were geographically dispersed spanning from Latin America to Eastern Asia, we could not find a single time that worked for all. Language barrier was another challenge. The office hour sessions were hosted in English and all communications were drafted in English, and some participants faced difficulty in understanding and interpreting the purpose of the sessions due to the technical language required to discuss the Equity Landscape. Lastly, connectivity issues also brought challenges during calls where internet bandwidth could be overwhelmed when attempting to run both the video call and the Google data studio at the same time.

What Next?

We are looking to continue with the pilot period. We continue to monitor the talk page for new pilot volunteers and we plan to present some targeted demonstration workshops in July to preview the metrics to key use cases. 

We are also exploring translation support for key documentation and working to connect with key stakeholder groups who would benefit from a guided workshop and demo session.

If you are part of a strategic Wikimedia organizing or governance group interested in attending a demonstration workshop (1) watch for announcements and/or (2) reach out on our talk page if you have a specific workshop request!

Should Vector be responsive?

20:35, Thursday, 23 2022 June UTC

Here I share some thoughts around the history of "responsive" MediaWiki skins and how we might want to think about it for Vector.

The buzzword "responsive" is thrown around a lot in Wikimedia-land, but essentially what we are talking about is whether to include a single tag in the page. The addition of a meta tag with name viewport, will tell the page how to adapt to a mobile device.

<meta name="viewport" content="width=device-width, initial-scale=1">

More information: https://css-tricks.com/snippets/html/responsive-meta-tag/

Since the viewport tag must be added, by default websites are not made mobile-friendly. Given the traditional Wikimedia skins were built before mobile sites and this tag existed, CologneBlue, Modern, Vector did not add this tag.

When viewing these skins on mobile the content will not adapt to the device and instead will appear zoomed out. One of the benefits of this is that the reader sees a design that is consistent with the design they see on desktop. The interface is familiar and easy enough to navigate as the user can pinch and zoom to parts of the UI. The downside is that reading is very difficult, and requires far more hand manipulation to move between sentences and paragraphs, and for this reason, many search engines will penalize traffic.

Enter Minerva

The Minerva skin (and MobileFrontend before it) were introduced to allow us to start adapting our content for mobile. This turned out to be a good decision as it avoided the SEO of our projects from being penalized. However, building Minerva showed that making content mobile-friendly was more than adding a meta tag. For example, many templates used HTML elements with fixed widths that were bigger than the available space. This was notably a problem with large tables. Minerva swept many of these issues under the rug with generic fixes (For example enforcing horizontal scrolling on tables). Minerva took a bottom-up approach where it added features only after they were mobile-friendly. The result of this was a minimal experience that was not popular with editors.

Timeless

Timeless was the 2nd responsive skin added to Wikimedia wikis. It was popular with editors as it took a different approach to Minerva, in that it took a top-down approach, adding features despite their shortcomings on a mobile screen. It ran into many of the same issues that Minerva had e.g. large tables and copied many of the solutions in Minerva.

MonoBook

During the building of Timeless, the Monobook skin was made responsive (T195625). Interestingly this led to a lot of backlash from users (particularly on German Wikipedia), revealing that many users did not want a skin that adapted to the screen (presumably because of the reasons I outlined earlier - while reading is harder, it's easier to get around a complex site. Because of this, a preference was added to allow editors to disable responsive mode (the viewport tag). This preference was later generalized to apply to all skins:

Responsive Vector

Around the same time, several attempts were made by volunteers to force Vector to work as a responsive skin. This was feature flagged given the backlash for MonoBook's responsive mode. The feature flag saw little development, presumably because many gadgets popped up that were providing the same service.

Vector 2022

The feature flag for responsive Vector was removed for legacy Vector in T242772 and efforts were redirected into making the new Vector responsive. Currently, the Vector skin can be resized comfortably down to 500px. It currently does not add a viewport tag, so does not adapt to a mobile screen.

However, during the building of the table of contents, many mobile users started complaining (T306910). The reason for this was that when you don't define a viewport tag the browser makes decisions for you. To avoid these kind of issues popping up it might make sense for us to define an explicit viewport to request content that appears scaled out at a width of our choosing. For example, we could explicitly set width 1200px with a zoom level of 0.25 and users would see:

If Vector was responsive, it would encourage people to think about mobile-friendly content as they edit on mobile. If editors insist on using the desktop skin on their mobile phones rather than Minerva, they have their reasons, but by not serving them a responsive skin, we are encouraging them to create content that does not work in Minerva and skins that adapt to the mobile device.

There is a little bit more work needed on our part to deal with content that cannot hit into 320px e.g. below 500px. Currently if the viewport tag is set, a horizontal scrollbar will be shown - for example the header does not adapt to that breakpoint:


Decisions to be made

  1. Should we enable Vector 2022's responsive mode? The only downside of doing this is that some users may dislike it, and need to visit preferences to opt-out.
  2. When a user doesn't want responsive mode, should we be more explicit about what we serve them? For example, should we tell a mobile device to render at a width of 1000px with a scale of 0.25 ( 1/4 of the normal size) ? This would avoid issues like T306910. Example code [1] demo
  3. Should we apply the responsive mode to legacy Vector too? This would fix T291656 as it would mean the option applies to all skins.

[1]

<meta name="viewport" content="width=1400px, initial-scale=0.22">

The Board of Trustees has formally concluded the work of the Funds Dissemination Committee (FDC) with gratitude and appreciation for all of the dedication of its members over the years.

The FDC was developed in 2012 as an advisory body to the Board of Trustees, to “provide recommendations on requests for funding by eligible entities within the movement to achieve the mission goals of the movement”. The FDC framework and the members who served on it across the years have been instrumental in shaping our movement’s organizational development. Over the past three years, the FDC deliberations were put on pause so that key movement actors, individuals and groups could concentrate on developing our Movement Strategy

The FDC framework which oversaw the Wikimedia chapters’ Annual Plan Grants program was based on participatory grantmaking – where the decision-makers were people from the movement. The FDC made recommendations to the Board of Trustees for APG and also reviewed the Foundation’s annual plan. The FDC also introduced common practices for metrics and evaluation. The new, collaboratively developed Wikimedia Foundation Funds strategy builds upon valuable lessons from the FDC and focuses on decentralized community decision making, building a  relationship of partnership with grantees and learning. 

Moving forward, the seven new Regional Funds committees will support the review of grant applications in their region, aligned with the new funding strategy that brings decision-making closer to local communities and needs. The role previously filled by the FDC as a sub-committee to the Board to review the Wikimedia Foundation’s annual plan will be carried out through a movement-wide feedback process that will be built into the Annual Planning Process. This year, the Foundation has already shared a draft annual plan on Meta and hosted conversations with the movement to invite feedback and thoughts on the draft. That feedback is informing the next iteration of the plan, which will be finalized and shared with the Board in June. The experience from this year’s open community feedback process will also help shape future iterations of a collaboratively created Foundation plan. 

On behalf of the Board of Trustees, I would like to extend our thanks to everyone who has thoughtfully participated in the FDC framework to elevate our grants programs. This includes a special thank you to the members of the FDC, each of whom volunteered their time and spent countless hours reviewing applications and reports. I also want to thank the organizations who participated in the program, created thoughtful applications, and developed diverse and inspiring programs to support our work to bring knowledge to all. 

To ensure the institutional memory and learnings of the FDC framework are preserved, a retrospective was completed with members of the most recent committee and is available on Meta-wiki

On May 28, 2022, an editathon was held at OYA Soichi Library in Tokyo, Japan, which specializes in magazines. Araisyohei (a sysop of the Japanese Wikipedia) supported this event, and was in charge of recording video and photo shooting . A report in Japanese language by Natsumi Miura, a member of the staff, was published in a web-based e-mail magazine “ACADEMIC RESOURCE GUIDE” in CC-BY. Note that only the text of this article was published in the original e-mail magazine. Pictures were inserted for this article.

Publication

Contents

Wikipedia OYA was held.

Miura Natsumi (Sumida Midori Library, Certified Librarian, Japan Library Association 1154)

■ Summary

On May 28, 2022, “Wikipedia OYA,” a volunteer project, was held at Oya Soichi Library (Setagaya-ku, Tokyo), which specializes in magazines. I would like to introduce this project from the perspective of a staff member.

The aim of this project is to edit articles of Wikipedia, an Internet encyclopedia that can be edited by anyone, using magazines in OYA Soichi Library. The Wikipedians who participated in this event were: さえぼー, Swanee, 逃亡者, のりまき, and Eugene Ormandy (the planner of this project).

This event was originally scheduled to take place in April of 2021, but it was postponed due to the spread of Covid-19, and held in May of 2022. Considering the situation of the infection, we didn’t widely recruit the participants. Information of this event was also posted in SNS with the hashtag #WikipediaOYA.

■ Aim of the project

“WikipediaOYA” was planned by a group of volunteers with the aim of spreading awareness of good compatibility of Wikipedia and magazines, which strongly represent the social trends and fashions of the times, and it was also planned to promote the  utilization of Oya Soichi Library.

Because magazines contain a variety of topics scattered throughout a single volume, it is difficult to find the information you need. However, the OYA Soichi Library’s unique Oya-style index classification method makes it easy to find the information you are looking for.

We held this event to let Wikipedian editors or Wikipedia readers know the value of Oya Soichi Library, large number of magazines and convenient search system.

■ OYA Soichi Library

OYA Soichi Library, a magazine library in Japan, was established in 1971 based on the magazine collection of a critic Soichi Oya (1900-1970) who had wanted to make his collection “available for every person to use”.

Last year (2021) marked the 50th anniversary of its establishment, and it continues to collect magazines. The collection amounts to approximately 12,000 and it is used by press such as newspapers and TV stations, as well as researchers and students.

However, due to the spread of the Internet and the Covid-19 lockdowns, the number of users has decreased. In 2017, the library started the crowdfunding using READYFOR, and the monetary goal has been achieved in 3 days. Moreover, the library newly launched a paid membership system named “OYA-bunko patronage”.

■ The theme “Bread”.

“Bread” was chosen as the theme for this event to take advantage of the magazine’s characteristics. The inspiration for this theme came from a magazine pathfinder (resource list) of “Curry Bread” on the website of OYA Soichi Library.

Another reason is that the information about unique breads in Japan has not been compiled as a book and is not well organized also on Wikipedia.

■ Visiting

Firstly, Mr. Hiroshi Kamoshida, a staff of OYA Soichi Library explained to the participants how to use its original database “Web OYA-bunko”. Users can search for magazines mainly from 1988 by keywords and unique classification. I searched for “brain bread (zunou-pan)”, which is said to make you smarter. I was surprised that I can find the information which I thought is not available by using this database. You can also use this database at public libraries with which OYA Soichi Library have contracts, but the number of them is not large.

After that, Mr. Kamoshida showed us the stacks, which are usually closed to the public (magazines are took out by library staff). The stacks, divided into eight rooms, are full of magazines.

We saw many magazines, first issue of a magazine that is still published today, a magazine that has already ceased publication, and the oldest magazine in the collection, “Kaikan Zasshi (會館雑誌)” published in 1875. We can also flip through the pages of a magazine published 100 years ago. It included special features about the world100 years later.

Magazines in OYA Soichi Library were stored, searchable, findable, and accessible. In public libraries, serials such as magazines are often removed after their storage period has passed. I reaffirmed the significance of the OYA Soichi Library.

■ Research and Edit

Participants used the database and “OYA Soichi Library Index Catalog,” which is a catalog of magazines before 1987, to find the information they need. Not only Wikipedians but also staff actually used the database.

We picked up some fashion and cuisine magazines such as “Hanako”, “Tokyo Walker”, and “Dancyu”, and other magazines such as “Mainichi Graph”, “Shukan Jitsuwa”, and “Seventeen”. One participant commented that “I didn’t expect to find so many articles”.

MARC (Machine Readable Cataloging) used by public libraries does not contain such detail information especially about serialized articles or columns included in feature articles.

Four articles ([[ウチキパン]] [[かにぱん]] [[なかよしパン]] [[マグノリアベーカリー]]) were written and published on the day of the project using magazines. More articles will be published.

■ After the project

I often feel that if I cannot find the information, it is the same as if it didn’t exist. On the contrary, if we can find, we can use the information. 

Soichi Oya said “Books are not for read. They are for search”. I felt that the OYA Soichi Library, which not only collects magazines but also stores them in a searchable format, has more places to be utilized, not limited to Wikipedia.

Visiting OYA Soichi library is the best way to enjoy its uniqueness, but if you can’t visit, you can use database “Web OYA-bunko”. If this database were available at other libraries like prefectural libraries, for example, I wonder how many missing information could be found. How can we support this rare specialized library which has materials that are not available anywhere else? This is one of the main reasons why I agreed with the planners and participated in the project.

We want to continue to hold Wikipedia OYA. Through this project, we would like to support the Oya Soichi Library so that it will continue to be utilized in the future. The next event has not yet been decided, but we will post information on SNS with the hashtag #WikipediaOYA. If you are interested in this event, please keep an eye on this tag. Sharing information is also welcome! Why don’t you join us in supporting the Soichi Oya Library?

Author’s profile

Natsumi Miura. After working in the magazine editorial department of a publishing company, she joined TRC Library Service Co., Ltd. in 2008 and works at Sumida Midori Library as a designated manager. She was the responsible editor for LRG No. 35, which was released last year. Outside of the library, she is active in drawing illustrations and focusing on the library’s book post. On June 18 (Sat.), she held “Wikipedia Town Sumida” with volunteers.

*This text is released under the Creative Commons License CC-BY.

Episode 115: BTB Digest 18

18:15, Tuesday, 21 2022 June UTC

🕑 30 minutes

It's another BTB Digest episode! Mike Cariaso explains why you should use SQLite, Tyler Cipriani talks about teaching deployment to volunteers, Dror Snir-Haim compares translation options, Alex Hollender defends sticky headers, Kunal Mehta criticizes Bitcoin miners, and more!

June 21, 2022, San Francisco, CA, USA ― Wikimedia Enterprise, a first-of-its-kind commercial product designed for companies that reuse and source Wikipedia and Wikimedia projects at a high volume, today announced its first customers: multinational technology company Google and nonprofit digital library Internet Archive.  Wikimedia Enterprise was recently launched by the Wikimedia Foundation, the nonprofit that operates Wikipedia, as an opt-in product. Starting today, it also offers a free trial account to new users who can self sign-up to better assess their needs with the product.

As Wikipedia and Wikimedia projects continue to grow, knowledge from Wikimedia sites is increasingly being used to power other websites and products. Wikimedia Enterprise was designed to make it easier for these entities to package and share Wikimedia content at scale in ways that best suit their needs: from an educational company looking to integrate a wide variety of verified facts into their online curricula, to an artificial intelligence startup that needs access to a vast set of accurate data in order to train their systems. Wikimedia Enterprise provides a feed of real-time content updates on Wikimedia projects, guaranteed uptime, and other system requirements that extend beyond what is freely available in publicly-available APIs and data dumps. 

“Wikimedia Enterprise is designed to meet a variety of content reuse and sourcing needs, and our first two customers are a key example of this. Google and Internet Archive leverage Wikimedia content in very distinct ways, whether it’s to help power a portion of knowledge panel results or preserve citations on Wikipedia,” said Lane Becker, Senior Director of Earned Revenue at the Wikimedia Foundation. “We’re thrilled to be working with them both as our longtime partners, and their insights have been critical to build a compelling product that will be useful for many different kinds of organizations.” 

Organizations and companies of any size can access Wikimedia Enterprise offerings with dedicated customer-support and Service Level Agreements, at a variable price based on their volume of use. Interested companies can now sign up on the website for a free trial account which offers 10,000 on-demand requests and unlimited access to a 30-day Snapshot. 

Google and the Wikimedia Foundation have worked together on a number of projects and initiatives to enhance knowledge distribution to the world. Content from Wikimedia projects helps power some of Google’s features, including being one of several data sources that show up in its knowledge panels. Wikimedia Enterprise will help make the content sourcing process more efficient. Tim Palmer, Managing Director, Search Partnerships at Google said, “Wikipedia is a unique and valuable resource, created freely for the world by its dedicated volunteer community. We have long supported the Wikimedia Foundation in pursuit of our shared goals of expanding knowledge and information access for people everywhere. We look forward to deepening our partnership with Wikimedia Enterprise, further investing in the long-term sustainability of the foundation and the knowledge ecosystem it continues to build.”

Internet Archive is a long-standing partner to the Wikimedia Foundation and the broader free knowledge movement. Their product, the Wayback Machine, has been used to fix more than 9 million broken links on Wikipedia. Wikimedia Enterprise is provided free of cost to the nonprofit to further support their mission to digitize knowledge sources. Mark Graham, Director of the Internet Archive’s Wayback Machine shared, “The Wikimedia Foundation and the Internet Archive are long-term partners in the mission to provide universal and free access to knowledge. By drawing from a real time feed of newly-added links and references in Wikipedia sites – in all its languages, we can now archive more of the Web more quickly and reliably.”

Wikimedia Enterprise is an opt-in, commercial product. Within a year of its commercial launch, it is covering its current operating costs and with a growing list of users exploring the product. All Wikimedia projects, including the suite of publicly-available datasets, tools, and APIs the Wikimedia Foundation offers will continue to be available for free use to all users. 

The creation of Wikimedia Enterprise arose, in part, from the recent Movement Strategy – the global, collaborative strategy process to direct Wikipedia’s future by the year 2030 devised side-by-side with movement volunteers. By making Wikimedia content easier to discover, find, and share, the product speaks to the two key pillars of the 2030 strategy recommendations: advancing knowledge equity and knowledge as a service. 

Interested companies are encouraged to visit the Wikimedia Enterprise website for more information on the product offering and features, as well as to sign up for their free account. 

About the Wikimedia Foundation 

The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Wikimedia Enterprise is operated by Wikimedia, LLC, a wholly owned limited liability company (LLC) of the Wikimedia Foundation. The Foundation’s vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. 

The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive donations from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

For more information on Wikimedia Enterprise:

How does Internet Archive know?

19:30, Monday, 20 2022 June UTC

The Internet Archive discovers in real-time when WordPress blogs publish a new post, and when Wikipedia articles reference new sources. How does that work?

Wikipedia

Wikipedia, and its sister projects such as Wiktionary and Wikidata, run on the MediaWiki open-source software. One of its core features is “Recent changes”. This enables the Wikipedia community to monitor site activity in real-time. We use it to facilitate anti-spam, counter-vandalism, machine learning, and many more quality and research efforts.

MediaWiki’s built-in REST API exposes this data in machine-readable form to query (or poll). For wikipedia.org, we have an additional RCFeed plugin that broadcasts events to the stream.wikimedia.org service (docs).

The service implements the HTTP Server-Sent Events protocol (SSE). Most programming languages have an SSE client via a popular package. Most exciting to me, though, is the original SSE client: the EventSource API — built straight into the browser.1 This makes cool demos possible, getting started with only the following JavaScript:

new EventSource('https://stream.wikimedia.org/…');

And from the command-line, with cURL:

$ curl 'https://stream.wikimedia.org/v2/stream/recentchange'

event: message
id: …
data: {"$schema":…,"meta":…,"type":"edit","title":…}

WordPress

WordPress played a major role in the rise of the blogosphere. In particular, ping servers (and pingbacks2), helped the early blogging community with discovery. The idea: your website notifies a ping server over a standardized protocol. The ping server in turn notifies feed reader services (Feedbin, Feedly), aggregators (FeedBurner), podcast directories, search engines, and more.3

Ping servers today implement the weblogsCom interface (specification), introduced in 2001 and based on the XML-RPC protocol.4 The default ping server in WordPress is Automattic’s Ping-O-Matic, which in turn powers the WordPress.com Firehose.

This firehose is a Jabber/XMPP server at xmpp.wordpress.com:8008. It provides events about blog posts published in real-time, from any WordPress site. Both WordPress.com and self-hosted ones.5 The firehose is also available in as HTTP stream.

$ curl -vi xmpp.wordpress.com:8008/posts.org.json # self-hosted
{ "published":"2022-06-05T21:26:09Z",
  "verb":"post",
  "generator":{},
  "actor":{},
  "target":{"objectType":"blog",…,},
  "object":{"objectType":"article",…}
}
{}

$ curl -vi xmpp.wordpress.com:8008/posts.json # WordPress.com
{}

Internet Archive

It might be surprising, but the Internet Archive does not try to index the entire Internet. This in contrast to commercial search engines.

The Internet Archive consists of bulk datasets from curated sources (“collections”). Collections are often donated by other organizations, and go beyond capturing web pages. They can also include books, music,6, and software.7 Any captured web pages are additionally surfaced via the Wayback Machine interface.

Perhaps you’ve used the “Save Page Now” feature, where you can manually submit URLs to capture. While also represented by a collection, these actually go to the Wayback Machine first, and appear in bulk as part of the collection later.

The Common Crawl and Wide Crawl collections represent traditional crawlers. These starts with a seed list, and go breadth-first to every site it finds (within a certain global and per-site depth limit). Such crawl can take months to complete, and captures a portion of the web from a particular period in time — regardless of whether a page was indexed before. Other collection are more narrow in focus, e.g. regularly crawl a news site and capture any articles not previously indexed.

Wikipedia collection

One such collection is Wikipedia Outlinks.8 This collection is fed several times a day with bulk crawls of new URLs. The URLs are extracted from recently edited or created Wikipedia articles, as discovered via the events from stream.wikimedia.org (Source code: crawling-for-nomore404).

en.wikipedia.org, revision by Krinkle, on 30 May 2022 at 21:03:30.

Last month, I edited the VodafoneZiggo article on Wikipedia. My edit added several new citations. The articles I cited were from several years ago, and most already made their way into the Wayback Machine by other means. Among my citations was a 2010 article from an Irish news site (rtl.ie). I searched for it on archive.org and no snapshots existed of that URL.

A day later I searched again, and there it was!

web.archive.org found 1 result, captured at 30 May 2022 21:03:55. This capture was collected by: Wikipedia Eventstream.

I should note that, while the snapshot was uploaded a day later, the crawling occurred in real-time. I published my edit to Wikipedia on May 30th, at 21:03:30 UTC. The snapshot of the referenced source article, was captured at 21:03:55 UTC. A mere 25 seconds later!

In addition to archiving citations for future use, Wikipedia also integrates with the Internet Archive in the present. The so-called InternetArchiveBot (source code) continously crawls Wikipedia, looking for “dead” links. When it finds one, it searches the Wayback Machine for a matching snapshot, preferring one taken on or near the date that the citation was originally added to Wikipedia. This is important for online citations, as web pages may change over time.

The bot then edits Wikipedia (example) to rescue the citation by filling in the archive link.

Wikipedia.org, revision by InternetArchiveBot, on 4 June 2022. Rescuing 1 source. The source was originally cited on 29 September 2018. The added archive URL is also from 29 September 2018. web.archive.org, found 1 result, captured 29 September 2018. This capture was collected by: Wikipedia Eventstream.

WordPress collection

The NO404-WP collection on archive.org works in a similar fashion. It is fed by a crawler that uses the WordPress Firehose (source code). The firehose, as described above, is pinged by individual WordPress sites after publishing a new post.

For example, this blog post by Chris. According to the post metadata, it was published at 12:00:42 UTC. And by 12:01:55, one minute later, it was captured.9

In addition to preserving blog posts, the NO404-WP collection goes a step further and also captures any new material your post links to. (Akin to Wikipedia citations!) For example, this css-tricks.com post links to file on GitHub inside the TT1 Blocks project. This deep link was not captured before and is unlikely to be picked up by regular crawling due to depth limits. It got captured and uploaded to the NO404-WP collection a few days later.

Further reading

Footnotes:

  1. The “Server-sent events” technology was around as early as 2006, originating at Opera (announcement, history). It was among the first specifications to be drafted through WHATWG, which formed in 2004 after the W3C XHTML debacle

  2. Pingback (Pingbacks explained, history) provides direct peer-to-peer discovery between blogs when one post mentions or links to another post. By the way, the Pingback and Server-Sent Events specifications were both written by Ian Hickson. 

  3. Feedbin supports push notifications. While these could come from from its periodic RSS crawling, it tries to deliver these in real-time where possible. It this does by mapping pings from blogs that notify Ping-O-Matic, to feed subscriptions. 

  4. The weblogUpdates spec for Ping servers was writen by Dave Winer in 2001, who took over Weblogs.com around that time (history) and needed something more scalable. This, by the way, is the same Dave Winer who developed the underlying XML-RPC protocol, the OPML format, and worked on RSS 2.0. 

  5. That is, unless the blog owner opts-out by disabling the “search engine” and “ping” settings in WordPress Admin. 

  6. The Muziekweb collection is one that stores music rather than web pages. Muziekweb is a library in the Netherlands that lends physical CDs, via local libraries, to patrons. They also digitize their collection for long-term preservation. One cool application of this, is that you can stream any album in full from a library computer. And… they mirror to the Internet Archive! You can search for an artist, and listen online. For copyright reasons, most music is publicly limited to 30s samples. Through Controlled digital lending, however, you can access many more albums in full. Plus you can publicly stream any music in the public domain, under a free license, or pre-1972 no longer commercially available

  7. I find particularly impressive that Internet Archive also host platform emulators for the software it preserves, and that these platforms not only include game consoles but also Macintosh and MS-DOS, and that these emulators are then compiled via Emscripten to JavaScript and integrated right on the archive.org entry! For example, you can play the original Prince of Persia for Mac (via pce-macplus.js), the later color edition, or Wolfenstein 3D for MS-DOS (via js-dos or em-dosbox), or check out Bill Atkinson’s 1985 MacPaint

  8. The “Wikipedia Outlinks” collection was originally populated via the NO404-WKP subcollection, which used the irc.wikimedia.org service from 2013 to 2019. It was phased out in favour of the wikipedia-eventstream subcollection

  9. In practice, the ArchiveTeam URLs collection tends to beat the NO404-WP collection and thus the latter doesn’t crawl it again. Perhaps the ArchiveTeam scripts also consume the WordPress Firehose? For many WordPress posts I checked, the URL is only indexed once, which is from “ArchiveTeam URLs” doing so within seconds of original publication. 

Tech News issue #25, 2022 (June 20, 2022)

00:00, Monday, 20 2022 June UTC
previous 2022, week 25 (Monday 20 June 2022) next

Tech News: 2022-25

weeklyOSM 621

09:59, Sunday, 19 2022 June UTC

07/06/2022-13/06/2022

lead picture

Chaz Hutton is probably not the only one for whom the benefits of OSM are new. [1] © G-Maps | map data © OpenStreetMap contributors

Mapping

  • Data curator arredond explained how working on map layers for the Felt company was an opportunity to find and correct tagging errors.
  • In the fourth part of a series about the specific challenges of working with map data, Daniel Mescheder wrote about the importance of tracking changes.
  • MarcoR noted that there are templates (KeyDescription and ValueDescription) in the OSM wiki that are used by taginfo to display some useful information to the user. Since the same information is included in the data element associated with the feature page, some wiki users in good faith truncate these templates to their minimum (e.g. ‘{{KeyDescription}}’), preventing taginfo from retrieving the data.
  • willkmis shared a personal view on urban road classification from a North American perspective.
  • The proposal for county, city and local highway networks in the United States was approved with 17 votes for, 0 votes against and 2 abstentions.

Community

  • Amanda McCann wrote about the new moderator team and etiquette guidelines for the talk and osmf-talk mail lists.
  • Amanda’s work report for May 2022 is available online.
  • The OpenStreetMap Taiwan Community (OSMTW) gathered for its second workshop. Whether participating on-site or online, the participants worked very hard to map on OpenStreetMap or uploaded related images to Wikimedia Commons. Even during the peak of the COVID-19 pandemic in Taiwan, there were two geographic teachers who attended the workshop to learn more about OpenStreetMap, and how to use OpenStreetMap in the curriculum. OpenStreetMap Taiwan will use the resource supported by the alliance grant from the Wikimedia Foundation to support related workshops scheduled from March 2022 until February 2023. OSMTW is dedicated to organising at least six street-view expeditions and six edit workshops.
  • Zhengyi Cao and Chris Park briefly reported on their projects in this year’s GSoC.
  • Ed Freyfogle talked to Ilya Zverev about mapping in general and about Ilya’s new OSM editor Every Door, in Geomob Podcast #132.
  • OSM Belgium has chosen Nicxon Piaso, from Papua New Guinea, as Mapper of the Month and introduced him in an interview.
  • Pieter Vander Vennet wrote about educational facilities, the current way of tagging them, and examined how to converge towards unified tagging of schools. Previous discussions on the subject and the difficulty in unifying even a simple country are reported by other contributors.

OpenStreetMap Foundation

  • Paul Norman pointed out the North American capacity issues with pyrene, the only US render server. It looks like Amazon may help with the server problem.
  • Simon Poole shared his insights about the limits and possibilities of reaching a EU-wide General Data Protection Regulations (GDPR)-compatible OSM world.

Local chapter news

  • There are still places available (de) > en at the OSM-FOSSGIS community meeting on the weekend of 1 to 3 July at the Linuxhotel in Essen. Travel is at your own expense; accommodation and meals will be provided by the the FOSSGIS Association, the German regional representation of OSM.

Education

  • Anne-Karoline Distel shared a new video on the topic ‘Adding roads to hiking route relations’.
  • Daniel Capilla presented (es) > en a brief exposition of the possibilities offered by the OpenStreetMap data mining tool for those who wish to collaborate in the task of field verification of recycling containers in the municipality of Malaga, based on the key check_date and using the overpass turbo query wizard.

OSM research

  • Veniamin Veselovsky, Dipto Sarkar, Jennings Anderson and Robert Soden published a scientific paper about the development of an automated technique for identifying corporate mapping in OpenStreetMap.

Maps

  • Christoph Hormann published the third and fourth parts of a series about the depiction of trees in maps.
  • Holocrypto provides OSM Planet, Europe and Netherlands vectorial MBTiles for personal or educational use. The MBtiles packages are updated regularly.
  • Neue Züricher Zeitung is publishing an OSM-based daily updated interactive map of developments in the Ukraine war.

Software

  • Grab launched GrabMaps, which aims to tap into the US$1 billion map and location-based services market in Southeast Asia. Grab still uses OpenStreetMap as its map base.

Did you know …

  • flipcoords, OpenCage’s new tool for reformatting coordinates to and from lat/lng to lng/lat or into named parameters?

Other “geo” things

  • [1] You might think the whole world knows about OpenStreetMap, and then you read this light-bulb moment (aha moment) from Chaz Hutton on Twitter. ‘Shout out to Ed Freyfogle for getting me onto it’, Chaz comments on his new insight.
  • Without words
  • The European Space Agency has released (fr) > en a three dimensional map of the Milky Way, including nearly two billion stars. It took ten years for the Gaia satellite, 1.5 million kilometres away from Earth, to collect the data and the mission will continue until 2025.
  • About a hundred people were rescued (fr) > en during a school trip this week in Kleinwalsertal, Austria. The teachers were apparently misguided by false information on the internet, leading the group onto the Heuberggrat path without warning them about its sheer difficulty.
  • As TechCrunch reported, Russian tech giant Yandex has removed national borders from its map apps.

Upcoming Events

Where What Online When Country
Arrondissement de Tours La liberté numérique osmcalpic 2022-06-18 flag
京都市 京都!街歩き!マッピングパーティ:第31回 妙法院 osmcalpic 2022-06-18 flag
新店區 OpenStreetMap 街景踏查團 #2 三峽-大溪踏查 osmcalpic 2022-06-19 flag
OSMF Engineering Working Group meeting osmcalpic 2022-06-20
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Initiation osmcalpic 2022-06-21 flag
Kaiserslautern Erfassung von Barrieren in Kaiserslautern osmcalpic 2022-06-21 flag
Lyon Rencontre mensuelle Lyon osmcalpic 2022-06-21 flag
152. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-06-21
San Jose South Bay Map Night osmcalpic 2022-06-22 flag
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-06-21 flag
TeachOSM Map-Along osmcalpic 2022-06-22
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-06-21 flag
Manila Making OSM a Safer Space for LGBTQIA+ Mapper – An Intro to SOGIESC (Sexual Orientation, Gender Identity and Expression, and Sex Characteristics) and How to be a better Ally? osmcalpic 2022-06-22 flag
Washington OpenStreetMap US Mappy Hour osmcalpic 2022-06-23 flag
Roma Capitale Incontro dei mappatori romani e laziali osmcalpic 2022-06-22 flag
Kaiserslautern Erfassung von Barrieren in Kaiserslautern osmcalpic 2022-06-23 flag
Oriental Mindoro Open Mapping Hub Asia Pacific’s Map and Chat Hour (PRIDE Celebration) osmcalpic 2022-06-24 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen osmcalpic 2022-06-24 flag
IJmuiden OSM Nederland bijeenkomst (online) osmcalpic 2022-06-25 flag
Tanzania Mapping Groups June Mapathon osmcalpic 2022-06-25
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Contribution osmcalpic 2022-06-28 flag
Hlavní město Praha MSF Missing Maps CZ Mapathon 2022 #2 Prague, KPMG office (Florenc) osmcalpic 2022-06-28
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-06-30
Essen 17. OSM-FOSSGIS-Communitytreffen osmcalpic 2022-07-01 – 2022-07-03 flag
San Jose South Bay Map Night osmcalpic 2022-07-06 flag
London Missing Maps London Mapathon osmcalpic 2022-07-05 flag
Salt Lake City OSM Utah Monthly Meetup osmcalpic 2022-07-07 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, Sammyhawkrad, Strubbl, TheSwavu, derFred.

Cool desktops don’t change 😎

18:52, Thursday, 16 2022 June UTC

Tools can be a subtle trap.

– Neil Gaiman, The Sandman

My ThinkPad X220 in all its glory

Working on my old ThinkPad x220 feels easy because I’ve used the same software for over a decade.

And while it’s tempting to switch to one of the endless new apps out there, there are good reasons to trust old tools.

The Lindy Effect

My boring desktop

The Lindy effect posits the older something is, the longer it’ll be around.

According to the Lindy Effect, you can assume most software is halfway through its lifespan.

So, Vi will be around in 2068, whereas Visual Studio Code will be defunct before the end of this decade.1

Debian 1993 28 years old
Bash 1989 33 years old
XMonad 2007 15 years old
URXvt 2001 20 years old
Tmux 2007 15 years old
Vim 1991 30 years old

The average software running my laptop is 24 years old. So, 24 more years of this desktop (right!? 😅).

Preserve your flow state

My desktop has features that are missing from other people’s computers.

These features whisk me into a flow state and keep me there; they preserve my limited attention, willpower, and (frankly) mental capacity.

Vim instead of a new notetaking app

Sage advice from @netcapgirl on Twitter, 2022-04-27

The problem with most notetaking apps is editing text outside Vim breaks my brain.

Vimwiki has piqued my interest, but I have yet to use it.

Meanwhile, I keep boring notes in Vim using a bash script, Pandoc, markdown, and a distraction-free writing environment.

In the end, I get a bunch of webpages available on http://localhost/~thcipriani/brain:

This is basically Roam + Obsidian + Notion…right?

Bash instead of DuckDuckGo

Some folks rely on DuckDuckGo (or, worse yet, Google) for basic utilities:

  • Calculator
  • Dictionary
  • Spell check
  • Unit conversion

But you can achieve the same thing faster, without breaking your concentration.

  • Calculator

    I stole the calc bash function from Addy Osmani in 2012 and have used it daily since.

    (/^ヮ^)/*:・゚✧ calc 6922251*8
    55378008
  • Dictionary

    I 😍 dictionaries.

    One of the best dictionaries for writers is available as the gcide-dict package in Debian:

    (/^ヮ^)/*:・゚✧ dict fustian
      Fustian \Fus"tian\, n.
         1. A kind of coarse twilled cotton or cotton and linen stuff,
            including corduroy, velveteen, etc.
            [1913 Webster]
    
         2. An inflated style of writing; a kind of writing in which
            high-sounding words are used, above the dignity of the
            thoughts or subject; bombast.
            [1913 Webster]
    
                  Claudius . . . has run his description into the most
                  wretched fustian.                              --Addison.
            [1913 Webster]
  • Spell checker

    Spellcheck is available almost everywhere, but when it isn’t, people tend to search for whatever word they’re spelling. I wrote a script called spell which uses aspell to improve my spelling:

    spell in action
  • Temperature conversion

    units is a unit conversion and calculation program. And its database is one of my favorite sources of trivia.

    You can make scripts for the most common functions. I made one called temp, which uses units to show temp in both °C and °F—handy for talking about the weather.

    (/^ヮ^)/*:・゚✧ temp 100
    37.777778°C
    212°F

Scratchpads instead of pinned tabs

Scratchpads are little windows you summon with a keyboard shortcut. I’ve combined XMonad and Chrome to get little floating web apps all over my desktop.

  • ⌘ + Shift + p is an ever-present notetaking terminal window.
  • ⌘ + Shift + s is Google calendar.
  • ⌘ + Shift + o used to bring up an org-mode capture template, but now it brings up todoist (yes, I’m suitably ashamed).
XMonad NamedScratchpads 👏

The year decade of Linux on the desktop

There’s a bitter joke that goes like this: “It’s the year of Linux on the desktop.”

People say it in video calls when they can’t get their audio to work. But, honestly, I’ve had a pleasant decade of Linux on the desktop.

And when Wayland finally happens? Well. I guess I’ll have no choice but to stop using computers forever ¯\_(ツ)_/¯2.

Anyway. Here’s to the next decade and beyond.


Thanks to Brennen, Kostah, and Željko for reading an early draft of this post and making it less terrible. <3


  1. There is precedent here: https://github.blog/2022-06-08-sunsetting-atom/↩︎

  2. or, I suppose, I could finally figure out how to use SwayWM↩︎

When a Wikipedia research project becomes a thesis

15:51, Thursday, 16 2022 June UTC
Maria Murad
Maria Murad. Image courtesy Maria Murad, all rights reserved.

Maria Murad decided to take Heather Sharkey’s course at the University of Pennsylvania because it involved learning how to write a Wikipedia article.

“I was already working for my school newspaper, The Daily Pennsylvanian, and I wanted to explore more avenues that allowed me to create short form, accessible content on important topics,” Maria explains. “I find that academic articles can be pretty inaccessible for most. Wikipedia or news articles are more accessible to the masses. I know when I want to learn more about something, the first thing I do is search it on Wikipedia. This course felt like an opportunity to have a meaningful impact on a platform that virtually everyone uses to learn about new topics.”

The assignment provided a meaningful impact for Wikipedia’s readers — but also to Maria herself. Dr. Sharkey had provided a list of women connected to the University of Pennsylvania Museum of Archaeology and Anthropology (better known as the Penn Museum) as potential subjects for their Wikipedia assignment. One name on the list was Florence Shotridge, or Kaatxwaaxsnéi, a Native Alaskan Tlingit ethnographer, museum educator, and weaver who worked at the Penn Museum for several years. There was little to no information online about her. Intrigued, Maria set out to research her.

Florence Shotridge
Florence Shotridge

“I took a lot of trips to the Museum Archives and learned that she was one of the first American Indians  to lead an anthropological expedition (alongside her husband), an excellent Chilkat weaver, and a museum educator guide to schoolchildren who would visit the museum,” Maria says. “Her husband, Louis Shotridge, already had a Wikipedia article and there was a lot of information about him at the Museum, but it seemed like Florence’s legacy was mostly invisible.”

Maria made it more visible by creating her biography on Wikipedia. But the assignment inspired Maria further: She also made Florence Shotridge the focus of her senior thesis, including creating a short documentary film about her life.

“I think one of the best skills I gained from writing for Wikipedia was the ability to succinctly synthesize various sources,” Maria says. “Since little was written about most of these women before, I had to combine primary research I discovered in the Museum archives with object histories in the Museum with brief mentions in academic articles. I had to marry a variety of sources together in a clear and accessible way in order to publish it on Wikipedia. I think this is a very important skill to have in academic writing.”

It’s a skill Maria is now putting to use. A Kentucky native, she graduated from Penn in 2021. Now, she’s studying Visual, Material, and Museum Anthropology in a master’s program at the University of Oxford. While Oxford is keeping her busy, she hopes to get back to editing Wikipedia soon, especially creating new articles about women.

“Though supplementing details and information on extant articles was a worthwhile and rewarding task, it felt very special to contribute something new to the platform that would lead to so many more people learning about important women in Penn’s history that would have never known about them before,” says Maria.

To learn more about the Wikipedia Student Program, visit teach.wikiedu.org.

Image credit: Bain News Service, publisher, Public domain, via Wikimedia Commons

Production Excellence #44: May 2022

01:13, Thursday, 16 2022 June UTC

How’d we do in our strive for operational excellence last month? Read on to find out!

Incidents

By golly, we've had quite the month! 10 documented incidents, which is more than three times the two-year median of 3. The last time we experienced ten or more incidents in one month, was June 2019 when we had eleven (Incident graphs, Excellence monthly of June 2019).

I'd like to draw your attention to something positive. As you read the below, take note of incidents that did not impact public services, and did not have lasting impact or data loss. For example, the Apache incident benefited from PyBal's automatic health-based depooling. The deployment server incident recovered without loss thanks to Bacula. The Etcd incident impact was limited by serving stale data. And, the Hadoop incident recovered by resuming from Kafka right where it left off.

2022-05-01 etcd
Impact: For 2 hours, Conftool could not sync Etcd data between our core data centers. Puppet and some other internal services were unavailable or out of sync. The issue was isolated, with no impact on public services.

2022-05-02 deployment server
Impact: For 4 hours, we could not update or deploy MediaWiki and other services, due to corruption on the active deployment server. No impact on public services.

2022-05-05 site outage
Impact: For 20 minutes, all wikis were unreachable for logged-in users and non-cached pages. This was due to a GlobalBlocks schema change causing significant slowdown in a frequent database query.

2022-05-09 Codfw confctl
Impact: For 5 minutes, all web traffic routed to Codfw received error responses. This affected central USA and South America (local time after midnight). The cause was human error and lack of CLI parameter validation.

2022-05-09 exim-bdat-errors
Impact: During five days, about 14,000 incoming emails from Gmail users to wikimedia.org were rejected and returned to sender.

2022-05-21 varnish cache busting
Impact: For 2 minutes, all wikis and services behind our CDN were unavailable to all users.

2022-05-24 failed Apache restart
Impact: For 35 minutes, numerous internal services that use Apache on the backend were down. This included Kibana (logstash) and Matomo (piwik). For 20 of those minutes, there was also reduced MediaWiki server capacity, but no measurable end-user impact for wiki traffic.

2022-05-25 de.wikipedia.org
Impact: For 6 minutes, a portion of logged-in users and non-cached pages experienced a slower response or an error. This was due to increased load on one of the databases.

2022-05-26 m1 database hardware
Impact: For 12 minutes, internal services hosted on the m1 database (e.g. Etherpad) were unavailable or at reduced capacity.

2022-05-31 Analytics Hadoop failure
Impact: For 1 hour, all HDFS writes and reads were failing. After recovery, ingestion from Kafka resumed and caught up. No data loss or other lasting impact on the Data Lake.


Incident follow-up

Recently completed incident follow-up:

Invalid confctl selector should either error out or select nothing
Filed by Amir (@Ladsgroup) after the confctl incident this past month. Giuseppe (@Joe) implemented CLI parameter validation to prevent human error from causing a similar outage in the future.

Backup opensearch dashboards data
Filed back in 2019 by Filippo (@fgiunchedi). The OpenSearch homepage dashboard (at logstash.wikimedia.org) was accidentally deleted last month. Bryan (@bd808) tracked down its content and re-created it. Cole (@colewhite) and Jaime (@jcrespo) worked out a strategy and set up automated backups going forward.

Remember to review and schedule Incident Follow-up work in Phabricator! These are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at Incident status on Wikitech.

💡Did you know?: The form on the Incident status page now includes a date, to more easily create backdated reports.

Trends

In May we discovered 28 new production errors, of which 20 remain unresolved and have come with us to June.

Last month the workboard totalled 292 tasks still open from prior months. Since the last edition, we completed 11 tasks from previous months, gained 11 additional errors from May (some of May was counted in last month), and have 7 fresh errors in the current month of June. As of today, the workboard houses 299 open production error tasks (spreadsheet, phab report).

Take a look at the workboard and look for tasks that could use your help.
View Workboard


Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

New discovery tool for technical documentation

14:00, Wednesday, 15 2022 June UTC

The technical community now has a new tool to discover information. The Developer Portal is a centralized entry point to help you:

  • Find the key documentation you need for common developer tasks.
  • Discover available tools and technologies.
  • Learn how to get started in Wikimedia technical areas.

For a general overview of this project, see the companion post on Diff; the following post focuses on details of the Developer Portal relevant to technical audiences.

Design principles

At its core, the Developer Portal is an index of categorized links to key sources of technical information. These sources are hosted primarily on wikis—the portal contains no actual documentation. 

Technical writers, developer advocates, software engineers, and designers worked together to create the Developer Portal, with lots of input and feedback from the community. We did user research, analyzed documents, created a content strategy, and implemented the portal as a static site. 

In designing the Developer Portal, we followed these principles:

  • Progressive disclosure: Avoid information overload by limiting the amount of content on each page. Provide only relevant, contextualized information at each step.
  • Well-lit paths: Focus on the most important and reliable resources. Do not attempt to index documentation for all Wikimedia technologies. Prioritize content that lowers barriers to entry.
  • Inclusivity: Support the widest set of developers (See the Diff blog post for details.) This includes providing translations, and making the portal accessible and usable on low-speed internet connections.

Paths to explore

Browse tutorials

Tutorials are crucial to help developers get started with Wikimedia technology, but it can be hard to find tutorials when they live on different wikis.  The Developer Portal makes it easier to browse available tutorials:

Explore by programming language

In feedback sessions and user testing, developers often express a desire to browse documentation and projects by programming language. The Developer Portal provides several paths to do that:

Find community and educational resources

It can be hard to keep up with all the events, news, and opportunities in the technical community! The Developer Portal brings together some essential resources to help people stay connected:

Technical architecture

The navigation-focused design of the portal made its implementation different from a standard, content-focused website. We wanted to use a static site generator to simplify the process of constructing and rendering the portal, but we didn’t want to create many pages with paragraphs of written content. Instead, a single page on the portal displays a collection of links to wiki pages or other key technical resources. We wanted to maintain only a single description of each key technical resource, and be able to easily combine or transclude those units to create modular collections of links.

The Developer Portal site is generated by MkDocs using the Material for MkDocs theme. We built custom plugins to integrate with translatewiki.net and to render markdown pages based on categorized sets of links, which are described in YAML files.  For more implementation details, see the Developer Portal docs on mediawiki.org.

Ongoing and future work

So far, the portal is 100% translated into French, Macedonian, and Turkish. If you can help add translations in more languages, visit the project’s page on translatewiki.net!

In the future, we plan to do more user testing in languages other than English, and we’d also like to test with users of assistive technology.

A major part of this project includes reviewing and updating the key documents linked from the Developer Portal.  We’re continuing that work into the coming year, and also investigating how to improve and scale the process.  Helping the technical community find documentation is just the first step—the larger goal is to empower everyone to contribute to and benefit from high-quality, reliable information about Wikimedia technical projects.

Learn more