June 21, 2022, San Francisco, CA, USA ― Wikimedia Enterprise, a first-of-its-kind commercial product designed for companies that reuse and source Wikipedia and Wikimedia projects at a high volume, today announced its first customers: multinational technology company Google and nonprofit digital library Internet Archive.  Wikimedia Enterprise was recently launched by the Wikimedia Foundation, the nonprofit that operates Wikipedia, as an opt-in product. Starting today, it also offers a free trial account to new users who can self sign-up to better assess their needs with the product.

As Wikipedia and Wikimedia projects continue to grow, knowledge from Wikimedia sites is increasingly being used to power other websites and products. Wikimedia Enterprise was designed to make it easier for these entities to package and share Wikimedia content at scale in ways that best suit their needs: from an educational company looking to integrate a wide variety of verified facts into their online curricula, to an artificial intelligence startup that needs access to a vast set of accurate data in order to train their systems. Wikimedia Enterprise provides a feed of real-time content updates on Wikimedia projects, guaranteed uptime, and other system requirements that extend beyond what is freely available in publicly-available APIs and data dumps. 

“Wikimedia Enterprise is designed to meet a variety of content reuse and sourcing needs, and our first two customers are a key example of this. Google and Internet Archive leverage Wikimedia content in very distinct ways, whether it’s to help power a portion of knowledge panel results or preserve citations on Wikipedia,” said Lane Becker, Senior Director of Earned Revenue at the Wikimedia Foundation. “We’re thrilled to be working with them both as our longtime partners, and their insights have been critical to build a compelling product that will be useful for many different kinds of organizations.” 

Organizations and companies of any size can access Wikimedia Enterprise offerings with dedicated customer-support and Service Level Agreements, at a variable price based on their volume of use. Interested companies can now sign up on the website for a free trial account which offers 10,000 on-demand requests and unlimited access to a 30-day Snapshot. 

Google and the Wikimedia Foundation have worked together on a number of projects and initiatives to enhance knowledge distribution to the world. Content from Wikimedia projects helps power some of Google’s features, including being one of several data sources that show up in its knowledge panels. Wikimedia Enterprise will help make the content sourcing process more efficient. Tim Palmer, Managing Director, Search Partnerships at Google said, “Wikipedia is a unique and valuable resource, created freely for the world by its dedicated volunteer community. We have long supported the Wikimedia Foundation in pursuit of our shared goals of expanding knowledge and information access for people everywhere. We look forward to deepening our partnership with Wikimedia Enterprise, further investing in the long-term sustainability of the foundation and the knowledge ecosystem it continues to build.”

Internet Archive is a long-standing partner to the Wikimedia Foundation and the broader free knowledge movement. Their product, the Wayback Machine, has been used to fix more than 9 million broken links on Wikipedia. Wikimedia Enterprise is provided free of cost to the nonprofit to further support their mission to digitize knowledge sources. Mark Graham, Director of the Internet Archive’s Wayback Machine shared, “The Wikimedia Foundation and the Internet Archive are long-term partners in the mission to provide universal and free access to knowledge. By drawing from a real time feed of newly-added links and references in Wikipedia sites – in all its languages, we can now archive more of the Web more quickly and reliably.”

Wikimedia Enterprise is an opt-in, commercial product. Within a year of its commercial launch, it is covering its current operating costs and with a growing list of users exploring the product. All Wikimedia projects, including the suite of publicly-available datasets, tools, and APIs the Wikimedia Foundation offers will continue to be available for free use to all users. 

The creation of Wikimedia Enterprise arose, in part, from the recent Movement Strategy – the global, collaborative strategy process to direct Wikipedia’s future by the year 2030 devised side-by-side with movement volunteers. By making Wikimedia content easier to discover, find, and share, the product speaks to the two key pillars of the 2030 strategy recommendations: advancing knowledge equity and knowledge as a service. 

Interested companies are encouraged to visit the Wikimedia Enterprise website for more information on the product offering and features, as well as to sign up for their free account. 

About the Wikimedia Foundation 

The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Wikimedia Enterprise is operated by Wikimedia, LLC, a wholly owned limited liability company (LLC) of the Wikimedia Foundation. The Foundation’s vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. 

The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive donations from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

For more information on Wikimedia Enterprise:

How does Internet Archive know?

19:30, Monday, 20 2022 June UTC

The Internet Archive discovers in real-time when WordPress blogs publish a new post, and when Wikipedia articles reference new sources. How does that work?

Wikipedia

Wikipedia, and its sister projects such as Wiktionary and Wikidata, run on the MediaWiki open-source software. One of its core features is “Recent changes”. This enables the Wikipedia community to monitor site activity in real-time. We use it to facilitate anti-spam, counter-vandalism, machine learning, and many more quality and research efforts.

MediaWiki’s built-in REST API exposes this data in machine-readable form to query (or poll). For wikipedia.org, we have an additional RCFeed plugin that broadcasts events to the stream.wikimedia.org service (docs).

The service implements the HTTP Server-Sent Events protocol (SSE). Most programming languages have an SSE client via a popular package. Most exciting to me, though, is the original SSE client: the EventSource API — built straight into the browser.1 This makes cool demos possible, getting started with only the following JavaScript:

new EventSource('https://stream.wikimedia.org/…');

And from the command-line, with cURL:

$ curl 'https://stream.wikimedia.org/v2/stream/recentchange'

event: message
id: …
data: {"$schema":…,"meta":…,"type":"edit","title":…}

WordPress

WordPress played a major role in the rise of the blogosphere. In particular, ping servers (and pingbacks2), helped the early blogging community with discovery. The idea: your website notifies a ping server over a standardized protocol. The ping server in turn notifies feed reader services (Feedbin, Feedly), aggregators (FeedBurner), podcast directories, search engines, and more.3

Ping servers today implement the weblogsCom interface (specification), introduced in 2001 and based on the XML-RPC protocol.4 The default ping server in WordPress is Automattic’s Ping-O-Matic, which in turn powers the WordPress.com Firehose.

This firehose is a Jabber/XMPP server at xmpp.wordpress.com:8008. It provides events about blog posts published in real-time, from any WordPress site. Both WordPress.com and self-hosted ones.5 The firehose is also available in as HTTP stream.

$ curl -vi xmpp.wordpress.com:8008/posts.org.json # self-hosted
{ "published":"2022-06-05T21:26:09Z",
  "verb":"post",
  "generator":{},
  "actor":{},
  "target":{"objectType":"blog",…,},
  "object":{"objectType":"article",…}
}
{}

$ curl -vi xmpp.wordpress.com:8008/posts.json # WordPress.com
{}

Internet Archive

It might be surprising, but the Internet Archive does not try to index the entire Internet. This in contrast to commercial search engines.

The Internet Archive consists of bulk datasets from curated sources (“collections”). Collections are often donated by other organizations, and go beyond capturing web pages. They can also include books, music,6, and software.7 Any captured web pages are additionally surfaced via the Wayback Machine interface.

Perhaps you’ve used the “Save Page Now” feature, where you can manually submit URLs to capture. While also represented by a collection, these actually go to the Wayback Machine first, and appear in bulk as part of the collection later.

The Common Crawl and Wide Crawl collections represent traditional crawlers. These starts with a seed list, and go breadth-first to every site it finds (within a certain global and per-site depth limit). Such crawl can take months to complete, and captures a portion of the web from a particular period in time — regardless of whether a page was indexed before. Other collection are more narrow in focus, e.g. regularly crawl a news site and capture any articles not previously indexed.

Wikipedia collection

One such collection is Wikipedia Outlinks.8 This collection is fed several times a day with bulk crawls of new URLs. The URLs are extracted from recently edited or created Wikipedia articles, as discovered via the events from stream.wikimedia.org (Source code: crawling-for-nomore404).

en.wikipedia.org, revision by Krinkle, on 30 May 2022 at 21:03:30.

Last month, I edited the VodafoneZiggo article on Wikipedia. My edit added several new citations. The articles I cited were from several years ago, and most already made their way into the Wayback Machine by other means. Among my citations was a 2010 article from an Irish news site (rtl.ie). I searched for it on archive.org and no snapshots existed of that URL.

A day later I searched again, and there it was!

web.archive.org found 1 result, captured at 30 May 2022 21:03:55. This capture was collected by: Wikipedia Eventstream.

I should note that, while the snapshot was uploaded a day later, the crawling occurred in real-time. I published my edit to Wikipedia on May 30th, at 21:03:30 UTC. The snapshot of the referenced source article, was captured at 21:03:55 UTC. A mere 25 seconds later!

In addition to archiving citations for future use, Wikipedia also integrates with the Internet Archive in the present. The so-called InternetArchiveBot (source code) continously crawls Wikipedia, looking for “dead” links. When it finds one, it searches the Wayback Machine for a matching snapshot, preferring one taken on or near the date that the citation was originally added to Wikipedia. This is important for online citations, as web pages may change over time.

The bot then edits Wikipedia (example) to rescue the citation by filling in the archive link.

Wikipedia.org, revision by InternetArchiveBot, on 4 June 2022. Rescuing 1 source. The source was originally cited on 29 September 2018. The added archive URL is also from 29 September 2018. web.archive.org, found 1 result, captured 29 September 2018. This capture was collected by: Wikipedia Eventstream.

WordPress collection

The NO404-WP collection on archive.org works in a similar fashion. It is fed by a crawler that uses the WordPress Firehose (source code). The firehose, as described above, is pinged by individual WordPress sites after publishing a new post.

For example, this blog post by Chris. According to the post metadata, it was published at 12:00:42 UTC. And by 12:01:55, one minute later, it was captured.9

In addition to preserving blog posts, the NO404-WP collection goes a step further and also captures any new material your post links to. (Akin to Wikipedia citations!) For example, this css-tricks.com post links to file on GitHub inside the TT1 Blocks project. This deep link was not captured before and is unlikely to be picked up by regular crawling due to depth limits. It got captured and uploaded to the NO404-WP collection a few days later.

Further reading

Footnotes:

  1. The “Server-sent events” technology was around as early as 2006, originating at Opera (announcement, history). It was among the first specifications to be drafted through WHATWG, which formed in 2004 after the W3C XHTML debacle

  2. Pingback (Pingbacks explained, history) provides direct peer-to-peer discovery between blogs when one post mentions or links to another post. By the way, the Pingback and Server-Sent Events specifications were both written by Ian Hickson. 

  3. Feedbin supports push notifications. While these could come from from its periodic RSS crawling, it tries to deliver these in real-time where possible. It this does by mapping pings from blogs that notify Ping-O-Matic, to feed subscriptions. 

  4. The weblogUpdates spec for Ping servers was writen by Dave Winer in 2001, who took over Weblogs.com around that time (history) and needed something more scalable. This, by the way, is the same Dave Winer who developed the underlying XML-RPC protocol, the OPML format, and worked on RSS 2.0. 

  5. That is, unless the blog owner opts-out by disabling the “search engine” and “ping” settings in WordPress Admin. 

  6. The Muziekweb collection is one that stores music rather than web pages. Muziekweb is a library in the Netherlands that lends physical CDs, via local libraries, to patrons. They also digitize their collection for long-term preservation. One cool application of this, is that you can stream any album in full from a library computer. And… they mirror to the Internet Archive! You can search for an artist, and listen online. For copyright reasons, most music is publicly limited to 30s samples. Through Controlled digital lending, however, you can access many more albums in full. Plus you can publicly stream any music in the public domain, under a free license, or pre-1972 no longer commercially available

  7. I find particularly impressive that Internet Archive also host platform emulators for the software it preserves, and that these platforms not only include game consoles but also Macintosh and MS-DOS, and that these emulators are then compiled via Emscripten to JavaScript and integrated right on the archive.org entry! For example, you can play the original Prince of Persia for Mac (via pce-macplus.js), the later color edition, or Wolfenstein 3D for MS-DOS (via js-dos or em-dosbox), or check out Bill Atkinson’s 1985 MacPaint

  8. The “Wikipedia Outlinks” collection was originally populated via the NO404-WKP subcollection, which used the irc.wikimedia.org service from 2013 to 2019. It was phased out in favour of the wikipedia-eventstream subcollection

  9. In practice, the ArchiveTeam URLs collection tends to beat the NO404-WP collection and thus the latter doesn’t crawl it again. Perhaps the ArchiveTeam scripts also consume the WordPress Firehose? For many WordPress posts I checked, the URL is only indexed once, which is from “ArchiveTeam URLs” doing so within seconds of original publication. 

Tech News issue #25, 2022 (June 20, 2022)

00:00, Monday, 20 2022 June UTC
previous 2022, week 25 (Monday 20 June 2022) next

Tech News: 2022-25

weeklyOSM 621

09:59, Sunday, 19 2022 June UTC

07/06/2022-13/06/2022

lead picture

Chaz Hutton is probably not the only one for whom the benefits of OSM are new. [1] © G-Maps | map data © OpenStreetMap contributors

Mapping

  • Data curator arredond explained how working on map layers for the Felt company was an opportunity to find and correct tagging errors.
  • In the fourth part of a series about the specific challenges of working with map data, Daniel Mescheder wrote about the importance of tracking changes.
  • MarcoR noted that there are templates (KeyDescription and ValueDescription) in the OSM wiki that are used by taginfo to display some useful information to the user. Since the same information is included in the data element associated with the feature page, some wiki users in good faith truncate these templates to their minimum (e.g. ‘{{KeyDescription}}’), preventing taginfo from retrieving the data.
  • willkmis shared a personal view on urban road classification from a North American perspective.
  • The proposal for county, city and local highway networks in the United States was approved with 17 votes for, 0 votes against and 2 abstentions.

Community

  • Amanda McCann wrote about the new moderator team and etiquette guidelines for the talk and osmf-talk mail lists.
  • Amanda’s work report for May 2022 is available online.
  • The OpenStreetMap Taiwan Community (OSMTW) gathered for its second workshop. Whether participating on-site or online, the participants worked very hard to map on OpenStreetMap or uploaded related images to Wikimedia Commons. Even during the peak of the COVID-19 pandemic in Taiwan, there were two geographic teachers who attended the workshop to learn more about OpenStreetMap, and how to use OpenStreetMap in the curriculum. OpenStreetMap Taiwan will use the resource supported by the alliance grant from the Wikimedia Foundation to support related workshops scheduled from March 2022 until February 2023. OSMTW is dedicated to organising at least six street-view expeditions and six edit workshops.
  • Zhengyi Cao and Chris Park briefly reported on their projects in this year’s GSoC.
  • Ed Freyfogle talked to Ilya Zverev about mapping in general and about Ilya’s new OSM editor Every Door, in Geomob Podcast #132.
  • OSM Belgium has chosen Nicxon Piaso, from Papua New Guinea, as Mapper of the Month and introduced him in an interview.
  • Pieter Vander Vennet wrote about educational facilities, the current way of tagging them, and examined how to converge towards unified tagging of schools. Previous discussions on the subject and the difficulty in unifying even a simple country are reported by other contributors.

OpenStreetMap Foundation

  • Paul Norman pointed out the North American capacity issues with pyrene, the only US render server. It looks like Amazon may help with the server problem.
  • Simon Poole shared his insights about the limits and possibilities of reaching a EU-wide General Data Protection Regulations (GDPR)-compatible OSM world.

Local chapter news

  • There are still places available (de) > en at the OSM-FOSSGIS community meeting on the weekend of 1 to 3 July at the Linuxhotel in Essen. Travel is at your own expense; accommodation and meals will be provided by the the FOSSGIS Association, the German regional representation of OSM.

Education

  • Anne-Karoline Distel shared a new video on the topic ‘Adding roads to hiking route relations’.
  • Daniel Capilla presented (es) > en a brief exposition of the possibilities offered by the OpenStreetMap data mining tool for those who wish to collaborate in the task of field verification of recycling containers in the municipality of Malaga, based on the key check_date and using the overpass turbo query wizard.

OSM research

  • Veniamin Veselovsky, Dipto Sarkar, Jennings Anderson and Robert Soden published a scientific paper about the development of an automated technique for identifying corporate mapping in OpenStreetMap.

Maps

  • Christoph Hormann published the third and fourth parts of a series about the depiction of trees in maps.
  • Holocrypto provides OSM Planet, Europe and Netherlands vectorial MBTiles for personal or educational use. The MBtiles packages are updated regularly.
  • Neue Züricher Zeitung is publishing an OSM-based daily updated interactive map of developments in the Ukraine war.

Software

  • Grab launched GrabMaps, which aims to tap into the US$1 billion map and location-based services market in Southeast Asia. Grab still uses OpenStreetMap as its map base.

Did you know …

  • flipcoords, OpenCage’s new tool for reformatting coordinates to and from lat/lng to lng/lat or into named parameters?

Other “geo” things

  • [1] You might think the whole world knows about OpenStreetMap, and then you read this light-bulb moment (aha moment) from Chaz Hutton on Twitter. ‘Shout out to Ed Freyfogle for getting me onto it’, Chaz comments on his new insight.
  • Without words
  • The European Space Agency has released (fr) > en a three dimensional map of the Milky Way, including nearly two billion stars. It took ten years for the Gaia satellite, 1.5 million kilometres away from Earth, to collect the data and the mission will continue until 2025.
  • About a hundred people were rescued (fr) > en during a school trip this week in Kleinwalsertal, Austria. The teachers were apparently misguided by false information on the internet, leading the group onto the Heuberggrat path without warning them about its sheer difficulty.
  • As TechCrunch reported, Russian tech giant Yandex has removed national borders from its map apps.

Upcoming Events

Where What Online When Country
Arrondissement de Tours La liberté numérique osmcalpic 2022-06-18 flag
京都市 京都!街歩き!マッピングパーティ:第31回 妙法院 osmcalpic 2022-06-18 flag
新店區 OpenStreetMap 街景踏查團 #2 三峽-大溪踏查 osmcalpic 2022-06-19 flag
OSMF Engineering Working Group meeting osmcalpic 2022-06-20
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Initiation osmcalpic 2022-06-21 flag
Kaiserslautern Erfassung von Barrieren in Kaiserslautern osmcalpic 2022-06-21 flag
Lyon Rencontre mensuelle Lyon osmcalpic 2022-06-21 flag
152. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-06-21
San Jose South Bay Map Night osmcalpic 2022-06-22 flag
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-06-21 flag
TeachOSM Map-Along osmcalpic 2022-06-22
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-06-21 flag
Manila Making OSM a Safer Space for LGBTQIA+ Mapper – An Intro to SOGIESC (Sexual Orientation, Gender Identity and Expression, and Sex Characteristics) and How to be a better Ally? osmcalpic 2022-06-22 flag
Washington OpenStreetMap US Mappy Hour osmcalpic 2022-06-23 flag
Roma Capitale Incontro dei mappatori romani e laziali osmcalpic 2022-06-22 flag
Kaiserslautern Erfassung von Barrieren in Kaiserslautern osmcalpic 2022-06-23 flag
Oriental Mindoro Open Mapping Hub Asia Pacific’s Map and Chat Hour (PRIDE Celebration) osmcalpic 2022-06-24 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen osmcalpic 2022-06-24 flag
IJmuiden OSM Nederland bijeenkomst (online) osmcalpic 2022-06-25 flag
Tanzania Mapping Groups June Mapathon osmcalpic 2022-06-25
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Contribution osmcalpic 2022-06-28 flag
Hlavní město Praha MSF Missing Maps CZ Mapathon 2022 #2 Prague, KPMG office (Florenc) osmcalpic 2022-06-28
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-06-30
Essen 17. OSM-FOSSGIS-Communitytreffen osmcalpic 2022-07-01 – 2022-07-03 flag
San Jose South Bay Map Night osmcalpic 2022-07-06 flag
London Missing Maps London Mapathon osmcalpic 2022-07-05 flag
Salt Lake City OSM Utah Monthly Meetup osmcalpic 2022-07-07 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, Nordpfeil, PierZen, Sammyhawkrad, Strubbl, TheSwavu, derFred.

When a Wikipedia research project becomes a thesis

15:51, Thursday, 16 2022 June UTC
Maria Murad
Maria Murad. Image courtesy Maria Murad, all rights reserved.

Maria Murad decided to take Heather Sharkey’s course at the University of Pennsylvania because it involved learning how to write a Wikipedia article.

“I was already working for my school newspaper, The Daily Pennsylvanian, and I wanted to explore more avenues that allowed me to create short form, accessible content on important topics,” Maria explains. “I find that academic articles can be pretty inaccessible for most. Wikipedia or news articles are more accessible to the masses. I know when I want to learn more about something, the first thing I do is search it on Wikipedia. This course felt like an opportunity to have a meaningful impact on a platform that virtually everyone uses to learn about new topics.”

The assignment provided a meaningful impact for Wikipedia’s readers — but also to Maria herself. Dr. Sharkey had provided a list of women connected to the University of Pennsylvania Museum of Archaeology and Anthropology (better known as the Penn Museum) as potential subjects for their Wikipedia assignment. One name on the list was Florence Shotridge, or Kaatxwaaxsnéi, a Native Alaskan Tlingit ethnographer, museum educator, and weaver who worked at the Penn Museum for several years. There was little to no information online about her. Intrigued, Maria set out to research her.

Florence Shotridge
Florence Shotridge

“I took a lot of trips to the Museum Archives and learned that she was one of the first American Indians  to lead an anthropological expedition (alongside her husband), an excellent Chilkat weaver, and a museum educator guide to schoolchildren who would visit the museum,” Maria says. “Her husband, Louis Shotridge, already had a Wikipedia article and there was a lot of information about him at the Museum, but it seemed like Florence’s legacy was mostly invisible.”

Maria made it more visible by creating her biography on Wikipedia. But the assignment inspired Maria further: She also made Florence Shotridge the focus of her senior thesis, including creating a short documentary film about her life.

“I think one of the best skills I gained from writing for Wikipedia was the ability to succinctly synthesize various sources,” Maria says. “Since little was written about most of these women before, I had to combine primary research I discovered in the Museum archives with object histories in the Museum with brief mentions in academic articles. I had to marry a variety of sources together in a clear and accessible way in order to publish it on Wikipedia. I think this is a very important skill to have in academic writing.”

It’s a skill Maria is now putting to use. A Kentucky native, she graduated from Penn in 2021. Now, she’s studying Visual, Material, and Museum Anthropology in a master’s program at the University of Oxford. While Oxford is keeping her busy, she hopes to get back to editing Wikipedia soon, especially creating new articles about women.

“Though supplementing details and information on extant articles was a worthwhile and rewarding task, it felt very special to contribute something new to the platform that would lead to so many more people learning about important women in Penn’s history that would have never known about them before,” says Maria.

To learn more about the Wikipedia Student Program, visit teach.wikiedu.org.

Image credit: Bain News Service, publisher, Public domain, via Wikimedia Commons

Production Excellence #44: May 2022

01:13, Thursday, 16 2022 June UTC

How’d we do in our strive for operational excellence last month? Read on to find out!

Incidents

By golly, we've had quite the month! 10 documented incidents, which is more than three times the two-year median of 3. The last time we experienced ten or more incidents in one month, was June 2019 when we had eleven (Incident graphs, Excellence monthly of June 2019).

I'd like to draw your attention to something positive. As you read the below, take note of incidents that did not impact public services, and did not have lasting impact or data loss. For example, the Apache incident benefited from PyBal's automatic health-based depooling. The deployment server incident recovered without loss thanks to Bacula. The Etcd incident impact was limited by serving stale data. And, the Hadoop incident recovered by resuming from Kafka right where it left off.

2022-05-01 etcd
Impact: For 2 hours, Conftool could not sync Etcd data between our core data centers. Puppet and some other internal services were unavailable or out of sync. The issue was isolated, with no impact on public services.

2022-05-02 deployment server
Impact: For 4 hours, we could not update or deploy MediaWiki and other services, due to corruption on the active deployment server. No impact on public services.

2022-05-05 site outage
Impact: For 20 minutes, all wikis were unreachable for logged-in users and non-cached pages. This was due to a GlobalBlocks schema change causing significant slowdown in a frequent database query.

2022-05-09 Codfw confctl
Impact: For 5 minutes, all web traffic routed to Codfw received error responses. This affected central USA and South America (local time after midnight). The cause was human error and lack of CLI parameter validation.

2022-05-09 exim-bdat-errors
Impact: During five days, about 14,000 incoming emails from Gmail users to wikimedia.org were rejected and returned to sender.

2022-05-21 varnish cache busting
Impact: For 2 minutes, all wikis and services behind our CDN were unavailable to all users.

2022-05-24 failed Apache restart
Impact: For 35 minutes, numerous internal services that use Apache on the backend were down. This included Kibana (logstash) and Matomo (piwik). For 20 of those minutes, there was also reduced MediaWiki server capacity, but no measurable end-user impact for wiki traffic.

2022-05-25 de.wikipedia.org
Impact: For 6 minutes, a portion of logged-in users and non-cached pages experienced a slower response or an error. This was due to increased load on one of the databases.

2022-05-26 m1 database hardware
Impact: For 12 minutes, internal services hosted on the m1 database (e.g. Etherpad) were unavailable or at reduced capacity.

2022-05-31 Analytics Hadoop failure
Impact: For 1 hour, all HDFS writes and reads were failing. After recovery, ingestion from Kafka resumed and caught up. No data loss or other lasting impact on the Data Lake.


Incident follow-up

Recently completed incident follow-up:

Invalid confctl selector should either error out or select nothing
Filed by Amir (@Ladsgroup) after the confctl incident this past month. Giuseppe (@Joe) implemented CLI parameter validation to prevent human error from causing a similar outage in the future.

Backup opensearch dashboards data
Filed back in 2019 by Filippo (@fgiunchedi). The OpenSearch homepage dashboard (at logstash.wikimedia.org) was accidentally deleted last month. Bryan (@bd808) tracked down its content and re-created it. Cole (@colewhite) and Jaime (@jcrespo) worked out a strategy and set up automated backups going forward.

Remember to review and schedule Incident Follow-up work in Phabricator! These are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at Incident status on Wikitech.

💡Did you know?: The form on the Incident status page now includes a date, to more easily create backdated reports.

Trends

In May we discovered 28 new production errors, of which 20 remain unresolved and have come with us to June.

Last month the workboard totalled 292 tasks still open from prior months. Since the last edition, we completed 11 tasks from previous months, gained 11 additional errors from May (some of May was counted in last month), and have 7 fresh errors in the current month of June. As of today, the workboard houses 299 open production error tasks (spreadsheet, phab report).

Take a look at the workboard and look for tasks that could use your help.
View Workboard


Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

Illuminating pathways to technical documentation

14:00, Wednesday, 15 2022 June UTC

Wikimedia has many entrance points for developers and technical contributors. Our technical community includes people with a wide range of skills, experience, and interests—from students learning to build their first app, to professional software engineers with deep knowledge of MediaWiki. Many developers use APIs to access free knowledge content, and many volunteers contribute their technical skills to Wikimedia open source projects.

Today, the Wikimedia Foundation Developer Advocacy team is launching a new, centralized entry point for finding technical documentation and community resources: the Wikimedia Developer Portal. Just like the community itself, our technical documentation takes many forms and resides in many different places. From mediawiki.org to GitHub to Wikitech, the key information developers need may exist in wiki pages, code repositories, websites, and more. This complex landscape can make it hard to find the information you need. The goal of the the Developer Portal project was to make it easier for developers and technical contributors to:

  • Find the information they need to achieve a certain task.
  • Discover available tools and technologies.
  • Learn how to get started in Wikimedia technical areas.

Understanding audiences and their documentation needs

Part of the complexity of Wikimedia technical documentation comes from the multiple audiences it serves. The new Developer Portal seeks to support the following types of users:

A person sitting on the floor working on a laptop with a W on it

Content reusers

Developers who want to use Wikimedia content in their projects.

A person drinking a hot drink while sitting at a laptop

Data consumers and researchers

Data scientists, machine learning engineers, and researchers who want to use Wikimedia data in their projects.

Two people sitting cross-legged while working on their computers. They seem to be discussing code.

Tool developers

Wikimedians who have created or contributed to Wikimedia tools.

A person sitting in front of a computer with bugs on it. The person is scratching their head.

New tool developers

Wikimedians who are interested in learning to create, contribute to, and use Wikimedia tools, usually to solve a problem they face in maintaining their local wiki.

A person in a wheelchair celebrating while holding a laptop in celebration of the Wikimedia 2022 Hackathon

Open source contributors

Developers who want to use their skills to contribute to Wikimedia technical projects.

Two people with laptops and speech bubbles above their heads. One person has a cane.

Learners

Students who are learning programming and are interested in Wikimedia technical projects.

Each of these audiences has different (but often overlapping) goals, motivations, and tasks to complete. They have different information needs and may rely on different types of documentation. How can we help all these types of users find what they need without risking information overload?

To begin, we talked with stakeholders, gathered community feedback, and reviewed previous research about developer documentation hubs. We then completed multiple rounds of user research to identify the major user journeys for each audience. A user journey captures the steps to complete a specific task. For example, the user journey below illustrates the steps a new developer might take to start contributing code to a Wikimedia open source project:

Example user journey for open source contributors

Applying user journeys to guide our content strategy enabled us to identify the key docs necessary for each step in the journey. Key docs might be landing pages for a major technical area, tutorials for a given topic, or overviews that explain essential concepts. 

Improving technical docs and processes

This project involved an entirely new user research process for our team. We worked with the Wikimedia Foundation Design Strategy team to design a research study and recruit a diverse group of users to test the final version of the site. We learned a lot from this process! Here are some key takeaways:

  • User types overlap more than we expected.
  • People have very different navigation behaviors: some use menus heavily, others use page content, and sometimes that behavior changes based on context. Offering multiple paths to information is important.
  • Users appreciate the option to view content in languages other than English, but most developers say they would still use the site in English because that’s the language of most technical content. 

Helping users discover key technical docs is only part of the solution; the information in those docs must also be reliable, updated, and user-friendly.  To ensure that the Developer Portal links to high-quality documentation, we developed a documentation review process.  It currently includes checklists to standardize documentation reviews and make it simpler for anyone to help improve documentation. We’ve used this process to review and improve over 20 key docs so far, and we’re continuing to streamline and scale the process. While work in this area continues, today’s launch of the Developer Portal is an exciting step towards better empowering developers to share in the sum of all knowledge.

Learn more

Image credits: Mam’Gobozi Design Factory (MDF), CC BY-SA 3.0, via Wikimedia Commons

New discovery tool for technical documentation

14:00, Wednesday, 15 2022 June UTC

The technical community now has a new tool to discover information. The Developer Portal is a centralized entry point to help you:

  • Find the key documentation you need for common developer tasks.
  • Discover available tools and technologies.
  • Learn how to get started in Wikimedia technical areas.

For a general overview of this project, see the companion post on Diff; the following post focuses on details of the Developer Portal relevant to technical audiences.

Design principles

At its core, the Developer Portal is an index of categorized links to key sources of technical information. These sources are hosted primarily on wikis—the portal contains no actual documentation. 

Technical writers, developer advocates, software engineers, and designers worked together to create the Developer Portal, with lots of input and feedback from the community. We did user research, analyzed documents, created a content strategy, and implemented the portal as a static site. 

In designing the Developer Portal, we followed these principles:

  • Progressive disclosure: Avoid information overload by limiting the amount of content on each page. Provide only relevant, contextualized information at each step.
  • Well-lit paths: Focus on the most important and reliable resources. Do not attempt to index documentation for all Wikimedia technologies. Prioritize content that lowers barriers to entry.
  • Inclusivity: Support the widest set of developers (See the Diff blog post for details.) This includes providing translations, and making the portal accessible and usable on low-speed internet connections.

Paths to explore

Browse tutorials

Tutorials are crucial to help developers get started with Wikimedia technology, but it can be hard to find tutorials when they live on different wikis.  The Developer Portal makes it easier to browse available tutorials:

Explore by programming language

In feedback sessions and user testing, developers often express a desire to browse documentation and projects by programming language. The Developer Portal provides several paths to do that:

Find community and educational resources

It can be hard to keep up with all the events, news, and opportunities in the technical community! The Developer Portal brings together some essential resources to help people stay connected:

Technical architecture

The navigation-focused design of the portal made its implementation different from a standard, content-focused website. We wanted to use a static site generator to simplify the process of constructing and rendering the portal, but we didn’t want to create many pages with paragraphs of written content. Instead, a single page on the portal displays a collection of links to wiki pages or other key technical resources. We wanted to maintain only a single description of each key technical resource, and be able to easily combine or transclude those units to create modular collections of links.

The Developer Portal site is generated by MkDocs using the Material for MkDocs theme. We built custom plugins to integrate with translatewiki.net and to render markdown pages based on categorized sets of links, which are described in YAML files.  For more implementation details, see the Developer Portal docs on mediawiki.org.

Ongoing and future work

So far, the portal is 100% translated into French, Macedonian, and Turkish. If you can help add translations in more languages, visit the project’s page on translatewiki.net!

In the future, we plan to do more user testing in languages other than English, and we’d also like to test with users of assistive technology.

A major part of this project includes reviewing and updating the key documents linked from the Developer Portal.  We’re continuing that work into the coming year, and also investigating how to improve and scale the process.  Helping the technical community find documentation is just the first step—the larger goal is to empower everyone to contribute to and benefit from high-quality, reliable information about Wikimedia technical projects.

Learn more

Upload your photographs during June to be in with a chance of winning country and national prizes.

This year, for the second time, Wales is taking part in the international photography competition ‘Wiki Loves Earth’ organised by the Wikimedia movement. Founded 9 years ago as a focus for nature heritage, the competition aims to raise awareness of protected sites. The Welsh campaign is also organised by Wikimedia UK, the National Library of Wales and WiciMon.

Robin Owain who leads the Wikimedia UK projects across Wales said “This year, our key supporters include the Welsh Government, the Ramblers Association and all three National Parks! We are calling on people across Wales to share their photographs of nature: flora, fauna and fungi!”

This is one of the largest photography competitions in the world focusing on National Parks, Sites of Special Scientific Interest and all protected areas. Robin explained “The biodiversity and geology of Wales is unique, and this competition allows Welsh photographers to share our protected areas on a world stage. 

Other organisations who will be supporting this exciting competition include Natural Resources Wales, all three National Parks: Eryri (Snowdonia), Pembrokeshire and the Brecon Beacons, Ramblers (Cymru), and both Edward Llwyd and Llên Natur  nature societies.

Examples of past winners can be seen at http://wikilovesearth.org and last year’s Welsh winners can be found here.

Any photographs you have taken in the past can be uploaded during June, with prizes at both country and national level to the winners. Robin added “The competition is open to everyone. We play rugby and football on the world stage, therefore we ask our friends, volunteers and staff to take photographs on that international stage, and at the same time exhibit their photographs of our diverse countryside.”

Read more about Wiki Loves Earth 2022 in Wales here on Wikimedia Commons.

More on Wiki Loves Earth can be found here.

Further information & images [email protected]

About Wikimedia UK here

The post Wales and international photography competition #WikiLovesEarth appeared first on WMUK.

Tech/News/2022/24

07:00, Tuesday, 14 2022 June UTC

Other languages: Bahasa Indonesia, Deutsch, English,Yorùbá, español, français, italiano, magyar, polski, português, português do Brasil, suomi, svenska, čeština, Ελληνικά, русский, українська, עברית, العربية, فارسی, বাংলা, 中文, 日本語

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 14 June. It will be on non-Wikipedia wikis and some Wikipedias from 15 June. It will be on all wikis from 16 June (calendar).
  • Some wikis will be in read-only for a few minutes because of a switch of their main database. It will be performed on 14 June at 06:00 UTC (targeted wikis). [3]
  • Starting on Wednesday, a new set of Wikipedias will get “Add a link” (Abkhazian Wikipedia, Achinese Wikipedia, Adyghe Wikipedia, Afrikaans Wikipedia, Akan Wikipedia, Alemannisch Wikipedia, Amharic Wikipedia, Aragonese Wikipedia, Old English Wikipedia, Aramaic Wikipedia, Egyptian Arabic Wikipedia, Asturian Wikipedia, Atikamekw Wikipedia, Avaric Wikipedia, Aymara Wikipedia, Azerbaijani Wikipedia, South Azerbaijani Wikipedia). This is part of the progressive deployment of this tool to more Wikipedias. The communities can configure how this feature works locally. [4]
  • The New Topic Tool will be deployed for all editors at Commons, Wikidata, and some other wikis soon. You will be able to opt out from within the tool and in Preferences. [5][6]

Future meetings

  • The next open meeting with the Web team about Vector (2022) will take place today (13 June). The following meetings will take place on: 28 June, 12 July, 26 July.

Future changes

  • By the end of July, the Vector 2022 skin should be ready to become the default across all wikis. Discussions on how to adjust it to the communities’ needs will begin in the next weeks. It will always be possible to revert to the previous version on an individual basis. Learn more.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

Explore wiki project data faster with mwsql

21:24, Monday, 13 2022 June UTC

By Slavina Stefanova, Wikimedia Cloud Services

The mwsql library is the latest addition to MediaWiki-utilities, a collection of lightweight Python tools for extracting and processing MediaWiki data. It provides a simple interface for downloading, inspecting, and transforming SQL dump files into other more user-friendly formats such as Pandas dataframes or CSV. mwsql is available through PyPI and can be installed using pip.

Why mwsql?

Data from Wikimedia projects is open-source licensed and publicly available in a variety of formats, such as:

While utilities for working with most of these data sources have existed for quite some time, for example mwapi and mwxml, no such tool existed for SQL dumps. Because of this gap, developing mwsql was proposed as a joint Outreachy project between the Research and Technical Engagement teams during the May-August round of 2021.

SQL dumps

Before diving into exploring the different features of mwsql, let’s take a look at what a raw SQL dump file looks like.

A dump of SQL data

The dump contains information related to the database table structure, as well as the actual table contents (records) in the form of a list of SQL statements. There is also some additional metadata. Database dumps are most often used for backing up a database so that its contents can be restored in the event of data loss. They are not designed to be worked with ‘as is’, e.g., parsed, filtered or searched. However, having the ability to access data directly from the dumps allows offline processing and lowers the barrier for users with only basic Python knowledge, such as data scientists, researchers, or journalists because the only prerequisite is basic Python knowledge.

mwsql features

mwsql main features are:

  • easily downloading SQL dump files
  • parsing the database table into a Dump object
  • allowing fast exploration of the table’s metadata and contents
  • transforming the SQL dump into other more convenient data structures and file formats

Use mwsql with a wiki data dump

The rest of this tutorial demonstrates each of these features through a concrete example hosted on GitHub. You can clone the Jupyter notebook to go through the example of downloading dump files, parsing the SQL dump file, exploring the data, and writing to CSV.

You’re welcome to clone, fork, or adapt the Jupyter notebook containing the source code for this tutorial to meet your needs.

Future of mwsql

As many of the dump files are huge (>10GB), having to download them before being able to process their contents can be time-consuming. This is less of a problem in a WMF-hosted environment, such as PAWS, where the dumps are available through a public directory. Having the opportunity to inspect a file before committing to download all of it, as well as being able to process it as it is downloading (streaming), would be a huge performance improvement for users working in non-WMF environments.

mwsql project info

The project repository is hosted on GitHub. Anyone is welcome to submit a patch, file a bug report, request new features, and help improve the existing documentation. Have you used mwsql to do something interesting with Wikimedia data? Leave a post on this Talk page, and together we can think of a way to showcase your work.

Further reading

This tutorial explains how you can use mwsql along with other tools from the Mediawiki-utilities suite and Pandas to explore how mobile editing has evolved over time.

13 June 2022, San Francisco  — The Wikimedia Foundation today announced the appointment of Selena Deckelmann as Chief Product and Technology Officer. Selena is currently serving as Senior Vice President of Mozilla, where she was responsible for Firefox. She will officially join on August 1, 2022.

Selena will lead the product and technology teams at the Wikimedia Foundation. These teams support the technology infrastructure and innovation that powers Wikimedia projects, including Wikipedia, one of the most visited websites in the world with more than 16 billion pageviews per month. They also enable more than 300,000 global volunteers to edit Wikimedia projects each month. 

“Selena has a proven track record of delivering results by enabling individuals and teams to tackle unique and often complex challenges,” said Maryana Iskander, CEO of the Wikimedia Foundation. “She has dedicated her career to open source technologies for empowerment and inclusion.” 

At Mozilla, where she has been for nearly a decade, Selena currently leads the Firefox organization of more than 400 people responsible for all Firefox product and technology functions including desktop, mobile, web platform, and browser services. She oversaw some of the company’s most significant achievements including performance projects like Quantum Flow, architectural changes like Project Fission, key features like Enhanced Tracking Protection and Total Cookie Protection, and services such as Firefox Monitor. In her nine years at Mozilla, Selena held various other roles including Vice President for Firefox Desktop, Senior Director for Web Platform Engineering and Gecko Runtime, and Senior Manager for Gecko Security Engineering. 

Selena also brings experience from her previous roles as co-founder of Prime Radiant, a software as a service business that explored how to improve business processes at scale with checklist automation software, and as Consulting Director of Development for The Ada Initiative, an organization that was dedicated to increasing the participation of women in open source and technology communities. She was a major contributor to PostgreSQL, one of the largest free and open source databases in the world. 

“Open collaboration produces better solutions for the world, and technology is a critical enabler of making this true,” said Selena. “I look forward to contributing to Wikimedia’s inspiring free knowledge mission.”

As Chief Product and Technology Officer, Selena will work with Wikimedia Foundation staff, technical contributors, volunteer developers, researchers, and communities to support Wikimedia’s 2030 Movement Strategy to advance free and open access to knowledge. The majority of the Foundation is focused on product and technology development in service of our mission.

13 June 2022, San Francisco  — The Wikimedia Foundation today announced the appointment of Selena Deckelmann as Chief Product and Technology Officer. Selena is currently serving as Senior Vice President of Mozilla, where she was responsible for Firefox. She will officially join on August 1, 2022.

Selena will lead the product and technology teams at the Wikimedia Foundation. These teams support the technology infrastructure and innovation that powers Wikimedia projects, including Wikipedia, one of the most visited websites in the world with more than 16 billion pageviews per month. They also enable more than 300,000 global volunteers to edit Wikimedia projects each month. 

“Selena has a proven track record of delivering results by enabling individuals and teams to tackle unique and often complex challenges,” said Maryana Iskander, CEO of the Wikimedia Foundation. “She has dedicated her career to open source technologies for empowerment and inclusion.” 

At Mozilla, where she has been for nearly a decade, Selena currently leads the Firefox organization of more than 400 people responsible for all Firefox product and technology functions including desktop, mobile, web platform, and browser services. She oversaw some of the company’s most significant achievements including performance projects like Quantum Flow, architectural changes like Project Fission, key features like Enhanced Tracking Protection and Total Cookie Protection, and services such as Firefox Monitor. In her nine years at Mozilla, Selena held various other roles including Vice President for Firefox Desktop, Senior Director for Web Platform Engineering and Gecko Runtime, and Senior Manager for Gecko Security Engineering. 

Selena also brings experience from her previous roles as co-founder of Prime Radiant, a software as a service business that explored how to improve business processes at scale with checklist automation software, and as Consulting Director of Development for The Ada Initiative, an organization that was dedicated to increasing the participation of women in open source and technology communities. She was a major contributor to PostgreSQL, one of the largest free and open source databases in the world. 

“Open collaboration produces better solutions for the world, and technology is a critical enabler of making this true,” said Selena. “I look forward to contributing to Wikimedia’s inspiring free knowledge mission.”

As Chief Product and Technology Officer, Selena will work with Wikimedia Foundation staff, technical contributors, volunteer developers, researchers, and communities to support Wikimedia’s 2030 Movement Strategy to advance free and open access to knowledge. The majority of the Foundation is focused on product and technology development in service of our mission.

Wikipedia is not just a place where the world goes for quick and reliable information. It’s a place where stories can be reframed, where the record can be corrected, where longstanding inequities can be addressed. This is exactly what Professor Nicole Lugosi-Schimpf’s students at the University of Alberta attempted to do in her Fall 2020 course on Colonialism and the Criminal Justice System in Canada. Content related to Indigenous communities is woefully underdeveloped on Wikipedia, and Professor Lugosi-Schimpf’s students tackled this glaring content gap through the lens of criminal justice.

We wrote about Professor Lugosi-Schimpf’s class last year, highlighting some of the important contributions her students made. With the field being wide open, her students tackled everything from specific court cases involving Indigenous populations to the very broad subject of Indigenous Peoples and the Canadian Criminal Justice System itself. As the student who wrote this article noted, “The most difficult part of selecting a topic was that every relevant topic I considered writing about would have first required educating the reader on the broader context of the Indigenous experience in Canada. This is because there was no relevant or accurate article to backlink to. This was the basis for the decision to write on the broad topic of ‘Indigenous Peoples and the Canadian Criminal Justice System’.”

In her recent paper, Theorizing and implementing meaningful Indigenization: Wikipedia as an opportunity for course-based digital advocacy, Professor Lugosi-Schimpf and two of her students provide a systematic overview of how the Wikipedia assignment can be a critical tool in the process of decolonial-Indigenization. As they note, “To realize decolonial-Indigenization aims, instructors must acknowledge that it is not theoretically or pedagogically possible to understand and teach about Indigenous oppression without attention to how colonialism and systemic racism are intertwined.” In other words, it’s not enough to provide content where none exists, but to carefully curate that content in such a way that accurately represents the experience of Indigenous populations. All too often these communities are presented as victims and this idea is perpetuated in all aspects of society from the media to institutions of higher education. But as the article notes, “Wikipedia, if properly curated, can play an important role in decolonial-Indigenization projects.”

Who edits Wikipedia matters. As Lugosi-Schimpf and her co-authors so eloquently write, “A result of unrepresentative authorship is unrepresentative content.” This is especially true on Wikipedia where the majority of editors identify as white, male, and from Western countries. Wiki Education has long striven to not only diversify Wikipedia’s content, but to diversify its editor base as well. Diversity of content and authorship are two sides of the same coin. Though all Wikipedia articles are supposed to be neutrally written and entirely fact-based, the author ultimately decides which facts to include and which to leave out. Wikipedia can provide a diverse array of communities with an opportunity to shape their own narrative within Wikipedia’s guidelines designed to uphold accuracy and reliability. As Lugosi-Schimpf notes in the article, “Editing Wikipedia was an impetus for students to contemplate what narratives and histories are told, how they are told, and by whom. … Actively engaging the politics of citation affords students an invaluable opportunity to push back against disciplinary canons often found on syllabi to bring scholarship from the margins to the forefront.”

Peoples from historically marginalized communities have largely been left out of the story because they have rarely been given the chance to write their own narratives. If they are injected into mainstream history, it’s often as victims without agency or depth. Wikipedia offers such people a unique space to present history in and on their own terms. As Professor Lugosi-Schimpf notes, “From the course evaluations, it was clear the Wikipedia experience was rewarding for all of the students, and it was especially meaningful for the students that identify as BIPOC and/or sexual and gender minorities whose voices and perspectives are often missing from mainstream media.” And as one of Lugosi-Schimpf’s students confirmed, “As a Black bi-racial woman, I have embodied experiences with misrepresentation and stereotyping that stems from structures of white supremacy and systemic racism. The opportunity to create Wikipedia content that dispelled taken for granted assumptions for another equity seeking group, from within a supported environment, was both empowering and inspiring.”

Wikipedia is in many ways a reflection of the systemic biases present throughout society. Its reliance strictly on written sources means that many peoples and cultures are left out because they have been left out of the written record. It’s often argued that Wikipedia isn’t a place for activism. Its requirements around neutrality dictate that no single point of view should dominate an article. Its notability policies have often been criticized for excluding those who have been left out of the written record — namely historically marginalized communities. In spite of its limitations, Lugosi-Schimpf and her co-authors argue that Wikipedia is in fact a place where longstanding institutional biases can be overturned: “Despite its constraints, we assert that Wikipedia can still be leveraged as a site of digital advocacy to foster positive change. For example, once a reader has more facts and sees an assemblage of colonial projects, it is difficult to refute the damage done by settler-colonialism. Even a balanced viewpoint can cause readers to question their taken-for-granted assumptions. Striving for neutrality, while contentious, opens Wikipedia up to be an ideal place to rewrite history, because history as previously written has not been neutral.”

When students contribute to Wikipedia, it can be easy to get caught up in the technicalities of the project and the demands of the term. At its core though, students are engaging in the politics of knowledge production. They are the ones deciding which facts to include and which to leave out, and it’s our hope that the work they produce makes Wikipedia a more equitable place.

Interested in teaching with Wikipedia? Visit teach.wikiedu.org for more information.

Image: ibourgeault_tasse, CC BY 2.0, via Wikimedia Commons

Don’t Blink: Public Policy Snapshot for May 2022

15:54, Monday, 13 2022 June UTC

Welcome to the “Don’t Blink” series! Every month we compile developments from around the world that shape people’s ability to participate in the free knowledge movement. In case you blinked this month, here are the most important topics that have kept the Wikimedia Foundation’s Global Advocacy and Public Policy team busy in May.

To learn more about our team and the work we do, join our first-ever monthly conversation hour, follow us on Twitter (@WikimediaPolicy), sign up to our Wikimedia public policy mailing list, or visit our Meta-Wiki page.


Latin America and the Caribbean

  • World Press Freedom Day event in Uruguay: On 1 May, the Wikimedia Foundation sponsored and co-hosted an event to mark World Press Freedom Day with the Observatorio Latinoamericano de Regulación, Medios y Convergencia (OBSERVACOM). The session, titled More Transparency in Content Moderation: How Do We Achieve It?, explored transparency and accountability around online platforms’ content moderation practices. Amalia Toledo (Lead Public Policy Specialist for Latin America & the Caribbean) and Ricky Gaines (Senior Human Rights Advocacy Manager) participated in the event, which aimed to enhance debate around these issues in Latin America, so that civil society groups and their allies can better anticipate and respond to legislative proposals in the region that would threaten online communities and human rights on the internet. You can read our full recap of the event.

Asia

  • C20 global civil society forum: Rachel Arinii Judhistari (Lead Public Policy Specialist for Asia) represented the Foundation at the C20, the global civil society forum that runs parallel to the Group of 20 (G20) forum. As we wrote previously, the process convened civil society organizations from around the world to discuss issues related to digital transformation and inequities, among other topics. The Foundation is participating in the Working Group on Digitalization, Education, and Global Citizenship, and is contributing to a civil society policy brief that will shape G20 summit outcome documents on issues relating to free knowledge and internet regulation. 
  • Read our deep dive on the Australian Basic Online Safety Expectations (BOSE): Earlier this year, we published a blog post on the common pitfalls of online safety regulations. Aspects of these bills may threaten open knowledge communities and individuals’ fundamental and human rights to privacy, freedom of expression, and access to knowledge. We have now published a deep dive on the Australian government’s approach to online safety. The BOSE contain overly prescriptive content identification, removal, and enforcement expectations, as well as threats to encryption and privacy practices that could disproportionately expose historically underserved groups to online harm. 
  • Association of Southeast Asian Nations (ASEAN) meeting on human rights: On 18 May, Rachel Arinii Judhistari represented the Foundation at a meeting with the ASEAN Intergovernmental Commission on Human Rights (AICHR) Representative from Thailand. The topics discussed included online disinformation, harmful business models of social media platforms, and others. The Thai Representative will aim to organize a regional dialogue on digital rights this year, and we are looking forward to contributing the perspective of the open knowledge movement.

United States

  • Texas social media law: The US Supreme Court has blocked, at least for now, a Texas state law that would harm the free exchange of knowledge around the world, and threaten the legal protections that enable the community editing model of Wikimedia projects. We signed onto an amicus brief in the case a week prior to this decision, which was made on 31 May. Our brief asked the US Supreme Court to stop the law from going into effect and presented the harmful impacts of this law to free speech online. We signed the brief alongside allies at the Center for Democracy and Technology, Electronic Frontier Foundation, and others.
  • Declaration for the Future of the Internet: Since our last update for April, more than 60 countries issued a Declaration for the Future of the Internet. We strongly support the principles in the declaration, which advance free expression and the exchange of free knowledge, and have explained that we intend to hold signatories to their commitments to support a free, open, interoperable, and accessible Internet.
  • Comments to US Copyright Office’s public inquiry: We filed comments to the United States (US) Copyright Office in its inquiry into standard technical measures (STMs) under the Digital Millennium Copyright Act (DMCA). STMs are technical tools meant to assist in identifying copyrighted works on online platforms. In our comments, we explained why STMs must be open source, that costs to free expression and privacy must be considered when identifying STMs, and our concerns that forcing platforms to use inappropriate STMs could interfere with the exchange of free knowledge and, specifically, with Wikimedia projects’ effective copyright enforcement system.

Additional Developments

  • United Kingdom (UK) government to protect internet access in Russia: The UK government has made a major decision to protect access to the free and open internet within Russia. On 30 May, the UK government exempted transactions that enable civilian telecommunications and news media services from sanctions against Russia. The decision comes off the heels of an open letter that we signed alongside allies like AccessNow, Article 19, Open Rights Group, among others, urging the UK government to protect access to the global, open internet for the Russian people. This access is essential as it enables consulting reliable information and prevents further isolation of those who speak for human rights and against war within the country. Our request to the UK government mirrors a similar initiative that we pursued in the context of US sanctions: in response to our advocacy efforts, President Biden authorized US internet companies to continue providing essential internet services within Russia. Kate Ruane (Lead Policy Specialist for US), led both of these efforts.
  • World Intellectual Property Organization (WIPO): On 9 May, six Wikimedia chapters—France, Germany, Italy, Mexico, Sweden and Switzerland—were rejected from gaining accreditation to the WIPO’s Standing Committee on Copyright and Related Rights (SCCR), the body responsible for shaping the future of global copyright policy. China was the only country to oppose the accreditation of the Wikimedia chapters, inaccurately claiming that chapters were complicit in spreading disinformation. China has also objected to the application of the Wikimedia Foundation for observer status twice, first in 2020 and again in 2021. 

Announcements from our Team

  • Launch of monthly conversation hours: In June, our team is launching our monthly conversation hours. We want to create a dedicated space to engage directly with Wikimedia volunteers, affiliates, and Foundation staff. This forum offers you an opportunity to ask questions about our work, to share information about your own projects and initiatives, and to connect and learn from each other. Come and talk with us! All details, links, and dates are on our Meta page.
  • Human Rights Policy community conversations and survey: Our team has finished the first phase of Community Conversations on the Foundation’s Human Rights Policy. These events provided spaces for volunteers, Foundation staff, and contractors to ask questions about the policy, to share their own experiences, and to offer ideas and recommendations for its implementation. If you were unable to participate or wanted to share more about your experiences, please consider filling out our anonymous survey. This survey is currently being translated into multiple languages, but is already available in English. It will be conducted via a third-party service, LimeSurvey, which may subject it to additional terms. For more information on privacy and data-handling, see the survey’s privacy statement. The survey will remain open through 30 June 2022.

This Month in GLAM: May 2022

14:34, Monday, 13 2022 June UTC

On 6 June 2022, the Wikimedia Foundation filed an appeal to challenge a Moscow Court’s decision that the Foundation committed an administrative offense by failing to remove “prohibited” information on Wikipedia, largely related to the Russian invasion of Ukraine. In its appeal, the Wikimedia Foundation argues that information on Wikipedia should be protected by freedom of expression and does not constitute disinformation, as found by the Court. The information at issue is fact-based and verified by volunteers who continuously edit and improve articles on the site; its removal would therefore constitute a violation of people’s rights to free expression and access to knowledge.

The court fined the Foundation a total of 5 million rubles (the equivalent of approximately $65,000 USD) for refusing to remove information from Russian Wikipedia articles: Russian Invasions of Ukraine (2022), Black powder, Battle for Kyiv, War Crimes during the Russian Invasion of Ukraine, Shelling of Hospital in Mariupol, Bombing of the Mariupol Theater, Massacre in Bucha. The appeal comes on the heels of a growing number of requests by the Russian government to censor fact-based knowledge on Wikipedia and Wikimedia projects amidst the government’s ongoing invasion of Ukraine. 

According to the lower Court’s decision, the information on Wikipedia is considered disinformation, which poses risk of mass public disorder in Russia. Further, the Court declared that the Wikimedia Foundation is operating inside Russian territory, and would therefore be required to comply with Russian law. 

“This decision implies that well-sourced, verified knowledge on Wikipedia that is inconsistent with Russian government accounts constitutes disinformation,” said Stephen LaPorte, Associate General Counsel at the Wikimedia Foundation. “The government is targeting information that is vital to people’s lives in a time of crisis. We urge the court to reconsider in favor of everyone’s rights to knowledge access and free expression.”  

This action is part of a growing trend of companies and websites being asked to set up legal entities in the country, thereby placing users, staff, and equipment under the authority of the Russian government, and making it easier to request content removal from their platforms. 

 In addition to arguing that the Russian government’s request to remove information from Wikimedia projects constitutes a violation of human rights, the Wikimedia Foundation appeal contends that Russia does not have jurisdiction over the Wikimedia Foundation. Describing Wikipedia as operating inside of Russian territory mischaracterizes the global nature of its model. Wikipedia is a global resource available in over 300 languages. All of its language editions, including Russian Wikipedia, are available to anyone in any country around the world. 

Russian-language Wikipedia is a crucial second draft of history, written by and for Russian speakers around the world who volunteer their time to make reliable, fact-checked information available to all. Blocking access to Wikipedia in Russia would deny more than 145 million people access to this vital information resource. Further, the articles flagged for removal uphold Wikipedia’s standards of neutrality, verifiability, and reliable secondary sources to ensure articles are based in fact. They are well-sourced, including citations to a variety of established news sources. The articles continue to be improved by Wikipedia volunteer editors from all over the world with more sources and up-to-date information.

The Wikimedia Foundation remains committed to defending the right of everyone to freely access and share knowledge. We have not complied with any orders from the Russian government to date, and will continue to stand by our mission to deliver free knowledge to the world. 

The Russian government will have an opportunity to make a filing in response to our appeal in the coming weeks. 

For more information, please see our previous statements on 1 March 2022 and 3 March 2022.

Tech News issue #24, 2022 (June 13, 2022)

00:00, Monday, 13 2022 June UTC
previous 2022, week 24 (Monday 13 June 2022) next

Tech News: 2022-24

weeklyOSM 620

10:04, Sunday, 12 2022 June UTC

31/05/2022-06/06/2022

lead picture

OSMCha with a new feature funded by Wikimedia Italy [1] © OSMCha | map data © OpenStreetMap contributors

Mapping campaigns

  • It is with great sadness that we learnt of the sudden death of Innocent Dibloni Soungalo, an OSM contributor from Burkina Faso. A geographer by training and a geomatician by profession, a former volunteer of the Francophonie in Senegal, Innocent had worked hard to strengthen OpenStreetMap in West Africa since his debut in 2015. From Cotonou to Dakar, via Ouagadougou, Bamako, Lomé or Bouaké, many people have benefited from his teachings and will mourn him. In his memory, the African OSM community has decided to map Gaoua, his home town.
  • Christoffs reported that OSM Poland (OSMP) has recently established contacts with the blind community in Poland. This has identified their special needs and the potential for OSM support. The next step is to encourage contributors to pay special attention when mapping to tags that support the mobility of blind people.

Mapping

  • French news agency Agence France Presse reported (fr) > en about the colossal work of a volunteer team in Ukraine that is scanning buildings deemed of interest at a 5 mm resolution, both for historical and future rebuilding purposes. Similar methods had been used for the Notre-Dame de Paris’ cathedral in France, before the 2019 fire.
  • Franjo Lukezic wrote a guide to making before-and-after GIFs for visualising OpenStreetMap editing sessions.
  • The vote on the improved tagging of neighbourhood places (place=*) in Japan is open until Thursday 16 June.

Community

  • French mapper Djiril shared (fr) their main takeaways after 1 month into OpenStreetMap. Future updates will be on Github (fr) > en.
  • François Lacombe posted (fr) > en a call, on LinkedIn, asking for data about communications and power poles in France. Following a partnership between OpenStreetMap France and Enedis, the main French electric power distribution company, more than one million of the estimated total of 24 million poles have been mapped.
  • The UN Mapper of the month for June is Yacouba Diarra, from Mali, a member of YouthMappers UnivSegou.

OpenStreetMap Foundation

  • The OSMF Board intends to change the requirements to become a normal foundation member to include that people must have edited on a minimum of 15 days first, and first registered as a mapper at least 3 months ago. The announcement triggered a lengthy discussion on osmf-talk which is still ongoing. Mikel Maron has summarised what is proposed here.
  • Cristoffs wrote an open letter to the OSMF board, as a diary entry (following on from this issue), which attracted considerable comment, including from the EWG, who commented here, and indirectly from the repository author, here. The key request was that the style ought to reflect community requests for the display of new tags regardless of cartographic issues, and that the OSMF board (who currently don’t directly mandate what that style shows) should make that happen, due to the special status that the ‘standard’ style has (rendered by OSM itself, cached by Fastly, etc.). Topics covered also included the behaviour of style authors (and how issues there should be reported).
  • The OSMF Engineering Working Group has commissioned Jochen Topf to write a report outlining the problems with the current OSM data model, their impact on OSM systems, and possible improvements. Steve Coast, founder of OSM, responded by saying that there is nothing to fix. If the discussion on Y Combinator is any guide, this is not a truth universally acknowledged.

Events

  • The State of the Map Working Group is happy to announce that tickets and the programme are now accessible through the SotM 2022 website.

OSM research

  • You can now find out about the four accepted student projects for the Google Summer of Code 2022.

Humanitarian OSM

  • The HOT unSummit is offering travel funding for active HOT and humanitarian open mapping / open data contributors and community members to attend two of the conferences that they are supporting; FOSS4G and SOTM.

Maps

  • The AdV working group Smart Mapping has developed a bilingual map ((uk)/(de)), as an aid for refugees and as an experiment for future applications. More can be found about AdV projects at basemap.de (de) > en and AdV Smart Mapping.
  • Andy Townsend (SomeoneElse) wrote a diary entry describing the recent style changes visible on the map here including the use of colours to differentiate tourist and mainline railway stations, locked gates, and better roadside cycleway and footpath name handling. Mostly these are done using lua tag transforms to keep the styling code simple.

switch2OSM

  • Big Tech’s maps have led ride-sharing giant Grab astray. Grab is now building its own maps based on OSM and says it has become the largest contributor to OpenStreetMap in Southeast Asia.

Open Data

  • Cristiano Giovando wrote about the current state of affairs with OpenAerialMap version 2.

Software

  • OSMCha now allows users to search changesets that affect a certain OSM tag. For example: you can find changesets that created, modified or deleted restaurants. For more details, see the Development Seed blog post.
  • OsmAnd invites you to celebrate 12 years with OsmAnd! You can share a photo with your trip story on Instagram with the hashtag #12YearsOsmAnd; OsmAnd will choose the best and award a prize.
  • Sammyhawkrad built a simple tool to help OpenStreetMap contributors see statistics of their most common editing issues as flagged by Osmose.

Releases

  • Eugene Kizevich outlined what’s new in version 4.2 of OsmAnd for Android. Besides an adaptation of the map style, new quick actions and recording widgets, the ‘OSM Mapper assistant’ option was split into separate options: fixme tags,note tags, icons at low zooms, and waterway tunnels.

OSM in the media

  • French dataviz company WeDoData tweeted (fr) > en about the Arte series Europe, a disrupted continent (fr) episode on people-driven solutions to European transportation issues. They explain how they used OpenStreetMap to fetch the bicycle network at a continental scale and computed its evolution since 2014.

Other “geo” things

  • Jules Grandin has made (fr) > en a series of maps comparing humans and different livestock populations in French departments using data (fr) > en from the Institut national de la statistique et des études économiques (Insee).

Upcoming Events

Where What Online When Country
Nantes State of the Map France 2022 osmcalpic 2022-06-10 – 2022-06-12 flag
Zürich 141. OSM-Stammtisch/Mappingparty osmcalpic 2022-06-11 flag
臺北市 OpenStreetMap x Wikidata Taipei #41 osmcalpic 2022-06-13 flag
Washington MappingDC Mappy Hour osmcalpic 2022-06-15 flag
20095 Hamburger Mappertreffen osmcalpic 2022-06-14 flag
Berlin Missing Maps – GRC Online Mapathon osmcalpic 2022-06-14 flag
Guadalajara Curso Gratuito JOSM osmcalpic 2022-06-16 flag
Arrondissement de Tours La liberté numérique osmcalpic 2022-06-18 flag
京都市 京都!街歩き!マッピングパーティ:第31回 妙法院 osmcalpic 2022-06-18 flag
新店區 OpenStreetMap 街景踏查團 #2 三峽-大溪踏查 osmcalpic 2022-06-19 flag
OSMF Engineering Working Group meeting osmcalpic 2022-06-20
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Initiation osmcalpic 2022-06-21 flag
San Jose South Bay Map Night osmcalpic 2022-06-22 flag
152. Treffen des OSM-Stammtisches Bonn osmcalpic 2022-06-21
City of Nottingham OSM East Midlands/Nottingham meetup (online) osmcalpic 2022-06-21 flag
TeachOSM Map-Along osmcalpic 2022-06-22
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2022-06-21 flag
Washington OpenStreetMap US Mappy Hour osmcalpic 2022-06-23 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen osmcalpic 2022-06-24 flag
IJmuiden OSM Nederland bijeenkomst (online) osmcalpic 2022-06-25 flag
Tanzania Mapping Groups June Mapathon osmcalpic 2022-06-25
Arlon EPN d’Arlon – Atelier ouvert OpenStreetMap – Contribution osmcalpic 2022-06-28 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting osmcalpic 2022-06-30
Essen 17. OSM-FOSSGIS-Communitytreffen osmcalpic 2022-07-01 – 2022-07-03 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Lejun, MatthiasMatthias, Nordpfeil, PierZen, Sammyhawkrad, SomeoneElse, Strubbl, TheSwavu, derFred, erenozdemir.

Shocking tales from ornithology

02:45, Saturday, 11 2022 June UTC
Manipulative people have always made use of the dynamics of ingroups and outgroups to create diversions from bigger issues. The situation is made worse when misguided philosophies are peddled by governments that put economics ahead of ecology. The pursuit of easily gamed targets such as GDP is preferrable to ecological amelioration since money is a man-made and controllable entity. Nationalism, pride, other forms of chauvinism, the creation of enemies and the magnification of war threats are all effective tools in the arsenal of Machiavelli for use in misdirecting the masses when things go wrong. One might imagine that the educated, especially scientists, would be smart enough not to fall into these traps, but cases from history dampen hopes for such optimism.

There is a very interesting book in German by Eugeniusz Nowak called "Wissenschaftler in turbulenten Zeiten" (or scientists in turbulent times) that deals with the lives of ornithologists, conservationists and other naturalists during the Second World War. Preceded by a series of recollections published in various journals, the book was published in 2010 but I became aware of it only recently while translating some biographies into the English Wikipedia. I have not yet actually seen the book (it has about five pages on Salim Ali as well) and have had to go by secondary quotations in other content. Nowak was a student of Erwin Stresemann (with whom the first chapter deals with) and he writes about several European (but mostly German, Polish and Russian) ornithologists and their lives during the turbulent 1930s and 40s. Although Europe is pretty far from India, there are ripples that reached afar. Incidentally, Nowak's ornithological research includes studies on the expansion in range of the collared dove (Streptopelia decaocto) which the Germans called the Türkentaube, literally the "Turkish dove", a name with a baggage of cultural prejudices.

Nowak's first paper of "recollections" notes that: [he] presents the facts not as accusations or indictments, but rather as a stimulus to the younger generation of scientists to consider the issues, in particular to think “What would I have done if I had lived there or at that time?” - a thought to keep as you read on.

A shocker from this period is a paper by Dr Günther Niethammer on the birds of Auschwitz (Birkenau). This paper (read it online here) was published when Niethammer was posted to the security at the main gate of the concentration camp. You might be forgiven if you thought he was just a victim of the war. Niethammer was a proud nationalist and volunteered to join the Nazi forces in 1937 leaving his position as a curator at the Museum Koenig at Bonn.
The contrast provided by Niethammer who looked at the birds on one side
while ignoring inhumanity on the other provided
novelist Arno Surminski with a title for his 2008 novel -
Die Vogelwelt von Auschwitz
- ie. the birdlife of Auschwitz.

G. Niethammer
Niethammer studied birds around Auschwitz and also shot ducks in numbers for himself and to supply the commandant of the camp Rudolf Höss (if the name does not mean anything please do go to the linked article / or search for the name online).  Upon the death of Niethammer, an obituary (open access PDF here) was published in the Ibis of 1975 - a tribute with little mention of the war years or the fact that he rose to the rank of Obersturmführer. The Bonn museum journal had a special tribute issue noting the works and influence of Niethammer. Among the many tributes is one by Hans Kumerloeve (starts here online). A subspecies of the common jay was named as Garrulus glandarius hansguentheri by Hungarian ornithologist Andreas Keve in 1967 after the first names of Kumerloeve and Niethammer. Fortunately for the poor jay, this name is a junior synonym of  G. g. anatoliae described by Seebohm in 1883.

Meanwhile inside Auschwitz, the Polish artist Wladyslaw Siwek was making sketches of everyday life  in the camp. After the war he became a zoological artist of repute. Unfortunately there is very little that is readily accessible to English readers on the internet (beyond the Wikipedia entry).
Siwek, artist who documented life at Auschwitz
before working as a wildlife artist.
 
Hans Kumerloeve
Now for Niethammer's friend Dr Kumerloeve who also worked in the Museum Koenig at Bonn. His name was originally spelt Kummerlöwe and was, like Niethammer, a doctoral student of Johannes Meisenheimer. Kummerloeve and Niethammer made journeys on a small motorcyle to study the birds of Turkey. Kummerlöwe's political activities started earlier than Niethammer, joining the NSDAP (German: Nationalsozialistische Deutsche Arbeiterpartei = The National Socialist German Workers' Party)  in 1925 and starting the first student union of the party in 1933. Kummerlöwe soon became a member of the Ahnenerbe, a think tank meant to provide "scientific" support to the party-ideas on race and history. In 1939 he wrote an anthropological study on "Polish prisoners of war". At the museum in Dresden that he headed, he thought up ideas to promote politics and he published his ideas in 1939 and 1940. After the war, it is thought that he went to all the European libraries that held copies of this journal (Anyone interested in hunting it should look for copies of Abhandlungen und Berichte aus den Staatlichen Museen für Tierkunde und Völkerkunde in Dresden 20:1-15.) and purged them of the article which would incriminate him. According to Nowak, he even managed to get his hands (and scissors) on copies of the journal held in Moscow and Leningrad!  

The Dresden museum was also home to the German ornithologist Adolf Bernhard Meyer (1840–1911). In 1858, he translated the works of Charles Darwin and Alfred Russel Wallace into German and introduced evolutionary theory to a whole generation of German scientists. Among Meyer's amazing works is a series of avian osteological works which uses photography and depicts birds in nearly-life-like positions (wonder how it was done!) - a less artistic precursor to Katrina van Grouw's 2012 book The Unfeathered Bird. Meyer's skeleton images can be found here. In 1904 Meyer was eased out of the Dresden museum because of rising anti-semitism. Meyer does not find a place in Nowak's book.
 
Niethammer stands behind Salim Ali, 1967.
International Ornithological Congress, 1967


Nowak's book includes entries on the following scientists: (I keep this here partly for my reference as I intend to improve Wikipedia entries on several of them as and when time and resources permit. Would be amazing if others could pitch in!).
In the first of his "recollection papers" (his 1998 article) Nowak writes about the reason for writing them - noticing that the obituary for Prof. Ernst Schäfer  was a whitewash that carefully avoided any mention of his wartime activities. And this brings us to India. In a recent article in Indian Birds, Sylke Frahnert and coauthors have written about the bird collections from Sikkim in the Berlin natural history museum. In their article there is a brief statement that "The  collection  in  Berlin  has  remained  almost  unknown due  to  the  political  circumstances  of  the  expedition". This might be a bit cryptic for many but the best read on the topic is Himmler's Crusade: The true story of the 1939 Nazi expedition into Tibet (2009) by Christopher Hale. Hale writes: 
He [Himmler] revered the ancient cultures of India and the East, or at least his own weird vision of them.
These were not private enthusiasms, and they were certainly not harmless. Cranky pseudoscience nourished Himmler’s own murderous convictions about race and inspired ways of convincing others...
Himmler regarded himself not as the fantasist he was but as a patron of science. He believed that most conventional wisdom was bogus and that his power gave him a unique opportunity to promulgate new thinking. He founded the Ahnenerbe specifically to advance the study of the Aryan (or Nordic or Indo-German) race and its origins
From there, Hale goes on to examine the motivations of Schäfer and his team. He looks at how much of the science was politically driven. Swastika signs dominate some of the photos from the expedition - as if it provided for a natural tie with Buddhism in Tibet. It seems that Himmler gave Schäfer the opportunity to rise within the political hierarchy. The team that went to Sikkim included Bruno Beger. Beger was a physical anthropologist but with less than innocent motivations although that would be much harder to ascribe to the team's other pursuits like botany and ornithology. One of the results from the expedition was a film made by the entomologist of the group, Ernst Krause - Geheimnis Tibet - or secret Tibet - a copy of this 1 hour and 40 minute film is on YouTube. At around 26 minutes, you can see Bruno Beger creating face casts - first as a negative in Plaster of Paris from which a positive copy was made using resin. Hale talks about how one of the Tibetans put into a cast with just straws to breathe from went into an epileptic seizure from the claustrophobia and fear induced. The real horror however is revealed when Hale quotes a May 1943 letter from an SS officer to Beger - ‘What exactly is happening with the Jewish heads? They are lying around and taking up valuable space . . . In my opinion, the most reasonable course of action is to send them to Strasbourg . . .’ Apparently Beger had to select some prisoners from Auschwitz who appeared to have Asiatic features. Hale shows that Beger knew the fate of his selection - they were gassed for research conducted by Beger and August Hirt.
SS-Sturmbannführer Schäfer at the head of the table in Lhasa

In all, Hale makes a clear case that the Schäfer mission had quite a bit of political activity underneath. We find that Sven Hedin (Schäfer was a big fan of him in his youth. Hedin was a Nazi sympathizer who funded and supported the mission) was in contact with fellow Nazi supporter Erica Schneider-Filchner and her father Wilhelm Filchner in India, both of whom were interned later at Satara, while Bruno Beger made contact with Subhash Chandra Bose more than once. [Two of the pictures from the Bundesarchiv show a certain Bhattacharya - who appears to be a chemist working on snake venom at the Calcutta snake park - one wonders if he is Abhinash Bhattacharya.]

My review of Nowak's book must be uniquely flawed as  I have never managed to access it beyond some online snippets and English reviews.  The war had impacts on the entire region and Nowak's coverage is limited and there were many other interesting characters including the Russian ornithologist Malchevsky  who survived German bullets thanks to a fat bird observation notebook in his pocket! In the 1950's Trofim Lysenko, the crank scientist who controlled science in the USSR sought Malchevsky's help in proving his own pet theories - one of which was the ideas that cuckoos were the result of feeding hairy caterpillars to young warblers!

Issues arising from race and perceptions are of course not restricted to this period or region, one of the less glorious stories of the Smithsonian Institution concerns the honorary curator Robert Wilson Shufeldt (1850 – 1934), who, in the infamous Audubon affair, made his personal troubles with his second wife, a grand-daughter of Audubon, into one of race. He also wrote such books as America's Greatest Problem: The Negro (1915) in which we learn of the ideas of other scientists of the period like Edward Drinker Cope! Like many other obituaries, Shufeldt's is a classic whitewash.  

Even as recently as 2015, the University of Salzburg withdrew an honorary doctorate that they had given to the Nobel prize winning Konrad Lorenz for his support of the political setup and racial beliefs. It should not be that hard for scientists to figure out whether they are on the wrong side of history even if they are funded by the state. Perhaps salaried scientists in India would do well to look at the legal contracts they sign with their employers, especially the state, more carefully. The current rules make government employees less free than ordinary citizens but will the educated speak out or do they prefer shackling themselves. 

Postscripts:
  • Mixing natural history with war sometimes led to tragedy for the participants as well. In the case of Dr Manfred Oberdörffer who used his cover as an expert on leprosy to visit the borders of Afghanistan with entomologist Fred Hermann Brandt (1908–1994), an exchange of gunfire with British forces killed him although Brandt lived on to tell the tale.
  • Apparently Himmler's entanglement with ornithology also led him to dream up "Storchbein Propaganda" - a plan to send pamphlets to the Boers in South Africa via migrating storks! The German ornithologist Ernst Schüz quietly (and safely) pointed out the inefficiency of it purely on the statistics of recoveries!

Wikimedia Ukraine is collecting and telling stories of Ukrainian Wikimedia community members affected by Russia’s invasion of Ukraine. This article was first published in The Signpost. See the full collection of stories on Meta.

The Russian siege of Mariupol, a major city in southeastern Ukraine, has become one of the most profound tragedies of the 21st century.

Authorities estimate that over 20,000 civilians have died since early March, as a result of shelling, and the effects of the siege like lack of food and water. The vast majority of Mariupol’s buildings have been destroyed or severely damaged by indiscriminate shelling.

Oleksandr, known on Wikipedia as Wanderer777, was born in Mariupol and spent much of his life in the city. He eventually managed to escape from the city and is safe now, but before that he had witnessed the siege and its effects first-hand.

Mariupol after Russian shelling, photo taken by Oleksandr (credits: Wanderer 777, CC BY-SA 4.0 / Wikimedia Commons)

Oleksandr graduated from the Pryazovskyi State Technical University in Mariupol, specializing in the automatization of metallurgical processes and computer-integrated technologies.

On Wikipedia, he’s been most active in the Russian-language edition; over the past 15 years, he had the opportunity to be an administrator, a bureaucrat, a member of the Arbitration Committee, and a mediator on the topic of Ukraine. Oleksandr has also contributed to Ukrainian Wikipedia, Wikimedia Commons, and other wiki projects.

When Russia openly invaded Ukraine on February 24th, Oleksandr and his family contemplated leaving Mariupol but decided to stay, hoping that the war would not reach them quickly. The predictions proved too optimistic – Russian forces advanced rapidly in the east of Ukraine, and soon Mariupol was encircled. On the third day of the invasion, leaving was already impossible, Oleksandr recalls.

Oleksandr’s family moved to a safer western part of the city. Within a few days, the occupiers destroyed practically all civilian infrastructure. Supermarkets, electrical transformer substations, water supply pumping stations were shattered, and so were fire stations and funeral homes.

Oleksandr and other people in his building moved to the basement and lived there for a few weeks. He remembers constant shelling – a picture of a Russian tank approaching the neighborhood and indiscriminately shooting at residential buildings was not uncommon. Oleksandr’s house was hit and damaged but not destroyed – unlike most of the buildings around it, which collapsed completely.

Mariupol’s drama theater destroyed by Russian bombing (credits: Donetsk Regional Military Civil Administration, CC BY-SA 4.0)

As soon as possible in mid-March, Oleksandr and his family managed to sneak from the city to a nearby village. This wasn’t the end of their ordeal, though – they spent another month looking for ways to escape from occupied territory. Finally, they managed to leave by car in the second half of April, reaching the Ukrainian-controlled city of Zaporizhzhia.

Oleksandr says it’s a miracle he managed to leave Mariupol. People leaving later, especially military-age men, were either not allowed to leave or placed in filtration camps, effectively being jailed for an indefinite period without trial.

He helped his family move abroad and remains in Dnipro, a city in eastern Ukraine that’s controlled by the Ukrainian government and is relatively safe as compared to beleaguered Donbas.

Oleksandr says we’ll never know the full extent of the devastation in Mariupol. As he describes on his user page in Russian Wikipedia, “many people died, truly many … People were dying from missiles and shells. In houses and on the streets, in yards and shelters. When they were trying to get at least some food from destroyed shops, when they were cooking food in bonfires, when they were looking for a place that still had mobile connection. People were dying when buildings collapsed from air bombs and in basements from smoke caused by fires. People were dying from the lack of insulin, antibiotics and medications for heart diseases. People were dying from hunger and thirst.”

Now, what once was a major industrial center with over 400,000 residents is in ruins – and fully occupied by Russia. Active fighting has stopped, but the humanitarian disaster is not over – the city’s infrastructure was destroyed, and the occupying authorities aren’t likely to rebuild it soon.

For another account of the Mariupol tragedy, check the diary of doctor Oleh Zyma – also a Wikipedia editor – published by “Bird in Flight”.

On 15 June at 17:00 hrs UTC, join a Pride month panel on improving LGBTQIA+ representation within the Wikimedia movement and projects. You can watch live on YouTube.

Throughout the Wikimedia movement and its projects, many identities and backgrounds are represented and celebrated. Unfortunately, within general online spaces, there are still a lot of challenges for Queer individuals and groups that steers them away from joining communities due to their identities. 

This Pride month, on 15 June, LGBTQIA+ folks from across the Wikimedia movement will share their experiences from within the movement and talk about the achievements and challenges they continue to face in online spaces. What are some of these challenges? There is still a clear lack of representation of the LGBTQIA+ community, as well as a big gender gap in the Wikimedia projects. There’s also a lack of representation in terms of the people who build our communities: only 1% of all editors identify as trans, and less than 1% of Wikipedia biographies cover trans or nonbinary people. 

According to the open data project Humaniki, the majority of content about people on all Wikimedia projects is about men. For example, as of May 2022, only 18.38% of content in all Wikimedia projects, including biographies on Wikipedia, are about women.

Also, much of the content about LGBTQIA+ projects, activists and history, suffers from damaging edits. According to research conducted by the Foundation, 8.8% of some 500 edits made to Marsha P. Johnson’s biography were . 

This panel will be hosted by the Diversity, Equity and Inclusion team, and will shed light on perspectives experienced by folks in different areas of the movement, all with different tenures. 

The panelists are:

  • Andrea Denisse. Based in Mexico, Denisse is a Site Reliability Engineer with the Wikimedia Foundation, working on the Observability project.
  • Vic Sfriso. Sfriso is Cooperation and Inclusion Program Assistant with  Wikimedia Argentina.
  • Rae Adimer, User:Vermont. Adimer is a new Movement Communications Associate with the Wikimedia Foundation, as well as a Wikimedia Steward and Admin/CU on Meta-Wiki and the Simple English Wikipedia. They are also part of the Universal Code of Conduct Revisions Committee and the Leadership Development Working Group.
  • Marinus Uys. Based in South Africa, Uys is a Lead Learning and Development Specialist with  the Wikimedia Foundation. He is also a Wikipedia editor and volunteer in LGBTQIA+ organizations within South Africa. 

You can join the event on 15 June at 17:00 hrs UTC. Watch live on YouTube.

Adding women physicists to the Spanish Wikipedia

16:03, Wednesday, 08 2022 June UTC
Sofia presents in front of a flip chart
Sofía Flores Fuentes.
Image courtesy Sofía Flores Fuentes, all rights reserved.

Sofía Flores Fuentes is a science communicator. She’s been a university professor, a civil servant, and an independent public engagement person. Currently, she’s working as a communicator at the Physics Institute of the National Autonomous University of Mexico (UNAM). Her most recent medium of science communication? Wikipedia.

“Wikipedia is a great platform, if not the best platform, to freely communicate science and information based on evidence,” Sofía says. “It reaches any corner of the world (that has internet access) so anyone can exploit the information located here. I think as science communicators we have the responsibility of knowing how to use Wikipedia.”

Sofía learned to edit Wikipedia through a recent Wiki Scientists course run by Wiki Education and sponsored by the American Physical Society (APS). A colleague had recommended the course, and she knew it was the help she needed to jumpstart her work on Wikipedia. The course focused on improving biographies of underrepresented physicists on Wikipedia, a cause near and dear to Sofía’s heart.

While the course was taught in English and focused on the English Wikipedia, Sofía took the opportunity to use her bilingualism to improve Spanish Wikipedia articles too. She expanded the article on María Ester Brandan and created the article on Myriam Mondragón Ceballos.

“The Wiki Scientists course gave me the tools to write an article. Even though the Spanish version changes a bit, I had the chance to go into the platform, learn the process and how it works in general terms,” Sofía says. “However, the most important thing I got from the course was the confidence to do it. Wikipedia seemed like a dark universe to me, that couldn’t be penetrated that easily. After this course I now feel like it is a fascinating world created and nourished by a vibrant community, and all the respect and values involved.”

Sofía found the differences in processes between the Spanish and English Wikipedia interesting, as well as the differences in discussions. She’s inspired to keep editing articles about Mexican physicists, especially women. And she hopes to have events at her institution to support others to edit as well.

“I am a science communicator who loves writing articles. But I also stand for the access to information, so I try to dedicate my professional work so people can have the possibility of learning and being informed. I also think that humanity can do great things that can benefit other people, so I believe Wikipedia is a great effort for humans to reach this goal,” Sofía says. “I’m just grateful for APS giving me the opportunity to learn. I think that a lot of people like me can make the most from your work so we can also help others.”

Episode 114: Lionel Scheepmans

16:59, Tuesday, 07 2022 June UTC

🕑 1 hour 38 minutes

Lionel Scheepmans is a co-founder of the Wikimedia Belgium chapter, an open source and open knowledge activist, and a PhD student at the University of Louvain. He is also currently running for the Wikimedia Foundation Board of Trustees.

Links for some of the topics discussed:

50,000 video games on Wikidata

16:26, Tuesday, 07 2022 June UTC

Wikidata’s WikiProject Video games recently passed a major milestone: 50,000 video game (Q7889) items on Wikidata. Let’s use that opportunity to draw a quick mid-year report.

Description

Let’s look at how these items are described along some basic properties − asking the Wikidata Query Service for some pretty graphs, and using my trusted inteGraality for some more advanced statistics.

Over 85% of the items have a platform (P400) statement (which does not mean that we have 85% completion on that topic, since many games are published on several platforms, and we may only have recorded one or a couple of them).

78% of the items have a publication date (P577)

67% have a genre (P136) − we have a very long tail of 600 distinct values as genres (some of which could use a clean-up, granted 🙂 )

Just above 53% have a country of origin (P495)

Just under 50% of the items have a developer (P178) or a publisher (P123).

Links to Wikipedia

77% of the items are linked to an article in at least one language-version of Wikipedia − English comes first (52%), then French (30%) and then Japanese (25%).

What I also find interesting is to look at items linked to only one Wikipedia language version: some 13% only have an article in the English-language Wikipedia, almost 10% only to Japanese-language Wikipedia, then comes French-language Wikipedia with 3% of items.

External identifiers

Over at Wikidata we link to hundreds of other video game databases.

The king here is MobyGames game ID (P1933), used on over 50% of our Q7889 items. Then come the 34% of Internet Game Database game ID (P5794), 27% of GameFAQs game ID (P4769), 20% of PCGamingWiki ID (P6337), 19% of speedrun.com game ID (P6783), 17,4% of the Media Arts Database ID (P7886), 16% of Giant Bomb ID (P5247), 15,2% of OGDB game title ID (P7564), 14,8% of Igromania ID (P6827)… and a very very long tail of sometimes highly specialized databases.

(The most represented are English-language databases, but the list above includes one Japanese, German and Russian databases)

Some caveats

1/ By the time of writing this, we already reached 50,444 items. Ah well 🙂

2/ We had actually passed the milestone of “50K games” on Wikidata before. Looking strictly at instance of (P31)=video game (Q7889) items does not tell the full story, as we have a long tail of subclasses also used as P31: some refer to distinct concepts (the 850 DLCs or 587 expansion packs), while others are indeed games (192 mobile game, 120 video game remaster, 102 browser game, 100 video game remake…)
Both raise questions on our modelling − which we shall leave for another day and another post.

3/ 50,000 is definitely something to be proud of, but is still far from the almost 300,000 entries in Mobygames, the 80,000 of GiantBomb, the 63,000 of OGDB… and as such, is indeed a milestone on the road we have ahead of us.

Link collection

Tech/News/2022/22

14:39, Tuesday, 07 2022 June UTC

Other languages: Bahasa Indonesia, Deutsch, English,español, français, italiano, magyar, polski, português, português do Brasil, svenska, čeština, русский, українська, עברית, العربية, ,فارسی ,हिन्दी, বাংলা, ไทย, 中文, 日本語

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Changes later this week

  • The new version of MediaWiki will be on test wikis and MediaWiki.org from 7 June. It will be on non-Wikipedia wikis and some Wikipedias from 8 June. It will be on all wikis from 9 June (calendar).
  • A new str_replace_regexp() function can be used in abuse filters to replace parts of text using a regular expression. [1]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

By: Evelin Heidel, Wikimedistas de Uruguay & Brisa Ceccon, Wikimedia Foundation

With the global #WikiForHumanRights campaign, Wikimedia organizers have been doing local or regional activities to highlight the importance of open knowledge to tackle the triple planetary crisis of biodiversity loss, pollution and climate change. The connection between human rights and the environment has been vivid in Latin America in recent decades. 

Join us for a panel discussion!

To learn more, we invite you to join us on 15 June  2022, from 15 to 16.30 UTC for the panel conversation: “Climate Change, Access to Information and Open Knowledge in Latin America: Challenges and Future Prospects”. Register for the event on Zoom. We will be providing simultaneous interpretation to Portuguese, English and Spanish.

Experts from various international and regional organizations such as the Inter-American Development Bank (IDB), the Food and Agriculture Organization of the United Nations (FAO), the Climate Finance Group for Latin America and the Caribbean (GFLAC), and the United Nations Economic Commission for Latin America and the Caribbean (ECLAC), who play an important role as generators of knowledge and advisors in the design of public policies and investments in climate change in Latin America and the Caribbean, will discuss the following topics: 

  • What are the main climate change information gaps in the region and their causes? 
  • What are the impacts of the lack of access to accurate, quality and locally/regionally relevant climate change information in Latin America? 
  • What actions and partnerships can we undertake to facilitate access to such information? Where should we collectively put the focus?

The expert panelists include:

  • Sandra Guzmán, Climate Policy Initiative manager and founder of the GFLAC, the Climate Finance Group of Latin America and the Caribbean.
  • Graham Watkins, Chief of the Climate Change Division, IDB.
  • Martial Bernoux, Senior Natural Resources Officer, FAO.
  • Julie Gail Lennox, Head of the Agricultural Development Unit and Focal Point for Climate Change at UN ECLAC’s Subregional Headquarters in Mexico.
Flyer to invite people
Help us spread the word on social media!

Access to environmental knowledge matters more than ever

One of the main challenges that have inspired our actions in the  #WikiForHumanRights campaign this year is that access to knowledge about the climate crisis is not available to everyone. The climate movement has taught us that not everyone is impacted by  environmental crises in the same way. Climate change has disparate impacts that worsen already inexisting inequalities, such as race, gender and age inequalities. For that reason, not every region will suffer the same impacts or have the same response or solutions for climate change.

Latin America is home to six out of the 17 megadiverse countries in the world that are under threat from human activities such as deforestation, mining and intensive agriculture. This crisis is compounded by the climate crises. We depend on our ability to have access to information that is locally relevant and culturally appropriate to understand the undergoing changes, adapt better to its impacts, and design creative solutions to these crises. 

Wikipedia, the free and multilingual online encyclopedia, is the largest and most-read reference work in history and also one of the 15 most popular websites in the Internet, and the only non-profit site on that list. Therefore, Wikipedia plays an important role in providing access to this information in the main languages spoken in Latin America, including Spanish and Portuguese. 

Millions of people access Wikipedia to search for information about climate change in their own language. Last year, around 25 thousand articles on climate change across Wikipedias were viewed 325 million times. Everyone needs access to environmental information in their own language, as recognized in Latin America recently by the Escazú Agreement, in order to make decisions in their everyday lives. Wikipedia can provide that knowledge to all the decision makers in their own language, so that we can work together to solve these compounding crises.
The virtual event will be open to the public and we will be providing simultaneous interpretation into Portuguese, English and Spanish. In order to access the interpretation, you need to register in the Zoom link. The original audio will be streamed through the Wikimedistas de Uruguay YouTube channel. A more detailed agenda is available here.

Announcing WikiArabia 2022 Conference in Dubai, UAE

16:21, Monday, 06 2022 June UTC

The Wikimedians of UAE user group is excited to announce that the 6th edition of the WikiArabia conference will be held in Dubai, United Arab Emirates from the 28th – 30th October 2022.  This will be the first Wikimedia event of its kind held in the UAE and the Gulf region.

As a new addition to the Arabic Wikimedia community, the Wikimedians of UAE User Group believes in the importance of sharing opinions and experiences of those who have been active in various Arabic user groups in the region. This will lead to the strength and enrichment of the Arab Wikimedia community and provide Arabic speakers with a cohesive and unified voice within the international community.

Additionally, the WikiArabia 2022 conference will provide an important opportunity to work on policies that concern the Arab community as well as design strategies on increasing participation and creating sustainable communities of editors.

We look forward to meeting and connecting with the wider Wikimedia Arab community in person.

The Organizing Team

Ahlam Bolooki, Hani Yakan, Serine Ben Brahim, Dania Droubi, Reda Kerbouche

We are also honored to announce the names of the programming and scholarship committees who will be supporting us in organizing a successful event:

Programming Committee:

The Scholarship Committee:

Registration is now open to register attendance and submissions starting from today until Monday, 11th of July 2022. 

To register for attendance please visit this page. For submissions please visit this link

For more details, please visit our meta page.

In case you have any questions regarding the conference, please do not hesitate to contact the Organising Committee members, or email us on [email protected]

In search of the least viewed article on Wikipedia

07:00, Monday, 06 2022 June UTC

Wikipedia sure is popular. The most popular articles in a given week routinely get millions of views. But with 6 million plus articles, Wikipedia has plenty of room for articles about topics which are profoundly obscure, even downright boring. I should know, I’ve written dozens of them! Some of what I consider to be my finest contributions to Wikipedia are lucky to get a couple of views per day, for example:

Of my creations, the least popular seems to be Sunday reading periodical, an article about a Victorian magazine genre which averages around a dozen views per month.

Are there articles with even less popular appeal than that?

Though Wikipedia page view data is publicly available (as a massive raw data dump, and through an API), there’s unfortunately no easy way to sort out the least viewed pages, short of a very slow linear search for the needle in the haystack…

A smaller haystack

As a starting point, I grabbed 2021 pageview data for a random sample of about 32,000 Wikipedia articles. Maybe the properties of the least viewed articles in the sample will lead us to some heuristics we can use to narrow our search for the least viewed articles.

Here’s what the distribution of views looks like for that sample. I’ve used a logarithmic scale, since the values are widely spread out. The median article gets a little under 1,000 views annually. The average is around 13,000, thanks to the long tail.

We have almost 100 articles in the sample whose total views in 2021 are in the single digits(!). Here’s a peek at the first few:

But these are disambiguation pages – navigational aids which link to similarly named articles, but which aren’t themselves “real” articles, at least for our purposes. And in fact, all of the 50 least viewed pages in our dataset are disambiguation pages – they seem to have a notably lower floor on their pageviews than other articles.

After filtering out disambiguation pages, we’re left with a small handful of articles with single-digit annual views (ranging from 7 to 9):

These obscure 2 or 3 sentence stubs average less than one view per month! That figure is so small, I suspect most or all of those might come from readers hitting the “Random article” button. This would help explain why the least viewed pages in our sample are all disambiguation pages – the “Random article” button was coded to ignore disambiguation pages starting in 2015.

There’s an effective way we can test this hypothesis. And if it’s true, it will give us an important clue for finding the least viewed article on Wikipedia.

Interlude: how the “Random article” button works

Here’s a dark secret about Wikipedia: due to some peculiarities in its implementation, the “Random article” button isn’t as random as you might think.

Whenever an article is created on Wikipedia, it’s assigned a random number between 0 and 1 (stored in the database as a field called page_random). As a toy example, suppose our encyclopedia has just 5 pages, with the following page_random values:

When someone hits the “Random article” button, the server generates a random number between 0 and 1.

ASCII archer by jah/SSt via asciiart.eu.

Let’s say our drunken archer’s arrow randomly lands at 0.29. The server will then search for and return the article in the database with the next-highest page_random value after 0.29. In this case, that’s Cow Tools.

ASCII arrow: own work.

As you might have surmised, this is not exactly a “fair” process. There is only a small range of values that will get us to Musca depicta: those between 0.15 and 0.2 (represented by the orange region above). It will only come up about 5% of the time, whereas Fox tossing will come up 46% of the time.

The probability of a given article being landed on is equal to the size of what I’ll call its random gap: the difference between the article’s page_random value and the next-lowest page_random value in the database. In the diagrams above, the size of each article’s colored rectangle corresponds to its random gap.

If the random article button is responsible for most of the pageviews for the project’s least popular articles, this leads to a couple testable predictions:

  1. That the least viewed articles will have unusually small random gaps
  2. That there is a (weak) correlation between random gap size and pageviews. This correlation should be most apparent when looking at the least viewed articles.

Are the least viewed articles in our sample “unlucky”?

Since there are around 6 million Wikipedia articles, the average random gap must be about 1/6,000,000, or 1.67e-7 in scientific notation. How big are the random gaps for the least viewed articles in our sample?

The least viewed article in the sample, Erygia sigillata, has a page_random value of 0.500764585777. The article Katherine Hanley is right on its tail with a value of 0.500764582314, which is just 0.000000003 less, or 3e-9 in scientific notation. This is 98% smaller than the average random gap. In other words, Erygia sigillatais an extremely unlucky article as far as the “Random article” button is concerned! It’s 50 times less likely to be landed on than an average article.

The random gaps for the 5 other articles in our sample with single-digit annual views are: 3e-9, 9e-9, 8e-9, 4e-9, 8e-9, 2e-8. All about an order of magnitude smaller than average. Quite a strong pattern!

Is there a correlation between random gap and views?

In the grand view of our sample of 32,000 articles, it seems like a wash:

(If anything, it might look like articles with smaller gaps get more views, but this is just an artefact of the fact that most articles have gaps which are close to the average.)

But we predicted that random gap will only have a noticeable effect on the floor of pageviews. Let’s do an extreme zoom-in on the very bottom of the plot, looking only at articles with less than 200 annual views:

An even clearer picture emerges if we limit our analysis to articles which are a priori probably uninteresting, such as short articles about moth species (sorry, entomologists). Here’s a scatterplot of random gap vs. total views in 2021 for all ~1,500 pages in Category:Phaegopterina stubs:

This must be how those scientists felt when they first saw a graph of the cosmic microwave background radiation! (To get a sense of how coherent this pattern is, here is what the same graph would look like under the null hypothesis of no association between random gap and page views. I synthesized this by randomly permuting the pageview values in the dataset.)

Based on our findings above, the least viewed articles on Wikipedia are not going to be merely about topics with little popular interest – they must also be “unlucky” in the sense of having very small random gaps.

We can considerably narrow our search for the least viewed articles of 2021 by limiting our analysis to pages with small random gaps. I set a threshold of 1.7e-8, or about 1/10th of the average gap size.

Of these 600,000 least lucky articles, all received at least a few views in 2021. The booby prize for least popular article of 2021 is shared by two articles which received exactly 3 probably-human pageviews:

If you guessed that these are both moth species, you would be right.

Patterns in unpopular articles

You can check out a larger leaderboard of the 500 least viewed articles here. The list is remarkably consistent in its subject matter:

  • A significant majority of them are about species or other taxons of insects (plus 17 gastropods, and one fungus).
  • The next most common category is obscure geographical features, especially (for some reason) towns in Iran and Sri Lanka. My favourite of these is the deliciously laconic Kälberbuckel.
  • One other recurring genre are set index articles like C24H31FO5DottleySukmanovka, and Great polemonium. (A set index article is a page which looks and functions like a disambiguation page but isn’t, because of reasons.)

There are a small number of articles not falling into the previously-mentioned categories. Some feel like living fossils from an earlier age of Wikipedia when standards of demonstrated notability were looser. It’s a little questionable whether articles like DMZ//38 or EuroNanoForum 2009 could weather a deletion discussion today.

Why so many moths?

The Wikipedia community’s policies and practices around which articles are “notable” (worthy of an article) and which get deleted have a healthy pragmatism to them. If Wikipedia allowed articles about anything, we would see a lot more articles about obscure garage bands, businesses, and living people. The authors of these articles would not be disinterested scholars writing with the goal of expanding the largest collection of knowledge on the internet. Rather, we would get a lot of editors with conflicts of interest, using Wikipedia for publicity, profit, or to settle a score. Before the community tightened up its notability criteria, it was not so uncommon in the very early days of the project to see blatant autobiographies, advertisements, or attack pages. Here are just a few examples based on real articles from Wikipedia’s early years which have since been deleted (names and details have been altered to protect the “innocent”):

Mian Amir Rashid is the youngest elected chairman of Pakistan chapter of Mensa. He assumed the post in 2001 at the age of 23. Under his tenure Mensa has grown very rapidly and now operating in 5 cities of Pakistan including Karachi, Lahore & Capital Islamabad

Mr. Rashid is a Public Relations & Marketing consultant by profession.

Union Cab is a cab company in Saint Paul, MN. They can be reached at http://www.unioncab.biz or 555-242-2000.

–Sam

Trevor Shelby is a Canadian businessman and robotics engineer. He is the founder and CEO of Polybonk.

Mr. Shelby and Polybonk were the subject of a Human Rights Tribunal of Quebec inquiry alleging discrimination in employment practices.[1] During the course of the inquiry, Mr. Shelby’s professional qualifications were called into question.[2]

Shelby also created controversy in a highly publicized case of road rage. According to the police report, he menaced another driver with a tennis racquet while hurling obscenities.[3]

The Ghosties are a small band from Melbourne. Nick sings and plays guitar, Sumeet plays bass if he hasn’t been naughty, Clark plays guitar properly and Kris makes the band seem good on the drums.

With their trademark songs Dear Robby, and Firecracker, this band are very cool, and their unmeasurable spontaneity is the stuff of legends. Learn more about The Ghosties on their websiteThe Forum should contain the dates and times of any upcoming gigs.

Over time, Wikipedia has developed a strong immune response against those who would try to use it for nefarious purposes, in the form of strict sourcing requirements for the sorts of topics shown above (e.g. living people, companies, bands). The existence of, say, the Union Cab company may be verifiable via primary sources, such as local business listings, but that’s not enough to secure it a place on Wikipedia. It needs significant coverage in multiple independent secondary sources. It makes sense then that we see almost no articles about these sorts of topics in the bottom 500. Any subject that meets these strict sourcing requirements is probably going to be of interest to someone beyond just those surfing the “Random article” button.

On the other hand, no-one has yet come up with a way to monetize a topic like Pseudoneuroterus mazandarani or use it to push a contentious point of view. Hence articles about species and populated places are generally not deleted, even if the topic is only weakly sourced – and most of our unpopular articles are weakly sourced, often having just a single citation to a primary source such as a database or gazetteer, or a passing mention in a single book or journal article.

Because the bar for these topics is so low, many of these articles feel a little soulless, having the appearance of being popped out via a mechanical (perhaps even fully automated) process. For example, the 12-word stub Pottallinda (5 views last year) was created on 18 January 2011 by User:Ser Amantio di Nicolao, who happens to be the most active editor in all of Wikipedia (as measured by number of edits). Within 60 seconds of creating this page, the same editor also created PolmalagamaPolommanaPolpitiyaPolwatta, and dozens of other substantially identical articles.

But hey, these hyper-obscure, tiny articles aren’t doing any harm (other than maybe disappointing the dozen people per year who land on them, rather than a more interesting fleshed-out article, when hitting the “Random article” button), and they lay a groundwork that other editors might build on in the future.

The pageview data used in this post, as well as the code used to scrape and analyse it, is available on GitHub here.

Originally published by Colin Morris at colinmorris.github.io on May 26, 2022

Licensed CC BY-SA 4.0

Tech News issue #23, 2022 (June 6, 2022)

00:00, Monday, 06 2022 June UTC
previous 2022, week 23 (Monday 06 June 2022) next

Tech News: 2022-23