Week 5: Think About your Audience - Outreachy Round 24 Blog

Hello! The aim of this week’s blog post is to explain my Outreachy project to a newcomer to the Wikimedia community. Here’s my attempt:

You’ve almost definitely heard of Wikipedia: a community-run encyclopedia aiming to provide accurate, comprehensive, and cited information on just about everything. Wikipedia is only one project of the Wikimedia Foundation: an assortment of communities aiming to provide free information and resources to the world.

Encyclopedias like Wikipedia or image collections like Wikimedia Commons are completely untraversable if not categorised and structured. Within the Wikiemdia foundation, Wikidata helps achieve this task: operating as a structured data storage hub for Wikimedia projects.

As mentioned above, information on Wikimedia is (ideally) cited. As a result, much of the data stored on Wikidata are scientific papers: reliable sources that make sure the information provided across Wikimedia is accurate.

Each item corresponding to a scientific paper in Wikidata contains information about said paper: its title, on what academic databases it can be found, when it was published… and ideally, its authors.

Many scientific papers on Wikidata are missing author information: either failing to have it completely or having it in incomplete chunks. The aim of my Outreachy project is to accurately add author information to existing Wikidata articles.

This is not a task as simple as simply extracting and adding author names: names are complicated. What makes a first name a first name and a last name a last name is a complex question, particularly when dealing with non-standard Anglosaxon names. Chinese and Japanese names, for instance, are often transliterated in the same way despite being spelled differently, and Latin American names are often incorrectly transcribed, with last names and middle names often being mixed up or discarded entirely.

My project is focused on making sure names are added accurately to Wikidata pages. This is done by finding said names on existing academic databases and parsing them correctly - making sure to account for possible cultural difficulties. By adding authors to items, articles are able to be searched by author in a way that allows for more comprehensive and detailed searching of information across Wikimedia.

Much of the work being done on the project involves accurately using academic APIs, or Application Programming Interfaces. APIs are calls on other computer programs that provide useful information. For this project, we’re interested in gathering citations from APIs which author names can be extracted from. APIs can be (ideally) provided officially by an organisation or found on a website.

That’s basically it! Adding accurate name information to scientific articles on Wikidata, making sure names are being found accurately. Hope that all made sense!