Week 7: Modifying Expectations

This week, Outreachy interns are encouraged to discuss how their projects are progressing, giving particular importance to comparing where we thought we would be as described in our original timeline.

I’ll start out my thanking one of my mentors, Mike, for encouraging me to space out my timeline a bit. My original timeline was overly ambitious and sprawled out, with me not yet aware of how difficult and error-prone the process of name identification really is!

The timeline I ended up submitting as part of my Outreachy application was much more reasonable, and as a result I’m actually not too far off from where I said I would be at this point! Let’s dig in:

Successes

  • I sought out to have a successful citation extractor and name adder that worked with both simple and complex names. This is a success!
  • I’m actually ahead in terms of bot request status: I aimed to have a bot request sent out during Week 8, and I got my bot request up at the beginning of the week! Great news, too, as dealing with bot request questions is turning out to be fairly tricky.
  • Algorithm works with both simple and complex names and checks for distinct name types through a variety of check processes.

Changes

  • I aimed to have a program that automatically identified authors from author name strings and linked to them on the Wikidata page when added. This proved to be much harder than I thought, mainly due to the abundance of authors with the same name. Even filtering by profession ended up not being enough, as there can be multiple people with the same name working in the same field. Maybe there’s another Wikimedian named Roberto Garcia that I don’t know anything about! As a result of this, I’ve set up an ‘extension goal’ of matching ORCID IDs to authors in articles that are already on Wikidata for exact matches. This is fairly difficult and something that others have already tried to do, although I’m unsure if the project is still active.
  • I have not prioritised using ML methods thus far, mainly because it’s seemed unnecessary in a matching algorithm that has overall worked very well, but it might be good to start thinking about it!

My goals for the second half of the internship haven’t changed considerably: continue polishing up my program, get my bot approved, and make sure names are dealt with the best way possible! I am a bit upset that I’m mostly working with author name strings, but manual author matching seems to be a good next step for the future should I continue cooperating with the Wikimedia foundation (and I see no reason why I wouldn’t!).