Learning to Code, Learning to Collaborate

Over the 2014-2015 year, I was a historian in an engineering school. As a postdoc at the Coleman Fung Institute for Engineering Leadership at UC Berkeley, I had the opportunity to collaborate with Professor Lee Fleming and his team on using a world-leading database of patent data to develop data visualizations for better understanding the history of technology.

One tool we developed is the Patent Co-Inventor Network Tool, which creates social network diagrams showing who inventors co-invented with (and who these co-inventors co-invented with), in a given time frame. Examples of the visualizations it produces include:

Inventors, co-inventors, and co-co-inventors within 1990-2001

Inventors, co-inventors, co-co-inventors, and co-co-co-inventors within 1996-2001

This post is not about that digital history project. Instead, I have been asked to share my thoughts about getting into the nebulous (but sometimes lucrative) field of digital humanities, and collaborating with technically-skilled colleagues.

The Value of DH

Is it worth diverting time from your research for side projects? As one data point, I got interviews for about one out of every seven history positions I applied for last year, and three out of four "digital humanities" positions. I'm probably more qualified in history per se, but there are just fewer people doing this work. The position I took, based in a history department, required DH qualifications, but supports my non-DH work. It also opens a lot of doors if I leave academia.

Learning to Code

There is a lot to be said for learning some basic coding, even if you are never going to be an expert. When Lee went looking for someone to fill this postdoc, he wanted a historian of technology with sufficient coding skills (especially an ability to write MySQL queries) to play around with the database independently. Having to ask someone else to gather data would have meant major delays at each step, and hindered any kind of exploration and experimentation. The principle applies more broadly – even if you're going to be leaning on someone else for the heavy technical lifting, you should at least try to learn enough to recognize how much you're asking of your collaborators.

For grad students lacking any formal background in coding (I had one course of scientific programming in C/C++ during undergrad), there are a lot of great ways to teach yourself the beginnings of coding, such as CodeAcademy, Berkeley's D-Lab courses, the Programming Historian, or summer coding boot camps. The languages you need to learn depend on your project, but the following are especially flexible and useful for social scientists: Python, PHP/MySQL, R, Javascript. It's also worth picking up the basics of TeX, a system for word processing that's better suited for displaying technical content.

Much like spoken languages, you'll probably never learn to code if you don't have an opportunity to apply it to real problems. No amount of textbook exercises and verb conjugation will stand in for speaking German over beers for a few hours each week. No amount of coding classes will stand in for finding an interesting, but achievable project and building it up step-by-step. You'll get stuck—a lot. You'll have to search the Internet for hours to figure out why something isn't working, and eventually discover it was just a typo somewhere. But nearly every problem you'll stumble across has been solved, and there’s a solution somewhere on the Internet. It's just a matter of effective googling. That's how you learn to code.

For me, that project was the FamilyGiftLister.com website. I have a big extended family, and the logistics of getting each other Christmas presents without duplication was a real hassle. I threw together a site where we each posted wish lists, then we could secretly reserve gifts off of each other's lists. It was rudimentary, but it worked. The next year, I spent a week making it better: adding features, fixing some bugs we had discovered the previous year, and making it look nicer. The next year, I did the same and opened it up to other families. That project has now been dormant for years (and remains very ugly), but it still works, and it taught me a lot about coding in PHP and using MySQL databases, piece-by-piece, as I needed it.

Project Ideas

Having a hard time coming up with something? Here are some I would build if I had time:

Academic Wiki v2. There's a website called the Academic Wiki where people post academic job postings, then update information as they learn more about it. We could use a better version of that, where users can track individual jobs and quickly skim what jobs have been posted since you last logged in. It gets to be a real mess later in the job season. Even if your site never catches on, it would still be useful for tracking your own job apps.
Glassdoor for academic journals. Each sub-field of history has a few journals with an informal pecking order of prestige, as well as some unspoken information about what type of content belongs there. That's fine for your own field, since over time you will gather this information by reading the journals, but it can be hard to assess when you're dabbling in a new sub-field. Make a website where people can submit their opinions about the prestige order for their subfields, and have it sort the journals in some kind of reasonable order (average ranking? Weighted somehow? Maybe a composite where you can search for a journal that fits "History of Science" and "Diplomatic History"?). Bonus points if users can submit their experience with turn-around time, to get crowd-sourced data. Sometimes lower prestige but faster beats high prestige that requires an 8-month wait for any answer.
Public History Search Crawler. Certain search engines like DuckDuckGo let you pull data from their servers via an easy-to-use "API." Built a site that finds every search that includes the term "history," puts it through some semi-intelligent filters to remove junk, and see what questions people are asking the internet about history (or sociology, or anthropology, or whatever your field is). Rank them weekly, see if you can chart trends across time (e.g. was there a spike in searching for Civil War history around the Charleston shooting and Confederate flag controversy?). This would make the basis for a pretty swell blog if you could get guest posts on the week's top issues.

Required skills: I would build these in PHP/Mysql, but that's a case of "when your only tool is a hammer..." You could also use Javascript, Python, or plenty of other languages. To make it pretty, you might benefit from learning to integrate CSS.

Digital Projects Cost Time, Not (Much) Money

Web hosting, including pre-installed MySQL and a domain name, can be had for as little as $20/year – it really doesn't need to cost a great deal. There is also funding available, including collaborative research funds from DH at Berkeley, NEH DH Start-Up Grants, and more.

Once you have hosting for one project, you can usually use that same hosting for any future sites, making it even cheaper. I rented hosting some years ago for FamilyGiftLister, and that same website also hosts my personal website and the Patent Co-Inventor Network Tool (though we intend to migrate that to Fung Institute servers eventually). You can buy several domain names and host them under the same plan – for example, www.douglasoreagan.com is the same as http://familygiftlister.com/personal/

Even better that building a project alone is building it as part of a team. Each of you should focus especially on making sure everything is well-documented and clearly structured so you don't get in each other's way. Businesses are often suspicious of hiring humanities PhDs because of a perception that they won't fit into a team environment. Having collaborated on a substantial digital humanities project, and having good, clean code to show for it, can be a really valuable commodity both within and beyond academia. Plus, it's always true that the best way to learn is to teach, so helping each other out with problems and figuring out each other's code will be great for your own learning process.

Broader Thoughts on Collaboration

Expectations for scholarly output vary considerably across academic disciplines. History is a book field, meaning articles are worthwhile but secondary for my career. Starting on a new topic and producing one top-quality article in a year (and a few presentations) is a reasonable pace of work. Meanwhile, in Industrial Engineering and Operations Research (the specific department hosting me), a handful of articles per year is a low baseline.

The obvious lesson here is also the most important one:
Communicate expectations clearly. Do it early, keep doing it throughout. Ultimately, I took the lead on one paper, Lee took the lead on another, I contributed somewhat to a third group paper, and then I helped out at the center in other ways (running a seminar, helping an undergraduate researcher get up to speed on the technical end of things, etc.). Set up weekly meetings if possible, and at least keep in frequent email contact.

That compromise – more papers, and somewhat shorter ones – had a few consequences:

Collaboration often means delays, because everyone is very busy. That means you need to juggle multiple projects to keep up a decent pace of progress, which can be a very different workflow than a historian / humanities scholar might be used to. Our visualization specialist, for example, had an enormous amount of work on his plate, and so a request that he prioritize X or Y might still means 2-3 weeks before it could get done. Our database specialist similarly had other obligations, and data issues could be corrected overnight, or it could take weeks/months, depending on a lot of factors over which I had no control. I was used to being able to clear my schedule for a week and plow through a major portion of a project. That's just not always feasible when working with others, and it's worth thinking through thoroughly before you get started.

Plan time to learn as much as possible about your colleagues' fields. That's also one of the chief benefits of collaboration. Despite the significant overlap in the subject matter of innovation studies, legal scholarship on intellectual property, and history of science and technology, those communities almost never read each other's work or even communicate informally. There is a world of benefit to be gained from making connections between different areas of scholarship, and being able to signal to people in new fields that you speak their language by dropping the right reference at the right time. It does mean slotting in a lot of extra reading (and time to figure out exactly how to even find that other field's readings) in an already overly busy schedule.

Finally, it might not all work out quite right - and that’s okay. Digital humanities is still a very experimental field, and sometimes experiments fail. We didn’t set out to build the tool described above. Our first several project ideas had marginal success, producing reasonable but uninteresting results. Developing our final research project (of which the Network Tool is just one small part) took time, effort, and some frustration. If upon reflection, you find that your project isn’t living up to your standards for academic rigor, don’t be afraid to rethink and restart. The skills you’ve gained in making this first step will still be helpful in taking the next step - and you may even be able to reuse some of your code. There is a world of possibilities out there for teams willing to combine different skillsets - with all the challenge that entails - to find new ways of learning about our world.