Last week, we discussed project sustainability through the lens of who builds DH projects and who maintains them. Project sustainability can also be viewed as the way a project connects to adjacent tools and communities, both actively and passively. Who uses your project and how easy it is for someone to create a derivative use? How interoperable is your project —how easy it is for your project to connect to other tools and data sources? What choices — technical and social — can you make to enable future collaborations? This blog post continues a series reflecting on project sustainability, using the DiRT Directory as a case study.

Open Data & Open Culture

In the STEM fields, as well as the data-driven social sciences, there has been a cultural push for truly reproducible research. Funding institutions and peer-reviewed journals are increasingly asking researchers to make their data publicly available by default, rather than by request. In some cases, this has even become a mandated requirement from funding agencies. Humanities agencies, such as the NEH Office of Digital Humanities, now require data management plans and sustainability plans for DH Start-Up Grants and DH Implementation Grants. The new culture of open data and these funding requirements have have also oriented humanists towards making their research materials, the objects of the analysis, freely available where intellectual property agreements, fair use, or Open Access licenses permit redistribution.

Access and the right to build upon these materials also form a key pillar of open culture. Cultural data aggregation projects like the Digital Public Library of America and Europeana, and research collectives like HathiTrust, unite digital collections and institutional repositories and make that data searchable and mineable via APIs. In 2014, DH at Berkeley had the opportunity to participate in forays into open culture through hackathons co-hosted with the Bancroft Library (#HackFSM) and the Phoebe A. Hearst Museum of Anthropology (HackTheHearst). At these events, students and community members were invited to play with digital collections data, made available via an API. Where copyright does not allow the aggregation of content, the practice of linked open data allows us to sketch the landscape of information by making it easier to aggregate metadata, descriptive data, and bibliographic data.

Enabling Editing, Forking, and Remixing

Making data accessible opens it up to reuse in novel ways, such as running independent analyses of the data, combining it with other data, improving or editing it, or building new features or functionality. For example, the DiRT Directory makes its tool data available via an API, which is used to enrich other spaces in the DH community. DHCommons now uses the DiRT Directory’s API to add context to tools tagged in collaborator and project pages.

Another way to approach sustainability is to not only allow the redistribution of your data, but to invite users to edit existing data and contribute new material. Projects and tools take different approaches to inviting and managing the efforts of contributors. In this regard, DH continues to be affected by larger movements in open source software and open culture / remix culture.

Some of these efforts harness the help of a few dedicated users (such as an editorial board or a development team), while others utilize the help of many users (such as volunteers or paid crowdworkers) for short intervals of time. Since its beginning, the DiRT Directory has drawn inspiration from a vision of community-maintained resources (found in projects like Wikipedia and open source software communities) and thus, initially chose a wiki platform for its first website. Since its 2011 reformulation in Drupal, the DiRT Directory has continued to be freely editable by any user with a registered account. Revisions tracking, another feature enabled by Drupal, allows the DiRT community to give credit to editors’ contributions.

Digital humanities projects can also borrow the powerful concepts of forking and remixing from the open source software and open culture movements. Forking involves building new projects that are derived someone else’s material.  For example, Paper Machines is a plugin that connects to the open source bibliographic management system, Zotero, to perform machine learning tasks on collected texts.

The DiRT Directory participates in this culture of reuse by utilizing a Creative Commons license, which enables scholars to use data for their own purposes and form communities around it. For example, ADHO’s GeoHumanities Special Interest Group is currently at work on GeoDiRT, a directory of geospatial analysis tools that draws upon data from the DiRT Directory. Some basic metadata categories, such as whether the tool is actively maintained and what type of license it’s available under, will automatically be ported to GeoDiRT via an API. GeoDiRT editors will add more discipline specific information about geospatial software tools.

Sustaining Your Project’s Community

Any project which will be fueled mainly by user contributions will need to invest serious time and effort into outreach work, designing attractive content, and moderating user submissions. Projects like the Metadata Games and Zooniverse invite users to interact with collections through a game interface. These games entertain the user and further the project by facilitating activities like the description of collection images or transcriptions of World War I diaries.

For many projects, attracting this level of user participation and/or providing resources for that level of engagement will be unfeasible. For example, while the DiRT Directory is used by a huge number of users and is open for anyone to edit, the vast majority of its content contributions come from a small group of dedicated editors. Not all projects will have the resources (or interest) in coordinating an open source community, maintaining a communications plan, or actively implementing integrations with other projects. However, there are less involved practices that a project can utilize, such as:

  • Posting project files (database, code, etc.) to a public GitHub repository
  • Making data available via an API
  • Making data available as static downloads (e.g. CSV, JSON)
  • Choosing technical solutions that allow for multiple users to edit data
  • Licensing content for reuse with Creative Commons

We encourage researchers to imagine DH projects, from the beginning, as connected to other data sources, workflows, and communities. Taking a holistic view of sustainability reveals the true collaborative potential of digital projects, which can extend well beyond the initial team that digitizes, develops, and/ or analyzes a set of data.

Other Resources: