Open Source software is ubiquitous and contributions to and adoptions of OSS have immense value at a national and global scales (missing reference). Open source software is at the core of most major operating systems, and drives most of the websites and internet services we use every day; however, Open Source software is often invisible. We may unzip a file, but it's often not readily apparent that we are directly engaging with open source software. Within academic institutions this issue of visibility may be more pressing. Professors, graduate students, undergraduates, staff, extension employees and other affiliates may all produce open source software, or may have the intention of building open source software, but unless it is widely publicized, we'd never know it, and often these individuals get little in the way of rewards for producing it.
One of the first things we wanted to do as part of the University of Wisconsin OSPO is build a tool to help us understand the landscape of Open Source software at the University, so that we could begin to identify the ways in which our work could help community members to build better software, build new communities, and learn from one another and from external experts to help foster a community of innovation and discovery around Open Source Software at the University.
A significant challenge for us to overcome was how to understand and capture the extent of OS development from our community. We can use basic search tools and APIs from various Open Source repositories like GitHub and GitLabs, but Wikipedia lists 16 Open Source code repositories in its comparison of hosting facilities, and within Academia, individuals may also share code through their personal websites, or as supplementary material within journal articles.
Regardless of where the code was, we wanted to store some critical metadata:
Beyond this, many of the larger public code repositories provide additional metadata about code repositories, such as the number of commits, file and language types, collaborator networks and others (see for example the GitHub REST API and GitLab REST API).
By reaching out to the community through our OSPO Survey we were able to begin to understand the ways that individuals in the UW Community engaged and understood Open Source software. It was also clear that individuals engaged with, and reported their engagement in a number of ways, from contributions to projects they have developed to needs they reported. This means that any product we build to help us better understand the community needs to be able to track a range of Open Software resources. We that recognize helping build a stronger culture around Open Source Software will require us to evaluate baselines, best practices, and improvements over time, to see how the community changes and evolves over time.
Over the next few months we will be releasing posts detailing how this decision making was undertaken, from surveying the community directly, scraping data using Open Software Repository APIs, and searching for data using full-text search tools across journal articles and the web. We will talk about how we built our Open Software Database, our design decisions behind an API for accessing data from the database, and how we chose to analyze, present and track the data we obtained for the database. We will also discuss how we move forward from our analysis to help build a stronger community culture around OSS.