Skip to main content

A Look Into GitHub / GitLab Repository Quality Metrics

Abe Megahed, University of Wisconsin Data Science Institute

Summary

As part of the University of Wisconsin Madison’s Open Source Program Office’s investigation into open source activities around the state, we collected and analyzed a set of information from GitHub and GitLab that yields some insights into the general state of code and repository quality.

Methodology

The UW OSPO opened in the fall of 2023 and one of our first tasks was to try to understand the current state of open source activity at the University of Wisconsin and around the state (in accordance with the Wisconsin Idea). To aid in this task, we built a set scripts to utilize the GitHub and GitLab APIs to download and compile metadata about relevant repositories.

The code and findings from this investigation are available at the following url: https://github.com/UW-Madison-DSI/UW-Open-Source-Exploration

Findings

The most important thing to note in the data was actually what was missing from most repositories. These results are summarized in the following charts: https://projects.dsi.wisc.edu/amegahed/open-source

We found the following GitHub statistics:

GitHub Features Chart
GitLab Features Chart

Proposed Remediations

We have been considering a number of possible ways that we can address the code quality problem and ways that we can help people to create better repositories.

  1. Educational Materials

    We can create a set of instructional materials that would describe the importance of each of the repository elements and serve as a guide for building quality repositories.

  2. Curated Examples

    Sometimes people learn best by example so the OSPO is working to build a curated set of showcase repositories that can demonstrate what a high quality and complete repository looks like in practice.

  3. Templates

    There are a number of examples of README templates, but there are differences of opinion on the essential elements and preferred formatting. We could create our own OSPO README template that reflects our own thoughts and experience.

  4. Classes or Workshops

    The OSPO could potentially hold classes or workshops in repository creation. The University of Wisconsin’s Data Science Hub also conducts classes on how to use Git which could potentially include a section about repository quality.

  5. Automated Repository Evaluation Tools

    We could potentially create an automated evaluation tool that would examine a particular repository and make suggestions for improvement. This might also be an opportunity to explore using AI tools to parse the README text to help with the process of creating meaningful automated suggestions.

Conclusions

Given the fact that so many repositories are lacking in the most basic and fundamental elements, the UW Open Source Program Office has an opportunity to make a significant difference in the code quality of current repositories.