Just like the dust that settles in the atmosphere, open source programs have begun to settle within all areas of research. Just ask Paolo Veglio, a researcher for the University of Wisconsin-Madison Space Science and Engineering Center; his research revolves around both concepts, bringing together the study of the atmosphere and the use of open source simulation.
Veglio says open source programming has become standard amongst he and his colleagues.
“Scientists are okay sharing stuff. Again, the reward is always like, oh yeah, I have this powerful tool. I’m happy for you to use it, just give me credit.”
Veglio’s main research project is currently the NASA Cloudmask program. In simple terms, this program reads images and readings from a NASA and NOAA satellites and assigns every pixel of every image with the probability for a cloud to be present in that spot. The open source algorithm used to calculate this probability has been in constant development for over 30 years, with current optimization efforts being focused on the software’s memory space.
“Say you’re looking at a photograph you snapped with your phone of a field scattered with flowers,” Veglio described, “You can use colors to identify what pixels are flowers instead of grass.”
“The cloud mask behaves in a similar way; you take a photo of the atmosphere and you try to identify what pixels are cloudy. The main difference is that while your phone or camera only captures images in Red-Green-Blue channels, satellite-borne sensors capture data in additional channels, both in the visible and near-infrared parts of the spectrum to provide a more complete picture of the atmosphere.”
What kind of research can come from studying the cloudiness of Earth? Well, it is not just cloudiness they are measuring, Veglio insisted- it’s the presence of contaminants as well. By measuring the presence of a cloud in a pixel, you can begin to identify what elements were formerly there. You can then measure how large their presence was, how they interacted with other elements, and from there assess how they may have impacted the climate.
Veglio’s efforts are mainly focused towards training an algorithm that can create synthetic data sets so realistic that they will be able to simulate any potential scenario. These datasets will make climate predictions more accurate than ever before.
The Cloudmask program was built on open source tools like Python and the C coding language. It also has relied on other open source tools, like the NASA GEOS-5 model, to get contextual information and improve retrieval output, demonstrating the importance of open source resources in creating progress for science.
For Veglio, the main benefit of open source is how it allows scientists to skip the work that has already been completed.
“It’s like Newton said: I see far because I stand on the shoulders of giants.” Veglio stated, “If everyone started reinventing logic and arithmetic… we would never go anywhere. Open source is the same idea…. we have all these tools, let’s use them.”
An especially important part of open source is the community it has built. Internationally, thousands of scientists have worked on the same pieces of code. For Veglio, that means that no piece of code is a true mystery- there is someone, somewhere, who knows exactly how to use it down to the deepest bits of binary. That kind of knowledge allows everyone to contribute, even if all they leave is a complaint or a pull request.
The open source community here at UW Madison is especially vibrant. Veglio himself is directly involved, acting as a mentor for one of the Open Source Program Office’s Internship projects.
Working directly with the intern, Veglio has been able to pass on his knowledge of the Cloudmask project. With a dedicated student working on the project, the Cloudmask program has developed at a rapid pace. The intern has been able to support the memory optimization of the algorithm, which Veglio says has radically improved performance and scalability.
Madison’s open source community expands even beyond the Open Source Program Office. There is a robust network of programmers, Veglio says, with some even working in his office. Oftentimes, when he finds a valuable piece of open source code, he finds out it initially began with a researcher at the University of Wisconsin-Madison.
“Sure enough, I was searching online and found a piece of open source code that was related to what I was doing,” Paolo began, “Before I grabbed it, I looked at the creator. It was literally taken from the code that someone gave me. It was actually from a guy who worked in my office.”
Much of Veglio’s research is built from taking on the progress of others. Progress, he insists, is built on sharing resources. Whether those resources come from someone on campus, or from someone across the world, improvements can only be made when people work together.
“When we share our projects, we double our knowledge,” Veglio said, “That idea applies directly to open source. Our knowledge is shareable, and it keeps scaling up. When we work together, our knowledge just keeps adding.”