The next generation of environmental software needs a vision and some help

ISEES1

At a three day workshop, a group of scientists explored a vision of the “grand challenges” that eco- and earth science face in the coming decade. Each of these challenges, if answered, would provide invaluable new knowledge to resource planners and managers across the planet. And every challenge contained a workflow that called upon software capabilities, many of which do not currently exist: capabilities to handle remote and in situ observations and environmental model output in order to incorporate multiple data layers and models at several resolutions, from a prairie to the planet. Water cycles, pollution streams, carbon sequestration, climate modeling, soil dynamics, and food systems—achieving the next plateau of understanding these processes will require a massive investment in computing and software. The reason for this workshop was to help inform a new institute that can provide key services to make this investment pay off.

Much of this software will be built by research teams that propose projects to solve these grand challenges. These teams will be multi-institutional and are likely to be more focused on the science side of their project, and less on the value their software might acquire by being built on standards, using best-practice coding, and ready for reuse by others. The history of federally-funded science software is crowded with abandoned ad hoc project-based software services and products. I’ve helped to author some of these. One of the federally-funded products (a science education software package) I helped produce had its home-page URL baked into its user interface. After the project funding ended, the PI did not renew the domain name, and this was picked up by a Ukrainian hacker, who used it as the front end of a pornography portal. So the software UI (distributed in hundreds of DVDs) now pointed students to a porn site. A far more prevalent issue is that of software built with 3rd-party services (remember HyperCard?) that have subsequently changed or died, breaking the software after the funding is gone and the programmer has moved on. The point here is that there are dozens of lessons already learned by science software developers, and these need to be assembled and shared with the teams that are building new software.

There is still more value to be added here. A software institute can offer a range of services that will make funded software more reliable, more reusable, and more valuable to science. Much of the federally-funded software development will be done by university staff scientists and graduate students. Most of the latter are in the beginning stages of learning how to program. A crash course on agile programming and Git, or other basic programming skills, could help them get up to speed over a summer. An up-to-date clearinghouse of data and file format issues and recommendations, a help-desk for common CMS and data access problems, and particularly, personal (Skyped) help when the grad student hits a wall: these services can save a funded project from floundering. All together, these services can save the project’s software from an early grave. Research into extending the lifecycle of science software is needed to help science maintain the longer-term provenance of its methods and findings.

Isees2

This workshop was organized by the team that is looking to build the Institute for Sustainable Earth and Environmental Software. Here is their website: http://isees.nceas.ucsb.edu