Category Archives: eScience

EarthCube is poised to start its mission to transform the geosciences

The red areas are sandstone.

The red areas are sandstone.

Here is the current vision statement of EarthCube

EarthCube enables transformative geoscience by fostering a community committed to providing unprecedented discovery, access, and analysis of geoscience data.

The primary goal of membership in EarthCube, and indeed of the entire culture of the EarthCube organization is to support this vision. The EarthCube vision describes a future where geoscience data is openly shared, and where a new science, one based on an abundance of sharable data, assembles new knowledge about our planet. Certainly shared open source software and open access publishing are anticipated in this vision. The vision accepts that it will take a committed community of domain and data scientists to realize this goal.

What can we predict about the culture of a community committed to transformational geosciences? How is this different from the culture of a community pursuing geoscience currently? We need to start building out our imagination of what transformative geoscience will look like and do.  One thing we might agree on is that this will be a much more open and collaborative effort.

Unprecedented data discovery, access, and analysis in the geosciences coupled with open science best practices will drive knowledge production to a new plateau. Many of today’s grand challenge questions about climate change, water cycles, human population interaction with ecosystems, and other arenas will no long be refractory to solution. For now, we can call the engine for this process “Open Geosciences” or OG for short.  What will OG pioneers be doing, and how can EarthCube foster these activities?

  • Pioneering OG scientists will collect new data using shared methodologies, workflows, and data formats.
  • These OG scientists will describe their data effectively (through shared metadata) and contribute this to a shared repository.
  • OG scientists will analyze their data with software tools that collect and maintain a record of the data provenance as well as metrics on the software platform.
  • OG scientists will report out their findings in open access publications, with links to the data and software.
  • OG scientists will peer review and add value to the work of others in open review systems.
  • OG domain and data scientists will reuse open data to synthesize new knowledge, and to build and calibrate models.
  • OG software engineers will collaborate on open software to improve capabilities and sustainability.
  • OG scientists will share more than data. They will share ideas, and null results, questions and problems, building on the network effect of organizations such as EarthCube to grow collective intelligence.
  • OG science funding agencies will work with OG communities to streamline research priority decisions and access to funding.

 At this stage, EarthCube is in its most institutionally reflexive moment and is most responsive to new ideas. Like a Silicon Valley start-up flush with cash and enthusiasm, EarthCube is poised to build its future up from the ground. EarthCube can succeed in its vision without attempted to directly influence the embedded cultures of government organizations, tier one universities, professional societies, and commercial publishers. EarthCube will succeed by building its own intentional culture, starting with its membership model and focused on its vision. EarthCube will only transform geoscience by proving that its members can do better science faster and cheaper through their commitment to the modes of scientific collaboration now made possible through EarthCube. EarthCube will transform science by transforming the practices and the attitudes of its own members.

NASA image by Robert Simmon with ASTER data. Caption by Holli Riebeek with information and review provided by David Mayer, Robert Simmon, and Michael Abrams.

Hitting the target makes all the difference for the software life cycle

Sky diver jumping from plane

At a recent, NSF-funded workshop that was looking at how a new institute might help scientists become better stewards of the software they create for their research, a day was devoted to discussing the entire software life cycle, and the differences between commercial software, open-source, community-led software, and academic science software. A long list to positives and negatives accumulated to describe the triumphs and the pitfalls of each of these arenas for software development. Most of the triumphs were in the commercial software column, and the great majority of pitfalls were common to science software development.

That evening, upon reflection, it occurred to me that commercial software was simply very good at determining a target (feature and/or customer) and then hitting this target. It seemed like academic software developers, admittedly working on shoestring budgets only seemed to cobble together whatever feature their next experiment might require, with the result being software that was almost secreted over time (I almost said excreted…) instead of crafted for the long-haul.

It struck me—reflecting back on my single skydiving adventure, in the days where you still took your first jump solo on a static line—that my focus at that time had been narrowed down to the single fear of getting out of the plane. I did not want to freeze in the door. Consequently, I seemed to have not paid as close attention as I might to what happens next. As a result I ended up landing in a field away from the airport (upside: I did not land on the Interstate). I hit ground, no problem, and without breaking anything, but I missed the target completely.

Again, commercial software developers are firmly focused on their targets, and they make software that helps others find the same target too. To do this they know how to jump and when to pivot in order to land right on that X. When Instagram created its software it focused on the simplicity of sharing photos.

Open-source, community-led software tends to lose that target focus, in part because the developer community usually has several simultaneous targets in mind. What they are good at is designing the parachute and the gadgets that help to figure altitude and wind. They make jumping safer and more fun, and their goal is to enable more people to do the same.

Getting back to science software developers, these are often individuals or small teams working as a part of a larger project. They wrangle the datasets and finagle some visualizations. They add a button or a drop-down list and call it a GUI. They tell their team how to use it and what not to do so it won’t crash. Then they go ahead and do their experiments and write it up. In software life cycle terms, all they know how to do it jump out of the plane. Forget the target, never mind the parachute…just jump.

The goal of the NSF workshop was to help design an institute that would support better software development practices across the environmental and earth sciences. To do that, science software developers need to focus all the way to common targets of resourceful, reliable, and reusable software. Do you have some ideas? Feel free to join the ongoing conversation at the ISEES Google+ Community.

Photo Credit: CC licensed on Flickr by US Air Force

The next generation of environmental software needs a vision and some help

ISEES1

At a three day workshop, a group of scientists explored a vision of the “grand challenges” that eco- and earth science face in the coming decade. Each of these challenges, if answered, would provide invaluable new knowledge to resource planners and managers across the planet. And every challenge contained a workflow that called upon software capabilities, many of which do not currently exist: capabilities to handle remote and in situ observations and environmental model output in order to incorporate multiple data layers and models at several resolutions, from a prairie to the planet. Water cycles, pollution streams, carbon sequestration, climate modeling, soil dynamics, and food systems—achieving the next plateau of understanding these processes will require a massive investment in computing and software. The reason for this workshop was to help inform a new institute that can provide key services to make this investment pay off.

Much of this software will be built by research teams that propose projects to solve these grand challenges. These teams will be multi-institutional and are likely to be more focused on the science side of their project, and less on the value their software might acquire by being built on standards, using best-practice coding, and ready for reuse by others. The history of federally-funded science software is crowded with abandoned ad hoc project-based software services and products. I’ve helped to author some of these. One of the federally-funded products (a science education software package) I helped produce had its home-page URL baked into its user interface. After the project funding ended, the PI did not renew the domain name, and this was picked up by a Ukrainian hacker, who used it as the front end of a pornography portal. So the software UI (distributed in hundreds of DVDs) now pointed students to a porn site. A far more prevalent issue is that of software built with 3rd-party services (remember HyperCard?) that have subsequently changed or died, breaking the software after the funding is gone and the programmer has moved on. The point here is that there are dozens of lessons already learned by science software developers, and these need to be assembled and shared with the teams that are building new software.

There is still more value to be added here. A software institute can offer a range of services that will make funded software more reliable, more reusable, and more valuable to science. Much of the federally-funded software development will be done by university staff scientists and graduate students. Most of the latter are in the beginning stages of learning how to program. A crash course on agile programming and Git, or other basic programming skills, could help them get up to speed over a summer. An up-to-date clearinghouse of data and file format issues and recommendations, a help-desk for common CMS and data access problems, and particularly, personal (Skyped) help when the grad student hits a wall: these services can save a funded project from floundering. All together, these services can save the project’s software from an early grave. Research into extending the lifecycle of science software is needed to help science maintain the longer-term provenance of its methods and findings.

Isees2

This workshop was organized by the team that is looking to build the Institute for Sustainable Earth and Environmental Software. Here is their website: http://isees.nceas.ucsb.edu

From carrots and sticks to donuts and heroin: what academic software producers need to learn from their commercial counterparts.

Carrot and Stick

I’ve spent much of the past decade managing software development projects. These projects can be sorted into two types. One type involves collaboration with academic organizations, mainly with government agency funding. The other type is with commercial partners and an eye toward the open marketplace. Software project management for both types is similar in most ways. Both types used the same agile software development process. The agile project management process includes a conversation about user experience and engagement. In fact, it starts with user problems and stories and use cases.

The notion of customer-driven design is a central feature of all good software development. So too is the goal of creating something of immediate use and widespread need. There are some differences that, when teased out, suggest arenas where academic (and other, open-source) software developers might want to learn something from commercial software development practices. The reverse is not as obvious at the software development level, but is more evident in the user licensing and IP level.  At the code level, the process of development and design for academic/government agency software can be quite different than commercial software. This difference is mainly a matter of user expectations. As Doc Searls noted, “…Microsoft needed to succeed in the commercial marketplace, Linux simply needed to succeed as a useful blob of code” (Searls 2012, Kindle Locations 2262-2263).

A couple of conversations in the academic software code arena can illustrate how far apart these two types are. In the first, I was told that “we can deliver this with the warts showing, as long as it works.” And in the second, someone noted that some combination of “carrot and stick” could be applied to make sure people used the software service. Compare this to the goal that Guy Kawasaki promotes for software: enchantment. “There are many tried-and-true methods to make a buck, yuan, euro, yen, rupee, peso, or drachma. Enchantment is on a different curve: When you enchant people, your goal is not to make money from them or to get them to do what you want, but to fill them with great delight” (Kawasaki 2011, Kindle Locations 185-187). No warts or sticks allowed if your goal is enchantment. In fact, not that many carrots, either.

I countered the carrot and stick suggestion with one of my own, “How about donuts and heroin?” In commercial software development, it’s not uncommon to ask “So, what is the heroin in this software?” The idea is that the customer would be so enchanted with the software that they would gladly use it every day. Even the worst experience should still be a donut, and not a wart, and certainly not a stick.

Certain realities do intrude here. Academic and agency software developers work on the tiniest of budgets. They tackle massive problems to connect to data resources and add value to these. They commonly have no competition, which means they solve every problem on their own. A “useful blob of code” is better than no code at all. But still, they might consider imagining how to enchant their users, and provide a few dimples and donuts along with the worts and the carrots. Because their users spend most of their digital lives on the daily heroin supplied by Apple and Google and Facebook, being handed a carrot may not do the trick.

Kawasaki, Guy (2011-03-08). Enchantment: The Art of Changing Hearts, Minds, and Actions. Penguin Group. Kindle Edition.

Searls, Doc (2012-04-10). The Intention Economy: When Customers Take Charge. Perseus Books Group. Kindle Edition.

Photo credits, CC licensed from Flickr:

carrot and stick: bthomso

carrot on plate: malias

donuts: shutterbean

eating donut: Sidereal

Hulk want Negroponte Shift in publishing now

It’s one of those weeks where the clanking chains of the ancient devices of academic publishers have been more than a bit annoying, and it looks like no amount of WD40 will smooth the transition into digital delivery without first demolishing these anachronistic machines and their devilish DRM schemes.

This morning on NPR there was a bit on how public libraries need to subscribe each year to access the same digital files for eBooks, in order to provide these in serial increments to individual users. No overdue books here, the narrator notes, the digital files simply disappear from the user’s device, forcing them to queue up (and wait for weeks or months) until the digital file is again available. The more popular the book, the longer the wait.

I also had the opportunity to search for and find a book published by an academic press in Europe, and was informed by their website that I could download a digital copy for only $100+. Makes the iTunes bookstore seem cheap.

And then, with a link from William Gunn, I made myself read a response to the open access demands by scholars, students, and libraries, by a (IMHO fatuous) mouthpiece of the publishing industry in the Guardian. At one point he justifies the bloated profits of the industry by noting that they pay taxes on these (not if they can help it) and the government uses these taxes to fund (wait for it)… research. For profit academic publishers are the “research producers” that keep the wheels of science rolling. Lord help us if the (socialist) open access lumpen masses get their way.

On the more hopeful side, John Wilbanks and the Open Access Gang of Four are most of the way to a successful open access for research petition on the White House petition site. And if only the intern who programmed the Drupal user authentication for that site had hooked in a better module, then it’s likely that the necessary 25,000 signatures would have already been accomplished.

At last look, over at The Cost of Knowledge, 11,923 scholars have pledged to not give their services to Elsevier.

And, today we will find out if Redditors can oust Texas representative Lamar Smith, who co-authored SOPA.

We are all waiting for everything to digital and available and searchable and browsable, and linked, and curated, filtered and yet without the bubble, semantically rich, and, of course, free. We are not there yet, plenty of cruft to clear away. Time to point the Hulk at the entire academic publishing enterprise and say “SMASH.” It couldn’t hurt.

Use a virtual organization to borrow enough requisite variety to innovate in a data-rich world

Photo from Flickr, used with CC license. by http://www.flickr.com/photos/dominik99/

What can you do when your research team says, “This is way too complex.”

When your government agency or university laboratory looks to innovate in a world where multiple/large data inputs are coming on line, how can you stay ahead of the inherent complexity of the systems you are creating/interrogating? One way to look at this problem is through Ashby’s principle/law of requisite variety. A principle of cybernetic management, requisite variety notes that unless the control system has at least the variety of the environment it controls, it will fail. Which actually means that some part of the environment will be controlled elsewhere. Elsewhere is also where innovation happens; because unless you can corral the inherent variety of the problem you face, it will seem too complex for your team to innovate a response (Kofman [1]).  You can either go out and hire a bigger team, or you can borrow enough requisite variety just long enough to bring your own team up to speed. That is a great use for a virtual organization (VO).

Theorists of knowledge management have applied Ashby’s law in various modes, including a thread of interest in what is called a “learning organization” and mode of business communications management known as “systems thinking.” [There is a great amount of information about this available at the Society for Organizational Learning http://www.solonline.org/.]  The point they make is that the team you build to tackle a tough problem needs to have enough of a portfolio of knowledge and skills to address all parts of the problem. Not only that, but they need to communicate their skills and knowledge to one another so that each team member shares in this collective intelligence. Andrew Van de Ven put it this way, “Requisite variety is more nearly achieved by making environmental scanning a responsibility of all unit members, and by recruiting personnel within the innovation unit who understand and have access to each of the key environmental stakeholder groups or issues that affect the innovation’s development.”  (Van de Ven, 600).

Virtual organizations include online communities, research collaboratories, open source software programmer collectives, and other groups in a great variety of arenas and professions. What they offer is an open network of common interest and complementary talents. When your business or agency is looking to innovate in a world where data are more plentiful than insights (Abbott, 114) then it makes great sense, in terms of time and effort, to join a VO and gain enough requisite variety to conquer complexity and kickstart some innovation.

References

Abbott, Mark R. (2009) “A New Path for Science?” in The Fourth Paradigm: Data-Intensive Scientific Discovery. Hey, Tony, Stewart Tansley, and Kristin Tolle, eds.  Pp. 111-116. Redmond: Microsoft Research.

Fred Kofman [1]   http://www.youtube.com/watch?v=4ARZBxIzsKk&feature=relmfu

Van de Ven, Andrew H. (1986) “Central Problems in the Management of Innovation.” Management Science. Vol. 32. No. 5. (May.) pp. 590-607.

Post Publication Peer Review: It’s in your future

I recently attended the PLoS Forum in San Francisco. For a couple years, I’ve been encouraging PLoS to find a way to experiment with post-publication peer review. However, the pathway from the current academic peer review system to something potentially better (faster, fairer, more precise) must first overcome the enormous weight of influence that the current publication system holds for academic careers. I was encouraged that half of the day was spent trying to figure out how to move ahead with post-publication peer review.

Here is an excerpt from a Knol I wrote about scaffolding a new system based upon the reputation system of the old system:

“The real sticking points preventing scientific communication from taking full advantage of digital distribution are the following: 1) top ranked journals have cornered the reputation economy in terms of impact on tenure (they are a virtual whuffieopoly:  for the term “whuffie” see Hunt and Doctorow). 2) the very same journals remain locked into the 20th century (with resemblances to prior centuries) print-based publishing model, built on blind peer review and informed by the scarcity of space available in any printed journal. The task then, is to release them from their print-based constraints, while rewarding and supporting them to continue to be a high-end filter for quality science; and then transitioning their whuffie-abilities to a form more suited to the rapid digital dissemination of scientific outcomes. The academy needs great filters to help guide readers to the best science among hundreds of thousands of new papers every year. Universities need fair and broad feedback from the academic community to decide which faculty deserve promotion. The research community needs to accelerate publication speed and minimize editorial overhead. And the public needs markers that help them determine good science from the rest. Open-access content is the first step. The next step might need some badges.”

You can read the whole piece at Post-Publication Peer Review in the Digital Age