Thoughts on Governance for your New Big Data VO

A well cared for volunteer community is like a great South Berkeley garden!
A well cared for volunteer community is like a great South Berkeley garden!
NOTE: too long for a blog (sorry), but I did want this to be available.
The West Big Data Innovation Hub held its first all-hands-meeting in Berkeley last Thursday. What follows is a short talk I gave to the newly-formed Governance Working Group.
The Hub seeks to become a community-led, volunteer-run organization that can bring together the academy and industry… and that other academy (the one with the statues), and regional and metro government organizations into a forum where new knowledge will be born to build the practices and the technologies for big data use in the western US.
To become this organization it will need to spin up governance. An initial task for the governance working group is to draft a preliminary governance document that outlines the shape of the Hub’s decision space, and the desired processes to enable those HUB activities needed to realize the mission of the organization.
Virtual organization governance is hard. And the knowledge of how to succeed is not well understood.  We do know that the opportunities for failure are numerous. Funders will need to exercise patience and forbearance during the spin-up process. 
I don’t know of any NSF-funded community-led, volunteer-run organization that can be a model for this governance. I would be very happy to hear about one.  It would be great if this Hub becomes that successful organization.    
I have three suggestions (with the usual caveats) to help frame the work of this working group.

NUMBER ONE: Your community does not yet exist.  

There is a quote attributed to either Abraham Lincoln or Darryl Royal (depending if you’re from Texas or not)… “If you have five minutes to cut down a tree, spend the first three sharpening your axe.” 
Community building activities is the hub sharpening its axe.
Right now, when someone talks about the “big data community” that’s just another word for a bunch of people whose jobs or research involve big data. That’s a cohort, not a community. If you want community—and you do want community—you have to build it first.  That’s why you need to spend resources getting more people into the process and give them every reason to stay involved.
The first real job of the hub is to build your member community. 
Part of building your community is to give your members a stage for their vision of the future.  Challenge your members to envision the destination that marks the optimal big-data future for a wide range of stakeholders, then build a model for this destination inside the Hub.  
To meld vision with action and purpose and forge something that is new and useful, that’s a great goal: think of the Hub as the Trader Joes of big data. The place people know to go to… in order  to get what they need.
NOTE: Why do you actually need community? There’s a whole other talk there….  Community is the platform for supporting trustful teamwork… without it, you will not get things done. Without it emails will not get answered, telecons will not be attended, ideas and problems will not surface in conversations… and meetings will be tedious.

NUMBER TWO: Engagement is central. 

ANOTHER QUOTE: Terry Pratchett, the philosopher poet, once wrote: “Give a man a fire and he’s warm for a day. Ah, but set a man on fire and he’s warm for the rest of his life…” 
You governance effort should be centered on maximizing member engagement by giving the greatest number of members opportunities to do what they believe is most important for them to do RIGHT NOW. Invite new members to join and then ask them what the hub can do for them. This is not a Kennedy moment.
Your members want pizza… it’s your job to build them a kitchen and let them cook.
Your steering committee (or whatever this is called) needs to be 90% listening post and 10% command center. It needs to listen and respond to members who want to use the Hub to do what they think the hub should do. It needs to coordinate activities and look for gaps. It needs to remind members of the vision, the values, and the mission goals of the organization, and then remind them that this vision, these values, and the mission belong to them and are open to all members to reconfigure and improve.
The Hub needs to be a learning organization with multiple coordinated communication channels… Members need to know their ideas have currency in the organization.  
Do not be afraid of your members, but do be wary of members that seem to want to lead without first attracting any followers. Spread leadership around. Look for leadership on the edges and grow it.
Engagement will lead to expertise.   Over time, the members will learn to become better members.  The organization should improve over time. It will not start out amazing.  It can become amazing if you let it.
Each member needs to get more than they give to the organization. If they don’t, then you’re probably doing it wrong. This will be difficult at first, so the shared vision will need to carry people through that initial phase.
Creating a bunch of committees and a list of tasks that need to be finished on a deadline is NOT the way to engage members. If you think that’s engagement, you are probably doing it wrong.  YES, some things need to be done soon to get the ball rolling. But remember that volunteers have other, full time jobs.

NUMBER THREE:  There can be a great ROI for the NSF

The Hub’s success will provide the NSF with a return on its investment that is likely to be largely different than what it expects today, but also hugely significant and valuable.
Final quote here: Brandon Sanderson, the novelist wrote: “Expectations are like fine pottery. The harder you hold them, the more likely they are to break.”
The hub is NOT an NSF-funded facility, or a facsimile of a facility…
Unlike a facility, the NSF will not need to fund a large building somewhere and maintain state-of-the-art equipment. The NSF already funds these facilities for its big data effort.  The Hub is not funded to be a facility and will not act like a facility. 
The hub is also not just another funded project… 
Unlike a fully funded project, the NSF will not be paying every member to accomplish work in a managed effort with timelines and deliverables. 
Volunteers are not employees. They cannot and should not be tasked to do employee-style work. They have other jobs.  The backbone coordination projects for the hubs and spokes are paid to enable their volunteer members to do the work of volunteers. The Hub is not a giant funded project. It will not work like a giant funded project. It cannot be managed. It must be governed.  This means it needs to govern itself. 
Self governance is the biggest risk of failure for the hub. That’s why the work you do in this working group is crucial.
Self governance is also the only pathway to success. So, there is a possible downside and potentially a really big upside…
Remember that process is always more important than product.  You may need to remind your NSF program managers of this from time to time.
The Hub needs to take full advantage of the opportunities and structural capacities it inherits as a community-led, volunteer-run organization. It’s goal is to be the best darn community-led, volunteer-run organization it can be.  Not a facility and not a big, clumsy funded project.
Here are Seven Things the NSF can get only by NOT funding them directly, but through supporting the HUB as a community-led virtual organization of big-data scientists/technologists:
1. The NSF gets to query and mine a durable, expandable level of collective intelligence and a requisite variety of knowledge within the HUB;
2. The NSF can depend on an increased level of adoption to standards and shared practices that emerge from the HUB;
3. The NSF will gain an ability to use the HUB’s community network to create new teams capable of tackling important big-data issues (also it can expect better proposals led by hub member teams);
4. The NSF can use the HUB’s community to evaluate high-level decisions before these are implemented (=higher quality feedback than simple RFIs);
5. Social media becomes even more social inside the HUB big-data community, with lateral linkages across the entire internet. This can amplify the NSF’s social media impact;
6. The Hub’s diverse stakeholders will be able to self-manage a broad array of goals and strategies tuned to a central vision and mission and with minimal NSF funding; and,
7. The NSF and the Hub will be able to identify emergent leadership for additional efforts.
Bottom Line: Sponsoring a community-led, volunteer-run big data Hub offers a great ROI for the NSF. There are whole arenas of valuable work to be done, but only if nobody funds this work directly, but instead funds the backbone organization that supports a community of volunteers. This is the promise of a community-led organization.
And it all starts with self-governance…
To operationalize your community-building effort you will be spinning up the first iteration of governance.  If you can keep this first effort nimble, direct, as open to membership participation as you can, and easy to modify, all will be good.  Do not sweat the details at this point.  Right now you are building just the backbone for the organization. Just enough to enable and legitimate the first round of decisions.
Make sure that this document is not set in concrete… it will need to change several times in the next 3-5 years. In the beginning, create a simple process and a low threshold for changes (not a super majority). TIP: Keep all the governance documents on GitHub or something like that. Stay away from Google Docs! Shun Word and PDFs!   

Postscript:

Hallmark moments in the future of this Hub if it is successful:
At some point 90% of the work being done through the Hub will be by people not in this room today. The point is to grow and get more diverse. With proper engagement new people will be finding productive activities in the hub. [with growth and new leadership from the community] 
At some point none of the people on the steering committee will be funded by the NSF for this project…  [this is a community-led org… yes?]… 
At a future AHM meeting more than 50% of the attendees will be attending for the first time.

How about a little democracy for your virtual organization

 

16466471030_2fa8fb6483_k

 

What follows is the text from an unfunded NSF proposal in 2008

We had offered to assemble a knowledge resource for NSF-funded virtual organizations to create governance systems that were “open, trustworthy, generative, and courageous” (taking the lead here from Maddie Grant and Jamie Nodder’s book: Humanize). The idea was to raise the level of knowledge and awareness of NSF program managers and funded PIs to the challenges and rewards of creating actual democratic governance when they build a community-led, volunteer-run virtual science organization. The operant word above is: “unfunded.” From recent events it looks like the NSF still could use a broader purview of the role of governance in its funded networks.

New Knowledge is Essential to guide Governance Plan Decisions for future CI Projects

Building the cyber-social-structure that supports cyberinfrastructure projects is equally important as building the information technologies. While critical-path project management might be sufficient to get the code done, it takes community engagement to get that code used. Every project that uses “community-based” research or promises to “serve a user community” needs to consider the issue of project governance outside of critical-path task management. However, a search for the term “governance plan” on the NSF website (January 5, 2008) shows that only five program RPFs (ITEST, PFC, MSP, CREST, and RDE) have ever asked for a plan for project governance. Even in these cases, governance was associated with task management, rather than community engagement/building. Other large scale NSF CI projects such as the DLESE digital library effort, which were/are centered on community-based content development, have had no requirement (nor guidance) on matters of community-based governance. The simple fact is this: the knowledge that would enable the NSF to give guidance to CI/VO projects about community governance planning and execution does not today exist.

Today, there is no place where NSF Program Managers or project PIs can go to gather the knowledge required to make an informed decision on a community based/led governance plan for a proposed project. The literature on VO project/task management and communication has grown considerably of late (See: Jarvenpaa and Leidner (1999), Monge and Desanctis (1998)). However, the role of community participation in decision making for VOs is mostly undertheorized and poorly understood. The Virtual Democracy Project will produce useable knowledge that the NSF and project PIs can use to make concrete decisions on the issue of community-based governance.

Dialogic Democracy in the Virtual Public Sphere

The Virtual Democracy Project centers its work on a novel extension of the theory and practice of “dialogic democracy,” as this occurs within virtual organizations (VO). This term was coined by Anthony Giddens, who wrote in 1994, “…it is the aspect of being open to deliberation, rather than where it occurs, which is most important. This is why I speak of democratization as the (actual and potential) extension of dialogic democracy—a situation where there is developed autonomy of communication, and where such communication forms a dialogue by means of which policies and activities are shaped.” The notion owes much to Habermas’s (1992) notion of the role of conversation in the public sphere (see also: Calhoun 1992).

Large-scale VOs (such as digital libraries and national collaboratories) are created outside of single institutions. They serve as bridges between communities and organizations. In order to be truly interdisciplinary (and/or inter-organizational, inter-agency, or international), they require an external position to their constituent groups. They become, in fact, “virtual public spheres” where discussions concerning the needs and goals of the VO must avoid collapsing into competing voices from within the various communities to which the members also belong (academic disciplines, universities, etc.). A VO of any scale engages this virtual public sphere whenever it proposes to use “community-based (or -led)” research or outreach.

Just as the Public Sphere opens up the space for dialogic democracy in the modern nation-state (Calhoun 1992), the virtual public sphere inside the VO opens up the dialogic space necessary for authentic community-based governance. How is this virtual public sphere created and sustained? How are practices within it enabled to shape policies and activities of the VO? How does this governance effort interact with the project management effort? These are questions that many VOs must face or ignore at their own risk.

Which form of governance is right for your CI effort?

A funded project’s policies and activities can be shaped and decisions made in many ways. When these are made through open communication among peers, a form of democracy is achievable. Conversations, commentaries, discussions, multiple opportunities for feedback into the decision process: practices such as these mark the emergence of a dialogic democracy within a VO. Fortunately for researchers, dialogic democracy is not a subtle, hidden practice. The implementation of community-led governance is a visible, recordable, completely reflexive event. This means that it’s absence is also markedly noticeable. Ask any member of a VO who makes the decisions for the project, and the answer will reveal the presence or absence, the strength or weakness, of dialogic democracy in that organization. Examples of strong and weak community governance in VOs are available for study.

Take, for example, two large, currently active VOs that have chosen completely different governance structures. The Federation of Earth Science Information Partners (ESIPFED) uses dialogic democracy as the basis of all of its workings. Its members spent three years creating the organization’s Constitution and Bylaws (ESIP Federation 2000). By contrast, the National Science Digital Library (NSDL), early in its founding period, chose not to embrace community-led governance, even though this was prominent in early discussions (NSDL 2001). How important is/was dialogic democracy to the work and the sustainability of VOs such as the ESIPFED and the NSDL? How much will this have an impact on future CI-funded VOs? How does the NSF manage funding when this also needs to be managed through community-based governance structures? As a part of the Virtual Democracy Project, PIs (past and present) from the ESIPFED and the NSDL will be surveyed about the role of dialogic democracy in these organizations.  The Virtual Democracy Project will be the first NSF funded effort to look at the value of and evaluate the practices and the return on investment of dialogic democracy practices (or their absence) in existing VOs.

Software/services with built-in democracy features

While many social networking and peer feedback software services appear to offer functionalities that can be used as-is within community-led governance efforts, democracy places its own requirements on the channels and administration of communication resources. In addition the need for active communication among peers there is a new need for appropriate monitoring of these channels to ensure that their use is transparent and sufficient to support minority voices and sustain a record for review and for possible redress.

The Virtual Democracy Project (VDP) provides paradigm-shifting research for both social-science and computer-science research approaches. The application of the public-sphere based dialogic democracy model to “virtual public spheres” within VOs represents a novel research perspective for CI governance issues. The software services that constitute the vehicles for peer interaction need to also be democratically available for members of VOs, just as the files and folders, the rooms and chambers: the venues that inform the councils of government need to be available for citizens.

Computer scientists on the VDP team will be evaluating available social networking and peer-evaluation services to devise ways for software/services to be open to community inspection. Other software issues include maintaining the privacy of online voting records while allowing for independent validation of results, and maintaining logs of more public member contributions for proper attribution and rewards.

Geography offers a particularly useful domain for VOs that include unstructured crowd-sourcing (such as Yahoo Maps, Wikimapia, and geo-tagging on Flickr). Thousands of strangers every day add nodes and layers to Internet maps that are openly shared. The role of community -building/-governance practices that would promote reliable management of these voluntary community contributions for scientific research offers a window into the very front end of Web 2.0 development.

New IT services are generally built according to the emerging needs of users. Through the proposed research, new user needs for IT in support of dialogic communication will certainly emerge. Because of the dual requirements of privacy and attribution, one can predict that these software services will require novel thinking about database structures and security. The need for non-technical persons to have confidence that information assembled by the VO to inform its decisions is accurate and reflects the contributions of its members requires the construction of new diagnostic tools that can monitor software services to look for evidence of tampering or rigging. A whole new set of questions and concerns will inform the next generation of IT based social networking services that will need to meet new standards for use within VO governance structures.

Meeting concerns for the future of an inclusive cyberinfrastructure

This research effort will have immediate benefits for the remainder of the CI effort, as its outcomes will lead to practical guidance about which forms of governance might best be applied to any proposed CI program/project. Where the proposed effort embraces community participation, the activity of governance for community-building can be better budgeted for time and labor and also timing. Democracy also takes time. A three-year project that starts community-building in year three will probably fail in this task. The larger question of how much should a government agency spend on community-building efforts for any project also needs to be addressed. Planners and program directors will be able to turn to the cybersocialstructure.org site for decision support.

Where issues of community participation and dialogic democracy really come to the fore is in practices designed to improve and reward the efforts of underrepresented communities and individuals within VO decision making. Assuming the goal is actual inclusion of a diverse range of voices and interests in the decision process, authentic (and authenticatable) democratic processes are an obvious need and solution. The Virtual Democracy Project will explore the use of dialogic democratic practices as a feature of building a more inclusive cyberinfrastructure.

A final note, however, is that democratic practices also can inform and potentially improve communication by building community (and so, trust and identification with project goals) within the core group of PIs and Co-PIs (Wiesenfeld, et al 1999). There are potential benefits to the core task management effort that need to be considered in any cost-benefit decision.

Photo Credit: Backbone Campaign (CC general 2.o)

EarthCube is poised to start its mission to transform the geosciences

The red areas are sandstone.
The red areas are sandstone.

Here is the current vision statement of EarthCube

EarthCube enables transformative geoscience by fostering a community committed to providing unprecedented discovery, access, and analysis of geoscience data.

The primary goal of membership in EarthCube, and indeed of the entire culture of the EarthCube organization is to support this vision. The EarthCube vision describes a future where geoscience data is openly shared, and where a new science, one based on an abundance of sharable data, assembles new knowledge about our planet. Certainly shared open source software and open access publishing are anticipated in this vision. The vision accepts that it will take a committed community of domain and data scientists to realize this goal.

What can we predict about the culture of a community committed to transformational geosciences? How is this different from the culture of a community pursuing geoscience currently? We need to start building out our imagination of what transformative geoscience will look like and do.  One thing we might agree on is that this will be a much more open and collaborative effort.

Unprecedented data discovery, access, and analysis in the geosciences coupled with open science best practices will drive knowledge production to a new plateau. Many of today’s grand challenge questions about climate change, water cycles, human population interaction with ecosystems, and other arenas will no long be refractory to solution. For now, we can call the engine for this process “Open Geosciences” or OG for short.  What will OG pioneers be doing, and how can EarthCube foster these activities?

  • Pioneering OG scientists will collect new data using shared methodologies, workflows, and data formats.
  • These OG scientists will describe their data effectively (through shared metadata) and contribute this to a shared repository.
  • OG scientists will analyze their data with software tools that collect and maintain a record of the data provenance as well as metrics on the software platform.
  • OG scientists will report out their findings in open access publications, with links to the data and software.
  • OG scientists will peer review and add value to the work of others in open review systems.
  • OG domain and data scientists will reuse open data to synthesize new knowledge, and to build and calibrate models.
  • OG software engineers will collaborate on open software to improve capabilities and sustainability.
  • OG scientists will share more than data. They will share ideas, and null results, questions and problems, building on the network effect of organizations such as EarthCube to grow collective intelligence.
  • OG science funding agencies will work with OG communities to streamline research priority decisions and access to funding.

 At this stage, EarthCube is in its most institutionally reflexive moment and is most responsive to new ideas. Like a Silicon Valley start-up flush with cash and enthusiasm, EarthCube is poised to build its future up from the ground. EarthCube can succeed in its vision without attempted to directly influence the embedded cultures of government organizations, tier one universities, professional societies, and commercial publishers. EarthCube will succeed by building its own intentional culture, starting with its membership model and focused on its vision. EarthCube will only transform geoscience by proving that its members can do better science faster and cheaper through their commitment to the modes of scientific collaboration now made possible through EarthCube. EarthCube will transform science by transforming the practices and the attitudes of its own members.

NASA image by Robert Simmon with ASTER data. Caption by Holli Riebeek with information and review provided by David Mayer, Robert Simmon, and Michael Abrams.

Hitting the target makes all the difference for the software life cycle

Sky diver jumping from plane

At a recent, NSF-funded workshop that was looking at how a new institute might help scientists become better stewards of the software they create for their research, a day was devoted to discussing the entire software life cycle, and the differences between commercial software, open-source, community-led software, and academic science software. A long list to positives and negatives accumulated to describe the triumphs and the pitfalls of each of these arenas for software development. Most of the triumphs were in the commercial software column, and the great majority of pitfalls were common to science software development.

That evening, upon reflection, it occurred to me that commercial software was simply very good at determining a target (feature and/or customer) and then hitting this target. It seemed like academic software developers, admittedly working on shoestring budgets only seemed to cobble together whatever feature their next experiment might require, with the result being software that was almost secreted over time (I almost said excreted…) instead of crafted for the long-haul.

It struck me—reflecting back on my single skydiving adventure, in the days where you still took your first jump solo on a static line—that my focus at that time had been narrowed down to the single fear of getting out of the plane. I did not want to freeze in the door. Consequently, I seemed to have not paid as close attention as I might to what happens next. As a result I ended up landing in a field away from the airport (upside: I did not land on the Interstate). I hit ground, no problem, and without breaking anything, but I missed the target completely.

Again, commercial software developers are firmly focused on their targets, and they make software that helps others find the same target too. To do this they know how to jump and when to pivot in order to land right on that X. When Instagram created its software it focused on the simplicity of sharing photos.

Open-source, community-led software tends to lose that target focus, in part because the developer community usually has several simultaneous targets in mind. What they are good at is designing the parachute and the gadgets that help to figure altitude and wind. They make jumping safer and more fun, and their goal is to enable more people to do the same.

Getting back to science software developers, these are often individuals or small teams working as a part of a larger project. They wrangle the datasets and finagle some visualizations. They add a button or a drop-down list and call it a GUI. They tell their team how to use it and what not to do so it won’t crash. Then they go ahead and do their experiments and write it up. In software life cycle terms, all they know how to do it jump out of the plane. Forget the target, never mind the parachute…just jump.

The goal of the NSF workshop was to help design an institute that would support better software development practices across the environmental and earth sciences. To do that, science software developers need to focus all the way to common targets of resourceful, reliable, and reusable software. Do you have some ideas? Feel free to join the ongoing conversation at the ISEES Google+ Community.

Photo Credit: CC licensed on Flickr by US Air Force

The next generation of environmental software needs a vision and some help

ISEES1

At a three day workshop, a group of scientists explored a vision of the “grand challenges” that eco- and earth science face in the coming decade. Each of these challenges, if answered, would provide invaluable new knowledge to resource planners and managers across the planet. And every challenge contained a workflow that called upon software capabilities, many of which do not currently exist: capabilities to handle remote and in situ observations and environmental model output in order to incorporate multiple data layers and models at several resolutions, from a prairie to the planet. Water cycles, pollution streams, carbon sequestration, climate modeling, soil dynamics, and food systems—achieving the next plateau of understanding these processes will require a massive investment in computing and software. The reason for this workshop was to help inform a new institute that can provide key services to make this investment pay off.

Much of this software will be built by research teams that propose projects to solve these grand challenges. These teams will be multi-institutional and are likely to be more focused on the science side of their project, and less on the value their software might acquire by being built on standards, using best-practice coding, and ready for reuse by others. The history of federally-funded science software is crowded with abandoned ad hoc project-based software services and products. I’ve helped to author some of these. One of the federally-funded products (a science education software package) I helped produce had its home-page URL baked into its user interface. After the project funding ended, the PI did not renew the domain name, and this was picked up by a Ukrainian hacker, who used it as the front end of a pornography portal. So the software UI (distributed in hundreds of DVDs) now pointed students to a porn site. A far more prevalent issue is that of software built with 3rd-party services (remember HyperCard?) that have subsequently changed or died, breaking the software after the funding is gone and the programmer has moved on. The point here is that there are dozens of lessons already learned by science software developers, and these need to be assembled and shared with the teams that are building new software.

There is still more value to be added here. A software institute can offer a range of services that will make funded software more reliable, more reusable, and more valuable to science. Much of the federally-funded software development will be done by university staff scientists and graduate students. Most of the latter are in the beginning stages of learning how to program. A crash course on agile programming and Git, or other basic programming skills, could help them get up to speed over a summer. An up-to-date clearinghouse of data and file format issues and recommendations, a help-desk for common CMS and data access problems, and particularly, personal (Skyped) help when the grad student hits a wall: these services can save a funded project from floundering. All together, these services can save the project’s software from an early grave. Research into extending the lifecycle of science software is needed to help science maintain the longer-term provenance of its methods and findings.

Isees2

This workshop was organized by the team that is looking to build the Institute for Sustainable Earth and Environmental Software. Here is their website: http://isees.nceas.ucsb.edu