Thoughts on Governance for your New Big Data VO

A well cared for volunteer community is like a great South Berkeley garden!
A well cared for volunteer community is like a great South Berkeley garden!
NOTE: too long for a blog (sorry), but I did want this to be available.
The West Big Data Innovation Hub held its first all-hands-meeting in Berkeley last Thursday. What follows is a short talk I gave to the newly-formed Governance Working Group.
The Hub seeks to become a community-led, volunteer-run organization that can bring together the academy and industry… and that other academy (the one with the statues), and regional and metro government organizations into a forum where new knowledge will be born to build the practices and the technologies for big data use in the western US.
To become this organization it will need to spin up governance. An initial task for the governance working group is to draft a preliminary governance document that outlines the shape of the Hub’s decision space, and the desired processes to enable those HUB activities needed to realize the mission of the organization.
Virtual organization governance is hard. And the knowledge of how to succeed is not well understood.  We do know that the opportunities for failure are numerous. Funders will need to exercise patience and forbearance during the spin-up process. 
I don’t know of any NSF-funded community-led, volunteer-run organization that can be a model for this governance. I would be very happy to hear about one.  It would be great if this Hub becomes that successful organization.    
I have three suggestions (with the usual caveats) to help frame the work of this working group.

NUMBER ONE: Your community does not yet exist.  

There is a quote attributed to either Abraham Lincoln or Darryl Royal (depending if you’re from Texas or not)… “If you have five minutes to cut down a tree, spend the first three sharpening your axe.” 
Community building activities is the hub sharpening its axe.
Right now, when someone talks about the “big data community” that’s just another word for a bunch of people whose jobs or research involve big data. That’s a cohort, not a community. If you want community—and you do want community—you have to build it first.  That’s why you need to spend resources getting more people into the process and give them every reason to stay involved.
The first real job of the hub is to build your member community. 
Part of building your community is to give your members a stage for their vision of the future.  Challenge your members to envision the destination that marks the optimal big-data future for a wide range of stakeholders, then build a model for this destination inside the Hub.  
To meld vision with action and purpose and forge something that is new and useful, that’s a great goal: think of the Hub as the Trader Joes of big data. The place people know to go to… in order  to get what they need.
NOTE: Why do you actually need community? There’s a whole other talk there….  Community is the platform for supporting trustful teamwork… without it, you will not get things done. Without it emails will not get answered, telecons will not be attended, ideas and problems will not surface in conversations… and meetings will be tedious.

NUMBER TWO: Engagement is central. 

ANOTHER QUOTE: Terry Pratchett, the philosopher poet, once wrote: “Give a man a fire and he’s warm for a day. Ah, but set a man on fire and he’s warm for the rest of his life…” 
You governance effort should be centered on maximizing member engagement by giving the greatest number of members opportunities to do what they believe is most important for them to do RIGHT NOW. Invite new members to join and then ask them what the hub can do for them. This is not a Kennedy moment.
Your members want pizza… it’s your job to build them a kitchen and let them cook.
Your steering committee (or whatever this is called) needs to be 90% listening post and 10% command center. It needs to listen and respond to members who want to use the Hub to do what they think the hub should do. It needs to coordinate activities and look for gaps. It needs to remind members of the vision, the values, and the mission goals of the organization, and then remind them that this vision, these values, and the mission belong to them and are open to all members to reconfigure and improve.
The Hub needs to be a learning organization with multiple coordinated communication channels… Members need to know their ideas have currency in the organization.  
Do not be afraid of your members, but do be wary of members that seem to want to lead without first attracting any followers. Spread leadership around. Look for leadership on the edges and grow it.
Engagement will lead to expertise.   Over time, the members will learn to become better members.  The organization should improve over time. It will not start out amazing.  It can become amazing if you let it.
Each member needs to get more than they give to the organization. If they don’t, then you’re probably doing it wrong. This will be difficult at first, so the shared vision will need to carry people through that initial phase.
Creating a bunch of committees and a list of tasks that need to be finished on a deadline is NOT the way to engage members. If you think that’s engagement, you are probably doing it wrong.  YES, some things need to be done soon to get the ball rolling. But remember that volunteers have other, full time jobs.

NUMBER THREE:  There can be a great ROI for the NSF

The Hub’s success will provide the NSF with a return on its investment that is likely to be largely different than what it expects today, but also hugely significant and valuable.
Final quote here: Brandon Sanderson, the novelist wrote: “Expectations are like fine pottery. The harder you hold them, the more likely they are to break.”
The hub is NOT an NSF-funded facility, or a facsimile of a facility…
Unlike a facility, the NSF will not need to fund a large building somewhere and maintain state-of-the-art equipment. The NSF already funds these facilities for its big data effort.  The Hub is not funded to be a facility and will not act like a facility. 
The hub is also not just another funded project… 
Unlike a fully funded project, the NSF will not be paying every member to accomplish work in a managed effort with timelines and deliverables. 
Volunteers are not employees. They cannot and should not be tasked to do employee-style work. They have other jobs.  The backbone coordination projects for the hubs and spokes are paid to enable their volunteer members to do the work of volunteers. The Hub is not a giant funded project. It will not work like a giant funded project. It cannot be managed. It must be governed.  This means it needs to govern itself. 
Self governance is the biggest risk of failure for the hub. That’s why the work you do in this working group is crucial.
Self governance is also the only pathway to success. So, there is a possible downside and potentially a really big upside…
Remember that process is always more important than product.  You may need to remind your NSF program managers of this from time to time.
The Hub needs to take full advantage of the opportunities and structural capacities it inherits as a community-led, volunteer-run organization. It’s goal is to be the best darn community-led, volunteer-run organization it can be.  Not a facility and not a big, clumsy funded project.
Here are Seven Things the NSF can get only by NOT funding them directly, but through supporting the HUB as a community-led virtual organization of big-data scientists/technologists:
1. The NSF gets to query and mine a durable, expandable level of collective intelligence and a requisite variety of knowledge within the HUB;
2. The NSF can depend on an increased level of adoption to standards and shared practices that emerge from the HUB;
3. The NSF will gain an ability to use the HUB’s community network to create new teams capable of tackling important big-data issues (also it can expect better proposals led by hub member teams);
4. The NSF can use the HUB’s community to evaluate high-level decisions before these are implemented (=higher quality feedback than simple RFIs);
5. Social media becomes even more social inside the HUB big-data community, with lateral linkages across the entire internet. This can amplify the NSF’s social media impact;
6. The Hub’s diverse stakeholders will be able to self-manage a broad array of goals and strategies tuned to a central vision and mission and with minimal NSF funding; and,
7. The NSF and the Hub will be able to identify emergent leadership for additional efforts.
Bottom Line: Sponsoring a community-led, volunteer-run big data Hub offers a great ROI for the NSF. There are whole arenas of valuable work to be done, but only if nobody funds this work directly, but instead funds the backbone organization that supports a community of volunteers. This is the promise of a community-led organization.
And it all starts with self-governance…
To operationalize your community-building effort you will be spinning up the first iteration of governance.  If you can keep this first effort nimble, direct, as open to membership participation as you can, and easy to modify, all will be good.  Do not sweat the details at this point.  Right now you are building just the backbone for the organization. Just enough to enable and legitimate the first round of decisions.
Make sure that this document is not set in concrete… it will need to change several times in the next 3-5 years. In the beginning, create a simple process and a low threshold for changes (not a super majority). TIP: Keep all the governance documents on GitHub or something like that. Stay away from Google Docs! Shun Word and PDFs!   

Postscript:

Hallmark moments in the future of this Hub if it is successful:
At some point 90% of the work being done through the Hub will be by people not in this room today. The point is to grow and get more diverse. With proper engagement new people will be finding productive activities in the hub. [with growth and new leadership from the community] 
At some point none of the people on the steering committee will be funded by the NSF for this project…  [this is a community-led org… yes?]… 
At a future AHM meeting more than 50% of the attendees will be attending for the first time.

Yes, your agency/foundation can sponsor world-class virtual organizations to transform the sciences

For VRVOs conviviality is essential
For VRVOs, conviviality is essential

I’ve just returned from the Summer meeting of the Federation of Earth Science Information Partners (ESIP). After nearly two decades of “making data matter”, ESIP continues to show real value to its sponsors. Indeed, the next few years might be a period where ESIP grows well beyond its original scope (remotely sensed Earth data) to tackle data and software issues throughout the geosciences. A good deal of the buzz at this year’s Summer meeting was a new appreciation for the “ESIP way” of getting things done.
ESIP champions open science at all levels, and this openness extends to everything ESIP does internally. ESIP is building a strong culture for the pursuit of open science in the geosciences, and remains a model for other volunteer-run virtual organizations (VRVO) across science domains. There are lessons learned here that can be applied to any arena of science.
I hope other agency sponsors will take note of ESIP when they propose to fund a “community-led, volunteer-run virtual organization.” In this letter I’m going to point out some central dynamics that can maximize the ROI for sponsors and enable these organizations to do their work of transforming science. One note: I am using the term “sponsor” here to designate agencies or foundations that fund the backbone organization, the staff of the VRVO. The work of volunteers is of course, not directly funded (apart from some logistic support).

The biggest picture
The real potential for any science VRVO to return value to its sponsors is realized as this organization develops into an active, vibrant community-led, volunteer-run virtual science/technology organization. To capture this value, the VRVO needs to focus on those activities that leverage the advantages peculiar to this type of organization, with special attention to activities that could not be realized through direct funding as, say, a funded research center. This is a crucial point. The real advantages that the VRVO offers to science and to its sponsors are based on the fact that it is not a funded project or center, and that the difference between it and funded centers (or facilities, or projects) is intentional and generative to its ROI.
The simple truth is that any volunteer-run organization will never be able to perform exactly like a funded center, just as centers cannot perform like VRVOs. Community-led organizations make, at best, mediocre research centers. Volunteers cannot be pushed to return the same type of deliverables as those expected by a center.
The biggest return that any VRVO will provide to its sponsors will come from circumstances where incentives other than funding are in play. In fact, adding money is generally a counter-incentive in these circumstances. Among these returns are the following:

  • A durable, expandable level of collective intelligence that can be queried and mined;
  • An amplified positive level of adoption to standards and shared practices;
  • An ability to use the network to create new teams capable of tackling important issues (=better proposals); and,
  • The ability to manage a diverse set of goals and strategies within the group, each of them important to a single stakeholder community, but all of them tuned to a central vision and mission.

Elsewhere I have outlined a larger number of such returns on investment. I continue to receive comments listing additional ones. I’ll do an updated list before the end of the year.

None of these returns can be funded directly by the sponsors, apart from supporting the backbone organization that in turn supports the VRVO. And none of these could effectively be funded through a center or other entity. They are predictable outcomes only of precisely the type of organization that the VRVO will, hopefully, achieve.

The real test for a science VRVO is to develop fully within the scope and logic of its organizational type. The concomitant test for the sponsors is to understand that sponsoring a new and different type of organization will require some new expectations and some period (a few years) of growth and experimentation to allow the virtual organization to find its own strength and limits.

Experiments, such as micro-funding are easier in a VRVO
Experiments, such as micro-funding, are easier in a VRVO

Governance NOT Management
One important lesson learned at ESIP is this: governance must never be reduced to management. Funded projects and centers are managed. VRVOs are  self-governed. Volunteer-run organizations are intrinsically unmanageable as a whole, and at their best. A VRVO can certainly house dozens or hundreds of small, self-directed teams where real work can be managed. ESIP “clusters” are good example. These teams can produce valuable and timely deliverables for science and for the sponsors.
The style of governance is also very important here. Attempts to shift governance away from the membership and into top-down executive- or oversight committees are always counterproductive. They give the membership a clear alibi to not care about the organization. Academics have enough alibis to not volunteer without adding this one. The members need to own the mission, vision, and strategies for the VO. Successful activities will emerge from initiatives that have been started independently and with some immediate urgency by small groups and which grow into major efforts with broadly valued deliverables. Bottom-up governance will outperform top-down management over the long term.

Science culture shifting
Probably the largest recognized impact that science VRVOs can make here—and perhaps only these can accomplish this—is to model a new, intentional cultural mode of producing science. This new cultural model will likely be centered on sharing (sharing is also one of the oldest cultural traits of science, only recently neglected). Sharing ideas. Sharing software, tools, techniques, data, metadata, workflows, algorithms, methodologies, null data, and then sharing results. Reuse needs to become a key metric of science knowledge (Cameron Neylon noted this at the original Beyond the PDF conference).
Transforming science means changing the culture of science. Science VRVOs must perform real culture work here. This is often a challenge for their sponsors, as these organizations are usually well situated at the center of the existing science culture. The key learning moments and opportunities, and perhaps the highest ROI for sponsoring a science VRVO is when this organization teaches its sponsor to change.

Three critical governance conditions any agency/foundation sponsor needs to heed.

There are three necessary conditions for an agency-sponsored, community-led organization to be accepted as legitimate by a science community.

  1. The sponsoring agency needs to allow the community to build its own governance. Governance documents and practices are not subject to approval or even review by the sponsoring agency, apart from needing to follow standard fiduciary rules. The sponsoring agency can offer input the same way other individuals and groups do, but the community decides its own practices. The metrics for the governance are the growth of volunteer participation, and spread of community involvement, the perceived transparency and fairness of decisions, and the community’s value placed on the work being done.
  2. The sponsoring agency has no right to review or in any way interfere with elections. All organization members have the right to run for office and to be elected.
  3. The agency’s sponsorship is designed to help the organization grow into its potential as a volunteer-run, community-led scientific organization. The returns on investment for the agency are multiple, but do not include tasking the organization to perform specific duties, other than to improve over time.

Postscript: of course, the golden rule of any volunteer organization, new or old, is this: DFUTC.

EarthCube is poised to start its mission to transform the geosciences

The red areas are sandstone.
The red areas are sandstone.

Here is the current vision statement of EarthCube

EarthCube enables transformative geoscience by fostering a community committed to providing unprecedented discovery, access, and analysis of geoscience data.

The primary goal of membership in EarthCube, and indeed of the entire culture of the EarthCube organization is to support this vision. The EarthCube vision describes a future where geoscience data is openly shared, and where a new science, one based on an abundance of sharable data, assembles new knowledge about our planet. Certainly shared open source software and open access publishing are anticipated in this vision. The vision accepts that it will take a committed community of domain and data scientists to realize this goal.

What can we predict about the culture of a community committed to transformational geosciences? How is this different from the culture of a community pursuing geoscience currently? We need to start building out our imagination of what transformative geoscience will look like and do.  One thing we might agree on is that this will be a much more open and collaborative effort.

Unprecedented data discovery, access, and analysis in the geosciences coupled with open science best practices will drive knowledge production to a new plateau. Many of today’s grand challenge questions about climate change, water cycles, human population interaction with ecosystems, and other arenas will no long be refractory to solution. For now, we can call the engine for this process “Open Geosciences” or OG for short.  What will OG pioneers be doing, and how can EarthCube foster these activities?

  • Pioneering OG scientists will collect new data using shared methodologies, workflows, and data formats.
  • These OG scientists will describe their data effectively (through shared metadata) and contribute this to a shared repository.
  • OG scientists will analyze their data with software tools that collect and maintain a record of the data provenance as well as metrics on the software platform.
  • OG scientists will report out their findings in open access publications, with links to the data and software.
  • OG scientists will peer review and add value to the work of others in open review systems.
  • OG domain and data scientists will reuse open data to synthesize new knowledge, and to build and calibrate models.
  • OG software engineers will collaborate on open software to improve capabilities and sustainability.
  • OG scientists will share more than data. They will share ideas, and null results, questions and problems, building on the network effect of organizations such as EarthCube to grow collective intelligence.
  • OG science funding agencies will work with OG communities to streamline research priority decisions and access to funding.

 At this stage, EarthCube is in its most institutionally reflexive moment and is most responsive to new ideas. Like a Silicon Valley start-up flush with cash and enthusiasm, EarthCube is poised to build its future up from the ground. EarthCube can succeed in its vision without attempted to directly influence the embedded cultures of government organizations, tier one universities, professional societies, and commercial publishers. EarthCube will succeed by building its own intentional culture, starting with its membership model and focused on its vision. EarthCube will only transform geoscience by proving that its members can do better science faster and cheaper through their commitment to the modes of scientific collaboration now made possible through EarthCube. EarthCube will transform science by transforming the practices and the attitudes of its own members.

NASA image by Robert Simmon with ASTER data. Caption by Holli Riebeek with information and review provided by David Mayer, Robert Simmon, and Michael Abrams.