Tuesday, April 1, 2008

Only Panthers Share Archaeological Data

Sebastian Heath's comments on my last post, where I made some rather self-important statements about the problems of linking data and publication, are (as one would expect) thoughtful and stimulating.

Sebs is right to pick me up on my glib statement that "nobody wants to share their data," and he presents some good examples of admirable projects that are taking a lead in being transparent and open with their findings. As I'll suggest in a moment, however, I think that the projects he highlights are the exceptions rather than the norm, and there are some distinctive features that make them exceptional.

Firstly, however, I would like to point out that I am not alone in my pessimism about the readiness of scholars to share. Since my last post also referred to it vigorously, it may seem that Christine Borgman's synthetic Scholarship in the Digital Age (2007) is the only book I have read. It's not quite (I have just re-read the last Harry Potter!), but I am finding some good stuff in it, especially in Chapter 8 where the author explores four areas of concern that discourage information sharing among scientists, social scientists, and humanists alike:

(1) Reward: Scholars are rewarded for publication through promotion and tenure but rarely acknowledged for managing their information. This means that they feel little obligation to self-archive data for others to use.

(2) Effort: The preparation of data so that it is useful to others is a huge effort. The logical fieldwork database structure is less than half the story. The documentation so that the information is useful to others is the real sweat.

(3) Priority: Although archaeology isn't medicine, being first with information is still the key to prestige and future funding.

(4) Ownership: Confusion about intellectual property rights and related concerns about the ownership and control of data make the prospect of sharing information risky.

These worries ring true to me for archaeology, and I find two of the points particularly persuasive. Having just spent a day watching my editorial colleagues almost literally tearing their hair out over a poorly organized pottery catalogue, I think that it is hard to underestimate the amount of thought that needs to go into preparing raw information so that it is useful (point 2 above). Archaeological data is socially constructed, so it needs context to be properly interpreted. There are norms of presentation to obey, concerns about interoperability to address, and a multitude of structuring decisions that need to be made to differentiate a catalog from a data dump. The question of "when" the records of a project become data is a big one in our field.

Although intellectual property concerns (point 4 above) are often linked to "copyright" and the naughtiness of publishers, other concerns that archaeologists have to wrestle with over the ownership and control of data seem to me to be much more urgent. In multi-institutional projects, which director's institution owns the copyright in the work its staff member is doing (remember that it is usually employers rather than employees who own the rights to work done on "company time")? When working overseas, what rights does the host nation have over the information being extracted? (Mexico, Greece, and France have recently attempted to tax photographs of their "patrimony"). These are all scary considerations, and issues that are made more frightening by being poorly understood.

But let's move back to the examples of good sharers Sebs brings up; Jack Davis with PRAP and MRAP, Ian Hodder at Çatalhöyük, Martha Joukowsky at Petra, and Brian Rose at Troy. Let's face it, Sebastian, these are legendary names, the "gray panthers" who have nothing to prove. Tenured, funded, at the top of their profession, they have little need for further reward, have access to some of the best minds around to help shape their data for other users, have less need than others to retain the right to priority, and are savvy in their abilities to navigate the intellectual property minefield. If you are a powerful feline, the obstacles to data sharing drop away.

When we talk about sharing, we need to look more at scholarly behavior at the starting-out level. Think graduate students and untenured faculty, the baby armadillos and raccoons rather than the panthers of the scholarly ecosystem. With their institutional repositories standing largely empty, libraries are currently puzzling about why these first "net gen" grad students and junior faculty aren't loading up university servers with data sets, conference presentations, articles in progress, course materials, and all the other good digital stuff a lively intellect produces. A glance at Borgman's book may suggest some simple truths about motivations, disincentives, and the law of the grasslands.

