Sunday, March 30, 2008

The "drill down" dilemma. Why can't we link archaeological publication to the underlying data?

In Scholarship in the Digital Age (2007), Christine Borgman writes that "scholarly publications tell the story of data, regardless of whether those data are biological specimens, ecological sensor data, answers to interview questions, potshards found in an archaeological sites, or themes in fourteenth-century manuscripts. The story may be lost when the data and the publications are separated. Making better links between data and the documents that describe them is a common need across disciplines."

In archaeology, the dream of making seamless links between the publication and the data that inspired it can be traced back to at least 1983 when Barry Cunliffe proposed a multi-level publication strategy in his report on archaeological publication commissioned by the Council for British Archeology. Inspired by his own experiences at the Danebury Iron Age hillfort, Cunliffe seemed to be imagining a situation where a visitor to the site, puzzled by a particular question, might start their researches with the site guide. Basic information provided in that popular guide would link to the academic monograph series which itself would reference specialist reports, by this stage potentially available only in microfilm form. The real enthusiast could then follow the trail right back to the physical site archives, stored in the basement of the local museum.

The immediate attraction of such an idea for archaeology is apparent. Even the imagery of "drilling down" and "digging for information" seems perfectly geared to a discipline that revolves around concepts of layering and stratigraphy. Why then are there so few examples of the "drill down" publication strategy even 30 years later?

Sure, a number of archaeological journals have "supplements," a perennial source of confusion to publishers and libraries alike ("Are they supplements books or journals?" "Do we receive them as part of our subscription?"). At the American School of Classical Studies, the first Hesperia supplement was published in 1937, only 5 years after the journal itself was established. The Journal of Roman Archaeology and the Bulletin of the American Schools of Oriental Research are two other long-running periodicals with supplements. However, these supplements are almost always used to publish self-contained studies simply too long for the journal itself or, in format terms, more suitable for "book" publication--a distinction usually poorly-defined.

Certainly also, archaeology is starting to be inspired by other disciplines in experimenting with electronic-only "supplementary materials," where online hosting platforms permit. Recent book publications by the American School include CD-Roms full of color images too expensive to print on paper (The Neolithic Pottery from Lerna (2007)) and original texts (such as Ottoman tax records) written before the standardization of paper sizes (A Historical and Economic Geography of Ottoman Greece (2006)). As digital repository technology stablilizes, these supplementary materials will certainly appear online.

I would suggest that truly seamless linking between archaeological publication and its raw materials remains an unrealized goal for three reasons:

1. Nobody wants to share their data. It is now largely accepted that the culture of humanities scholarship is not amenable to sharing, as the American Council of Learned Societies report, Our Cultural Commonwealth (2006), emphasizes. Not only is there no mechanism in the system to reward data sharing (tenure committees don't care), but the concept of national, institutional, and personal control over the rights to study particular materials extracted during archaeological fieldwork is so embedded as to be largely unchallenged within the discipline itself. The costs of upsetting "established professor X" or not being able to work "in museum Y" are potentially too great to make a fuss!

2. The link between data and its publication is not a simple one. Unlike scientific experiments, archaeological excavation cannot be replicated. The information an archaeological project generates, therefore, is shaped by the preconceptions and expertise of the person recording it. Other kinds of factors may play a role: in his study of the methodology of archaeological survey, for example, Robert Schon has tested the degree to which alcoholic consumption the night before may impact the number and types of finds spotted next day! Post-excavation analysis occupies an even more uneasy ground between art and science. When characteristics such as culinary taste feed into the categorization techniques used by pot specialists, the problems of replicability become apparent.

3. The necessary infrastructure for archiving archaeological data is lacking. Since archaeological excavation is "the unrepeatable experiment," the importance of finding durable solutions to the archiving of excavation records should be obvious--especially when these records most often consist now of ephemeral digital files. However, the general lack of resources in the field and segmented nature of the institutional infrastructure (there are 17 foreign archaeological institutes in Athens alone) mean that the creation of archives is still in a state of discussion. Initiatives like the Archaeological Data Service in the UK are inspiring projects like Archaeoinformatics.org, but the motivations of each proposed program are met with suspicion by the elders in a discipline noted for its academic tribalism. Since control of data (see 1 above) and divergent standards (see 2 above) are central features of the discipline, a certain degree of skepticism about the prospects for success of "catch all" solutions is not a reasonable.

So where next for the dream of seamlessly linked publications and their data in archaeology? Some current trends are encouraging.

Firstly, semantic technologies for mapping different types of data are overcoming our need to develop common standards with colleagues we are barely on speaking terms with. A recent initiative by the American School, sponsored by the Mellon Foundation, aimed to develop a prototype digital repository mapping sample data from the long-running excavations from Corinth (since 1896) and the Athenian Agora (since 1931). Led by Thornton Staples of the Fedora Commons Foundation, the School's information architecture team showed how the creation of even a basic data dictionary allowed new connections to be discovered instantly not only across two sites with divergent histories but also across a range of different data types and sources. Although published records were not part of the repository, the potential to include them in the web of digital objects was clear. Although the links are not strictly hierarchical, as Barry Cunliffe envisioned, a "good enough" linkage between unpublished and published records in Fedora-like software systems is clearly within reach technological. Even if the bit is blunter and the results less certain, the ability to drill sideways and upwards as well as down is better than not having a drill at all!

Secondly, the elephants in the archaeological ecosystem (funders like the National Science Foundation) look set to mandate the deposit of digital data in some trusted repository as a requirement of funding archaeological projects. What level of data needs to be deposited and who will accept responsibility for its creation are topics of much discussion. But the discussion itself clearly signals that squatting on data for too long is unacceptable behavior.

Thirdly, other disciplines are developing models that may be of powerful use to archaeology. Borgman's opening quote acknowledges that linking publications to data is a common problem, and some interesting initiatives from other disciplines show great potential. At the Protein Data Bank, for example, Phil Bourne and his colleagues have increased both the range and granularity of the use of Digital Object Identifiers (DOIs) to link published with non-published materials. Meanwhile stronomers faced with the problem of comparing much large datasets than archaeologists have to deal with, collected from radio telescopes around the world, are developing clever "on-the-fly" mapping techniques to deal with divergent data standards.

The "drill down" may never be as easy as it sounds, but it is more attainable technologically, intellectually, and politically now than it has ever been in the past. The prospect of linking archaeological publication with the data that inspired it is coming within sight.

1 comment:

Anonymous said...

Dear Charles,

This is a good, thoughtful article. The internet is the perfect vehicle for linking interpretations to the data that support them (or don't, in some cases).

The internet journal, Internet Archaeology, has been drilling down for several years now - and your blog has been incorporated into the editorial of the current issue. The URL is:

http://intarch.ac.uk/journal/issue23/index.html

all the best,


Claire