Sunday, March 30, 2008

The "drill down" dilemma. Why can't we link archaeological publication to the underlying data?

In Scholarship in the Digital Age (2007), Christine Borgman writes that "scholarly publications tell the story of data, regardless of whether those data are biological specimens, ecological sensor data, answers to interview questions, potshards found in an archaeological sites, or themes in fourteenth-century manuscripts. The story may be lost when the data and the publications are separated. Making better links between data and the documents that describe them is a common need across disciplines."

In archaeology, the dream of making seamless links between the publication and the data that inspired it can be traced back to at least 1983 when Barry Cunliffe proposed a multi-level publication strategy in his report on archaeological publication commissioned by the Council for British Archeology. Inspired by his own experiences at the Danebury Iron Age hillfort, Cunliffe seemed to be imagining a situation where a visitor to the site, puzzled by a particular question, might start their researches with the site guide. Basic information provided in that popular guide would link to the academic monograph series which itself would reference specialist reports, by this stage potentially available only in microfilm form. The real enthusiast could then follow the trail right back to the physical site archives, stored in the basement of the local museum.

The immediate attraction of such an idea for archaeology is apparent. Even the imagery of "drilling down" and "digging for information" seems perfectly geared to a discipline that revolves around concepts of layering and stratigraphy. Why then are there so few examples of the "drill down" publication strategy even 30 years later?

Sure, a number of archaeological journals have "supplements," a perennial source of confusion to publishers and libraries alike ("Are they supplements books or journals?" "Do we receive them as part of our subscription?"). At the American School of Classical Studies, the first Hesperia supplement was published in 1937, only 5 years after the journal itself was established. The Journal of Roman Archaeology and the Bulletin of the American Schools of Oriental Research are two other long-running periodicals with supplements. However, these supplements are almost always used to publish self-contained studies simply too long for the journal itself or, in format terms, more suitable for "book" publication--a distinction usually poorly-defined.

Certainly also, archaeology is starting to be inspired by other disciplines in experimenting with electronic-only "supplementary materials," where online hosting platforms permit. Recent book publications by the American School include CD-Roms full of color images too expensive to print on paper (The Neolithic Pottery from Lerna (2007)) and original texts (such as Ottoman tax records) written before the standardization of paper sizes (A Historical and Economic Geography of Ottoman Greece (2006)). As digital repository technology stablilizes, these supplementary materials will certainly appear online.

I would suggest that truly seamless linking between archaeological publication and its raw materials remains an unrealized goal for three reasons:

1. Nobody wants to share their data. It is now largely accepted that the culture of humanities scholarship is not amenable to sharing, as the American Council of Learned Societies report, Our Cultural Commonwealth (2006), emphasizes. Not only is there no mechanism in the system to reward data sharing (tenure committees don't care), but the concept of national, institutional, and personal control over the rights to study particular materials extracted during archaeological fieldwork is so embedded as to be largely unchallenged within the discipline itself. The costs of upsetting "established professor X" or not being able to work "in museum Y" are potentially too great to make a fuss!

2. The link between data and its publication is not a simple one. Unlike scientific experiments, archaeological excavation cannot be replicated. The information an archaeological project generates, therefore, is shaped by the preconceptions and expertise of the person recording it. Other kinds of factors may play a role: in his study of the methodology of archaeological survey, for example, Robert Schon has tested the degree to which alcoholic consumption the night before may impact the number and types of finds spotted next day! Post-excavation analysis occupies an even more uneasy ground between art and science. When characteristics such as culinary taste feed into the categorization techniques used by pot specialists, the problems of replicability become apparent.

3. The necessary infrastructure for archiving archaeological data is lacking. Since archaeological excavation is "the unrepeatable experiment," the importance of finding durable solutions to the archiving of excavation records should be obvious--especially when these records most often consist now of ephemeral digital files. However, the general lack of resources in the field and segmented nature of the institutional infrastructure (there are 17 foreign archaeological institutes in Athens alone) mean that the creation of archives is still in a state of discussion. Initiatives like the Archaeological Data Service in the UK are inspiring projects like, but the motivations of each proposed program are met with suspicion by the elders in a discipline noted for its academic tribalism. Since control of data (see 1 above) and divergent standards (see 2 above) are central features of the discipline, a certain degree of skepticism about the prospects for success of "catch all" solutions is not a reasonable.

So where next for the dream of seamlessly linked publications and their data in archaeology? Some current trends are encouraging.

Firstly, semantic technologies for mapping different types of data are overcoming our need to develop common standards with colleagues we are barely on speaking terms with. A recent initiative by the American School, sponsored by the Mellon Foundation, aimed to develop a prototype digital repository mapping sample data from the long-running excavations from Corinth (since 1896) and the Athenian Agora (since 1931). Led by Thornton Staples of the Fedora Commons Foundation, the School's information architecture team showed how the creation of even a basic data dictionary allowed new connections to be discovered instantly not only across two sites with divergent histories but also across a range of different data types and sources. Although published records were not part of the repository, the potential to include them in the web of digital objects was clear. Although the links are not strictly hierarchical, as Barry Cunliffe envisioned, a "good enough" linkage between unpublished and published records in Fedora-like software systems is clearly within reach technological. Even if the bit is blunter and the results less certain, the ability to drill sideways and upwards as well as down is better than not having a drill at all!

Secondly, the elephants in the archaeological ecosystem (funders like the National Science Foundation) look set to mandate the deposit of digital data in some trusted repository as a requirement of funding archaeological projects. What level of data needs to be deposited and who will accept responsibility for its creation are topics of much discussion. But the discussion itself clearly signals that squatting on data for too long is unacceptable behavior.

Thirdly, other disciplines are developing models that may be of powerful use to archaeology. Borgman's opening quote acknowledges that linking publications to data is a common problem, and some interesting initiatives from other disciplines show great potential. At the Protein Data Bank, for example, Phil Bourne and his colleagues have increased both the range and granularity of the use of Digital Object Identifiers (DOIs) to link published with non-published materials. Meanwhile stronomers faced with the problem of comparing much large datasets than archaeologists have to deal with, collected from radio telescopes around the world, are developing clever "on-the-fly" mapping techniques to deal with divergent data standards.

The "drill down" may never be as easy as it sounds, but it is more attainable technologically, intellectually, and politically now than it has ever been in the past. The prospect of linking archaeological publication with the data that inspired it is coming within sight.

Thursday, March 6, 2008

An Institutional Response to the Challenges of Digital Scholarship in Archaeology at the American School of Classical Studies at Athens

There follows the text of a presentation I gave at the Mellon Foundation's All-Projects, Archaeology, meeting held in New York in March 2008. A version with pictures can be found on the meeting's website.

The project we will be talking about is somewhat different from the other presentations today in that our focus is on an institution rather than a particular work of electronic scholarship. Unlike most of you, the co-PIs on the Mellon Foundation grant (Chuck Jones and I) are not primarily scholars (although Chuck has an impressive scholarly record). I am Director of Publications at the American School and, until last Monday, Chuck was Head of the Blegen Library at the School. We are therefore support staff and our goal is to facilitate the institution’s scholarly goals, rather than do the research ourselves. The research and teaching activities that the American School engages in are increasingly conducted using computer technologies, and the support needs are therefore becoming rather different than they were even ten years ago.

In this brief presentation, we will introduce the American School of Classical Studies at Athens to those of you who don’t know it, highlight both the opportunities of digital scholarship and the challenges that new technologies are posing to the institution, and describe some of the ways in which we are trying to meet these challenges with the support of the Mellon Foundation.

The American School of Classical Studies at Athens is a research and teaching institution dedicated to the advanced study of all aspects of Greek culture, from antiquity to the present day. Founded in 1881, the School has always had a particular focus on archaeological research and it conducts excavations at two of the most important sites in the classical world; at Ancient Corinth since 1896 and in the Athenian Agora since 1931.

Digital technologies offer the ASCSA opportunities to further its mission in a number of ways: (1) We can extend access to our publications and enrich them through supplementary materials. (2) We can provide more teaching resources for the 180 North American institutions that already send their graduate students to our year-long academic program. (3) We can help scholars discover our rich information resources so that they are better prepared to launch straight into research when they arrive in Athens. (4) We can further support excavation and survey projects working in Greece by providing a trusted repository for the increasing amount of digital data they are producing. By structuring the data deposited we can allow cross-searching and can advance scholarship by showing how the results of these projects fit together.

We started planning to take advantage of the digital opportunities described in 2003, when the Mellon Foundation funded the first of two committees of information experts to visit Athens. With their help, we identified three main needs: (1) An enhanced web presence ( (2) A digital repository to manage, display, and curate all the different types of electronic resources the institution might have responsibility for—especially the irreplaceable records of archaeological excavation (information on our work on this and other digital initiatives can be accessed through the “digital library” tab on our website). (3) An information resources structure and staff in the libraries, archives, and IT department capable of sustaining this digital infrastructure.

In May 2006, we applied to the Foundation for a grant to help us fund Phase 1 of these changes. We are now close to completing this phase.

One major focus of the Mellon Foundation’s generous 2006 grant has been on the creation of a digital architecture capable of managing and delivering the School’s collections. In creating a prototype repository we have been very fortunate to have had the guidance of a visionary data architect, Thornton Staples from the Fedora Commons Foundation. It is on this prototype that we will now focus.

While good digital library models exist for the management and delivery of textual and visual materials (we think of D-Space, for example), incorporating archaeological data and seamlessly interweaving it with archival and library materials presents a more complex set of problems. Much of the archaeological data being produced now is born-digital, ranging from digital photographs to GIS datasets.

Meanwhile, scanning projects are being aggressively funded by the EU “Information Society” program, of which we have been a substantial beneficiary in the last two years, resulting in 100,000s of digital surrogates of notebooks, photos, plans etc. The management and curation of both types of electronic product present substantial technological challenges.

As well as posing problems related to technological sustainability, the creation of an integrated digital architecture involves dealing with the political challenges of an extremely territorial academic culture that sometimes seems to find its worst expression in archaeology. The American School is a single institution but this does not prevent the formation of fiefdoms.

Since the beginning of fieldwork, the excavation processes at Ancient Corinth and the Athenian Agora have developed separately. Even terminology sometimes varies. For example, at a very basic level, the vase-type known as a kotyle in Corinth is referred to as a Corinthian skyphos at the Agora. These issues are not unique to our institution, of course, but creating systems flexible enough to handle them is essential if our goal is the unification of data in a single context.

The prototype repository developed aims to meet these challenges. It provides a very flexible way of managing multiple different types of digital objects in a single environment. It also allows database systems at different projects to contribute data without dictating the software or metadata fields they use.

The prototype is very far from being a shiny, fully-functioning, tool. As well as being rough round the edges in appearance, it also contains only a small sample of excavation data, drawn from recent seasons at Corinth and the Athenian Agora. As a proof-of-concept, the prototype also contains retroactively-entered data from early 20th century excavations at Korakou, a site a few miles from the Corinth excavations sharing a similar suite of material culture, and a variety of digital surrogate material (such as photographs and notebook pages) pulled in from the EU funded scanning project.

The repository consists of a range of independent information objects that can be related to any other information object. The user may explore from node to node, or may view the web of all the contiguous relationships from a particular object. As a colleague at the University of Cincinnati commented a couple of days ago, the arrows should really be going both directions, and there should also of course be other links to other types of material. However, such a spider-web might be harder to visualize.

Within the archaeological data, there are four kinds of information objects, reflecting the conceptual model that forms the base for both the Agora and Corinth databases. While the terminology will vary for other projects, at the Agora and Corinth the information objects are:

1. “baskets or loci” (the units of earth extracted during fieldwork)
2. “finds” (the objects that are found when the dirt is sifted)
3. “lots” (the groups of baskets that after excavation are interpreted as deriving from a single depositional process and chronological phase)
4. “buildings, deposits, or features” (the conceptual contexts that result from the interpretation of the excavation data, informed by other textual or cultural evidence).

In the repository, each information object includes a structured description, expressed in XML, which, among other things, describes the relationships of that information object to others. Examples of such relationships would include “FindA45 wasPartOf BasketB23” to describe a pot’s relationship to the basket in which it was found, and “Basket67 wasDerivedFrom Notebook42-P45” to delineate the connection back from a basket object to the image of the page-spread from the notebook where it was first described.

Since all this information is contained in XML files, the system presents a durable long-term solution for archaeological data. The RDF (Resource description framework) indexing program used to generate the full-text search (through which the user can find the starting point of her exploration) and the relationship index (which creates the links that she will start to follow) are, like the whole system, Open Source. While the system is designed to be moved into a Fedora repository, quite a bit of further work will be needed to create a full Fedora implementation.

A central concern of our project was to generate minimal disruption for the departmental specialists who have curatorial control and responsibility for the different data sets being drawn into the system. A wide range of Filemaker and Access databases are in use at the excavations and in the archives and data from these is mechanically translated into the repository using a “data dictionary” produced by Thorny and ASCSA information specialists during three workshops in Athens. Metadata fields from the Agora and Corinth databases are mapped to a core XML schema. This is currently one of Thorny’s own invention, but he is optimistic that he can fairly easily change this to use the new Visual Resources Association Core schema. The idea that no archaeological project is forced to change their databases or conform to some centrally-imposed database scheme is absolutely integral to our approach. Different archaeological datasets and an infinite number of other data types from multiple locations can be mapped.

So what’s next?

Since December 2007 we have been demonstrating this prototype to American colleagues on affiliated projects, and specialists from the other foreign archaeological schools and Greek academic institutions. As we continue to detect their support, we will move to the next development stage for the repository, perhaps inviting participation from North American projects working in well-explored archaeological regions such as the Corinthia and Crete. Funding permitting, we also plan to move the prototype into a Fedora implementation. Fedora is powerful and flexible but complicated. As a privately-funded institution without access to the IT resources of a major university, we also hope that we can find a development partner among our Cooperating Institutions to help with this implementation.

At the same time we need to continue streamlining and coordinating our library, IT, and archival services—a process that has also been supported by the Foundation over the last few years. What we have learnt so far is that no institution should underestimate the costs, both financial and emotional, of transforming the experimental project we have shown here into a full-scale trusted digital repository program. The opportunities for developing new regional perspectives and revealing serendipitous links between once silo-ed projects are tremendously exciting. But we are also very conscious of the seriousness of the responsibilities we are now taking on in extending ASCSA’s 127 year-old role as a coordinating force for North American archaeology in Greece, to become a trusted archive and publisher in the digital, as well as analog, world.

Credits: The work described in this presentation was accomplished between July 2006 and December 2007 with the financial support of the Mellon Foundation. The PIs were Charles Watkinson and Chuck Jones. The information architecture consultant was Thornton Staples of the Fedora Commons Foundation. During three workshops held in Athens, Thorny worked with an Information Architecture Team consisting of the following ASCSA information specialists:

Tarek Elemam, Information Technology Manager, ASCSA
Bruce Hartzler, Information Specialist, Athenian Agora Excavations
James Herbst, Architect, Corinth Excavations
Carol A. Stein, Managing Editor, ASCSA Publications

This presentation reflects the hard work of all of these colleagues.