Saturday, August 30, 2008

The Next Conspiracy Theory?

A humanities publisher is always happy to be recognized by the STM world, especially a giant like Elsevier, so it is delightful to receive a message addressed to the Editor of Hesperia telling us that "in recognition of the quality and relevance of your journal to the scientific community, we are pleased to inform you that the publication listed above has been selected to be covered for indexing in SCOPUS; the largest abstract and citation database of research literature across all subject areas and quality web sources."

"By being indexed in Scopus," we are told, "your publication will be visible to millions of users every day. Once these users discover the abstracts of your journal in Scopus, they will then make use of (or request access to) the print and/or electronic version in your own library."

We immediately say "yes please" of course, but it leaves me wondering about why we are getting this unprecedented interest from a commercial giant. Although our publishing is wonderful, of course, why is an STM giant suddenly so fascinated in the relatively obscure products of a small humanities non-profit? Maybe they have just hit "H" in a random trawl through some serials list but, even if that is the case, the question remains "why now?"

As small non-profit publishers like ASCSA become increasingly persuaded by their more activist scholarly stakeholders and by the library community to allow their authors to self-archive post-prints, to remove the subscription barriers to older content, to preview larger quantities of text, and generally to engage in various forms of open access experiment, a cautionary note from the ever-perceptive Don Waters' (published in the Winter 2008 edition of the Journal of Electronic Publishing) comes to my attention:

"It is easy to imagine—especially in the absence of hard-nosed and aggressive strategic planning by, and collective action among, scholars, libraries, information technologists, and their universities—that the large, heavily capitalized publishing and other media firms will simply exploit open access repositories, cherry-pick the most valuable open access products, combine them with the most valuable new databases and resources, and sell them back to the academy at a significant profit, thereby chasing out sources of capital from within the academic community that are desperately needed to advance scientific, humanistic, and social science study. If the academy is unwilling or unable to think carefully now about possible downstream consequences of open access publishing and ways to steer clear of undesirable consequences, then the mantra about journal publishing—that the academy gives away its products only to buy them back at exorbitant prices—will surely return to haunt the academy in an even scarier garb than before, and prove to be even more financially debilitating."

Hmmm.

Saturday, April 5, 2008

The European Reference Index for the Humanities: Friend or Foe?

Trouble is brewing in the arcane world of humanities bibliometrics, and it looks as if a major debate on the measurement of what constitutes "quality" in scholarship in fields such as archaeology and classical studies is about to begin.

The catalyst for this is a European Science Foundation initiative to compile a European Reference Index for the Humanities (ERIH), defined (in its first iteration) as "a reference index of top journals in 15 areas of the Humanities, across the continent and beyond" but due to expand "to include book-form publications and non-traditional formats" with the aim of eventually operating "as a backbone to a fully-fledged research information system for the Humanities." Wow!

In 2007, with little fanfare, the first ERIH lists were published, presenting what aimed to be comprehensive catalogues of journals (ca. 1,300 in total and almost 300 in Classical Studies alone), each with an A, B, or C classification. Although the Committees of top scholars in each discipline who compiled them emphasized that "the lists are not a bibliometric tool" and that they therefore advised "against using the lists as the only basis for assessment of individual candidates for positions or promotions or of applicants for research grants," the presumption that universities and other organizations would not start to use them in this way was naive.

The reality is that funding agencies, university administrators, library acquisitions staff, and hiring committees alike have been desperate to find some objective way of measuring the quality of humanities research for years. Although subject to increasing criticism (and attempts to find a web-based metric using Google-like algorithms), the citation-based "impact factor" has been an acceptable measure of article quality in the sciences for decades, since Eugene Garfield invented the measure and later institutionalized it by selling his Institute for Scientific Information (ISI) to the mighty Thomson Corporation. Journals in the humanities, in the meantime, tend to be ranked on the basis of the extremely qualitative and fuzzy scale of "peer perception" which understandably drives busy bureaucrats within the higher establishment wild. There is an Arts and Humanities citation index, and sometimes I will get a panicked call from a junior scholar whose Dean has asked what Hesperia's impact factor is, but the AHCI index has never been widely accepted and ISI does not provide a means of extracting an impact factor from it. Among other problems with AHCI, its coverage of journals is limited and it doesn't acknowledge the important role books play in the humanities.

In an aside, related to previous discussions on this blog and elsewhere about the segmented and "tribal" nature of disciplines like archaeology, I once heard a rumor that ISI didn't produce an impact factor for the Arts and Humanities partly because their statistical analyses tended to find "odd clumping" when analyzing humanities journals, perhaps explained by the tendency of some sub-disciplines to almost exclusively cite themselves. However, I asked Eugene Garfield a question about this at a meeting of the Society for Scholarly Publishing a couple of years ago, and he claimed that his algorithm would work just as well for humanities journals as for sciences. Statistics is far from being my strong suit, so I didn't pursue this further.

In this context, ERIH sounds like an attempt to turn a qualitative measure of "peer perception" into something quantitative, and it undoubtedly has some good motivations.

Firstly, while the method by which the expert panels selected were chosen is obscure, the people on them are distinguished. For Archaeology, the panel consists of Lin Foxhall (Chair), University of Leicester (UK); Csanád Bálint, Hungarian Academy of Sciences, Budapest (HU); Serge Cleuziou, CNRS / Nanterre (FR); Kristian Kristiansen, Göteborgs Universitet (SE), and Jacek Lech, Polish Academy of Sciences, Warzaw (PL). For Classical Studies, Claudia Antonetti (Chair), Università Ca' Foscari, Venice (IT); Angelos Chaniotis, Universität Heidelberg (DE); Antonio Gonzales, Université de Franche-Comté, Besançon (FR); Richard Hunter, University of Cambridge (UK), and Paul Schubert, Université de Genève (CH).

Secondly, the compilation of lists that include important journals from developing as well as developed countries and new periodicals is extremely praiseworthy since almost any library catalogue is incomplete in this regard, and the high quality work of publishing colleagues in some of the new European countries is often unfairly ignored.

Thirdly, humanities is probably losing out on funding by not catering to bureaucrats. ERIH proponents, thinking in European terms, argue that a quantitative measure of humanities research quality would enable the humanities to compete alongside the sciences to access the €7bn funding provided by the European Research Council each year. At a more provincial level, it is probably true that one reason why high-quality but independently-published journals like Hesperia that are seeing steadily declining numbers of institutional subscribers is that librarians don't have a quantitative measure of quality to rely on when making their choices, and therefore tend to make scattershot decisions to subscribe to large commercial packages in the hope that they will hit some of the "core" periodicals. For independent journals as well, quantitative measures in the humanities would probably level the playing field.

On the other hand, a growing body of academics, especially in the traditionally Euro-sceptic UK, are spotting problems with ERIH. They seem to be led, one is somewhat proud to note, by the traditional "awkward squad" disciplines of archaeology and classics.

A good summary of the arguments against ERIH can be found in the PDF minutes of a meeting of the Arts and Humanities Research Council in the UK on February 27, 2008, subsequently reported in the Times Higher Education Supplement of March 19, 2008. Also worth watching may soon be the website of the Council of University Classical Departments (CUCD) which seems to be leading the opposition to ERIH under the control of their distinguished Chairman, Robin Osborne.

The main problems identified were:

1. By categorizing journals into disciplines, the importance of interdisciplinary journals is understated. A single list of journals would be more useful than 15 separate ones.

2. The methodology on categorization (involving quantitative data on % of authors from different countries, acceptance rate, level of peer-review but also qualitative data about who is on the advisory board etc.) is obscure and unscientific. However good the expert panels are, surely their own research preferences and integration into networks would show.

3. The lists were not complete, especially in regard to non-European including North American periodicals, and some of the journals listed were defunct.

The biggest concern was that a system like ERIH that even its proponents agreed was till in a beta-phase was already being to make hiring and funding decisions. Although the evidence for this was anecdotal, the probability that, after such a long period of frustration over the lack of quantitative measures of humanities research quality, ERIH would not be used by data-starved administrators seemed low.

In recent weeks, the generators of ERIH have clearly been acting to head off its critics. For the first time last week, Hesperia received formal notification of the project and a feedback form with which to comment on our rankings. How did we do? Not too bad with category "A" (for "high ranking, international level publication") in Classical Studies and Archaeology and "B" (for "standard, international level publication") in History. The initial list in Art and Art History, another important field in which the journal publishes, is yet to be announced.

It's nice to get grade "A"s, so perhaps my decision to suspend judgement on ERIH for the moment is biased. Peer review works for the contents of journals, so why shouldn't it work for compiling lists of journals themselves? How else would an obvious gap in the market for information on humanities publications be filled than by a major international initiative? Will ERIH's promise to index publications in "non-traditional formats" in the future provide objective measures for the quality of electronic publications that have so far been poorly recognized by employers? However, I also see a lot of validity in the criticism of the project which seems to have been unduly secretive in its development and perhaps naive in its implementation. The important debate about how to measure the quality of publications in the humanities that ERIH has reopened is definitely one to watch.

Archaeology as Ecology?

I was extremely flattered and amused last week to have stimulated a debate about ecology through my rather glib identification of some of the giants in our field of broader Mediterranean archaeology (Hodder, Davis, Joukowsky, Rose) as "grey panthers," who are confident enough of their positions to take the risk of sharing data.

I think that the distinguished colleagues who have recently identified themselves as "archaeological data critters" (Eric Kansa's great term) are downplaying their role in the food chain; Bill Caraher is more capibara (uber-rodent and one of my favorite animals) than squirrel, Eric Kansa is more an eagle than a bluebird, and Tom Elliott is more like the nicest kind of giant pan-galactic gorilla than cranky space monkey. As for me, Tom's reference to "the Watkinson Archaeological Cyberverse" momentarily transformed me into one of those tiny tree-frogs that temporarily conceal their true minuteness by puffing up.

Before I deflate and go back to my hole in the tree, however, I do want to make an additional comment about the idea of there being different kinds of "archaeological critters" and its relevance to larger debates about how to support emerging digital scholarship.

The use of a biological metaphor in discussions of how scholarship works is, of course, not new. The idea of an "ecology of scholarly communication" seems to be cropping up more and more in discussions of the respective roles of libraries, press, and IT departments in supporting the "information life-cycle" from production, through management and preservation, to dissemination. The nature of academia as an interdependent and complex system of groups with their own behavior patterns is often discussed in terms of either "species" or "tribes" (e.g., the bestselling book, Academic Tribes and Territories: Intellectual Enquiry and the Cultures of Discipline).

Any of us involved in archaeology recognize that our discipline must be one of the most segmented fields of study out there. Look at conferences, for example. As the veteran of many conference displays in a past job at The David Brown Book Company, I know that hardly anybody who goes to the Society of American Archaeology (SAA) meetings attends the Archaeological Institute of America (AIA) or the Society for Historical Archaeology (SHA). Funding constraints mean that most scholars can only attend one big conference a year, and this economic "accident" perpetuates disciplinary divide. Even within an AIA meeting, a multitude of different worlds with their own logics can be observed. Individual conference identities are shaped by scholarly specialty, institutional affiliation, age cohort, qualitative or quantitative orientation etc., and groupings form and reform in a way that is probably better described in terms of theories of "ethnicity" than biology.

The recognition that, in academic terms, fields like archaeology consist of many sub-disciplines with their own logics and modes of presentation and discussion is very clear when working with authors. There are notable exceptions, but my experience in publishing classical archaeology, for example, is that epigraphers and amphora specialists tend to be extremely territorial; reviewers tend to be very opinionated and outspoken, citations seem to always be to a limited range of sources, data sharing seems unusual, and there is usually a long time-delay between discovery and publication.

In a commentary on the "archaeological critters" discussion, my friend and sadly-missed ex-colleague Chuck Jones (some kind of good bear, I think) asked the following question: "it occurs to me that some rather ambitious projects seeking to offer generalized platforms for the archiving and distribution of archaeological data have not yet come up in the conversations, and I wonder why not."

Understanding that there are multiple species, even in a small ecological zone like archaeology, is important for the designers and funders of the evolving cyberinfrastructure. The recognition that "archaeological informatics" is a specific field of information science is becoming more widespread. Archaeology is different from computer science, economics, medicine or physics in the way researchers use, produce, manage, and disseminate information. However, we all need to also recognize that is that the discipline is internally segmented to a degree that (a) any overly ambitious "global" repository- or digital toolkit-building exercise will only gain very limited acceptance, unless it tailors itself to support a range of different communities, and (b) perhaps the most logical level at which to sustain repositories and digital tools is at a "species" rather than "ecological zone" level, meaning that it is likely that each discipline will remain fairly wary of solutions created by "outsiders" (e.g., Americanist archaeologists in the Mediterranean).

Of course, the ecological metaphor loses its usefulness after a certain point. After all, in our quest for interoperability between digital repositories, is what we are really trying to do is to get different species to interbreed? (If so, no wonder it's so difficult!) And, anyway, what kind of sick, demented kind of research project is that? None of the pleasant and visionary archaeological informaticians I have met look like Dr. Moreau.

Tuesday, April 1, 2008

Only Panthers Share Archaeological Data

Sebastian Heath's comments on my last post, where I made some rather self-important statements about the problems of linking data and publication, are (as one would expect) thoughtful and stimulating.

Sebs is right to pick me up on my glib statement that "nobody wants to share their data," and he presents some good examples of admirable projects that are taking a lead in being transparent and open with their findings. As I'll suggest in a moment, however, I think that the projects he highlights are the exceptions rather than the norm, and there are some distinctive features that make them exceptional.

Firstly, however, I would like to point out that I am not alone in my pessimism about the readiness of scholars to share. Since my last post also referred to it vigorously, it may seem that Christine Borgman's synthetic Scholarship in the Digital Age (2007) is the only book I have read. It's not quite (I have just re-read the last Harry Potter!), but I am finding some good stuff in it, especially in Chapter 8 where the author explores four areas of concern that discourage information sharing among scientists, social scientists, and humanists alike:

(1) Reward: Scholars are rewarded for publication through promotion and tenure but rarely acknowledged for managing their information. This means that they feel little obligation to self-archive data for others to use.

(2) Effort: The preparation of data so that it is useful to others is a huge effort. The logical fieldwork database structure is less than half the story. The documentation so that the information is useful to others is the real sweat.

(3) Priority: Although archaeology isn't medicine, being first with information is still the key to prestige and future funding.

(4) Ownership: Confusion about intellectual property rights and related concerns about the ownership and control of data make the prospect of sharing information risky.

These worries ring true to me for archaeology, and I find two of the points particularly persuasive. Having just spent a day watching my editorial colleagues almost literally tearing their hair out over a poorly organized pottery catalogue, I think that it is hard to underestimate the amount of thought that needs to go into preparing raw information so that it is useful (point 2 above). Archaeological data is socially constructed, so it needs context to be properly interpreted. There are norms of presentation to obey, concerns about interoperability to address, and a multitude of structuring decisions that need to be made to differentiate a catalog from a data dump. The question of "when" the records of a project become data is a big one in our field.

Although intellectual property concerns (point 4 above) are often linked to "copyright" and the naughtiness of publishers, other concerns that archaeologists have to wrestle with over the ownership and control of data seem to me to be much more urgent. In multi-institutional projects, which director's institution owns the copyright in the work its staff member is doing (remember that it is usually employers rather than employees who own the rights to work done on "company time")? When working overseas, what rights does the host nation have over the information being extracted? (Mexico, Greece, and France have recently attempted to tax photographs of their "patrimony"). These are all scary considerations, and issues that are made more frightening by being poorly understood.

But let's move back to the examples of good sharers Sebs brings up; Jack Davis with PRAP and MRAP, Ian Hodder at Çatalhöyük, Martha Joukowsky at Petra, and Brian Rose at Troy. Let's face it, Sebastian, these are legendary names, the "gray panthers" who have nothing to prove. Tenured, funded, at the top of their profession, they have little need for further reward, have access to some of the best minds around to help shape their data for other users, have less need than others to retain the right to priority, and are savvy in their abilities to navigate the intellectual property minefield. If you are a powerful feline, the obstacles to data sharing drop away.

When we talk about sharing, we need to look more at scholarly behavior at the starting-out level. Think graduate students and untenured faculty, the baby armadillos and raccoons rather than the panthers of the scholarly ecosystem. With their institutional repositories standing largely empty, libraries are currently puzzling about why these first "net gen" grad students and junior faculty aren't loading up university servers with data sets, conference presentations, articles in progress, course materials, and all the other good digital stuff a lively intellect produces. A glance at Borgman's book may suggest some simple truths about motivations, disincentives, and the law of the grasslands.

Sunday, March 30, 2008

The "drill down" dilemma. Why can't we link archaeological publication to the underlying data?

In Scholarship in the Digital Age (2007), Christine Borgman writes that "scholarly publications tell the story of data, regardless of whether those data are biological specimens, ecological sensor data, answers to interview questions, potshards found in an archaeological sites, or themes in fourteenth-century manuscripts. The story may be lost when the data and the publications are separated. Making better links between data and the documents that describe them is a common need across disciplines."

In archaeology, the dream of making seamless links between the publication and the data that inspired it can be traced back to at least 1983 when Barry Cunliffe proposed a multi-level publication strategy in his report on archaeological publication commissioned by the Council for British Archeology. Inspired by his own experiences at the Danebury Iron Age hillfort, Cunliffe seemed to be imagining a situation where a visitor to the site, puzzled by a particular question, might start their researches with the site guide. Basic information provided in that popular guide would link to the academic monograph series which itself would reference specialist reports, by this stage potentially available only in microfilm form. The real enthusiast could then follow the trail right back to the physical site archives, stored in the basement of the local museum.

The immediate attraction of such an idea for archaeology is apparent. Even the imagery of "drilling down" and "digging for information" seems perfectly geared to a discipline that revolves around concepts of layering and stratigraphy. Why then are there so few examples of the "drill down" publication strategy even 30 years later?

Sure, a number of archaeological journals have "supplements," a perennial source of confusion to publishers and libraries alike ("Are they supplements books or journals?" "Do we receive them as part of our subscription?"). At the American School of Classical Studies, the first Hesperia supplement was published in 1937, only 5 years after the journal itself was established. The Journal of Roman Archaeology and the Bulletin of the American Schools of Oriental Research are two other long-running periodicals with supplements. However, these supplements are almost always used to publish self-contained studies simply too long for the journal itself or, in format terms, more suitable for "book" publication--a distinction usually poorly-defined.

Certainly also, archaeology is starting to be inspired by other disciplines in experimenting with electronic-only "supplementary materials," where online hosting platforms permit. Recent book publications by the American School include CD-Roms full of color images too expensive to print on paper (The Neolithic Pottery from Lerna (2007)) and original texts (such as Ottoman tax records) written before the standardization of paper sizes (A Historical and Economic Geography of Ottoman Greece (2006)). As digital repository technology stablilizes, these supplementary materials will certainly appear online.

I would suggest that truly seamless linking between archaeological publication and its raw materials remains an unrealized goal for three reasons:

1. Nobody wants to share their data. It is now largely accepted that the culture of humanities scholarship is not amenable to sharing, as the American Council of Learned Societies report, Our Cultural Commonwealth (2006), emphasizes. Not only is there no mechanism in the system to reward data sharing (tenure committees don't care), but the concept of national, institutional, and personal control over the rights to study particular materials extracted during archaeological fieldwork is so embedded as to be largely unchallenged within the discipline itself. The costs of upsetting "established professor X" or not being able to work "in museum Y" are potentially too great to make a fuss!

2. The link between data and its publication is not a simple one. Unlike scientific experiments, archaeological excavation cannot be replicated. The information an archaeological project generates, therefore, is shaped by the preconceptions and expertise of the person recording it. Other kinds of factors may play a role: in his study of the methodology of archaeological survey, for example, Robert Schon has tested the degree to which alcoholic consumption the night before may impact the number and types of finds spotted next day! Post-excavation analysis occupies an even more uneasy ground between art and science. When characteristics such as culinary taste feed into the categorization techniques used by pot specialists, the problems of replicability become apparent.

3. The necessary infrastructure for archiving archaeological data is lacking. Since archaeological excavation is "the unrepeatable experiment," the importance of finding durable solutions to the archiving of excavation records should be obvious--especially when these records most often consist now of ephemeral digital files. However, the general lack of resources in the field and segmented nature of the institutional infrastructure (there are 17 foreign archaeological institutes in Athens alone) mean that the creation of archives is still in a state of discussion. Initiatives like the Archaeological Data Service in the UK are inspiring projects like Archaeoinformatics.org, but the motivations of each proposed program are met with suspicion by the elders in a discipline noted for its academic tribalism. Since control of data (see 1 above) and divergent standards (see 2 above) are central features of the discipline, a certain degree of skepticism about the prospects for success of "catch all" solutions is not a reasonable.

So where next for the dream of seamlessly linked publications and their data in archaeology? Some current trends are encouraging.

Firstly, semantic technologies for mapping different types of data are overcoming our need to develop common standards with colleagues we are barely on speaking terms with. A recent initiative by the American School, sponsored by the Mellon Foundation, aimed to develop a prototype digital repository mapping sample data from the long-running excavations from Corinth (since 1896) and the Athenian Agora (since 1931). Led by Thornton Staples of the Fedora Commons Foundation, the School's information architecture team showed how the creation of even a basic data dictionary allowed new connections to be discovered instantly not only across two sites with divergent histories but also across a range of different data types and sources. Although published records were not part of the repository, the potential to include them in the web of digital objects was clear. Although the links are not strictly hierarchical, as Barry Cunliffe envisioned, a "good enough" linkage between unpublished and published records in Fedora-like software systems is clearly within reach technological. Even if the bit is blunter and the results less certain, the ability to drill sideways and upwards as well as down is better than not having a drill at all!

Secondly, the elephants in the archaeological ecosystem (funders like the National Science Foundation) look set to mandate the deposit of digital data in some trusted repository as a requirement of funding archaeological projects. What level of data needs to be deposited and who will accept responsibility for its creation are topics of much discussion. But the discussion itself clearly signals that squatting on data for too long is unacceptable behavior.

Thirdly, other disciplines are developing models that may be of powerful use to archaeology. Borgman's opening quote acknowledges that linking publications to data is a common problem, and some interesting initiatives from other disciplines show great potential. At the Protein Data Bank, for example, Phil Bourne and his colleagues have increased both the range and granularity of the use of Digital Object Identifiers (DOIs) to link published with non-published materials. Meanwhile stronomers faced with the problem of comparing much large datasets than archaeologists have to deal with, collected from radio telescopes around the world, are developing clever "on-the-fly" mapping techniques to deal with divergent data standards.

The "drill down" may never be as easy as it sounds, but it is more attainable technologically, intellectually, and politically now than it has ever been in the past. The prospect of linking archaeological publication with the data that inspired it is coming within sight.

Thursday, March 6, 2008

An Institutional Response to the Challenges of Digital Scholarship in Archaeology at the American School of Classical Studies at Athens

There follows the text of a presentation I gave at the Mellon Foundation's All-Projects, Archaeology, meeting held in New York in March 2008. A version with pictures can be found on the meeting's website.

The project we will be talking about is somewhat different from the other presentations today in that our focus is on an institution rather than a particular work of electronic scholarship. Unlike most of you, the co-PIs on the Mellon Foundation grant (Chuck Jones and I) are not primarily scholars (although Chuck has an impressive scholarly record). I am Director of Publications at the American School and, until last Monday, Chuck was Head of the Blegen Library at the School. We are therefore support staff and our goal is to facilitate the institution’s scholarly goals, rather than do the research ourselves. The research and teaching activities that the American School engages in are increasingly conducted using computer technologies, and the support needs are therefore becoming rather different than they were even ten years ago.

In this brief presentation, we will introduce the American School of Classical Studies at Athens to those of you who don’t know it, highlight both the opportunities of digital scholarship and the challenges that new technologies are posing to the institution, and describe some of the ways in which we are trying to meet these challenges with the support of the Mellon Foundation.

The American School of Classical Studies at Athens is a research and teaching institution dedicated to the advanced study of all aspects of Greek culture, from antiquity to the present day. Founded in 1881, the School has always had a particular focus on archaeological research and it conducts excavations at two of the most important sites in the classical world; at Ancient Corinth since 1896 and in the Athenian Agora since 1931.

Digital technologies offer the ASCSA opportunities to further its mission in a number of ways: (1) We can extend access to our publications and enrich them through supplementary materials. (2) We can provide more teaching resources for the 180 North American institutions that already send their graduate students to our year-long academic program. (3) We can help scholars discover our rich information resources so that they are better prepared to launch straight into research when they arrive in Athens. (4) We can further support excavation and survey projects working in Greece by providing a trusted repository for the increasing amount of digital data they are producing. By structuring the data deposited we can allow cross-searching and can advance scholarship by showing how the results of these projects fit together.

We started planning to take advantage of the digital opportunities described in 2003, when the Mellon Foundation funded the first of two committees of information experts to visit Athens. With their help, we identified three main needs: (1) An enhanced web presence (http://www.ascsa.edu.gr). (2) A digital repository to manage, display, and curate all the different types of electronic resources the institution might have responsibility for—especially the irreplaceable records of archaeological excavation (information on our work on this and other digital initiatives can be accessed through the “digital library” tab on our website). (3) An information resources structure and staff in the libraries, archives, and IT department capable of sustaining this digital infrastructure.

In May 2006, we applied to the Foundation for a grant to help us fund Phase 1 of these changes. We are now close to completing this phase.

One major focus of the Mellon Foundation’s generous 2006 grant has been on the creation of a digital architecture capable of managing and delivering the School’s collections. In creating a prototype repository we have been very fortunate to have had the guidance of a visionary data architect, Thornton Staples from the Fedora Commons Foundation. It is on this prototype that we will now focus.

While good digital library models exist for the management and delivery of textual and visual materials (we think of D-Space, for example), incorporating archaeological data and seamlessly interweaving it with archival and library materials presents a more complex set of problems. Much of the archaeological data being produced now is born-digital, ranging from digital photographs to GIS datasets.

Meanwhile, scanning projects are being aggressively funded by the EU “Information Society” program, of which we have been a substantial beneficiary in the last two years, resulting in 100,000s of digital surrogates of notebooks, photos, plans etc. The management and curation of both types of electronic product present substantial technological challenges.

As well as posing problems related to technological sustainability, the creation of an integrated digital architecture involves dealing with the political challenges of an extremely territorial academic culture that sometimes seems to find its worst expression in archaeology. The American School is a single institution but this does not prevent the formation of fiefdoms.

Since the beginning of fieldwork, the excavation processes at Ancient Corinth and the Athenian Agora have developed separately. Even terminology sometimes varies. For example, at a very basic level, the vase-type known as a kotyle in Corinth is referred to as a Corinthian skyphos at the Agora. These issues are not unique to our institution, of course, but creating systems flexible enough to handle them is essential if our goal is the unification of data in a single context.

The prototype repository developed aims to meet these challenges. It provides a very flexible way of managing multiple different types of digital objects in a single environment. It also allows database systems at different projects to contribute data without dictating the software or metadata fields they use.

The prototype is very far from being a shiny, fully-functioning, tool. As well as being rough round the edges in appearance, it also contains only a small sample of excavation data, drawn from recent seasons at Corinth and the Athenian Agora. As a proof-of-concept, the prototype also contains retroactively-entered data from early 20th century excavations at Korakou, a site a few miles from the Corinth excavations sharing a similar suite of material culture, and a variety of digital surrogate material (such as photographs and notebook pages) pulled in from the EU funded scanning project.

The repository consists of a range of independent information objects that can be related to any other information object. The user may explore from node to node, or may view the web of all the contiguous relationships from a particular object. As a colleague at the University of Cincinnati commented a couple of days ago, the arrows should really be going both directions, and there should also of course be other links to other types of material. However, such a spider-web might be harder to visualize.

Within the archaeological data, there are four kinds of information objects, reflecting the conceptual model that forms the base for both the Agora and Corinth databases. While the terminology will vary for other projects, at the Agora and Corinth the information objects are:

1. “baskets or loci” (the units of earth extracted during fieldwork)
2. “finds” (the objects that are found when the dirt is sifted)
3. “lots” (the groups of baskets that after excavation are interpreted as deriving from a single depositional process and chronological phase)
4. “buildings, deposits, or features” (the conceptual contexts that result from the interpretation of the excavation data, informed by other textual or cultural evidence).

In the repository, each information object includes a structured description, expressed in XML, which, among other things, describes the relationships of that information object to others. Examples of such relationships would include “FindA45 wasPartOf BasketB23” to describe a pot’s relationship to the basket in which it was found, and “Basket67 wasDerivedFrom Notebook42-P45” to delineate the connection back from a basket object to the image of the page-spread from the notebook where it was first described.

Since all this information is contained in XML files, the system presents a durable long-term solution for archaeological data. The RDF (Resource description framework) indexing program used to generate the full-text search (through which the user can find the starting point of her exploration) and the relationship index (which creates the links that she will start to follow) are, like the whole system, Open Source. While the system is designed to be moved into a Fedora repository, quite a bit of further work will be needed to create a full Fedora implementation.

A central concern of our project was to generate minimal disruption for the departmental specialists who have curatorial control and responsibility for the different data sets being drawn into the system. A wide range of Filemaker and Access databases are in use at the excavations and in the archives and data from these is mechanically translated into the repository using a “data dictionary” produced by Thorny and ASCSA information specialists during three workshops in Athens. Metadata fields from the Agora and Corinth databases are mapped to a core XML schema. This is currently one of Thorny’s own invention, but he is optimistic that he can fairly easily change this to use the new Visual Resources Association Core schema. The idea that no archaeological project is forced to change their databases or conform to some centrally-imposed database scheme is absolutely integral to our approach. Different archaeological datasets and an infinite number of other data types from multiple locations can be mapped.

So what’s next?

Since December 2007 we have been demonstrating this prototype to American colleagues on affiliated projects, and specialists from the other foreign archaeological schools and Greek academic institutions. As we continue to detect their support, we will move to the next development stage for the repository, perhaps inviting participation from North American projects working in well-explored archaeological regions such as the Corinthia and Crete. Funding permitting, we also plan to move the prototype into a Fedora implementation. Fedora is powerful and flexible but complicated. As a privately-funded institution without access to the IT resources of a major university, we also hope that we can find a development partner among our Cooperating Institutions to help with this implementation.

At the same time we need to continue streamlining and coordinating our library, IT, and archival services—a process that has also been supported by the Foundation over the last few years. What we have learnt so far is that no institution should underestimate the costs, both financial and emotional, of transforming the experimental project we have shown here into a full-scale trusted digital repository program. The opportunities for developing new regional perspectives and revealing serendipitous links between once silo-ed projects are tremendously exciting. But we are also very conscious of the seriousness of the responsibilities we are now taking on in extending ASCSA’s 127 year-old role as a coordinating force for North American archaeology in Greece, to become a trusted archive and publisher in the digital, as well as analog, world.

Credits: The work described in this presentation was accomplished between July 2006 and December 2007 with the financial support of the Mellon Foundation. The PIs were Charles Watkinson and Chuck Jones. The information architecture consultant was Thornton Staples of the Fedora Commons Foundation. During three workshops held in Athens, Thorny worked with an Information Architecture Team consisting of the following ASCSA information specialists:

Tarek Elemam, Information Technology Manager, ASCSA
Bruce Hartzler, Information Specialist, Athenian Agora Excavations
James Herbst, Architect, Corinth Excavations
Carol A. Stein, Managing Editor, ASCSA Publications

This presentation reflects the hard work of all of these colleagues.