from Victorian Studies Volume 41, Number 4

Textual Scholarship, Textual Theory, and the Uses of Electronic Tools: A Brief Report on Current Undertakings

Jerome McGann


Permission to Copy

You may download, save, or print for your personal use without permission. If you wish to disseminate the electronic article, or to produce multiple copies for classroom or educational use, please request permission from:

Copyright Clearance Center
Professional Relations Department
222 Rosewood Drive
Danvers MA 01923

FAX: 978-750-4470/4744
Web address: www.copyright.com

For other permissions, use our online reprint request form.
 


Like many other professional observers, P. Aaron Potter worries that scholarly editing and textual scholarship face a clear and present danger from electronic instruments. He focuses on my work because he sees it as exemplifying general trends and practices. His critical vantage should be taken seriously, for the truth is that no one now adequately understands what these new tools hold in store for editing and textual studies, much less for so-called creative uses of text and mixed media materials. The current scene is volatile to a degree. On the other hand, like the printing press 500 years ago, the tools are now there, and they are being taken up by many people and institutions, not least of all by libraries (which are still our central scholarly and educational institutions). What lies in our futures, scholarly and otherwise, is already being determined by what is being done now with these tools. Persons like myself who have decided to test and deploy electronic tools in critical and scholarly practice do so because, like Potter, we are concerned about the future of our disciplines. My life has been spent in education and with books. I want to understand more clearly how these new media and technologies impinge on both. I also want to do what I can to ensure that if engineering technologies are certain to alter our libraries and our educational institutions--and they are--a humanistic perspective toward those changes is preserved and developed.

Textual studies is ground zero of everything we do. We read, we write, we think in a textual condition. Because that is true, the new information and media technologies go to the core of our work. As humane scholars we should not leave the development of these tools, which includes their introduction into our institutions, to administrators, systems analysts, and electronic engineers. Hence The Rossetti Archive, which was begun in 1992 with two goals in mind: first, to design a multi-media electronic model for scholarly editing that would have general applicability; second, to use this practical task as a vehicle for exploring the theoretical structure of imaginative texts and other aesthetic works as seen from a critical or ''user's'' perspective. Because the project had few models to depend upon, the work has been slow to emerge, though we are now on the brink of publishing the first installment of the Archive with University of Michigan Press. Potter worries about the stability of tools like The Rossetti Archive, as he should. Such concerns have been foundational at every stage of our work, as has our commitment to a thorough and accurate treatment of the Archive's materials, the thousands of SGML texts and digital images contained in the Archive's complex analytic organization. Aware of the volatile character of computer hardware and software, we designed the Archive from the beginning to be as free as possible from particular platforms, tools, and applications.

From that general context, then, let me address Potter's critical remarks in two ways. First, I want to make a brief pass through his essay in order to correct some of its errors and misunderstandings. After doing that, I'll address a few issues arising from his comments that have, in my view, some clear importance for humanities and literary scholarship.

I

Potter writes, mistakenly, that The Rossetti Archive is to be a CD-ROM product.1 The error here is both telling and grievous, for one of the most important objects of the Archive--it has been widely discussed and is well-known--is to build this complex system online. I am not sure that Potter understands the magnitude or significance of the difference between designing such a work as a CD-ROM and designing it as an online product. Furthermore, Potter seems to equate ''online'' systems with World Wide Web products, which of course can be (and often are) ephemeral, and which need not command the stability and rigor we expect from scholarly works (though they may in fact command such virtues). But the Archive is specifically not to be modeled on the HTML-based World Wide Web formats, which lack the analytic power we have designed into the structure of The Rossetti Archive. CD-ROM products, for their part, have several marked disadvantages, the most important being the limited range of their materials and of the connections that can be made to other related materials. Unlike George Landow's several Victorian ''web'' works, which Potter equates with The Rossetti Archive, the latter is organized for full on-the-fly collation of its database of texts and digital images as well as full structured search and analysis of all of its materials. Building such a work online, on such a scale, has been a considerable undertaking. Nothing quite like it exists, so far as we are aware.2

The misunderstanding about CD-ROM and online scholarly products suggests that Potter does not understand what is happening in the field of electronic textuality. The fact that he never mentions text-markup, TEI, or SGML, least of all the important new efforts to develop more flexible markup tools that would be adequate to noninformational, ''imaginative'' texts, is eloquent. And the fact that he apparently does not understand The Rossetti Archive's relation to these matters simply means he does not know how it works or what it has been designed to do.

Potter cites two of my essays but passes by other essays and materials where these matters of the logic and design of electronic textual instruments are discussed. The first treatment of these subjects has been available for some time as part of the online demo version of the Archive that we put up on the Web for critique and comment in 1992. This is the ''Brief Introduction'' to The Rossetti Archive, which I cannot see that Potter could have read, or if he read, could not have understood. As development of the Archive proceeded I have written and lectured about its design extensively. Besides ''The Rationale of HyperText'' (1997), with which Potter is familiar, the most relevant of these materials is ''The Rossetti Archive and Image-based Electronic Editing'' (1996) and my presidential address to the Society for Textual Scholarship (Spring 1997), neither of which does Potter indicate he knows. There is, as well, the essay ''Imagining What You Don't Know: The Theoretical Goals of the Rossetti Archive'' (1999), an earlier version of which has been available online. These works all treat in various ways the design theory of the Archive and the problems its development has exposed and been forced to deal with.3

Two of those essays are primarily critiques of The Rossetti Archive--the online essay ''Imagining What You Don't Know'' and the 1997 STS lecture ''Editing as a Theoretical Pursuit.'' They are critical reflections based on what we have discovered about editorial theory and scholarship as a consequence of trying to accommodate it to electronic tools. I do not think it's absolutely necessary, in order to reflect critically on these kinds of undertakings, that one have concrete experience in building them. But I do think that without such experience--because of the journalistic style of the discussion being carried on at all levels--one may easily misunderstand the nature of the problems involved. The scholars I know who are actually working with these tools are rarely ignorant of their limitations and problems. On the contrary. And the most important of these scholars pursue their work exactly because they want to bring enlightenment to the situation. Failure comes to these persons in a regular way and under many guises and masks. In an important sense, failure is what they know best and what they court most faithfully. The positive results they aim for, that is to say, the works they are trying to imagine into reality, have to be built with instruments (software and hardware, abstract and concrete) whose design is as yet unsettled. More significantly still, the instruments that are needed for this scholarly work are being created and modified out of the actual scholarly demands and experiences of these very projects. That result is one of the principal reasons driving people to these undertakings, whose designers are rarely naive about the perils involved.

The development of a tool like TEI is exemplary here. It came out of a rigorous reflection on how SGML could be adapted to the markup of codex materials in libraries.4 The elegance of its design structure was quickly recognized so that it was soon made a kind of standard for marking up library materials and documents. This large-scale use raised the level of our awareness of TEI and exposed its limitations. At first these problems seemed largely local ones that could be handled through modifications that did not call into question the basic structure of the tool. But a problem recognized early on--the problem of concurrency, it is called--gradually acquired greater and greater significance. The problem has now come to cast a clear and important light on the limits of TEI.5 The system's hierarchical organization, inherited from SGML, assumes that ''texts'' are informational structures. Imaginative texts, however--novels, poetry, even certain kinds of philosophical works like those of Wittgenstein, Kierkegaard, Plato, Peirce, and Derrida--are primarily organized as complex orders of concurrent and recursive forms; and these structural forms cannot be easily dealt with in a TEI and an SGML order of things.

Let me hasten to add that scholarly tools organized in TEI and SGML are not therefore benighted or useless as practical ''editions'' or scholarly instruments. They function a priori at vastly higher levels of scholarly usefulness than anything developed in less rigorously analytical forms (say Web-based HTML products). The limitations of TEI/SGML have been important in other ways as well. What does not come naturally to TEI/SGML, however, often represents what scholars and students of imaginative works most want to know or to have clarified, as I shall try to explain more particularly in a moment. Consequently, scholars who adopt a TEI/SGML approach to text markup may often ask their software to perform unnatural acts. Certainly this has been our case in developing The Rossetti Archive.6 We use a straight TEI markup for purely informational materials: for the Archive's various general commentaries, and for some of the standard critical materials that we include in the Archive. For Rossetti's works we fall back to SGML, which we have modified to make it more serviceable to the study of Rossetti's textual and pictorial works.

So, The Rossetti Archive has knowingly accepted an SGML approach to its materials, with all the virtues and vices of that approach. I don't have the slightest doubt that the Archive will prove very helpful to students of Victorian literature and culture, to students of art and poetry, and to students interested in theory of texts and theory of editing. Nor do I have any doubt that its problems and limitations will be clarified beyond what we already know from our experience in building the Archive. These will not be, however, the problems raised in Potter's essay.

II

What, then, are the specific problems and issues we have emphasized, by design or by fate, in undertaking The Rossetti Archive? Let me begin my response to that question by retreating for a moment to reflect briefly on some well-known matters.

The book is an exceedingly flexible invention. As a tool for studying and analyzing texts, it has proved its power for well over a millennium, and moveable type only enhanced its effectiveness. Long use has added the benefit of standardization: people know how books work, and can learn pretty quickly how to use even complex forms of the book, like scholarly editions. Scholars in particular appreciate the value of such standardized forms and procedures, which is exactly why many scholars--Potter is one--take a skeptical view of computerized tools. Why take up with uncertain and unstable instruments (computer hardware and software) when the ones we have work so well? The only sensible answer to that question is that one shouldn't, unless the potential benefits to be gained are so great that the effort is justified in that fact, or that expectation. Furthermore, even if it's clear that the benefits justify the effort, the community of scholars will not and should not adopt these instruments right away. They will come into scholarly use through a gradual process as experimental and exploratory projects test, modify, and standardize the new tools so that they can be easily assimilated into the traditional interests that define the work of scholars in their particular fields.

My own interests are traditional to a degree: I want to be able to study--to analyze and to interpret--imaginative works like novels, poems, and paintings. The documentary record that preserves and instantiates these works--primary as well as secondary documents--is the foundation of everything we do. Electronic tools hold out the promise that we will be able to advance our understanding of these kinds of works if we can learn to exploit computerized technology.7 I decided to work with Dante Gabriel Rossetti because his work raises extreme problems of analysis and interpretation, whether one is working with codex tools or with computerized instruments. The practical question then became: can computerized resources make possible what was impossible with codex resources, such as a scholarly instrument that could ''edit'' for full analysis and interpretation all the works of D. G. Rossetti (his textual works, his paintings and drawings, his book designs, his photography, his stained glass and arts and crafts works)?

Initially we saw our greatest problems in digital images. On one hand, they offered immense opportunities, for one could use the computer to store and retrieve large repositories of visual materials like pictures, photographs, and the physical book itself, and to link these images to related textual materials of all kinds. On the other hand, the information in digital images is not open to the standard kinds of analysis that make computers such apt students of texts. The problem of using computers to analyze and interpret digital images remains formidable, but in privileging this issue, The Rossetti Archive has helped to develop new ways of dealing with these problems, as one can already see in The Blake Archive, a work developed under the auspices of The Rossetti Archive.8

The aptitude of computers for analyzing textual materials allowed us to develop great plans in this area for Rossetti's works. After studying the opportunities offered by both TEI and SGML, we decided that a specially modified SGML design could be built that would allow us to carry out comprehensive collations as well as other search and analysis operations on the entire extant corpus of Rossetti's textual documents (all editions, all proofs, all manuscripts--all to be stored in the database in both SGML and digitized forms). Developing this tool, however, as I noted above, exposed many unanticipated problems. Dealing with the problems in a practical way has led us to begin rethinking the entire question of how to prepare textual materials for search and analysis so that they will be most available to students of imaginative literature.

The problem is easily described if not easily solved. Our standard tools for analyzing text--TEI and SGML protocols, for example--theorize text as nested sets of hierarchical forms. And it is indeed the case that all texts, informational as well as imaginative, deploy those kinds of forms. But while hierarchical forms appear in imaginative texts, often in multiple and overlapping ways, they do not govern such texts, as they do seem to govern informational texts. There is an important sense in which one could define an imaginative or poetical text as a documentary structure organized as a confederated arrangement of concurrent and recursive forms.

Such works are most like that fabulous form once used as a metaphor for God, the circle whose center is everywhere and whose circumference is nowhere. Each textual unit in an imaginative structure lies open to any number of formal transactions: grammatical, rhetorical, metrical/rhythmic; generic, figural, intertextual. Each of those formal horizons are themselves open to any number of specific instantiations that can function simultaneously, as the well-known phenomenon of poetic ''ambiguity'' illustrates. No markup system could hope to capture even a fraction of those procedural processes, and yet from the student and scholar's point of view these are exactly the matters that focus our attention and that always will. Indeed, given this situation one can easily see that our received codex-based materials are far more apt for analyzing and interpreting imaginative works than are any computerized instruments yet developed. This is the case because of the complexity and stability of our received literary institution--our system of libraries, museums, schools, publishing venues, and research centers. Books and their users fit into that institutional structure in ways that computers and their users do not (yet). The latter are groping to find their place, and they will become more integrated as time passes--which is to say, as time allows us to discover how to exploit the potential of these new electronic instruments.

We are positioned for those kinds of discoveries by the failure of current standard markup procedures to deal adequately with imaginative works. Something has to be done because scholars of imaginative work will never not be deeply interested in how to analyze and interpret concurrent and recursive forms. At this point at the Institute we are beginning to consider new approaches to the problem of searching and analyzing imaginative texts for those (concurrent and recursive) forms that are their most characteristic features. I believe that AI (Artifical Intelligence) models hold out interesting possibilities. One can imagine building tools programmed to execute on-the-fly searches of textual materials for various rule-governed patterns. Examples of these kinds of instruments can already be found, although none I have seen offer anything of practical usefulness (yet). But they offer procedural models that humanities and literary scholars might well adapt.9 At University of Virginia's Institute for Advanced Technology in the Humanities we have begun to discuss practical designs for computerized analyses and interpretations of poetical texts along these lines.

The difficulty of using computers to analyze and interpret digital images has already led us to try new approaches to those materials. For example, during the past several years the Institute designed and built a piece of software, the so-called Image Tool, that facilitates the analysis of a digital image's visual information through an SGML structure of marked-up metadata.10 The tool was first put to practical use with The Blake Archive and the plan is to develop it much further. At about the same time, while playing with Adobe Photoshop with a friend, I stumbled on an odd and startling potential of image-editing software. One could use its filters and editing protocols to develop interesting lines of analysis and interpretation of digital images.11 It so happened that I was working at the same time with a graduate student in literature who was carrying out physical manipulations of poetical texts for interpretive purposes.12 We have since combined our work in order to write up, in a brief and preliminary way, an explanation of how and why deliberate acts of ''deformation'' carried out on imaginative works (textual as well as pictorial) possess significant potential for explaining and understanding such works. This fairly extensive essay, ''Deformance and Interpretation,'' will appear in New Literary History in 1999.

Electronic tools have interested me most because they help us to understand books and the works that books organize and transmit--most especially imaginative works. They also help us to understand pictures better, and in these improved understandings they have, in my experience, begun to deepen our grasp of general issues of analysis and interpretation. The differentials that separate computers from more traditional texts are enabling ones. So, for instance, paradoxical as it may seem, these machines are already driving us to recover the resources and special precisions of more ''subjective'' forms of analysis and interpretation, forms and procedures that have fallen into disuse under the pressure of neoclassical and scientistic models of thinking and imagining. Impersonal as they seem (and are) in one perspective, computerized tools encourage the deployment of stochastic procedures which call upon individual and idiosyncratic choices. Much of our received philosophy of art and literature is grounded in essentialist thought, as if there were such a self-identical thing as ''the poem itself.'' Computerized environments implicitly argue for dialectical models, and many try to make that implicit frame of mind as explicit as possible.

The first hypertexts were books, as were the first hypermedia works, and the model for the World Wide Web most familiar to us is the library. It helps to remember those things when we try to grapple with the problems of bringing these new instruments into our work.13 We want computers only if they can function at least as well as books and libraries function as instruments for organizing and understanding texts. For the moment they do not function nearly so well. But we can already see how they might. It is part of their immediate usefulness that they are forcing us to imagine how we will shape these inventions to our needs and desires--as an acute student of this general scene has put it, how to ''imagine what we don't know.''14

University of Virginia

NOTES

1 There may be some CD-ROM components attached to the Archive that will facilitate its use in, for instance, a home desktop environment. Nonetheless, the Archive is fundamentally an online system.

2 The special character of The Rossetti Archive can perhaps be gauged by comparing it to three currently available scholarly works: Gregory Crane's Perseus Project (interactive sources and studies on ancient Greece); Peter Robinson's Canterbury Tales project; and The Blake Archive (Joseph Viscomi, Morris Eaves, and Robert Essick, eds.). Because the first two are not online products, the interactive range of their materials is limited. Robinson's edition has been designed for complex textual collations, something that lies far beyond the range of Crane's project, which has minimal analytic capabilities. But Crane's project surpasses Robinson's Canterbury Tales in the range of its materials and its hypermedia structure, and it dwarfs The Blake Archive in the same way. On the other hand, because the latter is an online product it stands ready to grow and develop well beyond its current, relatively modest range. It is also a hypermedia environment, and it has been marked in SGML for complex structured search operations. The Rossetti Archive, in a sense, amalgamates the features of all three of these projects. Of the three, The Blake Archive stands closest to The Rossetti Archive in its design and structure, though the scale of The Rossetti Archive is much larger than The Blake Archive (several hundred SGML files and digital images vs. many thousands). The similarity between the two is perhaps inevitable, since The Blake Archive was designed and developed within The Rossetti Archive's horizon at University of Virginia's Institute for Advanced Technology in the Humanities, where a whole series of ''Archive'' works have sprung up in the wake of the Institute's experience in developing The Rossetti Archive.

3 For the ''Introduction'' to The Rossetti Archive see The Journal of Pre-Raphaelite Studies N.S. 6 (Spring 1997): 22-32 (also at http://jefferson.village.virginia.edu/rossetti/introduction.html); ''The Rossetti Archive and Image-based Electronic Editing'' in The Literary Text in the Digital Age (University of Michigan Press, 1996): 145-84, rpt. in The Journal of Pre-Raphaelite Studies N.S. 6 (Spring 1997): 5-21; ''Imagining What You Don't Know: The Theoretical Goals of the Rossetti Archive'' at http://jefferson.village.virginia.edu/~jjm2f/chum.html; ''Editing as a Theoretical Pursuit,'' TEXT 12 (1998): 1-16.

4 There are many guides and introductions to TEI and SGML. Very useful ones are available from the E-Text Center at the University of Virginia, which also offers helpsheets for VRML and HTML at http://etext.lib.virginia.edu/helpsheets/helpsheets.html.

5 For some interesting critical reflections on TEI and theory of text markup see Allen Renear, ''Text Ontology from Below: The Contribution of Computing Practice to New Theories of Textuality,'' lecture abstract at http://www.cs.ucc.ie/renearTalk.html, as well as his ''Out of Praxis.'' See also Renear, Mylonas, and Durand.

6 I have discussed this matter in a general way in the essay ''Imagining What You Don't Know.'' Here is an example of a typical kind of problem. There is a large corpus of extant proof material relating to Rossetti's 1870 volume of Poems. Most of these are integral proofs, more or less intact, but many are different kinds of proof and/or manuscript assemblages put together by Rossetti and/or other persons, with or without Rossetti's cooperation or knowledge. These heteroglot and wildly non-hierarchical assemblages do not easily submit to an SGML design, least of all to a TEI structure. By manipulating certain standard markup fields, however, one can ''trick'' the markup system into organizing these kinds of documents so that they lay themselves open to structured search and analysis. Or, I should say, that one can trick the system so that the marked-up document parses against the DTD. Theoretically, then, the SGML software should be able to process these documents. What is theoretically correct, however, may turn out to be highly problematic in practice. For further discussions of these matters see Barnard, et al. The scholars working on the Wittgenstein Project are attempting a different approach to text markup in order to evade some of the limitations of TEI/SGML (see Huitfeldt).

7 The general case for using hypermedia tools to study literary and imaginative works is set out in ''The Rationale of HyperText.'' It may be helpful to remark here on the relations I entered into with computer specialists when I began work on the Archive. At that point I was fortunate to be thrown in with some very astute engineers, who approached their work with me under the following rule: Tell us what are the kinds of activities you do as a textual scholar, the kinds of problems and questions you're interested in, and we'll tell you whether our computer tools and resources can help you do better what you're already doing.

8 The address of The Blake Archive at the Institute for Advanced Technology in Humanities is http://jefferson.village.virginia.edu/blake/main.html.

9 I mention here two intriguing programs: one that generates English language anagrams from a given word-string, at http://www.infobahn.com/pages/anagram.html; the other, called ''BatMemes,'' which generates transformations of a given text according to several sets of rules (the primary one being derived from the work of the Oulipo group). The transformations are created from a vocabulary of words arbitrarily established--in the case of the demo version I used, the vocabulary of all the words in Bram Stoker's novel Dracula (1897). ''BatMemes''is a shareware program that can be downloaded from http://www.winsite.com.

10 Information on the Image Tool can be found at http://jefferson.village.virginia.edu/inote/.

11 I give a brief description of this work in the last section of the essay ''Imagining What You Don't Know.''

12 This work was initially explored in a Ph.D. thesis; see Samuels, ''Poetic Arrest.''

13 In this context I must mention Espen J. Aarseth's recent book, Cybertext (1997). It is easily the best exploration we now have of the relation of hypertext and hypermedia to literary work.

14 See Samuels's exploration of this idea in her ''Introduction to Poetry and the Problem of Beauty.''

WORKS CITED

Aarseth, Espen J. Cybertext: Perspectives on Ergodic Literature. Baltimore and London: Johns Hopkins University Press, 1997.

Barnard, David T., Ron Hayter, Maria Karababa, George Logan, and John McFadden. ''SGML-Based Markup for Literary Texts: Two Problems and Some Solutions.'' Computers and the Humanities 22 (1988): 265-76.

Huitfeldt, Claus. ''Multi-Dimensional Texts in a One-Dimensional Medium.'' Ed. Paul Henry and Arild Utaker. Wittgenstein and Contemporary Theories of Language, Working Papers from the Wittgenstein Archives at the University of Bergen, No. 5 (1992): 142-61. Rpt. in Computers and the Humanities 28 (1994): 235-41.

Renear, Allen. ''Out of Praxis: Three (Meta)Theories of Textuality.'' Ed. Kathryn Sutherland. Electronic Text: Investigations in Method and Theory. Oxford: Clarendon, 1997. 107-26.

Renear, Allen, Elli Mylonas, and David Durand. ''Refining Our Notion of What Text Really Is: The Problem of Overlapping Hierarchies.'' Ed. Nancy Ide and Susan Hockey. Research in Humanities Computing 4 (1996): 263-80.

Samuels, Lisa. ''Introduction to Poetry and the Problem of Beauty.'' Modern Language Studies 27.2 (Spring 1997): 1-7.

------. ''Poetic Arrest: Laura Riding, Wallace Stevens, and the Modernist Afterlife.'' Diss. Department of English, University of Virginia, 1997.

IU Press Journals
Home Page
More about Victorian Studies
Library
Recommendation
Tables of
Contents
Advance
Information
Copyright
Clearance