Laura White is a renowned expert on Jane Austen. However, she has chosen a novel approach to this classic British icon. The Nebraska professor is an innovator in the emerging field of digital humanities, and studies literature by means of a computer. The rigid divide between human creativity and the world of binary computer code is quickly being bridged, according to Google. I had a few words with professor White about this new branch of study, and what it means for writing and literary studies.
Historyradio.org: You have studied Jane Austen, have you discovered something new about her, something we couldn’t have discovered without the use of a computer?
Professor White: Yes, I think so. What we did was identify (code) each and every word in the six major novels as to speaker. That’s easy to do for the narrator and character speech, but trickier with free indirect diction, when the narrator is “speaking for” a character, using his or her vocabulary and point of view. Such shared speech we coded as such, and weighted to reflect the depth of ventriloquism. The results are not yet fully known, because what we created is a public sandbox in which people can design their own searches about diction to use the coding we created—there is a lot waiting to be discovered. But at the very least we found that Austen’s use of free indirect discourse (and she was the first major novelist to exploit FID fully) was far more complex and varied than we (and all the scholars writing on the subject thus far) had suspected. We also have found some cute nuggets—for instance, the fact that no male character in Austen uses the word “wedding” and no female character uses the word “marriage”!
Historyradio.org: When you began your studies of Austen, did you have to create your own methodology?
Professor White: We had to create coding that would properly reflect the complexity of Austen’s speakers: was the speech spoken or written? How many levels of speaker are in a given phrase (in one letter, for instance, we have the string of Mrs.Younge-told-Darcy-told-Mr.Gardiner-told-Mrs.Gardiner-told-Elizabeth-told-reader). But the flexible marvel of .xml already existed, and even more importantly, the program TokenX, designed by our team member Brian Pytlik Zillig (Professor at UNL’s Center for Digital Research in the Humanities), was at our service. TokenX determines unique frequencies of words and thus provides an easy-to-use interface for text analysis (especially through frequency tables) and visualization.
Historyradio.org: I have heard of similar studies on Agatha Christie, and that they were able to create a profile of her style. Is this your goal with Jane Austen?
Professor White: You can’t actually get to a full knowledge of Austen’s style, even by understanding her patterns of diction, because verbal irony (and its reverberations) can’t be caught mathematically—and her verbal irony is pervasive. But you can learn a lot about her use of free indirect diction, which is in turn important to understanding her style. One could make a profile about percentages of indirect diction, dialogue, and so on—but that would only be helpful to compare with other writers—or, using big data searches, comparing that data against the profile of such a thing as “the eighteenth-century British novel” or “Henry James” (the latter being an author who took Austen’s innovations with FID and ran with them about as far as they can be run). Our project may indeed do such things in the future—it’s the next obvious step.
Historyradio.org: If you had something resembling a profile, not only of her choice of words, but of the larger patterns in her plot construction, do you think a computer could emulate Austen? Could it produce a fake Austen, so to speak?
Professor White: You could perhaps create an Austen that could fool some people, but it wouldn’t be a good fake. Unless you can feed in her values (not possible) AND her education, including but not restricted to her reading (difficult) AND the operations of her irony (not possible), you’ll just get a partial simulacrum.
Historyradio.org: This new approach could be used to compare authors, and then detect larger patterns in literary and cultural history. Austen is of course a central figure in the development of the English novel. In the past, this has been studied by Ian Watt and others. Do you think we now could have a more empirical history of literature?
Professor White: I do think we can have more data that tells us interesting things about patterns of diction and clusters of tropes across large bodies of texts—a lot of people at UNL, such as my colleagues Steve Ramsay and Matt Jockers, do work on just that sort of thing. Matt for instance has very recently uncovered a lot of information about patterns among popular fiction, especially bestsellers. If we can design the right questions, we can find some interesting answers. But as I pointed out before, huge literary elements such as irony can’t be reckoned computationally, so a Theory of All Lit from digital humanities is impossible.
Historyradio.org: Gillian Beer, Arthur O. Lovejoy and others have specialized in detecting patterns from the history of ideas in fiction. Can a computer assist us in this type of study?
Professor White: This kind of work is my favorite kind of scholarship to read, that which finds the largest patterns in imaginative literature over the centuries. I’d recommend your readers go to Northrop Frye’s The Anatomy of Criticism (1957) for the best of such of work; Eric Auerbach’s Mimesis (1946) is also marvelous. For finding the largest patterns in the Bible, read Frye’s The Great Code (1982) (admittedly very demanding). And, yes, to some degree, computers can help with this kind of work, especially with discerning patterns of diction and plot (though in the latter case obviously the text won’t tell you its own plot—a human being has to schematize what happens and feed that information in).
Historyradio.org: Where do you see the field of digital humanities in 20 years or so?
Professor White: Moving upward and onwards. By the 90s, humanities had been somewhat exhausted following the usual roads of literary criticism—I don’t generally advise students to focus on Jane Austen, for instance, because it’s very hard to find room for an original thought. One way the humanities are being revitalized is with a much more stringent attention to history, and digital humanities plays a role here too by making it easy to read texts long forgotten, literary and otherwise. For instance, in my recent book on Carroll’s Alice books, I made much use of the texts in Carroll’s library of about 3,000 volumes. They were all auctioned off at his death, but catalogs of the library which have been produced by Jeffrey Stern and Charlie Lovett let one read his library cover to cover through Googlebooks, Hathitrust, and other such digitization initiatives. When read in detail, one finds this virtual library corrects many of the critical and biographical misperceptions about Carroll. And these resources are just a small part of how digital humanities is transforming literary studies: visualization, archives, data-mining all play a part.
Historyradio.org: Some people think that creativity is unique to us as humans, and may feel threatened by the fact that our “cultural soul” is gradually dissected by computers. What do you say to them?
Professor White: I’d agree that creativity is unique to humans, though some of the higher apes do seem to like to finger-paint. Computers can’t be creative—it isn’t possible. They can be programmed to make wild outputs, and we might think creative thoughts about those generated outputs, but there’s no creativity on the part of the computer involved. We are more threatened by computers in terms of surveillance; we are not at all private when we’re online, and big data (which doesn’t care about us as individuals) can nonetheless potentially be retrofitted to be small data, fingering us one by one. So people are right to worry about this—since human beings are in charge of computers, it is very unlikely that they will always be used for good (no other human invention has been).