Digital Humanities

You are currently browsing the archive for the Digital Humanities category.

Text Visualization with Paper and Yarn

“I am quite content to go down to posterity as a scissors and paste man for that seems to me a harsh but not unjust description.”—James Joyce

So, let’s say you took Gabler’s edition of Ulysses, photocopied each page of “Wandering Rocks” (episode 10) at 50% of its normal size, and then taped them all together. You now have one long piece of paper. Cut at the breaks between the nineteen sections of which the episode is composed and you have nineteen pieces of paper—one for each of the episode’s sections. The sizes of these pieces, of course, would vary; the first (describing Father Conmee’s walk and trip on the tram) would be the longest.

Next, grab some yarn and some paper clips (because they’re handy). Cut some short lengths of yarn. Tie a paper clip to each end. Now let’s have a look at the second sheet (“Text Interruptions”) in this Google Doc, containing a list of moments in “Wandering Rocks” where a recognizable line from one section intrudes in another. Take one end of your paper-clip/yarn device and clip to the line where a reference occurs; connected the other side to the area referenced (the Gabler edition has lines numbers; that is chiefly why we’re using).

When all is said and done, with some variability depending on how exactly you connect things, you might have something that looks like this:

Wandering Rocks, Visualized in Paper

(Check out this Flickr image for an annotated version of the same image.)

This is a sort of basic visualization of the connections between the sections of “Wandering Rocks.” Using scissors and some basic office supplies you can begin to get a grip on how, precisely, the various sections of the episode related to one another.

This only visualizes, however, one of the ways in which these sections are connected. Certain characters, for example, help synchronize the sections by appearing in more than one section: this is not visualized (perhaps we could highlight proper names which occur in more than one section; or connect them with a different colored thread). Location also helps synchronize the sections: Bloom, Stephen, and his sister all appear at the book stall, for instance. Maybe we could lay the nineteen pieces of paper out on a map (how would we handle the episodes where characters are moving?). You’ll notice that I haven’t even tried to make sense of the final, nineteenth section, where many of the characters see the cavalcade as it moves through Dublin (its movement appears in a number of other sections too). Have a look at the Google Doc to see my raw data; if you think you can improve it, email me (cforster @ virginia.edu) and I’ll happily add you as an editor for the document.

It is also worth remembering that the chief unit of analysis here is the “line” in the Gabler edition. But all “lines” within the narration are not equal in terms of the time narrated. So you can line up the sections based on the synchronizations within the sections; but these provide only a point of synchronization; you cannot extrapolate out beyond that point.

There isn’t too much to be learned from this very basic attempt to get a handle on the complexity of this episode. But it does seem interesting that sections tend to branch out—rather than, for instance, many sections all referring to one section (though this situation is precisely that of the final section, which I have ignored; and, in another, of Father Conmee’s walk which, through its geographical progression, may relate to other sections in ways I have ignored).

This yarn stuff is fun, but wouldn’t it be nice to have this digitally? Let’s take this and do it in processing. In trying to write up in code this same visualization, I think the chief lesson of playing with yarn is that there are essentially two key types of objects for this analysis: chunks of text (representable as a rectangle of length propotional to the amount of text they represent); and flexible connections between parts of the text (not necessarily between sections: a link could, theoretically, be within a single section).

These two types of things were instatiated as two basic classes in my code: textChunk and connection. A textChunk contains its starting line and ending line, its length (computed as the difference between those first two pieces of data; I keep it onboard rather than re-computing it constantly), and a quick description (stored as a String); each textChunk object also contains the coordinates of its current location on the screen. The connection objects similarly contain the points they link together (stored as simply two integers representing the two line numbers that are linked; we don’t even need x,y coordinates since we’re working with a basically one dimensional representation of the text here). There are also a handful of methods for these objects: constructors to load up the data (though the way the data is currently stored/loaded is an embarassment); some methods to draw the objects, etc.

Here is what is looks like, comparing my yarn visualization with my version in processing (not too bad, huh?):

Two Visualizations Compared: One Paper, One Digital

(In mapping things out, I got some of the inspiration here from my friend & colleague Jean Bauer‘s much more sophisticated tool for visualizing relational databases, Davila, also written in processing; originally I was simply going to gut her code and repurpose it here; but her code is far more elegant than mine, and is designed for situations far more complex. It made sense to just start from scratch.)

Each object bridges the gap between what it represents (which remains basically static) and the current state of its representation (which can be moved around and interacted with).

Wandering Rocks Visualized

The interactions are basic. You can grab each textChunk and move it around manually. Hovering over a block will produce a little description of that block in the white section near the bottom of the window. You can hit ‘a’ and the blocks will automatically align. That function isn’t working entirely perfectly yet, so I had to do manually massage things a bit to get them to look as these do above.

But as you move the blocks around, the connections stretch and keep the links between the sections evident. The blocks lined up on the right hand side are those without connections. (Oh yeah, those curved white lines; they’re my beginning of an attempt to mark the skiff’s progress.)

There would be other ways to begin visualizing “Wandering Rocks” (and I’d love to hear suggestions). And there are certainly ways to improve this one. One could attach the entire text (its available through Project Gutenberg y’know) of each section; though I’m not sure what the advantage would be of doing that would be. The colors just alternate now (for odd and even sections), to avoid sheer monochromatism. But the color of the textChunk could be tied to location or character; similarly the color of the connection could be made meaningful in some way.

I may post the code if I can get it cleaned up enough; if you’d like to see it in its current state, just email me, and I’ll chuck you a tar ball with everything as it stands.

What we’re playing with here is the tension between narrated time and narrative time. This neglects the entire dimension of space, which is central to the text of “Wandering Rocks” itself. In the comments of my previous post, crazymonk pointed to these maps from the wonderful Robot Wisdom site which is full of interesting Joyce material. The next step on this odd little project will be to continue to improve this visualization with an eye to moving towards a mapped visualization of the action of the episode. The simultaneity I’ve trying to visualize here is directly connected to the way the episode attempts to unify diverse locations. Bringing together a basic geographical representation of the episode’s action (and the action of the novel) with the concerns I’m tinkering with here, would allow this visualization to move from merely playing to something else I think… Of which, more anon (or, anonish).

Tags: , , , , ,

This is the first of (at least) two posts on Joyce, Ulysses, and simultaneity.

Throughout Ulysses, moments of synchronization occur which allow the reader to align the narrated events within the sequence of narrative time. Perhaps the first of these is the cloud which passes over Stephen Dedalus in the first episode, which reappears in fourth episode, allowing us to synchronize the events in the Stephen narrative with those in the Bloom narrative (references are to the Gabler edition, episode & line numbers):

  • A cloud began to cover the sun slowly, wholly, shadowing the bay in deeper green. (1.248 – 49)
  • A cloud began to cover the sun slowly, wholly. Grey. Far. (4.218)

The “Wandering Rocks” episode, however, is surely the most extreme of such instance. Its nineteen sections follow different characters around Dublin during a single period of time. (The time covered in each of the sections, however, is surely not equal or isomorphic.) Certain figures move through the episode allowing one to place the events in one section in relation to other sections: the wanderings of the “mad” (quick, someone, grab Madness and Civilization) Cashel Boyle O’Connor Fitzmaurice Tisdall Farrell, or the progress of the crumpled piece of paper stating “Elijah is coming” wending its way down the Liffey, or the blind stripling (who will continue to play this synchronizing role in the next episode as well). These characters movements provide some anchors. Places also help the reader begin to make sense of moving set of characters. Both Bloom and Stephen, for instance, end up at the same book stall (Stephen arriving after Bloom has departed). And if you knew the geography of Dublin’s streets, one imagines, one could locate the characters in relation to one another even more precisely (about which more in my next post). The most radical and disorienting technique of synchronization, however, are the small pieces of one section which intrude into another, allowing the reader a sort of anchor or bookmark (or even link) which indicates that these two events are occurring simultaneously.

While device of these fragments from one section appearing in another, divorced from their proper context, may be a bit disorienting, effect itself feels quite familiar. Though unusual in prose narrative, this sort of thing is quite familiar in film and television. What better way to wrap up a one hour crime drama than to cut in sequence from the criminal doing his perp-walk, the detective crossing his/her arms over his/her chest in satisfaction, and the victim doing some everyday activity which suggests the closure they now feel. (I am thinking here, primarily, of shows like The Mentalist or Cold Case, rather than, for instance, the Law and Order franchise.) A related example too complicated to really fold into this discussion is the infamous scene in Magnolia which plays with these conventions by offering a comparable survey of characters in a moment of time, complicated by the fact that the characters break the film’s diegesis in order sing along to Aimee Mann’s “Wise Up.

In film & television we understand intuitively the meaning and purpose of this device. “Ah, we’re seeing all these characters while they play this pop song so that I know the episode is ending, and all is right with the world again.” But what is its function in Ulysses and in “Wandering Rocks”? Nabokov, in passing, compares “Wandering Rocks” to the famous “agricultural fair” scene in Madame Bovary where Flaubert juxtaposes the fair with Rodolphe’s wooing of Emma. That sort of juxtaposition, however, feels (to continue my tenuous film analogies) seems more like the crosscuts between the baptism and the murders in The Godfather (the link is to the Spike TV page—the only place I could find this scene online).

Temporal simultaneity is not really the object of these contrasts though; instead they contrast the content of what is juxtaposed: Rodolphe’s lofty Romantic rhetoric and the crude materiality of the fair; the high ceremony of the baptism with the violence of Michael Corleone’s power grab. The 19 mini-narratives of “Wandering Rocks” do not serve to highlight some relationship between what happens in each of them.

Nor are these “Nineteen Ways of Looking at a City,” but one attempt to bring a single massive object with reach of apprehension. In some novels (an epistolary novel like The Expedition of Humphrey Clinker or in Faulkner’s As I Lay Dying or even The Sound and the Fury), one sees the same event from different perspectives; the effect can be comic or tragic. But the focus is ultimately on character and interpretation. This is not the case, however, in “Wandering Rocks.”

Instead, there is a sort of brute empiricism to the audacity of “Wandering Rocks.” It tries to occupy a sort of God’s eye view and simply report everything that is happening. In my reading, trying to keep track of where each character is located and what they are doing (and when) is like trying to troubleshoot a complex mechanism. You try to hold within your understanding the many different states of the mechanism, in order to follow its logic and diagnose some problem.

And maybe this “mechanical” feeling isn’t totally ungrounded or irrelevant. One source for Joyce’s interest in trying to capture simultaneous actions (perhaps) lies in technological developments. Consider a far less radical instance of Joyce’s interest in simultaneity: the close of “The Dead.” As Gabriel Conroy looks out the window after hearing his wife’s recollection of Michael Furey, Joyce writes in the story’s concluding paragraph (quoted in part):

It had begun to snow again. He watched sleepily the flakes, silver and dark . . . Yes, the newspapers were right: snow was general all over Ireland. It was falling on every part of the dark central plain, on the treeless hills, falling softly upon the Bog of Allen and, farther westward, softly falling into the dark mutinous Shannon waves. It was falling, too, upon every part of the lonely churchyard on the hill where Michael Furey lay buried.

Here too what Joyce is interested in simultanaeity—in using a single moment to bring together different locations. Here, of course, the snow which falls alike on the “living and the dead,” brings together Gabriel Conroy with the memory of Michael Furey, upsetting Gabriel’s sense of himself and his marriage in the process. But Gabriel’s ability to experience this connection to Michael Furey’s grave is mediated by the newspapers which forecast “snow… general all over Ireland.” Weather forecasts were themselves a product of the use telegraph communications to pass observations about weather conditions elsewhere. It has been suggested (I think; no reference ready to hand) that the telegram offers one source for the modernist fragment. Equally important, however, is the recognition of the changed spatio-temporal awareness introduced by telegraph—a phenomenon the manifestations of which may not always be obvious. (Are such concerns, we might ask, registered in “Proteus” where Stephen wonders about “space[s] of time” and “times of space,” trying to reconcile Kant’s two forms of pure intuition). In “Wandering Rocks” the effects of spatio-temporal collapse effected by the telegraph become the object of the prose itself, which in its dislocated fragments telegraphs points of synchronization between the nineteen sections.

(So, yeah; that last paragraph was some pretty reductive, technological determinism. You can figure our your own caveats, right?)

Outside prose, of course, the representation of simultaneous relationships that Joyce attempts in “Wandering Rocks” is much easier. In the next post (promised, dear reader, before week’s end) I’ll share my bumbling attempt to do precisely that in processing. Until then, two screenshots of my still-in-progress attempt.

Visualization 1 of "Wandering Rocks"

A first visualization.


Visualization 2 of "Wandering Rocks"

A second visualization; this one interactive!

Tags: , ,

Tux knocks Caravaggio's St. Paul from his Horse.

This month Ubuntu will release version 10.04 (Ubuntu versions are released on a 6 month cycle; version numbers are [year].[month]), Lucid Lynx (they’re also alliteratively named after animals). (A name which is at least somewhat unfortunate for confusing the namespace of the text-based web-browswer Lynx [just fyi, you can use Lynx with twitter if you use the mobile version; very cool]). For most folks this is not a huge deal. But as a Linux user who relies on Ubuntu for nearly everything, it’s a happy day indeed (I also use CrunchBang [an ubuntu variant] on my eee and straight Debian on a beatup old Pentium III for various things).

To celebrate, or commemorate, or at least recognize this, I thought I’d write up a quick(ish) account of my own conversion to linux— the tale of how a somewhat geeky humanities grad student fares relying exclusively on linux. I know from twitter that I am certainly not the only person with a humanities background who runs Linux. And some of you other folks are probably much savvier Linux users than I am. Having run nothing but Linux for more than three years now, however, I think I can offer an honest assessment of it as desktop experience. Despite what the image above (created with five minutes of work in GIMP) may suggest, I don’t have a Pauline zeal for converting non-Linux folks. If anything, over three years, I’ve grown more pragmatic in my attitude towards operating systems. I use Flash (rather than Gnash), I listen to mp3s (rather than oggs)—I have chosen, in short, pragmatism and usability over virtue. What follows is an honest assessment of my experience from the perspective of such pragmatism—not an argument in favor of the virtues of free software (which I trust you can find better articulated elsewhere), nor an entry into the debate between “free” and “open-source,” nor the larger debate between proprietary (read: Apple) and other models. I’ve run Linux exclusively for three years. I love it—despite some of the blemishes which remain from an end-user perspective. Let try to explain why.

Linux is gaining no small amount of attention because of Google’s Android OS and Chrome OS, both of which are based on Linux. (And plenty of other stuff runs Linux, from devices like the Roku Box to D-Link’s Boxee Box to many more). My own love affair with Linux though started when we were both a little younger. And it began with rather disasterous results. It must have been the mid-1990s, and I had received the massive Linux Bible as a holiday gift. I read about installing LILO as a boot-loader, and with that little bit of knowledge I set about trying to install Yggradsil Linux on the family computer. I was entirely too successful. After an hour of work the family computer would boot Linux… and nothing else.

For the sake of the other computer users with whom I then shared a home, I took some time off then from Linux. After high school I was briefly (for exactly 2 quarters) a computer science major at Worcester Polytech. The computer I brought with me to my dorm was a trash-picked 486. Over the course of that year I tinkered and learned enough to finally get X-Windows running on it. That’s it; plenty of tinkering and reading about xorg.conf, and I never even had internet access on the thing (which, at the time, was a separate cost from the dorm room itself anyway.)

After that I let Linux fall by the wayside. But a little more than 3 years ago my desktop stopped booting after a motherboard failure (an exploded capacitor). I managed to fix my desktop computer (new motherboard; I wish I were cool enough to repair the motherboard by simply replacing the popped cap) only to realize that I had long-ago lost my Windows XP disk and the new motherboard and installed OS were not playing nicely together. So, I figured, why not try Linux again, if only until I could dig up an XP disk? A quick download of Ubuntu later, and I was booting a live CD. That popped capacitor may very well have been my road to Damascus experience. The installation was easy, nearly everything just worked (“nearly everything”; I was using a PCI WiFi card with, what I would learn was, a Broadcom chipset; that was a bit of a hassle…). The computer was soon running better than it ever had, and for a while I talked about the virtues of Linux with everyone I bumped into. I even got into the unfortunate habit of digging up obscure old hardware just for the fun of installing Linux on it.

That early enthusiasm has waned and been tempered by some of the drawbacks I’ll mention below. But Linux’s virtues are many.

  • Linux is Fun—If you feel like spending time customizing your desktop, Linux offers more options than either Windows or OSX; by separating one’s GUI from the OS itself, there are a plethora of options: the standard desktop environments, Gnome and KDE; lighter weight alternatives like Xfce, Fluxbox, and Openbox; or truly minimalistic options like Rat Poison. More generally, if you are at all geeky, it is fun to run Linux. The amount of under-the-hood stuff that is available to tinker with makes Linux a real hobbyist OS. When something doesn’t work, you can always find a reason (if not a solution; e.g. I am running 32-bit Ubuntu on a 64-bit Athlon chip, for instance, and right now at least I can’t run Adobe Air Apps—like Tweetdeck—b/c of some obscure problem).
  • Linux is Free (as in Beer)—Um, there isn’t too much to this one. As end-user, download and go. $0.
  • Linux is Free (as in Speech)—This is probably the best reason to run Linux. The precise politics of OSS strike me as rather ambiguous—an odd mixture of ideologies that can alternately sound libertarian or socialist. Of course, Linux is a big tent and I think it is impossible to ascribe any coherent “politics” to an operating system. The tension is perhaps well captured in the figures of Richard Stallman (who advocates open source software on grounds I don’t think it excessive to describe as deontological, and with a zeal that approaches the religiousness that I have taken as my playful conceit here) and Eric S. Raymond whose essay on the The Cathedral and the Bazaar offers an apologia for open source in terms that are more utilitarian and market oriented.
  • Linux works—Linux is very stable and wonderfully snappy. With the exception of a few blips during installation, I’ve never had any of my Linux machines crash or become completely unresponsive. (I have had occasion to ctrl-alt-backspace out of an x-server that wasn’t… living up to expectations; often when playing with linux eye candy). Software installation on Linux is, unless you have to build something from source (which is quite rare these days), absolutely wonderful. Debian’s APT is the way all software installation should be done. And the amount of software available to tinker with, for free, is pretty impressive.

Which brings me to the drawbacks. Linux is a great choice. But it probably isn’t for everyone. I’ve heard others describe Linux as the ideal OS for folks on either end of the computer use spectrum—serious geeks and absolute neophytes. The former will be empowered by Linux; the latter will have a stable, unbloated OS to just get things done. But, according to this argument, a large chunk of users at the center of the bell curve, from basically well informed folks to so-called “power users,” will just find Linux frustrating. Consider my parents: I often wish that my mother, who is using a very old laptop to email and browse the web, were running a stripped down Linux rather than Windows XP. Her computer is almost unusably slow; with Linux it could be peppy and wonderful and she could do all the things she currently does with greater ease. But my father, who relies on various pieces of software for his job and has a long established familiarity with Windows, would find the experience maddening.

I think there is more than a little truth in this suggestion. But I think it may obscure as much as it illuminates. Let me try to unpack it a little.

Much of the frustration a “power user” (that term just drips marketing-speak) is likely to find in Linux comes from the software that is (not) available on Linux. If you want to use Photoshop, no luck. If iTunes organizes your life, you’re out of luck (though for my money Amarok 1.4 is a much better music player/organizer anyway). If you use any of those nifty pieces of Mac software (DevonThink, Things, Scrivener… the sort of stuff I am always envying when I read about it on Prof. Hacker), well you’re out of luck too. Same goes for Microsoft Office (and though it may sound like heresy, I quite like Microsoft Office 2007). (Oh, and I won’t even touch the issues with games on Linux…)

That said, the software available on Linux is more than sufficient for all my purposes. I use Chrome and Firefox for my browsers; I use OpenOffice for most of my word processing and spreadsheet needs; for music I am a partisan of Amarok version 1.4 which works very nicely for podcasts and plays nice with my iPod (which was a gift; it is a great little device, but in the future I would choose a non-Apple device… though there is always iPodLinux or Rock Box). Skype works without a hitch (nearly; my webcam color goes wonky at random intervals), as do the Amazon MP3 downloader and the Jungle Disk client. F-Spot is a great photomanagement program (though, to be honest, I’ve never really used anything comparable on another OS). Tomboy Notes is great, as is Tasque (which provides a simple desktop client for Remember the Milk). Gnome-Do (a Linux equivalent of Quicksilver) is also absolutely necessary. Conky is the best system monitor for any OS I know of (though I’ve heard OSX folks mention Geek Tool). Without adobe AIR, for now, I don’t have a great twitter client.

The customizability of Linux, which makes it very attractive, can also make it rather frustrating. Eric Raymond’s description of Unix captures this nicely:

the most enduring objections to Unix are consequences of a feature of its philosophy first made explicit by the designers of the X windowing system. X strives to provide “mechanism, not policy”, supporting an extremely general set of graphics operations and deferring decisions about toolkits and interface look-and-feel (the policy) up to application level. Unix’s other system-level services display similar tendencies; final choices about behavior are pushed as far toward the user as possible. Unix users can choose among multiple shells. Unix programs normally provide many behavior options and sport elaborate preference facilities.

This tendency reflects Unix’s heritage as an operating system designed primarily for technical users, and a consequent belief that users know better than operating-system designers what their own needs are. . .

But the cost of the mechanism-not-policy approach is that when the user can set policy, the user must set policy. Nontechnical end-users frequently find Unix’s profusion of options and interface styles overwhelming and retreat to systems that at least pretend to offer them simplicity.

I’d love to be able to say that I am one of those “technical users,” for whom the pretended offer of simplicity has no appeal whatsoever. I’m not. The “mechanism-not-policy” approach has a steep learning curve. And sometimes I have Mac envy.

But I like Linux. A lot. Philosophically, I believe in free software. And after three years, I do things much more efficiently in Linux than I could anywhere else. The phrase “it just works” is often offered as a virtue above all others in discussion of computers and usability. And thankfully, at this point, many things indeed “just work” in Linux (you don’t need to manually mount a USB device after you’ve plugged it in anymore). But if you want an OS that smoothly disappears from view, Linux is not it (yet). But maybe it shouldn’t be. There are virtues other than invisibility in an operating system. Some amount of resistance forces the user to better understand his tools. That is the current state of Linux—it may not just work, but it works just as well as you need it to. This is not to say it is good enough; it is as exactly as good as you’re willing to make it.

So, interested in giving it a try? Bryan Lunduke, of the Linux Action Show did a nice job comparing Ubuntu 9.10 with Windows 7 and OSX a while back (though he may have his thumb on Ubuntu’s side of the scale… just a teeny bit). Or why not just download a live CD and have a gander for yourself? Booting from a Live CD lets you see how your hardware is going to behave with Linux without making any changes to your hard drive. If you’re looking to learn more, check out Chess Griffin’s wonderful Linux Reality podcast. Intended for the complete newcomer, Linux Reality is simply wonderful; by the end (Chess discontinued the podcast after a year, once he felt like he’d done everything he wanted to) he gets through some pretty great stuff (setting up a LAMP server, using screen, etc). And Chess is in many ways the perfect guide—a passionate Linux advocate, a wonderful sharer-of-enthusiasm, and not a professional developer or system administrator (meaning no offense to these august professions, but they might not be the best guides to a new OS for folks who aren’t already pretty well informed).

Happy Tinkering.

Tags: ,

So, lately I’ve been writing about my interest in the changing semantics of the word bitch, trying to pin down when it went from a term meaning primarily “female dog” to being primarily an obscenity. I still don’t have a good answer. In this post I’ll try to explain why.

Along the way I’ll talk about:

That might seem like a lot of stuff, especially if you’re the poor soul reading this; those links above can hopefully get you where you might be interested in going (and of course, there are more entertaining places on the internet anyway you know).

I need to make very clear here though that, despite all that follows, there is nothing even approaching an answer to the question with which I began in what follows. I will share a sort of dummy visualization of the changing semantics of “bitch,” but it is worthless as an answer to the question with which I started—G.I.G.O..

Last time on “Mining Obscenity”. . .

So, as I’ve written about before, I became interested in tracing the changing definitions of the word “bitch,” of trying to get some idea about when the shift occurred from bitch being used in print to mean “a female dog” (and, I learned, sometimes other animals) to its being a (mildly) obscene obloquy. (This is, of course, just one change in the term’s meanings—more recently, for instance, one could chart the way the term comes increasingly to be used by men to emasculate other men.)

But I think (and hope) the general premise is clear enough: to provide some sense about when, historically, the obscene/derogatory meaning took precedence over “female dog”, at least as reflected in print (which itself raises questions about how one gets a historically valid sample, et cetera, et cetera).


Exploring Project Gutenberg

So one source of textual data is Project Gutenberg. The amount of data ready to hand at Gutenberg, as well as its availability in vanilla plaintext, has made it attractive to folks doing stylometric analyses. And the very kind folks at Gutenberg have even included a very helpful way of getting all their ebooks. (Zipped up, that is something like 14.5 gigabytes according to the PG website).

Project Gutenberg also makes available its catalog data in one big RDF file. As a preliminary step I decided to start with this catalog file just to get some sense of the distribution of texts in the Gutenberg archive. So, using Python to extract data from the RDF and Processing to visualize it, I produced this picture of the distribution of texts.

Graph of Authors in Project Gutenberg

Authors in Project Gutenberg

Each gray horizontal line represents the lifespan of an author who has at least one work in the Project Gutenberg archive. Authors with more than 50 works in the archive get more than a line—they get a box with their name in it. These “major authors” are then color coded: authors with more than 150 works in PG get a red box; authors with between 100 and 150 get a blue box; and authors with more than 50, but less than 100, get a green box. The lines are stacked (using a very crude algorithm; “major authors” aren’t stacked the same way—they’re just chucked at some height), so that the height of the stacked lines gives some insight into the number of authors writing at a certain period.

It isn’t especially pretty (and some boxes are less visible because they have been drawn over), mostly because my programming ability is pretty limited. But it offers some insight into the historical distribution of PG’s authors. There are a lot of authors in the nineteenth/twentieth centuries, because the novel (with the predictable exception of Shakespeare) dominates PG’s holdings. (I’ve focused on the period from 1500 – 2000 here; PG includes some works in the period before 1500—some translations of the classics, some Li Po, some Confucius, and so on, but not too many by comparison).

But there are still lots of problems. If you were paying attention you’ll note that I said that authors with more than 150 works in PG get a red box, which would seem to suggest that Shakespeare was even more prolific than you remembered. This inflated number is because PG’s Shakespeare holdings include a number of different versions of each of Shakespeare’s plays, translations of some of them, as well as a version of The Complete Works. So what gets tallied up as a “work” is not really a work. (Of course what exactly defines “a work” —how we define its unity and its singularity—is just one more of those thorny questions that I’m trying to shunt aside to get some heuristic peek into literary history.)

This is (I hope) somewhat interesting, at least as a glimpse into PG. But if you’ve been paying attention you should be asking—by now you’re probably screaming in frustration—why are you visualizing the lifetimes of authors rather than publication dates of individual works? Well, that is simple. PG’s catalog data does not include publication date in its catalog data. (For that matter, it doesn’t include any data about what edition a particular etext represents at all).

Well, that’s certainly a problem.


But what if we just ignored all that: Visualizing the Semantics of Bitch (with Bad Data)

Okay, so that’s a problem. But let’s say we ignored this problem and tried to forge ahead anyway. Maybe you could take Gutenberg’s textual data and get metadata about the works from some other source. Great idea! But this solution proved more difficult than I could easily manage.

Well you could always just make the data up. Let’s just take each author’s birth year, add it to the year in which s/he dies, and divide by 2, effectively assuming that each author produced all of their work in one great burst of creativity midway on life’s journey.

This would be an assumption so ugly as to call any resulting visualization severely into question, as least in terms of its philological accuracy. But as proof-of-concept, I decided to make the assumption anyway.

So, after waiting for the massive 15 gig-ish download of PG’s etexts, how would one proceed? Well, I imagine that there are other ways to approach this, perhaps better ways, but I used used rgrep to search all the files for instances of the (case insensitive) string “bitch.” Using arguments you can have rgrep return a line on either side of the occurrence of the searched for term. The results will look something like this:

./etext97/itwls10.txt-4218-of the stag; but, partaking more of the nature of the domestic than
./etext97/itwls10.txt:4219:of the wild animal, it remained with the herd of cattle.  A bitch
./etext97/itwls10.txt-4220-also was pregnant by a monkey, and produced a litter of whelps
--
./etext05/8cptm10.txt-62244-"Yes; yes, by the stitching 'tis plain to be seen
./etext05/8cptm10.txt:62245:"It was made by that Bourbonite bitch, VICTORINE!"
./etext05/8cptm10.txt-62246-What a word for a hero!--but heroes _will_ err,

Above are two results from such a search, the middle line of each contains the searched for term. At the beginning of each line is the file in which the grepped-for term occurs, followed by the line number, and then a line of text. Pipe all those results into a text file and you have the raw material you need. The file info (by way of etext number) can connect the text to its entry in the RDF catalog (and thence to the author, title, and birth/death date info).

Determining the meaning of “bitch” in these passages though is not an easy task. One can imagine a machine learning solution—but on such small samples it seems unlikely to work well and would introduce a whole other level of complexity. You could try simply searching for selected key terms within a certain proximity of the occurrence “bitch” (like “dog” or “litter”) and come to a conclusion based on the result. But since the number of results was relatively low (around 400 results), I thought it would be easier and better to just do it manually. To ease the task I wrote a quick Python script to display each extract and accept as input a number (0 – 4) to classify the term. Here is what it looked like:

There are certainly other ways to break up the meanings, but after surveying the data this seemed sufficient. With this scheme, one could skip an entry if it was a false positive (for example, the name Bitchov or similar—there were actually a couple of these). I ranked “son-of-bitch” separately only because it occurred so frequently that it might be worth keeping an eye on it (as a specialized instance of the range of the term’s obscene meaning); and I left open the possibility of ranking a term as “ambiguous” since, even with 3 lines of context, the term’s meaning might not be obvious. (By keeping ambiguous results separate from false positives, “0″, one could go back and grab more context to resolve the ambiguity).

So, for a couple days I left this simple program running. Whenever I had a few free minutes to do some simple classifying while talking on the phone or waiting for water to boil, I classified some occurrences of the term “bitch.” Once all of them had been classified and the output written to a file, it was time to return to Processing to try to visualize this. After some futzing around, here is what all that bitch data looked like.

Visualization of the Relative Obscenity of "Bitch"

Let me first reiterate that this visualization does not really show anything—that the data it represents is fundamentally flawed. As I noted above, because date of publication was not easily available, the dates used here are effectively inventions. (They are accurate within a tolerance of, say, half three score and ten.) Moreover, even with all that text downloaded from Gutenberg, we still have a pretty small number of points to draw any conclusions from. (You’ll note that, for purposes of visualization, I’ve grouped occurrences by the decade in which they occur, fudging the dates still further). And, as if that weren’t enough, let’s recall that the same “work” can appear more than once in PG leading to double-counting. (I went through the data by hand to try to remove these, but I could have missed some).

So, this sure seems like a long blog post for a useless visualization, isn’t it?

Well, here is what I like: this visualization divides the two meanings of bitch horizontally—points appearing below the center, horizontal line represent instances of the term being used in its obscene sense (the color-code gives some further insight into how these break down using the 4-part division discussed above), points above the line represent instances where the term is used in a non-obscene way (to mean “female dog”). This is simple, but has the advantage of allowing that both meanings might be equally available, or available in some mixed proportion, at any historical moment. With a larger data set, and with correct publication dates, this seems to me to be a elegant way of answering the admittedly amorphous question with which I began (though I’m certainly open to criticism of this entire approach).

It could also be improved upon. You could keep in memory the text samples from which these points were derived so that one could mouse-over each point and get data about what author/work that point represents, a keyword in context sort of view, and even a link to the full-text.

With a sufficiently complete data set, I would expect expect that we’d see that, during the twentieth century, the occurrences of the term as obscene would greatly increase while the occurrences of the term as meaning “female dog” would decrease. Exactly where the obscene meanings takes precedence would be the interesting thing to know. (Indeed, it is the thing I was interested in originally.)


A Final Thought

While I want to stress once again that this exercise in digital philological visualization (does that sound suitably buzzword-worthy to win me a prize of some sort?) fails, it fails because the data is not readily available; to get a meaningful result would require more, and better, data than is available from PG at present. (I’ll be putting this little toy problem on the back burner now, but would be interested in exploring other sources of data—Google Books is the obvious choice, but after spending some time playing with the Books API, I’m not sure the necessary data is currently available [nor am I confident that such a use is even permissible within the terms of use]) .

If you will grant that ferreting out the historical contours of the changing uses of the term “bitch” is worthwhile (and maybe it isn’t; perhaps this whole post reeks of sheer pedantry), a visualization like this one seems to illustrate that change (or at least one aspect of). And if you’ll grant all that, there is a final point worth making. This sort of visualization answers the question posed simply and without oversimplification, but it is tailor made to this particular problem. This recalls something I recently read on the Humanist discussion list in a message by Richard Lewis. He wrote:

…I’m increasingly of the opinion that end user application style software is not really what scholars who are serious about exploring the possibilities of using technology to enhance their research or open new avenues of research require. Rather, I’m beginning to feel that a good grounding in programming, a simple, expressive language, and good provision of libraries for abstracting over data encodings and difficult algorithms required in each discipline will be much more conducive to interesting computational scholarship.

The things that make computational scholarship interesting can’t, I think, be packaged up in an end user application. Like scholarship conducted in any paradigm, computational scholarship is interesting and worthwhile when it’s exploratory. But the restrictions of an end user application seriously stiffle any possibility for exploration.

Such a statement has the potential to stir up a debate I’ve seen elsewhere about whether “Digital Humanists” should learn to program, which I have no interest in doing. Nevertheless, at least for tasks like the one I’ve (painfully) described here, I think the perspective Lewis describes is helpful. Insofar as I even made half a stab at solving this little riddle, it is because of the availability of a set of tools that are easy enough to be picked up by a nonspecialist, but supple enough to be used in unanticipated ways. In particular I would single out Python, the Natural Language Toolkit, and Processing. As has been been noted elsewhere, Python’s simplicity, makes it fun to work with and perfect for these sort of problems. In addition to Python’s native facility with strings, the NLTK makes all sorts of text analysis tasks (frequency counts, etc) very simple (and it is all wonderfully well documented). And Processing does for visualization what the NLTK does for text analysis.

Using them as I have here produces an admittedly heterogeneous solution, cobbled together out of what one can learn on the fly (biggest challenge—figuring out SAX processing to handle PG’s massive RDF catalog file). One could simply do everything I’ve done here using rgrep, Python, and Processing, within a single language: there are graphics libraries for Python, and one could do all the string/data manipulation by way of Processing (perhaps with some help from native Java libraries). But it seems that using a language in a task-specific way provides a helpful midway point between spending too much time trying to learn how to code, and just waiting for the exact right tool to appear (in this case, the obscene-semantics-historical-separator—surely it’s next from Google Labs).

Tags: , , , , , ,

In a great, collectively authored post at Profhacker, Janine Utell observes the comparative dearth of tweets concerning our shared field. “There was a silence, a whistling void where there should have been voices: where were the literature folks, people doing research, giving and listening to papers in my area? Where are my fellow modernists, commenting on what we were all learning at the convention?” I felt a particular pang of guilt.

While I attended plenty of panels at MLA this year, I didn’t tweet too much (that is, at all). While there are technological reasons for my relative silence (anyone want to give me a Droid?), the primary reason I didn’t tweet more is because of my assumption that only Digital Humanities folks follow MLA on twitter (not, say, modernists). I mean, what is the sense of tweeting a panel on Pound into a whistling void? But, of course, if there was a whistling void it is at least partially (though probably only partially) my fault. As Janine’s comment makes clear, there are actually a good bunch of modernists already on twitter. So, inspired by Janine’s comment (and the excellent write-up of the Legacy of David Foster Wallace Panel by Kathleen Fitzpatrick, which I was sure had me up at 8:30 Wednesday), here is a quick digest of my notes from MLA, with all snark and doodles redacted, made in atonement for my silence.

I haven’t included all my notes for all the panels I attended. I should note that “653. Cognitive Approaches to Literature: Are We Beyond Science Envy Yet?” was, with the Freud panel described below, the best attended of the panels I sat in on (I’ll leave the implied contrast between these two panels to your imagination); but I was primarily interested in trying to understand exactly what a “Cognitive Approach to Literature” would be, that I didn’t really take any notes. Should any of the mentioned below find this page and wish to amend/change/contest anything I say below, please let me know in the comments. I’m happy to amend the post as needed. Indeed, going over my notes, I learned that I probably need to take better notes in the future… But if nothing else, these notes may, through the deliberate serendipity of Google, allow some folks to find one another.

Panels

  • 150. Unboxing Modernism: Beyond the Divides”: Introducing this panel, Melba Cuddy-Keane provided a brief outline of the development of modernist studies from 1970s to the present, from the consolidation of definitions of modernism in terms of formal experimentalization, to the recognition of the exclusions of such formulations (broadly speaking, this narrative seems applicable to literary studies as a whole). Our own period, she suggests, is one of refusing of closure—of attempting to keep the very definition of modernism open. The panelists, she suggests, offer visions of how this might be achieved.

    Broadly speaking, the panelists seemed to split into two groups: Ann Ardis and Michael Leja were interested in locating modernism within a larger frame of cultural reference, taking modernism out of the hermetically sealed “box” of high culture (to use the somewhat abused metaphor dominating the panel). Leja was interested in showing the similarities between modernist art (construed broadly enough to include abstract expressionism) and larger developments in visual culture. Ardis discussed periodical studies as providing one avenue that can enrich our understanding of the period, by forcing us to return to the complexity of the primary source. She mentioned anonymous/pseudonymous/collective authorship, and the complex international circulation of such periodicals, as obvious areas of interest. Anita Patterson and Steven Yao were interested in challenging the geography of modernism, locating modernism within a transnational framework. Patterson’s work focuses on modernist poets connected to the Americas (Jules Laforgue, St. John Perse, Wilson Harris). Yao’s work focuses on the Pacific, particularly with the fascination of some modernists with translating works they could not really read (all those poems “from the Chinese”).

    In the comments, the provocative question of whether “modernism” was even a valuable term anymore was raised. Panelists did not seem to come to any consensus about this important question, and (alack!) the panel ended before it was fully pursued.

    The panelists also provided a helpful run down of some of the most interesting recent works in modernist studies. Among the works mentioned were:

    • Christopher Bush, Ideographic Modernism: China, Writing, Media(Oxford Univ. Press, 2010)
    • Pacific Rim Modernisms, edited by Mary Ann Gilles, Helen Sword, and Steven Yao (Univ. of Toronto Press, 2009)
    • Lesley Wheeler, Voicing American Poetry: Sound and Performance from the 1920s to the Present (Cornell Univ. Press, 2008)
    • Pericles Lewis, Religious Experience and the Modernist Novel (Cambridge Univ. Press, 2010)
  • 235. Law and the Modernist Atlantic: These three papers all considered some aspect of modernism’s (broadly construed) encounter with “the law.” Lisa Fluet’s paper “‘Liberal Fascism,’ Human Rights, and the State: On H.G. Wells” pursued the imagining of the state in the work of H. G. Wells. While the state has tended to be an object of critique in leftist and Foucauldian narrative, Wells’s narrative, she suggests, offers a way of imagining the state more positively. Unlike figures like Henry James or Virginia Woolf, concerned with recording subjective experience (“how the world feels”), Wells offers something like a “novel of information” (Fluet here borrows James Wood’s term, describing the contemporary novel) concerned with describing how the world actually works. For Fluet, Wells’s work offers an important opportunity to do what the novels of James and Woolf cannot do—imagine the state.

    Kelly McDowell’s “The Perverse ‘Look’ of the Law: Ulysses and Obscenity” offered a close, theoretically informed reading of “Nausicaa” episode. The episode’s representation of Gerty MacDowell and Leopold Bloom demonstrate the perversity inherent in the law itself. The normativity of the law itself, in the interacting gazes of Bloom and Gerty, undermines itself. McDowell closed by reading the logic of the “Nausicaa” chapter into the obscenity trials that it sparked.

    Thomas Cohen offered a fascinating look at Kathy Acker’s literary appropriations, and the legal controversy, by looking at Acker’s text “Dead Doll Humility.” Drawing on Lyotard’s notion of the differnd, Cohen traced the conflict between experimental writing and intellectual property in Acker’s work. Cohen helpfully quotes Geoffery Bennington on Lyotard: “an accusation of theft might well also involve a diffénd, if one of the parties does not recognize that the object in question is a legitimate object of property.” Such, Cohen suggests, is the case with Acker’s appropriations/plagiarisms of four pages of Harold Robbins’s The Pirate in Acker’s “The Adult Life of Toulouse Lautrec” (“Dead Doll Humility” responds to the controversy which followed this plagiarism).

  • 294. The Death of Freud?: This was the most crowded panel I attended. The second panelist was unable to attend because of illness, allowing Jean-Michel Rabatè to speak at length. His paper, entitled “What is to be preferred, Death of Obsolecence?”, provided a fascinating meditation on the place of death in Freud’s work. Rabaté began by contrasting his two titular terms, obsolescence being a sort of incomplete, unsuccessful death. Freudian psychoanalysis, by midcentury, had been co-opted by a “weak adaptive culturalism,” (what Lacan decried in “ego-psychology”). Adorno and Lacan both sought to save psychoanalysis from this fate (cf. “In psychoanalysis nothing is true except the exaggerations” – Adorno; the entire Lacanian rereading of Freud). In this regard, death, indeed, seems preferable to obsolescence.

    From here Rabaté moved to a discussion of the changing place of death in Freud’s work. This preoccuptation with death begins early, in a set of letters written in Spanish to Edouard Silberstein. Indeed, Freud seems tohave taught himself at least passable Spanish in order to conduct this correspondence, which was inspired by Cervantes’s Dialogue of the Dogs. The letters are interesting because they are structued by an injunction similar to that of free association. They also, however, feature a prohibition on describing death (one is not to say that “One has died”; substituting instead some sort of euphemism). This correspondence, with its anticipation of free association, the obvious importance of language (it was conducted in Spanish), and its vexed relationship to death, provides a model for a set of issues which will continue to constellate in interesting ways throughout Freud’s work. (In its ambition to trace key themes throughout Freud’s work, and with death in particular, Rabaté’s talk reminded me frequently of Laplanche’s Life and Death in Psychoanalysis).

    This concern with death puts Freud in dialogue with the better part of nineteenth-century German philosophy: Hegel, Schopenhauer, Nietzsche. Rabaté continued to the famous discussion of death in Beyond the Pleasure Principle, where precisely the question of the origin of death is broached explicitly. Is death an internal necessity or is it merely imposed from without? (This is the key question about whether a “death drive” exists.) In closing, he briefly discussed a thinker (about whom I knew nothing) who most emphatically believed that death was not a necessity: the Russian philosopher Nikolai Fyodorov, a resolutely anti-Heideggerian (and somewhat crazy) figure for whom death is a purely external phenomenon to be resisted (he proposed, as humanity’s key project, resurrecting everyone… yeah).

    In the Q&A period questions returned to panel’s titular question, trying to think about Freud’s continuing relevance through the psychoanalytic categories of mourning/melancholia, or the Derridean notion of ‘hauntology’ (preserving “a specter of Freud”). Rabatè responded by trying to move past these oft-referenced categories. “There is no ontology of psychoanalysis,” he insisted. Ontology itself is not a Freudian category; the concern with language, the concern with the Other in us (the work of culture), that is Freudian.

    (Oh, and I learned that Freud, like W. B. Yeats, had had a vasectomy—or Steinach procedure—in the belief that the procedure had rejuvenating effects.)

  • 364. D. H. Lawrence’s Short Stories: Beth McFarland-Wilson’s paper, “A Family Systems Interpretation of ‘Horse Dealer’s Daughter’” offered a reading of Lawrence’s story from the perspective of “family systems theory.” This approach allowed McFarland-Wilson’s reading to understand the story outside the terms of Oedipal desire that predominate in existing readings. Carrie Rohman’s “Ecology and the Creaturely in ‘Sun’” draws on Merleau-Ponty and reads the character Juliet, in “Sun,” as an experiment in the role of the irrational and the creaturely, a flight from humanism to a view of the subject as ecologically situated. The streaming “dark flow” between Juliet and the sun captures well the relationship between the body and the world that Merleau-Ponty describes as the “flesh of the world,” the perceiving body that is at once part of the world it perceives. Pamela K. Wright’s paper, “Till Death Do Us Part: The Implications of Illness, Disability, and Death on Love and Romance in Lawrence’s ‘The Blind Man’ and Somerset Maugham’s ‘Sanitorium’”, explored the role of disability in the two stories. Lawrence’s ‘The Blind Man’, Wright suggests, offers a more complex and sympathetic representation of the disabled body than that, for example, of Clifford Chatterley, whose disability comes to symbolized a broader cultural impotence.

  • 513. Joycean Materialities: Christy Burns’s “Circean Sense: Phenomenology in Joyce” examined the representation of sensate experience in Ulysses. Burns suggests that Stephen (on the beach in “Telemachus”), and Leopold Bloom on the beach later in Ulysses, offer two different attitudes toward the object world (Stephen’s disdain of brute materiality and Bloom’s immersion in the sensual world). “Circe,” in which objects themselves take on a life of their own, dramatizes the tension between these two different attitudes. David Earle’s fascinating “James Joyce, Gently Used: Republication and Dissemiation of Popular Modernism” contested the fetishization of the first-edition, to suggest that pulp editions of modernist works served a too-often ignored role in popularizing these works. Earle shared many fascinating popular versions of modernist texts, including an appearance of Joyce’s poems in American Girl (the periodical of the American Girl Scouts), and even mentioned the pulp edition of Bubu of Montparnasse, with an introduction by T. S. Eliot (which I’ve mentioned here). Sean Latham’s “Joyce’s Dirty Work” took as its object of analysis the literal dirt of “dear dirty Dublin” as an especially valuable way about thinking about the mongrel nature of the Irish nation as it emerges in the age of what Ulrich Beck calls the “risk society.”

  • 588. Copyright and the Modernist Atlantic: Versions of the three papers from this panel will all appear in a forthcoming volume, Modernism and Copyright, edited by Paul Saint-Amour. Robert Spoo’s “Copyright Deformations and the Transatlantic Publishing Scene” offered a historically rich account of the complex ways copyright impacted modernist literature. The US in this period (and until 1989) did not participate in the Berne Convention which establishes international copyright standards. Instead, to claim a copyright in the United States, a book had to be published/printed in the United States (the “manufacturing clause” of US copyright). Informally, “courtesy of the trade” prevented rampant piracy, but this informal system withered in the early years of the twentieth century as new publishers emerged, and a more competitive publishing environment developed. Joseph Slaughter’s “Plagiarism, Promiscuous Translation, and Yambo Ouologuem’s Primitivism: or, The Following Takes Place (Again) between 12am and 1am, 14 July 1913,” began by comparing two different translations of Oulouguem’s Le Devoir de Violence (the long title makes sense). The 1971 translation by Ralph Manheim introduced allusions to Eliot and Dickinson in the novel (replacing allusions in the original to Homer). These allusions became an object of controversy in discussions of the novel. Generally, Slaughter suggests, allusion becomes plagiarism when writer and reader are not able to share a common text/heritage/culture. In this way, the question of allusion/plagiarism in Ouloguem’s novel became a question of cultural authenticity—to what extent can an African novelist allude to Western canonicals works without being accused of plagiarism? Paul Saint-Amour’s “Modernism, Copyright, and the Counter Factual” suggests a shift in the concept of copyright during the twentieth century from the individual to the population, from the individualized logic of the author function to a more biopolitical logic. This shift in conceptualization of intellectual property, Saint-Amour suggests, and the counterfactual logic the law sometimes uses, are behind contemporary extensions of copyright. But they might also open up new avenues of contestation. For example, while some arguments for extending copyright terms rely on longer life expectancies, mightn’t this same logic suggest that copyright should expire sooner in those nations with lower life expectancies?

  • 612. The Legacy of David Foster Wallace: Why would I write this panel up, when Kathleen Fitzpatrick has already done a fantastic job?

Tags: ,

« Older entries