Those partial to the backhoe approach can use Ward Cunningham's Signature Survey program. Billed as a "method for browsing unfamiliar code," Signature Survey scans through source code and compresses lines of text into a single punctuation symbol. Operating on the assumption that a file's size is proportional to the number of punctuation marks separating individual elements (packages and files in Java, for example), Signature Survey offers a quick guide to programming thickets and areas of quick repetition.
"It's a satellite system for looking over large bodies of work," Cunningham says. "It lets you use your own human pattern recognition to see variation over the whole program. It also leads you to interesting parts of the program to read."
Thomas says his own preferred technique is to import a program's contents into Microsoft Word and reduce the zoom factor as far as it will go. The resulting 50-page image leaves little for the eye to make out other than jagged patterns of text and blank page. Still, even these patterns can reveal peculiar anomalies in developer mood or style. "Sometimes the structure is easier to see at that level than if you're digging around line-by-line," he says.
Both Thomas and Cunningham liken their techniques to the aerial surveys some archaeologists use to spot the overall structure of burial mound networks, neolithic cairn patterns, etc.
"It shows the most interesting places to dig," says Cunningham.
It also provides a quick way to track the flow of ideas and source code from one program to the next. Cunningham, a man best known on the Web as the creator of the Wiki collaborative online authoring language, has loaned out his forensic talents to companies embroiled in legal disputes over intellectual property and prior art. He's also used it to refactor, or streamline, his own programs, stripping out redundant sections and commands.
When it comes to the toothbrush level, forensic tools and techniques are still in development. Booch says the fall workshop will discuss ways to analyze the fine structure of programs and to detect the emergence of novel techniques. One potential benefit of such knowledge would be a steep reduction in the number of frivolous patent claims filed by software companies.
"IBM believes in patents. I believe in them, too, but there are a lot that look suspicious," Booch says. "What better way to check for prior art than to have the source code ready and available for inspection?"
Herein lies the final goal of the fall Computer History Museum conference: to provide a foundation for a future exhibit on classic software programs and to provide a "vocabulary" for the intellectual dissection and discussion of these programs.
"Maybe I'm horribly geeky," says Booch, "but I find tremendous beauty in looking at well-written software programs. There's an elegance, a brilliance that we're only now developing the critical means to describe. We have literary critics. We have art critics. We don't have any software critics, yet. We need software critics, too."
Booch and his allies will need to overcome a number of obstacles, first. The largest obstacle at the moment is the lack of a central source code repository. In an online article, Elisabeth Kaplan, an archivist at the University of Minnesota's Charles Babbage Institute, lays out the frustrating history of software preservation. In 1986 the Computer Museum, a Boston forerunner of the current Computer History Museum, commissioned a report on how to archive software programs. That report identified many of the challenges but left the solutions to future reports. In 1988, the Library of Congress created a Machine Readable Collections Reading Room, essentially a repository of old machines capable of reading out-of-date programs. The project was phased out a few years later, however.
Since then, the topic of preservation has resurfaced every three years or so, a periodic rate roughly coincidental with the upgrade cycle of most commercial software programs, by the way.
"The issue comes up again and again," says Kaplan. "From an archival perspective, though, it's just not worth it to put resources into preserving software. There's just not enough projected use. The fact is, when you add up the amount of people who can use these programs, there are like five of them."
One institution willing to take up the burden is Kahle's Internet Archive. The Internet Archive already stores screen shots of Web sites and other artifacts of the digital age. Adding source code to the mix would be easy enough, says staff software preservationist Simon Carless. Unfortunately, legal issues and aging copy-protection mechanisms make it difficult to provide a decent record of historic programs.
Carless says the Digital Millennium Copyright Act clouds the current preservation landscape. Although the 1998 law lets archives make copies of copyright-protected works for preservation purposes, it imposes harsh criminal penalties for any circumvention of copy-protection mechanisms. Rather than risk legal blowback, Carless and the Internet Archive are currently petitioning Congress to clarify that archival organizations are exempt from such penalties.
"Even if you're an institution that's allowed to archive stuff, there's still a possible DMCA problem," Carless says. "If there's a physical hardware dongle that restricts copying, are you allowed to emulate that dongle to get the software running or does that qualify as a circumvention? We don't know."
Carless and the Internet Archive have recently requested that Congress expand its list of exemptions to Sec. 1201 of the DMCA, the portion that prohibits the circumvention of copy-protection mechanisms, to include software source-code preservation efforts. While waiting for a response, the Internet Archive has built a page displaying famous programs currently on the brink of software extinction.
In a similar attempt to rally the public, the Computer History Museum's Booch has sent out surveys asking programmers to nominate "classic" programs for a potential source-code exhibit. The list, originally intended to be a Top 50, already includes more than 150 games, applications, tools and programming languages. He hopes to devote the upcoming seminar to discussing how to present such programs to the public in a way that encourages further study and preservation.
"There's a great difference between walking up and showing somebody the Illiac and showing them the original source code for Lotus 1,2, 3," Booch admits.
Booch hopes to ally the preservation movement with two powerful forces: the World Wide Web and the open-source software community. Both have already proven invaluable in the preservation and publication of coding techniques, he says. He also plans to lobby companies with a stake in seeing their early works preserved.
Though Booch is hesitant to predict a donation of the original DOS source code from Microsoft, he has spoken with archivists inside the Redmond-based company wrestling with the same ideas. He also holds out hope that, with a little schmoozing and a little ego massage, the Computer History Museum might be able to encourage a more direct form of participation.
"Imagine somebody 100 years from now watching Bill Gates explaining the structures of his first program," says Booch, throwing out yet another hypothetical scenario. "Just think: Fox could have a reality show on software programming."
Booth punctuates his dream scenario with a quick laugh: "Actually, that's pretty scary when you think about it."