How Did Your Computer Crash?
Check the Instant Replay


By MICHAEL FITZGERALD

ANYONE who uses a computer knows what it’s like to have the system crash. Crashes are the digital world’s addition to that short list of inevitables, death and taxes. But what if you could record the crash and play it back, like TiVo for software?

That idea inspired two software engineers, Jonathan Lindo and Jeffrey Daudel, to figure out such a product. They have succeeded, and are now moving from the niche market where they proved the idea and onto a bigger stage.

System crashes and other software flaws are more than an annoyance. A 2002 study by the National Institute of Standards and Technology estimated that software flaws cost the United States economy as much as $59.5 billion a year.

For software developers, the flaws that cause crashes rank among their biggest problems, especially the ones that can’t be reproduced, like the proverbial noise in the car engine that disappears when you visit the mechanic.

Mr. Lindo says he and Mr. Daudel found themselves overwhelmed by bugs they couldn’t find while working together at an Internet start-up in 2002. “We were spending almost all of our time not fixing the issues, but trying to get to the point where we could just see the issue, and we said, ‘Wouldn’t it be great if we could just TiVo this and replay it?”’ Mr. Lindo recalls.

Innovation by analogy is a powerful concept, says Giovanni Gavetti, an associate professor at the Harvard Business School who, with his colleague Jan W. Rivkin, has published research on how businesses can use analogic reasoning as a strategic tool. Human beings are analogy machines, he notes, dealing with new information by comparing it to things they already know something about.

It would take time for Mr. Lindo and Mr. Daudel to prove that their analogy worked. They were tackling a daunting problem — in fact, friends told them that they had a great idea, but one that was probably impossible to carry out. For one thing, they had to account for everything that can affect a program, from keystrokes, mouse movements and other software applications to network traffic and programming instructions that are designed to occur randomly. (For instance, in a computer game, the villain shouldn’t always do the same thing.)

Ideally, their tool would not slow down the system as it recorded what was happening. They were also developing it for game platforms, among the most complex of software environments.

There were already programs on the market that could do things like log all the various inputs a program received. But none of them worked as the program was running, which is what developers really want, say analysts like Theresa Lanowitz of Voke, a technology research firm. In effect, that meant these products took snapshots, not the “video” that Mr. Lindo and Mr. Daudel thought was necessary.

Eventually, they got their technology working, and in late December 2003 quit their jobs and started Replay Solutions — so named because it would replay software crashes.

Their product, ReplayDirector, works on the Xbox gaming platform and several versions of the Microsoft Windows operating system. One customer is Electronic Arts, which began using the product in the fall of 2006, according to Steven Giles, its director of online operations.

Mr. Giles says he was referred to Replay by a venture capitalist he knows. The venture capitalist was worried that the company’s software might be “smoke and mirrors,” and Mr. Giles initially felt the same way. But when he realized that it worked, he convinced a number of developer teams at Electronic Arts to license the tool. (The venture capitalist ended up investing in Replay.)

Mr. Giles says that he liked a number of things about the tool, but that one stood out: its ability to capture bugs that cannot be reproduced.

“That’s something that nobody inside or outside our industry has really been able to solve,” he says. “We refer to it almost as magic.”

He declined to say how much Electronic Arts saves by using Replay, but says that “where it’s obvious that there are savings is when you see your most senior and expensive developers working on the game rather than chasing these ghost bugs.”

Replay’s gaming software sells for $50,000 a project, with negotiated pricing for multiple projects.

There are still pieces missing from the TiVo analogy. For instance, software developers can’t yet fast-forward to see their crashes. Still, having tackled games, Replay is now expanding into new markets. Last week, it released a beta version of its next product, for developers who write code in the Java language. Versions for other markets will also start appearing later this year, and the technology should transfer to almost any kind of software environment, says Vishwanath Venugopalan, an enterprise software analyst at the 451 Group, a research firm in New York.

THE best news for business is that Replay is one of a number of innovators, big and small, aiming to improve how software is developed, says Dana Gardner, president of Interarbor Solutions, a consulting firm in Gilford, N.H. This burst of innovation, he says, reflects the increasing importance of software across the business world.

Even a clever tool like Replay, however, cannot completely eliminate system crashes. With new software applications, “you’re always trying to solve a problem nobody’s ever solved before,” notes Michael D. Ernst, an associate professor of computer science at M.I.T. who studies programmer productivity and has independently been researching the idea of “replaying” software.

New software, then, guarantees new bugs. But having a way to replay problems should make it much faster to find — and swat — those bugs.

Michael Fitzgerald writes about business, technology and culture. E-mail: mfitz@nytimes.com.

http://www.nytimes.com/2008/03/23/te...f75&ei=5087%0A

Copyright 2008 The New York Times Company