How We Fixed eBay: Lessons from the Crisis Management Trenches

January 13th, 2014 / Comments Off on How We Fixed eBay: Lessons from the Crisis Management Trenches

Crises happen every day—from BP’s oil spill along the Gulf Coast to recent issues with and the ongoing backlash directed at the NSA. It’s easy to judge from the sidelines, but in the fray dealing with these problems is difficult. It’s not important to get into them; what’s important is to determine how to get out of them.

So, what do you do when a crisis comes your way? It has been almost 15 years since I was asked to join eBay and fix its catastrophic systems issues but I remember it like it was yesterday. Below is a methodology I came up after 35 years in the technology industry, and practical tactics we used at eBay, and later, at other companies.

Assess the seriousness

You have to know your product, service and customers enough to know the size and scope of the problem. Is this comparable to one individual with food poisoning, or is it an E. coli outbreak? Both are bad, but one may be catastrophic. At eBay, Meg Whitman and I would judge incoming issues on the Richter Scale model, which gave us a quick way to sync up on the seriousness of the issue. An example of a “nine” was when the site crashed and the power went down on the main servers and the backup didn’t come on.

Deploy your resources

If you determine you have a serious problem, as we did with our “nine,” then you must ensure you have the right resources on it. The first thing to do is sound the alarm to get immediate attention — and action. At eBay we developed codes (Severity 1, Severity 2, etc.) and terminology like “911s”, which meant that every resource in the company could be pulled off of whatever they were doing to work on the current issue.

Plan for crises in advance

Ideally, you want to be deploying a playbook rather than developing a playbook. Most often, people don’t do this in advance and then have to develop processes while in battle — that’s much harder. At eBay, when we learned that hours after 9/11 people were putting debris from the World Trade Center for sale on the site (a “six” on the Richter Scale) we knew how to respond immediately because we had a policy in place that detailed that we would not profit from disaster. Because of this, we were able to respond immediately and take it down. Test processes and procedures in advance. Once we understood what was happening with the denial of service attacks, we tested solutions so we would be ready for the next situation. Keep everyone on high alert so they are ready to execute solutions.

Create a multifaceted plan

When problems hit, you don’t always know what’s wrong, and you certainly don’t always know how to fix it. This was the case in early 2000 when several Internet companies were victim to denial of service attacks, where everything looked good on the actual site, but unfortunately consumers couldn’t get to the service. We had to work with vendors, deploy patches and collaborate with other tech companies and law enforcement to determine how to stop it. I was always a fan of having several possible solutions going simultaneously just in case we were wrong in our hypothesis. If you want to solve a problem fast, it’s always better to have several solves going in parallel, as opposed to solutions launched serially.

Make sure you have the right people in place

Talent is everything. When you are in crisis mode you quickly get to see people at their best and their worst. I decided that I needed to add several key executives ASAP to my team as direct reports. Within weeks I had hired Marty Abbott who worked for me at Gateway to run Operations and by November, Lynn Reedy was hired to run Development.

Don’t have a lot of pride

It’s not about hierarchy. The best answers can come from anywhere. What’s most important is to solve the problems fast and prevent their recurrence. Are you getting better every day?

Minimize impact for customers

At eBay we used drives from a big vendor and when they crashed they would recycle and were designed to recover on their own. The vendor recommended we rely on that process, but we couldn’t wait for that amount of time. We created an intensive interim plan until the firmware was ready. For 24 hours a day, we would keep people watching for the warning that would flash before a crash and they could take the disk out of service before a crash happened. It was a high intensive solution, which was later automated with the software fix, but in the meantime we had to do everything within our power to reduce how customers were affected.

Get management’s support

Have management on high alert and present until permanent solutions are in place. The best way to handle crises is to stay calm and provide encouragement, but be relentless about getting the immediate issue fixed.

Communicate continually — to customers, vendors, employees

Create a culture of transparency and accountability. If we had a problem we always communicated it to our community. I wrote personal updates on issues to the management team and every week gave status reports on what went well and what didn’t. Be careful about what you say. Be truthful, but keep in mind that often what you think is going to happen, or what you believe is the causing the issue, turns out to be something else. At eBay, and later at, we built a “trust site” and dashboard that gave full transparency into what was happening on the site to let people know what was going on.

Postmortems are essential

Once the problem is solved, figuring out what went wrong (was it an execution issue, a vendor or product issue, a software bug, an external event, etc.?) and how to ensure it won’t recur is essential. Remember, great companies want to be world class at dealing with crises, but they want to be even better about not needing to show that they are.

Comments are closed.

Leave a Comment