Thursday, August 23, 2007

Skype recounts tale of 'perfect storm' outage

It was a dark and stormy upgrade, and it won't happen again they say

Peter Sayer

August 21, 2007 (IDG News Service) -- The situation that prevented millions of people from accessing Skype Ltd.'s Internet telephony service late last week was a "perfect storm" and should not reoccur, the company said Tuesday.

The company initially attributed the problem, which began on Aug. 16, to the near-simultaneous rebooting of millions of computers, as Skype users running the Windows operating system attempted to reconnect to the service after downloading a series of routine software patches from Microsoft Corp.'s Windows Update service.

Skype's service relies on some of its users' computers to act as "supernodes," routing traffic for other, less well-connected, users. But as Skype customers tried to reconnect, many of those supernodes were themselves in the process of rebooting. The remaining supernodes were soon overwhelmed because a bug in the company's software did not efficiently allocate the network resources available.

Users were skeptical of this explanation. Microsoft regularly issues patches that may cause Windows computers to reboot, and they haven't caused problems for Skype before. Microsoft releases software updates on the second Tuesday of each month, a day known to systems administrators as "Patch Tuesday."

Skype spokesman Villu Arak offered a more detailed explanation of Skype's outage on Tuesday: Last week's problems were the result of a "perfect storm" of exceptionally high traffic through the service at the same time as the Windows Update process led to a shortage of supernodes in the service's peer-to-peer network.

The company did not offer an explanation for the high traffic, but accepted full responsibility for the software problem.

"Skype and Microsoft engineers went through the list of patches that had been pushed out," Arak wrote. "We ruled each one out as a possible cause for Skype's problems. We also walked through the standard Windows Update process to understand it better and to ensure that nothing in the process had changed from the past (and nothing had)."

The catastrophic effect on Skype's service was entirely Skype's fault -- a result of its software being unable to deal with simultaneous high load and supernode rebooting, according to Arak.

On Aug. 17, the day after the problems began, Skype released a new version of its software client for Windows to correct the problem. That update should behave better the next time high traffic coincides with a scarcity of supernodes, he said.

Skype had updated versions of its software client for Windows, Mac and Linux since July's patch Tuesday and before last week's outage, but the changes made in those updates were not responsible for the problem, according to company spokeswoman Imogen Bailey.

Reprinted with permission from

idg.net
Story copyright 2006 International Data Group. All rights reserved.

No comments: