A World of Content on Every Web Site
Solving the Performance Challenges for Media and Portal Sites
In April 2009, Google sent a video reporter to Times Square to ask passers-by to answer the simple question, “What is a browser?” (See the video). Fewer than 8 percent of the respondents could answer the question correctly. It would be interesting to follow up this random survey with the similarly disarming question, “What is a Web site?” No doubt the average Internet user would find that question equally, if not more, challenging.
Wikipedia (see accompanying interview) defines a Web site this way:
A website (also spelled Web site or web site) is a collection of related web pages, images, videos or other digital assets that are addressed with a common domain name or IP address in an Internet Protocol-based network. A web site is hosted on at least one web server, accessible via a network such as the Internet or a private local area network.
A web page is a document, typically written in plain text interspersed with formatting instructions of Hypertext Markup Language (HTML, XHTML). A web page may incorporate elements from other websites with suitable markup anchors.
That definition encompasses the many elements and permutations that make up a Web page or site. But to the average Internet user, a Web site is a single destination that delivers information or entertainment in various forms — video streams, photos, localized weather, feature stories. What does it matter where the content is coming from? For the average user (emphasis on average), all the content is coming from the one site they’re visiting. But as anyone who works in the industry is well aware, the content on any given site can originate from a number of sources.
Media Mash-Ups and Personalized Portals
Perhaps some of the most complicated sites on the Web are media outlets for news, sports, and entertainment, and portal sites like AOL, Yahoo!, iGoogle and the like. These destinations define “rich” content, loaded as they are with video streams, Flash movies, news feeds, tweets and, of course, advertising. Some of the mash-ups are completely transparent. Google News, for example, gives clear attribution to the sources of its stories; none of its content is original. On other sites, pulled-in content is not so obvious. And in most cases, content is being pulled in from multiple sources completely invisibly to the viewer.
While it’s not yet the “semantic Web” envisioned a decade ago by Sir Tim Berners-Lee, the man credited with inventing the World Wide Web — a Web where machines can understand, analyze, and combine information in usable ways without human intervention — the fact is that today’s Web relies very much on the free and automatic exchange of content among unrelated sites. Whether it’s external news feeds, product information from a manufacturer, live updates from Twitter, or the ubiquitous ad banners and boxes, more and more Web sites are populating their pages with content that comes from somewhere else on the Web — including owned or outsourced content delivery networks and the site owner’s own affiliated domains. For webmasters, this makes for complicated site performance and user experience challenges.
The Complexities of Content
“Gone are the days of a simple Web page with one or two people updating it,” says Shawn White, director of external operations for Keynote. “Nowadays, you have dozens, if not hundreds of servers and computers that are all over the world trying to serve up this content as fast as possible, and being updated by any number of people, including the public. For Web operations and IT managers who are responsible for uptimes and availability, it just makes things a whole lot more complex.”
In those days, back near the turn of the century, when sites created and pushed content out more or less from a single source, performance was in the hands of the site owners. They were responsible for implementing and configuring the capacity they needed to handle their expected traffic — and when there were performance lapses, they looked in the mirror to find the sources and solutions. After 9/11, for example, many of the major news sites went down for hours or days. They simply weren’t designed to handle the tremendous surge in visitors as Americans flocked to the Internet for news and updates. These were perhaps the most significant, spontaneous flash crowds that the adolescent Internet saw, and many sites’ inability to handle them was painfully obvious. But the problems were mainly internal capacity and external bandwidth.
Fast forward to today. In less than a decade since 9/11, the capacity of the Internet overall and of individual Web sites to handle traffic has increased exponentially — as have user expectations that sites will be available and fast 24/7/365. Today, Web sites are no longer single-source, singly hosted affairs; content is often fed from multiple external sources to populate a page. Bandwidth-intensive, processor-hungry video is everywhere, and is the life blood of many media sites. Flash-crowd events large and small are not uncommon, and by and large, most sites take them reasonably in stride. Site crashing is a much rarer phenomenon, even in the crush of traffic after a tsunami, an historic election, or a plane landing in the Hudson. Site performance, however, can still be significantly degraded by a major surge in traffic.
Rooting Out Page-Load Problems
One recent event brought many sites to their knees: the death of pop icon Michael Jackson. Akamai reported an 11 percent spike in Internet traffic worldwide in the hour the news was breaking. 1Adotas.com, “Ad Networks, Not Websites, Choked on Michael Jackson News,” by Edward Barrera, July 1, 2009 Major news outlets that are followed in the Keynote Performance Index saw their availability plummet as low as 10 percent. The Los Angeles Times, Jackson’s hometown news site, had significant problems.
Analysis revealed, however, that in a number of cases, it was not the site’s inability to handle the surge of traffic, but rather the inability of the third-party servers delivering ads to the sites to keep up with the demand. Pages froze as they waited for the ads to load. And users had to wait for their news, if they got it at all.
The Michael Jackson story is a dramatic example, but third-party content can eat away at site performance every day. Ads can be notoriously slow to load, but ads are not the only culprit. Twitter feeds, linked content from other sites, page assets delivered by content delivery networks, even Google analytics embedded in pages — all can slow a site down to uncompetitive, if not unacceptable levels.
You can’t have a media site without video, and apparently, if a little video is good, a lot of video is better. It’s the heart and soul of entertainment sites. It’s de rigueur for the broadcast news networks. And the Web has given traditional print journalism brands the opportunity to compete on broadcast journalism’s video turf. New technology has made it almost as easy to shoot, edit and post a video online as to prepare a written story with accompanying photos. Online media sites, with help from YouTube, have enabled a mass Web audience that prefers to watch rather than read.
There’s also no faster way to lose an audience than with a video stream that stutters and constantly stops to rebuffer. But again, monitoring streams from multiple servers or domains, and understanding actual end-user performance, is a significant test and measurement challenge.
Who’s on First? What’s on Second?
Site owners are more pressured than ever to deliver the fast, flawless experiences users now demand, and can often find at a competitor’s site. Monitoring and measuring their performance is no longer the simple task of measuring overall page load time. There’s really nothing a webmaster can do with the information that the site is running slow. Is it their own content? The CDN that’s pushing out their videos? The sister site that’s hosting their image library? The Flash banner promoting upcoming programming on their TV network? Or the ad network servers that supply the bulk of the site’s revenue? How does the site owner identify the bottlenecks, and gain actionable data to demand better performance from weak
providers in the content chain?
Measuring for Management
“When a site is being updated from different sources, you have to be able to figure out where your slowdowns are happening,” White explains. “Is it happening with a third-party ad? Is it with a Twitter or RSS feed? Is it with Flash or some other content that’s being uploaded?
“The bottom line is, you can’t manage what you don’t measure. First you have to determine what’s normal for your site. You need a benchmark for what’s normal for you — and that can vary by the hour or day of the week depending on your traffic patterns — and it’s also helpful to benchmark against your competition.”
Keynote offers services that drill down into overall page performance to provide sub-data that reflects the mosaic of content types and sources that typify complex media and portal Web pages. Using a suite of fast, simple-to-use tools, individual page components can be isolated or grouped into measurable “virtual pages,” so that the performance impact of each can be specifically characterized. Page components can be filtered by any variety of criteria, including domain, page element, size, and more. Third-party content providers — and internal resource managers — can then be held accountable for any performance shortcomings. Or the construction of the page itself can be tweaked for greater responsiveness.
“There are a number of ways that IT managers, webmasters and Web developers can implement improvements,” White says. “It’s surprising that there’s still a lot of people who don’t use these fundamental tricks of the trade; they either just don’t know about them or it hasn’t been as big of an issue.
“So they’re preloading advertisements or putting ads first in the code; one fix could be as simple as making the ads last to load on the page. Using Keynote tools, you can make adjustments and measure to see if it has any effect, and repeat that process on various components until the page actually meets your requirements for responsiveness.”
Mobile Web the New Mainstream?
Love it or hate it, the iPhone has dramatically changed the way masses of consumers use the Web. For media companies and online portals like AOL and Google, an online presence is not complete without a robust mobile site and/or application. Delivering an exceptional user experience on a mobile device is fraught with the same challenges as computer-delivered content, with the added complexity of hundreds and hundreds of device profiles, and the bandwidth challenges of cellular signals. And again, the challenge is not only with the site owner’s hosted content, but with third-party feeds including ads and videos. How do you measure what the end-user is actually experiencing?
“Can you emulate that experience or do you need to do it from a real iPhone,” White asks. “With Keynote’s Mobile Device Perspective, we have a network of real, actual iPhones around the world. We have real iPhones, connected to computers so we can take a recorded transaction or scenario. We can set up a script that says load the CNN app and click on the first headline, and how long does that take?
“We also have a service where we can emulate a phone — or 1,600 different phones — and do similar types of things. The advantage of our emulated service is that we get more details about the network — signal strength, what cell tower is the signal coming through.”
Performance Pays — Or Not
Slow page loads make for a bad user experience that can cause visitors to abandon sites. Recent studies suggest that visitors expect a page to load in just two seconds. So ad delivery that slows page performance down, or videos that take forever to stream, have a real financial impact. The site owner potentially loses revenue because they are delivering less traffic to the advertiser. The ad networks take a hit because it lowers the number of eyeballs they are delivering as well. And the advertisers themselves are not getting the exposure they are counting on to market their products or services.
All three parties then — site owner, ad network and advertiser — have a stake in understanding where the performance issues lie. With accurate performance data in hand, site owners can demand that ad networks perform to their minimum standards, or they can switch their sites to competitive networks (after making sure, that is, that their own page construction is optimized for best performance). Ad networks, in turn, can use the data to improve their delivery or to demonstrate to clients that they are delivering as promised. And advertisers can know if their message is getting out, and if it isn’t, they can explore alternate channels for their advertising.
Best Practices for Page Component Testing
The only accurate way to gauge page performance for end users is with live testing, in the field, using real browsers located across all the geography being served. Keynote’s testing network includes some 3,000 “typical” computers running Internet Explorer in 80 countries around the world. Tests can be constructed to simply measure home page load times, or to measure a specified task sequence. And with “virtual page” testing, individual page components — ad feeds or Flash movies, for example — can be individually benchmarked.
“With the data from this kind of testing,” White says, “IT and site managers can find out, where are my slowdowns? Are my slowdowns regional? Is it East Coast versus West Coast? Is it third-party feeds that are hanging my pages up? Or is it the ISP in a particular region?
“At the end of the day, you have to test in Johannesburg to know how your site is performing in Johannesburg. There’s nothing that beats the real thing. And that’s what our testing products do. We go to great lengths to make it as real-life as possible.”