Solutions
The Keynote Method
Partners
Support
News & Events
Resource Library

About the Internet Health Report

The Internet is a network of networks with no single owner or manager. As a result there has always been a lack of high level information on the status and performance of the Internet as a whole. Internet Health Report from Keynote Systems provides network engineers and corporate web site managers a unique service to help monitor and diagnose the Internet.

How it Works
The Internet Health Report delivers data about network performance (latency) between major United States Internet backbones. The measurement agents selected for the Internet Health Report have direct, single homed connections to a backbone so their connectivity is unambiguous. Each measurement agent connects to every other agent every 15 minutes and measures the latency (delay) of establishing a TCP connection across the Internet.

The resulting data in the Internet Health Report provides a logical performance map of the Internet. The Internet Health Report matrix is updated every 15 minutes. Any time the delay between two networks exceeds a threshold, a color coded alert indicates the level of the performance problem. Users can drill down to see specifically where the delays are occurring by clicking on any of the color coded boxes in the display.

The numbers in the display represent latency between networks in milliseconds. The total in each box is the geometric mean of all the data points collected between the networks during the specified interval.

Interpretation -
Top-level matrix

At the top level, the matrix shows aggregate latency between all the agents on backbone one and all the agents on backbone two. The color of the cell represents the highest latency of all cells in the second level matrix. The color of the number represents the rating of the geometric mean of the data that makes up the second level matrix.

Internet Health Report

As an example, consider the following top level matrix cell: Top Level Matrix Cell and the second-level matrix that sits beneath it:

Second Level Matrix Cell

We visualize the data this way so that you can determine at a glance the overall performance between a pair of backbones (by looking at the color of the number on the top-level matrix) as well as the poorest performance between any pair of agents on the pair of backbones (by looking at the color of the cell).

Since the measurements are round-trip delays, the top level matrix is symmetric around the upper-left to lower-right diagonal. The numbers for a given pair of backbones will be the same on either side of the diagonal.

Second-level Matrix
To drill down to the second level matrix, click on the cell of interest in the top level matrix:

Second Level Matrix Cell

Each cell at this level reports the geometric mean of all data points taken in the last hour (or last day, depending on your selection) for each agent pair. The number and cell color will always be the same

You may occasionally see a black cell in the matrix: Black Cell in the Matrix This indicates that we have no data for that particular agent pair over the time period you are examining. This can happen for a number of reasons, including loss of connectivity, agent failure, or Keynote database maintenance.

A Note on Comparison
Please note that using the top-level grid to compare the internal performance of different backbones is problematic. Because we use existing Keynote Perspective agents to take these measurements, there is not necessarily an equal number of measurement points on each measured backbone. In addition, the absolute physical distance between agents varies widely between backbones. For example, consider comparing a backbone that has two agent locations on opposite coasts to one that has two agent locations on the east coast. The latter backbone will report shorter latencies simply because there is less distance between the agents.

As we add additional agents to the measured backbones, this will become less of a problem.

Methodology

Measurements
Keynote maintains a network of measurement agents in over 50 cities around the world. These agents are placed on backbones that are statistically selected to represent the average experience of a business-to-business end user. Each location (a city-backbone pair) is connected to the selected backbone over a dedicated DS3 circuit or a dedicated ethernet connection. A list of all our agent locations is available here.

We have selected up to four agents on six US backbones for the initial release of Internet Health Report. Over time, we plan on adding additional agents per backbone, additional backbones, international measurements, and backbone-to-hosting facility measurements.

Every fifteen minutes each agent measures the time it takes to perform a TCP Open to every agent in the matrix, including itself. These measurements are randomly distributed across each fifteen minute period. The times are reported back to Keynote's central database.

We use TCP Open time rather than ICMP (Ping) times because the TCP Open is a much better representation of actual internet user experience. The most-used internet applications (Web, Email, Telnet, ...) begin by performing a TCP Open. Thus, all the intervening routers will treat this measurement just as they treat actual end-user requests, and not in some special way that may be triggered by the use of ICMP packets.

We do not currently measure packet loss directly. A loss of a packet during the TCP Open handshake will result in a much longer latency, as the lost packet will be retransmitted by the sender only after a timeout.

Reporting
Every fifteen minutes a report is run against the database which extracts every data point for the previous hour and day. These data points are then sorted by agent pair, a geometric mean is taken, and the Internet Health Report pages are generated and copied to the web server for public view.

We use geometric means because the Internet does not behave in a normally distributed way. Latencies tend to have a "heavy tailed" time distribution: most measurements clustered around a relatively small number (say, 50mS), with a few measurements at a very large number (say, 400mS). As a result, normal-distribution-based calculations of means (averages) do not give the best view of actual performance.

As an example, consider the situation of five houses in a neighborhood selling for $100K, and one selling for $10M. The average house price would be $1.75M (not an optimal representation of actual house prices - the expensive house pulls the average too high), the median house price would be $100K (not an optimal representation either, as the information added by the price of the expensive house is lost), while the geometric mean would be $215K (much closer to the cost of the five houses, and the information about the expensive house is not lost).

To calculate a geometric mean, we take the log of each data point, average the resultant numbers, and exponentiate by the same base as the original logarithm. Since the base in the log and exponentiation are the same, it does not matter what it is, as the exponentiation cancels the logarithm.

Thresholds
We have set the latency thresholds at 90mS for Green (Healthy), 120mS for Blue (Stable), 180mS for Yellow (Severe), and greater than 180mS for Red (Critical). These thresholds are based on the bi-coastal round-trip time for light through fiber (40mS per ITU-T G.114) with an allowance of an additional 50mS for propagation delay through intervening routers. We may change these thresholds over time as overall internet performance improves, and may use different thresholds for International connectivity measurements.


Contacting Us
For more information on Keynote products and services, please contact Keynote Sales at sales@keynote.com or 1-888-KEYNOTE (539-6683).

For comments or suggestions for the Internet Health Report, please send email to inthealth@keynote.com.

Press or analysts may contact Keynote Public Relations at press@keynote.com or 650-403-3254.