This is the html version of the file https://www.usenix.org/conference/usits-01/architecture-content-routing-support-internet.
Google automatically generates html versions of documents as we crawl the web.
These search terms have been highlighted: m gritter d cheriton an architecture for content routing support in the internet proc usits march 2001
An Architecture for Content Routing Support in the Internet
Page 1
An Architecture for Content Routing Support in the Internet
Mark Gritter, David R. Cheriton
Computer Science Department
Stanford University
{mgritter,cheriton}@dsg.stanford.edu
Abstract
The primary use of the Internet is content distribu-
tion — the delivery of web pages, audio, and video
to client applications — yet the Internet was never
architected for scalable content delivery. The re-
sult has been a proliferation of proprietary protocols
and ad hoc mechanisms to meet growing content de-
mand.
In this paper, we describe a content routing design
based on name-based routing as part of an explicit
Internet content layer. We claim that this content
routing is a natural extension of current Internet di-
rectory and routing systems, allows efficient content
location, and can be implemented to scale with the
Internet.
1 Introduction
With the emergence of the World Wide Web, the
primary use of the Internet is content distribution,
primarily in the form of web pages, but increasingly
audio and video streams as well. Some measure-
ments indicate that 70 to 80 percent of wide-area
Internet traffic is HTTP traffic. Much of the remain-
der consists of RealAudio streams and DNS[4, 12].
That is, almost all of the traffic in the wide area is
delivery of content, and ancillary traffic to locate it.
Today, millions of clients are accessing thousands of
web sites on a daily basis, with relatively few popu-
lar sites supplying a large proportion of the traffic.
Moreover, new popular web sites and temporarily
attractive web sites can prompt the sudden arrival
of a “flash crowd” of clients, often overwhelming the
resources of the associated servers.
To scale content delivery to support these demands,
more and more content providers and content host-
ing sites are replicated at multiple geographically
dispersed sites around the world, either on-demand
(i.e. caching) or by explicit replication. The problem
is then to route client requests to a nearby replica of
the content, the content routing problem.
In this paper, we argue that current content rout-
ing designs are unsuitable due to their closed na-
ture and scalability limits. Section 3 describes a
content routing system that forms part of an ex-
plicit Internet content layer, and we claim that this
system provides better latency and scalability than
current approaches. We support these arguments by
an analysis of the scalability of name-based routing
in section 4, and close with a description of related
work, future goals, and conclusions.
2 The Content Routing Problem
The goal of content routing is to reduce the time
needed to access content. This is accomplished by
directing a client to one of many possible content
servers; the particular server for each client is chosen
to reduce round-trip latency, avoid congested points
in the network, and prevent servers from becoming
overloaded. These content servers may be complete
replicas of a popular web site or web caches which
retrieve content on demand.
Currently, a variety of ad hoc and, in some cases,
proprietary mechanisms and protocols have been de-
ployed for content routing. In the basic approach,
the domain name of the desired web site or con-
tent volume is handled by a specialized name server.
When the client initiates a name lookup on the DNS
portion of the content URL, the request goes to this
specialized name server which then returns the ad-
dress of a server “near” the client, based on special-
ized routing, load monitoring and Internet “map-
ping” mechanisms. There may be multiple levels of
redirection, so that the initial name lookup returns
the address of a local name server which returns the
actual server to be used, and the client must send
out additional DNS requests.
As shown in figure 1, a client which misses in the
DNS cache first incurs the round-trip time to ac-
cess the DNS root server, to obtain the address
of the authoritative name server for a site, e.g.
microsoft.com.. Next the client must query this

Page 2
Root
DNS
DNS
Server
Client
Internet
Figure 1: Conventional Content Routing
name server to receive the address of a nearby con-
tent server, incurring another round-trip time. Fi-
nally, it incurs the round-trip time to access the con-
tent on the designated server. If in this example
the client is located in Turkey, the first round-trip
is likely to go to Norway or London. The second
round-trip may have to travel as far as Redmond,
Washington, and the final might be to a content dis-
tribution site in Germany.
Thus, the conventional content routing design does
not scale well because it requires the world-wide
clients of a site, on a cache miss, to incur the long
round-trip time to a centralized name server as part
of accessing that site, from wherever the client is lo-
cated. This round-trip times are purely overhead,
and are potentially far higher than the round-trip
times to the content server itself; this long latency
becomes the dominant performance issue for clients
as Internet data rates move to multiple gigabits, re-
ducing the transfer time for content to insignificance.
These name requests may also use congested por-
tions of the network that the content delivery system
is otherwise designed to avoid.
DNS-based content routing systems typically use
short time-to-lives on the address records they re-
turn to a client, in order to respond quickly to
changes in network conditions. This places addi-
tional demands on the DNS system, since name re-
quests must be sent more frequently, increasing load
on DNS servers. This can lead to increased latency
due to server loads, as well as increased probability
of a dropped packet and costly DNS timeout. As
shown in [5], DNS lookup can be a significant por-
tion of web transaction latency.
Both of these problems can be ameliorated by intro-
ducing multiple levels of redirection. Higher-level
names (e.g., m.contentdistribution.net) specify
a particular network or group of networks and have a
relatively long time-to-live (30 minutes to an hour).
The records identifying individual servers (such as
s12.m.contentdistribution.net) expire in just
seconds. However, this increases the amount of work
a client (or a client’s name server) must perform on
a “higher-level” cache miss, and requires additional
infrastructure. Also, such a design conflicts with the
desire for high availability, since alternate location
choices are unavailable to a client with cached DNS
records in the event of network failure.
Conventional content routing systems may also suf-
fer from other availability problems. A system which
uses only network-level metrics does not respond
to application-level failure, so a client may be con-
tinually redirected to an unresponsive web server.
Designs which rely upon measurements taken from
server locations may also choose servers which are
not suitable from the client’s perspective, due to
asymmetric routing. A smaller, related, problem is
that DNS requests go through intermediate name
servers, so that the actual location of the client may
be hidden.
Finally, content routing systems may have difficulty
scaling to support multiple content provider net-
works and large numbers of content providers. Some
content providers (such as CNN.com) serve HTML
pages from a central web site but provide graphics
and other high-bandwidth objects from a content de-
livery network; the URLs of these objects are located
under the content delivery network’s domain name.
This has the advantage of increasing the probabil-
ity of DNS cache hits, since the same server location
information (akamai.net, for example) can be used
for other sites. However, it does not help increase
availability of the site’s HTML content or improve
latency to access it. Performing content routing on a
larger set of domain names in order to improve web
latency may result in lower DNS hit ratios, in ad-
dition to the costs of a larger database at a content
delivery network’s name servers.
Obtaining access to the network routing informa-
tion needed to perform content routing may also be
problematic. Content provider networks must ei-
ther obtain routing information from routers near
to their servers (via BGP peering or a proprietary
mechanism) or else make direct network measure-
ments. Both these schemes require aggregating net-
work information for scalability, duplicating the ex-
isting routing functions of the network. It may also
be politically infeasible to obtain the necessary in-
formation from the ISPs hosting content servers.
There is also no clear path for integrating access to
multiple content delivery networks. In order to do

Page 3
so, a content provider would have to include an ad-
ditional level of indirection to decide which CDN to
direct clients to. This may be infeasible in prac-
tice (for example, if a URL-rewriting scheme is used
to indicate the CDN in use), or at the very best
difficult due to conflicting mechanisms and metrics.
The proprietary approaches to content routing vi-
olate the basic philosophy of the Internet of using
open, community-based standard protocols, impos-
ing a closed overlay on top of the current Internet
that duplicates many of the existing functions in the
Internet, particularly the routing mechanisms.
3 Network-Integrated Content Rout-
ing
Our approach to the content routing problem is to
view it as, literally, a routing problem. Clients (and
users) desire connectivity not to a particular server
or IP address but to some piece of content, speci-
fied by name (typically a URL). Replicated servers
can be viewed as offering alternate routes to access
that content, as depicted in Figure 2. That is, the
client can select the path through server 1, server
2 or server 3 to reach the content, assuming each
server is hosting the desired content. Thus, it is the
same multi-path routing problem addressed in the
current Internet in routing to a host.
Server
Server
Server
3
2
1
Client
Content
Figure 2: Content-Layer Routing
Network-integrated content routing provides sup-
port in the core of the Internet to distribute, main-
tain, and make use of information about content
reachability. This is performed by routers which
are extended to support naming. These content
routers (CRs) act as both conventional IP routers
and name servers, and participate in both IP routing
and name-based routing. This integration forms the
basis of the content layer. Not every router need be
a content router; instead, we expect firewalls, gate-
ways, and BGP-level routers to be augmented while
the vast majority of routers are oblivious to the con-
tent layer.
3.1 Content Lookup
Client
CR
CR
Server
CR
Server
Server
1.2.3.4
1.4.9.6
8.4.2.1
example.com?
1.2.3.4
Figure 3: Internet Name Resolution Protocol
Name lookup is supported by the Internet Name
Resolution Protocol (INRP); this protocol is reverse-
compatible with DNS, using the same record types
and packet format, but with different underlying se-
mantics. Clients initiate a content request by con-
tacting a local content router, just as they would
contact a preconfigured DNS server. Their requests
may include just the “server” portion of a URL, al-
though in the long run it would be advantageous to
include the entire URL.
Each content router maintains a set of name-to-next-
hop mappings, just as an IP router maps address
prefixes to next hops. (This name routing informa-
tion is maintained using a dynamic routing protocol
detailed below.) When an INRP request arrives, the
desired name is looked up in the name routing table,
and the next hop is chosen based on information as-
sociated with the known routes. The content router
forwards the request to the next content router, and
in this way the request proceeds toward the “best”
content server, as shown in Figure 3. The routing in-
formation kept for a name is typically just the path
of content routers to the content server, although it
may be augmented with load information or metrics
directly measured by a content router.
When an INRP request reaches the content router
adjacent to the “best” content server, that router
sends back a response message containing the ad-
dress of the preferred server. This response is sent
back along the same path of content routers. If no
response appears, intermediate content routers can
select alternate routes and retry the name lookup.
A client application which is INRP-aware can also

Page 4
request exclusion of a non-responsive server in an
INRP request.
In this fashion, client requests are routed over the
best path to the desired content in the normal case,
yet can recover from a failing server or out-of-date
routing information. INRP thus provides an “any-
cast” capability at the content level, with network
and client control to re-select alternatives based on
direct experience with the chosen server.
Routing is done at the granularity of server names
rather than full URLs (although the latter could be
useful for proxies or content transformers). This de-
cision does limit some possible caching applications,
but provides little practical obstacle to web design-
ers, since directory names from the file portion of
a URL can be moved into the front of the server
name (i.e., http://foo.com/bar/index.html can
become http://bar.foo.com/index.html). Rout-
ing is longest-suffix match, since this can be
much more efficiently performed than other possi-
ble matches on URLs.
Relaying the name lookup request across the same
path as the packets are to flow ensures that naming
is as available as endpoint connectivity— and that
the replica selected is actually reachable. Moreover,
the trust in name lookup matches the trust in deliv-
ery because both depend on the same set of network
nodes. Also, the name lookup load for a path is im-
posed just on the routers on that path, so upgrad-
ing a router on that path for increased data capacity
can also upgrade the name lookup capacity on that
path. Additionally, we are exploring piggybacking
connection setup these name lookups, in which case
the name lookup would progress all the way to the
content server itself.
3.2 Name-Based Routing
The Name-Based Routing Protocol (NBRP) per-
forms routing by name with a structure similar to
BGP [15]. Just as BGP distributes address pre-
fix reachability information among autonomous sys-
tems, NBRP distributes name suffix reachability to
content routers. Like BGP, NBRP is a distance-
vector routing algorithm with path information; an
NBRP routing advertisement contains the path of
content routers toward a content server.
At its most basic, a BGP routing advertisement con-
sists of an address range, a next-hop router address,
and a list of the autonomous system (AS) numbers
through which the advertised route will direct traf-
fic. For example, an advertisement for Stanford’s IP
address range might specify 171.64/255.192 as the
CR
CR
CR
Server
[aaa.wh.net]
example.com NH 1.2.3.4
example.com NH 9.8.7.6
[aaa.wh.net gw.wh.net]
Figure 4: Name-Based Routing
range, 192.41.177.8 as the next hop router, and
7170, 1 as the AS-path.
As shown in figure 4, a name-based routing adver-
tisement contains essentially the same information.
The advertised content is named example.com, the
next hop toward that content is the address of the
content server or content router, and the path of
routers through which the content is accessed.
Routing advertisements from content servers may
also include a measure of the load at that server,
specified in terms of the expected response latency.
This extra attribute indicates that content which
takes longer to access appears “further away” from
a routing perspective, and may be treated internally
by a content router as extra hops in the routing path.
The distance this load information is propagated is
limited to keep the number of routing updates man-
ageable.
NBRP updates can be authenticated by crypto-
graphic signatures, in a manner similar to Secure
BGP [10]. A content server’s authenticity is veri-
fied by the signature on its initial routing update;
content routers receive explicit permission from this
content server to advertise routes with their name
added to the path list.
Content routers should apply information learned
from IP routing to the content routing; if a content
peer becomes unreachable then all the content avail-
able through that peer is unreachable as well. IP
routing information can also be used to select among
routes that appear identical at the content routing
level. Finally and most importantly, IP routing poli-
cies must be consistent with content routing policies
so that the decisions made at the content level are
faithfully carried out by the IP forwarding level. (It
is possible that existing traffic engineering schemes
can be used to ensure this behavior; however, we pro-
vide some additional ideas on how the two layers can
be integrated in the future work section.) Content

Page 5
routers may also make routing decisions based upon
information obtained via measurement and mapping
techniques.
3.3 Benefits
Using INRP and NBRP as described above, a client
request is mapped to a nearby content server within
one round-trip time to a content router near to the
client, without the need to contact off-path name
servers. Latency is typically dominated by round-
trip time to the content server, not by content rout-
ing; cache misses require only one RTT to do a new
name lookup. Moreover, by increasing the number
of content routers, this property is retained even as
the Internet scales to ever larger size and increasing
number of clients.
By making name lookup low-latency, INRP elimi-
nates the need to perform multiple levels of redi-
rection in DNS. Instead, low-TTL address records
can be returned at the first layer of naming, to pre-
serve sensitivity to network conditions. (Assuming,
of course, that the content routers can handle the
name lookup load required, which will be addressed
below.)
By making INRP and NBRP open standard Internet
protocols, all ISPs, router manufacturers and con-
tent providers can participate in this content routing
layer, further enhancing the cost-effective scalability
to the clients.
The key issue raised by our solution is the scalability
of NBRP, given it is distributing naming and load in-
formation, not just aggregatable addressing informa-
tion. Ideally, we would like to completely replace the
current Domain Name System by INRP and NBRP,
to remove dependence on root name servers— them-
selves a large source of connection setup latency and
scalability concerns.
4 Scaling Mechanisms for Name-
Based Routing
At the global top-level domain name (GTLD) level,
the domain name system is essentially flat. There
is little aggregation possible with the domain name
space beyond that performed at the organization
level.
For example, most names of the form
*.stanford.edu appear in the same part of the
network, but stanford.edu is not aggregatable as
part of an edu route. So, “default-free” content
routers have to know essentially all second-level do-
main names.
4.1 Explicit Aggregation
To handle large numbers of names which appear
globally in name-based routing tables, NBRP sup-
ports combining collections of name suffixes that
map to the same routing information into routing
aggregates. For instance, we expect an ISP content
router to group all of the names from its customers
into a small number of aggregates. Routing updates
then consist of a small number of aggregates rather
than the large number of individual name entries
contained in each aggregate. Load on an entire data
center or network may be advertised as load on the
aggregates advertised by that data center or net-
work.
Routing aggregate advertisements contain a version
number, so that a content router can detect a change
in the contents of an aggregate. Aggregate contents
are discovered by sending an INRP request back to
the router advertising the aggregate; this request is
for a “diff” between the last known version and the
advertised version, so that large aggregates do not
have to be resent in their entirety. Aggregate mem-
bership is relatively long-lived, compared to dynamic
routing state, so that content routers can amortize
the cost of learning the names in an aggregate over
many routing updates.
All names in a routing aggregate are treated iden-
tically in routing calculation, thus reducing load at
content routers. This is accomplished in our im-
plementation by mirroring the indirection provided
by aggregates in the routing table, as shown in fig-
ure 5. A routing table entry for a name appear-
ing in an aggregate contains a special entry pointing
to the entry for the aggregate itself. This indirect
pointer is treated as the preferred route for the ag-
gregate. Thus, when the routing information for the
aggregate (calren.net) changes, the routing infor-
mation for its constituent names (stanford.edu and
bekerley.edu) is automatically updated.
100 (A) 90 (B) 20 (C)
115 (D)
calren.net
stanford.edu
berkeley.edu
Figure 5: Routing Table with Aggregates
The number of aggregates a name belongs to is likely
to be relatively small for all but the largest con-
tent providers— which can be advertised unaggre-

Page 6
site threshold
Affixes
aggregate threshold
(1000s)
3
5
10
20
2
1727
19.5 (6.7) 20.1 (5.6) 25.7 (4.4) 37.0 (3.4)
3
1692
14.9 (5.9) 16.1 (5.0) 20.6 (4.0) 30.1 (3.2)
10
1679
14.8 (5.9) 16.0 (5.0) 20.6 (4.0) 30.3 (3.2)
Table 1: Number of routes (and aggregates) in thousands for different site and aggregate threshold values.
gated. So, even though this design requires a linear
search through the entire list of routes for aggre-
gated names, we expect that this cost will be rela-
tively small. (It would eliminate much of the benefit
of aggregation to re-sort all such lists on a routing
change.)
Fine-grain information such as server load is hid-
den by the aggregate; regions of the network where
the aggregate is advertised but individual members
are not must base their decision on just the aggre-
gate. This is similar to the way BGP routing pro-
vides coarse-grained address range information and
does not indicate whether particular hosts or sub-
nets are up or down. In fact, the situation is im-
proved by INRP’s request-response nature: unlike
datagram delivery, a name lookup can pursue alter-
nate routes if a route proves inaccurate— at the cost
of additional latency.
To evaluate the expected performance of aggrega-
tion if applied in the current Internet, we processed
a comprehensive list of address-to-name mappings in
the Domain Name System[6] and BGP table dumps
from the MAE-East exchange point[7] by the follow-
ing algorithm, making the assumption that name-
based routing structure will roughly correspond to
current BGP autonomous system boundaries:
1. Each address range from the BGP table is
matched with the DNS zones represented. (If
fewer than site threshold hosts in a range be-
long to an existing zone, they are removed from
the table completely and assumed to be han-
dled with the redirection mechanism described
below in section 4.2)
2. Names whose associated routing information is
made redundant by a superzone are also re-
moved.
3. Aggregates are created for any set of names
larger than aggregate threshold that have iden-
tical routing information (i.e., all known routes
were identical, not just the preferred route.)
The resulting aggregates, although incomplete, pro-
vide an estimate of those expected to be generated
by name-based routers.
One representative set of results is shown in Ta-
ble 1. The original BGP table had 68,200 routing
table entries. Aggressive aggregation (site threshold
of 10 and aggregate threshold of 3) results in a table
with 1,679,000 entries, but only 14,800 advertised
names (including 5,900 aggregates). Thus, the num-
ber of routes is actually smaller than in the original
IP routing table. Even relaxed values of the model
parameters result in a routing “back-end” size com-
parable with the original BGP table. Higher-level
aggregation may be able to reduce this yet further
without resorting to renumbering or renaming.
The number of routing entries is comparable with
the best possible achieved under IP routing. BGP
does have a limited mechanism for aggregation: a
single route update may include several address pre-
fixes. It is not clear the extent to which BGP soft-
ware makes use of this to optimize update calcula-
tions: there is no requirement that advertisements
keep these address prefixes together, and the address
ranges must appear separately in the IP routing ta-
ble. Aggregating all all identical address prefixes
would result in 11,800 routing table entries for the
original MAE-East table.
And additional challenge NBRP addresses is addi-
tion of a new names, which is much more common
than addition of new BGP prefixes; this name in-
formation must propagate to all default-free content
routers. This is far smaller than the rate of normal
routing updates, since addition of new names is done
on human time scales; even the addition of 10 million
new globally routed suffixes per year results in just
19 updates per minute. Some new names are added
to multiple locations or duplicated to handle flash
crowds, increasing the rate of routing table change.
However, the number of routing updates seen by an
individual router on name addition is limited by the
number of direct peers. To put this in perspective, a
backbone router may receive more than 2,000 rout-
ing updates per minute; new naming information is
dwarfed by the normal rate of topological change.

Page 7
Also, the actual level of routing updates necessary
for new names can be lower in some cases because
changes to aggregates can be “batched” to reflect
many new names with one update.
The cost of distributing the contents of routing ag-
gregates is acceptable as well, even at Internet scales.
The aggregates obtained from the analysis described
above show a heavy-tailed distribution; the mean
number of names per aggregate is 304, while the me-
dian is 24. The average size of the domain names in
these aggregates is 16 bytes, although these statistics
could be somewhat different if the content routing
system was used more aggressively to do redirection
on finer granularity. Even an aggregate of 50,000 en-
tries requires only 782 kilobytes to be sent initially,
estimating 16 bytes for each name suffix. Later up-
dates can be sent as deltas to the known aggregate,
since aggregates can be kept in permanent storage.
The long-term bandwidth consumed by aggregate
updates (across as single peering connection) can be
estimated as
number of names ·
2 · (3 · average name size + 170)
average name lifetime
The estimate represents the cost of sending a query
(including IP and TCP headers) for a change in ag-
gregate membership and getting the response; this
transaction occurs twice since it happens when the
name enters and leaves an aggregate. The packet
sizes were measured by our implementation and as-
sume one TCP packet per query. Even assuming a
relatively short time of 1 day for the average time
a name suffix appears in an aggregate, a database
of 30 million names (with average size 16) requires
1.2 Mbit/second data transfer. When compared to
the 10-gigabit expected capacity of backbone links,
this number represents only 0.01 percent of the avail-
able bandwidth. Moreover, all aggregate routing up-
date traffic takes place only between two immediate
NBRP peers.
4.2 Redirection
Not all hosts with names in a given suffix are
connected to the network globally advertising that
name suffix. For example, there may be hosts with
stanford.edu names scattered throughout the In-
ternet, even though most Stanford names are located
together. It does not seem feasible to advertise all
of these names globally. Such hosts could simply be
assigned fixed addresses by the content router oper-
ated by Stanford. However, we see some benefit in
allowing local flexibility in address assignment with-
out updating a remote server— particularly in sit-
uations where network address translation is being
used, or in mobile networks.
INRP provides a redirection mechanism for finding
isolated names not advertised in the the name-based
routing system. Such names have records indicating
their actual topological location in the Internet in
terms of a more well-known name. When a request
is answered with a redirection record, the client (or
the first-hop content router acting on its behalf)
restarts the query using the proper name. For exam-
ple, if the host gritter.stanford.edu is located at
Berkeley, a name lookup might return a redirection
to guest32.berkeley.edu. This redirection mech-
anism trades fate-sharing and name lookup latency
for decreased routing state; economic factors may
well determine what names appear in routing tables
and which are found through redirection. That is,
an ISP may charge per name it places in the routing
table, so an organization weighs the cost-benefit of
having a name handled by the ISP versus incurring
the redirection cost on a name.
This secondary mechanism is really only needed
when using NBRP to replace all DNS usage; for con-
tent routing, name-based routing tables contain only
site and content volume names rather than host and
network interface names.
5 Implementation and Analysis
Our prototype content router has been implemented
in C++. The name routing table is implemented as
a hash trie, allowing longest-suffix matching to be
performed in time linear with name length. (For
most names in our sample and experiments this is
simply two hash table lookups.) The following mea-
surements were taken on a 600 MHz Pentium III
system running Linux 2.2.12.
5.1 Content-Layer Overhead
We measured an overhead of 0.5 milliseconds (total
for both request and response) for going through a
single hop of the content routing layer, on a name
routing table of 5 million entries stored entirely in
memory. The 5 million names were randomly gener-
ated second-level domain names, with 80% in .com
and 10% in each of .org and .net; a uniform distri-
bution of name lengths between 3 and 17 was used.
These names were divided into aggregates of 15,000.
Measurements done on the 1.7 million-name
database from our aggregation experiment show no
significant difference in overhead. Profiling infor-
mation shows that most of this time is spent doing

Page 8
packet processing; measurements on the actual rout-
ing table show that route lookup takes as little as 6
microseconds. Our implementation can easily sus-
tain a throughput of 650 requests/second without
any degradation of response time, and peak through-
put of 1600 requests/second.
The total amount of memory used by the content
router for a 5 million entry table was 344MB, while
that on a similarly generated 100,000 entry (but un-
aggregated) table was 20MB. This leads to an esti-
mate of 69 bytes per routing table entry. We can
extrapolate that a 30-million entry database would
require nearly 2GB of memory; while large, this is
not an infeasible amount of DRAM, costing only
about $4000. (It is worth noting that name lookups
which must go to the DNS root already encounter a
database lookup of approximately this size.)
5.2 Improved Performance with INRP
The improved performance provided by INRP is
illustrated by considering access times to content
servers through Akamai versus our proposed con-
tent routing. An additional example shows the ben-
efit of using INRP rather than contacting root name
servers.
A conventional name lookup of a388.g.akamai.net
from Stanford returns the addresses of two content
servers which are located 6.6 ms round-trip-time
away.
At the next level, the name servers for akamai.net
are located throughout the Internet, with an aver-
age round-trip times ranging from 12 ms to 93 ms.
Overall, this set of name servers has a mean response
time of 65 ms and median of 83 ms, ignoring dropped
requests.
Using INRP, the same request would go through
about 5 content servers (at least one per intervening
network), so we will estimate 3 ms extra round-trip
time. The direct path to the content servers would
then require approximately 10 ms for the name re-
quest. A similar example for a miss at the root name
servers is carried out in Table 2 for www.cisco.com.
As the latency measurements in Table 2 indicate,
INRP reduces average request latency in these exam-
ples by 86 to 95 percent and also eliminates the vari-
ability in latency, providing more predictable perfor-
mance.
5.3 Name-Based Routing Performance
We measured the routing throughput of our proto-
type implementation on a local network using a ran-
Site
Server
request latency
Prefix
minimum
mean
Akamai akamai.net
12 ms
65 ms
g.akamai.net
7 ms
7 ms
Total
19 ms
72 ms
INRP (5 hops)
10 ms
10 ms
(-86%)
Cisco
com
9 ms
101 ms
cisco.com
4 ms
40 ms
Total
13 ms
141 ms
INRP (5 hops)
7 ms
7 ms
(-95%)
Table 2: Example name request round-trip times
on
cache miss (measured
from
Stanford) for
a388.g.akamai.net and www.cisco.com.
dom routing update traffic generator. A single ma-
chine was the source of all routing traffic; an instru-
mented content router was configured to advertise
its preferred routes to a variable number of peered
content routers, all connected by a 100Mbit LAN. To
maximally exercise the content router, the generated
traffic consisted of previously unknown routes and
changes to aggregate membership, so that all rout-
ing updates were propagated to all configured peers.
The routing preference function used was minimal;
with more complicated routing polices, the cost of
calculating route preferences dominates, so the re-
sults presented below should be considered upper
bounds on the performance of this particular imple-
mentation.
Figure 6 shows the routing throughput for each num-
ber of peers. The “no aggregates” data set repre-
sents routing updates advertising individual names.
Throughput gracefully declines with the number of
routing peers, from a maximum of 1050 updates per
second with one peer, to 370 updates per second
with six peers.
For “1 percent update”, 1% of the routing adver-
tisements consisted of a one-name change to an ag-
gregate. As the figure shows, this reduces through-
put by approximately 5%, due to the extra queries
needed to obtain the new name and the file system
accesses to store the new aggregate contents. In-
creasing the update size to 20 names showed only a
1-2% additional reduction in throughput, indicating
there is some benefit to batching aggregate mem-
bership changes. Higher proportions of aggregate
changes to normal routing updates result in further
reductions in throughput; an experiment where all

Page 9
0
200
400
600
800
1000
0
1
2
3
4
5
6
7
updates
Number of Peers
No aggregates
 
 
 
 
 
 
 
1 percent update
+
+
+
+
+
+
+
Figure 6: Name-based Routing Performance (up-
dates/sec)
routing updates were aggregate changes resulted in
only 87 updates/second.
6 Deployment
The content layer has a simple deployment path,
based on user need and on an ISP’s motivation to
provide a better web experience to customers and su-
perior service to colocated service providers. INRP
and NBRP can initially be implemented in ISP name
servers, which fail over to normal DNS behavior
for unrecognized names. INRP provides a way for
ISPs to quickly direct their customers to colocated
servers, eliminating any need for name requests to
leave their network. NBRP is not strictly needed,
but may prove a convenient way to advertise new
content to an ISP’s name servers.
This initial deployment requires no changes to end
hosts and no change to the basic IPv4 routers and
switches constituting the infrastructure of the leaf
and backbone networks. It only requires the de-
ployment of content routers, which can be imple-
mented on top of existing hardware using packet
filtering and redirection techniques. In particular,
hosts still use conventional DNS lookup to get an ad-
dress, but benefit from reduced dependence on dis-
tant root name servers and lower latency to access
local content servers. However, some customers may
be running their own name servers and avoiding the
use of ISP DNS servers, and thus see no benefit until
they reconfigure.
The overhead of doing content routing in this man-
ner is very small, since the ISP’s name server already
does DNS packet handling. Only the cost of access-
ing the name-based routing table would be wasted
on names not in the content routing system— much
less than the 0.5 ms described above.
ISPs who already peer at the IP routing level are
motivated to peer at the content routing level to pro-
vide their customers faster access to nearby content
servers— and increase the benefit of placing con-
tent servers in their network. As demand grows,
additional content routers can be placed to handle
the increased usage without user-visible changes. As
the content routing topology evolves to more closely
matches IP routing topology, the content routing
system can make more accurate decisions.
7 Related Work
The original Internet directory service was supplied
by a “hosts.txt” file that listed all hosts in the In-
ternet. As the Internet grew, this approach was
replaced by DNS [13] in 1985. Subsequent work
on “network directories” such as X.500 attempts to
support naming of other types of objects such as
mailboxes and users, and providing more flexible
ways of specifying identification, such as lists of at-
tributes. We choose instead to restrict the entities
being named (and the namespace) in order to im-
prove scalability.
Content routing builds on the decentralized naming
approach advocated from experience in the V dis-
tributed system [3], That is, as in V, names are the
primary identification of objects, hosts and content
volumes in this case, and each server implements the
naming for the objects it implements. There is no
centralized name repository.
Current wide-area content routing depends on
HTTP- or DNS-level redirection, and is generally
done on the “server side”. For instance, Cisco’s
Distributed Director (DD) redirects a name lookup
from the main site to a replica site closer to request-
ing client address, based on responses from a set
of participating routers running an agent protocol,
supporting DD. Unfortunately, the client incurs the
response time penalty of accessing this main site
DD before being directed to the closer site. Pro-
prietary schemes by Akamai, Sightpath, Arrowpoint
and others appear to work similarly. These propri-
etary content distribution networks can be centrally
monitored and managed, unlike name-based rout-

Page 10
ing. This may lead to better understanding of net-
work performance; however, CDNs still rely upon
the existing IP routing framework for content de-
livery, so the amount of benefit to be gained from
a proprietary overlay network is limited. Addition-
ally, we believe future work on advanced routing de-
signs and improved network management is applica-
ble to name-based routing as well as IP routing. (As
research and experience with BGP has shown [11],
wide-area routing is neither easy to understand nor
easily tuned.)
Smart Server Selection [14], in contrast, is a “client-
side” approach to content routing. An authoritative
name server for a content volume returns all avail-
able addresses for replicas of the content, and the
client (or the client’s name server) interacts with a
nearby router to obtain routing metrics for this set
of addresses and chooses the nearest one. This re-
quires cooperation from the router in the form of a
request protocol, adding an extra step to the name
lookup process. Smart Server Selection also does not
address the problem of name lookup latency.
Similarly, some DNS servers measure round-trip
times to known name servers in order to choose the
lowest-latency server, especially at the root level.
Although this can improve the performance of name
lookups by lowering the mean lookup latency, it only
helps at one level of a cache miss. Further, such
name servers are ignorant of network conditions and
thus may experience several timeouts before switch-
ing their preference to an alternate server.
Intentional naming [1] integrates name resolution
and message delivery, offering application-layer any-
cast and multicast similar to our proposed con-
tent layer. The Intentional Naming System offers
attribute-based naming, a much richer form of con-
tent addressing than URLs or domain names. How-
ever, INS is not designed to provide global reacha-
bility information, and the attribute-based naming
is less scalable than the a hierarchal namespace pro-
vided by URLs. INS’s “late binding”, where every
message packet contains a name, is too expensive
to use for content distribution; our proposed archi-
tecture corresponds to INS’s “early binding”, where
names are resolved to addresses before content is ex-
changed.
Much work has been done on distributed caching
schemes; one design very similar in spirit to our
content-layer routing is “adaptive web caching” [16].
In this system, caches exchange information about
which web pages they currently hold (in order to
eliminate the need for “cache probing”) and main-
tain “URL routing tables”. Our design does not offer
routing on the granularity of URL prefixes, as adap-
tive web caching does, but offers a more compre-
hensive solution intended to replace current naming
systems.
Network-level anycast designs such as GIA [8] at-
tempt to solve server location problems at the IP
level. The semantics of an anycast IP address are
to deliver a packet destined for that address to the
“best” of an available pool of servers. GIA does
not incorporate application-level metrics, so anycast
packets may be routed to an unresponsive server
without providing any recourse for the client. Also,
unless a client is statically configured with all needed
anycast addresses, it must still use a directory to de-
termine the address to use.
8 Future Directions
The motivation for a “content layer” approach came
as part of the TRIAD[2] project. TRIAD is a
new, NAT-friendly Internet architecture which seeks
to reduce dependency on addresses by promoting
names as transport-layer endpoints. In a TRIAD
Internet, all large-scale routing would occur at the
naming level. We believe this approach is ultimately
more scalable and deployable than attempts to solve
problems (such as mobility, multihoming, anycast-
ing, and wide-area addressing) at the network level.
Two features of TRIAD enhance the content rout-
ing architecture. TRIAD provides extended ad-
dressing via the Wide-Area Relay Addressing Pro-
tocol (WRAP), which provides loose-source routing
among multiple address realms. WRAP addresses
can be used to specify a path through the network,
ensuring that the route selected by the content rout-
ing layer is the path actually used by data packets.
TRIAD also integrates TCP connection setup into
INRP name lookup; by sending TCP connection ini-
tiation information inside an INRP request, the la-
tency for web transactions can be lowered yet fur-
ther. Thus, the full TRIAD architecture integrates
naming, routing, and connection setup into a single
framework.
INRP allows proxies and web caches to intercept
content requests based on URL. We have not imple-
mented or fully explored this design, but it appears
to be a promising way to provide “semi-transparent”
proxies, which would require no explicit configura-
tion at the client, but would be used by the client as
a content request’s TCP connection endpoint.
Finally, the integration of naming and routing al-

Page 11
lows feedback-based routing. Conventional IP routing
schemes have few ways to tell if the routes they select
actually deliver packets to the intended destination.
Content routers, however, can track the responses
they receive to forwarded queries, allowing them
to make better decisions and react more quickly to
routing problems than conventional routers. For ex-
ample, if content servers send back load information
in INRP responses, then content routers can obtain
up-to-date load information on heavily used sites
without placing this load information into routing
updates.
The most current version of this paper can be found
at [9].
9 Conclusions
Current content routing solutions will not scale to
handle increasing global demands for content. Con-
ventional content routing distributes content deliv-
ery but does not effectively distribute content discov-
ery. Further, the proprietary nature of most content
routing designs in use today makes them undesirable
for global use and are in conflict with the Internet
open standard philosophy.
The content layer — integrated naming and routing
— provides a mechanism for large-scale content rout-
ing that addresses these issues. By pushing naming
information out into the network, content routers
allow fast location of nearby content replicas; in
essence, content routers provide the same service for
naming that CDNs do for the content itself.
We developed NBRP to distribute names in this
fashion and INRP to perform efficient lookup on this
distributed integrated named-based routing system.
Our results indicate that client name lookup is then
faster and far less variable.
The content layer can be easily deployed to provide
immediate benefits to ISPs and their customers. Our
implementation, and the networking community’s
experience with BGP, give confidence that name-
based routing can scale at least to the demands of
content routing for popular content. We anticipate
that additional research and experience will demon-
strate the feasibility of using name-based routing for
all Internet naming.
10 Acknowledgments
The TRIAD project was supported by the
US Defense Advanced Research Projects Agency
(DARPA) under contract number MDA972-99-C-
0024. This paper greatly benefited from the guid-
ance of our shepherd, Steven Gribble, and the com-
ments of the USITS reviewers; thanks also to Vin-
cent Laviano and Dan Li for their feedback.
References
[1] William Adjie-Winoto, Elliot Schwartz, Hari
Balakrishnan, and Jeremy Lilley, “The design
and implementation of an intentional naming
system”, Proc. 17th ACM SOSP, Dec. 1999.
[2] David Cheriton and Mark Gritter, “TRIAD:
A New Next-Generation Internet Architec-
ture”, http://www.dsg.stanford.edu/triad,
July 2000.
[3] D. R. Cheriton and T.P. Mann, “Decentralizing
a Global Naming Service for Improved Perfor-
mance and Fault Tolerance”, ACM TOCS, May
1989.
[4] k claffy, G. Miller, and K. Thompson, “The na-
ture of the beast: recent traffic measurements
from an Internet backbone”, INET 98.
[5] E. Cohen and H. Kaplan, “Prefetching the
means for document transfer: A new approach
for reducing Web latency”, INFOCOM 2000.
[6] Internet
Domain
Survey,
July
1999,
http://www.isc.org/ds/.
[7] Internet Performance Measurement and Analy-
sis project, http://www.merit.edu/ipma/.
[8] Dina Katabi and John Wroclawsi, “A Frame-
work for Scalable Global IP-Anycast (GIA)”,
SIGCOMM 2000.
[9] Mark
Gritter
and
David
R. Cheri-
ton,
An
Architecture
for
Content
Routing
Support
in
the
Internet”,
http://www.dsg.stanford.edu/papers/
contentrouting/, 2001.
[10] S. Kent, C. Lynn, and K. Seo, “Secure Border
Gateway Protocol (S-BGP)”, IEEE Journal on
Selected Areas in Communication, 1999.
[11] Craig Labovitz, G. Robert Malan, and Farnam
Jahanian, “Internet Routing Instability”, ACM
SIGCOMM Conference, September 1997.
[12] Sean McCreary and kc claffy, “Trends in Wide
Area IP Traffic Patterns”, ITC Specialist Sem-
inar on IP Traffic Modeling, Measurement and
Management, September 2000.

Page 12
[13] P. Mockapetris, “Domain Names – Concepts
and Facilities”, RFC 882, November 1983 (ob-
soleted by RFC 1034 and 1035).
[14] W. Tang, F. Du, M. W. Mutka, L. Ni, and
A. Esfahanian, “Supporting Global Replicated
Services by a Routing-Metric-Aware DNS”,
Proceedings of the 2nd International Workshop
on Advanced Issues of E-Commerce and Web-
Based Information Systems (WECWIS 2000),
June 2000.
[15] Y. Rekhter, T. Li (editors), “A Border Gateway
Protocol 4 (BGP-4)”, RFC 1771, March 1995.
[16] Lixia Zhang, Scott Michel, Khoi Nguyen, Adam
Rosenstein, Sally Floyd, and Van Jacobson,
“Adaptive Web Caching: Towards a New
Global Caching Architecture”, 3rd Interna-
tional WWW Caching Workshop, June 1998.