1. Choosing A Proxy Server
ApacheCon 2014
Bryan Call
ATS Committer / Yahoo
2. About Me
• Yahoo! Employee
– WebRing, GeoCities, Personals, Tiger Team, Platform
Architect, Edge Team, Research, ATS and HTTP
(HTTP/2 and TLS at IETF)
• Working on Traffic Server for 7 years
– Since 2007
• Part of the team that open sourced it in 2009
• ATS Committer
3. Overview
• Types of Proxies
• Features
• Architecture
• Cache Architecture
• Performance
• Pros and Cons
6. Reverse Proxy
• Proxy in front of your own web servers
• Caching?
• Geographic location?
• Connection handling?
• SSL termination?
• SPDY support?
• Adding business logic?
14. Features
ATS NGiNX Squid Varnish Apache httpd
mod_proxy
Reverse Proxy Y Y Y Y Y
Forward Proxy Y N Y N Y
Transp. Proxy Y N Y N Y
Plugin APIs Y Y partial Y Y
Cache Y Y Y Y Y
ESI Y N Y partial N
ICP Y N Y N N
SSL Y Y Y N Y
SPDY Y* Y N N partial
* 5.0.0 (May 2014)
31. Cache
• Mainly two types
– File system
– Database like
• In memory index
– Bytes per object
• Minimize disk seeks and system calls
32. Cache
ATS NGiNX Squid Varnish Apache httpd
mod_cache
File system X X X
mmap X
Raw disk/direct IO X X
Ram cache X X
Memory index X X X*
Persistent cache X X X X
41. • Squid used the most CPU
and the worst median
latency
• 95th percentile latency
with NiGNX, Squid and
httpd 0
500
1000
1500
2000
2500
ATS NGiNX Squid Varnish httpd
RPS / CPU Usage
0
20000
40000
60000
80000
100000
120000
ATS NGiNX Squid Varnish httpd
Requests Per Second
0
2
4
6
8
10
12
14
16
18
ATS NGiNX Squid Varnish httpd
Latency
Median
95th
42. Benchmark 2
• 1,000 clients
• 8KB response
• 100% cache hit
• Keep-alive off
43. • Squid used the most
CPU again
• NGiNX had latency
issues
• ATS most throughput 0
500
1000
1500
2000
2500
ATS NGiNX Squid Varnish httpd
RPS / CPU Usage
0
5000
10000
15000
20000
25000
30000
ATS NGiNX Squid Varnish httpd
Requests Per Second
0
5
10
15
20
25
30
35
40
ATS NGiNX Squid Varnish httpd
Latency
Median
95th
44. ATS
• Pros
– Scales well automatically, little config needed
– Best cache implementation
• Cons
– Too many config files
– Too many options in the default config files
45. NGiNX
• Pros
– Lots of plugins
– FastCGI support
• Cons
– HTTP/1.1 compliance
– Latency issues around accepting new connections
– Rebuild server for new plugins
46. Squid
• Pros
– Best HTTP/1.1 compliance
• Cons
– Memory index for cache using 10x that of ATS
– Least efficient with CPU
– Worst median latency for keep-alive benchmarks
47. Varnish
• Pros
– VCL (Varnish Configuration Language)
• Can do a lot without writing plugins
• Cons
– Thread per connection
– mmap for cache
• Persistence is experimental
– No SSL or SPDY support
48. Apache httpd
• Pros
– Lots of plugins
– Most used http server
– Best 95th percentile latency for non-keep-alive
• Cons
– SPDY Support
49. Why ATS?
• Scales well
– CPU Usage, auto config
• Cache scales well
– Efficient memory index, minimizes seeks
• Apache Community
• Plugin support
– Easy to port existing plugins over
A reverse proxy, aka a web accelerator, does not require the browser to cooperate in any special way. As far as the user (browser) is concerned, it looks like it’s talking to any other HTTP web server on the internet. The reverse proxy server on the other hand must be explicitly configured for what traffic it should handle, and how such requests are properly routed to the backend servers (aka. Origin Servers). Just as with a forward proxy, many reverse proxies are configured to cache content locally. It can also help load balancing and redundancy on the Origin Servers, and help solve difficult problems like Ajax routing.
* Before we go into details of what drives Traffic Server, and how we use it, let me briefly discuss the three most common proxy server configurations.* In a forward proxy, the web browser has to be manually (or via auto-PAC files etc.) configured to use a proxy server for all (or some) requests. The browser typically sends the “full” URL as part of the GET request.The forward proxy typically is not required to be configured for “allowed” destination addresses, but can be configured with Access Control List, or blacklists controlling what requests are allowed, and by whom. A forward proxy is typically allowed to cache content, and a common use case scenario is inside corporate firewalls.
An intercepting proxy, also commonly called a transparent proxy, is very similar to a forward proxy, except the client (browser) does not require any special configuration. As far as the user is concerned, the proxying happens completely transparently. A transparent proxy will intercerpt the HTTP requests, modify them accordingly, and typically “forge” the source IP before forwarding the request to the final destination. Transparent proxies usually also implements traffic filters and monitoring, allowing for strict control of what HTTP traffic passes through the mandatory proxy layer. Typical use cases include ISPs and very strictly controlled corporate firewalls. I’m very excited to announce that as of a few days ago, code for transparent proxy is available in the subversion tree.
Squid – SPDY not on roadmap- http://wiki.squid-cache.org/Squid-3.5 or in the bugs for 3.5 – no progress http://wiki.squid-cache.org/Features/HTTP2ESI – Edge Side Includes - http://en.wikipedia.org/wiki/Edge_Side_IncludesICP - Internet Cache Protocol -http://www.ietf.org/rfc/rfc2186.txthttpd - mod_spdy uses Chromium's SpdyFramer class to encode and decode SPDY frames.
https://istlsfastyet.com/ - IlyaGrigorik
NGiNX – doesn’t handle accept-encoding or vary at all
Multithreading allows a process to split itself, and run multiple tasks in “parallel”. There is significantly less overhead running threads compared to individual processes, but threads are still not free. They need memory resources, and incur context switches. It’s a known methodology for solving the concurrency problem, and many, many server implementations relies heavily on threads. Modern OS’es have good support for threads, and standard libraries are widely available.
Events are scheduled by the event loop, and event handlers execute specific code for specific events This makes it easier to code for, there’s no risk of deadlock or race condition Can handle a good number of connections (but not unlimited) Squid is a good example of an event driven server.
Events are scheduled by the event loop, and event handlers execute specific code for specific events This makes it easier to code for, there’s no risk of deadlock or race condition Can handle a good number of connections (but not unlimited) Squid is a good example of an event driven server.
Squid - 72 or 104 bytes of metadata in memory for every object in your cache. http://wiki.squid-cache.org/SquidFaq/SquidMemory#Why_does_Squid_use_so_much_memory.21.3FATS – 10 bytes
Squid – ufs (filesystem) – rock store (database style)Varnish – since it is a mmap cache and the index is part of the mmap it has a in memory indexATS – Using a “cyclone cache” similar to a log based file system – merges writes less seeking
ATS – should auto config accept threads
NIGX – uses the least CPU, but has really bad latenciesATS – most tuses a lot less CPU then Squid, Varnish, httpd