Apache attacked by a "slow loris"

Posted Jun 24, 2009 15:28 UTC (Wed) by mcmanus (guest, #4569)
Parent article: Apache attacked by a "slow loris"

The root of the problem is that the threads are a scarce resource in the Apache model, and slow clients block those threads, right? In this case the consumers are intentionally slow, but fundamentally it isn't much different than your website becoming really popular with users on bad cell phone networks.

An async architecture based around epoll would certainly make the thing scale much better - I've worked on systems that could handle 100,000 idle (or extremely slow) http connections without meaningfully impacting the performance of a few hundred live ones. Such an architecture is even mentioned on the apache-dev link that is provided in the article, so its not like that's news.

But that kind of hand-managed parallelism development style is what a thread is meant to abstract away. The real question to me is why threads are such a scarce and/or unscalable resource?

Specifically, I am curious to know what happened to all the talk of linux 2.6 and 100,000 concurrent threads from a few years back: http://kerneltrap.org/node/422

Should the default config be cranking up the max number of thread allocations (and the default stack sizes for them down), and if not - where is the bottleneck that prevents that?

Apache attacked by a "slow loris"

Posted Jun 24, 2009 15:34 UTC (Wed) by mjthayer (guest, #39183) [Link] (6 responses)

I think that the main bottleneck is virtual memory - each thread requires a few kilobytes of stack, and on 32 bit archetectures that quickly reaches its limit. With 64 bits and enough memory that is much less of a problem as far as I know.

Apache attacked by a "slow loris"

Posted Jun 24, 2009 16:17 UTC (Wed) by NAR (subscriber, #1313) [Link] (1 responses)

So I suppose this bug wouldn't affect an Erlang-based web server like yaws, because the Erlang VM is actually supposed to handle 100000 threads.

Anyway, is this bug HTTP-protocol specific? I'd guess it can affect any services which are expected to keep a session open for a long time, if the client can use a lot less memory for each connection than the server...

Apache attacked by a "slow loris"

Posted Jun 25, 2009 9:52 UTC (Thu) by Darkmere (subscriber, #53695) [Link]

This "bug" is the same as the SMTP tarpit and others, but instead of working on the server side, it works on the client side, against the server.

So, it works on the same basis that so many other attacks do, find a limited resource on the server, find a way to make the server hit that limit, lean back and reap profits.

Apache attacked by a "slow loris"

Posted Jun 24, 2009 16:20 UTC (Wed) by mcmanus (guest, #4569) [Link]

I think it should work: 32 bit gives a default 3:1 split, and can get 98,000 32KB stacks out of 3GB... using 8KB kernel stacks you can get 128,000 corresponding kernel stacks.

Obviously you can't use all of the vmm map for thread stacks - but you can use a lot of it. And if 32KB is not enough, then its a state problem not a threading problem (i.e. an async design scenario would need to stash the state somewhere too)). Plus apache has that hybrid process/thread model, so each process has its own vm map.

Maybe 100K threads is hyperbole if you're all in one process, but 50+K seems quite reasonable from a memory management standpoint and makes a thread much less of a scarce resource than it is in the default apache config (2 or 3 hundred iirc).

I'm more curious if Linux kernel/libc could handle the creation and scheduling of so many threads. If not then that's kind of a setback from the direction of things when 2.6 was being released (see the link I provided last post), and if it can it seems a much easier approach than rearchitecting the application.

Fundamentally, isn't this multiplexing the kind of thing the OS should be doing for you efficiently?

Apache attacked by a "slow loris"

Posted Jun 24, 2009 16:31 UTC (Wed) by smurf (subscriber, #17840) [Link] (1 responses)

64bit machines also have a limited amount of memory. It's not just the thread stack size; Apache carries a whole lot of memory per conection. It's much worse when each connection is handled by a Perl/Python/PHP thread.

Apache is generally configured so that the maximum number of _real_ work threads (i.e. including all that state) doesn't cause the system to swap excessively. A slowloris connection eats much fewer resources than that, but Apache doesn't know that and thus reaches the configuration's limit far too quickly.

Apache attacked by a "slow loris"

Posted Jun 24, 2009 17:03 UTC (Wed) by mcmanus (guest, #4569) [Link]

Fair enough.

But it does seem to me that if what Apache is trying to prevent is memory exhaustion, then it should be doing admission control based on memory allocated instead of using max clients as a poor stand-in. Especially as the two don't correlate well at all even in normal (i.e. non DoS) situations.

Apache attacked by a "slow loris"

Posted Jun 25, 2009 5:29 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

(Reading this over, it's a bit of a ramble. Sorry about that.) Virtual memory isn't the problem per se. Apache isn't running out of virtual memory. The attack is against Apache's own limit on the number of simultaneous outstanding requests. Returning to the supermarket analogy, the problem isn't running out of space in the store for cashiers, but simply tying up all the existing ones.

Now, you can increase these limits in order to defeat the attack. There's the problem: Apache's resource consumption can be too high to service the number of simultaneous connections required. For prefork servers, the problem is especially severe: each simultaneous request requires its own process. Now, any serious operating system can handle thousands of processes; that's not the problem. You can even have thousands of Apache processes, since Apache workers are actually pretty light, memory-wise, and all the code and some of the data structures are shared among all workers.

More an issue is the mod_* idiocy that embeds an interpreter in each worker process*. Then, when you have thousands of processes, you run into a problem far more severe than virtual address space exhaustion: swap thrashing, poor performance, machine lockups, and OOM killing sprees.

Using threads, in principle, helps the problem. Each thread shares the same process, meaning the additional memory occupied by each new simultaneous connection is just the memory needed to keep track of that connection; if all the threads share a single copy of the interpreter and other data structures, I bet you can avoid the slowloris problem entirely by simply setting connection limits high enough.

A small variant of this approach is to use multiplexed I/O, as in lighttpd; that's like using the threaded approach above, but you don't need a thread stack for each simultaneous connection. In practice, if you make the thread stack small enough, you can still fit thousands in a 32-bit virtual address space.

The problem with threads is that you need threads, though! The free world's most inexplicably-popular web scripting language, PHP, doesn't have a thread-safe interpreter. That limitation us to another solution: divorce the heavyweight state from the HTTP handler, and use something like FastCGI to communicate to the interpreter. That way, you can have as many (cheap) HTTP connection handlers (threads, processes, or state machines) as you'd like, and still limit the number of heavyweight interpreters you need. (Using a separate processes for the heavy lifting is better than mixing all the threads together anyway; see below.)

To deal with slowloris attacks, just batch all the headers up in a request before sending the whole request on to the heavyweight process that actually does something meaningful with it. (That's what mod_fastcgi does.) That way, a slowloris-using attacker can pound away at the lightweight worker thread (or process), and only when a complete request is read will the system actually commit to using a significant resource, one of the heavyweight FastCGI servers.

lighttpd can't use embedded interpreters; it's state machine precludes that. Instead, people generally set it up to use FastCGI: guess why lighttpd is not vulnerable to slowloris attacks. If you configure Apache appropriately, you can make it work similarly.

(Before you say, "but wait! FastCGI is pretty slow!" -- that's simply not true. The HTTP user-agent's communication to the server is orders of magnitude slower than the local communication between Apache and a FastCGI server. The FastCGI communication doesn't add any meaningful time to each request, so it doesn't increase latency. And since each application is sequestered into its own process instead of running in each worker process in Apache, total memory use can actually be lower than the mod_* model.

As if that weren't a good enough reason to use FastCGI (or something similar, like scgi), it's also a lot easier to keep track of your web application's system resource usage as distinct from the web server's. You can actually measure the CPU consumption of, say, squirrelmail without conflating it with Apache's.)

Oh, and there's an even better reason: nothing forces a FastCGI server to run as the same user as the web server. Finally, you can put an end to anything remotely related to the web having to be owned by apache. Each application can have its own user, and be limited to its own little corner of the machine just like any other kind of network-facing daemon. Really, mod_* is just an ugly hack that's completely unnecessary today.)

Apache attacked by a "slow loris"

Posted Jun 24, 2009 19:22 UTC (Wed) by guus (subscriber, #41608) [Link]

A simple solution would be this: if you exhausted all threads/sockets/whatever resource, and a new request comes in, drop an old thread/socket/resource. Of course, there should be some tweaks to make things fair for slow but legitimate clients.

Apache attacked by a "slow loris"

Posted Jun 25, 2009 15:35 UTC (Thu) by elanthis (guest, #6227) [Link]

You can't lower the stack sizes. One of the biggest reasons that Apache administrators have to keep with older Apache MPMs is for crap like the mod_php, which is likely going to crash with lower stack sizes.

Honestly, if at the end of the day you don't need any of the Apache modules (maybe you're using regular CGI or fast-CGI or something), then just don't use Apache, but instead use one of the newer, more efficient, more scalable Open web servers, like lighttpd, cherokee, etc.

(Unfortunately, the lack of .htaccess support in those servers, and hence the lack of mod_rewrite-compatible user-configurable rewrites and redirects, is a real issue for a lot of hosts. If those servers added an optional module for basic .htaccess-like support, they'd probably skyrocket in adoption.)