Staring At Empty Pages: Computers

Showing posts with label Computers. Show all posts

Monday, July 25, 2011

Inventing the Internet

I’m in Québec City this week for the IETF meeting. A group of us were having dinner last evening, and at the end of the meal, as we were paying, the waitress asked us what we were all in town for. We told her were were at a meeting to work on standards for how things talk to each other on the Internet.

So she tells us about a crazy lady who comes in the restaurant every afternoon. The lady claims to have invented a bunch of things, and one thing she says is that she invented the Internet. After someone makes the required Al Gore joke, I say, well, to tell you the truth, no one at this table qualifies but we do actually have some people in our group who actually did invent the Internet. She says It’s one person who did it?, and we say no, maybe eight or ten or so... and at least four of them really are here this week.

Tuesday, June 21, 2011

Misconceptions about DKIM

I chair the DKIM working group in the IETF. The working group is finishing up its work, about ready to publish an update to the DKIM protocol, which moves DomainKeys Identified Mail up the standards track to Draft Standard.

DKIM is a protocol that uses digital signatures to attach a confirmed domain name to an email message (see part 7, in particular). DKIM started from a simple place, with a simple problem statement and a simple goal:

Email messages have many addresses associated with them, but none are authenticated, so none can be relied on.
Bad actors — spammers and phishers — take advantage of that to pretend they are sending mail from a place (a domain name) the recipient might trust, in an attempt to fool the recipient.
If we can provide an authenticated domain name, something that’s confirmed and that a sender can’t fake, then that information can be used as part of the delivery system, as part of deciding how to handle incoming mail.

It’s important to note that mail signed with DKIM isn’t necessarily good mail, nor even mail from a good place. All we know is that mail signed with DKIM was digitally signed by a specified domain. We can then use other information we have about that domain as part of the decision to deliver the message to the user’s inbox, to put it in junk mail, to subject it to further analysis or to skip that analysis, and so on.

Domain example.com signed this message, is just one of many pieces of information that might help decide what to do.

But some people — even some who have worked on the development of the DKIM protocol — miss the point, and put DKIM in a higher position than it should be. Or, perhaps more accurately, they give it a different place in the email delivery system than it should have.

Consider this severely flawed blog post from Trend Micro, a computer security company that should know better, but doesn’t:

In a recently concluded discussion by the [DKIM Working Group], some of those involved have decided to disregard phishing-related threats common in today’s effective social engineering attacks. Rather than validating DKIM’s input and not relying upon specialized handling of DKIM results, some members deemed it a protocol layer violation to examine elements that may result in highly deceptive messages when accepted on the basis of DKIM signatures.

The blog post describes an attack that takes a legitimately signed message, alters it in a way that does not invalidate the DKIM signature (taking advantage of some intentional flexibility in DKIM), and re-sends the message as spam or phishing. The attacker can add a second from address, and appear to the user to be from a trusted domain, though the DKIM signature is not.

The attack sounds bad, but it really isn’t, and the Trend Micro blog’s conclusion that failure to absolutely block this makes DKIM an EVIL protocol (their words) is not just overstated, but laughable and ridiculous. It completely undermines Trend Micro’s credibility.

Here’s why the attack is overstated:

It relies on the sender’s ability to get a DKIM signature on a phishing message, and assumes the message will be treated as credible by the delivery system.
It ignores the facts that delivery systems use other factors in deciding how to handle incoming messages and that they will downgrade the reputation score of a domain that’s seen to sign these sorts of things.
It ignores the fact that high-value domains, with strong reputations, will not allow the attackers to use them for signing.
The attack creates a message with two from lines, and such messages are not valid. It ignores the fact that delivery systems will take that into account as they score the message and make their decisions.

Apart from that, the blog insists that the right way to handle this attack would be to have DKIM go far beyond what it’s designed to do. Rather than just attaching a confirmed domain name to the message, DKIM would, Trend Micro says, now have to check the validity of messages during signature validation. Yes, that is a layer violation. Validity checking is an important part of the analysis of incoming email, but it is a separate function that’s not a part of DKIM. All messages, whether DKIM is in use or not, should be checked for being well-formed, and deviations from correct form should increase the spam score of a message. That has nothing to do with DKIM.

In fact, the updated DKIM specification does address this attack, and suggests things that delivery systems might do in light of it. But however good that advice might be, it’s not mandated by the DKIM protocol, because it belongs in a separate part of the analysis of the message.

Others have also posted rebuttals of the Trend Micro blog post. You can find one here, at CircleID, and look in the comments there for pointers to others.

Wednesday, April 27, 2011

Ephemeral clouds

I’ve talked about cloud computing a number of times in these pages. It’s a model of networking that in some ways brings us back to the monolithic data center, but in other ways makes that data center distributed, rather than central. A data cloud, an application cloud, a services cloud. An everything cloud, and, indeed, when one reads about cloud computing one sees a load of [X]aaS acronyms, the aaS part meaning as a service: Software as a Service (SaaS), Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and so on.

I use email in the cloud. I keep my blog in the cloud. I post photos in the cloud. I have my own hosted domain, and I could have my email there, my blog, there, my photos there... but who would maintain the software? I could pay my hosting service extra for that, perhaps, but, well, the cloud works for me.

It works for many small to medium businesses, as well. Companies pay for cloud-based services, and, in return, the services promise things. There are service-level agreements, just as we’ve always had, and companies that use cloud-based services get reliability and availability guarantees, security guarantees, redundancy, backups in the cloud, and so on. Their data is out there, and their data is protected.

But what happens when they want to move? Suppose there’s a better deal from another cloud service. Suppose I, as a user, want to move my photos from Flickr to Picasa, or from one of those to a new service. Suppose a company has 2.5 terabytes of stuff out there, in a complex file-system-like hierarchy, all backed up and encrypted and safe and secure... and they want to move it to another provider.

In the worst case, suppose they have to, because their current service provider is going out of business.

Recently, Google Video announced that they would take their content down, after having shut the uploads down (in favour of YouTube) some time ago. This week, Friendster announced that they would revamp their service, removing most of their data in the process.

Of course, you understand that when I say their data, here, I really mean your data, yes? Because those Google Video things were uploaded by their users, and the Friendster stuff is... well, here’s what they say:

An e-mail sent Tuesday to registered users told them to expect a new and improved Friendster site in the coming weeks. It also warned them that their existing account profile, photos, messages, blog posts and more will be deleted on May 31. A basic profile and friends list will be preserved for each user.

Now, that sort of thing can happen: when you rely on a company for services, the company might, at some point, go away, terminate the service, or whatnot. But what’s the backup plan? Where’s the migration path? In short...

...how do you save your data?

Friendster has, it seems, provided a exporter app that will let people grab their stuff before it goes away. Google Video did no such thing, and there’s a crowd-sourced effort to save the content. But in the general case, this is an issue: if your provider goes away — or becomes abusive or hostile — how easy will it be for you to get hold of what you have stored there, and to move it somewhere else?

Be sure you consider that when you make your plans.

[Just for completeness: I have copies on my own local disks of everything I’ve put online... including archives of the content of these pages. If things should go away, it might be a nuisance, but I’ll have no data loss.]

Tuesday, March 08, 2011

You’re in a maze of twisty little passages, all alike.

There’s a phrase given to us by the venerable computer game called Adventure, which fits many situations. The game, in which one explores caverns, searches for treasures, and solves puzzles to obtain the treasures and bring them back to the surface, contains two mazes.

Most adventurers find the first maze when they go south from a particular room in the cave. You’re in a maze of twisty little passages, all alike, says the computer, adding that there are passages leading off in all directions. One’s first thought is to go north to retrace one’s steps, but, well, the passages are twisty (and little), and going north from there only lands one in another room within the maze. Again, You’re in a maze of twisty little passages, all alike.

The directions are not random, and there actually is a well-defined maze here, which one can map. Enter the maze while carrying as many items as you can, and you can drop the items like bread crumbs. You have to keep carrying the lamp in order to see, but as you drop the rest, one by one, the rooms become distinct:

You’re in a maze of twisty little passages, all alike.
There is a bottle of water here.
You’re in a maze of twisty little passages, all alike.
There is tasty food here.
You’re in a maze of twisty little passages, all alike.
There are some keys on the ground here.

Interestingly, the maze comprises two lobes, connected by a single passage. Because the only (non-magic) exit from the maze is the way you came in, wandering into the far lobe makes it much more difficult to ever get out, and you’re likely to run out of battery power in your lamp, fall into a pit in the dark, and die.

Such is the maze of twisty little passages, all alike.

The second maze is interesting only for its differences. It’s also entered by heading south, from a different starting room. Its map is much more complex than that of the other, with many more passages interconnecting the rooms, though with fewer rooms. When you enter it, you see, You’re in a maze of twisty little passages, all different.

But this time you don’t have to leave bread crumbs; you have only to read the descriptions carefully:

You’re in a maze of twisty little passages, all different.
You’re in a maze of little twisty passages, all different.
You’re in a twisty maze of little passages, all different.
You’re in a little twisty maze of passages, all different.

...and so on.

The all-different maze is cute, and, as I said above, mostly there for its contrast with the all-alike maze (most adventurers stumble into the all-alike one first). There’s a treasure in the all-alike maze, so you have to go in there (and kill the pirate) in order to get it. There’s nothing you need in the all-different maze, and experienced adventurers just avoid it.

And anyway, it’s the first description that’s stuck with us as a catch phrase: You’re in a maze of twisty little passages, all alike.

When you’re having a discussion that keeps going around in circles with no hope for resolution: You’re in a maze of twisty little passages, all alike.

When you’re debugging a problem, but everything you try just makes the problem happen without adding any clue as to why: You’re in a maze of twisty little passages, all alike.

When you’re trying to deal with bureaucracy, and every attempt to get something done just sends you to another office that you know won’t help any more than the last did: You’re in a maze of twisty little passages, all alike.

Very useful sentence, that.

Friday, March 04, 2011

Reasonable network management

Back in December, the U.S. Federal Communications Commission released a Report and Order specifying new rules related to network neutrality. The rules have since been challenged in court in separate suits by Verizon and Metro PCS. They’re also under attack by the House of Representatives, though whatever they do is unlikely to pass the Senate and the president.

The Report and Order is quite long and involved, a typical federal document that runs to 194 pages (here’s a PDF of it, in case you’d like to read the whole thing). On page 135 there begins a statement by FCC Chairman Genachowski, which contains, on page 137, five points, key principles, as Mr Genachowski says, that lead to key rules designed to preserve Internet freedom and openness. That’s sort of an executive summary of the document.

I’ll note principles four and five here:

Fourth, the rules recognize that broadband providers need meaningful flexibility to manage their networks to deal with congestion, security, and other issues. And we also recognize the importance and value of business-model experimentation, such as tiered pricing. These are practical necessities, and will help promote investment in, and expansion of, high-speed broadband networks. So, for example, the order rules make clear that broadband providers can engage in reasonable network management.
Fifth, the principle of Internet openness applies to mobile broadband. There is one Internet, and it must remain an open platform, however consumers and innovators access it. And so today we are adopting, for the first time, broadly applicable rules requiring transparency for mobile broadband providers, and prohibiting them from blocking websites or blocking certain competitive applications.

In apparent response to those points, and taking transparency seriously, Verizon Wireless has recently updated their Customer Agreement (Terms and Conditions). If you scroll down to the bottom of that document, you’ll find a section called Additional Disclosures, the first paragraph of which says this:

We are implementing optimization and transcoding technologies in our network to transmit data files in a more efficient manner to allow available network capacity to benefit the greatest number of users. These techniques include caching less data, using less capacity, and sizing the video more appropriately for the device. The optimization process is agnostic to the content itself and to the website that provides it. While we invest much effort to avoid changing text, image, and video files in the compression process and while any change to the file is likely to be indiscernible, the optimization process may minimally impact the appearance of the file as displayed on your device. For a further, more detailed explanation of these techniques, please visit www.verizonwireless.com/vzwoptimization

That URL at the end lacks the http at the beginning and has not been made into a clickable link, but if you copy/paste it into your browser’s address bar, you’ll be redirected to a long page called Explanation of Optimization Deployment, full of technical details. It’s perhaps the most detailed and technical disclosure I’ve seen presented to consumers, full of terms such as Internet latency, quantization, codecs, caching, transcoding, and buffer tuning.

I have to say that the policy looks reasonable. They say that they apply their optimization (not really the right term, here, but that’s the marketing spin) to all content, including Verizon Wireless branded content. They compress images and transcode video to reach a compromise between fidelity to the original content and what’s likely to be useful on a mobile device, conserving transmission resources by doing it. But it also benefits the consumer by way of reduced data charges. They also, basically, stream the content (buffer tuning), so if you stop a video in the middle you don’t have to transmit (nor pay for the transmission of) the unwatched portion.

The only disadvantage of any of this as I see it is that there’s no way to turn it off. If you notice degradation of your video content and want to watch the original — and are willing to pay for extra data transmission that entails — you can’t.

As a first step, this looks good: it’s a reasonable policy that preserves the essence of neutrality and fits the reasonable network management model. Of course, Verizon Wireless may just be testing the water, introducing changes a little at a time, with the most benign changes first. We’ll have to see.

Tuesday, March 01, 2011

URL shorteners

If you’re ~~a twit~~ a Twitter user, you’ve likely used one or another of the URL shorteners out there. Even if you’re not, you may have run across a shortened URL. The first one I encountered, several years ago, was tinyurl.com, but there plenty of them, including bit.ly, tr.im, qoiob.com, tinyarrow.ws, tweak, and many others.

The way they work is that you go to one of them and enter a URL — say, the URL for this page you’re reading:

http://staringatemptypages.blogspot.com/2011/03/url_shorteners.html

...you click a button and get back a short link, such as this one:

http://bit.ly/eqHg3S

...that will get users to the same page. The shortened link redirects to the target page, and won’t take up too many characters in a Twitter or SMS message. It also may hide the ugliness of some horrendously long URL generated by, say, Lotus Domino.

On the other hand, it will also hide the URL that it points to. When you look at the bit.ly link above, you have no idea where it will take you. Maybe it’ll be to one of these august pages, maybe it will be to a New York Times article, maybe to a YouTube video, and maybe to a page of pornography. Click on a shortened URL at your own peril.

In addition, any URL you post, long or short, might eventually disappear (or, perhaps worse, point to content that differs from what you’d meant to link to), but if you post a load of shortened URLs to your blog or Twitter stream and then the service you used goes out of business, all your links will break at once. That didn’t used to happen, but can now. And because some of them use country-code top-level domains (.ly, .im, .tk, and .ws, for example), the services may be subject to disruption for other reasons — one imagines that the Isle of Man and Western Samoa might be stable enough, but if you’ve been watching the news lately you might be less sanguine about Lybia.

The more popular URL shorteners can also collect a lot of information about people’s usage patterns, using cookies to separate the clicks from distinct users. If they can get you to sign up and log in, they can also connect your clicks to your identity. There are definite privacy concerns with all this. URL shorteners run by bad actors can include mechanisms for infecting computers with worms and viruses before they send you on to the target site.

Of course, any URL can hide a redirect, and any URL can hide a redirect to a page you’d rather not visit. It’s just that URL shorteners are designed to hide redirects, and there are no lists of best practices for these services, along with lists of reputable shorteners that follow the best practices.

What would best practices for URL shortening services look like? Some suggestions, from others as well as from me:

Publish a usage policy that includes privacy disclosures and descriptions, parameters, and limitations for other items such as the ones below.
Provide an open interface to allow browsers to retrieve the target URLs without having to visit them. This allows browsers to display the actual target URL on mouse-over or with a mouse click. Of course, shortening-service providers might not want you to be able to snag the URL without clicking, because they may be getting business from the referrals. Services such as Facebook, while not shorteners, front-end the links posted on their sites for this reason. So we have a conflict between the interests of the users and the interests of the services.
Filter the URLs you redirect to, refusing to redirect to known illegal or abusive sites. Provide intermediate warning pages when the content is likely to be offensive, but not at the level of blocking.
Provide a working mechanism for people to report abusive targets, and respond to the reports quickly.
Don’t allow the target URL to be changed after the short link is created.
Related to the previous item, develop some mechanism to address target-page content changes. This one is trickier, because ads and other incidental content might change, while the intended content remains the same. It’s not immediately clear what to do, or whether there’s a good answer to this one.

Meanwhile, I never use URL shorteners to create links, and I try to avoid visiting links that are hidden behind them. I like to know where I’m clicking to.

Update, 11 March, this just in from BoingBoing:

Dear readers! URL shorteners’ popularity with spammers means we’ve blocked some of the big ones (at least temporarily) to cut down on the spammation. Sorry for the inconvenience! While we plan a long-term fix, just use normal URLs. You are welcome to use anchor tags in BB comments, too.

Monday, February 28, 2011

IP blocklists, email, and IPv6

Engineers in the Internet Engineering Task Force, in the Messaging Anti-Abuse Working Group, and elsewhere have been debating how to handle e-mail-server blocklists in an IPv6 network. Let’s take a look at the problem here.

We basically have three ways to address spam, in our goal of reducing the amount of spam in our inboxes:

Prevent its being sent in the first place.
Refuse to accept it when it’s presented for relay or delivery.
Discard it or put it into a junk mail folder at (or after) delivery.

The last is handled by what we usually think of as spam filters, which analyze the content and other aspects of the messages. Dealing with the first involves law enforcement, as well as adoption of best practices for legal email marketers. To implement the second, we try to do various analyses during the actual transmission of the email messages, in order to respond at the protocol level with some sort of refusal. It’s rather like standing between your postal carrier and the mailbox at your house, and telling the carrier that she may put this envelope into the box, but she should take those two catalogues and the credit-card offer right back to the post office with her.

And one can actually imagine doing that, by looking at the envelopes and applying rules such as, If it’s pre-sorted, it’s probably junk, and, The more urgent it claims to be, the more likely it is to be junk. But a better way, still, would be if we could get this to happen as soon as the junk mail entered the postal system, by having a way to say, See that guy who’s dropping that pile of mail at the post office? He only sends junk, and when you see him coming just make him go away. Don’t even let him bring his pile in the door.

We have that in our email systems, in what we call IP blocklists (or blacklists). These are lists of the numeric Internet addresses of email servers that we think send so much spam that we won’t even let them come to the door. When one of these servers makes an Internet connection to one of our mail servers, we don’t even start an email protocol exchange with them — we just refuse the connection. We make them go away.

Estimates vary as to what portion of attempted spam this blocks, but at least some estimates are on the order of 90%. Despite the problems with this mechanism (legitimate mail servers do find themselves on blocklists, for various reasons, and sometimes have a hard time getting the list-managers to remove them), it’s a critical one in the fight against spam, saving a great deal of time and computing resources by cutting the spam messages off much earlier in the process.

But note that it deals with IP addresses. Today, of course, that means IPv4 addresses, those things that look like 192.168.0.1, and that there are around 4 billion of. 4 billion is a large number, but, as we’ve seen, it’s notably finite and manageable. It’s reasonable to take every IP address we ever see trying to send mail, and keep it on a list, sorting the addresses into the good ones and the bad ones. It’s feasible to block Internet connections from the ones in our list that are marked bad.

Not so when we consider IPv6. Bumping the IP address from 32 bits to 128, bumping the 4 billion up to a billion billion billion or so — the number doesn’t matter, at that point — makes it infeasible to keep a list of bad addresses. There are enough addresses there to allow the bad guys to use a new one every time, so we’d never see repeats. There are, of course, ways we can group addresses into large blocks, and know that any address we see in one of those blocks will be bad, but even that isn’t enough to make it work.

We could switch to a pass list, a whitelist of known good addresses — that would still be small enough to be manageable — and refuse anything else. But that makes it very hard for an organization to deploy a new server, or for a new organization to join in.

John Levine has one approach: leave the email system on IPv4 for the foreseeable future. Even, John points out, when many other services, customer endpoints, mobile and household devices, and the like have been — have to have been — switched to IPv6, we can still run the Internet email infrastructure on IPv4 for a long time, leaving the IP blocklists with v4 addresses, and a system that we’re already managing fine with.

Of course, some day, we’ll want to completely get rid of IPv4 on the Internet, and by then we’ll need to have figured out a replacement for the IP blocklist mechanism. But John’s right that that won’t be happening for many years yet, and he makes a good case for saying that we don’t have to worry about it.

At least not until he and I have long been retired

Thursday, February 17, 2011

Watson’s third day

I hadn’t planned to make three posts, one per day, about Watson on Jeopardy!, but there ya go. The third day — the second game of the two-game tournament — was perhaps even more interesting than the first two.

Watson seemed to have a lot more trouble with the questions this time, sometimes making runs of correct answers, but at other times having confidence levels well below the buzz-in threshold. Also, at many of those times its first answer was not the correct one, and sometimes its second and even its third were not either. Some of the problems seemed to be in the categories, but some just seemed to deal with particular clues, regardless of category.

Watson also did not have domination of the buzzer this time, even when it had enough confidence to buzz in. I don’t know whether they changed anything — I suspect not, since they didn’t say so. It’s likely that Mr Jennings and Mr Rutter simply were more practiced at anticipating and timing their button-presses by then (remember that the three days’ worth of shows were all recorded at the same time, a month ago).

Those factors combined to make Watson not the run-away winner going into the Final Jeopardy! round that it was in the first game. In yesterday's final round (category: 19th-century novelists), all three contestants (and your reporter, at home) came up with the right answer, and Watson pulled far ahead with an aggressive bet that Mr Rutter didn’t have the funds to match. Mr Jennings, meanwhile, chose to be conservative: assuming he would lose to Watson (the first game’s results made that certain), he made his bet of only $1000 to ensure that he would come in second even if he got the answer wrong.

The result, then, was Watson winning the two-game match handily, and earning $1 million for two charities. Other charities will get half of Mr Jennings’s and Mr Rutter’s winnings (whether that’s before or after taxes, I don’t know; I also don’t know whether taxes will reduce Watson’s million-dollar contribution).

One other thing: in a New Scientist article yesterday, talking about the second day and the first Final Jeopardy! round, Jim Giles makes a sloppy mistake (but see update below):

Watson’s one notable error came right at the end, when it was asked to name the city that features two airports with names relating to World War II. Jennings and Rutter bet almost all their money on Chicago, which was the correct answer. Watson went for Toronto.
Even so, the error showed another side to Watson’s intelligence: knowing that it was unsure about the answer, the machine wagered less than $1000 on its answer.

Of course, Watson’s wager had nothing to do with how sure it was about the answer: it had to place the bet before the clue was revealed. Its wager had something to do with the category, but likely was far more heavily controlled by its analysis of the game position and winning strategy. In determining its bets, it runs through all the bets it and its opponents might make, and decides on a value that optimizes its own position. And its strategy in the second game was different from that in the first

Update: The New Scientist article was updated shortly after it was published. It now says this:

Even so, the error did not hurt Watson too much. Knowing that it was far ahead of Jennings and Rutter, the machine wagered less than $1000 on its answer.

Wednesday, February 16, 2011

Watson’s second day

Commenting on yesterday’s entry, The Ridger notes this:

I find looking at the second-choice answers quite fascinating. "Porcupine" for what stiffens a hedgehog’s bristles, for instance. There is no way that would be a human’s second choice (after keratin). Watson is clearly getting to the answers by a different route than we do.

That’s one way to look at it, and clearly it’s true that Watson goes about determining answers very differently from the way humans do — Watson can’t reason, and it’s all about very sophisticated statistical associations.

Consider that both humans (in addition to this one, at home) got the Final Jeopardy question with no problem, in seconds... but Watson had no idea (and, unfortunately, we didn’t get to see the top-three analysis that we saw in the first two rounds). My guess is that the question (the answer) was worded in a manner that made it very difficult for the computer to pick out the important bits. It also didn’t understand the category, choosing Toronto in the category U.S. Cities, which I find odd (that doesn’t seem a hard category for Watson to suss).

But another way to look at it is that a human wouldn’t have any second choice for some of these questions, but Watson always does (as well as a third), by definition (well, or by programming). In the case of the hedgehog question that The Ridger mentions, keratin had 99% confidence, porcupine had 36%, and fur had 8%. To call fur a real third choice is kind of silly, as it was so distant that it only showed up because something had to be third.

But even the second choice was well below the buzz-in threshold. That it was as high as it was, at 36% confidence, does, indeed, show Watson’s different thought process — there’s a high correlation between hedgehog and porcupine, along with the other words in the clue. Nevertheless, Watson’s analysis correctly pushed that well down in the answer bin as it pulled out the correct answer at nearly 100% confidence.

In fact, I think most adult humans do run the word porcupine through their heads in the process of solving this one. It’s just that they rule it out so quickly that it doesn’t even register as a possibility. That sort of reasoning is beyond what Watson can do. In that sense it’s behaving like a child, who might just leave porcupine as a candidate answer, lacking the knowledge and experience to toss it.

No one will be mistaking a computer for a human any time soon, though Watson probably is the closest we’ve come to something that could pass the Turing test. However good it can do at Jeopardy! — and from the perspective of points, it’s doing fabulously (and note how skilled it was at pulling all three Daily Doubles) — it would quickly fall on its avatar-face if we actually tried to converse with it.

Tuesday, February 15, 2011

Watson’s first day

Interesting.

Watson did very well on its first day. In order to have time to explain things and introduce the concept of Watson, they set it up so that only two games are played over the three days. The first day was for the first round, and the second day (this evening) will have Double Jeopardy and Final Jeopardy.

It wasn’t surprising that there were a few glitches, where Watson didn’t fully get the question — for instance, answering leg, rather than missing a leg, in describing the anatomical oddity of an Olympic winner. And, as we knew might happen, Watson repeated an incorrect answer from Ken Jennings, because the computer has no way to know what the other contestants have said.

What I found interesting, though, is that Watson does have a very strong advantage with the buzzer. Despite the attempts to smooth that out by setting up a mechanical system whereby Watson sends a signal to cause a button to be physically pushed, and despite whatever the humans can do through anticipation, it’s clear that people just can’t match the computer’s reactions. Almost every time Watson was highly confident of its answer — a green bar (see below) — it won the buzz. Surely, on things like the names of people in Beatles songs, Mr Jennings and Mr Rutter were as confident of the answer as Watson was, and had the answers ready well before Alex finished reading. Yet Watson won the buzz on every one of those.

It was fun to have a little of Watson’s thought process shown: at the bottom of the screen, we saw Watson’s top three answer possibilities, along with its confidence for each, shown as a percentage bar that was coloured red, yellow, or green, depending upon the percentage. That was interesting whether or not Watson chose to buzz in. On a Harry Potter question for which the answer was the villain, Voldemort, Watson’s first answer was Harry Potter — it didn’t understand that the question was looking for the bad guy, even though the whole category related to bad guys. But its confidence in the answer was low (red, and well below the buzz threshold), it didn’t buzz in, and Mr Rutter gave the correct answer (which had been Watson’s second choice).

Of course, they didn’t use any audio or video clues, according to the agreement — Watson can neither hear nor see — but they didn’t seem to pull any punches on the categories or types of questions. It feels like a normal Jeopardy! game.

Oh, and by the way: the TiVo has it marked as copy-protected, so I can’t put it on a DVD. Damn. I don’t know whether regular Jeopardy! games are that way or not; I’ve never recorded one before.

Sunday, February 13, 2011

Jeopardy! tomorrow

Monday through Wednesday are the days when the Jeopardy! games will air that pit IBM Research’s Watson computer against former champions Ken Jennings and Brad Rutter.

My TiVo is set to record them, and it’s also recorded last week’s NOVA program, Smartest Machine on Earth (which you can watch on the PBS site). I’m eager to see how the games, recorded last month, came out.

Update, 15 Feb, answer to Nathaniel’s question in the comments: Ken Jennings says this, on his blog:

On Twitter, Watson (okay, his human handlers) have said that video will be posted on Watson’s website on Thursday, for those unable to watch one or more of the games live. You know: non-Americans, the gainfully employed, the Tivo-less, those with significant others expecting a romantic night out tonight instead of a quiz show, etc.

Thursday, February 10, 2011

Foiling offline password attacks

Jarno, at F-Secure — an excellent Finnish anti-malware company — has posted a nice analysis of encoding password files. Because he assumes some knowledge of the way things work, I’ll try to expand a bit on that here. Some of this has been in these pages before, so this is a review.

A cryptographic hash algorithm is a mathematical algorithm that will take some piece of data as input, and will generate as output a piece of data — a number — of a fixed size. The output is called a hash value, or simply a hash (and it’s sometimes also called a digest). The algorithm has the following properties:

It’s computationally simple to run the algorithm on any input.
Given two different inputs, however similar, it’s very likely that the hashes will be different (it is collision resistant).
Given a hash value, it’s computationally infeasible to determine an input that will generate that hash (it is preimage resistant).
Given an input, it’s computationally infeasible to choose another input that gives the same hash (it has second preimage resistance).

Cryptographic hash algorithms go by names like MD5 (for Message Digest) and SHA-1 (for Secure Hash Algorithm), and they’re used for many things. Sometimes they’re used to convert a large piece of data into a small value, in order to detect modifications to the data. They’re used that way in digital signatures. But sometimes they’re just used to hide the original data (which might actually be smaller than the hash value).

Unix systems used to store user names and passwords in a file called /etc/passwd, with the passwords hashed to hide (obfuscate) them. A standard attack was to find a way to get a copy of a system’s /etc/passwd file, and try to guess the passwords offline. If you know what hash algorithm they’re using, that’s easy: guess a password, hash it, then look in the /etc/passwd file to see if any user has that hash value for its password.

Nowadays, most systems have moved away from storing the passwords that way, but there are still services that do it, there are still ways of snatching password files, and the attack’s still current. Jarno’s article looks at some defenses.

Salting the hashed passwords involves including some other data along with the password when the hash is computed, to make sure that two different users who use the same password will have different hashes in the password file. That prevents the sort of global attack that says, Let’s hash the word ‘password’, and see if anyone’s using that. Of course, if the salt is discoverable (it’s the user name, or something else that’s stored along with the user’s information), users’ passwords can still be attacked individually.

Even using individual attacks, it’s long been easy to crack a lot of passwords offline: we know that a good portion of people will use one of the 1000 or so most popular passwords (password, 123456, and so on), and it never has taken very long to test those. Even if that only nets the attacker 5% of the passwords in the database, that’s pretty good. But now that processors are getting faster, it’s feasible to test not only the 1000 most popular passwords, but tens or hundreds of thousands. All but the best passwords will fall to a brute-force offline attack.

The reason offline attacks are important is that most systems have online protections: if, as an attacker, you actually try to log in, you’ll only be allowed a few tries before the account is locked out and you have to move on to another. But if you can play with the password file offline, you have no limits.

Of course, the best defense is for a system administrator to make sure no one can get hold of the system’s or the service’s password file. That said, one should always assume that will fail, and someone will get the file. Jarno suggests the backup defense of using different salt values for each user and making a point of picking a slow hash algorithm. The reasoning is that it doesn’t make much difference if it takes a few hundred milliseconds for legitimate access — it doesn’t matter if a login takes an extra quarter or half second — but at a quarter of a second per attempt, it will be much harder for an attacker to crack a bunch of passwords on the system.

Just two small points:

First, Jarno recommends specific alternatives to SHA-1, but he doesn’t have it quite right. PBKDF2 and HMAC are not themselves hash algorithms. They are algorithms that make use of hash algorithms within them. You’d still be using SHA-1, but you’d be wrapping complexity around it to slow it down. That’s fine, but it’s not an alternative to SHA-1.

The same is the case for bcrypt, only, worse, bcrypt uses a non-standard hash algorithm within it. I would not recommend that, because the hash algorithm hasn’t been properly vetted by the security community. We don’t really know how its cryptographic properties compare with those of SHA-1.

Second, Jarno suggests that as processors get faster, the hashing can be changed to maintain the time required to do it. He’s right, but that still leaves an exposure: because the server doesn’t have the passwords (only the hashes of the passwords), no hash can be changed until the user logs in. If the system doesn’t lock out unused accounts periodically, those unused accounts become weak points for break-ins over time.

That said, this is sound advice for system administrators and designers. And perhaps at least a little interesting to some of the rest of you.

Friday, January 14, 2011

Watson and Jeopardy!

Today, the folks at Jeopardy! will be recording the competition, to be aired on 14-16 February, between IBM’s Watson computer and two of the game’s biggest champions, Ken Jennings and Brad Rutter. I’m told that the Watson Research Center lab is closed to employees today, and that employees were asked to work from home or make other working arrangements for the day.

They did a practice round that Watson won, and you can watch some video of that in the ZDNet article. In that round, no one answered any questions wrong — it will be interesting to see how it all works out when the errors start coming in — and it looks like Watson has an edge on the buzzer timing.

Some observers are not impressed by all this. One commenter to the ZDNet article says, This is not progress. I’ve talked with others who think the whole thing is a WOMBAT.^[1] And, indeed, one has to wonder about an expenditure of a million dollars on a replica Jeopardy! set (according to CNN Money).

But, of course, they want to make a spectacle of this, just as they did with Deep Blue and Garry Kasparov.

Spectacle aside, though, is this just a silly waste? We’ll have to see what comes of the technology after the Jeopardy! match. It’s not directly clear what IBM did with the technology that went into Deep Blue, but it’s unlikely that the technology that has gone into Watson will languish. If all these projects do is produce machines that can play chess or Jeopardy!, then, indeed, they’re wasteful, no more than novelties.

But, surely, technology that can understand human-language questions and answer them has many practical uses. Such a system could be a useful front-end to many systems that have to direct people to the right experts, diagnose problems, and answer common questions. Of course, on the other side, many of us might find ourselves more frustrated than we are already, when it becomes even harder to get a real human on the phone.

Though, might we be getting closer to passing the Turing test? Perhaps before too long we won’t be able to tell whether we have a real human on the phone or not. And if that computer, Watson XVII perhaps, can answer our questions and give us a smooth and pleasant experience in the process, does it matter?

It’s clear that, while word processors and spreadsheets are useful, it’s games that have really pushed and expanded the limits of technology. 3-D graphics rendering, hand-held motion sensors, and even parts of the underlying network technology are where they are because of games. If we take advantage of where the games move us and use the technology beyond the realm of entertainment — by, say, rendering images of heart scans in 3-D to give doctors diagnostic capabilities that our parents’ doctors couldn’t even dream about, and allowing them to perform surgery with amazing levels of precision — then what we spent on the frivolity of the games was well worth it.

So let’s see what’s next for the Watson technology after Jeopardy!

We’ve come a long way since Eliza.

^[1]WOMBAT = Waste Of Money, Brains, And Time

Monday, January 03, 2011

Свободного программного обеспечения

Interesting: Vladimir Putin has signed an order to move the Russian government to free software over the next four years.

The transition to open-source, or free, software will begin in the second quarter of 2011, with the Ministry of Communications examining what base software packages are needed for government agencies, according to the documents. During the same quarter, the ministry and other agencies will develop proposals for user support centers and for mechanisms to support software developers, the documents said.
Russian agencies will also begin an inventory of their IT assets during the second quarter of 2011, the documents said. Pilot agencies will begin using a basic package open-source software in the second quarter of 2012, according to the transition schedule.

Official adoption of Firefox has been going around here and there, but this goes way beyond that, with plans to deploy Linux in place of Windows, to replace Microsoft Office, and so on. The order talks of replacing proprietary software with free software, including operating systems, drivers for hardware and application software for servers and user workstations.

On the other hand, while Computerworld’s report mentions open-source software, I’m not sure about the idiom. The word the order uses, свободного (genitive of свободный), seems to mean free as in unrestricted, which is the same sense as it’s used by the Free Software Foundation — not free of charge, so much as free access. There are differences between free software and open-source software; they’re similar, but they’re not the same. Because I don’t know Russian, I can’t tell whether the Russian term (used here in the title, the words taken from the official plan) applies to the latter or not.

Friday, December 17, 2010

Someone else’s impressions of a Mac

Here’s an odd essay in the Huffington Post (one might call that a tautology), a rant by one Joshua Kors, who calls himself an Investigative Reporter, about why he’s returning the iMac he just bought.

I say that it’s odd because it seems that anyone who’s paid any attention to recent technology would know a few things that Mr Kors seems not to: that Mac computers are different from Windows machines, that there are usually ways to accomplish what you want, if you’re willing to look into it a bit (investigate, one might say), and suchwhat.

When I got my first (and, so far, only) Mac, three and a half years ago, I knew there’d be differences between it and Windows, and I knew there’d be things it’d take me a while to learn. I wrote about some of those differences here and here. My overall impression was (and is) that Macs aren’t better than Windows machines, nor are they worse; they’re just different. And the differences in either take some getting used to, if you’re accustomed to the other. From my first impression, after only one day:

There’s a lot that seems like it’s wrong until you find out that it’s really there, and you just need to learn how to do it. I’ve found quite a few of those so far (the Dock is much nicer than it first seems, for instance). We’ll see how I like it as I learn more.

And that’s the thing: Mr Kors didn’t seem to be interested in learning the differences, or in figuring out what works better for him and what doesn’t. He says he was really excited to buy an iMac, but then, after two weeks, he returned it, annoyed:

Two weeks later I’m back at Apple headquarters — my teeth worn down, my face prematurely aged from endless hours of sleeplessness and technological frustration — certain that the iMac was the worst purchase I’d ever made.

His account, though, starts off with hyperbole and goes from there:

My iMac and I got off on the wrong foot. Turns out there’s a video camera embedded in the screen, and before I could boot her up for the very first time, she wanted to take my picture. For identity purposes, she said. I stumbled to the bathroom, brushed my hair (and my teeth), exchanged my raggedy Raiders t-shirt for a professionally ironed button-up and returned to my desk, smirking at the turn of events. My old PC didn’t care if I called the Pentagon in my bathrobe. My iMac apparently had registered with Match.com.

Come on, Mr Kors. Just put your cat in front of the camera, and move on. I do agree that the overly familiar tone of things is annoying, at least to this techie, but it’s also not rocket science to understand that it’s just fluff. Windows, too, wants a personal image for the login screen. Only, Windows starts off with some canned images — a chess piece, a flower, a skateboard, and so on. Whatever.

He goes on to complain about the mouse and the keyboard, before getting used to them. He complains that not all the software he wants is bundled with his machine (was it really so on Windows?), and that the software he’s used to doesn’t work (had no one clued him in about needing new software?), and he implies that he knows of no alternatives.

He says he knew that he wouldn’t be able to transfer files from one computer to another over the network, something that’s news to this Mac-and-Windows user, who does that all the time — Mac to Mac, Mac to Windows, Windows to Mac... it all works fine. In copying files using an intermediate external drive (a canoe, in his metaphor), he’s frustrated at the manual bookkeeping he has to do because he can’t figure out that copy; move to trash is the equivalent to move (yes, I find the two-step requirement mildly annoying as well, but it’s hardly critical).

Migrating settings from iTunes, Thunderbird, and so on is much easier than he makes out, with or without an ocean of Mac dork chat boards. Changing display fonts is also easy, and you don’t have to change the font size in your sent mail in order to do it.

All in all, it appears that Mr Kors didn’t want to try something new, but just wanted to write a rant. And so he did. Only, he comes across as an idiot in the process.

Well, he writes for the Huffington Post, so maybe that’s enough to make him come across as an idiot all by itself.

Tuesday, October 26, 2010

More on Internet cafés and public networks

For my readers who aren’t terribly fond of the entries tagged technology, please stick with this one. It’s important.

Do you log into web sites from public computers, even though I advised against it four years ago? That post only scratched the surface, really: it just talked about using public computers. These days, most people have their laptops with them, and they connect them to the public wireless networks in the cafés.

Most of those networks are unencrypted. That means that you don’t have to enter a key or a password when you access the network. You just select the network name (or let your computer snag it automatically), go to a web page in your browser, and get redirected to some sort of login and/or usage-agreement screen on the network you’ve connected to. Once you click through that, you’re on the Internet.

Suppose there are twenty people in there using that particular network. All twenty of them are sending and receiving stuff through the air. How is it that I only get my stuff, and you only get yours, and we don’t see each other’s, nor the web pages of the other eighteen users? It must be that my web pages are beamed straight to me, and yours to you, right?

No. In fact, everything that everyone sends and receives is out there for all twenty computers to see. But each of our computers is given an IP address, each data packet contains the address that the packet is being sent to... and all of our well behaved computers just look at the addresses and ignore any packets that aren’t meant for them.

Computers do not have to be well behaved. Any computer in the café — or near enough to hear the wireless signals — can see everything that everyone is sending to and receiving from the network. Because the network isn’t encrypted, it’s all out there, in the clear, visible to all who care to be badly behaved.

But we aren’t completely unprotected: we have something called TLS (or SSL, depending upon the version). When the web site’s address, the URL, begins with https, your communication with that web site is encrypted and safe from eavesdropping, even if the network itself isn’t. Perhaps you don’t care who sees you reading the New York Times, but you want to be protected when you visit your bank online. Use http for the Times and https for the bank, and all is well.

And that’s important, because most web authentication just has you send your username and password openly from your browser to the web site. Anyone could snoop your ID and password as you logged in, if your connection to the web site wasn’t encrypted. But that https saves you.

But wait: I have a New York Times account, and I’ve logged into the Times web site (using https). Every time I visit the site, it knows who I am. Even when I just go to http://www.nytimes.com/ ! How does it know that, when I’m not logging in all the time?

Web sites use things called browser cookies to remember stuff about you. A cookie is a short bit of data that the web site sends and asks your browser to attach a name to and keep. Later, when you return, the web site asks if you have a cookie with a particular name, and if you do, your browser sends it. For web sites that you log into, such as your bank and the Times, the login (session) cookie is sent every time your browser touches the web site. Every time I click on another Times article, my Times session cookie is sent again. Every time I go to another page on my bank’s site, my bank’s session cookie is sent again.

My bank is set up securely, as is my credit card site, as is Gmail, as is PayPal: every contact from the login screen until I’m logged out is through https. It’s all encrypted. Not only is my password encrypted when I log in, but the session cookie that the site gives me is encrypted too, every time I send it.

The New York Times, though, doesn’t work that way: only the login itself uses https. Once it gives me the session cookie, everything switches back to http, and there’s no encryption. When I click on an article and my browser sends my cookie again, anyone in the café can grab it.

Now, the cookie doesn’t contain my password, so no one can get my password this way. But as long as I stay logged in, and the cookie is valid, anyone who has that cookie can masquerade as me. If they send my cookie to the New York Times, it will treat them as though they were me, as though they had logged in with my password.

Of course, it’s not just the New York Times that does this. Amazon does it. So do eBay, Twitter, Flickr, Picasa, Blogger, and Facebook. So do many other sites where you can buy and sell things. (All the airline sites I’ve checked do it right, using https after login.) That means that if you use Facebook while you’re at Panera, someone else can borrow your Facebook session cookie and be you, until you log out. If you stop by Starbucks and get on eBay, someone else can use your cookie to make bids from your account.

There’s some protection at some sites. Amazon, for example, will let the cookie thief browse around as you, but will want your password before placing an order... assuming you didn’t enable one-click purchasing. And depending upon the options you have set, eBay might or might not ask for your password when the thief places a bid. But Facebook and Twitter are certainly wide open, here.

To try to increase awareness of this, a guy named Eric Butler has created a Firefox add-on called Firesheep, which will make it trivial for anyone, even someone who knows nothing about the technical details of this stuff, to be a cookie thief and pretend she’s you on Facebook, or Twitter, or Blogger, or the New York Times. Eric isn’t trying to abet unethical or criminal behaviour; he’s trying to push the popular web sites, whose users will be targets of these sorts of attacks, to fix their setups and use https for everything whenever you’re logged in.

So here’s an expanded form of the warning: Don’t do private stuff on public networks, unless you’re absolutely sure your sessions are encrypted. If you don’t know how to be sure, then err on the side of caution.

Monday, October 25, 2010

Challenge/response still lives (barely)

Wow; I haven’t gotten one of these in a long time:

ATTENTION!
A message you recently sent to a 0Spam.com user with the subject "[redacted]" was not delivered because they are using the 0Spam.com anti-spam service. Please click the link below to confirm that this is not spam. When you confirm, this message and all future messages you send will automatically be accepted.

I wrote about challenge/response anti-spam systems about three years ago, but probably haven’t seen a challenge message in at least two years. I thought people had given up on them.

Alas, no. But if the last two years is something to judge by, they’ve at least fallen further into disfavour.

Anyway, it’s worth a re-post, then, of my three-year-old item about them. All the problems, all the reasons one shouldn’t use them, are still valid now. So, here’s the link again: head over and read (or re-read) it.

Monday, October 11, 2010

Search engines and their responsibility

A French court has just decided a case that will likely have a great deal of effect on online search engines if the decision is upheld after appeals. A French man had been accused of crimes relating to the corruption of a minor, ultimately resulting in a suspended sentence. He found that Google search results snagged the news items about his case, putting them at the top of search results on his name:

Given extensive press coverage of the alleged crime at the time, querying the man’s name on the popular search engine returns web pages from news publications that suggested he was a rapist, among other non-favorable descriptions.
The man argues that the statements in the online articles still available today adversely characterize him, which puts him in a disadvantageous social position when meeting new people and applying for jobs, among other situations and opportunities.
The man previously contacted Google directly to remove the defamatory articles from its search index, but the company did not do so arguing its proprietary algorithms simply return web pages in its index related to the keywords searched, that is, there is no direct human manipulation of top search results.

The result from the court was this:

The French court sided with the plaintiff, agreeing that those representations were defamatory, and ruled Google could have mitigated costs to the plaintiff by removing the pages.
The ruling ordered Google to pay €100,000, and to reimburse €5,000 in litigation costs incurred by the plaintiff. The ruling also ordered the company to disassociate the man’s name from the defamatory characterizations in Google Suggest, which suggests popular phrases while a person enters search terms in the Google search-box prior to completing a search. Additionally, for every single day the defamatory information remains in the company’s search results, Google would be fined an additional €5,000.

This decision will be disastrous for search engines and other Internet services if it stands. Moreover, it’s just horribly wrong on the surface. It makes no sense to hold indexing services responsible for the information they index, unless it can clearly be shown that they preferentially indexed certain material with a goal of creating a biased view.

Research facilities have, long before the widespread availability of Internet search tools, helped people find news items and other public information that we might rather they didn’t point to, including false information and stories that have since been debunked. We’ve always considered it the responsibility of the researcher to winnow the data.

The difference now, of course, is that the researchers are friends, neighbours, potential romantic partners, and prospective employers... and the information is much more readily available than it ever was. It’s tempting to try to make the search engines let go of obsolete information and only find the current stuff.

The problems with that idea, though, are several. It’s essentially impossible to sort out in any automated way what’s appropriate and what’s not. Even if they prefer legitimate news outlets to other sources of information, and prefer newer articles to older ones, the amount of cross-linking, re-summarizing, and background information will still show searchers plenty of nasty stuff. And who decides what the legitimate news outlets are? The search engines shouldn’t be making those filtering decisions for us.

Any mechanism that isn’t entirely automated doesn’t scale. With the untold millions upon millions of web pages that Google and other search engines have to index every day, there would be no way to respond to individual requests — or demands backed by court mandates — to unlink or otherwise remove specific information.

If this should stand, I can see that Google might have to cease operations in France. If it should spread, it might easily deprive all of us of easy searching on the Internet. That would be a far greater disaster than having a guy in Paris have to explain away unflattering news stories about a false or exaggerated accusation.

Clearing one’s name has always been a difficult challenge, and it’s only been made harder — perhaps, ultimately, impossible — on the Internet. I have a great deal of sympathy for anyone who finds himself relentlessly pursued by his past, especially when that past contains errors that weren’t his.

But this can’t be an answer to that. It just comes with too much collateral damage.

Monday, October 04, 2010

A couple of things about Stuxnet

There’s a relatively newly discovered (within the last few months) computer worm called Stuxnet, which exploits several Windows vulnerabilities (some of which were patched some time ago) as it installs itself on people’s computers. It largely replicates through USB memory sticks, and not so much over the Internet (though it can replicate through storage devices shared over networks). And it’s something of an odd bird. Its main target isn’t (at least for now) the computers it’s compromised, and it’s not trying to enslave the computers to send spam, collect credit card numbers, or mount attacks on web sites.

It’s specifically designed to attack one particular industrial automation system by Siemens, and it’s made headlines because of how extensive and sophisticated it is. People suspect it’s the product of a government, aimed at industrial sabotage — very serious stuff.

The folks at F-Secure have a good Q&A blog post about it.

There are two aspects of Stuxnet that I want to talk about here. The first is one of the Windows vulnerabilities that it exploits: a vulnerability in .lnk files that kicks in simply by having an infected Windows shortcut show its icon:

This security update resolves a publicly disclosed vulnerability in Windows Shell. The vulnerability could allow remote code execution if the icon of a specially crafted shortcut is displayed. An attacker who successfully exploited this vulnerability could gain the same user rights as the local user. Users whose accounts are configured to have fewer user rights on the system could be less impacted than users who operate with administrative user rights.

Think about that. You plug in an infected USB stick, and you look at it with Windows Explorer. You don’t click on the icon, you don’t run anything, you don’t try to copy it to your disk... nothing. Simply by looking at the contents of the memory stick (or network drive, or CD, or whatever), as you look at its icon and say, Hm, I wonder what that is. I’d better not click on it, it’s already infecting your computer. And since most Windows users prior to Windows 7 ran with administrator rights, the worm could get access to anything on the system.

You need to make sure this security update is on your Windows systems.

The other aspect is interesting from a security point of view. From the F-Secure Q&A:

Q: Why is Stuxnet considered to be so complex?
A: It uses multiple vulnerabilities and drops its own driver to the system.
Q: How can it install its own driver? Shouldn’t drivers be signed for them to work in Windows?
A: Stuxnet driver was signed with a certificate stolen from Realtek Semiconductor Corp.
Q: Has the stolen certificate been revoked?
A: Yes. Verisign revoked it on 16th of July. A modified variant signed with a certificate stolen from JMicron Technology Corporation was found on 17th of July.

I’ve talked about digital signatures before, at some length. When the private keys are kept private, digital signatures that use current cryptographic suites are, indeed, secure. But...

...anyone who has the private key can create a spoofed signature, and if the private keys are compromised the whole system is compromised. When one gets a signing certificate, the certificate file has both private and public keys in it. Typically, one installs the certificate, then exports a version that only contains the public key, and that certificate is made public. The original certificate, containing the private key, has to be kept close.

But it’s just a file, and anyone with access to it can give it to someone else. Shouldn’t, but can. If you can compromise an employee with the right level of access, you can snag the private key and made unauthorized authorized signatures.

In most cases, it’s far easier to find corruptible (or unsophisticated) people than it is to break the crypto. And if the stakes are high enough, finding corruptible people isn’t hard at all. The Stuxnet people may well have a host of other stolen certs in their pockets.

Monday, September 20, 2010

Interesting hacks: IPv6 addreses in UNC names

Microsoft’s Raymond Chen tells us about an interesting hack that Microsoft uses. When you’re using a disk drive over the network from Windows, you normally refer to it with what’s called a UNC name (for Uniform Naming Convention). Normally, what goes into the UNC name is the name of the computer the drive is on, so if you want to use a share called Banana on a computer called HomeSrv, you write it as \\HomeSrv\Banana. So, for example, you might copy MyFile.html into the WebFiles subdirectory this way:

copy MyFile.html \\HomeSrv\Banana\WebFiles

That uses a NetBIOS name for the computer, but it’s common to use an Internet address instead, often written as a domain name. So, maybe:

copy MyFile.html \\homesrv.example.net\Banana\WebFiles

Sometimes, you have to use a computer that doesn’t have a resolvable name, perhaps because it’s on a private network that doesn’t have name-resolution service (DNS). In that case, you have to use the numeric Internet address (IP address):

copy MyFile.html \\192.168.2.13\Banana\WebFiles

Now, 192.168.2.13 is an IPv4 address — the form of address most of us are using today. But as we switch to IPv6, we’ll be using addresses in a different form. They’re 128-bit addresses, and they’re written as eight sixteen-bit chunks, separated by colons (not dots). Like this:

copy MyFile.html \\2001:DB8:0:0:8:800:200C:417A\Banana\WebFiles

The trouble is that the colon character has a special meaning in Windows identifiers, from way back before anyone had thought of IPv6, and many programs can’t deal with something that looks like that. To help out, Microsoft registered the domain name ipv6-literal.net, and you can do this with it:

copy MyFile.html \\2001-DB8-0-0-8-800-200C-417A.ipv6-literal.net\Banana\WebFiles

That special name, 2001-DB8-0-0-8-800-200C-417A.ipv6-literal.net, will resolve to the IPv6 address 2001:DB8:0:0:8:800:200C:417A... and it might look long and ugly, but it will work in any program that supports domain names in UNC names.

The amusing part of the hack is that it doesn’t actually go out to DNS and resolve that name. Indeed, if you try it from nslookup, it will resolve to the same address that ipv6-literal.net does. If you put it in a web browser, it will do a Bing search on the address string and ipv6-literal. No, what’s interesting is that the name is specially handled by Windows, and resolved in the Windows internal name resolution scheme, without its ever going out to the Internet.

It’s the true definition of a hack, put in to make old resource-name parsers happy. And it only has to work on Windows, because Windows systems are the only ones that have that issue with the colon character.