I skipped the 8:30AM session today, it was mostly not interesting to
me and I badly needed the extra hour of sleep. I’m sorry to miss On
the Feasibility of Internet-Scale Author Identification
, but I
will read the paper. I also skipped the business meeting, so, summaries
start with the 10:30 session, and end with the short talks.
Network Security
Scalable Fault Localization under Dynamic Traffic Patterns
This is a new scheme to find malicious routers which may be dropping
or misrouting packets they should have forwarded, or modifying them in
transit, or injecting entirely new packets. The problem has been studied
for more than a decade, but is very hard, with no truly satisfying
solutions to date. A malicious router can pretend to work correctly when
it notices probes; mandatory per-hap acks are high-overhead, may be
forgeable, and may also enable framing
attacks (to make it look
like some other router is malicious); you have to distinguish
distinguish malice from normal packet loss; path-based approaches
require per-path state and per-source key management, and long-term path
stability.
The authors propose neighborhood-based
monitoring, in which
each router monitors only its 1-hop neighbors, and don’t worry about
end-to-end delivery. This scheme (they say) requires only a constant
amount of state on each router and has no problem with frequent route
changes. They assume reliable neighbor discovery and at least one benign
node in each neighborhood. The idea is, each node records a traffic
summary
of packets received from and sent to each of its neighbors.
Periodically, all the routers report to a trusted controller, which
decides whether packet loss at any given node is excessive.
For storage efficiency, they want to store some sort of fingerprint
rather than full packets. These need to be immune to forgery, and yet
allow the controller to do all the processing it needs to do, in
particular arithmetic on packet counts. Their proposed technique is to
divide traffic into loosely synchronized epochs
; within each
epoch, each node stores cryptographic hashes of every packet; the
controller broadcasts an encryption key after the epoch ends, which all
nodes use to encrypt their hashes. (At this point it is too late to
forge anything, so it’s ok that everyone knows the key. Or so they said.
I’m a little confused about how this part works.) For traffic
efficiency, the controller also sends out an RNG seed which selects only
a fraction of nodes to send in a report.
They claim security even against colluding nodes, and to be able to detect a faulty neighborhood (not necessarily the exact troublesome node) with 99% accuracy.
I like this general idea—and it dovetails neatly with a project I was working on last term—but I’d like to see them try to remove the trusted controller, making the scheme fully distributed and acephalous.
Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail
This is another salvo in a long research war between techniques to hide what you are doing on the interwebs, and techniques to unmask what you are doing on the interwebs. Specifically, if you use an encrypted proxy server or mix network that’s supposed to conceal precisely what you are doing from an attacker who can observe your traffic, then the attacker can try to use traffic analysis to figure it out anyway. Generally, both sides assume that the attacker can see low-level packet information despite the encryption (packet lengths, timings, and immediate destinations, overall transfer sizes, etc), knows which hosts the client might contact, and has an opportunity to train machine classifiers on the low-level features of interest. Packet lengths, in particular, have repeatedly been shown to reveal all kinds of things, ranging from website identities to search queries to voice-over-IP conversations (if the language is known). The obvious countermeasure is to pad the packets, but it’s not clear whether this helps.
The authors of this paper want to make a higher-level point: even if you make all the fine-grained features useless, an attacker can still figure out something (what websites you’re visiting) from coarse features like overall communication time and overall bandwidth, that are prohibitively expensive to conceal. They built deliberately simple (naive Bayes) classifiers looking only at one or two coarse features, and were able to identify websites from a set of 512 with 70 to 80% accuracy.
This is a scary result, but there might still be room for
countermeasures, because this is a lab result. It’s not clear whether
real-world artifacts interfere, it’s possible that one might be able to
apply application-level countermeasures that change the coarse features
sufficiently to break the classifer, and (I think this last is most
important) nobody in academia has tried to scale these attacks to
Internet scale. I want to know what happens if there are 512 sites of
interest
that the attacker cares about visits to, but a couple
orders of magnitude more uninteresting sites that might also be
visited.
Off-Path TCP Sequence Number Inference Attack—How Firewall Middleboxes Reduce Security
An off-path TCP sequence number attack is when you try to inject packets into an existing TCP connection by guessing the sequence numbers, and you’re not a man-in-the-middle so you can’t predict them. TCP starts off with random sequence numbers nowadays to make this hard, but they claim it’s possible to infer a much smaller range if you have malware on the destination host and/or there are middleboxes trying to be clever.
They gave only one detailed example. A TCP sequence number
checking firewall
discards packets that don’t fall within a sliding
window around its best guess of the current sequence number for each
connection. The goal here is to get rid of blindly injected packets.
Ironically, this enables the attack, because if the attacker can observe
the network on the protected side of the firewall, it can see whether
its packets got through, which allows a binary search for the right
sequence number range. As a special case, malware on the destination
host can detect whether the packets got through by looking at
netstat
error counters.
(If you have malware on the destination host, isn’t it game over already? Maybe the malware doesn’t have enough privilege to do what you want, yet. Recall the talk from yesterday about how processes with no special privileges at all can still read diagnostic info from the kernel. Also, they claimed that if the connection is sufficiently long-lived you don’t need the malware, but they didn’t explain any further.)
This notion generalizes to other sensitive, redundant state in middleboxes. I’d editorialize that they got it right ’way back when they designed the Internet with all the smarts at the edges of the network, and also I’d point out that this is a special case of the classic error-oracle attack. If something can fail two different ways depending on secret information, often an attacker can use that to discover the secret.
Attacks, 2
Signing Me onto Your Accounts through Facebook and Google: a Traffic-Guided Security Study of Commercially Deployed Single-Sign-On Web Services
This is a broad study of single-sign-on protocols actually deployed
in the wild, such as Google’s use of OpenID to federate your Google
identity to third parties, and Facebook’s Connect protocol to the same
end. The attack goal is to impersonate a user on a site that makes use
of the single sign-on. It was done without insider information, i.e. all
the authors know is messages sent to and from the browser, not
server-to-server messages or server internal logic. This made their job
harder, but also means that any vulnerability they found is a real live
exploitable thing rather than a theoretical problem. Conveniently, many
of these protocols use browser-relayed messages
; that is, the
identity provider and the relying site communicate through the
browser, by means of HTTP redirects.
They gave two examples of genuine exploits. With Google/OpenID, an attacker can insert the impersonation victim’s email address into a valid, signed authorization message. The signature doesn’t cover the substituted email address, but many relying websites don’t care. With Facebook, the attacker can rewrite an authorization to use app A as authorization to use app B. (Both of these have since been fixed, they say.)
Then they drew some conclusions. These flaws are in concrete deployment details, not the protocol specs; discovering them requires careful examination of actual systems. Real-world integration is via APIs, SDKs and sample code, not protocol specs, which is both good and bad. And underlying runtime systems matter; the Facebook exploit depends on details of Flash’s cross-domain access control rules.
Unleashing Mayhem on Binary Code
This is a scheme for automatically detecting security-exploitable bugs in compiled code (e.g. buffer overflows) and then automatically generating exploits. They use symbolic execution (a standard technique for program analysis) to discover cases where user input gets written to the program counter, and then to attempt to place attack code and transfer control to it.
As always with symbolic execution, they had to come up with clever tricks to prevent combinatorial explosion of stored state; they spent most of the talk on this, but I’m going to gloss over it, because it’s nothing new. The interesting bit is: the tool picks up all known bugs in their (fairly small) test set, and finds two more. They don’t seem to have tried it on anything really big, alas.
Clash Attacks on the Verifiability of E-Voting Systems
E-voting systems often issue cryptographic receipts to voters, which they can use later to confirm that their vote was indeed counted; in the simplest case, all the receipts are posted publicly after the election. The receipts do not reveal individual votes, but if you have the complete list you can confirm that the overall tally is correct. Suppose the vote-recording system is malicious, and gives two or more voters back the same receipt for their ballot. Then all but one of those votes can be replaced by forged votes.
This was almost entirely a theoretical presentation; they showed that a malicious implementation of a particular e-voting protocol can indeed carry out this attack, but the details are not interesting. The interesting thing here, in my opinion, is the observation that in electronic voting, the protocol has to be robust against a malicious vote-recording system as well as against malicious voters. Which is nothing new out here in the real world; it seems that e-voting proposals are often a bit divorced from what the real threats to elections are.
Informal Demo
During the break, a team from Google demonstrated an experimental
single-sign-on system they’re calling Belay. I’m
going to call this inverted single sign on, because what it
really does is let each site issue their own credential, which you then
collect and manage in one web station.
You don’t need a password
for anything except maybe the web station (which doesn’t even have to be
on the web: it could be in your browser, or on your smartphone, or
something like that), and you manage which sites get to know what
personally-identifiable information after you get the
credentials from the websites. They also understood that people want to
maintain multiple overlapping identities and reveal different facets of
each in different contexts, which it’s nice to see someone get
thoroughly right. I want more technical detail, but it seems like a
nifty idea.
Web Security
Third-Party Web Tracking Policy and Technology
This is a report by the donottrack.us group on the state of existing third-party tracking schemes, using the auditing framework they’ve constructed. Key points:
Existing voluntary opt-outs are (by corporate admission) not really opting you out of tracking, just of behavioral advertising, which (in the speaker’s words) is totally backward and deceptive.
Tracking systems really do use all those dirty tricks you’ve heard
about for not letting you erase their tracking info (supercookies
and the like).
Anti-tracking technology tends to prevent the browser from talking to
these third parties at all; it all uses manually curated blacklists,
which are fragile, not comprehensive, time-intensive to maintain, and
the tracking agencies deliberately try to evade them. The speaker called
this a cat-and-mouse arms race.
Worse, users have to
pick a blocklist, and it’s not fair to expect them to do this
in an informed manner. Some commercially-sponsored lists are not only
ineffective but negate the effects of other lists!
EvilSeed: A Guided Approach to Finding Malicious Web Pages
The authors wanted to develop a more efficient way to identify malicious URLs than blind crawling. Their idea was to derive new candidate URLs from information in known malicious URLs and/or pages. Malicious pages often contain links to other malicious pages, or keywords that are likely to lead to more malicious pages (they tried both keyword and n-gram extraction). Malicious domains are often registered in bulk, so if a domain is malicious, other domains registered at about the same time are probably malicious too. Putting it all together, they get a stream of candidate URLs of which 2% are malicious. This is a tenfold improvement over blind crawling, but it wasn’t clear how much it helped overall. The ultimate goal was to cut down on the load on Bing’s malware crawler, but they didn’t give any numbers on that.
Rozzle: De-Cloaking Internet Malware
This talk tackled another aspect of the crawling-for-malware problem: Malicious sites often attempt to conceal themselves from search engines. That is, if you claim to be a search engine (by IP, or by user-agent, or by browser fingerprint) you may get an innocuous-looking page instead of the malware. There are obvious workarounds for this. It gets worse: nowadays malware-distribution sites try to detect known-vulnerable user configurations, and only bother actually sending down an exploit if it’ll work. This involves fingerprinting the browser and its plugins in detail, which is typically done with JavaScript.
The difficulty, of course, is that if you want to find all the
malware that a site can produce, you have to poke it with lots of
different system configurations. Or you can take the Rozzle approach,
which is to do a little bit of symbolic execution and investigate all
possible control flow paths involving environment-sensitive branches.
Since these are rare, there isn’t much of a combinatorial explosion
problem, and the effect is to appear as vulnerable as possible.
They claim less than 2% overhead on a sample of both malicious and
innocuous URLs, again from Bing’s malware scanner.
(It seems to me that an easier approach would be simply to treat this
fingerprinting code as itself an indication of malware. They
said themselves that only 1.2% of all JS visible to Bing’s crawler
contains branches based on
but 89.5% of malicious JS does.)environment sensitive values
Short Talks
Described briefly.
- SecuritySpeak: Oakland Papers Crammed Into 32 Pictures
Word clouds of all the Oakland papers ever, as a function of time.
Constants: system,
security,
information,
protocol,
message.
Alice
comes and goes.
Attack
shows up in 2004 or so, privacy
in 2005. Went by a
little too fast for me to try to pick out older trends.
- Metric-driven Fuzzing Strategy for Network Protocols
Really about meta-strategies for developing network protocol fuzzers. Does not offer any actual metrics, only ways to come up with protocol-specific metrics based on black-box analysis of protocol or implementations.
- Differential Privacy by Typing for Security Protocols
Theory people like differential privacy because it’s a provable property, but the proofs are machine-generated, lengthy and difficult to comprehend. It gets worse in a distributed environment. Speaker claims to have a scheme that makes it better, and also a way to apply it to protocols as well as databases.
- Privacy, Linearity, and Accountability in Proof-Carrying Authorization Systems
Theory people also like proof-carrying authorization schemes, but they have serious practical limitations, e.g. exposing sensitive data to the world. Speaker proposes to use zero-knowledge proofs instead of direct proofs, then patch up some holes with pseudonyms. Blithe, unsubstantiated claim that pseudonyms cannot be correlated across services.
- Fighting Parasite Hosting: Identifying and Mitigating Unauthorized Ads on Your Webserver
Parasite hosting
is spamvertised content on hacked sites which
is only visible to search engines and links from search engines or spam;
makes it hard for the legitimate administrator to find and remove.
Speaker developed a crawler that can find this stuff on a
suspected-compromised site, by pretending to be a search engine, in
various ways. Also some preliminary findings about what kind of sites
tend to get hit with this stuff.
- Learning from Our Mistakes: Us LUSSRs Are In Such a MESS
Announcement of the LASER workshop, encouraging people to share both successes and failures for methodological dissection. Unfortunately by invitation only and limited to 50 attendees.
- Provably Secure and Practical Onion Routing
Black-box formal model of Tor-style onion routing, with a bunch of analytic deductions claimed but not actually described (it would have been hard to do it in five minutes); interested parties referred to the full paper.
- Long Tail Cyber Education
What do we want people to learn about cyber-security (blech), and
who should learn it? Proposes a long-tail model for specialized
knowledge in this field. Wants people to study the transition region
from everyone should know this
to only a few specialists need
to know this
. Points at the Weather Channel as a good example of a
good, seamless presentation of both general- and special-interest
info.
- Why Johnny’s not Robust
Claim that everything is too complicated; abstraction boundaries are in the wrong place; system services are inadequate; and lots of wheels are getting badly reinvented. Want to engage in radical simplification and provision of one-size-fits-all APIs that are impossible to use incorrectly. Complete failure to understand the actual problem here, IMNSHO, and in particular the true nature of complexity. (These people also had a poster yesterday. I talked to them. I was not impressed.)
- TARDIS: Secure Time Keeping For Embedded Devices Without Clocks
Defense against brute force attacks on unpowered smartcard-type
devices, which have no clock, therefore no notion of query rate. Use
SRAM decay as a way to measure time: for most use cases, can simply
refuse queries if the memory contents have not fully decayed. Can make
the time shorter by heating the card, but only within its design
operating range, and still slows down brute-force by orders of
magnitude. No special hardware required. Cute backronym: Time And
Remanence Decay In SRAM
- Protecting Query Privacy in Cloud Database Hosting
It’s known how to prevent a cloud-database provider from extracting sensitive information directly from the tables, but what about inference from queries? No answers offered; speaker tried a few obvious things but found them lacking.