What goes on Planet Mozilla: a survey

Attention conservation notice: probably not of interest to anyone who doesn’t read this blog via Planet Mozilla.

I currently syndicate everything on this blog to Planet Mozilla. Given the ongoing discussion of what does and does not belong there, I would like to poll the audience: How do you feel about any of the following topics appearing on Planet?

  • Details of my recent trip to $COUNTRY
  • Offers to give away unwanted items prior to moving
  • Musings about pottery
  • Musings about video game design
  • Small programs that were a pain to write and might be useful to someone else maybe someday
  • Various Internet-security-related topics which may or may not have anything to do with The Web
  • incredibly hypothetical ideas, zany schemes, and related philosophizing
  • detailed reports on my academic research
  • explanations for a lay audience of how to use the Internet safely
  • summaries of the research presented at $CONFERENCE

This is a sincere question, which I am asking in order to decide whether I, personally, should start filtering what gets syndicated to Planet from here.

Notes from Poland

My grandfather David and his brothers grew up in the small city of Ostrowiec. They emigrated to the USA in 1938, and as far as I know, none of the family have set foot in Poland since. Until now; this academic year, my sister Dara is living in Warsaw on a Fulbright scholarship to study Polish theater and its relationship to the Greek chorus. Pam and I went to visit her over the winter holidays, continuing a family tradition of her studying abroad and me visiting. Here are highlights and selected photos. Full photo album, as usual, on Flickr.

Continued…

The ethics of preventing third-party net filtering

I haven’t posted anything research-related in a while because I’ve been on a project that I’m not supposed to talk about till it’s done, and it’s not done yet. I can say, though, that it’s about ways to get around country-scale filtration of the Internet. I’m writing it up now, starting with the threat model, as you do:

Alice Arishat wishes to publish things for Brutus to read. Cato does not approve of what Arishat has to say, and seeks to prevent her from publishing anything.

Most online discussion of censorship starts from the premise that Cato is automatically in the wrong here. That’s one of the cypherpunk premises that underpin most discussion of theoretical Internet security. I want to play devil’s advocate today, though, and explore circumstances where we might choose to support Cato. In the offline world, we trade off free speech against all sorts of other values every day:

Continued…

unearthed arcana (music division)

Some time ago—I don’t remember how long precisely—I started working on a mixtape. I got as far as writing down a bunch of songs in categories, and then I lost interest, and the list has been cluttering up my desk ever since. The category tags no longer make a great deal of sense and I’m not even sure who sings some of these songs anymore, but if I put it into the computer then I can get rid of the paper cluttering up my desk, and maybe the magic of the internets will do something with it.

Continued…

test your file locking

This PUBLIC SERVICE ANNOUNCEMENT is brought to you by the I JUST WASTED AN HOUR ON THAT Foundation:

Do you suffer from mysteriously hanging autotools processes? Or perhaps other mysteriously hanging processes? If so, you may have a problem with your file locking, and the IJWAHOT Foundation recommends you compile and run this program on the computer with the problem, preferably under strace or equivalent. If it, too, hangs, then you do indeed have a problem with your file locking. The Foundation does not presently know the cause of this problem, but we suspect that it is NFS’s fault somehow. If you do know the cause of this problem, we would love to hear about it in the comments.

Breaking things every six weeks

Attention conservation notice: 900 words of inside baseball about Mozilla. No security content whatsoever.

The Mozilla Project has been taking a whole lot of flak recently over its new rapid release cycle, in which there is a new major version of Firefox (and Thunderbird) every six weeks, and it potentially breaks all your extensions. Especially the big complicated extensions like Firebug that people cannot live without. One might reasonably ask, what the hell? Why would any software development team in their right mind—especially a team developing a critical piece of system infrastructure, which is what Web browsers are these days, like it or not—inflict unpredictable breakage on all their users at six-week intervals?

Continued…

Icons of the Future City

Way back at the 2010 Mozilla Summit, one of the keynote speakers showed us an amazing demo flythrough of a 3D-rendered futuristic city, with embedded video, tweets, and the like, all running live inside a Firefox 4 beta thanks to awesome new tech like WebGL and JägerMonkey. (Note: in the linked video, the city only appears about a minute in.) That’s not what I want to talk about, though.

It occurred to me while I was watching, that there is a standard futuristic city used in demos like this one. It’s night. You can’t see the ground. Skyscrapers stretch all the way to the horizon. Said skyscrapers are glass oblongs, for the most part; this demo mixed it up quite a bit with interesting cross-sections, but still had hardly any ornamentation, terracing, or what-have-you. All the skyscrapers’ windows are lit up. There may be flying vehicles between or around the towers, but there is no sign of any other type of transportation. It is, in short, the future of the Futurists of the nineteen-teens, the city of Metropolis, Blade Runner, and Neuromancer.

Now the thing is, no city in the real world has ever looked like that. Even in the densest and most skyscraper-ful urban areas—have a look at these aerial videos of Manhattan and Hong Kong, for instance—there are buildings that are less than ten stories tall (these are in fact the majority in Manhattan, although possibly not in Hong Kong); there are parks and other open spaces; and by no means are all of the buildings boring oblongs. Furthermore, people doing actual urban design argue, vehemently, over whether or not dense skyscraper-ful cities are best (e.g.: pro, con and I think nobody would argue, anymore, that open space is unnecessary.

And yet, when we want an icon of the city of the Future, the Futurists’ vision is what we turn to. Why? Perhaps because it’s instantly recognizable, or because it’s easy to build 3D models for. But I claim this is causing this discredited vision to occupy a share of the casual imagination that it does not deserve anymore. It crowds out other visions with its readiness to hand. Let’s invent some new icons for the future city. Let’s make the next demo flythrough be of something like this or this or this. (But watch out for the just-as-discredited Radiant City vision, please.)

A Zany Scheme for Compact Secure Hashes

Lots of current and near-future tech relies heavily on secure hashes as identifiers; these are usually represented as hexadecimal strings. For instance, in a previous post I threw out the strawman h: URN scheme that looks like this:

<!-- jQuery 1.5.2 -->
<script src="h:sha1,b8dcaa1c866905c0bdb0b70c8e564ff1c3fe27ad"></script>

Now the problem with this is, these hexadecimal strings are inconveniently long and are only going to get longer. SHA-1 (as shown above) produces 160-bit hashes, which take 40 characters to represent in hex. That algorithm is looking kinda creaky these days; the most convenient replacement is SHA-256. As the name implies, it produces 256-bit hashes, which take 64 characters to write out in hex. The next generation of secure hash algorithms, currently under development at NIST, are also going to produce 256-bit (and up) hashes. The inconvenience of these lengthy hashes becomes even worse if we want to use them as components of a URI with structure to it (as opposed to being the entirety of a URN, as above). Clearly some encoding other than hex, with its 2x expansion, is desirable.

Hashes are incompressible, so we can’t hope to pack a 256-bit hash into fewer than 32 characters, or a 160-bit hash into fewer than 20 characters. And we can’t just dump the raw binary string into our HTML, because HTML is not designed for that—there is no way to tell the HTML parser the next 20 characters are a binary literal. However, what we can do is find 256 printable, letter-like characters within the first few hundred Unicode code points and use them as an encoding of the 256 possible bytes. Continuing with the jQuery example, that might look something like this:

<script src="h:sha1,пՎЦbηúFԱщблMπĒÇճԴցmЩ"></script> <!-- jQuery 1.5.2 -->

See how we can fit the annotation on the same line now? Even with sha256, it’s still a little shorter than the original in hex:

<!-- jQuery 1.5.2 -->
<script src="h:sha256,ρKZհνàêþГJEχdKmՌYψիցyԷթνлшъÁÐFДÂ"></script>

Here’s my proposed encoding table:

    0              0 1              1
    0123456789ABCDEF 0123456789ABCDEF
 00 ABCDEFGHIJKLMNOP QRSTUVWXYZÞabcde
 20 fghijklmnopqrstu vwxyzþ0123456789
 40 ÀÈÌÒÙÁÉÍÓÚÂÊÎÔÛÇ ÄËÏÖÜĀĒĪŌŪĂĔĬŎŬÐ
 60 àèìòùáéíóúâêîôûç äëïöüāēīōūăĕĭŏŭð
 80 αβγδεζηθικλμνξπρ ςστυφχψωϐϑϒϕϖϞϰϱ
 A0 БГДЖЗИЙЛПФЦЧШЩЪЬ бгджзийлпфцчшщъь
 C0 ԱԲԳԴԵԶԷԸԹԺԻԽԾԿՀՁ ՂՃՄՅՆՇՈՉՊՋՌՍՎՐՑՒ
 E0 աբգդեզէըթժիխծկհձ ղճմյնշոչպջռսվրցւ

All of the characters in this table have one- or two-byte encodings in UTF-8. Every punctuation character below U+007F is given special meaning in some context or other, so I didn’t use any of them. This unfortunately does mean that only 62 of the 256 bytes get one-byte encodings, but storage compactness is not the point here, and it’s no worse than hex, anyway. What this gets us is display compactness: a 256-bit hash will occupy exactly 32 columns in your text editor, leaving room for at least a few other things on the same line.

Choosing the characters is a little tricky. A whole lot of the code space below U+07FF is taken up by characters we can’t use for this purpose—composing diacritics, control characters, punctuation, and right-to-left scripts. I didn’t want to use diacritics (even in precomposed form) or pairs of characters that might be visually identical to each other in some (combination of) fonts. Unfortunately, even with the rich well of Cyrillic and Armenian to work with, I wasn’t able to avoid using a bunch of Latin-alphabet diacritics. Someone a little more familiar with the repertoire might be able to do better.

How To Choose Passwords

When I talk to people who aren’t security researchers about history sniffing, they want to know whether they should worry about it, and I say no: the only thing you can do to protect yourself is use the latest version of your favorite browser, which you should do anyway; besides, the interactive attacks will probably never appear in the wild. But if I only ever talk about computer security topics that are only relevant to researchers, I’m not helping people as much as I could, and I’m scaring them about things they can’t control. So this post is about something you should worry about, because it’s under your direct control; lots of people do it poorly and that does make them less safe online; and it’s easy to do well. That thing is choosing passwords.

You have probably heard that you shouldn’t reuse the same password on many different websites, and that your passwords should be long, contain numbers and punctuation, and avoid dictionary words. But you probably haven’t heard anyone explain why, and you probably have noticed that these two pieces of advice are hard to follow at the same time, because long gibberish passwords are hard to remember even if you only have one of them. I’m going to tell you why you should do these things, and how to do them without too much grief.

Don’t use the same password on many different websites

No matter how good your password is, the bad guys might discover what it is. For instance, if you log into an unencrypted website over an unencrypted wireless network, anyone else on the same wireless network can listen in on the radio traffic and discover your password. (It’s just like eavesdropping on a private conversation.) Or you might accidentally type your password into a website that looks like the real thing but is actually a fake created to trick you.

Suppose the bad guys have discovered your password for a Web forum. That’s not a big deal, because someone impersonating you on one forum probably isn’t a big deal. You might have to apologize to some people for letting some schmuck insult them while pretending to be you. But the bad guys know that people often use the same password on many different websites, so they’re going to try to log into your email with that password, and your bank, and so on. If they succeed—if you did use the same password—they might be able to ruin your life, or at least steal some of your money. But if you always use different passwords on different websites, the bad guys have to discover the password you use for your bank (and nothing else) in order to steal your money.

How do you manage to remember lots of different passwords, especially when (as I’m about to explain) they all need to be long and complicated? The best way is to let the computer—specifically, your browser’s password manager—do it for you. This may seem unsafe, but it’s actually much safer than using the same password for everything. The password manager cannot be fooled by phishing sites, and it has no trouble remembering lots of long complicated passwords. Yes, all the passwords are in a file on your computer. But the only way the bad guys can get at that is by physically stealing your computer, or installing spyware on it remotely. If you keep your computer up to date with security patches, you don’t have to worry about spyware much. If your computer is in danger of being physically stolen (e.g. it’s a laptop) you should use the master password mode of your browser’s password manager, so that the file on your computer is encrypted. Whether or not you have to worry about theft, you should enable Sync, or equivalent feature, even if you have no other computer to sync with; that way, if your computer breaks, there’s still a backup of all your passwords out there in the cloud (safely encrypted).

Use long, complicated passwords

The other way the bad guys discover passwords is by breaking into servers that store entire databases of them. If these databases have been designed correctly, that doesn’t tell them anything by itself, because the passwords are hashed. Hashing deserves a little explanation: suppose my password on some site is 12345 (the kind of thing that an idiot would have on his luggage). The server doesn’t store 12345 in its database, it stores 827ccb0eea8a706c4c34a16891f84e7b, which is the result of running 12345 through a cryptographic hash, in this case MD5. It’s easy to convert a password into its hash, but it’s prohibitively hard to do the reverse. MD5 is old and no longer considered a good choice for passwords (or anything, for that matter), but there is still no known algorithm to take an arbitrary MD5 hash and reveal an input that produces that hash, other than guess-and-check.

So the bad guys can’t just read the passwords from a database once they have it. But they can guess passwords, run the guesses through MD5 (or whatever was used), and compare the results to the database entries. (They can guess passwords even if they haven’t stolen a database, by feeding the guesses to the site’s login form—but that’s much slower and the site admins are likely to notice.) 12345 isn’t a good password because it’s easy to guess—but so is any five-digit number: a cheap laptop can calculate the MD5 of all 100,000 five-digit (or smaller) numbers in less than a second. There are something like 250,000 words in English—that’s maybe five seconds’ worth of work for the same laptop—so any word in the dictionary is bad, too. You can buy a 40-million-entry word list for $30 that has not only all the words in 20 different languages, but mangled versions of them (e.g. f0od)—that might take an hour or two to process.

The longer and more complicated your password is, the harder it is to guess; but that makes it harder to remember as well. Adding punctuation and numbers doesn’t help as much as one would like. There are 95 characters that you can type on a US keyboard, so there are 958, or about a quadrillion (short scale) possible eight-character passwords, if you use all those characters. A quadrillion possibilities is out of the reach of a cheap laptop, but it’s a few weeks’ effort for a small cluster of beefy computers—a determined bad guy could do this for maybe $25,000.

The good news is, you can have passwords that can’t be guessed this way but are still easy to remember. The trick is to use phrases rather than words. One random English word is 250,000 possibilities. Two random English words are 62.5 billion possiblities—250,000 squared. That’s still not enough. But ten random English words is 250,00010 ≈ 1054 possibilities, which is big enough that a modern supercomputer tasked with the problem would still be guessing when the Sun burns out five billion years from now.

You can’t take just any phrase, though. The bad guys could easily try every phrase in the Oxford Dictionary of Quotations, because there are only 20,000 of them. I haven’t worked out the math, but I think guessing every sentence in the complete works of Shakespeare is doable. But nobody has a database of every sentence in every work of literature that was written with the Latin alphabet. A phrase taken from somewhere in the middle of an obscure but lengthy book is a good choice. Or you could follow this procedure:

  1. Go to Wikipedia and click on random article. (You can use any site with a random article feature for this step, if you’d rather.)
  2. Copy the URL of the page you get, and paste it into the Eater of Meaning. Leave the drop-down on Eat word endings.
  3. Choose ten consecutive words from the result. They don’t have to all come from the same sentence.

Don’t worry about finding a sentence that you can remember yourself, because you’re going to have the password manager do it (unless you’re trying to pick the master password).

Some sites have limits on the length of their passwords. This is bad, and you should complain; but until they fix it, just use the first letter of each word in your ten-word phrase, with some numbers and punctuation if they insist on numbers and punctuation. That kind of password is theoretically crackable, as I said earlier, but it’s likely to be better than lots of other passwords in the database. So if the bad guys get the database, they will crack so many other people’s passwords before they get to yours that they don’t feel they have to bother cracking yours. (It’s kind of like the joke about how fast you need to run away from a lion.)

If there’s no limit on the length of the password, but the site still insists on numbers and/or punctuation, put them in between the words; that’s easier to type.

PSA: like buttons

Because I hit empty spam just a little too fast, erasing the question about this: There are no Facebook like buttons on this site because I myself barely ever use Facebook and don’t really see the point; same same digg, reddit, etc. If you like something you read here enough to want to promote it, please consider mentioning it somewhere you can put in a few words to explain why people should click through (twitter, Facebook wall, sort of thing). Or write a full-sized response article and link back.