Attention conservation notice: 900 words of inside baseball about Mozilla. No security content whatsoever.
The Mozilla Project has
been taking a whole lot of flak recently over its new rapid
release cycle
, in which there is a new major version of
Firefox (and Thunderbird) every six weeks, and it potentially breaks all
your extensions. Especially the big complicated extensions like Firebug that people cannot live
without. One might reasonably ask, what the hell? Why would any
software development team in their right mind—especially a team
developing a critical piece of system infrastructure, which is what Web
browsers are these days, like it or not—inflict unpredictable breakage
on all their users at six-week intervals?
The first thing to know is that Firefox’s core developers are really
focused on making the Web better. If we weren’t, we would be hacking on
something other than a Web browser. The old release cycle was way too
slow for us to do that effectively; as Jono Xia describes in his blog
post It’s
Not About the Version Numbers,
anything we did might not get out
to end users for over a year. When David Baron fixed visited-link history
sniffing, he patched Firefox first—but Chrome and Safari shipped the
change before we did.
You should read Jono’s post now. I’ll be here when you get back.
Shipping new versions of the browser every six weeks is clearly a better way to improve the Web rapidly, than shipping a new version only once a year or so. But what’s stopping the Mozilla team from shipping a new batch of under-the-hood improvements to the Web every six weeks without breaking anything? Why do we need to break things?
Well, we tried not breaking things for ten years, give or take, and
it didn’t work. The second thing to know is that the core browser
(Gecko
) suffers from enormous technical debt. Like any large,
15-year-old piece of software, we have code in abundance that was
written under too much time pressure to get it right, was written so
long ago that nobody remembers how it works, isn’t comprehensively
tested, or any combination of the above. We also have major components
that reasonably seemed like good ideas at the time, but have since
proven to be a hindrance (XUL, XBL, XPConnect, etc). We have other major
components that should have been recognized as bad ideas at the time,
but weren’t (XPCOM, NSPR, etc). And we have code for which there is no
excuse at all (Firefox still had code using the infamous Mork
file format until just this summer, and I understand it’s still live
in Thunderbird).
It gets worse: many of the bugs can’t be fixed without
breaking stuff. For example, take bug
234856. That’s a seven-year-old display glitch. What could
possibly be an excuse for not fixing a simple display glitch for seven
years? Well, the root cause of that bug (described more clearly in bug
643041, where the actual fix is posted) is an error in an XPCOM
interface that, until we decided we weren’t going to do this anymore
(post-FF4), was frozen
—it could not be changed even though it
was wrong, precisely so that extensions could depend on it not
changing. There are thousands of XPCOM interfaces, and
extensions can use all of them. That’s a great strength: it
lets Firefox extensions do far more than, say, Chrome extensions can.
That’s also a huge problem for people trying to make the core better.
(Only about 200 of these interfaces were permanently frozen, but pre-FF4
we tried to avoid changing even the un-frozen ones as much as possible.)
You’ll notice that the change in bug 643041 makes it easier to
write extensions that manipulate SSL certificates, because now there’s
just one nsIX509Cert
interface, not three. But
taking away nsIX509Cert2
and nsIX509Cert3
breaks code that was using them.
Some bugs can’t even be fixed without breaking Web sites. Any time
Gecko doesn’t do the same thing Webkit and/or IE do, we (and the Webkit
and IE people) want to make that difference go away—but to do that, at
least one of the three has to change, and there may be sites out there
relying on the behavior that just got taken away. In some cases, adding
features breaks the web. For instance, if you write
directly in
your HTML, when the event fires, the JavaScript interpreter will try to
call a method of <element onevent="do_something()">
element
’s DOM API named
do_something
before it tries to call a global function with
that name. Which means that adding DOM methods to any HTML element
potentially breaks websites. (This is not a problem if you assign to
element.onevent
from a <script>
.)
This is why Mozilla core developers can seem so callous to the needs of extension and website developers built on Gecko. We know that we depend on both groups for our continued relevance—a browser is no use at all with no websites to browse, and without extensions there is not much reason to pick one browser over another. But we feel that right now it is more important to fix the problems with our existing platform than to provide stability. In the long run, we will have a better platform for both groups to work with. And in the long run, stability will come back. There are many bugs to fix first, but there are not infinitely many bugs, even if it seems like it sometimes. Having said that, there are some things we could be doing right now to make extension and website developers’ lives better … but I’m going to save them for the next post. 900 words is enough.
Note to commenters: I know lots of people are unhappy with the UX changes post-FF3.6, but let’s keep this to discussion of API breakage, please.