I'm back in Mountain View, living at the same apartment complex, working at the same place with the same people, waiting at the same traffic lights on my way to and from work. Even the dented PLDI water bottle that I left in the kitchen cabinet at the office was still there after nine months.1
If I'm lucky, I might even manage to convince myself that I'm qualified for, and capable of, doing my job, on account of having literally done it before.
I'm actually delighted about this. For background, last summer all the Rust interns went to PLDI and received, among other swag, water bottles that I liked a lot. But my bottle, after being crushed underneath the fold-down bed in Alex's and my apartment, was dented badly enough that it was more or less unusable. Sully gave me his bottle as a replacement, and I used it happily for some time, but during a bike ride this past winter, the cap flew off and was lost forever, making that bottle more or less unusable. So it was great to come back to Mozilla and find my dented bottle (with intact cap), which, together with Sully's intact bottle (with missing cap), leaves me with a complete, intact bottle-and-cap pair!
Yesterday and today, I worked on adding destructors to classes. A destructor is a method that can't be named directly, to which the compiler inserts a call whenever the memory for the class is about to be freed (so, whenever its reference count becomes zero). Here's a super simple example of a class with a destructor:
class shrinky_pointer {
let i: @@mut int;
fn look_at() -> int { ret **(self.i); }
new(i: @@mut int) { self.i = i; }
drop { **(self.i) -= 1; }
}
I've been writing posts but not posting them for the past couple days, so now I'm catching up. This one is from Tuedsay, May 8.
Tuesday featured a marathon meeting session in the morning, after I continued working on heap-allocated classes. I started out being kind of puzzled about how to annotate classes in the AST to indicate whether they're allocated in the stack or the heap -- because they already have region parameter annotations. ( Read more... )
I didn't get to writing a blog post on Friday, since I had a couple of
interruptions and didn't get that much code written anyway. The main
news from Friday is that classes can now implement parameterized
interfaces! This was a pretty minor fix. That is to say, I can write:
class cat implements map<int, str> {
... }
which says that the type cat supports table- or map-like
operations mapping integer keys onto string variables. Why a cat would
support that, I don't know, but it's just an example. I went on to
write a test like this:
( Cut for length )
Here is one of those guessing games whose answers will depress you: how many (partial) MIME parsers do we have in Thunderbird? If you guessed zero, ha-ha, very funny; now go sit in the corner. If you guessed one, you've obviously never worked with libmime before. The actual answer depends on how liberal you want to be. On the conservative end, there are no fewer than 5 parsers which go at least as far as parsing multipart messages (including two in a single file). If you choose to go liberally, counting everybody who attempts to split header values to look for a specific value, you gain an additional six that I am aware of… and I have no doubt that more are lurking behind them. I suppose this means that we now have more implementations of message header parsing than we do base64-decoding (although nowadays, it seems like most base64-decoding got stuffed into one method with several wrappers). The complete list as I know it:
nsMsgBodyHandler
nsMsgDBFolder::GetMsgTextFromStream
libmime
IMAP fakeserver (one for body parts, one for body structure, and another spot happily hunts down headers)
NNTP fakeserver (hunts down headers)
nsParseMailbox
nsNNTPProtocol (hunts down headers)
nsMsgLocalStoreUtils (hunts down headers)
Somewhere in compose code. Probably, although it's not clear how much is being funneled back to libmime and how much is not.
A class in necko implements in the RFC 2231 and RFC 2047 decoding.
Well, it's time for to increment that count by one. And then decrement it by at least three (two of the IMAP and the NNTP fakeserver decoders are switched over in a queued patch, and I hope to get the third IMAP fakeserver switched over). I've been working on a new JS MIME parser to Thunderbird for a bit at a time over the past several months, and I have local patches that start using it in tests (and as a replacement for nsIMimeHeaders).
So, why replace libmime instead of consolidating everyone onto it? The short answer is because libmime is a festering pile of crap. It uses an internal object system that can be summarized as "reimplementing C++ in C" and has existed largely in the same form since at least Netscape 5 (the Mozilla classic version gives a date of May 15, 1996). All of that would be manageable were it not for the fact that the architecture is clearly broken. Reliably parsing multipart MIME messages is not a trivial task (indeed, not only does one of the above do it incorrectly, but there is actually a test which may rely on it being wrong. Guess which one it is.), and the fact that so many people have variants on it is a clear indication that the API fails to expose what it ought to expose. This means that the code is in need of change, and the implementation is a form which makes changing things extremely difficult.
The choice to do it in JS was motivated mostly by Andrew Sutherland. There has been a long-term ideal in Thunderbird dating back to at least around 2008 to move more implementation of core code into JS, which would help avoid massive churn spurred on by mozilla-central; nowadays, there is the added benefit that it would aid in efforts like B2G or porting to platforms where native code is frowned upon. MIME code, being extremely self-contained (charset conversion and S/MIME encryption make up the biggest dependencies in the core parser). As of my current implementation, the only external dependency that the MIME parser has is atob, although charset conversion (via the proposed string encoding API) will be added when I get there. In other words, this code is usable by anyone who wants to write a mail client in JS, not just the internal mailnews code.
Another advantage to writing my own library is that it gives me a lot of time to lay out specifications in clearer terms. One called out in the spec are on how to handle seeing boundaries for the non-leaf-most part. My own brain went further and started musing on non-leaf parts getting QP or base64 content-transfer-encodings (which RFC 6532 went ahead and allowed anyways), or multiple nested parts having the same boundary (which the specification, in my view, hints at resolving in a particular fashion). Other developments include the fact that most MIME parsers do not permit as much CFWS as the specification indicates could be present ("Content-Type: multipart / mixed" would be unusable in every MIME parser source I read)I also produced a list of all the specifications that the parser will need to refer to that I have found so far (13 RFCs and a handful of non-RFCs, plus 9 obsoleted RFCs that may still warrant viewing). As for size? The JS implementation is about 600-700 lines right now (including over 300 lines of comments), while the equivalent C++ portions take over a thousand lines to do more or less the same thing.
As I said earlier, one of the problems with libmime is its oppressive architecture. It is primarily designed to drive the message pane while living as a streaming converter. However, it encompasses roughly three steps: the production of the raw MIME tree (or a tree that combine the raw MIME data with pseudo-MIME alternatives), conversion of that tree into a more traditional body-and-attachments view, and then the final driving of the UI. Getting it to stop any earlier is between difficult and impossible; what I have now, once it gets some more testing, can satisfy people who just need the first step. Doing the second step would require me to sit down and document how libmime makes its choices, which is not a task I look forward to doing.
Today is Thursday, and Thursday is bug-triage day. Each day of the week gets assigned to one of the core paid Rust contributors to go through the bug database and triage newer bugs as well as reviewing older ones, looking for duplicates, closing obsolete bugs, and so on. When there's any time left, I grep for the string "FIXME" in the code -- not everybody uses "FIXME", but there's been more than enough FIXMEs to keep me busy -- which usually denotes either a minor issue that can be fixed in minutes (because it was blocked on another compiler feature that's since been completed), or something subtle that unexpectedly ends up taking days. I try to nudge myself towards just filing a new issue in the issue tracker (the goal is to have every FIXME in the code map to at least one issue in the issue tracker), but it's hard to resist that "I can do it!" urge that usually sends me too deep down into some rabbit hole.
I've been keeping track of what bug ranges I review, and today I went over the most recent 34 issues. I should probably start going over the older issues again, since it's been six weeks since I looked at some of them and probably some things have changed. It's always a bit difficult to figure out who to assign bugs to, since our team of paid contributors is still pretty small, and I generally don't want to drop work non-consensually on the volunteer contributors. Assigning milestones is a bit random as well, but I figure if I make a mistake, it'll eventually get noticed in a group meeting and we'll move something to a different milestone. I wish github's issue tracker had a notion of priorities ("trivial", "important", "crucial") as well as milestones -- since some of the issues arising from FIXMEs, especially, can be pretty trivial, but I still want them to be on the record -- but it doesn't. So if at all possible, I try to make sure everything is milestoned (except for RFCs, enhancements, and "wishlist" items -- I see "wishlist" as like "enhancement", but even more optional); I also try to make sure bugs are assigned (using "git blame" to figure out who last touched the relevant bit of code, although it's misleading for occasions when we move directories around and do big reformats), though I'm more ok with leaving things unassigned than unmilestoned. (After all, unassigned bugs are openings for new contributors to get involved!)
Then I started in on FIXMEs. Most of the ones today were either comments I didn't understand, or quickly proved to be way too complicated (like, when I removed an unused constant -- should have been trivial -- promptly causing a scary memory corruption error, probably because the RTS also depended on the "unused" constant). So I filed issues. There were a few minor issues that I could fix quickly, and that was satisfying, like moving code over to use new for loops. When I left the office I had gotten started on one of those bugs where you pull the thread in the sweater and the entire thing unravels (changing some constants from int-type to unsigned-int-type in the back-end -- I think it should be done, but it's death by a thousand type errors). I won't detail the other bugs I fixed and filed, because that would, maybe, be too much detail even for this blog. I find joy and satisfaction in mundane, routine work. It's something I can do, and it's always still there for me no matter how long I've been away. And even though my highfalutin job title is "Research Engineer", it's my feeling that most of the effort involved in turning research into a thing that people can use (whether that thing is a programming language, or a compiler for it, or a browser implemented in the language) is the mundane, routine details that you're just Not Allowed to be too interested in when you're a grad student. Something feels right to me about giving myself over to that unglamorous work now. So while I might say that my blog posts are boring, that's just because I don't automatically expect other people to care just because I care. I've been many things at work over the past few months, but I haven't been bored.
...Except when something's recompiling and I don't have the energy to context-switch when I know I'll have to context-switch again in 2 or 3 minutes. Then I'm bored.
I'm going to see if I can productively write these posts a bit at a time, during long compiles. Whee!
I mentioned how yesterday, I got everything to stop being broken and returned to a state in which the only thing that was breaking was my new test case. Here it is, btw:
// xfail-fast
// aux-build:cci_class_cast.rs
use cci_class_cast;
import cci_class_cast::kitty::*;
import to_str::*;
import to_str::to_str;
fn print_out<T: to_str>(thing: T, expected: str) {
let actual = thing.to_str();
#debug("%s", actual);
assert(actual == expected);
}
fn main() {
let nyan : to_str = cat(0u, 2, "nyan") as to_str;
print_out(nyan, "nyan");
}
I'm going to make the bold assumption that you already can read Rust (but if not, there's a tutorial that may or may not reflect the version of the language I'm using.) The test depends on an auxiliary crate; the "aux-build" directive in a comment tells our test runner to build it; the use directive links this program with the other crate, called cci_class_cast, and finally, the import directive imports the module kitty within the crate cci_class_cast. And yes, I try to make as many of my examples as possible involve cats. There's something wrong with that?
( Cut for length )
In general, I don't blog about work. That's not a matter of policy, just that
the topics that move me enough to cause me to sit down and write for a couple
of hours tend not to be work-related. You might say that I should question the
line of work I'm in because of that -- and believe me, I have. On the other
hand, I do stay up late working because I have to know the
solution to this problem rather than because there's a deadline (deadlines?)
on a regular basis. And when I do find out the solution, I'm usually too tired
to write about it. So there's that.
There are plenty of really good reasons to document what I do as I go along,
though, and all of those reasons are centered around me rather than other
people. Thus I'm going to start trying to write for an audience of one. But if I
write here, in my public blog, rather than in a private text file, I'll be
forced to write clearly enough that others might have a chance at
understanding it -- which means I'll probably be able to understand it later
myself, even if no one else ever does. So there's also that.
Some of my colleagues do a really great job writing
beautifully detailed, explanatory posts, but I'm not going to try to do that,
because it's just too intimidating. Instead, I'm going to write as close to
every day as possible (though I won't beat myself up too much if I miss a day), as much as possible, and as uninterestingly as
possible. You have been warned.
Today, I went back to working on implementing classes in Rust, as
I've been doing for more or less the past four months. (The last two days of
last week, I took some time to do bug triage
and to fix what I
thought was going to be an easy bug (to give myself that key little
dopamine-surge of accomplishment) and turned out not to be.) When I left off
working on it before, I was in one of those truly gruesome states where you're
trying to add support for a new feature -- in this case, the ability to cast a
class to an interface
type -- but it breaks everything and you don't know why. In this case, I
was trying to unify the code that handles implementations of interfaces (existing code), and classes that
implement interfaces (new code), so I guess I did know why it broke (I messed
up how interfaces got typechecked), but not how to fix it.
( Cut for length )
Dave’s efforts to expose undergraduate computer science students to the realities of Firefox development are inspiring and unparalleled in Higher Ed. I highly recommend those interested in CS Ed to familiarize themselves with his past teaching work at Seneca, and the extremely positive outcomes for his students.
This feels a little bit like the web platform having opened a door to hell and Zombies running out of it. I wonder if we can ever close it again.
– Malte Ubl
Let’s see if we can. I’ve had a bunch of productive conversations since my post the other day.
I talked about how specifying little-endian would force big-endian browser vendors to choose one implementation strategy—emulate little-endian by byte-swapping and try to optimize as best they can—and concluded that it was better to let them decide for themselves and see how the market shakes out before specifying. But that doesn’t take into account the cost to web developers, which should always be the first priority (mea culpa).
Leaving it unspecified or forcing developers to opt in to a specified endianness taxes developers: it leaves them open to the possibility of their sites breaking on systems they likely can’t even test on, or forces them to make sure they pass the argument (in which case, they’d always be one forgotten argument away from possible bustage on some platform they can’t test on).
Imagine that instead of defaulting to unspecified behavior, we defaulted to little-endian—which is the de facto semantics of the web today—but apps could opt in to big-endian with an optional argument. Then a carefully-written app could use this (in combination with, say, a navigator.endianness feature test API) to decide which byte order would give them better performance. On little-endian systems, they’d use little-endian, on big-endian systems, they’d use big-endian. Less carefully-written apps that just went with the default might get some performance degradation in big-endian platforms, but we don’t actually know how bad it would be. But crucially, there would be no way to accidentally break your app’s behavior.
But let me take it one step further. I don’t even think we know that that additional option will be needed. For now, we don’t even know of any big-endian user agents that are implementing WebGL, nor do we know if byte-swapping will be prohibitively expensive. Until then, I say any additional API surface area is premature optimization. YAGNI.
In summary: let’s prioritize web developers over hypothetical performance issues on hypothetical browsers. Typed arrays should be standardized as little-endian—full stop.
But the typed arrays spec doesn’t specify a byte order. So a browser on a big-endian system (say, a PowerPC console like Xbox or PS3) is allowed to print 0. In short: casting an ArrayBuffer to different types is unportable by default. It’s up to web developers to canonicalize bytes for different architectures.
Now, we could just require typed arrays to be little-endian, once and for all. After all, almost all platforms are little-endian these days. The few big-endian platforms could just automatically reorder bytes for all typed array accesses. But this would have to be made to work with WebGL, which works by sending application-generated buffers to the GPU. In order to make this work on a big-endian architecture, little-endian-encoded ArrayBuffer data would need to be translated when sending back and forth to the GPU. Technically, this might be possible, but there’s really no evidence that it would have acceptable performance.
On the other hand, can we really trust that web applications will write portable code? Imagine a hashing algorithm that builds an internal ArrayBuffer and casts it to different types. If the code isn’t written portably, it’ll break on a browser implementing big-endian typed arrays.
This leaves big-endian browsers with a nasty decision: try to emulate little-endian typed arrays to protect against unportable application logic, and suffer the complexity and performance costs of translating data back and forth to the GPU, or just hope that not too many web pages break. Or perhaps surface an annoying decision to users: do you want to run this application in fast mode or correct mode?
For now, we should let browser vendors on big-endian systems make that decision, and not force the decision through the spec. If they end up all choosing to emulate little-endian, I’ll be happy to codify that in the standards. As I understand it, TenFourFox can’t support WebGL, so there the best decision is probably to emulate little-endianness. On an Xbox, I would guess WebGL performance would be a higher priority than web sites using internal ArrayBuffers. But I’m not sure. I’d say this is a decision for big-endian browsers to make, but I would greatly welcome their input.
In the meantime, we should do everything we can to make portability more attractive and convenient. For working with I/O, where you need explicit control over endianness, applications can use DataView. For heterogeneous data, there’ll be ES6 structs. Finally, I’d like to add an option for ArrayBuffers and typed arrays to be given an optional explicit endianness:
12345
varbuffer=newArrayBuffer(1024,"little");// a little-endian buffervara1=newUint32Array(buffer);a1[0]=1;vara2=newUint8Array(buffer);a2[0];// must be 1, regardless of system architecture
With the endianness specified explicitly, you can still easily write portable logic even when casting—without having to canonicalize bytes yourself. Emscripten and Mandreel could benefit from this increased portability, for example, and I think crypto algorithms would as well. I’ll propose this extension to Khronos and TC39, and discuss it with JavaScript engine implementors.
I’ve never really understood what “homoiconic” is supposed to mean. People often say something like “the syntax uses one of the language’s basic data structures.” That’s a category error: syntax is not a data structure, it’s just a representation of data as text. Or you hear “the syntax of the language is the same as the syntax of its data structures.” But S-expressions don’t “belong” to Lisp; there’s no reason why Perl or Haskell or JavaScript couldn’t have S-expression libraries. And every parser generates a data structure, so if you have a Python parser in Python, then is Python homoiconic? Is JavaScript?
Maybe there’s a more precise way to define homoiconicity, but frankly I think it misses the point. What makes Lisp’s syntax powerful is not the fact that it can be represented as a data structure, it’s that it’s possible to read it without parsing.
Wait, what?
It’s hard to explain these concepts with traditional terminology, because the distinction between reading and parsing simply doesn’t exist for languages without macros.
Parsing vs reading: the compiler’s view
In almost every non-Lispy language ever, the front end of every interpreter and compiler looks pretty much the same:
Take the text, run it through a parser, and you get out an AST. But that’s not how it works when you have macros. You simply can’t produce an AST without expanding macros first. So the front-end of a Lispy language usually looks more like:
What’s this intermediate syntax tree? It’s an almost entirely superficial understanding of your program: it basically does paren-matching to create a tree representing the surface nesting structure of the text. This is nowhere near an AST, but it’s just enough for the macro expansion system to do its job.
Parsing vs reading: the macro expander’s view
If you see this statement in the middle of a JavaScript program:
123
for(letkeyinobj){print(key);}
you know for sure that it’s a ForInStatement, as defined by the spec (I’m using let because… ES6, that’s why). If you know the grammar of JavaScript, you know the entire structure of the statement. But in Scheme, we could implement for as a macro. When the macro expander encounters:
12
(for(keyobj)(printkey))
it knows nothing about the contents of the expression. All it knows is the macro definition of for. But that’s all it needs to know! The expander just takes the two subtrees, (key obj) and (print key), and passes them as arguments to the for macro.
This macro works by pattern matching: it expects two sub-trees, the first of which can itself be broken down into two identifier nodes x and e1, and it expands into the for-each expression. So when the expander calls the macro with the above example, the result of expansion is:
1
(for-each (λ(key)(printkey))obj)
The power of the parenthesis
If you’ve ever wondered why Lisp weirdos are so inexplicably attached to their parentheses, this is what it’s all about. Parentheses make it unambiguous for the expander to understand what the arguments to a macro are, because it’s always clear where the arguments begin and end. It knows this without needing to understand anything about what the macro definition is going to do. Imagine trying to define a macro expander for a language with syntax like JavaScript’s. What should the expander do when it sees:
1
quux(mumble,flarg)[1,2,3]{foo:3}grunch/wibble/i
How many arguments does quux take? Is the curly-braced argument a block statement or an object literal? Is the thing at the end an arithmetic expression or a regular expression literal? These are all questions that can’t be answered in JavaScript without knowing your parsing context—and macros obscure the parsing context.
None of this is to say that it’s impossible to design a macro system for languages with non-Lispy syntax. My point is just that the power of Lisp’s (Scheme’s, Racket’s, Clojure’s, …) macros comes not from being somehow tied to a central data structure of the language, but rather to the expander’s ability to break up a macro call into its separate arguments and then let the macro do all the work of parsing those arguments. In other words, homoiconicity isn’t the point, read is.
relies on Automatic Semicolon Insertion (ASI) and so cannot be minified except by parsing fully (including ASI), observing the significance of the newline after clearMenus(), and inserting a semicolon when stripping that newline.
FWIW, I agree with Doug’s canonically grumpy tone if not his substance; more below on the substance.
I also agree with @cramforce and @jedschmidt that the && line is an abusage, allowed due to JS’s C heritage by way of Java, but frowned upon by most JS hackers; and that an if statement would be much better style (and, I take it, help JSMin do right). But this particular criticism is too ad hoc to help resolve the general “Let me have my ASI freedom and still minify, dammit!” debate.
TC39 is considering the use of ! as an infix operator. This code will break in the future. Fix it now. Learn to use semicolons properly. ! is not intended to be a statement separator. ; is.
The !-as-infix-operator idea is proposed as syntactic sugar for promises, which may or may not make it into Harmony with that exact syntax, or with any syntactic sugar at all.
Doug’s right that ! is not a statement terminator or “initiator”. And (my point here), neither is newline.
But search for [nlth] in the proposed promises grammar and you’ll see something surprising about ASI and infix operators: we can add new infix operators in the future, whether new contextual keyword-operators (e.g., is and isnt — BTW these are in doubt) or retasked, existing unary-prefix operators, provided that we insist on [no LineTerminator here] immediately to the left of any such infix operator.
(In ECMA-262, [no LineTerminator here] is used in so-called “restricted productions” to make contextually-significant newlines, e.g., after return without any expression of the return value on the same line.)
This future-friendliness to new infix operators comes directly from ASI as a newline-sensitive error correction procedure, as the example at top demonstrates. Try other examples using a leading identifier on a well-formed second line and you’ll see the same effect. Removing the newline introduces an early error, which creates homesteading space for new infix operators in a later edition of ECMA-262. Examples:
let flag = x is y; // no \n before 'is'!
x ! p = v; // Q(x).put(’p’, v)
An aside on coding style: if we add new infix operators used in restricted productions, this gives weight to the JS coding style that puts infix operators in multiline expressions at the end of continued lines, rather than at the beginning of continuation lines.
So while I agree with Doug on those two lines of code from Bootstrap (an excellent JS library, BTW) exhibiting poor style, it is not the case that such code as written could break in the future, even if we were to adopt the !-as-infix-operator strawman. The first line terminator in that example is indeed significant.
The moral of this story: ASI is (formally speaking) a syntactic error correction procedure. If you start to code as if it were a universal significant-newline rule, you will get into trouble. A classic example from ECMA-262:
a = b + c
(d + e).print()
Similar hazards arise with [, /, and unary + and -. Remember, if there wasn’t an error, ASI does not apply.
This problem may seem minor, but JS file concatenation ups the ante. For this reason some style guides (Dojo, IIRC) advocate starting your reusable JS file with ;, but people don’t know and it’s easy to forget.
I wish I had made newlines more significant in JS back in those ten days in May, 1995. Then instead of ASI, we would be cursing the need to use infix operators at the ends of continued lines, or perhaps \ or brute-force parentheses, to force continuation onto a successive line. But that ship sailed almost 17 years ago.
The way systematic newline significance could come to JS is via an evolution of paren-free that makes it to Harmony status. I intend to work on this in the strawman, but not for ES6.
Some of the github issue comments are naive or idealistic to the point of being silly. Since when does any programming language not have syntax arguments? All living, practical languages that I know of, even those with indentation-based block structure and similar restrictions, have degrees of freedom of expression that allow abusage as well as good usage. Language designers can try to reduce degrees of freedom, but not eliminate them completely.
My two cents: be careful not to use ASI as if it gave JS significant newlines. And please don’t abuse && and || where the mighty if statement serves better.
I’ll also say that if it were up to me, in view of JS’s subtle and long history, I’d fix JSMin. But I would still log a grumpy comment or two first!
Similar to the previously-posted PoC Light Table, but less ambitious and actually available. Also has a lot of headroom wrt. visualizing code coverage, data flow, …
If we start to try to make “Mozilla” mean “those people who share not only the Mozilla mission but also my general political / social / religious / environmental view” we will fail. If we focus Mozilla on our shared consensus regarding the Mozilla mission and manifesto then the opportunities before us are enormous.
Mozilla’s diversity is a success condition. Our mission and our goal is truly global. Our mission taps into a shared desire for respect and control and user sovereignty that runs across cultures and across many other worldviews. We may even offend each other in some of our other views. Despite this, we share a commitment to the Mozilla mission. This is a remarkable achievement and important to our continued success.
I agree with every word of this, and I believe it applies to other communities of which I’m a member. If not, these communities will tend to splinter, and that is likely to be a net loss for everyone.
Background
A donation that I made in support of California Proposition 8 four years ago became publicknowledge and sparked a firestorm of comments in the last few days, mostly on Twitter.
People in other countries or other U.S. states do not know why “Mozilla” was listed in the donation data. Donors above a certain amount are required by the State of California to disclose their employer. Mozilla had nothing to do with the donation.
I’m not going to discuss Prop 8 here or on Twitter. There is no point in talking with the people who are baiting, ranting, and hurling four-letter abuse. Personal hatred conveyed through curse words is neither rational nor charitable, and strong feelings on any side of an issue do not justify it.
In contrast, people expressing non-abusive anger, sadness, or disagreement, I understand, grieve, and humbly accept.
No Hate
Ignoring the abusive comments, I’m left with charges that I hate and I’m a bigot, based solely on the donation. Now “hate” and “bigot” are well-defined words. I say these charges are false and unjust.
First, I have been online for almost 30 years. I’ve led an open source project for 14 years. I speak regularly at conferences around the world, and socialize with members of the Mozilla, JavaScript, and other web developer communities. I challenge anyone to cite an incident where I displayed hatred, or ever treated someone less than respectfully because of group affinity or individual identity.
Second, the donation does not in itself constitute evidence of animosity. Those asserting this are not providing a reasoned argument, rather they are labeling dissenters to cast them out of polite society. To such assertions, I can only respond: “no”.
If we are acquainted, have good-faith assumptions, and circumstances allow it, we can discuss 1:1 in person. Online communication doesn’t seem to work very well for potentially divisive issues. Getting to know each other works better in my experience.
The Larger Point
There’s a larger point here, the one Mitchell made: people in any group or project of significant size and diversity will not agree on many crucial issues unrelated to the group or project.
I know people doing a startup who testify that even at fewer than 20 employees, they have to face this fact. It’s obviously true for much larger communities such as JS and Mozilla. Not only is insisting on ideological uniformity impractical, it is counter-productive.
So I do not insist that anyone agree with me on a great many things, including political issues, and I refrain from putting my personal beliefs in others’ way in all matters Mozilla, JS, and Web. I hope for the same in return.
One of the great features of ES6 modules is the direct style module loading syntax:
12
importmapfrom"underscore.js";...map(a,f)...
This makes it as frictionless as possible to grow or refactor your code into multiple modules, and to pull third-party modules into an existing codebase. It also makes a common module format that can be shared between the browser and JS servers like Node.
But this direct style requires loading its dependencies before it can execute. That is, it’s a synchronous module load. Put in the context of a script tag, this would make it all too easy to block page rendering on I/O:
Throwing this syntax into the browser like this would be an invitation to jank. Thanks to insight from Luke Hoban, I think we have the right approach to this for ES6, which is in fact similar to our approach to avoiding turning eval into a synchronous I/O operation.
In previous versions of ECMAScript, there’s only one syntactic category of program that you can evaluate, called Program in the grammar. In ES6, we’ll define a restricted version of the syntax to be used in synchronous settings, which makes it illegal to do synchronous loads. Within a blocking script, the only access to modules is via the dynamic loading API:
This eliminates the footgun, and all of your modules can themselves use the synchronous loading syntax. For example, if jquery.js wants to use a module—say, a data structure library—it can go ahead and load it synchronously:
But still, this restriction on the top-level loses the convenience of directly importing modules from scripts. Thing is, in an asynchronous context, there’s nothing wrong with doing a synchronous load. So just like the asynchronously loaded jquery.js can use the synchronous syntax, we can also allow it in a defer or async script:
This allows the full flexibility and expressiveness of ES6 embedded in HTML, without any hazard of blocking page rendering for I/O.
The eval function for ES6 will work the same way, disallowing synchronous loading syntax in the grammar it recognizes, to prevent turning it into a synchronous API. We’ll also add an asynchronous version of eval that, like script async, recognizes the full grammar.
The little slideshow I presented is in part quaint. WPF/E and Adobe Apollo, remember those? (Either the code names, or the extant renamed products?) The Web has come a long way since 2007.
But other parts of my slideshow are still relevant, in particular the part where Mozilla and Opera committed to an unencumbered <video> element for HTML5:
We did what we said we would. We fought against the odds. We carried the unencumbered HTML5 <video> torch even when it burned our hands.
We were called naive (no) idealists (yes). We were told that we were rolling a large stone up a tall hill (and how!). We were told that we could never overcome the momentum behind H.264 (possibly true, but Mozilla was not about to give up and pay off the patent rentiers).
At Google I/O in May 2010, Adobe announced that it would include VP8 (but not all of WebM?) support in an upcoming Flash release.
On January 11, 2011, Mike Jazayeri of Google blogged:
… we are changing Chrome’s HTML5 <video> support to make it consistent with the codecs already supported by the open Chromium project. Specifically, we are supporting the WebM (VP8) and Theora video codecs, and will consider adding support for other high-quality open codecs in the future. Though H.264 plays an important role in video, as our goal is to enable open innovation, support for the codec will be removed and our resources directed towards completely open codec technologies.
These changes will occur in the next couple months….
A followup post three days later confirmed that Chrome would rely on Flash fallback to play H.264 video.
Where we are today
It is now March 2012 and the changes promised by Google and Adobe have not been made.
What’s more, any such changes are irrelevant if made only on desktop Chrome — not on Google’s mobile browsers for Android — because authors typically do not encode twice (once in H.264, once in WebM), they instead write Flash fallback in an <object> tag nested inside the <video> tag. Here’s an example adapted from an Opera developer document:
The Opera doc nicely carried the unencumbered video torch by including
<source src="video.webm" type="video/webm">
after the first <source> child in the <video> container (after the first, because of an iOS WebKit bug, the Opera doc said), but most authors do not encode twice and host two versions of their video (yes, you who do are to be commended; please don’t spam my blog with comments, you’re not typical — and YouTube is neither typical nor yet completely transcoded [1]).
Of course the ultimate fallback content could be a link to a video to download and view in a helper app, but that’s not “HTML5 video” and it is user-hostile (profoundly so on mobile). Flash fallback does manage to blend in with HTML5, modulo the loss of expressiveness afforded by DHTML playback controls.
Now, consider carefully where we are today.
Firefox supports only unencumbered formats from Gecko’s <video> implementation. We rely on Flash fallback that authors invariably write, as shown above. Let that sink in: we, Mozilla, rely on Flash to implement H.264 for Firefox users.
Adobe has announced that it will not develop Flash on mobile devices.
In spite of the early 2011 Google blog post, desktop Chrome still supports H.264 from <video>. Even if it were to drop that support, desktop Chrome has a custompatched Flash embedding, so the fallback shown above will work well for almost all users.
Mobile matters most
Android stock browsers (all Android versions), and Chrome on Android 4, all support H.264 from <video>. Given the devices that Android has targeted over its existence, where H.264 hardware decoding is by far the most power-efficient way to decode, how could this be otherwise? Google has to compete with Apple on mobile.
Steve Jobs may have dealt the death-blow to Flash on mobile, but he also championed and invested in H.264, and asserted that “[a]ll video codecs are covered by patents”. Apple sells a lot of H.264-supporting hardware. That hardware in general, and specifically in video playback quality, is the gold standard.
Google is in my opinion not going to ship mobile browsers this year or next that fail to play H.264 content that Apple plays perfectly. Whatever happens in the very long run, Mozilla can’t wait for such an event. Don’t ask Google why they bought On2 but failed to push WebM to the exclusion of H.264 on Android. The question answers itself.
So even if desktop Chrome drops H.264 support, Chrome users almost to a person won’t notice, thanks to Flash fallback. And Apple and Google, along with Microsoft and whomever else might try to gain mobile market share, will continue to ship H.264 support on all their mobile OSes and devices — hardware-implemented H.264, because that uses far less battery than alternative decoders.
Here is a chart of H.264 video in HTML5 content on the Web from MeFeedia:
And here are some charts showing the rise of mobile over desktop from The Economist:
These charts show Bell’s Law of Computer Classes in action. Bell’s Law predicts that the new class of computing devices will replace older ones.
In the face of this shift, Mozilla must advance its mission to serve users above all other agendas, and to keep the Web — including the “Mobile Web” — open, interoperable, and evolving.
What Mozilla is doing
We have successfully launched Boot to Gecko (B2G) and we’re preparing to release a new and improved Firefox for Android, to carry our mission to mobile users.
What should we do about H.264?
Andreas Galproposes to use OS- and hardware-based H.264 decoding capabilities on Android and B2G. That thread has run to over 240 messages, and spawned some online media coverage.
Some say we should hold out longer for someone (Google? Adobe?) to change something to advance WebM over H.264.
Remember, dropping H.264 from <video> only on desktop and not on mobile doesn’t matter, because of Flash fallback.
Others say we should hold out indefinitely and by ourselves, rather than integrate OS decoders for encumbered video.
I’ve heard people blame software patents. I hate software patents too, but software isn’t even the issue on mobile. Fairly dedicated DSP hardware takes in bits and puts out pixels. H.264 decoding lives completely in hardware now.
Yes, some hardware also supports WebM decoding, or will soon. Too little, too late for HTML5 <video> as deployed and consumed this year or (for shipping devices) next.
As I wrote in the newsgroup thread, Mozilla has never ignored users or market share. We do not care only about market share, but ignoring usability and market share can easily lead to extinction. Without users our mission is meaningless and our ability to affect the evolution of open standards goes to zero.
Clearly we have principles that prohibit us from abusing users for any end (e.g., by putting ads in Firefox’s user interface to make money to sustain ourselves). But we have never rejected encumbered formats handled by plugins, and OS-dependent H.264 decoding is not different in kind from Flash-dependent H.264 decoding in my view.
We will not require anyone to pay for Firefox. We will not burden our downstream source redistributors with royalty fees. We may have to continue to fall back on Flash on some desktop OSes. I’ll write more when I know more about desktop H.264, specifically on Windows XP.
What I do know for certain is this: H.264 is absolutely required right now to compete on mobile. I do not believe that we can reject H.264 content in Firefox on Android or in B2G and survive the shift to mobile.
Losing a battle is a bitter experience. I won’t sugar-coat this pill. But we must swallow it if we are to succeed in our mobile initiatives. Failure on mobile is too likely to consign Mozilla to decline and irrelevance. So I am fully in favor of Andreas’s proposal.
Our mission continues
Our mission, to promote openness, innovation, and opportunity on the Web, matters more than ever. As I said at SXSW in 2007, it obligates us to develop and promote unencumbered video. We lost one battle, but the war goes on. We will always push for open, unencumbered standards first and foremost.
In particular we must fight to keep WebRTC unencumbered. Mozilla and Opera also lost the earlier skirmish to mandate an unencumbered default format for HTML5 <video>, but WebRTC is a new front in the long war for an open and unencumbered Web.
We are researching downloadable JS decoders via Broadway.js, but fully utilizing parallel and dedicated hardware from JS for battery-friendly decoding is a ways off.
Can we win the long war? I don’t know if we’ll see a final victory, but we must fight on. Patents expire (remember the LZW patent?). They can be invalidated. (Netscape paid to do this to certain obnoxious patents, based on prior art.) They can be worked around. And patent law can be reformed.
Mozilla is here for the long haul. We will never give up, never surrender.
Google has at best transcoded only about half the videos into WebM. E.g., this YouTube search for “cat” gives ~1.8M results, while the same one for WebM videos gives 704K results.
WebM on YouTube is presented only for videos that lack ads, which is a shrinking number on YouTube. Anything monetizable (i.e., popular) has ads and therefore is served as H.264.
All this is moot when you consider mobile, since there is no Flash on mobile, and as of yet no WebM hardware, and Apple’s market-leading position.
One of my goals in life is to participate in a true political debate. Not a shouting match, but a true debate where both sides argue their points reasonably and, crucially, are open to the fact that they may be holding the wrong ideas and willing to change their beliefs. The Internet, allowing for an easy way to facilitate discussions across a very large, very diverse group of participants ought to have made this easier; alas, it seems that the past few years has only increased the vitriol and spitting bile in these shouting matches.
When I read news articles, sometimes I think "this sounds thought-provoking", and I turn to the online comments to see if it started one. Almost invariably, most of the comments turn out to be rants that are only distantly germane to the article (e.g., any article on the Middle East sets off a firestorm with ... well, you can guess; any article on Ron Paul sets off commenters screaming about how he's being "ignored" by the media, etc.). Sometimes, the ones I think are most thought-provoking have no comments (typically any article that mentions Africa). Go figure.
It is also surprising how many people you think ought to know better end up doing nothing but shouting. I went to a well-regarded selective-admissions public high school (the mean score for the standard college-admissions test in the US is not far from perfect) (and no, I did not get a perfect score on said test), but when we had a sponsored debate between the Democratic and Republican groups in our school, it still devolved into a (literal) shouting match.
Which brings me, in a roundabout manner, to the true topic of this post. Apparently, there is a firestorm currently brewing over a post on planet.mozilla.org that dealt with a political topic of current interest to some people. I discovered, years ago, that you catch a lot more flak if you write a post that goes to an aggregator that people disagree with than if you write one they agree with (I explained why I did not like the results of the 2008 presidential election, and caught more flak than the few posts earlier celebrating those results). I don't mind political discussion—indeed, as I described earlier, I'd love one. But I can understand why people might not like sensitive political topics being collected on a site which can be interpreted as official Mozilla policy.
However, there is one thing I don't like about it. Some people have taken it upon themselves to call the viewpoint presented as hate speech. I don't know what hate speech is exactly, but I don't think it should ever exist, no matter what one defines. That is because terming something as hate speech is an attempt to avoid discussing it: it is a way to say "that statement is so wrong I am not going to bother to refute it", or, equivalently, "I am so right that any opposition to my idea cannot be rationally argued." It is at best an insult to one's opponent and at worst an admission of outright arrogance.
Even worse, I think, are people who want to outlaw hate speech. Speech is, fundamentally, a reflection of one's belief. Attempting to outlaw the utterance of specific beliefs is nothing more than trying to outlaw those beliefs in the first place. Sure, some of them are injurious insults (such as those who deny that the Holocaust happened), but there is, to me, a steep slippery slope you start out on. Why outlaw denials of the Holocaust but not also outlaw denials of other historical religiously-motivated genocides? Hell, why stop there? We could outlaw any false statement: think how much more rational our discussions would be if people didn't state such lies as "Sharia law is invading the US legal system" or "the New Deal did nothing to bring us out of the Great Depression." Wait, maybe that's why.
I remember, when doing a project for a US Government class that involved redebating several key cases on freedom speech, that I found a Supreme Court decision which mentioned something to the effect of "Free speech is at its best when it causes discomfort" (I cannot find the exact quote right now, sadly enough). At that moment, I think, I solidified my stance on free speech to the following: all speech ought to be free, no matter how insulting, radical, or slanderous it may be. It may be that your speech causes injurious actions, but it's only the intent and the action that matters (e.g., falsely yelling "Fire!" in a crowded theater is grounds for manslaughter, but because the action was intended to create a stampede, not because you actually said "Fire"). Attempting to censor others' opinions—whether through legal means, or through attempts to shame them by calling their words "hate speech"—is not an action I can support.
And if you disagree with me, feel free to try to discuss it with me. Who knows, I may just change my mind.
When I checked email this morning, the first message I read was a posting on Homozilla, the internal LGBTQ/ally mailing list at Mozilla, about a posting from Gervase Markham, a Mozilla contractor who works on community relations, that appeared on the Planet Mozilla blog aggregator. Planet Mozilla syndicates blogs from various people (employees and volunteers) in the Mozilla community. Some people choose to only publish posts of theirs that are tagged with a certain tag on Planet Mozilla, while others publish their entire blogs. This leads to a mix of Mozilla-related and non-Mozilla-related content.
Gerv's post was not just non-Mozilla-related, but was a call for other UK residents to sign an anti-marriage-equality petition. While Gerv is entitled to his own opinions and to publish them on his own blog, publishing this opinion as a paid Mozilla staff member under the Mozilla banner implies that Mozilla endorses the hate speech that he chose to release. And yes, I'm calling it hate speech because saying that I don't deserve to have a fundamental human right is saying that I'm not a person. If that's not clear to you, perhaps you have a hard time empathizing with people who aren't as privileged as yourself. I would suggest you work on that; it's not really my job to help you learn.
The Mozilla staff members who administer Planet Mozilla responded by essentially standing behind Gerv's hate speech as published under the Mozilla banner. Again, Gerv has the right to say whatever he wants about how I'm not a human being, but as a company, Mozilla can make a choice about what kind of content to publish under their name.
Whatever it accomplishes for Mozilla's mission of protecting the open Web to disseminate speech that denies the humanity of a marginalized minority group, I guess that's more important than affirming that Mozilla values the contributions made by its LGBTQ employees and volunteers. To me, defending rather than repudiating Gerv's hateful post says that my contributions, and those of every other LGBTQ contributor at Mozilla, aren't important -- that Mozilla as a company is willing to give up all of those contributions just to be able to distribute hate speech through the Planet Mozilla aggregator.
I can't help but see parallels with my experience at PSU -- in that case, authority figures made the judgment that another grad student's right to talk about raping another student, to their face, at work was more important than their or my education. Apparently, the value of rape jokes at work was so high to my group at Portland State that it was not possible to take any serious action against pervasive sexual harassment or to discipline the person who committed the most heinous act in any serious way. I thought Mozilla was better than that, but apparently there's a similar calculus at work here: the value of having hate speech targeting LGBTQ people on a blog aggregator that is clearly under the Mozilla umbrella (nobody could read it and not think it's an official Mozilla feed) is being deemed greater than the value of everything that LGBTQ contributors have to offer to Mozilla.
If you think that the term "hate speech" is overreaching, you may be confused about the distinction between offense and oppression. For example, a homophobe might be offended by this post, but it is not hate speech against homophobes, since homophobes are not an oppressed class (quite the opposite) and I have no systematic power over them. At least under California state law, it's easy to find out which classes are protected classes: for example, women, people of color, people with disabilities, members of gender and sexual minorities, among others. Speech that targets any of those groups as a group and tears down their humanity (for example, by suggesting they don't deserve a fundamental human right) is hate speech. Speech that targets individuals independently from their group membership, or that targets powerful groups that are not protected classes, is not hate speech. Speech that is merely offensive to somebody and does not have the power and violence of a dominant group (like men, white people, heterosexuals, or cisgender people with cissexual bodies) to back it up cannot be hate speech. Speech on behalf of heterosexuals that targets LGBTQ people is absolutely hate speech; words that imply we're not worthy of basic rights are the theory, and a fist to the face is the practice. Each incites the other.
Back in September at the Mozilla All-Hands meeting, Gary Kovacs, our CEO, said that people had already been fired for making bigoted remarks at work and he wouldn't hesitate to do it again. The room applauded. I felt like I was finally in a place where I could feel free to focus on work without being afraid that someone else would decide that their discomfort with my gender or sexual minority status was my problem and that the administration would side with the bully. Now, I'm not so sure. See, the thing is, you can't be neutral when a bully is bullying -- being neutral means taking the bully's side. You can't cite "free speech" when a bully is using words to commit an act of violence by asserting and renewing their superior social standing and power in a situation. Tolerating hate speech means destroying free speech for people in minority groups; unchecked free speech means that only people in powerful majority groups get to speak. When bullies are allowed to use their power to remind me that I'm not a person, that silences my voice.
I'm using the "research" tag for this post since if my setup from long ago still works, this post will be syndicated on Planet Mozilla. And, of course, this post represents about an hour that I could have spent doing research had a number of individuals not made the collective decision that my workplace should be a place where my humanity is not a given but, rather, up for debate.
Edit: Edited to remove the name of a particular person who I had mistakenly attributed more to than he was actually responsible for.
One of the problems with working with compilers is that you tend not to find some bugs until they get used on real code. And unlike most test suites, real code can involve some very large files, which is where you tend to find bugs. As luck would have it, my debugging fun today led to two bugs being found in the same function... which is only a 200 line function that hides infinite loops in macros and is written using continuations to boot. The LLVM IR for this function, after being compiled with -O3, was 4500 lines (and one block had 87 predecessors and several &varphi instructions as well). At such a size, finding out why my code crashes on such a function is impossible, let alone figuring out which optimizations I need to blame for it.
Fortunately, LLVM has a tool called bugpoint which can reduce this IR into a more manageable size of work. Doing manual reduction via an iterative process of "this doesn't look necessary; cut it out" the first time took me about an hour to produce a pile of code small enough to actually analyze. Doing it via bugpoint on the second bug took closer to 30 minutes. Unfortunately, the hard part is figuring out how to actually use the tool in the first place: none of the manuals give an example command line invocation, and they start playing games of "look at that documentation over there". So, I am going to remedy this situation by actually giving functional documentation.
In this case, I have an assert in an LLVM pass that is being triggered. This pass isn't being run as part of opt, but rather its own tool that takes as input an LLVM IR file. So the first step is to get the IR file (clang -cc1 -emit-llvm -O3 is sufficient for my needs). After that, it's time to prepare a shell script that actually compiles the code; you can skip this step if you don't actually need to provide arguments to the program. For example, my bugpoint.sh script would look like:
After that, the next step is to actually invoke bugpoint. Here's the command line that's running as I write this post: bugpoint --compile-custom -compile-command ./bugpoint.sh io_lat4.ll. Since my program causes an assertion failure, bugpoint figures out that I'm trying to crash on code generation (it can also detect miscompilation errors). Hopefully, the first bit of output you get should look like the following:
Error running tool:
./bugpoint.sh bugpoint-test-program.bc-FB1NoU
Text indicating your program crashed
*** Debugging code generator crash!
If you've seen this much (you may need to wait for it to crash; it can take a long time if you doing Debug+Asserts builds), you know that it's trying to find code that makes your tool crash. After that, bugpoint tries to first reduce global initializers and then tries to eliminate as many functions as possible. After that, it tries eliminating basic blocks and then goes to work eliminating instructions. You will see lots of streaming output telling you what stuff it's removing; the documentation says it's helpful to capture this output, but I've found it useless.
When everything is done, you should get several files of the form bugpoint-*.bc; the most useful one is bugpoint-reduced-simplified.bc, which is the most reduced testcase. What you get now is a nice, reduced testcase? Not quite. First, it gives you just a bitcode file (I'd prefer .ll files, simply because my first thought is to read them to figure out what's going on). Another problem I have with bugpoint is that it doesn't do things like attempt to eliminate unused struct entries, nor does it bother to clean up some of its names. Take a look at this struct monstrosity: %struct.su3_matrix.4.134.147.173.199.212.238.290.303.329.381.394.407.459.472.485.498.524.550.563.576.589.602.615.628.654.667.680.719.758.823.836.849.862.1044.1057.1096.1808 = type { [3 x [3 x %struct.complex.3.133.146.172.198.211.237.289.302.328.380.393.406.458.471.484.497.523.549.562.575.588.601.614.627.653.666.679.718.757.822.835.848.861.1043.1056.1095.1807]] }.
Anyways, I hope this helps anyone else who has looked at bugpoint before and wondered "How do I actually use this tool to do something useful?".
This is a problem that came up during my research. Suppose I have a function foo whose assembly looks like the following:
foo:
mov global($rip), %rax
mov (%rax), %rax
retq
Are the following two lines of code equivalent?
/* defines somewhere */
int val;
extern int foo(void);
/* actual code */
asm("callq foo" : "=a"(val));
val = foo();
The answer, as you might guess by the fact that I'm writing this post, is "no." The reason why is a lot more complex. In the case that caused me endless toil and grief, the function foo is not located in the same executable as the caller; it is in an external library file. Now, the platform ABI allows functions to clobber a fair number of registers which callers must assume are unusable. If the loader needs to perform work when calling a global relocated function, then it is a good idea to use those registers where possible, since you don't need to bother saving those values. When the asm construct is invoked, we've declared that only the return register is clobbered, so the compiler is free to store some values over the call. When the function call is direct and doesn't need to go through the GOT or anything similar, this is perfectly fine and works as one would expect. But if you need to invoke the loader when calling through the GOT, then you now have registers whose values have suddenly mysteriously changed on you when calling a function which doesn't (appear to) use them. Hope you have more fun debugging those cases than I did!
Mozilla is happy to support Facebook in forming a Core Mobile Web PlatformW3C Community Group in which to curate prioritized, tiered lists of emerging and de facto standards that browsers should support in order for the Web to compete with native application stacks on mobile devices.
The W3C Community Groups do not create normative specifications; their work is informative at most [UPDATED per Ian Jacob's comment]. However I believe they can add significant value, especially by helping developers make their priorities clear to the implementors who tend to control the normative specs (W3C Recommendations).
Standards-making like law-making is definitely sausage-making. How could it be otherwise, with intensely competitive companies trying to work together?
On top of this, consider how conflicted many standards bodies are by pay-to-play, however muted and tamed by “process”. Anyone can join with enough money, and inject a divergent agenda or random noise into the process.
One inevitable outcome of these conflicts is too many proposed and even finalized standards for all browsers possibly to implement correctly and completely. The nice thing about standards is….
Who is best situated to advise implementors (mainly browser vendors) on which standards to prototype and finalize first? In my view, developers. But of course you can’t ask developers questions to answer with one voice. Developer communities must acclaim their own leaders, who then speak to standards bodies.
Last year, Facebook joined the W3C. I thought at the time “there is a company with skin in the Web content game, not only for pages but especially for apps.” Facebook relies heavily on HTML5, CSS, and JS. Facebook has no browser in the market to pull focus or inject asymmetric browser/service integration agendas.
And Facebook has hired long-time Open Web developers who have risen to be leaders in their communities: James Pearce and Tobie Langel.
So I encourage everyone interested in helping to join with James, Tobie and others in the new Core Mobile Web Platform community group. Together we can get the specs that Web developers deserve, completed in the right order with multiple interoperating implementations.
A bit of background. Mozilla has always contributed to web standards, going back to the start of the project. We co-founded the WHAT-WG to kick off HTML5. As readers of this blog know, we are a leader in JS standardization. We have some of the top CSS and layout experts in the world.
In the last eight months, our efforts to extend the web standards to include new APIs needed to build compelling apps and OS components on mobile devices have really caught fire. B2G and Open Web Apps are the fuel for this fire.
So I thought I would compile a list of emerging APIs to which we’ve contributed. In citing Mozillans I do not mean to minimize the efforts of standardization colleagues at Google, Microsoft, Nokia, Opera, the W3C and elsewhere. Standards are a multi-vendor effort (although excluding WebGL [see UPDATE below] one shiny name is conspicuously absent from this list).
The Mozilla contributions are worth noting both to acknowledge the individuals involved, and to highlight how Mozilla is championing device APIs for the web without having a native application stack blessed with such APIs on offer. We see the Web as quickly evolving to match native stacks. We have no other agenda than improving the Web to improve its users’ lives, including Web developers’ lives — especially mobile users and developers.
As always, standards in progress are subject to change, yet require prototype implementation and user-testing. Mozilla remains committed to playing fairly by not forging de-facto standards out of prototypes, rather proposing before disposing and in the end tracking whatever is standardized.
Here is the list, starting with some 2011-era work:
Geolocation, with Google contributing the editor and Firefox (thanks to Jay Sullivan leading the charge) implementing early.
The group is deeply indebted to Mounir Lamouri, Jonas Sicking, and the Mozilla WebAPI team in general for providing the WebVibrator prototype as an initial input.
File API. Editors: Arun Ranganathan, Jonas Sicking.
I did not list most of the HTML5 and Web API work aimed at Desktop Firefox, to focus on the new mobile-oriented additions. There’s more to say, including about bundled-permission follies and how to weave permission-granting (with memorization) into interactions, but not here.
One last note. The CSS vendor prefix brouhaha had, among many salutary effects, the benefit of shining light on an important requirement of competitive mobile web development: CSS style properties such as -webkit-animation-*, however you spell them, must have fast and beautiful implementations across devices for developers to find them usable: 60Hz, artifact-free rendering under touch control. This requires such work as off-main-thread compositing and GL layers.
This is a high technical bar, but we are in the process of meeting it in the latest Firefox for Android and B2G builds, thanks to hard work from many people, especially Patrick Walton, Robert O’Callahan, Chris Jones, and Andreas Gal. Onward!
I discovered that the compiler I was using last night managed to run out of memory while compiling some code. This was on our research group's server which has a whopping 128GB of memory. And by "ran out of memory", I literally watched in top as clang went from dozens of megabytes of memory to 120GB of memory, evict everything from the cache and thrash swap like crazy before the out-of-memory killer decided it ought to go.
What was I doing? No, I wasn't linking the source code to the universe; I was merely compiling a file in the Spec 2006 benchmarks. And this wasn't anything large--only 3700 lines of code or so in a single C++ file. And the pass that wasn't dying wasn't some horribly written pass that leaked memory like a leaky bucket. Indeed, LLVM tends to get you to free memory too quickly, judging by how many times I've chased down memory corruption. If you don't believe me, the offending code was this:
Code = CloneBasicBlock(CodeOrig, ValMap, "", F, NULL);
for (II = Code->begin(), EE = Code->end(); II != EE; ++II) {
if (!isa(II))
RemapInstruction(II, ValMap, RF_IgnoreMissingEntries);
}
For those of you who don't know what the code is doing, it roughly amounts to this: make a copy of a basic block of a function, and then tell all of the instructions to use the new instructions instead. On some programs, though, this simple instruction seems to have compelled the code to eat up gigabytes of memory.
Fortunately, I am now proud to announce that I have reduced my memory consumption by 99%… with a one-line fix. So I guess I should get the achievement for "Fix the Memory Leak" as well?
Back in September, I wrote about what the first half of grad school was like for me. That post covers a period of about three years, from the middle of 2008 to the middle of 2011. I've been wanting very much to write the companion piece covering the second half of grad school, but so far, I've been hampered by the fact that most of the second half of grad school has not actually transpired yet.
Still, I'm willing to give it a shot!1
To recap from last time: while I was at Mozilla working on Rust last summer, my advisor invited me to come with her to Northeastern University for the rest of my Ph.D., which meant that I had to choose between IU and NU. It was an extraordinarily difficult decision to make. NU has a world-class group of PL researchers; living in Boston probably would have been cool; and, of course, I'd have gotten to keep working with Amal in person. But my husband, Alex, is also in the middle of his Ph.D., and while it hasn't been nonstop rainbows and sunshine for him here at IU, starting over at yet another school (assuming he could find a graduate program in Boston that was a good fit for him), and then probably having to take yet more courses and jump through more qualifying exam hoops, would be far worse. Living apart isn't a viable option for us, either -- we did that for a year and a half, and we're not interested in trying to do it again. (Empirically, if we're apart for more than a week these days, I go nocturnal and spend all my time either reading genre fiction or watching the same Lady Gaga video over and over. Bad scene.)
So, when I got back to school last August, I started working with a new advisor, Ryan Newton, who had just joined the faculty at IU. I'm now being advised by both Amal and Ryan, but Ryan is who I see day to day. I've spent a lot of time in the last several months telling Ryan about my work with Amal, and now that Ryan and I have a project going as well, I've started to do some of the reverse. Happily, I learn a lot in the process of doing so!
In order to get a better sense of the kind of research Ryan does, I took his course on embedded DSLs for parallelism in the fall. This was a new area for me to be reading in, and one of the papers that particularly grabbed me was about the Concurrent Collections system that Ryan had worked on while he was at Intel (abbreviated "CnC", as per some Intel marketing team). The paper had a proof of determinism for a parallel language called Featherweight CnC that's a lightweight model of full CnC. I noticed that the determinism proof hinges on a monotonicity lemma that says that if one state s in the semantics steps to another state s', then either s' is an error, or the store in s' is a valid extension of the store that s had. The monotonicity lemma, in turn, hinges on a single-assignment property, which states that we can only write to previously unused memory locations. It was the sort of thing that smelled like it might fit into a possible-worlds semantic model like the ones that I have some familiarity with from working with Amal. I was also curious about the analogy between the single-assignment property and the "weak updates only" requirement that some systems have, which requires that updates to a store can only change the value, not the type, of the contents of a cell -- and whether any of the tricks that allow loosening of the weak-updates-only requirement had analogues that would allow loosening of the single-assignment restriction.
When I went to talk to Ryan about that, it turned out that he had also been thinking about loosening the single-assignment restriction on single-celled I-structures, sometimes known as IVars, which are the communication mechanism that the Haskell monad-par package uses. If it's okay to write to a cell once, it seems like it should be fine to write the same value multiple times. But at that point, it would be interesting to try to loosen the notion of "same" and replace it with a less restrictive equivalence relation. "Values that unify with each other" is one possibility that Ryan brought up; for me, good ol' "values that are indistinguishable for a given number of steps" comes to mind. Also, what would happen if we parameterized a language by an equivalence relation for sameishness, lambda-eek-style? What properties would have to hold of that equivalence relation to make the language deterministic? (Wren sloganed this idea as "determinism modulo theories", which I like.) Or, what about only allowing writes of values that are monotonically increasing according to some partial order that we also parameterize the language by? And speaking of monotonicity: what do the notions of monotonicity that turn up in various models of deterministic parallel computation -- for instance, in monad-par, in CnC, and in Kahn process networks -- have in common? What would it look like to have a unified theory of monotonicity that's general enough to account for existing deterministic parallel models as well as guide the design of new ones?
By the time we were asking all these questions, it was late November, and Ryan suggested that we end the semester by writing a grant proposal. The NSF solicitation that he had in mind had a deadline of December 19th, so there was no time to waste, but we worked hard and got the proposal submitted on time. (Hooray!) "Generalizing Monotonic Data Structures for Expressive, Deterministic Parallel Programming" is the title of our proposal. It'll probably be a few months before we find out whether it will be recommended for funding, but one nice thing about being in theoretical CS is that there's no big outlay of funds needed to get a project off the ground -- the tools of our trade are pretty cheap.2 So we've been able to get started on the work anyway, and it's going to be my main focus for the next month.
In the meantime, the project that I've been working on with Amal for the last two years (!) is still underway. I made a trip to Boston back in October to work on it, and I'll be there again in mid-March. We were also able to sneak a little time to work on it during POPL last month. (While at POPL, which was awesome, I also had the chance to pick lots of clever people's brains about monotonic data structures, and I got some cool ideas that I'm still in the process of following up on and that I hope to have more to say about soon.) So, I'm still juggling separate research projects, but so far, I've mostly managed to work on each one for enough time at a stretch that the context-switching overhead doesn't kill my productivity. It also helps that I'm finally done taking classes, plus, this semester Ryan is funding me, so I don't have to work as a teaching assistant in order to pay the bills. So, all things considered, I'm very happy about how things are going! Although it's too early to say for sure, I think it's likely that I'll propose a thesis that has to do with the monotonic data structures project sometime in the next year or so, after we've submitted a paper or two. I'm excited about that, and I'm looking forward to seeing what the next couple of years will bring.
I'll end on that note, but first, one thing bears some explanation. Back in July, I was thinking I might want to develop an intermediate language for LLVM-based compilers (of which the Rust compiler is an example) as my dissertation work. Although that would be a cool project, it probably isn't the right project for me. I'm interested in doing more theoretical work. I probably won't be interested in doing more theoretical work forever, and in fact, I really enjoy doing heads-down compiler hacking, plan to do some more of it this summer, and could imagine myself doing a lot more of it after I graduate. I just don't particularly want to do it for my Ph.D.
If I'm completely, unflinchingly honest with myself, one reason that hacking on a Rust-related project came to mind as a potential thesis topic back in July was because right then, I was grappling with the decision between IU and NU, and some part of me fantasized that if I could get Mozilla to fund my research, then it wouldn't matter so much who my advisor or institutional affiliation was, and I could just avoid the decision entirely. Problem solved, right? Except, no, not at all, because the thing is that I would like to actually get a Ph.D. at the end of all this, and Mozilla, for all the cool things that it is, is not a degree-granting institution.3 Moreover, Mozilla would like me to actually get a Ph.D. at the end of all this, and as Dave Herman, my fantastic mentor at Mozilla, pointed out, they can only fund Ph.D. students for projects that the student's advisor is actively involved in, because otherwise, the student would essentially be doing contract work for Mozilla. That's not to say that there's anything wrong with doing contract work for Mozilla, of course, but you can't get a Ph.D. that way. An advisor is much more than a source of funding; an advisor is someone who guides their students through the grad school process and, along with the rest of a student's committee, eventually signs off on their Ph.D. So, although there are various ways to fund a Ph.D., to actually get one you really need to have an advisor -- or two.
Part of my job as a researcher is to try to make informed guesses about the future, right? If I wanted to really be cute, I could even title this post "The Next 700 Days of Grad School." The timing probably isn't even too far off.
Recently, I have had the misfortuneopportunity to fix bug 695309. This bug is perhaps a good exemplar of why "obvious" bugs take weeks or months to fix; it is also a good example of why just reporting that a bug occurs for the filer is insufficient to fix. In the hope that others might find this useful, I will explain in a fake liveblogging format how I fixed the bug (fake because I'm writing most of these entries well after they happened).
October 20
There's a bug that "Thunderbird sometimes marks entire newsgroups as unread" in my email. Time to open up the bug report… reported in version 8… I'm pretty sure I've seen this once or twice before, so I don't think it's just a "the news server borked itself" issue. Time to file it away until when I have more time.
Another comment reminded me that I have this tab open. I've definitely seen it a few times, but I need to remember to keep Thunderbird open with logging enabled to figure out what's going on. It's reported as being a regression from version 8, when I checked in a slew of the NNTP URI parsing patches, so that seems like a probable target to look at. The question is why URI parsing would be causing an intermittent problem instead of a constant issue?
Some NNTP logs. Nothing is obviously amiss (not terribly unexpected, since logging either tends to drown you in useless information or omit the things you really want to know). Note after the fact: the NNTP logs in fact contain the smoking gun; it's just that the poster trimmed it out.
December 6
Bienvenu comments that no developer has seen the problem; this isn't true, as I have seen it (but just taken to avoiding it as much as possible). At some point in time, I figured out that the issue could be avoided by shutting down Thunderbird before putting my laptop to sleep. After being prodded over IRC, I finally sat down and attempted to debug it. The working thesis is that the problem is that newsrc files are getting corrupted, so I faithfully save several copies of the newsrc for confirmation. The msf files are also generally good candidates, but since other flags are untouched, it's highly unlikely in this bug.
I successfully trigger the bug once. The reports are mixed: the newsrc didn't get trashed as expected at first, but later it did. However, there is a brief window of time after the bug happens which allows you to fix it, if you shut down Thunderbird. Knowing what I now know, this is because the newsrc file is written out on a timer, so the newly-all-unread messages wouldn't have been saved to disk to show the bug. Since I always think of what to log after the fact, I try to trigger it a few more times.
None of the later tests are quite so successful as the first one. It does start to dawn on me that the bug probably has two parts. There is a first step that puts the group into a pre-bug situation; a second step actually produces the symptoms of the bug. Omniscient addendum: the first step is where the bug happens; the second step is just the delay in getting the bug to occur.I report my findings, and subsequent comments all but confirm my hypothesis for a necessary component.
December 12
This bug seems to be happening for me only when I don't want it to happen. I thought of more things to test earlier, and had them ready to copy-paste into my error console to try to find a smoking gun. This test confirms that the fault originates with the newsrcLine (so something reset that); more investigation leads to only one truly likely scenario to cause this to happen. At this point, I am all but convinced that the bug happens in two parts. All of the debugging needs to focus on when the first part happens; the symptoms (everything being marked read) are a result of Thunderbird attempting to repair the damage that had already been done. Since most people are going to try to report based on when they see the problems, I'm probably not going to get anything useful.
December 13
Hey, I found it again. Tests confirm that the high water number was first set to 0. This means either we have memory corruption or we are setting the value incorrectly initially. Looking at the code, this is set from a call to sscanf. A simple test indicates that I can set the input value to 0 if I don't have it scan a number. Now all I need to do is figure out network traffic that can cause this. Time to try to trigger it with xpcshell tests. And then get frustrated because I still can't do it reliably.
December 18
With Wireshark (logging on Windows is rather annoying), I finally can track down the network traffic of interest. It turns out that we are off-by-one in terms of responses. This really should be enough to get xpcshell to report the error. However, it also means I probably should finally give up and go for the NNTP logs again to catch the error: it is painfully obvious that the problem is that the internal state is futzed up. This also means that other various newly-reported issues (more frequent auth failures, various alerts like "Not in a newsgroup", etc.) are pretty much confirmed to be the same bug.
January 9
I finally buckled down and turned my very old hackish code for NSPR log management into an extension. This means that I don't have to futz with environment variables on Windows, and it also gives me an easy way to trim down my log size (since unlike environment variables, I can stop logging). I test and finally get the NNTP log of interest.
Now that I have a log, I can try to work backwards and figure out how this bug happens. The failure is easily ascribable to somehow thinking we've already opened the socket when we open a socket; this much I have known for almost a month (the Wireshark told me). What the NNTP log gives me is a more fine-grained ability to try to trace the code after-the-fact to figure out where the bug is.
Working backwards from the log
NNTP logs helpfully record the address of the the NNTP protocol object when emitting output, so the best place to start is from the first line of that object. Since the numbers are hard to read, so I replace the addresses with a more obvious string like "evil." The next state was set to SEND_FIRST_NNTP_COMMAND—that clearly should be NNTP_LOGIN_RESPONSE, so let's see why it might be skipped. The latter is set if m_socketIsOpen is false, so perhaps someone could open a socket if it isn't set.
This variable is set only by nsMsgProtocol, specifically the LoadUrl method. So who might call that? Still nothing out of the ordinary is popping out at me, so it's time to return to the logs (A particularly astute developer might be able to find out the bug without returning to the log here, but I doubt most people would come up with the final flash of insight yet).
To figure out the problem, it's important (to me, at least) to set out the time frames. The natural sequence of events for a connection is to be created, connected to the server, and then go through several iterations of running URLs. Is this bad connection a new connection or an old one (since I started the log after TB started up, I can't guarantee that this is truly a new connection)? Thunderbird stupidly (but helpfully, in this case) tells me when the connection is created, since we get a log message of "creating" in the constructor. Since this message doesn't appear, it's being reused.
Wait—that message is there, tucked just underneath the lines that tell me LoadUrl is run. As I am often wont to do, I announce my revelation to IRC: "< jcranmer> bienvenu: please slap me". The code for the constructor is pretty simple, but there is one little function call that ends up causing the problem. If I tell you the exact revision that caused the regression (specifically, the first hunk of that the patch), can you find the regression? Click me if you can't
Reproducing the bug
Now that I know the exact sequence of events to cause the bug, I can write a test to reliably reproduce the bug. I can also explain why the symptoms occur. First, a connection's socket dies, but it doesn't fail the timeout check in mailnews code, so it remains in the cache. Next, the server schedules a URL on the connection, which promptly dies. If there are many more URLs to be run (when we're doing, say, a timed update), then we have several elements in the queue waiting to be processed. Now, we attempt to run a new URL; with no connections available in the cache, we now create a new connection and run the new URL on that. Since the queue is not empty, the SetIsBusy(false) call ends up pulling a queued URL and setting up the state (including opening up the connection) and running that. Then the constructor finishes, and the new URL for which the connection was constructed is used to initialize it. Since the connection is by now open, the code moves straight to the run-my-URL part of the state machine, which misinterprets the login response as the answer to the command it just sent. This ends up leading to all of the various badness subsequently observed.
The description of these events is pretty complicated. A condensed version is added to the test I constructed to get this to happen, which is called, fittingly enough, test_bug695309.js. I am not going to try to come up with a better name, as I think most people might agree that this is where naming tests after bug numbers is most desirable. Naturally, the code to actually trigger this is complicated: I started with code to trigger just the unread effect, and then switched to monitoring the highwater mark instead (one less download of messages). I need to fill up the connection cache and then figure out how to kill the connections (restarting the server seems to do an adequate job). After that, I need to run enough URLs to cause the connections to queue, and then trigger a load of the folder in question at precisely the right moment. All of these requires paying careful attention to what happens synchronously and what does not, and it all relies on knowing the precise steps of what will happen as a result of each action. Any of a small number of changes I could make in the future is probably going to "break" the test, in that it will cease to test anything meaningful.
Postmortem on fixing it
I'm sure some people might look at this bug and ask several questions. Let me try to anticipate some of them and answer them.
Why did the reviewers fail to notice this change?
This bug is a result of the confluence of several changes. In the beginning, SetIsBusy was mostly a no-op, so calling it from the constructor was safe. Then the URL queue was added, and SetIsBusy was used to indicate that the connection was ready to receive a new URL from the queue. This turned it into a time bomb, since the code was correct only because the server was null in the constructor and the queue should have been empty during connection construction anyways. The final change was moving initialization of the server, which triggered the time bomb. But even then, it triggered the bomb only in a rather abnormal circumstance. At no time would the code have failed any automated tests or any of the "does your patch work" tests that a reviewer normally goes through: robustness during connection interruption is both sufficiently difficult to test and rare enough that it generally doesn't get tested.
Why did this bug take so long to track down?
The time since December 1 is truly a result of lack of time: I have had maybe 1 good week to work on anything due to outside factors. There are two, maybe three, people in active participation with Mozilla who know this code well, and out of those, only one was able to somewhat reliably reproduce it (specifically, the one with the least amount of time on his hands). I could have relied on others to gather the information I needed, but my previous efforts in QA have taught me that getting people to give you necessary information is often very difficult, especially when the information collectable at the obvious point in time is completely useless. I realize in retrospect that there were some more technically inclined people in the bug (at which point in time I was already devoting as much attention as I felt I could spare on the bug anyways). The bug is also particularly pernicious in that the most valuable information is that which is gathered before any visible symptoms occur; it was after I figured out how to discover that the bug was going to occur that I could track it down further.
Will the fix be backported to more stable branches?
At this stage, I am looking at two fixes: the small one-liner that fixes the specific bug, and the larger patch which fixes the larger issue of "the NNTP URL connection queue is a ticking time bomb". The former I would definitely like to see backported to aurora and beta (and possibly release if there are plans for another one before the next uplift), and I can't see any reason a small patch on the most-highly-voted TB bug filed since 01-01-2011 would get rejected (it is the newest bug to have more than 5 votes, and the next largest has 1/3 less and is almost twice as old).
Why is this bug so hard to reproduce?
If you think to the cause of the bug, it is that it is passing a check in a function that is always called. It therefore seems like this bug should be triggered a majority of the time, and not an undependable minority. As I alluded to earlier, the problem is that this also has to interact with a buggy pending URL queue management system. In short, most of the time, this queue will be empty during the initial, buggy call; with a poor timing of events (less likely to happen on most developers' computers, in short), the queue will cease to be empty and cause the problem. Indeed, when crafting my test for reproduction, I discovered that letting the event loop spin in one particular location (after killing connections) would fail to trigger the bug. In the real world, that is one of the places where the event loop is most liable to spinning for the longest.
If I remember right, today is my two year anniversary working full time at Mozilla. And it works out to about six years of working with Mozilla and TC39. I could stop and get sentimental, but there’s work to do.
This feels a bit late to be talking about the 2011 LLVM Developers' Meeting (seeing as how it happened almost exactly a month ago), but since the slides have been put up over the past week and the talks were only put up on Youtube this week, I suppose I can finally back notes up with links and not talk so abstractly.
Of the talks I went to, I think by far the most interesting one was Doug Gregor's talk on Extending Clang. It covered various extension points in Clang and some of their capabilities with simple examples. It also ended in what might be impolitely titled "Where Extending Clang Sucks" and what was politely titled "Help Wanted." This boils down to "plugins are hard to use" and "one-off rewriters are hard to write". Indeed, I think the refrain of problematic architecture for further tools was repeated in more than a few presentations: Clang has the information you want and need, but squeezing the information out is inordinately difficult.
What was probably the most popular talk was Chandler Carruth's talk on Clang MapReduce—Automatic C++ Refactoring at Google Scale (his slides appear to not yet be posted). From what I recall, the more interesting parts of the talk are closer to the end. He had a discussion on developing a language for semantic queries to identify code that needs to be replaced (the example query was essentially "find all calls to Foo::get()"); the previous things people have tried are regexes (C++ is not syntactically regular, it's recursively enumerable), XPath (unfortunately, ASTs, despite their name, aren't exactly tree structures), and pattern matching ASTs (not always sufficient textual clues). The team's idea was to use a matcher library on "AST" values as predicates. He also spoke a bit about some of the efforts they did on using Clang with Chromium.
There is a point brought up in Chandler's talk that I want to reiterate. One of the problems with writing refactoring tools (or static analysis tools in general) is in getting people to trust that they work. For any sufficiently large codebase—millions of lines of code or more—it is fairly certain that the project will run up against the nasty edge cases in parsing tools. If a tool is intended to run mostly automated analysis on such code, it is impossible for anyone to be able to look at the output and ensure complete correctness. However, most sufficiently large codebases come with massive testsuites to help people believe their code is correct; if the same parser is capable of producing code that passes the entire testsuite, then one only needs to trust that the analysis is done correctly, a much smaller task. Without such a capability, no one would be willing to trust the analysis; in other words, any parser which is not capable of then generating code is never going to be trusted as a basis for further work, at least, not if you are looking at million-line projects.
Speaking of Chromium and Clang, there was talk on this too. As I am sure most of my audience is well aware, Chromium uses Clang for several of its buildbots (and all of its recent Mac development); I wish Mozilla could get Clang to be a tier 2 or tier 1 platform for Mac and Linux. As for why they prefer it, the brief rundown is this: better diagnostics, faster, smaller object sizes, they can write a style checker, and they can build better tools off of it (like AddressSanitizer, which, unsurprisingly, itself had a talk). Again, there were the complaints: building the rewriter proved to be difficult a task (notice a trend here?). Incidentally, I also attended the AddressSanitizer talk, but I'm getting tired of copying down notes of talks, so I'll let those slides and the video speak for themselves.
Finally, I did give a talk on DXR. It seemed to be well-received; I had a few people coming up to me later in the day thanking me for the talk and asking more questions about DXR. Something I did discover when giving it, though, is just how difficult giving a talk really is. I had specific notes on the slides written in my notebook, only to discover that I wasn't able to gracefully retreat to the podium and read the notes long enough to figure out what to say next. If you're wondering what I forgot to mention in the talk, it's mostly a more coherent explanation during the undead demo.
The topic of coroutines (or
fibers, or continuations) for JavaScript comes up from time to time,
so I figured I’d write down my thoughts on the matter. I admit to
having a soft spot for crazy control-flow features like continuations,
but they’re unlikely ever to make it into ECMAScript. With good
reason.
The big justification for coroutines in JavaScript is non-blocking
I/O. As we all know, asynchronous I/O leads to callback API’s, which
lead to nested lambdas, which lead to… the pyramid of doom:
That looks pretty sweet. It’s a synchronous version of setTimeout
that doesn’t block the main thread. This seems like a nice combination
of the sequential style of synchronous code but with the
responsiveness of non-blocking I/O. Why wouldn’t we want something
like this in ECMAScript?
Coroutines are almost as pre-emptive as threads
Part of the beauty of JavaScript’s event loop is that there’s a very
clear synchronization point for reaching a stable state in your
programs: the end of the current turn. You can go ahead and leave
things in a funky intermediate state for as long as you like, and as
long as you stitch everything back up in time for the next spin of the
event loop, no other code can run in the meantime. That means you can
be sure that while your object is lying in pieces on the floor, nobody
else can poke at it before you put it back together again.
Once you add coroutines, you never know when someone might call
yield. Any function you call has the right to pause and resume you
whenever they want, even after any number of spins of the event
loop. Now any time you find yourself modifying state, you start
worrying that calling a function might interrupt some code you
intended to be transactional. Take something as simple as swapping a
couple fields of an object:
What happens if munge does a yield and only resumes your code
after a few other events fire? Those events could interact with obj,
and they’d see it in this intermediate state where both obj.foo and
obj.bar are the same value, because obj.bar hasn’t yet been
updated.
We’ve seen this movie before. This is just like Java’s threads, where
any time you’re working with state, you have to worry about who might
try to touch your data before it reaches a stable point. To be fair,
life is actually far worse in Java, where almost every single basic
operation of the language can be pre-empted. But still, with
coroutines, every function call becomes a potential pre-emption point.
Host frames make coroutines unportable
And then there’s the implementation problem. Unless your JavaScript
engine doesn’t use a stack (and they all do), coroutines would have to
be able to save a stack on the heap and restore it back on the stack
later. But what if JavaScript code calls into code implemented in the
host language (usually C++)? Some engines implement functions like
Array.prototype.forEach in C++. How would they handle code like
this?
Other languages with coroutines take different approaches. Lua allows
implementations to throw an error
if user code tries to suspend host activations. This would simply be
unportable, since different engines would implement different standard
libraries in C++.
The Scheme community tends to demand a lot from their continuations,
so they expect functions like for-each and map to be
suspended. This could mean either forcing all the standard libraries
to be self-hosted, or using more complicated implementation strategies
than traditional stacks.
Simply put: browser vendors are not going to do this. Modern JS
engines are extraordinary feats of engineering, and rearchitecting
their entire stack mechanism is just not realistic. Then when you
consider that these changes could hurt performance of ordinary
function calls, well… end of discussion.
Shallow coroutines to the rescue
OK, back to the pyramid of doom. It really does kind of suck. I mean,
you could name and lift out your functions, but then you break up the
sequential flow even worse, and you get a combinatorial explosion of
function arguments for all those upvars.
This is why I’m excited about
generators. Generators
are a lot like coroutines, with one important difference: they only
suspend their own function activation. In ES6, yield isn’t a
function that anyone can use, it’s a built-in operator that only a
generator function can use. With generators, calling a JS function is
as benign as it ever was. You never have to worry that a function call
might yield and stop you from doing what you were trying to do.
But it’s still possible to build an API similar to node-fibers. This
is the idea of task.js. The
fibers example looks pretty similar in task.js:
The big difference is that the sleep function doesn’t implicitly
yield; instead, it returns a
promise. The
task then explicitlyyields the promise back to the task.js
scheduler. When the promise is fulfilled, the scheduler wakes the task
back up to continue. Hardly any wordier than node-fibers, but with the
benefit that you can always tell when and what you’re suspending.
Coroutines no, generators yes
Coroutines are not going to happen in JavaScript. They would break one
of the best features of JavaScript: the simplicity of the event loop
execution model. And the demands they would place on current engines
for portability are simply unrealistic. But generator functions are
easy to add to existing engines, they have none of the portability
issues of coroutines, and they give you just enough power to write
non-blocking I/O in a synchronous style without being “threads lite.”
I’m joining the throngs of programmer-bloggers using Octopress for my new blog.
There’s so much to commend about it, but it really comes down to one thing:
Programmers should be able to write their blogs in text editors.
Also, from now on, everything I ever do in my life should be in GitHub.
Chris Williams makes a moving plea for an end to negativity, meaning trolling, flaming, mocking, and hating in online media.
This sounds utopian, like “an end to history”. But it is good as an aspiration, a constant reminder, since we’ve all seen how many people tend to be more negative online than they are in person. This isn’t just a matter of isolated individual behavior, free of cultural feedback loops. The new media reinforce tribalism.
However, it is hard to be positive about somethings. I will persevere….
I would like to single out Alon Zakai‘s Emscripten talk. Emscripten is an LLVM-to-JS compiler, which means it enables compiling C, C++, and Objective-C (and other languages with LLVM front ends) to JS. What’s more, interpreters written in C for Python, Ruby, and Lua have been compiled and hosted on the web.
Alon’s results are impressive, with lots of room for more wins. At JSConf.eu, jaws dropped and eyes were opened.
For my talk, I reprised some CapitolJS material, including the RiverTrail demo, which won loud and enthusiastic applause when I clicked on the “Parallel” button.
(A few people asked afterward about whether the graphics was running on one of four cores. I’ll repeat the answer here: the particle system demo uses WebGL targeting the GPU for rendering, and the four CPUs’ vector units for n-body solving. All from deadlock-free, data-race-free, seemingly single-threaded JS.)
Here’s the video of my talk:
The amazing Anna Lena Schiller created infographics for all the talks, on the spot — a truly impressive display of concentration and stamina. Here’s the one she did for my talk:
And here are the updated and new slides I presented, showing ES6 work-in-progress (none of it final, so don’t panic) and covering some current controversies.
From recentes-discussmessages, I’m afraid that classes are on their way out of ES6. This seems a shame, and avoidable. In hindsight, we did not have all class advocates working in concert on the hard issues last year and earlier this year. But we also do not agree on what’s required for ES6, and some on TC39 view minimizing as future-hostile.
To be blunt, we lost some “classes” advocates who work for Google to Dart. Others at Google on TC39 seem to want more out of ES6 classes than even Dart guarantees (see the future-hostile point above).
I’m not slamming Google as a company here, since it does still support people working on JS in TC39. I respect the people involved and believe they’re for the most part making their own choices. But Dart and other unrelated Google agenda items do impose clear and significant opportunity costs on Google’s standards actiivities.
To remain positive per “An End to Negativity”, I’ll simply conclude that we TC39ers should pay attention to Dart now that it is out, even though we’ve lost time and potential contributions.
The famous Tony Hoare quote that Bill Frantz cited, which argues for deferring classes, is this:
When any new language design project is nearing completion, there is always a mad rush to get new features added before standardization. The rush is mad indeed, because it leads into a trap from which there is no escape. A feature which is omitted can always be added later, when its design and its implications are well understood. A feature which is included before it is fully understood can never be removed later.
From C.A.R.Hoare’s 1980 ACM Turing Award Lecture
I agree with Erik Arvidsson that “[b]y not providing [class] syntax we are continuing to encourage a million incompatible ‘class’ libraries.” I’m with Erik: I would still like to see TC39 agree on minimal classes. But not at any cost.
Onward to new proposals with sometimes-tentative syntax. I’m continuing to “live in a fishbowl” by showing these proposals, even though doing so risks drive-by misinterpretation that we have finalized the sum of all proposals.
So, please don’t freak out. Not all of this will make it as proposed. We may also make cuts. But it’s important to address the use-cases motivating these proposals, take in the fullness of the problem space and potential solutions, and do the hermeneutic spiral.
Apart from font issues that make <| look lopsided or non-triangular, this proposal looks good. It replaces the main legitimate use-case for assigning to __proto__: presetting the prototype link in an object literal.
Unlike Object.extend, .{ copies only “own” properties from its right-hand-side object literal, and (this is a crucial difference) it also copies properties with private name object keys (which are non-enumerable by definition). For example, base.{[privateName]: value, publicName: value2} given a private name object reference denoted privateName in scope.
Design patterns point to programming language bugs. Nevertheless, this class pattern shows clever work by Allen Wirfs-Brock, decomposing classes-as-sugar into chained operator expressions. It’s still a bit verbose and error-prone in my opinion, and cries out for the ultimate sugar of minimal class syntax (if only we could agree on that).
Much of the Dart class syntax design looks good to me. Possibly TC39 can agree to adopt it, with necessary adjustments. It would still be sugar for constructors and prototypes.
Arrow function syntax faces an uphill battle due to the combination of TC39′s agreement to future-proof by having an unambiguous LR(1) grammar (after ASI and with lookahead restrictions); mixed with the comma expression, (a, b, c), which I copied into JS’s grammar straight from C (not from Java, which left it out, instead providing comma-separated special forms in a few contexts, e.g. for(;;) loop heads). You can’t have both, and we do not want to remove the comma expression in Harmony.
I’m quite in favor of block-lambdas, and they meet formal approval from TC39′s strictest grammarian. Some still object to them as an alien DNA injection from Ruby and Smalltalk, both syntactically and (with Tennent Correspondence Principle conformance regarding return, break, continue, and this) semantically.
At this point, ES6 has no shorter function syntax. This seems like a loss, and fixable, to me. Your comments welcome, especially if they make novel distinctions that help forge consensus.
During the talk and Q&A, I recounted how the WHAT-WG was created to counteract a standards body gone wrong (the 2004-era W3C). I then raised the idea of a community-based group, a “JS-WG”, to augment the much healthier but still under-staffed Ecma TC39 committee.
Besides floating more ideas (really, the point is not to bikeshed endlessly or take in too many proposals to digest), a JS-WG worth organizing might actually develop draft specs and prototype implementation patches for JavaScriptCore, SpiderMonkey, and V8. The maintainers of those engines could use the help, and with patches and patched builds, we could scale up user testing beyond what’s in the cards now.
I know it’s hard to believe, but people are finally realizing that with V8 prototyping alongside SpiderMonkey, ES6 is happening. It’ll be prototyped in pieces. I hope many will be “on by default” (e.g., not under a flag in Chrome) well before the new edition is standardized (end of 2013). That’s how we roll in Firefox with SpiderMonkey.
My journey to become a contributor to Mozilla started during the summer of 2007, which was when I started very heavily using Thunderbird. The initial impetus is probably best traced to this thread in a newsgroup: a long, semi-flame war. I wanted to excise the flame war portions from the rest of the thread, so I filed bug 392404 (the first bug I ever filed!). That bug was marked as a duplicate of bug 11054.
At that time, I had a fair amount of free time (this was around the time my summer internship ended but before school started). I downloaded the source code to Thunderbird and then built it, only to discover that every time I built it, it crashed in a linker. After bugging people about it on IRC, I learned that I needed a swap file, and I finally managed to get a successful build. Then, having had programming experience, I decided just to fix the bug myself. I honestly can't remember for certain, but I believe I built the original patch without much aide from people on IRC--it was only the inability to build that I had needed to ask for help.
I do remember that I needed guidance on how to get a patch committed. This being mailnews code around the time that Mozilla Messaging was being set up, the pool of potential reviewers and superreviewers was rather slim. I had been advised that David Bienvenu and Scott McGregor were my best candidates for review. Unfortunately, around this time, Scott had cut off all work with Mozilla; after two months of not getting any response from him whatsoever, I switched to another reviewer.
While I was waiting for the patch to be reviewed, I recall Dan Mosedale talking about bug 132340 indirectly; later, when I asked if there was a fix that people would be interested in, that bug was what I was pointed towards. This patch took about two months and four review cycles to be accepted, but it eventually satisfied my reviewer and superreviewer and became my first contribution to Mozilla. That first patch I worked on? It turned out to be significantly more complicated than I first appreciated and finally was closed 9 months after I started work on it. After bug 132340, I started getting more and more involved with Mozilla development and have been a contributor since then.
Click this link. It’s one of the best overviews of architecture/strategy differences between Google, Amazon, Facebook, and Microsoft. I also think it is a good approximation to the differences between Firefox and Webkit as platforms.
My colleague Jamie already broke the news a while ago, so I guess there's nothing stopping me: my adviser1, Amal, has left Indiana University and joined the faculty at Northeastern. I've sat down to try to write about that a dozen times since Amal told me the news in mid-June, but it appears that my attempts have to do so have instead yielded a 4000-word screed about how my Ph.D. career had been progressing up until about mid-June.
I'd like to share that story with you, but before we start, a warning about me. Some people manage to keep dedicated research blogs that they keep entirely separate from wherever it is, if anywhere, that they write about goings-on in their personal lives. I am not one of those people. I love to write about research, but if I have a research-related epiphany, not only am I going to tell you what it is, I'm going to tell you which metaphorical or actual bathroom stall it happened in. At length. I can do pithy, too, but if "pithy" is what you're after right now, I think you're in the wrong place, friend.
Okay; warning imparted -- off we go, then!
What I usually tell people is that for the last three years, I've been a grad student in the programming languages group at my school. That isn't quite true. In fact, I joined IU without being a member of any research group. In my first year, I had a hooray-you're-a-girl fellowship, so I didn't need to teach or do research in order to have funding; in effect, that meant that I was able to treat the first part of grad school as an extension of undergrad.2 All I had to do was take classes, and that was something I already knew how to do, so I did pretty well at it. I finished my first semester with excellent grades and a feeling of smug superiority to everyone I'd ever heard say that grad school was hard.
Then, Dan asked me to help teach his course during my second semester. My fellowship would have supported me until the end of the year, so I didn't have to teach anything. But I'd loved taking the course in the fall, and I was flattered to be asked to help out, so I went for it. Dan's student at the time, Will, was deeply into dissertation mode, so working for Dan meant a lot of hanging around Will as he worked on miniKanren. That was how I began to dip a toe into research, which is, I'm sure, what Dan had intended to have happen all along. For Dan, teaching and research are inextricably linked. Even just the teaching part by itself was a challenge for me, though, and I was doing it on top of my normal course load, which was harder than the previous semester's, since now all the courses I was taking were covering material I hadn't seen before. I stopped feeling smug pretty quickly, and it wasn't too long before I was really getting killed.3
Halfway through the spring, Dan invited me to spend the summer doing research with him and Will, and I very nearly declined the invitation, because I was exhausted and desperately needed a break. But I wrote to John Stone for advice, and thankfully, he talked me out of saying no. In May, I declared victory over the bloody remains of the semester and stuck around to hang out with Dan and Will over the summer. I think that it was at that point that I stopped treating grad school like Undergrad, Part Deux and started really attempting to be a researcher.
The summer was a mixed bag, though. I got a first publication out of it, and even an attempt at a second, plus quite a lot of free pizza. But I never felt like I knew what the hell I was doing, and now I can put my finger on at least one of the things that was wrong. In a theoretical field like ours, I think that reading is one of the most important parts of doing research, and I wasn't doing nearly enough reading. When I was working on our FLOPS submission, I didn't understand that citing someone else's work should mean not only that you've read and understood the relevant part of that person's work, but also that you've surveyed the rest of the nearby research landscape and come up with a reason why that particular paper, rather than some other one, is the right one to cite for your purposes. You might have to look at several papers for every one you actually end up citing. I was citing all kinds of stuff I hadn't really read. Moreover, because I hadn't read enough papers, I didn't have a sense of what a good research paper was supposed to look like or sound like or be. That all goes a long way toward explaining why the FLOPS submission was rejected. At no point had anyone sat me down and said, "You want to be a researcher? You need to learn to read papers if you want to learn to write them."
Amal joined the faculty at IU that fall, just as my second year was starting, and I took her type theory course and loved it. We began working on research together in the spring, and it started to become clear that working with Amal was a good fit for me. Part of the reason I didn't do very much reading when I was working with Dan is that Dan's modus operandi with students is to give them fiendish puzzles to solve -- koans, you might say -- and then actively discourage them from reading anything that might give away the answer, because that would deprive them of the learning experience that comes from figuring it out for themselves. For a certain kind of learner, I'm sure that that approach leads to enlightenment. For me, as a first-year grad student, it just led to frustration. But Amal was different: she gave me piles of stuff to read! I took two more courses from her during spring and fall 2010, both research seminars, and it was in those courses that I finally started to read a lot of papers. Some were more relevant to me than others, but they all filled the need of getting me to just read a whole lot of PL papers, so that I would know how to read papers and therefore be prepared to absorb information from sources that were relevant.
And "prepared to absorb information" was what I needed to be. The project that Amal had suggested for us to work on was an offshoot of research that she had done with Jacob Matthews in 2008. They had published a paper that dealt with how the parametricity property could be made to hold for a language that was partly statically typed and partly dynamically typed, where the two languages were combined using a Matthews-Findler-style multi-language embedding. Having dynamically typed code in the mix would ordinarily mean giving up on parametricity, but their multi-language system used a technique called dynamic sealing that forces programs to behave parametrically at run-time, or die trying. Dynamic sealing wasn't a new idea, but Jacob and Amal were the first to present a proof that it worked as advertised to preserve parametricity. The only problem was that their work had turned out to have a flaw in it that invalidated the proof.4 But Amal had an idea for fixing the problem, and the plan was that we'd fix it, redo the proof, revise the paper to go along with it, and then publish the updated paper as a journal article together.
The Brian Kernighan quotation goes, "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." This is true of math as well, and the flawed proof that we were to fix employed a technique that I would go so far as to call not only clever, but actually kind of sneaky. (All the best math is sneaky.) When we started on the project in January 2010, Amal thought we'd have it done by May. What actually happened, of course, was that it took until May for me to get enough background knowledge to be able to even start the project. I had only just started studying type theory at all, and I'd only had the briefest of encounters with logical relations as a proof technique, let alone the step-indexed logical relations that this proof used. I couldn't even do an SILR proof, much less debug one. Nor did I understand what a multi-language embedding was, how dynamic sealing worked, or even what parametricity was really all about. So I had a long way to go before I was finally ready to start working on the project in May.
According to the commit logs, I then worked on it from precisely May 1 to May 19, at which point I scuttled off to my summer 2010 internship at GrammaTech and proceeded to not look at the thing for four months. Working at GT was great, but when I came back in the fall after my internship ended and tried to pick up where I'd left off on our project in May, I had a hard time remembering what any of it really meant or where we were headed with it. I flailed a bit, thought a lot, wrote down some notes, and thought I'd gotten a decent handle on things again. But then I went to ICFP in Baltimore at the end of September, and Derek Dreyer asked me a very simple question about the project that I should have been able to answer but couldn't, and I wanted to curl up and die right there under the buffet table.
So I got to work. I worked pretty hard on the project for the rest of that semester -- largely made possible by the fact that I was finally, in my third year, being paid to do research rather than being paid to teach5 -- and I slowly and gradually started to understand what the hell was going on.
In our multi-language system, parametricity on the statically typed side is enforced the way it's usually enforced: to wit, statically, by the type system! But on the dynamically typed side, it's enforced dynamically. A seal is nothing more than a unique value, generated at run-time by gensym or its equivalent. If you want to hide the number 5, you generate a unique seal, then wrap the 5 in a function that takes a seal as argument and compares whatever seal was passed in to your uniquely generated seal, which the function has stored in its closure. If the function's caller knows the correct seal and passes it in, you'll get the 5 back; otherwise, you'll get an error. It's possible to get information hiding, and therefore parametricity, by creating a seal in a private scope and using it to seal values before handing them out to clients, then unsealing them when they come back. And if dynamically typed code is embedded inside statically typed code, it can use this dynamic sealing mechanism to wrap values that came from the static side and had abstract types that were hidden behind type variables over on that side. Thus parametricity can be enforced even in the part of the system that isn't statically typed. Run-time is a rather strange time to be trying to enforce parametricity, but nevertheless, that's when we were doing it. So if we wanted to show that parametricity was actually being enforced, our proof needed to somehow account for the run-time semantics of our language.
What was needed was a sort of run-time environment that could capture information about the seals that had been generated so far. Then we needed to develop an equivalence relation on terms that was parameterized by just such a run-time environment. Such a relation turns out to be what is known as a Kripke logical relation parameterized by possible worlds.6 As it turns out, step-indexed logical relations are already a special case of Kripke logical relations, because a step index (which is just a natural number, so called because it represents the number of steps remaining for future execution of a program) can be thought of as a very low-information form of "world". Actually, for a lot of situations, that step index turns out to be all you need, so part of the contribution of our paper is to illustrate a situation where a step index alone is not enough. In fact, the reason the original version of the proof was broken was because it tried to use step indices alone. We needed to fix it by using richer worlds that capture information about which seals are generated, when they're generated, and which compile-time entities (in particular, type variables on the statically-typed side of the language) the generated seals correspond to. That's what I finally came to understand after the better part of a year.
The very end of the semester was rough, because I unexpectedly also had my oral qual exam to study for. A few days before the exam, everything that I had been working on and thinking about with regard to both quals and research culminated in a breathless middle-of-the-night email to Dan in which I explained how the step-by-step derivation of the Y combinator in The Little Schemer is precisely analogous to Appel and McAllester's (step-)indexed model of recursive types.7 Dan wrote back, "Thanks so much for sharing this with me. Amal explained it to me, but I did not see the relation with fixpoints. Your description is quite clear because I can relate to types and chapter 9 and much of Davey & Priestly. I think that D&P might hold some insight into how to make step-indexing more beautiful. What a very pretty model." That made me pretty happy, but then, one day later, Stevie sstrickl Strickland and Aaron Turon made me even happier by inviting me to come give a talk at Northeastern's PL seminar in Boston. At that point, I had just committed to starting at Mozilla in March. (Oh, yeah, I was somehow also interviewing at Mozilla while all of this was going on. Oh, and also, getting engaged. Come to think of it, those things may have also had a little to do with the happiness.) So if I were going to do the talk in Boston, it would have to be in February at the latest. In December 2010, February 2011 seemed like a very long time away, so I said, "Sign me up!"
I took the qual exam in mid-December, and I passed, but I didn't make my best showing. I knew that with quals, the only thing that matters is whether you pass. Still, I wanted another chance to really show Amal what I had learned. I knew I'd learned a lot, but we'd been working together for a year and I still had no concrete results to point to other than a shaky paper draft, a half-finished proof, and a predilection for sending spastic emails to Dan in the middle of the night. So I decided that my talk at Northeastern needed to be about the multi-language parametricity project, even though it still wasn't done, and even though Amal was going to be on leave starting in January and wouldn't be around to help much.
I kept working on our proof through January and the early part of February, and I got most of the interesting cases done. It actually turned out to be a good thing that Amal wasn't there to answer my every question, because I learned more from wrestling with the proof on my own and digging up relevant references for myself. Then, in mid-February, I put the proof aside -- still not done, but closer! -- and started working on the talk. I worked really hard on that thing. I probably spent 40 hours working on the slides alone, if you count both times I overhauled them after confronting parts of the project about which my own understanding was fuzzy. One might have thought that working on our proof for so long would have forced me to have confronted those things already, but, embarrassingly, enough of this kind of work is mindless symbol-pushing that it's possible to get quite a long way on a proof without actually understanding some key idea. So it was a good thing that I was choosing to give the talk about this project -- because it made me understand my own project a lot better! I ended up going to Boston at the end of February feeling quite well prepared. The talk went well, and I got a wonderful message from Aaron afterward: "Your talk was one of the most enjoyable and comprehensible in recent memory. I particularly liked your intuitions on the duality between universal and existential types." He was referring to the beginning part of the talk, in which I'd tried to touch on What Parametricity Really Means And Why We Care About It In The First Place. I'd written that part with no guidance from Amal at all, so I was especially glad that Aaron liked it!
Another good thing about working on the talk was that it made me think critically about the way that Amal and I had organized our paper. The paper makes a series
of false starts in which it sneaks something that won't work past
the reader, but then turns around and says "Hah, tricked you! We can't
do it that way. Let's do it this other way." That style of organization didn't seem to work for the talk, so I did it in a different way; it builds up the work in
stages, but doesn't set out to fool the audience. After having written the talk that way, I'm tempted to change the paper to a similar style. I don't know if we actually will, but the point is that before working on the talk, I wouldn't have had enough perspective to be able to make meta-level criticisms of the paper. I feel a lot more comfortable doing so now.
After getting back from Boston, I went on vacation with Alex for a few days; then, I almost immediately moved to California and started working for Mozilla on Rust. Working on Rust didn't quite start out in the way I had expected. The job posting that had originally attracted me to Rust had mentioned its type-and-effect system, which was supposed to track mutability of values in Rust. The Rust team, the posting said, was interested in having a soundness proof of a simple model of Rust as reassurance of the cohesiveness of the type-and-effect system design. I thought that my familiarity with Kripke models would serve me well in working on such a thing, since one of the things Kripke models are good for is reasoning about type safety in the presence of mutable state. (Our parametricity project doesn't deal with mutable state as such, but seal generation is a stateful notion akin to storing a value in a new memory location.) By the time I arrived at Mozilla in March, though, the type-and-effect system was already on its way out of the language.8 So I needed to figure out what I was going to work on instead.
After flipping through a draft of the language manual, I was intrigued by Rust's classless, prototype-based object system with flexible, extensible objects -- pretty unusual for a statically typed language. I asked if the prototype-based stuff was for real, and was told, "Oh, most of that isn't implemented yet; you should implement it!" At that point, Rust had the beginnings of an object system: it was possible to create an object that had methods and fields and call those methods or access those fields from the methods, but objects weren't extensible, methods couldn't be overridden, and there was no form of self-dispatch, so methods couldn't call other methods on the same object except by way of a lengthy workaround. In fact, there wasn't even a "self" (or "this") keyword in the language at that time. So I spent my spring and summer implementing pieces of the Rust object system and learning that those pieces interact with each other in interesting ways. It was a great experience, and I learned a hell of a lot. By the end of the summer, I was also contemplating a thesis proposal idea based on Rust, but now I'm jumping ahead -- more on that will have to wait until the second half of this story.
I was originally only going to be at Mozilla for the summer, but Dave Herman had asked if I could show up any earlier than May, and there was nothing really stopping me, since Amal was going to be on leave anyway. I'm very glad that I arrived at Mozilla in March, because it meant that I had more time to work on my project -- I really needed all five of the months I had. But, moreover, it meant that I got to be there in April when the compiler went self-hosting -- first, the first time that the compiler written in Rust compiled itself, and then, a few days later, the first time that that compiler successfully compiled itself, producing a fixpoint. That was one of the cooler moments in my career so far -- so cool that I got the chills.
And that brings us more or less up to mid-June. I was halfway through my internship when I got the news from Amal, and the timing of it divides grad school in half rather neatly, too. So, sometime soon, I'll write about what's happened since then, and about my plans (and dreams!) for what the second half of grad school will be like.
I still spell "adviser" in the Grinnell style much of the time. If it's good enough for the Firefox spell-checker, it's good enough for me. (On the other hand, the Firefox spell-checker doesn't approve of "parametricity", so.)
An external fellowship can be a bad thing for precisely this reason. As Tim points out, "Being employed as a research assistant for a specific professor or research group integrates you socially and binds you to a commitment to deliver a particular kind of results -- a commitment that motivates you to finish your task by any means necessary, including collaborating with others. Having a fellowship empowers you to fuck around for almost three years and never get called on your shit." To be fair, though, the one-year fellowship I had did me a lot of good, because when I arrived at grad school, my formal CS education comprised precisely eight courses, instead of the twelve or sixteen that most people take before entering a CS graduate program. (Or more. At one point, I asked Alex oniugnip to count how many CS courses he took at Georgia Tech -- he took twenty-two as an undergrad alone.) So the fellowship turned out to be a good thing, because at the beginning I really needed some time to concentrate on taking classes. But I think I'd be worse off had it gone on any longer.
It also didn't help that I kept going off to attendworkshops that were only vaguely academic. I remember walking across campus at some point in April of 2009, trying to find a place where I could sit and work on my compiler. I couldn't decide where to go, so I thought to myself, "Where have I consistently been able to get a lot of work done on this project?" Then I realized that the answer was "SFO". When you live in Indiana, that's a strange realization to have.
Neis, Dreyer, and Rossberg writing in JFP this year about the Matthews and Ahmed 2008 paper: "A key technical difference is that their formulation of the logical relations does not use possible worlds to capture the type store (the latter is left implicit in their operational semantics). Unfortunately, this resulted in a significant flaw in their paper. They have since reportedly fixed the problem -- independently of our work -- using a technique similar to ours, but they have yet to write up the details." We're working on it!
Or to be female.
Actually, this is redundant, since as far as I can tell, authors in my field use "Kripke logical relation" interchangeably with "logical relation that is parameterized by possible worlds". (After squinting at the relevant Wikipedia talk pages, I see that not everyone is happy about the terms being used interchangeably, but I'm a computer scientist, not a logician, so for me, that's splitting hairs.) In her dissertation, Amal points out that Kripke models are named in honor of logician and philosopher Saul Kripke (you know; the Naming and Necessity guy), citing the 1963 paper in which he developed a model theory for propositional modal logic. But for the most part, PL authors don't seem to cite Kripke when they use Kripke LRs. From the point of view of a PL researcher, his work was general enough and long enough ago that I think citing him would be kind of like citing Alonzo Church every time you write a lambda. (Although that would be kind of awesome.) In fact, there are two POPL 2011 papers that actually mention Kripke LRs in the title, but neither cites Kripke at all.
And, more generally, that the notion of stratification in domain theory is analogous to the notion of stratification in Kripke models. Up until that point, I'd thought of domain theory, if I thought of it at all, as some difficult and mysterious thing that we didn't have to use because we used Kripke models instead. Now I recognize domain theory and Kripke models as sharing an essential ingredient.
It turns out, though, that some of the ideas that the type-and-effect system was about are working their way back into the language by way of the kind system.
I took time away from the Mozilla all-hands last week to help out on-stage at the Intel Developer Forum with the introduction of RiverTrail, Intel’s technology demonstrator for Parallel JS — JavaScript utilizing multicore (CPU) and ultimately graphics (GPU) parallel processing power, without shared memory threads (which suck).
Then over the weekend, I spoke at CapitolJS, talking about ES6 and Dart, and demo’ing RiverTrail to the JS faithful. As usual, I’ll narrate my slides, but look out for something new at the end: a screencast showing the RiverTrail IDF demo.
I had a lot to cover in a half-hour (a good talk-time in my view — we’ll see how the video comes out, but I found it invigorating). CapitolJS had a higher-than-JSConf level of newcomers to JS in attendance, so I ran through material that should be familiar to readers of this blog, presented here without much commentary:
The meaning of my color-coding should be intuitively obvious .
Here I must add the usual caveat that “ES6″ might be renumbered. Were I more prudent, I’d call it “ES.next”, but it’s highly likely to be the 6th edition, and you’re all sophisticated close readers of the spec and its colorful history, right?
The @ notation is actually “out” for ES6, per the July meeting (see notes). Thanks to private name objects and object literal extensions (see middle column), private access syntax is factored out of classes. Just use this[x] or p[x], or (in an object literal for the computed property name, which need not be a private name) [x]: value.
As close to CoffeeScript as I can get them, given JS’s grammar and curly-brace (not indentation-based) block structure.
Ruby-esque, Smalltalk is the grandfather.
You know you want it.
I’ve been clear about preferring block-lambdas over arrows for their added semantic value and compelling syntax mimicking control statements. It’s good to hear @jashkenas agree in effect.
@mikeal is right that JS syntax is not a problem for some (perhaps many) users. For others, it’s a hardship. Adding block-lambdas helps that cohort, adds value for code generators (see Harmony Goals above), and on balance improves the language in my view. It won my straw show-of-hands poll at CapitolJS against all of arrows, vaguer alternatives, and doing nothing.
The epic Hacker News thread on my last blog post in relation to Dart needs its own Baedeker. For now I’ll just note that Dart and the politics evident in the memo are not making some of my standards pals who work for other browser vendors happy. Google may fancy itself the new Netscape, but it doesn’t have the market share to pull off proprietary power-move de facto standards.
The leaked memo makes some observations I agree with, some unbacked assertions about unfixable JS problems that TC39 work in progress may falsify this year, and a few implicit arguments that are just silly on their face.
Still, I think we should react to valid complaints about JS, whatever the source. The number type has well-known usability and (in practice, in spite of aggressively optimizing JIT-compiling VMs) performance problems.
The last bullet shows pragmas for switching default numeric type and arithmetic evaluation regime. This would have to affect Number and Math, but lexically — no dynamic scope. Still a bit hairy, and not yet on the boards for Harmony. But perhaps it ought to be.
Coordinated strawman prototyping in SpiderMonkey and V8 is a tall order. Perhaps we need a separate jswg.org, as whatwg.org is to the w3c, to run ahead? I’ve been told I should be BDFL of such an org. Would this work? Comments welcome.
Remember, ridiculouslyparallel processing power is coming, if not already present, on your portable devices. It’s here on your laptops and desktops. The good news is that JS can exploit it without your having to deal with data races and deadlocks.
RiverTrail is a Narcissus-based JS to OpenCL compiler, packaged as a Firefox add-on. It demonstrates the utility of a new ParallelArray built-in, based on typed arrays. The JS-to-OpenCL compiler automatically multicore-short-vectorizes your JS for you.
As noted in the previous slide, because the ParallelArray methods compile to parallelized folds (see Guy Steele’s excellent ICFP 2009 keynote), associative operations will be reordered, resulting in small non-deterministic floating point imprecision errors. This won’t matter for graphics code in general, and it’s an inevitable cost of business using parallel floating point hardware.
The code looks like typical JS, with no hairy callbacks, workers, or threads. It requires thinking in terms of immutable trees and reductions or other folds, but that is not a huge burden. As Guy’s talk makes plain, learning to program this way is the key to parallel speedups.
Here is my screencast of the demo. Alas, since RiverTrail currently targets the CPU and its short vector unit (SSE4), and my screencast software uses the same parallel hardware, the frame rate is not what it should be. But not to worry, we’re working on GPU targeting too.
At CapitolJS and without ScreenFlow running, I saw frame rates above 35 for the Parallel demo, compared to 3 or 2 for Sequential.
The point of a technology demonstrator is to show where real JS engines can go. Automatic parallelization of ParallelArray-based code can be done by next year’s JS engines, based on this year’s Firefox add-on. We’re very close to exploiting massively parallel hardware from JS, without having to write WebCL and risk terrible safety and DoS bugs.
To close, I sought inspiration from Wesley Snipes in Passenger 57. Ok, not his best movie, but I miss the ’90s action movie era.
Seriously, the shortest path on the Web usually is the winning play. JS is demonstrably able to grow new capabilities with less effort than a “replacement” entails. Always bet on JS!
I gave a brownbag internship talk on http://air.mozilla.org yesterday. I don’t have access to a recording of the talk yet, but I’m going to go ahead and post the (slightly revised) slides that I used. Let me know if anything is unclear. I’ll post a link to a recording once it’s available.
Browser instrumentation is the technical basis for performance tuning and many web-related research projects. Such projects boil down to recording interesting data as the browser runs, and optionally modifying the behavior of the browser itself based on recorded data. In an extensible browser such as Firefox, there are a number of means to accomplish these ends, each with certain tradeoffs:
Extensions/addons are code written in high-level JavaScript and dynamically loaded at runtime. Extensions can record and modify the behavior of the browser to the extent allowed by the APIs of browser components (in Firefox, XPCOM components).
Extensions are eminently portable, cross-platform and relatively easy to author, but are limited in the data and behavior to which they have access. Many low-level subsystems (such as the layout engine and JavaScript engine) are off-limits due to the performance and complexity costs of exposing certain functionality and data to a COM-like API. Furthermore, even if there are no performance considerations, many potentially interesting pieces of data remain unexposed for lack of (widespread) need.
Many platform/OS-level dynamic instrumentation frameworks (such as DTrace, SystemTap, and Windows ETW) allow for custom instrumentation in both kernel and user code. A common metaphor for these systems is the insertion of “probes” into the code, which perform some user-defined action.
The main advantages of these probe frameworks is that they are incredibly low-overhead, and probes can be added and removed dynamically without recompiling or restarting. As a building block for browser research and tools, they are inadequate: probes are difficult to write correctly, are not cross-platform, and the obtained data does not easily feed back into the program being inspected.
Modifying the C++ source code of the browser is the brute-force method used by many performance analysts and researchers. The developer creates a fork of the browser source tree, makes local modifications, and optionally distributes a modified binary or source tarball to others who want to run the instrumented browser.
“Hacking” the browser in this way provides the ultimate power to record and change any behavior, but has significant drawbacks. The researcher/developer must invest much time and energy to navigate and understand a large, foreign codebase just to figure out where to add instrumentation. Many independent developers do not have the resources to test their changes on all relevant platforms. Most importantly, these modifications almost always doom the forked codebase (and thus the research project) to become a transitory software artifact instead of a useful tool.
jsprobes
jsprobes is cross-platform, portable, and flexible browser instrumentation framework. It follows the event-driven idiom common to DTrace and many browser components: certain locations in the code (“probe points”), when executed, can run custom bits of instrumentation. A significant feature of jsprobes is that this instrumentation (“probe handlers”) is written in JavaScript. Browser data structures are exposed to instrumentation through a safe, easy-to-use data API.
The framework was created to address the weak points of existing approaches and build on their strengths. More specifically, the design goals include:
cross-platform: jsprobes is implemented as part of the browser platform, instead of beneath it. There are no dependencies on specific operating system features, architectures, kernel modules, or escalated privileges (root access/jailbeaks).
flexible: Probe handlers have access to the full JavaScript language, and can leverage existing libraries. Adding new probe points or exposing additional data to instrumentation code is simple and customizable.
The handler execution model could be extended to serialize the probe event stream and data to disk, allowing for offline or remote evaluation of handlers.
easy-to-use: handlers are written in JavaScript, and handler-writers need not know the inner workings of Firefox data structures and architecture. Simple probes can be written using higher-level data API’s, while complex probes can use lower-level data API’s that more closely mirror the C++ view of browser data structures.
portable: Extensions are the de-facto way to share browser customizations; extensions can add, remove, and communicate with jsprobes-based probe handlers using a standard XPCOM interface.
cross-language: Firefox is implemented using a mix of C++ and JavaScript. jsprobes supports custom conversions between the data representations of C++ and JavaScript (or even JS-JS).
low overhead: The architecture of jsprobes minimizes time spent by the main thread to execute probe handler code. Most handlers are executed asynchronously on a side thread, and communication happens via asynchronous message passing.
Although not yet implemented, the design is also conducive to future adoption of “zero probe effect” techniques used by other instrumentation frameworks like DTrace. These techniques are a prerequisite for jsprobes to land in Firefox trunk.
Current status
In it’s current incarnation, jsprobes is exposed via the nsIProbeService XPCOM interface, which provides a high-level API to control the core instrumentation engine implemented inside SpiderMonkey. Accompanying this service and the core instrumentation engine are dozens of stub calls throughout the Firefox code base. These source locations are well-known and are shared by other instrumentation frameworks such as DTrace and ETW.
Currently jsprobes is distributed as a patch series, and is available on my bitbucket account. The patches track mozilla-inbound fairly closely as of the time of this writing. See the bitbucket page for more specifics.
Example: understanding GC behavior
Suppose you would like to know which benchmarks cause garbage collection (GC), the duration of such collections, and how much garbage is collected. At a data level, one needs to know the start and end times of each GC cycle, and the heap size before and after each GC.
Registering your instrumentation
Instrumentation is added and removed via an event handler-like interface, which should be familiar to DOM/JavaScript programmers. Each place where instrumentation could be added (i.e., at GC start and end) is called a probe point. Each probe point has a fixed set of arguments: for example, the current JavaScript context, the current runtime, or the GC compartment to be collected.
The instrumentation that should be executed upon reaching a probe point is called a probe handler (to distinguish it from event handlers). Probe handlers have access to the probe point arguments, but the actual handler code is run asynchronously on a separate thread with its own JavaScript heap. So, the probe point arguments are serialized when the probe point is reached, and then later deserialized into the handler-heap when the handler runs.
In our example, the probe points of interest are GC_DID_START and GC_WILL_END. These probe points are constants defined in the nsIProbeService interface, and each are described in the (forthcoming) documentation. Their interfaces are:
GC_DID_START(runtime)
GC_WILL_END(runtime)
Each argument type has a specific API as well. Below, to the left of the arrow is the field name, and to the right of the arrow is the data type. Capitalized types are representable by instances of the equivalent JavaScript prototypes. (env is a special implicit argument to all probe points, and is always available.)
runtime:
heapSize -> Number
gcTriggerBytes -> Number
heapLastSize -> Number
heapMaxSize -> Number
env:
currentTimeMS -> Date
At each of these points, we want to record heap size and current time. Fortunately, this data is readily available from the probe point arguments above. So, our two handlers should look like so:
/* handler for GC_DID_START */
/* format: [startTime, stopTime, startHeap, stopHeap] */
record = [env.currentTimeMS, 0, runtime.heapSize, 0];
/* handler for GC_WILL_END */
record[1] = env.currentTimeMS;
record[3] = runtime.HeapSize;
pendingData.push(record);
There is one more piece required before we can register these probe handlers: a data specification. Probe handlers run asynchronously, but data must be captured and serialized on the main thread when the probe point is reached. Capturing all of the data is expensive—-Data specifications tell the instrumentation engine exactly which values need to be captured.
Data specifications are easy to write. Both of the above probe handlers have the same data specification:
Note that probes.addHandler returns a cookie, which is usable later as an argument to the corresponding probes.removeHandler API method.
Now that we’ve registered some probe handlers, subsequent garbage collections will trigger our instrumentation. This is great, but we eventually want to know the results of our instrumentation. nsIProbeService has one more method called asyncQuery. The arguments to asyncQuery are 1) some JavaScript source to run, and 2) an optional callback to invoke whenever a message is sent using postMessage from the handler thread to the probe service (main thread). For our example, we need two calls to asyncQuery: one for setup, and a periodic script that collects pending data records:
/* setup: run this before registering handlers */
probes.asyncQuery("pendingData = [];", function(){});
/* collection: run this periodically to marshal results */
probes.asyncQuery("while (pendingData.length) { " +
" postMessage(pendingData.pop()); " +
"} ",
function(msg) { results.push(msg); });
Once data starts rolling in via the callback, we can take action on the results array of records. For example, we could graph the data to show heap size over time, or plot GC pause length vs. garbage claimed. In fact, I have developed a demo extension called about:gc which does exactly this. It is available at my bitbucket.org/burg/aboutgc.
A brief note to all application and operating system developers:
Failure to respond to a prompt does not indicate accession to perform the actions that would otherwise occur. Especially if those actions have a high risk of losing my data. So I would like to thank you (whichever product is responsible for my most recent undesired restart) for completely obliterating my class notes. At the very least, you did not destroy all of my research too, just a day's worth.
Well, it has the beginning of a macro system. There are a few things that are glaringly absent:
hygiene — It’s really nice to have hygiene, but the standard Dybvig algorithm for hygienic macro expansion was a bit too big to implement, and still have time for all the other things.
macro scope / macro export — Currently, macros can only be used in the same file they were defined, and they don’t respect any kind of scope.
AST agnosticism — The macro expander gives privileged status to expressions. But it would be nice to be able to abstract over other parts of the AST.
In spite of these things, a story for declarative macros in Rust has taken shape:
Macro expansion has its own phase, and the compiler can emit expanded code, which can be useful for debugging.
Macro expansion happens in the “correct” but unintuitive order: the outermost macro invocation is expanded first.
Syntax stores a “backtrace” indicating the series of macro invocations that produced it. This has been useful for debugging errors in code produced by complex macros. However, it’s a bit of a hack. In particular, doing the Right Thing for macro-defining macros would require making the backtrace into a tree…
You can use Macro By Example to define macros. It’s a simple, intuitive, declarative way to specify macros, borrowed from Scheme. Unfortunately, I can’t find any good introductions to it, so all I can suggest to the curious reader is that they search for “ellipses” in JRM’s Syntax-rules Primer for the Merely Eccentric, or reading the paper that codified the system: “Macro-by-example: Deriving syntactic transformations from their specifications”.
Complicated macro invocations in Rust currently look much like Scheme macros, if you turned all the parentheses into square brackets. We have some interesting ideas for how things could be different. But that’s for the future, and for now, I’ll sign off.
I spoke at TXJS, a really excellent regional JS conference, in June. Thanks to @SlexAxton, rmurphey, and everyone else involved. My talk was concerned with the good, bad, and ugly of Ecma TC39 (and I mean those words in the best possible way!), mixing philosophy with historical events from the last 14 years.
After describing the standards process and its history in Ecma, I presented the good stuff that’s going into ES6 (I gave a remixed version of this part of the talk at an SFTechTalk hosted by Twitter last month — thanks to @valueof and @chanian for hosting). As a bonus, I closed with a paren-free update, whose punchline slide is included at the bottom.
I am not Tuco.
We don’t smoke, but the rest is pretty much right.
Here I praised Jan van den Beld, former S-G, Ecma — a raconteur and polymath, for his stewardship of Ecma.
Yes, Netscape picked ECMA (now Ecma, pay attention ) as standards body to which to submit JS mainly to give Microsoft grief, since ECMA had bravely standardized some part of the Windows API. But Jan and the entire TC39 TG1 group played fair on ES1, the first ECMA-262 Edition. Microsoft was so happy they standardized early C# and CLI metadata specs at Ecma.
Not Bruce Campbell, so not me. That means I’m the Bad.
Here is something that the Google leak about Dart (née Dash) telegraphs: many Googlers, especially V8 principals, do not like JS and don’t believe it can evolve “in time” (whatever that might mean — and Google of course influences JS’s evolution directly, so they can put a finger on the scale here).
They’re wrong, and I’m glad that at least some of the folks at Google working in TC39 actually believe in JS — specifically its ability to evolve soon enough and well enough to enable both more predictable performance and programming in the large.
There’s a better-is-better bias among Googlers, but the Web is a brutal, shortest-path, Worse-is-Better evolving system.
I’ve spent the last 16 years betting on the Web. Evolving systems can face collapses, die-offs, exigent circumstances. I don’t see JS under imminent threat of death due to such factors, though. Ironic that Google would put a death mark on it.
Here I propose that Crock is Plato and I am Aristotle, and that while we need to “keep it real”, we must also hew to high ideals.
The ideas I cite here, represented by the Hermenuetic Circle, definitely apply to TC39′s understanding of the text of ECMA-262, as well as various canonical texts in Computer Science. The committee works best when it spirals in on a solid design, avoiding local and premature optimizations and pessimizations.
Every one of these “Bad Parts” has been on parade in TC39 in recent years, including 2011. I’ve been tempted by the second one, horse-trading, so I’m not innocent (no one is).
The “Scenario Solving” one is particularly subtle in that the Scenario proposed to be solved is quite often a very real developer problem. But a complex, ad-hoc solution to it, especially when rushed, too often is unsound. We would do better to take the Scheme lesson to heart, and develop sound and valid orthogonal primitives. Higher-level libraries built on them can be standardized post hoc.
These meta-discussions are sometimes quite funny in light of the vendor whose representative is making them. I note that C# can now be written to look like JS. Why shouldn’t any particular extension to C# be at least considered (not rubber-stamped of course) for JS standardization?
These slides provide a glimpse of ES6, still under construction but already full of good stuff.
I hope the way JS is developed and standardized, as much in the open as the existing standards process permits, and with developer feedback via es-discuss, twitter, and the various conference talks, helps. If not, we may as well wait for some single-source solution to descend from the mountain. And then hold our breaths waiting for Firefox, IE and Safari to implement!
While paren-free in all likelihood won’t make ES6, the for-of loop will, and comprehensions and generator expressions in ES6 will be paren-free. Yay!
Perhaps I should have read the Avro documentation a little more closely. It requires Boost headers, the Boost regular expression library, as well as flex and bison in order to compile. If I don’t want my code to have a chance of making it upstream, adding these big libraries is a good start. Meanwhile, it seems upb is still too premature. In the meantime, I will just pass bulky, inefficient C++ data structures around :(
Since the last collabopt update, I’ve put some more work into the server software (dubbed optserver) that collects and aggregates profiles. This server is implemented in NodeJS and sqlite3. I have a support request in for a test phone that can run Fennec (aka Mobile Firefox). Once this comes, I will collect some profiles and see what the potential speedups are for the image load order optimization. (There may also be some tuning necessary to make the profile recording-extension work with mobile.)
Recently I have also thought of some other potential optimizations that should not be too hard to aggregate or optimize:
GC frequency/allocation rate: if we can observe that a particular website is animation-intensive or allocates a lot of short-lived objects, then we could pre-emptively run the GC more often than normal.
Compartment heap size: If we notice that particular websites keep a lot of objects alive, then fruitless GC time can be reduced by making the heap larger than normal.
Default image size: When loading webpages, often the size of image elements is unknown until the resource has been downloaded and decoded. In absence of size hints (such as the width="80" HTML attribute or min-width: 80px CSS property), the rendering engine must guess a default size for the image element. If this guess was inaccurate (known after downloading the image), the guessed image size may negatively effect on any following page content. If the guess was too big, page content may be pushed further down than necessary, and cause a layout reflow.
To avoid costly reflow, we can observe actual image dimensions during multiple loads of a website, and give this size information to new site visitors.
All three of these optimizations are conceptually simple, but recording profiles with this data is difficult or impossible if one can only use the internal JavaScript APIs of the browser. The values we seek to record are deep in the guts of the JavaScript engine (1 and 2) and layout engine (3). We require a different approach to recording this data in a low-overhead way. This different recording approach is the next major technical challenge of my internship. I have been brainstorming most of this week on approaches, and will write another post once the design has settled more.
My brother is learning how to play Dwarf Fortress. We talk about it over IM sometimes.
Eric: “I’m learning how irrigation works”
Eric: “for example, I have now irrigated my hospital”
If all you’ve heard about Dwarf Fortress is isolated anecdotes, you probably don’t have a very clear idea of what kind of game it is. Neither do I, and I’ve spent many hours playing it. Things that you may need to think about in Dwarf Fortress include (a) prevalence of ores in different geological layers (b) danger of infection of untreated wounds (c) jealousy amongst aristocrats regarding room furnishings (d) risk of military training injuries (e) ease of access of drawbridge/floodgate/hatch control levers (f) pasture size for grazing animals (g) psychological impact of failure to bury the dead.
Dwarf Fortress has a maddening difficulty curve, and even a good player has many corners of the game that they still do not understand. The UI is inconsistent and inconvenient, but often quite powerful.
It reminds me of some software tools. TeX comes to mind, and Git (a little), and especially Emacs. They are powerful, strange, and complex. It’s virtually impossible to learn any of these tools without having an in-house guru to run to when you get stuck. And when you do start to get the hang of one of them, to the point that you can be a mini-guru to your friends, you begin to worry about what percentage of your brain you’ve devoted to it. There are probably other essential tools that are similarly baroque.
Now here’s the thing that puzzles me: what’s going to happen in the future? Emacs has been around for over 30 years. I’d be surprised if the number of Emacs users were shrinking. Most of the programmers I’ve known use it, regardless of what generation they come from. Can this go on forever?
Like Dwarf Fortress, I suspect that everyone who’s ever used it has had fantasies about throwing it away and building a new Emacs, without all of the baked-in, grotty design decisions from the past. None of this dynamically-scoped elisp nonsense, let’s use Scheme! Let’s revamp the built-in terminology so that “window” means what it means everywhere else! Let’s figure out a better default set of keybindings! Let’s have something like ELPA, except less not-used-by-anyone!
Someday, one of those mad visionaries will succeed. And we will be given a text editor better than Emacs. Can you imagine?
Back when I was looking at reducing disk usage of running DXR, I used the sizes of the generated CSV files to make a rough estimate of how many times the average line of mozilla-central needs to be parsed and compiled (the answer is around 20). Now, having just respun a build of mozilla-central with the newest version of DXR, I have a larger database with some fairly accurate statistics of its size. This all started when I wondered what our most common function name was, so I decided to make a list of factoids (it's also a chance to refresh my SQL memory). All of the statistics come from a build of Firefox today on Linux x86-64 with debug and tests disabled.
Notes on accuracy
DXR tries as hard as possible to correlate data back to the original source code, and it is very likely that it can get itself confused in some weird circumstances. I don't yet have a good idea of where all of the buggy cases are, but I am aware of some broad strokes. Pretty much all information that deals with references is likely missing large swathes of the codebase. Information about callers only counts explicit calls (i.e, call expressions). The most accurate data I have is the macro information and the least accurate is information involving a templated class in almost any fashion. I also suspect that scope information is malformed in a non-empty set of cases, and I'm pretty sure that the number of global objects in any count is overcounted.
Type statistics
I count in mozilla-central just 33,555 distinct types. Of these, we have 13,648 typedefs, 6,523 structs, 6,470 classes, 5,147 enums, 1,410 interfaces, and 357 unions. Of all of these, 13,740 types are nested in another in some fashion, and another 4,524 types are templated. I'm not counting separate template instantiations as new types, but I am counting specializations individually.
Then there's the inheritance. I found 7,715 direct inheritance relations, all but a handful of which (around 200) are public. These relations account for 2,317 distinct base classes and 6,157 distinct subclasses. Naturally, some types are inherited much more than others. The winner, by a factor of 6, is nsISupports, having a whopping 3,156 implementations. Pickle and IPC::Message tie for second place with 501 implementations each; nsIRunnable takes 4th place at 282 implementations. The 5th place goes to nsXPCOMCycleParticipant (242). Rounding out the top 10 are nsIDOMElement (226), nsRunnable (224), nsScriptObjectTracer (209), nsSupportsWeakReference (154), and nsIObserver with 144. Subclass relationships involving templates are not counted, which may bump a few classes up into this list.
Macros
There are 42,457 distinct definitions of macros to produce 38,045 distinct names. Of these, 30,475 look like variables and 7,647 look like functions, which implies that 77 macros take on both depending on how they got defined.
How about calling them? I count just 482,142 macro invocations, so each macro is being invoked about 12 times on average. But... 20,628 of our macros are never used (or almost 48.6% of them), so the average is closer to 25 times. Of course, some macros really get used. Here are the top 5:
Count
Macro name
23,109
nsnull
22,941
PR_FALSE
20,139
NS_OK
13,154
NS_IMETHOD
13,138
NS_IMETHODIMP
Functions
There are 137,903 functions in mozilla-central. Of these, there are 68,867 having a distinct name. Of these, I found 33,693 in the global scope, and 8,943 that were a member of a class. Directly templated functions comprise about 2,291 functions. Just for fun, I found that there are 774 functions named exactly "Init" and a further 1,681 that begin with "Init" (case-insensitively in the last case).
In terms of calling these functions, I found 291,598 distinct edges in the callgraph (this is definitely an underestimate, since I am missing a large number of cases). For my usage, a callgraph is not a traditional directed graph but rather a hypergraph, where each edge goes from a single head to a set of nodes in the tail. These comprise 85,144 distinct callers and 67,098 distinct targets. Of the targets, I found 51,590 distinct functions being called statically, 13,274 distinct virtual functions invoked, and 2,234 distinct function pointers or pointers-to-member-functions being called. If I break it up by calls, 246,086 of the calls are static function calls, 41,770 virtual function calls, and 3,742 function pointer calls. I want to emphasize here that information pertaining to templates, in particular nsCOMPtr is completely missing, so a lot of calls to nsISupport's methods are missing, which is going to horribly skew the statistics.
Counting the function pointers, we have 65 pointers-to-member-functions and around 1700-1800 function pointers (those numbers do not add up to what I should get above, but I'm not sure who's in error here). I count about 19,721 virtual functions that I generated target information for (a subset of all virtual functions), and 65,761 implementations of those virtual functions, so the average virtual function has about 3.4 implementations. In addition, I found about 981 of these virtual functions were also called statically.
Now I'm sure, having mentioned it earlier, that you too are now wondering what the most common function name in mozilla-central is. The answer should be pretty obvious when you consider the most heavily-implemented class. And the winners are...
Count
Function name
1,687
Release
1,680
AddRef
1,600
GetIID
1,587
QueryInterface
774
Init
659
operator=
549
Log
517
Read
506
Write
396
GetType
The top 4 methods are related to XPCOM; the famed Init method is a mere 5th place. Of interesting note is that there are 659 assignment operators; I'm guessing some of these may be default copy constructors implemented for non-POD classes.
Variables
There's not much to say here. We have some 623,237 variables of some kind. Most of these, naturally, are local variables or parameters: 516,627, to be precise. We additionally have 82,008 members of some compound type, and 24,602 global variables. Some interesting statistics would be to compute the number of variables whose names defy our naming convention or the number of static constructors that need to be run before startup, but that data is harder to compute.
Warnings
I count 5,283 reported warnings for mozilla-central. Of these, 1,608 are warning about the use of non-virtual destructors with virtual functions. Another 940 warn about our use of mismatched enumeration types. There are 1,348 warnings of unused things. Finally, there are 1,551 warnings about use of extensions, leaving 466 "miscellaneous warnings".
Build statistics
The final set of statistics I have is mere size statistics for the build information. There are about 8,581,746 non-empty lines of text comprising some 320MiB of data. These are organized into about 51,469 files, including 1,469 files of generated files in the build directory. My output SQLite file was 464MiB, easily comprising around 3.5 million rows of data. We also have about 38MiB of binary files (like PNGs) in the source tree. Almost 15MiB of the generated included files were produced, comprising around 354,893 non-empty lines of text.
I think the best way to summarize this data is "mozilla-central is a massive codebase." It also goes to show you why just looking at source code without compiling is a bad idea: around 3-5% of our source code actually isn't in the source tree to begin with but is instead automatically created at compile time.
While testing the separation of database generation and the actual webpage, I had discovered that there was a weird bug in that SQLite cheerily opened the database but then complained that the database was invalid. I thought this was quite weird, and double-checked that the file had copied correctly: same size, correct permissions, file gives the same information (SQLite database, version 3). After a bit of thought, I found the problem:
That worked wonderfully. So if you ever need to fix a problem with SQLite-incompatible versions, that is how you dump a database to sqlite and import it again. While I'm on the topic, this is worth paying attention to:
-rw-r--r-- 1 jcranmer jcranmer 27394048 Aug 2 17:13 spidermonkey.sqlite
-rw-r--r-- 1 jcranmer jcranmer 27419572 Aug 2 17:17 statements.sql
The list of SQL statements is only 0.1% larger than the SQLite file. If I look at the older database, it's actually smaller than the database. Food for thought.
Some of you may be wondering why it is taking me so long to build a DXR index of mozilla-central. Part of the answer is that I'm waiting to bring DXR on clang back up to the original instantiation in terms of feature support. The other part of the answer is that mozilla-central is rather big to index. How big? Well, some statistics await…
I first noticed a problem when I was compiling and found myself at 4 GiB of space remaining and dwindling fast. I'm used to fairly tight space restrictions (a result of malportioning the partitions on my last laptop), so that's not normally a source of concern. Except I had been at 10 GiB of free space an hour before that, before I started the run to index mozilla-central.
The original indexing algorithm is simple: for each translation unit (i.e., C++ file), I produce a corresponding CSV file of the "interesting" things I find in that translation unit—which includes all of the header files included. As a result, I emit the data for every header file each time it is included by some compiled file. The total size of all data files produced by the mozilla-central run turned out to be around 12GiB. When I collected all of the data and removed duplicate lines, the total size turned out to be about 573MiB.
Step back and think about what this means for a moment. Since "interesting" things to DXR basically boil down to all warnings, declarations, definitions, and references (macros and templates underrepresented), this implies that every declaration, definition, and reference is parsed by the compiler, on average, about 20-25 times. Or, if you take this as a proxy for the "interesting" lines of code, the compiler must read every line of code in mozilla-central about 20-25 times.
The solution for indexing is obviously not to output that much data in the first place. Since everyone who includes, say, nscore.h does so in the same way, there's no reason to reoutput its data in all of the thousands of files that end up including it. However, a file like nsTString.h (for the uninitiated, this is part of the standard internal string code interfaces, implemented using C's versions of templates, more commonly known as "the preprocessor") can be included multiple times but produce different results: one for nsString and one for nsCString in the same place. Also, there is the question of making it work even when people compile with -jN, since I have 4 cores that are begging for the work.
It was Taras who thought up the solution. What we do is we separate out all of the CSV data by the file that it comes in. Then, we store each of the CSV data in a separate file whose name is a function of both the file it comes in and its contents (actually, it's <file>.<sha1(contents)>.csv, for the curious). This also solves the problem of multiple compilers trying to write the same file at the same time: if we open with O_CREAT | O_EXCL and the open fails because someone else created the file… we don't need to do anything because the person who opens the file will write the same data we wanted to write! Applying this technique brings the total generated CSV file data down to around 1GiB (declaration/definition mappings account for the need for duplicates), or down to about 2 times the real data size instead of 20 times. Hence why the commit message for fixing this is titled Generate much less data. MUCH LESS.
Of course, that doesn't solve all of my problems. I still have several hundred MiBs of real data that need to be processed and turned into the SQL database and HTML files. Consuming the data in python requires a steady state of about 3-3.5 GiB of resident memory, which is a problem for me since I stupidly gave my VM only 4GiB of memory and no swap. Switching the blob serialization from using cPickle (which tries to keep track of duplicate references) to marshal (which doesn't) allowed me to postpone crashing until I generate the SQL database, where SQL allocates just enough memory to push me over the edge, despite generating all of the SQL statements using iterators (knowing that duplicate processing is handled more implicitly). I also switched from using multiple processes to using multiple threads to avoid Linux failing to fork Python due to not enough memory (shouldn't it only need to copy-on-write?).
Obviously, I need to do more work on shrinking the memory usage. I stripped out the use of the SQL for HTML generation and achieved phenomenal speedups (I swore that I had to have broken something since it went from a minute to near-instantaneous). I probably need to move to a light-weight database solution, for example, LevelDB, but I don't quite have a simple (key, value) situation but a (file, plugin, table, key, value) one. Still not hard to do, but more than I want to test for a first pass.
There remains, as of this blog post, one major regression from the original dehydra-based DXR, and that is the lack of support for callgraph. The main problem I have here is trying to pin down the scope of the feature. To give an idea of why this hard, I pose the following challenge: how many different ways are there in C++ of invoking a function, as would be viewed from the generated assembly code? Here is my list:
Global function
Invocation of non-virtual member
Nonvirtual invocation of virtual member
Constructor, copy constructor invocations
Implicit destructor invocation
Functors (objects that implement operator())
C++0x lambdas (which appears to be a functor)
Count all of these as "one," if you like, since they're all still more or less boil down to a call addr instruction, so it really boils down to how pedantic you get about the differences. Most of these look fairly distinct in source code level as well too. But, on the plus side, being a statically-known function call, all of these are easy to handle.
Virtual member function invoked virtually
This is clearly different at an assembly level, since it requires a vtable lookup with the added possibility of thunking. From a handling perspective, though, this is more or less statically known, since we know the possible targets of a virtual method call are limited to subclasses of the method, if we assume that people aren't sneaky.
Function pointers
Easy to dip into, impossible to think about off the top of your head (how many tries does it take you to write the type of a function pointer that returns a function pointer without using typedefs?), and it requires data-flow analysis to solve. The main question is if it's even worth trying to at least plan to support function pointers, or if they should just be left out entirely.
Pointer-to-member function pointers
Theoretically easier than function pointers (because we again have a more limited state space), but they still suffer from the same need to data flow analysis. Not to mention that they are much harder to use, and that their implementations are much different from function pointers.
Template dependent name resolution
Templates, as far as C++ is concerned, is essentially a macro system that is more powerful and more inane than the C preprocessor. So if the expression that is to be called is dependent on the type parameter, then the function call can be any (or all) of the above depending on what was passed in.
throw
The entire exception handling semantics in C++ essentially gets handled by the ABI as a combination of magic function calls and magic data sections; undoubtedly a throw itself is a function call which gets to do stack unwinding. Most of this stuff isn't exposed in the code, it just leaks out in semantics.
One goal I've had in mind is to make sure I can do some reasonable support for dynamic languages as well as static languages. To that end, I've looked up other callgraph implementations to see what they do in the realm of indirect function calls. The answer is a mixture of either "do runtime instrumentation to build the callgraph" or "ignore them altogether." The closest I've seen is one implementation that indicated when the address of a function was taken.
Over the past two weeks I’ve been prototyping collaborative optimization (as proposed in the FIT paper). The most simple optimization in that proposal is image load order optimization. Recall that we want to observe actual image load orders across multiple page loads/users, calculate whether there is a better ordering than can reduce render time for the initial viewable area of the webpage, and then disseminate/use the calculations to actually improve load time.
The main task thus far has been collecting profiles for the “image load order” optimization. This is implemented using the new Addon SDK (aka Jetpacks) for Firefox. Actually, two profiles are collected: one characterizes the placement of image elements and background images, while the other characterizes the load times of all resource requests. The latter will be used to perform a simple limit study—that is, calculate the potential speedup using real timing data. I want to do this to see whether it’s worth implementing changes to the image request logic deep in the innards of Gecko. If there’s very little theoretical speedup, then I will spend my time looking for more profitable optimizations.
The rest of this week will be spent generating some more profiles and writing code to store and mine the data (probably using NodeJS and node-sqlite). I’ve found plenty of rough edges in the Addon SDK, and too many ideas for my main project this summer at Mozilla (which I will detail next week).
I really love open source development, open research, and am interning at Mozilla right now. So, it’s a natural decision to make my experiments and code publicly available. Right now the collabopt repository is on BitBucket at bitbucket.org/burg/collabopt/. Take a look if you are so inclined, and definitely let me know if you have ideas for collaboration, experimentation or improvement. A website will appear sometime in the fall, when I have more to discuss.
When I came to Mozilla this summer, I told my co-workers that I was casting about for a thesis topic. For the last few weeks, I've been mulling over an idea that I think could lead to one.
Although my idea addresses a problem that I don't think is specific to Rust, I did arrive at it by working on Rust, so I'll start there. The Rust compiler is implemented with an LLVM backend; it translates
Rust code to the LLVM IR. Before I started working on Rust this
spring, I knew nothing about LLVM, but I've become a fan. By
targeting LLVM, there are a whole set of machine-specific issues
(register allocation, instruction selection, code generation) that our
team (mostly) no longer has to think about, allowing us to (mostly) concentrate on higher-level language design issues. I like the modular design of LLVM; perhaps the best part is that by targeting LLVM we're able
to take advantage of optimizations that other people contribute. If
a compiler targets LLVM, then at least in theory, improvements to LLVM
will make the compiler better over time without the compiler writers having to do
anything. In the case of Rust, we've made a few contributions upstream to
LLVM, too, so the relationship is, we hope, mutually beneficial.
Now, the LLVM folks have a way of making it sound like it's very easy
to write an LLVM-backed compiler: all you have to
do is write a straightforward front-end that massages source
programs into LLVM IR, and you're done. This might not be too
far-fetched of a thing to say if the source language you're compiling
is C. If you're compiling a much higher-level language, though,
translating to LLVM IR is nontrivial. In the Rust compiler, the translation from the fully-fleshed-out-and-annotated AST to LLVM IR happens via a single, monolithic, 9000-line pass known
affectionately as "trans". It's almost impossible to try to reason about a compiler pass this monolithic. We can ameliorate some of the pain of dealing with trans by separating it into
modules that handle, say, task and communication-related things,
vector operations, and so on; we've started doing this, and it does
help. We've also talked about abstracting low-level stuff out into what we call an "LLVM combinator library", which would present higher-level abstractions for common things we do with LLVM, and that
will help, too. But what would really be nice would be if we had more intermediate
languages. Even just one well-designed intermediate language would go
a long way toward making this part of the compiler easier to reason about and less
buggy, and at least one Rust developer besides me has advocated for such an intermediate language in discussions on our IRC channel.
But here's the thing: it's not just Rust!
Everyone who compiles a high-level language to LLVM is going to have
this problem and is going to have to deal with it one way or another.
So, wouldn't it be great to have a language that sits on top of LLVM, compiles to LLVM in a
semantics-preserving way, and serves as a target language for compiler
writers who are trying to compile high-level languages but who want to
take advantage of the goodness of LLVM? That's the language that I propose designing.
In order to do the proof of semantics preservation, I'd need a
formal semantics for LLVM, and from what I can tell, nobody has one of those yet. (It looks like Steve Zdancewic and Jianzhou Zhao recently embarked
on an LLVM-in-Coq project, but they haven't published anything yet.) So, the main contributions of this work would be
as follows:
It would make compiler writers' lives easier by giving them a
language to target that isn't as low-level as LLVM IR, but still
lets them exploit the goodness of LLVM in their compiler -- which,
after working on Rust, I believe is a real need.
It would give a formal semantics to a subset of LLVM, which I think would
be useful in many contexts outside of this work.
Rather than having to give a semantics to all of LLVM, I
can just give a semantics to the target language of compilation, which
I think could be a rather smaller subset of LLVM IR. I think a proof
of semantics preservation would be valuable even if it's only for a small subset of LLVM, because part of
the point of the LLVM infrastructure is that it's okay to produce
correct-but-inefficient IR and hand it off to the LLVM optimizer. If
we don't have to worry too much about producing efficient
LLVM IR, as long as we produce correct LLVM IR, then we can
compile to some minimal subset of LLVM that's expressive enough for
our needs and then let the optimizer go to town on code written in
that subset.
What we're after here, by the way, is not end-to-end verified compilation; for
that, there's CompCert and its ilk. Instead, we are working on the
assumption that an LLVM backend is something some people are going to
want anyway. Having said that, Greg Morrisett, Jean-Baptiste Tristan,
and Paul Govereau have work
underway on validating LLVM optimizations, so if someone
did want an end-to-end verified compiler that used LLVM, this
work and theirs could be complementary pieces of that larger
puzzle.
In the last couple of weeks, I've gotten the chance to pick some brains about my idea. So far I've gotten some great questions and feedback:
Adam Foltzer asked how my proposed intermediate language
would compare to C-- in terms of its level of abstraction. Adam may have been thinking of how, in the LLVM-backed version of GHC, an intermediate language that's based on C-- compiles to LLVM. However, C-- isn't
really higher-level than LLVM; in fact, the level of abstraction they
offer is quite similar. The C-- developers characterize it as a
portable assembly language, and that sounds a lot like LLVM. (So why
does GHC compile from C-- to LLVM, then? There's a great
paper about the LLVM GHC backend from last year's Haskell
Symposium that answers that question in detail.) So the answer is
that I'm aiming for something higher-level than C--. How much
higher-level is still an open question. The trans pass of Rust gives
me an idea of the range of abstraction I'm aiming for, but between the
level of abstraction of the input to Rust's trans pass (which is a big, hairy
AST with a whole bunch of stuff hanging off of it) and the level of
abstraction of LLVM, there's still a lot of room. This is something
that I'd especially like more feedback from people about. I'd like to
choose a set of abstractions that serve as a sane target for a variety
of source languages -- say, a purely functional one, a
super-OO-Kool-Aid one, and a super-concurrent-Kool-Aid one. I
wouldn't, I hope, actually need to implement a compiler from each of
these languages to my intermediate language. That seems like too much
work for one grad student. But I'd like to be able to have a reasonably convincing story for how my language could be a target for
any of them.
Dan Grossman pointed out that I'd need to consider how a
compiler writer who wanted to target my language would deal with
run-time system matters, particularly garbage collection. That
reminded me that most of the complexity of the aforementioned enormous
trans pass in Rust is code that handles reference counting. This is a
stopgap measure that will be replaced by a real garbage collector one
day, and I suppose that might cut down on the complexity, but in any
case, I think I'd want my language to sit at a level of abstraction
above garbage collection. On the other hand, I don't want it to tie
the compiler writer to a particular set of GC choices. This, I think,
is an example of the high-level-abstraction-versus-low-level-control
tension that drives a lot of research in our field. Actually, C--
found a neat solution to the eat-cake-and-have-it-too dilemma by
providing an interface for compiler writers to plug in their own
garbage collector; this run-time interface thing is the most
interesting and novel aspect of C--. Putting an interface like that
in my language is a possibility, but I recently realized (from talking
with my co-workers and from reading the aforementioned LLVM-backed GHC
paper) that LLVM
now offers such an interface, too. So maybe my story can be "not
my problem; use the LLVM GC interface".
Caylee Hogg asked what the framework for semantics preservation would be. I think I'd want to mechanize a proof of semantics preservation in a proof assistant and then extract the compiler from the proof, like y' do. Of the various proof assistant options, Coq is the only one I've tried, and I'm not married to it; I'm willing to invest in learning something new. More generally, though, I need to pin down exactly what "semantics preservation" means in this setting. A lot of authors seem to use "type-preserving compilation" and "semantics-preserving compilation" interchangeably, so is type preservation what I want, or do I want something more? Or something less? I've always been fuzzy on this point, and I'd like to get very, very clear on it before embarking on a proof.
So now I'd like to pick your brain. Who should I be talking to? What related work should I be looking at? What potential pitfalls have I not considered? I'd love to know what you think!
A record/replay framework provides the possibility to execute a program once and replay the execution of the program an arbitrary number of times. The main challenge in the record phase is to capture all sources of non-determinism (e.g., the schedule in a multi-threaded program). In the replay phase, this information is used to reconstruct the original (recorded) behavior of the program.
There are various levels of granularity at which information is recorded. For example, the recorder can log every executed instruction. One advantage of such a fine-grained recording strategy is that all information is available to replay the application deterministically. The disadvantage, however, is that recording slows down the program execution approximately two orders of magnitude. Such a large slowdown renders recording of programs with large workloads impractial. Fine-grained record/replay frameworks are therefore used for debugging programs rather than testing a program. Fine-grained record/replay frameworks are usually implemented with dynamic binary translation tool (e.g., PIN (http://www.pintool.org/) or FastBT (http://nebelwelt.net/?id=47)).
Our work takes a different approach in that we do not record (results of) individual instructions but events that introduce non-determinism to the program. We monitor a program (without rewriting code) and record all events that (1) provide an input to the program and (2) can affect the control-flow of the program. A list of events is described in the next paragraph – “Challenges”. The “Solution” paragraph explains how we eliminate/record events. Finally, the “Implementation” paragraph highlight the most important implementation details.
Challenges
Here is a list of sources of non-determinism we are considering so far. The list is probably not complete. So if you have an item to add, please let me know.
Address space randomization (ASR)
Signals
Schedule/interleaving of parallel programs
Instructions (e.g., rdts, rdmsr)
System calls (e.g., gettimeofday())
Solution
To eliminate the non-determinism that is introduced by the OS/hardware we propose the following strategies:
ASR: can be disabled by the OS. E.g., in Linux system, sysctl -w kernel.randomize_va_space=0 disables ASR
Signals: To be able to replay the original program execution as concise as possible, signals should be delivered at the exact spot as in the recording phase. This is a non-trivial problem since signals are, in general, asynchronous; a process can receive a signal at an arbitrary program point, and based on the signal, change the control flow. We use hardware performance counters to count the number of retired conditional branches in the recording phase and count the value down in the replay. When the counter reaches zero,the replay-process is stopped and single-steps to the eip where the original signal was received and sends the signal. This way signalling of multi-(threaded/process) applications can replayed exactly.
Scheduling: Multi-threaded programs that share memory can introduce non-determinism due to data-races and deadlocks. Trying to replay parallel programs deterministically either requires modifications to the kernel and/or language or runtime systems. All these efforts are still quite “academic” and time will show which approach is most promising. To avoid “parallel problems” we serialize the execution of parallel programs by letting only one thread/process run at a time. More specifically, the recorder determines which process runs and logs the schedule, which is used in the replayer to implement the same schedule. Well, now you might think that we are already in the area of parallelism and we are building a recorder that serializes my parallel program. In fact I think that many programs will remain mostly sequential, or apply only a couple of threads. As a result serializing a parallel program will slow down the recording/replay by a modest factor. The reasoning behind my assumption is that many problems are inherently sequential, or at least large parts of it. Keeping Amdahl’s Law in mind, many programs will use a couple of threads only.
Instructions: There are some (x86) instructions that introduce non-determinism. For example, the rdtsc reads the current time, which is given by a 64-bit value. Clearly, this value will differ in the record/replay phase. Since rdtsc is a common instruction we must make sure that the values are the same in the record/replay. In a Linux environment rdtsc can be virtualized by using the fcntl system call. The rdmsr instruction is used to access the PMU (Performance Monitoring Unit). Most data that can be gathered by the PMU is non-deterministic (e.g., cache hit/miss). Currently, we do not handle the rdmsr instruction specifically. However, since most applications do not make use of PMU data we do not consider this as a major limitation of our approach.
System calls: There are many system calls that provide input to the application. E.g., read, gettimeofday(), or sockets can give different varying (from run-to-run) inputs to the target application. Therefor, the recorder logs all inputs in the record phase, and the replayer injects the recorded data in the replay phase. This way we can guarantee that the executions in the record/replay do not differ due to system calls.
Implementation
The current system is implemented for a 32-bit Linux-based system. Subsequently, you can find a list of the most important implementation/design decisions:
The monitoring process(recorder/replayer) is implemented as a separate process. The reason for using separate address spaces is that the execution of the original (not-monitored) program should be influenced as llittle as possible.
We use ptrace to stop at system call entries/exits and to intercept signals.
We virtuaize every system call that uses file descriptors (including sockets)
Only system calls that must be executed are actually executed (e.g., mmap)
I will add more implementation details over time. The project is still in an experimental phase and things might change over time. I’ll keep you informed.
When I last wrote
about my work on Rust a month ago, I had just finished
implementing some very basic support for extending Rust objects with new methods. In the grand tradition of computer science publishing, the example program I gave was one that very carefully avoided doing any of the things that didn't work yet. The most egregious issue was that if you extended an object with new methods, you could, um, no longer call its original methods. (Really. Well, I had to start somewhere.) Even once that was fixed, a lot of things were wrong. Outer methods couldn't refer to inner ones, since object extension didn't play nicely with self. And method overriding didn't work at all.
In the last month, I've made all of those things work to some extent. For instance, programs like thesetwo from the Rust test suite that exercise the features I just mentioned can now compile, run, and exit normally, having behaved as expected. (And do, on three platforms, whenever someone lands a change to the compiler.) Hooray!
A lot of work remains to be done on the object system, of course, including a pretty huge issue that I've been putting off fixing. Everyone loves a brainteaser, right? Consider the following Rust program:
use std;
fn main() {
obj cat() {
fn ack() -> str {
ret "ack";
}
fn meow() -> str {
ret "meow";
}
fn zzz() -> str {
ret self.meow();
}
}
auto shortcat = cat();
auto longcat = obj() {
fn lol() -> str {
ret "lol";
}
fn nyan() -> str {
ret "nyan";
}
with shortcat
};
assert (shortcat.zzz() == "meow");
assert (longcat.zzz() == "meow");
}
Here we've got an object with three methods,
shortcat, that's created by calling the
cat() object constructor. When the call to
shortcat.zzz() is made, we look up zzz in
shortcat's vtable and run the code associated with it, which directs us to look up meow in
shortcat's vtable and run the code associated with it. Finally, that code returns the string "meow", and the first assertion succeeds.
Now consider the call to longcat.zzz(). The
longcat object was created by extending shortcat with two new
methods, lol and nyan. Neither one of the new
methods overrides any of shortcat's methods, and neither one should interfere with zzz or meow.
So we would expect the call to longcat.zzz() to return
"meow", too. Instead, in the current buggy implementation, it returns something else, and the assertion fails.
So, the brainteaser: can you guess what longcat.zzz() returns
instead of "meow", and why? (Note that the program compiles with no errors or warnings, doesn't segfault when run, and doesn't raise any errors under Valgrind.)
Scroll down for the answer and an explanation...
First, note that longcat has five entries in its
vtable, appearing in alphabetical order: ack,
lol, meow, nyan,
zzz. Of those, lol and nyan
are "normal" entries, and the others are "forwarding functions" that, roughly speaking, forward us along to the
appropriate vtable entries from shortcat. (Implementing
this forwarding behavior has taken me much of the last month.) So,
when we call longcat.zzz(), we get forwarded along to
shortcat's vtable entry for zzz, which contains the compiled code for
ret self.meow().
This is where the problem arises: the code we're looking at was
compiled under the assumption that self is shortcat, since of course that's what it was when shortcat's vtable was created. shortcat's vtable only has three methods,
ack, meow, and zzz, in that
order. The call to self.meow() therefore compiled
to, roughly, "do whatever the second slot in my vtable says to
do". But since self is now longcat, the second slot in
its vtable is not meow but lol. So longcat.zzz() returns
"lol", and the assertion fails.
Interestingly, as I mentioned, this code runs cleanly under Valgrind -- no memory corruption or anything like that. Furthermore, the assertion might have accidentally succeeded if I'd chosen names for the methods that alphabetize differently. So this is a truly nasty
bug.
To fix the mismatch, we need the self in
shortcat.zzz() to have the same type as
shortcat does. By "same type", I mean that its vtable
must have the same number of methods, with the same names, in the same
order, and having the same types as the original. But self is supposed to be longcat at that point, so how are we going to manage to give it the same type as shortcat? The answer, unsurprisingly, is another level of indirection. We give self a vtable that has three
backwarding functions, one for each of the original three
methods ack, meow, and zzz, which point to the corresponding entries in longcat's
vtable. From there, of course, they will forward right back to
shortcat's vtable -- unless longcat has overridden one of them, in which case we'd use the overriding entry from longcat. (There's no overriding happening in this example, but we need a solution that would work in case there were.) With luck, I'll land this backwarding behavior in Rust sometime in the next week. Stay tuned for the next exciting episode of Lindsey Implements An Object System!
This morning, I have made available a public version of the viewsource code. Viewsource was originally a web tool to output a pretty-printed version of the Dehydra objects for simple snippets of source code. Since I found it a useful idea for debugging, I also added dumps for the now-defunct Pork rewriting toolset, JSHydra, and now, finally Clang.
The odd tool out here is clang, as it has neither a debug-useful XML output nor a simple JS representation that can report the output, which requires me to make the tool a two-stage process: the first stage is a compiler plugin that figures out from clang::RecursiveASTVisitor which functions and classes it needs to worry about dumping out, and then writes out the code for the second stage, another compiler plugin that actually dumps the generated AST information to the console. In other words, I have a plugin to write a plugin that is used to dump information to write plugins.
Eventually, I hope to be more complete in the information I can dump out (particularly type locations and type information), but this is complete enough to be at least somewhat useful for finding interesting things, such as understanding what the location information actually refers to. As I work on this tool more and more, this should enable me to find and fix many bugs in DXR.
I am pleased to announce the third release of DXR. Compared to the previous release, the UI has been tweaked and several bugs have been noticed and fixed. In particular, I have improved the support of links, and fixed the bug that prevented the type hierarchy from being properly realized. I have also introduced support for indexing the IDL code in Mozilla's codebase, as well as indexing generated code in general. I have also added yet another tree, comm-central.
Compared to the original implementation of DXR by Dave Humphrey, there remains one unimplemented feature, namely support for the callgraph. My plan for the next release will focus on implementing this feature as well as any other bugs I uncover via real-world testing. The other major features for the next release will be a JSON query API for data and improved support for multiple trees.
In a more general viewpoint, the plugin architecture has been much improved. There is now support for viewing code-coverage data in the source tree (I do not have a visible demo of this yet), and the support for IDL is entirely a separate plugin. Implementing these plugins has caused me to realize that there is need for better hooks in the current architecture (especially with regards to the generated HTML), and these will also be forthcoming in the next release.
The final point to make is that I have moved the viewsource code into its own repository, as well as adding support for a limited subset of the clang AST. The website does not properly support clang at this time owing to problems in the server's current toolchain configuration; when these issues are resolved, I will be demonstating the tool better.
I've been playing around with the idea of replacing mork for leveldb, the two being at roughly the same "no feature database" level. The sanest way to do a decent head-to-head comparison involves picking the least powerful wrapping around this API (the message folder cache) and then reimplementing it twice. Actually, I originally intended to try implementing mdb with a leveldb backend for a drop-in replacement to mork. Five minutes successfully dissuaded me, as the mdb interfaces require you to implement about five interfaces to get to "open a database".
For realistic testing, I grabbed a real database and instrumented all of the accesses in real use cases (i.e., startup and shutdown). Since the real data is what I love to call my Profile from Hell™, there's enough data that I can get some reasonable statistical accuracy in timing. I'm also testing what is a path of note in Thunderbird startup, so I made sure to test cold startup, which, incidentally, is very annoying. A 1 second test takes several seconds to reload xpcshell.
The first performance results were very surprising to me. I expected that not touching the knobs would probably result in a mild performance regression (mork is fast by virtue of choosing the "space" option in the space/time tradeoff). What I wasn't expecting was something that was an order of magnitude slower. I eventually got off my butt and hooked up jprof to see what was going wrong. Unfortunately, I forgot that jprof's visualization of output results sucks, so I traipsed back to my very old post to pull down my script to turn the dumb HTML file into a more manageable dot file (which needed fixing, since the only commit to jprof in 3 years changed the output syntax on me).
Clearly, the output file implicates JS as being the hotpath. Since this is xpcshell and I had to hack the jprof initialization in, I decided to move the jprof initialization to the execution point to eliminate the effect of parsing my ~20K-line input file (I did say it loads a lot of data). When I last used jprof for serious performance tracking, I noted that the main regression appeared to be caused by XPCThrower::ThrowBadResult (about 60%, apparently). So here, the main regression appears to be... you guessed it, JS exceptions (the method names have changed in 3 years, though).
That is truly unexpected, since I'm not supposed to be actually changing any API. After looking back at the code and inspecting much more deeply, I found out that I actually was changing API. You see, the nsIMsgFolderCacheElement has two ways to access properties: as strings or as integers. Accessing integers clearly throws an error if the value isn't there; accessing strings also appears to throw an error. Assuming this to be the case, I explicitly threw an error in both cases in my new implementation. It turns out that the current implementation actually doesn't throw for strings (I'm not sure if it's intended to or not). Fixing that greatly improved the times. A brief summary of the real results is that mork is faster on startup, even on general property access, and slower on shutdown, only the last of which is not surprising.
In summary: throwing JS exceptions, at least via XPCOM, is very, very, very slow. If that code is a potential hotpath, just say no to exceptions.
One of the hardest parts about writing a compiler is producing correct location information. Well, it is very easy to get the precise location of a token (most of the time, in any case), but the hard part is assigning token locations to actual AST productions, since they are composed of several tokens. Take a simple function definition: void foo(int bar) {}. Nearly every token in that definition can be argued to be the right location of the function. Hence showing only line information is common for compilers: in most normal usage, you get the same information no matter which token you pick. However, there are times when the actual column number is crucial: if you need to correlate something to the original source code (e.g., for rewriting or for DXR), you want that column number to be spot on.
Now let's consider pathological cases. Suppose we define that function in a templated class. Do I want the location of the template or the location of the template instantiation? The former is often better, but there are times you want the latter. But you can also make the function in a macro: #define FUNCTION(name) void name(int bar) {} and FUNCTION(foo). Do I want the location within the macro definition or the location in the macro instantiation? Macros let you be really evil: #define FUNCTION(name) void start##name(int bar) {}. Now which location do I want? Oh yeah, and while we're on the topic, there is also the issue of autogenerated code (e.g., I want the location to be what it was in the original file). So which location do I use?
There are two orthogonal issues here: which token do I use in the "final" version of the text, and which version of the text do I grab it from. There are several versions of the text (think of the lexer not as a single unit but instead of a chain of steps that pass the information between them); we don't care about some of the intermediate representations (e.g., what the code looks like after replacing trigraphs). The lowest level of these is the position in the last form of text, just before being parsed for real. It's also important to track the position through repeated invocations of macros (most of the time). We also want the location in the text file as it is when fed to the compiler to begin with, as well as the position in the original file before it is fed to whatever output the C/C++ source file.
Clang gives us some of this information for any SourceLocation. The source of the actual token is the spelling location. On the other hand, the instantiation location is the location of the outermost macro invocation to generate the production. It is possible to get the tree of locations by repeatedly calling getImmediateSpellingLocation, which only looks down one level of invocation. To get the result after following #line directives, you need to use the presumed location, which looks at the instantiation location. It is possible to pass in the spelling location to getPresumedLoc to produce the correct results, though. Unfortunately, precise location information goes out the window if your macro uses # or ## (it goes into "scratch space" whose line numbers don't correlate well enough to the original source code for me to make sense of).
The other direction is one of the possible methods to get the source location in the class. A function declaration merely has 6 of these: the start and end of the declaration, the location of the name identifier, the location of the type specifier, the location after the template stuff, and the location of the right brace. I would go in more detail, but I haven't yet catalogued all of the ways to get this information to be sure that the method names actually mean what they claim to mean.
Over the last week or so, I made more improvements to the task system in Rust than I realized. The graph below shows the current performance curve for the pfib benchmark in red, and the orange one shows the results from last week for comparison. In the interest of time and avoiding frustration dealing with a certain well-known spreadsheet application, I’m only including the benchmarks for Mac OS X.
There are a couple of observations here. First, tasks seem to be about twice as fast as they were before. Secondly, the red curve is longer than the orange one.
Last week there was one master scheduler lock. In order to get things working sooner, I was very conservative about when to grab this lock, which meant a lot of things were serialized that didn’t need to be. Over the last week, I’ve been reducing the number of times we need to grab the scheduler lock to the point where now we basically only hold it when we’re doing scheduler-related things like creating tasks, switching tasks, etc. Making this work safely required a couple more locks. One is on ports, and the other is on tasks. Fortunately, these are much smaller in scope, which reduces the overall amount of contention.
For the moment, we’re not using the cool lock-free message queue system we had in rustboot. We may resurrect this in the future though to maximize our scalability.
The next change is that we can create a lot more tasks than we could before. Rust is supposed to have dynamic stack growth, but this is currently unimplemented. Instead, we fixed each task’s stack size at a value that seemed to work. Last week it was 1 MB. Unfortunately, our compiler has been known to exceed that from time to time, so Patrick moved it up to 2 MB. This causes problems with pfib, because this test creates a huge number of tasks. Since each task gets its own stack, we can quickly exhaust the 32-bit virtual address space. Thus, I added a RUST_MIN_STACK environment variable. We still use the default of 2 MB, which works for most programs. If you need more or less, however, you can set the stack size now. The benchmarks in this post were run using a 16 KB stack. I’ve seen it do up to the 25th Fibonacci number, but it takes a really long time at that point, so I only went up to 22 for this post.
There is still a lot more room for optimization. Creating and destroying tasks still requires a global scheduler lock. For a test like pfib, this means we can only really use one core at a time. Moving to per-thread task queues and adding work stealing would let us use a local lock to create and destroy tasks, and hopefully allow us to use more cores.
For the time being, though, I’m going to try to implement some more interesting parallel programs.
I found a deterministic hardware performance counter event on a Sandy Bridge architecture. The event counts retired conditional branches. That’s great news, since we can use this event to find the exact spot to deliver a signal in the replayer by counting down the retired conditional branch instructions that were recorded.
I am pleased to announce a second alpha release of DXR. Several improvements have been made, including changes to the UI as well as better cooperation with macros. A complete list of changes can be found at the end. As this is the second release, I have expanded the live demo to include not one but two separate indexing trees:
clang and mozilla-central, both of which are current as of last night.
The advantage of DXR is that it sees the source code by instrumenting the compiler at build time, so it is able to understand classes most of whose methods are defined by macros, like Clang's RecursiveASTVisitor. Like any source code browser, it is possible to click on an identifier and then go see information about the identifier, such as its points of declaration, or its definition. Unlike LXR and MXR, however, it knows what type of identifier it's working with, so you won't be offered the definition of BuiltinType::getKind if you look up the getKind method of an Expr object.
Another new feature is support for plugins, to allow people to use different compilers, add in support for more languages, or even augment the UI with new features like capturing line-coverage information. The current support is rudimentary, but there is already active work on a dehydra-based indexer.
A list of new features:
Improved search speed at runtime, and dropped glimpse as a prerequisite
The web UI works in newer versions of Firefox
Support for plugins has been added
Disk space requirements have been radically reduced
Setup configuration is easier
Link extents have been fixed in many cases
Declarations can now be seen along with definitions
For the past few days, it appears that Google Groups has stopped mirroring newsgroups properly; this included the various Mozilla mailing lists. It does, however, appear to be forwarding posted messages, so if you send a message via the UI and you don't see anything show up... don't panic, and don't repost it 5 times.
I don't know how long Google Groups will be down, but, in the meantime, you can use your favorite newsreader to read the newsgroups. Heck, Thunderbird has a built in news client. If you wish to use mozilla mailing lists, just add in the server news.mozilla.org. You can then happily see everything that you would be able to see with Google Groups, if it were working.
Task support in Rust has been moving right along. Since about a week ago when we could run 8 infinite loops at a time, we’ve not gotten to the point where we can very inefficiently calculate Fibonacci Numbers in parallel. The task system is also being used for more useful things, like building an asynchronous IO system for Rust.
Tasks in Rust are meant to serve both as a fundamental unit of isolation and for doing things concurrently, asynchronously or in parallel. We expect them to be used frequently, which means they need to be fast and cheap. So far my focus has been on making them work, which means being conservative about grabbing locks (where conservative means grabbing locks more often than needed to err on the side of caution). Now it’s time to start making this quicker. To do this, I need to measure things, so I can say “Look, I changed X and that improved performance by Y%.”
Here are some graphs for a very simple benchmark. It calculates Fibonacci Numbers by spawning a task for each recursive call. The X axis shows which Fibonacci Number we’re calculating, and the Y axis shows the average number of microseconds per task spawned. I ran each number 10 times, and the line connects all the average values. The Windows and Linux benchmarks were done on a Lenovo ThinkPad with a 1.73 GHz Core i7 Processor, and the Mac benchmarks were done on a MacBook Pro with a 2.3 GHz Core i7 Processor.
Oh, great. I’m hitting the extended metaphors already.
The studio apartment that I’m living in for the summer is quite nice, but it came with the dullest knives I have ever used. It’s common to hear that sharp knives are safer than dull knives, but I’m not sure that I agree. Sharp knives are slightly more dangerous to carry around, or wash, or even to look at (though if you get injured looking at a knife, your looking technique needs a lot of work). The advantage that sharp knives have is that they cut things.
If, upon discovering your knives are dull, you decide not to eat fresh produce, then dull knives are, in fact, quite safe, right up until you die of scurvy. On the other hand, if you decide that you really need chopped onions, you’ll wind up moving your dull knives in a sort of mashing motion that’s kind of like a parody of cutting. (Also, you’ll have to stop repeatedly to wipe the tears from your eyes so that you can see.) This is dangerous. It’d be more dangerous to do with a sharp knife, but you don’t, because you could just cut things with a sharp knife.
As you can probably tell, I’m thinking about type safety here.
Most things that people are interested in doing are inherently a little bit unsafe. But if you’re using a crippled tool (even if it was crippled in the name of safety), you’ll wind choosing between (a) doing things the convenient yet appallingly dangerous way, and (b) not actually doing anything (at least at any noticeable speed). For this reason, adding a restriction to a language can endanger users, by forcing them to use unsafe constructs and workarounds.
Rust’s type system’s strength doesn’t lie in having fancy features to prevent array bounds issues or integer overflow. As type systems go, it’s fairly traditional. Its strength is that the type-related features of the language are designed so that the user doesn’t need to (and can’t) subvert the type system. The main guarantee that Rust provides over C++ isn’t the avoidance of some particular kind of error, it’s that when the type system says an expression has type τ, the value it produces actually has type τ. It’s like having a knife rack that doesn’t drop your knives on your feet.
I'm going to go out on a limb and guess that the GNU autotools are one of the most heavily used build systems in existence. What I'm not going to guess is that, as a build system, it sucks. I will grant you that C and C++ code in particular are annoying to compile by virtue of the preprocessor mechanics, and I will also grant that autotools can be useful in trying to work out the hairiness of working on several close-but-not-quite-the-same platforms. But that doesn't justify why you need to make a build system that is so bad.
One of the jobs of autotools (the configure script in particular) is to figure out which compilers to use, how to invoke them, and what capabilities they support. For example, is charsigned or unsigned? Or does the compiler support static_assert(expr) or need to resort to extern void func(int arg[expr ? 1 : -1])? Language standards progress, and, particularly in the case of C++, compilers can be slow to correct their implementations to the language.
The reason I bring this up is because I discovered today that my configure failed because my compiler "didn't produce an executable." Having had to deal with this error several times (my earlier work with dxr spent a lot of time figuring out how to manipulate $CXX and still get a workable compiler), I immediately opened up the log, expecting to have to track down configure guessing the wrong file is the executable (last time, it was a .sql file instead of .o). No, instead the problem was that the compiler had crashed (the reason essentially boiling down to std::string doesn't like being assigned NULL). That reason, this is the program that autoconf uses to check that the compiler works:
(I literally copied it from mozilla's configure script, go look around line 1861 if you don't believe me). That is a problem because it's not legal C99 code. No, seriously, autoconf has decided to verify that my C compiler is working by relying on a feature so old that it's been removed (not deprecated) in a 12-year old specification. While I might understand that there are some incredibly broken compilers out there, I'm sure this line of code is far more likely to fail working compilers than the correct code would be, especially considering that it is probably harder to write a compiler to accept this code than not except it. About the only way I can imagine writing this "test" program to make it more failtastic is to use trigraphs (which is legal C code that gcc does not honor by default). Hey, you could be running on systems that don't have a `#' key, right?
Addendum: Okay, yes, I'm aware that the annoying code is a result of autoconf2.13 and that the newest autoconfs don't have this problem. In fact, after inspecting some source history (I probably have too much time on my hands), the offending program was changed by a merge of the experimental branch in late 1999. But the subtler point, which I want to make clearer, is that the problem with autoconf is that it spends time worrying about arcane configurations that the projects who use them probably don't even support. It also wraps the checks for these configurations in scripts which render the actual faults incomprehensible, including "helpfully" cleaning up after itself so you can't actually see the offending command lines and results. Like, for example, the fact that your compiler never produced a .o file to begin with.
As you may know, I wrote JavaScript in ten days. JS was born under the shadow of Java, and in spite of support by marca and Bill Joy, JS in 1995 was essentially a one-man show.
I had a bit of help, even at the start, that I’d like to acknowledge again. Ken Smith, a Netscape acquiree from Borland, ported JDK 1.0-era java.util.Date (we both just drafted off of the Java truck, per management orders; we did not demur from the Y2K bugs in that Java class). My thanks also to Netscape 2′s front-end hackers, chouck, atotic, and garrett for their support. EDIT: can’t forget spence on the X front end!
That was 1995. Engine prototype took ten days in May. Bytecode compiler and interpreter from the start, because Netscape had a server-side JS product in the works. The rest of the year was browser integration, mainly what became known as “DOM level 0″. Only now standardized in HTML 5 and Anne’s wg. Sentence fragments here show my PTSD from that sprint :-/.
In 1996 I finally received some needed assistance from RRJ, who helped port David M. Gay and Guy Steele’s dtoa.c and fix date/time bugs.
Also in summer 1996, nix interned at Netscape while a grad student at CMU, and wrote the first LiveConnect. I am still grateful for his generous contributions in wide-ranging design discussions and code-level interactions.
At some point in late summer or early fall 1996, it became clear to me that JS was going to be standardized. Bill Gates was bitching about us changing JS all the time (some truth to it; but hello! Pot meet Kettle…). We had a standards guru, Carl Cargill, who knew Jan van den Beld, then the Secretary-General of ECMA (now Ecma). Carl steered our standardization of JS to ECMA.
Joining ECMA and participating in the first JS standards meeting was an eye-opener. Microsoft sent a B-team, and Borland and a company called NOMBAS also attended. “Success has many fathers” was the theme. The NOMBAS founder greeted me by saying “oh, we’ve been doing JavaScript for *years*”. I did not see how that could be the case, unless JS meant any scripting language with C-based syntax. I had not heard of NOMBAS before then.
At that first meeting, I think I did well enough in meta-debate against the Microsoft team that they sent their A-team to the next meeting. This was all to the good, and Microsoft in full-blooded compete mode, but also with individual initiative beyond the call of corporate duty by Shon Katzenberger, materially helped create ES1. Sun contributed Guy Steele, who is composed of pure awesome. Guy even brought RPG for fun to a few meetings (Richard contributed ES1 Clause 4).
Meanwhile, in fall 1996, I was under some pressure from Netscape management to write a proto-spec for JS, but that was not something I could do while also maintaining the “Mocha” engine all by myself in both shipping and future Netscape releases, along with all of the DOM code.
This was a ton of work, and on top of it I had to pay off substantial technical debt that I had willingly taken on in the first year. So I actually stayed home for two weeks to rewrite Mocha as the codebase that became known asSpiderMonkey, mainly to get it done (no other way), also to go on a bit of a strike against the Netscape management team that was still underinvesting in JS. This entailed garbage collection and tagged values instead of slower reference-counting and fat discriminated union values.
Also in fall 1996, chouck decided to join me as the second full-time JS team-mate. He and I did some work targeting the (ultimately ill-fated) Netscape 4 release. This work was ahead of its time. We put the JS engine in a separate thread from the “main thread” in Netscape (still in Mozilla). This allowed us to better overlap JS and HTML/CSS/image computations, years ahead of multicore. You could run an iloop in JS and the “slow script dialog” seamlessly floated above it, allowing you to stop the loop or permit it to continue.
After summer 1996 and the start of ECMA-262 standardization, Netscape finally invested more in JS. Clayton Lewis joined as manager, and hired Norris Boyd, who ended up creating Rhino from SpiderMonkey’s DNA transcoded to Java. This was ostensibly because Netscape was investing in Java on the server, in particular in an AppServer that wanted JS scripting.
I met shaver for the first time in October 1996 at Netscape’s NY-based Developer Conference, where he nimbly nerd-blocked some Netscape plugin API fanboys and saved me from having to digress from the main thing, which was increasingly JS.
Some time in 1997, shaver contributed “LiveConnect 2″, based on more mature Java reflection APIs not available to nix in 1996. Clayton hired shaver and the JS team grew large by end of 1997, when I decided to take a break from JavaScript (having delivered ES1 and ES2) and join the nascent mozilla.org.
I handed the keys to the JS kingdom to Waldemar Horwat, now of Google, in late 1997. Waldemar did much of the work on ES3, and threw his considerable intellect into JS2/ES4 afterwards, but without overcoming the market power and stalling tactics of Microsoft.
True story: Waldemar’s Microsoft nemesis on TC39 back then, at the time a static language fan who hated JS, has come around and now endorses JS and dynamic languages.
Throughout all of this, I maintained module ownership of SpiderMonkey.
Fast-forward to 2008. After a great (at the time) Firefox 3 release where @shaver and I donned the aging super-hero suits one more time to compete successfully on interpreter performance against JavaScriptCore in WebKit, Andreas Gal joined us for the summer in which we hacked TraceMonkey, which we launched ahead of Chrome and V8.
A note on V8: I’d learned of it in 2006, when I believe it was just starting. At that point there was talk about open-sourcing it, and I welcomed the idea, encouraging any of: hosting on code.google.com, hosting without any pressure to integrate into Firefox on mozilla.org (just like Rhino), or hosting with an integration plan to replace SpiderMonkey in Firefox. I had to disclose that another company was about to release their derived-from-JS engine to Mozilla, but my words included “the more the merrier”. It was early days as far as JS JITs were concerned.
V8 never open-sourced in 2006, and stealthed its way to release in September 2008. This may have been a prudent move by Google to avoid exciting Microsoft. Clearly, in 1995, the “Netscape + Java kills Windows” talk from Netscape antagonized Microsoft. I have it on good authority that a Microsoft board member wrote marca at the end of 1995 warning “you’ve waved the cape in the bull’s face — prepare to get the horns!” One could argue that Chrome in 2008 was the new red cape in the bull’s face, which begot IE9 and Chakra.
Whatever Google’s reasoning, keeping V8 closed-source for over two years hurt JS in this sense: it meant Apple and Mozilla had to climb the JIT learning curves on their own (at first; then finally with the benefit of being able to inspect V8 sources). Sure, the Anamorphic work on Self and Smalltalk was somewhat documented, and I had learned it in the ’90s, in part with a stint on loan from Netscape to Sun when they were doing due dliigence in preparation for acquiring Anamorphic. But the opportunity to build on a common engine codebase was lost to path dependence.
On the upside, different competing open source engines have demonstrably explored a larger design space than one engine codebase could under consolidated management.
In any event, the roads not taken in JS’s past still give me pause, because similar roads lie ahead. But the past is done, and once we had launched TraceMonkey, and Apple had launched SquirrelFish Extreme, the world had multiple proofs along with the V8 release that JS was no longer consigned to be “slow” or “a toy”, as one referee dismissed it in rejecting a PLDI submission from Andreas in 2006.
You know the rest: JS performance has grown an order of magnitude over the last several years. Indeed, JS still has upside undreamed of in the Java world where 1% performance win is remarkable. And, we are still at an early stage in studying web workloads, in order to synthesize credible benchmarks. On top of all this, the web is still evolving rapidly, so there are no stable workloads as far as I can tell.
Around the time TraceMonkey launched, Mozilla was lucky enough to hire Dave Mandelin, fresh from PhD work at UCB under Ras Bodik.
The distributed, open source Mozilla JS team delivered the goods in Firefox 4, and credit goes to all the contributors. I single Dave out here because of his technical and personal leadership skills. Dave is even-tempered, super-smart, and a true empirical/skeptical scientist in the spirit of my hero, Richard Feynman.
So it is with gratitude and more than a bit of relief, after a very long 16 years in full, 13 years open source, that I’m announcing the transfer of SpiderMonkey’s module ownership to @dmandelin.
I am pleased to announce an alpha release of DXR built on top of Clang. A live demo of DXR can be found at http://dxr.mozilla.org/clang/, which is an index of a relatively recent copy of the Clang source code. Since this is merely an alpha release, expect to find bugs and inconsistencies in the output. For more information, you can go to #static on irc.mozilla.org or contact the staticanalysis mailing list. A list of most of the bugs I am aware of is at the end of this blog post.
So what is DXR? DXR is a smart code browser that works by using instrumented compilers to use what the compiler knows about the code to provide a database of the code. For C and C++ in particular, using an instrumented compiler is necessary, since it is the only reliable way to fix the issue of macros. Take, for instance, RecursiveASTVistor in the Clang codebase. Most of the almost 1200 functions are defined via macros as opposed to in raw code; as a consequence, the doxygen output for this class is useless: as far as I can tell, there are only five methods I can override to visit AST nodes. On the other hand, DXR neatly tells me all of the methods that are defined, and can point me to the place where that function is defined (within the macro, of course).
Where can you get the code? DXR is available both as a github repository (use the dxr-clang branch) and as a Mercurial repository. Instructions on how to use can be found on the wiki page.
The following is a list of known problems:
Links occur at odd boundaries
Some lines have id="l234"/> prepended
Non-root installs (i.e., installing to http://dxr.mozilla.org/clang/) cause issues. Interestingly, refreshing the page often causes things to work.
There is a long list of scrolling text when compiling code. Ignore it.
HTML generation produces IndexErrors
.csv files are created in the source directory and HTML code is generated.
Inheritance searches don't match the full hierarchy, only one or two levels.
(My posts with the tag mozilla,
such as this one, are now being syndicated onto Planet Mozilla Research along with those of my colleagues. Yay!)
This week will have been my fourteenth week (!) on the Rust team at Mozilla. Fourteen
weeks is about the length of a normal internship, so now seems like a
good time to take stock of what I've accomplished so far. When I left off writing about Rust in April,
I'd just spent a couple of weeks learning my way around PLT Redex and getting started
on a Redex model of Rust. Working in Redex was a lot of fun, but I felt rather disconnected from the rest of the team, who were
all hacking on, you know, the actual implementation of Rust.
So it was good to get back to working on Rust proper at the end of
those two weeks. Of course, Rust is changing so fast that it's easy
to lose track of what's going on if you stop paying attention for even
a moment, so when I came back to Rust it was another week or so of catching up before
I was really ready to start writing code again.
If you're new here, Rust is a new systems programming language
pursuing the trifecta of safe, concurrent, and
fast. It has an expressive type system, pattern matching,
concurrency primitives, and a lightweight, structural object system.
My project for the summer is focused on the object system and type system: I'm
designing and implementing support for "self" expressions in Rust,
which involves having a notion of self-types, or "the type of the
current object". Figuring out an appropriate type for the current
object is surprisingly subtle, especially in the presence of object
extension. For instance, suppose that you add some more methods or
fields to an instance of an object, then call a method on the extended
object. If the method you call happens to return self,
what's the type of that return value? Is it the type of the extended
object, or the type of the original one? Ideally, it would be the
type of the extended object, which is a more precise type, but some
languages, like Java, aren't able to statically determine that the
returned value has the more precise type. (Apparently, it's possible
to fake
that behavior using generics, but it ain't pretty.) It would be
great if Rust could do better.
Since object extension is one of the things that makes determining
the type of self hard, I decided to work on object
extension first. This alone has taken a pretty long time. Rust
compiles to the LLVM intermediate language, and I was forced to
confront my ignorance of how our translation to LLVM works.1 As it turns out, it's pretty hairy -- I spent at least a couple of the last eight weeks
doing nothing but staring at the translation pass of the compiler and
adding hundreds of lines of comments to it. But I can finally report that as of
yesterday, Rust supports extending an object with new methods! Here's
a complete, if rather uninteresting, Rust program that uses the new
feature:
use std;
fn main() {
obj normal() {
fn foo() -> int { ret 2; }
}
auto my_normal_obj = normal();
// Extending an object with a new method
auto my_anon_obj = obj {
fn bar() -> int {
ret 3;
}
with my_normal_obj
};
assert (my_normal_obj.foo() == 2);
assert (my_anon_obj.bar() == 3);
auto another_anon_obj = obj {
fn baz() -> int {
ret 4;
}
with my_anon_obj
};
assert (another_anon_obj.baz() == 4);
}
Here, the expression we're assigning to my_anon_obj is
what we're calling an "anonymous object expression" -- it takes
my_normal_obj, which is an instance of the object
normal, adds a new method to it, and evaluates to a
brand-new object. (Note the with my_normal_obj part.)
It's not necessary to call any kind of constructor -- the new object
simply springs into being. This seems obvious in retrospect, but it
caused me a lot of confusion when I was trying to figure out how to
translate anonymous object expressions. When normal objects such as
normal are translated, the translation is side-effecting:
it causes an object constructor function to end up in the generated
executable. Translating anonymous object expressions, on the other
hand, doesn't do that -- it produces an object "inline", and no
constructor is generated. The resulting extended object is still a
bona fide object, though, like one you'd get from a constructor, so it
can be extended again with more methods, producing
another_anon_obj.
There's still a lot of work to do here -- as you can see, I'm not
doing anything with self yet. Also conspicuously absent
are calls like, say, my_anon_obj.foo(), which
should work fine but which doesn't yet. To do that, I need
to put so-called "forwarding slots" in the anonymous object's vtable
that forward method calls to the appropriate methods in the object it
extended. And we also want to support field extension -- so those
things are all on my plate for the rest of the summer. I have nine weeks to go -- we'll see how far I get!
I'm also excited about continuing to work on Rust when the summer
is over. By August, I think I'll be in an unusual position as someone
who's familiar with the Rust compiler internals, but who has a
theoretical background and wants to do theoretical/foundational work. One thing I'm wondering about, for instance, is how well Rust's
translation to LLVM preserves the semantics of types. My guess is
that since many Rust types compile to one LLVM type, the translation is not particularly semantics-preserving -- so does a
semantics-preserving encoding exist in the LLVM type system, and if not, how much more
sophisticated would LLVM's type system have to be to make it possible?
(And a semantics for LLVM doesn't exist yet, which by itself is a huge
open problem that I think a lot of
people would like to have a solution for.)
In the more immediate term, though, I'm taking tomorrow off work to compete in the ICFP Programming Contest, which starts today at 5 p.m. local time and goes until 5 p.m. on Sunday. This will be the third year that Alex oniugnip and I have competed
as Team K&R2. As soon as our long weekend of hacking is over, Alex is headed off to a conference in Portland. He'll have something like three hours between the time the contest ends and the time his flight leaves, which meets our usual standards for ridiculousness. Who else is playing?
In fact, I didn't know anything about the LLVM compiler infrastructure, so I started by reading the chapter about LLVM from the recently published Architecture of Open Source Applications book. I highly recommend this book; everything I've read in it has been good, and it's available free online. They're also accepting contributions of more chapters.
For various reasons, I want to get the extent of a particular expression in clang. Most AST objects in clang have a method along the lines of getSourceRange(), which, on inspection, returns start and end locations for the object in question. Naturally, it's the perfect fit (more or less), so I start using it. However, it turns out that the documentation is a bit of a liar. I would expect the end location to return the location (either file offset or file/line) of the end of the expression, give or take a character depending on whether people prefer half-open or fully closed ranges. Instead, it returns the location of the last token in the expression. Getting the true last token requires this mess: sm.getFileOffset(Lexer::getLocForEndOfToken(end, 0, sm, features)). Oh yeah, in case you couldn't have guessed, that causes it to relex the file starting at that point. For an idea of my current mood, pick a random image from Jennifer's recent post and you'll likely capture it.
It's been almost a week since I last discussed DXR, but, if you look at the commit log, you can see that I've not exactly been laying back and doing nothing. No, the main reason is because most of the changes I've been doing so far aren't exactly ground-breaking.
In terms of UI, the only thing that's changed is that I know actually link the variable declarations correctly, a result of discovering the typo that was causing it not to work properly in the midst of the other changes. From a user's perspective, I've also radically altered the invocation of DXR. Whereas before it involved a long series of steps consisting of "build here, export these variables, run this script, build there, run that script, then run this file, and I hope you did everything correctly otherwise you'll wait half an hour to find you didn't" (most of which takes place in different directories), it's now down to . dxr/setup-env.sh (if you invoke that incorrectly, you get a loud error message telling you how to do it correctly. Unfortunately, the nature of shells prohibits me from just doing the right thing in the first place), build your code, and then python dxr/dxr-index.py in the directory with the proper dxr.config. Vastly improved, then. But most of my changes are merely in cleaning up the code.
The first major change I made was replacing the libclang API with a real clang plugin. There's only one thing I've found harder (figuring out inheritance, not that I'm sure I've had it right in the first place), and there are a lot of things I've found easier. Given how crazy people get with build systems, latching onto a compiler makes things go a lot smoother. The biggest downside I've seen with clang is that it's documentation is lacking. RecursiveASTVisitor is the main visitor API I'm using, but the documentation for that class is complete and utter crap, a result of doxygen failing to handle the hurdle of comprehending macros. I ended up using the libindex-based dxr to dump a database of all of the functions in the class, of which there are something like 1289, or around 60 times the functions listed in the documentation. Another example is Decl, where the inheritance picture is actually helpful. It, however, manages to have failed to document DeclCXX.h, which is a rather glaring omission if what you are working with is C++ source.
The last set of changes I did was rearchitecting the code to make it hackable by other people. I have started on a basic pluggable architecture for actually implementing new language modules, although most of the information is still hardcoded to just use cxx-clang. In addition, I've begun work on ripping out SQL as the exchange medium of choice: the sidebar list is now directly generated using the post-processing ball of information, and linkification is now set up so that it can be generated independent of tokenization. In the process, I've greatly sped up HTML generation times only to regress a lot of it by tokenizing the file twice (the latter part will unfortunately remain until I get around to changing tokenization API). It shouldn't take that much longer for me to rip SQL fully out of the HTML builder and shove SQL generation into a parallel process for improved end-to-end time.
Some things I've discovered about python along the way. First, python closures don't let you mutate closed-over-variables. That breaks things. Second, if you have a very large python object, don't pass it around as an argument to multiprocessing: just make it a global and let the kernel's copy-on-write code make cloning cheap. Third, apparently the fastest way to concatenate strings in python is to use join with a list comprehension. Finally, python dynamic module importing sucks big time.
Where am I going from here? I'll have HTMLification (i.e., remove SQL queries and replace with ball lookups) fixed up tomorrow; after that, I'll make the cgi scripts work again. The next major change after that is getting inheritance and references in good shape, and then making sure that I can do namespaces correctly. At that point in time, I'll think I'll be able to make a prerelease of DXR.
This is perhaps a bit belated, but I thought I'd give a brief overview of where I think fakeserver should be going over the next several months (i.e., when I find time to work on it). For those who don't know, Fakeserver is a generalized testing framework for the 4 major mail protocols: IMAP, NNTP, SMTP, and POP. For various reasons, I'm actually getting around to fixing problems with it right now. The following are the features I am going to be adding:
This is a long-standing bug in fakeserver, which I first discovered about 3 years ago, when I started testing the IMAP implementation. Fakeserver uses the same connection handler for every connection, which means the state of the connection is shared between all connections, obviously very problematic for a protocol as stateful as IMAP. In my defense, I cribbed from the stateless (more or less) HTTP server, and used the barely-stateful NNTP as my first testbed. This bug is probably the most API-breaking of any change to fakeserver code: every invocation of the server must replace new nsMailServer(new handler(daemon)) with new nsMailServer(function (d) { return new handler(d); }, daemon). Subsequent to that, handlers are no longer easily exposed by the server, which means all manipulation must be on the daemon end. This causes major changes to SMTP code and NNTP code for different reasons; pretty much everyone else hides between helper functions that act as a firewall to minimizing code changes.
This change (which has no patch yet) will move the fakeserver running into another process. I have some working prototypes for manipulating objects across two different processes, but I don't yet have a good IPC mechanism (which Mozilla strangely lacks) for actually communicating this information. As for API changes, I haven't gotten far enough yet to know the impact of multiprocess fakeserver, but the answer will probably that handler information goes from "hard to get" to "impossible to get". On the plus side, this should essentially enable mozmill tests to be able to use fakeserver.
I figure fakeserver needs SSL support--it's common in the mail world to use SSL instead of plaintext, so we need to support it. In terms of external API, the only change will be some added methods to start up an SSL server (and methods to get STARTTLS working to, provided my knowledge of how SSL works is correct). The API pseudo-internal to maild.js gets more convoluted, though: essentially, I need to support multiple pumping routines for communication. On the plus side, though, if the API I'm planning on implementing turns out to be feasible, we should also be able to get generic charset decoding for nearly free, for i18n tests.
LDAP address book code is almost completely untested. I'd like to see that fixed. However, LDAP is a binary protocol (BER, to be precise). If I can implement the SSL support in the API framework I want to, then it shouldn't be that much more to get a BER-to-text-ish layer thrown in on top of it. The downside is that I'm pretty much looking at using our own LDAP library to do BER encoding/decoding since I don't want to write that myself.
A primary goal of DXR is to be easy to use. Like any simple statement, translating that into design decisions is inordinately difficult, and I best approach such issues by thinking out loud. Normally, I do this in IRC channels, but today I think it would be best to think in a louder medium, since the problem is harder.
There are three distinct things that need to be made "easy to use". The first is the generation of the database and the subsequent creation of the web interface. The second is extension of DXR to new languages, while the last is the customization of DXR to provide more information. All of them have significant issues.
Building with DXR
Starting with the first one, build systems are complicated. For a simple GNU-ish utility, ./configure && make is a functioning build system. But where DXR is most useful is on the large, hydra-like projects where figuring out how to build the program is itself a nightmare: Mozilla, OpenJDK, Eclipse, etc. There is also a substantial number of non-autoconf based systems which throw great wrenches in everything. At the very least, I know this much about DXR: I need to set environment variables before configuring and building (i.e., replace the compiler), I need to "watch" the build process (i.e., follow warning spew), and I need to do things after the build finishes (post-processing). Potential options:
Tell the user to run this command before the build and after the build. On the plus side, this means that DXR needs to know absolutely nothing about how the build system works. On the down side, this requires confusing instructions: in particular, since I'm setting environment variables, the user has to specifically type ". " in the shell to get them set up
properly. There are a lot of people who do not have significant shell exposure to actually understand why that is necessary, and general usage is different enough from the commands that people are liable to make mistakes doing so.
Guess what the build system looks like and try to do it all by ourselves. This is pretty much the opposite extreme, in that it foists all the work on DXR. If your program is "normal", this won't be a problem. If your program isn't... it will be a world of pain. Take also into consideration that any automated approach is likely to fail hard on Mozilla code to begin with, which effectively makes this a non-starter.
Have the user input their build system to a configuration file and go from there. A step down from the previous item, but it increases the need for configuration files.
Have DXR spawn a shell for the build system. Intriguing, solves some problems but causes others.
Conclusion: well, I don't like any of those options. While the goal of essentially being able to "click" DXR and have it Just Work™ is nice, I have reservations about such an approach being able to work in practice. I think I'll go for a "#1 and punt on this issue to someone with more experience."
Multiple language
I could devote an entire week's worth of blog posts to this topic, I think, and I would wager that this is more complicated and nebulous than even build systems are. In the end, all we really need to worry about with build systems is replacing compilers with our versions and getting to the end; with languages, we actually need to be very introspective and invasive to do our job.
Probably the best place to start is actually laying out what needs to be done. If the end goal is to produce the source viewer, then we need to at least be able to do syntax highlighting. That by itself is difficult, but people have done it before: I think my gut preference at this point is to basically ask authors of DXR plugins to expose something akin to vim's syntax highlighting instead of asking them to write a full lexer for their language.
On the other end of the spectrum is generating the database. The idea is to use an instrumenting compiler, but while that works for C++ or Java, someone whose primary code is a website in Perl or a large Python utility has a hard time writing a compiler. Perhaps the best option here is just parsing the source code when we walk the tree. There is also the question about what to do with the build system: surely people might want help understanding what it is their Makefile.in is really doing.
So what does the database look like? For standard programming languages, we appear to have a wide-ranging and clear notion of types/classes, functions, and variables, with slightly vaguer notions of inheritance, macros (in both the lexical preprocessing sense and the type-based sense of C++'s templates), and visibility. Dynamic languages like JavaScript or Python might lack some reliable information (e.g., variables don't have types, although people often still act as if they have implicit type information), but they still uphold this general contract. If you consider instead things like CSS and HTML or Makefiles in the build system, this general scheme completely fails to hold, but you can still desire information in the database: for example, it would help to be able to pinpoint which CSS rules apply to a given HTML element.
This begs the question, how does one handle multiple languages in the database? As I ponder this, I realize that there are multiple domains of knowledge: what is available in one language is not necessarily available in another. Of the languages Mozilla uses, C, assembly, C++, and Objective-C[++] all share the same ability to access any information written in the other languages; contrast this to JS code, which can only interact with native code via the use of a subset of IDL or interactions with native functions. IDL is a third space, which is a middle ground between native and JS code, but is insufficiently compatible with either to be lumped in with one. Options:
Dump each language into the same tables with no distinction. This has problems in so far as some languages can't be shoehorned into the same models, but I think that in such cases, one is probably looking for different enough information anyways that it doesn't matter. The advantage of this is that searching for an identifier will bring it up everywhere. The disadvantage... is that it gets brought up everywhere.
Similar to #1, but make an extra column for language, and let people filter by language.
Going a step further, take the extra language information and build up the notion of different bindings: this foo.bar method on a python object may be implemented by this Python_foo_bar C binding. In other words, add another table which lists this cross-pollination and takes it into account when searching or providing detailed information<.
Instead of the language column in tables, make different tables for every language.
Instead of tables, use databases?
Hmm. I think the binding cross-reference is important. On closer thought, it's not really languages themselves that are the issue here, it's essentially the target bindings: if we have a system that is doing non-trivial build system work that involves cross-compiling, it matters if what we are doing is being done for the host or being done for the target. Apart from that, I think right now that the best approach is to have different tables.
Extraneous information
The previous discussion bleeds into this final one, since they both ultimately concern themselves with one thing: the database. This time, the question is how to handle generation of information beyond the "standard" set of information. Information, as I see it, comes in a few forms. There is additional information at the granularity of identifiers (this function consumes 234 bytes of space or this is the documentation for the function), lines (this line was not executed in the test suite), files (this file gets compiled to this binary library), and arguably directories or other concepts not totally mappable to constructs in the source tree (e.g., output libraries).
The main question here is not on the design of the database: it's only a question of extra tables or extra columns (or both!). Instead, the real question is in the design of the programmatic mechanisms. In dxr+dehydra, the simple answer is to load multiple scripts. For dxr+clang, however, the question becomes a lot more difficult since the code is written in C++ and isn't dynamically loading modules like dehydra does. It also begins to beg the question of the exposed API. On the other hand, I'm not sure I know enough of the problem space to be able to actually come up with solutions. I think I'll leave this one for later