Mike HommeyAnalyzing git-cinnabar memory use

In previous post, I was looking at the allocations git-cinnabar makes. While I had the data, I figured I’d also look how the memory use correlates with expectations based on repository data, to put things in perspective.

As a reminder, this is what the allocations look like (horizontal axis being the number of allocator function calls):

There are 7 different phases happening during a git clone using git-cinnabar, most of which can easily be identified on the graph above:

  • Negotiation.

    During this phase, git-cinnabar talks to the mercurial server to determine what needs to be pulled. Once that is done, a getbundle request is emitted, which response is read in the next three phases. This phase is essentially invisible on the graph.

  • Reading changeset data.

    The first thing that a mercurial server sends in the response for a getbundle request is changesets. They are sent in the RevChunk format. Translated to git, they become commit objects. But to create commit objects, we need the entire corresponding trees and files (blobs), which we don’t have yet. So we keep this data in memory.

    In the git clone analyzed here, there are 345643 changesets loaded in memory. Their raw size in RawChunk format is 237MB. I think by the end of this phase, we made 20 million allocator calls, have about 300MB of live data in about 840k allocations. (No certainty because I don’t actually have definite data that would allow to correlate between the phases and allocator calls, and the memory usage change between this phase and next is not as clear-cut as with other phases). This puts us at less than 3 live allocations per changeset, with “only” about 60MB overhead over the raw data.

  • Reading manifest data.

    In the stream we receive, manifests follow changesets. Each changeset points to one manifest ; several changesets can point to the same manifest. Manifests describe the content of the entire source code tree in a similar manner as git trees, except they are flat (there’s one manifest for the entire tree, where git trees would reference other git trees for sub directories). And like git trees, they only map file paths to file SHA1s. The way they are currently stored by git-cinnabar (which is planned to change) requires knowing the corresponding git SHA1s for those files, and we haven’t got those yet, so again, we keep everything in memory.

    In the git clone analyzed here, there are 345398 manifests loaded in memory. Their raw size in RawChunk format is 1.18GB. By the end of this phase, we made 23 million more allocator calls, and have about 1.52GB of live data in about 1.86M allocations. We’re still at less than 3 live allocations for each object (changeset or manifest) we’re keeping in memory, and barely over 100MB of overhead over the raw data, which, on average puts the overhead at 150 bytes per object.

    The three phases so far are relatively fast and account for a small part of the overall process, so they don’t appear clear-cut to each other, and don’t take much space on the graph.

  • Reading and Importing files.

    After the manifests, we finally get files data, grouped by path, such that we get all the file revisions of e.g. .cargo/.gitignore, followed by all the file revisions of .cargo/config.in, .clang-format, and so on. The data here doesn’t depend on anything else, so we can finally directly import the data.

    This means that for each revision, we actually expand the RawChunk into the full file data (RawChunks contain patches against a previous revision), and don’t keep the RawChunk around. We also don’t keep the full data after it was sent to the git-cinnabar-helper process (as far as cloning is concerned, it’s essentially a wrapper for git-fast-import), except for the previous revision of the file, which is likely the patch base for the next revision.

    We however keep in memory one or two things for each file revision: a mapping of its mercurial SHA1 and the corresponding git SHA1 of the imported data, and, when there is one, the file metadata (containing information about file copy/renames) that lives as a header in the file data in mercurial, but can’t be stored in the corresponding git blobs, otherwise we’d have irrelevant data in checkouts.

    On the graph, this is where there is a steady and rather long increase of both live allocations and memory usage, in stairs for the latter.

    In the git clone analyzed here, there are 2.02M file revisions, 78k of which have copy/move metadata for a cumulated size of 8.5MB of metadata. The raw size of the file revisions in RawChunk format is 3.85GB. The expanded data size is 67GB. By the end of this phase, we made 622 million more allocator calls, and peaked at about 2.05GB of live data in about 6.9M allocations. Compared to the beginning of this phase, that added about 530MB in 5 million allocations.

    File metadata is stored in memory as python dicts, with 2 entries each, instead of raw form for convenience and future-proofing, so that would be at least 3 allocations each: one for each value, one for the dict, and maybe one for the dict storage ; their keys are all the same and are probably interned by python, so wouldn’t count.

    As mentioned above, we store a mapping of mercurial to git SHA1s, so for each file that makes 2 allocations, 4.04M total. Plus the 230k or 310k from metadata. Let’s say 4.45M total. We’re short 550k allocations, but considering the numbers involved, it would take less than one allocation per file on average to go over this count.

    As for memory size, per this answer on stackoverflow, python strings have an overhead of 37 bytes, so each SHA1 (kept in hex form) will take 77 bytes (Note, that’s partly why I didn’t particularly care about storing them as binary form, that would only save 25%, not 50%). That’s 311MB just for the SHA1s, to which the size of the mapping dict needs to be added. If it were a plain array of pointers to keys and values, it would take 2 * 8 bytes per file, or about 32MB. But that would be a hash table with no room for more items (By the way, I suspect the stairs that can be seen on the requested and in-use bytes is the hash table being realloc()ed). Plus at least 290 bytes per dict for each of the 78k metadata, which is an additional 22M. All in all, 530MB doesn’t seem too much of a stretch.

  • Importing manifests.

    At this point, we’re done receiving data from the server, so we begin by dropping objects related to the bundle we got from the server. On the graph, I assume this is the big dip that can be observed after the initial increase in memory use, bringing us down to 5.6 million allocations and 1.92GB.

    Now begins the most time consuming process, as far as mozilla-central is concerned: transforming the manifests into git trees, while also storing enough data to be able to reconstruct manifests later (which is required to be able to pull from the mercurial server after the clone).

    So for each manifest, we expand the RawChunk into the full manifest data, and generate new git trees from that. The latter is mostly performed by the git-cinnabar-helper process. Once we’re done pushing data about a manifest to that process, we drop the corresponding data, except when we know it will be required later as the delta base for a subsequent RevChunk (which can happen in bundle2).

    As with file revisions, for each manifest, we keep track of the mapping of SHA1s between mercurial and git. We also keep a DAG of the manifests history (contrary to git trees, mercurial manifests track their ancestry ; files do too, but git-cinnabar doesn’t actually keep track of that separately ; it just relies on the manifests data to infer file ancestry).

    On the graph, this is where the number of live allocations increases while both requested and in-use bytes decrease, noisily.

    By the end of this phase, we made about 1 billion more allocator calls. Requested allocations went down to 1.02GB, for close to 7 million live allocations. Compared to the end of the dip at the beginning of this phase, that added 1.4 million allocations, and released 900MB. By now, we expect everything from the “Reading manifests” phase to have been released, which means we allocated around 620MB (1.52GB – 900MB), for a total of 3.26M additional allocations (1.4M + 1.86M).

    We have a dict for the SHA1s mapping (345k * 77 * 2 for strings, plus the hash table with 345k items, so at least 60MB), and the DAG, which, now that I’m looking at memory usage, I figure has the one of the possibly worst structure, using 2 sets for each node (at least 232 bytes per set, that’s at least 160MB, plus 2 hash tables with 345k items). I think 250MB for those data structures would be largely underestimated. It’s not hard to imagine them taking 620MB, because really, that DAG implementation is awful. The number of allocations expected from them would be around 1.4M (4 * 345k), but I might be missing something. That’s way less than the actual number, so it would be interesting to take a closer look, but not before doing something about the DAG itself.

    Fun fact: the amount of data we’re dealing with in this phase (the expanded size of all the manifests) is close to 2.9TB (yes, terabytes). With about 4700 seconds spent on this phase on a real clone (less with the release branch), we’re still handling more than 615MB per second.

  • Importing changesets.

    This is where we finally create the git commits corresponding to the mercurial changesets. For each changeset, we expand its RawChunk, find the git tree we created in the previous phase that corresponds to the associated manifest, and create a git commit for that tree, with the right date, author, and commit message. For data that appears in the mercurial changeset that can’t be stored or doesn’t make sense to store in the git commit (e.g. the manifest SHA1, the list of changed files[*], or some extra metadata like the source of rebases), we keep some metadata we’ll store in git notes later on.

    [*] Fun fact: the list of changed files stored in mercurial changesets does not necessarily match the list of files in a `git diff` between the corresponding git commit and its parents, for essentially two reasons:

    • Old buggy versions of mercurial have generated erroneous lists that are now there forever (they are part of what makes the changeset SHA1).
    • Mercurial may create new revisions for files even when the file content is not modified, most notably during merges (but that also happened on non-merges due to, presumably, bugs).
    … so we keep it verbatim.

    On the graph, this is where both requested and in-use bytes are only slightly increasing.

    By the end of this phase, we made about half a billion more allocator calls. Requested allocations went up to 1.06GB, for close to 7.7 million live allocations. Compared to the end of the previous phase, that added 700k allocations, and 400MB. By now, we expect everything from the “Reading changesets” phase to have been released (at least the raw data we kept there), which means we may have allocated at most around 700MB (400MB + 300MB), for a total of 1.5M additional allocations (700k + 840k).

    All these are extra data we keep for the next and final phase. It’s hard to evaluate the exact size we’d expect here in memory, but if we divide by the number of changesets (345k), that’s less than 5 allocations per changeset and less than 2KB per changeset, which is low enough not to raise eyebrows, at least for now.

  • Finalizing the clone.

    The final phase is where we actually go ahead storing the mappings between mercurial and git SHA1s (all 2.7M of them), the git notes where we store the data necessary to recreate mercurial changesets from git commits, and a cache for mercurial tags.

    On the graph, this is where the requested and in-use bytes, as well as the number of live allocations peak like crazy (up to 21M allocations for 2.27GB requested).

    This is very much unwanted, but easily explained with the current state of the code. The way the mappings between mercurial and git SHA1s are stored is via a tree similar to how git notes are stored. So for each mercurial SHA1, we have a file that points to the corresponding git SHA1 through git links for commits or directly for blobs (look at the output of git ls-tree -r refs/cinnabar/metadata^3 if you’re curious about the details). If I remember correctly, it’s faster if the tree is created with an ordered list of paths, so the code created a list of paths, and then sorted it to send commands to create the tree. The former creates a new str of length 42 and a tuple of 3 elements for each and every one of the 2.7M mappings. With the 37 bytes overhead by str instance and the 56 + 3 * 8 bytes per tuple, we have at least 429MB wasted. Creating the tree itself keeps the corresponding fast-import commands in a buffer, where each command is going to be a tuple of 2 elements: a pointer to a method, and a str of length between 90 and 93. That’s at least another 440MB wasted.

    I already fixed the first half, but the second half still needs addressing.

Overall, except for the stupid spike during the final phase, the manifest DAG and the glibc allocator runaway memory use described in previous posts, there is nothing terribly bad with the git-cinnabar memory usage, all things considered. Mozilla-central is just big.

The spike is already half addressed, and work is under way for the glibc allocator runaway memory use. The manifest DAG, interestingly, is actually mostly useless. It’s only used to track the heads of the DAG, and it’s very much possible to track heads of a DAG without actually storing the entire DAG. In fact, that’s what git-cinnabar already does for changeset heads… so we would only need to do the same for manifest heads.

One could argue that the 1.4GB of raw RevChunk data we’re keeping in memory for later user could be kept on disk instead. I haven’t done this so far because I didn’t want to have to handle temporary files (and answer questions like “where to put them?”, “what if there isn’t enough disk space there?”, “what if disk access is slow?”, etc.). But the majority of this data is from manifests. I’m already planning changes in how git-cinnabar stores manifests data that will actually allow to import them directly, instead of keeping them in memory until files are imported. This would instantly remove 1.18GB of memory usage. The downside, however, is that this would be more CPU intensive: Importing changesets will require creating the corresponding git trees, and getting the stored manifest data. I think it’s worth, though.

Finally, one thing that isn’t obvious here, but that was found while analyzing why RSS would be going up despite memory usage going down, is that git-cinnabar is doing way too many reallocations and substring allocations.

So let’s look at two metrics that hopefully will highlight the problem:

  • The cumulated amount of requested memory. That is, the sum of all sizes ever given to malloc, realloc, calloc, etc.
  • The compensated cumulated amount of requested memory (naming is hard). That is, the sum of all sizes ever given to malloc, calloc, etc. except realloc. For realloc, we only count the delta in size between what the size was before and after the realloc.

Assuming all the requested memory is filled at some point, the former gives us an upper bound to the amount of memory that is ever filled or copied (the amount that would be filled if no realloc was ever in-place), while the the latter gives us a lower bound (the amount that would be filled or copied if all reallocs were in-place).

Ideally, we’d want the upper and lower bounds to be close to each other (indicating few realloc calls), and the total amount at the end of the process to be as close as possible to the amount of data we’re handling (which we’ve seen is around 3TB).

… and this is clearly bad. Like, really bad. But we already knew that from the previous post, although it’s nice to put numbers on it. The lower bound is about twice the amount of data we’re handling, and the upper bound is more than 10 times that amount. Clearly, we can do better.

We’ll see how things evolve after the necessary code changes happen. Stay tuned.

Air MozillaMarch Privacy Lab: Cryptographic Engineering for Everyone 3.22.17

March Privacy Lab: Cryptographic Engineering for Everyone 3.22.17 Our March speaker is Justin Troutman, creator of PocketBlock - a visual, gamified curriculum that makes cryptographic engineering fun. It's suitable for everyone from an...

Emma HumphriesLinking to GitHub issues from bugzilla.mozilla.org

I wanted to draw your attention to a lovely, new feature in BMO which went out with this week's push: auto-linking to GitHub issues.

Now, in a Bugzilla bug's comments, if you reference a GitHub issue, such as mozilla-bteam/bmo#26, Bugzilla converts that to a link to the issue on GitHub.

References to GitHub Issues now become links

This will save you some typing in the future, and if you used this format in earlier comments, they'll be linkified as well.

Thanks to Sebastin Santy for his patch, and Xidorn Quan for the suggestion and code review.

If you come across a false positive, please file a bug against bugzilla.mozilla.org::General.

The original bug: 1309112 - Detect and linkify GitHub issue in comment



comment count unavailable comments

Air MozillaBugzilla Project Meeting, 22 Mar 2017

Bugzilla Project Meeting The Bugzilla Project developers meeting.

Air MozillaWeekly SUMO Community Meeting Mar. 22, 2017

Weekly SUMO Community Meeting Mar. 22, 2017 This is the sumo weekly call

Mike HommeyWhen the memory allocator works against you, part 2

This is a followup to the “When the memory allocator works against you” post from a few days ago. You may want to read that one first if you haven’t, and come back. In case you don’t or didn’t read it, it was all about memory consumption during a git clone of the mozilla-central mercurial repository using git-cinnabar, and how the glibc memory allocator is using more than one would expect. This post is going to explore how/why it’s happening.

I happen to have written a basic memory allocation logger for Firefox, so I used it to log all the allocations happening during a git clone exhibiting the runaway memory increase behavior (using a python that doesn’t use its own allocator for small allocations).

The result was a 6.5 GB log file (compressed with zstd ; 125 GB uncompressed!) with 2.7 billion calls to malloc, calloc, free, and realloc, recorded across (mostly) 2 processes (the python git-remote-hg process and the native git-cinnabar-helper process ; there are other short-lived processes involved, but they do less than 5000 calls in total).

The vast majority of those 2.7 billion calls is done by the python git-remote-hg process: 2.34 billion calls. We’ll only focus on this process.

Replaying those 2.34 billion calls with a program that reads the log allowed to reproduce the runaway memory increase behavior to some extent. I went an extra mile and modified glibc’s realloc code in memory so it doesn’t call memcpy, to make things faster. I also ran under setarch x86_64 -R to disable ASLR for reproducible results (two consecutive runs return the exact same numbers, which doesn’t happen with ASLR enabled).

I also modified the program to report the number of live allocations (allocations that haven’t been freed yet), and the cumulated size of the actually requested allocations (that is, the sum of all the sizes given to malloc, calloc, and realloc calls for live allocations, as opposed to what the memory allocator really allocated, which can be more, per malloc_usable_size).

RSS was not tracked because the allocations are never filled to make things faster, such that pages for large allocations are never dirty, and RSS doesn’t grow as much because of that.

Full disclosure: it turns out the “system bytes” and “in-use bytes” numbers I had been collecting in the previous post were smaller than what they should have been, and were excluding memory that the glibc memory allocator would have mmap()ed. That however doesn’t affect the trends that had been witnessed. The data below is corrected.

(Note that in the graph above and the graphs that follow, the horizontal axis represents the number of allocator function calls performed)

While I was here, I figured I’d check how mozjemalloc performs, and it has a better behavior (although it has more overhead).

What doesn’t appear on this graph, though, is that mozjemalloc also tells the OS to drop some pages even if it keeps them mapped (madvise(MADV_DONTNEED)), so in practice, it is possible the actual RSS decreases too.

And jemalloc 4.5:

(It looks like it has better memory usage than mozjemalloc for this use case, but its stats are being thrown off at some point, I’ll have to investigate)

Going back to the first graph, let’s get a closer look at what the allocations look like when the “system bytes” number is increasing a lot. The highlights in the following graphs indicate the range the next graph will be showing.

So what we have here is a bunch of small allocations (small enough that they don’t seem to move the “requested” line ; most under 512 bytes, so under normal circumstances, they would be allocated by python, a few between 512 and 2048 bytes), and a few large allocations, one of which triggers a bump in memory use.

What can appear weird at first glance is that we have a large allocation not requiring more system memory, later followed by a smaller one that does. What the allocations actually look like is the following:


void *ptr0 = malloc(4850928); // #1391340418
void *ptr1 = realloc(some_old_ptr, 8000835); // #1391340419
free(ptr0); // #1391340420
ptr1 = realloc(ptr1, 8000925); // #1391340421
/* ... */
void *ptrn = malloc(879931); // #1391340465
ptr1 = realloc(ptr1, 8880819); // #1391340466
free(ptrn); // #1391340467

As it turns out, inspecting all the live allocations at that point, while there was a hole large enough to do the first two reallocs (the second actually happens in place), at the point of the third one, there wasn’t a large enough hole to fit 8.8MB.

What inspecting the live allocations also reveals, is that there is a large number of large holes between all the allocated memory ranges, presumably coming from previous similar patterns. There are, in fact, 91 holes larger than 1MB, 24 of which are larger than 8MB. It’s the accumulation of those holes that can’t be used to fulfil larger allocations that makes the memory use explode. And there aren’t enough small allocations happening to fill those holes. In fact, the global trend is for less and less memory to be allocated, so, smaller holes are also being poked all the time.

Really, it’s all a straightforward case of memory fragmentation. The reason it tends not to happen with jemalloc is that jemalloc groups allocations by sizes, which the glibc allocator doesn’t seem to be doing. The following is how we got a hole that couldn’t fit the 8.8MB allocation in the first place:


ptr1 = realloc(ptr1, 8880467); // #1391324989; ptr1 is 0x5555de145600
/* ... */
void *ptrx = malloc(232); // #1391325001; ptrx is 0x5555de9bd760 ; that is 13 bytes after the end of ptr1.
/* ... */
free(ptr1); // #1391325728; this leaves a hole of 8880480 bytes at 0x5555de145600.

All would go well if ptrx was free()d, but it looks like it’s long-lived. At least, it’s still allocated by the time we reach the allocator call #1391340466. And since the hole is 339 bytes too short for the requested allocation, the allocator has no other choice than request more memory to the system.

What’s bothering, though, is that the allocator chose to allocate ptrx in the space following ptr1, when it allocated similarly sized buffers after allocating ptr1 and before allocating ptrx in completely different places, and while there are plenty of holes in the allocated memory where it could fit.

Interestingly enough, ptrx is a 232 bytes allocation, which means under normal circumstances, python itself would be allocating it. In all likeliness, when the python allocator is enabled, it’s allocations larger than 512 bytes that become obstacles to the larger allocations. Another possibility is that the 256KB fragments that the python allocator itself allocates to hold its own allocations become the obstacles (my original hypothesis). I think the former is more likely, though, putting back the blame entirely on glibc’s shoulders.

Now, it looks like the allocation pattern we have here is suboptimal, so I re-ran a git clone under a debugger to catch when a realloc() for 8880819 bytes happens (the size is peculiar enough that it only happened once in the allocation log). But doing that with a conditional breakpoint is just too slow, so I injected a realloc wrapper with LD_PRELOAD that sends a SIGTRAP signal to the process, so that an attached debugger can catch it.

Thanks to the support for python in gdb, it was then posible to pinpoint the exact python instructions that made the realloc() call (it didn’t come as a surprise ; in fact, that was one of the places I had in mind, but I wanted definite proof):


new = ''
end = 0
# ...
for diff in RevDiff(rev_patch):
    new += data[end:diff.start]
    new += diff.text_data
    end = diff.end
    # ...
new += data[end:]

What happens here is that we’re creating a mercurial manifest we got from the server in patch form against a previous manifest. So data contains the original manifest, and rev_patch the patch. The patch essentially contains instructions of the form “replace n bytes at offset o with the content c“.

The code here just does that in the most straightforward way, implementation-wise, but also, it turns out, possibly the worst way.

So let’s unroll this loop over a couple iterations:

new = ''

This allocates an empty str object. In fact, this doesn’t actually allocate anything, and only creates a pointer to an interned representation of an empty string.

new += data[0:diff.start]

This is where things start to get awful. data is a str, so data[0:diff.start] creates a new, separate, str for the substring. One allocation, one copy.

Then appends it to new. Fortunately, CPython is smart enough, and just assigns data[0:diff.start] to new. This can easily be verified with the CPython REPL:

>>> foo = ''
>>> bar = 'bar'
>>> foo += bar
>>> foo is bar
True

(and this is not happening because the example string is small here ; it also happens with larger strings, like 'bar' * 42000)

Back to our loop:

new += diff.text_data

Now, new is realloc()ated to have the right size to fit the appended text in it, and the contents of diff.text_data is copied. One realloc, one copy.

Let’s go for a second iteration:

new += data[diff.end:new_diff.start]

Here again, we’re doing an allocation for the substring, and one copy. Then new is realloc()ated again to append the substring, which is an additional copy.

new += new_diff.text_data

new is realloc()ated yet again to append the contents of new_diff.text_data.

We now finish with:

new += data[new_diff.end:]

which, again creates a substring from the data, and then proceeds to realloc()ate new one freaking more time.

That’s a lot of malloc()s and realloc()s to be doing…

  • It is possible to limit the number of realloc()s by using new = bytearray() instead of new = ''. I haven’t looked in the CPython code what the growth strategy is, but, for example, appending a 4KB string to a 500KB bytearray makes it grow to 600KB instead of 504KB, like what happens when using str.
  • It is possible to avoid realloc()s completely by preallocating the right size for the bytearray (with bytearray(size)), but that requires looping over the patch once first to know the new size, or using an estimate (the new manifest can’t be larger than the size of the previous manifest + the size of the patch) and truncating later (although I’m not sure it’s possible to truncate a bytearray without a realloc()). As a downside, this initializes the buffer with null bytes, which is a waste of time.
  • Another possibility is to reuse bytearrays previously allocated for previous manifests.
  • Yet another possibility is to accumulate the strings to append and use ''.join(). CPython is smart enough to create a single allocation for the total size in that case. That would be the most convenient solution, but see below.
  • It is possible to avoid the intermediate allocations and copies for substrings from the original manifest by using memoryview.
  • Unfortunately, you can’t use ''.join() on a list of memoryviews before Python 3.4.

After modifying the code to implement the first and fifth items, memory usage during a git clone of mozilla-central looks like the following (with the python allocator enabled):

(Note this hasn’t actually landed on the master branch yet)

Compared to what it looked like before, this is largely better. But that’s not the only difference: the clone was also about 1000 seconds faster. That’s more than 15 minutes! But that’s not all so surprising when you know the volumes of data handled here. More insight about this coming in an upcoming post.

But while the changes outlined above make the glibc allocator behavior less likely to happen, it doesn’t totally obliviate it. In fact, it seems it is still happening by the end of the manifest import phase. We’re still allocating increasingly large temporary buffers because the size of the imported manifests grows larger and larger, and every one of them is the result of patching a previous one.

The only way to avoid those large allocations creating holes would be to avoid doing them in the first place. My first attempt at doing that, keeping manifests as lists of lines instead of raw buffers, worked, but was terribly slow. So slow, in fact, that I had to stop a clone early and estimated the process would likely have taken a couple days. Iterating over multiple generators at the same time, a lot, kills performance, apparently. I’ll have to try with significantly less of that.

Mark CôtéConduit's Commit Index

As with MozReview, Conduit is being designed to operate on changesets. Since the end result of work on a codebase is a changeset, it makes sense to start the process with one, so all the necessary metadata (author, message, repository, etc.) are provided from the beginning. You can always get a plain diff from a changeset, but you can’t get a changeset from a plain diff.

Similarly, we’re keeping the concept of a logical series of changesets. This encourages splitting up a unit of work into incremental changes, which are easier to review and to test than large patches that do many things at the same time. For more on the benefits of working with small changesets, a few random articles are Ship Small Diffs, Micro Commits, and Large Diffs Are Hurting Your Ability To Ship.

In MozReview, we used the term commit series to refer to a set of one or more changesets that build up to a solution. This term is a bit confusing, since the series itself can have multiple revisions, so you end up with a series of revisions of a series of changesets. For Conduit, we decided to use the term topic instead of commit series, since the commits in a single series are generally related in some way. We’re using the term iteration to refer to each update of a topic. Hence, a solution ends up being one or more iterations on a particular topic. Note that the number of changesets can vary from iteration to iteration in a single topic, if the author decides to either further split up work or to coalesce changesets that are tightly related. Also note that naming is hard, and we’re not completely satisfied with “topic” and “iteration”, so we may change the terminology if we come up with anything better.

As I noted in my last post, we’re working on the push-to-review part of Conduit, the entrance to what we sometimes call the commit pipeline. However, technically “push-to-review” isn’t accurate, as the first process after pushing might be sending changesets to Try for testing, or static analysis to get quick automated feedback on formatting, syntax, or other problems that don’t require a human to look at the code. So instead of review repository, which we’ve used in MozReview, we’re calling it a staging repository in the Conduit world.

Along with the staging repository is the first service we’re building, the commit index. This service holds the metadata that binds changesets in the staging repo to iterations of topics. Eventually, it will also hold information about how changesets moved through the pipeline: where and when they were landed, if and when they were backed out, and when they were uplifted into release branches.

Unfortunately a simple “push” command, whether from Mercurial or from Git, does not provide enough information to update the commit index. The main problem is that not all of the changesets the author specifies for pushing may actually be sent. For example, I have three changesets, A, B, and C, and pushed them up previously. I then update C to make C′ and push again. Despite all three being in the “draft” phase (which is how we differentiate work in progress from changes that have landed in the mainline repository), only C′ will actually be sent to the staging repo, since A and B already exist there.

Thus, we need a Mercurial or Git client extension, or a separate command-line tool, to tell the commit index exactly what changesets are part of the iteration we’re pushing up—in this example, A, B, and C′. When it receives this information, the commit index creates a new topic, if necessary, and a new iteration in that topic, and records the data in a data store. This data will then be used by the review service, to post review requests and provide information on reviews, and by the autoland service, to determine which changesets to transplant.

The biggest open question is how to associate a push with an existing topic. For example, locally I might be working on two bugs at the same time, using two different heads, which map to two different topics. When I make some local changes and push one head up, how does the commit index know which topic to update? Mercurial bookmarks, which are roughly equivalent to Git branch names, are a possibility, but as they are arbitrarily named by the author, collisions are too great a possibility. We need to be sure that each topic is unique.

Another straightforward solution is to use the bug ID, since the vast majority of commits to mozilla-central are associated with a bug in BMO. However, that would restrict Conduit to one topic per bug, requiring new bugs for all follow-up work or work in parallel by multiple developers. In MozReview, we partially worked around this by using an “ircnick” parameter and including that in the commit-series identifiers, and by allowing arbitrary identifiers via the --reviewid option to “hg push”. However this is unintuitive, and it still requires each topic to be associated with a single bug, whereas we would like the flexibility to associate multiple bugs with a single topic. Although we’re still weighing options, likely an intuitive and flexible solution will involve some combination of commit-message annotations and/or inferences, command-line options, and interactive prompts.

Air MozillaRust Libs Meeting 2017-03-21

Rust Libs Meeting 2017-03-21 Rust Libs Meeting 2017-03-21

Support.Mozilla.OrgGuest post: “That Bug about Mobile Bookmarks”

Hi, SUMO Nation!

Time for a guest blog post by Seburo – one of our “regulars”, who wanted to share a very personal story about Firefox with all of you. He originally posted in on Mozilla’s Discourse, but the more people it reaches, the better. Thank you for sharing, Seburo! (As always, if you want to post something to our blog about your Mozilla and/or SUMO adventures and experiences, let us know.)

Here we go…

 

As a Mozillian I like to set myself goals and targets. It helps me to plan what I would like to do and to ensure that I am constantly focusing on activities that help Mozilla as well as maintain a level of contribution. But under these “public” goals are a number of things that are more long term, that are possible and have been done by many Mozillians, but for me just seem a little out of reach. If you were to see the list, it may seem a little odd and possibly a little egotistical, even laughable, but however impossible some of them are, they serve as a reminder of what I may be able to achieve.

This blog entry is about me achieving one of them…

In the time leading up to the London All-Hands, I had been invited by a fellow SUMO contributor to attend a breakfast meeting to learn more about the plans around Nightly. This clashed with another breakfast meeting between SUMO and Sync to continue to work to improve our support for this great and useful feature of Firefox. Not wanting to upset anyone, I went with the first invite, but hoped to catch up with members of the Sync team during the week.

Having spent the morning better understanding how SUMO fits into the larger corporate structure, I made use of the open time in the schedule to visit the Firefox Homeroom which was based in a basement meeting room, home for the week to all the alchemists and magicians that bring Mozilla software to life. It was on the way back up the stairs that I bumped into Mark from the Firefox Desktop team. Expecting to arrange some time for later in the week, Mark was free to have a chat there and then.

Sync is straightforward when used to connect desktop and mobile versions of Firefox but I wanted to better understand how it would work if a third device was included. It was at the end of the conversation that one of us mentioned about how the bookmarks coming to desktop Firefox could be seen in the Mobile Bookmarks folder in the bookmark drop down menus. But it is not there, which can make it look like your bookmarks have disappeared. Sure, you can open the bookmark library, but this is extra mouse clicks to open a separate tool. Mark suggested that this could be easy to fix and that I should file a bug, a task that duly went in the list of things to do on returning from the week.

A key goal for contributors at an All-Hands is to come back with a number of ways to build upon your ability to contribute in the future and I came back with a long list that took time to work through. The bug was also delayed in filing due to natural pessimism about its chances of success. But I realised…what if we all thought like that? All things that we have done started with someone having an idea that was put forward knowing that other ideas had failed, but they still went ahead regardless.

So I wrote a bug and submitted it and nothing much happened. But after a while there was a spark of activity. Thom from the Sync team had decided to resolve it and seemed to fully understand how this could work. The bug was assigned various flags and it soon became clear to me that work was being done on it. Not having any coding ability, I was not able to provide any real help to Thom aside from positive feedback to an early mock up of how the user experience would look. But to be honest, I was too nervous to say much more. A number of projects I had come back from MozLondon with had fallen through and I did not say anything much that could “jinx it” and it not proceed.

A few months passed after which I started getting copied in on bugmail about code needing review with links to systems I barely knew existed. And there, partway down a page were two words:

Ship It.

I know that these words are not unusual for many people at Mozilla, indeed their very existence is one of the reasons that many staff turn on their computers (the other is probably cat gifs), but for me it was the culmination of something that I never thought would happen. The sobriety of this moment increased with the release of Nightly 54 – I could actually see and use what Thom and Mark had spent time and effort crafting. If you use version 54 (which is currently Firefox Developer Edition) and use Firefox Sync, you should now see a “Mobile Bookmarks” folder in the drop down from the menu bar and from the toolbar. This folder is an easier way for you to access the bookmarks that you have saved on the bus, in the pub, on the train or during that really boring meeting you thought would never end.

I never thought that I would be able to influence the Firefox end product, and I had in a very small way. Whilst full credit should go to Thom and Mark and the Sync team for building this and those who herded and QA’d the bug (never forget these people, their work is vital), credit should also go to the SUMO team for enabling me to be a position to understand the user perspective to help make Sync work for more users. Sync is a great feature of Firefox and one that I hope can be improved and enhanced further.

I sincerely hope that you have enjoyed reading this little story, but I hope that you have learned from it and that those learnings will help you as a contributor. In particular:

  • Have goals, however impossible.
  • Contribute your ideas. Nobody else in the world has the same idea as you and imagines it in the same way.
  • Work outside of your own team, build bridges to other areas.
  • Use Nightly and (if you also use a mobile version of Firefox) use it with Firefox Sync.
  • Be respectful of Mozilla staff as they are at work and they are busy people, but also be prepared to be in awe of their awesomeness.

Whilst this was (I have been told) a simple piece of code, the result for me was to see a feature in Firefox that I helped make happen. Along the way, I have broadened my understanding of the effort that goes into Firefox but I can also see that some of the bigger goals I have are achievable.

There is still so much I want to do.

QMOFirefox 53 Beta 3 Testday Results

Hello Mozillians!

As you may already know, last Friday – March 17th – we held a new Testday event, for Firefox 53 Beta 3.

Thank you all for helping us making Mozilla a better place – Iryna Thompsn, Surentharan and Suren, Jeremy Lam and jaustinlam.

From Bangladesh team: Nazir Ahmed Sabbir | NaSb, Rezaul Huque Nayeem, Md.Majedul islam, Rezwana Islam Ria, Maruf Rahman, Aminul Islam Alvi | AiAlvi, Sayed Mahmud, Mohammad Mosfiqur Rahman, Ridwan, Tanvir Rahman, Anmona Mamun Monisha, Jaber Rahman, Amir Hossain Rhidoy, Ahmed Safa, Humayra Khanum, Sajal Ahmed, Roman Syed, Md Rakibul Islam, Kazi Nuzhat Tasnem, Md. Almas Hossain, Md. Asif Mahmud Apon, Syeda Tanjina Hasan, Saima Sharleen, Nusrat jahan, Sajedul Islam, আল-যুনায়েদ ইসলাম ব্রোহী, Forhad Hossain and Toki Yasir.

From India team: Guna / Skrillex, Subhrajyoti Sen / subhrajyotisen, Pavithra R, Nagaraj.V, karthimdav7, AbiramiSD/@Teens27075637, subash M, Monesh B, Kavipriya.A, Vibhanshu Chaudhary | vibhanshuchaudhary, R.KRITHIKA SOWBARNIKA, HARITHA KAMARAJ and VIGNESH B S.

Results:

– several test cases executed for the WebM Alpha, Compact Themes and Estimated Reading Time features.

– 2 bugs verified: 1324171, 1321472.

– 2 new bugs filed: 1348347, 1348483.

Again thanks for another successful testday! 🙂

We hope to see you all in our next events, all the details will be posted on QMO!

Robert O'CallahanDeterministic Hardware Performance Counters And Information Leaks

Summary: Deterministic hardware performance counters cannot leak information between tasks, and more importantly, virtualized guests.

rr relies on hardware performance counters to help measure application progress, to determine when to inject asynchronous events such as signal delivery and context switches. rr can only use counters that are deterministic, i.e., executing a particular sequence of application instructions always increases the counter value by the same amount. For example rr uses the "retired conditional branches" (RCB) counter, which always returns exactly the number of conditional branches actually retired.

rr currently doesn't work in environments such as Amazon's cloud, where hardware performance counters are not available to virtualized guests. Virtualizing hardware counters is technically possible (e.g. rr works well in Digital Ocean's KVM guests), but for some counters there is a risk of leaking information about other guests, and that's probably one reason other providers haven't enabled them.

However, if a counter's value can be influenced by the behavior of other guests, then by definition it is not deterministic in the sense above, and therefore it is useless to rr! In particular, because the RCB counter is deterministic ("proven" by a lot of testing), we know it does not leak information between guests.

I wish Intel would identify a set of counters that are deterministic, or at least free of cross-guest information leaks, and Amazon and other cloud providers would enable virtualization of them.

This Week In RustThis Week in Rust 174

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community

News & Blog Posts

Crate of the Week

We don't have a Crate of this Week for lack of suggestions. Sorry.

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

117 pull requests were merged in the last week.

New Contributors

  • David Roundy
  • Dawid Ciężarkiewicz
  • Petr Zemek
  • portal
  • projektir
  • Russell Mackenzie
  • ScottAbbey
  • z1mvader

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now. This week's FCPs are:

New RFCs

Style RFCs

Style RFCs are part of the process for deciding on style guidelines for the Rust community and defaults for Rustfmt. The process is similar to the RFC process, but we try to reach rough consensus on issues (including a final comment period) before progressing to PRs. Just like the RFC process, all users are welcome to comment and submit RFCs. If you want to help decide what Rust code should look like, come get involved!

Issues in final comment period:

Other significant issues:

Upcoming Events

If you are running a Rust event please add it to the calendar to get it mentioned here. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

#rustlang is a very strange place sans null deref nor data race it has its own styles but once it compiles it will not blow up in your face

llogiq on Twitter. Check out his Twitter feed for more #rustlang limericks!

Submit your quotes for next week!

This Week in Rust is edited by: nasa42, llogiq, and brson.

The Mozilla BlogHow Do We Connect First-Time Internet Users to a Healthy Web?

Fresh research from Mozilla, supported by the Bill & Melinda Gates Foundation, explores how low-income, first-time smartphone users in Kenya experience the web — and what digital skills can make a difference

 

Three billion of us now share the Internet. But our online experiences differ greatly, depending on geography, gender and income.

For a software engineer in San Francisco, the Internet can be open and secure. But for a low-income, first-time smartphone user in Nairobi, the Internet is most often a small collection of apps in an unfamiliar language, limited further by high data costs.

This undercuts the Internet’s potential as a global public resource — a resource everyone should be able to use to improve their lives and societies.

Twelve months ago, Mozilla set out to study this divide. We wanted to understand the barriers that low-income, first-time smartphone users in Kenya face when adapting to online life. And we wanted identify the skills and education methods necessary to overcome them.

To do this, Mozilla created the Digital Skills Observatory: a participatory research project exploring the complex relationship between devices, digital skills, social life, economic life and digital life. The work — funded by the Bill & Melinda Gates Foundation — was developed and led by Mozilla alongside Digital Divide Data and A Bit of Data Inc.

Today, we’re sharing our findings.

 

For one year, Mozilla researchers and local Mozilla community members worked with about 200 participants across seven Kenyan regions. All participants identified as low income and were coming online for the first time through smartphones. To hone our focus, we paid special attention to the impact of digital skills on digital financial services (DFS) adoption. Why? A strong grasp of digital financial services can open doors for people to access the formal financial environment and unlock economic opportunity.

In conducting the study, one group of participants was interviewed regularly and shared smartphone browsing and app usage data. A second group did the same, but also received digital skills training on topics like app stores and cybersecurity.

Our findings were significant. Among them:

  • Without proper digital skills training, smartphone adoption can worsen — not improve — existing financial and social problems.
    • Without media literacy and knowledge of online scams, users fall prey to fraudulent apps and news. The impact of these scams can be devastating on people who are already financially precarious
    • Users employ risky methods to circumvent the high price of data, like sharing apps via Bluetooth. As a result, out-of-date apps with security vulnerabilities proliferate
  • A set of 53 teachable skills can reduce barriers and unlock opportunity.
    • These skills — identified by both participants and researchers — range from managing data usage and recognizing scams to resetting passwords, managing browser settings and understanding business models behind app stores
    • Our treatment group learned these skills, and the end-of-study evaluation showed increased agency and understanding of what is possible online
    • Without these fundamental skills, users are blocked in their discoveries and adoption of digital products
  • Gender and internet usage are deeply entwined.
    • Men often have an effect on the way women use apps and services — for example, telling them to stop, or controlling their usage
    • Women were almost three times as likely to be influenced by their partner when purchasing a smartphone, usually in the form of financial support
  • Language and Internet usage are deeply entwined.
    • The web is largely in English — a challenge for participants who primarily speak Swahili or Sheng (a Swahili-English hybrid)
    • Colloquial language (like Sheng) increases comfort with technology and accommodates learning
  • Like most of us, first-time users found an Internet that is highly centralized.
    • Participants encountered an Internet dominated by just a few entities. Companies like Google, Facebook and Safaricom control access to apps, communication channels and more. This leads to little understanding of what is possible online and little agency to leverage the web
  • Digital skills are best imparted through in-person group workshops or social media channels.
    • Community-based learning was the most impactful — workshops provide wider exposure to what’s possible online and build confidence
    • Mobile apps geared toward teaching digital skills are less effective. Many phones cannot support them, and they are unlikely to “stick”
    • Social networks can be highly effective for teaching digital skills. Our chatbot experiment on WhatsApp showed positive results
  • Local talent is important when teaching digital skills.
    • Without a community of local practitioners and teachers, teaching digital skills becomes far more difficult
    • Research and teaching capacity can be grown and developed within a community
  • Digital skills are critical, but not a panacea.
    • Web literacy is one part of a larger equation. To become empowered digital citizens, individuals also must have access (like hardware and affordable data) and need (a perceived use and value for technology).

Mozilla’s commitment to digital literacy doesn’t end with this research. We’re holding roundtables and events in Kenya — and beyond — to share findings with allies like NGOs and technologists. We’re asking others to contribute to the conversation.

We’re also rolling our learnings into our ongoing Internet Health work, and building on the concept that access alone isn’t enough — we need solutions that account for the nuances of social and economic life, too.

Read the full report here.

The post How Do We Connect First-Time Internet Users to a Healthy Web? appeared first on The Mozilla Blog.

Roberto A. VitilloOn technical leadership

I have been leading a team of data engineers for over a year and I feel like I have a much better idea of what leadership entails since the beginning of my journey. Here is a list of things I learned so far:

Have a vision

As a leader, you are supposed to have an idea of where you are leading your project or product to. That doesn’t mean you have to be the person that comes up with a plan and has all the answers! You work with smart people that have opinions and ideas, use them to shape the vision but make sure everyone on your team is aligned.

Be your team’s champion

During my first internship in 2008, I worked on a real-time monitoring application used to assess the quality of the data coming in from the ATLAS detector. My supervisor at the time had me present my work at CERN in front of a dozen scientists and engineers. I recall being pretty anxious, especially because my spoken English wasn’t that great. Even so, he championed my work and pushed me beyond my comfort zone and ultimately ensured I was recognized for what I built. Having me fly over from Pisa to present my work in person might not have been a huge deal for him but it made all the difference to me.

I had the luck to work with amazing technical leads over the years and they had all one thing in common: they championed my work and made sure people knew about it. You can build amazing things but if nobody knows about it, it’s like it never happened.

Split your time between maker & manager mode

Leading a team means you are going to be involved in lots of non-coding related activities that don’t necessarily feel immediately productive, like meetings. While it’s all too easy just jump back to coding, one shouldn’t neglect the managerial activities that a leadership role necessarily entails. One of your goals should be to improve the productivity of your colleagues and coding isn’t the best way to do that. That doesn’t mean you can’t be a maker, though!

When I am in manager mode, interruptions and meetings dominate all my time. On the other hand, when I am in maker mode I absolutely don’t want to be interrupted. One simple thing I do is to schedule some time on my calendar to fully focus on my maker mode. It really helps to know that I have a certain part of the day that others can’t schedule over. I also tend to stay away from IRC/Slack during those hours. So far this has been working great; I stopped feeling unproductive and not in control of my time as soon as I adopted this simple hack.

Pick strategic tasks

After being around long enough in a company you will have probably picked up a considerable baggage of domain-specific technical expertise. That expertise allows you to not only easily identify the pain points of your software architecture, but also know what solutions are appropriate to get rid of them. When those solutions are cross-functional in nature and involve changes to various components, they provide a great way for you to add value as they are less likely to be tackled spontaneously by more junior peers.

Be a sidekick

The best way I found to mentor a colleague is to be their sidekick on a project of which they are the technical lead, while I act more as a consultant intervening when blockers arise. Ultimately you want to grow leaders and experts and the best way to do that is to give them responsibilities even if they only have 80% of the skills required. If you give your junior peers the chance to prove themselves, they will work super hard to learn the missing 20% and deliver something amazing. When they reach a milestone, let them take the credit they deserve by having them present it to the stakeholders and the rest of the team.

Have regular one-on-ones with people you work with

Having a strong relationship based on mutual respect is fundamental to lead people. There isn’t a magic bullet to get there but you can bet it requires time and dedication. I found recurring one-on-ones to be very helpful in that regard as your colleagues know there is a time during the week they can count on having your full attention. This is even more important if you work remotely.

Talk to other leads

Projects don’t live in isolation and sooner or later your team will be blocked on someone else’s team. It doesn’t matter how fast your team can build new features or fix bugs if there is a bottleneck in the rest of the pipeline. Similarly, your team might become the blocker for someone else.

To avoid cross-functional issues, you should spend time aligning your team’s vision with the goals of the teams you work with. A good way to do that is to schedule recurring one-on-ones with technical leads of teams you work with.

Give honest feedback

If something isn’t right with a colleague then just be honest about it. Don’t be afraid to tell the truth even when it’s uncomfortable. Even though it might hurt to give, and receive, negative feedback, ultimately people appreciate that they always know where they stand.

Be open about your failures

The best teams I worked in were the ones in which everyone felt safe admitting mistakes. We all do mistakes and if you don’t hear about them it simply means they are getting covered up.

The only way to become an expert is to do all possible mistakes one can possibly make. When you are open about your missteps, you not only encourage others to do the same but you are also sharing a learning opportunity with the rest of the team.


Air MozillaMozilla Weekly Project Meeting, 20 Mar 2017

Mozilla Weekly Project Meeting The Monday Project Meeting

The Mozilla BlogWebVR and AFrame Bringing VR to Web at the Virtuleap Hackathon

Imagine an online application that lets city planners walk through three-dimensional virtual versions of proposed projects, or a math program that helps students understand complex concepts by visualizing them in three dimensions. Both CityViewR & MathworldVR are amazing applications experiences that bring to life the possibilities of virtual reality (VR).
Both are concept virtual reality applications for the web that were generated for the Virtuleap WebVR Hackathon. Amazingly, nine out of ten of the winning projects used AFrame, an open source project sponsored by Mozilla, which makes it much easier to create VR experiences.. CityView really illustrates the capabilities of WebVR to have real life benefits that impact the quality of people’s daily lives beyond the browser.

A top-notch batch of leading VR companies, including Mozilla, funded and supported this global event with the goal of building the grassroots community for WebVR. For non-techies, WebVR is the experimental JavaScript API that allows anyone with a web browser to experience immersive virtual reality on almost any device. WebVR is designed to be completely platform and device agnostic and so it is a scalable and democratic path to stoking a mainstream VR industry that can take advantage of the most valuable thing the web has to offer: built-in traffic and hundreds of millions of users.

Over three months, long contest teams from a dozen countries submitted 34 VR concepts. Seventeen judges and audience panels voted on the entries. Below is a list of the top 10 projects. I wanted to congratulate @ThePascalRascal and @Geczy for their work that won the €30,000 prize and spots to VR accelerator programs in Amsterdam, respectively.

Here’s the really excellent part. With luck and solid code, virtual reality should start appearing in standard general availability web browsers in 2017. That’s a big deal. To date, VR has been accessible primarily on proprietary platforms. To put that in real world terms, the world of VR has been like a maze with many doors opening into rooms. Each room held something cool. But there was no way to walk easily and search through the rooms, browse the rooms, or link one room to another. This ability to link, browse, collaborate and share is what makes the web powerful and it’s what will help WebVR take off.

To get an idea of how we envision this might work, consider the APainter app built by Mozilla’s team. It is designed to let artists create virtual art installations online. Each APainter work has a unique URL and other artists can come in and add to or build on top of the creation of the first artist, because the system is open source. At the same time, anyone with a browser can walk through an APainter work. And artists using APainter can link to other works within their virtual works, be it a button on a wall, a traditional text block, or any other format.

Mozilla participated in this hackathon, and is supporting WebVR,  because we believe keeping the web open and ensuring it is built on open standards that work across all devices and browsers is a key to keeping the internet vibrant and healthy. To that same end, we are sponsoring the AFrame Project. The goal of AFrame is to make coding VR apps for the web even easier than coding web apps with standard HTML and javascript. Our vision at Mozilla is that, in the very near future, any web developer that wants to build VR apps can learn to do so, quickly and easily. We want to give them the power of creative self-expression.

It’s gratifying to see something we have worked so hard on enjoy such strong community adoption. And we’re also super grateful to Amir and the folks that put in the time and effort to organize and staff the Virtualeap Global Hackathon. If you are interested in learning more about AFrame, you can do so here.

The post WebVR and AFrame Bringing VR to Web at the Virtuleap Hackathon appeared first on The Mozilla Blog.

Daniel Stenbergcurlup 2017: curl now

At curlup 2017 in Nuremberg, I did a keynote and talked a little about the road to what we are and where we are right now in the curl project. There will hopefully be a recording of this presentation made available soon, but I wanted to entertain you all by also presenting some of the graphs from that presentation in a blog format for easy access and to share the information.

Some stats and numbers from the curl project early 2017. Unless otherwise mentioned, this is based on the availability of data that we have. The git repository has data from December 1999 and we have detailed release information since version 6.0 (September 13, 1999).

Web traffic

First out, web site traffic to curl.haxx.se over the seven last full years that I have stats for. The switch to a HTTPS-only site happened in February 2016. The main explanation to the decrease in spent bandwidth in 2016 is us removing the HTML and PDF versions of all documentation from the release tarballs (October 2016).

My log analyze software also tries to identify “human” traffic so this graph should not include the very large amount of bots and automation that hits our site. In total we serve almost twice the amount of data to “bots” than to human. A large share of those download the cacert.pem file we host.

Since our switch to HTTPS we have a 301 redirect from the HTTP site, and we still suffer from a large number of user-agents hitting us over and over without seemingly following said redirect…

Number of lines in git

Since we also have documentation and related things this isn’t only lines of code. Plain and simply: lines added to files that we have in git, and how the number have increased over time.

There’s one notable dip and one climb and I think they both are related to how we have rearranged documentation and documentation formatting.

Top-4 author’s share

This could also talk about how seriously we suffer from “the bus factor” in this project. Look at how large share of all commits that the top-4 commiters have authored. Not committed; authored. Of course we didn’t have proper separation between authors and committers before git (March 2010).

Interesting to note here is also that the author listed second here is Yang Tse, who hasn’t authored anything since August 2013. Me personally seem to have plateaued at around 57% of all commits during the recent year or two and the top-4 share is slowly decreasing but is still over 80% of the commits.

I hope we can get the top-4 share well below 80% if I rerun this script next year!

Number of authors over time

In comparison to the above graph, I did one that simply counted the total number of unique authors that have contributed a change to git and look at how that number changes over time.

The time before git is, again, somewhat of a lie since we didn’t keep track of authors vs committers properly then so we shouldn’t put too much value into that significant knee we can see on the graph.

To me, the main take away is that in spite of the top-4 graph above, this authors-over-time line is interestingly linear and shows that the vast majority of people who contribute patches only send in one or maybe a couple of changes and then never appear again in the project.

My hope is that this line will continue to climb over the coming years.

Commits per release

We started doing proper git tags for release for curl 6.5. So how many commits have we done between releases ever since? It seems to have gone up and down over time and I added an average number line in this graph which is at about 150 commits per release (and remember that we attempt to do them every 8 weeks since a few years back).

Towards the right we can see the last 20 releases or so showing a pattern of high bar, low bar, and I’ll get to that more in a coming graph.

Of course, counting commits is a rough measurement as they can be big or small, easy or hard, good or bad and this only counts them.

Commits per day

As the release frequency has varied a bit over time I figured I should just check and see how many commits we do in the project per day and see how that has changed (or not) over time. Remember, we are increasing the number of unique authors fairly fast but the top-4 share of “authorship” is fairly stable.

Turns our the number of commits per day has gone up and down a little bit through the git history but I can’t spot any obvious trend here. In recent years we seem to keep up more than 2 commits per day and during intense periods up to 8.

Days per release

Our general plan is since bunch of years back to do releases every 8 weeks like a clock work. 8 weeks is 56 days.

When we run into serious problems, like bugs that are really annoying or tedious to users or if we get a really serious security problem reported, we sometimes decide to go outside of the regular release schedule and ship something before the end of the 8-week cycle.

This graph clearly shows that over the last, say 20, releases we clearly have felt ourselves “forced” to do follow-up releases outside of the regular schedule. The right end of the graph shows a very clear saw-tooth look that proves this.

We’ve also discussed this slightly on the mailing list recently, and I’m certainly willing to go back and listen to people as to what we can do to improve this situation.

Bugfixes per release

We keep close track of all bugfixes done in git and mark them up and mention them in the RELEASE-NOTES document that we ship in every new release.

This makes it possible for us to go back and see how many bug fixes we’ve recorded for each release since curl 6.5. This shows a clear growth over time. It’s interesting since we don’t see this when we count commits, so it may just be attributed to having gotten better at recording the bugs in the files. Or that we now spend fewer commits per bug fix. Hard to tell exactly, but I do enjoy that we fix a lot of bugs…

Days spent per bugfix

Another way to see the data above is to count the number of bug fixes we do over time and just see how many days we need on average to fix bugs.

The last few years we do more bug fixes than there are days so if we keep up the trend this shows for 2017 we might be able to reach down to 0.5 days per bug fix on average. That’d be cool!

Coverity scans

We run coverity scans on the curl cover regularly and this service keeps a little graph for us showing the number of found defects over time. These days we have a policy of never allowing a defect detected by Coverity to linger around. We fix them all and we should have zero detected defects at all times.

The second graph here shows a comparison line with “other projects of comparable size”, indicating that we’re at least not doing badly here.

Vulnerability reports

So in spite of our grand intentions and track record shown above, people keep finding security problems in curl in a higher frequency than every before.

Out of the 24 vulnerabilities reported to the curl project in 2016, 7 was the result of the special security audit that we explicitly asked for, but even if we hadn’t asked for that and they would’ve remained unknown, 17 would still have stood out in this graph.

I do however think that finding – and reporting – security problem is generally more good than bad. The problems these reports have found have generally been around for many years already so this is not a sign of us getting more sloppy in recent years, I take it as a sign that people look for these problems better and report them more often, than before. The industry as a whole looks on security problems and the importance of them differently now than it did years ago.

Daniel Stenbergcurl up 2017, the venue

The fist ever physical curl meeting took place this last weekend before curl’s 19th birthday. Today curl turns nineteen years old.

After much work behind the scenes to set this up and arrange everything (and thanks to our awesome sponsors to contributed to this), over twenty eager curl hackers and friends from a handful of countries gathered in a somewhat rough-looking building at curl://up 2017 in Nuremberg, March 18-19 2017.

The venue was in this old factory-like facility but we put up some fancy signs so that people would find it:

Yes, continue around the corner and you’ll find the entrance door for us:

I know, who’d guessed that we would’ve splashed out on this fancy conference center, right? This is the entrance door. Enter and look for the next sign.

Yes, move in here through this door to the right.

And now, up these stairs…

When you’ve come that far, this is basically the view you could experience (before anyone entered the room):

And when Igor Chubin presents about wttr,in and using curl to do console based applications, it looked like this:

It may sound a bit lame to you, but I doubt this would’ve happened at all and it certainly would’ve been less good without our great sponsors who helped us by chipping in what we didn’t want to charge our visitors.

Thank you very much Kippdata, Ergon, Sevenval and Haxx for backing us!

Daniel Stenberg19 years ago

19 years ago on this day I released the first ever version of a software project I decided to name curl. Just a little hobby you know. Nothing fancy.

19 years ago that was a few hundred lines of code. Today we’re at around 150.000 lines.

19 years ago that was mostly my thing and I sent it out hoping that *someone* would like it and find good use. Today virtually every modern internet-connected device in the world run my code. Every car, every TV, every mobile phone.

19 years ago was a different age not only to me as I had no kids nor house back then, but the entire Internet and world has changed significantly since.

19 years ago we’d had a handful of persons sending back bug reports and a few patches. Today we have over 1500 persons having helped out and we’re adding people to that list at a rapid pace.

19 years ago I would not have imagined that someone can actually stick around in a project like this for this long time and still find it so amazingly fun and interesting still.

19 years ago I hadn’t exactly established my “daily routine” of spare time development already but I was close and for the larger part of this period I have spent a few hours every day. All days really. Working on curl and related stuff. 19 years of a few hours every day equals a whole lot of time

I took us 19 years minus two days to have our first ever physical curl meeting, or conference if you will.

The Servo BlogThis Week In Servo 95

In the last week, we landed 110 PRs in the Servo organization’s repositories.

Planning and Status

Our overall roadmap is available online, including the overall plans for 2017 and Q1. Please check it out and provide feedback!

This week’s status updates are here.

Congratulations to our new reviewers, avadacatavra and canaltinova. Diane joined the Servo team last year and has been upgrading our networking and security stack, while Nazım has been an important part of the Stylo effort so far. We’re excited to see them both use their new powers for good!

Notable Additions

  • SimonSapin reduced the overhead of locking associated with CSSOM objects.
  • nox corrected a case that did not properly merge adjacent text nodes.
  • glennw improved the rendering quality of transforms in WebRender.
  • Manishearth added support for CSS system colors in Stylo.
  • mukilan and canaltinova implemented HTML parser support for form owners.
  • n0max fixed a panic when resizing canvases.
  • ajeffrey implemented support for setting document.domain.
  • mchv removed assumptions that browsing context’s could be safely unwrapped in many circumstances.
  • ajeffrey made the constellation store more information about the original request for a document, rather than just the URL.
  • montrivo implemented missing constructors for the ImageData API.
  • ajeffrey made the top and parent APIs work for cross-thread origins.
  • paulrouget added support for vetoing navigation in embeddings.
  • ajeffrey implemented cross-thread postMessage support.
  • Manishearth converted a macro into a higher-order macro for cleaner, more idiomatic code.

New Contributors

Interested in helping build a web browser? Take a look at our curated list of issues that are good for new contributors!

James LongHow I Became a Better Programmer

Several people at React Conf asked me for advice on becoming a better programmer. For some reason, people see me as a pretty advanced programmer worth listening to. I thought it would be worthwhile to write down my "mental model" for how I have approached programming over the years.

Some details about me: I'm 32 years old and have over 10 years of solid experience. It probably wasn't until the last few years until I really felt confident in what I was doing. Even now, though, I continually doubt myself. The point is that this feeling doesn't go away, so just try to ignore it, keep hacking, and keep building experience.

Let me be clear that these are only a few tips for improving your skills. Ultimately you need to figure out what works best for you. These are just things that I have found helpful.

  • Find people who inspire you, but don't idolize them.

    Over the years there have been many people that I looked up to and watched for new tech. I learned a lot by simply trusting they were right and digging into things they worked on. These people tend to be very productive, brilliant, and inspiring. Find them and let them inspire and teach you.

    However, make sure not to idolize them. It's easy to seem intimidating from a twitter feed, but if you look at how they work in real life, you'll see that they aren't that different. Hacks everywhere, etc. We're all just experimenting. Lastly, don't blindly trust them; if you disagree, engage them and learn from it. Some of my most productive conversations happened this way.

    My Emacs config is a mess. I don't know why my OCaml autocompletion is broken (it's been broken for over a month). I don't automate stuff and have to dig around in my shell history to find commands I need sometimes. I write the ugliest code at first. I stick things on the global object until I know what I'm doing. The most experienced programmer uses hacks all the time; the important part is that you're getting stuff done.

  • Don't devalue your work.

    Newer programmers tend to feel like their work isn't worth much because they are new. Or maybe you are an experienced programmer, but working in a new area that makes you uncomfortable. In my opinion, some of the best ideas come from newer programmers who see improvements to existing tech that those who have already-formed opinions don't see.

    Your work is worthwhile, no matter what. In the worst case, if your idea doesn't work out, the community will have learned better why that approach doesn't make sense. (A note to the community: it's up to us to execute on this and be welcoming to newcomers.)

  • Don't feel pressured to work all the time.

    With new tech coming out every day, it can feel like the world will move on without you if you take a night off. That's not true. In fact, you will do better work if you disengage a lot. Your perspective will be fresh, and I find myself subconsciously coming up with new ideas when I'm not working.

    The majority of the stuff being released every day is just a rehash of the same ideas. Truly revolutionary stuff only happens every few years. A good talk to watch on this subject is Hammock Driven Development.

  • Ignore fluff.

    One of the biggest ways you can objectively get better faster is by ignoring "fluff" that won't actually improve your skills very much. Another way to say this is "use your time wisely". You only have so many hours in the day and if you spend it on deeper things you will see a big difference over time.

    So what is "fluff"? It's up to you, but I can give you some examples of what I consider fluff: language syntax, library APIs, and configuring build tooling. Learning a new ES7 JS syntax won't make you a better programmer nearly as much as learning how compilers work, for example. Adopting a new library that implements the same idea but with a new API isn't that interesting. All of those things are important, of course, but I recommend spending more time learning deeper concepts that will reward you for years.

    Here's a question I like to ask: do you spend most of your time making your code look "nice"? If so, I recommend not focusing on it so much. Your code is going to change a lot over time anyway. It's better to focus hard on the core problems you're trying to solve and think hard about your layers of abstractions. After you've nailed all of that you can spend a little time polishing your code. (This also applies to the DRY principle. Don't worry about it so much. Feel free to duplicate.)

  • Dig into past research.

    If you're excited about an idea, it's super tempting to sit down an immediately get going. But you shouldn't do that until you've done some cursory research about how people have solved it before. Spending a few days researching the topic always completely changes how I am going to solve it.

    It's valuable to learn how to read academic papers. I don't know anything about denotational/operational/etc semantics so there are a lot of papers I can't read. But there are many that use code instead of math and aren't too hard to read. There is a huge amount of knowledge sitting in papers from the last 30 years. If you get good at extracting this, you'll be a thought-leader in no time.

    Prettier is a perfect example of this. I knew what I wanted but I had no idea how to implement it. After a little research I found this paper and after a few days I knew exactly what I needed to do. I had something basic working in a week. If I ignored previous research it would have taken a lot longer.

    If you're looking for papers, the Papers We Love GitHub repo is a great place to start.

  • Take on big projects. Get uncomfortable.

    There's nothing better than experience. Not everyone is in the position to experiment, but if you have time, try and take on some big projects. You don't even need to finish them. Just trying to tackle something like writing a compiler will teach you tons in the first few weeks.

    I honestly hate the feeling where I have no idea how to solve a complex problem. It's uncomfortable. I know I'll have to do a lot of research and learning before I'm even close to a solution. But I'm always a much better programmer afterwards.

    Start with learning a new language. It's the most effective way to force you out of your current habits and see things in a new light. For me, the best thing I did as a young programmer was learn Scheme. It's an extremely simple language and forces you to do everything in a functional style, and really learn the fundamentals of how code works. The few years I spent in Scheme are still paying off today; the way I see code is fundamentally changed. (I even named my company Shift Reset LLC after the shift/reset operators from Scheme.)

    Here's a list of a few things I would recommend doing. These are all things that had huge impacts on my programmer career. Most of them continue to pay off to this day in subtle ways and help me to deconstruct new ideas mentally. You don't need to do these to become a good programmer, and there are many other things you can learn to improve yourself, but these are what helped me.

    • Learn C - Just the basics, if you don't already. I think it's valuable to understand why everyone complains about it.
    • Write a compiler - Perhaps the best way to get uncomfortable and learn. Check out the super tiny compiler.
    • Learn macros - See Scheme, Lisp, or Clojure(Script). Macros will really change how you see code.
    • SICP - SICP is an old book that I think is still relevant today (some people disagree). It assumes very little programming knowledge and walks you all the way up to implementing a meta-circular evaluator and compiler. Another book I really enjoyed and goes a lot deeper in compilers is Lisp In Small Pieces.
    • Understand continuations - Continuations are a low-level control flow mechanism. Scheme is the only language to implement them, and while you will never use them in production, they will change how you think about control flow. I wrote a blog post trying to explain them.
    • If anything, just try a new language - Regardless of what you do, you really should explore other languages. I would recommend any of the following: Clojure, Rust, Elm, OCaml/Reason, Go, or Scheme. All of them have unique features and will force you to learn a new way of thinking.

Mozilla Reps CommunityReps of the Month – February 2017

Please join us in congratulating Siddhartha Rao and Vishal Chavan, our Reps of the Month for February 2017!

Siddhartha
Siddhartha is a Computer Engineering major from New York, he is also a technology leader with rich experience in Security, Web Development and Data Privacy Evangelism. He started with his Mozilla contribution by leading the Firefox Club and Cyber Cell for a year at his university. Sid was a speaker and Data Privacy Ingeniouses at the Glassroom initiative by Tactical Tech and Mozilla.

He contributed to the strata of data privacy in browser metadata, mobile devices and social networks and took the responsibility to delve deep in the topics and spread awareness about the same at the Data Detox Bar. He also is an important part of the privacy month campaign in the New York community. Creating quality content for the campaign, Sid was responsible for putting together the social media content for Twitter and Facebook posts and ensuring timely delivery of graphics to global teams for localization.

 

Vishal
Vishal is a web entrepreneur and a champion Mozilla contributor since April 2013. He started his Mozilla journey by organizing sessions focused on teaching the web to school and university students. He then became a part of Mozilla Foundation when assigned the role of a Regional Coordinator for Mozilla Clubs. One of the key roles he played as a contributor was being a core team member who started the January Privacy Month Campaign. A campaign which, as of 2017, has proven to be a global success for the third year in a row.

In the last six months, Vishal has contributed as a Regional Coordinator, has helped set up and mentor many new Mozilla Clubs in and around India. He has been an integral part of campaigns to spread privacy awareness in India. Involved in the planning and the flow of the privacy month campaig, Vishal was responsible for assigning roles, mobilizing contributors and encouraging participation in India and in global communities. He also mentors at the Maker-Fest in India.

Congrats to both of you! Join us in congratulating them in discourse.

Alex VincentA practical whitelist in JavaScript: es7-membrane, version 0.7

Several months ago, I announced es7-membrane, a new project for letting JavaScript developers control, through JS proxies, what their customers see of their own libraries.  A proxy, as you may recall, lets its creator define rules for looking up properties, defining properties, calling methods, etc., often with a real object underneath which the proxy’s handler internally refers to.

First off, a little bit of basics: an object is a collection of properties (some of which are functions, and officially named “methods”). This may sound obvious, but it’s important: you refer to other values by the object and a property name.

Proxies allow you to rewrite the rules for referring to other values by that tuple of the (containing) object and a property name. For instance, you can hide “private” members behind a proxy.

(Stack Overflow)

A membrane presents a one-to-one relationship between each proxy and a corresponding real object.  If you’re given a proxy to a document, and not the document itself, you can get other proxies, and those proxies can offer other properties that let you get back to the proxy of the document… but you can’t break out of any of the proxies to the underlying set of objects (or “object graph”).

When I made the announcement back in August of this new project, the membrane I presented was quite useless, implementing only a mirroring capability.  Not anymore.  There’s a few features that the latest version currently supports:

  1. Membrane owners can replace any proxy the membrane generates with a custom proxy, using the .modifyRules API
    • .createChainHandler(…) creates a new ProxyHandler derived from a handler used for an object graph the membrane already knows about.
    • .replaceProxy(oldProxy, newHandler) takes the new handler and returns a new proxy for the original value, provided the new proxy handler was created via .createChainHandler().
  2. Membrane owners can require a proxy to store new properties locally on the proxy, instead of propagating them through to the underlying object.  (.storeUnknownAsLocal(…) )
  3. Membrane owners can require a proxy to delete properties locally, instead of on the underlying object.  (.requireLocalDelete(…) )
  4. Membrane owners can hide existing properties of an object from the proxy’s users, as if those properties do not exist.  (.filterOwnKeys(…) )
  5. Membrane owners can be notified when a new proxy is about to go to the customer, and set up new rules for that proxy before the customer ever sees it.  (ObjectGraphHandler.prototype.addProxyListener(callback), ..removeProxyListener(callback) )
  6. Membrane owners can define as many object graphs as they want.  Traditionally, a membrane in JavaScript has supported only two object graphs.  (“wet”/”dry”, or “protected”/”public”, or “internal”/”external”… you get the idea)  But there’s no technical reason for that to be the upper limit.  The initial design of es7-membrane allowed the owner to define an object graph by name with a string.
    • Having more than one object graph could have a few applications:  different privileges for different users or customers, for example.
    • The es7-membrane test code uses “dry”, “wet” and “damp” object graphs.
    • For lack of a better name, es7-membrane calls these “multisided” membranes.  I have a document explaining how this works at a very temporary location.
  7. A new feature of ECMAScript 6, however, is symbols – which can be used as valid keys to an object, but are not strings.  Per a Github issue by Dr. Mark Miller of Google (who has been advising me on applications every now and then – thanks, Mark), now membrane owners can define object graphs by a “private” JavaScript symbol they create, as well.

Features two through five above combine to make whitelisting of properties in JavaScript very doable:  properties you don’t want exposed you filter out, and customers receiving proxies configured for local properties won’t propagate their changes to the your objects.  The listeners also mean you can apply those filters and local property rules before the customer ever sees any newly created proxies.  So properties you want private and/or protected really are private and/or protected from the malicious end-user.

Version 0.7 added the symbol support today, along with a probable memory leak clean-up and a little more protection for revoked object graphs.

The production files are available in the dist subdirectory of the es7-membrane project’s Github’ repository, or directly usable as a npm module.

I’m still looking for help in a few areas:

  • Building a static interactive demo site on Github
  • Code and documentation reviews
  • More testing
    • additional tests in Jasmine
    • converting tests to test-262 format (someone suggested the standard tests for ECMAScript might use some integration tests, such as a membrane implementation)
    • if there are weaknesses, where a customer who has only “dry” proxies and good old ECMAScript 6+ (no membrane access, no other proxies) can break out to reach “wet” proxies… no one’s seriously explored that yet.
  • Using the membrane module to protect itself from evildoers (dogfooding bonus)
  • Implementing other use cases for a multisided membrane
  • And just in general, another volunteer developer or two would be extremely welcome!

Manish GoregaokarI Never Hear the Phrase 'INHTPAMA' Anymore

Imagine never hearing the phrase ‘INHTPAMA’ again.

Oh, that’s already the case? Bummer.

Often, when talking about Rust, folks refer to the core aliasing rule as “that &mut thing”, “compile-time RWLock” (or “compile-time RefCell”), or something similar. Basically, referring to the fact that you can’t mutate the data that is currently held via an & reference, and that you can’t mutate or read the data currently held via an &mut reference except through that reference itself.

It’s always bugged me that we really don’t have a name for this thing. It’s one of the core bits of Rust, and crops up often in discussions.

But we did have a name for it! It was “INHTPAMA” (which was later butchered into “INHTWAMA”).

This is a reference to Niko’s 2012 blog post, titled “Imagine Never Hearing The Phrase ‘aliasable, mutable’ again”. It’s where the aliasing rules came from. Go read it, it’s great. It talks about this weird language with at symbols and purity, but I assure you, that language is Baby Rust. Or maybe Teenage Rust. The lifecycle of rusts is complex and interesting and I don’t know how to categorize it.

The point of this post isn’t really to encourage reviving the use of “INHTWAMA”; it’s a rather weird acronym that will probably confuse folks. I would like to have a better way of refering to “that &mut thing”, but I’d prefer if it wasn’t a confusing acronym that carries no meaning of its own if you don’t know the history of it. That’s a recipe for making new community members feel like outsiders.

But that post is amazing and I’d hate to see it drop out of the collective memory of the Rust community.

Cameron Kaiser45.8.1 not available (also: 45.9 and FPR1 progress, and goodbye, App.net)

TenFourFox 45.8.1 is not available, because there isn't one, even though Firefox 52.0.1 is available to fix the fallout from Pwn2Own. However, the exploited API in question does not exist in Firefox 45 (against which we are based) and a second attack against Firefox was apparently unsuccessful, so at least right now no urgent TenFourFox chemspill is required. 45.9, the last release we will make at source parity against the Mozilla code base, is still on schedule for April 18th.

45.9 has more microimprovements to JavaScript, including some sections of hand-written assembly code that have been completely overhauled (especially in the inline caches for arithmetic operations) and fixing a stupid bug that caused logical comparisons on floating point operations to always hit a slow code path, some additional microimprovements to hard-coding code flow with xptcall, graphics and text runs, and then bug fixes for geolocation and the font blacklist. The last two should be done by the end of next week or slightly after, and then after I test it internally there will be a beta prior to release.

Today, though, I've been playing with Google's new Guetzli JPEG encoder, which promises higher compression ratios with great quality that any JPEG decoder can view, because you really can have "tastes great" and "less filling" at the same time, apparently. Yes, I was able to get it to compile on the Power Mac and it works pretty well on my G5 from the command line; I'm trying to package it as a drag-and-drop tool which I might release a little later if I have time. If Guetzli takes off, maybe Google will stop it with their stupid WebP fetish since this is a much better solution and far more compatible.

Finally, yesterday was the last day of App.net, fondly called "ADN" by denizens such as myself, Martin, Sevan and Riccardo. It unfairly got tarred as a "pay Twitter clone," which in fairness its operators didn't do enough to dispel, though most of us longtimers think that the service sealed its doom when they moved from a strictly pay model to a freemium model. That then destabilized the service by allowing a tier of user that wasn't really invested in its long-term success (like, say, blog spammers, etc.), and it gradually dropped below profitability because the pay tier didn't offer enough at that point.

But ADN had a real sense of community that just doesn't exist with Facebook, nor Twitter in particular. There were much fewer trolls and mob packs, and those that did engage in that behaviour found themselves ostracised quickly. Furthermore, you didn't have the sense of people breathing down your neck or endlessly searching for victims who might post the wrong thing so they can harass and "out" you for not toeing the party line. I think the smaller surface area and user base really led to that kind of healthier online relating, and I still believe that a social media service that forces a smaller number of people to be invested in the success of that service -- that in turn treats them as customers and not cattle -- is the most effective way to get around the problems the large free social sites have.

Meanwhile, most of us ADN refugees have moved to Pnut, made by another ADN denizen. Pnut is getting around the jerk problem by going invite-only. If you're not a jerk and you're interested in a better community to socially interact online, contact me and I'll get you a code.

Here are the last moments of the ADN global stream, as witnessed by Texapp, my custom ADN client. Thanks, Dalton and Bryan, and all the great ADN staff that participated over the years. It was a good ride while it lasted.

Air MozillaWebdev Beer and Tell: March 2017

Webdev Beer and Tell: March 2017 Once a month web developers across the Mozilla community get together (in person and virtually) to share what cool stuff we've been working on in...

Mozilla Addons BlogMigrating to WebExtensions? Don’t Forget Your Users

Stranded users feel sadness.

If you’re the developer of a legacy add-on with an aim to migrate to WebExtensions, here are a few tips to consider so your users aren’t left behind.

Port your user data

If your legacy add-on stores user data, we encourage you to take advantage of Embedded WebExtensions to transfer the data to a format that can be used by WebExtensions. This is critical if you want to seamlessly migrate your users—without putting any actionable burden on them—to the new WebExtensions version when it’s ready. (Embedded WebExtensions is a framework that contains your WebExtension inside of a bootstrapped or SDK extension.)

Testing beta versions

If you want to test a WebExtensions version of your add-on with a smaller group of users, you can make use of the beta versions feature on addons.mozilla.org (AMO). This lets you test pre-release beta versions that are signed and available to Firefox users who want to give them a spin. You’ll benefit from real users interacting with your new version and providing valuable feedback—without sacrificing the good reputation and rating of your listed version. We don’t recommend creating a separate listing on release because this will fragment your user base and leave a large number of them behind when Firefox 57 is released.

Don’t leave your users behind

Updating your listing on AMO when your WebExtension is ready is the only way to ensure all of your users move over without any noticeable interruption.

Need further assistance with your migration journey? You can find real-time help during office hours, or by emailing webextensions-support [@] mozilla [dot] org.

The post Migrating to WebExtensions? Don’t Forget Your Users appeared first on Mozilla Add-ons Blog.

Niko MatsakisThe Lane Table algorithm

For some time now I’ve been interested in better ways to construct LR(1) parsers. LALRPOP currently allows users to choose between the full LR(1) algorithm or the LALR(1) subset. Neither of these choices is very satisfying:

  • the full LR(1) algorithm gives pretty intuitive results but produces a lot of states; my hypothesis was that, with modern computers, this wouldn’t matter anymore. This is sort of true – e.g., I’m able to generate and process even the full Rust grammar – but this results in a ton of generated code.
  • the LALR(1) subset often works but sometimes mysteriously fails with indecipherable errors. This is because it is basically a hack that conflates states in the parsing table according to a heuristic; when this heuristic fails, you get strange results.

The Lane Table algorithm published by Pager and Chen at APPLC ‘12 offers an interesting alternative. It is an alternative to earlier work by Pager, the “lane tracing” algorithm and practical general method. In any case, the goal is to generate an LALR(1) state machine when possible and gracefully scale up to the full LR(1) state machine as needed.

I found the approach appealing, as it seemed fairly simple, and also seemed to match what I would try to do intuitively. I’ve been experimenting with the Lane Table algorithm in LALRPOP and I now have a simple prototype that seems to work. Implementing it required that I cover various cases that the paper left implicit, and the aim of this blog post is to describe what I’ve done so far. I do not claim that this description is what the authors originally intended; for all I know, it has some bugs, and I certainly think it can be made more efficient.

My explanation is intended to be widely readable, though I do assume some familiarity with the basic workings of an LR-parser (i.e., that we shift states onto a stack, execute reductions, etc). But I’ll review the bits of table construction that you need.

First example grammar: G0

To explain the algorithm, I’m going to walk through two example grammars. The first I call G0 – it is a reduced version of what the paper calls G1. It is interesting because it does not require splitting any states, and so we wind up with the same number of states as in LR(0). Put another way, it is an LALR(1) grammar.

I will be assuming a basic familiarity with the LR(0) and LR(1) state construction.

Grammar G0

G0 = X "c"
   | Y "d"
X  = "e" X
   | "e"
Y  = "e" Y
   | "e"

The key point here is that if you have "e" ..., you could build an X or a Y from that "e" (in fact, there can be any number of "e" tokens). You ultimately decide based on whether the "e" tokens are followed by a "c" (in which case you build an X) or a "d" (in which case you build a Y).

LR(0), since it has no lookahead, can’t handle this case. LALR(1) can, since it augments LR(0) with a token of lookahead; using that, after we see the "e", we can peek at the next thing and figure out what to do.

Step 1: Construct an LR(0) state machine

We begin by constructing an LR(0) state machine. If you’re not familiar with the process, I’ll briefly outline it here, though you may want to read up separately. Basically, we will enumerate a number of different states indicating what kind of content we have seen so far. The first state S0 indicates that we are at the very beginning out “goal item” G0:

S0 = G0 = (*) X "c"
   | G0 = (*) Y "d"
   | ... // more items to be described later

The G0 = (*) X "c" indicates that we have started parsing a G0; the (*) is how far we have gotten (namely, nowhere). There are two items because there are two ways to make a G0. Now, in these two items, immediately to the right of the (*) we see the symbols that we expect to see next in the input: in this case, an X or a Y. Since X and Y are nonterminals – i.e., symbols defined in the grammar rather than tokens in the input – this means we might also be looking at the beginning of an X or a Y, so we have to extend S0 to account for that possibility (these are sometimes called “epsilon moves”, since these new possibilities arise from consuming no input, which is denoted as “epsilon”):

S0 = G0 = (*) X "c"
   | G0 = (*) Y "d"
   | X = (*) "e" X
   | X = (*) "e"
   | Y = (*) "e" Y
   | Y = (*) "e"

This completes the state S0. Looking at these various possibilities, we see that a number of things might come next in the input: a "e", an X, or a Y. (The nonterminals X and Y can “come next” once we have seen their entire contents.) Therefore we construct three successors states: one (S1) accounts for what happens when see an "e". The other two (S3 and S5 below) account for what happens after we recognize an X or a Y, respectively.

S1 (what happens if we see an "e") is derived by advancing the (*) past the "e":

S1 = X = "e" (*) X
   | X = "e" (*)
   | ... // to be added later
   | Y = "e" (*) Y
   | Y = "e" (*)
   | ... // to be added later

Here we dropped the G0 = ... possibilities, since those would have required consuming a X or Y. But we have kept the X = "e" (*) X etc. Again we find that there are nonterminals that can come next (X and Y again) and hence we have to expand the state to account for the possibility that, in addition to being partway through the X and Y, we are at the beginning of another X or Y:

S1 = X = "e" (*) X
   | X = "e" (*)
   | X = (*) "e"     // added this
   | X = (*) "e" "X" // added this
   | Y = "e" (*) Y
   | Y = "e" (*)
   | Y = (*) "e"     // added this
   | Y = (*) "e" Y   // added this

Here again we expect either a "e", an "X", or a "Y". If we again check what happens when we consume an "e", we will find that we reach S1 again (i.e., moving past an "e" gets us to the same set of possibilities that we already saw). The remaining states all arise from consuming an "X" or a "Y" from S0 or S1:

S2 = X = "e" X (*)

S3 = G0 = X (*) "c"

S4 = Y = "e" Y (*)

S5 = G0 = Y (*) "d"

S6 = G0 = X "c" (*)

S7 = G0 = Y "d" (*)

We can represent the set of states as a graph, with edges representing the transitions between states. The edges are labeled with the symbol ("e", X, etc) that gets consumed to move between states:

S0 -"e"-> S1
S1 -"e"-> S1
S1 --X--> S2
S0 --X--> S3
S1 --Y--> S4
S0 --Y--> S5
S3 -"c"-> S6
S5 -"d"-> S7

Reducing and inconsistent states. Let’s take another look at this state S1:

S1 = X = "e" (*) X
   | X = "e" (*)      // reduce possible
   | X = (*) "e"
   | X = (*) "e" "X"
   | Y = "e" (*) Y
   | Y = "e" (*)      // reduce possible
   | Y = (*) "e"
   | Y = (*) "e" Y

There are two interesting things about this state. The first is that it contains some items where the (*) comes at the very end, like X = "e" (*). What this means is that we have seen an "e" in the input, which is enough to construct an X. If we chose to do so, that is called reducing. The effect would be to build up an X, which would then be supplied as an input to a prior state (e.g., S0).

However, the other interesting thing is that our state actually has three possible things it could do: it could reduce X = "e" (*) to construct an X, but it could also reduce Y = "e" (*) to construct a Y; finally, it can shift an "e". Shifting means that we do not execute any reductions, and instead we take the next input and move to the next state (which in this case would be S1 again).

A state that can do both shifts and reduces, or more than one reduce, is called an inconsistent state. Basically it means that there is an ambiguity, and the parser won’t be able to figure out what to do – or, at least, it can’t figure out what to do unless we take some amount of lookahead into account. This is where LR(1) and LALR(1) come into play.

In an LALR(1) grammar, we keep the same set of states, but we augment the reductions with a bit of lookahead. In this example, as we will see, that suffices – for example, if you look at the grammar, you will see that we only need to do the X = "e" (*) reduction if the next thing in the input is a "c". And similarly we only need to do the Y = "e" (*) reduction if the next thing is a "d". So we can transform the state to add some conditions, and then it is clear what to do:

S1 = X = "e" (*) X    shift if next thing is a `X`
   | X = "e" (*)      reduce if next thing is a "c"
   | X = (*) "e"      shift if next thing is a "e"
   | X = (*) "e" "X"  shift if next thing is a "e"
   | Y = "e" (*) Y    shift if next thing is a `Y`
   | Y = "e" (*)      reduce if next thing is a "d"
   | Y = (*) "e"      shift if next thing is a "e"
   | Y = (*) "e" Y    shift if next thing is a "e"

Note that the shift vs reduce part is implied by where the (*) is: we always shift unless the (*) is at the end. So usually we just write the lookahead part. Moreover, the “lookahead” for a shift is pretty obvious: it’s whatever to the right of the (*), so we’ll leave that out. That leaves us with this, where ["c"] (for example) means “only do this reduction if the lookahead is "c"”:

S1 = X = "e" (*) X
   | X = "e" (*)      ["c"]
   | X = (*) "e"
   | X = (*) "e" "X"
   | Y = "e" (*) Y
   | Y = "e" (*)      ["d"]
   | Y = (*) "e"
   | Y = (*) "e" Y

We’ll call this augmented state a LR(0-1) state (it’s not quite how a LR(1) state is typically defined). Now that we’ve added the lookahead, this state is no longer inconsistent, as the parser always knows what to do.

The next few sections will show how we can derive this lookahead automatically.

Step 2: Convert LR(0) states into LR(0-1) states.

The first step in the process is to naively convert all of our LR(0) states into LR(0-1) states (with no additional lookahead). We will denote the “no extra lookahead” case by writing a special “wildcard” lookahead _. We will thus denote the inconsistent state after transformation as follows, where each reduction has the “wildcard” lookahead:

S1 = X = "e" (*) X
   | X = "e" (*)     [_]
   | X = (*) "e"
   | X = (*) "e" "X"
   | Y = "e" (*) Y
   | Y = "e" (*)     [_]
   | Y = (*) "e"
   | Y = (*) "e" Y

Naturally, the state is still inconsistent.

Step 3: Resolve inconsistencies.

In the next step, we iterate over all of our LR(0-1) states. In this example, we will not need to create new states, but in future examples we will. The iteration thus consists of a queue and some code like this:

let mut queue = Queue::new();
queue.extend(/* all states */);
while let Some(s) = queue.pop_front() {
    if /* s is an inconsistent state */ {
        resolve_inconsistencies(s, &mut queue);
    }
}

Step 3a: Build the lane table.

To resolve an inconsistent state, we first construct a lane table. This is done by the code in the lane module (the table module maintains the data structure). It works by structing at each conflict and tracing backwards. Let’s start with the final table we will get for the state S1 and then we will work our way back to how it is constructed. First, let’s identify the conflicting actions from S1 and give them indices:

S1 = X = (*) "e"         // C0 -- shift "e"
   | X = "e" (*)     [_] // C1 -- reduce `X = "e" (*)`
   | X = (*) "e" "X"     // C0 -- shift "e"
   | X = "e" (*) X
   | Y = (*) "e"         // C0 -- shift "e"
   | Y = "e" (*)     [_] // C2 -- reduce `Y = "e" (*)`
   | Y = (*) "e" Y       // C0 -- shift "e"
   | Y = "e" (*) Y

Several of the items can cause “Confliction Action 0” (C0), which is to shift an "e". These are all mutually compatible. However, there are also two incompatible actions: C1 and C2, both reductions. In fact, we’ll find that we look back at state S0, these ‘conflicting’ actions all occur with distinct lookahead. The purpose of the lane table is to summarize that information. The lane table we will up constructing for these conflicting actions is as follows:

| State | C0    | C1    | C2    | Successors |
| S0    |       | ["c"] | ["d"] | {S1}       |
| S1    | ["e"] | []    | []    | {S1}       |

Here the idea is that the lane table summarizes the lookahead information contributed by each state. Note that for the shift the state S1 already has enough lookahead information: we only shift when we see the terminal we need next (“e”). But state C1 and C2, the lookahead actually came from S0, which is a predecessor state.

As I said earlier, the algorithm for constructing the table works by looking at the conflicting item and walking backwards. So let’s illustrate with conflict C1. We have the conflicting item X = "e" (*), and we are basically looking to find its lookahead. We know that somewhere in the distant past of our state machine there must be an item like

Foo = ...a (*) X ...b

that led us here. We want to find that item, so we can derive the lookahead from ...b (whatever symbols come after X).

To do this, we will walk the graph. Our state at any point in time will be the pair of a state and an item in that state. To start out, then, we have (S1, X = "e" (*)), which is the conflict C1. Because the (*) is not at the “front” of this item, we have to figure out where this "e" came from on our stack, so we look for predecessors of the state S1 which have an item like X = (*) e. This leads us to S0 and also S1. So we can push two states in our search: (S0, X = (*) "e") and (S1, X 5B= (*) "e"). Let’s consider each in turn.

The next state is then (S0, X = (*) "e"). Here the (*) lies at the front of the item, so we search the same state S0 for items that would have led to this state via an epsilon move. This basically means an item like Foo = ... (*) X ... – i.e., where the (*) appears directly before the nonterminal X. In our case, we will find G0 = (*) X "c". This is great, because it tells us some lookahead (“c”, in particular), and hence we can stop our search. We add to the table the entry that the state S0 contributes lookahead “c” to the conflict C1. In some cases, we might find something like Foo = ... (*) X instead, where the X we are looking for appears at the end. In that case, we have to restart our search, but looking for the lookahead for Foo.

The next state in our case is (S1, X = (*) e). Again the (*) lies at the beginning and hence we search for things in the state S1 where X is the next symbol. We find X = "e" (*) X. This is not as good as last time, because there are no symbols appearing after X in this item, so it does not contribute any lookahead. We therefore can’t stop our search yet, but we push the state (S1, X = "e" (*) X) – this corresponds to the Foo state I mentioned at the end of the last paragraph, except that in this case Foo is the same nonterminal X we started with.

Looking at (S1, X = "e" (*) X), we again have the (*) in the middle of the item, so we move it left, searching for predecessors with the item X = (*) e X. We will (again) find S0 and S1 have such items. In the case of S0, we will (again) find the context “c”, which we dutifully add to the table (this has no effect, since it is already present). In the case of S1, we will (again) wind up at the state (S1, X = "e" (*) X). Since we’ve already visited this state, we stop our search, it will not lead to new context.

At this point, our table column for C1 is complete. We can repeat the process for C2, which plays out in an analogous way.

Step 3b: Update the lookahead

Looking at the lane table we built, we can union the context sets in any particular column. We see that the context sets for each conflicting action are pairwise disjoint. Therefore, we can simply update each reduce action in our state with those lookaheads in mind, and hence render it consistent:

S1 = X = (*) "e"
   | X = "e" (*)     ["c"] // lookahead from C1
   | X = (*) "e" "X"
   | X = "e" (*) X
   | Y = (*) "e"
   | Y = "e" (*)     ["d"] // lookahead from C2
   | Y = (*) "e" Y
   | Y = "e" (*) Y

This is of course also what the LALR(1) state would look like (though it would include context for the other items, though that doesn’t play into the final machine execution).

At this point we’ve covered enough to handle the grammar G0. Let’s turn to a more complex grammar, grammar G1, and then we’ll come back to cover the remaining steps.

Second example: the grammar G1

G1 is a (typo corrected) version of the grammar from the paper. This grammar is not LALR(1) and hence it is more interesting, because it requires splitting states.

Grammar G1

G1 = "a" X "d"
   | "a" Y "c"
   | "b" X "c"
   | "b" Y "d"
X  = "e" X
   | "e"
Y  = "e" Y
   | "e"

The key point of this grammar is that when we see ... "e" "c" and we wish to know whether to reduce to X or Y, we don’t have enough information. We need to know what is in the ..., because "a" "e" "c" means we reduce "e" to Y and "b" "e" "c" means we reduce to X. In terms of our state machine, this corresponds to splitting the states responsible for X and Y based on earlier context.

Let’s look at a subset of the LR(0) states for G1:

S0 = G0 = (*) "a" X "d"
   | G0 = (*) "a" Y "c"
   | G0 = (*) "b" X "c"
   | G0 = (*) "b" X "d"
   
S1 = G0 = "a" (*) X "d"
   | G0 = "a" (*) Y "c"
   | X = (*) "e" X
   | X = (*) "e"
   | Y = (*) "e" Y
   | Y = (*) "e"

S2 = G0 = "b" (*) X "c"
   | G0 = "b" (*) Y "d"
   | X = (*) "e" X
   | X = (*) "e"
   | Y = (*) "e" Y
   | Y = (*) "e"

S3 = X = "e" (*) X
   | X = "e" (*)      // C1 -- can reduce
   | X = (*) "e"      // C0 -- can shift "e"
   | X = (*) "e" "X"  // C0 -- can shift "e"
   | Y = "e" (*) Y
   | Y = "e" (*)      // C2 -- can reduce
   | Y = (*) "e"      // C0 -- can shift "e"
   | Y = (*) "e" Y    // C0 -- can shift "e"

Here we can see the problem. The state S3 is inconsistent. But it is reachable from both S1 and S2. If we come from S1, then we can have (e.g.) X "d", but if we come from S2, we expect X "c".

Let’s walk through our algorithm again. I’ll start with step 3a.

Step 3a: Build the lane table.

The lane table for state S3 will look like this:

| State | C0    | C1    | C2    | Successors |
| S1    |       | ["d"] | ["c"] | {S3}       |
| S2    |       | ["c"] | ["d"] | {S3}       |
| S3    | ["e"] | []    | []    | {S3}       |

Now if we union each column, we see that both C1 and C2 wind up with lookahead {"c", "d"}. This is our problem. We have to isolate things better. Therefore, step 3b (“update lookahead”) does not apply. Instead we attempt step 3c.

Step 3c: Isolate lanes

This part of the algorithm is only loosely described in the paper, but I think it works as follows. We will employ a union-find data structure. With each set, we will record a “context set”, which records for each conflict the set of lookahead tokens (e.g., {C1:{"d"}}).

A context set tells us how to map the lookahead to an action; therefire, to be self-consistent, the lookaheads for each conflict must be mutually disjoint. In other words, {C1:{"d"}, C2:{"c"}} is valid, and says to do C1 if we see a “d” and C2 if we see a “c”. But {C1:{"d"}, C2:{"d"}} is not, because there are two actions.

Initially, each state in the lane table is mapped to itself, and the conflict set is derived from its column in the lane table:

S1 = {C1:d, C2:c}
S2 = {C1:c, C2:d}
S3 = {C0:e}

We designate “beachhead” states as those states in the table that are not reachable from another state in the table (i.e., using the successors). In this case, those are the states S1 and S2. We will be doing a DFS through the table and we want to use those as the starting points.

(Question: is there always at least one beachhead state? Seems like there must be.)

So we begin by iterating over the beachhead states.

for beachhead in beachheads { ... }

When we visit a state X, we will examine each of its successors Y. We consider whether the context set for Y can be merged with the context set for X. So, in our case, X will be S1 to start and Y will be S3. In this case, the context set can be merged, and hence we union S1, S3 and wind up with the following union-find state:

S1,S3 = {C0:e, C1:d, C2:c}
S2    = {C1:c, C2:d}

(Note that this union is just for the purpose of tracking context; it doesn’t imply that S1 and S3 are the ‘same states’ or anything like that.)

Next we examine the edge S3 -> S3. Here the contexts are already merged and everything is happy, so we stop. (We already visited S3, after all.)

This finishes our first beachhead, so we proceed to the next edge, S2 -> S3. Here we find that we cannot union the context: it would produce an inconsistent state. So what we do is we clone S3 to make a new state, S3’, with the initial setup corresponding to the row for S3 from the lane table:

S1,S3 = {C0:e, C1:d, C2:c}
S2    = {C1:c, C2:d}
S3'   = {C0:e}

This also involves updating our LR(0-1) state set to have a new state S3’. All edges from S2 that led to S3 now lead to S3’; the outgoing edges from S3’ remain unchanged. (At least to start.)

Therefore, the edge S2 -> S3 is now S2 -> S3'. We can now merge the conflicts:

S1,S3  = {C0:e, C1:d, C2:c}
S2,S3' = {C0:e, C1:c, C2:d}

Now we examine the outgoing edge S3’ -> S3. We cannot merge these conflicts, so we search (greedily, I guess) for a clone of S3 where we can merge the conflicts. We find one in S3’, and hence we redirect the S3 edge to S3’ and we are done. (I think the actual search we want is to make first look for a clone of S3 that is using literally the same context as us (i.e., same root node), as in this case. If that is not found, then we search for one with a mergable context. If that fails, then we clone a new state.)

The final state thus has two copies of S3, one for the path from S1, and one for the path from S2, which gives us enough context to proceed.

Conclusion

As I wrote, I’ve been experimenting with the Lane Table algorithm in LALRPOP and I now have a simple prototype that seems to work. It is not by any means exhaustively tested – in fact, I’d call it minimally tested – but hopefully I’ll find some time to play around with it some more and take it through its paces. It at least handles the examples in the paper.

The implementation is also inefficient in various ways. Some of them are minor – it clones more than it needs to, for example – and easily corrected. But I also suspect that one can do a lot more caching and sharing of results. Right now, for example, I construct the lane table for each inconsistent state completely from scratch, but perhaps there are ways to preserve and share results (it seems naively as if this should be possible). On the other hand, constructing the lane table can probably be made pretty fast: it doesn’t have to traverse that much of the grammar. I’ll have to try it on some bigger examples and see how it scales.

Edits

  • The lane table I originally described had the wrong value for the successor column. Corrected.

Air MozillaReps Weekly Meeting Mar. 16, 2017

Reps Weekly Meeting Mar. 16, 2017 This is a weekly call with some of the Reps to discuss all matters about/affecting Reps and invite Reps to share their work with everyone.

Doug BelshawWhat does it mean to be 'digitally employable'?

Let’s define terms:

Employable

  1. Suitable for paid work.
  2. Able to be used.

So being employable means that someone is useful and can fill a particular role, whether as an employee or freelancer, for paid work.


Having written my doctoral thesis on digital literacy, and led Mozilla’s Web Literacy Map from inception to v1.5, one of my biggest frustrations has been how new literacies are developed. It’s all very well having a framework and a development methodology, but how on earth do you get to actually effect the change you want to see in the world?

For the past few weeks, the germ of an idea has been growing in my mind. On the one hand there are digital literacy frameworks specifying what people should be able to know, do, and think. On the other there are various approaches to employability skills. The latter is a reaction to formal education institutions being required to track the destination of people who graduate (or leave) them.

The third factor in play here is Open Badges. This is a metadat specification and ecosystem for verifiable digital credentials that can be used to provide evidence of anything. In addition, anybody can earn and issue them, the value coming in recognition of the issuing body, and/or whether the supplied evidence shows that the individual has done something worthwhile.

The simplest way of representing these three areas of focus is using a Venn diagram:

Digital Employability?

The reason it’s important that Open Badges are in there is that they provide evidence of digital literacies and employability skills. They provide a ‘route to market’ for the frameworks to actually make a difference in the world.

I know there’s an appetite for this, as I present on a regular basis and one of the slides I use is this one from Sussex Downs College:

Sussex Downs Employability Passport

People ask me to go back to the slide ‘so they can take a photo of it’. Our co-op knows the team behind this project, as we helped with the kick-off of the project, and then ran a thinkathon for them as they looked to scale it.

The Sussex Downs approach has a number of great elements to it: a three part badge system; competencies defined with the help of local businesses; a very visual design; and the homely metaphor of a ‘passport’.

One thing that’s perhaps lacking is the unpacking of ‘Digital Literacy’ as more than a single element on the grid. To my mind, there’s a plethora of digital knowledge, skills, and behaviours that’s directly applicable to employability.

In my experience, it’s important to come up with a framework for the work that you do. That’s why Sussex Downs’ Employability Passport is so popular. However, I think there’s also a need for people to be able to do the equivalent of pressing ‘view source’ on the framework to see why and how it was put together.

What I’d like to do is to come up with a framework for digital employability that looks at the knowledge, skills, and understanding to thrive in the new digital economy. That will necessarily be a mix of old-school things like using Outlook, but also newer workflows such as GitHub.

Once that’s defined, preferably with input from a diverse list of contributors and endorsement from various relevant organisations, it will be time to issue badges. Once the ‘rubber hits the road’ we’ll no doubt then need to update the framework. It’s an iterative process.

Early days, but I wanted to put something out there. Get in touch if this sounds interesting!


Comments? Questions? I’m @dajbelshaw on Twitter, or you can email me: hello@dynamicskillset.com

QMOExtra Testday event hold by Mozilla Tamilnadu community

Hello Mozillians,

This week, Mozilla community from Tamilnadu organized and held a Testday event in various campus clubs from their region.

I just wanted to thank you all for taking part in this. With the community help, Mozilla is improving every day.

Several test cases were executed for the WebM Alpha, Reader Mode Displays Estimate Reading Time and Quantum – Compositor Process features.

Many thanks to Prasanth P, Surentharan R A, Monesh, Subash, Rohit R, @varun1102, Akksaya, Roshini, Swathika, Suvetha Sri, Bhava, aiswarya.M, Aishvarya, Divya, Arpana, Nivetha, Vallikannu, Pavithra Roselin, Suryakala, prakathi, Bhargavi.G, Vignesh.R, Meganisha.B, Aishwarya.k, harshini.k, Rajesh, Krithika Sowbarnika, harini shilpa, Dhinesh kumar, KAVIPRIYA.S, HARITHA K SANKARI, Nagaraj V, abarna, Sankararaman, Harismitaa R K, Kavya, Monesh, Harini, Vignesh, Anushri, Vishnu Priya, Subash.M, Vinothini K, Pavithra R.

Keep up the good work!
Mihai Boldan, QA Community Mentor
Firefox for Desktop, Release QA Team

The Rust Programming Language BlogAnnouncing Rust 1.16

The Rust team is happy to announce the latest version of Rust, 1.16.0. Rust is a systems programming language focused on safety, speed, and concurrency.

If you have a previous version of Rust installed, getting Rust 1.16 is as easy as:

$ rustup update stable

If you don’t have it already, you can get rustup from the appropriate page on our website, and check out the detailed release notes for 1.16.0 on GitHub.

What’s in 1.16.0 stable

The largest addition to Rust 1.16 is cargo check. This new subcommand should speed up the development workflow in many cases.

What does it do? Let’s take a step back and talk about how rustc compiles your code. Compilation has many “passes”, that is, there are many distinct steps that the compiler takes on the road from your source code to producing the final binary. You can see each of these steps (and how much time and memory they take) by passing -Z time-passes to a nightly compiler:

 rustc .\hello.rs -Z time-passes
time: 0.003; rss: 16MB  parsing
time: 0.000; rss: 16MB  recursion limit
time: 0.000; rss: 16MB  crate injection
time: 0.000; rss: 16MB  plugin loading
time: 0.000; rss: 16MB  plugin registration
time: 0.049; rss: 34MB  expansion
<snip>

There’s a lot of them. However, you can think of this process in two big steps: first, rustc does all of its safety checks, makes sure your syntax is correct, all that stuff. Second, once it’s satisfied that everything is in order, it produces the actual binary code that you end up executing.

It turns out that that second step takes a lot of time. And most of the time, it’s not neccesary. That is, when you’re working on some Rust code, many developers will get into a workflow like this:

  1. Write some code.
  2. Run cargo build to make sure it compiles.
  3. Repeat 1-2 as needed.
  4. Run cargo test to make sure your tests pass.
  5. GOTO 1.

In step two, you never actually run your code. You’re looking for feedback from the compiler, not to actually run the binary. cargo check supports exactly this use-case: it runs all of the compiler’s checks, but doesn’t produce the final binary.

So how much speedup do you actually get? Like most performance related questions, the answer is “it depends.” Here are some very un-scientific benchmarks:

  thanks cargo diesel
initial build 134.75s 236.78s 15.27s
initial check 50.88s 148.52s 12.81s
speedup 2.648 1.594 1.192
secondary build 15.97s 64.34s 13.54s
secondary check 2.9s 9.29s 12.3s
speedup 5.506 6.925 1.100

The ‘initial’ categories are the first build after cloning down a project. The ‘secondary’ categories involved adding one blank line to the top of src\lib.rs and running the command again. That’s why the initial ones are more dramatic; they involve also doing this for all dependencies, as well as the crate itself. As you can see, larger projects with many dependencies see a big improvement, but smaller ones see much more modest gains.

We are still working on improving compile-times generally as well, though we don’t have anything in particular to highlight at this time.

Other improvements

To support cargo check, rustc has learned to emit a new kind of file: .rmeta. This file will contain only the metadata about a particular crate. cargo check needs this for your dependencies, to let the compiler check types and such from them. It’s also useful for the Rust Language Server, and possibly more tools in the future.

Another large change is the removal of a long-standing diagnostic: consider using an explicit lifetime parameter. This diagnostic would kick in whenever you had an incorrect lifetime annotation, and the compiler thought that you might have meant something else. Consider this code:

use std::str::FromStr;

pub struct Name<'a> {
    name: &'a str,
}

impl<'a> FromStr for Name<'a> {
    type Err = ();

    fn from_str(s: &str) -> Result<Name, ()> {
        Ok(Name { name: s })
    }
}

Here, Rust isn’t sure what to do with the lifetimes; as written, the code doesn’t guarantee that s will live as long as Name, which is required for Name to be valid. Let’s try to compile this code with Rust 1.15.1:

> rustc +1.15.1 foo.rs --crate-type=lib
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in generic type due to conflicting requirements
  --> .\foo.rs:10:5
   |
10 |       fn from_str(s: &str) -> Result<Name, ()> {
   |  _____^ starting here...
11 | |         Ok(Name { name: s })
12 | |     }
   | |_____^ ...ending here
   |
help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result<Name, ()>
  --> .\foo.rs:10:5
   |
10 |       fn from_str(s: &str) -> Result<Name, ()> {
   |  _____^ starting here...
11 | |         Ok(Name { name: s })
12 | |     }
   | |_____^ ...ending here

The compiler explains the issue, and gives a helpful suggestion. So let’s try it: modify the code to add in the 'a, and compile again:

> rustc +1.15.1 .\foo.rs --crate-type=lib
error[E0308]: method not compatible with trait
  --> .\foo.rs:10:5
   |
10 |       fn from_str(s: &'a str) -> Result<Name, ()> {
   |  _____^ starting here...
11 | |         Ok(Name { name: s })
12 | |     }
   | |_____^ ...ending here: lifetime mismatch
   |
<snip>
help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result<Name<'a>, ()>
  --> .\foo.rs:10:5
   |
10 |       fn from_str(s: &'a str) -> Result<Name, ()> {
   |  _____^ starting here...
11 | |         Ok(Name { name: s })
12 | |     }
   | |_____^ ...ending here

It still doesn’t work. That help message was not actually helpful. It does suggest adding another lifetime, this time on Name. If we do that…

> rustc +1.15.1 .\foo.rs --crate-type=lib
<snip>
help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result<Name<'a>, ()>
  --> .\foo.rs:10:5

… that’s what we already have, compiler!

This diagnostic was well-intentioned, but when it’s wrong, it was very wrong, as you can see here. Sometimes it wouldn’t even suggest valid Rust syntax! Furthermore, more advanced Rust users didn’t really need the suggestion, but new Rustaceans would take them to heart, and then be led down this bad path. As such, we decided that for now, we should remove the help message entirely. We may bring it back in the future, but only if we can limit false positives.

An aside: the above implementation is not possible; Name would need to use String, not &str.

In other diagnostic changes, previous versions of Rust would helpfully attempt to suggest fixes for typos:

let foo = 5;

println!("{}", ffo);

Would give this error:

error[E0425]: cannot find value `ffo` in this scope
 --> foo.rs:4:20
  |
4 |     println!("{}", ffo);
  |                    ^^^ did you mean `foo`?

However, this would only happen in certain circumstances: sometimes in local variables, and for fields in structs. This now happens nearly everywhere. When combined with some other related improvements, this results in a significant improvement in these sorts of diagnostics.

See the detailed release notes for more.

Library stabilizations

21 new bits of API were stabilized this release:

In addition, a number of small improvements to existing functions landed. For example writeln! now has a single-argument form, just like println! has. This ends up writing only a newline, but is a nice bit of symmetry.

All structs in the standard library now implement Debug.

When slicing a &str, you’ll see better errors. For example, this code:

&"abcαβγ"[..4]

Is incorrect. It generates this error:

thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

The part after the ; is new.

See the detailed release notes for more.

Cargo features

In addition to cargo check, Cargo and crates.io have some new polish added. For example, cargo build and cargo doc now take a --all flag for building and documenting every crate in your workspace with one command.

Cargo now has a --version --verbose flag, mirroring rustc.

Crates.io now can show off your TravisCI or AppVeyor badges on your crate’s page.

In addition, both Cargo and crates.io understand categories. Unlike keywords, which are free-form, categories are curated. In addition, keywords are used for searching, but categories are not. In other words, categories are intended to assist browsing, and keywords are intended to assist searching.

You can browse crates by category here.

See the detailed release notes for more.

Contributors to 1.16.0

Last release, we introduced thanks.rust-lang.org. We have been doing some behind-the-scenes refactoring work to allow for more projects than only Rust itself; we’re hoping to introduce that in the next release.

We had 135 individuals contribute to Rust 1.16. Thanks!

Myk MelezIntroducing qbrt

I recently blogged about discontinuing Positron. I’m trying a different tack with a new experiment, codenamed qbrt, that reuses an existing Gecko runtime (and its existing APIs) while simplifying the process of developing and packaging a desktop app using web technologies.

qbrt is a command-line interface written in Node.js and available via NPM:

npm install -g qbrt

Installing it also installs a Gecko runtime (currently a nightly build of Firefox, but in the future it could be a stable build of Firefox or a custom Gecko runtime). Its simplest use is then to invoke the ‘run’ command with a URL:

qbrt run https://eggtimer.org/

Which will start a process and load the URL into a native window:

URLs loaded in this way don’t have privileged access to the system. They’re treated as web content, not application chrome.

To load a desktop application with system privileges, point qbrt at a local directory containing a package.json file and main entry script:

qbrt run path/to/my/app/

For example, clone qbrt’s repo and try its example/ app:

git clone https://github.com/mozilla/qbrt.git
qbrt run qbrt/example/

This will start a process and load the app into a privileged context, giving it access to Gecko’s APIs for opening windows and loading web content along with system integration APIs for file manipulation, networking, process management, etc.

(Another good example is the “shell” app that qbrt uses to load URLs.)

To package an app for distribution, invoke the ‘package’ command, which creates a platform-specific package containing both the app’s resources and the Gecko runtime:

qbrt package path/to/my/app/

Note that while qbrt is written in Node.js, it doesn’t provide Node.js APIs to apps. It might be useful to do so, using SpiderNode, as we did with Positron, although Gecko’s existing APIs expose equivalent functionality.

Also, qbrt doesn’t yet support runtime version management (i.e. being able to specify which version of Gecko to use, and to switch between them). At the time you install it, it downloads the latest nightly build of Firefox. (You can update that nightly build by reinstalling qbrt.)

And the packaging support is primitive. qbrt creates a shell script (batch script on Windows) to launch your app, and it packages your app using a platform-specific format (ZIP on Windows, DMG on Mac, and tar/gzip on Linux). But it doesn’t set icons nor most other package meta-data, and it doesn’t create auto-installers nor support signing the package.

In general, qbrt is immature and unstable! It’s appropriate for testing, but it isn’t yet ready for you to ship apps with it.

Nevertheless, I’m keen to hear how it works for you, and whether it supports your use cases. What would you want to do with it, and what additional features would you need from it?

The Mozilla BlogFive issues that will determine the future of Internet Health

In January, we published our first Internet Health Report on the current state and future of the Internet. In the report, we broke down the concept of Internet health into five issues. Today, we are publishing issue briefs about each of them: online privacy and security, decentralization, openness, web literacy and digital inclusion. These issues are the building blocks to a healthy and vibrant Internet. We hope they will be a guide and resource to you.

We live in a complex, fast moving, political environment. As policies and laws around the world change, we all need to help protect our shared global resource, the Internet. Internet health shouldn’t be a partisan issue, but rather, a cause we can all get behind. And our choices and actions will affect the future health of the Internet, for better or for worse.

We work on many other policies and projects to advance our mission, but we believe that these issue briefs help explain our views and actions in the context of Internet health:


 

1. Online Privacy & Security:

Security and privacy on the Internet are fundamental and must not be treated as optional.

In our brief, we highlight the following subtopics:

  • Meaningful user control – People care about privacy. But effective understanding and control are often difficult, or even impossible, in practice.
  • Data collection and use – The tech industry, too often, reflects a culture of ‘collect and hoard all the data’. To preserve trust online, we need to see a change.
  • Government surveillance – Public distrust of government is high because of broad surveillance practices. We need more transparency, accountability and oversight.
  • Cybersecurity – Cybersecurity is user security. It’s about our Internet, our data, and our lives online. Making it a reality requires a shared sense of responsibility.

Protecting your privacy and security doesn’t mean you have something to hide. It means you have the ability to choose who knows where you go and what you do.


2. Openness:

A healthy Internet is open, so that together, we can innovate.

To make that a reality, we focus on these three areas:

  • Open source – Being open can be hard. It exposes every wrinkle and detail to public scrutiny. But it also offers tremendous advantages.
  • Copyright – Offline copyright law built for an analog world doesn’t fit the current digital and mobile reality.
  • Patents – In technology, overbroad and vague patents create fear, uncertainty and doubt for innovators.

Copyright and patent laws should better foster collaboration and economic opportunity. Open source, open standards, and pro-innovation policies must continue to be at the heart of the Internet.


3. Decentralization:

There shouldn’t be online monopolies or oligopolies; a decentralized Internet is a healthy Internet.

To accomplish that goal, we are focusing on the following policy areas.

  • Net neutralityNetwork operators must not be allowed to block or skew connectivity or the choices of Internet users.
  • Interoperability – If short-term economic gains limit long-term industry innovation, then the entire technology industry and economy will suffer the consequences.
  • Competition and choice – We need the Internet to be an engine for competition and user choice, not an enabler of gatekeepers.
  • Local contribution – Local relevance is about more than just language; it’s also tailored to the cultural context and the local community.

When there are just a few organizations and governments who control the majority of online content, the vital flow of ideas and knowledge is blocked. We will continue to look for public policy levers to advance our vision of a decentralized Internet.


4. Digital Inclusion:

People, regardless of race, income, nationality, or gender, should have unfettered access to the Internet.

To help promote an open and inclusive Internet, we are focusing on these issues:

  • Advancing universal access to the whole Internet Everyone should have access to the full diversity of the open Internet.
  • Advancing diversity online – Access to and use of the Internet are far from evenly distributed. This represents a connectivity problem and a diversity problem.
  • Advancing respect online – We must focus on changing and building systems that rely on both technology and humans, to increase and protect diverse voices on the Internet.

Numerous and diverse obstacles stand in the way of digital inclusion, and they won’t be overcome by default. Our aim is to collaborate with, create space for, and elevate everyone’s contributions.


5. Web Literacy:

Everyone should have the skills to read, write and participate in the digital world.

To help people around the globe participate in the digital world, we are focusing on these areas:

  • Moving beyond coding –  Universal web literacy doesn’t mean everyone needs to learn to code; other kinds of technical awareness and empowerment can be very meaningful.
  • Integrating web literacy into education – Incorporating web literacy into education requires examining the opportunities and challenges faced by both educators and youth.
  • Cultivating digital citizenship – Everyday Internet users should be able to shape their own Internet experience, through the choices that they make online and through the policies and organizations they choose to support.

Web literacy should be foundational in education, like reading and math. Empowering people to shape the web enables people to shape society itself. We want people to go beyond consuming and contribute to the future of the Internet.


Promoting, protecting, and preserving a healthy Internet is challenging, and takes a broad movement working on many different fronts. We hope that you will read these and take action alongside us, because in doing so you will be protecting the integrity of the Internet. For our part, we commit to advancing our mission and continuing our fight for a vibrant and healthy Internet.

The post Five issues that will determine the future of Internet Health appeared first on The Mozilla Blog.

Air MozillaThe Joy of Coding - Episode 95

The Joy of Coding - Episode 95 mconley livehacks on real Firefox bugs while thinking aloud.

Air MozillaBuilding Habit-Forming Products with Nir Eyal

Building Habit-Forming Products with Nir Eyal Nir Eyal has built and invested in products reaching hundreds of millions of users including AdNectar, Product Hunt and EventBrite. He'll draw on core psychological...

Mozilla Addons BlogAdd-ons Update – 2017/03

Here’s the state of the add-ons world this month.

The Road to Firefox 57 explains what developers should look forward to in regards to add-on compatibility for the rest of the year. Please give it a read if you haven’t already.

The Review Queues

In the past month, 1,414 listed add-on submissions were reviewed:

  • 1132 (80%) were reviewed in fewer than 5 days.
  • 31 (2%) were reviewed between 5 and 10 days.
  • 251 (18%) were reviewed after more than 10 days.

There are 594 listed add-ons awaiting review.

We met last week to discuss the state of the queues and our plans to reduce waiting times. There are already some changes coming in the next month or so that should help significantly, but we have larger plans that we will share soon that should address this recurring problem permanently.

If you’re an add-on developer and are looking for contribution opportunities, please consider joining us. Add-on reviewers are critical for our success, and can earn cool gear for their work. Visit our wiki page for more information.

Compatibility

The blog post for 53 is up and the bulk validation will be run soon. Firefox 54 is coming up.

Multiprocess Firefox is enabled for some users, and will be deployed for most users very soon. Make sure you’ve tested your add-on and either use WebExtensions or set the multiprocess compatible flag in your add-on manifest.

As always, we recommend that you test your add-ons on Beta and Firefox Developer Edition to make sure that they continue to work correctly. End users can install the Add-on Compatibility Reporter to identify and report any add-ons that aren’t working anymore.

Recognition

We would like to thank the following people for their recent contributions to the add-ons world:

  • Piotr Drąg
  • Niharika Khanna
  • saintsebastian
  • Atique Ahmed Ziad
  • gilbertginsberg
  • felixgirault
  • StandB
  • lavish205
  • numrut
  • fitojb
  • totaki
  • ingoe

You can read more about their work in our recognition page.

The post Add-ons Update – 2017/03 appeared first on Mozilla Add-ons Blog.

Robert KaiserFinal Round for My LCARStrek and EarlyBlue Themes

As you may have noted, Mozilla published a plan for a new themes system that doesn't fully cover my thoughts on the matter and ends up making themes that go as far as my LCARStrek theme impossible.

The only way I could still hold up this extent of theming is to spread it guerilla-style as userChrome.css mods, i.e. a long CSS sheet to be copied into people's userChromes.css manually. That would still allow the extent of theming, but be extremely inconvenient to distribute.

Because of that, I will stop development of my themes as soon as Firefox 57 hits Nightly and I can't use the LCARStrek theme myself any more (EarlyBlue, which is SeaMonkey-only, is something I just dragged along anyhow). Given the insecurity of even having releases and the small "market", I also will not continue them for SeaMonkey only, Firefox has been the only thing that really mattered any more there.

Also, explicit theming support for Firefox devtools is being removed from LCARStrek with the 2.49 release that I just submitted to AMO as it's extremely complicated to maintain and with the looming removal of full themes from Firefox, that amount of work is not worth my time any more. Because of this, there is a bit of a mixture of styles in some areas of devtools esp. in Firefox 52 (improving in newer versions) but that is outside of the control of a theme author. I tested that devtools are usable this way, contrast of icons in toolbars isn't optimal at times but visible enough so developers can work with them. To any LCARStrek users, sorry for the inconvenience, I would have put more work into this if the theming feature of this extent would not be removed.

Image No. 23308

This is a hard step for me as the first thing I experimented with when I downloaded my first Mozilla M5 build in 1999 was actually the theming files, and LCARStrek came out of that as a demonstration of how awesome this system of customization was and how far it could go. It achieve a look that really was out of this world, but I guess the new direction of Firefox is not compatible with a 24th century look. ;-)

It will also be hard for me go move back to the bland look of the default theme, esp. as it looks even more boring on Linux than on other platforms, but I have a few months to get used to the idea before I actually have to do this, and I will keep the themes going for that little while.

Somehow this fits well with the overall theme that MoCo and myself are at odds right now on a number of things, but you can be assured that I'm not gone from the community, as a matter of fact I have planned a few activities in Vienna in the next months, from WebVR workshops to conference appearances, and I'm just about to finish the Tech Speakers training and hope to be more active in that area in the future.

LLAP!

Firefox NightlyThese Weeks in Firefox: Issue 12

Highlights

  • Turning on the permissions dialog for WebExtensions this week

    The WebExtension permission dialog for Ghostery, asking the user if the add-on can access browser activity during navigation, browser tabs, and browsing history

    Now you get a better sense of what the add-on is capable of.

  • Download progress indication redesign landed and is now in Nightly and Firefox Developer Edition! (UX spec)
    • Special thanks to the front-end engineering and user experience team who worked on the project: Bryant Mao, Carol Huang, Morpheus Chen, Rex Lee, and Sean Lee!
  • e10s-multi tentatively scheduled to ship to release in Firefox 55 with 4 content processes
    • Recent measurements show that we’re quite competitive, memory-wise, in this configuration

      A set of bar graphs comparing memory usage in Firefox and competing browsers with different content process numbers.

      Feeling slim and trim!

  • Updated orphaning dashboard! This will help us figure out why some people are stuck on older versions of Firefox.
  • You can now reorder tabs in Firefox for Android (Nightly)!
  • <img src=🔥" class="wp-smiley" style="height: 1em; max-height: 1em;" />New Test Pilot experiment<img src=🔥" class="wp-smiley" style="height: 1em; max-height: 1em;" />: Containers launched March 2nd
  • ehsan sent out a Quantum Flow Engineering Newsletter that you should read

Friends of the Firefox team

Project Updates

Add-ons

Activity Stream

Electrolysis (e10s)

Firefox Core Engineering

  • Flash
    • Running a telemetry experiment this month on 55 (nightly) to see how the Flash-as-click-to-play-by-default behaves, in advance of a 53 release Shield study
  • Crash pings on Nightly 55 and Aurora 54:
    • …are being sent by pingSender;
    • …have content crash pings;
    • …have raw stacks in those crash pings.
    • Building the ability to gather info from that data starts this week.
    • Reminder that FPO is off as of 53.
  • Main shutdown pings are going to be sent via pingSender in 55
  • Beginning work on Update Agent, which will continue downloading an update if the user’s session has ended
  • Updater
    • Simplification of Updater UI aimed at 55
    • Changes for MAR signing are unblocked (compression (LZMA) and cert (SHA-384)), aiming for land by the end of Q1.

Form Autofill

Mobile

  • Shipped Firefox Focus 3.1 with support for 20 new locales!
  • Extended the beta period for Firefox for iOS 7.0 as well as a new beta build for user testing

Platform UI and other Platform Audibles

Privacy/Security

  • Polish work for permission notifications + insecure password warning (live in 52 🔥)
  • johannh is working on getting right click + autocomplete/insecure password warning behavior to work correctly for password and username fields
  • nhnt11 is working on a few polish bugs for permissions notifications
  • paolo is fixing regressions (and importantly, bug 1345449 – doorhanger won’t stay open when the browser window is minimized)

Quality of Experience

  • Will soon be handing off most of the Theme API work to the WebExtensions team to allow transitioning to the Photon work.
  • Preferences work is moving along, the patches are looking good and we are going through review cycles now.
  • Continuing to work on performance of importing data from other browsers.
    • Currently looking at a 10 times runtime improvement for bookmark import \o/ <img src=<img src=🔥" class="wp-smiley" style="height: 1em; max-height: 1em;" />" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src=<img src=🔥" class="wp-smiley" style="height: 1em; max-height: 1em;" />" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src=<img src=🔥" class="wp-smiley" style="height: 1em; max-height: 1em;" />" class="wp-smiley" style="height: 1em; max-height: 1em;" />
    • History import improvements have landed on 54 and uplifted to 53.

Search

Sync / Firefox Accounts

  • The Mobile Bookmarks folder is now visible in the Bookmarks menu bar and toolbar after a successful Sync! This is the result of work done by the Sync and SUMO teams to help make Firefox Sync easier to use.

Test Pilot

  • Page Shot in 54 is on track.
  • Experiments updating this week:
    • Snoozetabs (adding localization and bug fixes)
    • Min Vid (adding better metrics and bug fixes).  History/Upcoming queues coming soon
    • Cliqz v2 (new UI) coming soon

Here are the raw meeting notes that were used to derive this list.

Want to help us build Firefox? Get started here!

Here’s a tool to find some mentored, good first bugs to hack on.

Mark CôtéConduit Field Report, March 2017

For background on Conduit, please see the previous post and the Intent to Implement.

Autoland

We kicked off Conduit work in January starting with the new Autoland service. Right now, much of the Autoland functionality is located in the MozReview Review Board extension: the permissions model, the rewriting of commit messages to reflect the reviewers, and the user interface. The only part that is currently logically separate is the “transplant service”, which actually takes commits from one repo (e.g. reviewboard-hg) and applies it to another (e.g. try, mozilla-central). Since the goal of Conduit is to decouple all the automation from code-review tools, we have to take everything that’s currently in Review Board and move it to new, separate services.

The original plan was to switch Review Board over to the new Autoland service when it was ready, stripping out all the old code from the MozReview extension. This would mean little change for MozReview users (basically just a new, separate UI), but would get people using the new service right away. After Autoland, we’d work on the push-to-review side, hooking that up to Review Board, and then extend both systems to interface with BMO. This strategy of incrementally replacing pieces of MozReview seemed like the best way to find bugs as we went along, rather than a massive switchover all at once.

However, progress was a bit slower than we anticipated, largely due to the fact that so many things were new about this project (see below). We want Autoland to be fully hooked up to BMO by the end of June, and integrating the new system with both Review Board and BMO as we went along seemed increasingly like a bad idea. Instead, we decided to put BMO integration first, and then follow with Review Board later (if indeed we still want to use Review Board as our rich-code-review solution).

This presented us with a problem: if we wouldn’t be hooking the new Autoland service up to Review Board, then we’d have to wait until the push service was also finished before we hooked them both up to BMO. Not wanting to turn everything on at once, we pondered how we could still launch new services as they were completed.

Moving to the other side of the pipeline

The answer is to table our work on Autoland for now and switch to the push service, which is the entrance to the commit pipeline. Building this piece first means that users will be able to push commits to BMO for review. Even though they would not be able to Autoland them right away, we could get feedback and make the service as easy to use as possible. Think of it as a replacement for bzexport.

Thanks to our new Scrum process (see also below), this priority adjustment was not very painful. We’ve been shipping Autoland code each week, so, while it doesn’t do much yet, we’re not abandoning any work in progress or leaving patches half finished. Plus, since this new service is also being started from scratch (although involving lots of code reuse from what’s currently in MozReview), we can apply the lessons we learned from the last couple months, so we should be moving pretty quickly.

Newness

As I mentioned above, although the essence of Conduit work right now is decoupling existing functionality from Review Board, it involves a lot of new stuff. Only recently did we realize exactly how much new stuff there was to get used to!

New team members

We welcomed Israel Madueme to our team in January and threw him right into the thick of things. He’s adapted tremendously well and started contributing immediately. Of course a new team member means new team dynamics, but he already feels like one of us.

Just recently, we’ve stolen dkl from the BMO team, where he’s been working since joining Mozilla 6 years ago. I’m excited to have a long-time A-Teamer join the Conduit team.

A new process

At the moment we have five developers working on the new Conduit services. This is more people on a single project than we’re usually able to pull together, so we needed a process to make sure we’re working to our collective potential. Luckily one of us is a certified ScrumMaster. I’ve never actually experienced Scrum-style development before, but we decided to give it a try.

I’ll have a lot more to say about this in the future, as we’re only just hitting our stride now, but it has felt really good to be working with solid organizational principles. We’re spending more time in meetings than usual, but it’s paying off with a new level of focus and productivity.

A new architecture

Working within Review Board was pretty painful, and the MozReview development environment, while amazing in its breadth and coverage, was slow and too heavily focussed on lengthy end-to-end tests. Our new design follows more of a microservice-based approach. The Autoland verification system (which checks users permissions and ensures that commits have been properly reviewed) is a separate service, as is the UI and the transplant service (as noted above, this last part was actually one of the few pieces of MozReview that was already decoupled, so we’re one step ahead there). Similarly, on the other side of the pipeline, the commit index is a separate service, and the review service may eventually be split up as well.

We’re not yet going whole-hog on microservices—we don’t plan, for starters at least, to have more than 4 or 5 separate services—but we’re already benefitting from being able to work on features in parallel and preventing runaway complexity. The book Building Microservices has been instrumental to our new design, as well as pointing out exactly why we had difficulties in our previous approach.

New operations

As the A-Team is now under Laura Thomson, we’re taking advantage of our new, closer relationship to CloudOps to try a new deployment and operations approach. This has freed us of some of the constraints of working in the data centre while letting us take advantage of a proven toolchain and process.

New technologies

We’re using Python 3.5 (and probably 3.6 at some point) for our new services, which I believe is a first for an A-Team project. It’s new for much of the team, but they’ve quickly adapted, and we’re now insulated against the 2020 deadline for Python 2, as well as benefitting from the niceties of Python 3 like better Unicode support.

We also used a few technologies for the Autoland service that are new to most of the team: React and Tornado. While the team found it interesting to learn them, in retrospect using them now was probably a case of premature optimization. Both added complexity that was unnecessary right now. React’s URL routing was difficult to get working in a way that seamlessly supported a local, Docker-based development environment and a production deployment scenario, and Tornado’s asynchronous nature led to extra complexity in automated tests. Although they are both fine technologies and provide scalable solutions for complex apps, the individual Conduit services are currently too small to really benefit.

We’ve learned from this, so we’re going to use Flask as the back end for the push services (commit index and review-request generator), for now at least, and, if we need a UI, we’ll probably use a relatively simple template approach with JavaScript just for enhancements.

Next

In my next post, I’m going to discuss our approach to the push services and more on what we’ve learned from MozReview.

Air MozillaMartes Mozilleros, 14 Mar 2017

Martes Mozilleros Reunión bi-semanal para hablar sobre el estado de Mozilla, la comunidad y sus proyectos. Bi-weekly meeting to talk (in Spanish) about Mozilla status, community and...

Christian HeilmannWant to learn more about using the command line? Remy helps!

This is an unashamed plug for Remy Sharp’s terminal training course command line for non–techies. Go over there and have a look at what he’s lined up for a very affordable price. In a series of videos he explains all the ins and outs of the terminal and its commands that can make you much more effective in your day-to-day job.

Working the command line ebook

I’ve read the ebook of the same course and have to say that I learned quite a few things but – more importantly – remembered a lot I had forgotten. By using the findings over and over a lot has become muscle memory, but it is tough to explain what I am doing. Remy did a great job making the dark command line magic more understandable and less daunting. Here is what the course covers:

Course material

“Just open the terminal”

  • Just open the terminal (03:22)
  • Why use a terminal? (03:23)
  • Navigating directories (07:71)
  • Navigation shortcuts (01:06)

Install all the things

  • Running applications (05:47)
  • brew install fun (07:46)
  • gem install (06:32)
  • npm install—global (09:44)
  • Which is best? (02:13)

Tools of the Terminal Trade

  • Connecting programs (08:25)
  • echo & cat (01:34)
  • grep “searching” (06:22)
  • head tail less (10:24)
  • sort | uniq (07:58)

How (not) to shoot yourself in the foot

  • Delete all the things (07:42)
  • Super user does…sudo (07:50)
  • Permissions: mode & owner (11:16)
  • Kill kill kill! (12:21)
  • Health checking (12:54)

Making the shell your own

  • Owning your terminal (09:19)
  • Fish ~> (10:18)
  • Themes (01:51)
  • zsh (zed shell) (10:11)
  • zsh plugins: z st… (08:26)
  • Aliases (05:43)
  • Alias++ → functions (08:15)

Furthering your command line

  • Piping workflow (08:14)
  • Setting environment values (03:04)
  • Default environment variable values (01:46)
  • Terminal editors (06:41)
  • wget and cURL (09:53)
  • ngrok for tunnelling (06:38)
  • json command for data massage (07:51)
  • awk for splitting output into columns (04:11)
  • xargs (for when pipes won’t do) (02:15)
  • …fun bonus-bonus video (04:13)

I am not getting anything for this, except for making sure that someone as lovely and dedicated as Remy may reach more people with his materials. So, take a peek.

The Mozilla BlogA Public-Private Partnership for Gigabit Innovation and Internet Health

Mozilla, the National Science Foundation and U.S. Ignite announce $300,000 in grants for gigabit internet projects in Eugene, OR and Lafayette, LA

 

By Chris Lawrence, VP, Leadership Network

At Mozilla, we believe in a networked approach — leveraging the power of diverse people, pooled expertise and shared values.

This was the approach we took nearly 15 years ago when we first launched Firefox. Our open-source browser was — and is — built by a global network of engineers, designers and open web advocates.

This is also the approach Mozilla takes when working toward its greater mission: keeping the internet healthy. We can’t build a healthy internet — one that cherishes freedom, openness and inclusion — alone. To keep the internet a global public resource, we need a network of individuals and organizations and institutions.

One such partnership is Mozilla’s ongoing collaboration with the National Science Foundation (NSF) and U.S. Ignite. We’re currently offering a $2 million prize for projects that decentralize the web. And together in 2014, we launched the Gigabit Community Fund. We committed to supporting promising projects in gigabit-enabled U.S. cities — projects that use connectivity 250-times normal speeds to make learning more engaging, equitable and impactful.

Today, we’re adding two new cities to the Gigabit Community Fund: Eugene, OR and Lafayette, LA.

 

Beginning in May 2017, we’re providing a total of $300,000 in grants to projects in both new cities. Applications for grants will open in early summer 2017; applicants can be individuals, nonprofits and for-profits.

We’ll support educators, technologists and community activists in Eugene and Lafayette who are building and beta-testing the emerging technologies that are shaping the web. We’ll fuel projects that leverage gigabit networks to make learning more inclusive and engaging through VR field trips, ultra-high definition classroom collaboration, and real-time cross-city robot battles. (These are all real examples from the existing Mozilla gigabit cities of Austin, Chattanooga and Kansas City.)

We’re also investing in the local communities on the ground in Eugene and Lafayette — and in the makers, technologists, and educators who are passionate about local innovation. Mozilla will bring its Mozilla Network approach to both cities, hosting local events and strengthening connections between individuals, schools, nonprofits, museums, and other organizations.

Video: Learn how the Mozilla Gigabit Community Fund supports innovative local projects across the U.S.

Why Eugene and Lafayette? Mozilla Community Gigabit Fund cities are selected based on a range of criteria, including a widely deployed high-speed fiber network; a developing conversation about digital literacy, access, and innovation; a critical mass of community anchor organizations, including arts and educational organizations; an evolving entrepreneurial community; and opportunities to engage K-12 school systems.

We’re excited to fuel innovation in the communities of Eugene and Lafayette  — and to continue our networked approach with NSF, U.S. Ignite and others, in service of a healthier internet.

 

The post A Public-Private Partnership for Gigabit Innovation and Internet Health appeared first on The Mozilla Blog.

This Week In RustThis Week in Rust 173

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community

News & Blog Posts

Crate of the Week

This week's crate of the week is µtest, a testing framework for embedded software. Thanks to nasa42 for the suggestion.

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

142 pull requests were merged in the last week.

New Contributors

  • CrazyMerlyn
  • Fabjan Sukalia
  • Gibson Fahnestock
  • Joel Gallant
  • Jonas Bushart
  • Joshua Horwitz
  • madseagames
  • Paul Daniel Faria
  • Petr Hosek
  • Tobias Schottdorf

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now. This week's FCPs are:

New RFCs

Style RFCs

Style RFCs are part of the process for deciding on style guidelines for the Rust community and defaults for Rustfmt. The process is similar to the RFC process, but we try to reach rough consensus on issues (including a final comment period) before progressing to PRs. Just like the RFC process, all users are welcome to comment and submit RFCs. If you want to help decide what Rust code should look like, come get involved!

PRs in final comment period:

Issues in final comment period:

Other significant issues:

Upcoming Events

If you are running a Rust event please add it to the calendar to get it mentioned here. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

In #rustlang, None is always an Option<_>.

llogiq on Twitter.

Thanks to Johan Sigfrids for the suggestion.

Submit your quotes for next week!

This Week in Rust is edited by: nasa42, llogiq, and brson.

Mitchell BakerThe “Worldview” of Mozilla

There are a set of topics that are important to Mozilla and to what we stand for in the world — healthy communities, global communities, multiculturalism, diversity, tolerance, inclusion, empathy, collaboration, technology for shared good and social benefit.  I spoke about them at the Mozilla All Hands in December, if you want to (re)listen to the talk you can find it here.  The sections where I talk about these things are at the beginning, and also starting at about the 14:30 minute mark.

These topics are a key aspect of Mozilla’s worldview.  However, we have not set them out officially as part of who we are, what we stand for and how we describe ourselves publicly.   I’m feeling a deep need to do so.

My goal is to develop a small set of principles about these aspects of Mozilla’s worldview. We have clear principles that Mozilla stands for topics such as security and free and open source software (principles 4 and 7 of the Manifesto).  Similarly clear principles about topic such as global communities and multiculturalism will serve us well as we go forward.  They will also give us guidance as to the scope and public voice of Mozilla, spanning official communications from Mozilla, to the unofficial ways each of us describes Mozilla.

Currently, I’m working on a first draft of the principles.  We are working quickly, as quickly as we can have rich discussions and community-wide participation. If you would like to be involved and can potentially spend some hours reviewing and providing input please sign up here. Jascha and Jane are supporting me in managing this important project.  
I’ll provide updates as we go forward.  

Chris CooperShameless self (release) promotion: Firefox 53.0b1 from TaskCluster

You may recall two short months ago when we moved Linux and Android nightlies from buildbot to TaskCluster. Due to the train model, this put us (release engineering) on a clock: either we’d be ready to release a beta version of Firefox 53 for Linux and Android using release promotion in TaskCluster, or we’d need to hold back our work for at least the next cycle, causing uplift headaches galore.

I’m happy to report that we were able to successfully release Firefox 53.0b1 for Linux and Android from TaskCluster last week. This is impressive for 3 reasons:

  1. Mac and Windows builds were still promoted from buildbot, so we were able to seamlessly integrate the artifacts of two different continuous integration (CI) platforms.
  2. The process whereby nightly builds are generated has always been different from how we generate release builds. Firefox 53.0b1 represents the first time a beta build was generated using the same taskgraph we use for a nightly, thereby reducing the delta between CI builds and release builds. More work to be done here, for sure.
  3. Nobody noticed. With all the changes under the hood, this may be the most impressive achievement of all.

A round of thanks to Aki, Johan, Kim, and Mihai who worked hard to get the pieces in place for Android, and a special shout-out to Rail who handled the Linux beta while also dealing with the uplift requirements for ESR52. Of course, thanks to everyone else who has helped with the migration thus far. All of that foundational work is starting to pay off.

Much more to do, but I look forward to updating you about Mac and Windows progress soon.

Air MozillaMozilla Weekly Project Meeting, 13 Mar 2017

Mozilla Weekly Project Meeting The Monday Project Meeting

Mozilla Addons BlogWebExtensions in Firefox 54

Firefox 54 landed in Developer Edition this week, so we have another update on WebExtensions for you. In addition to new APIs to help more developers port over to WebExtensions, we also announced a new Office Hours Support schedule where developers can get more personalized help with the transition.

New APIs

A new API for creating sidebars was implemented. This allows you to place a local HTML file inside the sidebar. The API is similar to the one in Opera. If you specify the sidebar_action manifest key, Firefox will create a sidebar:

To allow keyboard commands to be sent to the sidebar, a new _execute_sidebar_action was added to the commands API which allows you trigger the showing of the sidebar.

The ability to override the about:newtab with pages inside your extension was added to the chrome_url_overrides field in the manifest. Check out the example that uses the topSites API to show the top sites you visit .

The privacy API gives you the ability to flip certain Firefox preferences related to privacy. Although the preferences in Chrome and Firefox aren’t a direct mapping, we’ve mapped the Firefox preferences that makes sense to the APIs. Currently implemented are: networkPredictionEnabled, webRTCIPHandlingPolicy and hyperlinkAuditingEnabled.

The protocol_handler API lets you easily map protocols to actions in your extension. For example: we use irccloud at Mozilla, so we can map ircs:// links to irccloud by adding this into an extension:

  "protocol_handlers": [
    {
      "protocol": "ircs",
      "name": "IRC Mozilla Extension",
      "uriTemplate": "https://irccloud.mozilla.com/#!/%s"
    }
  ]

When a user clicks on an IRC link, it shows the application selector with the IRC Mozilla Extension visible:

This release also marks the landing of the first sets of devtools APIs. Quite a few APIs landed including: inspectedWindow.reload(), inspectedWindow.eval(), inspectedWindow.tabId, network.onNavigated, and panels.create().

Here’s an example of the Redux DevTools extension running on Firefox:

Backwards incompatible changes

The webRequest API will now require that you’ve requested the appropriate hosts’ permissions before allowing you to perform webRequest operations on a URL. This will be a backwards-incompatible change for any extension which used webRequest but did not request the host permission.

Deletes in storage.sync are now encrypted. This would be a breaking change for any extensions using storage.sync on Developer Edition.

API Changes

Some key events were completed in some popular APIs:

  • webRequest.onBeforeRequest is initiated before a server side redirect is about occur and webRequest.onAuthRequired is fired when an authentication failure occurs. These allow you to catch authentication requests from servers, such as proxy authentication.
  • webNavigation.onCreatedNavigationTarget event has been completed. This is fired when a new window or tab is created to be navigated to.
  • runtime.onMessageExternal event has been implemented. This is fired when a message is sent from another extension.

Other notable bugs include:

Android

Notably in Firefox 54, basic tabs API support was landed for Android. The API support focuses on the parts of the API that make sense for Android, so some tab methods and events are deliberately not implemented.

This is an important API in its own right, but other key APIs did not have good support without this. By landing this, Android WebExtensions got much better webNavigation and webRequest support. This gives us a clear path to getting ad blockers, the most common extension type on Android.

Contributors

A big thank you to our contributors Rob Wu, Tomislav Jovanovic and Tushar Saini who helped out with this release.

The post WebExtensions in Firefox 54 appeared first on Mozilla Add-ons Blog.

Tim Guan-tin ChienBen’s Story of Firefox OS

Like my good old colleague Ben Francis, I too have a lot to say about Firefox OS.

It’s been little over a year since my team and I moved away from Firefox OS and the ill-fated Connected Devices group. Over the course of last year, each time I think about the Firefox OS experience, I arrived at different conclusions and complicated, sometimes emotional, belives. I didn’t feel I am ready to take a snapshot of these thoughts and publish it permanently, so I didn’t write about it here. Frankly, I don’t even think I am ready now.

In his post, Ben has pointed out many of the turning points of the project. I agree with many of his portrayals, most importantly, lost of direction between being a product (which must ship fast and deliver whatever partners/consumers wanted and used to) and a research project (which involves engineering endeavors that answer questions asked in the original announcement). I, however, have not figured out what can be done instead (which Ben proposed in his post nicely).

Lastly, a final point: to you, this might as well be another story in the volatile tech industry, but to me, I felt the cost of people enormously whenever a change was announced during the “slow death” of Firefox OS.

People moves on and recovers, including me (which fortunately wasn’t nearly being hit the hardest). I can only extend my best wishes to those who had fought the good fight with.

Gregory Szorcfrom __past__ import bytes_literals

Last year, I simultaneously committed one of the ugliest and impressive hacks of my programming career. I haven't had time to write about it. Until now.

In summary, the hack is a source-transforming module loader for Python. It can be used by Python 3 to import a Python 2 source file while translating certain primitives to their Python 3 equivalents. It is kind of like 2to3 except it executes at run-time during import. The main goal of the hack was to facilitate porting Mercurial to Python 3 while deferring having to make the most invasive - and therefore most annoying - elements of the port in the canonical source code representation.

For the technically curious, it works as follows.

The hg Python executable registers a custom meta path finder instance. This entity is invoked during import statements to try to find the module being imported. It tells a later phase of the import mechanism how to load that module from wherever it is (usually a .py or .pyc file on disk) to a Python module object. The custom finder only responds to requests for modules known to be managed by the Mercurial project. For these modules, it tells the next stage of the import mechanism to invoke a custom SourceLoader instance. Here's where the real magic is: when the custom loader is invoked, it tokenizes the Python source code using the tokenize module, iterates over the token stream, finds specific patterns, and rewrites them to something more appropriate. It then untokenizes back to Python source code then falls back to the built-in loader which does the heavy lifting of compiling the source to Python code objects. So, we have Python 2 source files on disk that magically get transformed to be Python compatible when they are loaded by Python 3. Oh, and there is no performance penalty for the token transformation on subsequence loads because the transformed bytecode is cached in the .pyc file (using a custom header so we know it was transformed and can be invalidated when the transformation logic changes).

At the time I wrote it, the token stream manipulation converted most string literals ('') to bytes literals (b''). In other words, it restored the Python 2 behavior of string literals being bytes and not unicode. We jokingly call it from __past__ import bytes_literals (a play on Python 2's from __future__ import unicode_literals special syntax which changes string literals from Python 2's str/bytes type to unicode to match Python 3's behavior).

Since I implemented the first version, others have implemented:

As one can expect, when I tweeted a link to this commit, many Python developers (including a few CPython core developers) expressed a mix of intrigue and horror. But mostly horror.

I fully concede that what I did here is a gross hack. And, it is the intention of the Mercurial project to undo this hack and perform a proper port once Python 3 support in Mercurial is more mature. But, I want to lay out my defense on why I did this and why the Mercurial project is tolerant of this ugly hack.

Individuals within the Mercurial project have wanted to port to Python 3 for years. Until recently, it hasn't been a project priority because a port was too much work for too little end-user gain. And, on the technical front, a port was just not practical until Python 3.5. (Two main blockers were no u'' literals - restored in Python 3.3 - and no % formatting for b'' literals - restored in 3.5. And as I understand it, senior members of the Mercurial project had to lobby Python maintainers pretty hard to get features like % formatting of b'' literals restored to Python 3.)

Anyway, after a number of failed attempts to initiate the Python 3 port over the years, the Mercurial project started making some positive steps towards Python 3 compatibility, such as switching to absolute imports and addressing syntax issues that allowed modules to be parsed into an AST and even compiled and loadable. These may seem like small steps, but for a larger project, it was a lot of work.

The porting effort hit a large wall when it came time to actually make the AST-valid Python code run on Python 3. Specifically, we had a strings problem.

When you write software that exchanges data between machines - sometimes machines running different operating systems or having different encodings - and there is an expectation that things work the same and data roundtrips accordingly, trying to force text encodings is essentially impossible and inevitably breaks something or someone. It is much easier for Mercurial to operate bytes first and only take text encoding into consideration when absolutely necessary (such as when emitting bytes to the terminal in the wanted encoding or when emitting JSON). That's not to say Mercurial ignores the existence of encodings. Far from it: Mercurial does attempt to normalize some data to Unicode. But it often does so with a special Python type that internally stores the raw byte sequence of the source so that a consumer can choose to operate at the bytes or Unicode level.

Anyway, this means that practically every string variable in Mercurial is a bytes type (or something that acts like a bytes type). And since string literals in Python 3 are the str type (which represents Unicode), that would mean having to prefix almost every '' string literal in Mercurial with b'' in order to placate Python 3. Having to update every occurrence of simple primitives that could be statically transformed automatically felt like busy work. We wanted to spend time on the meaningful parts of the Python 3 port so we could find interesting problems and challenges, not toil with mechanical conversions that add little to no short-term value while simultaneously increasing cognitive dissonance and quite possibly increasing the odds of introducing a bug in Python 2. In other words, why should humans do the work that machines can do for us? Thus, the source-transforming module importer was born.

While I concede what Mercurial did is a giant hack, I maintain it was the correct thing to do. It has allowed the Python 3 port to move forward without being blocked on the more tedious and invasive transformations that could introduce subtle bugs (including performance regressions) in Python 2. Perfect is the enemy of good. People time is valuable. The source-transforming module importer allowed us to unblock an important project without sinking a lot of people time into it. I'd make that trade-off again.

While I won't encourage others to take this approach to porting to Python 3, if you want to, Mercurial's source is available under a GPL license and the custom module importer could be adapted to any project with minimal modifications. If someone does extract it as reusable code, please leave a comment and I'll update the post to link to it.

The Servo BlogThese Weeks In Servo 94

In the last two weeks, we landed 185 PRs in the Servo organization’s repositories.

Planning and Status

Our overall roadmap is available online, including the overall plans for 2017 and Q1. Please check it out and provide feedback!

This week’s status updates are here.

Notable Additions

  • samgiles and rabisg added the Origin header to fetch requests.
  • hiikezoe made CSS animations be processed by Stylo.
  • Manishearth supported SVG presentation attributes in Stylo.
  • ferjm allowed redirects to occur after a CORS preflight fetch request.
  • sendilkumarn corrected the behaviour of loading user scripts in iframes.
  • pcwalton improved the performance of layout queries and requestAnimationFrame.
  • emilio removed unnecessary heap allocation for some CSS parsers.
  • mephisto41 implemented gradient border support in WebRender.
  • jdm avoided some panics triggered by image elements initiating multiple requests.
  • MortimerGoro improved the Android integration and lifecycle hooks.
  • nox removed the last uses of serde_codegen.
  • ferjm avoided a deadlock triggered by the Document.elementsFromPoint API.
  • fitzgen improved the rust-bindgen support for complex template parameter usages.
  • gw implemented page zoom support in WebRender.
  • dpyro added support for the nosniff algorithm in the Fetch implementation.
  • KiChjang implemented the :lang pseudoclass.

New Contributors

Interested in helping build a web browser? Take a look at our curated list of issues that are good for new contributors!

Mike HommeyWhen the memory allocator works against you

Cloning mozilla-central with git-cinnabar requires a lot of memory. Actually too much memory to fit in a 32-bits address space.

I hadn’t optimized for memory use in the first place. For instance, git-cinnabar keeps sha-1s in memory as hex values (40 bytes) rather than raw values (20 bytes). When I wrote the initial prototype, it didn’t matter that much, and while close(ish) to the tipping point, it didn’t require more than 2GB of memory at the time.

Time passed, and mozilla-central grew. I suspect the recent addition of several thousands of commits and files has made things worse.

In order to come up with a plan to make things better (short or longer term), I needed data. So I added some basic memory resource tracking, and collected data while cloning mozilla-central.

I must admit, I was not ready for what I witnessed. Follow me for a tale of frustrations (plural).

I was expecting things to have gotten worse on the master branch (which I used for the data collection) because I am in the middle of some refactoring and did many changes that I was suspecting might have affected memory usage. I wasn’t, however, expecting to see the clone command using 10GB(!) memory at peak usage across all processes.

(Note, those memory sizes are RSS, minus “shared”)

It also was taking an unexpected long time, but then, I hadn’t cloned a large repository like mozilla-central from scratch in a while, so I wasn’t sure if it was just related to its recent growth in size or otherwise. So I collected data on 0.4.0 as well.

Less time spent, less memory usage… ok. There’s definitely something wrong on master. But wait a minute, that slope from ~2GB to ~4GB on the git-remote-hg process doesn’t actually make any kind of sense. I mean, I’d understand it if it were starting and finishing with the “Import manifest” phase, but it starts in the middle of it, and ends long before it finishes. WTH?

First things first, since RSS can be a variety of things, I checked /proc/$pid/smaps and confirmed that most of it was, indeed, the heap.

That’s the point where you reach for Google, type something like “python memory profile” and find various tools. One from the results that I remembered having used in the past is guppy’s heapy.

Armed with pdb, I broke execution in the middle of the slope, and tried to get memory stats with heapy. SIGSEGV. Ouch.

Let’s try something else. I reached out to objgraph and pympler. SIGSEGV. Ouch again.

Tried working around the crashes for a while (too long while, retrospectively, hindsight is 20/20), and was somehow successful at avoiding them by peaking at a smaller set of objects. But whatever I did, despite being attached to a process that had 2.6GB RSS, I wasn’t able to find more than 1.3GB of data. This wasn’t adding up.

It surely didn’t help that getting to that point took close to an hour each time. Retrospectively, I wish I had investigated using something like Checkpoint/Restore in Userspace.

Anyways, after a while, I decided that I really wanted to try to see the whole picture, not smaller peaks here and there that might be missing something. So I resolved myself to look at the SIGSEGV I was getting when using pympler, collecting a core dump when it happened.

Guess what? The Debian python-dbg package does not contain the debug symbols for the python package. The core dump was useless.

Since I was expecting I’d have to fix something in python, I just downloaded its source and built it. Ran the command again, waited, and finally got a backtrace. First Google hit for the crashing function? The exact (unfixed) crash reported on the python bug tracker. No patch.

Crashing code is doing:

((f)->f_builtins != (f)->f_tstate->interp->builtins)

And (f)->f_tstate is NULL. Classic NULL deref.

Added a guard (assessing it wouldn’t break anything). Ran the command again. Waited. Again. SIGSEGV.

Facedesk. Another crash on the same line. Did I really use the patched python? Yes. But this time (f)->f_tstate->interp is NULL. Sigh.

Same player, shoot again.

Finally, no crash… but still stuck on only 1.3GB accounted for. Ok, I know not all python memory profiling tools are entirely reliable, let’s try heapy again. SIGSEGV. Sigh. No debug info on the heapy module, where the crash happens. Sigh. Rebuild the module with debug info, try again. The backtrace looks like heapy is recursing a lot. Look at %rsp, compare with the address space from /proc/$pid/maps. Confirmed. A stack overflow. Let’s do ugly things and increase the stack size in brutal ways.

Woohoo! Now heapy tells me there’s even less memory used than the 1.3GB I found so far. Like, half less. Yeah, right.

I’m not clear on how I got there, but that’s when I found gdb-heap, a tool from Red Hat’s David Malcolm, and the associated talk “Dude, where’s my RAM?” A deep dive into how Python uses memory (slides).

With a gdb attached, I would finally be able to rip python’s guts out and find where all the memory went. Or so I thought. The gdb-heap tool only found about 600MB. About as much as heapy did, for that matter, but it could be coincidental. Oh. Kay.

I don’t remember exactly what went through my mind then, but, since I was attached to a running process with gdb, I typed the following on the gdb prompt:

gdb> call malloc_stats()

And that’s when the truth was finally unvealed: the memory allocator was just acting up the whole time. The ouput was something like:

Arena 0:
system bytes    =  some number above (but close to) 2GB
in use bytes    =  some number above (but close to) 600MB

Yes, the glibc allocator was just telling it had allocated 600MB of memory, but was holding onto 2GB. I must have found a really bad allocation pattern that causes massive fragmentation.

One thing that David Malcolm’s talk taught me, though, is that python uses its own allocator for small sizes, so the glibc allocator doesn’t know about them. And, roughly, adding the difference between RSS and what glibc said it was holding to to the use bytes it reported somehow matches the 1.3GB I had found so far.

So it was time to see how those things evolved in time, during the entire clone process. I grabbed some new data, tracking the evolution of “system bytes” and “in use bytes”.

There are two things of note on this data:

  • There is a relatively large gap between what the glibc allocator says it has gotten from the system, and the RSS (minus “shared”) size, that I’m expecting corresponds to the small allocations that python handles itself.
  • Actual memory use is going down during the “Import manifests” phase, contrary to what the evolution of RSS suggests.

In fact, the latter is exactly how git-cinnabar is supposed to work: It reads changesets and manifests chunks, and holds onto them while importing files. Then it throws away those manifests and changesets chunks one by one while it imports them. There is, however, some extra bookkeeping that requires some additional memory, but it’s expected to be less memory consuming than keeping all the changesets and manifests chunks in memory.

At this point, I thought a possible explanation is that since both python and glibc are mmap()ing their own arenas, they might be intertwined in a way that makes things not go well with the allocation pattern happening during the “Import manifest” phase (which, in fact, allocates and frees increasingly large buffers for each manifest, as manifests grow in size in the mozilla-central history).

To put the theory at work, I patched the python interpreter again, making it use malloc() instead of mmap() for its arenas.

“Aha!” I thought. That definitely looks much better. Less gap between what glibc says it requested from the system and the RSS size. And, more importantly, no runaway increase of memory usage in the middle of nowhere.

I was preparing myself to write a post about how mixing allocators could have unintended consequences. As a comparison point, I went ahead and ran another test, with the python allocator entirely disabled, this time.

Heh. It turns out glibc was acting up all alone. So much for my (plausible) theory. (I still think mixing allocators can have unintended consequences.)

(Note, however, that the reason why the python allocator exists is valid: without it, the overall clone took almost 10 more minutes)

And since I had been getting all this data with 0.4.0, I gathered new data without the python allocator with the master branch.

This paints a rather different picture than the original data on that branch, with much less memory use regression than one would think. In fact, there isn’t much difference, except for the spike at the end, which got worse, and some of the noise during the “Import manifests” phase that got bigger, implying larger amounts of temporary memory used. The latter may contribute to the allocation patterns that throw glibc’s memory allocator off.

It turns out tracking memory usage in python 2.7 is rather painful, and not all the tools paint a complete picture of it. I hear python 3.x is somewhat better in that regard, and I hope it’s true, but at the moment, I’m stuck with 2.7. The most reliable tool I’ve used here, it turns out, is pympler. Or rebuilding the python interpreter without its allocator, and asking the system allocator what is allocated.

With all this data, I now have some defined problems to tackle, some easy (the spike at the end of the clone), and some less easy (working around glibc allocator’s behavior). I have a few hunches as to what kind of allocations are causing the runaway increase of RSS. Coincidentally, I’m half-way through a refactor of the code dealing with manifests, and it should help dealing with the issue.

But that will be the subject of a subsequent post.

Eric RahmAre they slim yet, round 2

A year later let’s see how Firefox fares on Windows, Linux, and OSX with multiple content processes enabled.

Results

Graph comparing memory usage, chrome is still quite high

We can see that Firefox with four content processes fares better than Chrome on all platforms which is reassuring; Chrome is still about 2X worse on Windows and Linux. Our current plan is to only move up to four content processes, so this is great news.

Two content processes is still better than IE, with four we’re a bit worse. This is pretty impressive given last year we were in the same position with one content process.

Surprisingly on Mac Firefox is better than Safari with two content processes, compared with last year where we used 2X the memory with just one process, now we’re on par with four content processes.

I included Firefox with eight content processes to keep us honest. As you can see we actually do pretty well, but I don’t think it’s realistic to ship with that many nor do we currently plan to. We already have or are adding additional processes such as the plugin process for Flash and the GPU process. These need to be taken into consideration when choosing how many content processes to enable and pushing to eight doesn’t give us much breathing room. Making sure we have measurements now is important; it’s good to know where we can improve.

Overall I feel solid about these numbers, especially considering where we were just a year ago. This bodes well for the e10s-multi project.

Test setup

This is the same setup as last year. I load the first 30 pages of the tp5 page set (a snapshot of Alexa top 100 websites from a few years ago), each in its own tab, with 10 seconds in between loads and 60 seconds of settle time at the end.

Note: There was a minor change to the setup to give each page a unique domain. At least Safari and Chrome are roughly doing process per domain, so just using different ports on localhost was not enough. A simple solution was to modify my /etc/hosts file to add localhost-<1-30> aliases.

Methodology

Measuring multiprocess browser memory usage is tricky. I’ve settled with a somewhat simple formula of:

total_memory = sum_uss(content processes) + sum_rss(parent processes); 

Where a parent process is defined as anything that is not a content process (I’ll explain in a moment). Historically there was just one parent process that manages all other processes, this is still somewhat the case but each browser still has other executables they may run in addition to content processes. A content process has a slightly different definition per browser, but is generally “where the pages are loaded” — this is an oversimplification, but it’s good enough for now.

My definitions:

Browser Content Definition Example “parent”
Firefox firefox processes launched with the -contentproc command line. firefox without the -contentproc command line, plugin-process which is used for Flash, etc.
Chrome chrome processes launched with the --type command line. chrome without out the --type command line, nacl_helper, etc.
Safari WebContent processes. Safari, SafariServices, SafariHistory, Webkit.Networking, etc.
IE iexplore.exe process launched with the /prefetch command line. iexplore without the /prefetch command line.
Edge MicrosoftEdgeCP.exe processes. MicrosoftEdge.exe, etc.

For Firefox this is a reasonable and fair measurement, for other browsers we might be under counting memory by a bit. For example Edge has a parent executable, MicrosoftEdge.exe, and a different content executable, MicrosoftEdgeCP.exe, arguably we should measure the RSS of one the MicrosoftEdgeCP.exe processes, and USS for the rest, so we’re probably under counting. On the other hand we might end up over counting if the parent and content processes are sharing dynamic libraries. In future measurements I may tweak how we sum the memory, but for now I’d rather possibly under count rather then worry about being unfair to other browsers.

Raw numbers

OS Browser Total Memory
Ubuntu 16.04 LTS Chrome 54 (see note) 1,478 MB
Ubuntu 16.04 LTS Firefox 55 – 2 CP 765 MB
Ubuntu 16.04 LTS Firefox 55 – 4 CP 817 MB
Ubuntu 16.04 LTS Firefox 55 – 8 CP 990 MB
macOS 10.12.3 Chrome 59 1,365 MB
macOS 10.12.3 Firefox 55 – 2 CP 1,113 MB
macOS 10.12.3 Firefox 55 – 4 CP 1,215 MB
macOS 10.12.3 Firefox 55 – 8 CP 1,399 MB
macOS 10.12.3 Safari 10.2 (see note) 1,203 MB
Windows 10 Chrome 59 1,382 MB
Windows 10 Edge (see note) N/A
Windows 10 Firefox 55 – 2 CP 587 MB
Windows 10 Firefox 55 – 4 CP 839 MB
Windows 10 Firefox 55 – 8 CP 905 MB
Windows 10 IE 11 660 MB

Browser Version Notes

  • Chrome 54 — aka chrome-unstable — was used on Ubuntu 16.04 LTS as that’s the latest branded version available (rather than Chromium)
  • Firefox Nightly 55 – 2 CP is Firefox with 2 content processes and one parent process, the default configuration for Nightly.
  • Firefox Nightly 55 – 4 CP is Firefox with 4 content processes and one parent process, this is a longer term goal.
  • Firefox Nightly 55 – 8 CP is Firefox with 8 content processes and one parent process, this is aspirational, a good sanity check.
  • Safari Technology Preview 10.2 release 25 was used on macOS as that’s the latest branded version available (rather than Webkit nightly)
  • Edge was disqualified because it seemed to bypass the hosts file and wouldn’t load pages from unique domains. I can do measurements so I might revisit this, but it wouldn’t have been a fair comparison as-is.

Gervase MarkhamFirefox Secure Travel Addon

In these troubled times, business travellers occasionally have to cross borders where the border guards have significant powers to seize your electronic devices, and even compel you to unlock them or provide passwords. You have the difficult choice between refusing, and perhaps not getting into the country, or complying, and having sensitive data put at risk.

It is possible to avoid storing confidential data on your device if it’s all in the cloud, but then your browser is logged into (or has stored passwords for) various important systems which have lots of sensitive data, so anyone who has access to your machine has access to that data. And simply deleting all these passwords and cookies is a) a pain, and b) hard to recover from.

What might be very cool is a Firefox Secure Travel addon where you press a “Travelling Now” button and it:

  • Disconnects you from Sync
  • Deletes all cookies for a defined list of domains
  • Deletes all stored passwords for the same defined list of domains

Then when you arrive, you can log back in to Sync and get your passwords back (assuming it doesn’t propagate the deletions!), and log back in to the services.

I guess the border authorities can always ask for your Sync password but there’s a good chance they might not think to do that. A super-paranoid version of the above would also:

  • Generate a random password
  • Submit it securely to a company-run web service
  • On receiving acknowledgement of receipt, change your Sync password to
    the random password

Then, on arrival, you just need to call your IT department (who would ID you e.g. by voice or in person) to get the random password from them, and you are up and running. In the mean time, your data is genuinely out of your reach. You can unlock your device and tell them any passwords you know, and they won’t get your data.

Worth doing?

Mozilla Addons BlogImprovements to add-on review communications

We recently made some improvements to our tools and processes to better support communication between add-on developers and reviewers.

Previously, when you submitted an add-on to addons.mozilla.org (AMO) and a reviewer emailed you, your replies went to a mailing list (amo-editors AT mozilla DOT org) where a few reviewers (mostly admins) handled every response. This approach had some flaws—it put the burden on very few people to reply, who had to first get familiar with the add-on code and previous review actions. Further replies from either party went to the mailing list only, rather than being fed back into the review tools on AMO. These flaws slowed things down unnecessarily, and contributed to information clutter.

Now, add-on developers can choose to reply to a review by email—like they’re used to—or from the Manage Status & Versions page of the add-on in the developer hub. Replies are picked up by AMO and displayed in the review history for reviewers and developers. In addition, everyone  involved in the review of the particular version will be notified by email. Admin reviewers will make sure all inquiries are followed up with.

This long-anticipated feature will not only make follow-ups for reviews more efficient for both developers and reviewers, it also makes upcoming reviews easier by having all information in the same place.

The mailing list (amo-editors AT mozilla DOT org) will be discontinued shortly, so we ask all developers to use this system instead. For other questions not related to a particular review, please send a message to amo-admins AT mozilla DOT org.

The Add-on Review team would like to thank Andrew Williamson for implementing this new feature and our QA team for testing it!

The post Improvements to add-on review communications appeared first on Mozilla Add-ons Blog.

Daniel GlazmanOS X TouchBar

Currently adding OS X TouchBar support to XUL :-) So far so good. Should be trivial to add multiple touchbars to a given XUL window when it's done.

Karl Dubost[worklog] Edition 058 - Rain, sign of spring

webcompat life

webcompat issues

webcompat.com dev

  • By introducing HTTP Caching to our HTML resources, I also introduced an issue. The bug report says that people can't login, logout. In fact they can. The issue is what the page looks like, it doesn't display that the user is logged-in. It's normal I initial put a max-age of one day in addition to the etag value. In browser speak, it means that if the browser has a cached copy it will not even try a conditional requests with If-None-Match. Because during the login process we are hitting the same resource. Users are receiving the previous cached copy and don't see that they have actually logged in. A force reload solves that. So instead of putting a max-age I can solve this with a no-cache. Unfortunately, browsers didn't respect the HTTP semantics of no-cache and decided to make it a no-store. Back to the departure case. must-revalidate, max-age=0 will solve it and force the conditional request.
  • Opened quick issue about our current twitter link based on the UX research currently done.
  • Discussions about issues with dependencies and how do we handle them?
  • Discussing about the Contact page friendliness
  • We currently have some outreachy participants. Time for review and help: review
  • Our issues title are… pretty poor for now.

Otsukare!

Hub FiguièreSo, I got a Dell

Long. Overdue. Upgrade.

I bought a Dell XPS13 as my new portable workstation for Linux and GNOME. This is the model 9360 that is currently available as in a Developer Edition with Ubuntu 16.04 LTS (project Sputnik for those who follow). It satisifies all I was looking for in a laptop: lightweigh, small 13", 16 GB of RAM (at least), core i7 CPU (this is a Kaby Lake) and must run Linux well.

But I didn't buy the Developer Edition. Price-wise the Developer Edition is CAD$150 less than the Windows version in the "business" line and CAD$50 less than the Windows version in the "home" line (which only has that version). Exact same hardware. Last week, it was on sale, CAD$500 off the regular price, so it didn't matter. I got one. I had delayed so long for getting one, this was the opportunity for a bargain. I double checked, and unlike the previous Skylake based model that didn't have the same wifi card, this one is really the same thing.

I got a surprise door bell ring, from the delivery person (the website didn't tell me it was en route).

Unboxing

The black box

It came in a box that contain a cardboard wrap and a nice black box. The cardboard wrap contain the power brick and the AC cord. I'm genuinely surprised of the power adapter. It is small, smaller than what I'm used to. It is just odd that it doesn't come in the same box as the laptop (not a separate shipping box, just that it is boxes in a box, shipped to you at once). The nice black box only bear the Dell logo and contain the laptop, and two small booklets. One is for the warranty, the other is a getting started guide. Interestingly it mentions Ubuntu as well, which lead me to think that it is same for both preloads. This doesn't really matter in the end but it just show the level of refinement for a high-end laptop, which until the last Apple refresh, was still more expensive than the Apple equivalent.

New laptop...

Fiddling with the BIOS

It took me more time to download the Fedora live ISO and flash it to the USB stick than the actual setup of the machine, minus some fiddling. As I had booted, Fedora 25 was installed in 20 minutes. I did wipe the internal SSD, BTW.

Once I figured out it was F2 I had to press to get the BIOS upon boot, to set it to boot off the USB drive, I also had to configure the disk controller to AHCI instead of RAID, as without that the Fedora installed didn't find the internal storage. Note that this might be more troublesome if you want dual boot, but I didn't want. also I don't know what's the default setup in the Developer Edition either.

Dell XPS13 BIOS configuration

Nonetheless, the Fedora installer worked well with mostly defaults. HiDPI is managed, which that I finally have a laptop with "Retina Display".

Hands on

The laptop is small, with a HiDPI screen, a decent trackpad that works out of the box with two finger scrolling. An aluminium shell with a non glowing logo in he middle, a black inside with keyboard, and glass touch screen. The backlit keyboard has a row of function keys at the top, like it should be, row that dub as "media" button with the Fn key or actually without. Much like on a MacBook. The only difference that will require me to get used to is the Control key is actually in the corner. Like it used to be... (not sure if that's on all Dell though, but I remember hating not have control in the corner, then got used to to it like there was no option, and that was more than a decade ago). That will make for a finger reeducation, that's sure. The whole laptop a good size, it is very compact. Light but solid.

As for the connectivity, there is a SD card reader, 2 USB 3.0 port (A type) and a USB 3.1 Type-C that carries HDMI and Thunderbolt. For HDMI looks like a CAD$30 adapter, but it seems to be standard so might not be a huge problem.

Conclusion

I am happy with it. GNOME is beautiful in HiDPI, and it is all smooth.

Mozilla Addons BlogOffice Hours Support for Transitioning and Porting to WebExtensions

To help facilitate more mutual support among developers migrating and porting to WebExtensions, we asked the add-on developer community to sign up for blocks of time when they can be available to assist each other. This week, we published the schedule, which shows you the days and hours (in your time zone) when people are available to answer questions in IRC and the add-on forum. Each volunteer helper has indicated their specialties, so you can find the people who are most likely able to help you.

If you’d like to get help asynchronously, you can join and email the webextensions-support [at] mozilla [dot] org mailing list, where more people are on hand to answer questions.

If you have any knowledge in or expertise with add-ons, please sign up to help! Just go to the etherpad and add your IRC handle, times you’re available, and your specialties, and we’ll add you to the schedule. Or, join the mailing list to help out at any time.

The post Office Hours Support for Transitioning and Porting to WebExtensions appeared first on Mozilla Add-ons Blog.

Air MozillaEqual Ratings Conference Demo Day Presentations 3.09.17

Equal Ratings Conference Demo Day Presentations 3.09.17 We Believe in Equal Rating Mozilla seeks to make the full range of the Internet's extraordinary power and innovative potential available to all. We advocate...

Air MozillaDenise Graveline on Graceful ways with Q & A

Denise Graveline on Graceful ways with Q & A Some speakers love Q & A. Others dread it. No matter which group you are in, this session will share tips for how to plan...

Air MozillaEqual Ratings Conference Judges' Panel Discussion 3.09.17

Equal Ratings Conference Judges' Panel Discussion 3.09.17 We Believe in Equal Rating Mozilla seeks to make the full range of the Internet's extraordinary power and innovative potential available to all. We advocate...

Air MozillaReps Weekly Meeting Mar. 09, 2017

Reps Weekly Meeting Mar. 09, 2017 This is a weekly call with some of the Reps to discuss all matters about/affecting Reps and invite Reps to share their work with everyone.

Air MozillaEqual Ratings Conference AM Session 3.09.17

Equal Ratings Conference AM Session 3.09.17 We Believe in Equal Rating Mozilla seeks to make the full range of the Internet's extraordinary power and innovative potential available to all. We advocate...

Gervase MarkhamMozilla seeks Internet Policy Manager

“Are you passionate about the Open Web? Do you want to help protect the Internet’s tremendous socioeconomic benefits through policy and advocacy?”

Mozilla is looking for an Internet Policy Manager, to work with the wonderful Raegan in the EU. If you know anyone who might be suitable, please encourage them to apply :-)

Emma IrwinHelp Review Mozilla’s Community Participation Guidelines 2.0 (Draft)

CC BY-SA 3.0 Nick Youngson

 

TL;DR We need you! There are 3 ways to give feedback on the draft of Mozilla’s Community Participation Guidelines. The draft is available in both English and Spanish (and yes, more languages will be included in future!).

1. By joining our #Mozillians Telegram Group or on Twitter tomorrow, Thursday  March 9th at 7 AM PST (your time).
2. Directly on Discourse – English or Spanish.
3. Via the Activate Campaign (which comes with recognition of completion) – English or Spanish

Longer version!

D&I lead at Mozilla Larissa Shapiro, and her team have been working very hard on a 2.0 version of our Community Participation Guidelines, you can find the current version online here.

Why are we revisiting and improving our guidelines.

Diversity is the mix of people, and Inclusion is getting the mix to work well together.


The Community Participation Guidelines draft is intended to  support a diverse and inclusive Mozilla by laying out expected behavior, consequences and reporting.   Please help us make this resources one that enriches our experiences –  making Mozilla an empowered, safe and rewarding place to be.

There are 3 ways to give feedback (and hear from others)

  1. By joining our #Mozillians Telegram Group or on Twitter tomorrow, Thursday  March 9th at 7 AM PST (your time).
  2. Directly on Discourse – English or Spanish.
  3. Via the Activate Campaign (which comes with recognition of completion) – English or Spanish

 

FacebookTwitterGoogle+Share

James LongTalking Open Source on Changelog

After writing my post "Why I'm Frequently Absent from Open Source", the Changelog podcast invited me to come talk more about it. It was great to dive into open source and talk about some problems it brings. If you are a maintainer of a project, I think you'll connect with a lot of what is said here.

We talk about the burden that is people on individuals that do open source in their spare time, the guilt that it brings, and try to find some answers.

The Changelog 242: The Burden of Open Source with James Long – Listen on Changelog.com

Will Kahn-GreeneBleach v2.0 released!

What is it?

Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.

Bleach v2.0 released!

Bleach 2.0 is a massive rewrite. Bleach relies on the html5lib library. html5lib 0.99999999 (8 9s) changed the APIs that Bleach was using to sanitize text. As such, in order to support html5lib >= 0.99999999 (8 9s), I needed to rewrite Bleach.

Before embarking on the rewrite, I improved the tests and added a set of tests based on XSS example strings from the OWASP site. Spending quality time with tests before a rewrite or refactor is both illuminating (you get a better understanding of what the requirements are) and also immensely helpful (you know when your rewrite/refactor differs from the original). That was time well spent.

Given that I was doing a rewrite anyways, I decided to take this opportunity to break the Bleach API to make it more flexible and easier to use:

  • added Cleaner and Linkifier classes that you can create once and reuse to reduce redundant work--suggested in #125
  • created BleachSanitizerFilter which is now an html5lib filter that can be used anywhere you can use an html5lib filter
  • created LinkifyFilter as an html5lib filter that can be used anywhere you use an html5lib filter including as part of cleaning allowing you to clean and linkify in one pass--suggested in #46
  • changed arguments for attribute callables and linkify callbacks
  • and so on

During and after the rewrite, I improved the documentation converting all the examples to doctest format so they're testable and verifiable and adding examples where there weren't any. This uncovered bugs in the documentation and pointed out some annoyances with the new API.

As I rewrote and refactored code, I focused on making the code simpler and easier to maintain going forward and also documented the intentions so I and others can know what the code should be doing.

I also adjusted the internals to make it easier for users to extend, subclass, swap out and whatever else to adjust the functionality to meet their needs without making Bleach harder to maintain for me or less safe because of additional complexity.

For API-adjustment inspiration, I went through the Bleach issue tracker and tried to address every possible issue with this update: infinite loops, unintended behavior, inflexible APIs, suggested refactorings, features, bugs, etc.

The rewrite took a while. I tried to be meticulous because this is a security library and it's a complicated problem domain and I was working on my own during slow times on work projects. When working on one's own, you don't have benefit of review. Making sure to have good test coverage and waiting a day to self-review after posting a PR caught a lot of issues. I also go through the PR and add comments explaining why I did things to give context to future me. Those habits help a lot, but probably aren't as good as a code review by someone else.

Some stats

OMG! This blog post is so boring! Walls of text everywhere so far!

There were 61 commits between v1.5 and v2.0:

  • Vadim Kotov: 1
  • Alexandr N. Zamaraev: 2
  • me: 58

I closed out 22 issues--possibly some more.

The rewrite has the following git diff --shortstat:

64 files changed, 2330 insertions(+), 1128 deletions(-)

Lines of code for Bleach 1.5:

~/mozilla/bleach> cloc bleach/ tests/
      11 text files.
      11 unique files.
       0 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.07 s (152.4 files/s, 25287.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          11            353            200           1272
-------------------------------------------------------------------------------
SUM:                            11            353            200           1272
-------------------------------------------------------------------------------
~/mozilla/bleach>

Lines of code for Bleach 2.0:

~/mozilla/bleach> cloc bleach/ tests/
      49 text files.
      49 unique files.
      36 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.13 s (101.7 files/s, 20128.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          13            545            406           1621
-------------------------------------------------------------------------------
SUM:                            13            545            406           1621
-------------------------------------------------------------------------------
~/mozilla/bleach>

Some off-the-cuff performance benchmarks

I ran some timings between Bleach 1.5 and various uses of Bleach 2.0 on the Standup corpus.

Here's the results:

what? time to clean and linkify
Bleach 1.5 1m33s
Bleach 2.0 (no code changes) 41s
Bleach 2.0 (using Cleaner and Linker) 10s
Bleach 2.0 (clean and linkify--one pass) 7s

How'd I compute the timings?

  1. I'm using the Standup corpus which has 42000 status messages in it. Each status message is like a tweet--it's short, has some links, possibly has HTML in it, etc.
  2. I wrote a timing harness that goes through all those status messages and times how long it takes to clean and linkify the status message content, accumulates those timings and then returns the total time spent cleaning and linking.
  3. I ran that 10 times and took the median. The timing numbers were remarkably stable and there was only a few seconds difference between the high and low for all of the sets.
  4. I wrote the median number down in that table above.
  5. Then I'd adjust the code as specified in the table and run the timings again.

I have several observations/thoughts:

First, holy moly--1m33s to 7s is a HUGE performance improvement.

Second, just switching from Bleach 1.5 to 2.0 and making no code changes (in other words, keeping your calls as bleach.clean and bleach.linkify rather than using Cleaner and Linker and LinkifyFilter), gets you a lot. Depending on whether your have attribute filter callables and linkify callbacks, you may be able to just upgrade the libs and WIN!

Third, switching to reusing Cleaner and Linker also gets you a lot.

Fourth, your mileage may vary depending on the nature of your corpus. For example, Standup status messages are short so if your text fragments are larger, you may see more savings by clean-and-linkify in one pass because HTML parsing takes more time.

How to upgrade

Upgrading should be straight-forward.

Here's the minimal upgrade path:

  1. Update Bleach to 2.0 and html5lib to >= 0.99999999 (8 9s).
  2. If you're using attribute callables, you'll need to update them.
  3. If you're using linkify callbacks, you'll need to update them.
  4. Read through version 2.0 changes for any other backwards-incompatible changes that might affect you.
  5. Run your tests and see how it goes.

Note

If you're using html5lib 1.0b8, then you have to explicitly upgrade the version. 1.0b8 is equivalent to html5lib 0.9999999 (7 9s) and that's not supported by Bleach 2.0.

You have to explicitly upgrade because pip will think that 1.0b8 comes after 0.99999999 (8 9s) and it doesn't. So it won't upgrade html5lib for you.

If you're doing 9s, make sure to upgrade to 0.99999999 (8 9s) or higher.

If you're doing 1.0bs, make sure to upgrade to 1.0b9 or higher.

If you want better performance:

  1. Switch to reusing bleach.sanitizer.Cleaner and bleach.linkifier.Linker.

If you have large text fragments:

  1. Switch to reusing bleach.sanitizer.Cleaner and set filters to include LinkifyFilter which lets you clean and linkify in one step.

Many thanks

Many thanks to James Socol (previous maintainer) for walking me through why things were the way they were.

Many thanks to Geoffrey Sneddon (html5lib maintainer) for answering questions, helping with problems I encountered and all his efforts on html5lib which is a huge library that he works on in his spare time for which he doesn't get anywhere near enough gratitude.

Many thanks to Lonnen (my manager) who heard me talk about html5lib zero point nine nine nine nine nine nine nine nine a bunch.

Also, many thanks to Mozilla for letting me work on this during slow periods of the projects I should be working on. A bunch of Mozilla sites use Bleach, but none of mine do.

Where to go for more

For more specifics on this release, see here: https://bleach.readthedocs.io/en/latest/changes.html#version-2-0-march-8th-2017

Documentation and quickstart here: https://bleach.readthedocs.org/en/v2.0/

Source code and issue tracker here: https://github.com/mozilla/bleach

Air MozillaThe Joy of Coding - Episode 94

The Joy of Coding - Episode 94 mconley livehacks on real Firefox bugs while thinking aloud.

Myk MelezPositron Discontinued

After some consideration, I’ve decided to discontinue development of Positron.

Positron was an experimental runtime for creating desktop apps using web technologies. It was based on Firefox, and its principal feature was that it was Electron-compatible. I started working on it—in conjunction with several colleagues—to enable Tofino to run on Gecko.

But Tofino is dead (long live the Browser Futures Group!), and Electron compatibility isn’t essential for a viable Gecko runtime. It’s also hard, since Electron has a large API surface area, is a moving target, requires Node.js integration (itself a moving target), and is designed for Chromium’s process architecture, which is substantially different from Firefox’s.

I’ve previously written about the utility of desktop runtimes (among other embedding projects). I still think they’re valuable for a variety of use cases, and Gecko can provide unique value to desktop application development. I’ll continue to pursue the realization of that value. I just won’t do it through Positron.

Chris H-CFirefox on Windows XP: End of the Line

With the release of Firefox 52 to all users worldwide, we now have the final Windows XP-supported Firefox release out the door.

This isn’t to say that support is done. As I’ve mentioned before, Windows XP users will be transitioned to the ESR update channel where they’ll continue to receive security updates for the next year or so.

And I don’t expect this to be the end of me having to blog about weird clients that are inexplicably on Windows XP.

However, this does take care of one of the longest-standing data questions I’ve looked at on this blog and in my career at Mozilla. So I feel that it’s worth taking a moment to mark the occasion.

Windows XP is dead. Long live Windows XP.

:chutten


Air MozillaWeekly SUMO Community Meeting Mar. 08, 2017

Weekly SUMO Community Meeting Mar. 08, 2017 This is the sumo weekly call

Gregory SzorcBetter Compression with Zstandard

I think I first heard about the Zstandard compression algorithm at a Mercurial developer sprint in 2015. At one end of a large table a few people were uttering expletives out of sheer excitement. At developer gatherings, that's the universal signal for something is awesome. Long story short, a Facebook engineer shared a link to the RealTime Data Compression blog operated by Yann Collet (then known as the author of LZ4 - a compression algorithm known for its insane speeds) and people were completely nerding out over the excellent articles and the data within showing the beginnings of a new general purpose lossless compression algorithm named Zstandard. It promised better-than-deflate/zlib compression ratios and performance on both compression and decompression. This being a Mercurial meeting, many of us were intrigued because zlib is used by Mercurial for various functionality (including on-disk storage and compression over the wire protocol) and zlib operations frequently appear as performance hot spots.

Before I continue, if you are interested in low-level performance and software optimization, I highly recommend perusing the RealTime Data Compression blog. There are some absolute nuggets of info in there.

Anyway, over the months, the news about Zstandard (zstd) kept getting better and more promising. As the 1.0 release neared, the Facebook engineers I interact with (Yann Collet - Zstandard's author - is now employed by Facebook) were absolutely ecstatic about Zstandard and its potential. I was toying around with pre-release versions and was absolutely blown away by the performance and features. I believed the hype.

Zstandard 1.0 was released on August 31, 2016. A few days later, I started the python-zstandard project to provide a fully-featured and Pythonic interface to the underlying zstd C API while not sacrificing safety or performance. The ulterior motive was to leverage those bindings in Mercurial so Zstandard could be a first class citizen in Mercurial, possibly replacing zlib as the default compression algorithm for all operations.

Fast forward six months and I've achieved many of those goals. python-zstandard has a nearly complete interface to the zstd C API. It even exposes some primitives not in the C API, such as batch compression operations that leverage multiple threads and use minimal memory allocations to facilitate insanely fast execution. (Expect a dedicated post on python-zstandard from me soon.)

Mercurial 4.1 ships with the python-zstandard bindings. Two Mercurial 4.1 peers talking to each other will exchange Zstandard compressed data instead of zlib. For a Firefox repository clone, transfer size is reduced from ~1184 MB (zlib level 6) to ~1052 MB (zstd level 3) in the default Mercurial configuration while using ~60% of the CPU that zlib required on the compressor end. When cloning from hg.mozilla.org, the pre-generated zstd clone bundle hosted on a CDN using maximum compression is ~707 MB - ~60% the size of zlib! And, work is ongoing for Mercurial to support Zstandard for on-disk storage, which should bring considerable performance wins over zlib for local operations.

I've learned a lot working on python-zstandard and integrating Zstandard into Mercurial. My primary takeaway is Zstandard is awesome.

In this post, I'm going to extol the virtues of Zstandard and provide reasons why I think you should use it.

Why Zstandard

The main objective of lossless compression is to spend one resource (CPU) so that you may reduce another (I/O). This trade-off is usually made because data - either at rest in storage or in motion over a network or even through a machine via software and memory - is a limiting factor for performance. So if compression is needed for your use case to mitigate I/O being the limiting resource and you can swap in a different compression algorithm that magically reduces both CPU and I/O requirements, that's pretty exciting. At scale, better and more efficient compression can translate to substantial cost savings in infrastructure. It can also lead to improved application performance, translating to better end-user engagement, sales, productivity, etc. This is why companies like Facebook (Zstandard), Google (brotli, snappy, zopfli), and Pied Piper (middle-out) invest in compression.

Today, the most widely used compression algorithm in the world is likely DEFLATE. And, software most often interacts with DEFLATE via what is likely the most widely used software library in the world, zlib.

Being at least 27 years old, DEFLATE is getting a bit long in the tooth. Computers are completely different today than they were in 1990. The Pentium microprocessor debuted in 1993. If memory serves (pun intended), it used PC66 DRAM, which had a transfer rate of 533 MB/s. For comparison, a modern NVMe M.2 SSD (like the Samsung 960 PRO) can read at 3000+ MB/s and write at 2000+ MB/s. In other words, persistent storage today is faster than the RAM from the era when DEFLATE was invented. And of course CPU and network speeds have increased as well. We also have completely different instruction sets on CPUs for well-designed algorithms and software to take advantage of. What I'm trying to say is the market is ripe for DEFLATE and zlib to be dethroned by algorithms and software that take into account the realities of modern computers.

(For the remainder of this post I'll use zlib as a stand-in for DEFLATE because it is simpler.)

Zstandard initially piqued my attention by promising better-than-zlib compression and performance in both the compression and decompression directions. That's impressive. But it isn't unique. Brotli achieves the same, for example. But what kept my attention was Zstandard's rich feature set, tuning abilities, and therefore versatility.

In the sections below, I'll describe some of the benefits of Zstandard in more detail.

Before I do, I need to throw in an obligatory disclaimer about data and numbers that I use. Benchmarking is hard. Benchmarks should not be trusted. There are so many variables that can influence performance and benchmarks. (A recent example that surprised me is the CPU frequency/power ramping properties of Xeon versus non-Xeon Intel CPUs. tl;dr a Xeon won't hit max CPU frequency if only a core or two is busy, meaning that any single or low-threaded benchmark is likely misleading on Xeons unless you change power settings to mitigate its conservative power ramping defaults. And if you change power settings, does that reflect real-life usage?)

Reporting useful and accurate performance numbers for compression is hard because there are so many variables to care about. For example:

  • Every corpus is different. Text, JSON, C++, photos, numerical data, etc all exhibit different properties when fed into compression and could cause compression ratios or speeds to vary significantly.
  • Few large inputs versus many smaller inputs (some algorithms work better on large inputs; some libraries have high per-operation overhead).
  • Memory allocation and use strategy. Performance can vary significantly depending on how a compression library allocates, manages, and uses memory. This can be an implementation specific detail as opposed to a core property of the compression algorithm.

Since Mercurial is the driver for my work in Zstandard, the data and numbers I report in this post are mostly Mercurial data. Specifically, I'll be referring to data in the mozilla-unified Firefox repository. This repository contains over 300,000 commits spanning almost 10 years. The data within is a good mix of text (mostly C++, JavaScript, Python, HTML, and CSS source code and other free-form text) and binary (like PNGs). The Mercurial layer adds some binary structures to e.g. represent metadata for deltas, diffs, and patching. There are two Mercurial-specific pieces of data I will use. One is a Mercurial bundle. This is essentially a representation of all data in a repository. It stores a mix of raw, fulltext data and deltas on that data. For the mozilla-unified repo, an uncompressed bundle (produced via hg bundle -t none-v2 -a) is ~4457 MB. The other piece of data is revlog chunks. This is a mix of fulltext and delta data for a specific item tracked in version control. I frequently use the changelog corpus, which is the fulltext data describing changesets or commits to Firefox. The numbers quoted and used for charts in this post are available in a Google Sheet.

All performance data was obtained on an i7-6700K running Ubuntu 16.10 (Linux 4.8.0) with a mostly stock config. Benchmarks were performed in memory to mitigate storage I/O or filesystem interference. Memory used is DDR4-2133 with a cycle time of 35 clocks.

While I'm pretty positive about Zstandard, it isn't perfect. There are corpora for which Zstandard performs worse than other algorithms, even ones I compare it directly to in this post. So, your mileage may vary. Please enlighten me with your counterexamples by leaving a comment.

With that (rather large) disclaimer out of the way, let's talk about what makes Zstandard awesome.

Flexibility for Speed Versus Size Trade-offs

Compression algorithms typically contain parameters to control how much work to do. You can choose to spend more CPU to (hopefully) achieve better compression or you can spend less CPU to sacrifice compression. (OK, fine, there are other factors like memory usage at play too. I'm simplifying.) This is commonly exposed to end-users as a compression level. (In reality there are often multiple parameters that can be tuned. But I'll just use level as a stand-in to represent the concept.)

But even with adjustable compression levels, the performance of many compression algorithms and libraries tend to fall within a relatively narrow window. In other words, many compression algorithms focus on niche markets. For example, LZ4 is super fast but doesn't yield great compression ratios. LZMA yields terrific compression ratios but is extremely slow.

This can be visualized in the following chart showing results when compressing a mozilla-unified Mercurial bundle:

bundle compression with common algorithms

This chart plots the logarithmic compression speed in megabytes per second against achieved compression ratio. The further right a data point is, the better the compression and the smaller the output. The higher up a point is, the faster compression is.

The ideal compression algorithm lives in the top right, which means it compresses well and is fast. But the powers of mathematics push compression algorithms away from the top right.

On to the observations.

LZ4 is highly vertical, which means its compression ratios are limited in variance but it is extremely flexible in speed. So for this data, you might as well stick to a lower compression level because higher values don't buy you much.

Bzip2 is the opposite: a horizontal line. That means it is consistently the same speed while yielding different compression ratios. In other words, you might as well crank bzip2 up to maximum compression because it doesn't have a significant adverse impact on speed.

LZMA and zlib are more interesting because they exhibit more variance in both the compression ratio and speed dimensions. But let's be frank, they are still pretty narrow. LZMA looks pretty good from a shape perspective, but its top speed is just too slow - only ~26 MB/s!

This small window of flexibility means that you often have to choose a compression algorithm based on the speed versus size trade-off you are willing to make at that time. That choice often gets baked into software. And as time passes and your software or data gains popularity, changing the software to swap in or support a new compression algorithm becomes harder because of the cost and disruption it will cause. That's technical debt.

What we really want is a single compression algorithm that occupies lots of space in both dimensions of our chart - a curve that has high variance in both compression speed and ratio. Such an algorithm would allow you to make an easy decision choosing a compression algorithm without locking you into a narrow behavior profile. It would allow you make a completely different size versus speed trade-off in the future by only adjusting a config knob or two in your application - no swapping of compression algorithms needed!

As you can guess, Zstandard fulfills this role. This can clearly be seen in the following chart (which also adds brotli for comparison).

bundle compression with modern algorithms

The advantages of Zstandard (and brotli) are obvious. Zstandard's compression speeds go from ~338 MB/s at level 1 to ~2.6 MB/s at level 22 while covering compression ratios from 3.72 to 6.05. On one end, zstd level 1 is ~3.4x faster than zlib level 1 while achieving better compression than zlib level 9! That fastest speed is only 2x slower than LZ4 level 1. On the other end of the spectrum, zstd level 22 runs ~1 MB/s slower than LZMA at level 9 and produces a file that is only 2.3% larger.

It's worth noting that zstd's C API exposes several knobs for tweaking the compression algorithm. Each compression level maps to a pre-defined set of values for these knobs. It is possible to set these values beyond the ranges exposed by the default compression levels 1 through 22. I've done some basic experimentation with this and have made compression even faster (while sacrificing ratio, of course). This covers the gap between Zstandard and brotli on this end of the tuning curve.

The wide span of compression speeds and ratios is a game changer for compression. Unless you have special requirements such as lightning fast operations (which LZ4 can provide) or special corpora that Zstandard can't handle well, Zstandard is a very safe and flexible choice for general purpose compression.

Multi-threaded Compression

Zstd 1.1.3 contains a multi-threaded compression API that allows a compression operation to leverage multiple threads. The output from this API is compatible with the Zstandard frame format and doesn't require any special handling on the decompression side. In other words, a compressor can switch to the multi-threaded API and decompressors won't care.

This is a big deal for a few reasons. First, today's advancements in computer processors tend to yield more capacity from more cores not from faster clocks and better cycle efficiency (although many cases do benefit greatly from modern instruction sets like AVX and therefore better cycle efficiency). Second, so many compression libraries are only single-threaded and require consumers to invent their own framing formats or storage models to facilitate multi-threading. (See Blosc for such a library.) Lack of a multi-threaded API in the compression library means trusting another piece of software or writing your own multi-threaded code.

The following chart adds a plot of Zstandard multi-threaded compression with 4 threads.

multi-threaded compression

The existing curve for Zstandard basically shifted straight up. Nice!

The ~338 MB/s speed for single-threaded compression on zstd level 1 increases to ~1,376 MB/s with 4 threads. That's ~4.06x faster. And, it is ~2.26x faster than the previous fastest entry, LZ4 at level 1! The output size only increased by ~4 MB or ~0.3% over single-threaded compression.

The scaling properties for multi-threaded compression on this input are terrific: all 4 cores are saturated and the output size barely changed.

Because Zstandard's multi-threaded compression API produces data compatible with any Zstandard decompressor, it can logically be considered an extension of compression levels. This means that the already extremely flexible speed vs ratio curve becomes even wider in the speed axis. Zstandard was already a justifiable choice with its extreme versatility. But when you throw in native multi-threaded compression API support, the flexibility for tuning compression performance is just absurd. With enough cores, you are likely to run into I/O limits long before you exhaust the CPU, at which point you can crank up the compression level and sacrifice as much CPU as you are willing to burn. That's a good position to be in.

Decompression Speed

Compression speed and ratios only tell half the story about a compression algorithm. Except for archiving scenarios where you write once and read rarely, you probably care about decompression performance.

Popular compression algorithms like zlib and bzip2 have less than stellar decompression speeds. On my i7-6700K, zlib decompression can deliver many decompressed data sets at the output end at 200+ MB/s. However, on the input/compressed end, it frequently fails to reach 100 MB/s or even 80 MB/s. This is significant because if your application is reading data over a 1 Gbps network or from a local disk (modern SSDs can read at several hundred MB/s or more), then your application has a CPU bottleneck at decoding the data - and that's before you actually do anything useful with the data in the application layer! (Remember: the idea behind compression is to spend CPU to mitigate an I/O bottleneck. So if compression makes you CPU bound, you've undermined the point of compression!) And if my Skylake CPU running at 4.0 GHz is CPU - not I/O - bound, A Xeon in a data center will be even slower and even more CPU bound (Xeons tend to run at much lower clock speeds - the laws of thermodynamics require that in order to run more cores in the package). In short, if you are using zlib for high throughput scenarios, there's a good chance it is a bottleneck and slowing down your application.

We again measure the speed of algorithms using a Firefox Mercurial bundle. The following charts plot decompression speed versus ratio for this file. The first chart measures decompression speed on the input end of the decompressor. The second measures speed at the output end.

decompression input

decompression output

Zstandard matches its great compression speed with great decompression speed. Zstandard can deliver decompressed output at 1000+ MB/s while consuming input at 200-275MB/s. Furthermore, decompression speed is mostly independent of the compression level. (Although higher compression levels require more memory in the decompressor.) So, if you want to throw more CPU at re-compression later so data at rest takes less space, you can do that without sacrificing read performance. I haven't done the math, but there is probably a break-even point where having dedicated machines re-compress terabytes or petabytes of data at rest offsets the costs of those machine through reduced storage costs.

While Zstandard is not as fast decompressing as LZ4 (which can consume compressed input at 500+ MB/s), its performance is often ~4x faster than zlib. On many CPUs, this puts it well above 1 Gbps, which is often desirable to avoid a bottleneck at the network layer.

It's also worth noting that while Zstandard and brotli were comparable on the compression half of this data, Zstandard has a clear advantage doing decompression.

Finally, you don't appear to pay a price for multi-threaded Zstandard compression on the decompression side (zstdmt in the chart).

Dictionary Support

The examples so far in this post have used a single 4,457 MB piece of input data to measure behavior. Large data can behave completely differently from small data. This is because so much of what compression algorithms do is find patterns that came before so incoming data can be referenced to old data instead of uniquely stored. And if data is small, there isn't much of it that came before to reference!

This is often why many small, independent chunks of input compress poorly compared to a single large chunk. This can be demonstrated by comparing the widely-used zip and tar archive formats. On the surface, both do the same thing: they are a container of files. But they employ compression at different phases. A zip file will zlib compress each entry independently. However, a tar file doesn't use compression internally. Instead, the tar file itself is fed into a compression algorithm and compressed as a whole.

We can observe the difference on real world data. Firefox ships with a file named omni.ja. Despite the weird extension, this is a zip file. The file contains most of the assets for non-compiled code used by Firefox. This includes the JavaScript, HTML, CSS, and images that power many parts of the Firefox frontend. The file weighs in at 9,783,749 bytes for the 64-bit Windows Firefox Nightly from 2017-03-06. (Or 9,965,793 bytes when using zip -9 - the code for generating omni.ja is smarter than zip and creates smaller files.) But a zlib level 9 compressed tar.gz file of that directory is 8,627,155 bytes. That 1,156KB / 13% size difference is significant when you are talking about delivering bits to end users! (In this case, the content within the archive needs to be individually addressable to facilitate fast access to any item without having to decompress the entire archive: this matters for performance.)

A more extreme example of the differences between zip and tar is the files in the Firefox source checkout. On revision a08ec245fa24 of the Firefox Mercurial repository, a zip file of all files in version control is 430,446,549 bytes versus 322,916,403 bytes for a tar.gz file (1,177,430,383 bytes uncompressed spanning 180,912 files). Using Zstandard, compressing each file discretely at compression level 3 yields 391,387,299 bytes of compressed data versus 294,926,418 as a single stream (without the tar container). Same compression algorithm. Different application method. Drastically different results. That's the impact of input size on compression performance.

While the compression ratio and speed of a single large stream is often better than multiple smaller chunks, there are still use cases that either don't have enough data or prefer independent access to each piece of input (like Firefox's omni.ja file). So a robust compression algorithm should handle small inputs as well as it does large inputs.

Zstandard helps offset the inherent inefficiencies of small inputs by supporting dictionary compression. A dictionary is essentially data used to seed the compressor's state. If the compressor sees data that exists in the dictionary, it references the dictionary instead of storing new data in the compressed output stream. This results in smaller output sizes and better compression ratios. One drawback to this is the dictionary has to be used to decompress data, which means you need to figure out how to distribute the dictionary and ensure it remains in sync with all data producers and consumers. This isn't always trivial.

Dictionary compression only works if there is enough repeated data and patterns in the inputs that can be extracted to yield a useful dictionary. Examples of this include markup languages, source code, or pieces of similar data (such as JSON payloads from HTTP API requests or telemetry data), which often have many repeated keywords and patterns.

Dictionaries are typically produced by training them on existing data. Essentially, you feed a bunch of samples into an algorithm that spits out a meaningful and useful dictionary. The more coherency in the data that will be compressed, the better the dictionary and the better the compression ratios.

Dictionaries can have a significant effect on compression ratios and speed.

Let's go back to Firefox's omni.ja file. Compressing each file discretely at zstd level 12 yields 9,177,410 bytes of data. But if we produce a 131,072 byte dictionary by training it on all files within omni.ja, the total size of each file compressed discretely is 7,942,886 bytes. Including the dictionary, the total size is 8,073,958 bytes, 1,103,452 bytes smaller than non-dictionary compression! (The zlib-based omni.ja is 9,783,749 bytes.) So Zstandard plus dictionary compression would likely yield a meaningful ~1.5 MB size reduction to the omni.ja file. This would make the Firefox distribution smaller and may improve startup time (since many files inside omni.ja are accessed at startup), which would make a number of people very happy. (Of course, Firefox doesn't yet contain the zstd C library. And adding it just for this use case may not make sense. But Firefox does ship with the brotli library and brotli supports dictionary compression and has similar performance characteristics as Zstandard, so, uh, someone may want to look into transitioning omni.jar to not zlib.)

But the benefits of dictionary compression don't end at compression ratios: operations with dictionaries can be faster as well!

The following chart shows performance when compressing Mercurial changeset data (describes a Mercurial commit) for the Firefox repository. There are 382,530 discrete inputs spanning 221,429,458 bytes (mean: 579 bytes, median: 306 bytes). (Note: measurements were conducted in Python and therefore may introduce some overhead.)

dictionary compression performance

Aside from zstd level 3 dictionary compression, Zstandard is faster than zlib level 6 across the board (I suspect this one-off is an oddity with the zstd compression parameters at this level and this corpus because zstd level 4 is faster than level 3, which is weird).

It's also worth noting that non-dictionary zstandard compression has similar compression ratios to zlib. Again, this demonstrates the intrinsic difficulties of compressing small inputs.

But the real takeaway from this data are the speed differences with dictionary compression enabled. Dictionary decompression is 2.2-2.4x faster than non-dictionary decompression. Already respectable ~240 MB/s decompression speed (measured at the output end) becomes ~530 MB/s. Zlib level 6 was ~140 MB/s, so swapping in dictionary compression makes things ~3.8x faster. It takes ~1.5s of CPU time to zlib decompress this corpus. So if Mercurial can be taught to use Zstandard dictionary compression for changelog data, certain operations on this corpus will complete ~1.1s faster. That's significant.

It's worth stating that Zstandard isn't the only compression algorithm or library to support dictionary compression. Brotli and zlib do as well, for example. But, Zstandard's support for dictionary compression seems to be more polished than other libraries I've seen. It has multiple APIs for training dictionaries from sample data. (Brotli has none nor does brotli's documentation say how to generate dictionaries as far as I can tell.)

Dictionary compression is definitely an advanced feature, applicable only to certain use cases (lots of small, similar data). But there's no denying that if you can take advantage of dictionary compression, you may be rewarded with significant performance wins.

A Versatile C API

I spend a lot of my time these days in higher-level programming languages like Python and JavaScript. By the time you interact with compression in high-level languages, the low-level compression APIs provided by the compression library are most likely hidden from you and bundled in a nice, friendly abstraction, suitable for a higher-level language. And more often than not, many features of that low-level API are not exposed for you to call. So, you don't get an appreciation for how good (or bad) or feature rich (or lacking) the low-level API is.

As part of writing python-zstandard, I've spent a lot of time interfacing with the zstd C API. And, as part of evaluating other compression libraries for use in Mercurial, I've been looking at C APIs for other libraries and the Python bindings to them. A takeaway from this is an appreciation for the quality of zstd's C API.

Many compression library APIs are either too simple or too complex. Zstandard's is in the Goldilocks zone. Aside from a few minor missing features, its C API was more than adequate in its 1.0 release.

What I really appreciate about the zstd C API is that it provides high, medium, and low-level APIs. From the highest level, you throw it pointers to input and output buffers and it does an operation. From the medium level, you use a reusable context holding state and other parameters and it does an operation. From the low-level, you are calling multiple functions and shuffling bytes around, maintaining your own state and potentially bypassing the Zstandard framing format in the process. The different levels give you almost total control over everything. This is critical for performance optimization and when writing bindings for higher-level languages that may have different expectations on the behavior of software. The performance I've achieved in python-zstandard just isn't (easily) possible with other compression libraries because of their lacking API design.

Oftentimes when interacting with a C library I think if only there were a function to let me do X my life would be much easier. I rarely have this experience with Zstandard. The C API is well thought out, has almost all the features I want/need, and is pretty easy to use. While most won't notice this difference, it should be a significant advantage for Zstandard in the long run, as more bindings are written and more people have a high-quality experience with it because the C API allows them to.

Zstandard Isn't Perfect

I've been pretty positive about Zstandard so far in this post. In fear of sounding like a fanboy who is so blinded by admiration that he can't see faults and because nothing is perfect, I need to point out some negatives about Zstandard. (Aside: put little faith in the words uttered by someone who can't find a fault in something they praise.)

First, the framing format is a bit heavyweight in some scenarios. The frame header is at least 6 bytes. For input of 256-65791 bytes, recording the original source size and its checksum will result in a 12 byte frame. Zlib, by contrast, is only 6 bytes for this scenario. When storing tens of thousands of compressed records (this is a use case in Mercurial), the frame overhead can matter and this can make it difficult for compressed Zstandard data to be as small as zlib for very small inputs. (It's worth noting that zlib doesn't store the decompressed size in its header. There are pros and cons to this, which I'll discuss in my eventual post about python-zstandard and how it achieves optimal performance.) If the frame overhead matters to you, the zstd C API does expose a block API that operates at a level below the framing format, allowing you to roll your own framing protocol. I also filed a GitHub issue to make the 4 byte magic number optional, which would go a long way to cutting down on frame overhead.

Second, the C API is not yet fully stabilized. There are a number of functions marked as experimental that aren't exported from the shared library and are only available via static linking. There's a ton of useful functionality in there, including low-level compression parameter adjustment, digested dictionaries (for reusing computed dictionaries across multiple contexts), and the multi-threaded compression API. python-zstandard makes heavy use of these experimental APIs. This requires bundling zstd with python-zstandard and statically linking with this known version because functionality could change at any time. This is a bit annoying, especially for distro packagers.

Third, the low-level compression parameters are under-documented. I think I understand what a lot of them do. But it isn't obvious when I should consider adjusting what. The default compression levels seem to work pretty well and map to reasonable compression parameters. But a few times I've noticed that tweaking things slightly can result in desirable improvements. I wish there were a guide of sorts to help you tune these parameters.

Fourth, dictionary compression is still a bit too complicated and hand-wavy for my liking. I can measure obvious benefits when using it largely out of the box with some corpora. But it isn't always a win and the cost for training dictionaries is too high to justify using it outside of scenarios where you are pretty sure it will be beneficial. When I do use it, I'm not sure which compression levels it works best with, how many samples need to be fed into the dictionary trainer, which training algorithm to use, etc. If that isn't enough, there is also the concept of content-only dictionaries where you use a fulltext as the dictionary. This can be useful for delta-encoding schemes (where compression effectively acts like a diff/delta generator instead of using something like Myers diff). If this topic interests you, there is a thread on the Mercurial developers list where Yann Collet and I discuss this.

Fifth, the patent rights grant. There is some wording in the PATENTS file in the Zstandard project that may... concern lawyers. While Zstandard is covered by the standard BSD 3-Clause license, that supplemental PATENTS file may scare some lawyers enough that you won't be able to use Zstandard. You may want to talk to a lawyer before using Zstandard, especially if you or your company likes initiating patent lawsuits against companies (or wishes to reserve that right - as many companies do), as that is the condition upon which the license terminates. Note that there is a long history between Facebook and consumers of its open source software regarding this language in the PATENTS file. Do a search for React patent grant to read more.

Sixth and finally, Zstandard is still relatively new. I can totally relate to holding off until something new and shiny proves itself. That being said, the Zstandard framing protocol has some escape hatches for future needs. And, the project proved during its pre-1.0 days that it knows how to handle backwards and future compatibility issues. And considering Facebook and others are using Zstandard in production, I wouldn't be too worried. I think the biggest risk is to people (like me) who are writing code against the experimental C APIs. But even then, the changes to the experimental APIs in the past several months have been minor. I'm not losing sleep over it.

That may seem like a long and concerning list. Most of the issues are relatively minor. The language in the PATENTS file may be a showstopper to some. From my perspective, the biggest thing Zstandard has going against it is its youth. But that will only improve with age. While I'm usually pretty conservative about adopting new technology (I've gotten burned enough times that I prefer the neophytes do the field testing for me), the upside to using Zstandard is potentially drastic performance and efficiency gains. And that can translate to success versus failure or millions of dollars in saved infrastructure costs and productivity gains. I'm willing to take my chances.

Conclusion

For the corpora I've thrown at it, Zstandard handily outperforms zlib in almost every dimension. And, it even manages to best other modern compression algorithms like brotli in many tests.

The underlying algorithm and techniques used by Zstandard are highly parameterized, lending themselves to a variety of use cases from embedded hardware to massive data crunching machines with hundreds of gigabytes of memory and dozens of CPU cores.

The C API is well-designed and facilitates high performance and adaptability to numerous use cases. It is batteries included, providing functions to train dictionaries and perform multi-threaded compression.

Zstandard is backed by Facebook and seems to have a healthy open source culture on Github. My interactions with Yann Collet have been positive and he seems to be a great project maintainer.

Zstandard is an exciting advancement for data compression and therefore for the entire computing field. As someone who has lived in the world of zlib for years, was a casual user of compression, and thought zlib was good enough for most use cases, I can attest that Zstandard is game changing. After being enlightened to all the advantages of Zstandard, I'll never casually use zlib again: it's just too slow and inflexible for the needs of modern computing. If you use compression, I highly recommend investigating Zstandard.

(I updated the post on 2017-03-08 to include a paragraph about the supplemental license in the PATENTS file.)

Mozilla Privacy BlogMozilla statement on CIA / WikiLeaks

Today, the organization WikiLeaks published a compendium of information alleged to be documents from the U.S. Central Intelligence Agency (CIA) pertaining to tools and techniques to compromise the security of mobile phones, computers, and internet-connected devices. We released the following statement on these reports:

If the information released in today’s reports are accurate, then it proves the CIA is undermining the security of the internet – and so is Wikileaks. We’ve said before that cybersecurity is a shared responsibility, and this is true in this example, regarding the disclosure of security vulnerabilities. It appears that neither the CIA nor Wikileaks are living up to that standard – the CIA seems to be stockpiling vulnerabilities, and Wikileaks seems to be using that trove for shock value rather than coordinating disclosure to the affected companies to give them a chance to fix it and protect users.

The government may have legitimate intelligence or law enforcement reasons for delaying disclosure of vulnerabilities (for example, to enable lawful hacking), but these same vulnerabilities can endanger the security of billions of people. These two interests must be balanced, and recent incidents demonstrate just how easily stockpiling vulnerabilities can go awry without proper policies and procedures in place.

Once governments become aware of a security vulnerability, they have a responsibility to consider how and when (not whether) to disclose the vulnerability to the affected company so they can fix the problem and protect users.

We have been advocating for broader, open conversations about disclosure of security vulnerabilities and although today’s disclosures are jarring, we hope this raises awareness of the severity of these issues and the urgency of collaborating on reforms.

The post Mozilla statement on CIA / WikiLeaks appeared first on Open Policy & Advocacy.

Ben KeroThe Dark Arts of SSH presentation materials

Here is the presentation material for my talk entitled The Dark Arts of SSH. Please note this is a single HTML rendering that incldues presenter’s notes.