Planet Release Engineering

October 14, 2014

Jordan Lund (jlund)

This week in Releng - Oct 5th, 2014

Major highlights:

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

October 14, 2014 04:36 AM

October 07, 2014

Ben Hearsum (bhearsum)

Redo 1.3 is released – now with more natural syntax!

We’ve been using the functions packaged in Redo for a few years now at Mozilla. One of the things we’ve been striving for with it is the ability to write the most natural code possible. In it’s simplest form, retry, a callable that may raise, the exceptions to retry on, and the callable to run to cleanup before another attempt – are all passed in as arguments. As a result, we have a number of code blocks like this, which don’t feel very Pythonic:

retry(self.session.request, sleeptime=5, max_sleeptime=15,
      kwargs=dict(method=method, url=url, data=data,
                  config=self.config, timeout=self.timeout,
                  auth=self.auth, params=params)

It’s particularly unfortunate that you’re forced to let retry do your exception handling and cleanup – I find that it makes the code a lot less readable. It’s also not possible to do anything in a finally block, unless you wrap the retry in one.

Recently, Chris AtLee discovered a new method of doing retries that results in much cleaner and more readable code. With it, the above block can be rewritten as:

for attempt in retrier(attempts=self.retries):
        self.session.request(method=method, url=url, data=data,
                             timeout=self.timeout, auth=self.auth,
    except (requests.HTTPError, requests.ConnectionError), e:

retrier simply handles the the mechanics of tracking attempts and sleeping, leaving your code to do all of its own exception handling and cleanup – just as if you weren’t retrying at all. It’s important to note that the break at the end of the try block is important, otherwise self.session.request would run even if it succeeded.

I released Redo 1.3 with this new functionality this morning – enjoy!

October 07, 2014 12:48 PM

October 02, 2014

Hal Wine (hwine)

bz Quick Search

October 02, 2014 07:00 AM

September 29, 2014

Jordan Lund (jlund)

This Week In Releng - Sept 21st, 2014

Major Highlights:

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

September 29, 2014 06:08 PM

This Week In Releng - Sept 7th, 2014

Major Highlights

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

September 29, 2014 05:44 PM

September 25, 2014

Armen Zambrano G. (@armenzg)

Making mozharness easier to hack on and try support

Yesterday, we presented a series of proposed changes to Mozharness at the bi-weekly meeting.

We're mainly focused on making it easier for developers and allow for further flexibility.
We will initially focus on the testing side of the automation and make ground work for other further improvements down the line.

The set of changes discussed for this quarter are:

  1. Move remaining set of configs to the tree - bug 1067535
    • This makes it easier to test harness changes on try
  2. Read more information from the in-tree configs - bug 1070041
    • This increases the number of harness parameters we can control from the tree
  3. Use structured output parsing instead of regular where it applies - bug 1068153
    • This is part of a larger goal where we make test reporting more reliable, easy to consume and less burdening on infrastructure
    • It's to establish a uniform criteria for setting a job status based on a test result that depends on structured log data (json) rather than regex-based output parsing
    • "How does a test turn a job red or orange?" 
    • We will then have a simple answer that is that same for all test harnesses
  4. Mozharness try support - bug 791924
    • This will allow us to lock which repo and revision of mozharnes is checked out
    • This isolates mozharness changes to a single commit in the tree
    • This give us try support for user repos (freedom to experiment with mozharness on try)

Even though we feel the pain of #4, we decided that the value gained for developers through #1 & #2 gave us immediate value while for #4 we know our painful workarounds.
I don't know if we'll complete #4 in this quarter, however, we are committed to the first three.

If you want to contribute to the longer term vision on that proposal please let me know.

In the following weeks we will have more updates with regards to implementation details.

Stay tuned!

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

September 25, 2014 07:42 PM

September 23, 2014

Ben Hearsum (bhearsum)

Stop stripping (OS X builds), it leaves you vulnerable

While investigating some strange update requests on our new update server, I discovered that we have thousands of update requests from Beta users on OS X that aren’t getting an update, but should. After some digging I realized that most, if not all of these are coming from users who have installed one of our official Beta builds and subsequently stripped out the architecture they do not need from it. In turn, this causes our builds to report in such a way that we don’t know how to serve updates for them.

We’ll look at ways of addressing this, but the bottom line is that if you want to be secure: Stop stripping Firefox binaries!

September 23, 2014 05:38 PM

September 19, 2014

Ben Hearsum (bhearsum)

New update server has been rolled out to Firefox/Thunderbird Beta users

Yesterday marked a big milestone for the Balrog project when we made it live for Firefox and Thunderbird Beta users. Those with a good long term memory may recall that we switched Nightly and Aurora users over almost a year ago. Since then, we’ve been working on and off to get Balrog ready to serve Beta updates, which are quite a bit more complex than our Nightly ones. Earlier this week we finally got the last blocker closed and we flipped it live yesterday morning, pacific time. We have significantly (~10x) more Beta users than Nightly+Aurora, so it’s no surprise that we immediately saw a spike in traffic and load, but our systems stood up to it well. If you’re into this sort of thing, here are some graphs with spikey lines:
The load average on 1 (of 4) backend nodes:

The rate of requests to 1 backend node (requests/second):

Database operations (operations/second):

And network traffic to the database (MB/sec):

Despite hitting a few new edge cases (mostly around better error handling), the deployment went very smoothly – it took less than 15 minutes to be confident that everything was working fine.

While Nick and I are the primary developers of Balrog, we couldn’t have gotten to this point without the help of many others. Big thanks to Chris and Sheeri for making the IT infrastructure so solid, to Anthony, Tracy, and Henrik for all the testing they did, and to Rail, Massimo, Chris, and Aki for the patches and reviews they contributed to Balrog itself. With this big milestone accomplished we’re significantly closer to Balrog being ready for Release and ESR users, and retiring the old AUS2/3 servers.

September 19, 2014 02:31 PM

September 17, 2014

Kim Moir (kmoir)

Mozilla Releng: The ice cream

A week or so ago, I was commenting in IRC that I was really impressed that our interns had such amazing communication and presentation skills.  One of the interns, John Zeller said something like "The cream rises to the top", to which I replied "Releng: the ice cream of CS".  From there, the conversation went on to discuss what would be the best ice cream flavour to make that would capture the spirit of Mozilla releng.  The consensus at the end was was that Irish Coffee (coffee with whisky) with cookie dough chunks was the favourite.  Because a lot of people like on the team like coffee, whisky makes it better and who doesn't like cookie dough?

I made this recipe over the weekend with some modifications.  I used the coffee recipe from the Perfect Scoop.  After it was done churning in the ice cream maker,  instead of whisky, which I didn't have on hand, I added Kahlua for more coffee flavour.  I don't really like cookie dough in ice cream but cooked chocolate chip cookies cut up with a liberal sprinkling of Kahlua are tasty.

Diced cookies sprinkled with Kahlua

Ice cream ready to put in freezer

Finished product
I have to say, it's quite delicious :-) If I open source ever stops being fun, I'm going to start a dairy empire.  Not really. Now back to bugzilla...

September 17, 2014 01:43 PM

September 16, 2014

Armen Zambrano G. (@armenzg)

Which builders get added to buildbot?

To add/remove jobs on, we have to modify buildbot-configs.

Making changes can be learnt by looking at previous patches, however, there's a bit of an art to it to get it right.

I just landed a script that sets up buildbot for you inside of a virtualenv and you can pass a buildbot-config patch and determine which builders get added/removed.

You can run this by checking out braindump and running something like this:
buildbot-related/ -j path_to_patch.diff

NOTE: This script does not check that the job has all the right parameters once live (e.g. you forgot to specify the mozharness config for it).

Happy hacking!

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

September 16, 2014 03:26 PM

September 11, 2014

Armen Zambrano G. (@armenzg)

Run tbpl jobs locally with Http authentication ( - take 2

Back in July, we deployed the first version of Http authentication for mozharness, however, under some circumstances, the initial version could fail and affect production jobs.

This time around we have:

If you read How to run Mozharness as a developer you should see the new changes.

As quick reminder, it only takes 3 steps:

  1. Find the command from the log. Copy/paste it.
  2. Append --cfg
  3. Append --installer-url/--test-url with the right values
To see a real example visit this

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

September 11, 2014 12:45 PM

Massimo Gerva (mgerva)

Canada, whale watching

We loved our staying in Canada, here are some pictures about our travel.

Let’s start from the amazing whale watching day in Vancouver:

Click to view slideshow.

September 11, 2014 10:12 AM

September 10, 2014

Kim Moir (kmoir)

Mozilla pushes - August 2014

Here's August 2014's monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file.

It was another record breaking month.  No surprise here!


General Remarks
Both Try and Gaia-Try have about 36% each of the pushes.  The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 21% of all the pushes.

August 2014 was the month with most pushes (13,090  pushes)
August 2014 has the highest pushes/day average with 620 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
August 20, 2014 had the highest number of pushes in one day with 690 pushes

September 10, 2014 02:09 PM

September 09, 2014

Nick Thomas (nthomas)

ZNC and Mozilla IRC

ZNC is great for having a persistent IRC connection, but it’s not so great when the IRC server or network has a blip. Then you can end up failing to rejoin with

nthomas (…) has joined #releng
nthomas has left … (Max SendQ exceeded)

over and over again.

The way to fix this is to limit the number of channels ZNC can connect to simultaneously. In the Web UI, you change ‘Max Joins’ preference to something like 5. In the config file use ‘MaxJoins = 5′ in a <User foo> block.

September 09, 2014 10:19 AM

September 08, 2014

Jordan Lund (jlund)

This Week In Releng - Sept 1st, 2014

Major Highlights:

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

September 08, 2014 04:58 AM

September 06, 2014

Hal Wine (hwine)

New Hg Server Status Page

New Hg Server Status Page

Just a quick note to let folks know that the Developer Services team continues to make improvements on Mozilla’s Mercurial server. We’ve set up a status page to make it easier to check on current status.

As we continue to improve monitoring and status displays, you’ll always find the “latest and greatest” on this page. And we’ll keep the page updated with recent improvements to the system. We hope this page will become your first stop whenever you have questions about our Mercurial server.

September 06, 2014 07:00 AM

September 01, 2014

Nick Thomas (nthomas)

Deprecating our old rsync modules

We’ve removed the rsync modules mozilla-current and mozilla-releases today, after calling for comment a few months ago and hearing no objections. Those modules were previously used to deliver Firefox and other Mozilla products to end users via a network of volunteer mirrors but we now use content delivery networks (CDN). If there’s a use case we haven’t considered then please get in touch in the comments or on the bug.

September 01, 2014 10:09 PM

August 26, 2014

Chris AtLee (catlee)

Gotta Cache 'Em All


Waaaaaaay back in February we identified overall network bandwidth as a cause of job failures on TBPL. We were pushing too much traffic over our VPN link between Mozilla's datacentre and AWS. Since then we've been working on a few approaches to cope with the increased traffic while at the same time reducing our overall network load. Most recently we've deployed HTTP caches inside each AWS region.

Network traffic from January to August 2014

The answer - cache all the things!

Obligatory XKCD

Caching build artifacts

The primary target for caching was downloads of build/test/symbol packages by test machines from file servers. These packages are generated by the build machines and uploaded to various file servers. The same packages are then downloaded many times by different machines running tests. This was a perfect candidate for caching, since the same files were being requested by many different hosts in a relatively short timespan.

Caching tooltool downloads

Tooltool is a simple system RelEng uses to distribute static assets to build/test machines. While the machines do maintain a local cache of files, the caches are often empty because the machines are newly created in AWS. Having the files in local HTTP caches speeds up transfer times and decreases network load.

Results so far - 50% decrease in bandwidth

Initial deployment was completed on August 8th (end of week 32 of 2014). You can see by the graph above that we've cut our bandwidth by about 50%!

What's next?

There are a few more low hanging fruit for caching. We have internal pypi repositories that could benefit from caches. There's a long tail of other miscellaneous downloads that could be cached as well.

There are other improvements we can make to reduce bandwidth as well, such as moving uploads from build machines to be outside the VPN tunnel, or perhaps to S3 directly. Additionally, a big source of network traffic is doing signing of various packages (gpg signatures, MAR files, etc.). We're looking at ways to do that more efficiently. I'd love to investigate more efficient ways of compressing or transferring build artifacts overall; there is a ton of duplication between the build and test packages between different platforms and even between different pushes.

I want to know MOAR!

Great! As always, all our work has been tracked in a bug, and worked out in the open. The bug for this project is 1017759. The source code lives in, and we have some basic documentation available on our wiki. If this kind of work excites you, we're hiring!

Big thanks to George Miroshnykov for his work on developing proxxy.

August 26, 2014 02:21 PM

August 18, 2014

Jordan Lund (jlund)

This week in Releng - Aug 11th 2014

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

August 18, 2014 06:38 AM

August 12, 2014

Ben Hearsum (bhearsum)

Upcoming changes to Mac package layout, signing

Apple recently announced changes to how OS X applications must be packaged and signed in order for them to function correctly on OS X 10.9.5 and 10.10. The tl;dr version of this is “only mach-O binaries may live in .app/Contents/MacOS, and signing must be done on 10.9 or later”. Without any changes, future versions of Firefox will cease to function out-of-the-box on OS X 10.9.5 and 10.10. We do not have a release date for either of these OS X versions yet.

Changes required:
* Move all non-mach-O files out of .app/Contents/MacOS. Most of these will move to .app/Contents/Resources, but files that could legitimately change at runtime (eg: everything in defaults/) will move to .app/MozResources (which can be modified without breaking the signature): This work is in progress, but no patches are ready yet.
* Add new features to the client side update code to allow partner repacks to continue to work. (
* Create and use 10.9 signing servers for these new-style apps. We still need to use our existing 10.6 signing servers for any builds without these changes. ( and
* Update signing server code to support new v2 signatures.

We are intending to ship the required changes with Gecko 34, which ships on November 25th, 2014. The changes required are very invasive, and we don’t feel that they can be safely backported to any earlier version quickly enough without major risk of regressions. We are still looking at whether or not we’ll backport to ESR 31. To this end, we’ve asked that Apple whitelist Firefox and Thunderbird versions that will not have the necessary changes in them. We’re still working with them to confirm whether or not this can happen.

This has been cross posted a few places – please send all follow-ups to the newsgroup.

August 12, 2014 05:05 PM

August 11, 2014

Jordan Lund (jlund)

This Week In Releng - Aug 4th, 2014

Major Highlights:

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

August 11, 2014 01:09 AM

August 08, 2014

Kim Moir (kmoir)

Mozilla pushes - July 2014

Here's the July 2014 monthly analysis of the pushes to our Mozilla development trees. You can load the data as an HTML page or as a json file.
Like every month for the past while, we had a new record number of pushes. In reality, given that July is one day longer than June, the numbers are quite similar.


General remarks
Try keeps on having around 38% of all the pushes. Gaia-Try is in second place with around 31% of pushes.  The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 22% of all the pushes.

July 2014 was the month with most pushes (12,755 pushes)
June 2014 has the highest pushes/day average with 662 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
June 4th, 2014 had the highest number of pushes in one day with 662 

August 08, 2014 06:16 PM

August 07, 2014

Kim Moir (kmoir)

Scaling mobile testing on AWS

Running tests for Android at Mozilla has typically meant running on reference devices.  Physical devices that run jobs on our continuous integration farm via test harnesses.  However, this leads to the same problem that we have for other tests that run on bare metal.  We can't scale up our capacity without going buying new devices, racking them, configuring them for the network and updating our configurations.  In addition, reference cards, rack mounted or not, are rather delicate creatures and have higher retry rates (tests fail due to infrastructure issues and need to be rerun) than those running on emulators (tests run on an Android emulator in a VM on bare metal or cloud)

Do Android's Dream of Electric Sheep?  ©Bill McIntyre, Creative Commons by-nc-sa 2.0
Recently, we started running Android 2.3 tests on emulators in AWS.  This works well for unit tests (correctness tests).  It's not really appropriate for performance tests, but that's another story.  This impetus behind this change was so we could decommission Tegras, the reference devices we used for running Android 2.2 tests. 

We run many Linux based tests, including Android emulators on AWS spot instances.  Spot instances are AWS excess capacity that you can bid on.  If someone outbids the price you have paid for your spot instance, you instance can be terminated.  But that's okay because we retry jobs if they fail for infrastructure reasons.  The overall percentage of spot instances that are terminated is quite small.  The huge advantage to using spot instances is price.  They are much cheaper than on-demand instances which has allowed us to increase our capacity while continuing to reduce our AWS bill

We have a wide variety of unit tests that run on emulators for mobile on AWS.  We encountered an issue where some of the tests wouldn't run on the default instance type (m1.medium), that we use for our spot instances.   Given the number of jobs we run, we want to run on the cheapest AWS instance type that where the tests will complete successfully.  At the time we first tested it, we couldn't find an instance type where certain CPU/memory intensive tests would run.  So when I first enabled Android 2.3 tests on emulators, I separated the tests so that some would run on AWS spot instances and the ones that needed a more powerful machine would run on our inhouse Linux capacity.  But this change consumed all of the capacity of that pool and we had very high number of pending jobs in that pool.  This meant that people had to wait a long time for their test results.  Not good.

To reduce the pending counts, we needed to buy some more in house Linux capacity or try to run a selected subset of the tests that need more resources or find a new AWS instance type where they would complete successfully.  Geoff from the ATeam ran the tests on the c3.xlarge instance type he had tried before and now it seemed to work.  In his earlier work the tests did not complete successfully on this instance type.  We are unsure as to the reasons why.  One of the things about working with AWS is that we don't have a window into the bugs that they fix at their end.  So this particular instance type didn't work before, but it does now.

The next steps for me were to create a new AMI (Amazon machine image) that would serve as as the "golden" version for instances that would be created in this pool.  Previously, we used Puppet to configure our AWS test machines but now just regenerate the AMI every night via cron and this is the version that's instantiated.  The AMI was a copy of the existing Ubuntu64 image that we have but it was configured to run on the c3.xlarge instance type instead of m1.medium. This was a bit tricky because I had to exclude regions where the c3.xlarge instance type was not available.  For redundancy (to still have capacity if an entire region goes down) and cost (some regions are cheaper than others), we run instances in multiple AWS regions

Once I had the new AMI up that would serve as the template for our new slave class, I created a slave with the AMI and verified running the tests we planned to migrate on my staging server.  I also enabled two new Linux64 buildbot masters in AWS to service these new slaves, one in us-east-1 and one in us-west-2.  When enabling a new pool of test machines, it's always good to look at the load on the current buildbot masters and see if additional masters are needed so the current masters aren't overwhelmed with too many slaves attached.

After the tests were all green, I modified our configs to run this subset of tests on a branch (ash), enabled the slave platform in Puppet and added a pool of devices to this slave platform in our production configs.  After the reconfig deployed these changes into production, I landed a regular expression to watch_pending.cfg to so that new tst-emulator64-spot pool of machines would be allocated to the subset of tests and branch I enabled them on. The script watches the number of pending jobs that on AWS and creates instances as required.  We also have scripts to terminate or stop idle instances when we don't get them.  Why pay for machines when you don't need them now?  After the tests ran successfully on ash, I enabled running the tests on the other relevant branches.

Royal Border Bridge.  Also, release engineers love to see green builds and tests.  ©Jonathan Combe, Creative Commons by-nc-sa 2.0
The end result is that some Android 2.3 tests run on m1.medium or (tst-linux64-spot instances), such as mochitests.

And some Android 2.3 tests run on c3.xlarge or (tst-emulator64-spot instances), such as crashtests.


In enabling this slave class within our configs, we were also able to reuse it for some b2g tests which also faced the same problem where they needed a more powerful instance type for the tests to complete.

Lessons learned:
Use the minimum (cheapest) instance type required to complete your tests
As usual, test on a branch before full deployment
Scaling mobile tests doesn't mean more racks of reference cards

Future work:
Bug 1047467 c3.xlarge instance types are expensive, let's test running those tests on a range of instance types that are cheaper

Further reading:
AWS instance types 
Chris Atlee wrote about how we Now Use AWS Spot Instances for Tests
Taras Glek wrote How Mozilla Amazon EC2 Usage Got 15X Cheaper in 8 months
Rail Aliiev 
Bug 980519 Experiment with other instance types for Android 2.3 jobs 
Bug 1024091 Address high pending count in in-house Linux64 test pool 
Bug 1028293 Increase Android 2.3 mochitest chunks, for aws 
Bug 1032268 Experiment with c3.xlarge for Android 2.3 jobs
Bug 1035863 Add two new Linux64 masters to accommodate new emulator slaves
Bug 1034055 Implement c3.xlarge slave class for Linux64 test spot instances
Bug 1031083 Buildbot changes to run selected b2g tests on c3.xlarge
Bug 1047467 c3.xlarge instance types are expensive, let's try running those tests on a range of instance types that are cheaper

August 07, 2014 06:24 PM

August 04, 2014

Jordan Lund (jlund)

This Week In Releng - July 28th, 2014

Major Highlights:

Completed Work (marked as resolved):

In progress work (unresolved and not assigned to nobody):

August 04, 2014 04:22 PM

July 28, 2014

Kim Moir (kmoir)

2014 USENIX Release Engineering Summit CFP now open

The CFP for the 2014 Release Engineering summit (Western edition) is now open.  The deadline for submissions is September 5, 2014 and speakers will be notified by September 19, 2014.  The program will be announced in late September.  This one day summit on all things release engineering will be held in concert with LISA, in Seattle on November 10, 2014. 

Seattle skyline © Howard Ignatius, Creative Commons by-nc-sa 2.0

From the CFP

"Suggestions for topics include (but are not limited to):
URES '14 West is looking for relevant and engaging speakers and workshop facilitators for our event on November 10, 2014, in Seattle, WA. URES brings together people from all areas of release engineering—release engineers, developers, managers, site reliability engineers, and others—to identify and help propose solutions for the most difficult problems in release engineering today."

War and horror stories. I like to see that in a CFP.  Describing how you overcame problems with  infrastructure and tooling to ship software are the best kinds of stories.  They make people laugh. Maybe cry as they realize they are currently living in that situation.  Good times.  Also, I think talks around scaling high volume continuous integration farms will be interesting.  Scaling issues are a lot of fun and expose many issues you don't see when you're only running a few builds a day. 

If you have any questions surrounding the CFP, I'm happy to help as I'm on the program committee.   (my irc nick is kmoir (#releng) as is my email id at

July 28, 2014 09:28 PM

July 25, 2014

Aki Sasaki (aki)

on leaving mozilla

Today's my last day at Mozilla. It wasn't an easy decision to move on; this is the best team I've been a part of in my career. And working at a company with such idealistic principles and the capacity to make a difference has been a privilege.

Looking back at the past five-and-three-quarter years:

I will stay a Mozillian, and I'm looking forward to see where we can go from here!

comment count unavailable comments

July 25, 2014 07:26 PM

July 18, 2014

Kim Moir (kmoir)

Reminder: Release Engineering Special Issue submission deadline is August 1, 2014

Just a friendly reminder that the deadline for the Release Engineering Special Issue is August 1, 2014.  If you have any questions about the submission process or a topic that's you'd like to write about, the guest editors, including myself, are happy to help you!

July 18, 2014 10:03 PM

Mozilla pushes - June 2014

Here's June 2014's  analysis of the pushes to our Mozilla development trees. You can load the data as an HTML page or as a json file

This was another record breaking month with a total of 12534 pushes.  As a note of interest, this is is over double the number of pushes we had in June 2013. So big kudos to everyone who helped us scale our infrastructure and tooling.  (Actually we had 6,433 pushes in April 2013 which would make this less than half because June 2013 was a bit of a dip.  But still impressive :-)


General Remarks
The introduction of Gaia-try in April has been very popular and comprised around 30% of pushes in June compared to 29% last month.
The Try branch itself consisted of around 38% of pushes.
The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 21% of all the pushes, compared to 22% in the previous month.

June 2014 was the month with most pushes (12534 pushes)
June 2014 has the highest pushes/day average with
418 pushes/day
June 2014 has the highest average of "pushes-per-hour" is
23.17 pushes/hour
June 4th, 2014 had the highest number of pushes in one day with
662 pushes

July 18, 2014 09:46 PM

Massimo Gerva (mgerva)

apache rewrite rules


always serve the content from an external web server unless the content is available locally:

RewriteEngine on
RewriteCond %{REQUEST_URI} !-U
RewriteRule ^(.+)$1

thanks mod_rewrite!

July 18, 2014 06:53 PM

July 15, 2014

Armen Zambrano G. (@armenzg)

Developing with GitHub and remote branches

I have recently started contributing using Git by using GitHub for the Firefox OS certification suite.

It has been interestting switching from Mercurial to Git. I honestly believed it would be more straight forward but I have to re-read again and again until the new ways sink in with me.

jgraham shared with me some notes (Thanks!) with regards what his workflow looks like and I want to document it for my own sake and perhaps yours:
git clone

# Time passes

# To develop something on master
# Pull in all the new commits from master

git fetch origin

# Create a new branch (this will track master from origin,
# which we don't really want, but that will be fixed later)

git checkout -b my_new_thing origin/master

# Edit some stuff

# Stage it and then commit the work

git add -p
git commit -m "New awesomeness"

# Push the work to a remote branch
git push --set-upstream origin HEAD:jgraham/my_new_thing

# Go to the GH UI and start a pull request

# Fix some review issues
git add -p
git commit -m "Fix review issues" # or use --fixup

# Push the new commits
git push

# Finally, the review is accepted
# We could rebase at this point, however,
# we tend to use the Merge button in the GH UI
# Working off a different branch is basically the same,
# but you replace "master" with the name of the branch you are working off.

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 15, 2014 09:04 PM

July 14, 2014

Massimo Gerva (mgerva)

bash magic

I have find this command in one of our startup script:

return $?

Our scripts worked fine for months but then some random error appeared. The problem with the above code is that it will always return 0

ret (unused variable) stores the exit code of  <command> but then the script returns $? .

The second $? refers to the status of the assignment of the variable, not the exit code of <command>.

Here is the an updated (and working) version of the code

return $?

not to self: remember to remove all the unused bash variables.

July 14, 2014 11:33 PM

July 11, 2014

Armen Zambrano G. (@armenzg)

Introducing Http authentication for Mozharness.

A while ago, I asked a colleague (you know who you are! :P) of mine how to run a specific type of test job on tbpl on my local machine and he told me with a smirk, "With mozharness!"

I wanted to punch him (HR: nothing to see here! This is not a literal punch, a figurative one), however he was right. He had good reason to say that, and I knew why he was smiling. I had to close my mouth and take it.

Here's the explanation on why he said that: most jobs running inside of tbpl are being driven by Mozharness, however they're optimized to run within the protected network of Release Engineering. This is good. This is safe. This is sound. However, when we try to reproduce a job outside of the Releng network, it becomes problematic for various reasons.

Many times we have had to guide people who are unfamiliar with mozharness as they try to run it locally with success. (Docs: How to run Mozharness as a developer). However, on other occasions when it comes to binaries stored on private web hosts, it becomes necessary to loan a machine. A loaned machine can reach those files through internal domains since it is hosted within the Releng network.

Today, I have landed a piece of code that does two things:
This change, plus the recently-introduced developer configs for Mozharness, makes it much easier to run mozharness outside of continuous integration infrastructure.

I hope this will help developers have a better experience reproducing the environments used in the tbpl infrastructure. One less reason to loan a machine!

This makes me *very* happy (see below) since I don't have VPN access anymore.

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 11, 2014 07:42 PM

Using developer configs for Mozharness

To help run mozharness by developers I have landed some configs that can be appended to the command appearing on tbpl.

All you have to do is:
  • Find the mozharness script line in a log from tbpl (search for "script/scripts")
  • Look for the --cfg parameter and add it again but it should end with ""
    • e.g. --cfg android/ --cfg android/
  • Also add the --installer-url and --test-url parameters as explained in the docs
Developer configs have these things in common:
  • They have the same name as the production one but instead end in ""
  • They overwrite the "exes" dict with an empty dict
    • This allows to use the binaries in your personal $PATH
  • They overwrite the "default_actions" list
    • The main reason is to remove the action called read-buildbot-configs
  • They fix URLs to point to the right public reachable domains 
Here are the currently available developer configs:
You can help by adding more of them!

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 11, 2014 07:15 PM

July 04, 2014

Kim Moir (kmoir)

This week in Mozilla Releng - July 4, 2014

This is a special double issue of this week in releng. I was so busy in the last week that I didn't get a chance to post this last week.  Despite the fireworks for Canada Day and Independence Day,  Mozilla release engineering managed to close some bugs. 

Major highlights:
 Completed work (resolution is 'FIXED'):
In progress work (unresolved and not assigned to nobody):

July 04, 2014 09:39 PM

July 03, 2014

Armen Zambrano G. (@armenzg)

Tbpl's blobber uploads are now discoverable

What is blobber? Blobber is a server and client side set of tools that allow Releng's test infrastructure to upload files without requiring to deploy ssh keys on them.

This is useful since it allows uploads of screenshots, crashdumps and any other file needed to debug what failed on a test job.

Up until now, if you wanted your scripts determine the files uploaded in a job, you would have to download the log and parse it to find the TinderboxPrint lines for Blobbler uploads, e.g.
15:21:18 INFO - (blobuploader) - INFO - TinderboxPrint: Uploaded 70485077-b08a-4530-8d4b-c85b0d6f9bc7.dmp to
Now, you can look for the set of files uploaded by looking at the uploaded_files.json that we upload at the end of all uploads. This can be discovered by inspecting the buildjson files or by listening to the pulse events. The key used is called "blobber_manifest_url" e.g.
"blobber_manifest_url": "",
In the future, this feature will be useful when we start uploading structured logs. It will help us not to download logs to extract meta-data about the jobs!

No, your uploads are not this ugly
This work was completed in bug 986112. Thanks to aki, catlee, mtabara and rail to help me get this out the door. You can read more about Blobber by visiting: "Blobber is live - upload ALL the things!" and "Blobber - local environment setup".

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 03, 2014 12:02 PM

July 02, 2014

Hal Wine (hwine)

2014-06 try server update

2014-06 try server update

Chatting with Aki the other day, I realized that word of all the wonderful improvements to the try server issue have not been publicized. A lot of folks have done a lot of work to make things better - here’s a brief summary of the good news.

Try server pushes could appear to take up to 4 hours, during which time others would be locked out.
The major time taker has been found and eliminated: ancestor processing. And we understand the remaining occasional slow downs are related to caching . Fortunately, there are some steps that developers can take now to minimize delays.

What folks can do to help

The biggest remaining slowdown is caused by rebuilding the cache. The cache is only invalidated if the push is interrupted. If you can avoid causing a disconnect until your push is complete, that helps everyone! So, please, no Ctrl-C during the push! The other changes should address the long wait times you used to see.

What has been done to infrastructure

There has long been a belief that many of our hg problems, especially on try, came from the fact that we had r/w NFS mounts of the repositories across multiple machines (both hgssh servers & hgweb servers). For various historical reasons, a large part of this was due to the way pushlog was implemented.

Ben did a lot of work to get sqlite off NFS, and much of the work to synchronize the repositories without NFS has been completed.

What has been done to our hooks

All along, folks have been discussing our try server performance issues with the hg developers. A key confusing issue was that we saw processes “hang” for VERY long times (45 min or more) without making a system call. Kendall managed to observe an hg process in such an infinite-looking-loop-that-eventually-terminated a few times. A stack trace would show it was looking up an hg ancestor without makes system calls or library accesses. In discussions, this confused the hg team as they did not know of any reason that ancestor code should be being invoked during a push.

Thanks to lots of debugging help from glandium one evening, we found and disabled a local hook that invoked the ancestor function on every commit to try. \o/ team work!

Caching – the remaining problem

With the ancestor-invoking-hook disabled, we still saw some longish periods of time where we couldn’t explain why pushes to try appeared hung. Granted it was a much shorter time, and always self corrected, but it was still puzzling.

A number of our old theories, such as “too many heads” were discounted by hg developers as both (a) we didn’t have that many heads, and (b) lots of heads shouldn’t be a significant issue – hg wants to support even more heads than we have on try.

Greg did a wonderful bit of sleuthing to find the impact of ^C during push. Our current belief is once the caching is fixed upstream, we’ll be in a pretty good spot. (Especially with the inclusion of some performance optimizations also possible with the new cache-fixed version.)

What is coming next

To take advantage of all the good stuff upstream Hg versions have, including the bug fixes we want, we’re going to be moving towards removing roadblocks to staying closer to the tip. Historically, we had some issues due to http header sizes and load balancers; ancient python or hg client versions; and similar. The client issues have been addressed, and a proper testing/staging environment is on the horizon.

There are a few competing priorities, so I’m not going to predict a completion date. But I’m positive the future is coming. I hope you have a glimpse into that as well.

July 02, 2014 07:00 AM

July 01, 2014

Armen Zambrano G. (@armenzg)

Down Memory Lane

It was cool to find an article from "The Senecan" which talks about how through Seneca, Lukas and I got involved and hired by Mozilla. Here's the article.

Here's an excerpt:
From Mozilla volunteers to software developers 
It pays to volunteer for Mozilla, at least it did for a pair of Seneca Software Development students. 
Armen Zambrano and Lukas Sebastian Blakk are still months away from graduating, but that hasn't stopped the creators behind the popular web browser Firefox from hiring them. 
When they are not in class learning, the Senecans will be doing a wide range of software work on the company’s browser including quality testing and writing code. “Being able to work on real code, with real developers has been invaluable,” says Lukas. “I came here to start a new career as soon as school is done, and thanks to the College’s partnership with Mozilla I've actually started it while still in school. I feel like I have a head start on the path I've chosen.”  
Firefox is a free open source web browser that can...

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 01, 2014 05:58 PM

June 30, 2014

Nick Thomas (nthomas)

Keeping track of buildbot usage

Mozilla Release Engineering provides some simple trending of the Buildbot continuous integration system, which can be useful to check how many jobs are currently running versus pending. There are graphs of the last 24 hours broken out in various ways – for example compilation separate from tests, compilation on try and everything else. This data also feeds into the pending queue on trychooser.

ImageUntil recently the mapping of job name to machine pool was out of date, due to our rapid growth for b2g and into Amazon’s AWS, so the graphs were more misleading than useful. This has now been corrected and I’m working on making sure it stays up to date automatically.

Update: Since July 18 the system stays up to date automatically, in just about all cases.

June 30, 2014 04:31 AM

June 24, 2014

Chris AtLee (catlee)

B2G now building using unified sources

Last week, with the help of Ehsan and John, we finally enabled unified source builds for B2G devices.

As a result we're building device builds approximately 40% faster than before.

Between June 12th and June 17th, 50% of our successful JB emulator builds on mozilla-inbound finished in 97 minutes or less. Using unified sources for these builds reduced the 50th percentile of build times down to 60 minutes (from June 19th to June 24th).

To mitigate the risks of changes landing that break non-unified builds, we're also doing periodic non-unified builds for these devices.

As usual, all our work here was done in the open. If you're interested, read along in bug 950676, and bug 942167.

Do you enjoy building, debugging and optimizing build, test & release pipelines? Great, because we're hiring!

June 24, 2014 07:23 PM

June 20, 2014

Kim Moir (kmoir)

Introducing Mozilla Releng's summer interns

The Mozilla Release Engineering team recently welcomed three interns to our team for the summer.

Ian Connolly is a student at Trinity College in Dublin. This is his first term with Mozilla and he's working on preflight slave tasks and an example project for Releng API.
Andhad Jai Singh is a student at Indian Institute of Technology Hyderabad.  This is his second term working at Mozilla, he was a Google Summer of Code student with the Ateam last year.  This term he's working on generating partial updates on request.
John Zeller is also a returning student and studies at Oregon State University.  He previously had a work term with Mozilla releng and also worked during the past school term as a student worker implementing Mozilla Releng apps in Docker. This term he'll work on updating our ship-it application  so that release automation updates ship it more frequently so we can see the state of the release, as well as integrating post-release tasks.

View from Mozilla San Francisco Office

Please drop by and say hello to them if you're in our San Francisco office.  Or say hello to them in #releng - their irc nicknames are ianconnolly, ffledgling and zeller respectively.


June 20, 2014 09:24 PM

This week in Mozilla Releng - June 20, 2014

Ben is away for the next few Fridays, so I'll be covering this blog post for the next couple of weeks.

Major highlights:

Completed work (resolution is 'FIXED'):
In progress work (unresolved and not assigned to nobody):

June 20, 2014 09:23 PM

Armen Zambrano G. (@armenzg)

My first A-team project: install all the tests!

As a welcoming bug to the A-team I had to deal with changing what tests get packaged.
The goal was to include all tests on a regardless if they are marked as disabled on the test manifests or not.

Changing it the packaging was not too difficult as I already had pointers from jgriffin, the problem came with the runners.
The B2G emulator and desktop mochitest runners did not read the manifests; what they did is to run all tests that came inside of the (even disabled ones).

Unfortunately for me, the mochitest runners code is very very old and it was hard to figure out how to make it work as clean as possible. I did a lot of mistakes and landed it twice incorrectly (improper try landing and lost my good patch somewhere) - sorry Ryan!.

After a lot of tweaking it, reviews from jmaher and help from ted & ahal, it landed last week.

For more details you can read bug 989583.

PS = Using was priceless to speed up my development.

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

June 20, 2014 08:06 PM

June 19, 2014

John Zeller (zeller)

Tupperware: Mozilla apps in Docker!

Announcing Tupperware, a setup for Mozilla apps in Docker! Tupperware is portable, reusable, and containerized. But unlike typical tupperware, please do not put it in the Microwave.



This is a project born out of a need to lower the barriers to entry for new contributors to Release Engineering (RelEng) maintained apps and services. Historically, RelEng has had greater difficulty attracting community contributors than other parts of Mozilla, due in large part to how much knowledge is needed to get going in the first place. For a new contributor, it can be quite overwhelming to jump into any number of the code bases that RelEng maintains and often leads to quickly losing that new contributor out of exaspiration. Beyond new contributors, Tupperware is great for experienced contributors as well to assist in keeping an unpolluted development environment and testing patches.


Currently Tupperware includes the following Mozilla apps:

BuildAPI – a Pylons project used by RelEng to surface information collected from two databases updated through our buildbot masters as they run jobs.

BuildBot – a job (read: builds and tests) scheduling system to queue/execute jobs when the required resources are available, and reporting the results.

Dependency apps currently included:

RabbitMQ – a messaging queue used by RelEng apps and services

MySQL – Forked from orchardup/mysql


Vagrant is used as a quick and easy way to provision the docker apps and make the setup truly plug n’ play. The current setup only has a single Vagrantfile which launches BuildAPI and BuildBot, with their dependency apps RabbitMQ and MySQL.

How to run:

– Install Vagrant 1.6.3

– hg clone && cd tupperware && vagrant up (takes >10 minutes the first time)

Where to see apps:

– BuildAPI:

– BuildBot:

– RabbitMQ Management:

Troubleshooting tips are available in the Tupperware README.

What’s Next?

Now that Tupperware is out there, it’s open to contributors! The setup does not need to stay solely usable for RelEng apps and services. So please submit bugs to add new ones! There are a few ideas for adding functionality to Tupperware already:

Have ideas? Submit a bug!

June 19, 2014 12:00 AM

June 16, 2014

Ben Hearsum (bhearsum)

June 17th Nightly/Aurora updates of Firefox, Fennec, and Thunderbird will be slightly delayed

As part of the ongoing work to move our Betas and Release builds to our new update server, I’ll be landing a fairly invasive change to it today. Because it requires a new schema for its data updates will be slightly delayed while the data repopulates in the new format as the nightlies stream in. While that’s happening, updates will continue to point at the builds from today (June 16th).

Once bug 1026070 is fixed, we will be able to do these sort of upgrades without any delay to users.

June 16, 2014 07:05 PM

How to not get spammed by Bugzilla

Bugmail is a running joke at Mozilla. Nearly everyone I know that works with Bugzilla (especially engineers) complains about the amount of bugmail they get. I too suffered from this problem for years, but with some tweaks to preferences and workflow, this problem can be solved. Here’s how I do it:

E-mail preferences

Here’s what my full e-mail settings look like:

And here’s my Zimbra filter for changes made by me (I think the “from” header part is probably unnecessary, though):


This section is mostly just an advertisement for the “My Dashboard” feature on Mozilla’s Bugzilla. By default, it shows you your assigned bugs, requested flags, and flags requested of you. Look at it at regular intervals (I try to restrict myself to once in the morning, and once before my EOD), particularly the “flags requested of you” section.

The other important thing is to generally stop caring about a bug unless it’s either assigned to you, or there’s a flag requested of you specifically. This ties in to some of the e-mail pref changes above. Changing my default state from “I must keep track of all bugs I might care about” to “I will keep track of my bugs & my requests, and opt-in to keeping tracking of anything else” is a shift in mindset, but a game changer when it comes to the amount of e-mail (and cognitive load) that Bugzilla generates.

With these changes it takes me less than 15 minutes to go through my bugmail every morning (even on Mondays). I can even ignore it at times, because “My Dashboard” will make sure I don’t miss anything critical. Big thanks to the Bugzilla devs who made some of these new things possible, particularly glob and dkl. Glob also mentioned that even more filtering possibilities are being made possible by bug 990980. The preview he sent me looks infinitely customizable:

June 16, 2014 01:11 PM

June 13, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – June 13th, 2014 – *double edition*

I spaced and forgot to post this last week, so here’s a double edition covering everything so far this month. I’ll also be away for the next 3 Fridays, and Kim volunteered to take the reigns in my stead. Now, on with it!

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

June 13, 2014 04:45 PM

June 11, 2014

Armen Zambrano G. (@armenzg)

Who doesn't like cheating on the Try server?

Have you ever forgotten about adding a platform to your Try push and had to push again?
Have you ever wished to *just* make changes to a file without having to build it first?
Well, this is your lucky day!

In this wiki page, I describe how to trigger arbitrary jobs on you try push.
As always be gentle with how you use it as we all share the resources.

Go crazy!

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

June 11, 2014 04:23 PM

June 10, 2014

Kim Moir (kmoir)

Talking about speaking up

We all interpret life through the lens of our previous experiences.  It's difficult to understand what each day is like for someone who has had a life fundamentally different from your own because you simply haven't had those experiences.  I don't understand what it's like to transition from male to female while involved in an open source community.  I don't know the steps taken to become an astrophysicist.  To embark to a new country as an immigrant.   I haven't lived struggled to survive on the streets as homeless person. Or a person who has been battered by domestic abuse.  To understand the experiences of others, all we can do is listen and learn from others, with empathy.

There have been many news stories recently about women or other underrepresented groups in technology.   I won't repeat them because frankly, they're quite depressing.  They go something like this:
1.  Incident of harassment/sexism either online/at a company/in a community/at a conference
2.  People call out this behaviour online and ask the organization to apologize and take steps to prevent this in the future.
3.  People from underrepresented groups who speak up about behaviour are told that their feelings are not valid or they are overreacting.  Even worse, they are harassed online with hateful statements telling them they don't belong in tech or are threatened with sexual assault or other acts of violence.
4.  Company/community/conference apologizes and issue written statement. Or not.
5. Goto 1

I watched an extraordinary talk the other day that really provided a vivid perspective about the challenges that women in technology face and what people can do to help. Brianna Wu is head of development at Giant Spacekat, a game development company.  She gave the talk "Nine ways to stop hurting and start helping women in tech" at AltConf last week.  She is brutally honest with the problems that exist in our companies and communities, and the steps forward to make it better. 

She talks about how she is threatened and harassed online. She also discusses how random people threatening you on the internet is not a just theoretical, but really frightening because she knows it could result in actual physical violence.   The same thing applies to street harassment. 

Here's the thing about being a woman.  I'm a physically strong person. I can run.  But I'm keenly aware that men are almost always bigger than me, and by basic tenets of physiology, stronger than me. So if a man tried to physically attack me, chances are I'd lose that fight.  So when someone threatens you, online or not, it is profoundly frightening because you fear for your physical safety. And to have that happen over and over again, like many women in our industry experience, apart from being terrifying, is exhausting and has a huge emotional toll.

I was going to summarize the points she brings up in her talk but she speaks so powerfully that all I can do is encourage you to watch the talk.

One of her final points really drives home the need for change in our industry when she says to the audience "This is not a problem that women can solve on their own....If you talk to your male friends out there, you guys have a tremendous amount of power as peers.  To talk to them and say, look dude this isn't okay.  You can't do this, you can't talk this way.  You need to think about this behaviour. You guys need to make a difference in a way that I can't."  Because when she talks about this behaviour to men, it often goes in one ear and out the next.  To be a ally in any sense of the word, you need to speak up.

THIS 1000x THIS.

Thank you Brianna for giving this talk.  I hope that when others see it they will gain some insight and feel some empathy on the challenges that women, and other underrepresented groups in the technology industry face.  And that you will all speak up too.

Further reading
Ashe Dryden's The 101-Level Reader: Books to Help You Better Understand Your Biases and the Lived Experiences of People                                                                                                           
Ashe Dryden Our most wicked problem

June 10, 2014 01:30 AM

June 04, 2014

Ben Hearsum (bhearsum)

More on “How far we’ve come”

After I posted “How far we’ve come” this morning a few people expressed interest in what our release process looked like before, and what it looks like now.

The earliest recorded release process I know of was called the “Unified Release Process”. (I presume “unified” comes from unifying the ways different release engineers did things.) As you can see, it’s a very lengthy document, with lots of shell commands to tweak/copy/paste. A lot of the things that get run are actually scripts that wrap some parts of the process – so it’s not as bad as it could’ve been.

I was around for much of the improvements to this process. Awhile back I wrote a series of blog posts detailing some of them. For those interested, you can find them here:

I haven’t gotten around to writing a new one for the most recent version of the release automation, but if you compare our current Checklist to the old Unified Release Process, I’m sure you can get a sense of how much more efficient it is. Basically, we have push-button releases now. Fill in some basic info, push a button, and a release pops out:

June 04, 2014 06:57 PM

How far we’ve come

When I joined Mozilla’s Release Engineering team (Build & Release at the time) back in 2007, the mechanics of shipping a release were a daunting task with zero automation. My earliest memories of doing releases are ones where I get up early, stay late, and spend my entire day on the release. I logged onto at least 8 different machines to run countless commands, sometimes forgetting to start “screen” and losing work due to a dropped network connection.

Last night I had a chat with Nick. When we ended the call I realized that the Firefox 30.0 release builds had started mid-call – completely without us. When I checked my e-mail this morning I found that the rest of the release build process had completed without issue or human intervention.

It’s easy to get bogged down thinking about current problems. Times like this make me realize that sometimes you just need to sit down and recognize how far you’ve come.

June 04, 2014 01:26 PM

June 02, 2014

Kim Moir (kmoir)

Mozilla pushes - May 2014

Here's May's monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file

This was a record breaking month where we overcame our previous record of 8100+ pushes with a record of 11000+ pushes this month.  Gaia-try, just created in April has become a popular branch with 29% of pushes.

General Remarks
The introduction of Gaia-try in April has been very popular and comprised around 29% of pushes in May.  The Try branch itself consisted of around 38% of pushes.
The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 22% of all the pushes, compared to 30% in the previous month.

May 2014 was the month with most pushes (11711 pushes)
May 2014 has the highest pushes/day average with 378 pushes/day
May 2014 has the highest average of "pushes-per-hour" is 22 pushes/hour
May 29th, 2014 had the highest number of pushes in one day with 613 pushes

May 2014 is a record setting month, 11711 pushes!

Note that Gaia-try was added in April and has quickly become a high volume branch

I changed the format of this pie chart this month.  It seemed to be previously based on several months data, but not all data from the previous year.  So I changed it to be only based on the data from the current month which seemed more logical.

June 02, 2014 09:43 PM

May 30, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – May 30th, 2014

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

May 30, 2014 08:20 PM

May 28, 2014

Armen Zambrano G. (@armenzg)

How to create local buildbot slaves

For the longest time I have wished for *some* documentation on how to setup a buildbot slave outside of the Release Engineering setup and not needing to go through the Puppet manifests.

On a previous post, I've documented how to setup a production buildbot master.
In this post, I'm only covering the slaves side of the setup.

Install buildslave

virtualenv ~/venvs/buildbot-slave
source ~/venvs/buildbot-slave/bin/activate
pip install zope.interface==3.6.1
pip install buildbot-slave==0.8.4-pre-moz2 --find-links
pip install Twisted==10.2.0
pip install simplejson==2.1.3
NOTE: You can figure out what to install by looking in here:

Create the slaves

NOTE: I already have build and test master in my localhost with ports 9000 and 9001 respecively.
buildslave create-slave /builds/build_slave localhost:9000 bld-linux64-ix-060 pass
buildslave create-slave /builds/test_slave localhost:9001 tst-linux64-ec2-001 pass

Start the slaves

On a normal day, you can do this to start your slaves up:
 source ~/venvs/buildbot-slave/bin/activate
 buildslave start /builds/build_slave
 buildslave start /builds/test_slave

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

May 28, 2014 07:05 PM

May 23, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – May 23rd, 2014

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

May 23, 2014 08:40 PM

Armen Zambrano G. (@armenzg)

Technical debt and getting rid of the elephants

Recently, I had to deal with code where I knew there were elephants in the code and I did not want to see them. Namely, adding a new build platform (mulet) and running a b2g desktop job through mozharness on my local machine.

As I passed by, I decided to spend some time to go and get some peanuts to get at least few of those elephants out of there:

I know I can't use "the elephant in the room" metaphor like that but I just did and you just know what I meant :)

Well, how do you deal with technical debt?
Do you take a chunk every time you pass by that code?
Do you wait for the storm to pass by (you've shipped your awesome release) before throwing the elephants off the ship?
Or else?

Let me know; I'm eager to hear about your own de-elephantization stories.

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

May 23, 2014 03:35 AM

May 22, 2014

Peter Moore (pmoore)

Protected: Setting up a Mozilla vcs sync -> mapper development environment

This post is password protected. You must visit the website and enter the password to continue reading.

May 22, 2014 02:35 PM

May 16, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – May 16th, 2014

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

May 16, 2014 07:50 PM

Kim Moir (kmoir)

20 years on the web

Note: I started writing this a long time ago as part of #mynerdstory but never got around to finishing it until recently.  So I changed it a bit when I noticed it had been over 20 years since I first used the internet.

I found this picture the other day.  It's me on graduation day at Acadia,  twenty years ago this month.  A lot has changed since then.

In the picture, I'm in Carnegie Hall, where the Computer Science department had their labs, classrooms and offices. I'm sitting in front of a Sun workstation, which ran a early version of Mosaic.  I recall the first time I saw a web browser display a web page, I was awestruck.   I think it was NASA's web page.  My immediate reaction was that I wanted to work on that, to be on the web.

As a I've mentioned before, my Dad was a manager at a software and services firm in Halifax.  He brought home our first computer when I was 9.  Dad was always upgrading the computers or fixing them and I'd watch him and asked lots of questions about how the components connected together.  In junior high, I taught myself BASIC from the manual, wrote a bunch of simple programs, and played so many computer games that my dreams at night became pixelated.  When I was 16, I started working at my Dad's office doing clerical work during the school break.  One of my tasks was to run a series of commands to connect to BITNET via an accoustic coupler using Kermit and download support questions from their university customers.  I thought it was so magical that these computers that were so physically distant could connect and communicate.

In high school, I took computer science in grade 12 and we wrote programs in Pascal on Apple IIs.  My computer science teacher was very enthusiastic and welcoming.  He taught us sorting algorithms, and binary trees, and other advanced topics that weren't on the curriculum. Since he had such an interest he taught a lot of extra material.  Thanks Mr. B. 

When it was time to apply to university,  I didn't apply to computer science.  I don't know why, my grades were fine and I certainly had the background.  I really lacked self confidence that I could do it.  In retrospect, I would have been fine.  I enrolled at Acadia in their Bachelor of Business Administration program, probably because I liked reading the Globe and Mail.

I arrived on campus with a PC to write papers and do my accounting assignments.  The reason I had access to a computer was that the company my Dad worked for allowed their employees borrow a computer for home use for a year at a time, then return it.  Otherwise, they were prohibitively expensive at the time.  My third year of university I decided that I was better suited to computer science than business so started taking all my elective courses from the computer science faculty.  I still wanted to graduate on in four years so I didn't switch majors.  It was such a struggle to scrape together the money from part-time jobs and student loans to pay for four years of university, let alone six.

One of my part-time jobs was helping people in the university computer labs with questions and fixing problems.  Everything was very text based back then.  We used Archie to search for files, read books transcribed by the Gutenberg project and use uudecode to assemble pictures posted to Usenet groups.  I applied for a Unix account on the Sun system that only the Computer Science students had access to.   It was called dragon and the head sysadmin had a sig that said "don't flame me, I'm on dragon".  I loved learning all the obscure yet useful Unix commands.

My third year I had a 386 portable running Windows 3.1.  I carried this computer all over campus, plugging it in at the the student union centre and working on finance projects with my business school colleagues.  By my fourth year, they had installed Sun workstations in the Computer Science labs with Mosaic installed.   This was my first view of the world wide web.   It was beautiful.  The web held such promise.

I applied for 40 different jobs before I graduated from Acadia and was offered a job in Ottawa working for the IT department of Revenue Canada.  A ticket out of rural Nova Scotia! I didn't like my first job there that much but they paid for networking and operating system courses that I took at night.  I was able to move to a new job in a year and started being a sysadmin for their email servers that served 30,000 users.  It was a lot of fun and I learned a tremendous amount about networking, mail related protocols and operating systems.  I also spent a lot of time in various server rooms across Canada installing servers.  Always bring a sweater.

I left after a few years to work at Nortel as technical support for a telephony switch that offloaded internet traffic from voice switches to a dedicated switch.  Most internet traffic back then was via modem which were longer duration calls than most voice calls and caused traffic issues.  I took a lot of courses on telephony protocols, various Unix variants and networking. I traveled to several telco customers to help configure systems and demonstrate product features. More time in cold server rooms.

Shortly after Mr. Releng and I got married we moved to Athens, Georgia where he was completing his postdoc.  I found a great job as a sysadmin for the UGA's computer systems division.  The group provided FTP, electronic courseware and email services to the campus.  We also secured a lot of hacked Linux servers set up by unknowing graduate students in various departments.  When I started, I didn't know Linux very well so my manager just advised me to install Red Hat about 30 times and change the options every time, learn how to compile custom kernels and so on.  So that's what I did.  At that time you also had to compile Apache from source to include any modules such as ssl support, or different databases so I also had fun doing that. 

We used to do maintenance on the computer systems between 5 and 7am once a week.  Apparently not many students are awake at that hour.  I'd get up at 4am and drive in to the university in the early morning, the air heavy with the scent of Georgia pine and the ubiquitous humidity.  My manager M, always made a list the night before of what we had to do, how long it would take, and how long it would take to back the changes out.  His attention to detail and reluctance to ever go over the maintenance window has stayed with me over time. In fact, I'm still kind of a maintenance nerd, always figuring out how to conduct system maintenance in the least disruptive way to users.  The server room at UGA was huge and had been in operation since the 1960s.  The layers of cable under the tiles were an archeological record of the progress of cabling within the past forty years.  M typed on a DVORAK keyboard, and was one of the most knowledgeable people about all the Unix variants, and how they differed. If he found a bug in Emacs or any other open source software, he would just write a patch and submit it to their mailing list.  I thought that was very cool.

After Mr. Releng finished his postdoc, we moved back to Ottawa.  I got a job at a company called OTI as a sysadmin.  Shortly after joining, my colleague J said "We are going to release an open source project called Eclipse, are you interested in installing some servers for it?"  So I set up Bugzilla, CVS, mailman, nntp servers etc.  It was a lot of fun and the project became very popular and generated a lot of traffic.  A couple years later the Eclipse consortium became the Eclipse Foundation and all the infrastructure management moved there. 

I moved to the release engineering team at IBM and started working with S who taught me the fundamentals of release engineering.  We would spent many hours testing and implementing new features in the build, and test environment, and working with the development team to implement new functionality, since we used Eclipse bundles to build Eclipse.  I have written a lot about that before on my blog so I won't reiterate.  Needless to say, being paid to work full time in an open source community was a dream come true.

A couple of years ago, I moved to work at Mozilla.  And the 20 year old who looked Mosaic for the first time and saw the beauty and promise of the web, couldn't believe where she ended up almost 20 years later.

Many people didn't grow up with the privilege that I have, with access to computers at such a young age, and encouragement to pursue it as a career.  I thank all of you who I have worked with and learned so much from.  Lots still to learn and do!

May 16, 2014 07:40 PM

Release Engineering Special Issue

A different type of mobile farm  ©Suzie Tremmel, Creative Commons by-nc-sa 2.0

Are you a release engineer with a great story to share?  Perhaps the ingenious way that you optimized your build scripts to reduce end to end build time?  Or how you optimized your cloud infrastructure to reduce your IT costs significantly?  How you integrated mobile testing into your continuous integration farm?  Or are you a researcher who would like to publish their latest research in a area related to release engineering?

If so, please consider submitting a report or paper to the first IEEE Release Engineering special issue.   Deadline for submissions is August 1, 2014 and the special issue will be published in the Spring of 2015.

IEEE Release Engineering Special Issue

If you have any questions about the process or the special issue in general, please reach out to any of the guest editors.  We're happy to help!

We're also conducting a roundtable interview with several people from the release engineering community in the issue.  This should raise some interesting insights given the different perspectives that people from organizations with large scale release engineering efforts bring to the table.

May 16, 2014 07:23 PM

May 15, 2014

Massimo Gerva (mgerva)

local storage for builds

Using the instance storage space on aws

Bug 977611 enables the use of the instance storage space for builds. Instance storage comes for free with your instance and it is faster than EBS, especially if the  instance type comes with SSDs. Here is how we are managing the instances:

detect if any instance storage is available

instance storage can be a single disk or multiple volumes; to detect your instance storage query this url:

$ curl 

in this case, the instance has two volumes: ephemeral0 and ephemeral1. ephemeral0 maps to

$ curl

(and ephemral1 maps to /dev/sdc)

prepare the disk space

if the instance type has multiple disk, we need to use lvm

create a physical volume (man page)
for each device (/dev/xvdb, /dev/xvdc): 
dd if=/dev/zero of=/dev/xvdb bs=512 count=1
pvcreate -ff -v /dev/xvdb

now format the device:

mkfs.ext4 /dev/xvdb
create a volume group (man page)
vgcreate vg /dev/xvdb /dev/xvdc
create a logical volume (man page)
lvcreate -l 100%VG --name /dev/mapper/vg-local
mount the new disk

add the following line to /etc/fstab

/dev/mapper/vg-local /builds/slave ext4 defaults,noatime 0 0

reboot, and have fun!


May 15, 2014 03:09 PM

May 13, 2014

Armen Zambrano G. (@armenzg)

Do you need a used Mac Mini for your Mozilla team? or your non-for-profit project?

If so, visit this form and fill it up by May 22nd (9 days from today).
There are a lot of disclaimers in the form. Please read them carefully.

These minis have been deprecated after 4 years of usage. Read more about it in here.

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

May 13, 2014 05:38 PM

Chris AtLee (catlee)

Limiting coalescing on the build/test farm

tl;dr - as of yesterday we've limited coalescing on all builds/tests to merge at most 3 pending jobs together

Coalescing (aka queue collapsing aka merging) has been part of Mozilla's build/test CI for a long, long time. Back in the days of Tinderbox, a single machine would do a checkout/build/upload loop. If there were more checkins while the build was taking place, well, they would get built on the next iteration through the loop.

Fast forward a few years later to our move to buildbot, and having pools of machines all able to do the same builds. Now we create separate jobs in the queue for each build for each push. However, we didn't always have capacity to do all these builds in a reasonable amount of time, so we left buildbot's default behaviour (merging all pending jobs together) enabled for the majority of jobs. This means that if there are pending jobs for a particular build type, the first free machine skips all but the most recent item on the queue. The skipped jobs are "merged" into the job that was actually run.

In the case that all builds and tests are green, coalescing is actually a good thing most of the time. It saves you from doing a bunch of extra useless work.

However, not all pushes are perfect (just see how often the tree is closed due to build/test failures), and coalescing makes bisecting the failure very painful and time consuming, especially in the case that we've coalesced away intermediate build jobs.

To try and find a balance between capacity and sane results, we've recently added a limit to how many jobs can be coalesced at once.

By rigorous statistical analysis:

@catlee     so it's easiest to pick a single upper bound for coalescing and go with that at first
@catlee     did you have any ideas for what that should be?
@catlee     I was thinking 3
edmorley|sheriffduty        catlee: that sounds good to me as a first go :-)
mshal       chosen by fair dice roll? :)
@catlee     1d4
bhearsum    Saving throw failed. You are dead.
philor      wfm

we've chosen 3 as the upper bound on the number of jobs we'll coalesce, and we can tweak this as necessary.

I hope this makes the trees a bit more manageable! Please let us know what you think!

As always, all our work is done in the open. See the bug with the patch here:

May 13, 2014 11:17 AM

May 12, 2014

Kim Moir (kmoir)

Mozilla pushes - April 2014

Here's April's monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file.



General Remarks



The data collected prior to 2014 could be slightly off since different data collection methods were used.

May 12, 2014 01:33 PM

Nick Thomas (nthomas)

Rethinking rsync at Mozilla

Some time ago Mozilla moved away from volunteer servers for delivering installers and updates to our end users, but we still offer two rsync modules

Both of these have been unmaintained for some time, so they are out of date (1 year) and huge (500GB) respectively. We still serve quite a lot of traffic through those modules, although there were no complaints when it was down for a few days recently.

I’m interested to hear opinions on whether we should maintain rsync access to release bits. From the logs it’s clear some of former mirrors are still pulling data, which is fine if it’s intentional rather than legacy. There may be other use cases we’re not aware of, so please let us know in the comments or on bug 807543.

May 12, 2014 04:08 AM

May 09, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – May 9th, 2014

This was a quieter week than most. With everybody flying home from Portland on Friday/Saturday, it took some time for most of us to get back into the swing of things.

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

May 09, 2014 08:10 PM

May 08, 2014

Aki Sasaki (aki)

brainstorm: splitting mozharness

[stating the problem]

Mozharness currently handles a lot of complexity. (It was designed to be able to, but the ideal is still elegantly simple scripts and configs.)

Our production-oriented scripts take (and sometimes expect) config inputs from multiple locations, some of them dynamic; and they contain infrastructure-oriented behavior like clobberer, mock, and tooltool, which don't apply to standalone users.

We want mozharness to be able to handle the complexity of our infrastructure, but make it elegantly simple for the standalone user. These are currently conflicting goals, and automating jobs in infrastructure often wins out over making the scripts user friendly. We've brainstormed some ideas on how to fix this, but first, some more details:

[complex configs]

A lot of the current complexity involves config inputs from many places:

We want to lock the running config at the beginning of the script run, but we also don't want to have to clone a repo or make external calls to web resources during __init__(). Our current solution has been to populate runtime configs during one of our script actions, but then to support those runtime configs we have to check multiple config locations for our script logic. (self.buildbot_config, self.test_config, self.config, ...)

We're able to handle this complexity in mozharness, and we end up with a single config dict that we then dump to the log + to a json file on disk, which can then be reused to replicate that job's config. However, this has a negative effect on humans who need to either change something in the running configs, or who want to simplify the config to work locally.

[in-tree vs out-of-tree]

We also want some of mozharness' config and logic to ride the trains, but other portions need to be able to handle outside-of-tree processes and config, for various reasons:

[brainstorming solutions]

Part of the solution is to move logic out of mozharness. Desktop Firefox builds and repacks moving to mach makes sense, since they're

  1. configurable by separate mozconfigs,
  2. tasks completely shared by developers, and
  3. completely dependent on the tree, so tying them to the tree has no additional downside.

However, Andrew Halberstadt wanted to write the in-tree test harnesses in mozharness, and have mach call the mozharness scripts. This broke some of the above assumptions, until we started thinking along the lines of splitting mozharness: a portion in-tree running the test harnesses, and a portion out-of-tree doing the pre-test-run machine setup.

(I'm leaning towards both splitting mozharness and using helper objects, but am open to other brainstorms at this point...)

[splitting mozharness]

In effect, the wrapper, out-of-tree portion of mozharness would be taking all of the complex inputs, simplifying them for the in-tree portion, and setting up the environment (mock, tooltool, downloads+installs, etc.); the in-tree portion would take a relatively simple config and run the tests.

We could do this by having one mozharness script call another. We'd have to fix the logging bug that causes us to double-log lines when we instantiate a second BaseScript, but that's not an insurmountable problem. We could also try execing the second script, though I'd want to verify how that works on Windows. We could also modify our buildbot ScriptFactory to be able to call two scripts consecutively, after the first script dynamically generates the simplified config for the second script.

We could land the portions of mozharness needed to run test harnesses in-tree, and leave the others out-of-tree. There will be some duplication, especially in the mozharness.base code, but that's changing less than the scripts and mozharness.mozilla modules.

We would be able to present a user-friendly "inner" script with limited inputs that rides the trains, while also allowing for complex inputs and automation-oriented setup beforehand in the "outer" script. We'd most likely still have to allow for automation support in the inner script, if there's some reporting or error checking or other automation task that's needed after the handoff, but we'd still be able to limit the complexity of that inner script. And we could wrap that inner script in a mach command for easy developer use.

[helper objects]

Currently, most of mozharness' logic is encapsulated in self. We do have helper objects: the BaseConfig and the ReadOnlyDict self.config for config; the MultiFileLogger self.log_obj that handles all logging; MercurialVCS for cloning, ADBDeviceHandler and SUTDeviceHandler for mobile device wrangling. But a lot of what we do is handled by mixins inherited by self.

A while back I filed a bug to create a LocalLogger and BaseHelper to enable parallelization in mozharness scripts. Instead of cloning 90 locale repos serially, we could create 10 helper objects that each clone a repo in parallel, and launch new ones as the previous ones finish. This would have simplified Armen's parallel emulator testing code. But even if we're not planning on running parallel processes, creating a helper object allows us to simplify the config and logic in that object, similar to the "inner" script if we split mozharness into in-tree and out-of-tree instances, which could potentially also be instantiated by other non-mozharness scripts.

Essentially, as long as the object has a self.log_obj, it will use that for logging. The LocalLogger would log to memory or disk, outside of the main script log, to avoid parallel log interleaving; we would use this if we were going to run the helper objects in parallel. If we wanted the helper object to stream to the main log, we could set its log_obj to our self.log_obj. Similarly with its config. We could set its config to our self.config, or limit what config we pass to simplify.

(Mozharness' config locking is a feature that promotes easier debugging and predictability, but in practice we often find ourselves trying to get around it somehow. Other config dicts, self.variables, editing self.config in _pre_config_lock() ... Creating helper objects lets us create dynamic config at runtime without violating this central principle, as long as it's logged properly.)

Because this "helper object" solution overlaps considerably with the "splitting mozharness" solution, we could use a combination of the two to great efficacy.

[functions and globals]

This idea completely alters our implementation of mozharness, by moving self.config to a global config, directly calling logging methods (or wrapped logging methods). By making each method a standalone function that's only slightly different from a standard python function, it lowers the bar for contribution or re-use of mozharness code. It does away with both the downsides and benefits of objects.

The first, large downside I see is this solution appears incompatible with the "helper objects" solution. By relying on a global config and logging in our functions, it's difficult to create standalone helpers that use minimized configs or alternate logging configurations. I also think the global logging may make the double-logging bug more prevalent.

It's quite possible I'm downplaying the benefit of importing individual functions like a standard python script. There are decorators to transform functions into class methods and vice versa, which might allow for both standalone functions and object-based methods with the same code.

[related links]

  • Jordan Lund has some ideas + wip patches linked from bug 753547 comment 6.
  • Andrew Halberstadt's Sharing code not always a good thing and How to deal with IFFY requirements
  • My mozharness core principles example scripts+configs and video
  • Lars Lohn's Crouching Argparse Hidden Configman. Afaict configman appears to solve similar problems to mozharness' BaseConfig, but Argparse requires python 2.7 and mozharness locks the config.

  • comment count unavailable comments

    May 08, 2014 04:09 AM

    May 07, 2014

    Ben Hearsum (bhearsum)

    Redo – Utilities to retry Python callables

    We deal with a lot of flaky things in RelEng. The network can drop. Code can have race conditions. Servers can go offline temporarily. Freak errors can happen (more often than you’d think). One of the ways we’ve learned to cope with this is to add “retry” behaviour to damn near everything that could fail intermittently. We use it so much that we’ve got a Python library and command line tool that are used all over the place.

    Last week I finally got around to packaging and publishing ours, and I’m happy to present: Redo – Utilities to retry Python callables. Redo provides a decorator, context manager, plain old function, and even a command line tool to retry all sorts of things that may break. It’s very simple to use, here’s some examples from the docs:
    The plain old function:

    def maybe_raises(foo, bar=1):
        return 1
    def cleanup():
    ret = retry(maybe_raises, retry_exceptions=(HTTPError,),
                cleanup=cleanup, args=1, kwargs={"bar": 2})

    The decorator:

    from redo import retriable
    def foo()
    @retriable(attempts=100, sleeptime=10)
    def bar():

    The context manager:

    def foo(a, b):
    with retrying(foo, retry_exceptions=(HTTPError,)) as retrying_foo:
        r = retrying_foo(1, 3)

    You can grab version 1.0 from PyPI, or find it on Github, where you can send issues or pull requests.

    May 07, 2014 07:07 PM

    May 06, 2014

    Kim Moir (kmoir)

    Releng 2014 invited talks available

    On April 11,  there was a Releng workshop held at Google in Mountain View. The two keynote talks and panel at the end of the day were recorded and made available on the Talks at Google channel on YouTube.  Thank you Google!

    Moving to mobile: The challenges of moving from web to mobile releases, Chuck Rossi, Facebook

    Some interesting notes from Chuck's talk:
    The 10 Commandments of Release Engineering Dinah McNutt, Google

    Some notes from Dinah's talk. 
    • Release engineering is accelerating the path for development to operations
    • You want to be able to reproduce your build environment and source code management system if you have to recreate a very old build
    • Configuration management and release engineering as disciplines will probably merge as the next few years
    • Reproducibility is a virtue. Binaries don't belong in SCMs.  However, it's important to be able to reproduce binaries.  If you do need a repo with binaries, put them in a separate repo. 
    • Use the right tool for the job, you will have multiple tools.  Both commercial and open source.
    • View the job of a release engineer and making a developers job easier.  By setting up tooling and best practices.
    • Package management.  Provides auditing, upgrading, installation, removals. Tars and jars are not package managers.
    • You need to think about your upgrade process before you release 1.0.
    • Customers find problems we cannot find ourselves. Even if we're dogfooding.
    • As release engineers, step back and look at the big picture.  Look and see how we can make things better from a cost perspective so we have the resources we need to do our jobs.
    • It's a great year to be a release engineer. Dinah is the organizing committee for Release Engineering Summit June 20 in Philadelphia as part of USENIX. There is also one as a part of LISA in Seattle in November.  Overwhelming interest for a first time summit in terms of submissions!
    Closing discussion panel:

    Stephanie Bellomo, one of my colleagues from the organizing committee for this workshop moderated the panel.  Really interesting discussions, well worth a listen.  I like that the first question of "What is your worst operational nightmare and how did you recover from it?"  I love war stories.

    As an aside, we charged $50 per attendee for this workshop.  We talked to other people who had organized similar events and they suggested this would be an appropriate fee.  I've read that if you don't charge a fee for an event, you have more no-shows on the day of the event because psychologically,  they attach a lesser value to the event since they didn't pay for it.  However, we didn't have many expenses to pay for the workshop other than speaker gifts, eventbrite fees and badges.  Google provided the venue and lunch, again, thank you for the sponsorship. So we donated $1,531.00 USD to each of the following organizations from the remaining proceeds.

    YearUp is an organization that in their words "Year Up empowers low-income young adults to go from poverty to professional careers in a single year."  I know Mozilla has partnered with YearUp to provide mentoring opportunities within the IT group and it was an amazing experience for all involved. 
    The second organization we donated to is the Tanzania Education Fund.  This organization was one that Stephany mentioned since she had colleagues that were involved with for many years.  They provide pre-school, elementary, secondary and high school education for students in Tanzania.   Secondary education is not publicly funded in Tanzania.  In addition, 50% of their students are girls, in an area where education for girls is given low priority.  Education is so important to empower people.

    Thanks to all those that attended and spoke at the workshop!

    May 06, 2014 06:40 PM

    Aki Sasaki (aki)

    Gaia Try

    We're now running Gaia tests on TBPL against Gaia pull requests.

    John Ford wrote a commit hook in bug 989131 to push an update to the gaia-try repo that looks like this. The buildbot test scheduler master is polling this repo; when it sees a change, it triggers tests against the latest b2g desktop builds. The test jobs download the pre-built desktop binaries, and clone the appropriate pull-request Gaia repo and revision on top, then run the tests. The buildbot work was completed in bug 986209.

    This should allow us to verify our code doesn't break anything before it's merged to Gaia proper, saving human time and reducing code churn.

    Armen pointed out how gaia code changes currently show up in the push count via bumper processes, but that only reflects merged pull requests, not these un-reviewed pull requests. Now that we've turned on gaia-try, this is equivalent to another mozilla-inbound (an additional 10% of our push load, iirc). Our May pushes should see a significant bump.

    comment count unavailable comments

    May 06, 2014 06:15 PM

    May 05, 2014

    Armen Zambrano G. (@armenzg)

    Releng goodies from Portlandia!

    Last week, Mozilla's Release Engineering met at the Portland office for a team week.
    The week was packed with talks and several breakout sessions.
    We recorded a lot of our sessions and put all of them in here for your enjoyment! (with associated slide decks if applicable).

    Here's a brief list of the talks you can find:
    Follow us at @MozReleng and Planet Releng.

    Many thanks to jlund to help me record it all.

    UPDATE: added thanks to jlund.

    The Releng dreams are alive in Portland

    Creative Commons License
    This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

    May 05, 2014 08:03 PM

    Kim Moir (kmoir)

    Remote work in review

    I recently read two books on remote work.  Scott Berkun's Year without Pants and Remote: Office Not Required by Jason Fried and David Heinemeier Hansson.  The first book describes Scott Berkun's year as a remote manager at Automattic, which runs  The second book describes the authors' experiences at the fully distributed company 37signals, which has since renamed itself to Basecamp to reflect the importance of its flagship product.  Both books were interesting reflections on the nature of remote work in the context of their respective companies.  They were not books that addressed the nature of remote work in general, but described the approach they felt was successful within their companies. Of the two books, I would really recommend reading the Year Without Pants.  Remote isn't as compelling but it's a short read.

    Some notes from "The Year without Pants" 

    "To work at a remote company demanded great communication skills and everyone had them"

    Chapter 4 on culture always wins is  is a fantastic read, it's available for free on his website. 

    "Trust is everything". 

    "1. Hire great people
     2. Set good priorities
     3. Remove distractions
     4. Stay out of the way"

    In other words, treat people like grown ups and they will do good work.
    "In every meeting in every organization around the world where bad behavior is happening, there is someone with the most power in the room who can do something about it.  What that person does shapes the culture.  If the most powerful person is silent, this signals passive acceptance of whatever is going on."

    Wow, this is very significant.  If you are the most powerful person in the room, speak up and call out bad behaviour.  The people with less power are often hesitant to speak up because there may be consequences for them and they feel they lack authority.

    "Hire self sufficient, passionate people"

    I often get questions from people who don't work at home how I don't get distracted and goof off all day since I work from home.  It's simple.  I love my job.  It's a lot of fun.  I want to be shipping, not slacking.

    Shipping every day gives people a sense of accomplishment.  Many bug fixes are deployed to every day. No gatekeepers to deployment but the people deploying the change are expected to watch the site after they deploy for a few hours to ensure there aren't unexpected problems.

    Some notes from "Remote: Office Not Required"

    The book suggests asking managers or employees to work from home a few days a week to level the playing field with respect to communications between employees who work in an office and those who are remote.   This will ensure they appreciate what hinders communication and take steps for improvement.  And it will reduce the tendency to treat remote workers as second class citizens and cut them out of essential conversations.

    "...coming  into the office just means that people have to put on pants.  There is not guarantee of productivity."

    "If you view those who work under you as capable adults who will push themselves to excel even when they're not breathing down their necks, they'll delight you in return"

    Again, trust people and have high expectations of them and you'll be rewarded with excellence. 

    The authors note that international exposure is good as a selling point with clients.  Hiring around the world increases the talent pool available, but is not without tax or legal complications.  Also, given the degree of written communication with remote work, it's best to hire people with the language skills that can thrive in this situation.  

    The book stresses that workflow tools need to be available to all team members at all times in order to be productive. i.e. recording the state of a project in a wiki, bug tracker and recording meetings.  If a team member is working in a timezone offset from the majority of team members and doesn't have this in place, it can be a productivity drain.

     "Would-be remote workers and managers have a lot to learn from how the open source software movement has conquered the commercial giants over the past few decades. Open source is a triumph of asynchronous collaboration and communication like few the world has ever seen."

    Absolutely.  I learned so much working in open source for so many years.

    The authors also mention that you'll have to worry about your employees overworking, not underworking.  Because they office is physically in your home, it's easy to get sucked in at all hours to just work on one little thing that takes longer than you expect.

    My thoughts on remote work
    If you had asked me five years ago  if I ever thought I'd work full time from my home my answer would have been a definitive no.  But I wanted to work for Mozilla, and wasn't interested in moving to an city where they had a physical office. Here are my personal suggestions for working successfully on a remote team, given my two years of experience being a part of one:

    As aside, I was supposed to be at a Mozilla work week in Portland this past week.  But I didn't fly there because I came down with a bad cold.  Despite this, I could connect to the same room the they were in and see all the talks they were giving and also give a presentation.  This was so excellent.  Since we are so used to being a distributed team, having one person remote when we were supposed to be all together wasn't a problem.  We already had the culture and tools in place to accommodate this. Thank you Mozilla releng for being such an amazing team to work with.

    My idea for another book on remote work would be to have one with the format where there were about 10+ chapters, each written by an employee from a different company about how they approach it, what tools they use and so on. I think this could be a very interesting read.

    I'll close with some thoughts from Scott Berkun's book on whether remote work is for everyone 

    "For me,  I know that for any important relationship I'd want to be physically around that person as much as possible. If I started a rock band or a company, I'd want to share the same physical space often.  The upsides outweigh the downsides.  However, if the people I wanted to work with were only available remotely, I'm confident that we could do great work from thousands of miles away."

    What do you think are the keys to successfully working as a remote employee?

    Further reading
    John O'Duinn's We are all Remoties Talk

    May 05, 2014 02:23 PM

    May 01, 2014

    Chris Cooper (coop)

    Dispatches from the releng team week: Portland

    UntitledReleng has been much more diligent during our current team week about preparing presentations and, more importantly, recording sessions for posterity.

    Sessions are still ongoing, but the list of presentations is in the wiki. We will continue to add links there.

    Special thanks to Armen for helping remoties get dialed-in and for getting everything recorded.

    May 01, 2014 09:11 PM

    April 29, 2014

    Peter Moore (pmoore)

    How we do automated mobile device testing at Mozilla – Part 3

    Video of this presentation from Release Engineering work week in Portland, 29 April 2014

    Part 3: Keeping the devices running

    So in Part 1 and 2, we saw how Buildbot tegra and panda masters can assign jobs to Buildbot slaves, and that these slaves run on foopies, and that these foopies then connect to the SUT Agent on the device, to deploy and perform the tests, and pull back results.

    However, over time, since these devices can fail, how do we make sure they are running ok, and handle the case that they go awol?

    The answer has two parts:

    2. mozpool

    What is

    You remember that in Part 2, we said you need to create a directory under /builds on the foopy for any device that foopy should be taking care of.

    Well there is a cron job installed under /etc/cron.d/foopy that takes care of running every 5 mins.

    This script will look for device directories under /tools to see which devices are associated to this foopy. For each of these, it will check there is a buildbot slave running for that device. It handles the case of automatically starting buildbot slaves as necessary, if they are not running, but also checks the health of the device, by using the verification tools of SUT tools (discussed in Part 2). If it finds a problem with a device, it will also shutdown the buildbot slave, so that it does not get new jobs. In short, it keeps the state of the buildbot slave consistent with what it believes the availability of the device to be. If the device is faulty, it brings down the buildbot slave for that device. If it is a healthy device, passing the verification tests, it will start up the buildbot slave if it is not running.

    It also checks the “disabled” state of the device from slavealloc, and makes sure if it is “disabled” in slavealloc, that the buildbot slave will be shutdown.

    Therefore if you need to disable a device, by marking it as disabled in slavealloc, running from a cron tab on the foopy, will bring down the buildbot slave of the device.

    Where are the log files of

    They are on the foopy:

    If during a buildbot test we determine that a device is not behaving properly, how do we pull it out of use?

    If a serious problem is found with a device during a buildbot job, the buildbot job will create an error.flg file under the device directory on the foopy. This signals to that when that job has completed, it should kill the buildbot slave, since the device is faulty. It should not respawn a buildbot slave while that error.flg file remains. Once per hour, it will delete the error.flg file, to force another verification test of the device.

    But wait, I heard that mozpool verifies devices and keeps them alive?

    Yes and no. Mozpool is a tool (written by Dustin) to take care of the life-cycle management of panda boards. It does not manage tegras. Remember: tegras cannot be automatically reimaged – you need fingers to press buttons on the devices, and physically connect a laptop to them. Pandas can. This is why mozpool only takes care of pandas.

    Mozpool is made up of three layered components. From the mozpool overview (

    1. Mozpool is the highest-level interface, where users request a device in a certain condition, and Mozpool finds a suitable device.
    2. Lifeguard is the middle level. It manages the state of devices, and knows how to cajole and coddle them to achieve reliable behavior.
    3. Black Mobile Magic is the lowest level. It deals with devices directly, including controlling their power and PXE booting them. Be careful using this level!

    So the principles behind mozpool, is that all the logic you have around getting a panda board, making sure it is clean and ready to use, contains the right OS image you want to run it with, etc – can be handled outside of the buildbot jobs. You would just query mozpool, tell it you’d like a device, specify the operating system image you want, and it will get you one.

    In the background it is monitoring the devices and checking they are ok, only handing you a “good” device, and cleaning up when you finish with it.

    So watch_devices and mozpool are both routinely running verification tests against the pandas?

    No. This used to be the case, but now the verification test of for pandas simply queries mozpool to get the status of the device. It no longer directly runs verification tests against the panda, to avoid that we have two systems doing the same. It trusts mozpool to tell it the correct state.

    So if I dynamically get a device from mozpool when I ask for one, does that mean my buildbot slave might get different devices at different times, depending on which devices are currently available and working at the time of the request?

    No. Since the name of the buildbot slave is the same as the name of the device, the buildbot slave is bound to the one device only. This means it cannot take advantage of the “give me a panda with this image, i don’t care which one” model.

    Summary part 3

    So we’ve learned:

    < Part 2

    April 29, 2014 06:02 AM

    How we do automated mobile device testing at Mozilla – Part 2

    Video of this presentation from Release Engineering work week in Portland, 29 April 2014

    Part 2: The foopy, Buildbot slaves, and SUT tools

    So how does buildbot interact with a device, to perform testing?

    By design, Buildbot masters require a Buildbot slave to perform any job. For example, if we have a Windows slave for creating Windows builds, we would expect to run a Buildbot slave on the Windows machine, and this would then be assigned tasks from the Buildbot master, which it would perform, and feed results back to the Buildbot master.

    In the mobile device world, this is a problem:

    1. Running a slave process on the device would consume precious limited resources
    2. Buildbot does not run on phones, or mobile boards

    Thus was born …. the foopy.

    What the hell is a foopy?

    A foopy is a machine, running Centos 6.2, that is devoted to the task of interfacing with pandas or tegras, and running buildbot slaves on their behalf.

    My first mistake was thinking that a “foopy” is special piece of hardware. This is not the case. It is nothing more than a regular Centos 6.2 machine – just a regular server, that does not have any special physical connection to the mobile device boards – it is simply a machine that has been set aside for this purpose, that has network access to the devices, just like other machines in the same network.

    For each device that a foopy is responsible for, it runs a dedicated buildbot slave. Typically each foopy serves between 10 and 15 devices. That means it will have around 10-15 buildbot slaves running on it, in parallel (assuming all devices are running ok).

    When a Buildbot master assigns a job to a Buildbot slave running on the foopy, it will run the job inside its slave, but parts of the job will involve communicating with the device, pushing binaries onto it, running tests, and gathering results. As far as the Buildbot master is concerned, the slave is the foopy, and the foopy is doing all the work. It doesn’t need to know that the foopy is executing code on a tegra or panda. As far as the device is concerned, it is receiving tasks over the SUT Agent listener network interface, and performing those tasks.

    So does the foopy always connect to the same devices?

    Yes. Each foopy has a static list of devices for it to manage jobs for.

    How do you see which devices a foopy manages?

    If you ssh onto the foopy, you will see the devices it manages as subdirectories under /builds:

    pmoore@fred:~/git/tools/sut_tools master $ ssh foopy106
    Last login: Mon Apr 28 22:01:18 2014 from
    Unauthorized access prohibited
    [ ~]$ find /builds -maxdepth 1 -type d -name 'tegra-*' -o -name 'panda-*'
    [ ~]$

    How did those directories get created?

    Manually. Each directory contains artefacts related to that panda or tegra, such as log files for verify checks, error flags if it is broken, disable flags if it has been disabled, etc. More about this later. Just know at this point that if you want that foopy to look after that device, you better create a directory for it.

    So the directory existence on the foopy is useful to know which devices the foopy is responsible for, but how do you know which foopy manages an arbitrary device, without logging on to all foopies?

    In the tools repository, the file buildfarm/mobile/devices.json also defines the mapping between foopy and device. Here is a sample:

     "tegra-010": {
     "foopy": "foopy109",
     "pdu": "",
     "pduid": ".AA1"
     "tegra-011": {
     "foopy": "foopy109",
     "pdu": "",
     "pduid": ".AA1"
     "tegra-012": {
     "foopy": "foopy109",
     "pdu": "",
     "pduid": ".AA1"
     "panda-0168": {
     "foopy": "foopy45",
     "relayhost": "",
     "relayid": "2:6"
     "panda-0169": {
     "foopy": "foopy45",
     "relayhost": "",
     "relayid": "2:7"
     "panda-0170": {
     "foopy": "foopy46",
     "relayhost": "",
     "relayid": "1:1"

    So what if the devices.json lists different foopy -> devices mappings than the foopy filesystems list? Isn’t there a danger this data gets out of sync?

    Yes, there is nothing checking that these two data sources are equivalent. For example, if /builds/tegra-0123 was created on foopy39, but devices.json said tegra-0123 was assigned to foopy65, nothing would report this difference, and we would have non-deterministic behaviour.

    Why is the foopy data not in slavealloc?

    Currently the fields for the slaves are static across different slave types – so if we added a field for “foopy” for the foopies, it would also appear for all other slave types, which don’t have a foopy association.

    What is that funny other data in the devices.json file?

    The “pdu” and “pduid” are the coordinates required to determine the physical power supply of the tegra. These are the values that you call the PDU API with to enable/disable power for that particular tegra.

    The “relayhost” and “relayid” are the equivalent values for the panda power supplies.

    Where does this data come from?

    This data is maintained in IT’s inventory database. We duplicate this information in this file.


    So is a PDU and a relay board essentially the same thing, just one is for pandas, and the other for tegras?


    What about if we want to write comments in this file? json doesn’t support comments, right?

    For example, you want to put a comment to explain why a tegra is not assigned to a PDU. For this, since json currently does not support comments, we add a _comment field, e.g.:

     "tegra-024": {
     "_comment": "Bug 727345: Assigned to WebQA",
     "foopy": "None"

    Is there any sync process between inventory and devices.json to guarantee integrity of the relayboard and PDU data?

    No. We do not sync the data, so there is a risk our data can get out-of-sync. This could be solved by having an auto-sync to the devices.json file, or using inventory as the data source, rather than the devices.json file.

    So how do we interface with the PDUs / relay boards to hard reboot devices?

    This is done using sut_tools script.

    Is there anything else useful in this “sut tools” folder?

    Yes, lots. This provides scripts for doing all sorts, like deploying artefacts on tegras and pandas, rebooting, running smoke tests and verifying the devices, cleaning up devices, accessing device logs, etc.

    Summary part 2

    So we’ve learned:

    < Part 1    Part 3 >

    April 29, 2014 05:00 AM

    April 26, 2014

    Peter Moore (pmoore)

    How we do automated mobile device testing at Mozilla – Part 1

    Video of this presentation from Release Engineering work week in Portland, 29 April 2014

    Part 1: Back to basics

    What software do we produce for mobile phones?

    What environments do we use for building and testing this software?

    Building Testing
    Fennec CentOS 6.2
    (bld-linux64-ix-*) in-house
    (bld-linux64-ec2-*) AWS
    Tegra / Panda / Emulator
    B2G CentOS 6.2 Emulator

    So first key point unveiled:

    Second key point:

    So why do we test Fennec on tegras, pandas and emulators?

    To answer this, first remember the wide variety of builds and tests we perform:

    Screenshot from tbpl

    Screenshot from tbpl

    The answer is:


    What are the main differences between our tegras and pandas?

    Tegras Pandas
    Look like this: Look like this:
    Tegra250_plugged_in 2012-08-06-10.23.28-768x1024
    Racked up like this: Racked up like this:
    blog_racks_in_faraday_cage 2012-11-09-08.30.50
    Older Newer
    Running Android 2.2 Running Android 4.0
    Hanging in shoe racks Racked professionally in Faraday cages
    Can only be reimaged by physically connecting them to a laptop, and pressing buttons in a magical sequence can be remotely reimaged by mozpool (moar to come later)
    Not very reliable Quite reliable
    Is connected to a “PDU” which allows us to programatically call an API to “pull the power” Is connected to a “relay host” which allows us to programatically call an API to “pull the power”

    So as you see, a panda is a more serious piece of kit than a tegra. Think of a tegras as a toy.

    So what are tegras and a pandas, actually?

    Both are mobile device boards, as you see above, like you would get in a phone, but not actually in a phone.

    So why don’t we just use real phones?

    1. Real phones use batteries
    2. Real phones have wireless network

    Basically, by using the boards directly, we can:

    1. control the power supply (by connecting them to power units – PDUs) which we have API access to (i.e. we have an API to pull the power to a device)
    2. use ethernet, rather than wireless (which is more reliable, wireless signals don’t interfere with each other, less radiation, …)

    OK, so we have phones (or “phone circuit boards”) wired up to our network – but how do we communicate with them?

    Fennec historically ran on more platforms than just Android. It also ran on:

    For this reason, it was decided to create a generic interface, which would be implemented on all supported platforms. The SUT Agent was born.

    Please note: nowadays, Fennec it only available for Android 2.2+. It is not available for iOS (iPhone, iPad, iPod Touch), Windows Phone, Windows RT, Bada, Symbian, Blackberry OS, webOS or other operating systems for mobile.

    Therefore, the original reason for creating a standard interface to all devices (the SUT Agent) no longer exists. It would also be possible to use a different mechanism (telnet, ssh, adb, …) to communicate with the device. However, this is not what we do.

    So what is the SUT Agent, and what can it do?

    The SUT Agent is a listener running on the tegra or panda, that can receive calls over its network interface, to tell it to perform tasks. You can think of it as something like an ssh daemon, in the sense that you can connect to it from a different machine, and issue commands.

    How do you connect to it?

    You simply telnet to the tegra or foopy, on port 20700 or 20701.

    Why two ports? Are the different?

    Only marginally. The original idea was that users would connect on port 20701, and that automated systems would connect on port 20700. For this reason, if you connect on port 20700, you don’t get a prompt. If you connect on port 20701, you do. However, everything else is the same. You can issue commands to both listeners.

    What commands does it support?

    The most important command is “help”. It displays this output, showing all available commands:

    pmoore@fred:~/git/tools/sut_tools master $ telnet panda-0149 20701
    Connected to
    Escape character is '^]'.
    run [cmdline] - start program no wait
    exec [env pairs] [cmdline] - start program no wait optionally pass env
     key=value pairs (comma separated)
    execcwd <dir> [env pairs] [cmdline] - start program from specified directory
    execsu [env pairs] [cmdline] - start program as privileged user
    execcwdsu <dir> [env pairs] [cmdline] - start program from specified directory as privileged user
    execext [su] [cwd=<dir>] [t=<timeout>] [env pairs] [cmdline] - start program with extended options
    kill [program name] - kill program no path
    killall - kill all processes started
    ps - list of running processes
    info - list of device info
     [os] - os version for device
     [id] - unique identifier for device
     [uptime] - uptime for device
     [uptimemillis] - uptime for device in milliseconds
     [sutuptimemillis] - uptime for SUT in milliseconds
     [systime] - current system time
     [screen] - width, height and bits per pixel for device
     [memory] - physical, free, available, storage memory
     for device
     [processes] - list of running processes see 'ps'
    alrt [on/off] - start or stop sysalert behavior
    disk [arg] - prints disk space info
    cp file1 file2 - copy file1 to file2
    time file - timestamp for file
    hash file - generate hash for file
    cd directory - change cwd
    cat file - cat file
    cwd - display cwd
    mv file1 file2 - move file1 to file2
    push filename - push file to device
    rm file - delete file
    rmdr directory - delete directory even if not empty
    mkdr directory - create directory
    dirw directory - tests whether the directory is writable
    isdir directory - test whether the directory exists
    chmod directory|file - change permissions of directory and contents (or file) to 777
    stat processid - stat process
    dead processid - print whether the process is alive or hung
    mems - dump memory stats
    ls - print directory
    tmpd - print temp directory
    ping [hostname/ipaddr] - ping a network device
    unzp zipfile destdir - unzip the zipfile into the destination dir
    zip zipfile src - zip the source file/dir into zipfile
    rebt - reboot device
    inst /path/filename.apk - install the referenced apk file
    uninst packagename - uninstall the referenced package and reboot
    uninstall packagename - uninstall the referenced package without a reboot
    updt pkgname pkgfile - unpdate the referenced package
    clok - the current device time expressed as the number of millisecs since epoch
    settime date time - sets the device date and time
    tzset timezone - sets the device timezone format is
     GMTxhh:mm x = +/- or a recognized Olsen string
    tzget - returns the current timezone set on the device
    rebt - reboot device
    adb ip|usb - set adb to use tcp/ip on port 5555 or usb
    activity - print package name of top (foreground) activity
    quit - disconnect SUTAgent
    exit - close SUTAgent
    ver - SUTAgent version
    help - you're reading it
    $>Connection closed by foreign host.

    Typically we use the SUT Agent to query the device, push Fennec and tests onto it, run tests, perform file system commands, execute system calls, and retrieve results and data from the device.

    What is the difference between quit and exit commands?

    I’m glad you asked. “quit” will terminate the session. “exit” will shut down the sut agent. You really don’t want to do this. Be very careful.

    Is the SUT Agent a daemon? If it dies, will it respawn?

    No, it isn’t, but yes, it will!

    The SUT Agent can die, and sometimes does. However, it has a daddy, who watches over it. The Watcher is a daemon, also running on the pandas and tegras, that monitors the SUT Agent. If the SUT Agent dies, the Watcher will spawn a new SUT Agent.

    Probably it would be possible to have the SUT Agent as an auto-respawning daemon – I’m not sure why it isn’t this way.

    Who created the Watcher?

    Legend has it, that the Watcher was created by Bob Moss.

    Where is the source code for the SUT Agent and the Watcher?

    The SUT Agent codebase lives in the firefox desktop source tree:

    The Watcher code lives there too:

    Does the Watcher and SUT Agent get automatically deployed when there are new changes?

    No. If there are changes, they need to be manually built (no continuous integration) and manually deployed to all tegras, and a new image needs to be created for pandas in mozpool (will be explained later).

    Fortunately, there are very rarely changes to either component.

    Summary part 1

    So we’ve learned:

    > Part 2

    April 26, 2014 11:27 PM