Planet Release Engineering

December 11, 2014

Kim Moir (kmoir)

Releng 2015 CFP now open

Florence, Italy.  Home of beautiful architecture.

Il Duomo di Firenze by ©runner310, Creative Commons by-nc-sa 2.0


Delicious food and drink.

Panzanella by © Pete Carpenter, Creative Commons by-nc-sa 2.0

Caffè ristretto by © Marcelo César Augusto Romeo, Creative Commons by-nc-sa 2.0


And next May, release engineering :-)

The CFP for Releng 2015 is now open.  The deadline for submissions is January 23, 2015.  It will be held on May 19, 2015 in Florence Italy and co-located with ICSE 2015.   We look forward to seeing your proposals about the exciting work you're doing in release engineering!

If you have questions about the submission process or anything else, please contact any of the program committee members. My email is kmoir and I work at mozilla.com.

December 11, 2014 09:00 PM

December 09, 2014

Armen Zambrano G. (@armenzg)

Running Mozharness in developer mode will only prompt once for credentials

Thanks to Mozilla's contributor kartikgupta0909 we now only have to enter LDAP credentials once when running the developer mode of Mozharness.

He accomplished it in bug 1076172.

Thank you Kartik!


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

December 09, 2014 09:43 PM

December 08, 2014

Armen Zambrano G. (@armenzg)

Test mozharness changes on Try

You can now push to your own mozharness repository (even a specific branch) and have it be tested on Try.

Few weeks ago we developed mozharness pinning (aka mozharness.json) and recently we have enabled it for Try. Read the blog post to learn how to make use of it.

NOTE: This currently only works for desktop, mobile and b2g test jobs. More to come.
NOTE: We only support named branches, tags or specific revisions. Do not use bookmarks as it doesn't work.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

December 08, 2014 06:59 PM

December 04, 2014

Morgan Phillips (mrrrgn)

shutdown -h never

Mozilla build/test infrastructure is complex. The jobs can be expensive and messy. So, for a while now, machines have been rebooted after completing tasks to ensure that environments remain fresh. This strategy works marvelously at preventing unnecessary failures; but still wastes a lot of resources. To fix this, for the past month, I've worked on achieving the effects of a reboot without actually doing one. Sort of a "virtual" reboot.

Yesterday I turned on these "virtual" reboots for all of our Linux hosts, and it seems to be working well. By next month I should also have it turned on for OSX and Windows machines. With reboots taking something like two minutes to complete, and at around 100k jobs per day running this could save us a whopping 200,000 minutes. That's nearly five months of machine time saved a day!



What's more, this estimate does not take into account the fact that jobs run faster on a machine that's already "warmed up."

What does a "virtual" reboot look like?

For starters [pun intended], each job requires a good amount of setup and teardown, so, a sort of init system is necessary. To achieve this a utility called runner has been created. Runner is a project that manages starting tasks in a defined order. If tasks fail, the chain can be retried, or halted. Many tasks that once lived in /etc/init.d/ are now managed by runner including buildbot itself.



Among runner's tasks are various scripts for cleaning up temporary files, starting/restarting services, and also a utility called cleanslate. Cleanslate resets a users running processes to a previously recorded state.

At boot, cleanslate takes a snapshot of all running processes, then, before each job it kills any processes (by name) which weren't running when the system was fresh. This particular utility is key to maintaining stability and may be extended in the future to enforce other kinds of system state as well.



The end result is this:

old work flow

Boot + init -> Take Job -> Reboot (2-5 min)

new work flow

Boot + Runner -> Take Job -> Shutdown Buildslave
(runner loops and restarts slave)

December 04, 2014 06:54 PM

December 03, 2014

Kim Moir (kmoir)

Mozilla pushes - November 2014

Here's November's monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file.

Trends
Not a record breaking month, in fact we are down over 2000 pushes since the last month.

Highlights
10376 pushes
346 pushes/day (average)
Highest number of pushes/day: 539 pushes on November 12
17.7 pushes/hour (average)

General Remarks
Try keeps had around 38% of all the pushes, and gaia-try has about 30%. The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 23% of all the pushes.

Records
August 2014 was the month with most pushes (13,090  pushes)
August 2014 has the highest pushes/day average with 422 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
October 8, 2014 had the highest number of pushes in one day with 715 pushes    







December 03, 2014 09:41 PM

November 24, 2014

Armen Zambrano G. (@armenzg)

Pinning mozharness from in-tree (aka mozharness.json)

Since mozharness came around 2-3 years ago, we have had the same issue where we test a mozharness change against the trunk trees, land it and get it backed out because we regress one of the older release branches.

This is due to the nature of the mozharness setup where once a change is landed all jobs start running the same code and it does not matter on which branch that job is running.

I have recently landed some code that is now active on Ash (and soon on Try) that will read a manifest file that points your jobs to the right mozharness repository and revision. We call this process to "pin mozhaness". In other words, what we do is to fix an external factor to our job execution.

This will allow you to point your Try pushes to your own mozharness repository.

In order to pin your jobs to a repository/revision of mozharness you have to change a file called mozharness.json which indicates the following two values:


This is a similar concept as talos.json introduced which locks every job to a specific revision of talos. The original version of it landed in 2011.

Even though we have a similar concept since 2011, that doesn't mean that it was as easy to make it happen for mozharness. Let me explain a bit why:

Coming up:
  • Enable on Try
  • Free up Ash and Cypress
    • They have been used to test custom mozharness patches and the default branch of Mozharness (pre-production)
Long term:
  • Enable the feature on all remaining Gecko trees
    • We would like to see this run at scale for a bit before rolling it out
    • This will allow mozharness changes to ride the trains
If you are curious, the patches are in bug 791924.

Thanks for Rail for all his patch reviews and Jordan for sparking me to tackle it.



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

November 24, 2014 05:35 PM

November 12, 2014

Kim Moir (kmoir)

Scaling capacity while saving cash

There was a very interesting release engineering summit this Monday held in concert with LISA in Seattle.  I was supposed fly there this past weekend so I could give a talk on Monday but late last week I became ill and was unable to go.   Which was very disappointing because the summit looked really great and I was looking forward to meeting the other release engineers and learning about the challenges they face.

Scale in the Market  ©Clint Mickel, Creative Commons by-nc-sa 2.0

Although I didn't have the opportunity to give the talk in person, the slides for it are available on slideshare and my mozilla people account   The talk describes how we scaled our continuous integration infrastructure on AWS to handle double the amount of pushes it handled in early 2013, all while reducing our AWS monthly bill by 2/3.

Cost per push from Oct 2012 until Oct 2014. This does not include costs for on premise equipment. It reflects our monthly AWS bill divided by the number of monthly pushes (commits).  The chart reflects costs from October 2012-2014.

Thank you to Dinah McNutt and the other program committee members for organizing this summit.  I look forward to watching the talks once they are online.

November 12, 2014 07:34 PM

Mozilla pushes - October 2014

Here's the October 2014 monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file.

Trends
We didn't have a record breaking month in terms of the number of pushes, however we did have a daily record on October 18 with 715 pushes. 

Highlights
12821 pushes, up slightly from the previous month
414 pushes/day (average)
Highest number of pushes/day: 715 pushes on October 8
22.5 pushes/hour (average)

General Remarks
Try keeps had around 39% of all the pushes, and gaia-try has about 31%. The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 21% of all the pushes

Records
August 2014 was the month with most pushes (13,090  pushes)
August 2014 has the highest pushes/day average with 422 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
October 8, 2014 had the highest number of pushes in one day with 715 pushes




November 12, 2014 03:45 PM

Morgan Phillips (mrrrgn)

AirMozilla: Distrusting Our Own [Build] Infrastructure

If you missed last week's AirMozilla broadcast: Why and How of Reproducible Builds: Distrusting Our Own Infrastructure For Safer Software Releases by the Tor Project, consider checking it out.

The talk is an in depth look at how one can protect release pipelines from being owned by attacks which target build systems. Particularly, attacks where compromised compilers may be used to create unsafe binaries from safe source code.

Meanwhile RelEng is underway, putting these ideas into practice.

November 12, 2014 08:54 AM

A Simple Trusting Trust Attack

....

November 12, 2014 07:54 AM

November 10, 2014

Morgan Phillips (mrrrgn)

A Note on Deterministic Builds

Since I joined Mozilla's Release Engineering team I've had the opportunity to put my face into a firehose of interesting new knowledge and challenges. Maintaining a release pipeline for binary installers and updates used by a substantial portion of the Earth's population is a whole other kind of beast from ops roles where I've focused on serving some kind of SaaS or internal analytics infrastructure. It's really exciting!

One of the most interesting problems I've seen getting attention lately are deterministic builds, that is, builds that produce the same sequence of bytes from source on a given platform at any time.

What good are deterministic builds?

For starters, they aid in detecting "Trusting Trust" attacks. That's where a compromised compiler produces malicious binaries from perfectly harmless source code via replacing certain patterns during compilation. It sort of defeats the whole security advantage of open source when you download binaries right?

Luckily for us users, a fellow named David A. Wheeler rigorously proved a method for circumventing this class of attacks altogether via a technique he coined "Diverse Double-Compiling" (DDC). The gist of it is, you compile a project's source code with a trusted tool chain then compare a hash of the result with some potentially malicious binary. If the hashes match you're safe.

DDC also detects the less clever scenario where an adversary patches, otherwise open, source code during the build process and serves up malwareified packages. In either case, it's easy to see that this works if and only if builds are deterministic.

Aside from security, they can also help projects that support many platforms take advantage of cross building with less stress. That is, one could compile arm packages on an x86_64 host then compare the results to a native build and make sure everything matches up. This can be a huge win for folks who want to cut back on infrastructure overhead.

How can I make a project more deterministic?

One bit of good news is, most compilers are already pretty deterministic (on a given platform). Take hello.c for example:

int main() {
    printf("Hello World!");
}


Compile that a million times and take the md5sum. Chances are you'll end up with a million identical md5sums. Scale that up to a million lines of code, and there's no reason why this won't hold true.

However, take a look at this doozy:

int main() {
    printf("Hello from %s! @ %s", __FILE__, __TIME__);
}


Having timestamps and other platform specific metadata baked into source code is a huge no-no for creating deterministic builds. Compile that a million times, and you'll likely get a million different md5sums.

In fact, in an attempt to make Linux more deterministic all __TIME__ macros were removed and the makefile specifies a compiler option (-Werror=date-time) that turns any use of it into an error.

Unfortunately, removing all traces of such metadata in a mature code base could be all but impossible, however, a fantastic tool called gitian will allow you to compile projects within a virtual environment where timestamps and other metadata are controlled.

Definitely check gitian out and consider using it as a starting point.

Another trouble spot to consider is static linking. Here, unless you're careful, determinism sits at the mercy of third parties. Be sure that your build system has access to identical libraries from anywhere it may be used. Containers and pre-baked vms seem like a good choice for fixing this issue, but remember that you could also be passing around a tainted compiler!

Scripts that automate parts of the build process are also a potent breeding ground for non-deterministic behaviors. Take this python snippet for example:

with open('manifest', 'w') as manifest:
    for dirpath, dirnames, filenames in os.walk("."):
        for filename in filenames:
            manifest.write("{}\n".format(filename))


The problem here is that os.walk will not always print filenames in the same order. :(

One also has to keep in mind that certain data structures become very dangerous in such scripts. Consider this pseudo-python that auto generates some sort of source code in a compiled language:

weird_mapping = dict(file_a=99, file_b=1)
things_in_a_set = set([thing_a, thing_b, thing_c])
for k, v in werid_mapping.items():
    ... generate some code ...
for thing in things_in_a_set:
    ... generate some code ...


A pattern like this would dash any hope that your project had of being deterministic because it makes use of unordered data structures.

Beware of unordered data structures in build scripts and/or sort all the things before writing to files.

Enforcing determinism from the beginning of a project's life cycle is the ideal situation, so, I would highly recommend incorporating it into CI flows. When a developer submits a patch it should include a hash of their latest build. If the CI system builds and the hashes don't match, reject that non-deterministic code! :)

EOF

Of course, this hardly scratches the surface on why deterministic builds are important; but I hope this is enough for a person to get started on. It's a very interesting topic with lots of fun challenges that need solving. :) If you'd like to do some further reading, I've listed a few useful sources below.

https://blog.torproject.org/blog/deterministic-builds-part-two-technical-details

https://wiki.debian.org/ReproducibleBuilds#Why_do_we_want_reproducible_builds.3F

http://www.chromium.org/developers/testing/isolated-testing/deterministic-builds

November 10, 2014 07:54 PM

Justin Wood (Callek)

Firefox Launches Developer Editon (Minor Papercut Issues)

So, as you may have heard, Firefox is launching a dev edition.

This post does not attempt to elaborate on that specifically too much, but it’s more to identify some issues I hit in early testing and the solutions to them.

Theme

While I do admire the changes of the Developer Edition Theme, I’m a guy who likes to stick with “what I know” more than a drastic change like that. What I didn’t realize was that this is possible out of the box in developer edition.

After the Tour you get, you’ll want to open the Customize panel and then deselect “Use Firefox Developer Edition Theme” (see the following image — arrow added) and that will get you back to what you know.

DevEditionTheme

Sync

As a longtime user, I had “Old Firefox Sync” enabled; this was the one that very few users enabled and even fewer used it across devices.

Firefox Developer Edition, however, creates a new profile (so you can use it alongside whatever Firefox version you want) and supports setting up only the “New” sync features. Due to creating a new profile, it also leaves you without history or saved passwords.

To sync my old profile with developer edition, I had to:

  1. Unlink my Desktop Firefox from old sync
  2. Unlink my Android Firefox from old sync
  3. Create a new sync account
  4. Link my old Firefox profile with new sync
  5. Link my Android with new sync
  6. Link Dev Edition with new sync
  7. Profit

Now other than steps 6 and 7 (yea, how DO I profit?) this is all covered quite well in a SuMo article on the subject. I will happily help guide people through this process, especially in the near future, as I’ve just gone through it!

(Special Thanks to Erik for helping to copy-edit this post)

November 10, 2014 04:30 PM

I’m a wordpress newbie

If this is on planet.mozilla.org, and so is a “content is password protected” post below it, I’m sorry.

The post is merely that way because its unfinished but I wanted to share it with a few others for early feedback.

I’ll delete this post, and unhide that one once things are ready. (Sorry for any confusion)

November 10, 2014 05:18 AM

November 06, 2014

Armen Zambrano G. (@armenzg)

Setting buildbot up a-la-releng (Create your own local masters and slaves)

buildbot is what Mozilla's Release Engineering uses to run the infrastructure behind tbpl.mozilla.org.
buildbot assigns jobs to machines (aka slaves) through hosts called buildbot masters.

All the different repositories and packages needed to setup buildbot are installed through Puppet and I'm not aware of a way of setting my local machine through Puppet (I doubt I would want to do that!).
I managed to set this up a while ago by hand [1][2] (it was even more complicated in the past!), however, these one-off attempts were not easy to keep up-to-date and isolated.

I recently landed few scripts that makes it trivial to set up as many buildbot environments as you want and all isolated from each other.

All the scripts have been landed under the "community" directory under the "braindump" repository:
https://hg.mozilla.org/build/braindump/file/default/community

The main two scripts:

If you call create_community_slaves_and_masters.sh with -w /path/to/your/own/workdir you will have everything set up for you. From there on, all you would have to do is this:
  • cd /path/to/your/own/workdir
  • source venv/bin/activate
  • buildbot start masters/test_master (for example)
  • buildslave start slaves/test_slave
Each paired master and slave have been setup to talk to each other.

I hope this is helpful for people out there. It's been great for me when I contribute patches for buildbot (bug 791924).

As always in Mozilla, contributions are always welcome!

PS 1 = Only tested on Ubuntu. If you want it to port this to other platforms please let me know and I can give you a hand.

PS 2 = I know that there is a repository that has docker images called "tupperware", however, I had these set of scripts being worked on for a while. Perhaps someone wants to figure out how to set a similar process through the docker images.



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

November 06, 2014 02:02 PM

November 05, 2014

Massimo Gervasini (mgerva)

Sign the hash of the bundle, not the full bundle!

With Bug 1083683,, we are stopping direct processing of .bundle and .source files by our signing servers. This means that in the near future we will not have new *.bundle.asc and and *.source.tar.bz2.asc files on the ftp server.
Bundles and source files have grown quite a bit and get them signed sometimes ends up in retries and failed jobs, disrupting and delaying the release process. There’s also no benefit on having them signed directly; the source-package job already calculates the hash of the bundle/source files and their MD5/SHA1/SHA512 hashes get included in the .checksum file, which is signed with the release automation key.


November 05, 2014 04:57 PM

October 31, 2014

Chris Cooper (coop)

10.8 testing disabled by default on Try

Mountain LionIf you’ve recently submitted patches to the Mozilla Try server, you may have been dismayed by the turnaround time for your test results. Indeed, last week we had reports from some developers that they were waiting more than 24 hours to get results for a single Try push in the face of backlogs caused by tree closures.

The chief culprit here was Mountain Lion, or OS X 10.8, which is our smallest pool (99) of test machines. It was not uncommon for there to be over 2,000 pending test jobs for Mountain Lion at any given time last week. Once we reach a pending count that high, we cannot make headway until the weekend when check-in volume drops substantially.

In the face of these delays, developers started landing some patches on mozilla-inbound before the corresponding jobs had finished on Try, and worse still, not killing the obsolete pending jobs on Try. That’s just bad hygiene and practice. Sheriffs had to actively look for the duplicate jobs and kill them up to help decrease load.

We cannot easily increase the size of the Mountain Lion pool. Apple does not allow you to install older OS X versions on new hardware, so our pool size here is capped at the number of machines we bought when 10.8 was released over 2 years ago or what we can scrounge from resellers.

To improve the situation, we made the decision this week to disable 10.8 testing by default on Try. Developers must now select 10.8 explicitly from the “Restrict tests to platform(s)” list on TryChooser if they want to run Mountain Lion tests. If you have an existing Mac Try build that you’d like to back-fill with 10.8 results, please ping the sheriff on duty (sheriffduty) in #developers or #releng and they can help you out *without* incurring another full Try run.

Please note that we do plan to stand up Yosemite (10.10) testing as a replacement for Mountain Lion early in 2015. This is a stop-gap measure until we’re able to do so.

October 31, 2014 08:25 PM

October 27, 2014

Kim Moir (kmoir)

Mozilla pushes - September 2014

Here's September 2014's monthly analysis of the pushes to our Mozilla development trees.
You can load the data as an HTML page or as a json file.


Trends
Suprise!  No records were broken this month.

Highlights
12267 pushes
409 pushes/day (average)
Highest number of pushes/day: 646 pushes on September 10, 2014
22.6 pushes/hour (average)

General Remarks
Try has around 36% of pushes and Gaia-Try comprise about 32%.  The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 22% of all the pushes.

Records
August 2014 was the month with most pushes (13,090  pushes)
August 2014 has the highest pushes/day average with 620 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
August 20, 2014 had the highest number of pushes in one day with 690 pushes





October 27, 2014 09:11 PM

Release Engineering in the classroom

The second week of October, I had the pleasure of presenting lectures on release engineering to university students in Montreal as part of the PLOW lectures at École Polytechnique de Montréal.    Most of the students were MSc or PhD students in computer science, with a handful of postdocs and professors in the class as well. The students came from Montreal area universities and many were international students. The PLOW lectures consisted of several invited speakers from various universities and industry spread over three days.

View looking down from the university

Université de Montréal administration building

École Polytechnique building.  Each floor is painted a different colour to represent a differ layer of the earth.  So the ground floor is red, the next orange and finally green.

The first day, Jack Jiang from York University gave a talk about software performance engineering.
The second day, I gave a lecture on release engineering in the morning.  The rest of the day we did a lot of labs to configure a Jenkins server to build and run tests on an open source project. Earlier that morning, I had setup m3.large instances for the students on Amazon that they could ssh into and conduct their labs.  Along the way, I talked about some release engineering concepts.  It was really interesting and I learned a lot from their feedback.  Many of the students had not been exposed to release engineering concepts so it was fun to share the information.

Several students came up to me during the breaks and said "So, I'm doing my PhD in release engineering, and I have several questions for you" which was fun.  Also, some of the students were making extensive use of code bases for Mozilla or other open source projects so that was interesting to learn more about.  For instance one research project looking at the evolution of multi-threading in a Mozilla code bases, and another student was conducting bugzilla comment sentiment analysis.  Are angry bug comments correlated with fewer bug fixes?  Looking forward to the results of this research!

I ended the day by providing two challenge exercises to the students that they could submit answers to.  One exercise was to setup a build pipeline in Jenkins for another open source project.  The other challenge was to use a the Jenkins REST API to query the Apache projects Jenkins server and present some statistics on their build history.  The results were pretty impressive!

My slides are on GitHub and the readme file describes how I setup the Amazon instances so Jenkins and some other required packages were installed before hand.  Please use them and distribute them if you are interested in teaching release engineering in your classroom.

Lessons I learned from this experience:
The third day there was a lecture by Michel Dagenais of Polytechnique Montréal on tracing heterogeneous cloud instances using (tracing framework for Linux).  The Eclipse trace compass project also made an appearance in the talk. I always like to see Eclipse projects highlighted.  One of his interesting points was that none of the companies that collaborate on this project wanted to sign a bunch of IP agreements so they could collaborate on this project behind closed doors.  They all wanted collaborate via an open source community and source code repository.  Another thing he emphasized was that students should make their work available on the web, via GitHub or other repositories so they have a portfolio of work available.  It was fantastic to seem him promote the idea of students being involved in open source as a way to help their job prospects when they graduate!

Thank you Foutse and  Bram  for the opportunity to lecture at your university!  It was a great experience!  Also, thanks Mozilla for the opportunity to do this sort of outreach to our larger community on company time!

Also, I have a renewed respect for teachers and professors.  Writing these slides took so much time.  Many long nights for me especially in the days leading up to the class.  Kudos to you all who do teach everyday.

References
The slides are on GitHub and the readme file describes how I setup the Amazon instances for the labs

October 27, 2014 01:34 PM

Beyond the Code 2014: a recap

I started this blog post about a month ago and didn't finish it because well, life is busy.  

I attended Beyond the Code last September 19.  I heard about it several months ago on twitter.  A one-day conference about celebrating women in computing, in my home town, with an fantastic speaker line up?  I signed up immediately.   In the opening remarks, we were asked for a show of hands to show how many of us were developers, in design,  product management, or students and there was a good representation from all those categories.  I was especially impressed to see the number of students in the audience, it was nice to see so many of them taking time out of their busy schedule to attend.

View of the Parliament Buildings and Chateau Laurier from the MacKenzie street bridge over the Rideau Canal
Ottawa Conference Centre, location of Beyond the Code
 
There were seven speakers, three workshop organizers, a lunch time activity, and a panel at the end. The speakers were all women.  The speakers were not all white women or all heterosexual women.  There were many young women, not all industry veterans :-) like me.  To see this level of diversity at a tech conference filled me with joy.  Almost every conference I go to is very homogenous in the make up of the speakers and the audience.  To to see ~200 tech women in at conference and 10% men (thank you for attending:-) was quite a role reversal.

I completely impressed by the caliber of the speakers.  They were simply exceptional.

The conference started out with Kronda Adair giving a talk on Expanding Your Empathy.  One of the things that struck me from this talk was that she talked about how everyone lives in a bubble, and they don't see things that everyone does due to privilege.  She gave the example of how privilege is like a browser, and colours how we see the world.  For a straight white guy a web age looks great when they're running the latest Chrome on MacOSx.  For a middle class black lesbian, the web page doesn't look as great because it's like she's running IE7.  There is less inherent privilege.  For a "differently abled trans person of color" the world is like running IE6 in quirks mode. This was a great example. She also gave a shout out to the the Ascend Project which she and Lukas Blakk are running in Mozilla Portland office. Such an amazing initiative.

The next speaker was Bridget Kromhout who gave talk about Platform Ops in the Public Cloud.
I was really interested in this talk because we do a lot of scaling of our build infrastructure in AWS and wanted to see if she had faced similar challenges. She works at DramaFever, which she described as Netflix for Asian soap operas.  The most interesting things to me were the fact that she used all AWS regions to host their instances, because they wanted to be able to have their users download from a region as geographically close to them as possible.  At Mozilla, we only use a couple of AWS regions, but more instances than Dramafever, so this was an interesting contrast in the services used. In addition, the monitoring infrastructure they use was quite complex.  Her slides are here.

I was going to summarize the rest of the speakers but Melissa Jean Clark did an exceptional job on her blog.  You should read it!

Thank you Shopify for organizing this conference.  It was great to meet some many brilliant women in the tech industry! I hope there is an event next year too!

October 27, 2014 01:33 PM

October 14, 2014

Jordan Lund (jlund)

This week in Releng - Oct 5th, 2014

Major highlights:

Completed work (resolution is 'FIXED'):


In progress work (unresolved and not assigned to nobody):

October 14, 2014 04:36 AM

October 07, 2014

Ben Hearsum (bhearsum)

Redo 1.3 is released – now with more natural syntax!

We’ve been using the functions packaged in Redo for a few years now at Mozilla. One of the things we’ve been striving for with it is the ability to write the most natural code possible. In it’s simplest form, retry, a callable that may raise, the exceptions to retry on, and the callable to run to cleanup before another attempt – are all passed in as arguments. As a result, we have a number of code blocks like this, which don’t feel very Pythonic:

retry(self.session.request, sleeptime=5, max_sleeptime=15,
      retry_exceptions=(requests.HTTPError, 
                        requests.ConnectionError),
      attempts=self.retries,
      kwargs=dict(method=method, url=url, data=data,
                  config=self.config, timeout=self.timeout,
                  auth=self.auth, params=params)
)

It’s particularly unfortunate that you’re forced to let retry do your exception handling and cleanup – I find that it makes the code a lot less readable. It’s also not possible to do anything in a finally block, unless you wrap the retry in one.

Recently, Chris AtLee discovered a new method of doing retries that results in much cleaner and more readable code. With it, the above block can be rewritten as:

for attempt in retrier(attempts=self.retries):
    try:
        self.session.request(method=method, url=url, data=data,
                             config=self.config,
                             timeout=self.timeout, auth=self.auth,
                             params=params)
        break
    except (requests.HTTPError, requests.ConnectionError), e:
        pass

retrier simply handles the the mechanics of tracking attempts and sleeping, leaving your code to do all of its own exception handling and cleanup – just as if you weren’t retrying at all. It’s important to note that the break at the end of the try block is important, otherwise self.session.request would run even if it succeeded.

I released Redo 1.3 with this new functionality this morning – enjoy!

October 07, 2014 12:48 PM

October 02, 2014

Hal Wine (hwine)

bz Quick Search

October 02, 2014 07:00 AM

September 29, 2014

Jordan Lund (jlund)

This Week In Releng - Sept 21st, 2014

Major Highlights:

Completed work (resolution is 'FIXED'):


In progress work (unresolved and not assigned to nobody):

September 29, 2014 06:08 PM

This Week In Releng - Sept 7th, 2014

Major Highlights

Completed work (resolution is 'FIXED'):


In progress work (unresolved and not assigned to nobody):

September 29, 2014 05:44 PM

September 25, 2014

Armen Zambrano G. (@armenzg)

Making mozharness easier to hack on and try support

Yesterday, we presented a series of proposed changes to Mozharness at the bi-weekly meeting.

We're mainly focused on making it easier for developers and allow for further flexibility.
We will initially focus on the testing side of the automation and make ground work for other further improvements down the line.

The set of changes discussed for this quarter are:

  1. Move remaining set of configs to the tree - bug 1067535
    • This makes it easier to test harness changes on try
  2. Read more information from the in-tree configs - bug 1070041
    • This increases the number of harness parameters we can control from the tree
  3. Use structured output parsing instead of regular where it applies - bug 1068153
    • This is part of a larger goal where we make test reporting more reliable, easy to consume and less burdening on infrastructure
    • It's to establish a uniform criteria for setting a job status based on a test result that depends on structured log data (json) rather than regex-based output parsing
    • "How does a test turn a job red or orange?" 
    • We will then have a simple answer that is that same for all test harnesses
  4. Mozharness try support - bug 791924
    • This will allow us to lock which repo and revision of mozharnes is checked out
    • This isolates mozharness changes to a single commit in the tree
    • This give us try support for user repos (freedom to experiment with mozharness on try)


Even though we feel the pain of #4, we decided that the value gained for developers through #1 & #2 gave us immediate value while for #4 we know our painful workarounds.
I don't know if we'll complete #4 in this quarter, however, we are committed to the first three.

If you want to contribute to the longer term vision on that proposal please let me know.


In the following weeks we will have more updates with regards to implementation details.


Stay tuned!



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

September 25, 2014 07:42 PM

September 23, 2014

Ben Hearsum (bhearsum)

Stop stripping (OS X builds), it leaves you vulnerable

While investigating some strange update requests on our new update server, I discovered that we have thousands of update requests from Beta users on OS X that aren’t getting an update, but should. After some digging I realized that most, if not all of these are coming from users who have installed one of our official Beta builds and subsequently stripped out the architecture they do not need from it. In turn, this causes our builds to report in such a way that we don’t know how to serve updates for them.

We’ll look at ways of addressing this, but the bottom line is that if you want to be secure: Stop stripping Firefox binaries!

September 23, 2014 05:38 PM

September 19, 2014

Ben Hearsum (bhearsum)

New update server has been rolled out to Firefox/Thunderbird Beta users

Yesterday marked a big milestone for the Balrog project when we made it live for Firefox and Thunderbird Beta users. Those with a good long term memory may recall that we switched Nightly and Aurora users over almost a year ago. Since then, we’ve been working on and off to get Balrog ready to serve Beta updates, which are quite a bit more complex than our Nightly ones. Earlier this week we finally got the last blocker closed and we flipped it live yesterday morning, pacific time. We have significantly (~10x) more Beta users than Nightly+Aurora, so it’s no surprise that we immediately saw a spike in traffic and load, but our systems stood up to it well. If you’re into this sort of thing, here are some graphs with spikey lines:
The load average on 1 (of 4) backend nodes:

The rate of requests to 1 backend node (requests/second):

Database operations (operations/second):

And network traffic to the database (MB/sec):

Despite hitting a few new edge cases (mostly around better error handling), the deployment went very smoothly – it took less than 15 minutes to be confident that everything was working fine.

While Nick and I are the primary developers of Balrog, we couldn’t have gotten to this point without the help of many others. Big thanks to Chris and Sheeri for making the IT infrastructure so solid, to Anthony, Tracy, and Henrik for all the testing they did, and to Rail, Massimo, Chris, and Aki for the patches and reviews they contributed to Balrog itself. With this big milestone accomplished we’re significantly closer to Balrog being ready for Release and ESR users, and retiring the old AUS2/3 servers.

September 19, 2014 02:31 PM

September 17, 2014

Kim Moir (kmoir)

Mozilla Releng: The ice cream

A week or so ago, I was commenting in IRC that I was really impressed that our interns had such amazing communication and presentation skills.  One of the interns, John Zeller said something like "The cream rises to the top", to which I replied "Releng: the ice cream of CS".  From there, the conversation went on to discuss what would be the best ice cream flavour to make that would capture the spirit of Mozilla releng.  The consensus at the end was was that Irish Coffee (coffee with whisky) with cookie dough chunks was the favourite.  Because a lot of people like on the team like coffee, whisky makes it better and who doesn't like cookie dough?

I made this recipe over the weekend with some modifications.  I used the coffee recipe from the Perfect Scoop.  After it was done churning in the ice cream maker,  instead of whisky, which I didn't have on hand, I added Kahlua for more coffee flavour.  I don't really like cookie dough in ice cream but cooked chocolate chip cookies cut up with a liberal sprinkling of Kahlua are tasty.

Diced cookies sprinkled with Kahlua

Ice cream ready to put in freezer

Finished product
I have to say, it's quite delicious :-) If I open source ever stops being fun, I'm going to start a dairy empire.  Not really. Now back to bugzilla...

September 17, 2014 01:43 PM

September 16, 2014

Armen Zambrano G. (@armenzg)

Which builders get added to buildbot?

To add/remove jobs on tbpl.mozilla.org, we have to modify buildbot-configs.

Making changes can be learnt by looking at previous patches, however, there's a bit of an art to it to get it right.

I just landed a script that sets up buildbot for you inside of a virtualenv and you can pass a buildbot-config patch and determine which builders get added/removed.

You can run this by checking out braindump and running something like this:
buildbot-related/list_builder_differences.sh -j path_to_patch.diff

NOTE: This script does not check that the job has all the right parameters once live (e.g. you forgot to specify the mozharness config for it).

Happy hacking!


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

September 16, 2014 03:26 PM

September 11, 2014

Armen Zambrano G. (@armenzg)

Run tbpl jobs locally with Http authentication (developer_config.py) - take 2

Back in July, we deployed the first version of Http authentication for mozharness, however, under some circumstances, the initial version could fail and affect production jobs.

This time around we have:

If you read How to run Mozharness as a developer you should see the new changes.

As quick reminder, it only takes 3 steps:

  1. Find the command from the log. Copy/paste it.
  2. Append --cfg developer_config.py
  3. Append --installer-url/--test-url with the right values
To see a real example visit this


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

September 11, 2014 12:45 PM

Massimo Gerva (mgerva)

Canada, whale watching

We loved our staying in Canada, here are some pictures about our travel.

Let’s start from the amazing whale watching day in Vancouver:

Click to view slideshow.

September 11, 2014 10:12 AM

September 10, 2014

Kim Moir (kmoir)

Mozilla pushes - August 2014

Here's August 2014's monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file.



Trends
It was another record breaking month.  No surprise here!

Highlights

General Remarks
Both Try and Gaia-Try have about 36% each of the pushes.  The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 21% of all the pushes.


Records
August 2014 was the month with most pushes (13,090  pushes)
August 2014 has the highest pushes/day average with 620 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
August 20, 2014 had the highest number of pushes in one day with 690 pushes






September 10, 2014 02:09 PM

September 09, 2014

Nick Thomas (nthomas)

ZNC and Mozilla IRC

ZNC is great for having a persistent IRC connection, but it’s not so great when the IRC server or network has a blip. Then you can end up failing to rejoin with

nthomas (…) has joined #releng
nthomas has left … (Max SendQ exceeded)

over and over again.

The way to fix this is to limit the number of channels ZNC can connect to simultaneously. In the Web UI, you change ‘Max Joins’ preference to something like 5. In the config file use ‘MaxJoins = 5′ in a <User foo> block.

September 09, 2014 10:19 AM

September 08, 2014

Jordan Lund (jlund)

This Week In Releng - Sept 1st, 2014

Major Highlights:

Completed work (resolution is 'FIXED'):


In progress work (unresolved and not assigned to nobody):

September 08, 2014 04:58 AM

September 06, 2014

Hal Wine (hwine)

New Hg Server Status Page

New Hg Server Status Page

Just a quick note to let folks know that the Developer Services team continues to make improvements on Mozilla’s Mercurial server. We’ve set up a status page to make it easier to check on current status.

As we continue to improve monitoring and status displays, you’ll always find the “latest and greatest” on this page. And we’ll keep the page updated with recent improvements to the system. We hope this page will become your first stop whenever you have questions about our Mercurial server.

September 06, 2014 07:00 AM

September 01, 2014

Nick Thomas (nthomas)

Deprecating our old rsync modules

We’ve removed the rsync modules mozilla-current and mozilla-releases today, after calling for comment a few months ago and hearing no objections. Those modules were previously used to deliver Firefox and other Mozilla products to end users via a network of volunteer mirrors but we now use content delivery networks (CDN). If there’s a use case we haven’t considered then please get in touch in the comments or on the bug.

September 01, 2014 10:09 PM

August 26, 2014

Chris AtLee (catlee)

Gotta Cache 'Em All

TOO MUCH TRAFFIC!!!!

Waaaaaaay back in February we identified overall network bandwidth as a cause of job failures on TBPL. We were pushing too much traffic over our VPN link between Mozilla's datacentre and AWS. Since then we've been working on a few approaches to cope with the increased traffic while at the same time reducing our overall network load. Most recently we've deployed HTTP caches inside each AWS region.

Network traffic from January to August 2014

The answer - cache all the things!

Obligatory XKCD

Caching build artifacts

The primary target for caching was downloads of build/test/symbol packages by test machines from file servers. These packages are generated by the build machines and uploaded to various file servers. The same packages are then downloaded many times by different machines running tests. This was a perfect candidate for caching, since the same files were being requested by many different hosts in a relatively short timespan.

Caching tooltool downloads

Tooltool is a simple system RelEng uses to distribute static assets to build/test machines. While the machines do maintain a local cache of files, the caches are often empty because the machines are newly created in AWS. Having the files in local HTTP caches speeds up transfer times and decreases network load.

Results so far - 50% decrease in bandwidth

Initial deployment was completed on August 8th (end of week 32 of 2014). You can see by the graph above that we've cut our bandwidth by about 50%!

What's next?

There are a few more low hanging fruit for caching. We have internal pypi repositories that could benefit from caches. There's a long tail of other miscellaneous downloads that could be cached as well.

There are other improvements we can make to reduce bandwidth as well, such as moving uploads from build machines to be outside the VPN tunnel, or perhaps to S3 directly. Additionally, a big source of network traffic is doing signing of various packages (gpg signatures, MAR files, etc.). We're looking at ways to do that more efficiently. I'd love to investigate more efficient ways of compressing or transferring build artifacts overall; there is a ton of duplication between the build and test packages between different platforms and even between different pushes.

I want to know MOAR!

Great! As always, all our work has been tracked in a bug, and worked out in the open. The bug for this project is 1017759. The source code lives in https://github.com/mozilla/build-proxxy/, and we have some basic documentation available on our wiki. If this kind of work excites you, we're hiring!

Big thanks to George Miroshnykov for his work on developing proxxy.

August 26, 2014 02:21 PM

August 18, 2014

Jordan Lund (jlund)

This week in Releng - Aug 11th 2014

Completed work (resolution is 'FIXED'):


In progress work (unresolved and not assigned to nobody):

August 18, 2014 06:38 AM

August 12, 2014

Ben Hearsum (bhearsum)

Upcoming changes to Mac package layout, signing

Apple recently announced changes to how OS X applications must be packaged and signed in order for them to function correctly on OS X 10.9.5 and 10.10. The tl;dr version of this is “only mach-O binaries may live in .app/Contents/MacOS, and signing must be done on 10.9 or later”. Without any changes, future versions of Firefox will cease to function out-of-the-box on OS X 10.9.5 and 10.10. We do not have a release date for either of these OS X versions yet.

Changes required:
* Move all non-mach-O files out of .app/Contents/MacOS. Most of these will move to .app/Contents/Resources, but files that could legitimately change at runtime (eg: everything in defaults/) will move to .app/MozResources (which can be modified without breaking the signature): https://bugzilla.mozilla.org/showdependencytree.cgi?id=1046906&hide_resolved=1. This work is in progress, but no patches are ready yet.
* Add new features to the client side update code to allow partner repacks to continue to work. (https://bugzilla.mozilla.org/show_bug.cgi?id=1048921)
* Create and use 10.9 signing servers for these new-style apps. We still need to use our existing 10.6 signing servers for any builds without these changes. (https://bugzilla.mozilla.org/show_bug.cgi?id=1046749 and https://bugzilla.mozilla.org/show_bug.cgi?id=1049595)
* Update signing server code to support new v2 signatures.

Timeline:
We are intending to ship the required changes with Gecko 34, which ships on November 25th, 2014. The changes required are very invasive, and we don’t feel that they can be safely backported to any earlier version quickly enough without major risk of regressions. We are still looking at whether or not we’ll backport to ESR 31. To this end, we’ve asked that Apple whitelist Firefox and Thunderbird versions that will not have the necessary changes in them. We’re still working with them to confirm whether or not this can happen.

This has been cross posted a few places – please send all follow-ups to the mozilla.dev.platform newsgroup.

August 12, 2014 05:05 PM

August 11, 2014

Jordan Lund (jlund)

This Week In Releng - Aug 4th, 2014

Major Highlights:

Completed work (resolution is 'FIXED'):

In progress work (unresolved and not assigned to nobody):

August 11, 2014 01:09 AM

August 08, 2014

Kim Moir (kmoir)

Mozilla pushes - July 2014

Here's the July 2014 monthly analysis of the pushes to our Mozilla development trees. You can load the data as an HTML page or as a json file.
 
Trends
Like every month for the past while, we had a new record number of pushes. In reality, given that July is one day longer than June, the numbers are quite similar.

Highlights


General remarks
Try keeps on having around 38% of all the pushes. Gaia-Try is in second place with around 31% of pushes.  The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 22% of all the pushes.

Records 
July 2014 was the month with most pushes (12,755 pushes)
June 2014 has the highest pushes/day average with 662 pushes/day
July 2014 has the highest average of "pushes-per-hour" with 23.51 pushes/hour
June 4th, 2014 had the highest number of pushes in one day with 662 
 

August 08, 2014 06:16 PM

August 07, 2014

Kim Moir (kmoir)

Scaling mobile testing on AWS

Running tests for Android at Mozilla has typically meant running on reference devices.  Physical devices that run jobs on our continuous integration farm via test harnesses.  However, this leads to the same problem that we have for other tests that run on bare metal.  We can't scale up our capacity without going buying new devices, racking them, configuring them for the network and updating our configurations.  In addition, reference cards, rack mounted or not, are rather delicate creatures and have higher retry rates (tests fail due to infrastructure issues and need to be rerun) than those running on emulators (tests run on an Android emulator in a VM on bare metal or cloud)

Do Android's Dream of Electric Sheep?  ©Bill McIntyre, Creative Commons by-nc-sa 2.0
Recently, we started running Android 2.3 tests on emulators in AWS.  This works well for unit tests (correctness tests).  It's not really appropriate for performance tests, but that's another story.  This impetus behind this change was so we could decommission Tegras, the reference devices we used for running Android 2.2 tests. 

We run many Linux based tests, including Android emulators on AWS spot instances.  Spot instances are AWS excess capacity that you can bid on.  If someone outbids the price you have paid for your spot instance, you instance can be terminated.  But that's okay because we retry jobs if they fail for infrastructure reasons.  The overall percentage of spot instances that are terminated is quite small.  The huge advantage to using spot instances is price.  They are much cheaper than on-demand instances which has allowed us to increase our capacity while continuing to reduce our AWS bill

We have a wide variety of unit tests that run on emulators for mobile on AWS.  We encountered an issue where some of the tests wouldn't run on the default instance type (m1.medium), that we use for our spot instances.   Given the number of jobs we run, we want to run on the cheapest AWS instance type that where the tests will complete successfully.  At the time we first tested it, we couldn't find an instance type where certain CPU/memory intensive tests would run.  So when I first enabled Android 2.3 tests on emulators, I separated the tests so that some would run on AWS spot instances and the ones that needed a more powerful machine would run on our inhouse Linux capacity.  But this change consumed all of the capacity of that pool and we had very high number of pending jobs in that pool.  This meant that people had to wait a long time for their test results.  Not good.

To reduce the pending counts, we needed to buy some more in house Linux capacity or try to run a selected subset of the tests that need more resources or find a new AWS instance type where they would complete successfully.  Geoff from the ATeam ran the tests on the c3.xlarge instance type he had tried before and now it seemed to work.  In his earlier work the tests did not complete successfully on this instance type.  We are unsure as to the reasons why.  One of the things about working with AWS is that we don't have a window into the bugs that they fix at their end.  So this particular instance type didn't work before, but it does now.

The next steps for me were to create a new AMI (Amazon machine image) that would serve as as the "golden" version for instances that would be created in this pool.  Previously, we used Puppet to configure our AWS test machines but now just regenerate the AMI every night via cron and this is the version that's instantiated.  The AMI was a copy of the existing Ubuntu64 image that we have but it was configured to run on the c3.xlarge instance type instead of m1.medium. This was a bit tricky because I had to exclude regions where the c3.xlarge instance type was not available.  For redundancy (to still have capacity if an entire region goes down) and cost (some regions are cheaper than others), we run instances in multiple AWS regions

Once I had the new AMI up that would serve as the template for our new slave class, I created a slave with the AMI and verified running the tests we planned to migrate on my staging server.  I also enabled two new Linux64 buildbot masters in AWS to service these new slaves, one in us-east-1 and one in us-west-2.  When enabling a new pool of test machines, it's always good to look at the load on the current buildbot masters and see if additional masters are needed so the current masters aren't overwhelmed with too many slaves attached.

After the tests were all green, I modified our configs to run this subset of tests on a branch (ash), enabled the slave platform in Puppet and added a pool of devices to this slave platform in our production configs.  After the reconfig deployed these changes into production, I landed a regular expression to watch_pending.cfg to so that new tst-emulator64-spot pool of machines would be allocated to the subset of tests and branch I enabled them on. The watch_pending.py script watches the number of pending jobs that on AWS and creates instances as required.  We also have scripts to terminate or stop idle instances when we don't get them.  Why pay for machines when you don't need them now?  After the tests ran successfully on ash, I enabled running the tests on the other relevant branches.

Royal Border Bridge.  Also, release engineers love to see green builds and tests.  ©Jonathan Combe, Creative Commons by-nc-sa 2.0
The end result is that some Android 2.3 tests run on m1.medium or (tst-linux64-spot instances), such as mochitests.



And some Android 2.3 tests run on c3.xlarge or (tst-emulator64-spot instances), such as crashtests.

 

In enabling this slave class within our configs, we were also able to reuse it for some b2g tests which also faced the same problem where they needed a more powerful instance type for the tests to complete.

Lessons learned:
Use the minimum (cheapest) instance type required to complete your tests
As usual, test on a branch before full deployment
Scaling mobile tests doesn't mean more racks of reference cards

Future work:
Bug 1047467 c3.xlarge instance types are expensive, let's test running those tests on a range of instance types that are cheaper

Further reading:
AWS instance types 
Chris Atlee wrote about how we Now Use AWS Spot Instances for Tests
Taras Glek wrote How Mozilla Amazon EC2 Usage Got 15X Cheaper in 8 months
Rail Aliiev http://rail.merail.ca/posts/firefox-builds-are-way-cheaper-now.html 
Bug 980519 Experiment with other instance types for Android 2.3 jobs 
Bug 1024091 Address high pending count in in-house Linux64 test pool 
Bug 1028293 Increase Android 2.3 mochitest chunks, for aws 
Bug 1032268 Experiment with c3.xlarge for Android 2.3 jobs
Bug 1035863 Add two new Linux64 masters to accommodate new emulator slaves
Bug 1034055 Implement c3.xlarge slave class for Linux64 test spot instances
Bug 1031083 Buildbot changes to run selected b2g tests on c3.xlarge
Bug 1047467 c3.xlarge instance types are expensive, let's try running those tests on a range of instance types that are cheaper

August 07, 2014 06:24 PM

August 04, 2014

Jordan Lund (jlund)

This Week In Releng - July 28th, 2014

Major Highlights:

Completed Work (marked as resolved):

In progress work (unresolved and not assigned to nobody):

August 04, 2014 04:22 PM

July 28, 2014

Kim Moir (kmoir)

2014 USENIX Release Engineering Summit CFP now open

The CFP for the 2014 Release Engineering summit (Western edition) is now open.  The deadline for submissions is September 5, 2014 and speakers will be notified by September 19, 2014.  The program will be announced in late September.  This one day summit on all things release engineering will be held in concert with LISA, in Seattle on November 10, 2014. 

Seattle skyline © Howard Ignatius, https://flic.kr/p/6tQ3H Creative Commons by-nc-sa 2.0


From the CFP


"Suggestions for topics include (but are not limited to):
URES '14 West is looking for relevant and engaging speakers and workshop facilitators for our event on November 10, 2014, in Seattle, WA. URES brings together people from all areas of release engineering—release engineers, developers, managers, site reliability engineers, and others—to identify and help propose solutions for the most difficult problems in release engineering today."

War and horror stories. I like to see that in a CFP.  Describing how you overcame problems with  infrastructure and tooling to ship software are the best kinds of stories.  They make people laugh. Maybe cry as they realize they are currently living in that situation.  Good times.  Also, I think talks around scaling high volume continuous integration farms will be interesting.  Scaling issues are a lot of fun and expose many issues you don't see when you're only running a few builds a day. 

If you have any questions surrounding the CFP, I'm happy to help as I'm on the program committee.   (my irc nick is kmoir (#releng) as is my email id at mozilla.com)

July 28, 2014 09:28 PM

July 25, 2014

Aki Sasaki (aki)

on leaving mozilla

Today's my last day at Mozilla. It wasn't an easy decision to move on; this is the best team I've been a part of in my career. And working at a company with such idealistic principles and the capacity to make a difference has been a privilege.

Looking back at the past five-and-three-quarter years:

I will stay a Mozillian, and I'm looking forward to see where we can go from here!



comment count unavailable comments

July 25, 2014 07:26 PM

July 18, 2014

Kim Moir (kmoir)

Reminder: Release Engineering Special Issue submission deadline is August 1, 2014

Just a friendly reminder that the deadline for the Release Engineering Special Issue is August 1, 2014.  If you have any questions about the submission process or a topic that's you'd like to write about, the guest editors, including myself, are happy to help you!

July 18, 2014 10:03 PM

Mozilla pushes - June 2014

Here's June 2014's  analysis of the pushes to our Mozilla development trees. You can load the data as an HTML page or as a json file

Trends
This was another record breaking month with a total of 12534 pushes.  As a note of interest, this is is over double the number of pushes we had in June 2013. So big kudos to everyone who helped us scale our infrastructure and tooling.  (Actually we had 6,433 pushes in April 2013 which would make this less than half because June 2013 was a bit of a dip.  But still impressive :-)

Highlights

General Remarks
The introduction of Gaia-try in April has been very popular and comprised around 30% of pushes in June compared to 29% last month.
The Try branch itself consisted of around 38% of pushes.
The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 21% of all the pushes, compared to 22% in the previous month.

Records
June 2014 was the month with most pushes (12534 pushes)
June 2014 has the highest pushes/day average with
418 pushes/day
June 2014 has the highest average of "pushes-per-hour" is
23.17 pushes/hour
June 4th, 2014 had the highest number of pushes in one day with
662 pushes





July 18, 2014 09:46 PM

Massimo Gerva (mgerva)

apache rewrite rules

problem:

always serve the content from an external web server unless the content is available locally:

RewriteEngine on
RewriteCond %{REQUEST_URI} !-U
RewriteRule ^(.+) http://example.com/$1

thanks mod_rewrite!


July 18, 2014 06:53 PM

July 15, 2014

Armen Zambrano G. (@armenzg)

Developing with GitHub and remote branches

I have recently started contributing using Git by using GitHub for the Firefox OS certification suite.

It has been interestting switching from Mercurial to Git. I honestly believed it would be more straight forward but I have to re-read again and again until the new ways sink in with me.

jgraham shared with me some notes (Thanks!) with regards what his workflow looks like and I want to document it for my own sake and perhaps yours:
git clone git@github.com:mozilla-b2g/fxos-certsuite.git

# Time passes

# To develop something on master
# Pull in all the new commits from master

git fetch origin

# Create a new branch (this will track master from origin,
# which we don't really want, but that will be fixed later)

git checkout -b my_new_thing origin/master

# Edit some stuff

# Stage it and then commit the work

git add -p
git commit -m "New awesomeness"

# Push the work to a remote branch
git push --set-upstream origin HEAD:jgraham/my_new_thing

# Go to the GH UI and start a pull request

# Fix some review issues
git add -p
git commit -m "Fix review issues" # or use --fixup

# Push the new commits
git push

# Finally, the review is accepted
# We could rebase at this point, however,
# we tend to use the Merge button in the GH UI
# Working off a different branch is basically the same,
# but you replace "master" with the name of the branch you are working off.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 15, 2014 09:04 PM

July 14, 2014

Massimo Gerva (mgerva)

bash magic

I have find this command in one of our startup script:

<command>
ret=$?
return $?

Our scripts worked fine for months but then some random error appeared. The problem with the above code is that it will always return 0

ret (unused variable) stores the exit code of  <command> but then the script returns $? .

The second $? refers to the status of the assignment of the variable, not the exit code of <command>.

Here is the an updated (and working) version of the code

<command>
return $?

not to self: remember to remove all the unused bash variables.


July 14, 2014 11:33 PM

July 11, 2014

Armen Zambrano G. (@armenzg)

Introducing Http authentication for Mozharness.

A while ago, I asked a colleague (you know who you are! :P) of mine how to run a specific type of test job on tbpl on my local machine and he told me with a smirk, "With mozharness!"

I wanted to punch him (HR: nothing to see here! This is not a literal punch, a figurative one), however he was right. He had good reason to say that, and I knew why he was smiling. I had to close my mouth and take it.

Here's the explanation on why he said that: most jobs running inside of tbpl are being driven by Mozharness, however they're optimized to run within the protected network of Release Engineering. This is good. This is safe. This is sound. However, when we try to reproduce a job outside of the Releng network, it becomes problematic for various reasons.

Many times we have had to guide people who are unfamiliar with mozharness as they try to run it locally with success. (Docs: How to run Mozharness as a developer). However, on other occasions when it comes to binaries stored on private web hosts, it becomes necessary to loan a machine. A loaned machine can reach those files through internal domains since it is hosted within the Releng network.

Today, I have landed a piece of code that does two things:
This change, plus the recently-introduced developer configs for Mozharness, makes it much easier to run mozharness outside of continuous integration infrastructure.

I hope this will help developers have a better experience reproducing the environments used in the tbpl infrastructure. One less reason to loan a machine!

This makes me *very* happy (see below) since I don't have VPN access anymore.




Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 11, 2014 07:42 PM

Using developer configs for Mozharness

To help run mozharness by developers I have landed some configs that can be appended to the command appearing on tbpl.

All you have to do is:
  • Find the mozharness script line in a log from tbpl (search for "script/scripts")
  • Look for the --cfg parameter and add it again but it should end with "_dev.py"
    • e.g. --cfg android/androidarm.py --cfg android/androidarm_dev.py
  • Also add the --installer-url and --test-url parameters as explained in the docs
Developer configs have these things in common:
  • They have the same name as the production one but instead end in "_dev.py"
  • They overwrite the "exes" dict with an empty dict
    • This allows to use the binaries in your personal $PATH
  • They overwrite the "default_actions" list
    • The main reason is to remove the action called read-buildbot-configs
  • They fix URLs to point to the right public reachable domains 
Here are the currently available developer configs:
You can help by adding more of them!















Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 11, 2014 07:15 PM

July 04, 2014

Kim Moir (kmoir)

This week in Mozilla Releng - July 4, 2014

This is a special double issue of this week in releng. I was so busy in the last week that I didn't get a chance to post this last week.  Despite the fireworks for Canada Day and Independence Day,  Mozilla release engineering managed to close some bugs. 

Major highlights:
 Completed work (resolution is 'FIXED'):
In progress work (unresolved and not assigned to nobody):

July 04, 2014 09:39 PM

July 03, 2014

Armen Zambrano G. (@armenzg)

Tbpl's blobber uploads are now discoverable

What is blobber? Blobber is a server and client side set of tools that allow Releng's test infrastructure to upload files without requiring to deploy ssh keys on them.

This is useful since it allows uploads of screenshots, crashdumps and any other file needed to debug what failed on a test job.

Up until now, if you wanted your scripts determine the files uploaded in a job, you would have to download the log and parse it to find the TinderboxPrint lines for Blobbler uploads, e.g.
15:21:18 INFO - (blobuploader) - INFO - TinderboxPrint: Uploaded 70485077-b08a-4530-8d4b-c85b0d6f9bc7.dmp to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/5778e0be8288fe8c91ab69dd9c2b4fbcc00d0ccad4d3a8bd78d3abe681af13c664bd7c57705822a5585655e96ebd999b0649d7b5049fee1bd75a410ae6ee55af
Now, you can look for the set of files uploaded by looking at the uploaded_files.json that we upload at the end of all uploads. This can be discovered by inspecting the buildjson files or by listening to the pulse events. The key used is called "blobber_manifest_url" e.g.
"blobber_manifest_url": "http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/39e400b6b94ac838b4e271ed61a893426371990f1d0cc45a7a5312d495cfdb485a1866d7b8012266621b4ee4df0cf9aa7d0f6d0e947ff63785543d80962aaf9b",
In the future, this feature will be useful when we start uploading structured logs. It will help us not to download logs to extract meta-data about the jobs!

No, your uploads are not this ugly
This work was completed in bug 986112. Thanks to aki, catlee, mtabara and rail to help me get this out the door. You can read more about Blobber by visiting: "Blobber is live - upload ALL the things!" and "Blobber - local environment setup".


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 03, 2014 12:02 PM

July 02, 2014

Hal Wine (hwine)

2014-06 try server update

2014-06 try server update

Chatting with Aki the other day, I realized that word of all the wonderful improvements to the try server issue have not been publicized. A lot of folks have done a lot of work to make things better - here’s a brief summary of the good news.

Before:
Try server pushes could appear to take up to 4 hours, during which time others would be locked out.
Now:
The major time taker has been found and eliminated: ancestor processing. And we understand the remaining occasional slow downs are related to caching . Fortunately, there are some steps that developers can take now to minimize delays.

What folks can do to help

The biggest remaining slowdown is caused by rebuilding the cache. The cache is only invalidated if the push is interrupted. If you can avoid causing a disconnect until your push is complete, that helps everyone! So, please, no Ctrl-C during the push! The other changes should address the long wait times you used to see.

What has been done to infrastructure

There has long been a belief that many of our hg problems, especially on try, came from the fact that we had r/w NFS mounts of the repositories across multiple machines (both hgssh servers & hgweb servers). For various historical reasons, a large part of this was due to the way pushlog was implemented.

Ben did a lot of work to get sqlite off NFS, and much of the work to synchronize the repositories without NFS has been completed.

What has been done to our hooks

All along, folks have been discussing our try server performance issues with the hg developers. A key confusing issue was that we saw processes “hang” for VERY long times (45 min or more) without making a system call. Kendall managed to observe an hg process in such an infinite-looking-loop-that-eventually-terminated a few times. A stack trace would show it was looking up an hg ancestor without makes system calls or library accesses. In discussions, this confused the hg team as they did not know of any reason that ancestor code should be being invoked during a push.

Thanks to lots of debugging help from glandium one evening, we found and disabled a local hook that invoked the ancestor function on every commit to try. \o/ team work!

Caching – the remaining problem

With the ancestor-invoking-hook disabled, we still saw some longish periods of time where we couldn’t explain why pushes to try appeared hung. Granted it was a much shorter time, and always self corrected, but it was still puzzling.

A number of our old theories, such as “too many heads” were discounted by hg developers as both (a) we didn’t have that many heads, and (b) lots of heads shouldn’t be a significant issue – hg wants to support even more heads than we have on try.

Greg did a wonderful bit of sleuthing to find the impact of ^C during push. Our current belief is once the caching is fixed upstream, we’ll be in a pretty good spot. (Especially with the inclusion of some performance optimizations also possible with the new cache-fixed version.)

What is coming next

To take advantage of all the good stuff upstream Hg versions have, including the bug fixes we want, we’re going to be moving towards removing roadblocks to staying closer to the tip. Historically, we had some issues due to http header sizes and load balancers; ancient python or hg client versions; and similar. The client issues have been addressed, and a proper testing/staging environment is on the horizon.

There are a few competing priorities, so I’m not going to predict a completion date. But I’m positive the future is coming. I hope you have a glimpse into that as well.

July 02, 2014 07:00 AM

July 01, 2014

Armen Zambrano G. (@armenzg)

Down Memory Lane

It was cool to find an article from "The Senecan" which talks about how through Seneca, Lukas and I got involved and hired by Mozilla. Here's the article.



Here's an excerpt:
From Mozilla volunteers to software developers 
It pays to volunteer for Mozilla, at least it did for a pair of Seneca Software Development students. 
Armen Zambrano and Lukas Sebastian Blakk are still months away from graduating, but that hasn't stopped the creators behind the popular web browser Firefox from hiring them. 
When they are not in class learning, the Senecans will be doing a wide range of software work on the company’s browser including quality testing and writing code. “Being able to work on real code, with real developers has been invaluable,” says Lukas. “I came here to start a new career as soon as school is done, and thanks to the College’s partnership with Mozilla I've actually started it while still in school. I feel like I have a head start on the path I've chosen.”  
Firefox is a free open source web browser that can...



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

July 01, 2014 05:58 PM

June 30, 2014

Nick Thomas (nthomas)

Keeping track of buildbot usage

Mozilla Release Engineering provides some simple trending of the Buildbot continuous integration system, which can be useful to check how many jobs are currently running versus pending. There are graphs of the last 24 hours broken out in various ways – for example compilation separate from tests, compilation on try and everything else. This data also feeds into the pending queue on trychooser.

ImageUntil recently the mapping of job name to machine pool was out of date, due to our rapid growth for b2g and into Amazon’s AWS, so the graphs were more misleading than useful. This has now been corrected and I’m working on making sure it stays up to date automatically.

Update: Since July 18 the system stays up to date automatically, in just about all cases.

June 30, 2014 04:31 AM

June 24, 2014

Chris AtLee (catlee)

B2G now building using unified sources

Last week, with the help of Ehsan and John, we finally enabled unified source builds for B2G devices.

As a result we're building device builds approximately 40% faster than before.

Between June 12th and June 17th, 50% of our successful JB emulator builds on mozilla-inbound finished in 97 minutes or less. Using unified sources for these builds reduced the 50th percentile of build times down to 60 minutes (from June 19th to June 24th).

To mitigate the risks of changes landing that break non-unified builds, we're also doing periodic non-unified builds for these devices.

As usual, all our work here was done in the open. If you're interested, read along in bug 950676, and bug 942167.

Do you enjoy building, debugging and optimizing build, test & release pipelines? Great, because we're hiring!

June 24, 2014 07:23 PM

June 20, 2014

Kim Moir (kmoir)

Introducing Mozilla Releng's summer interns

The Mozilla Release Engineering team recently welcomed three interns to our team for the summer.

Ian Connolly is a student at Trinity College in Dublin. This is his first term with Mozilla and he's working on preflight slave tasks and an example project for Releng API.
Andhad Jai Singh is a student at Indian Institute of Technology Hyderabad.  This is his second term working at Mozilla, he was a Google Summer of Code student with the Ateam last year.  This term he's working on generating partial updates on request.
John Zeller is also a returning student and studies at Oregon State University.  He previously had a work term with Mozilla releng and also worked during the past school term as a student worker implementing Mozilla Releng apps in Docker. This term he'll work on updating our ship-it application  so that release automation updates ship it more frequently so we can see the state of the release, as well as integrating post-release tasks.

 
View from Mozilla San Francisco Office

Please drop by and say hello to them if you're in our San Francisco office.  Or say hello to them in #releng - their irc nicknames are ianconnolly, ffledgling and zeller respectively.

Welcome!

June 20, 2014 09:24 PM

This week in Mozilla Releng - June 20, 2014

Ben is away for the next few Fridays, so I'll be covering this blog post for the next couple of weeks.

Major highlights:


Completed work (resolution is 'FIXED'):
In progress work (unresolved and not assigned to nobody):

June 20, 2014 09:23 PM

Armen Zambrano G. (@armenzg)

My first A-team project: install all the tests!


As a welcoming bug to the A-team I had to deal with changing what tests get packaged.
The goal was to include all tests on a tests.zip regardless if they are marked as disabled on the test manifests or not.

Changing it the packaging was not too difficult as I already had pointers from jgriffin, the problem came with the runners.
The B2G emulator and desktop mochitest runners did not read the manifests; what they did is to run all tests that came inside of the tests.zip (even disabled ones).

Unfortunately for me, the mochitest runners code is very very old and it was hard to figure out how to make it work as clean as possible. I did a lot of mistakes and landed it twice incorrectly (improper try landing and lost my good patch somewhere) - sorry Ryan!.

After a lot of tweaking it, reviews from jmaher and help from ted & ahal, it landed last week.

For more details you can read bug 989583.

PS = Using trigger_arbitrary_builds.py was priceless to speed up my development.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

June 20, 2014 08:06 PM

June 19, 2014

John Zeller (zeller)

Tupperware: Mozilla apps in Docker!

Announcing Tupperware, a setup for Mozilla apps in Docker! Tupperware is portable, reusable, and containerized. But unlike typical tupperware, please do not put it in the Microwave.

Tupperware

Why?

This is a project born out of a need to lower the barriers to entry for new contributors to Release Engineering (RelEng) maintained apps and services. Historically, RelEng has had greater difficulty attracting community contributors than other parts of Mozilla, due in large part to how much knowledge is needed to get going in the first place. For a new contributor, it can be quite overwhelming to jump into any number of the code bases that RelEng maintains and often leads to quickly losing that new contributor out of exaspiration. Beyond new contributors, Tupperware is great for experienced contributors as well to assist in keeping an unpolluted development environment and testing patches.

What?

Currently Tupperware includes the following Mozilla apps:

BuildAPI – a Pylons project used by RelEng to surface information collected from two databases updated through our buildbot masters as they run jobs.

BuildBot – a job (read: builds and tests) scheduling system to queue/execute jobs when the required resources are available, and reporting the results.

Dependency apps currently included:

RabbitMQ – a messaging queue used by RelEng apps and services

MySQL – Forked from orchardup/mysql

How?

Vagrant is used as a quick and easy way to provision the docker apps and make the setup truly plug n’ play. The current setup only has a single Vagrantfile which launches BuildAPI and BuildBot, with their dependency apps RabbitMQ and MySQL.

How to run:

– Install Vagrant 1.6.3

– hg clone https://hg.mozilla.org/build/tupperware/ && cd tupperware && vagrant up (takes >10 minutes the first time)

Where to see apps:

– BuildAPI: http://127.0.0.1:8888/

– BuildBot: http://127.0.0.1:8000/

– RabbitMQ Management: http://127.0.0.1:15672/

Troubleshooting tips are available in the Tupperware README.

What’s Next?

Now that Tupperware is out there, it’s open to contributors! The setup does not need to stay solely usable for RelEng apps and services. So please submit bugs to add new ones! There are a few ideas for adding functionality to Tupperware already:

Have ideas? Submit a bug!

June 19, 2014 12:00 AM

June 16, 2014

Ben Hearsum (bhearsum)

June 17th Nightly/Aurora updates of Firefox, Fennec, and Thunderbird will be slightly delayed

As part of the ongoing work to move our Betas and Release builds to our new update server, I’ll be landing a fairly invasive change to it today. Because it requires a new schema for its data updates will be slightly delayed while the data repopulates in the new format as the nightlies stream in. While that’s happening, updates will continue to point at the builds from today (June 16th).

Once bug 1026070 is fixed, we will be able to do these sort of upgrades without any delay to users.

June 16, 2014 07:05 PM

How to not get spammed by Bugzilla

Bugmail is a running joke at Mozilla. Nearly everyone I know that works with Bugzilla (especially engineers) complains about the amount of bugmail they get. I too suffered from this problem for years, but with some tweaks to preferences and workflow, this problem can be solved. Here’s how I do it:

E-mail preferences

Here’s what my full e-mail settings look like:

And here’s my Zimbra filter for changes made by me (I think the “from” header part is probably unnecessary, though):

Workflow

This section is mostly just an advertisement for the “My Dashboard” feature on Mozilla’s Bugzilla. By default, it shows you your assigned bugs, requested flags, and flags requested of you. Look at it at regular intervals (I try to restrict myself to once in the morning, and once before my EOD), particularly the “flags requested of you” section.

The other important thing is to generally stop caring about a bug unless it’s either assigned to you, or there’s a flag requested of you specifically. This ties in to some of the e-mail pref changes above. Changing my default state from “I must keep track of all bugs I might care about” to “I will keep track of my bugs & my requests, and opt-in to keeping tracking of anything else” is a shift in mindset, but a game changer when it comes to the amount of e-mail (and cognitive load) that Bugzilla generates.

With these changes it takes me less than 15 minutes to go through my bugmail every morning (even on Mondays). I can even ignore it at times, because “My Dashboard” will make sure I don’t miss anything critical. Big thanks to the Bugzilla devs who made some of these new things possible, particularly glob and dkl. Glob also mentioned that even more filtering possibilities are being made possible by bug 990980. The preview he sent me looks infinitely customizable:

June 16, 2014 01:11 PM

June 13, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – June 13th, 2014 – *double edition*

I spaced and forgot to post this last week, so here’s a double edition covering everything so far this month. I’ll also be away for the next 3 Fridays, and Kim volunteered to take the reigns in my stead. Now, on with it!

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

June 13, 2014 04:45 PM

June 11, 2014

Armen Zambrano G. (@armenzg)

Who doesn't like cheating on the Try server?

Have you ever forgotten about adding a platform to your Try push and had to push again?
Have you ever wished to *just* make changes to a tests.zip file without having to build it first?
Well, this is your lucky day!

In this wiki page, I describe how to trigger arbitrary jobs on you try push.
As always be gentle with how you use it as we all share the resources.

Go crazy!













Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

June 11, 2014 04:23 PM

June 10, 2014

Kim Moir (kmoir)

Talking about speaking up

We all interpret life through the lens of our previous experiences.  It's difficult to understand what each day is like for someone who has had a life fundamentally different from your own because you simply haven't had those experiences.  I don't understand what it's like to transition from male to female while involved in an open source community.  I don't know the steps taken to become an astrophysicist.  To embark to a new country as an immigrant.   I haven't lived struggled to survive on the streets as homeless person. Or a person who has been battered by domestic abuse.  To understand the experiences of others, all we can do is listen and learn from others, with empathy.

There have been many news stories recently about women or other underrepresented groups in technology.   I won't repeat them because frankly, they're quite depressing.  They go something like this:
1.  Incident of harassment/sexism either online/at a company/in a community/at a conference
2.  People call out this behaviour online and ask the organization to apologize and take steps to prevent this in the future.
3.  People from underrepresented groups who speak up about behaviour are told that their feelings are not valid or they are overreacting.  Even worse, they are harassed online with hateful statements telling them they don't belong in tech or are threatened with sexual assault or other acts of violence.
4.  Company/community/conference apologizes and issue written statement. Or not.
5. Goto 1


I watched an extraordinary talk the other day that really provided a vivid perspective about the challenges that women in technology face and what people can do to help. Brianna Wu is head of development at Giant Spacekat, a game development company.  She gave the talk "Nine ways to stop hurting and start helping women in tech" at AltConf last week.  She is brutally honest with the problems that exist in our companies and communities, and the steps forward to make it better. 




She talks about how she is threatened and harassed online. She also discusses how random people threatening you on the internet is not a just theoretical, but really frightening because she knows it could result in actual physical violence.   The same thing applies to street harassment. 

Here's the thing about being a woman.  I'm a physically strong person. I can run.  But I'm keenly aware that men are almost always bigger than me, and by basic tenets of physiology, stronger than me. So if a man tried to physically attack me, chances are I'd lose that fight.  So when someone threatens you, online or not, it is profoundly frightening because you fear for your physical safety. And to have that happen over and over again, like many women in our industry experience, apart from being terrifying, is exhausting and has a huge emotional toll.

I was going to summarize the points she brings up in her talk but she speaks so powerfully that all I can do is encourage you to watch the talk.

One of her final points really drives home the need for change in our industry when she says to the audience "This is not a problem that women can solve on their own....If you talk to your male friends out there, you guys have a tremendous amount of power as peers.  To talk to them and say, look dude this isn't okay.  You can't do this, you can't talk this way.  You need to think about this behaviour. You guys need to make a difference in a way that I can't."  Because when she talks about this behaviour to men, it often goes in one ear and out the next.  To be a ally in any sense of the word, you need to speak up.

THIS 1000x THIS.

Thank you Brianna for giving this talk.  I hope that when others see it they will gain some insight and feel some empathy on the challenges that women, and other underrepresented groups in the technology industry face.  And that you will all speak up too.

Further reading
Ashe Dryden's The 101-Level Reader: Books to Help You Better Understand Your Biases and the Lived Experiences of People                                                                                                           
Ashe Dryden Our most wicked problem

June 10, 2014 01:30 AM

June 04, 2014

Ben Hearsum (bhearsum)

More on “How far we’ve come”

After I posted “How far we’ve come” this morning a few people expressed interest in what our release process looked like before, and what it looks like now.

The earliest recorded release process I know of was called the “Unified Release Process”. (I presume “unified” comes from unifying the ways different release engineers did things.) As you can see, it’s a very lengthy document, with lots of shell commands to tweak/copy/paste. A lot of the things that get run are actually scripts that wrap some parts of the process – so it’s not as bad as it could’ve been.

I was around for much of the improvements to this process. Awhile back I wrote a series of blog posts detailing some of them. For those interested, you can find them here:

I haven’t gotten around to writing a new one for the most recent version of the release automation, but if you compare our current Checklist to the old Unified Release Process, I’m sure you can get a sense of how much more efficient it is. Basically, we have push-button releases now. Fill in some basic info, push a button, and a release pops out:

June 04, 2014 06:57 PM

How far we’ve come

When I joined Mozilla’s Release Engineering team (Build & Release at the time) back in 2007, the mechanics of shipping a release were a daunting task with zero automation. My earliest memories of doing releases are ones where I get up early, stay late, and spend my entire day on the release. I logged onto at least 8 different machines to run countless commands, sometimes forgetting to start “screen” and losing work due to a dropped network connection.

Last night I had a chat with Nick. When we ended the call I realized that the Firefox 30.0 release builds had started mid-call – completely without us. When I checked my e-mail this morning I found that the rest of the release build process had completed without issue or human intervention.

It’s easy to get bogged down thinking about current problems. Times like this make me realize that sometimes you just need to sit down and recognize how far you’ve come.

June 04, 2014 01:26 PM

June 02, 2014

Kim Moir (kmoir)

Mozilla pushes - May 2014


Here's May's monthly analysis of the pushes to our Mozilla development trees.  You can load the data as an HTML page or as a json file

Trends
This was a record breaking month where we overcame our previous record of 8100+ pushes with a record of 11000+ pushes this month.  Gaia-try, just created in April has become a popular branch with 29% of pushes.


Highlights
General Remarks
The introduction of Gaia-try in April has been very popular and comprised around 29% of pushes in May.  The Try branch itself consisted of around 38% of pushes.
The three integration repositories (fx-team, mozilla-inbound and b2g-inbound) account around 22% of all the pushes, compared to 30% in the previous month.


Records
May 2014 was the month with most pushes (11711 pushes)
May 2014 has the highest pushes/day average with 378 pushes/day
May 2014 has the highest average of "pushes-per-hour" is 22 pushes/hour
May 29th, 2014 had the highest number of pushes in one day with 613 pushes

May 2014 is a record setting month, 11711 pushes!

Note that Gaia-try was added in April and has quickly become a high volume branch


I changed the format of this pie chart this month.  It seemed to be previously based on several months data, but not all data from the previous year.  So I changed it to be only based on the data from the current month which seemed more logical.

June 02, 2014 09:43 PM

May 30, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – May 30th, 2014

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

May 30, 2014 08:20 PM

May 28, 2014

Armen Zambrano G. (@armenzg)

How to create local buildbot slaves


For the longest time I have wished for *some* documentation on how to setup a buildbot slave outside of the Release Engineering setup and not needing to go through the Puppet manifests.

On a previous post, I've documented how to setup a production buildbot master.
In this post, I'm only covering the slaves side of the setup.

Install buildslave

virtualenv ~/venvs/buildbot-slave
source ~/venvs/buildbot-slave/bin/activate
pip install zope.interface==3.6.1
pip install buildbot-slave==0.8.4-pre-moz2 --find-links http://pypi.pub.build.mozilla.org/pub
pip install Twisted==10.2.0
pip install simplejson==2.1.3
NOTE: You can figure out what to install by looking in here: http://hg.mozilla.org/build/puppet/file/ad32888ce123/modules/buildslave/manifests/install/version.pp#l19

Create the slaves

NOTE: I already have build and test master in my localhost with ports 9000 and 9001 respecively.
buildslave create-slave /builds/build_slave localhost:9000 bld-linux64-ix-060 pass
buildslave create-slave /builds/test_slave localhost:9001 tst-linux64-ec2-001 pass

Start the slaves

On a normal day, you can do this to start your slaves up:
 source ~/venvs/buildbot-slave/bin/activate
 buildslave start /builds/build_slave
 buildslave start /builds/test_slave


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

May 28, 2014 07:05 PM

May 23, 2014

Ben Hearsum (bhearsum)

This week in Mozilla RelEng – May 23rd, 2014

Major highlights:

Completed work (resolution is ‘FIXED’):

In progress work (unresolved and not assigned to nobody):

May 23, 2014 08:40 PM

Armen Zambrano G. (@armenzg)

Technical debt and getting rid of the elephants

Recently, I had to deal with code where I knew there were elephants in the code and I did not want to see them. Namely, adding a new build platform (mulet) and running a b2g desktop job through mozharness on my local machine.

As I passed by, I decided to spend some time to go and get some peanuts to get at least few of those elephants out of there:

I know I can't use "the elephant in the room" metaphor like that but I just did and you just know what I meant :)

Well, how do you deal with technical debt?
Do you take a chunk every time you pass by that code?
Do you wait for the storm to pass by (you've shipped your awesome release) before throwing the elephants off the ship?
Or else?

Let me know; I'm eager to hear about your own de-elephantization stories.





Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

May 23, 2014 03:35 AM

May 22, 2014

Peter Moore (pmoore)

Protected: Setting up a Mozilla vcs sync -> mapper development environment

This post is password protected. You must visit the website and enter the password to continue reading.


May 22, 2014 02:35 PM