Planet Mozilla Automation

May 14, 2012

Jeff Hammel

Automation and Testing : Overhaul of Talos Configuration

Automation and Testing : Overhaul of Talos Configuration

Last week I pushed a fix to bug 704654 that fixes a number of issues, conceptual and user-facing, with how Talos handles configuration. I've had an idea on how I wanted to do this for a few months now, but it has always been tabled. But with my (joking, sorry) pledge to Bob Moss to fix all bugs in Talos by the end of quarter.

I had a free weekend so instead of killing the prerequisite bugs as I usually do I decided to tackle the problem in one go. My goals:

  • remove the need to edit several different configuration to change a configuration basis. Most .config edits needed to happen in 5 places (formerly 6). This is not only prone to human error (which I and others have been guilty of many times), it is a discouragement to change default configuration.
  • consistent and declarative serialization/deserialization. Serialization in PerfConfigurator was mostly awful, scanning through line by line and looking for particular strings in (basically) an if-else tree, often depending on particular whitespace or other subtle (and undocumented) formatting issues. While the .config files conform to YAML, we don't make use of this for de/serialization. In addition, while in run_tests.py we allow command line overrides for the YAML items, we do not post-process them as we would in PerfConfigurator.
  • consistent error checking. Currently some of our config-checking is in PerfConfigurator and some is done in run_tests. This opens the possibility that either case may miss cases where the other one would find it. If you call run_tests.py with a .yml file, you will not get the checking done for the combination of command line items and the .yml configuration that is done in PerfConfigurator. Since we process a lot of command line items into resulting configuration, this can lead to interesting results (e.g. while --activeTests is a command line item for run_tests.py, it is not used, anywhere). In general, configuration should be checked in one place before any program logic takes place. While this patch doesn't completely address this issue, it a big step forward and should pave the way for future improvement.
  • configuration should be declarative. You should get what you expect from configuration, not inconsistent results. If you edit a (e.g.) .yml file with the existing Talos, you have no real way to know if the keys you add or edit are going to be used by run_tests.py (and what format they should be in, etc.) Having a basis for configuration gives a single place to denote what is expected (and thereby what isn't allowed) and the form that it is supposed to be in. It is also nice to have all configuration in a single place instead of having to look at a bunch of config files for the basis as well as all over the code to see what is expected and how it is processed.
  • allow running directly from run_test.py . For particular (e.g. production) systems, it may be advisable to use tuned (.yml) configuration files to have highly customized runs (note that we don't do this and use (remote)PerfConfigurator in all cases for reasons that may be infered from the above). However, for a typical developer, there is little reason to run PerfConfigurator -e `which firefox` -a ts --develop -o ts.yml && talos -n -d ts.yml for a particular run. Instead, the entirety of this may be invoked with this patch as talos -n -d -e `which firefox` -a ts --develop -o ts.yml in a one-step process. (Note that we're still dumping to ts.yml though one wouldn't have to if the result is intended as ephemeral).

I hear people prefer blog posts with pictures, so with no reason here is a bunch of cute foxes:

/mozilla/images/panda_adoption.jpg

I've moved the basis of the Talos configuration to PerfConfigurator.py instead of some combination of .config files, PerfConfigurator.py, and run_tests.py. This gets rid of the duplication between the various config files as well as the command line options. In fact, there isn't much left of the configuration files

I don't like configuration to live in code, and so empathize with those who look at this cautiously from that point of view. However, PerfConfigurator following my rework isn't so much configuration, but a configuration basis. Given the goals above, some piece of code has to validate a given configuration, has to know what data is in a configuration, and has to provide whatever command line options are used to front-end the configuration. The previous incarnation of Talos and PerfConfigurator had a significant amount of code to this end, but it was both spread out and incomplete. So I don't think putting it all in one place is a big conceptual change. Having a piece of code that knows the allowable form of configuration gives great power and having the code all in one place just makes it more human-readable.

The unofficial history of Talos configuration, as I understand it, goes something like this: Initially, there was one configuration file. You copied it, edited it by hand, and ran your tests on it. At some point, this became cumbersome, and PerfConfigurator was created to automatically fill in values from a set of command-line choices, and in addition allow the values to be marked up a bit. The road was already paved for some part of configuration basis living in code versus in the .config file. Then, as the need to run tests in different configurations grew, .config files flourished to this end. I'd like to think the changes for bug 704654 as the next logical step in Talos's configuration evolution.

Longer term, we'd like to remove even more of Talos's configuration and replace .yml files with command line options. The complexity of configuration will be managed by mozharness .

May 14, 2012 09:03 AM

May 07, 2012

William Lachance

Ghetto retroscope with ffmpeg and the <video> tag

So yesterday we had a small get-together at my place, which gave me the opportunity to try something I’d been meaning to do for a while: build my own retroscope.

The idea is pretty simple: have a webcam record bits and pieces of a social event, then play them back on-the-spot a few minutes/hours later. I first heard about the concept from reading Nat Friedman’s blog entry from 2005 — if you read that, you see that he just hooked up a video camera to his TiVo. 7 years in the future, laptop webcams are ubiquitous and we have the awesome HTML5 <video> tag, so I figured it would be easy to knock up something interesting in short order with zero custom hardware.

Having only remembered that I wanted to do this about 30 minutes before people were scheduled to start arriving, I didn’t have much time to do anything really perfect. I settled on using this little snippet from stackoverflow to generate short (5 second) movies on my laptop, then used scp to copy them over and display a montage of them in an auto-refreshing webpage on my “television” (which is a Mac-Mini connected to a large computer monitor). Despite being a total hack job, the end result generated much amusement. I think this is a bit different from what Nat originally did (it sounds from his blog like his retroscope played back longer segments), but I think the end result is actually a bit more fun.

Perhaps unfortunately, but probably ultimately for the best, only a few snippets from the actual night got stored away. One example is this gem:

(yes, that handsome fellow with the Pernot is me)

I thought it might be fun to release the slightly-cleaned up results of this experiment as opensource for others to play with, so I created a small project for it on github. Unlike the original version, no complicated scp scheme is required — I just reused Joel Maher’s most excellent mozhttpd library from mozbase to run a web server in the same process as the capture logic. All you need to do is run the server on a Linux machine with a webcam and connect to it with a web browser from any other machine on your local network.

https://github.com/wlach/retroscope

Enjoy!

May 07, 2012 03:08 PM

May 04, 2012

William Lachance

Launching random web browsers on Android

Ok, this is somewhat mundane, but I’ve already had to do it twice (and helped someone do something similar on #mobile), so I figured I might as well blog about it for posterity.

For various automation tasks (notably the Eideticker dashboard and the cross-browser startup tests), we need to be able to launch an Android browser on the command line (via adb shell or our own custom SUTAgent). This is a bit of a black art, but you can find references on how to do this on stackoverflow and other places. The magic incantation is:

am start -a android.intent.action.VIEW -n <application/intent> -d <url>

So, for example, to launch Fennec, you’d run this on the Android command prompt:

am start -a android.intent.action.VIEW -n org.mozilla.fennec/.App -d http://mygreatsite.info

Ok, easy enough, but what if we want to launch a new browser that we just downloaded (e.g. Google Chrome)? Where do we get the application and intent names?

The short answer is that you need to reach into the apk and dig. ;) There’s probably many ways of doing this, but here’s what I do (which has the distinct advantage of not needing to compile, download or run weird java applications):

1. Copy the apk onto your machine (the apk should be in /data/app: if you have a rooted phone, you should be able to copy that off to your machine).

2. Extract AndroidManifest.xml from the apk (it’s just a .zip) and run axml2xml.pl on it.

3. Examine the resultant xml file and look for the <manifest> tag. It should have a property called <package> which is the package name. For example:

We can see pretty clearly that the application name in this case is com.android.chrome (you can also get this by running ps when using the application)

4. Finally, look for a tag called <intent filter> with an <action> tag with <android.intent.action.VIEW> as the android-name property. Scan up for the overarching activity tag, whose android-name property. This is the activity name. For example:


Likewise here we see that the activity name we want is .Main (which Android explicitly expands out to com.android.chrome.Main)

Armed with this information, you should now have enough information to launch the application. Furthering the example above, here’s how to start Chrome on Android via adb’s shell:

am start -a android.intent.action.VIEW -n com.android.chrome/.Main -d http://mygreatsite.info

Hope this helps someone, somewhere.

May 04, 2012 08:39 PM

Jonathan Griffin

Writing WebAPI tests for B2G using Marionette

At Mozilla, we have many different testing frameworks, each of which fills a different niche (although there is definitely some degree of overlap among them). For testing WebAPIs in B2G, some of these existing frameworks can be utilized, depending on the API. For example, mozSettings and mozContacts can be tested using mochitests, since there isn’t much, if anything, that’s device-specific to them. (We’re not currently running mochitests on B2G devices, but will be soon.)

But there are many other WebAPIs which are not testable using any of our standard frameworks, because tests for them need to interact with hardware in interesting ways, and most of our frameworks are designed to operate entirely within a gecko context, and thus have no ability to directly access hardware.

Malini Das and I have been working on a new framework called Marionette which can help. Marionette is a remote test driver, so it can remotely execute test steps within a gecko process while retaining the ability to interact with the outside world, including devices running B2G. When this is combined with the B2G emulator’s ability to query and set hardware state, we have a solution for testing a number of WebAPIs that would be difficult or impossible to test otherwise.

To illustrate how this works, I’m going to walk through the entire process of writing WebAPI tests for mozBattery and mozTelephony, to be run on B2G emulators. We already have such tests running in continuous integration, reporting to autolog. If developers add new Marionette WebAPI tests, they will be run and reported here as well. Eventually, they will likely be migrated over to TBPL.

Building the emulator

These tests will be run on the emulator, so you’ll have to build the B2G Ice Cream Sandwich emulator first, if you don’t have one already.  You’ll need to do this on linux, preferably Ubuntu.  Make sure to install the build prerequisites before you begin, if you haven’t built B2G before.

git clone https://github.com/andreasgal/B2G
cd B2G
make sync (get a cup of coffee, this takes quite a while)
make config-qemu-ics (get another cup of coffee)
make gonk (get another drink, but I think you've had enough coffee by now)
make

You should now have an emulator, which can you launch using:

./emu-ics.sh

After you’ve verified the emulator is working, close it again.

Running a Marionette sanity test

Now we’ll run a single Marionette test to verify that everything is working as expected.   First, ensure that you have Python 2.7 on your system.  Then, install some prerequisites:

pip install (or easy_install) manifestdestiny
pip install (or easy_install) mozhttpd
pip install (or easy_install) mozprocess

Now, from the directory where you cloned the B2G repo:

cd gecko/testing/marionette/client/marionette
python runtests.py --emulator --homedir /path/to/B2G/repo \
  tests/unit/test_simpletest_sanity.py

If everything has gone well, you should see something like the following:

TEST-START test_simpletest_sanity.py
test_is (test_simpletest_sanity.SimpletestSanityTest) ... ok
test_isnot (test_simpletest_sanity.SimpletestSanityTest) ... ok
test_ok (test_simpletest_sanity.SimpletestSanityTest) ... ok

----------------------------------------------------------------------
Ran 3 tests in 2.952s

OK

SUMMARY
-------
passed: 3
failed: 0
todo: 0

Writing a battery test

The B2G emulator allows you to arbitrarily set the battery level and charging state, by telnetting into the emulator’s console port and issuing certain commands.  Marionette has an EmulatorBattery class which abstracts these operations, and allows you to interact with the emulator’s battery using a very simple API.

A simple example is given in the EmulatorBattery documentation on MDN.  Save this example to a file named test_battery_example.py, and run this command:

python runtests.py --emulator --homedir /path/to/B2G/repo /path/to/test_battery_example.py

Marionette should launch an emulator and run the test; when it’s done you should see:

TEST-START test_battery_example.py
test_level (test_battery_example.TestBatteryLevel) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.391s

OK

SUMMARY
-------
passed: 1
failed: 0
todo: 0
How it works

This test, like all Marionette Python tests, is written using Python’s unittest framework, which provides the assert methods used in the test.  Other methods used by the test are provided by the Marionette and EmulatorBattery classes.

When the test executes this line:

self.marionette.emulator.battery.level = 0.25

the EmulatorBattery class telnets into the emulator and sets the battery’s level.  We then read the level back (which invokes another telnet command) to verify that the emulator’s battery state was updated as expected.  And finally, we execute a snippet of JavaScript inside gecko:

moz_level = self.marionette.execute_script("return navigator.mozBattery.level;")

and verify that it returns the same battery level as the emulator is reporting directly.

More tests with hardware interaction

In addition to battery interaction, the B2G emulator allows you to query and set the state of other properties normally set by hardware, like GPS location, network status, and various sensors.  Tests for all these could be written in a similar way.  It probably makes sense to make classes for these similar to EmulatorBattery which abstract the details of getting and setting the state of the underlying hardware.  I would encourage WebAPI developers to add as many WebAPI tests as possible; if you would like us to add convenience classes, please ping us on IRC (jgriffin and mdas, on #ateam or #b2g) or file a bug under Testing:Marionette.

Multi-emulator tests

There are some WebAPIs which cannot be completely tested using  a single device or emulator, like telephony and SMS.  Marionette can help with these too, as Marionette can be used to manipulate two emulator instances which are capable of communicating with each other.

In any tests run with the --emulator switch, Marionette launches an emulator before running the tests, and this emulator is associated with an instance of the Marionette class available to the test as self.marionette. Tests can invoke a second emulator instance using self.get_new_emulator(), and these emulator instances can call and text each other using their port numbers as their phone numbers.

To illustrate how this works, Malini has written an example test in which one emulator is used to dial another, and the caller’s number is verified on the receiver. See this example at https://developer.mozilla.org/en/Marionette/Marionette_Python_Tests/Emulator_Integrated_Tests#Manage_Multiple_Emulators.

If you save this example to test_dial_example.py and run the command:

python runtests.py --emulator --homedir /path/to/B2G/repo /path/to/test_dial_example.py

you should see Marionette launch one emulator, and then after it starts execution of the test, you should see a second emulator instance launch. After the test is done, you should see a successful report, similar to the one shown for the battery test.

We currently have a few tests for mozTelephony, but many more could be added, and new tests should be added for SMS/MMS as well.

Adding new tests to the B2G continuous integration

When new test are ready to be added to the CI, they should be checked into gecko under their dom component, e.g., dom/telephony/test/marionette. They should be added to the manifest.ini file in the same directory, and then for new manifest.ini files, the path to the .ini file should be added to the master manifest at http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/tests/unit-tests.ini. After this is done, it should be picked up by the B2G CI, after the gecko fork of B2G is updated, where it will be reported along with the other tests to autolog.

Caveats, provisos, and miscellanea

B2G builds go to sleep after 60 seconds of inactivity.  In the emulator, this “sleep” will completely lock up Marionette if it occurs while a test is running.  This is very inconvenient while testing.  See bug 739476. Until some better mechanism of handling this is available, I usually edit gecko/b2g/apps/b2g.js to increase the value of the power.screen.timeout pref before building, to prevent the emulator from going to sleep.

The current test failures in autolog are being tracked as bug 751403 and bug 751406.

Network access in the emulator currently doesn’t seem to work (see https://github.com/andreasgal/B2G/issues/287).  This prevents some parts of Gaia from working correctly but doesn’t interfere with the above style of WebAPI tests, none of which rely on Gaia or network access.

Building the emulator is very time-consuming, mostly due to the time required to sync all the various repos needed by B2G.  We hope to be able to post emulator builds for download soon, after a few details are worked out.

More reading

What is Marionette

Marionette Python tests

Marionette Emulator tests

the Marionette class

the Emulator class

Please contribute tests

There are many WebAPIs which are less tested than they could be.  Please help us expand test coverage by contributing tests in areas similar to those described above.    If you need help, contact :jgriffin or :mdas on IRC, or file a bug under Testing:Marionette.


May 04, 2012 05:31 PM

April 25, 2012

Jeff Hammel

Automation and Testing : Considering a Page-Centric Talos

Automation and Testing : Considering a Page-Centric Talos

Currently, the canonical unit of Talos tests is a page set. However, a page-centric point of view offers several intrinsic advantages on top of being, in my opinion, more conceptually coherent.

A page-centric point of view allows easy adding and updating of pages. Currently, making a new page set is a big deal. Since we average over all pages in a page set to obtain a quality metric, adding a new page (or removing a page) will change this number and the entire baseline for comparison has to be recentered. If we made the page the canonical unit of testing, then adding or removing a page doesn't involve a recentering as each page has a quality metric associated with it.

Taking an average over all pages to get a quality metric, as we do, gives a higher weight to pages that take (e.g.) longer to load. For instance, consider the output for tsvg:

|i|pagename|runs|
|0;gearflowers.svg;79;65;68;68;67
|1;composite-scale.svg;46;35;44;41;42
|2;composite-scale-opacity.svg;21;22;24;22;20
|3;composite-scale-rotate.svg;23;21;21;20;19
|4;composite-scale-rotate-opacity.svg;19;24;19;19;23
|5;hixie-001.xml;45643;14976;17807;14971;17235
|6;hixie-002.xml;51257;15193;21693;14969;14974
|7;hixie-003.xml;5016;37375;5021;5024;5008
|8;hixie-004.xml;5052;5053;5054;5054;5053
|9;hixie-005.xml;4618;4533;4611;4532;4554
|10;hixie-006.xml;5059;5107;9741;5107;5089
|11;hixie-007.xml;1629;1651;1648;1652;1649

A performance loss (or gain) in e.g. gearflowers.svg is likely not to be noticed in this pageset as it is several orders of magnitude lower than (e.g.) hixie-002.xml, so a small percentage-wise noise in the latter could easily hide a legitimate regression in the former.

Having this additional data of what changes regress which pages allows us to explore how these particular page modifications affect performance. If we can isolate patterns, we can fix them.

One conceptual disadvantage to a page-centric approach is that deciding whether a changeset is a net regression or not becomes harder. Ideally a human (or other expert system) would evaluate all of the data across pages and decide whether a change is a regression or not. However, we have many pages and not enough people, so this is harder to do than to craft a formula for a quality metric. To obtain an overall quality metric for a push, some sort of averaging over pages must be done. We currently throw away the highest value and take the mean of remaining page averages. If we continue with this approach we throw away the ability easily add and remove pages without futzing with the metric. Instead, a method should be sought whereby adding a new page does not affect a metric.

April 25, 2012 09:33 AM

William Lachance

Eideticker with less eideticker

[ For more information on the Eideticker software I'm referring to, see this entry ]

tl;dr: You can now run the standard eideticker benchmarks easily on any Android phone without any kind of specialized hardware.

So Eideticker is pretty great at comparing relative performance between different browsers and generally measuring things in an absolutely neutral way. Unfortunately it’s a bit of a pain to use it at the moment to catch regressions: the software still has a few bugs and encoding/decoding/analyzing the capture still takes a great deal of time. Not to mention the fact that it currently requires specialized hardware (though that will soon be less of a concern at least inside MoCo, where we have a bunch of Eideticker boxes on order for the Toronto and Mountain View offices).

A few months ago, Chris Lord wrote up some great code to internally measure the amount of checkerboarding going on in Fennec. I’ve thought for a while that it would be a neat idea to hook this up to the Eideticker harness, and today I finally did so. After installing Eideticker, you can now run the benchmark on any machine against an arbitrary fennec build just by typing this from the eideticker root directory:

adb shell setprop log.tag.GeckoLayerRendererProf DEBUG
./bin/get-metric-for-build.py --no-capture --get-internal-checkerboard-stats --num-runs 3 nightly.apk src/tests/scrolling/taskjs.org/index.html

In return, you’ll get some nice clean results like this:

=== Internal Checkerboard Stats (sum of percents, not percentage) ===
[167.34348, 171.871015, 175.3420296]

Just to be sure that the results were comparable, I did a quick set of runs on the Eideticker machine in Mountain View with both internal checkerboard statistics gathering and HDMI capturing enabled.

Stats file HDMI capturing
167.34348 177.022
171.87 184.46
175.34 184.44

While the results aren’t identical (we measure number of frames differently inside Fennec than we do with Eideticker, for one thing), they do seem roughly correlated. So go forth, benchmark and tweak! ;)

P.S. If you’ve been following mobile automation, you might be asking why I don’t just suggest running Talos and Robocop on your workstation. Can’t they do the same sorts of things? The short answer is that yes, they can, but unfortunately they’re much more involved to set up and use at the moment. Arguably they shouldn’t be, and this is something we (Mozilla tools & automation) need to work on. We’ll get there eventually (and help would be welcome!). For now, hacks like this should help with getting out the first release of Fennec by providing a fast, easy to use tool for bisection and analysis.

April 25, 2012 01:47 AM

April 19, 2012

William Lachance

GoFaster dashboard back online

Build times for mozilla-central are a major factor in developer productivity. Faster build times mean more people using try (reducing breakage) and more fine-grained regression ranges (reducing the impact of breakages). As a side benefit, it allows us to avoid buying and maintaining more hardware (or put new hardware to better use). About a half-year ago, we set up a project called BuildFaster to try to bring these times down, setting the ambitious goal of getting build times (from checkin to tests done) down to 2 hours. We didn’t quite succeed, though we did make some major strides. As part of this project, we also developed a dashboard to track our progress and narrow down the major bottlenecks which were keeping up our build times.

Unfortunately, this dashboard went down earlier this year with the rest of Brasstacks and we hadn’t had the chance to bring it back up. I’m pleased to announce that thanks to Jonathan Griffin, it’s finally back online.

While no one is actively working on build performance at the moment (at least to my knowledge), it’s still useful to keep track of build times to make sure that we don’t regress. Anecdotally, it has seemed to me that the time needed to get results from try has been pretty stable over the last while, and this is borne out by the results:

As the cliche goes: no news is good news.

April 19, 2012 10:27 PM

April 06, 2012

William Lachance

Yet more adventures in mobile performance analysis

[ For more information on the Eideticker software I'm referring to, see this entry ]

Participated in an interesting meeting on checkerboarding in Firefox for Android yesterday. As a reminder, checkerboarding refers to the amount of time you spend waiting to see the full page after you do a swipe on your mobile device, and it’s a big issue right now – so much so that it puts our delivery goal for the new native browser at risk.

It seems like we have a number of strategies for improving performance which will likely solve the problem, but we need to be able to measure improvements to make sure that we’re making progress. This is one of the places where Eideticker could be useful (especially with regards to measuring us against the competition), though there are a few things that we need to add before it’s going to be as useful as it could be. The most urgent, as I understand, is to come up with a suite of tests which accurately represent the set of pages that we’re having issues with. The current main measure of checkerboarding that we’re using with eideticker is taskjs.org which, while an interesting test case in some ways, doesn’t accurately represent the sort of site that the user would normally go to in the wild (and thus be annoyed by). ;)

This is going to take a few days (and a lot of review: I’m definitely no expert when it comes to this stuff) to get right, but I just added two tests for the New York Times which I think are a step in the right direction of being more representative of real-world use cases. Have a look here:

http://wrla.ch/eideticker/dashboard/#/nytimes-scrolling
http://wrla.ch/eideticker/dashboard/#/nytimes-zooming

The results here actually aren’t as bad as I would have expected/remembered. There amount of checkerboarding after a zoom out is a bit annoying (I understand this a known issue with font caching, or something) but not too terrible. Still, any improvements that show up here will probably apply across a wide variety of sites, as the design patterns on the New York Times site are very common.

(P.S. yes, I know I promised a comparison with Google Chrome for Android last time… rest assured that’s still coming soon!)

April 06, 2012 01:12 AM

April 03, 2012

William Lachance

An even better way of taking screenshots on Android

Just thought I’d mention this because I found it handy.

A while back AaronMT wrote up some clever instructions on taking Android screenshots by dumping the contents of ‘/dev/fb0′ and running ffmpeg on the results. This is useful, but you need to know the resolution of the device you have connected to pass the right arguments to ffmpeg. Wouldn’t it be better if you had just one script that would work for whatever device you had plugged in?

In fact, there is a way to do this using the monkeyrunner utility. Intended mainly as a tool for synthesizing input on Android (more on that some other time), you can also easily get a capture of the Android screen with its python/jython API (assuming you have the Android SDK installed). Here’s a quick script which does the job:

from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice
import os
import sys

if len(sys.argv) != 2:
    print "Usage: %s " % os.path.basename(sys.argv[0])
    sys.exit(1)

device = MonkeyRunner.waitForConnection()
result = device.takeSnapshot()
result.writeToFile(sys.argv[1], 'png')

Copy that into a file called capture.py (or whatever), then run it like so:

monkeyrunner capture.py screenshot.png

And you’re off to the races! Nice screenshot, no utilities or non-essential command line arguments required!

(credit to this stackoverflow answer for the idea)

April 03, 2012 09:49 PM

March 22, 2012

William Lachance

Eideticker dashboard update

[ For more information on the Eideticker software I'm referring to, see this entry ]

Since my first Eideticker dashboard post was so well received, I thought I’d give a quick update on another metric that I just brought online: checkerboarding (a.k.a. the amount of time you spend waiting to see the full page after you do a swipe on your mobile device).

[ link to real thing ]

Unfortunately the news here is not as good as before: as the numbers indicate, the new Native Fennec currently performs substantially worse than the version in Android market. This is a known issue, and is currently being tracked in bug 719447.

Next up: Seeing how we do against Google Chrome for Android.

March 22, 2012 10:07 PM

March 16, 2012

William Lachance

Announcing the Eideticker mobile performance dashboard

Over the last while, Clint Talbert and I have been working on setting up automatic mobile performance tests using Eideticker (a framework to measure perceived Firefox performance by video capturing automated browser interactions: for more information, see my earlier post).

There’s many reasons why this is interesting, but probably the most important one is that it can measure differences reliably across different types of mobile browsers. Currently I’m testing the old XUL fennec, the Android stock browser, and the latest nightlies.

I’m pleased to announce that the first iteration of the dashboard is available for public consumption, on my site.

http://wrla.ch/eideticker/dashboard/#/canvas

Eideticker Results

The demo is pretty cheesey (just click on any of the datapoints to see the video capture), but nonetheless does seem to illustrate some interesting differences between the three browsers. The big jump in performance for nightly comes from the landing of the Maple branch, which happened earlier this week. Hopefully this validates some of the work that the mobile/graphics team has been doing over the past while. Exciting times!

March 16, 2012 06:51 PM

March 12, 2012

Joel Maher

Reducing the Noise in Talos

Over the last year there has been a lot of research into reducing the noise in our talos performance numbers.  For example looking at tscroll, we have a fluctuation in the reported numbers of almost 400 (out of 14000).  Jan Larres took a look at this problem in his masters thesis, and found a variety of factors that did and didn’t contribute to the noise.  We actually have Bug 706912 filed to implement some of his suggestions on how to calculated the posted number.  Last fall, Stephen Lewchuk look at the raw data that was collected and found some inconsistencies in the way we were aggregating the data.  In short, we have a lot of ground to cover if we want to reduce our numbers.

Over the last couple months, we have been working on a project call Signal From Noise.  This is an attempt to fix the way we collect some numbers and redo the way we aggregate numbers for reporting.  We have done a lot of experimenting with the primary focus on tp5.  The way we run tp5 is to load each of the 100 pages once, then repeat 10 times.  For each page, we would drop the highest value and take the median value of the remaining 9 numbers.  This results in an array of 100 data points which get reported to the graph server.  We take those 100 data points and average them out to generate the single number for tp5.  It is easy to imagine that the small samples and median/average combination will produce a lot of noise.

Going forward, we are looking to change from column major to row major and collect 30 samples instead of 10.  This means we focus on one page and load it 30 times, then move to the next page and repeat until all 100 pages have been loaded.  The downfall is the runtime as we move from an average of 17 minutes to an average of 39 minutes for the entire tp5 run.  Collecting 30 samples will give us a much more meaningful number, but we also found that the first 5-10 iterations contain the most noise.  So initially we are looking to throw away the first 10 numbers instead of what we originally did by throwing away the highest number. When looking at the raw numbers (not the aggregated number), here are some graphs to highlight the difference:

Image

Image

 

This is only the first step in many changes needed.  After rolling this out, we need to evaluate the other test suites as well and ensure we are running adequate cycles to get a valid sample size.  We are also working on allowing the database to accept the raw values instead of the single median value per page.  Likewise are are looking to stop doing a average([median(page)]).  All of this will allow us to find regressions easier per page instead of having it washed over with the other numbers.


March 12, 2012 02:33 PM

March 06, 2012

Malini Das

DXR status!

So alongside my work with Marionette, I've been working with the DXR folks to help get their builds tested. Since a lot of DXR status updates occur over at the #static IRC channel on irc.mozilla.org, here's a quick rundown of what's going on: Many of you have been asking if DXR is ready to replace MXR, and what the timeline for that is. Well, now there's some good news! Most of the implementation work to get DXR's search features running, and running quickly, are done, and all that's left to get MXR parity is a few UI features and tweaks. Taras, the manager of the DXR team, is aiming to have these UI changes worked out next month, so expect big things for DXR in the short future. Right now, the Lanedo team is developing the codebase, and I'm running between them and release engineering to get production ...

March 06, 2012 10:07 PM

March 05, 2012

Mark Côté

Mozilla A-Team: Peptest results, an exercise in statistical analysis

UPDATE: It's been pointed out that the current metric (sum of squares of unresponsive periods, divided by 1000) is used in Talos and has had a fair bit of thought put into it. I was curious what not squaring the results would do, but I wouldn't go with another metric without more careful thought.

UPDATE 2: It has also been pointed out that peptest tests performance, not correctness, and hence should report its results elsewhere (essentially as I've done with the sampled data) and not be a strict pass/fail test. This approach definitely warrants some consideration.




About a week and a half ago, peptest was deployed to try. To recap, peptest identifies periods of unresponsiveness, where "unresponsiveness" is currently defined as any time the event loop takes more than 50 ms to complete. We have a very small suite of basic tests at the moment, looking for unresponsiveness while opening a blank tab, opening a new window, opening the bookmarks menu, opening and using context menus, and resizing a window.

The results are currently ignored, since we still don't know how useful they will be, but you can see them by going to https://tbpl.mozilla.org/?tree=Try&noignore=1. They are marked by a "U" (not sure why exactly, but it will change at some point to something more obvious).

At the moment, every platform fails at least one of these tests, and most of the time there are multiple failed tests. This isn't too surprising, since 50 ms is a pretty bold target. However, going forward, we need some sort of baseline result, so that we can identify real regressions. To accomplish this, peptest tests can be configured with a failure threshold. We calculate a metric for each test (see below), and, if a failure threshold is configured, a metric value below this threshold is considered a pass. Hopefully, we can identify a threshold for each test (or, likely, a threshold for each platform-test combination) such that all the tests pass but significant increases in unresponsiveness will trigger failures. At the same time, we will also file bugs on all the tests so we don't forget about the fact that there are still unresponsive periods during their execution that are being hidden by the thresholds. We can lower or eliminate the thresholds if these bugs are partially or fully fixed.

Things, of course, aren't that simple. I gathered and analyzed the peptest logs from try over a four-day period, and there is quite a lot of variance in the results, even on the same platform. With a sufficiently generous threshold, we could get the tests to pass most of the time, but there are occasionally some crazy outliers that no reasonable threshold could contain. However, it is probably okay to have the tests turn orange once in a while. 0 oranges might be an unreasonable target for this project, and intermittent oranges would be a reminder that, sometimes, there are really unacceptable periods of unresponsiveness.

(Btw one test, test_contextMenu.js, appears to only fail on Linux and Linux64, but this is actually a bug in the test--on all the other platforms, it's erroring out before it hits the end. I've since fixed this but haven't collected new data yet.)

I experimented a bit with the test metric, to see if that improved the situation. Right now, as deployed on try, the metric is calculated as the sum of the squares of the unresponsive periods in a single test (an unresponsive period being, by definition, a value above 50). I tried just summing the periods without squaring them, which seemingly increases the variance in some tests and decreases it in others. I also experimented with raising the minimum unresponsive period from 50 ms to 100 ms, since there are strong arguments that 50 ms is pretty unrealistic, at least at this stage.

I've graphed the failures, along with their mean and standard deviations, at http://people.mozilla.com/~mcote/peptest/results/. I also plotted passes as 0s (there are certainly lots of unresponsive periods less than 50 ms in those passes, but for all intents and purposes they are 0) in a different colour. There are unique URLs to all combinations of platform, test, and metric. The raw data is also available there (in JSON).

Following is a brief discussion of some of the problems with identifying good failure thresholds.

Some of the simple tests don't have much variance. test_openBlankTab.js, which just measures the responsiveness when opening and closing a blank tab, mostly passes, with just a few outliers. Some slightly more complicated tests, however, have quite a bit of variance. The bookmarks-menu test, test_openBookmarksMenu.js, scrolls through the bookmarks menu and then opens the bookmarks window. The results on snowleopard are particularly egregious:



As you can see, most of the failures are clustered around the mean. The standard deviation encompasses most of them. Changing the metric from the sum of squares of unresponsive periods to just the sum of the periods improves things a little:



There is only one point above a single standard deviation, although two are rather close. Increasing the allowable unresponsive period to 100 ms reduced the standard deviation, but only because a few low points became passes:



So this is one example where we would expect to see at least one orange every few days, even if we set the metric to about 25% higher than the mean.

In other cases we have mostly passes but some really crazy outliers. On snowleopard, test_openWindow.js, which merely opens a new window, has mostly passes, but in this sample there is one run that had unresponsive periods totalling more than 250 ms.



So here, we could leave the failure threshold at 0 ms, although we'd still have oranges every few days. In this case, setting the unresponsive threshold to 100 ms wouldn't make a difference, since the few failures are significantly above 100 ms.

test_openWindow.js on leopard, however, is all over the place when using just the sum of unresponsive periods:



There aren't really any outliers here, just a large spread of values. A reasonable failure threshold here would have to be twice the mean to ensure that oranges only occur occasionally.

In this case, switching to a sum of squares makes the outliers more obvious, although the standard deviation becomes quite large:



And in case it wasn't obvious, the results are completely different on a different OS. Take test_openWindow.js on Windows 7:



Most results are clustered, but there are 5-6 real outliers, depending on how you define an outlier. This test-platform combination looks to be a real potential for regular oranges unless an extremely generous failure threshold is defined.

In conclusion, it's going to be kind of tough to define failure thresholds such that most runs pass and that real regressions are identified. There doesn't seem to be a huge difference between using the sum of unresponsive periods versus the sum of their squares, although in some instances the latter makes the outliers more obvious. Raising the minimum acceptable unresponsive period unsurprisingly causes more passes but doesn't really improve the variance in the failures. Regardless, it looks like I will have to go through the sampled results and, for each test, set a failure threshold that encompasses the majority of the failures, but even still there will be intermittent oranges. Comments and suggestions welcome!

March 05, 2012 03:51 PM

February 26, 2012

Mark Côté

flot-axislabels v2.0

I've released version 2.0 of flot-axislabels, the flot plug-in for labelling axes. Flot is a great, easy-to-use JavaScript graphing lib, based on canvas; however, many people (myself included) viewed the lack of support for axis labels to be a big fault. With flot-axislabels, you can get said labels by just loading the script after flot and setting one extra option per axis (or a couple more if you have specific needs).

Version 2.0 (which is actually the first "real" release but has a lot of recent changes) now supports any number of X and Y axes. Previously only 2 X and 2 Y axes were supported (top, bottom, left and right).

Having more than 4 axes on a single plot probably sounds a bit weird, but apparently it is useful when plotting weather conditions:



You can see the live example and view its source to see how it's done. It's really quite simple.

flot-axislabels continues to support CSS translations, canvas, and traditional CSS positioning (plus a special mode for IE 8 combining CSS positioning with IE's special rotation functions). In the first two modes, labels for Y axes are rotated to face the plot. Graceful degradation is attempted based on the browser's detected capabilities.

Internally, it no longer pays attention to the name of the axis (yaxis, y2axis, etc.) but rather looks at the 'position' variable, which flot automatically sets if it is not provided. I believe this means that it will only work with flot 0.7, however.

Read the README, download the zip, and follow the project on github.

February 26, 2012 04:17 AM

February 15, 2012

Jeff Hammel

Talos Signal from Noise: Configurable Talos Data Filters

Talos Signal from Noise: Configurable Talos Data Filters

As part of Signal from Noise I introduced a patch that changes the way --ignoreFirst works and adds configurable data filters to Talos :

While this is a small change in terms of how the code currently works, it lays the groundwork for a window of possibilities in terms of Talos statistics. Currently, pageloader calculates the "median" (ignoring the high value), the mean, the max, and the min, and outputs these along with the raw run data. Pageloader is for loading pages and taking measurements, not really for doing statistics. So it would be nice to move this upstream: first to Talos, then to graphserver proper.

Being able to specify data filters with --filter from the command line and filter: in the .yml configuration file allows the test-runner to change the "interesting number" by which we measure performance metrics on the fly. While there are currently only a few filters available, it is easy to add more metrics as we need them.

In a parallel effort, the JetPerf software consumes Talos filters . This is a good example of the expansion of the Talos ecosystem: as a ciritical part of our performance testing infrastructure, building tests and frameworks on top of Talos. In general, the A Team is moving towards a testing ecosystem of reusable parts and sane APIs.

Data filters were added to talos as an interim measure to make the "interesting number" calculations more flexible. As we play with different types of statistics, we need the ability to change configuration without having to jump through too many hoops and this fulfills this immediate need.

However, in the longer term, Talos and pageloader shouldn't really be doing statistics at all. They are in the "statistics gathering" camp where graphserver is in the "statistics processing" business. It would also be nice if there was a piece of software that let you analyze Talos results locally, ideally using the same statistics processing package that graphserver uses. This is outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=721902 .

http://k0s.org/mozilla/talos/bug-721902.gv.txt

February 15, 2012 12:44 PM

February 11, 2012

William Lachance

Playing with pandas

For the last few days I’ve been experimenting with getting a Pandaboard running Android 4.0, continuing the work that Clint Talbert started in the fall to get these boards for use as a replacement for the Tegra in Mozilla’s android automation. The first objective is to get a reproducible build going, after that we’ll try to get some of our custom tools (SUTAgent & friends) installed by default.

So far this has been… interesting. Much as Clint did before, I thought I’d document some of the notes on what I did in the hopes that they’ll be helpful to other people trying to do similar things.

Getting things up and running is a two step process. First, you build the beast. This part is straightforward, just follow the instructions here:

At least the build part is more or less straightforward. Just follow the instructions here:

Note that you almost certainly want to build in the “eng” configuration, which is rooted and (apparently) has some extra tools installed.

Installing it is a little more tricky. The way they want you to do this is put the pandaboard into a special mode and copy the stuff you built onto an sdcard. Seem a little funny to you? Yeah, it does to me too. Why not just build an sdcard image directly?
Nonetheless, this is the officially supported way of imaging a pandaboard, so let’s just follow it until we can think of a better way of doing things. :) The instructions for doing this on the pandaboard are located in the source tree here:

device/ti/panda/README

These are mostly correct as far as I can tell, but there’s a few gotchas. First, you need to run the commands mentioned as root unless you’ve configured USB to be configurable by your user. Second, most of those commands are not in the path by default so you’ll need to specify the full path to e.g. the fastboot utility. The instructions here cover these exception cases: I recommend following them instead.

One thing which neither document mentions is that you really need to make sure your sdcard is wiped completely clean before using fastboot. The “oem format” step only recreates the partition table, it doesn’t delete any corrupted partitions. If you reboot while these are still in place, it will try to bring up your corrupted version of Android, not the fastboot console. I spent quite some time debugging why I couldn’t properly flash the operating system before realizing this. Easiest way to get around this is to dd /dev/zero onto the sdcard before beginning the flashing process.

Also, while not strictly necessary to get something up and running, I recommend highly getting an HDMI monitor as well as a serial<->USB adapter. The former is useful to see if your Android device actually successfully booted up, the latter is useful for debugging boot issues where you don’t get that far (the serial console is always available from boot).

So, after painfully learning about the above caveats, I have managed to get things mostly working. I can see the ICS homescreen on my attached HDMI monitor and interact with it if I attach a USB mouse. The one gotcha is that both ethernet and WIFI networking are totally broken. Plugging in an ethernet cable or connecting to a WIFI network seems to result in the machine randomly rebooting, with the logs saying nothing useful. Both of these things are ostensibly supposed to be working according to the latest I’ve read from Google so I’m not exactly sure what’s going on. Investigations will continue.

February 11, 2012 12:04 AM

January 31, 2012

Jeff Hammel

Talos Signal from Noise: analyzing the data

Talos Signal from Noise: analyzing the data

Recently, a change was pushed as part of the Signal from Noise effort in order to make Talos statistics better: https://bugzilla.mozilla.org/show_bug.cgi?id=710484 The idea being that the way were are doing things is skewing the data and not helping with noise.

Currently, pageloader calculates the median after throwing out the highest point: http://hg.mozilla.org/build/pageloader/file/beca399c3a16/chrome/report.js#l114 We introduced --ignoreFirst to instead ignore the first point and calculate the median of the remaining runs.

However, after introducing the change we noticed that our distribution had gone bimodal during side by side staging:

Were we doing something other than what we thought we were doing? Were our calculations wrong? Or was something else going on?

So jmaher and I dove in to take a look at the data. jmaher dug up a high-mode and low-mode case from the TBPL logs corresponding to the push sets displayed on graphserver

https://tbpl.mozilla.org/php/getParsedLog.php?id=8982519&tree=Firefox&full=1
high point:
NOISE: __start_tp_report
NOISE: _x_x_mozilla_page_load,109,NaN,NaN
NOISE: _x_x_mozilla_page_load_details,avgmedian|109|average|354.25|minimum|NaN|maximum|NaN|stddev|NaN
NOISE: |i|pagename|median|mean|min|max|runs|
NOISE: |0;big-optimizable-group-opacity-2500.svg;123.5;354.25;92;1130;147;1130;1078;92;100
NOISE: |1;small-group-opacity-2500.svg;109;2333.25;103;9247;103;9012;9247;111;107
NOISE: __end_tp_report


https://tbpl.mozilla.org/php/getParsedLog.php?id=8982267&tree=Firefox&full=1
low point:
NOISE: __start_tp_report
NOISE: _x_x_mozilla_page_load,108,NaN,NaN
NOISE: _x_x_mozilla_page_load_details,avgmedian|108|average|113.00|minimum|NaN|maximum|NaN|stddev|NaN
NOISE: |i|pagename|median|mean|min|max|runs|
NOISE: |0;big-optimizable-group-opacity-2500.svg;119;353.75;91;1132;139;1132;1086;91;99
NOISE: |1;small-group-opacity-2500.svg;108;113;103;9116;103;133;9116;108;108
NOISE: __end_tp_report

From http://pastebin.mozilla.org/1470000 .

Since I can't really read this being a mere human being, I modified results.py to parse this data:

+
+if __name__ == '__main__':
+    import sys
+    string_high = """
+|0;big-optimizable-group-opacity-2500.svg;123.5;354.25;92;1130;147;1130;1078;92;100
+|1;small-group-opacity-2500.svg;109;2333.25;103;9247;103;9012;9247;111;107
+"""
+    string_low = """
+|0;big-optimizable-group-opacity-2500.svg;119;353.75;91;1132;139;1132;1086;91;99
+|1;small-group-opacity-2500.svg;108;113;103;9116;103;133;9116;108;108
+"""
+    big = PageloaderResults(string_high)
+    small = PageloaderResults(string_low)
+    import pdb; pdb.set_trace()

This makes some explorable PageloaderResults objects that explorable with pdb . While I did this for a one-off hack, this is something we'll probably generally want as part of Signal from Noise: https://bugzilla.mozilla.org/show_bug.cgi?id=722915

Then I looked at the data:

(Pdb) pp(small.results)
[{'index': '|0',
  'max': 1132.0,
  'mean': 353.75,
  'median': 119.0,
  'min': 91.0,
  'page': 'big-optimizable-group-opacity-2500.svg',
  'runs': [139.0, 1132.0, 1086.0, 91.0, 99.0]},
 {'index': '|1',
  'max': 9116.0,
  'mean': 113.0,
  'median': 108.0,
  'min': 103.0,
  'page': 'small-group-opacity-2500.svg',
  'runs': [103.0, 133.0, 9116.0, 108.0, 108.0]}]
(Pdb) pp(big.results)
[{'index': '|0',
  'max': 1130.0,
  'mean': 354.25,
  'median': 123.5,
  'min': 92.0,
  'page': 'big-optimizable-group-opacity-2500.svg',
  'runs': [147.0, 1130.0, 1078.0, 92.0, 100.0]},
 {'index': '|1',
  'max': 9247.0,
  'mean': 2333.25,
  'median': 109.0,
  'min': 103.0,
  'page': 'small-group-opacity-2500.svg',
  'runs': [103.0, 9012.0, 9247.0, 111.0, 107.0]}]

You'll notice that a few things from the runs data:

  • the runs data is indeed bifurcated. In all case there is a low value, around a hundred, and a high value in the thousands
  • contrary to the assumption that the first datapoint may be biased and high, you can't really see any bias, at least compared to the magnitude of the bifurcation

So how does this compare to the graphserver results? http://graphs-new.mozilla.org/graph.html#tests=[[170,1,21],[57,1,21]]&sel=1327791635000,1328041307110&displayrange=7&datatype=running

For the old data and the low value of the new data, we see times around 110-120ms. The high value of the new data is around 590ms. Are these numbers what we'd expect?

Throwing away the high value and taking the median for both data sets gives a number of the order of 100 or so (the old algorithm). Taking the median functions as a filter for the bifurcated results towards the majorant population. Since the low population is slightly more majorant, dropping the highest number in the way that pageloader does further biases towards it. It is not surprising we see no bifurcation in the old data.

For the new data, we drop the first run. Coincidentally or not, for the cases studied the first run was part of the low population, so that tends towards bifurcation. Taking the median of the remaining data points gives

High case:

  • big-optimizable-group-opacity-2500.svg : (1078 + 100) / 2 = 589
  • small-group-opacity-2500.svg : (9012 + 111) / 2 = 4561.5

Low case:

  • big-optimizable-group-opacity-2500.svg (99 + 1086) / 2 = 592.5
  • small-group-opacity-2500.svg : (133 + 108) / 2  =  120.5

So why does high case come out high and the low case come out low? So there is even more magic. Graphserver reports an average by take the mean of all the pages but discarding the high result: http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208 (from http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l265 from http://hg.mozilla.org/graphs/file/d93235e751c1/server/collect.cgi ). Since both of the runs exhibit the high value of the bifurcation in the high case, you report the lower of the two bifurcated values: 589, from big-optimizable-group-opacity-2500.svg. Since in the low case only one of the values is bifurcated, you get the low value: 120.5, from small-group-opacity-2500.svg .

Okay mystery solved. We know why graphserver is reporting what data it is reporting and we also know that our algorithm is doing what we think it is doing. However, this is the beginning instead of the end of the problem.

By taking the average and discarding the high value of two data points, we are doing something weird and wrong. We are effectively only reporting one of the two pages. Note for the high and the low case what we are actually viewing data from the different pages! This is misleading and probably outright wrong. We essentially have two pages just to throw one of them away and then we have no confidence at what we are looking at. I'm not sure if the code at http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208 would even work for a single page. Probably not. In general I grow increasingly skeptical of our amalgamation of results. We need increasingly to be able to get to and manipulate the raw data. We certainly need a way of digging into the stats and know what we're looking at and have confidence in it. In general, talos, pageloader, and graphserver need to be made such that it is both easier to try new filters as well as more transparent to what is actually happening.

We have been trying to bias towards the low numbers. Looking at the data for the four tests show that there are 13 low-state numbers and 7 high-state numbers. While there are more numbers in the low state, it is not an overwhelming majority.

This leaves the big elephant in the room: why are these runs bifurcated? Are we seeing a code path, or is something else happening on these builders that leads to bifurcated results? While this will be challenging to investigate, IMHO we should know why this happens. While our method of throwing out the highest data point, getting the median, throwing the data to graphserver, then getting the average of the whole pageset back, has a positive effect of minimizing noise (which is important), it is also sweeping a lot under the rug. We need to have confidence that what we're ignoring is okay to ignore. I don't have that confidence yet.

January 31, 2012 04:42 PM

January 25, 2012

William Lachance

Yet more checkerboarding analysis

I’ve been spending a bit more time on refining the checkerboarding tests in Eideticker that I talked about last time. Most of my work has been focused on making the results as representative of a real world scenario as possible, to that effect I’ve been working on:

The end result of this is a framework that gives much more meaningful results. The bad news is that the results that I’m measuring don’t show a very positive picture for where we’re at with the native re-write of Firefox. Even relative to the version of mobile Firefox which is currently on the Android Market, we still have some catching up to do. Here’s some video of the “old” firefox in action:

And here’s the Native fennec (what we’re currently offering in nightly, with some minor modifications by me to change the way the “checkerboard” is drawn for analysis purposes):

The numbers behind this comparison:

Platform Percent checkerboarding over run of test
Old Fennec 2%
Native Fennec 57%

(by the way, this performance regression is filed as bug 719447)

I know there’s lots of great effort going into improving this situation, so I have hope that we’ll be doing much better on this metric in the coming days/weeks. The process for creating these videos/analyses is mostly automated at this point, so my plan is to create a small dashboard (ala arewefastyet.com) to measure these numbers over time on the latest nightlies. Stay tuned!

January 25, 2012 10:18 PM

January 24, 2012

Jeff Hammel

Mozilla Automation and Testing - Jetpack Performance Testing

Mozilla Automation and Testing - Jetpack Performance Testing

I have a working proof of concept for Jetpack performance testing (JetPerf): http://k0s.org/mozilla/hg/jetperf . JetPerf uses mozharness to run Talos ts tests with an addon built with the Jetpack addon-sdk to measure differences betwen performance with and without the addon installed.

Playing with Jetpack + Talos performance lets us explore statistics in a bit more straight-forward manner than the production Talos numbers. As part of the Signal from Noise project which I am also part of, there is a lot of parts to staging even small changes in how we process Talos data since the system involved has many moving parts ( Talos, pageloader, graphserver ). By contrast, since JetPerf is a new project, it is much more flexible to explore the data that we have not hitherto explored.

I made a mozharness script to clone the hg mirror of addon-sdk . It then builds a sample addon and runs Talos with it installed.

Looking at raw numbers wasn't very interesting, so I made a parser for Talos's data format It was pretty quick to get some averages out before and after the addon was installed, but I thought it would be more usefulto display the raw data along with the averages.

https://bug717036.bugzilla.mozilla.org/attachment.cgi?id=591224

These really aren't fair numbers, as currently the stub jetpack I use prints to a file, but its at least a start of a methodology.

The reason I'm sharing this isn't just to make a progress report, but more to present some ideas about thinking about what to do with Talos data. While this was done for JetPerf, much of this also applies to Signal from Noise. You run Talos and get some results. What do you do with them? Currently we just shove them into http://graphs.mozilla.org/ and say that's where you process them, but I think looking at them locally is not only important but necessary if you're doing development work. I think a big part of any statistics-heavy projects is to make it easy for all of the stakeholders to explore data, apply different filters and see how things fit together. While it takes a statistician to be rigorous about the process, anyone can play with statistics and it takes a village to really conceptualize what is being looked at. I hope, to this end, developers will use my software so that they can understand what it is doing and provide the valuable feedback I need.

TODO

JetPerf is still very much at a proof of concept stage. Ignoring the fact that none of it is in production, there are still many outstanding questions about basic facts of what we are doing here. But outside of polishing rough edges, here are some things on the pipe.

  • test more variation of addons; currently we just load panel and print something to a file
  • test on checkin (CI): so the main point of JetPerf is to get a better idea of what SDK changes cause addon performance regressions and hits, to be able to quantify them. While as stated this is a very open ended project, one thing to turn this from a casual exploration to a developer tool is running the tests on checkin. This will give an update in real time of if a checkin breaks performance.
  • graphserver: in order to assess Jetpack's performance over time, we will want to send numbers to some sort of graphserver . This will allow us to keep track of the data, to view it, and apply various operations to it.

I may also spin off the (ad hoc) graphing portion and the Talos log parser portions into their own modules, as they may be useful outside of just Jetperf.

January 24, 2012 01:10 PM

January 03, 2012

William Lachance

Measuring reduced checkerboarding in mobile Fennec

After my post on measuring checkerboarding in mobile Firefox, Clint Talbert (my fearless manager) suggested I run a before and after test to measure the improvement that just landed as part of bug 709512. After a bit of cleanup, I did so, measuring the delta between my build on December 20th and the latest version of Aurora. The difference is pretty remarkable: at least on the LG G2X that I’ve been using for testing, we’ve gone from checkerboarding between 10-20% of the time and not checkerboarding almost at all (in between two runs of the test with the Aurora build, there is exactly one frame that checkerboards). All credit to Chris Lord for that!

See the video evidence for yourself. Before:

After:

January 03, 2012 08:18 PM

January 01, 2012

Jeff Hammel

Mozilla Automation + Testing - MozBase Continuous Integration

Mozilla Automation + Testing - MozBase Continuous Integration

As part of the A-Team 2011 Q4 goals I was able to devote a few days to setting up continuous integration (CI) for MozBase . I revived and extended autobot to support buildbot 0.8.5, set up tests and a simple test runner for mozbase, and deployed a test instance to k0s.org. You can see the waterfall here: http://k0s.org:8010

While buildbot comes with a gitpoller the version in buildbot 0.8.5 (the current in http://pypi.python.org/ ) did not work with git 1.6.3, the version on k0s.org. Since my box is on an ancient version of Ubuntu (and is remote and not trivially upgradable), I brought the generic autobot poller from being buildbot 0.8.3 compatible to 0.8.5 compatible (which is worth noting is not trivial). Also, while there has been a patch for an hgpoller submitted by Mozilla developers some four years ago, it has been be WONTFIX ed, so I went ahead with a generic polling architecture which (IMHO) seems a wiser architectural choice. While I sympathize with the architectural ideology of using a push-based architecture, and believe this is closer to ideal, polling will always work and does not require access to the repository servers which is a huge factor when using https://github.com or even Mozilla hg repositories. (Incidentally, I found neither this patch nor http://hg.mozilla.org/build/buildbotcustom/file/tip/changes/hgpoller.py to work OOTB, so, sadly, I proceeded to roll my own. Also incidentally, it is not trivial to depend on buildbotcustom using install_requires due to its lack of a setup.py file.) After debugging the gitpoller I pushed a test change and was happy to see that autobot built correctly. Autobot now listens to MozBase changes!

I was unable to finish the (parenthetical) Q4 goal of having autobot report to autolog , so this remains outstanding work. There is a lot that could be done with autolog. The basic idea and TODOs are outlined in the README (which itself could use some work; it is largely up to date except the Projects section, though incomplete). I will endeavor to work on this in my available time or as need escalates, but my priority for 2012 Q1 will be separating Talos Signal From Noise so it is unlikely I will be able to put a lot of time into autobot (sadly). On the other hand, I am more than willing to help and advise if anyone wants any features or to iron out the crinkles. While the architecture is not completely straight forward, it is a decent approximation to a convex hull over the problem space of having simple to write, simple to maintain, simple to debug continuous integration for small(er) projects. As usual, if anyone wanted to seek out alternate solutions, that is fine too, but I am essentially happy with my architecture decisions and technology choices.

Regardless of whether the CI solution for MozBase is autobot or (other), it is important to remember that continuous integration is a safety net and not a first line of defense. It is regrettable that autobot has no more notifications (yet) than the waterfall display and the autobot character lurking in #ateam (the default IRC bot isn't very verbal OOTB and I haven't had time to customize it). But I think having some (admittedly smokescreen) automated testing for MozBase is an important step towards the evolution of the software as well as towards development practices in general.

January 01, 2012 04:51 PM

December 28, 2011

Jeff Hammel

Auto-tools Q4 in reflection: progress on mozbase and talos

Auto-tools Q4 in reflection: progress on mozbase and talos

Most of my effort this quarter was spend on two related goals:

  1. developing a sane set of python packages to build test harnesses on top of. We call this MozBase: https://wiki.mozilla.org/Auto-tools/Projects/MozBase
  2. Making Talos sane and porting it to use the MozBase set of packages.

These are illustrated in our goals page: https://wiki.mozilla.org/Auto-tools/Goals/2011Q4#Mozbase

From one point of view, this isn't exciting work. But I live for this stuff. I think of software as an ecosystem to be cultivated and I live to cultivate it. So while, for the most part, I can't point to any exciting features that I implemented (nor were there planned to be), in retrospect I am proud of the fruits of my efforts and those of my team-mates and comrades. A big shout out to BYK and others who have stepped up to the plate to help the A-Team with these super-important efforts.

When I look back I see:

  • Talos wasn't a python package. Now it is!
  • MozBase didn't even exist or have a repo. Now it does
  • MozBase didn't have documentation or tests worth speaking of. Now it has at least a good start!
  • Talos even has a test for installation. We need more tests, but its a good start!
  • There has been a lot of cleanup of Talos towards the end of making it more robust, easier to use, and easier to contribute to.
  • The A-Team didn't have any community contributors. Now we do! This one actually makes me the happiest :)

When I look the progress, I see Talos evolving towards what I would call real software (instead of a one-off that has been extended to do way too much to make it a one-off) that Mozillians can hack on and extend and make useful changes to. This also sets the stage for making Talos easier for developers to use locally to test their changes as well as getting more of our test harnesses to use the MozBase suite of utilities as well as making it easier to write new harnesses without reinventing so much of the wheel.

One of our our next priorities towards these ends is Bug 713055 - get Talos on Mozharness in production This is a huge step towards making buildbot more extensible as well as having desktop talos be more accessible to developers in a way that should be identical to the way that it is done in automation. :aki has done a bunch of work to start moving our aging buildbot infrastructure towards something more sane. This is mozharness .

Armen (:armenzg) also updated the way that talos.zip is sought so that it can be decoupled from buildbot. This is another big step forward that he details in his blog post: talos.zip, talos.json and you .

So a huge shout out to :jmaher and :wlach for all the Talos help, and :ahal and :ctalbert as well as all the help from those in release engineering for making all of this possible. I look forward to getting this all better in the coming year.

December 28, 2011 12:18 PM

December 27, 2011

Clint Talbert

Cross Browser Startup Automation

One of the longest running performance measurements we have is how long it takes Firefox to start. We do it very simply just to get a raw number (and yes, there have been many improvements made but this is the gist of the automation):

Pretty simple.  Applying this to different browsers, you have to nix the “print to console” idea.  But, how hard could it be to POST to a web service that stuffs your result in a database?  Do that and the rest of it will all “just work”, right?

Well, not really.  Every browser implements the cross-origin access policy to a different degree, and since we did this on android, some of them don’t seem to support it at all.  Once we found a way around that, we realized that not all the data was making it into the database because the automation would kill the browser before it had a chance to POST its results.  So we slowed that down, forcing the automation to wait 20s before closing the browser.  Then our database crashed, this part we had nothing to do with, but Murphy’s law states that you can’t have an automation project without at least one bonfire igniting under your chair.

Add to this cross-browser headache that we’re automating this on multiple phones.  The older Nexus phones (Nexus One and Nexus S) will not stay connected to a wireless network after reboot (appears fixed with Galaxy-Nexus or with ICS, not sure which).  Even if you put these phones on an open network with no contention and they are set to “join automatically”, they will at some point boot into a state with their wireless disabled. We had to write some service code to ensure the wireless remained on and connected to our specific network on boot.  Our other phones (a Droid Pro and a Samsung Galaxy S2) have no problem staying connected to the network, but they alternately “freeze”.  I’m still trying to debug what this “freeze” actually is, but everything is functioning fine on the phone – network, logcat, process list etc are all normal.  However, the phone stops running the automation.  It’s interesting that the Nexus phones never encounter this issue and they are all running the same version of the automation code and browsers.

At long last, we have fought through enough of these issues so that we can start to see the results of the data coming into our database (select “2 months” or “all” to see data).  Because we are merely firing our “timing” function when the “onload” event happens for the page, we can see the different interoperability issues with measuring this event. We knew it wasn’t perfect, but the results we are seeing on Android make me call into question the usefulness of this as a cross-browser comparison tool at all.

Onload results for Native Fennec

The system is far from perfect.  Measuring onload is at best an artificial metric, and not at all indicative of what the user sees.  In desktop automation, we don’t even use onload, we use the “mozafterpaint” event notification.  For the next stage of the cross-browser test we are going to automate some visual comparison tests to get closer to measuring the metric that really matters: real-life user experience.  In the meantime, the onload tests will continue to give us a rough barometer of our regressions and performance, especially against our own historical data.  To that end, I am going to undertake the next few improvements to this automation:


December 27, 2011 10:48 PM

December 23, 2011

William Lachance

Year end Eideticker update

Just before I leave for some Christmas vacation, it’s time for another update on the state of Eideticker. Since I last blogged about the software, I’ve been working on the following three areas:

  1. Coming up with better algorithm (green screen / red screen) for both determining the area of the capture as well as the start/end of the capture. The harness was already flood filling the area with these colours at the beginning/end of the capture, but now we’re actually using this information. The code’s a little hacky, but it seems to work well enough for the test cases I’ve been using so far.
  2. As a demonstration, I wrote up a quick test that demonstrates checkerboarding on mobile Fennec, and wrote up a quick bit of analysis code to detect this pattern and give an overall measure of how much this test “checkerboards” (i.e. has regions that are not fully painted when the user scrolls). As I understand this is an area that our mobile team is currently working on this problem quite a bit, it will be interesting to watch the numbers given by this test and see if things improve.
  3. It’s a minor thing, but you can now view a complete webm movie of the captured movie right from the web interface.

Here’s a quick demonstration video that shows all the above in action. As before, you might want to watch this full screen:

Happy holidays!

December 23, 2011 04:59 PM

December 21, 2011

Joel Maher

Love Mozilla, Love Python, Want to Help, What Next?

I have been asked a few times over the last couple months how to help out at Mozilla, specifically with python.  I know there are dozens of teams within Mozilla that have various python related projects.  I am on the automation and tools team at Mozilla (known as the A*Team) and we do a lot of python related work.  It seems that we are asked to add new and crazy stuff to harnesses or write new and interesting tools (usually a blend of python and javascript).

1) Mozbase.  Our efforts in our spare time is to refactor our test harnesses within Mozilla to share common code where possible, we call it mozbase.  I recommend doing a git clone of mozbase and getting it installed on your system: git clone git@github.com:mozilla/mozbase.git

2) Talos.  Next is to pick up a test harness.  We have been focusing on talos.  Mostly because you don’t have to pull the entire mozilla-central tree, do hour long builds, and really because the talos code base is in need of some serious updating.  To get talos, you need to clone it: hg clone http://hg.mozilla.org/build/talos

3) Configure Talos.  Talos is run in 2 steps right now.  A configuration step and a execution step.  The configuration step requires a path to firefox.exe as well as an active test (I use ts to keep it simple) is pretty easy:  “python PerfConfigurator.py –develop -a ts -e <path>/firefox.exe –output mytest.yml”.

4) Run Talos.  this step is easy.  Make sure you don’t have another instance of firefox.exe running on the computer and then run: “python run_tests.py -d -n mytest.yml”.

5) Take a look at some of these bugs that we have which are related to mozbase and/or talos:  http://bit.ly/tZHs3G

While this isn’t exhaustive or a perfect guide for how to work on the perfect bug in an hour or less, these 5 steps should get you setup to work on basic Mozilla code and start fixing bugs!  Pop into #ateam on irc.mozilla.org and ask some questions.

Now back to the other PI(e) that I always talk about!


December 21, 2011 04:38 AM

December 20, 2011

Joel Maher

How the mobile automation for Android became reliable in the last few months

Making the Mozilla automation infrastructure run reliably for each checkin on mobile devices has been my primary focus for the last few years.  Last year at this time we were just trying to get Android automation up and running, all tools and harnesses had been written and ready to run.  The core buildbot code for running the tests was in place.  The problem was that we just had so many failures of the devices (NVidia Tegra development boards) and the tests.

So as the months went on from last December and up through August, we really made little progress.  A few tests were fixed, some disabled, some checks in place to make the boards stay online, but really no consistent set of test results.

There were a couple things that fixed our problems:

1) a rock star intern (:jchen) who found and fixed some workarounds with the OS so fennec wouldn’t crash all the time (issues with networking and libc).

2) a weekly meeting started by :blassey to go over all the bugs, status, issues, future work, and other items.

Both of these items are signs that the mobile development team was serious about testing and wanting to see Android unittests become a part of Mozilla.  While this seems trivial, it was next to impossible to keep tests running smoothly without support from the entire team.

Enjoy the reliable unit tests on Android!


December 20, 2011 09:03 PM

December 11, 2011

Andrew Halberstadt

Peptest: Running and Adding Tests in Mozilla Central

Peptest is an automated test harness for measuring responsiveness (or lack thereof) in Firefox (see my older post).

Recently, peptest landed in Mozilla Central along with a make target. This means that you can run the tests with:

cd path/to/objdir
make peptest
There's some more in depth documentation on MDN.

Adding Tests

Being able to run the tests is great and all, the only problem is that there aren't any tests yet (aside from a handful of example tests)! Tests need to be added by developers and should correspond to a bug. I'll be making another post about best practices in a bit, but for now you can check out the test format to get started.

Once you have your test finished, tested and reviewed you can add it to the default list of tests. Tests currently live in testing/peptest. For example, say I had a couple tests that tested the responsiveness of Firefox's tabs. I might:

  1. Create a new directory called testing/peptest/tests/firefox/tabbrowsing
  2. Place the tests into this directory
  3. Create a manifest called tabbrowsing.ini and populate it with my tests eg:
    [tabbrowsing_open.js]
    [tabbrowsing_close.js]
    [tabbrowsing_switch.js]
    
  4. Add this manifest to the firefox_all.ini manifest:
    [include:tabbrowsing/tabbrowsing.ini]
    
Any tests that are included in the firefox_all.ini manifest will be run when make peptest is run.

Feel free to ping me (ahal) on irc if you have any questions or need help writing tests.

December 11, 2011 03:42 AM

December 09, 2011

William Lachance

Eideticker areas to explore

So I got some nice feedback on my Eideticker post yesterday on various channels. It seems like some people are interested in hacking on the analysis portion, so I thought I’d give some quick pointers and suggestions of things to look at.

  1. As I mentioned yesterday, the frame analysis is rather stupid. We need to come up with a better algorithm for disambiguating input noise (small fluctuations in the HDMI signal?) from actual changes in the page. Unfortunately the breadth of things that Eideticker’s meant to analyze makes this a bit difficult. I.e. edge detection probably wouldn’t work for something like Microsoft’s psychedelic browsing demo. I suspect the best route here is to put some work into better understanding the nature of this “noise” and finding a way to filter it out explicitly.
  2. Our analysis code is still rather slow, and is crying out to be parallelized (either by using multiple cores of the same CPU or a GPU). Burak Yiğit Kaya recommended I look into PyCuda which looks interesting. It looks like there are other possibilities as well though.
  3. Clipping capture by green screen/red screen. This should be doable by writing some relatively simple code to detect large amounts of green and red and then ignoring previous/current/subsequent frames as appropriate.
  4. Moar test cases! It was initially suggested to use some of the classic benchmarks, but these only seem to barely work on Fennec (at least with the setup I have). I don’t know if this is fixable or not, but until it is, we might be better off coming up with more reasonable/realistics measures of visual performance.

You might be able to find other inspiration on the Eideticker project page (note that some of this is out of date).

You obviously need the decklink card to perform captures, but the analysis portion of Eideticker can be used/modified on any machine running Linux (Mac should also work, but is untested). To get up and running, just follow the instructions in README.md, dump a pregenerated capture into the captures/ directory (here’s one of a clock), and off you go! The actual analysis code (such as it is) is currently located in src/videocapture/videocapture/capture.py while the web interface is in https://github.com/mozilla/eideticker/blob/master/src/webapp.

I’m going to be out later today (Friday), but I’m mostly around on IRC M-F 9ish-5ish EST on irc.mozilla.org #ateam as `wlach`. Feel free to pester me with questions!

P.S. I didn’t really cover infrastructure/automation portions above as I suspect people will find that less interesting (especially without a video capture card to test with), but you can look at my newsgroup post from yesterday if you want to see what I’ll likely be up to over the next few weeks.

December 09, 2011 03:47 PM

December 08, 2011

William Lachance

Eideticker update

Since I last blogged about Eideticker, I’ve made some good progress. Here’s some highlights:

  1. Eideticker has a new, much simpler harness and tests are much easier to write. Initially, I was using Talos for this task with the idea that it’s better not to have duplicate code where it’s not really required. Seemed like a fine idea in principle, but in practice Talos’s architecture (which is really oriented around running a large sequence of tests and uploading the results to a central server) was difficult to extend to do what we need to do. At heart, eideticker really only needs to do a few things right now (start up Firefox, start videocapture, load a webpage, stop videocapture) so it’s best to keep things simple.
  2. I’ve reworked the capture analysis API to use numpy behind the scenes. It’s still not quite as fast as I would like (doing a framediff analysis on a 30 second animation still takes a minute or so on my fast machine), but we’re doing an order of magnitude better than before. numpy also seem to have quite the library of routines for doing the types of matrix algebra useful in image analysis, which should be helpful as the project progresses.
  3. I added the beginnings of a fancy pants web interface for browsing captures and doing visualizations on them! I’m pretty happy with how this is turning out so far, it’s already been an incredibly useful tool for debugging Eideticker’s analysis system and I think it will be equally useful for understanding Firefox’s behaviour in general.

Here’s an example analysis session, where I examine a ~60 second capture of the fishtank demo from Microsoft, borrowed from Mark Cote’s speedtest library. You might want to view this fullscreen:

A few interesting things to note about this capture:

1. Our frame comparison algorithm is still comparatively dumb, it just computes the norm of the difference in RGB values between two frames. Since there’s a (very tiny) amount of noise in the capture, we have to use a threshold to determine whether two frames are the same or not. For all that, the FPS estimate it comes with for the fishtank demo seems about right (and unfortunately at 2 fps, it’s not particularly good).
2. I added a green screen / red screen at the start / end of every capture to eliminate race conditions with starting the capture, but haven’t yet actually taken those frames out of the analysis.
3. If you look carefully at the animation, not all of the fish that should be displaying in the demo are. I think this has to do with the new native version of Fennec that I’m using to test (old versions don’t exhibit this property). I filed a bug for this.

What’s next? Well, as I mentioned last time, the real goal is to create a tool that developers will find useful. To that end, we have plans to set up an Eideticker machine in Mozilla Mountain View office that more people can use (either locally or remotely over the VPN). For this to be workable, I need to figure out how to get the full setup working on “demand”. Most of the setup already allows this, with one big exception: the actual Android device that we want to capture video from. The LG G2X that I’m currently using works fine when I have physical access to it, but as far as I can tell it’s not possible to get it outputting proper video of an application unless it’s in an unlocked state (which it obviously isn’t most of the time).

My current thinking is that a Panda Board running a Vanilla version of Android might be a good candidate for a permanently-connected device. It is capable of HDMI output, doesn’t have unwanted the bells and whistles of a physical phone (e.g. a lock screen), and should be much reliable due to its physical networking. So far I haven’t had much luck getting it the video output working with the Decklink capture card, but I’ve only just started trying. Work will continue.

If I can somehow figure that out, and smooth out some of the rough edges with the web interface and capture API, I think the stage will be set for us all to do some pretty interesting stuff! Looking forward to it.

December 08, 2011 05:11 PM

December 05, 2011

Joel Maher

TegraPool – Bathing suit not required

I have had the honor of working with Trevor on a few projects during his internship at Mozilla.  One of earlier projects he worked on was TegraPool, a utility to check out a tegra and run tests on it as we do on tinderbox.  Trevor doesn’t have a blog setup or a feed to planet, so here is what Trevor has to say:

A few weeks ago, I got nominated as a friend of a tree for being able to help out with mobile development get past issues with talos. Being a relatively new hire, I too am aware about how difficult it can be to get a testing environment set up, especially if you haven’t worked with mobile devices at all. Luckily, the first project I worked on this term is especially useful for those who can’t be bothered with setting up talos, or getting a Tegra and setting up the proper config for it.

Tegra Pool is an internal-only site for those who need to debug the issues in an automated testing run (such as on TBPL).  It can be found here: TegraPool. It will show you a table of all the devices available, and a couple forms to checkin and checkout.

If you are local and have the entire mobile testing suite set up, then it’s easy. Just put in your LDAP credentials and click “Check Out”. The IP of the Tegra you get will pop up, and not only will you be able to telnet and use the SUTAgent, but the AndroidDeviceBridge(ADB) will use TCP/IP, allowing you to connect with “adb connect <ip>”.

Remoties have a bit more of a problem, as most tests will require the Tegra to contact an “external server”, but we don’t want it to be making requests off-network. Also, many users don’t want to run the tests on their own computer, because they might need to run other things, or might not want to set up the entire testing environment. Luckily TegraPool solves these problems too.

If you are remote, or don’t have the time to set up the entire testing set, you can select the “I want server…” checkbox. To set up everything for you, you will have to point it to a test folder to get the app and zip files. This is best done with going to the build on Try in TinderBoxPushLog and clicking, “go to build directory”, when B is selected, and selecting the try-android-xul directory (or equivalent). Alternatively, ftp…mozilla-central-android  (or other folders in the nightly directory) is usually a good folder to use. This will set up a temporary account for you on the TegraPool server (based on your LDAP username).

Once you have checked out a Tegra and received a temporary account, you can now SSH into the machine (The password is a standard “giveMEtegra”). If you look in the home folder, you can see a lot of scripts that will just run. runMochiRemote.sh will run every single mochitest, runTalosRemote.sh will run the quick tpan test, and runRefRemote.sh will run the ref tests. If you connect to the device with ADB before running this, you should be able to pinpoint where issues are occurring.
This directory is a quick product, and should not limit what you can do. You can sftp new fennec.apk files, or modify the .sh files to run the necessary tests (i.e. other talos tests or specific mochitests).

Hopefully, this should let anybody who wants to debug mobile issues have a fast and easy option. Right now there are only 2 Tegra boards and 2 Panda boards running android, but if there is enough usage, more devices will be added. Happy debugging!

-Trevor (tfair on IRC)


December 05, 2011 10:41 PM

Jeff Hammel

The state of Talos this week:

The state of Talos this week:

  • we need to fix Talos importing of mozbase. We want to get Talos to consume mozdevice, mozinfo, mozhttpd, and mozrunner, mozprocess, mozprofile
  • The current state of things: - talos includes the files of mozdevice and mozhttpd.py - we mirror these manually but things get out of sync
  • An interim solution is posed in bug 707218: mirror mozdevice, mozinfo, and mozhttpd to talos for the purpose of creating a tests.zip file and list them in setup.py for setuptools installation. This works because these are all simple dependencies, but will not work for mozprocess, mozrunner, and mozprofile as these all have dependencies of their own
  • In order to use these dependencies (mozprocess, mozrunner, mozprofile) in production talos, we will need a releng python package index: Bug 701506 . I will do a mock-up there; whether it will fulfill releng needs or not is hard to say. We will probably want to transition to mozharness soon thereafter or at the same time, but we shouldn't block on any more than we need to. These are all big changes to our deployment strategy, and for purposes of QA we will want to make as strategic and specific decisions as possible.
  • Once the transition is completely done, we can do away with talos.zip entirely.
  • Additionally, in order to make talos work with setup.py, the pywin32 package should be listed. However, pywin32 is in general compiled for specific python and windows versions. See e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=673132#c8 . BYK is looking into this, possibly switching the linked dependency based on the platform you are on.
  • slewchuk is looking into Talos data aggregation: bug 707486

This is a rough map of what we want to do. As said, with so many balls in the air, we will want to block on as little as possible and make as few really big changes at a time so that we can ensure that each piece of the puzzle fits together correctly.

December 05, 2011 08:06 PM

November 30, 2011

Mark Côté

Mozilla A-Team: Writing tests for Peptest

With ahal's impending return to studentdom and myself coming back from paternity leave, I will be taking over Peptest development and maintenance. To get myself up to speed, I wrote some tests.




The easiest test to write is one that looks for unresponsiveness while simply loading a page. I've noticed that the site for my favourite blog, The Daily What, causes some pain in Firefox, what with all the videos and images and so forth. I wrote a very simple test to see if I was onto something:

Components.utils.import('resource://mozmill/driver/mozmill.js');
let c = getBrowserController();

pep.performAction('open_page', function() {
  c.open('http://thedailywh.at');
  c.waitForPageLoad();
})


Indeed, while there were no very long pauses, there were a string of short ones. Remember, we care about pauses longer than 50 ms, which Peptest identifies for us:

PEP TEST-START | test_dailyWhat.js
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 103 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 199 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 112 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 204 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 105 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 57 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 79 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 194 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 202 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 68 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 182 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 63 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 84 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 118 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 51 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 55 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 67 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 215 ms
PEP TEST-UNEXPECTED-FAIL | test_dailyWhat.js | fail threshold: 0.0 | metric: 322.362
PEP TEST-END   | test_dailyWhat.js | finished in: 9426 ms


Not awful, but not great. I filed bug 706250 to investigate this.

Next, I decided to delve into some Project Snappy bugs to see what else I could find.

Changing the URL in the above test was all I needed to confirm that the page in comment 62 of bug 61684 was still an issue. This time I got about 50 unresponsive periods, the longest being 2.6 s. Ouch.

Bug 430106 is a little more interesting. Someone reported problems switching back to a tab in which a large image was loaded. The simplest way I could replicate this was by loading the example URL in one tab, loading any old page in a second tab, waiting for about 20 seconds, then switching back to the first tab. In peptest form,

Components.utils.import('resource://mozmill/driver/mozmill.js');
let c = getBrowserController();

while (c.window.gBrowser.tabs.length < 2) {
  c.window.gBrowser.addTab();
}

// Load large image in first tab.
c.tabs.selectTabIndex(0);
c.open('http://flickr.com/photos/thomasstache/2429920499/sizes/o/');
c.waitForPageLoad();

// Load any page in second tab.
c.tabs.selectTabIndex(1);
c.open('http://www.mozilla.org');
c.waitForPageLoad();

// Wait for memory to be freed from first tab.
c.sleep(20000);

pep.performAction('switch_tab', function() {
  c.tabs.selectTabIndex(0);
  // Wait for image to repaint.
  c.sleep(2000);
});


When I ran this, I saw a visible delay before the image was repainted. Peptest confirmed this:

PEP TEST-START | test_largeImgTabSwitchLocal.js
PEP WARNING    | test_largeImgTabSwitchLocal.js | switch_tab | unresponsive time: 54 ms
PEP WARNING    | test_largeImgTabSwitchLocal.js | switch_tab | unresponsive time: 835 ms
PEP TEST-UNEXPECTED-FAIL | test_largeImgTabSwitchLocal.js | fail threshold: 0.0 | metric: 700.141
PEP TEST-END   | test_largeImgTabSwitchLocal.js | finished in: 24083 ms


The unresponsiveness appears to be relative to the size of the image, as an image of about twice the dimensions, that is, 3.4 times as many pixels, resulted in a delay about 3.4 times as long.




One of the next steps for Peptest is to add JS-function tracing so we can figure out the exact sources of unresponsiveness. This, however, requires a fix for bug 580055, which in turn depends on bug 702740. As soon as patches for those bugs have landed, we'll add support to Peptest.

For more information on Peptest, see the wiki article and/or check out the code, which has recently been moved to hg.mozilla.org under mozilla-central/testing/peptest.

November 30, 2011 06:38 PM

November 28, 2011

Joel Maher

work in progress – turning on reftests in the new native UI for firefox on android

The latest builds of Mobile Firefox are switching to using a Java based UI, which means the tests that depend on a traditional window environment and backend XUL will most likely fail.  In general we have mochitests and some talos tests running, but reftests are a huge piece of testing that hasn’t been working.

In bug 704509, I have a patch to get reftests working with a Java front end.  This is really just using the XUL backend, but making it work with the limited support we have for addons and XUL, here are some differences:

I need to make this work with our current reftest harness for Firefox.  So most of these changes will need to be cleaned up to work in a way acceptable to everybody and minimize the special case hacking for android.


November 28, 2011 04:17 PM

November 21, 2011

Jeff Hammel

I've been developing `Talos <https://wiki.mozilla.org/Buildbot/Talos>`_ ...

I've been developing Talos recently. There are many caveats working on this test harness that demands a more rigorous process than, say, a webapp. It has a large amount of necessary platform-specific code. It is deployed in a complex infrastructure environment. And it has no tests.

In order to test Talos, the A*Team has an internal staging environment (thanks to the efforts of anode and bhearsum and others) that mirrors the production testing infrastructure environment. Like production, it requires an HTTP-hosted URL structure containing pageloader , a pageset (tp5 ), and other resources necessary for buildbot plus Talos. (We should probably document the directory structure.)

In order to test Talos, you point the A*Team staging environment configuration to your HTTP-hosted location of your copy of this structure of resources. Then you issue a buildbot sendchange (which can be scripted for ease of use) that corresponds to a set of Talos tests that are run on each platform of interest. We have some simple scripts to run tests (i.e. ./chrome.sh or ./dirty.sh) to run sets of tests as we do in production. This translates to a variety of buildbot sendchange commands appropriate for the tests to be run. Green runs means good.

In order to test my Talos changes, I needed to setup a system whereby I could translate my changes into a hosted copy of talos, pageloader, etc. So here is what I did.

Steps:

  1. Replicate http://people.mozilla.org/~jmaher/taloszips/tip/

    It would be nice to provide a sane base template for this.

  2. Put the talos zips on a web server:

    cd mozilla/web/talos # change to a desired hosted directory
    wget -r -l0 --no-parent http://people.mozilla.org/~jmaher/taloszips/tip/
    mv people.mozilla.org/~jmaher/taloszips/tip # the piece you need
    rm -rf people.mozilla.org # cleanup unneeded directories
    find tip -iname 'index.html*' -delete # remove unneeded index pages
    

    [Example: http://k0s.org/mozilla/talos/tip/]

  3. Clone a copy of Talos:

    cd ~/mozilla/src/
    virtualenv.py talos-staging
    cd talos-staging; mkdir src; cd src
    hg clone http://k0s.org/mozilla/hg/talos
    echo 'default-push = ssh://k0s.org/mozilla/hg/talos' >> talos/.hg/hgrc
    
  4. Development process:

Based on jmaher's update_talos.sh, I wrote a script to help me turn changes into changes in my hosted copy of talos.zip. Since I work largely in diffs hosted on bugzilla or my mercurial queue of Talos patches, I wanted a script that would apply a series of changes to a checkout of talos . In addition, I wanted to keep the flexibility of being able to edit these files on disk.

The script lives at http://k0s.org/mozilla/update_talos.py . I will endeavor to improve it as testing needs become more apparent. It sadly loses jmaher's update_talos.sh feature to create versioned zips. I thought about hosting a dedicated talos repository for testing (and still may, if that seems better down the line), but usually want to test a specific change and rollback to a known state.

The script does the following:

  1. Cleans up and reclones, optionally
  2. Applies a series of diffs
  3. Creates a talos.zip and moves to the appropriate place on disc.
  4. Fetches a fresh copy of pageloader.xpi
  5. Syncs the files with the HTTP server
  6. Cleans up and reclones, optionally

After the HTTP copy is updated, I can run (e.g.) xperf.sh to trigger that set of tests in the staging environment and watch the waterfall to assess the viability of the change

It would be nice to have something more generic, but the path to good software is through iteration. Perhaps as more people develop their own scripts to test Talos in the staging environment we will evolve to a more generic script to update talos as well as copies or templates of the URL/directory structure of what as needed as well as the staging software.

November 21, 2011 10:42 AM

November 18, 2011

William Lachance

A planet to call our own

Just a quick note that a planet for Mozilla Tools & Automation (the so-called “a team”) is now up, thanks to Reed Loden. With the exception of Jeff Hammel, everyone there was already being syndicated on Planet Mozilla, but this should offer a more focused feed of our doings for those who can’t always keep up with the firehose. Have a look:

http://planet.mozilla.org/ateam

Who should care? Well, we maintain all the major testing frameworks like Mochitest, Reftest, and Talos as well as automated tooling for QA like Mozmill. Our latest work is focused on making sure that Firefox is as robust, responsive, and performant as possible on desktop and mobile. In short, if you’re writing or verifying code from mozilla-central, what we’re doing probably affects you. Please let us know what you think about our projects and whether there’s anything we can do to make your job easier: we’re listening.

Quick bonus note: It’s not immediately obvious (or at least it wasn’t to me), but Mozilla has some fairly finely tuned infrastructure for running planets. If your team or group wants one, it’s definitely better to plug into that instead of rolling your own. ;) Reed Loden is the maintainer and the source lives in subversion.

November 18, 2011 10:47 PM

Clint Talbert

Meetings, meetings, meetings

If you work someplace, you have meetings.  It’s impossible not to.  Because the Automation and Tools team works on many different projects simultaneously, it was natural for us to have one big meeting a week to discuss the status of these projects, raise concerns, make announcements etc.  This is also the one meeting I’d invite outside contributors to so that they can learn who everyone on the team is and what we’re all doing.

However, week after week as I asked for each project’s status and listened to it, I wondered why on earth would anyone want to come to this?  And why were we spending an hour each week boring ourselves to tears when we could be doing something useful like being silly on IRC? So, the A-team and I talked about it, and we decided to do an experiment with the meeting.  Here’s what we’ve been doing for November:

The entire thing takes no more than twenty minutes, and most weeks it takes less than ten. So far, I have to say I’m a fan of the new meeting.  I worried that I’d lose my ability to stay abreast of what is happening on our projects, but that hasn’t been the case.  In fact, if you compare the wiki pages from before with these new ones, you’ll see that our emcees do an amazing job pulling together the data and communicating the highlights.

The other benefit this gives us is that as we grow into a larger team, it’s harder for all of us to interact.  Our rotating emcee gives each person a chance to talk with everyone else on the team and learn something about everyone’s projects.

I don’t know if this would work well for other teams, but it has worked really well for us so far.  If you’d like to drop in, here’s the information about our meeting.  This week’s emcee is our illustrious maple-bacon-cake-baking, cowboy-boot-wearing intern, Tfair.


November 18, 2011 02:42 AM

November 16, 2011

Malini Das

Attention Mozilla Test Writers!

We've just introduced a change to Mozilla's Mochitest harness to improve test run times, as per Bug 367393. This involved the removal of unnecessary MochiKit usage. We found that we were including a minified version of MochiKit, packed.js, in all our tests and within the harness, but we would only use a small portion of this enormous suite. That added an extra load of about 5 minutes per debug test run, so we removed MochiKit from our harness and added replacement functionality to SimpleTest. Note that the SimpleTest.js file in MXR may not yet be updated, so pull the latest mozilla-central code to see latest changes! What this means for you: If you're writing a test that doesn't require MochiKit, please do not include packed.js in your test. This just adds extra load. If your test does require some part of MochiKit, please check if that functionality ...

November 16, 2011 09:38 AM

November 15, 2011

Andrew Halberstadt

An Easier way to Manage Mozconfigs

Mozconfigwrapper is a tool inspired by Doug Hellman's magnificent virtualenvwrapper. In a nutshell, mozconfigwrapper hides all of your mozconfigs into a configurable directory (defaults to ~/.mozconfigs), and lets you easily switch, create, remove, edit and list them. Mozconfigwrapper is Unix only for now.

Mozconfigwrapper is brand new. I still need to add some better error checking and do testing on OSX. So if you have any problems installing or using it, please let me know or file an issue.

Installation

To install first make sure you have pip. Then run the command

sudo pip install mozconfigwrapper
Next open up your ~/.bashrc file and add the line
source /usr/local/bin/mozconfigwrapper.sh
Note that it may have been installed to a different location on your system. You can use the command 'which mozconfigwrapper.sh' to find it.

Finally run the command

source ~/.bashrc
Mozconfigwrapper is now installed.

Usage

Mozconfigwrapper allows you to create, remove, switch, list and edit mozconfigs. To build with (activate) a mozconfig named foo, run:
buildwith foo
To create a mozconfig named foo, run:
mkmozconfig foo
To delete a mozconfig named foo, run:
rmmozconfig foo
To see the currently active mozconfig, run:
mozconfig
To list all mozconfigs, run:
mozconfig -l
To edit the currently active mozconfig, run (the $EDITOR variable must be set):
mozconfig -e

Configuration

By default mozconfigs are stored in the ~/.mozconfigs directory, but you can override this by setting the $BUILDWITH_HOME environment variable. e.g, add:

export BUILDWITH_HOME=~/my/custom/mozconfig/path 
to your ~/.bashrc file.

When you make a new mozconfig, it will be populated with some basic build commands and the name of the mozconfig will be appended to the end of the OBJDIR instruction. You can modify what gets populated by default by editing the ~/.mozconfigs/.template file. For example, if I wanted my default configuration to store object directories in a folder called objdirs and enable debugging and tests, I'd edit the ~/.mozconfigs/.template file to look like:

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdirs/
ac_add_options --enable-application=browser
ac_add_options --enable-debug
ac_add_options --enable-tests
Now if I ran the command 'mkmozconfig foo', foo would be populated with the above and have the word 'foo' appended to the first line.

November 15, 2011 08:11 PM

Jeff Hammel

Introducing MozBase

Introducing MozBase

Over the years, Mozilla has developed a number of test harnesses for automated testing of Firefox and other applications. Most of the harness code is written in python due to its utility towards this type of development. As one would expect, the harnesses arose from necessity and grew organically. However, as the harnesses grew it became apparent that there were several generic tasks that the harnesses shared:

  • creating and manipulating a profile
  • installing addons into the profile
  • invoking (e.g.) Firefox in a desired manner
  • process management
  • ...a few other things

These pieces have largely been developed in a vacuum (in the early stages) or copy+pasted from other harnesses (in the later stages). This has lead to duplicated functionality, difficult to maintain and inconsistent harness software (since fixing things one place means that they probably need to fix them other places), and a system which was fully understood by no one after it became of sufficient complexity. The harness software could not be reused because it is tightly coupled to the implementation even when the underlying intent was generic.

Meet MozBase!

As software grows, it should be cultivated such that the effectivity and its knowledge base are maximized. Code should be made reusable and the architecture evolved towards a representation of intent. This is the goal of the MozBase effort by the A-Team : https://wiki.mozilla.org/Auto-tools/Projects/MozBase

  • we want to make high quality components to build test harnesses
  • ... and other pieces of software
  • ... that might be useful on their own
  • we want to replace existing code with these pieces
  • ... but cultivate their knowledge base
  • we want to develop canonical and reusable python tools
  • ... and encourage the community to use them

Developing MozBase is one of the A-Team goals this quarter. While cultivating software is an ongoing effort, we're off to a good start. We already have several MozBase python packages:

Our immediate goals are to cultivate these into high-quality tools taking lessons from the existing harnesses. Then, porting the harnesses to these tools that can be maintained in a unified manner. Right now, we're working on Talos both because this is a good proving ground for these tools and because much of its code can be replaced with MozBase code easily (for some definition of "easy").

While MozBase is about software, it is also about having a sane and maintainable environment to cultivate software in. While modular packages are great, their utility is in how they may be used together (as well as with other code) instead of in the craft of an individual package. So we're tackling these issues too.

Python importing in Mozilla Central: currently (most) python in mozilla central is not packaged and we manually futz with pythonpath and sys.path in several inconsistent and hard to maintain ways. In order to move towards python packages in any reasonable fashion we need to make importing easy and unified as well as moving towards how the python world typically does importing. There is bug 661908 for creating a unified virtualenv in the $OBJDIR. Work is likely to start on this or a similar effort soon (either this quarter or Q1 2012).

Mirroring software to Mozilla Central: we have hampered ourselves -- rewritten software and avoided fixing bugs -- by not using third-party python packages for tools that live in mozilla-central. In addition, since many of the test harness already live in m-c , if we are going to move these to consume mozbase we will need a strategy to mirror it and other software to the tree. While nothing has been definitively decided, preliminary discussion has pointed towards having a script to fetch resources from a variety of locations and add them to mozilla-central or elsewhere. We're having a meeting this week to figure out what we really want to do and go from there.

Such is the MozBase effort. I am excited to start moving our code into a solid maintainable structure, and I hope you are too. If you are, please check out our github project or sign in to #ateam# and tell us what you think. We'd love contributors!

November 15, 2011 10:59 AM

November 14, 2011

Jonathan Griffin

B2G and WebAPI testing in Emulators

Malini Das and I have been working on a new test framework called Marionette, in which tests of a Gecko-based product (B2G, Fennec, etc.) are driven remotely, ala Selenium.  Marionette has client and server components; the server side is embedded inside Gecko, and the client side runs on a (possibly remote) host PC.  The two components communicate using a JSON protocol over a TCP socket.  The Marionette JSON protocol is based loosely on the Selenium JSON Wire Protocol; it defines a set of commands that the Marionette server inside Gecko knows how to execute.

This differs from past approaches to remote automation in that we don’t need any extra software (i.e., a SUTAgent) running on the device, we don’t need special access to the device via something like adb (although we do use adb to manage emulators), nor do tests need to be particularly browser-centric.  These differences seem advantageous when thinking about testing B2G.

The first use case to which we might apply Marionette in B2G seems to be WebAPI testing in emulators.  There are some WebAPI features that we can’t test well in an automated manner using either desktop builds or real devices, such as WebSMS.  But we can write automated tests for these using emulators, since we can manipulate the emulator’s hardware state and emulators know how to “talk” to each other for the purposes of SMS and telephony.

Since Marionette tests are driven from the client side, they’re written in Python.  This is what a WebSMS test in Marionette might look like:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving
    sender = Marionette(emulator=True)
    assert(sender.emulator.is_running)
    assert(sender.start_session())

    receiver = Marionette(emulator=True)
    assert(receiver.emulator.is_running)
    assert(receiver.start_session())

    # setup the SMS event listener on the receiver
    receiver.execute_script("""
        var sms_body = "";
        window.addEventListener("smsreceived",
                                 function(m) { sms_body = m.body });

    """)

    # send the SMS event on the sender
    message = "hello world!"
    sender.execute_script("navigator.sms.send(%d, '%s');" %
        (receiver.emulator.port, message))

    # verify the message was received by the receiver
    assert(receiver.execute_script("return sms_body;") == message)

The JavaScript portions of the test could be split into a separate file from the Python, for easier editing and syntax highlighting.  Here’s the adjusted Python file:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving and
    # load the JS scripts for each
    sender = Marionette(emulator=True)
    assert(sender.emulator.is_running)
    assert(sender.start_session())
    assert(sender.load_script('test_sms.js'))

    receiver = Marionette(emulator=True)
    assert(receiver.emulator.is_running)
    assert(receiver.start_session())
    assert(receiver.load_script('test_sms.js'))

    # setup the SMS event listener on the receiver
    receiver.execute_script_function("setup_sms_listener")

    # send the SMS event on the sender
    message = "hello world!"
    target = receiver.emulator.port
    sender.execute_script_function("send_sms", [target, message])

    # verify the message was received by the receiver
    assert(receiver.execute_script_function("get_sms_body") == message)

And here’s the JavaScript file:

function send_sms(target, msg) {
    navigator.sms.send(target, msg);
}

var sms_body = "";

function setup_sms_listener() {
    window.addEventListener("smsreceived",
                            function(m) { sms_body = m.body });
}

function get_sms_body() {
    return sms_body;
}

Both of these options are just about usable in Marionette right now.  Note that the test is driven, and some of the test logic (like asserts) resides on the client side, in Python.  This makes synchronization between multiple emulators straightforward, and provides a natural fit for Python libraries that will be used to interact with the emulator’s battery and other hardware.

What if we wanted JavaScript-only WebAPI tests in emulators, without any Python?  Driving a multiple-emulator test from JavaScript running in Gecko introduces some complications, chief among them the necessity of sharing state between the tests, the emulators, and the Python testrunner, all from within the context of the JavaScript test.  We can imagine such a test might look like this:

var message = "hello world!";
var device_number = Marionette.get_device_number(Marionette.THIS_DEVICE);

if (device_number == 1) {
  // we're being run in the "sender"

  // wait for the test in the other emulator to be in a ready state
  Marionette.wait_for_state(Marionette.THAT_DEVICE, Marionette.STATE_READY);

  // send the SMS
  navigator.sms.send(Marionette.get_device_port(Marionette.THAT_DEVICE), message);
}
else {
  // we're being run in the "receiver"

  // notify Marionette that this test is asynchronous
  Marionette.test_pending();

  // setup the event listener
  window.addEventListener("smsreceived",
                          function (m) { 
                                         // perform the test assertion and notify Marionette 
                                         // that the test is finished
                                         is(m.body, message, "Wrong message body received"); 
                                         Marionette.test_finished();
                                       }
                         );

  // notify Marionette we're in a ready state
  Marionette.set_state(Marionette.STATE_READY);
}

Conceptually, this is more similar to xpcshell tests, but implementing support for this kind of test in Marionette (or inside the existing xpcshell harness) would require substantial additional work. As it currently exists, Marionette is designed with a client-server architecture, in which information flows from the client (the Python part) to the server (inside Gecko) using TCP requests, and then back. Implementing the above JS-only test syntax would require us to implement the approximate reverse, in which requests could be initiated at will from within the JS part of the test, and this would require non-trivial changes to Marionette in several different areas, as well as requiring new code to handle the threading and synchronization that would be required.

Do you think the Python/JS hybrid tests will be sufficient for WebAPI testing in emulators?


November 14, 2011 09:12 PM

Jeff Hammel

jhammel now maintains mozregression

jhammel now maintains mozregression

So the secret is out!

http://harthur.wordpress.com/2011/11/01/new-mozregression-owner/

I am going to be maintaining mozregression going forward. I released a 0.6 version to pypi today which hopefully fixes a few setup.py issues. You can find me at jhammel __at__ mozilla __dot__ com or as jhammel in #ateam.

http://groups.google.com/group/mozilla.tools/t/b1f12f5127761207

November 14, 2011 03:13 PM

Talos is now a python package

Talos is now a python package

The A-Team is working on creating a set of high-quality python utilities that are consumable, general purpose, and interoperable in an effort called MozBase . A huge part of this quarter's effort is to improve Talos to consume MozBase software and to make it an extensible harness that may also be consumed.

As one of the first steps towards making Talos consume upstream MozBase packages, I have made Talos a python package . This allows Talos to depend on upstream python packages in an automated fashion, permit additional setup/install time steps to be automated, and install in a manner that dotted paths against talos can be resolved by python import. That is, other packages can now usefully import talos without depending on a set directory structure.

Unfortunately, since the talos repository was arranged such that all the python scripts and other data lived in a fairly disorganized top-level directory, this involved making a talos subdirectory and moving all files (except the README) into that subdirectory and carefully ensuring that all data resources were properly installed alongside the python scripts.

Even more unfortunately, this change led to some confusion that could have been avoided ahead of time. Talos uses a tests.zip file that contains both the scripts and the data, and though I would have liked to do additional cleanup as part of making Talos a python package, I deliberately held off on changing anything that would invalidate this methodology. However, unbeknownst to me, there were other resources that depended on the talos directory structure, and these got broken with my change. I apologize for that, and will communicate these changes more widely next time. In the meantime, if you have any tools that depend on the talos directory structure, know that they will break next time you update. If you have questions about this, please contact me.

Although the fallout was regrettable, I think this is a necessary and forward facing change in the light of MozBase, Mozharness , and general good python practices. We're now looking at deprecating the tests.zip methodology and moving towards a Mozharness script for running Talos for both desktop testers and production. More on that as things progress.

November 14, 2011 02:32 PM

November 11, 2011

William Lachance

Measuring what the user sees

I’ve been spending the last month or so at Mozilla prototyping a new project called Eideticker which aims to use video capture data and image/frame analysis for performance measurement of Firefox Mobile. It’s still in quite a rough state, but it’s now complete enough that I thought it would be worth spending a bit of time describing both its motivation and how it works.

First, a bit of an introduction. Up to now, our automated performance tools have used entirely synthetic benchmarks (how long til we get the onload event? how many ms since we last hit the main loop?) to gather performance information. As we’ve found out, there’s a lot you can measure with synthetic benchmarks. Tools like Talos have proven themselves by catching performance regressions on a very regular basis.

Still, there’s many things that synthetic benchmarks can’t easily or reliably measure. For example, it’s nice to know that a page has triggered an “onload” event (and the sooner it does that, the better), but what does the browser look like before then? If it’s a complicated or image intensive page, it might take 10 or 15 seconds to load. In this interval, user studies have clearly shown that an application displaying something sooner rather than later is always desirable if it’s not possible to display everything immediately (due to network traffic, CPU constraints, whatever). It’s this area of user-perceived performance that Eideticker aims to help with. Eideticker creates a system to capture live data of what the browser is displaying, then performs image/frame analysis on the result to see how we’re actually doing on these inherently subjective metrics. The above was just one example, others might include:

It turns out that it’s possible to put together a system that does this type of analysis using off-the-shelf components. We’re still very much in the early phase, but initial signs are promising. The initial test system has the following pieces:

  1. A Linux workstation equipped with a Decklink extreme 3D video capture card
  2. An Android phone with HDMI output (currently using the LG G2X)
  3. A version of talos modified to video capture the results of a test.
  4. A bit of python code to actually analyze the video capture data.

So far, I’ve got the system working end-to-end for two simple cases. The first is the “pageload” case. This lets you capture the results of loading any page within a talos pageset. Here’s a quick example of the movie we generate from a tsvg test:

Here’s another example, a color cycle test (actually the first test case I created, as a throwaway):

After the video is captured, the next step is to analyze it! As described above (and in further detail on the Eideticker wiki page), there’s lots of things we could measure but the easiest thing is probably just to count the number of unique frames and derive a frame rate for the capture based on that (the higher the better, obviously). Based on an initial prototype from Chris Jones, I’ve started work on a python library to do exactly this. Assuming you have an eideticker capture handy, you can run a tool called “analyze.py” on the command line, and it’ll give you its best guess of the # of unique frames:


(eideticker)wlach@eideticker:~/src/eideticker$ bin/analyze.py ./src/talos/talos/captures/capture-2011-11-11T11:23:51.627183.zip
Unique frames: 121/272

(There are currently some rough edges with this: we’re doing frame comparisons based on per-pixel changes, but the video capture data is slightly noisy so sometimes a pixel changes its value even when nothing has actually happened in the browser)

So that’s what I’ve got working so far. What’s next? Short term, we have some specific high-level goals about where we want to be with the system by the end of the quarter. The big unfinished pieces are getting an end-to-end test involving real user interaction (typing into the URL bar, etc.) going and turning this prototype system into something that’s easy for others to duplicate and is robust enough to be easily extended. Hopefully this will come together fairly quickly now that the basics are in place.

The longer term picture really depends on feedback from the community. Unlike many of the projects we work on in automation & tools, Eideticker is not meant to be something that’s run on every checkin. Rather, it’s intended to be a useful tool that can be run on an as needed basis by developers and QA. We obviously have our own ideas on how something like this might be useful (and what a reasonable user interface might be), but I’ve found in cases like this it’s much better to go to the people who will actually be using this thing. So with that in mind, here’s a call for feedback. I have two very specific questions:

My goal is to make something that people will love, so please do let me know what you think. :) Nothing about this project is cast in stone and the last thing I want is to deliver a product that people don’t actually want to use.

Equally, while Eideticker is being written primarily with the goal of making Mobile Firefox better (and in the slightly-less short term, desktop Firefox and Boot to Gecko), much of it is broadly applicable to any user-facing mobile or desktop application. If you think some component of Eideticker might be interesting to your project and want to collaborate, feel free to get in touch.

November 11, 2011 08:57 PM

November 06, 2011

Andrew Halberstadt

Peptest: A new harness for testing responsiveness

While responsiveness is one of the main goals for Firefox this quarter, we still don't quite have the means to measure and test our progress towards this goal. The good news is that there are, and have been for some time, several efforts to fix this problem. Back in June, Ted wrote some event tracing instrumentation that gives us a reasonable idea of when the browser becomes unresponsive. This event tracer is already being used by some Talos tests which gives us a good general idea of whether or not Firefox is more or less responsive than it was previously. What it doesn't give us is a method for developers to write their own tests and determine whether a specific action or feature they are working on is causing unresponsivness.

Peptest is designed for the missing use case. Namely, it can be used to automate user interactions in the browser and determine whether those actions are causing unresponsivness. This may be useful for creating a suite of responsiveness regression tests, or for developers working on a responsiveness related feature or fix. The Peptest harness is designed to be lightweight (so as not to interfere with results), simple to run and easy to write tests for.

Tests are nothing but Javascript files that will be executed in chrome scope. This means that Peptests are basically browser-chrome tests without any of the assertions (since assertions aren't needed in this context). However, since many Peptests will likely need to perform some kind of UI automation, the Peptest harness also exposes Mozmill's driver for convenience. I feel it's important to note, that importing Mozmill is completely optional (though recommended if you need to do any automation). I also feel it's important to note that I did some work to isolate Mozmill's driver which means that the actual test harness bits of Mozmill have been completely stripped out. What's left over is surprisingly lightweight and lives in a handful of JS files.

Currently, it is possible to run tests locally on your machine, though I could potentially add features or change any aspect of the harness. I've also been working on a Mozharness script in bug 692091 so we can run tests automatically for tinderbox builds.

Finally, I'd like to say: I need feedback! The requirements of this harness have been very vague from the outset. I've been doing my best to interpret the requirements in a way that makes sense, but I'm still kind of flying blind so to speak. What I mean is, I'm not sure what developers want and/or need. I'm also not sure how useful what I've thrown together so far will be. So if you have any ideas or general comments, please ping ahal on irc, or e-mail ahalberstadt@mozilla.com and I'd be very grateful.

November 06, 2011 03:56 AM

November 03, 2011

Joel Maher

Work In Progress – making Talos easier to run

This quarter I became the proud owner of Talos (well at least for a quarter or two).  Over the last few years talos has not had much churn, but this year (2011 proper) we have seen addons, responsiveness, xperf, mozafterpaint and experiments with eideticker.  With all of this talos has grown and more people are working on writing patches for it.

So there are plenty of efforts underway to refactor talos to make it easier to expand.  This is fine and dandy, but for a developer wanting to help out or reproduce a bug it is next to impossible.  We have standalone talos, but that still requires some effort and hacking.

If you are interested in running talos, or if you have some pet peeve that you have encountered while running talos please file a bug, comment on existing bugs, or let us know in #ateam on irc.


November 03, 2011 01:49 PM

October 25, 2011

William Lachance

Faster, but not quite there yet…

So as others have been posting about, we’ve been making some headway on our progress on the GoFaster project. Unfortunately it seems like we’re still some distance away from reaching our magic number of a 2 hour turnaround for each revision pushed.

It’s a bit hard to see the exact number on the graph (someone should fix that), but we seem to teetering around an average of 3 hours at this point. Looking at our build charts, it seems like the critical path has shifted in many cases from Windows to MacOS X. Is there something we can do to close the gap there? Or is there a more general fix which would lead to substantial savings? If you have any thoughts, or would like to help out, we’re scheduled to have a short meeting tomorrow.

Anyone is welcome to join, but note that we’re practical, results-oriented people. Crazy ideas are fun, but we’re most interested in proposals that have measurable data behind them and can be implemented in reasonable amounts of time. :)

October 25, 2011 10:13 PM

October 24, 2011

Joel Maher

Android Orange rate is 4.46%

Ok, the title might be misleading, but as of the last few days we are <5% orange for android unittests on mozilla-central.  The reason this was done is we have hidden J1 and R2 from the results.  We are tracking these in the weekly mobile automation meetings, and will continue to do so until those tests are live again.

For more data on specifics to our test failure distribution, please check out this spreadsheet and look at the different sheets.  We have been working for the last few weeks trying to reproduce these failures and the only concrete reproducible bug we could come up with was bug 691073.

We will continue to fix a few oranges that we see on the other tests as well as reduce the number of red/purple.


October 24, 2011 04:51 PM

October 20, 2011

Andrew Halberstadt

Isolating Mozmill's Driver

At the beginning of September, I was asked to write yet another automated test harness for testing user responsiveness. Among other things, the harness needed to be capable of automating a wide range of user interactions in Firefox (such as opening context menus, clicking buttons etc). Oh and by the way this needs to be finished as quickly as possible.

It turns out that machines aren't very good at interacting with user interfaces designed for humans. Properly...

October 20, 2011 09:00 PM

October 14, 2011

Clint Talbert

How I Started at Mozilla

In response to David Boswell’s post on getting involved at Mozilla, I thought I’d relate my own story.

I worked at a company called SimDesk that decided to reuse the Thunderbird and Sunbird code bases and make a great email application–this was long before the Lightning extension came into being.  Like any good closed-source company, we stole the code and worked on it in secret until we had a shining example of an “Outlook killer” (well, more or less).

Then we started feeling like we should contribute some of that code back to Mozilla.  We had a bunch of very awkward meetings with Dan Mosedale and Mike Shaver as they tried to teach us how to do open source.  They kept saying, “just submit a patch”, we kept wondering which lawyers we’d have to get involved to do that. :)

Eventually, Mike Hovis (an old friend and superior developer) and I started writing those patches.  It became clear that our changes wouldn’t apply cleanly to the newly refactored “Lightning” source base.  We decided that I’d make it part of my job (20% of my time, as I recall) to make patches for functionality we cared about and get it to the Mozilla calendar team.

I started attending the calendar team’s public meetings, and during one, when they asked if anyone wanted to lead a calendar QA team, I volunteered.  I had no idea how to actually do this, but I wanted to try organizing online to see if some of my offline organizing skills would translate.  My contribution of time grew.  As SimDesk directed me to work on Outlook extensions rather than an Outlook killer, I spent more and more of my time working with my calendar team, writing patches, mentoring, and aiding volunteers as they found their roles as leaders and developers in the calendar project.

And one day, when I could plainly see the writing on the wall, I asked Dan if Mozilla would actually consider a resume from me.  After his enthusiastic “yes”, I applied, and the rest is history.

Starting in the calendar project was incredible.  It was smaller (of course so was Mozilla in those days–even though it felt huge to me at the time).  It was easier to see your impact in such a small space, easier to identify volunteers, and easier to mentor people through the process and watch them become leaders.

Starting in that small area was also fortuitous because there was so much that needed to be done and opportunities were everywhere.

I still think that there are small areas across Mozilla where people can start and have a similar experience.  However, I think that Mozilla seems so monolithic these days that it is daunting to even try to find those niches where you can start out as a volunteer.  It is up to us on our teams to identify those areas where people can start, publicize them, and help people make that leap from “casually interested party” to “volunteer”.  In that vein, I tried articulating the roles that we’d like to see people step up to fill on my team.  If you’re interested, you know where to find me.


October 14, 2011 01:55 AM

October 13, 2011

Clint Talbert

Pandaboard Status

We’re looking at updating our Android support with these PandaBoard cards.  We already run with Tegras in our automation, but the Tegra 250′s are discontinued, and we can’t update to newer versions of Android with them, so introducing Pandaboards.

Well, PandaBoards come with nothing, not even a power supply.  They can be powered off USB, but it’s pretty difficult to get adb working in that state (if you have steps, I’d love to hear them).  So, here are the steps to getting something usable working (See the official getting started too):

  1. Order power cord, specifically the adapter and the cord
  2. Order 8-16Gb SDCard
  3. Ensure you have a mini USB cord
  4. Ensure you have a CAT 5 network cable.

Once you have this, you can build or download a build onto your SDCard.  Oh yeah, you’ll need an SDCard writer/reader.  Most computers have them by default these days, thankfully.

Then, plug it all in, and it should work.  I’ve noticed a few oddities:


October 13, 2011 12:16 AM

October 07, 2011

William Lachance

“Developers can’t do UI”

Despite making a dramatic shift from front-end development to back-end stuff since I started at Mozilla a few months ago, I’ve still had occasion to have to do a fair bit of user-facing code, even if an audience of other developers is a bit more limited than what I’ve been used to. Since my mission is to make the rest of Mozilla more productive, it’s worth putting a bit of time and intention into the user interface for my stuff. If I can reduce learning curves or streamline day-to-day workflows, that’s a win for everyone since they can spend that much more time rocking at their jobs (whether that be release engineering, platform work, or whatever). This brings up a point that I’ve had in the back of my mind for a while:

Despite conventional wisdom, developers can design half-decent user interfaces (if they try)!

I used to be certain that a project really needed graphic designers and/or usability experts to provide guidance on UI issues, but my experience over the last few years with iOS/web development has made me reconsider. Sure, pixel pushing and vector art is never going to be a programmer’s strong suit (and there’s certain high-level techniques that take years of study to acquire/understand), but the basic principles behind good UI design are accessible to anyone. There’s really only three core skills:

* An ability to put yourself in the shoes of the user. Who are you designing for, and what are they trying to accomplish? How can I streamline my UI to help them quickly solve the task at hand? This is one of the reasons why I find user stories so helpful.

* An understanding of common vocabulary for describing/designing applications and knowing what is “good”. Unfortunately I haven’t found anything like this for the web, but Apple’s human interface guidelines have some good general advice on this (just ignore the stuff specific to phones/tablet apps if that’s not what you’re doing).

* A willingness to iterate. The best ideas usually aren’t apparent immediately, and may only come out of a back forth. It’s been my experience that the more constructive dialog there is between people actively involved in the project on user experience issues, the better the end result is likely to be.

For example, one of the things that release engineering has found most useful in the GoFaster Dashboard has been the build charts. Believe it or not, the idea for that view started out as this useless piece of junk (I can say that because I created it). It was only after a good half hour back and forth on irc between myself, jgriffin, and jmaher (all of us backend/tool developers) that we came up with the view that inspired so much good analysis on the project.

All this is not to say that usability experts and graphic designers don’t have special skills that are worthy of respect. Indeed, if you’re a designer and would like to get involved with our work, please join us, we’d love your help. My only point is that on a project where a design resource isn’t available, thinking explicitly about usability is still worthwhile. And even where you have a UX expert on staff, programmers can have useful feedback too. Good UI is everyone’s responsibility!

October 07, 2011 02:48 PM

October 05, 2011

Joel Maher

Android automation is becoming more stable ~7% failure rate

At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%

So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.

I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:

Firefox Android Failures

Firefox Android Failures

Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.

We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in.  I did the same thing here:

mozilla-inbound Android Failures

mozilla-inbound Android Failures

Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.

This is only a small sample of the tests, but it should give you a good idea of where we are.


October 05, 2011 07:35 PM

October 03, 2011

Jonathan Griffin

OrangeFactor changes: Talos, bug correlations

Talos oranges now in OrangeFactor

When OrangeFactor (aka WOO) premiered, it did not include Talos oranges in its calculations.  There were many reasons for this, including the fact that Talos oranges were quite rare at the time.

As philor noted last week, that is no longer the case; there are now several frequent Talos oranges on the Android platform.  Because of this, I’ve just added Talos oranges into OrangeFactor.  The result is that the OrangeFactor has jumped from 4.01 (678 failures in 169 testruns) to 5.44 (921 failures in 169 testruns) on mozilla-central.

New bug correlations view

Mark Côté has recently implemented a new view in OrangeFactor, the bug correlations view.  This view shows bugs which occur together on the same commit.  We’ve already had a couple of suggestions for this page which we’re going to implement:  add bug summaries, and show the actual revision numbers for the correlations.  If anyone has other suggestions, please file a bug under Testing:Orange Factor.

Upcoming changes

Next up:  adding the ability to guess when an orange was introduced.  Stay tuned!


October 03, 2011 09:54 PM

September 29, 2011

Mark Côté

Automated Speed Tests, take two!

I recently implemented some improvements in the A-Team's Automated Speed Tests as per some requests I got back when I first announced them in July. Not everything's done, but I think this is a good point to advertise what's been changed thus far.

Firstly, I ditched the awful BIRT reports in favour of a custom web app that is faster, easier to use, and more flexible. You can restrict the date range (default is the last four weeks) and switch between tests and machines. The graph is also more responsive when turning on and off particular browsers (just click on the name in the legend). All the same data is there, but it's less cluttered and, well, less ugly!

By the way, BIRT appears to have a security hole in that it will insert the value of some GET parameters directly into the page without sanitizing them! So beware of that if you want to use BIRT for some reason.

Secondly, more tests! The first is MazeSolver, one that Firefox is particularly bad at. The second is test262, a JavaScript conformance test that has unfortunately made the name "Speed Tests" a bit of a lie.

A couple interesting observations I've made recently:

If you're wondering, the two Windows machines are running different hardware; Win7 1 is a 32-bit machine, and Win7 2 is a 64-bit machine, although I only switched it to use the 64-bit nightlies today. Email me if you want more particulars on the hardware.

Still more to come, including

And, as always, please let me know if there's more I can do to make the framework, tests, or data more useful.

September 29, 2011 09:31 PM

Joel Maher

notes on a python webserver

Last week I created a python webserver as a patch for make talos-remote.  This ended up being frought with performance issues, so I have started looking into it.  I based it off of the profileserver.py that we have in mozilla-central, and while it worked I was finding my tp4 tests were timing out.

I come to find out we are using a synchronous webserver, so this is easy to fix with a ThreadingMixIn, just like the chromium perf.py script:

class MyThreadedWebServer(ThreadingMixIn, BaseHTTPServer.HTTPServer):
    pass

Now the test was finishing, but very very slowly (20+ minutes vs <3 minutes).  After doing a CTRL+C on the webserver, I saw a lot of requests hanging on log_message and gethostbyaddr() calls.  So I ended up overloading the log_message call and things worked.

class MozRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
    # I found on my local network that calls to this were timing out
    def address_string(self):
        return "a.b.c.d"

    # This produces a LOT of noise
    def log_message(self, format, *args):
        pass

Now tp4m runs as fast as using apache on my host machine.


September 29, 2011 02:54 PM

September 21, 2011

Joel Maher

talos pageloader now supports timestamps using MozAfterPaint

Today we rolled out changes to talos such that tests that use the pageloader (chrome, nochrome, tp) will have the option to report the page load times after we receive a MozAfterPaint event instead of a Load event.

Currently this is only active on Mozilla-Central as we will run the numbers side by side to ensure we get a solid new baseline number.  In addition we upgraded the version of flash we are using and this seems to cause a small increase in the numbers as well.

We will run these side by side for a week and then we will turn off the non paint versions.  This will go branch by branch until we have no more side by side tests running.  If you look at the talos names, the original tests are marked as old_{testname} (i.e. old_tp, or old_chrome), and on the graph server the new tests are called {testname}_paint (i.e. tp_paint, tdhtml_paint, etc…)

 

 


September 21, 2011 06:39 PM

September 19, 2011

Joel Maher

Professional Development, Improv and your audience

I had the opportunity to attend some really exciting professional development sessions at the All Hands.  Personally I found these very interesting, but I heard a lot of grumbling about how these are not adding a lot of value or of interest.

One reason I found these interesting is that in a previous life I had attended a few years of Improv acting classes and did a short stint of real onstage Improv acting.  In looping back to these professional development sessions, they reminded me of the core concepts we learned in Improv 101.  So if you felt that you missed out, sign up for an Improv class.  Maybe if there are professional development sessions at a future event they could just have an Improv acting class.

Related to the professional development courses, I found that most of these were sparsely attended.  Of those that did attend the courses received great reviews/ratings.  To be fair, the technical tracks that I attended had about the same attendance records of the professional development tracks.  Maybe we are not creating sessions that are of interest to our audience?  I know for the technical tracks we just propose something and it magically becomes a session.  I don’t recall getting any input in what sessions would be available to me.  Maybe in the future we can do a better job of getting input from the community (a.k.a audience)!


September 19, 2011 03:37 PM

September 15, 2011

Joel Maher

5 minute challenge: fresh samsung tablet to mochitest results

I did an experiment today, I took my new shiny Samsung Galaxy tablet (which I received late yesterday) and tried to run mochitests on it with a stop watch running.  To preamble this, I had an existing objdir.  Here are a few of the things I had to do:

The challenge is out, can you beat my 5 minutes?


September 15, 2011 02:08 PM

September 06, 2011

Jonathan Griffin

GoFaster: deeper data analysis

For the GoFaster project, releng and the A-team have been working on various tasks which we hope will result in getting the total commit to all-tests-done time down to 2 hours for the main branches (try excluded).   This total turnaround time was 6-8 hours a couple of months ago when we began this project.

We’ve recently made some improvements that seriously reduce the total machine time required to run all tests for a given commit.  These include hiding the mochitest results table, removing packed.js from mochitest, and streamlining individual slow tests (see bug 674738, bug 676412, and bug 670229).  These together have reduced the total machine time for test down from about 40 hours to around 25 hours per commit, a big win.

However, the total turnaround times are still much slower than our goal:

We already knew that PGO builds are slow, and jhford is working on turning on-demand builds into non-PGO builds, and make PGO builds every four hours (bug 658313).  However, we needed a way to dig deeper into the data to see what our other pain points are.

Will Lachance made some awesome build charts which help us visualize what’s going on in these buildbot jobs.  Clicking any commit will show a chart that displays all the relevant buildbot jobs in relative clock time; this makes it easier to see where the bottlenecks are.

Build times

Display the build chart for just about any commit (e58e98a89827 for instance), and you’ll see the problem right away:  just about every commit includes builds that far exceed 2 hours.  These aren’t always opt builds, and they sometimes occur even on our ‘fast’ OS:  linux.  Check out 5d9989c3bff6, which has a linux64 opt build that takes 214 min, compared to the linux32 opt build that takes 61 minutes.  198c7de0699d has an OSX 10.5 debug build that takes 171 minutes, but the 10.6 debug build takes only 82 minutes.  Clearly, we can’t hit our 2-hour goal with builds that take 2+ hours.  What’s going on?

It’s necessary to spend a little time digging through build logs to find out.  It turns out there are multiple factors.

  1. We already know that PGO builds are slow, particularly on Windows.  Once bug 658313 lands, we expect the overall situation to improve dramatically.
  1. On some builds, the ‘update’ step includes a full ‘hg clone’ of mozilla-central, while others use ‘hg pull -u’.  Below is a graph of update times; the average time for an update that includes ‘hg clone’ is 12.9 min, for those that use ‘hg pull’ the average is 0.6 min.  Each full clone is costing us an average of 12 minutes.

  1. On some build slaves, we do a full build (with no obj dir from a previous build), on others we do an incremental build.   Below is a graph showing incremental vs full compile times for opt and debug builds.   On average, full compiles are taking 17 minutes longer than incremental ones.

  1. We have a mix of slow and fast slaves.  This can easily be seen in the below graph of linux compile times.  On linux and linux64 builds, full compiles with moz2-linux(64)-* slaves are slow (those > 75 min), while those made with linux(64)-ix-* slaves are fast (those < 75 min).  32-bit mac builds show a similar split, with those on moz2-darwin9* slaves slow, and those on bm-xserve* slaves fast.  Hardware doesn’t appear to create a significant difference for windows and 64-bit mac builds.

  1. On macosx64 machines, the ‘alive test’ step takes an average of 6 min (vs 1 min on other os’s).
  2. The ‘checking clobber times’ step often takes just a couple of seconds, however when this step actually results in some clobbering being done, it can take up to 21 minutes (average: 6 min).

When all these factors coincide, we can get builds (which include compile, update, and other steps) that exceed 4 hours.  This suggests doing away with on-demand PGO builds may not in itself get us to our 2-hour goal.

From this data, two of the more obvious ways to improve our build times might be:

  1. Investigate retiring slow linux and 32-bit mac build slaves.
  2. Investigate ways to reduce clobbering.  Clobbering itself takes time (see bullet #6 above), but also indirectly costs time through increased update and compile times.  Currently, about 51% of our builds are operating on clobbered slaves, requiring full hg clones and full compiles.  If this number could be reduced, we might see a significant reduction in our average turnaround times.

Test times

According to Will’s build charts, the E2E time for tests is often within our 30-minute target range.  The exception is mochitest-other on debug builds, which often takes from 60 to 90 minutes.  We could improve this situation somewhat by splitting mochitest-browser-chrome (the longest-running chunk of mochitest-other) into its own test job.

Additionally, wait times for test slaves running android and win 7 tests is sometimes non-trivial; see e.g. the details for commit 97216ae0fc04.  We should try to understand why this happens; the graph of test wait times doesn’t show a clear trend, other than highlighting the fact that wait times for windows and android are usually worse than the other os’s.

 

 


September 06, 2011 11:50 PM

September 01, 2011

William Lachance

Changes!

A bit quiet here for the last few months. What’s been happening?

1. I got married and had a wonderful honeymoon in France.
2. I started a fantastic new job with Mozilla’s tools & automation group. Currently working on bringing down the build/test times for Firefox (part of a project called GoFaster), which has been really interesting.
3. I moved into a fantastic new apartment in an old victorian building near Vendôme metro.

In short, life has been treating me really well! More updates soon.

September 01, 2011 04:07 PM

August 12, 2011

Joel Maher

all mochitests are updated to remove packed.js

Thanks to :mdas, we will have faster mochitests because packed.js isn’t being loaded for every single test case.  Read a bit more about it on mdas’s blog or stop by #ateam on irc if you have any questions.

/me posting because :mdas doesn’t have a feed to planet.mozilla.org.


August 12, 2011 07:59 PM

August 09, 2011

Jonathan Griffin

GoFaster: hiding the mochitest results table

I’m sure anyone who has ever submitted a patch to a Mozilla tree is familiar with this drill:

  1. hg push
  2. check TBPL, wait
  3. check TBPL again, wait some more
  4. go to Starbucks for a caramel macchiato, install a new OS on your laptop, review all the patches in your queue, plan next winter’s tropical vacation, check TBPL, and….
  5. wait some more

Recently, the total end-to-end time from submit to all-tests-done has been around 6-8 hours, depending on load.  That’s too long, and RelEng and the A-Team think we can do something about it.  For the past couple of months we’ve been working on the GoFaster project; our goal is to get that turnaround time down to 2 hours.  We have a list of tasks, and recently one of these landed with some significant improvements.

Cameron McCormack wrote a patch which hides the mochitest results table when MOZ_HIDE_RESULTS_TABLE=1 (see: bug 479352).  The initial version of this patch caused frequent hangs during mochitest-1/5.  We didn’t discover the reason behind this, but  I updated the patch to hide the result table in a different way, and the hang vanished.  I pushed this change to mozilla-central, and Cameron made a table displaying before and after durations for all the test runs.

The results?  That one change saves about 13 hours of machine time per checkin.  The entire suite of unit tests which prior to that change took about 40 machine-hours to run now takes 27.  Wow!

What kind of improvement in the end-to-end time does that translate into?  I’m not sure.  Sam Liu, an A-Team intern, has been working on a dashboard to help track this, but it’s currently using canned (stale) data.  RelEng is working on exposing live data to be consumed by the dashboard, and when that’s ready we should be able to easily track the effect of changes like this in the overall time.

Meanwhile, check out the project’s wiki page or attend one of our meetings.  If you have thoughts on ways we can improve our total turnaround time, we’d love to hear from you.


August 09, 2011 05:21 PM

July 12, 2011

Mark Côté

Automated Speed Tests!

It's hard to find a discussion of the speed of modern browsers that doesn't mention Microsoft's Test Drive speed demos. It's a common occurrence to find hundreds of fish swimming around a graphics developer's monitor. Continuing our mission to make developers' lives easier, the Mozilla A-Team has put together a framework to automatically run a few of these tests and put the results online. They're a bit ugly and slow, but some day I'll get around to cleaning them up.

We have set up a small framework that executes 5 speed tests twice daily against all the major browsers: IE, Safari, Chrome, Opera, and Firefox. Since we're particularly concerned with the latter, we run both the latest released version of Firefox and the latest Nightly.

For most tests, we sample the FPS every 5 seconds, since there is often a ramping-up time as objects are created and such. We then plot the median FPS for the test for each browser to make comparison easy. The results of any particular test run are also available through links in the graph and table for those curious about how the browser performs at various points during the test run.

One test, Psychedelic Browsing, uses a different metric, namely, the RPMs of a spinning patterned wheel. This is sampled only once, at the end of the test.

Disclaimer: I won't get into technical issues here, but sufficed to say that automating one browser is a little tricky; automating 5 browsers from different vendors is very tricky. One way we've reduced the number of variables is by limiting network access to prevent automatic, potentially performance-affecting browser and OS upgrades. This requires some periodic manual maintenance to update everything. We also reboot the machine after every test run. But mistakes happen and bugs crop up, so there are gaps in some of the graphs where one or more browsers were unable to start up or load the test suite, and some swings in results where the browser or machine was perturbed by some force (unfortunately this happened recently, which is why the last Firefox 4 results are all over the place after running stably for weeks). But the main method we employ to deal with all this is by running the suite twice every day, even though all browsers (except Nightly) change much less frequently. So, as in any scientific endeavour, ignore the outlying points and focus on the trends.

Here are a few things I've noticed, some obvious, some less so.

- Different browsers excel at different tests, though, not surprisingly, IE does well on all of them. Firefox is good or excellent at 4 of the 5 tests, but it's much worse than Chrome and IE at Santa's Workshop (see roc's post about this).

- Some browsers max out (60 FPS) on some tests. These tests would have to be modified for a true comparison. However some tests report FPSs above 60, which means they must be using some sort of "virtual" frame rate, since no monitor can display that much. More investigation needs to be done to see if this is a valid statistic for comparison.

- Nightly generally outperforms Firefox 4 except where they have maxed out. This is especially noticeable in SpeedReading, where 4 was only at about 32-33 FPS, but Nightly and Firefox 5 have are at 60 FPS.

- Some browser/test combinations are quite stable, with almost all results being the same, and some vary up and down. For instance, most browsers have stable results for Mr Potato Gun, but IE varies by 20-30 FPS.

- OS and browser updates definitely affect performance. Recently the network was left fully connected, and Firefox, Opera, and potentially Windows downloaded updates during test runs. This dropped performance noticeably.

As usual, feel free to make suggestions. Specifically, if there are particularly useful tests out there, I am more than willing to add them to the suite.

July 12, 2011 05:22 PM

May 31, 2011

Andrew Halberstadt

Why I'm Returning to Mozilla for a Third Internship

Before I started interning at Mozilla back in May 2010, I really didn't know what to expect. How does a non-profit company with an open source product operate? After working at giant corporations like IBM and McAfee I couldn't fathom what this experience would be like.

Although I've always been somewhat of a Firefox fanboy, I also had my worries. You may remember that at that time, Chrome had been out for a...

May 31, 2011 08:15 PM

May 20, 2011

Joel Maher

converting xpcshell from listing directories to a manifest

Last year we ventured down the path of adding test manifests for xpcshell in bug 616999.  Finding a manifest format is not easy because there are plenty of objections to the format, syntax and relevance to the project at hand.  At the end of the day, we depend too much on our build system to filter tests and after that we have hardcoded data in tests or harnesses to run or ignore based on certain criteria.  So for xpcshell unittests, we have added a manifest so we can start to keep track of all these tests and not depend on iterating directories and sorting or reverse sorting head and tail files.

The first step is to get a manifest format for all existing tests.  This was landed today in bug 616999 and is currently on mozilla-central.  This requires that all test files in directories be in the manifest file and that the manifest file includes all files in the directory (verified at make time).  Basically if you do a build, it will error out if you forget to add a manifest or test file to the manifest.  Pretty straightforward.

The manifest we have chosen is the ini format from mozmill.  We found that there is no silver bullet for a perfect test manifest, which is why we chose an existing format that met the needs of xpcshell.  This is easy to hand edit (as opposed to json), is easy to parse from python and javascript.  As compared to reftests which have a custom manifest format, we needed to just have a list of test files and more specifically a way to associate a head and tail script file (not easy with reftest manifests).  The format might not work for everything, but it gives us a second format to work with depending on the problem we are solving.


May 20, 2011 07:35 PM

May 06, 2011

Alice Nodelman

Addon Performance Testing: Updates and Future Work

The addons performance testing system has been up and live for a few weeks now.  With so many more eyes then mine on the system I’ve seen a bunch of bug filings - which is awesome.  With each bug fixed the Talos system works better for both addons and for the general Firefox performance testing.

Here’s what’s already fixed and rolled out:

Here’s what’s fixed but waiting on deployment (which will probably happen early next week):

There are still more bug fixes in the works. Next on my list is Bug 648225 - Performance of platform-dependent add-ons is not tested, which will improve the testing system’s simulation of the real world.

In terms of future plans, we are going forward with a second quarter goal of completing an on-demand addon testing service.  Basically, this would allow an addon author to request that their addon be tested at any time, instead of waiting for our weekly tests of the 100 most downloaded addons.  This will gain us greater coverage of more addons along with a means to double or triple check results.  Did your addon perform poorly?  Retest!  Are you suspicious of the results?  Retest!  Did the addon fail to download or install?  Retest!  If you want to follow along the bugs that will lead to this system are:

Once the on-demand system is in place, we’ll be working to introduce a greater variety of tests.  The ts (test startup) was an easy test to begin with, but it can be of limited meaning for a lot of addons.  I’d like us to cover far more of the available Talos tests, concentrating on the tp (test pageload) tests.  Tp is interesting because it uses a set of collected, local web pages (100 culled from the Alexa top 300 list of worldwide top used sites) that are then cycled through ten times.  With a given addon installed and active (for some meaning of ‘active’ which will be different for different addons) this will give a greater idea of how real world page load time is impacted.  As a side benefit of Tp, Talos will monitor the memory footprint and CPU usage during this test.  By comparing an addon run to a no-addon run we’ll be able to observe memory and CPU usage differences.

I believe that we are also going to need to put effort into provided some testing hooks/prefs for Talos to use.  As in Bug 459965 - Add standardized support for first-run pages to install.rdf.  Talos doesn’t react well to first run pages and it would be great to have the means to disable them with a single pref, instead of a customized pref per-addon - especially since the standard use case is that users do not see the first-run page on a regular basis, as they only see it post-installation and never again.  I believe that there are probably some other settings like this that would standardize creating a Talos testing environment, and thus make addon-testing more applicable to the type of bulk testing that Talos does.

Testing addons has been a whole new area of Talos testing, and it has its own unique set of challenges.  I’ve spent most of my time at Mozilla concentrating on automated browser performance testing and the addons world is still quite new to me.  Each addon effects the browser in its own way; while I’ve grown accustomed to standardizing tests across browsers versions, platforms and machines this is definitely a new horizon.  The Talos tests are just one way to look at performance impact, but not the final word.  It may never be appropriate for a given type of addons, but I most definitely want to work with addon developers to try and get the best coverage that we can.

May 06, 2011 11:36 PM

Mark Côté

From the Vaults of the A-Team: flot plugins

tl;dr From work on the War on Orange, I spun off three flot plugins: flot-axislabels, flot-hiddengraphs, and flot-tickrotor. Use 'em how you will & feel free to gimme feedback.

I'm going to step away from telling you about the A-Team's projects for a few minutes and talk about our by-products. Yup, software by-products. Think virtual horse-glue or electronic fertilizer. Well actually those comparisons aren't very good. Anyway, what I mean is that sometimes I overcome my natural laziness and package up bits of the work that I do that I think would be of particular benefit to others.

The War on Orange is all about statistics, and statistics are boooooring. Pictures, however, make the whole stats thing somewhat bearable, so the War on Orange makes extensive use of graphs. We decided on flot, a popular program for doing a whole buncha different plot types. Of course flot can't do everything, so it has support for plugins so you can add functionality without too much effort.

The first thing that we noticed was absent from flot was axis labels. We have graphs that show daily orange counts alongside the "orange factor"—oranges per test run—and we were using a second axis since the two stats are orders of magnitude apart. Not sure why axis labels weren't available out of the box, since they seem to be a pretty fundamental part of a graph, but luckily someone had already started on a plugin. Alas, it only provided labels for primary axes. But the plugin structure was all there, along with some interesting hacks. I've had a github account for a little while but didn't really use it, so it was very exciting to get down to some hardcore forking action. A while later, I had secondary-axis labels going in my flot-axislabels fork:



In the spirit of github, I submitted a pull request so the original author could incorporate my work into the original plugin, but I guess he had lost interest and never accepted it. So as far as I know, my improved flot-axislabel plugin is still the most fully featured one out there—although it does have a bit of weird behaviour sometimes as a side effect of the hacks needed to fit the labels in. Btw I accept pull requests...

The War on Orange has been going on for some time, and all the information we were trying to cram into our graphs starting making them feel cramped. Experience modifying flot-axislabels gave me the courage to create my own plugin to solve this problem: flot-hiddengraphs. This plugin allows you to hide and show the various graphs on one plot via the legend:





I made some interesting discoveries while working on that plugin, including the fact that mouseenter and mouseleave don't seem to always fire. Maybe if I weren't so lazy I'd fix it to use mousemove. Oh and it's still a bit ugly and I dunno why I have this fascination with links in square brackets. Did I mention I accept pull requests?

Well now that this plugin thing was old hat, I had to get creative to continue to ensure my life as a software developer was still painful. We've got a graph that can have quite a lot of columns (whether this is the right kind of display for this data is another matter). While conducting a different war, the Battle to Understand BIRT (aka BIRT Y U NO LIKE ME?), I stumbled on a nice control that allows you to rotate tick labels, so you can fit more, and longer, labels in. I started with the same hack as flot-axislabels to allocate some space for the labels... but how much space? Well I've never worked in graphics (to which my UIs will attest), and ten years is ample time for formal education to abandon me, so I couldn't even think of the word trigonometry at first. But Google knows all, and a short while later I was all Math.sin() and Math.cos() and Math.PI. Felt good to know that a few more university dollars paid off. So now the universe has flot-tickrotor (making up for a string of boring project names).



Now's the part in which I tell you what sucks about it: I had some problems with the allocation of space (seemingly the hardest part of these plugins) for long labels slanted down and to the right, and 'cause I'm lazy and think down-left-slanting labels look better anyway, I left it out. Automatically scaling fonts would be teh aw3som3 as well. Pull requests: I accept 'em.

So yeah, please use them and tell people about them and complain to me when they don't work and then send me pull requests when I tell you I'm too lazy to fix them.

If you're still reading, maybe you care about some of the interesting bits (read: sublime (in the Schopenhauerean sense) hacks) in flot plugin development. The main one I had to contend with was, as I've mentioned above, the allocation of space for new or bigger elements. As the code comments in the original flot-axislabels state,


This is kind of a hack. There are no hooks in Flot between
the creation and measuring of the ticks (setTicks, measureTickLabels
in setupGrid() ) and the drawing of the ticks and plot box
(insertAxisLabels in setupGrid() ).

Therefore, we use a trick where we run the draw routine twice:
the first time to get the tick measurements, so that we can change
them, and then have it draw it again.


What that comes down to, I figured out after a while, is that there's no way to tell flot "hey make the graph itself smaller 'cause I got stuff to put in the margin", since you don't know how big the graph is going to be until the plot is drawn. So a plugin that wants those margins to be bigger needs to do some calculations based on the standard size, set the label-dimension options appropriately, then trigger the draw event a second time. Now you've got spacier margins and can insert your elements. Note that this actually seems to be invisible to the user; I guess the first and second draw events happen before anything is actually displayed.

Unfortunately, this approach can screw over other plugins that also want to put stuff in the margins. flot-axislabels is actually okay because it is just allocates a bit more space and doesn't replace anything. But flot-tickrotor replaces the labels entirely... oh wait, maybe I can fix it if I display the labels in the first draw and then just calculate how much bigger the labels will have to be and ugh man this stuff is tedious. Anyway for now, if you use both, make sure tickrotor is loaded first. Because you're sick of hearing it, I'll make up a French version: j'accepte les demandes de tire. Oh hey there's a French github... demandes de "pull"? Pah, how unoriginal.

May 06, 2011 08:32 PM

April 23, 2011

Bob Clary

100,000,000

100,000,015 Firefox Downloads

April 23, 2011 07:13 AM

April 21, 2011

Joel Maher

Some notes about adding new tests to talos

Over the last year and a half I have been editing the talos harness for various bug fixes, but just recently I have needed to dive in and add new tests and pagesets to talos for Firefox and Fennec.  Here are some of the things I didn’t realize or have inconveniently forget about what goes on behind the scenes.

This is my experience from getting ts_paint, tpaint, and tp4m (mobile only) tests added to Talos over the last couple months.


April 21, 2011 04:25 PM

April 16, 2011

Clint Talbert

Mozmill 1.5.3 Released

I’ve been very heads down on a number of projects recently, as you can no doubt tell from the number of updates I’ve made on this blog.  The last post was Mozmill 1.5.2′s release announcement, and now I’m here to announce Mozmill 1.5.3.  This is a small bug-fix-only release to fix a few small issues that QA brought to our attention.  Work is continuing on 2.0 which will have a far better architecture.  I’ll write some posts about the changes we’re making.

But, today, Mozmill 1.5.3 is released.  It’s already on Pypi, and it was just submitted for review on AMO.

We fixed:

Thanks very much to the tireless folk who’ve fixed all these bugs and verified them.


April 16, 2011 12:58 AM

April 08, 2011

Andrew Halberstadt

How to bulk install Firefox Addons

Firefox is known for its extensibility. In fact, over 2.4 *billion* add-ons have been downloaded to date, meaning there are a lot of people using a lot of add-ons. While having 20+ add-ons can undoubtedly personalize your browsing experience, it can also be a pain in the arse to manually install them every time you set up a new Firefox profile. As a developer working on Firefox related automation tools, this is twice as true since I create a...

April 08, 2011 07:00 AM

March 21, 2011

Mark Côté

Autolog

The A-Team is embarking on a new initiative, and we need your help! After all, the A-Team's customers is Mozilla at large, and we like to keep our customers happy.

The project this time is Autolog. It's intended to be a generic tbpl-like results viewer for all the various projects that have test suites but aren't part of mozilla-central and the related branches.

We've already got a good start on the back-end: we're using an Elastic Search database to store results, and we're serving them up, and accepting new results, via a RESTful interface.

But now's the hard part: the UI! As we mentioned, the original concept was a tbpl-like interface, something clean and easy to scan. But tbpl is tied tightly to tinderbox, so it isn't easy to extend. We've spent some time starting at the code, and it looks like some extensive modifications would be in order, and they wouldn't necessarily make future extensions any easier.

Then we were told about an alternate to tbpl, asuth's ArbPL. This was designed to be extensible and has some neat features: it tells you what area has been changed (e.g. "Accessibility: Tests", "Layout: C++ Code"), it displays some details of failed tests automatically so you don't have to click on the failures first, and, for Mozmill, it has some very pretty stack traces and other information (example from the Thunderbird tree).

To be brutally honest, though, the A-Team is biased towards tbpl's look, both because it's the current standard and because it's cleaner. ArbPL has some very nice features, though, so it might be worth the effort to implement a tbpl-like interface, as time consuming as that might be.

But in the end, it's YOU, the customer, that is important to us. So let's hear it: do you like tbpl? What do you think of ArbPL? Is one much better than the other? Are there aspects you like of one and wish were in the other?

For the linkophobic, here are contemporaneous screenshots from tbpl and ArbPL (click to embiggen).



tbpl




ArbPL

March 21, 2011 10:13 PM

March 15, 2011

Bob Clary

test262.ecmascript.org shootout

Dave Fugate of Microsoft announced an update to test262.ecmascript.org, the test suite for ECMAScript 5. I thought I would check it out. Internet Explorer 9 was tested on 32bit Windows Vista, while the others were all tested on Mac OS X 10.5.

Browser Tests To Run Total Tests Ran Pass Fail Failed To Load
Internet Explorer 9 10456 10456 10439 17 0
Firefox 4.0 10456 10456 10155 301 0
Chrome 10 10456 10456 9959 497 0
Safari 5.0.4 10456 10456 9156 1300 72
Opera 11 10456 10456 6905 3551 66

The test suite looks very cool. Kudos to everyone involved in creating test262!

March 15, 2011 09:04 PM

March 07, 2011

Jonathan Griffin

WebGL Conformance Tests now in GrafxBot

GrafxBot has been updated to include the mochitest version of the WebGL Conformance Tests.  When you run GrafxBot tests using the new version, it will run the usual reftests first, followed by the new WebGL tests.  Both sets of test results are posted to the database at the end of test.

The WebGL tests may be skipped for a couple of reasons:  they’ll be skipped if you have a Mac running less than 10.6, or if WebGL isn’t enabled in Firefox on your machine, which could happen if you don’t have supported hardware or drivers.  GrafxBot doesn’t try to force-enable either WebGL or accelerated layers.

Partially to support these tests, GrafxBot now reports some additional details about Firefox’s acceleration status, similar to what you see in about:support:

webgl results 132 pass / 7 fail
webgl renderer Google Inc. — ANGLE — OpenGL ES 2.0 (ANGLE 0.0.0.541)
acceleration mode 2/2 Direct3D 10
d2d enabled true
directwrite enabled true: 6.1.7600.20830, font cache n/a

I encourage users to download and run the new version; I’d like to get some feedback before I update it on AMO, to make sure users aren’t running into problems with the new tests.

The new version of GrafxBot can be downloaded here.


March 07, 2011 10:36 PM