The Mozilla BlogFirefox Team Looks Within to Lead Into the Future

For Firefox products and services to meet the needs of people’s increasingly complex online lives, we need the right organizational structure. One that allows us to respond quickly as we continue to excel at delivering existing products and develop new ones into the future.

Today, I announced a series of changes to the Firefox Product Development organization that will allow us to do just that, including the promotion of long-time Mozillian Selena Deckelmann to Vice President, Firefox Desktop.

“Working on Firefox is a dream come true,” said Selena Deckelmann, Vice President, Firefox Desktop. “I collaborate with an inspiring and incredibly talented team, on a product whose mission drives me to do my best work. We are all here to make the internet work for the people it serves.”

Selena Deckelmann, VP Firefox Desktop

During her eight years with Mozilla, Selena has been instrumental in helping the Firefox team address over a decade of technical debt, beginning with transitioning all of our build infrastructure over from Buildbot. As Director of Security and then Senior Director, Firefox Runtime, Selena led her team to some of our biggest successes, ranging from big infrastructure projects like Quantum Flow and Project Fission to key features like Enhanced Tracking Protection and new services like Firefox Monitor. In her new role, Selena will be responsible for growth of the Firefox Desktop product and search business.

Rounding out the rest of the Firefox Product Development leadership team are:

Joe Hildebrand, who moves from Vice President, Firefox Engineering into the role of Vice President, Firefox Web Technology. He will lead the team charged with defining and shipping our vision for the web platform.

James Keller who currently serves as Senior Director, Firefox User Experience will help us better navigate the difficult trade-off between empowering teams while maintaining a consistent user journey. This work is critically important because since the Firefox Quantum launch in November 2017 we have been focused on putting the user back at the center of our products and services. That begins with a coherent, engaging and easy to navigate experience in the product.

I’m extraordinarily proud to have such a strong team within the Firefox organization that we could look internally to identify this new leadership team.

These Mozillians and I, will eventually be joined by two additional team members. One who will head up our Firefox Mobile team and the other who will lead the team that has been driving our paid subscription work. Searches for both roles will be posted.

Alongside Firefox Chief Technology Officer Eric Rescorla and Vice President, Product Marketing Lindsey Shepard, I look forward to working with this team to meet Mozilla’s mission and serve internet users as we build a better web.

You can download Firefox here.

The post Firefox Team Looks Within to Lead Into the Future appeared first on The Mozilla Blog.

Daniel StenbergComing to FOSDEM 2020

I’m going to FOSDEM again in 2020, this will be my 11th consecutive year I’m travling to this awesome conference in Brussels, Belgium.

At this my 11th FOSDEM visit I will also deliver my 11th FOSDEM talk: “HTTP/3 for everyone“. It will happen at 16:00 Saturday the 1st of February 2020, in Janson, the largest room on the campus. (My third talk in the main track.)

For those who have seen me talk about HTTP/3 before, this talk will certainly have overlaps but I’m also always refreshing and improving slides and I update them as the process moves on, things changes and I get feedback. I spoke about HTTP/3 already at FODEM 2019 in the Mozilla devroom (at which time there was a looong line of people who tried, but couldn’t get a seat in the room) – but I think you’ll find that there’s enough changes and improvements in this talk to keep you entertained this year as well!

If you come to FOSDEM, don’t hesitate to come say hi and grab a curl sticker or two – I intend to bring and distribute plenty – and talk curl, HTTP and Internet transfers with me!

You will most likely find me at my talk, in the cafeteria area or at the wolfSSL stall. (DM me on twitter to pin me down! @bagder)

Patrick ClokeCleanly removing a Django app (with models)

While pruning features from our product it was necessary to fully remove some Django apps that had models in them. If the code is just removed than the tables (and some other references) will be left in the database.

After doing this a few times for work I came up …

Patrick ClokeUsing MySQL’s LOAD DATA with Django

While attempting to improve performance of bulk inserting data into MySQL database my coworker came across the LOAD DATA SQL statement. It allows you to read data from a text file (in a comma separated variable-like format) and quickly insert it into a table. There’s two variations of it …

Mozilla VR BlogHello WebXR

Hello WebXR

We are happy to share a brand new WebXR experience we have been working on called Hello WebXR!

Here is a preview video of how it looks:

We wanted to create a demo to celebrate the release of the WebXR v1.0 API!.

The demo is designed as a playground where you can try different experiences and interactions in VR, and introduce newcomers to the VR world and its special language in a smooth, easy and nice way.

How to run it

You just need to open the Hello WebXR page on a WebXR (Or WebVR thanks to the WebXR polyfill) capable browser like Firefox Reality or Oculus Browser on standalone devices such as the Oculus Quest, or with Chrome on Desktop >79. For an updated list of supported browsers please visit the ImmersiveWeb.dev support table.

Features

  • The demo starts in the main hall where you can find:
  • Floating spheres containing 360º mono and stereo panoramas
  • A pair of sticks that you can grab to play the xylophone
  • A painting exhibition where paintings can be zoomed and inspected at will
  • A wall where you can use a graffiti spray can to paint whatever you want
  • A twitter feed panel where you can read tweets with hashtag #hellowebxr
  • Three doors that will teleport you to other locations:
    • A dark room to experience positional audio (can you find where the sounds come from?)
    • A room displaying a classical sculpture captured using photogrammetry
    • The top of a building in a skyscrapers area (are you scared of heights?)

Goals

Our main goal for this demo was to build a nice looking and nice performing experience where you could try different interactions and explore multiple use cases for WebXR. We used Quest as our target device to demonstrate WebXR is a perfectly viable platform not only for powerful desktops and headsets but also for more humble devices like the Quest or Go, where resources are scarce.

Also, by building real-world examples we learn how web technologies, tools, and processes can be optimized and improved, helping us to focus on implementing actual, useful solutions that can bring more developers and content to WebXR.

Tech

The demo was built using web technologies, using the three.js engine and our ECSY framework in some parts. We also used the latest standards such as glTF with Draco compression for models and Basis for textures. The models were created using Blender, and baked lighting is used throughout all the demo.

We also used third party content like the photogrammetry sculpture (from this fantastic scan by Geoffrey Marchal in Sketchfab), public domain sounds from freesound.org and classic paintings are taken from the public online galleries of the museums where they are exhibited.

Conclusions

There are many things we are happy with:

  • The overall aesthetic and “gameplay” fits perfectly with the initial concepts.
  • The way we handle the different interactions in the same room, based on proximity or states made everything easier to scale.
  • The demo was created initially using only Three.js, but we successfully integrated some functionality using ECSY.

And other things that we could improve:

  • We released fewer experiences than we initially planned.
  • Overall the tooling is still a bit rough and we need to keep on improving it:
    • When something goes wrong it is hard to debug remotely on the device. This is even worse if the problem comes from WebGL. ECSY tools will help here in the future.
    • State of the art technologies like Basis or glTF still lack good tools.
  • Many components could be designed to be more reusable.

What’s next?

  • One of our main goals for this project is also to have a sandbox that we could use to prototype new experiences and interactions, so you can expect this demo to grow over time.
  • At the same time, we would like to release a template project with an empty room and a set of default VR components, so you can build your own experiments using it as a boilerplate.
  • Improve the input support by using the great WebXR gamepads module and the WebXR Input profiles.
  • We plan to write a more technical postmortem article explaining the implementation details and content creation.
  • ECSY was released after the project started so we only used it on some parts of the demo. We would like to port other parts in order to make them reusable in other projects easily.
  • Above all, we will keep investing in new tools to improve the workflow for content creators and developers.

Of course, the source code is available for everyone. Please give Hello World! a try and share your feedback or issues with us on the github repository.

The Firefox FrontierData detox: Five ways to reset your relationship with your phone

There’s a good chance you’re reading this on a phone, which is not surprising considering that most of us spend up to four hours a day on our phones. And … Read more

The post Data detox: Five ways to reset your relationship with your phone appeared first on The Firefox Frontier.

Rubén MartínModernizing communities – The Mozilla way

It’s been a long time since I’ve wanted to write deeply about my work empowering communities. I want to start with this article sharing some high-level learnings around working on community strategy.

Hi, I’m Rubén Martín and I work as a Community Strategist for Mozilla, the non-profit behind Firefox Browser.

I’ve been working with, and participating in open/libre source communities since 2004 – the first decade as a volunteer before making open source my full time career –  joining Mozilla as staff five years ago, where as part of my Community Strategist role I’ve been focused on:

  • Identifying and designing opportunities for generating organizational impact through participation of volunteer communities.
  • Design open processes for collaboration that  provide a nice, empowering and rich experience for contributors.

During these many years, I have witnessed incredible change in how communities engage, grow and thrive in the open source ecosystem and beyond, and ‘community’ has become a term more broadly implicated in product marketing and its success in many organizations. At Mozilla our community strategy,  while remaining dedicated to the success of projects and people, has been fundamentally modernized and optimized to unlock the true potential of a Mission-Driven Organization by:

  • Prioritizing data, or what I refer to as ‘humanizing the data-driven approach’ 
  • Implementing our Open By Design strategy. 
  • Investing into our contributor experience holistically.

Today Mozilla’s communities are a powerhouse of innovation, unlocking much more impact to different areas of the organization, like the Common Voice 2019 Campaign where we collected 406 hours of public domain voices for unlocking the speech-to-text market or the Firefox Preview Bug Hunter Campaign with more than 500 issues filed and 8000 installs in just two weeks that were fundamental to launch this browser to the market sooner.

Follow me during this post to know how we got there.

The roots, how did we get here?

<figcaption>“IMG_2255” by Janellie is licensed under CC BY-NC-ND 2.0</figcaption>

Mozilla has grown from a small community of volunteers and a few employees, to an organization with 1200+ employees and tens of thousands of volunteers around the world. You can imagine that Mozilla and its needs in 2004 were completely different to the current ones.

Communities were created and organized organically for a long time here at Mozilla. Anyone with the time and energy to mobilize a local community created one and tried to provide value to the organization, usually through product localization or helping with user support.

I like a quote from my college Rosana Ardila who long ago said that contributing opportunities at Mozilla were a “free buffet” where anyone could jump into anything with no particular order of importance and where many of the dishes were long empty. We needed to “refill” the buffet.

This is how usually a lot of libre and open source communities operate, being open by default with a ton of entry points to contribute, unfortunately some or most of them not really fully optimized for contributions and therefore don’t offer a great contribution experience.

Impact and humanizing the data-driven approach

<figcaption>“Interview with David LePôle for Databit.me” by hellocatfood is licensed under CC BY-NC-SA 2.0</figcaption>

Focusing on impact and at the same time humanizing the data-driven approach were a couple of fundamental changes that happened around 4-5 years ago and completely changed our approach to communities.

When you have a project with a community around there are usually two fundamental problems to solve:

  1. Provide value to the organization’s current goals.
  2. Provide value to the volunteers contributing *

If you move the balance too much into the first one, you risk your contributors to become “free labor”, but if you balance too much into the other direction, your contributor efforts are highly likely to become irrelevant for the organization.

* The second point is the key factor to humanize the approach, and something people forget when using data to make decisions: It’s not just about numbers, it’s also people, human beings!

How do you even start to balance both?

RESEARCH!

Any decision you take should be informed by data, sometimes people in charge of community strategy or management have “good hunches” or “assumptions”, but that’s a risky business you need to avoid, unless you have data to support it.

Do internal research to understand your organization, where it is heading, what are the most important things and the immediate goals for this year. Engage into conversations to understand why these goals are important with key decision makers.

Do internal research to also understand your communities and contributors, who they are, why they are contributing (motivation), where, how? Both quantitatively (stats from tools) as well as qualitatively (surveys, conversations).

This will provide you with an enormous amount of information to figure out where are the places where impact can be boosted and also understand how your communities and contributors are operating. Are they aligned? Are they happy? If not, why?

Do also external research to understand how other similar organizations are solving the same problems, get out of your internal bubble and be open learning from others.

A few years ago we did all of this at Mozilla from the Open Innovation team, and it really informed our strategy moving forward. We keep doing internal and external research regularly in all of our projects to inform any important decisions.

Open by default vs open by design

<figcaption>“Sometimes Open Needs a Push” by cogdogblog is licensed under CC0 1.0</figcaption>

I initially mentioned that being open by default can lead to a poor contributor experience, which is something we learned from this research. If you think your approach will benefit from being open, please do so with intention, do so by design.

Pointing people to donate their free time to suboptimal contributor experiences will do more harm than good. And if something is not optimized or doesn’t need external contributions, you shouldn’t point people there and clarify the expectations with everyone upfront.

Working with different teams and stakeholders inside the organization is key in order to design and optimize impactful opportunities, and this is something we have done in the past years at Mozilla in the form of Activate Campaigns, a regular-cadence set of opportunities designed by the community team in collaboration with different internal projects, optimized for boosting their immediate goals and optimized to be an engaging and fun experience for our Mission-Driven contributors.

The contributor experience

In every organization there is always going to be a tension between immediate impact and long term sustainability, especially when we are talking about communities and contributors.

Some organizations will have more room than others to operate in the long term, and I’m privileged to work in an organization that understands the value of long term sustainability.

If you optimize only for immediate value, you risk your communities to fall apart in the medium term, but if you optimize only for long-term you risk the immediate success of the organization.

Find the sweet-spot between both, maybe that’s 70-30% immediate-long or 80-20%, it’s really going to depend on the resources you have and where your organization is right now.

The way we approached it was to always have relevant and impactful (mesurable) opportunities for people to jump into (through campaigns) and at the same time work on the big 7 themes we found we needed to fix as part of our internal research.

I suspect these themes are also relevant to other organizations, I won’t go into full details in this article but I’d like to list them here:

  • Group identities: Recognize and support groups both at regional and functional level.
  • Metrics: Increase your understanding of the impact and health of your communities.
  • Diversity and inclusion: How do you create processes, standards and workflows to be more inclusive and diverse?
  • Volunteer leadership: Shared principles and agreements on volunteer responsibility roles to have healthier and more impactful communities.
  • Recognition: Create a rewarding contributor experience for everyone.
  • Resource distribution: Standards and systems to make resource distribution fair and consistent across the project.
  • Contributor journey and opportunity matching: Connect more people to high impact opportunities and make it easy to join.

Obviously this is something you will need a strong community team to move forward, and I was lucky to work with excellent colleges at the Mozilla Community Development Team on this: Emma, Konstantina, Lucy, Christos, George, Kiki, Mrinalini and a ton of Mozilla volunteers over the Reps program and Reps Council.

You can watch a short video of the project we called “Mission-Driven Mozillians” and how we applied all of this:

What’s next?

I hope this article has helped you understand how we have been modernizing our community approach at Mozilla, and I also hope this can inspire others in their work. I’ve been personally following this approach in all the projects I’ve been helping with Community Strategy, from Mission-Driven Mozillians, to Mozilla Reps, Mozilla Support and Common Voice

I truly believe that having a strong community strategy is key for any organization where volunteers play a key role, and not only for providing value to the organization or project but also to bring this value back to the people who decided to donate their precious free time because they believe in what you are doing.

There is no way for your strategy to succeed in the long term if volunteers don’t feel and ARE part of the team, working together with you and your team and influencing the direction of the project.

Which part of my work are you most interested in so I can write next in more detail?

Feel free to reach out to me via email (rmartin at mozilla dot com) or twitter if you have questions or feedback, I also really want to know and hear from others solving similar problems. 

Thanks!

The Mozilla BlogICANN Directors: Take a Close Look at the Dot Org Sale


As outlined in two previous posts, we believe that the sale of the nonprofit Public Interest Registry (PIR) to Ethos Capital demands close and careful scrutiny. ICANN — the body that granted the dot org license to PIR and which must approve the sale — needs to engage in this kind of scrutiny.

When ICANN’s board meets in Los Angeles over the next few days, we urge directors to pay particular attention to the question of how the new PIR would steward and be accountable to the dot org ecosystem. We also encourage them to seriously consider the analysis and arguments being made by those who are proposing alternatives to the sale, including the members of the Cooperative Corporation of .ORG Registrants.

As we’ve said before, there are high stakes behind this sale: Public interest groups around the world rely on the dot org registrar to ensure free expression protections and affordable digital real estate. Should this reliance fail under future ownership, a key part of the public interest internet infrastructure would be diminished — and so would the important offline work it fuels.

Late last year, we asked ISOC, PIR and Ethos to answer a series of questions about how the dot org ecosystem would be protected if the sale went through. They responded and we appreciate their engagement, but key questions remain unanswered.

In particular, the responses from Ethos and ISOC proposed a PIR stewardship council made up of representatives from the dot org community. However, no details about the structure, role or powers of this council have been shared publicly. Similarly, Ethos has promised to change PIR’s corporate structure to reinforce its public benefit orientation, but provided few details.

Ambiguous promises are not nearly enough given the stakes. A crystal-clear stewardship charter — and a chance to discuss and debate its contents — are needed before ICANN and the dot org community can even begin to consider whether the sale is a good idea.

One can imagine a charter that provides the council with broad scope, meaningful independence, and practical authority to ensure PIR continues to serve the public benefit. One that guarantees Ethos and PIR will keep their promises regarding price increases, and steer any additional revenue from higher prices back into the dot org ecosystem. One that enshrines quality service and strong rights safeguards for all dot orgs. And one that helps ensure these protections are durable, accounting for the possibility of a future resale.

At the ICANN board meeting tomorrow, directors should discuss and agree upon a set of criteria that would need to be satisfied before approving the sale. First and foremost, this list should include a stewardship charter of this nature, a B corp registration with a publicly posted charter, and a public process of feedback related to both. These things should be in place before ICANN considers approving the sale.

ICANN directors should also discuss whether alternatives to the current sale should be considered, including an open call for bidders. Internet stalwarts like Wikimedia, experts like Marietje Schaake and dozens of important non-profits have proposed other options, including the creation of a co-op of dot orgs. In a Washington Post op-ed, former ICANN chair Esther Dyson argues that such a co-op would “[keep] dot-org safe, secure and free of any motivation to profit off its users’ data or to upsell them pricy add-ons.”

Throughout this process, Mozilla will continue to ask tough questions, as we have on December 3 and December 19. And we’ll continue to push ICANN to hold the sale up against a high bar.

The post ICANN Directors: Take a Close Look at the Dot Org Sale appeared first on The Mozilla Blog.

Mozilla Addons BlogExtensions in Firefox 72

After the holiday break we are back with a slightly belated update on extensions in Firefox 72. Firefox releases are changing to a four week cycle, so you may notice these posts getting a bit shorter. Nevertheless, I am excited about the changes that have made it into Firefox 72.

Welcome to the (network) party

Firefox determines if a network request is considered third party and will now expose this information in the webRequest listeners, as well as the proxy onRequest listener. You will see a new thirdParty property. This information can be used by content blockers as an additional factor to determine if a request needs to be blocked.

Doubling down on security

On the road to Manifest v3, we also recently announced the possibility to test our new content security policy for content scripts. The linked blog post will fill you in on all the information you need to determine if this change will affect you.

More click metadata for browser- and pageActions

If your add-on has a browserAction or pageAction button, you can now provide additional ways for users to interact with them. We’ve added metadata information to the onClicked listener, specifically the keyboard modifier that was active and a way to differentiate between a left click or a middle click. When making use of these features in your add-on, keep in mind that not all users are accustomed to using keyboard modifiers or different mouse buttons when clicking on icons. You may need to guide your users through the new feature, or consider it a power-user feature.

Changing storage.local using the developer tools

In Firefox 70 we reported that the storage inspector will be able to show keys from browser.storage.local. Initially the data was read-only, but since Firefox 72 we also have limited write support. We hope this will allow you to better debug your add-ons.

Miscellaneous

  • The captivePortal API now provides access to the canonicalURL property. This URL is requested to detect the captive portal state and defaults to http://detectportal.firefox.com/success.txt
  • The browserSettings API now supports the onChange listener, allowing you to react accordingly if browser features have changed.
  • Extension files with the .mjs extension, commonly used with ES6 modules, will now correctly load. You may come across this when using script tags, for example.

A shout out goes to contributors Mélanie Chauvel, Trishul Goel, Myeongjun Go, Graham McKnight and Tom Schuster for fixing bugs in this version of Firefox. Also we’ve received a patch from James Jahns from the MSU Capstone project. I would also like to thank the numerous staff members from different corners of Mozilla who have helped to make extensions in Firefox 72 a success. Kudos to all of you!

The post Extensions in Firefox 72 appeared first on Mozilla Add-ons Blog.

The Firefox FrontierFirefox Extension Spotlight: Privacy Badger

People can’t be expected to understand all of the technically complex ways their online behavior is tracked by hidden entities. As you casually surf the web, there are countless techniques … Read more

The post Firefox Extension Spotlight: Privacy Badger appeared first on The Firefox Frontier.

Mozilla Open Policy & Advocacy BlogWhat could an “Open” ID system look like?: Recommendations and Guardrails for National Biometric ID Projects

Digital ID systems are increasingly the battlefield where the fight for privacy, security, competition, and social inclusion is playing out. In our ever more connected world, some form of identity is almost always mediating our interactions online and offline. From the corporate giants that dominate our online lives using services like Apple ID and Facebook and Google’s login systems to government IDs which are increasingly required to vote, get access to welfare benefits, loans, pay taxes, get on transportation or access medical care.

Part of the push to adopt digital ID comes from the international development community who argue that this is necessary in order to expand access to legal ID. The UN Sustainable Development Goals (SDGs) call for “providing legal identity for all, including birth registration” by 2030. Possessing legal identity is increasingly a precondition to accessing basic services and entitlements from both state and private services. For the most marginalised communities, using digital ID systems to access essential services and entitlements from both state and private services are often one of their first interactions with digital technologies. Without these commonly recognized forms of official identification, individuals are at risk of exclusion and denial of services. However, the conflation of digital identity as the same as (or an extension of) “legal identity”, especially by the international development community, has led to an often uncritical embrace of digital ID projects.

In this white paper, we survey the landscape around government digital ID projects and biometric systems in particular. We recommend several policy prescriptions and guardrails for these systems, drawing heavily from our experiences in India and Kenya.

In designing, implementing, and operating digital ID systems, governments must make a series of technical and policy choices. It is these choices that largely determine if an ID system will be empowering or exploitative and exclusionary. While several organizations have published principles around digital identity, too often they don’t act as a meaningful constraint on the relentless push to expand digital identity around the world. In this paper, we propose that openness provides a useful framework to guide and critique these choices and to ensure that identity systems put people first. Specifically, we examine and make recommendations around five elements of openness: multiplicity of choices, decentralization, accountability, inclusion, and participation.

  • Openness as in multiplicity of choices: There should be a multiplicity of choices with which to identify aspects of one’s identity, rather than the imposition of a single and rigid ID system across purposes. The consequences of insisting on a single ID can be dire. As the experiences in India and Peru demonstrate, not having a particular ID or failure of authentication via that ID can lead to denial of essential services or welfare for the most vulnerable.
  • Openness as in decentralisation: Centralisation of sensitive biometric data presents a single point of failure for malicious attacks. Centralisation of authentication records can also amplify the surveillance capability of those entities that have visibility into the records. Digital IDs should, therefore, be designed to prevent their use as a tool to amplify government and private surveillance When national IDs are mandatory for accessing a range of services; the resulting authentication record can be a powerful tool to profile and track individuals.
  • Openness as in accountability: Legal and technical accountability mechanisms must bind national ID projects. Data protection laws should be in force and with a strong regulator in place before the rollout of any national biometric ID project. National ID systems should also be technically auditable by independent actors to ensure trust and security.
  • Openness as in inclusion: Governments must place equal emphasis on ensuring individuals are not denied essential services simply because they lack that particular ID or because the system didn’t work, as well as ensuring individuals have the ability to opt-out of certain uses of their ID. This is particularly vital for those marginalised in society who might feel the most at risk of profiling and will value the ability to restrict the sharing of information across contexts.
  • Openness as in participation: Governments must conduct wide-ranging consultation on the technical, legal, and policy choices involved in the ID systems right from the design stage of the project. Consultation with external experts and affected communities will allow for critical debate over which models are appropriate if any. This should include transparency in vendor procurement, given the sensitivity of personal data involved.

Read the white paper here: Mozilla Digital ID White Paper

The post What could an “Open” ID system look like?: Recommendations and Guardrails for National Biometric ID Projects appeared first on Open Policy & Advocacy.

Hacks.Mozilla.OrgThe Mozilla Developer Roadshow: Asia Tour Retrospective and 2020 Plans

It’s a wrap!

November 2019 was a busy month for the Mozilla Developer Roadshow, with stops in five Asian cities —Tokyo, Seoul, Taipei, Singapore, and Bangkok. Today, we’re releasing a playlist of the talks presented in Asia.

We are extremely pleased to include subtitles for all these talks in languages spoken in the countries on this tour: Japanese, Korean, Chinese, Thai, as well as English. One talk, Hui Jing Chen’s “Making CSS from Good to Great: The Power of Subgrid”, was delivered in Singlish (a Singaporean creole) at the event in Singapore!

In addition, because our audiences included non-native English speakers, presenters took care to include local language vocabulary in their talks, wherever applicable, and to speak slowly and clearly. We hope to continue to provide multilingual support for our video content in the future, to increase access for all developers worldwide.

Mozillians Ali Spivak, Hui Jing Chen, Kathy Giori, and Philip Lamb presented at all five stops on the tour.

Additional speakers Karl Dubost, Brian Birtles, and Daisuke Akatsuka joined us for the sessions in Tokyo and Seoul.

Dev Roadshow Asia talk videos (all stops):

Ali SpivakIntroduction: What’s New at Mozilla

Hui Jing ChenMaking CSS from Good to Great: The Power of Subgrid

Philip LambDeveloping for Mixed Reality on the Open Web Platform
Kathy GioriMozilla WebThings: Manage Your Own Private Smart Home Using FOSS and Web Standards

Tokyo and Seoul:

Karl Dubost and Daisuke AkatsukaBest Viewed With… Let’s Talk about WebCompatibility
Brian Birtles10 things I hate about Web Animation (and how to fix them)

The Dev Roadshow took Center Stage (literally!) at Start Up Festival, one of the largest entrepreneurship events in Taiwan. Mozilla Taiwan leaders Stan Leong, VP for Emerging Markets, and Max Liu, Head of Engineering for Firefox Lite joined us to share their perspectives on why Mozilla and Firefox matter for developers in Asia. Our video playlist includes an additional interview with Stan Leong at Mozilla’s Taipei office.

Taiwan videos:

Interview with Stan Leong
Stan LeongThe state of developers in the Asia Pacific region, and Mozilla’s outreach and impact within this region
Max LiuLearn more about Firefox Lite!

Venetia Tay and Sri Subramanian joined us in Singapore, at the Grab offices high above Marina One Towers.

Singapore videos:

Venetia TayDesigning for User-centered Privacy
Sriraghavan SubramanianPercentage Rollout for Single Page web applications
Hui Jing ChenMaking CSS from Good to Great: The Power of Subgrid (Singlish version)

In Asia, we kept the model of past Dev Roadshows. Again, our goal was to meet with local developer communities and deliver free, high-quality, relevant technical talks on topics relevant to Firefox, Mozilla and the web.

At every destination, developers shared unique perspectives on their needs. We learned alot. In some communities, concern for security and privacy is not a top priority. In other locations, developers have extremely limited influence and autonomy in selecting tools or frameworks to use in their work. We realized that sometimes the “best” solutions are out of reach due to factors beyond our control.

Nevertheless, all the developers we spoke to, across all the locales we visited, expressed a strong desire to support diversity in their communities. Everyone we met championed the value of inclusion: attracting more people with diverse backgrounds and growing community, positively.

The Mozilla DevRel team is planning what’s ahead for our Developer Roadshow program in 2020. One of our goals is to engage even more deeply with local developer community leaders and speakers in the year ahead. We’d like to empower dev community leaders and speakers to organize and produce Roadshow-style events in new locations. We’re putting together a program and application process (open February 2020 – watch here and on our @MozHacks twitter account for update), and will share more information soon!

The post The Mozilla Developer Roadshow: Asia Tour Retrospective and 2020 Plans appeared first on Mozilla Hacks - the Web developer blog.

Mozilla Security BlogCRLite: Speeding Up Secure Browsing

CRLite pushes bulk certificate revocation information to Firefox users, reducing the need to actively query such information one by one. Additionally this new technology eliminates the privacy leak that individual queries can bring, and does so for the whole Web, not just special parts of it. The first two posts in this series about the newly-added CRLite technology provide background: Introducing CRLite: All of the Web PKI’s revocations, compressed and The End-to-End Design of CRLite.

Since mid-December, our pre-release Firefox Nightly users have been evaluating our CRLite system while performing normal web browsing. Gathering information through Firefox Telemetry has allowed us to verify the effectiveness of CRLite.

The questions we particularly wanted to ask about Firefox when using CRLite are:

  1. What were the results of checking the CRLite filter?
    1. Did it find the certificate was too new for the installed CRLite filter;
    2. Was the certificate valid, revoked, or not included;
    3. Was the CRLite filter unavailable?
  2. How quickly did the CRLite filter check return compared to actively querying status using the Online Certificate Status Protocol (OCSP)?

How Well Does CRLite Work?

With Telemetry enabled in Firefox Nightly, each invocation of CRLite emits one of these results:

  • Certificate Valid, indicating that CRLite authoritatively returned that the certificate was valid.
  • Certificate Revoked, indicating that CRLite authoritatively returned that the certificate was revoked.
  • Issuer Not Enrolled, meaning the certificate being evaluated wasn’t included in the CRLite filter set, likely because the issuing Certificate Authority (CA) did not publish CRLs.
  • Certificate Too New, meaning the certificate being evaluated was newer than the CRLite filter.
  • Filter Not Available, meaning that the CRLite filter either had not yet been downloaded from Remote Settings, or had become so stale as to be out-of-service.
Show that >50% of TLS connections would have been using CRLite

Figure 1: One month of CRLite results in Firefox Nightly (5 December 2019 to 6 Jan 2020)

Immediately, one sees that over 54% of secure connections (500M) could have benefited from the improved privacy and performance of CRLite for Firefox Nightly users.

Of the other data:

  • We plan to publish updates up to 4 times per day, which will reduce the incidence of the Certificate Too New result.
  • The Filter Not Available bucket correlates well with independent telemetry indicating a higher-than-expected level of download issues retrieving CRLite filters from Remote Settings; work to improve that is underway.
  • Certificates Revoked but used actively on the Web PKI are, and should be, rare. This number is in-line with other Firefox Nightly telemetry for TLS connection results.

How Much Faster is CRLite?

In contrast to OCSP which requires a network round-trip to complete before a web page can load, CRLite needs only to perform a handful of hash calculations and memory or disk lookups. We expected that CRLite would generally outperform OCSP, but to confirm we added measurements and let OCSP and CRLite race each other in Firefox Nightly.

Show that CRLite is faster than existing technology 99% of the time

Figure 2: How often is CRLite faster than OCSP? (11 December 2019 to 6 January 2020)

Over the month of data, CRLite was faster to query than OCSP 98.844% of the time.

CRLite is Faster 99% of the Time

The speedup of CRLite versus OCSP was rather stark; 56% of the time, CRLite was over 100 milliseconds faster than OCSP, which is a substantial and perceptible improvement in browsing performance.

Distribution of speedups of CRLite

Figure 3: Distribution of occurrences where CRLite outperformed OCSP, which was 99% of CRLite operations. [source]

Almost 10% of the collected data reports showed an entire second of speedup, indicating that the OCSP reached the default timeout. The delay in this figure shows time spent where a Firefox user is waiting for the page to start loading, so this has a substantial impact to perceived quickness in the browser.

To verify that outlier at the timeout, our OCSP telemetry probe shows that over the same period, 9.9% of OCSP queries timed out:

10% of OCSP queries time out

Figure 4: Results of Live OCSP queries in Firefox Nightly [source]

Generally speaking, when loading a website where OCSP wasn’t already cached, 10% of the time Firefox users pause for a full second before the site loads, and they don’t even get revocation data in exchange for the wait.

The 1% When OCSP is Faster

The 500k times that OCSP was faster than CRLite, it was generally not much faster: 50% of these occasions it was less than 40 milliseconds faster. Only 20% of the occasions found OCSP 100 milliseconds faster.

Distribution of slowdowns from CRLite

Figure 5: Distribution of occurrences where OCSP outperformed CRLite, which was 1% of CRLite operations. [source]

Interesting as this is, it represents only 1% of CRLite invocations for Firefox Nightly users in this time period. Almost 99% of CRLite operations were faster, much faster.

Much Faster, More Private, and Still Secure

Our study confirmed that CRLite will maintain the integrity of our live revocation checking mechanisms while also speeding up TLS connections.

At this point it’s clear that CRLite lets us keep checking certificate revocations in the Web PKI without compromising on speed, and the remaining areas for improvement are on shrinking our update files closer to the ideal described in the original CRLite paper.

In the upcoming Part 4 of this series, we’ll discuss the architecture of the CRLite back-end infrastructure, the iteration from the initial research prototype, and interesting challenges of working in a “big data” context for the Web PKI.

In Part 5 of this series, we will discuss what we’re doing to make CRLite as robust and as safe as possible.

The post CRLite: Speeding Up Secure Browsing appeared first on Mozilla Security Blog.

This Week In RustThis Week in Rust 322

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community

News & Blog Posts

Crate of the Week

This week's crate is fasteval, a crate for fast and safe evaluation of algebraic expressions.

Thanks to Christopher Sebastian for the suggestions!

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

270 pull requests were merged in the last week

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

No RFCs were approved this week.

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now.

RFCs
Tracking Issues & PRs

New RFCs

No new RFCs were proposed this week.

Upcoming Events

Europe
North America
South America

If you are running a Rust event please add it to the calendar to get it mentioned here. Please remember to add a link to the event too. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

Rc<RefCell> is like duct tape.

It's very versatile, and can fix a multitude of problems in a pinch. For some problems, it's even the best thing to use. But if the thing you're building is more than about 10% wrapped in duct tape, you might want to reconsider your design process!

trentj on rust-users

Thanks to Tom Phinney for the suggestion!

Please submit quotes and vote for next week!

This Week in Rust is edited by: nasa42 and llogiq.

Discuss on r/rust.

The Firefox FrontierHow to set Firefox as your default browser on Windows

During a software update, your settings can sometimes change or revert back to their original state. For example, if your computer has multiple browsers installed, you may end up with … Read more

The post How to set Firefox as your default browser on Windows appeared first on The Firefox Frontier.

Daniel Stenbergcurl cheat sheet refresh

Several years ago I made a first version of a “curl HTTP cheat sheet” to facilitate the most common curl command line options when working with HTTP.

This has now been refreshed after I took lots of feedback from friends on twitter, and then worked on rearranging the lines and columns so that it got as compact as possible without sacrificing readability (too much).

See the updated online version here.

<figcaption>The curl HTTP cheat sheet, 2020 edition.</figcaption>

The plan is to make stickers out of this – and possibly t-shirts as well. I did some test prints and deemed that with a 125 mm width, all the text is still clearly readable.

If things go well, I’ll hand out these beauties at curl up 2020 and of course leftovers will then be given away to whoever I run into at other places and conferences where I bring stickers…

Daniel Stenbergcurl ootw: -m, –max-time

(Previous option on the week posts.)

This is the lowercase -m. The long form is called --max-time.

One of the oldest command line options in the curl tool box. It existed already in the first curl release ever: 4.0.

This option also requires a numerical argument. That argument can be an integer number or a decimal number, and it specifies the maximum time – in seconds – that you allow the entire transfer to take.

The intended use for this option is basically to provide a means for a user to stop a transfer that stalls or hangs for a long time due to whatever reason. Since transfers may sometimes be very quick and sometimes be ridiculously slow, often due to factors you cannot control or know about locally, figuring out a sensible maximum time value to use can be challenging.

Since -m covers the whole operation, even a slow name resolve phase can trigger the timeout if you set it too low. Setting a very low max-time value is almost guaranteed to occasionally abort otherwise perfectly fine transfers.

If the transfer is not completed within the given time frame, curl will return with error code 28.

Examples

Do the entire transfer within 10 seconds or fail:

curl -m 10 https://example.com/

Complete the download the file within 2.25 seconds or return error:

curl --max-time 2.25 https://example.com

Caveat

Due to restrictions in underlying system functions, some curl builds cannot specify -m values under one second.

Related

Related, and often better, options to use include --connect-timeout and the --speed-limit and --speed-time combo.

Karl DubostWeek notes - 2020 w03 - worklog - Murphy

Work

Basically, I spent most of the week dealing with coding the new workflow for anonymous bug reporting. And we (Guillaume, Ksenia, Mike and myself) made good progress. I believe we will be ready for Berlin All Hands.

Also GitHub decided to revive our anonymous bugs, around 39,000 bugs are back. We haven't yet reactivated our anonymous reporting.

Next week I will be working on the last bit that will allow us to restart the normal process.

Oh! Before I forget shout out to Ksenia. When I was reviewing one of her pull requests, I discovered and explored a couple of things I didn't know in the unittest.mock python module. That is one thing we forget too often, reviewing code is a way to learn cool stuff.

Friday, I let the code on the side and did a bit of diagnosis, so the pile of bugs doesn't grow insanely. Specifically with Berlin events coming.

Murphy

Also as Murphy's law put it, last week bad omens kept creeping into this week. My personal VMs just decided to not reboot, and basically I lost everything. This blog (Pelican driven with markdown files), personal blog and a couple of other sites, and my personal email server. So next week, I'll have to rebuild things during night. I have backups of blog content, and emails are under imap on my local machine. The joy of having everything as static content is making things a lot simpler.

So it's why this post is published later than sooner.

Mozilla lay-offs

Let's address the elephant in the room.

By now, everyone knows the sad news. 70 persons that you met, crossed path, worked with. 70 persons who shared a desire to make a better place for the Web. My social upbringing (fight for your workers rights) is making things difficult. I find the outpouring of help wonderful, but at the same time, I can't help myself and think that there would not be at #WalmartLifeBoat type of support for people who need to find jobs to not be on the street the day after. The value of Syndicats de salariés (loosely translated by Trade unions in USA, but really with a complete different culture) and the laws protectings one's job in France are essential. Every day working in a North American job culture makes you realize this.

All the best to my coworkers who were forced to leave. It's always a sad moment. Every individual circumstance is unique, emotionally, economically, etc. and if I can do anything, tell me.

Otsukare!

Niko MatsakisAsync Interview #5: Steven Fackler

Hello! For the latest async interview, I spoke with Steven Fackler (sfackler). sfackler has been involved in Rust for a long time and is a member of the Rust libs team. He is also the author of a lot of crates, most notably tokio-postgres.

I particularly wanted to talk to sfackler about the AsyncRead and AsyncWrite traits. These traits are on everybody’s list of “important things to stabilize”, particularly if we want to create more interop between different executors and runtimes. On the other hand, in [tokio-rs/tokio#1744], the tokio project is considering adopting its own variant traits that diverge significantly from those in the futures crate, precisely because they have concerns over the design of the traits as is. This seems like an important area to dig into!

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

One note: something about our setup meant that I was hearing a lot of echo. I think you can sometimes hear it in the recording, but not nearly as bad as it was live. So if I seem a bit spacey, or take very long pauses, you might know the reason why!

Background: concerns on the async-read trait

So what are the concerns that are motivating tokio-rs/tokio#17144? There are two of them:

  • the current traits do not permit using uninitialized memory as the backing buffer;
  • there is no way to test presently whether a given reader supports vectorized operations.

This blog post will focus on uninitialized memory

sfackler and I spent most of our time talking about uninitialized memory. We did also discuss vectorized writes, and I’ll include some notes on that at the end, but by and large sfackler felt that the solutions there are much more straightforward.

Important: The same issues arise with the sync Read trait

Interestingly, neither of these issues is specific to AsyncRead. As defined today, the AsyncRead trait is basically just the async version of Read from std, and both of these concerns apply there as well. In fact, part of why I wanted to talk to sfackler specifically is that he is the author of an excellent paper document that covers the problem of using uninitialized memory in great depth. A lot of what we talked about on this call is also present in that document. Definitely give it a read.

Read interface doesn’t support uninitialized memory

The heart of the Read trait is the read method:

fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>

This method reads data and writes it into buf and then – assuming no error – returns Ok(n) with the number n of bytes written.

Ideally, we would like it if buf could be an uninitialized buffer. After all, the Read trait is not supposed to be reading from buf, it’s just supposed to be writing into it – so it shouldn’t matter what data is in there.

Problem 1: The impl might read from the buf, even if it shouldn’t

However, in practice, there are two problems with using uninitialized memory for buf. The first one is relatively obvious: although it isn’t supposed to, the Read impl can trivially read from buf without using any unsafe code:

impl Read for MyReader {
  fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
    let x = buf[0];
    ...
  }
}

Reading from an uninitialized buffer is Undefined Behavior and could cause crashes, segfaults, or worse.

Problem 2: The impl might not really initialize the buffer

There is also a second problem that is often overlooked: when the Read impl returns, it returns a value n indicating how many bytes of the buffer were written. In principle, if buf was uninitialized to start, then the first n bytes should be written now – but are they? Consider a Read impl like this one:

impl Read for MyReader {
  fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
    Ok(buf.len())
  }
}

This impl has no unsafe code. It claims that it has initialized the entire buffer, but it hasn’t done any writes into buf at all! Now if the caller tries to read from buf, it will be reading uninitialized memory, and causing UB.

One subtle point here. The problem isn’t that the read impl could return a false value about how many bytes it has written. The problem is that it can lie without ever using any unsafe code at all. So if you are auditing your code for unsafe blocks, you would overlook this.

Constraints and solutions

There have been a lot of solutions proposed to this problem. sfackler and I talked about all of them, I think, but I’m going to skip over most of the details. You can find them either in the video or in in sfackler’s paper document, which covers much of the same material.

In this post, I’ll just cover what we said about three of the options:

  • First, adding a freeze operation.
    • This is in some ways the simplest, as it requires no change to Read at all.
    • Unfortunately, it has a number of limitations and downsides.
  • Second, adding a second read method that takes a &mut dyn BufMut dyn value.
    • This is the solution initially proposed in [tokio-rs/tokio#1744].
    • It has much to recommend it, but requires virtual calls in a core API, although initial benchmarks suggest such calls are not a performance problem.
  • Finally, creating a struct BufMuf in the stdlib for dealing with partially initialized buffers, and adding a read method for that.
    • This overcomes some of the downsides of using a trait, but at the cost of flexibility.

Digression: how to think about uninitialized memory

Before we go further, let me digress a bit. I think the common understanding of uninitialized memory is that “it contains whatever values happen to be in there at the moment”. In other words, you might imagine that when you first allocate some memory, it contains some value – but you can’t predict what that is.

This intuition turns out to be incorrect. This is true for a number of reasons. Compiler optimizations are part of it. In LLVM, for example, an uninitialized variable is not assigned to a fixed stack slot or anything like that. It is instead a kind of “free floating” “uninitialized” value, and – whenever needed – it is mapped to whatever register or stack slot happens to be convenient at the time for most optimal code. What this means in practice is that each time you try to read from it, the compiler will substitute some value, but it won’t necessarily be the same value every time. This behavior is justified by the C standard, which states that reading uninitialized memory is “undefined behavior”.

This can cause code to go quite awry. The canonical example in my mind is the case of a bounds check. You might imagine, for example, that code like this would suffice for legally accessing an array:

let index = compute_index();
if index < length {
  return &array[index];
} else {
  panic!("out of bounds");
}

However, if the value returned by compute_index is uninitialized, this is incorrect. Because in that case, index will also be “the uninitialized value”, and hence each access to it conceptually yields different values. So the value that we compare against length might not be the same value that we use to index into the array one line later. Woah.

But, as sfackler and I discussed, there are actually other layers that rely on uninitialized memory never being read even below the kernel. For example, in the linux kernel, the virtual memory system has a flag called MADV_FREE. This flag is used to mark virtual memory pages that are considered uninitialized. For each such virtual page, khe kernel is free to change the physical memory page at will – until the virtual page is written to. At that point, the memory is potentially initialized, and so the virtual page is pinned. What this means in practice is that when you get memory back from your allocator, each read from that memory may yield different values, unless you’ve written to it first.

For all these reasons, it is best to think of uninitialized memory not as having “some random value” but rather as having the value “uninitialized”. This is special value that can, sometimes, be converted to a random value when it is forced to (but, if accessed multiple times, it may yield different values each time).

If you’d like a deeper treatment, I recommend Ralf’s blog post.

Possible solution to read: Freeze operation

So, given the above, what is the freeze operation, and how could it help with handling uninitialized memory in the read API?

The general idea is that we could have a primitive called freeze that, given some (potentially) uninitialized value, converts any uninititalized bits into “some random value”. We could use this to fix our indexing, for example, by “freezing” the index before we compare against the length:

let index = freeze(compute_index());
if index < length {
  return &array[index];
} else {
  panic!("out of bounds");
}

In a similar way, if we have a reference to an uninitialized buffer, we could conceivably “freeze” that reference to convert it to a reference of random bytes, and then we can safely use that to invoke read. The idea would be that callers do something like this:

let uninitialized_buffer = ...;
let buffer = freeze(uninitialized_buffer);
let n = reader.read(&mut buffer)?;
...

If we could do this, it would be great, because the existing read interface wouldn’t have to change at all!

There are a few complications though. First off, there is no such freeze operation in LLVM today. There is talk of adding one, but that operation wouldn’t quite do what we need. For one thing, it freezes the value it is applied to, but it doesn’t apply through a reference. So you could use it to fix our array bounds length checking example, but you can’t use it to fix read – we don’t need to freeze the &mut [u8] reference, we need to fix the memory it refers to.

Secondly, that primitive would only apply to compiler optimizations. It wouldn’t protect against kernel optimizations like MADV_FREE. To handle that, we have to do something extra, such as writing one byte per memory page. That’s conceivable, of course, but there are some downsides:

  • It feels fragile. What if linux adds some new optimizations in the future, how will we work around those?
  • It feels disappointing. After all, MADV_FREE was presumably added because it allows this to be faster – and we all agree that given a “well-behaved” Read implementation, it should be reasonable.
  • It can be expensive. sfackler pointed out that it is sometimes common to “over-provision” your read buffers, such as creating a 16MB buffer, so as to avoid blocking. This is fairly cheap in practice, but only thanks to optimizations (like MADV_FREE) that allow that memory to be lazilly allocated and so forth. If we start writing a byte into every page of a 16MB buffer, you’re going to notice the difference.

For these reasons, sfackler felt like freeze isn’t the right answer here. It might be a useful primitive for things like array bounds checking, but it would be better if we could modify the Read trait in such a way that we permit the use of “unfrozen” uninitialized memory.

Incidentally, this is a topic we’ve hit on in previous async interviews. [cramertj and I talked about it][ctj2], for example. My own opinion has shifted – at first, I thought a freeze primitive was obviously a good idea, but I’ve come to agree with sfackler that it’s not the right solution here.

Fallback and efficient interoperability

If we don’t take the approach of adding a freeze primitive, then this implies that we are going to have to extend the Read trait with some of second method. Let’s call it read2 for short. And this raises an interesting question: how are we going to handle backwards compatibility?

In particular, read2 is going to have a default, so that existing impls of Read are not invalidated. And this default is going to have to fallback to calling read, since that is the only method that we can guarantee to exist. Since read requires a fully initialized buffer, this will mean that read2 will have to zero its buffer if it may be uninitialized. This by itself is ok – it’s no worse than today.

The problem is that some of the solutions discussed in sfackler’s doc can wind up having to zero the buffer multiple times, depending on how things play out. And this could be a big performance cost. That is definitely to be avoided.

Possible solution to read: Take a trait object, and not a buffer

Another proposed solution, in fact the one described in [tokio-rs/tokio#1744], is to modify read so it takes a trait object (in the case of the Read trait, we’d have to add a new, defaulted method):

fn read_buf(&mut self, buf: &mut dyn BufMut) -> io::Result<()>

The idea here is that BufMut is a trait that lets you safely access a potentially uninitialized set of buffers:

pub trait BufMut {
    fn remaining_mut(&self) -> usize;
    unsafe fn advance_mut(&mut self, cnt: usize);
    unsafe fn bytes_mut(&mut self) -> &mut [u8];
    ...
}

You might wonder why the definition takes a &mut dyn BufMut, rather than a &mut impl BufMut. Taking impl BufMut would mean that the code is specialized to the particular sort of buffer you are using, so that would potentially be quite a bit faster. However, it would also make Read not “dyn-safe”1, and that’s a non-starter.

There are some nifty aspects to this proposal. One of them is that the same trait can to some extent “paper over” vectorized writes, by distributing the data written across buffers in a chain.

But there are some downsides. Perhaps most important is that requiring virtual calls to write into the buffer could be a significant performance hazard. Thus far, measurements don’t suggest that, but it seems like a cost that can only be recovered by heroic compiler optimizations, and that’s the kind of thing we prefer to avoid.

Moreover, the ability to be generic over vectorized writes may not be as useful as you might think. Often, the caller wants to know whether the underlying Read supports vectorized writes, and it would operate quite differently in that case. Therefore, it doesn’t really hurt to have two read methods, one for normal and one for vectorized writes.

Variant: use a struct, instead of a trait

The variant that sfackler prefers is to replace the BufMut trait with a struct.2 The API of this struct would be fairly similar to the trait above, except that it wouldn’t make much attempt to unify vectorized and non-vectorized writes.

Basically, we’d have a struct that encapsulates a “partially initialized slice of bytes”. You could create such a struct from a standard slice, in which case all things are initialized, or you can create it from a slice of “maybe initialized” bytes (e.g., &mut [MaybeUninit<u8>]. There can also be convenience methods to create a BufMut that refers to the uninitialized tail of bytes from a Vec (i.e., pointing into the vector’s internal buffer).

The safe methods of the BufMut API would permit

  • writing to the buffer, which will track the bytes that were initialized;
  • getting access to a slice, but only one that is guaranteed to be initialized.

There would be unsafe methods for getting access to memory that may be uninitialized, or for asserting that you have initialized a big swath of bytes (e.g., by handing the buffer off to the kernel to get written to).

The buffer has state: it can track what has been initialized. This means that any given part of the buffer will get zeroed at most once. This ensures that fallback from the new read2 method to the old read method is reasonably efficient.

Sync vs async, how to proceed

So, given the above thoughts, how should we proceed with AsyncRead? sfackler felt that the question of how to handle uninitialized output buffers was basically “orthogonal” from the question of whether and when to add AsyncRead. In others, sfackler felt that the AsyncRead and Read traits should mirror one another, which means that we could add AsyncRead now, and then add a solution for uninitialized memory later – or we could do the reverse order.

One minor question has to do with defaults. Currently the Read trait requires an implementation of read – any new method (read_uninit or whatever) will therefore have to have a default implementation that invokes read. But this is sort of the wrong incentive: we’d prefer if users implemented read_uninit, and implemented read in terms of the new method. We could conceivably reverse the defaults for the AsyncRead trait to this preferred style. Alternatively, sfackler noted that we could make both read and read_uninit have a default implementation, one implementing in terms of the other. In this case, users would have to implement one or the other (implementing neither would lead to an infinite loop, and we would likely want a lint for that case).

We also discussed what it would mean it tokio adopted its own AsyncRead trait that diverged from std. While not ideal, sfackler felt like it wouldn’t be that big a deal either way, since it ought to be possible to efficiently interconvert between the two. The main constraint is having some kind of stateful entity that can remember the amount of uninitialized data, thus preventing the inefficient fallover behavior.

Is the ability to use uninitialized memory even a problem?

We spent a bit of time at the end discussing how one could gain data on this problem. There are two things that would be nice to know.

First, how big is the performance impact from zeroing? Second, how ergonomic is the proposed API to use in practice?

Regarding the performance impact, I asked the same question on tokio-rs/tokio#17144, and I did get back some interesting results, which I summarized in this hackmd at the time. In short, hyper’s benchmarks show a fairly sizable impact, with uninitialized data getting speedups3 of 1.3-1.5x. Other benchmarks though are much more mixed, showing either no diference or small differences on the order of 2%. Within the stdlib, we found about a 7% impact on microbenchmarks.

Still, sfackler raised another interesting data point (both on the thread and in our call). He was pointing out #23820, a PR which rewrote read_to_end in the stdlib. The older implementation was simple and obvious, but suffered from massive performance cliffs related to the need to zero buffers. The newer implementation is fast, but much more complex. Using one of the APIs described above would permit us to avoid this complexity.

Regarding ergonomics, as ever, that’s a tricky thing to judge. It’s hard to do better than prototyping as well as offering the API on nightly for a time, so that people can try it out and give feedback.

Having the API on nightly would also help us to make branches of frameworks like tokio and async-std so we can do bigger measurements.

Higher levels of interoperability

sfackler and I talked a bit about what the priorities should be beyond AsyncRead. One of the things we talked about is whether there is a need for higher-level traits or libraries that expose more custom information beyond “here is how to read data”. One example that has come up from time to time is the need to know, for example, the URL or other information associated with a request.

Another example might be the role of crates like http, which aims to define Rust types for things like HTTP header codes that are fairly standard. These would be useful types to share across all HTTP implementations and libraries, but will we be able to achieve that sort of sharing without offering the crate as part of the stdlib (or at last part of the Rust org)? I don’t think we had a definitive answer here.

Priorities beyond async read

We next discussed what other priorities the Rust org might have around Async I/O. For sfackler, the top items would be

  • better support for GATs and async fn in traits;
  • some kind of generator or syntactic support for streams;
  • improved diagnostics, particularly around send/sync.

Conclusion

sfackler and I focused quite heavily on the AsyncRead trait and how to manage uninitialized memory. I think that it would be fair to summarize the main points of our conversation as:

  • we should add AsyncRead to the stdlib and have it mirror Read;
  • in general, it makes sense for the synchronous and asynchronous versions of the traits to be analogous;
  • we should extend both traits with a method that takes a BufMut struct to manage uninitialized output buffers, as the other options all have a crippling downside;
  • we should extend both traits with a “do you support vectorized output?” callback as well;
  • beyond that, the Rust org should focus heavily on diagnostics for async/await, but streams and async fns in traits would be great too. =)

Comments?

There is a thread on the Rust users forum for this series.

Appendix: Vectorized reads and writes

There is one minor subthread that I’ve skipped over – vectorized reads and writes. I skipped it in the blog post because this problem is somewhat simpler. The standard read interface takes a single buffer to write the data into. But a vectorized interface takes a series of buffers – if there is more data than will fit in the first one, then the data will be written into the second one, and so on until we run out of data or buffers. Vectorized reads and writes can be much more efficient in some cases.

Unfortunately, not all readers support vectorized reads. For that reason, the “vectorized read” method has a fallback: by default, it just calls the normal read method using the first non-empty buffer in the list. This is theoretically equal, but obviously it could be a lot less efficient – imagine that I have supplied one buffer of size 1K and one buffer of size 16K. The default vectorized read method will just always use that single 1K buffer, which isn’t great – but still, not much to be done about it. Some readers just cannot support vectorized reads.

The problem here then is that it would be nice if there were some way to detect when a reader supports vectorized reads. This would allow the caller to choose between a “vectorized” call path, where it tries to supply many buffers, or a single-buffer call path, where it just allocates a big buffer.

Apparently hyper will do this today, but using a heuristic: if a call to the vectorized read method returns just enough data to fit in the first buffer, hyper guesses that in fact vectorized reads are not supported, and switches dynamically to the “one big buffer” strategy. (Neat.)

There is perhaps a second, more ergonomic issue: since the vectorized read method has a default implementation, it is easy to forget to implement it, even if you would have been able to do so.

In any case, this problem is relatively easy to solve: we basically need to add a new method like

fn supports_vectorized_reads(&self) -> bool

to the trait.

The matter of decided whether or not to supply a default is a bit trickier. If you don’t supply a default, then everybody has to implement it, even if they just want the default behavior. But if you do, people who wished to implement the method may forget to do so – this is particularly unfortunate for reads that are wrapping another reader, which is a pretty common case.

Footnotes

  1. Most folks say “object-safe” here, but I’m trying to shift our terminology to talk more about the dyn keyword. 

  2. Carl Lerche proposed something similar on the tokio thread here

  3. I am defining a “speedup” here as the ratio of U/Z, where U/Z are the throughput with uninitialized/zeroed buffers respectively. 

Karl DubostWeek notes - 2020 w02 - worklog - Fixing the mess

What a week!

webcompat.com: Project belt-on.

So last week, on Friday (Japanese time), I woke up with a website being half disabled and then completely disabled. We had been banned by GitHub because of illegal content we failed to flag early enough. And GitHub did what they should do.

Oh… and last but not least… mike asked me what Belt-on meant. I guess so let's make it more explicit.

Seat Bealt Sign On

Now we need to fix our own mess, that will take a series of steps. Some of them are already taken.

  • Right now, anonymous reporting is disabled. And it will not come back as it was. Done
  • The only way to allow anonymous reporting again would be to switch to a moderation first model, then make it public once it has been made properly reviewed. ToDo
  • We also probably need to remove inline screenshot images. Currently the images are shown directly to users. On Webcompat.com, our nsfw tag blurs images, but this has no effect on github obviously as we do not control the CSS there. ToDo
  • Remove illegal content. Done
  • Discussions have started on how we need to handle the future workflow for anonymous bug report.
  • Created a diagram for a solution which doesn't involve URL redirections in case of moderation.

Berlin All Hands

  • Berlin All Hands will be a very interesting work week for the Webcompat team with probably a lot of new things to think and probably for the best of the webcompat.com experience. Incidents can become the source of improvements.

Misc

Reading

  • Interesting read about making the meta viewport the default value. I wonder which breakage impact it could have on Web compatibility. Some websites dynamically test if a meta viewport is already available in the markup and insert it or not dynamically based on user agent strings or screen sizes. Thomas reminded me of the discussion going on about @viewport
  • Release Notes for Safari Technology Preview 98

Otsukare!

Marco ZeheThis blog has moved

As discussed previously, I planned a few changes to my blog. Well, this Sunday, it all happened.

I moved my blog back to a self-hosted WordPress, but am powering it with Jetpack to offer many of the same features as during the seven months it ran on WordPress.com. I am also using the same theme, just have rearranged a few things. The privacy policy was updated to reflect the new status.

The site is hosted at a Germany-based provider that is very open-source friendly. The main server tasks happen on the command line. The most accessible: Text in, text out.

I also managed to transfer the followers acquired with the wordpress.com-hosted blog. I also hope to merge the stats with the ones I already gathered with this blog, and which Jetpack remembered because I am using the same domain again.

What did not transfer over were e-mail subscribers that do not use a WordPress.com account. Sorry about that, you’ll have to resubscribe if you want to. There is technically no way to transfer these.

Hoping to get you all on board again with the blog that now moved back to self-hosting. 🙂

Buy me a coffee

Donate to the maintenance of this blog or show your appreciation for my content by buying me a virtual cup of coffee.

€4.50

Mike HommeyAnnouncing git-cinnabar 0.5.3

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.2?

  • Updated git to 2.25.0 for the helper.
  • Fixed small memory leaks.
  • Combinations of remote ref styles are now allowed.
  • Added a git cinnabar unbundle command that allows to import a mercurial bundle.
  • Experimental support for python >= 3.5.
  • Fixed erroneous behavior of git cinnabar {hg2git,git2gh} with some forms of abbreviated SHA1s.
  • Fixed handling of the GIT_SSH environment variable.
  • Don’t eliminate closed tips when they are the only head of a branch.
  • Better handle manifests with double slashes created by hg convert from Mercurial < 2.0.1, and the following updates to those paths with normal Mercurial operations.
  • Fix compatibility with Mercurial libraries >= 3.0, < 3.4.
  • Windows helper is now statically linked against libcurl.

Armen ZambranoThis post focuses on the work I accomplished as part of the Treeherder team during the last half…

This post focuses on the work I accomplished as part of the Treeherder team during the last half of last year.

Some work I already blogged about:

Deprecated the taskcluster-treeherder service

The Taskcluster team requested that we stop ingesting tasks from the taskcluster-treeherder service and instead use the official Taskcluster Pulse exchanges (see work in bug 1395254). This required rewriting their code from Javascript to Python and integrate it into the Treeherder project. This needed to be accomplished by July since the Taskcluster migration would leave the service in an old set up without much support. Integrating this service into Treeherder gives us more control over all Pulse messages getting into our ingestion pipeline without an intermediary service. The project was accomplished ahead of the timeline. The impact is that the Taskcluster team had one less system to worry about ahead of their GCP migration.

Create performance dashboard for AWSY

I September I created a performance dashboard for the Are We Slim Yet project (see work in bug 1550235). This was not complicated because all I had to do was to branch off the existing Firefox performance dashboard I wrote last year and deploy it on a different Netlify app.

One thing I did not do at the time is to create different configs between the two branches. This would make it easier to merge changes from the master branch to the awsy branch without conflicts.

On the bright side, Divyanshu came along and fixed the issue! We can now use an env variable to start AWFY & another for AWSY. No need for two different branches!

Miscellaneous

Other work I did was to create a compare_pushes.py script that allows you to compare two different instances of Treeherder to determine if the ingestion of jobs for a push is different.

I added a management command to ingest all Taskcluster tasks from a push, an individual task/push or all tasks associated to a Github PR. Up until this point, the only way to ingest this data would be by having the data ingestion pipeline set up locally before those tasks started showing up in the Pulse exchanges. This script is extremely useful for local development since you can test ingesting data with precise control and having

I added Javascript code coverage for Treeherder through codecov.io and we have managed not to regress farther since then. This was in response that most code backouts for Treeherder were due to frontend regressions. Being able to be notified if we are regressing is useful to adjust tests appropriately.

I maintain a regular schedule (twice a week) to release Treeherder production deployments. This guarantees that Treeherder would not have too many changes being deployed to production all at once (I remember the odd day when we had gone 3 weeks without any code being promoted).

I helped with planning the work for Treeherder for H2 of this year, H1 next year and I helped the CI-A’s planning for 2020.

I automated merging Javascript dependencies updates for the Treeherder.

I created a process to share secrets safely within the team instead of using a Google Doc.

David TellerUnits of Measure in Rust with Refinement Types

Years ago, Andrew Kennedy published a foundational paper about a type checker for units of measure, and later implemented it for F#. To this day, F# is the only mainstream programming language which provides first class support to make sure that you will not accidentally confuse meters and feet, euros and dollars, but that you can still convert between watts·hours and joules.

I decided to see whether this could be implemented in and for Rust. The answer is not only yes, but it was fun :)

Mozilla Future Releases BlogA brand new browsing experience arrives in Firefox for Android Nightly

It’s been almost 9 years since we released the first Firefox for Android. Hundreds of millions of users have tried it and over time provided us with valuable feedback that allowed us to continuously improve the app, bringing more features to our users that increase their privacy and make their mobile lives easier. Now we’re starting a new chapter of the Firefox experience on Android devices.

Testing to meet the users’ needs

Back in school, most of us weren’t into tests. They were stressful and we’d rather be playing or hanging out with friends. As adults, however, we see the value of testing — especially when it comes to software: testing ensures that we roll out well-designed products to a wide audience that deliver on their intended purposes.

At Firefox, we have our users at heart, and the value our products provide to them is at the center of everything we do. That’s why we test a lot. It’s why we make our products available as Nightly (an early version for developers) and Beta versions (a more stable preview of a new piece of software), put the Test Pilot program in place and sometimes, when we enter entirely new territory, we add yet another layer of user testing. It’s exactly that spirit that motivated us to launch Firefox Preview Beta in June 2019. Now we’re ready for the next step.

A new Firefox for Android: the making-of

When we started working on this project, we wanted to create a better Firefox for Android that would be faster, more reliable, and able to address today’s user problems. Plus, we wanted it to be based on our own mobile browser engine GeckoView in order to offer the highest level of privacy and security available on the Android platform. In short: we wanted to make sure that our users would never have to choose between privacy and a great browsing experience.

We had an initial idea of what that new Android product would look like, backed up by previous user research. And we were eager to test it, see how users feel about it, and find out what changes we needed to make and adjust accordingly. To minimize user disruption, early versions of this next generation browser were offered to early adopters as a separate application called Firefox Preview.

In order to ensure a fast, efficient and streamlined user experience, we spent the last couple of months narrowing down on what problems our users wanted us to solve, iterating on how we built and surfaced features to them. We looked closely at usage behaviour and user feedback to determine whether our previous assumptions had been correct and where changes would be necessary.

The feedback from our early adopters was overwhelmingly positive: the Firefox Preview Beta users loved the app’s fresh modern looks and the noticeably faster browsing experience due to GeckoView as well as new UI elements, such as the bottom navigation bar. When it came to tracking protection, we learned that Android users prefer a browsing experience with a more strict protection and less distractions — that’s why we made Strict Mode the default in Firefox Preview Beta, while Firefox for Desktop comes with Standard Mode.

Firefox Preview Beta goes Nightly

Based on the previous 6 months of user testing and the positive feedback we have received, we’re confident that Android users will appreciate this new browsing experience and we’re very happy to announce that, as of Tuesday (January 21, 2020), we’re starting to roll it out to our existing Firefox for Android audience in the Nightly app. For current Nightly users, it’ll feel like a big exciting upgrade of their browsing experience once they update the app, either manually or automatically, depending on their preset update routine. New users can easily download Firefox Preview here.

As for next milestones, the brand new Firefox for Android will go into Beta in Spring 2020 and land in the main release later in the first half of this year. In the meantime, we’re looking forward to learning more about the wider user group’s perception of the new Firefox for Android as well as to more direct feedback, allowing us to deliver the best-in-class mobile experience that our users deserve.

The post A brand new browsing experience arrives in Firefox for Android Nightly appeared first on Future Releases.

Daniel StenbergYou’re invited to curl up 2020: Berlin

The annual curl developers conference, curl up, is being held in Berlin this year; May 9-10 2020.

Starting now, you can register to the event to be sure that you have a seat. The number of available seats is limited.

Register here

curl up is the main (and only?) event of the year where curl developers and enthusiasts get together physically in a room for a full weekend of presentations and discussions on topics that are centered around curl and its related technologies.

We move the event around to different countries every year to accommodate different crowds better and worse every year – and this time we’re back again in Germany – where we once started the curl up series back in 2017.

The events are typically small with a very friendly spirit. 20-30 persons

Sign up

We will only be able to let you in if you have registered – and received a confirmation. There’s no fee – but if you register and don’t show, you lose karma.

The curl project can even help fund your travel and accommodation expenses (if you qualify). We really want curl developers to come!

Register here

Date

May 9-10 2020. We’ll run the event both days of that weekend.

Agenda

The program is not done yet and will not be so until just a week or two before the event, and then it will be made available => here.

We want as many voices as possible to be heard at the event. If you have done something with curl, want to do something with curl, have a suggestion etc – even just short talk will be much appreciated. Or if you have a certain topic or subject you really want someone else to speak about, let us know as well!

Expect topics to be about curl, curl internals, Internet protocols, how to improve curl, what to not do in curl and similar.

Location

We’ll be in the co.up facilities in central Berlin. On Adalbertstraße 8.

Planning and discussions on curl-meet

Everything about the event, planning for it, volunteering, setting it up, agenda bashing and more will be done on the curl-meet mailing list, dedicated for this purpose. Join in!

<figcaption>The Stockholm edition.</figcaption>

Anti Google?

If you have a problem with filling in the Google-hosted registration form, please email us at curlup@haxx.se instead and we’ll ask you for the information over email instead.

Credits

The images were taken by me, Daniel. The top one at the Nuremburg curl up in 2017, the in-article photo in Stockholm 2018.

Nick FitzgeraldAnnouncing Better Support for Fuzzing with Structured Inputs in Rust

Today, on behalf of the Rust Fuzzing Authority, I’d like to announce new releases of the arbitrary, libfuzzer-sys, and cargo fuzz crates. Collectively, these releases better support writing fuzz targets that take well-formed instances of custom input types. This enables us to combine powerful, coverage-guided fuzzers with smart test case generation.

Install or upgrade cargo fuzz with:

cargo install --force cargo-fuzz

To upgrade your fuzz targets, bump your libfuzzer-sys dependency to 0.2.0 on crates.io. That should be all that’s needed for most cases. However, if you were already using Arbitrary inputs for your fuzz target, some changes will be required. See the upgrading fuzz targets section below for more details.

Fuzzing with Well-Formed, Custom Inputs

Imagine we are testing an ELF object file parser. In this scenario, it makes sense to fuzz the parser by feeding it raw byte buffers as input. Parsers for binary formats take raw byte buffers as input; there’s no impedance mismatch here. Additionally, coverage-guided fuzzers like libFuzzer naturally generate byte buffers as inputs for fuzz targets, so getting fuzzing up and running is easy.

Now instead imagine we are testing a color conversion library: converting between RGB colors to HSL colors and back. Our color conversion functions don’t take raw byte buffers as inputs, they take Rgb or Hsl structures. And these structures are defined locally by our library; libFuzzer doesn’t have any knowledge of them and can’t generate instances of them on its own.

We could write a color string parser, parse colors from the fuzzer-provided, raw input bytes, and then pass the parsed results into our color conversion functions. But now the fuzzer is going to spend a bunch of time exploring and testing our color string parser, before it can get “deeper” into the fuzz target, where our color conversions are. Small changes to the input can result in invalid color strings, that bounce off the parser. Ultimately, our testing is less efficient than we want, because our real goal is to exercise the conversions but we’re doing all this other stuff. On top of that, there’s this hump we have to get over to start fuzzing, since we have to write a bunch of parsing code if we didn’t already have it.

This is where the Arbitrary trait comes in. Arbitrary lets us create structured inputs from raw byte buffers with as thin a veneer as possible. As best it can, it preserves the property that small changes to the input bytes lead to small changes in the corresponding Arbitrary instance constructed from those input bytes. This helps coverage-guided fuzzers like libFuzzer efficiently explore the input space. The new 0.3.0 release of arbitrary contains an overhaul of the trait’s design to further these goals.

Implementing Arbitrary for custom types is easy: 99% of the time, all we need to do is automatically derive it. So let’s do that for Rgb in our color conversion library:

use arbitrary::Arbitrary;

#[derive(Arbitrary, Debug, PartialEq, Eq)]
pub struct Rgb {
    r: u8,
    g: u8,
    b: u8,
}

The libfuzzer-sys crate lets us define fuzz targets where the input type is anything that implements Arbitrary:

// The fuzz target takes a well-formed `Rgb` as input!
libfuzzer_sys::fuzz_target!(|rgb: Rgb| {
    let hsl = rgb.to_hsl();
    let rgb2 = hsl.to_rgb();

    // RGB -> HSL -> RGB is the identity function. This
    // property should hold true for all RGB colors!
    assert_eq!(rgb, rgb2);
});

Now we have a fuzz target that works with our custom input type directly, is the thinnest abstraction possible over the raw, fuzzer-provided input bytes, and we didn’t need to write any custom parsing logic ourselves!

This isn’t limited to simple structures like Rgb. The arbitrary crate provides Arbitrary implementations for everything in std that you’d expect it to: bool, u32, f64, String, Vec, HashMap, PathBuf, etc… Additionally, the custom derive works with all kinds of structs and enums, as long as each sub-field implements Arbitrary.

For more details check out the new Structure-Aware Fuzzing section of The Rust Fuzzing Book and the arbitrary crate. The book has another neat example, where we’re testing a custom allocator implementation and the fuzz target takes a sequence of malloc, realloc, and free commands.

Bonus: Improved UX in cargo fuzz

When cargo fuzz finds a failing input, it will display the Debug formatting of the failing input (particularly nice with custom Arbitrary inputs) and suggest common next tasks, like reproducing the failure or running test case minimization.

For example, if we write a fuzz target that panics when r < g < b for a given Rgb instance, libFuzzer will very quickly find an input that triggers the failure, and then cargo fuzz will give us this friendly output:

Failing input:

    fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554

Output of `std::fmt::Debug`:

    Rgb {
        r: 30,
        g: 40,
        b: 110,
    }

Reproduce with:

    cargo fuzz run my_fuzz_target fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554

Minimize test case with:

    cargo fuzz tmin my_fuzz_target fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554

Upgrading Fuzz Targets

First, make sure you’ve upgraded cargo fuzz:

cargo install --force cargo-fuzz

Next, upgrade your libfuzzer-sys from a git dependency to the 0.2.0 version on crates.io in your Cargo.toml:

 [dependencies]
-libfuzzer-sys = { git = "https://github.com/rust-fuzz/libfuzzer-sys.git" }
+libfuzzer-sys = "0.2.0"

If your existing fuzz targets were not using custom Arbitrary inputs, and were taking &[u8] slices of raw bytes, then you’re done!

If you’re implementing Arbitrary for your own custom input types, you’ll need to bump your dependency on Arbitrary to version 0.3. We recommend that, unless you have any specialized logic in your Arbitrary implementation, that you use the custom derive.

Enable the custom derive by requiring the "derive" feature:

[dependencies]
arbitrary = { version = "0.3", features = ["derive"] }

And then derive Arbitrary automatically:

use arbitrary::Arbitrary;

#[derive(Arbitrary, Debug)]
struct MyStruct {
    // ...
}

Finally, if you do you specialized logic in your Arbitrary implementation, and can’t use the custom derive, your implementations will change something like this:

 use arbitrary::{Arbitrary, Unstructured};

 impl Arbitrary for MyType {
-    fn arbitrary<U>(u: &mut U) -> Result<Self, U::Error>
-    where
-        U: Unstructured + ?Sized,
-    {
+    fn arbitrary(
+        u: &mut Unstructured
+    ) -> arbitrary::Result<Self> {
         // ...
     }
 }

The trait has been simplified a little bit, and Unstructured is a concrete type now, rather than a trait you need to parameterize over. Check out the CHANGELOG for more details.

CHANGELOGs

Thank You! 💖

Thanks to everyone who contributed to these releases!

  • Alex Rebert
  • koushiro
  • Manish Goregaokar
  • Nick Fitzgerald
  • Simonas Kazlauskas

And a special shout out to Manish for fielding so many pull request reviews!

FAQ

What is fuzzing?

Fuzzing is a software testing technique used to find security, stability, and correctness issues by feeding pseudo-random data as input to the software.

Learn more about fuzzing Rust code in The Rust Fuzzing Book!

How is all this different from quickcheck and proptest?

If you’re familiar with quickcheck or proptest and their own versions of the Arbitrary trait, you might be wondering what the difference is between what’s presented here and those tools.

The primary goal of what’s been presented here is to have a super-thin, efficient layer on top of coverage-guided, mutation-based fuzzers like libFuzzer. That means the paradigm is a little different from quickcheck and proptest. For example, arbitrary::Arbitrary doesn’t take a random number generator like quickcheck::Arbitrary does. Instead it takes an Unstructured, which is a helpful wrapper around a raw byte buffer provided by the fuzzer. It’s similar to libFuzzer’s FuzzedDataProvider. The goal is to preserve, as much as possible, the actual input given to us by the fuzzer, and make sure that small changes in the raw input lead to small changes in the value constructed via arbitrary::Arbitrary. Similarly, we don’t want different, customizable test case generation strategies like proptest supports, because we leverage the fuzzer’s insight into code coverage to efficiently explore the input space. We don’t want to get in the way of that, and step on the fuzzer’s toes.

This isn’t to say that quickcheck and proptest are without value! On the contrary, if you are not using a coverage-guided fuzzer like libFuzzer or AFL — perhaps because you’re working on an unsupported platform — and are instead using a purely-random, generation-based fuzzing setup, then both quickcheck and proptest are fantastic options to consider.

David TellerLayoff survival guide

If you’re reading these lines, you may have recently been laid off from your job. Or maybe, depending on your country and its laws, you’re waiting to know if you’re being laid off.

Well, I’ve been there and I’ve survived it, so, based on my experience, here are a few suggestions:

The Mozilla BlogReadying for the Future at Mozilla

Mozilla must do two things in this era: Continue to excel at our current work, while we innovate in the areas most likely to impact the state of the internet and internet life. From security and privacy network architecture to the surveillance economy, artificial intelligence, identity systems, control over our data, decentralized web and content discovery and disinformation — Mozilla has a critical role to play in helping to create product solutions that address the challenges in these spaces.

Creating the new products we need to change the future requires us to do things differently, including allocating resources for this purpose. We’re making a significant investment to fund innovation. In order to do that responsibly, we’ve also had to make some difficult choices which led to the elimination of roles at Mozilla which we announced internally today.

Mozilla has a strong line of sight on future revenue generation from our core business. In some ways, this makes this action harder, and we are deeply distressed about the effect on our colleagues. However, to responsibly make additional investments in innovation to improve the internet, we can and must work within the limits of our core finances.

We make these hard choices because online life must be better than it is today. We must improve the impact of today’s technology. We must ensure that the tech of tomorrow is built in ways that respect people and their privacy, and give them real independence and meaningful control. Mozilla exists to meet these challenges.

The post Readying for the Future at Mozilla appeared first on The Mozilla Blog.

Hacks.Mozilla.OrgHow we built Picture-in-Picture in Firefox Desktop with more control over video

Picture-in-Picture support for videos is a feature that we shipped to Firefox Desktop users in version 71 for Windows users, and 72 for macOS and Linux users. It allows the user to pull a <video> element out into an always-on-top window, so that they can switch tabs or applications, and keep the video within sight — ideal if, for example, you want to keep an eye on that sports game while also getting some work done.

As always, we designed and developed this feature with user agency in mind. Specifically, we wanted to make it extremely easy for our users to exercise greater control over how they watch video content in Firefox.

Firefox is shown playing a video, and a mouse cursor enters the frame. Upon clicking on the Picture-in-Picture toggle on the video, the video pops out into its own always-on-top player window.<figcaption>Using Picture-in-Picture in Firefox is this easy!</figcaption>

In these next few sections, we’ll talk about how we designed the feature and then we’ll go deeper into details of the implementation.

The design process

Look behind and all around

To begin our design process, we looked back at the past. In 2018, we graduated Min-Vid, one of our Test Pilot experiments. We asked the question: “How might we maximize the learning from Min-Vid?“. Thanks to the amazing Firefox User Research team, we had enough prior research to understand the main pain points in the user experience. However, it was important to acknowledge that the competitive landscape had changed quite a bit since 2018. How were users and other browsers solving this problem already? What did users think about those solutions, and how could we improve upon them?

We had two essential guiding principles from the beginning:

  1. We wanted to turn this into a very user-centric feature, and make it available for any type of video content on the web. That meant that implementing the Picture-in-Picture spec wasn’t an option, as it requires developers to opt-in first.
  2. Given that it would be available on any video content, the feature needed to be discoverable and straight-forward for as many people as possible.

Keeping these principles in mind helped us to evaluate all the different solutions, and was critical for the next phase.

Three sketches showing a possible drag and drop interaction for picture-in-picture<figcaption>Exploring different interactions for Picture-in-Picture</figcaption>

Try, and try again

Once we had an understanding of how others were solving the problem, it was our turn to try. We wanted to ensure discoverability without making the feature intrusive or annoying. Ultimately, we wanted to augment — and not disrupt — the experience of viewing video. And we definitely didn’t want to cause issues with any of the popular video players or platforms.

A screenshot of a YouTube page with a small blue rectangle on the right edge of the video, center aligned<figcaption>A screenshot of one of our early prototypes</figcaption>

This led us to building an interactive, motion-based prototype using Framer X. Our prototype provided a very effective way to get early feedback from real users. In tests, we didn’t focus solely on usability and discoverability. We also took the time to re-learn the problems users are facing. And we learned a lot!

The participants in our first study appreciated the feature, and while it did solve a problem for them, it was too hard to discover on their own.

So, we rolled our sleeves up and tried again. We knew what we were going after, and we now had a better understanding of users’ basic expectations. We explored, brainstormed solutions, and discussed technical limitations until we had a version that offered discoverability without being intrusive. After that, we spent months polishing and refining the final experience!

Stay tuned

From the beginning, our users have been part of the conversation. Early and ongoing user feedback is a critical aspect of product design. It was particularly exciting to keep Picture-in-Picture in our Beta channel as we engaged with users like you to get your input.

We listened, and you helped us uncover new blind spots we might have missed while designing and developing. At every phase of this design process, you’ve been there. And you still are. Thank you!

Implementation detail

The Firefox Picture-in-Picture toggle exists in the same privileged shadow DOM space within the <video> element as the built-in HTML <video> controls. Because this part of the DOM is inaccessible to page JavaScript and CSS stylesheets, it is much more difficult for sites to detect, disable, or hijack the feature.

Into the shadow DOM

Early on, however, we faced a challenge when making the toggle visible on hover. Sites commonly structure their DOM such that mouse events never reach a <video> that the user is watching.

Often, websites place transparent nodes directly over top of <video> elements. These can be used to show a preview image of the underlying video before it begins, or to serve an interstitial advertisement. Sometimes transparent nodes are used for things that only become visible when the user hovers the player — for example, custom player controls. In configurations like this, transparent nodes prevent the underlying <video> from matching the :hover pseudo-class.

Other times, sites make it explicit that they don’t want the underlying <video> to receive mouse events. To do this, they set the pointer-events CSS property to none on the <video> or one of its ancestors.

To work around these problems, we rely on the fact that the web page is being sent events from the browser engine. At Firefox, we control the browser engine! Before sending out a mouse event, we can check to see what sort of DOM nodes are directly underneath the cursor (re-using much of the same code that powers the elementsFromPoint function).

If any of those DOM nodes are a visible <video>, we tell that <video> that it is being hovered, which shows the toggle. Likewise, we use a similar technique to determine if the user is clicking on the toggle.

We also use some simple heuristics based on the size, length, and type of video to determine if the toggle should be displayed at all. In this way, we avoid showing the toggle in cases where it would likely be more annoying than not.

A browser window within a browser

The Picture-in-Picture player window itself is a browser window with most of the surrounding window decoration collapsed. Flags tell the operating system to keep it on top. That browser window contains a special <video> element that runs in the same process as the originating tab. The element knows how to show the frames that were destined for the original <video>. As with much of the Firefox browser UI, the Picture-in-Picture player window is written in HTML and powered by JavaScript and CSS.

Other browser implementations

Firefox is not the first desktop browser to ship a Picture-in-Picture implementation. Safari 10 on macOS Sierra shipped with this feature in 2016, and Chrome followed in late 2018 with Chrome 71.

In fact, each browser maker’s implementation is slightly different. In the next few sections we’ll compare Safari and Chrome to Firefox.

Safari

Safari’s implementation involves a non-standard WebAPI on <video> elements. Sites that know the user is running Safari can call video.webkitSetPresentationMode("picture-in-picture"); to send a video into the native macOS Picture-in-Picture window.

Safari includes a context menu item for <video> elements to open them in the Picture-in-Picture window. Unfortunately, this requires an awkward double right-click to access video on sites like YouTube that override the default context menu. This awkwardness is shared with all browsers that implement the context menu option, including Firefox.

<figcaption>Safari’s video context menu on YouTube.</figcaption>

Safari users can also right-click on the audio indicator in the address bar or the tab strip to trigger Picture-in-Picture:

The Safari web browser playing a video, with the context menu for the audio toggle in the address bar displayed. “Enter Picture in Picture” is one of the menu items.<figcaption>Here’s another way to trigger Picture-in-Picture in Safari.</figcaption>

On newer MacBooks, Safari users might also notice the button immediately to the right of the volume-slider. You can use this button to open the currently playing video in the Picture-in-Picture window:

A close-up photograph of the MacBook Pro touchbar when a video is playing. There is an icon next to the playhead scrubber that opens the video in an always-on-top player window.<figcaption>Safari users with more recent MacBooks can use the touchbar to enter Picture-in-Picture too.</figcaption>

Safari also uses the built-in macOS Picture-in-Picture API, which delivers a very smooth integration with the rest of the operating system.

Comparison to Firefox

Despite this, we think Firefox’s approach has some advantages:

  • When multiple videos are playing at the same time, the Safari implementation is somewhat ambiguous as to which video will be selected when using the audio indicator. It seems to be the most recently focused video, but this isn’t immediately obvious. Firefox’s Picture-in-Picture toggle makes it extremely obvious which video is being placed in the Picture-in-Picture window.
  • Safari appears to have an arbitrary limitation on how large a user can make their Picture-in-Picture player window. Firefox’s player window does not have this limitation.
  • There can only be one Picture-in-Picture window system-wide on macOS. If Safari is showing a video in Picture-in-Picture, and then another application calls into the macOS Picture-in-Picture API, the Safari video will close. Firefox’s window is Firefox-specific. It will stay open even if another application calls the macOS Picture-in-Picture API.

Chrome’s implementation

The PiP WebAPI and WebExtension

Chrome’s implementation of Picture-in-Picture mainly centers around a WebAPI specification being driven by Google. This API is currently going through the W3C standardization process. Superficially, this WebAPI is similar to the Fullscreen WebAPI. In response to user input (like clicking on a button), site authors can request that a <video> be put into a Picture-in-Picture window.

Like Safari, Chrome also includes a context menu option for <video> elements to open in a Picture-in-Picture window.

The Chrome web browser playing a video, with the context menu for the video element hovering over top of it. “Picture in Picture” is one of the menu items.<figcaption>Chrome’s video context menu on YouTube.</figcaption>

This proposed WebAPI is also used by a PiP WebExtension from Google. The extension adds a toolbar button. The button finds the largest video on the page, and uses the WebAPI to open that video in a Picture-in-Picture window.

The Chrome web browser playing a video. The mouse cursor clicks a button in the toolbar provided by a WebExtension which pops the video out into an always-on-top player window.<figcaption>There’s also a WebExtension for Chrome that adds a toolbar button for opening Picture-in-Picture.</figcaption>

Google’s WebAPI lets sites indicate that a <video> should not be openable in a Picture-in-Picture player window. When Chrome sees this directive, it doesn’t show the context menu item for Picture-in-Picture on the <video>, and the WebExtension ignores it. The user is unable to bypass this restriction unless they modify the DOM to remove the directive.

Comparison to Firefox

Firefox’s implementation has a number of distinct advantages over Chrome’s approach:

  • The Chrome WebExtension which only targets the largest <video> on the page. In contrast, the Picture-in-Picture toggle in Firefox makes it easy to choose any <video> on a site to open in a Picture-in-Picture window.
  • Users have access to this capability on all sites right now. Web developers and site maintainers do not need to develop, test and deploy usage of the new WebAPI. This is particularly important for older sites that are not actively maintained.
  • Like Safari, Chrome seems to have an artificial limitation on how big the Picture-in-Picture player window can be made by the user. Firefox’s player window does not have this limitation.
  • Firefox users have access to this Picture-in-Picture capability on all sites. Websites are not able to directly disable it via a WebAPI. This creates a more consistent experience for <video> elements across the entire web, and ultimately more user control.

Recently, Mozilla indicated that we plan to defer implementation of the WebAPI that Google has proposed. We want to see if the built-in capability we just shipped will meet the needs of our users. In the meantime, we’ll monitor the evolution of the WebAPI spec and may revisit our implementation decision in the future.

Future plans

Now that we’ve shipped the first version of Picture-in-Picture in Firefox Desktop on all platforms, we’re paying close attention to user feedback and bug intake. Your inputs will help determine our next steps.

Beyond bug fixes, we’d like to share some of the things we’re considering for future feature work:

  • Repositioning the toggle when there are visible, clickable elements overlapping it.
  • Supporting video captions and subtitles in the player window.
  • Adding a playhead scrubber to the player window to control the current playing position of a <video>.
  • Adding a control for the volume level of the <video> to the player window.

How are you using Picture-in-Picture?

Are you using the new Picture-in-Picture feature in Firefox? Are you finding it useful? Please us know in the comments section below, or send us a Tweet with a screenshot! We’d love to hear what you’re using it for. You can also file bugs for the feature here.

The post How we built Picture-in-Picture in Firefox Desktop with more control over video appeared first on Mozilla Hacks - the Web developer blog.

Doug BelshawStrategic approaches to the development of digital literacies

Updated! Slides now include outputs and link below to archive.org backup

AMICAL 2020 pre-conference workshop

Slides: http://bit.ly/AMICAL-digilit
Backup: https://archive.org/details/AMICAL2020

I’m in Kuwait City today, leading a pre-conference workshop for the AMICAL consortium of American international liberal arts institutions, who work together on common goals for libraries, technology and learning.

This isn’t a ‘tools’ session but rather, as the title would suggest, a strategic look at developing digital literacies strategically across institutions.

This workshop will cover the eight essential elements of digital literacies, exploring ways in which AMICAL institutions can benefit from a strategic approach to the area. The sessions will be of particular use to those who wish to think critically about the role of universities in 21st century society. Participants will leave the workshop empowered with the knowledge and skills to begin implementing digital literacies in a relevant context at their home institution.

I intend to update this post with a backup of the slides in PDF format on archive.org after the workshop. Done!

Zibi BranieckiThe New Localization System for Firefox is in!

After nearly 3 years of work, 13 Firefox releases, 6 milestones and a lot of bits flipped, I’m happy to announce that the project of integrating the Fluent Localization System into Firefox is now completed!

It means that we consider Fluent to be well integrated into Gecko and ready to be used as the primary localization system for Firefox!

Below is a story of how that happened.

3 years of history

At Mozilla All-Hands in December 2016 my team at the time (L10n Drivers) presented a proposal for a new localization system for Firefox and Gecko – Fluent (code name at the time – “L20n“).

The proposal was sound, but at the time the organization was crystallizing vision for what later became known as Firefox Quantum and couldn’t afford pulling additional people in to make the required transition or risk the stability of Firefox during the push for Quantum.

Instead, we developed a plan to spend the Quantum release cycle bringing Fluent to 1.0, modernizing the Internationalization stack in Gecko, getting everything ready in place, and then, once the Quantum release completes, we’ll be ready to just land Fluent into Firefox!

<figcaption>Original schema of the proposed system integration into Gecko</figcaption>

We divided the work between two main engineers on the project – Staś Małolepszy took the lead of Fluent itself, while I became responsible for integrating it into Firefox.

My initial task was to refactor all of the locale management and higher-level internationalization integration (date/time formatting, number formatting, plural rules etc.) to unify around a common Unicode-backed model, all while avoiding any disruptions for the Quantum project, and by all means avoid any regressions.

I documented the first half of 2017 progress in a blog post “Multilingual Gecko in 2017” which became a series of reports on the progress of in our internationalization module, and ended up with a summary about the whole rearchitecture which ended up with a rewrite of 90% of code in intl::locale component.

Around May 2017, we had ICU enabled in all builds, all the required APIs including unified mozilla::intl::LocaleService, and the time has come to plan how we’re going to integrate Fluent into Gecko.

Planning

Measuring

Before we began, we wanted to understand what the success means, and how we’re going to measure the progress.

Stating that we aim at making Fluent a full replacement for the previous localization systems in Firefox (DTD and .properties) may be overwhelming. The path from landing the new API in Gecko, to having all of our UI migrated would likely take years and many engineers, and without a good way to measure our progress, we’d be unable to evaluate it.

<figcaption>Original draft of a per-component dashboard</figcaption>

Together with Axel, Staś and Francesco, we spent a couple days in Berlin going back and forth on what should we measure. After brainstorming through ideas such as fluent-per-component, fluent-per-XUL-widget and so on, we eventually settled on the simplest one – percentage of localization messages that use Fluent.

<figcaption>Original draft of a global percentage view</figcaption>

We knew we could answer more questions with more detailed breakdowns, but each additional metric required additional work to receive it and keep it up to date. With limited resources, we slowly gave up on aiming for detail, and focused on the big picture.

Getting the raw percentage of strings in Fluent to start with, and then adding more details, allowed us to get the measurements up quickly and have them available independently of further additions. Big picture first.

Staś took ownership over the measuring dashboard, wrote the code and the UI and soon after we had https://www.arewefluentyet.com running!

<figcaption>AreWeFluentYet.com as of January 12th 2020</figcaption>

Later, with the help from Eric Pang, we were able to improve the design and I added two more specific milestones: Main UI, and Startup Path.

The dashboard is immensely useful, both for monitoring the progress, and evangelizing the effort, and today if you visit any Mozilla office around the World, you’ll see it cycle through on the screens in common areas!

Target Component

To begin, we needed to get agreement with the Firefox Product Team on the intended change to their codebase, and select a target for the initial migration to validate the new technology.

We had a call with the Firefox Product Lead who advised that we start with migrating Preferences UI as a non-startup, self-contained, but sufficiently large piece of UI.

It felt like the right scale. Not starting with the startup path limited the risk of breaking peoples Nightly builds, and the UI itself is complex enough to test Fluent against large chunks of text, giving our team and the Firefox engineers time to verify that the API works as expected.

We knew the main target will be Preferences now, but we couldn’t yet just start migrating all of it. We needed smaller steps to validate the whole ecosystem is ready for Fluent, and we needed to plan separate steps to enable Fluent everywhere.

I split the whole project into 6 phases, each one gradually building on top of the previous ones.

<figcaption>Outline of the phases used by the stakeholders to track progress</figcaption>

Phase 1 – Soft Launch (May 2017 – November 2017)

In this phase we intent to introduce a single, new, trivial message using Fluent into Firefox. That will require all core systems, like LocaleService, Intl, L10nRegistry, Fluent, Fluent-DOM, and Fluent-Gecko to work,. On top of that, we’ll need compare-locales, and Pontoon support. In this phase we will intentionally keep the new API under control with a whitelist of enabled .ftl files to remove the risk of engineers starting to use our API prematurely.

Bug 1402061 – Soft-launch the new localization API
<figcaption>Outline of Phase 1</figcaption>

Introducing a new localization system is already a huge challenge and if we caused regressions, it’d be much harder to get a buy-in from the organization for the future work. We needed that trust, and had to be careful not to lose it.

With an aim at migrating all eleven thousand strings in Firefox to Fluent, having a whole phase devoted to migrating just one may seem an overkill, but we wanted to start very small and be very careful.

The first phase landed just a single string in our JavaScript code, behind a flag intl.l10n.fluent.disabled, which we could flip on/off.

It’s easy to underestimate how much ecosystem alignment work is needed, and a milestone like this is a great way to expose it.

In order to complete this milestone, we had to remodel our language packs, localization tools, build system and other bits. A lot of small and medium size blockers were discovered quite late in the cycle contributing to a close to 6 weeks delay and a total time of 3 months to migrate a single string!

Eventually, the string landed in Firefox 58 and while itself it may seem like a small patch, the dependency tree of changes that were required to get there, tells a different story.

In the end, with support from Axel Hecht and Matjaz Horvat, we were able to complete this cycle in time for Firefox 58 train, and move on to the next one!

Phase 2 – First Migration

In this phase we will migrate the first piece of UI to Fluent. It will be a stable and fairly hidden piece of UI that is unlikely to change within months. We’ll also want the strings involved to be as simple as possible. This phase will test our migration infrastructure in Fluent, Gecko and Pontoon.

Bug 1407256 – First-migration release for the new localization API
<figcaption>Outline of Phase 2</figcaption>

In Phase 1, we added a string. But with close to 11 thousand strings in Firefox, migration of existing strings was known to be a substantial task on its own.

Staś wrote a new application devoted to facilitate migration by applying “migration recipes”. It works like this: First, we write a patch that migrates the strings and callsites in our codebase. Then, we generate a small python script that is later used by the migration application to take existing translations from all 100+ locales and place them in the new Fluent files.

The intended result is that we can migrate from the old systems to the new one without asking our localizers to manually re-translate all new strings that appear in result of it!

In Firefox 59, we migrated 5 strings from DTD to Fluent, and successfully applied the produced migration recipe onto all locales Firefox is translated in!

Phase 3 – Preferences

In this phase, we will migrate a single, complex, component off the start up path – Preferences. It’ll require us to prove Fluent integration with Intl, and Pontoon handling of more complex FTL features. We will be closely working with the most senior Firefox Front-End Team engineers as reviewers and reacting to their feedback on missing and required features in Fluent.

<figcaption>Outline of Phase 3</figcaption>

The jump between Phase 2 and Phase 3 was quite massive – from 5 strings to ~1600. Preferences UI is a text-heavy, large component which is important for a regular user experience and any regression in this area will be considered critical.

As you can see from the outline, this phase was divided into a very large number of bugs that were tracked separately and depending on which of the Fluent features a given part of the UI used, had different dependencies and blockers.

The area of concern in this phase shifted back from refactoring the ecosystem, bindings and tooling, back to the core of Fluent as we needed to finalize many of its features such as DOM Overlays, PLATFORM selector, and update Fluent itself to 0.6.

With the flock of new strings landing into our repository, Pontoon – our web localization platform – had to add support for many of the Fluent features that now became used in production.

This phase was particularly long, as we were monitoring the impact of our changes and fine-tuning the whole network of tools with each piece of UI migrated, but in the end, we were able to migrate all of the Preferences to Fluent, significantly simplify the localization API usage, and maintain the performance characteristics.

<figcaption>Statistics on Preferences UI L10n API between Firefox 57 and Firefox 71</figcaption>

The most impressive number here is the 10 fold reduction of JS calls. That means that we removed 180 places where JS code had to retrieve a string and push it to the UI, replacing all of them with declarative bindings. Declarative bindings are much less error prone, easier to handle asynchronously, and maintain with tools.

Firefox 62 shipped with over 1600 Fluent strings into production!

Phase 4 – Open Migration

In this phase we will start migrating more pieces of Firefox front-end to Fluent one by one. All work will be tightly monitored by the L10n Drivers team to allow us to evaluate readiness of our system for each component, postpone it if needed, and adjust the speed of transition on the go.

<figcaption>Outline of Phase 4</figcaption>

After Firefox 62, we started two phases in parallel.

Phase 4 – Open Migration – we intended to build up on the work we’ve done in Phase 3.

With over 1600 strings migrated, we started cleaning up our integration code and bringing more of the code deep into the DOM API. In particular, we integrated the main entry point for the Fluent – document.l10n into our Document WebIDL, making it available to all internal UI documents.

Knowing that we can migrate large chunks of strings as complex as ones we encountered in Preferences, we were able to continuously migrate batches of strings, and extend Fluent coverage to other areas of Firefox UI such as System Add-Ons, and non-privileged documents (about:neterror etc.).

At the same time, my focus shifted to the most challenging phase – Phase 5.

Phase 5 – Startup Path

In this phase we expect to be ready to enable Fluent on the startup path of Firefox. This phase may happen at the same time as the previous one, but if we encounter delays, we can specifically postpone this one without blocking the previous one.

Bug 1441035 – Improve performance of Fluent on the startup path
<figcaption>Outline of Phase 5</figcaption>

Despite how small the outline is, we knew that Phase 5 will be the longest and most challenging one.

We had one goal here – enable Fluent on the startup path without regressing performance.

Previous localization systems were very simple and well integrated into Gecko. Over 10 years of performance profiling and optimizations led to very tight and optimized codepaths for DTD and Properties, that we had to now replicate with Fluent, in order to enable use of Fluent on the startup path.

Initial Fluent numbers, even from before we started this project, indicated 15-30 ms performance regression on the startup, and performance profiling indicated that majority of that comes from using JavaScript for applying translation onto DOM.

JavaScript was a very good choice for the prototyping phase of Fluent, but with the shift from design, to implementation phase, we had to remove the cost of calling JS from C++ and C++ from JS.

The bulk of the work went into migrating all pieces of Fluent which interact with the DOM to C++. On top of that, with the help from Dave Townsend and Olli Pettay I was able to hook Fluent localization cycle into XUL cache to get on par with what DTD was doing.

There was one more tricky piece to add. Originally, per request from Benjamin Smedberg, Fluent was designed to be fully asynchronous, but during a design brainstorm on the startup path model with Boris Zbarsky, he asked me to add a synchronous mode which would be used just for the startup path.

The rationale is that while having localization resources I/O not block UI makes sense in almost all cases, there is no reason to delay I/O for the resources needed to paint the initial UI.

Fortunately, adding synchronous mode (triggered by setting data-l10n-sync attribute on the root element of the document) to an asynchronous code is much easier than doing the reverse, and with the help from Axel Hecht, I was able to get this dual-mode to work!

In the end, this phase took close to a year, and we finally completed it in August of 2019, enabling the first strings in the main browser UI to be migrated away from DTD to Fluent!

Phase 6 – Full Launch

In this phase we will remove the whitelist and let the developers start using the new API. We’ll still monitor their work and will ask for a period of adding us as reviewers on any patch that uses the new API, until we gain confidence in the new system.

<figcaption>Outline of Phase 6</figcaption>

With Fluent available and tested in various major pieces of Firefox UI such as Preferences, startup path, privileged and non-privileged documents and add-ons, Phase 6 allowed us to wrap up the lose ends and tighten the experience of working with Fluent.

We improved error messages, tests, and migration recipes, and continuously migrated more of Firefox UI to Fluent with increasing confidence that the system holds well and is a capable replacement for the older systems.

Finally, in November, we decided that the remaining items in Phase 6 are not of high priority, and with ~75% of DTD strings removed, and close to 4000 Fluent strings in place, we announced deprecation of the DTD as a localization system in Firefox. That, symbolically, marked the completion of this project!

Takeaways

Here’s a subset of the lessons learned we accumulated from our post-mortems organized at the end of each of the six phases:

  • Start with a very small and well defined milestone.
  • Keep an up-to-date single-view dashboard of project status to align stakeholders
  • Hold kick-off and post-mortem meetings even if it seems mundane. They create space for stakeholders to improve the process.
  • Be vocal about delays. They accumulate fast in the phantom traffic jam model.
  • Syntax changes against in-production code are much harder than API changes.
  • When working cross-team, document everything. External stakeholders are driven toward well-documented projects.
  • Divide large project into small phases in an exponential rather than linear model.
  • Landing a major in-development project into Gecko over a 3 year span is very challenging. There’s movement below (Gecko is a moving target) and above (Fluent was a moving target). Integration ends up being a very delicate task.
  • The best technology stack for prototyping is likely not going to be the best stack for production. Make decisions about how you’re going to handle that.

Next steps

Today, we have only 1100 DTD strings left from the original ~4000, and are focused on removing the 500 of them which are still on the startup path.

This is not the end of work yet as both Fluent development and its integration into Firefox are active, but it is a symbolic milestone for all those involved as we now completed a task that we clearly defined for ourselves in December 2016, and it is a rare occurrence in the software engineering realm for a project to stay on track for so long and deliver a closure in line with the original expectations.

We’re also bringing Fluent as an input proposal to the newly formed Unicode Message Format Working Group, with the hopes of working with the whole industry to develop a future Unicode Standard.

In Firefox, in 2020 we hope to eradicate all of DTD calls, bring new features to Fluent, deprecate .properties, migrate Fluent in Gecko to fluent-rs, and start building new capabilities that become possible when our engine uses a single, modern, unified localization system. Onwards!

Experience

It was an exciting and challenging project. It spanned from the beginning of 2017 till the end of 2019, and challenged me in many new ways.

I had to design a multi-stage, multi-year roadmap, coordinate the effort between multiple teams and people which impacted high number of modules of a product installed on hundreds of millions of machines, keep track of progress of work for many engineers on various teams such as build system, add-ons, DOM, performance, front end, security, l10n etc., run post-mortems for each phase and balance the workload while ensuring minimal time is spent being blocked.

In parallel, Staś had to plan and lead the standardization of the Fluent syntax and the implementation of the API. The parallelism of those two efforts was both a blessing and a curse for us.

Having a tight-loop in which we were able to test the revisions against production helped us avoid conceptual rabbit holes and also helped us shape Fluent much faster. At the same time the growing number of strings that already used Fluent in Firefox became an increasingly strong factor limiting our decisions and forcing us to make sub-optimal compromises from the design perspective, just to avoid having to refactor all the already-in-production bits that relied on the earlier prototypes.

While the project was challenging on many fronts, encountered numerous delays and each post-mortem collected many lessons-learned and next-time, I’m really happy that the roadmap designed in the end of 2016 worked without any major changes all the way till the completion 3 years later, and all stakeholders reported positive experience of working on it and called it a success!

To recognize that achievement, we’re going to hold a small celebration at the upcoming Mozilla All Hands in Berlin!

Mozilla GFXmoz://gfx newsletter #50

Hi there! Another gfx newsletter incoming.

Glenn and Sotaro’s work on integrating WebRender with DirectComposition on Windows is close to being ready. We hope to let it ride the trains for Firefox 75. This will lead to lower GPU usage and energy consumption. Once this is done we plan to follow up with enabling WebRender by default for Windows users with (some subset of) Intel integrated GPUs, which is both challenging (these integrated GPUs are usually slower than discrete GPUs and we have run into a number of driver bugs with them on Windows) and rewarding as it represents a very large part of the user base.

Edit: Thanks to Robert in the comments section of this post for mentioning the Linux/Wayland progress! I copy-pasted it here:

Some additional highlights for the Linux folks: Martin Stránský is making good progress on the Wayland front, especially concerning DMABUF. It will allow better performance for WebGL and hardware decoding for video (eventually). Quoting from https://bugzilla.mozilla.org/show_bug.cgi?id=1586696#c2:

> there’s a WIP dmabuf backend patch for WebGL, I see 100% performance boost with it for simple WebGL samples at GL compositor (it’s even faster than chrome/chromium on my box).

And there is active work on partial damage to reduce power consumption: https://bugzilla.mozilla.org/show_bug.cgi?id=1484812

What’s new in gfx

  • Handyman fixed fixed a crash in the async plugin infrastructure.
  • Botond fixed (2) various data races in the APZ code.
  • Sean Feng fixed another race condition in APZ code.
  • Andrew fixed a crash with OMTP and image decoding.
  • Sotaro fixed a crash with the GL compositor on Wayland.
  • Botond worked with Facebook developers to resolve a scrolling-related usability problem affecting Firefox users on messenger.com, primarily on MacOS.
  • Botond fixed (2) divisions by zero various parts of the APZ.
  • Sean Feng added some telemetry for touch input latency.
  • Timothy made sure all uses of APZCTreeManager::mGeckoFixedLayerMargins are protected by the proper mutex.
  • Boris Chiou moved animations of transforms with preserve-3d off the main thread
  • Jamie clamped some scale transforms at 32k to avoid excessively large rasterized areas.
  • Jonathan Kew reduced the emboldening strength used for synthetic-bold faces with FreeType.
  • Andrew implemented NEON accelerated methods for unpacking RGB to RGBA/BGRA.
  • Alex Henrie fixed a bug in Moz2D’s Skia backend.

What’s new in WebRender

WebRender is a GPU based 2D rendering engine for the web written in Rust, currently powering Firefox‘s rendering engine as well as Mozilla’s research web browser servo.

  • Miko avoided calculating snapped bounds twice for some display items.
  • Kris fixed snapping and rounding errors causing picture caching invalidation when zoomed in.
  • Glenn fixed a picture caching invalidation bug.
  • Kvark ensured shader programs are bound after changing the blend mode. While not necessary for OpenGL, this makes it easier to efficiently implement backends for vulkan and other modern GPU APIs.
  • Glenn refactored the OS compositor abstraction.
  • Jamie implemented a texture upload path that plays better with Adreno OpenGL drivers.
  • Jonathan Kew reduced the emboldening strength used for synthetic-bold faces with FreeType.
  • Nical prevented invalid glyphs from generating expensive rasterization requests every frame.
  • Nical reduced the number of memory allocations associated with clip chain stacks.
  • Nical reduced the number of memory allocations in various parts of the picture caching code.
  • Glenn fixed a picture caching invalidation issue when scrollbars are disabled.
  • Glenn and Andrew adjusted tradeoffs between text rendering quality and performance.
  • Miko simplified some of the scene building code.
  • Jamie switched to local raster space when animating a double tap zoom to avoid uploading glyphs continuously on Android.
  • Glenn fixed an intermittent compositor surface creation bug.
  • Andrew fixed a shader compilation error on some Android devices.
  • Bert improved the way picture cache tile sizes are selected for scroll bars.
  • Gankra removed some unsafe code in wrench.
  • Glenn fixed an issue with picture cache tile merging heuristics.
  • Glenn fixed tile opacity getting out of sync with compositor surfaces.
  • Glenn added an API for tagging image descriptors as candidates for native compositor surfaces (typically video frames).
  • Sotaro followed up by tagging the approriate image descriptors on the content side.
  • Andrew removed removed pixel snapping from most shaders, now that it is handled earlier in the pipeline.
  • Glenn improved the invalidation logic for images with picture caching.
  • Glenn improved the logic to detect identical frames and skip composition.
  • Glenn fixed the shader implementation of rounded rectangles with very small radii.
  • Kris fixed misplaced text selection popup with GeckoView.
  • Markus fixed a ton of issues with WebRender/CoreAnimation integration.
  • Markus shared the depth buffer between OS compositor tiles on MacOS to save memory.
  • Sotaro fixed image bitmap canvases with WebRender.
  • Sotaro fixed a crash at the intersection between picture-in-picture and WebRender frame throttling.
  • Timothy implemented support for respecting fixed layer margins during hit-testing.
  • Timothy implemented GeckoView’s setVerticalClipping API for WebRender.
  • Jeff fixed an SVG rendering bug.
  • Jamie fixed an issue where the screen would remain black after resuming Firefox for Android.

To enable WebRender in Firefox, in the about:config page, enable the pref gfx.webrender.all and restart the browser.

WebRender is available under the MPLv2 license as a standalone crate on crates.io (documentation) for use in your own rust projects.

What’s new in Wgpu

  • Kvark implemented buffer creation and mapping, with an ability to both provide data and read it back from the GPU.
  • Kvark set up the synchronization from Mozilla Central to Github repository.
  • jdashg created a separate category for WebGPU mochitests.
  • Kvark heavily reworked lifetime and usage tracking of resources.
  • Many fixes and improvements were made by the contributors to wgpu (thank you!)

 

Will Kahn-GreeneSwitching from pyup to dependabot

Switching from pyup to dependabot

I maintain a bunch of Python-based projects including some major projects like Crash Stats, Mozilla Symbols Server, and Mozilla Location Services. In order to keep up with dependency updates, we used pyup to monitor dependencies in those projects and create GitHub pull requests for updates.

pyup was pretty nice. It would create a single pull request with many dependency updates in it. I could then review the details, wait for CI to test everything, make adjustments as necessary, and then land the pull request and go do other things.

Starting in October of 2019, pyup stopped doing monthly updates. A co-worker of mine tried to contact them to no avail. I don't know what happened. I got tired of waiting for it to start working again.

Since my projects are all on GitHub, we had already switched to GitHub security alerts. Given that, I decided it was time to switch from pyup to dependabot (also owned by GitHub).

Switching from pyup to dependabot

I had to do a bunch of projects, so I ended up with a process along these lines:

  1. Remove projects from pyup.

    All my projects are either in mozilla or mozilla-services organizations on GitHub.

    We had a separate service account configure pyup, so I'm not able to make changes to pyup myself.

    I had to ask Greg to remove my projects from pyup.

    I wouldn't suggest proceeding until your project has been removed from pyup. Otherwise, it's possible you'll get PRs from pyup and dependabot for the same updates.

  2. Add dependabot configuration to repo.

    Then I added the required dependabot configuration to my repository and removed the pyup configuration.

    I used these resources:

    I created a pull request with these changes, reviewed it, and landed it.

  3. Enable dependabot.

    For some reason, I couldn't enable dependabot for my projects. I had to ask Greg who I think asked Hal to enable dependabot for my projects.

    Once this was done, then dependabot created a plethora of pull requests.

While there are Mozilla-specific bits in here, it's probably generally helpful.

Dealing with incoming pull requests

dependabot isn't as nice as pyup was. It can only update one dependency per PR. That stinks for a bunch of reasons:

  1. working through 30 PRs is extremely time consuming

  2. every time you finish up work on one PR, it triggers dependabot to update the others and that triggers email notifications, CI builds, and a bunch of spam and resource usage

  3. dependencies often depend on each other and need to get updated as a group

Since we hadn't been keeping up with Python dependencies, we ended up with between 20 and 60 pull requests to deal with per repository.

For Antenna, I rebased each PR, reviewed it, and merged it by hand. That took a day to do. It sucked. I can't imagine doing this four times every month.

While working on PRs for Socorro, I hit a case where I needed to update multiple dependencies at the same time. I decided to write a tool that combined pull requests.

Thus was born paul-mclendahand. Using this tool, I can combine pull requests. Using paul-mclendahand, I worked through 20 pull requests for Tecken in about an hour. This saves me tons of time!

My process goes like this:

  1. create a new branch on my laptop based off of master

  2. list all open pull requests by running pmac listprs

  3. make a list of pull requests to combine into it

  4. for each pull request, I:

    1. run pmac add PR

    2. resolve any cherry-pick conflicts

    3. (optional) rebuild my project and run tests

  5. push the new branch to GitHub

  6. create a pull request

  7. run pmac prmsg and copy-and-paste the output as the pull request description

I can then review the pull request. It has links to the other pull requests and the data that dependabot puts together for each update. I can rebase, add additional commits, etc.

When I'm done, I merge it and that's it!

paul-mclendahand v1.0.0

I released paul-mclendahand 1.0.0!

Install it with pipx:

pipx install paul-mclendahand

Install it with pip:

pip install paul-mclendahand

It doesn't just combine pull requests from dependabot--it's general and can work on any pull requests.

If you find any issues, please report them in the issue tracker.

I hope this helps you!

Daniel StenbergBackblazed

I’m personally familiar with Backblaze as a fine backup solution I’ve helped my parents in law setup and use. I’ve found it reliable and easy to use. I would recommend it to others.

Over the Christmas holidays 2019 someone emailed me and mentioned that Backblaze have stated that they use libcurl but yet there’s no license or other information about this anywhere in the current version, nor on their web site. (I’m always looking for screenshotted curl credits or for data to use as input when trying to figure out how many curl installations there are or how many internet transfers per day that are done with curl…)

libcurl is MIT licensed (well, a slightly edited MIT license) so there’s really not a lot a company need to do to follow the license, nor does it leave me with a lot of “muscles” or remedies in case anyone would blatantly refuse to adhere. However, the impression I had was that this company was one that tried to do right and this omission could then simply be a mistake.

I sent an email. Brief and focused. Can’t hurt, right?

Immediate response

Brian Wilson, CTO of Backblaze, replied to my email within hours. He was very friendly and to the point. The omission was a mistake and Brian expressed his wish and intent to fix this. I couldn’t ask for a better or nicer response. The mentioned fixup was all that I could ask for.

Fixed it

Today Brian followed up and showed me the changes. Delivering on his promise. Just totally awesome.

Starting with the Windows build 7.0.0.409, the Backblaze about window looks like this (see image below) and builds for other platforms will follow along.

15,600 US dollars

At the same time, Backblaze also becomes the new largest single-shot donor to curl when they donated no less than 15,600 USD to the project, making the recent Indeed.com donation fall down to a second place in this my favorite new game of 2020.

Why this particular sum you may ask?

Backblaze was started in my living room on Jan 15, 2007 (13 years ago tomorrow) and that represents $100/month for every month Backblaze has depended on libcurl back to the beginning.

/ Brian Wilson, CTO of Backblaze

I think it is safe to say we have another happy user here. Brian also shared this most awesome statement. I’m happy and proud to have contributed my little part in enabling Backblaze to make such cool products.

Finally, I just want to say thank you for building and maintaining libcurl for all these years. It’s been an amazing asset to Backblaze, it really really has.

Thank you Backblaze!

This Week In RustThis Week in Rust 321

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community

News & Blog Posts

Crate of the Week

This week's crate is cxx, a library to build a C++ FFI safely by taking care of both sides.

Thanks to Ehsan M. Kermani for the suggestions!

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

311 pull requests were merged in the last week

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

No RFCs were approved this week.

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now.

RFCs
Tracking Issues & PRs

New RFCs

Upcoming Events

Europe
North America
South America

If you are running a Rust event please add it to the calendar to get it mentioned here. Please remember to add a link to the event too. Email the Rust Community Team for access.

Rust Jobs

No jobs listed for this week.

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

@ZiCog: Does anyone have a 'no holds barred, unsafe or not' solution to the problem in Rust that can match C?

@kornel: Pipe the C version through c2rust :slight_smile:

@ZiCog: Yay! Rust now beats both Clang and GCC!

ZiCog and Kornel on rust-users

Thanks to Jan Riemer for the suggestion!

Please submit quotes and vote for next week!

This Week in Rust is edited by: nasa42 and llogiq.

Discuss on r/rust.

Mozilla Security BlogJanuary 2020 CA Communication

Mozilla has sent a CA Communication to inform Certificate Authorities (CAs) who have root certificates included in Mozilla’s program about current events relevant to their membership in our program and to remind them of upcoming deadlines. This CA Communication has been emailed to the Primary Point of Contact (POC) and an email alias for each CA in Mozilla’s program, and they have been asked to respond to the following 7 action items:

  1. Read and fully comply with version 2.7 of Mozilla’s Root Store Policy.
  2. Ensure that their CP and CPS complies with the updated policy section 3.3 requiring the proper use of “No Stipulation” and mapping of policy documents to CA certificates.
  3. Confirm their intent to comply with section 5.2 of Mozilla’s Root Store Policy requiring that new end-entity certificates include an EKU extension expressing their intended usage.
  4. Verify that their audit statements meet Mozilla’s formatting requirements that facilitate automated processing.
  5. Resolve issues with audits for intermediate CA certificates that have been identified by the automated audit report validation system.
  6. Confirm awareness of Mozilla’s Incident Reporting requirements and the intent to provide good incident reports.
  7. Confirm compliance with the current version of the CA/Browser Forum Baseline Requirements.

The full action items can be read here. Responses to the survey will be automatically and immediately published by the CCADB.

With this CA Communication, we reiterate that participation in Mozilla’s CA Certificate Program is at our sole discretion, and we will take whatever steps are necessary to keep our users safe. Nevertheless, we believe that the best approach to safeguard that security is to work with CAs as partners, to foster open and frank communication, and to be diligent in looking for ways to improve.

The post January 2020 CA Communication appeared first on Mozilla Security Blog.

The Firefox FrontierNo judgment digital definitions: Online advertising strategies

It’s hard to go anywhere on the internet without seeing an ad. That’s because advertising is the predominant business model of the internet today. Websites and apps you visit every … Read more

The post No judgment digital definitions: Online advertising strategies appeared first on The Firefox Frontier.

Mozilla Open Policy & Advocacy BlogCompetition and Innovation in Software Development Depend on a Supreme Court Reversal in Google v. Oracle

Today, Mozilla filed a friend of the court brief with the Supreme Court in Google v. Oracle, the decade-long case involving questions of copyright for functional elements of Oracle’s Java SE. This is the fourth amicus brief so far that Mozilla has filed in this case, and we are joined by Medium, Cloudera, Creative Commons, Shopify, Etsy, Reddit, Open Source Initiative, Mapbox, Patreon, Wikimedia Foundation, and Software Freedom Conservancy.

Arguing from the perspective of small, medium, and open source technology organizations, the brief urges the Supreme Court to reverse the Federal Circuit’s holdings first that the structure, sequence, and organization (“SSO”) of Oracle’s Java API package was copyrightable, and subsequently that Google’s use of that SSO was not a “fair use” under copyright law.

At bottom in the case is the issue of whether copyright law bars the commonplace practice of software reimplementation, “[t]he process of writing new software to perform certain functions of a legacy product.” (Google brief p.7) Here, Google had repurposed certain functional elements of Java SE (less that 0.5% of Java SE overall, according to Google’s brief, p. 8) in its Android operating system for the sake of interoperability—enabling Java apps to work with Android and Android apps to work with Java, and enabling Java developers to build apps for both platforms without needing to learn the new conventions and structure of an entirely new platform.

Mozilla believes that software reimplementation and the interoperability it facilitates are fundamental to the competition and innovation at the core of a flourishing software development ecosystem. However, the Federal Circuit’s rulings would upend this tradition of reimplementation not only by prohibiting it in the API context of this case but by calling into question enshrined tenets of the software industry that developers have long relied on to innovate without fear of liability. With the consequence that small software developers are disadvantaged and innovations are fewer, incumbents’ positions in the industry are reinforced with a decline in incentive to improve their products, and consumers lose out. We believe that a healthy internet depends on the Supreme Court reversing the Federal Circuit and reaffirming the current state of play for software development, in which copyright does not stand in the way of software developers reusing SSOs for API packages in socially, technologically, and economically beneficial ways.

The post Competition and Innovation in Software Development Depend on a Supreme Court Reversal in <i>Google v. Oracle</i> appeared first on Open Policy & Advocacy.

William LachanceConda is pretty great

Lately the data engineering team has been looking into productionizing (i.e. running in Airflow) a bunch of models that the data science team has been producing. This often involves languages and environments that are a bit outside of our comfort zone — for example, the next version of Mission Control relies on the R-stan library to produce a model of expected crash behaviour as Firefox is released.

To make things as simple and deterministic as possible, we’ve been building up Docker containers to run/execute this code along with their dependencies, which makes things nice and reproducible. My initial thought was to use just the language-native toolchains to build up my container for the above project, but quickly found a number of problems:

  1. For local testing, Docker on Mac is slow: when doing a large number of statistical calculations (as above), you can count on your testing iterations taking 3 to 4 (or more) times longer.
  2. On initial setup, the default R packaging strategy is to have the user of a package like R-stan recompile from source. This can take forever if you have a long list of dependencies with C-compiled extensions (pretty much a given if you’re working in the data space): rebuilding my initial docker environment for missioncontrol-v2 took almost an hour. This isn’t just a problem for local development: it also makes continuous integration using a service like Circle or Travis expensive and painful.

I had been vaguely aware of Conda for a few years, but didn’t really understand its value proposition until I started working on the above project: why bother with a heavyweight package manager when you already have Docker to virtualize things? The answer is that it solves both of the above problems: for local development, you can get something more-or-less identical to what you’re running inside Docker with no performance penalty whatsoever. And for building the docker container itself, Conda’s package repository contains pre-compiled versions of all the dependencies you’d want to use for something like this (even somewhat esoteric libraries like R-stan are available on conda-forge), which brought my build cycle times down to less than 5 minutes.

tl;dr: If you have a bunch of R / python code you want to run in a reproducible manner, consider Conda.

Daniel Stenbergcurl ootw: –raw

(ootw is short for “option of the week“!)

--raw

Introduced back in April of 2007 in curl 7.16.2, the man page details for this option is very brief:

(HTTP) When used, it disables all internal HTTP decoding of content or transfer encodings and instead makes them passed on unaltered, raw.

This option is for HTTP(S) and it was brought to curl when someone wanted to use curl in a proxy solution. In that setup the user parsed the incoming headers and acted on them and in the case where for example chunked encoded data is received, which curl then automatically “decodes” so that it can deliver the pure clean data, the user would find that there were headers in the received response that says “chunked” but since libcurl had already decoded the body, it wasn’t actually still chunked when it landed!

In the libcurl side, an application can explicitly switch off this, by disabling transfer and content encoding with CURLOPT_HTTP_TRANSFER_DECODING and CURLOPT_HTTP_CONTENT_DECODING.

The --raw option is the command line version that disable both of those at once.

With --raw, no transfer or content decoding is done and the “raw” stream is instead delivered or saved. You really only do this if you for some reason want to handle those things yourself instead.

Content decoding includes automatice gzip compression, so --raw will also disable that, even if you use --compressed.

It should be noted that chunked encoding is a HTTP/1.1 thing. We don’t do that anymore in HTTP/2 and later – and curl will default to HTTP/2 over HTTPS if possible since a while back. Users can also often avoid chunked encoded responses by insisting on HTTP/1.0, like with the --http1.0 option (since chunked wasn’t included in 1.0).

Example command line

curl --raw https://example.com/dyn-content.cgi

Related options

--compressed asks the server to provide the response compressed and curl will then decompress it automatically. Thus reduce the amount of data that gets sent over the wire.

Wladimir PalantPwning Avast Secure Browser for fun and profit

Avast took an interesting approach when integrating their antivirus product with web browsers. Users are often hard to convince that Avast browser extensions are good for them and should be activated in their browser of choice. So Avast decided to bring out their own browser with the humble name Avast Secure Browser. Their products send a clear message: ditch your current browser and use Avast Secure Browser (or AVG Secure Browser as AVG users know it) which is better in all respects.

Avast Secure Browser is based on Chromium and its most noticeable difference are the numerous built-in browser extensions, usually not even visible in the list of installed extensions (meaning that they cannot be disabled by regular means). Avast Secure Browser has eleven custom extensions, AVG Secure Browser has eight. Now putting eleven extensions of questionable quality into your “secure” browser might not be the best idea. Today we’ll look at the remarkable Video Downloader extension which essentially allowed any website to take over the browser completely (CVE-2019-18893). An additional vulnerability then allowed it to take over your system as well (CVE-2019-18894). The first issue was resolved in Video Downloader 1.5, released at some point in October 2019. The second issue remains unresolved at the time of writing. Update (2020-01-13): Avast notified me that the second issue has been resolved in an update yesterday, I can confirm the application version not being vulnerable any more after an update.

Malicious actors coming through Avast software

Note: I did not finish my investigation of the other extensions which are part of the Avast Secure Browser. Given how deeply this product is compromised on another level, I did not feel that there was a point in making it more secure. In fact, I’m not going to write about the Avast Passwords issues I reported to Avast – nothing special here, yet another password manager that made several of the usual mistakes and put your data at risk.

Summary of the findings

Browser vendors put a significant effort into limiting the attack surface of browser extensions. The Video Downloader extension explicitly chose to disable the existing security mechanisms however. As a result, a vulnerability in this extension had far reaching consequences. Websites could inject their JavaScript code into the extension context (CVE-2019-18893). Once there, they could control pretty much all aspects of the browser, read out any data known to it, spy on the user as they surf the web and modify any websites.

This JavaScript code, like any browser extension with access to localhost, could also communicate with the Avast Antivirus application. This communication interface has a vulnerability in the command starting Banking Mode which allows injecting arbitrary command line flags (CVE-2019-18894). This can be used to gain full control of Avast Secure Browser in Banking Mode and even execute local applications with user’s privileges. End result: visiting any website with Avast Secure Browser could result in malware being installed on your system without any user interaction.

Selecting a target

As I already mentioned, Avast Secure Browser comes with eleven custom browser extensions out of the box, plus one made by Google which is always part of Google Chrome. Given the large code bases, prioritization is necessary when looking for security issues here. I checked the extension manifests and noticed this huge “please hack me” sign in one of them:

"content_security_policy": "script-src 'self' 'unsafe-eval'; object-src 'self'",
"permissions": [
  "activeTab", "alarms", "bookmarks", "browsingData", "clipboardRead", "clipboardWrite",
  "contentSettings", "contextMenus", "cookies", "debugger", "declarativeContent", "downloads",
  "fontSettings", "geolocation", "history", "identity", "idle", "management",
  "nativeMessaging", "notifications", "pageCapture", "power", "privacy", "proxy", "sessions",
  "storage", "system.cpu", "system.display", "system.memory", "system.storage", "tabCapture",
  "tabs", "tts", "ttsEngine", "unlimitedStorage", "webNavigation", "webRequest",
  "webRequestBlocking", "http://*/*", "https://*/*", "\u003Call_urls>"
],

Let me explain: this extension requests access to almost every extension API available in the browser. It also wants access to each and every website. Not just that, it lists 'unsafe-eval' in its Content Security Policy. This allows dynamically generated JavaScript to be executed in the extension context, something that browsers normally disallow to reduce the attack surface of extensions.

Download bar displayed by Video Downloader on a YouTube video

The extension in question is called Video Downloader and it is fairly simple: it tries to recognize video players on web pages. When it finds one, it shows a “download bar” on top of it letting the user download the video. Does it need to call eval() or similar functions? No, it doesn’t. Does it need all these extension APIs? Not really, only downloads API is really required. But since this extension is installed by default and the user doesn’t need to accept a permissions prompt, the developers apparently decided to request access to everything – just in case.

Note that Video Downloader wasn’t the only Avast extension featuring these two manifest entries, but it was the only one combining both of them.

Getting into the extension

Looking at the background.js file of the Video Downloader extension, there are a bunch of interesting (indirect) eval() calls. All of these belong to the jQuery library. Now jQuery is meant to be simple to use, which in its interpretation means that it will take your call parameters and try to guess what you want it to do. This used to be a common source of security vulnerabilities in websites, due to jQuery interpreting selectors as HTML code.

But jQuery isn’t used for manipulating DOM here, this being the invisible background page. Instead, the code uses jQuery.ajax() to download data from the web. And you certainly know that jQuery.ajax() isn’t really safe to call with default parameters because that’s what it says in the official documentation. What, no big warning at the top of this page? Maybe if you scroll down to the dataType parameter. Yes, here it is:

The type of data that you’re expecting back from the server. If none is specified, jQuery will try to infer it based on the MIME type of the response (an XML MIME type will yield XML, in 1.4 JSON will yield a JavaScript object, in 1.4 script will execute the script, and anything else will be returned as a string).

No, this really doesn’t sound as scary as it should have been. Let me try it… If you call jQuery.ajax() and you don’t set the dataType parameter, jQuery will just guess how you want it to treat the data. And if it gets a response with text/javascript MIME type then it will run the code. Because that’s probably what you meant to do, right?

Well, Video Downloader developers clearly didn’t mean that. They probably assumed that they would always get JSON data back or something similarly benign. I mean, they were sending requests to services like YouTube and nobody would ever expect YouTube to suddenly turn evil, right?

What were they requesting? Video metadata mostly. There is code to recognize common video players on web pages and retrieving additional information. One rule is particularly lax in recognizing video sources:

playerRegExp: "(.*screen[.]yahoo[.]com.*)"

And the corresponding Yahoo.getMetadata handler will simply download the video URL to extract information from it. Which brings us to my proof of concept page:

<div>
  <video src="rce.js?screen.yahoo.com"></video>
</div>

Yes, that’s it. If the user opens this page, Video Downloader will download the file rce.js and jQuery will run its code in the context of the extension, granting it access to all the extension APIs.

What can be done on the inside?

Once a malicious website uses this approach to inject code into the Video Downloader extension, it controls pretty much all aspects of your browser. This code can read out your cookies, history, bookmarks and other information, it can read out and replace clipboard contents, it can spy on you while you are browsing the web and it can manipulate the websites you are visiting in an almost arbitrary way.

In short: it’s not your browser any more. Not even closing the problematic website will help at this point, the code is running in a context that you don’t control. Only restarting your browser will make it disappear. That is: if you are lucky.

Going beyond the browser

There is at least one way for the malicious code to get out of the browser. When looking into the Avast Online Security extension (yes, the one spying on you) I noticed that it communicates with Avast Antivirus via a local web server. Video Downloader can do that as well, for example to get a unique identifier of this Avast install or to read out some Avast Antivirus settings.

But the most interesting command here turned out to be SWITCH_TO_SAFEZONE. This one will open a website in Banking Mode which is an isolated Avast Secure Browser instance. Only website addresses starting with http: and https: are accepted which appears to be sufficient validation. That is, until you try to put whitespace in the website address. Then you will suddenly see Banking Mode open two websites, with the second address not going through any validation.

In fact, what we have here is a Command Injection vulnerability. And we can inject command line flags that will be passed to AvastBrowser.exe. With it being essentially Chromium, there is a whole lot of command line flags to choose from.

So we could enable remote debugging for example:

request(commands.SWITCH_TO_SAFEZONE, ["https://example.com/ --remote-debugging-port=12345"]);

Avast Secure Browser doesn’t have Video Downloader when running in Banking Mode, yet the regular browser instance can compromise it via remote debugging. In fact, a debugging session should also be able to install browser extensions without any user interaction, at least the ones available in Chrome Web Store. And there are Chromium’s internal pages like chrome://settings with access to special APIs, remote debugging allows accessing those and possibly compromising the system even deeper.

But Jaroslav Lobačevski hinted me towards an even more powerful command line flag: --utility-cmd-prefix. This can specify an arbitrary executable that will be run when the browser starts up:

request(commands.SWITCH_TO_SAFEZONE, ["https://example.com/ --utility-cmd-prefix=calc.exe"]);

This will in fact open the calculator. Running any other command would have been possible as well, e.g. cmd.exe with some parameters.

Process list showing Windows Calculator executed by Avast Secure Browser

Conclusions

Here we have it: a browser with “secure” literally in its name can be compromised by any website that the user happens to visit. That happens because of Video Downloader, a preinstalled extension which ironically has no security value. And only because that extension disabled existing security mechanisms for no good reason.

Not just that, once the attackers control any browser extension, Avast Antivirus makes it easy for them to escape the browser. In the worst case scenario they will be able to install malware or ransomware in the user’s account. This vulnerability is still open for any malicious or compromised browser extension to exploit, from any browser. Update 2020-01-13: This vulnerability is also resolved now.

Timeline

  • 2019-10-09: Reported Remote Code Execution vulnerability in Video Downloader to Avast. Publication deadline: 2020-01-13.
  • 2019-10-09: Got confirmation that vulnerability details have been received and forwarded to the developers.
  • 2019-10-30: Discovered that the vulnerability was fixed in the current extension version already, no notification from Avast.
  • 2019-10-30: Contacted Avast with details on how the compromise could be expanded using SWITCH_TO_SAFEZONE command.
  • 2019-11-05: Avast stated that they want to address SWITCH_TO_SAFEZONE vulnerability before publication.

Nick FitzgeraldSynthesizing Loop-Free Programs with Rust and Z3

<noscript> <blockquote> <p><em> This post makes extensive use of math symbols, and uses MathJax.js to render them, therefore I recommend enabling JavaScript. </em></p> </blockquote> </noscript>

Automatically finding a program that implements a given specification is called program synthesis. The main difficulty is that the search space is huge: the number of programs of size \(n\) grows exponentially. Naïvely enumerating every program of size \(n\), checking whether each one satisfies the specification, and then moving on to programs of size \(n+1\) and so on doesn’t scale. However, the field has advanced by using smarter search techniques to prune the search space, leveraging performance improvements in SMT solvers, and at times limiting the scope of the problem.

In this post, I’ll explain one approach to modern program synthesis: counterexample-guided iterative synthesis of component-based, loop-free programs, as described in Synthesis of Loop-Free Programs by Gulwani et al. We’ll dissect exactly what each of those terms mean, and we’ll also walk through an implementation written in Rust that uses the Z3 solver.

My hopes for this post are two-fold:

  1. I hope that people who are unfamiliar with program synthesis — just like I was not too long ago — get a little less unfamiliar and learn something new about the topic. I’ve tried to provide many examples, and break down the dense logic formulas from the paper into smaller, approachable pieces.

  2. I hope that folks who are already familiar with this kind of program synthesis can help me diagnose some performance issues in the implementation, where I haven’t been able to reproduce the synthesis results reported in the literature. For some of the more difficult benchmark problems, the synthesizer fails to even find a solution before my patience runs out.

Table of Contents

Motivation

Why write a program that writes other programs for me? Am I just too lazy to write them myself? Of course I am. However, there are many valid reasons why a person who is not as lazy as I am might want to synthesize programs.

Some programs are quite tricky to write correctly by hand, and a program synthesizer might succeed where you or I might fail. Quick! How do you isolate the rightmost zero bit in a word using only three bit manipulation instructions?!

              ,--- The rightmost zero bit.
              |
              V
Input:  011010011

Output: 000000100
              ^
              |
              '--- Only that bit is set.

Did you get it yet?

Okay, here’s the answer:

isolate_rightmost_zero_bit(x): // x = 011010011
    a  not x                  // a = 100101100
    b  add 1, x               // b = 011010100
    c  and a, b               // c = 000000100
    return c

Our program synthesizer will find a solution in under a second, and that minimal-length solution in a minute or so. It would take me quite a while longer than that to do the same by hand. We’ll return to this problem throughout the rest of this post, and use it as a running example.

Another reason to use a program synthesizer might be that we need to write many more programs than we have time to write by hand. Take for example a compiler’s peephole optimizer: it considers a sliding window of instruction sequences, and for each sequence, it checks if it knows of an equivalent-but-faster-or-smaller instruction sequence. When it does know of a better instruction sequence, it replaces the original instructions with the better ones.

Peephole optimizers are typically constructed from pattern-matching rules that identify suboptimal instruction sequences paired with the improved instruction sequence to replace matches with:

new PeepholeOptimizer(
    pattern0 → replacement0
    pattern1 → replacement1
    pattern2 → replacement2
    // ...
    patternn → replacementn
)

Each replacementi is a little, optimized mini-program. If we were writing a new peephole optimizer from scratch and by hand, we would have to write \(n\) optimized mini-programs ourselves. And \(n\) can be big: LLVM’s InstCombine peephole optimizer has over 1,000 pattern-and-replacement pairs. Even half that many is way more than I want to write myself.

Instead of writing those optimized mini-programs by hand, we can use each original instruction sequence as a specification, feed it into a program synthesizer, and see if the synthesizer can find the optimal instruction sequence that does the same thing. Finally, we can use all these original instruction sequences and their synthesized, optimal instruction sequences as pattern-and-replacement pairs to automatically construct a peephole optimizer! This idea was first proposed by Bansal et al in Automatic Generation of Peephole Superoptimizers.

Edit: John Regehr pointed out to me that this idea has been floating around since much earlier than 2006, when the Bansal et al paper was published. He pointed me to The Design and Application of a Retargetable Peephole Optimizer by Davidson et al from 1980 as an example, but noted that even this wasn’t the first time it came up.

An Overview of Our Task

Program synthesis is the act of taking a specification, and automatically finding a program that satisfies it. In order to make the problem a little more tractable, we’re limiting its scope in two ways:

  1. Loop-free: We are only synthesizing programs without loops.

  2. Component-based: We are only synthesizing programs that can be expressed as the composition of a given library of components.

The loop-free limitation is not very limiting for many use cases. For example, peephole optimizers often don’t consider instruction sequences that span across loop boundaries.

Component-based synthesis means that rather than synthesizing programs using any combination of any number of the target language’s expressions, the synthesizer is given a library of components and synthesizes programs that use each of those components exactly once. The synthesizer rearranges the components, rewiring their inputs and outputs, until it finds a configuration that satisfies the specification.

That is, given a library of \(N\) components, it constructs a program of the form

synthesized_program(inputs...):
    temp0 ← component0(params0...)
    temp1 ← component1(params1...)
    // ...
    tempN-1 ← componentN-1(paramsN-1...)
    return tempN-1

where each parameter in paramsi is either a tempj variable defined earlier in the program or one of the original inputs.

For example, given the two components

  • f(a)
  • g(a, b)

and an input parameter x, the synthesizer can construct any of the following candidate programs (implicitly returning the variable defined last):

a  g(x, x)
b  f(x)

or

a  g(x, x)
b  f(a)

or

a  f(x)
b  g(x, x)

or

a  f(x)
b  g(a, x)

or

a  f(x)
b  g(x, a)

or

a  f(x)
b  g(a, a)

That’s it. That’s all of the programs it can possibly construct given just those two components.

The synthesizer cannot construct the following program, because it doesn’t use every component:

a  f(x)

And the synthesizer cannot construct this program, because it uses the f component more than once:

a  f(x)
b  f(a)
c  g(b, b)

And, finally, it cannot construct this last program, because this last program uses some function h that is not a component in our given library:

a  f(x)
b  h(a, x)

The following table describes some of the properties of component-based synthesis by comparing it to the fully general version of program synthesis:

General Synthesis Component-Based Synthesis
Shape of Synthesized Programs Using any number of any of the target language's expressions Using only the components in the library
Size of Synthesized Programs Varies Exactly the size of the library, since each component in the library is used exactly once

In our synthesizer, the components will be functions over fixed bit-width integers (also known as “bitvectors” in the SMT solver parlance) and they will correspond to a single instruction in our virtual instruction set: add, and, xor, etc. But in principle they could also be higher-level functions or anything else that we can encode in SMT queries. More on SMT queries later.

While component-based synthesis makes the synthesis problem easier, it does foist a decision on us each time we invoke the synthesizer: we must choose the library of available components. Each component is used exactly once in the synthesized program, but if we want to synthesize a program that performs multiple additions, we can include multiple instances of the add component in the library. Too few components, and the synthesizer might not be able to find a solution. Too many components will slow down the synthesizer, and let it generate non-optimal programs that potentially contain dead code.

To summarize, in component-based synthesis of loop-free programs, our synthesizer’s inputs are

  • a specification, and

  • a library of components.

Its output is a program that satisfies the specification, expressed in terms of the given components, or an error if can’t find such a program.

Formalizing the Problem

In order to synthesize a program, we need a specification describing the desired program’s behavior. The specification is a logical expression that describes the output when the program is given these inputs.

We define the specification with:

  • \(\vec{I}\) as the program inputs,

  • \(O\) as the program output, and

  • \(\phi_\mathrm{spec}(\vec{I}, O)\) as the expression relating the inputs to the output. This expression should be true when \(O\) is the desired output of running the program on inputs \(\vec{I}\).

The library of components we’re given is a multi-set of specifications describing each component’s behavior. Each component specification comes with how many inputs it takes (e.g. an add(a, b) component takes two inputs, and a not(a) component takes one input) as well as a logical formula relating the component’s inputs to its output. The component inputs, output, and expression all have similar notation to the program specification, but with a subscript:

  • \(\vec{I}_i\) is the \(i^\mathrm{th}\) component’s input variables,

  • \(O_i\) is the \(i^\mathrm{th}\) component’s output variable, and

  • \(\phi_i(\vec{I}_i, O_i)\) is the logical expression relating the \(i^\mathrm{th}\) component’s inputs with its output.

We define \(N\) as the number of components in the library.

For our isolating-the-rightmost-zero-bit example, what is the minimal components library we could give to the synthesizer, while still preserving its ability to find our desired solution? It would be a library consisting of exactly the components that correspond to each of the three instructions in the solution program: a not, an add1, and an and component.

Component Definition Description
\( \phi_0(I_0, O_0) \) \( O_0 = \texttt{bvadd}(1, I_0) \) The add-one operation on bitvectors.
\( \phi_1(I_1, I_2, O_1) \) \( O_1 = \texttt{bvand}(I_1, I_2) \) The bitwise and operation on bitvectors.
\( \phi_2(I_3, O_2) \) \( O_0 = \texttt{bvnot}(I_3) \) The bitwise not operation on bitvectors.

Program synthesis can be expressed as an exists-forall problem: we want to find whether there exists some program \(P\) that satisfies the specification for all inputs given to it and outputs it returns.

\( \begin{align} & \exists P: \\ & \quad \forall \vec{I},O: \\ & \quad \quad P(\vec{I}) = O \implies \phi_\mathrm{spec}(\vec{I}, O) \end{align} \)

Let’s break that down and translate it into English:

\( \exists P \) There exists some program \(P\), such that
\( \forall \vec{I},O \) for all inputs \(\vec{I}\) and output \(O\),
\( P(\vec{I}) = O \) if we run the program on the inputs \(\vec{I}\) to get the output \(O\),
\( \implies \) then
\( \phi_\mathrm{spec}(\vec{I}, O) \) our specification \(\phi_\mathrm{spec}\) is satisfied.

This exists-forall formalization is important to understand because our eventual implementation will query the SMT solver (Z3 in our case) with pretty much this formula. It won’t be exactly the same:

  • \(P\) is an abstraction that’s hiding some details about components,
  • there are a few algebraic transformations we will perform, and
  • we won’t pose the whole problem to the solver in a single query all at once.

Nonetheless, the implementation follows from this formalization, and we won’t get far if we don’t have a handle on this.

A Brief Introduction to SMT Solvers

We can’t continue any further without briefly discussing SMT solvers and their capabilities. SMT solvers like Z3 take a logical formula, potentially containing unbound variables, and return whether it is:

  • Satisfiable: there is an assignment to the unbound variables that makes the assertions true, and also here is a model describing those assignments.

  • Unsatisfiable: the formula’s assertions are false; there is no assignment of values to the unbound variables that can make them true.

SMT solvers take their assertions in a Lisp-like input language called SMT-LIB2. Here is an example of a satisfiable SMT query:

;; `x` is some integer, but we don't know which one.
(declare-const Int x)

;; We do know that `x + 2 = 5`, however.
(assert (= 5 (+ x 2)))

;; Check whether the assertions are satisfiable. In
;; this case, they should be!
(check-sat)

;; Get the model, which has assignments to each of
;; the free variables. The model should report that
;; `x` is `3`!
(get-model)

Open and run this snippet in an online Z3 editor!

Note that even though there isn’t any \(\exists\) existential quantifier in there, the solver is implicitly finding a solution for \(x\) in \(\exists x: x + 2 = 5\), i.e. there exists some \(x\) such that \(x + 2\) equals 5. While some SMT solvers have some support for working with higher-order formulas with explicit \(\exists\) existential and \(\forall\) universal quantifiers nested inside, these modes tend to be much slower and also incomplete. We can only rely on first-order, implicitly \(\exists\) existential queries: that is, formulas with potentially unbound variables and without any nested \(\exists\) existential and \(\forall\) universal quantification.

We can add a second assertion to our example that makes it unsatisfiable:

(declare-const x Int)

(assert (= 5 (+ x 2)))

;; NEW: also, x + 1 should be 10.
(assert (= 10 (+ x 1)))

;; This time, the result should be unsatisfiable,
;; because there are conflicting requirements for `x`.
(check-sat)
(get-model)

Open and run this snippet in an online Z3 editor!

The assertions 10 = x + 1 and 5 = x + 2 put conflicting requirements on x, and therefore there is no value for x that can make both assertions true, and therefore the whole query is unsatisfiable.

Counterexample-Guided Iterative Synthesis

Counterexample-guided iterative synthesis (CEGIS) enables us to solve second-order, exists-forall queries — like our program synthesis problem — with off-the-shelf SMT solvers. CEGIS does this by decomposing these difficult queries into multiple first-order, \(\exists\) existentially quantified queries. These are the kind of queries that off-the-shelf SMT solvers excel at solving. First, we’ll look at CEGIS in general, and after that we’ll examine what is required specifically for component-based CEGIS.

CEGIS begins by choosing an initial, finite set of inputs. There has to be at least one, but it doesn’t really matter where it came from; we can use a random number generator. Then we start looping. The first step of the loop is finite synthesis, which generates a program that is correct at least for the inputs in our finite set. It may or may not be correct for all inputs, but we don’t know that yet. Next, we take that candidate program and verify it: we want determine whether it is correct for all inputs (in which case we’re done), or if there is some input for which the candidate program is incorrect (called a counterexample). If there is a counterexample, we add it to our set, and continue to the next iteration of the loop. The next program that finite synthesis produces will be correct for all the old inputs, and also this new counterexample. The counterexamples force finite synthesis to come up with more and more general programs that are correct for more and more inputs, until finally it comes up with a fully general program that works for all inputs.

Without further ado, here is the general CEGIS algorithm:

\(\begin{align} & \texttt{CEGIS}(\phi_\mathrm{spec}(\vec{I}, O)): \\ & \qquad S = \langle \text{initial finite inputs} \rangle \\ & \qquad \\ & \qquad \textbf{loop}: \\ & \qquad \qquad \texttt{// Finite synthesis.} \\ & \qquad \qquad \textbf{solve for $P$ in } \exists P,O_0,\ldots,O_{\lvert S \rvert - 1}: \\ & \qquad \qquad \qquad \left( P(S_0) = O_0 \land \phi_\mathrm{spec}(S_0, O_0) \right) \\ & \qquad \qquad \qquad \land \ldots \\ & \qquad \qquad \qquad \land \left( P(S_{\lvert S \rvert - 1}) = O_{\lvert S \rvert - 1} \land \phi_\mathrm{spec}(S_{\lvert S \rvert - 1}, O_{\lvert S \rvert - 1}) \right) \\ & \qquad \qquad \textbf{if } \texttt{unsat}: \\ & \qquad \qquad \qquad \textbf{error} \text{ “no solution”} \\ & \qquad \qquad \\ & \qquad \qquad \texttt{// Verification.} \\ & \qquad \qquad \textbf{solve for $\vec{I}$ in } \exists \vec{I},O: \,\, P(\vec{I}) = O \land \lnot \phi_\mathrm{spec}(\vec{I}, O) \\ & \qquad \qquad \textbf{if } \texttt{unsat}: \\ & \qquad \qquad \qquad \textbf{return $P$} \\ & \qquad \qquad \textbf{else}: \\ & \qquad \qquad \qquad \textbf{append $\vec{I}$ to $S$} \\ & \qquad \qquad \qquad \textbf{continue} \end{align}\)

CEGIS decomposes the exists-forall synthesis problem into two parts:

  1. Finite synthesis: The first query, the finite synthesis query, finds a program that is correct for at least the finite example inputs in \(S\). Here’s its breakdown:

    \( \exists P,O_0,\ldots,O_{\lvert S \rvert - 1}: \) There exists some program \(P\) and outputs \(O_0,\ldots,O_{\lvert S \rvert - 1}\) such that
    \( ( \,\, P(S_0) = O_0 \) \(O_0\) is the output of running the program on inputs \(S_0\)
    \( \land \) and
    \( \phi_\mathrm{spec}(S_0, O_0) \,\, ) \) the specification is satisfied for the inputs \(S_0\) and output \(O_0\),
    \( \land \ldots \) and…
    \( \land \left( P(S_{\lvert S \rvert - 1}) = O_{\lvert S \rvert - 1} \land \phi_\mathrm{spec}(S_{\lvert S \rvert - 1}, O_{\lvert S \rvert - 1}) \right) \) and \(O_{\lvert S \rvert - 1}\) is the output of running the program on inputs \(S_{\lvert S \rvert - 1}\), and the specification is satisfied for these inputs and output.

    Note that this is a first-order, existential query; it is not using nested \(\forall\) universal quantification over all possible inputs! Instead, it is instantiating a new copy of \(P(S_i) = O_i \land \phi_\mathrm{spec}(S_i, O_i)\) for each example in our finite set \(S\).

    For example, if \(S = \langle 3, 4, 7 \rangle\), then the finite synthesis query would be

    \(\begin{align} & \exists P,O_0,O_1,O_2: \\ & \qquad \left( P(3) = O_0 \land \phi_\mathrm{spec}(3, O_0) \right) \\ & \qquad \land \,\, \left( P(4) = O_1 \land \phi_\mathrm{spec}(4, O_1) \right) \\ & \qquad \land \,\, \left( P(7) = O_2 \land \phi_\mathrm{spec}(7, O_2) \right) \\ \end{align}\)

    This inline expansion works because the finite set of inputs \(S\) is much smaller in practice (typically in the tens, if even that many) than the size of the set of all possible inputs (e.g. there are \(2^{32}\) bitvectors 32 bits wide).

    If the query was unsatisfiable, then there is no program that can implement the specification for every one of the inputs in \(S\). Since \(S\) is a subset of all possible inputs, that means that there is no program that can implement the specification for all inputs. And since that is what we are searching for, it means that the search has failed, so we return an error.

    If the query was satisfiable, the resulting program \(P\) satisfies the specification for all the inputs in \(S\), but we don’t know whether it satisfies the specification for all possible inputs or not yet. For example, if \(S = \langle 0, 4 \rangle \), then we know that the program \(P\) is correct when given the inputs \(0\) and \(4\), but it may or may not be correct when given the input \(1\). We don’t know yet.

  2. Verification: Verification takes the program \(P\) produced by finite synthesis and checks whether it satisfies the specification for all inputs. That’s naturally expressed as a \(\forall\) universally quantified query over all inputs, but we can instead ask if there exists any input to the program for which the specification is not satisfied thanks to De Morgan’s law.

    Here’s the breakdown of the verification query:

    \( \exists \vec{I}, O: \) Does there exist some inputs \(\vec{I}\) and output \(O\) such that
    \( P(\vec{I}) = O\) \(O\) is the result of running the program on \(\vec{I}\)
    \( \land \) and
    \( \lnot \phi_\mathrm{spec}(\vec{I}, O) \) the specification is not satisfied.

    If the verification query is unsatisfiable, then there are no inputs to \(P\) for which the specification is not satisfied, which means that \(P\) satisfies the specification for all inputs. If so, this is what we are searching for, and we’ve found it, so return it!

    However, if the verification query is satisfiable, then we’ve discovered a counterexample: a new input \(\vec{I}\) for which the program does not satisfy the specification. That is, the program \(P\) is buggy when given \(\vec{I}\), so \(P\) isn’t the program we are searching for.

    Next, we add the new \(\vec{I}\) to our finite set of inputs \(S\), so that in the next iteration of the loop, we will synthesize a program that produces a correct result when given \(\vec{I}\) in addition to all the other inputs in \(S\).

As the loop iterates, we add more and more inputs to \(S\), forcing the finite synthesis query to produce more and more general programs. Eventually it produces a fully general program that satisfies the specification for all inputs. In the worst case, we are adding every possible input into \(S\): finite synthesis comes up with a program that fails verification when given 1, and then when given 2, and then 3, and so on. In practice, each counterexample \(\vec{I}\) that verification finds tends to be representative of many other inputs that are also currently unhandled-by-\(P\). By adding \(\vec{I}\) to \(S\), the next iteration of finite synthesis will produce a program that handles not just \(\vec{I}\) but also all the other inputs that \(\vec{I}\) was representative of.

For example, finite synthesis might have produced a program that handles all of \(S = \langle 0,1,5,13 \rangle\) but which is buggy when given a positive, even number. Verification finds the counterexample \(I = 2\), which gets appended to \(S\), and now \(S = \langle 0,1,5,13,2 \rangle\). Then, the next iteration of the loop synthesizes a program that doesn’t just also work for 2, but works for all positive even numbers. This is what makes CEGIS effective in practice.

Finally, notice that both finite synthesis and verification are first-order, \(\exists\) existentially quantified queries that off-the-shelf SMT solvers like Z3 can solve.

CEGIS with Components

Now that we know how CEGIS works in the abstract, let’s dive into how we can use it to synthesize component-based programs.

For every loop-free program that is a composition of components, we can flip the program’s representation into a location mapping:

  • Instead of listing the program line-by-line, defining what component is on each line, we can list components, defining what line the component ends up on.

  • Instead of referencing the arguments to each component by variable name, we can reference either the line on which the argument is defined (if it comes from the result of an earlier component) or as the \(n^\mathrm{th}\) program input.

For example, consider our program to isolate the rightmost zero bit in a word:

isolate_rightmost_zero_bit(x):
    a  not x
    b  add 1, x
    c  and a, b
    return c

We can exactly represent this program with the following location mapping:

{
    inputs: ["x"],

    components: {
        // Line 1: `b ← add 1, x`
        add1: {
            // The line in the program where this
            // component is placed.
            line: 1,

            // Each entry `a` represents where the
            // argument comes from.
            //
            // If `0 <= a < inputs.len()`, then the
            // argument is `inputs[a]`.
            //
            // Otherwise, when `inputs.len() <= a`,
            // then the argument comes from the value
            // defined by the component on line
            /// `a - inputs.len()`.
            arguments: [
                // `x`
                0,
            ],
        },

        // Line 2: `c ← and a, b`
        and: {
            line: 2,
            arguments: [
                // `a`
                1,
                // `b`
                2,
            ],
        },

        // Line 0: `a ← not x`
        not: {
            line: 0,
            arguments: [
                // `x`
                0,
            ],
        },
    },
}

With component-based CEGIS, we’ll be synthesizing this kind of location mapping. This lets us represent a whole component-based program with a handful of numbers for lines and argument indices. And numbers are something that we can represent directly in an SMT query.

Verifying a Component-Based Program

Let’s start with verying a component-based program before we look at their finite synthesis. Verification takes a location mapping, connects the components’ input and output variables together as described by the location mapping, and asks the SMT solver to find a counterexample.

For convenience, so we don’t have to keep repeating \(\vec{I}_0,\ldots,\vec{I}_{N-1}\) all the time, we define \(\textbf{P}\) as the set of all the parameter variables for each component in the library:

\( \textbf{P} = \, \vec{I}_0 \, \cup \, \ldots \, \cup \, \vec{I}_{N-1} \)

And similarly we define \(\textbf{R}\) as the set of all temporary result variables for each component in the library:

\( \textbf{R} = \{O_0, \, \ldots, \, O_{N-1}\} \)

With our running example of isolating the rightmost zero bit, our minimal library consists of

\( \begin{align} \phi_0(I_0, O_0) &= [O_0 = \texttt{bvadd}(1, I_0)] \\ \phi_1(I_1, I_2, O_1) &= [O_1 = \texttt{bvand}(I_1, I_2)] \\ \phi_2(I_3, O_2) &= [O_2 = \texttt{bvnot}(I_3)] \end{align} \)

and therefore its

\( \begin{align} N &= 3 \\ \textbf{P} &= \{ I_0, I_1, I_2, I_3, I_4 \} \\ \textbf{R} &= \{ O_0, O_1, O_2 \} \end{align} \)

We want to constrain the whole library to behave according to its individual component specifications. The output of each and component should indeed be the bitwise and of its inputs, and the output of each not component should indeed be the bitwise not of its input, etc… We define \(\phi_\mathrm{lib}\) as the combination of every component specification \(\phi_i\):

\( \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) = \phi_i(\vec{I}_0, O_0) \land \ldots \land \phi_i(\vec{I}_{N-1}, O_{N-1}) \)

So for our minimal example library, the \(\phi_\mathrm{lib}\) we get is:

\( \begin{align} \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) &= [ O_0 = \texttt{bvadd}(1, I_0) ] \\ &\land [ O_1 = \texttt{bvand}(I_1, I_2) ] \\ &\land [ O_2 = \texttt{bvnot}(I_3) ] \end{align} \)

That is, the library’s constraints are satisfied when all of

  • \(O_0\) is the wrapping addition of \(I_0\) and 1,
  • \(O_1\) is the bitwise and of \(I_1\) and \(I_2\), and
  • \(O_2\) is the bitwise not of \(I_3\),

Because finite synthesis runs before verification, we already have access to the candidate program’s location mapping when we’re constructing our verification query. This location mapping tells us which actual arguments align with which formal parameters of a component. That means we know what the connections are from each component’s input variables to the program inputs and the temporary result variables for other components. We know the dataflow between components.

Let’s make this concrete with our isolating-the-rightmost-zero-bit example. Having produced this candidate program:

a  not x
b  add 1, x
c  and a, b

With this library:

\( \begin{align} \phi_0(I_0, O_0) &= [O_0 = \texttt{bvadd}(1, I_0)] \\ \phi_1(I_1, I_2, O_1) &= [O_1 = \texttt{bvand}(I_1, I_2)] \\ \phi_2(I_3, O_2) &= [O_2 = \texttt{bvnot}(I_3)] \end{align} \)

We know that \(\texttt{a} = O_2\), since it is the result of the not component \(\phi_2(I_3, O_2)\). And since a is the first argument to and a, b, which uses component \(\phi_1(I_1, I_2, O_1)\), we know that \(\texttt{a} = I_1\). Therefore, we know that \(O_2 = I_1\).

We have these equalities from each component input variable \(I_i\) in \(\textbf{P}\) to either some other component’s output variable \(O_j\) in \(\textbf{R}\) or to one of the program inputs \(\vec{I}\). These equalities are given to us directly by the location mapping for the candidate program that we’re verifying.

Additionally, because our candidate program is implicitly returning the last temporary variable c, which is the result of the and component \(\phi_1(I_1, I_2, O_1)\), and because the \(O\) in \(\phi_\mathrm{spec}(\vec{I}, O)\) represents the result of the whole program, we know that \(O = O_1\).

If we put all these equalities together for our example program we get:

\( \left( I_0 = x \right) \land \left( I_1 = O_2 \right) \land \left( I_2 = O_1 \right) \land \left( I_3 = x \right) \land \left( O = O_1 \right)\)

This represents all the connections between our library’s various components and to the candidate program’s inputs and output. If you imagine connecting the components together like a circuit, this represents all the wires between each component.

We define these component-connecting equalities as \(\phi_\mathrm{conn}\), and its general definition is:

\( \begin{align} \phi_\mathrm{conn}(\vec{I}, O, \textbf{P}, \textbf{R}) &= \left( O = O_\mathrm{last} \right) \\ & \land \left( \vec{I}_0 \, = \, \vec{V}_0 \right) \\ & \land \, \ldots \\ & \land \left( \vec{I}_{N-1} \, = \, \vec{V}_{N-1} \right) \end{align} \)

Where

  • \(\vec{V}_i\) are the actual arguments that the candidate program passes into the \(i^\mathrm{th}\) component \(\phi_i\). Each \(\vec{V}_i\) is made up of entries from either the program’s inputs \(\vec{I}\) or from temporary results from \(\textbf{R}\) that are defined by earlier components in the program. This is equivalent to the arguments field defined for each component in our example location mapping’s components map.

  • \(O_\mathrm{last}\) is the output variable for the component on the last line of the program, according to our candidate program’s location mapping.

Once again, let’s break that down:

\( \left( O = O_\mathrm{last} \right) \) The output of the whole program is equal to the result of the component on the last line of the program,
\( \land \) and
\( \left( \vec{I}_0 \, = \, \vec{V}_0 \right) \) the first component's inputs and its assigned arguments are equal to each other,
\( \land \, \ldots \) and...
\( \left( \vec{I}_{N-1} \, = \, \vec{V}_{N-1} \right) \) the last component's inputs and its assigned arguments are equal to each other.

Note that both \(O_\mathrm{last}\) and each \(\vec{V}_i\) are properties of the candidate program’s location mapping, and are known at “compile time” of the verification query. They are not variables that we are \(\exists\) existentially or \(\forall\) universally quantifying over in the query itself. We expand them inline when constructing the verification query.

Ok, so with all of that out of the way, we can finally define the verification constraint that we use in component-based CEGIS:

\( \begin{align} & \exists \vec{I}, O, \textbf{P} , \textbf{R} : \\ & \qquad \phi_\mathrm{conn}(\vec{I}, O, \textbf{P}, \textbf{R}) \land \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) \land \lnot \phi_\mathrm{spec}(\vec{I}, O) \end{align} \)

The verification constraint asks: given that we’ve connected the components together as described by the candidate program’s location mapping, are there any inputs for which the specification is not satisfied?

Let’s break that down once more:

\( \exists \vec{I}, O, \textbf{P} , \textbf{R} : \) Does there exist some inputs and output such that
\(\phi_\mathrm{conn}(\vec{I}, O, \textbf{P}, \textbf{R}) \) when the components are connected together as described by our candidate program's location mapping,
\( \land \,\, \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) \) and when the components behave as defined by our library,
\( \land \,\, \lnot \phi_\mathrm{spec}(\vec{I}, O) \) the specification is not satisfied?

Finding a solution to this query gives us a new counterexample \(\vec{I}\) that we can add to our set of examples \(S\) for future iterations of the CEGIS loop. Failure to find any solution to this query means that the candidate location mapping corresponds to a program that is correct for all inputs, in which case we’re done.

Finite Synthesis of a Component-Based Program

Finite synthesis composes the library components into a program that will correctly handle all the given example inputs. It does this by querying the SMT solver for a location mapping that contains assignments of components to lines in the program, and assignments of variables to each component’s actual arguments.

Recall our example location mapping:

{
    inputs: ["x"],

    components: {
        // Line 1: `b ← add 1, x`
        add1: {
            line: 1,
            arguments: [0], // `[x]`
        },

        // Line 2: `c ← and a, b`
        and: {
            line: 2,
            arguments: [1, 2], // `[a, b]`
        },

        // Line 0: `a ← not x`
        not: {
            line: 0,
            arguments: [0], // `[x]`
        },
    },
}

To encode a location mapping in the finite synthesis query, every component parameter in \(\textbf{P}\) and every component result in \(\textbf{R}\) gets an associated location variable. The finite synthesis query is searching for an assignment to these location variables.

We call the set of all location variables \(L\), and we refer to a particular location variable as \(l_x\) where \(x\) is either a component result in \(\textbf{R}\) or component parameter in \(\textbf{P}\):

\( L = \{ \, l_x \, \vert \, x \in \textbf{P} \cup \textbf{R} \, \} \)

The location variable for a result \(l_{O_i}\) is equivalent to the line field for a component in our JSON-y syntax for example location mappings. It determines the line in the program that the component is assigned to, and therefore where its temporary result is defined.

The location variable for a parameter \(l_p\) is equivalent to an entry in a component’s arguments list in our JSON-y syntax. These location variables determine where the associated parameter gets its value from: either the \(i^\mathrm{th}\) program input or the temporary result defined on the \(j^\mathrm{th}\) line of the program.

To use one index space for both line numbers and program inputs, we follow the same convention that we did with entries in the arguments list in the JSON syntax:

  • When \(l_x\) is less than the number of program inputs, then it refers to the \({l_x}^\mathrm{th}\) program input.

  • Otherwise, when \(l_x\) is greater than or equal to the number of program inputs, then subtract the number of inputs from \(l_x\) to get the line number it’s referring to.

Value of \(l_x\) Refers To Location
0 Input 0 Program Inputs \(\vec{I}\)
1 Input 1
... ...
\( \lvert \vec{I} \rvert - 1 \) Input \( \lvert \vec{I} \rvert - 1 \)
\(\lvert \vec{I} \rvert + 0\) Line 0 Line Numbers
\(\lvert \vec{I} \rvert + 1\) Line 1
... ...
\(\lvert \vec{I} \rvert + N - 1\) Line \(N - 1\)

All loop-free, component-based programs can be described with a location mapping. However, the reverse is not true: not all location mappings describe a valid program.

Consider this location mapping:

{
    inputs: ["x"],

    components: {
        // Line 0: `a ← add 1, x`
        add1: {
            line: 0,
            arguments: [0], // `[x]`
        },

        // Line 0: `a ← sub x, 1`
        sub1: {
            line: 0,
            arguments: [0], // `[x]`
        }
    }
}

This line mapping is inconsistent because it wants to put both its components on line zero of the program, but each line in the program can only use a single component.

To forbid the solver from providing bogus answers of this sort, we add the consistency constraint \(\psi_\mathrm{cons}\) to the finite synthesis query. It requires that no pair of distinct component result location variables can be assigned the same line.

\( \psi_\mathrm{cons}(L) = \bigwedge\limits_{x,y \in \textbf{R}, x \not\equiv y} \left( l_x \neq l_y \right) \)

Once more, let’s break that down:

\( \bigwedge\limits_{x,y \in \textbf{R}, x \not\equiv y} \) For each \(x,y\) pair of component results, where \(x\) and \(y\) are not the same result variable,
\( \left( l_x \neq l_y \right) \) the location of \(x\) and the location of \(y\) are not the same.

But there are even more ways that a location mapping might describe an invalid program! Consider this location mapping:

{
    inputs: ["x"],

    components: {
        // Line 0: `a ← and x, b`
        and: {
            line: 0,
            arguments: [0, 2], // `[x, b]`
        },

        // Line 1: `b ← sub a, 1`
        sub1: {
            line: 1,
            arguments: [1], // `[a]`
        }
    }
}

That location mapping describes this program:

f(x):
    a  and x, b
    b  sub a, 1

The b temporary result is used before it is defined, and in order to compute b, we need to compute a, but computing a requires computing b, which requires computing a, etc… We have a cycle on our hands.

To forbid mappings that correspond to bogus programs with dataflow cycles, we use the acyclicity constraint \(\psi_\mathrm{acyc}\). This constraint enforces that a particular component’s parameters are defined before this component’s line.

\( \psi_\mathrm{acyc}(L) = \bigwedge\limits_{i=0}^{N-1} \left( \bigwedge\limits_{p \in \vec{I}_i} \left( l_p < l_{O_i} \right) \right) \)

Let’s break that down:

\( \bigwedge\limits_{i=0}^{N-1} \) For each component index \(i\),
\( \bigwedge\limits_{p \in \vec{I}_i} \) and for each of the \(i^\mathrm{th}\) component's input parameters,
\( l_p < l_{O_i} \) the location of the parameter should be less than the location of the component, meaning that the parameter is defined before the component is used.

The only other way that location mappings can be invalid is if a location is out of bounds of the program inputs and line numbers, so we’re ready to define the well-formed-program constraint \(\psi_\mathrm{wfp}\). This constraint enforces that any location mapping we synthesize will correspond to a well-formed program.

A well-formed program is

  • consistent,

  • acyclic,

  • its component parameter locations point to either a program input or a line number, and

  • its component temporary result locations point to a line number.

Let’s define \(M\) as the number of program inputs plus the number of components in the library:

\( M = \lvert \vec{I} \rvert + N \)

A component parameter location \(l_{p \in \textbf{P}}\) can point to either

  • a program input in the range from zero to the number of program inputs: \(0 \leq l_{p} \lt \lvert \vec{I} \rvert\), or

  • a line number, which corresponds to the \(N\) locations following the program inputs: \(\lvert \vec{I} \rvert \leq l_p \lt M \).

Since those two ranges are contiguous, it means that component parameter locations ultimately fall within the range \(0 \leq l_p \lt M\).

A component temporary result location \(l_{r \in \textbf{R}}\) must point to a line number, which means that they fall within the range \(\lvert \vec{I} \rvert \leq l_r \lt M\).

Put all that together and we get the well-formed-program constraint \(\psi_\mathrm{wfp}\):

\( \begin{align} \psi_\mathrm{wfp}(L) &= \bigwedge\limits_{p \in \textbf{P}} \left( 0 \leq l_p \lt M \right) \\ & \land \, \bigwedge\limits_{r \in \textbf{R}} \left( \lvert \vec{I} \rvert \leq l_r \lt M \right) \\ & \land \, \psi_\mathrm{cons}(L) \\ & \land \, \psi_\mathrm{acyc}(L) \end{align} \)

And here is its breakdown:

\( \bigwedge\limits_{p \in \textbf{P}} \left( 0 \leq l_p \lt M \right) \) Each component parameter location \(l_p\) points to either a program input or a line number,
\( \land \, \bigwedge\limits_{r \in \textbf{R}} \left( \lvert \vec{I} \rvert \leq l_r \lt M \right) \) and each component result location \(l_r\) points to a line number,
\( \land \, \psi_\mathrm{cons}(L) \) and the location mapping is consistent,
\( \land \, \psi_\mathrm{acyc}(L) \) and the location mapping is acyclic.

Now that we can constrain finite synthesis to only produce location mappings that correspond to well-formed programs, all we need to do is encode the connections between components and the behavior of the library. This should sound familiar: we need the finite synthesis equivalent of \(\phi_\mathrm{conn}\) and \(\phi_\mathrm{lib}\) from verification. And it turns out that \(\phi_\mathrm{lib}\) doesn’t need to be tweaked at all, because the behavior of the library remains the same whether we are in verification or finite synthesis. But while \(\phi_\mathrm{conn}\) was checking a set of already-known connections between components, in finite synthesis we are searching for those connections, so we need a different query.

These connections define the dataflow between components. They are the wires in the circuit built from our components. A connection from some component result into another component’s input means that we need to constrain the result and input variables to be equal in the finite synthesis query. For example, if component \(\phi_i\) get’s placed on line 3, and parameter \(p\) is assigned the location 3, then \(p\) must take on the same value as the output \(O_i\) of that component.

This leads us to our definition of \(\psi_\mathrm{conn}\): for every pair of location variables \(l_x\) and \(l_y\), if they refer to the same location, then \(x\) and \(y\) must have the same value.

\( \psi_\mathrm{conn}(L, \vec{I}, O, \textbf{P}, \textbf{R}) = \bigwedge\limits_{x,y \in \vec{I} \cup \textbf{P} \cup \textbf{R} \cup { O } } \left( \left( l_x = l_y \right) \implies \left( x = y \right) \right) \)

Here is its piece-by-piece breakdown:

\( \bigwedge\limits_{x,y \in \vec{I} \cup \textbf{P} \cup \textbf{R} \cup \{ O \} } \) For each pair of location variables \(l_x\) and \(l_y\), where \(x\) and \(y\) are either a program input, or a component's parameter, or a component's temporary result, or the program output,
\( \left( l_x = l_y \right) \) if the location variables refer to the same location,
\( \implies \) then
\( \left( x = y \right) \) \(x\) and \(y\) must have the same value.

We’re finally ready to define our finite synthesis query for a location mapping. This query asks the solver to find some location mapping that corresponds to a well-formed program that satisfies our specification for each example input in \(S\). In other words, it must enforce that

  • the location mapping corresponds to a well-formed program, and

  • when the components are connected as described by the location mapping, and when the components behave as described by our library,

  • then the specification is satisfied for each of our example inputs in \(S\).

Here it is, finally, our finite synthesis query:

\( \begin{align} & \exists L, O_0, \ldots, O_{\vert S \rvert - 1}, \textbf{P}_0, \ldots, \textbf{P}_{\vert S \rvert - 1}, \textbf{R}_0, \ldots, \textbf{R}_{\vert S \rvert - 1}: \\ & \qquad \psi_\mathrm{wfp}(L) \,\, \land \\ & \qquad \qquad \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \phi_\mathrm{lib}(\textbf{P}_i, \textbf{R}_i) \land \psi_\mathrm{conn}(L, S_i, O_i, \textbf{P}_i, \textbf{R}_i) \land \phi_\mathrm{spec}(S_i, O_i) \right) \\ \end{align} \)

That’s quite a mouthful, so, one last time, let’s pull it apart and break it down:

\( \exists L, O_0, \ldots, O_{\vert S \rvert - 1}, \textbf{P}_0, \ldots, \textbf{P}_{\vert S \rvert - 1}, \textbf{R}_0, \ldots, \textbf{R}_{\vert S \rvert - 1}: \) There exists some location mapping \(L\), and program outputs, component parameters, and component results variables for each example in \(S\), such that
\( \psi_\mathrm{wfp}(L) \,\, \land \) the location mapping is a well-formed program, and
\( \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \) for each example input index \(i\),
\( \phi_\mathrm{lib}(\textbf{P}_i, \textbf{R}_i) \) the components behave as described by the library,
\( \land \,\, \psi_\mathrm{conn}(L, S_i, O_i, \textbf{P}_i, \textbf{R}_i) \) and the components are connected as described by the location mapping,
\( \land \,\, \phi_\mathrm{spec}(S_i, O_i) \) and the specification is satisfied for the \(i^\mathrm{th}\) example input.

When the solver finds a satisfiable assignment for this query, we get a new candidate location mapping that corresponds to a program that is correct for each of the example inputs in \(S\). When the solver finds the query unsatisifiable, that means there is no locaiton mapping that corresponds to a program that is correct for each of the example inputs, which means that our search has failed.

Implementation

I implemented a loop-free, component-based program synthesizer in Rust that uses Z3 to solve the finite synthesis and verification queries. The implementation’s repository is over here.

Our target language has all the operations you would expect for working with fixed-width integers. It has arithmetic operations like add and mul. It has bitwise operations like and and xor. It has comparison operations like eq, that evaluate to one if the comparison is true and zero otherwise. Finally, it has a select operation that takes three operands: a condition, a consequent, and an alternative. When the condition is non-zero, it evaluates to the consequent, and otherwise it evaluates to the alternative.

Values are neither signed nor unsigned. For operations like division that behave differently on signed and unsigned integers, we have a different instruction for each behavior: div_s for signed division and div_u for unsigned division.

Program Representation

A program is a sequence of instructions:

pub struct Program {
    instructions: Vec<Instruction>,
}

An instruction has an operation and binds its result to an identifier:

pub struct Instruction {
    result: Id,
    operator: Operator,
}

An identifier is an opaque wrapper around a number:

pub struct Id(u32);

We won’t support any nice names for identifiers, since essentially everything is a temporary. When we stringify programs, we’ll turn the identifiers into a, b, c, etc…

Additionally, we will uphold the invariant that Id(i) always refers to the value defined by the ith instruction in the program.

Finally, the Operator enumeration has a variant for each operation in the language, along with the identifiers of its operands:

pub enum Operator {
    Var,                // New input variable.

    Const(u64),         // A constant value.

    Eqz(Id),            // Equals zero?
    Clz(Id),            // Count leading zeros.
    Ctz(Id),            // Count trailing zeros.
    Popcnt(Id),         // Population count.

    Eq(Id, Id),         // Equal?
    Ne(Id, Id),         // Not equal?
    LtS(Id, Id),        // Signed less than?
    LtU(Id, Id),        // Unsigned less than?
    GtS(Id, Id),        // Signed greater than?
    GtU(Id, Id),        // Unsigned greater than?
    LeS(Id, Id),        // Signed less than or equal?
    LeU(Id, Id),        // Unsigned less than or equal?
    GeS(Id, Id),        // Signed greater than or equal?
    GeU(Id, Id),        // Unsigned greater than or equal?

    Add(Id, Id),        // Addition.
    Sub(Id, Id),        // Subtraction.
    Mul(Id, Id),        // Multiplication.
    DivS(Id, Id),       // Signed division.
    DivU(Id, Id),       // Unsigned division.
    RemS(Id, Id),       // Signed remainder.
    RemU(Id, Id),       // Unsigned remainder.

    And(Id, Id),        // And.
    Or(Id, Id),         // Or.
    Xor(Id, Id),        // Exclusive or.
    Shl(Id, Id),        // Shift left.
    ShrS(Id, Id),       // Signed shift right.
    ShrU(Id, Id),       // Unsigned shift right.
    Rotl(Id, Id),       // Rotate left.
    Rotr(Id, Id),       // Rotate right.

    Select(Id, Id, Id), // If then else.
}

Building Programs

To make constructing programs by hand easier, we have a builder for programs. It wraps an inner Program and exposes a method for each operation, to append an instruction using that operation to the program thats being built.

pub struct ProgramBuilder {
    program: Program,
}

impl ProgramBuilder {
    /// Start building a new program!
    pub fn new() -> ProgramBuilder {
        ProgramBuilder {
            program: Program {
                instructions: vec![],
            },
        }
    }

    /// Finish building this program!
    pub fn finish(self) -> Program {
        self.program
    }

    fn next_id(&self) -> Id {
        Id(self.program.instructions.len() as u32)
    }

    // ...

    /// Append an `add a, b` instruction!
    pub fn add(&mut self, a: Id, b: Id) -> Id {
        let result = self.next_id();
        self.program.instructions.push(Instruction {
            result,
            operator: Operator::Add(a, b),
        });
        result
    }

    // ...
}

Here is our isolate-the-righmost-zero-bit example program constructed via the ProgramBuilder:

// isolate_rightmost_zero_bit(x):
//     a ← not x
//     b ← add 1, x
//     c ← and a, b
//     return c

let builder = ProgramBuilder::new();

let x = builder.var();
let a = builder.not(x);
let one = builder.const_(1);
let b = builder.add(one, x);
let c = builder.and(a, b);

let isolate_rightmost_zero_bit = builder.finish();

Having the builder is nice since we intend to write unoptimized programs ourselves, that we then use as specifications for synthesizing optimized programs.

Defining Components

Every operator has an associated component, so that we can synthesize programs that use that operator. Because a library can have a heterogeneous set of components, we’ll define an object-safe Component trait, so we can box them up as trait objects with dynamically dispatched methods, and store them all in the same collection.

A Component has two roles:

  1. It must provide the constraints between its inputs and its output for the solver. This is its \(\phi_i(\vec{I}_i, O_i)\) that will be part of the whole library’s \(\phi_\mathrm{lib}(\textbf{P}, \textbf{R})\).

  2. After we’ve synthesized some location mapping for a program that uses this component, in order to construct the corresponding Program, the component needs to know how to create the Operator it represents.

The make_expression trait method will handle the first role, and the make_operator trait method will handle the second. make_expression takes Z3 BitVec variables as its operands (the \(\vec{I}_i\) input variables) and returns a new Z3 BitVec expression that represents the result of running the operation on the operands (the \(O_i\) output variable). make_operator takes the Ids of its operands (as described by a synthesized location mapping) and returns its associated Operator with those Ids as its operands.

Note that, because different components have different arities, these methods don’t take their operands as multiple, individual parameters like a: BitVec, b: BitVec or a: Id, b: Id. Instead, components report their arity with the operand_arity trait method and callers pass in a slice of that many operands.

Finally, we also support synthesizing constant values for the const operator. These immediates are handled separately from operands, and so we also have the immediate_arity trait method, and make_expression and make_operator also takes slices of immediates.

pub trait Component {
    /// How many operands does this component take?
    fn operand_arity(&self) -> usize;

    /// How many immediates does this component need
    /// synthesized?
    fn immediate_arity(&self) -> usize;

    /// Construct this component's constraint as a Z3
    /// expression.
    fn make_expression<'a>(
        &self,
        context: &'a z3::Context,
        immediates: &[BitVec<'a>],
        operands: &[BitVec<'a>],
        bit_width: u32,
    ) -> BitVec<'a>;

    /// Construct an `Operator` from the given
    /// immediates and operands.
    fn make_operator(
        &self,
        immediates: &[u64],
        operands: &[Id]
    ) -> Operator;
}

Let’s take a look at the implementation of a simple, representative component: the add component.

  • It has two operands and zero immediates.

  • The make_operator implementation is mechanical, pulling out its individual operands from the given slice, just to push them into the Operator::Add.

  • The make_expression implementation is similarly concise, but is not quite mechanical. This is where we encode the add operator’s semantics into a Z3 expression. SMT-LIB2 defines a bvadd operatoin for wrapping addition on bitvectors, and the z3 crate we’re using exposes it to us as a method on BitVec. Wrapping addition is exactly what we want for our add instruction, and so all we have to do to encode our add operation’s semantics is bvadd our two operands.

  • Because we are always working with components as trait objects, we also define a helper function for boxing up the Add component and turning it into a trait object with dynamically dispatched methods.

struct Add;

impl Component for Add {
    fn operand_arity(&self) -> usize {
        2
    }

    fn immediate_arity(&self) -> usize {
        0
    }

    fn make_operator(
        &self,
        immediates: &[u64],
        operands: &[Id],
    ) -> Operator {
        Operator::Add(operands[0], operands[1])
    }

    fn make_expression<'a>(
        &self,
        _context: &'a z3::Context,
        immediates: &[BitVec<'a>],
        operands: &[BitVec<'a>],
        _bit_width: u32,
    ) -> BitVec<'a> {
        operands[0].bvadd(&operands[1])
    }
}

pub fn add() -> Box<dyn Component> {
    Box::new(Add) as _
}

Most components look pretty much like that: there is some direct encoding into a single SMT expression for the operation.

Other components, like eqz, don’t have a direct encoding as a single SMT expression, and we need to compose multiple.

impl Component for Eqz {
    // ...

    fn make_expression<'a>(
        &self,
        context: &'a z3::Context,
        _immediates: &[BitVec<'a>],
        operands: &[BitVec<'a>],
        bit_width: u32,
    ) -> BitVec<'a> {
        let zero = BitVec::from_i64(
            context,
            0,
            bit_width,
        );
        let one = BitVec::from_i64(
            context,
            1,
            bit_width,
        );
        // `ite` is "if-then-else". If the operand
        // is zero, evaluate to one, otherwise
        // evaluate to zero.
        zero._eq(&operands[0]).ite(&one, &zero)
    }
}

The const component is the only component that uses immediates. It doesn’t have operands. Either its constant value is unbound, in which case we’re synthesizing the value, or we provide a value for it upon constructing the component:

struct Const(Option<u64>);

impl Component for Const {
    fn operand_arity(&self) -> usize {
        0
    }

    fn immediate_arity(&self) -> usize {
        if self.0.is_some() {
            0
        } else {
            1
        }
    }

    fn make_operator(
        &self,
        immediates: &[u64],
        _operands: &[Id]
    ) -> Operator {
        if let Some(val) = self.0 {
            Operator::Const(val)
        } else {
            Operator::Const(immediates[0])
        }
    }

    fn make_expression<'a>(
        &self,
        context: &'a z3::Context,
        immediates: &[BitVec<'a>],
        _operands: &[BitVec<'a>],
        bit_width: u32,
    ) -> BitVec<'a> {
        if let Some(val) = self.0 {
            BitVec::from_i64(
                context,
                val as i64,
                bit_width,
            )
        } else {
            immediates[0].clone()
        }
    }
}

pub fn const_(val: Option<u64>) -> Box<dyn Component> {
    Box::new(Const(val)) as _
}

Finally, a library is just a vector of components:

pub struct Library {
    pub components: Vec<Box<dyn Component>>,
}

Specifications

A specification looks quite similar to a component, but we only need to create its SMT constraints. We don’t need to construct Operators from it, so it doesn’t have an equivalent of the Component::make_operator method.

pub trait Specification {
    fn arity(&self) -> usize;

    fn make_expression<'a>(
        &self,
        context: &'a z3::Context,
        inputs: &[BitVec<'a>],
        output: &BitVec<'a>,
        bit_width: u32,
    ) -> Bool<'a>;
}

We want to use existing, unoptimized programs as specifications for new, optimized programs, and so we implement Specification for Program.

The arity of a program specification is how many input variables the program defines with the var operator:

impl Specification for Program {
    fn arity(&self) -> usize {
        self.instructions
            .iter()
            .take_while(|inst| {
                inst.operator == Operator::Var
            })
            .count()
    }

    // ...
}

To create a Z3 expression for a whole program, we build up subexpressions instruction by instruction in a table. This table serves as our lexical environment. By inspecting an operation’s operand Ids, we pluck the corresponding subexpressions out of this table to pass as operands into the instruction’s Component::make_expression implementation. The final entry in the table is the expression for the whole program, since the program implicitly returns its last value.

impl Specification for Program {
    // ...

    fn make_expression<'a>(
        &self,
        context: &'a z3::Context,
        inputs: &[BitVec<'a>],
        output: &BitVec<'a>,
        bit_width: u32,
    ) -> Bool<'a> {
        // `exprs[i]` is the Z3 subexpression for the
        // value defined on the `i`th instruction of
        // this program.
        let mut exprs: Vec<_> = inputs
            .iter()
            .cloned()
            .collect();

        let mut operands = vec![];

        for instr in self
            .instructions
            .iter()
            .skip(inputs.len())
        {
            // All `const`s in the program already have
            // a bound value, so no instruction will
            // require new immediates.
            let immediates = [];

            // Gather the subexpressions for each
            // operand that this operator is using.
            operands.clear();
            instr
                .operator
                .operands(|Id(i)| {
                    let v = exprs[i as usize].clone();
                    operands.push(v);
                });

            // Create the Z3 expression for this
            // instruction and push it onto `exprs`.
            exprs.push(
                instr
                    .operator
                    .make_expression(
                        context,
                        &immediates,
                        &operands,
                        bit_width,
                    )
            );
        }

        // The last instruction defines our return
        // value, and it should be equal to the
        // `output` variable.
        exprs.last().unwrap()._eq(output)
    }
}

The Synthesizer

The Synthesizer structure solves one synthesis problem for us. It takes a library and spec upon construction, and then we can configure how it will run with a few builder-style methods, and then finally we call its synthesize method to actually kick off the CEGIS loop.

/// A program synthesizer for creating a component-
/// based, loop-free program.
pub struct Synthesizer<'a> {
    // ...
}

impl<'a> Synthesizer<'a> {
    /// Create a new synthesizer with the given
    /// library and specification.
    ///
    /// After creation, we have the opportunity
    /// to configure the synthesis process with
    /// the various builder-style methods.
    pub fn new(
        context: &'a z3::Context,
        library: &'a Library,
        spec: &'a dyn Specification,
    ) -> Result<Self> {
        // ...
    }

    /// Configure whether we should require synthesis
    /// to find the minimal-length program that
    /// satisfies the specification.
    ///
    /// This produces the optimally smallest possible
    /// program, but it tends to take more time.
    pub fn should_synthesize_minimal_programs(
        &mut self,
        should: bool,
    ) -> &mut Self {
        // ...
    }

    /// Configure the timeout.
    ///
    /// No timeout means that we will keep going forever
    /// if necessary. Providing a number of milliseconds
    /// means we will have a soft maximum runtime of of
    /// that many milliseconds before giving up.
    pub fn set_timeout(
        &mut self,
        milliseconds: Option<u32>,
    ) -> &mut Self {
        // ...
    }

    /// Run this configured synthesizer to find a
    /// program that satisfies the specification!
    pub fn synthesize(
        &mut self,
    ) -> Result<Program> {
        // ...
    }
}

This allows us to create, configure, and run a synthesis problem like this:

let program = Synthesizer::new(&context, &library, &spec)
    .should_synthesize_minimal_programs(true)
    .set_timeout(Some(60 * 60 * 1000))
    .synthesize()?;

Location Mappings

A location mapping uses a bunch of line indices to represent a component-based program. Lines are something we can represent directly in SMT expressions, and so location mappings are how we will encode programs for synthesis.

We have the choice of either encoding a line as a mathematical integer, or as a bitvector. I originally used mathematical integers, and perhaps unsurprisingly, it is way slower than using a bitvector. By using a bitvector representation, we can use the minimal number of bits required for our library of components. For example, if there is one program input and seven components in the library, the largest zero-based line number we will ever need is seven, which means we only need bitvectors that are three bits wide to represent every line. The narrower the bitvector, the faster Z3 can work with it. I haven’t seen line representation mentioned in any of the papers I’ve read, but I’m sure it is tribal knowledge that “everyone knows.”

// The index of a line that a component is located
// at, or gets a parameter from.
type Line<'a> = BitVec<'a>;

Location mappings are used differently at different phases of CEGIS. Before we synthesize a program, they are a collection of unbound variables in an SMT query that we’re constructing. After synthesis, those variables get bound to concrete values and we only want to work with the values from then on; we don’t need the variables anymore. Therefore, we split the location mapping into two distinct representations:

  1. Before and during finite synthesis we have LocationVars, which is a collection of line variables.

  2. After finite synthesis, we pull the values assigned to those variables out into an Assignments structure that is a collection of index values.

A LocationVars has line variables for the program inputs \(\vec{I}\), for all the components’ parameters \(\textbf{P}\), for all the components’ temporary results \(\textbf{R}\), and for the whole program’s output \(O\). It also keeps track of how wide our line location bitvectors are.

struct LocationVars<'a> {
    // Variables for lines assigned to program inputs.
    // These are always assigned `0..inputs.len()`.
    inputs: Vec<Line<'a>>,

    // `params[i]` is the variable for the line that
    // defines the `i`th component parameter.
    params: Vec<Line<'a>>,

    // `results[i]` is the variable for the line on
    // which the `i`th component is placed in the
    // synthesized program, and therefore where the
    // `i`th component's temporary result is defined.
    results: Vec<Line<'a>>,

    // The variable for the line on which the program's
    // output is defined.
    output: Line<'a>,

    // The minimum bit width necessary to represent
    // any line location.
    line_bit_width: u32,
}

Unlike the original paper, we do not always take the program output from the last line in the synthesized location mapping. Instead, we allow setting the output line earlier in the program, essentially inserting early returns. This lets us (optionally) synthesize programs with optimal code size: we can find the earliest output line for which we can still synthesize a correct program. This little trick is taken from Souper.

The Assignments are all those same things, but instead of the variable for the line, we have the value of the line that we pulled out of the model the SMT solver gave us. It also has any synthesized immediate values.

struct Assignments {
    // Synthesized immediate values.
    immediates: Vec<u64>,

    // `params[i]` is the line in the program where the
    // `i`th parameter is defined.
    params: Vec<u32>,

    // `results[i]` is the line in the program where the
    // `i`th component is located, and therefore the
    // `i`th temporary result is defined.
    results: Vec<u32>,

    // The line where the program's final output is
    // defined.
    output: u32,
}

Verification

Verification takes an Assignment produced by finite synthesis, and attempts find a counterexample. If it finds one, then we need to continue the CEGIS loop. If it can’t find a counterexample, then we’ve found a solution and we’re done.

In the original paper, they represent the candidate program in the verification query with \(\phi_\mathrm{lib}\) and \(\phi_\mathrm{conn}\), essentially asking the solver to interpret the connections between components. This is what I originally did as well, but then I had a realization:

  • We already need to have the ability to turn an Assignments into a Program for when we find a solution.

  • We can already turn a Program into an SMT expression, since we use unoptimized programs as specifications for finding optimized ones.

That means we can convert our Assignments into a Program and then into an SMT expression. Why not verify that this SMT expression implies the specification directly? This means that

  • we don’t need to construct \(\phi_\mathrm{lib}\) and \(\phi_\mathrm{conn}\) during verification,

  • the solver doesn’t need to unify all the connections between components’ inputs and outputs in \(\phi_\mathrm{conn}\) to solve the verification query, and

  • this should generally create smaller queries, which should generally be faster to solve than larger queries.

This did end up saving a little bit of code in the implementation, but does not seem to have yielded a speed up for overall synthesis time. Probably because nearly all time is spent in finite synthesis, and not verification.

Anyways, here is the verification implementation:

impl<'a> Synthesizer<'a> {
    fn verification(
        &mut self,
        assignments: &Assignments,
        bit_width: u32,
    ) -> Result<Verification> {
        // Create fresh input and output variables.
        let inputs: Vec<_> = (0..self.spec.arity())
            .map(|_| {
                fresh_input(self.context, bit_width)
            })
            .collect();
        let output = fresh_output(
            self.context,
            bit_width,
        );

        // Convert the assignments into a program, and
        // then convert that into an SMT expression.
        let mut prog = assignments
            .to_program(self.spec.arity(), self.library)
            .make_expression(
                self.context,
                &inputs,
                &output,
                bit_width,
            );

        let spec = self
            .spec
            .make_expression(
                self.context,
                &inputs,
                &output,
                bit_width,
            );

        // Are there any inputs to our program for
        // which the specification is not upheld?
        let not_spec = spec.not();
        let query = prog.and(&[&not_spec]);
        let solver = self.solver();
        solver.assert(&query);

        match solver.check() {
            // We don't know whether there is a
            // counterexample or not. This typically
            // means that we hit a timeout limit that
            // was previously configured.
            z3::SatResult::Unknown => {
                Err(Error::SynthesisUnknown)
            }

            // There are no more inputs that don't
            // satisfy the spec! We're done!
            z3::SatResult::Unsat => {
                Ok(Verification::WorksForAllInputs)
            }

            // There still exist inputs for which the
            // synthesized program does not satisfy the
            // spec. Return the counterexample.
            z3::SatResult::Sat => {
                let model = solver.get_model();
                self.add_invalid_assignment(assignments);
                let inputs = eval_bitvecs(
                    &model,
                    &inputs,
                );
                Ok(Verification::Counterexample(inputs))
            }
        }
    }
}

In the case where verification finds a counterexample, we make a call to the add_invalid_assignment method. This is another trick taken from Souper: we remember assignments that work for some inputs, but not all of them, and explicitly forbid them in future finite synthesis queries. This saves the SMT solver from reconsidering these seemingly promising assignments, only to find once again that they don’t work for one of the example inputs.

Finite Synthesis

Recall that finite synthesis searches for a location mapping that satisfies the specification for each of our example inputs, but might or might not satisfy the specification when given other inputs.

The first constraint on the location mapping is that it corresponds to a well-formed program. This constraint actually remains the same for every iteration of CEGIS, so we generate the constraint once when creating a new Synthesizer instance, and then reuse the already constructed constraint for each finite synthesis query. Similarly, we also store the location variables in the Synthesizer and reuse them across all iterations of the CEGIS loop.

pub struct Synthesizer<'a> {
    // ...
    locations: LocationVars<'a>,
    well_formed_program: Bool<'a>,
    // ...
}

Recall the definition of the \(\psi_\mathrm{wfp}\) well-formed program constraint:

\( \begin{align} \psi_\mathrm{wfp}(L) &= \bigwedge\limits_{p \in \textbf{P}} \left( 0 \leq l_p \lt M \right) \\ & \land \, \bigwedge\limits_{r \in \textbf{R}} \left( \lvert \vec{I} \rvert \leq l_r \lt M \right) \\ & \land \, \psi_\mathrm{cons}(L) \\ & \land \, \psi_\mathrm{acyc}(L) \end{align} \)

It says that each parameter location \(l_p\) falls within the range \(0 \leq l_p < M \), each temporary result location \(l_r\) falls within the range \(\lvert \vec{I} \rvert \leq l_r < M\), that the locations are consistent, and that there are no dataflow cycles.

Constructing the equivalent SMT expression in Z3 follows pretty much directly from that, except it is a bit noisier, since we’re passing contexts around and need to convert Rust usizes and u32s into Z3 bitvectors of the appropriate width:

impl<'a> LocationVars<'a> {
    fn well_formed_program(
        &self,
        context: &'a z3::Context,
        library: &Library,
        invalid_connections: &mut HashSet<(u32, u32)>,
    ) -> Bool<'a> {
        // We'll build up all of our well-formed program
        // constraints in this list, and then `and` them
        // all together at the end.
        let mut wfp = vec![];

        // A well-formed program is consistent.
        wfp.push(self.consistent(
            context,
            invalid_connections,
        ));

        // A well-formed program is acyclic.
        wfp.push(self.acyclic(context, library));

        let i_len = self.line_from_u32(
            context,
            self.inputs.len() as u32,
        );
        let m = self.line_from_u32(
            context,
            (self.results.len() +
                self.inputs.len()) as u32,
        );
        let zero = self.line_from_u32(context, 0);

        // The inputs are always assigned to the first
        // locations.
        for (i, l) in self.inputs.iter().enumerate() {
            let i = self.line_from_u32(
                context,
                i as u32,
            );
            wfp.push(l._eq(&i));
        }

        // All component parameter locations are in
        // the range `0 <= l < M`.
        for l in &self.params {
            // 0 <= l
            wfp.push(line_le(&zero, l));
            // l < M
            wfp.push(line_lt(l, &m));
        }

        // All component result locations are in the
        // range `|i| <= l < M`.
        for l in &self.results {
            // |i| <= l
            wfp.push(line_le(&i_len, l));
            // l < m
            wfp.push(line_lt(l, &m));
        }

        // `and` all of our constraints together.
        and(context, &wfp)
    }
}

Remember that the \(\psi_\mathrm{cons}\) consistency constraint ensures that we don’t assign two components to the same line:

\( \psi_\mathrm{cons}(L) = \bigwedge\limits_{x,y \in \textbf{R}, x \not\equiv y} \left( l_x \neq l_y \right) \)

Our consistent method creates the corresponding SMT expression for Z3, and additionally records pairs of location indices into the invalid_connections map. This map keeps a record of location variables that well-formed programs cannot have dataflow between. Therefore, we don’t need to add connection clauses for these pairs of location variables in our eventual implementation of the \(\psi_\mathrm{conn}\) connectivity constraint. Leaving those pairs out of the connectivity constraint keeps the SMT expressoin that much smaller, which should help Z3 solve it that much faster. Once again, this trick was taken from Souper.

impl LocationVars<'a> {
    fn consistent(
        &self,
        context: &'a z3::Context,
        invalid_connections: &mut HashSet<(u32, u32)>,
    ) -> Bool<'a> {
        // Build up all the consistency constraints
        // in this list.
        let mut cons = vec![];

        for (i, (index_x, loc_x)) in self
            .results_range()
            .zip(&self.results)
            .enumerate()
        {
            for (index_y, loc_y) in self
                .results_range()
                .zip(&self.results)
                .skip(i + 1)
            {
                // Can't have dataflow from a result to
                // another result directly, they need to
                // be connected through parameters, so
                // add this pair to the invalid
                // connections map.
                invalid_connections.insert(
                    (index_x, index_y)
                );

                // Components cannot be assigned to the
                // same line, so these location variables
                // cannot have the same value:
                // `loc_x != loc_y`.
                cons.push(loc_x._eq(loc_y).not());
            }
        }

        // `and` all the constraints together.
        and(context, &cons)
    }
}

The \(\psi_\mathrm{acyc}\) acyclicity constraint enforces that our synthesized location mappings don’t have dataflow cycles:

\( \psi_\mathrm{acyc}(L) = \bigwedge\limits_{i=0}^{N-1} \left( \bigwedge\limits_{p \in \vec{I}_i} \left( l_p < l_{O_i} \right) \right) \)

Our implementation of the acyclicity constraint doesn’t do anything fancy, and is just a direct translation of the constraints into an SMT expression:

impl LocationVars<'a> {
    fn acyclic(
        &self,
        context: &'a z3::Context,
        library: &Library,
    ) -> Bool<'a> {
        // Build up the constraints in this list.
        let mut acycs = vec![];

        let mut params = self.params.iter();

        for (result_loc, comp) in self
            .results
            .iter()
            .zip(&library.components)
        {
            for _ in 0..c.operand_arity() {
                let param_loc = params.next().unwrap();

                // The component's param's location
                // must be less than the component's
                // result's location, meaning the
                // param is already defined before it
                // is used: `param_loc < result_loc`.
                acycs.push(
                    line_lt(param_loc, result_loc)
                );
            }
        }

        // `and` all the constraints together.
        and(context, &acycs)
    }
}

Our next steps are implementing the \(\psi_\mathrm{conn}\) connectivity constraint and the \(\phi_\mathrm{lib}\) library specification. These do change with each iteration of the CEGIS loop, because we instantiate them for each example input we’re working with. That means we can’t cache-and-reuse them, like we do with the well-formed program constraint.

The \(\psi_\mathrm{conn}\) connectivity constraint encodes the dataflow connections between components in the library. If two location variables refer to the same location, then the entities whose location is assigned by those variables must have the same value.

\( \psi_\mathrm{conn}(L, \vec{I}, O, \textbf{P}, \textbf{R}) = \bigwedge\limits_{x,y \in \vec{I} \cup \textbf{P} \cup \textbf{R} \cup { O } } \left( \left( l_x = l_y \right) \implies \left( x = y \right) \right) \)

The connectivity method constructs a Z3 expression that implements this constraint. Previously, we had recorded pairs of location variables that cannot be connected directly. Now, we filter out any such connections from this constraint, keeping the SMT expression smaller, as explained earlier.

impl<'a> Synthesizer<'a> {
    fn connectivity(
        &self,
        inputs: &[BitVec<'a>],
        output: &BitVec<'a>,
        params: &[BitVec<'a>],
        results: &[BitVec<'a>],
    ) -> Bool<'a> {
        // A table mapping a location variable's index
        // to the location variable and its associated
        // entity: `index -> (l[x], x)`.
        let locs_to_vars: Vec<_> = self
            .locations
            .inputs
            .iter()
            .zip(inputs)
            .chain(
                self.locations
                    .params
                    .iter()
                    .zip(params)
            )
            .chain(
                self.locations
                    .results
                    .iter()
                    .zip(results)
            )
            .chain(iter::once(
                (&self.locations.output, output)
            ))
            .collect();

        // Build up the constraints in this list.
        let mut conn = vec![];

        for (index_x, (l_x, x)) in locs_to_vars
            .iter()
            .enumerate()
        {
            for (index_y, (l_y, y)) in locs_to_vars
                .iter()
                .enumerate()
                .skip(i + 1)
            {
                // If these locations can't ever be
                // connected directly, don't bother
                // constraining them.
                if self.is_invalid_connection(
                    index_x as u32,
                    index_y as u32,
                ) {
                    continue;
                }

                // If these two location variables
                // refer to the same location, then
                // their entities are connected to
                // each other, and must have the
                // same value.
                conn.push(
                    l_x._eq(l_y).implies(&x._eq(y))
                );
            }
        }

        // `and` all the constraints together.
        and(self.context, &conn)
    }
}

Our final prerequisite before we can implement the whole finite synthesis query is the \(\phi_\mathrm{lib}\) library specification, which describes the behavior of the library as a whole:

\( \phi_\mathrm{lib}(\textbf{P}, \textbf{R}) = \phi_i(\vec{I}_0, O_0) \land \ldots \land \phi_i(\vec{I}_{N-1}, O_{N-1}) \)

Implementing the \(\phi_\mathrm{lib}\) library specification involves getting the immediates, operands, and result variables for each component, asking the component to make its SMT expression relating those variables to each other, and finally anding all those expressions together:

impl<'a> Synthesizer<'a> {
    fn library(
        &self,
        immediates: &[BitVec<'a>],
        params: &[BitVec<'a>],
        results: &[BitVec<'a>],
        bit_width: u32,
    ) -> Bool<'a> {
        // Build up each component's spec expression
        // in here.
        let mut lib = vec![];

        let mut immediates = immediates;
        let mut params = params;

        for (result, comp) in results
            .iter()
            .zip(&self.library.components)
        {
            // Chop the immediates for this component off
            // the front of the list.
            let (these_imms, rest) = immediates.split_at(
                comp.immediate_arity()
            );
            immediates = rest;

            // Chop the params for this component off the
            // front of the list.
            let (these_params, rest) = params.split_at(
                comp.operand_arity()
            );
            params = rest;

            // Create the spec for this component, and
            // make sure its result is constrained by it.
            lib.push(
                result._eq(&comp.make_expression(
                    self.context,
                    these_imms,
                    these_params,
                    bit_width,
                ))
            );
        }

        // `and` all of the component specs together.
        and(self.context, &lib)
    }
}

Ok, we finally have all the functions we need to implement finite synthesis! The finite synthesis query assigns values to our location variables such that the resulting mapping corresponds to a well-formed program, and that program satisfies our specification for each of our example inputs.

\( \begin{align} & \exists L, O_0, \ldots, O_{\vert S \rvert - 1}, \textbf{P}_0, \ldots, \textbf{P}_{\vert S \rvert - 1}, \textbf{R}_0, \ldots, \textbf{R}_{\vert S \rvert - 1}: \\ & \qquad \psi_\mathrm{wfp}(L) \,\, \land \\ & \qquad \qquad \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \phi_\mathrm{lib}(\textbf{P}_i, \textbf{R}_i) \land \psi_\mathrm{conn}(L, S_i, O_i, \textbf{P}_i, \textbf{R}_i) \land \phi_\mathrm{spec}(S_i, O_i) \right) \\ \end{align} \)

Our implementation of the finite synthesis query mostly follows the same structure as that formula:

  • We already have the well-formed program constraint cached and ready for reuse, so we don’t need to recreate it here.

  • We loop over each of our example inputs:

    • Create fresh program output variables, fresh component parameter variables, and fresh component result variables for each example.

    • Constrain each component’s parameters and output to behave as dictated by our library.

    • Connect cross-component variables together as described by the location mapping, with the connectivity constraint.

    • Require that the program specification is satisfied for this example input.

  • Additionally, we assign the program’s output’s location to the line we were given. This lets callers optionally force the synthesis of optimally small programs.

  • Finally, we run the query and parse the resulting location mapping’s assignments out of the solver’s model.

impl<'a> Synthesizer<'a> {
    fn finite_synthesis(
        &mut self,
        example_inputs: &HashSet<Vec<u64>>,
        output_line: u32,
        bit_width: u32,
    ) -> Result<Assignments> {
        let immediates = self.fresh_immediates(
            bit_width
        );

        // We'll build up the constraints that the
        // location assignments work for each input
        // in this list.
        let mut works_for_inputs = vec![];

        for inputs in example_inputs {
            // Convert the inputs into bitvectors.
            let inputs: Vec<_> = inputs
                .iter()
                .map(|i| BitVec::from_i64(
                    self.context,
                    *i as i64,
                    bit_width,
                ))
                .collect();

            // Create fresh params, results, and
            // output variables. These are the
            // existentially-quantified `P[i]`,
            // `R[i]`, and `O[i]` from the formula.
            let params = self.fresh_param_vars(
                bit_width
            );
            let results = self.fresh_result_vars(
                bit_width
            );
            let output = fresh_output(
                self.context,
                bit_width,
            );

            // The components behave as described by
            // the library.
            let lib = self.library(
                &immediates,
                &params,
                &results,
                bit_width,
            );
            works_for_inputs.push(lib);

            // The components are connected together as
            // described by the location variables.
            let conn = self.connectivity(
                &inputs,
                &output,
                &params,
                &results,
            );
            works_for_inputs.push(conn);

            // And given that, our specification for
            // desired program behavior is satisified.
            let spec = self
                .spec
                .make_expression(
                    self.context,
                    &inputs,
                    &output,
                    bit_width,
                );
            works_for_inputs.push(spec);
        }

        let works_for_inputs: Vec<&_> = works_for_inputs
            .iter()
            .collect();

        // Enforce that the output is on the required
        // line.
        let output_line = self.locations.line_from_u32(
            self.context,
            output_line,
        );
        let output_on_line = self
            .locations
            .output
            ._eq(&output_line);

        // Put all the pieces together!
        let query = self
            .well_formed_program
            .and(&works_for_inputs)
            .and(&[
                &output_on_line,
                // Do not synthesize any of the
                // assignments we've already
                // discovered to be invalid.
                &self.not_invalid_assignments,
            ]);

        let solver = self.solver();
        solver.assert(&query);

        match solver.check() {
            // It is unknown whether an assignment
            // that fits the constraints exists or
            // not. Usually this is because a timeout
            // was configured and we ran out of time.
            z3::SatResult::Unknown => {
                Err(Error::SynthesisUnknown)
            }

            // There are no assignments for the
            // location variables that satisfies our
            // synthesis constraints. We can't create
            // a program that satisfies the spec with
            // this component library.
            z3::SatResult::Unsat => {
                Err(Error::SynthesisUnsatisfiable)
            }

            // Ok, we have an assignment for the
            // location variables that fits our
            // synthesis constraints! Pull them out
            // of the model and return them.
            z3::SatResult::Sat => {
                let model = solver.get_model();

                let immediates = eval_bitvecs(
                    &model,
                    &immediates,
                );

                let params = eval_lines(
                    &model,
                    &self.locations.params
                );

                let results = eval_lines(
                    &model,
                    &self.locations.results,
                );

                let output = eval_line(
                    &model,
                    &output_line,
                );

                Ok(Assignments {
                    immediates,
                    params,
                    results,
                    output,
                })
            }
        }
    }
}

The CEGIS Loop

When the synthesizer is configured to produce optimally small programs, we wrap the CEGIS loop in another loop, similar to how Souper does. Souper begins synthesizing at the shortest program length and then allows longer and longer programs until it finds an optimally short solution. In contrast, we start by synthesizing the longest programs that can be expressed with the given library first, and then try shorter and shorter programs from there. This is motivated by two insights:

  1. In practice, it appears that a failed search for a program of length n-1, because no such program exists, takes much longer than searching for and successfully finding a program of length n. By iterating longest to shortest target program lengths, we only hit the expensive case once.

  2. When we synthesize a longer-than-necessary program, often that program contains dead code. Running dead code elimination (DCE) on a loop-free program is super easy, and when the dead code is removed, it additionally lets us skip ahead a bunch of iterations in this loop.

When the synthesizer is not configured to produce optimally small programs, we adjust the range of target program lengths such that we only call into the CEGIS loop once. We accomplish this by setting the shortest length equal to the longest length.

impl<'a> Synthesizer<'a> {
    pub fn synthesize(
        &mut self,
    ) -> Result<Program> {
        let mut example_inputs = self
            .initial_concrete_inputs()?;

        // The length of the longest program we can
        // synthesize with this library.
        let longest = self.spec.arity() as u32
            + self.library.components.len() as u32;

        // The shortest length program we'll attempt to
        // synthesize.
        let shortest = if self
            .should_synthesize_minimal_programs
        {
            self.spec.arity() as u32 + 1
        } else {
            longest
        };

        // In practice, the cost of searching for a
        // program of length `n-1` when no such program
        // exists is much more expensive than
        // successfully searching for a program of
        // length `n`. Therefore, start synthesizing
        // longer programs first, shortening the length
        // as we iterate.
        let mut best = Err(Error::SynthesisUnknown);
        let mut target_length = longest;
        while target_length >= shortest {
            // Call into the CEGIS loop, synthesizing
            // a program of the current target length.
            match self.synthesize_with_length(
                target_length,
                &mut example_inputs,
            ) {
                // We found a program at this length --
                // it's our new `best`!
                //
                // Run DCE on it, potentially allowing us
                // to skip a few iterations of this loop,
                // and then search for a program that is
                // one instruction shorter than it.
                Ok(mut program) => {
                    program.dce();
                    target_length =
                        program.instructions.len() as u32
                        - 1;
                    best = Ok(program);

                    // Reset the invalid-assignments set,
                    // since an assignment that was an
                    // invalid program when returning
                    // line `i` might be valid when
                    // returning line `i-1`.
                    self.reset_invalid_assignments();
                    continue;
                }

                // We can't synthesize a program with the
                // target length, or we timed out. Return
                // our current `best` result, or if we
                // don't have one, return this error.
                err => return best.or_else(|_| err),
            }
        }

        // If we completed the loop without ever failing
        // to synthesize a program and then early
        // returning, that means we have a program of
        // length `shortest` on our hands. Return it!
        best
    }
}

Originally, this implementation was choosing the initial example inputs with a random number generator. After reading through the Souper sources, I saw that they are presenting the program specification to the solver with all the inputs and output variables unbound. Then they have solver solve for example inputs itself. These inputs are presumably more “interesting” than what a random number generator might choose, which should in turn help finite synthesis come up with better solutions more quickly.

I implemented this technique for our synthesizer and additionally constrain the input variables to be distinct from any that we’ve already added to our initial inputs \(S\).

Here’s the logic formula for the find-an-example-input query:

\( \begin{align} & \exists \vec{I}, O: \\ & \qquad \phi_\mathrm{spec}(\vec{I}, O) \land \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \vec{I} \neq S_i \right) \end{align} \)

And here’s its breakdown:

\( \exists \vec{I}, O: \) Find some inputs \(\vec{I}\) and output \(O\) such that
\( \phi_\mathrm{spec}(\vec{I}, O) \) our specification is satisfied
\(\land\) and
\( \bigwedge\limits_{i=0}^{\lvert S \rvert - 1} \left( \vec{I} \neq S_i \right) \) we don't already have those inputs in our list of example inputs \(S\).

The initial_concrete_inputs method constructs this query, passes it to Z3, and pulls each discovered example out of each query’s resulting model:

impl<'a> Synthesizer<'a> {
    fn initial_concrete_inputs(
        &mut self,
    ) -> Result<HashSet<Vec<u64>>> {
        // Shamelessly cargo cult'd from Souper.
        const NUM_INITIAL_INPUTS: usize = 4;

        let mut inputs: HashSet<Vec<u64>> =
            HashSet::with_capacity(NUM_INITIAL_INPUTS);

        // Create unbound inputs and output variables.
        let input_vars: Vec<_> = (0..self.spec.arity())
            .map(|_| fresh_input(
                self.context,
                FULL_BIT_WIDTH,
            ))
            .collect();
        let output_var = fresh_output(
            self.context,
            FULL_BIT_WIDTH,
        );

        let spec = self.spec.make_expression(
            self.context,
            &input_vars,
            &output_var,
            FULL_BIT_WIDTH,
        );

        for _ in 0..NUM_INITIAL_INPUTS {
            // Make sure that we find unique new
            // inputs by constraining the input
            // vars to be not equal to any of the
            // ones we've already found.
            let mut existing_inputs = vec![];
            for input_set in &inputs {
                // Build up all the expressions
                // `var[i] == input_set[i]` in
                // `eq_this_input`.
                let mut eq_this_input = vec![];
                for (inp, var) in input_set
                    .iter()
                    .zip(&input_vars)
                {
                    let inp = BitVec::from_i64(
                        self.context,
                        *inp as i64,
                        FULL_BIT_WIDTH,
                    );
                    eq_this_input.push(inp._eq(var));
                }
                let eq_this_input = and(
                    self.context,
                    &eq_this_input,
                );
                existing_inputs.push(eq_this_input);
            }

            // The new inputs should not be equal
            // to any of the existing inputs
            let not_existing_inputs = or(
                self.context,
                &existing_inputs,
            ).not();

            let query = spec.and(&[
                &not_existing_inputs,
            ]);

            let solver = self.solver();
            solver.assert(&query);

            match solver.check() {
                // If the result is unknown, try and
                // use the examples we've already got,
                // but if we don't have any, just bail
                // out now.
                z3::SatResult::Unknown => {
                    if inputs.is_empty() {
                        return Err(
                            Error::SynthesisUnknown
                        );
                    } else {
                        break;
                    }
                }

                // If the query is unsatisfiable, then
                // our spec is bogus.
                z3::SatResult::Unsat => {
                    return Err(
                        Error::SynthesisUnsatisfiable
                    );
                }

                // Found some example inputs, so pull
                // them out of the model and add them
                // to our set.
                z3::SatResult::Sat => {
                    let model = solver.get_model();
                    let new_inputs = eval_bitvecs(
                        &model,
                        &input_vars,
                    );
                    inputs.insert(new_inputs);
                }
            }
        }

        Ok(inputs)
    }
}

Alright so now we are ready to examine the implementation of our inner CEGIS loop!

Close readers will have noticed all the bit_width parameters being passed around everywhere and the usage of a FULL_BIT_WIDTH constant in earlier code samples. That’s because we implement the well-known optimization of synthesizing programs initially at narrow bit widths, and then verifying with wider and wider bit widths until we’ve verified at our target bit width: FULL_BIT_WIDTH, aka 32 in this implementation.

This works well because programs that are correct at, say, eight bits wide can often work just as well at 32 bits wide. Note that this is “often” and not “always”: widening synthesized constants, in particular, can be troublesome. However, with many fewer bits for the solver to keep track of, synthesis at eight bits wide is a lot faster than synthesis at 32 bits wide.

When verifying a program synthesized at a narrower bit width, we verify at the narrow bit width, and then verify at double that bit width, and then we double the bit width again, etc… until we’ve reached the full bit width, at which point we’re done. Whenever verification finds a counterexample, from then on we synthesize at whatever bit width we’ve reached so far. That is, we’ll never narrow our current bit width again after widening it.

impl<'a> Specification<'a> {
    fn synthesize_with_length(
        &mut self,
        program_length: u32,
        inputs: &mut HashSet<Vec<u64>>,
    ) -> Result<Program> {
        let output_line = program_length - 1;
        let mut bit_width = 2;

        'cegis: loop {
            // Finite synthesis finds some assignments
            // that satisfy the specification for our
            // example inputs at the current bit width.
            let assignments = self.finite_synthesis(
                inputs,
                output_line,
                bit_width,
            )?;

            // Now start the verification and bit width
            // widening loop.
            loop {
                // Verify the assignments at the current
                // bit width.
                match self.verification(
                    &assignments,
                    bit_width,
                )? {
                    // This assignment works for all
                    // inputs at the current bit width!
                    // If we're at our full bit width,
                    // then we have the answer and we're
                    // done. Otherwise, double our bit
                    // width and keep verifying at the
                    // new bit width.
                    Verification::WorksForAllInputs => {
                        if bit_width == FULL_BIT_WIDTH {
                            let program =
                                assignments.to_program(
                                    self.spec.arity(),
                                    self.library,
                                );
                            return Ok(program);
                        } else {
                            bit_width *= 2;
                        }
                    }

                    // Verification found a counter-
                    // example, so add it to our set of
                    // example inputs and try finite
                    // synthesis again.
                    Verification::Counterexample(
                        new_inputs
                    ) => {
                        inputs.insert(new_inputs);
                        continue 'cegis;
                    }
                }
            }
        }
    }
}

Results

We have a new program synthesizer, and the natural next question is: how well does it work? How does it compare to existing synthesizers? Can it solve as many synthesis problems as other synthesizers? Can it find programs as fast as other synthesizers can?

First off, our synthesizer can indeed find an optimal solution for isolating the rightmost zero bit in a word! However, there are multiple optimal solutions, and it doesn’t always find the one that’s exactly how we would probably write it by hand. According to our synthesizer’s cost model, which is just number of instructions used, a less-than-or-equal comparison that always evaluates to true is just as good as a const 1:

a  var           // a = 011010011
b  le_u a, a     // b = 000000001 (aka `b ← const 1`)
c  add a, b      // c = 011010100
d  xor c, a      // d = 000000111
e  and d, c      // e = 000000100

Phew!

The Synthesis of Loop-Free Programs paper by Gulwani et al defines a benchmark suite of 25 bitvector problems taken from the big book of bit-twiddling hacks, Hacker’s Delight. They use it to benchmark their program synthesizer, Brahma. Our running example of isolating the rightmost zero bit in a word is the seventh problem in the benchmark suite. Other benchmark problems include:

  • turning off the rightmost one bit in a word,

  • taking the average of two integers without overflow,

  • rounding up to the next power of two,

  • etc…

The problems are roughly ordered in increasing difficulty: the first problem is much easier than the last problem.

They use a standard library of twelve base components for the first seventeen problems. For the last eight problems, they augment that library with additional components, depending on the problem at hand.

I ported the benchmark problems into Programs that can be fed into our synthesizer as specifications, and created a small runner that times how long it takes our synthesizer to rediscover solutions to these problems.

Porting their standard library over to our synthesizer required some small changes. For example, our language does not allow immediates in the add operation, so their add1 component becomes two components in our library: an add component and a const 1 component. There are a couple other instances where they had a single component that corresponded to multiple operations. We don’t support that scenario, so I split those components into multiple components when porting them over to our synthesizer. Additionally, for the later, more difficult problems, I used the minimal library required to solve the problem, since these more difficult problems are prone to take quite a long time.

For these benchmarks, I’m using Z3 version 4.4.1 which comes in my OS’s package manager. Brahma used the Yices solver, and my understanding is that Souper can plug in a few different solvers, including Z3.

Our goal here is only to get a general sense of our synthesizer’s performance. We are not precisely evaluating our synthesizer’s performance and comparing it with Brahma in excruciating detail because this isn’t an apples-to-apples comparison. But we should get a sense of whether anything is horribly off.

All problems are run with a timeout of one hour, meaning that if we can’t find a solution in that amount of time, we give up and move on to the next problem. For comparison, the longest that Brahma took on any problem was 46 minutes.

First, I measured how long it takes to synthesize a program of any length for each problem. Next, I measured how long it takes to synthesize an optimally small program for each problem. Finally, I measured how long it takes to synthesize both an optimally small program and the constant values used in each const operation, rather than being given the necessary constant value as part of the const component.

The results are summarized in the table below, alongside the results reported in the paper for their Brahma synthesizer. The timings are in units of seconds.

Benchmark Brahma Our Synthesizer
Any Length Min. Length Min. Length & Synth. Constants
p1 3.2 0.5 109.0 170.4
p2 3.6 1.1 136.0 99.4
p3 1.4 0.6 32.8 31.9
p4 3.3 0.5 100.5 130.1
p5 2.2 0.3 125.2 119.2
p6 2.4 0.6 96.3 87.7
p7 1.0 0.5 100.1 80.3
p8 1.4 1.0 90.5 77.6
p9 5.8 36.5 241.4 165.6
p10 76.1 6.8 34.2 42.1
p11 57.1 4.1 31.9 36.9
p12 67.8 11.5 30.6 52.8
p13 6.2 20.9 129.7 165.0
p14 59.6 85.1 1328.7 1598.8
p15 118.9 20.0 1897.3 1742.7
p16 62.3 25.2 3600.0* 3600.0*
p17 78.1 8.2 102.0 243.5
p18 45.9 1.5 27.4 33.5
p19 34.7 3.3 10.0 6.0
p20 108.4 757.5 3600.1* 3600.0*
p21 28.3 timeout timeout timeout
p22 279.0 timeout timeout timeout
p23 1668.0 Z3 crash timeout timeout
p24 224.9 timeout timeout timeout
p25 2778.7 timeout timeout timeout

* For these problems, the synthesizer found a solution, but not necessarily the optimally small solution. It timed out before it could complete the search and either discover or rule out the existence of smaller programs.

For the easier, early problems, our synthesizer does pretty well. Finding an optimally small program takes much longer than finding a program of any length. Synthesizing constants doesn’t seem to add much overhead to synthesis over finding an optimally small program. Our synthesizer hits some sort of cliff around problems 20 and 21, and run times start skyrocketing and running into the timeout ceiling.

Problem 23 is to synthesize an implementation of popcount. Brahma solves the problem in just under 28 minutes. The Souper folks report that they can solve this problem as well, although they do say that their synthesizer scales less well than Braham does. When I was running this benchmark, Z3 repeatedly crashed while trying to solve a finite synthesis query. Ultimately, the synthesizer wasn’t able to complete the problem. I ran earlier versions of our synthesizer on this problem during development, when Z3 wasn’t crashing, and I was still never able to synthesize a program for popcount. I let the synthesizer run for nearly three days, and it couldn’t do it.

Maybe Yices is much better at solving these sorts of queries than Z3 is? Perhaps there are some well-known optimizations to make these queries easier for solvers to solve, and I’m just ignorant of them? Maybe the fact that compound components get split into multiple components in our library, resulting in a larger library and more location variables to solve for, is the root of the problem? I don’t know.

This performance cliff with larger synthesis problems is my one disappointment with this project. If anyone has ideas about why the synthesizer is falling off this cliff, or how to profile Z3 and dig into why it is struggling with these queries, I’d love to hear from you. Particularly if you have experience with this kind of program synthesis. SMT solvers are very much a black box to me right now.

Conclusion

Counterexample-guided iterative synthesis is a wonderfully clever way to transform a single problem that is too difficult for SMT solvers to solve, into multiple little problems, each of which off-the-shelf solvers can handle. Inverting the representation of a component-based program into a location mapping, so that it can be wholly represented inside an SMT query, is equally deft.

Program synthesis — writing programs that write other programs — is fun and delightful in a meta, ouroboros-y sense. It also foretells a future where we can mechanically generate large swaths of an optimizing compiler’s backend. A future where compilers emit actually optimal code sequences. I find that lofty promise exciting, but I’m also frustrated by my lack of insight into SMT solver performance. There do exist other approaches to program synthesis that only rely on SMT solvers for verification, and not for finite synthesis. However, it seems premature to give up on this approach when I haven’t yet reproduced the results reported for Brahma and Souper. Nonetheless, I remain optimistic, and I intend to keep digging in until I resolve these discrepancies.

Thanks for reading!

References

  • fitzgen/synth-loop-free-prog: The git repository for the synthesizer described in this post!

  • Synthesis of Loop-Free Programs by Gulwani et al: This is the paper that most of this post is based upon! A great paper, and you should read it if you found this interesting!

  • Program Synthesis by Gulwani et al: A survey paper summarizing the state of the art for program synthesis. A great way to quickly get an overview of the whole field!

  • A Synthesizing Superoptimizer by Sasnauskas et al: This paper describes Souper, a superoptimizer for LLVM that uses program synthesis at its core. Souper’s synthesizer is also based on the Gulwani paper, but extends it in a few different ways. I’ve looked through its sources quite a bit to find optimizations and tricks!

  • Automatic Generation of Peephole Superoptimizers by Bansal et al: The first paper describing how to generate a table-driven peephole optimizer offline with a superoptimizer.

  • Hacker’s Delight by Henry S. Warren: A book dedicated to concise and non-obvious bit twiddling tricks. It is an incredible book if you’re into that sort of thing! The benchmark problems presented in the Synthesis of Loop-Free Programs paper are taken from this book.

  • SyGuS: SyGuS, or Syntax-Guided Synthesis, formalizes a class of program synthesis problems, standardizes a language for describing them, and holds a yearly competition for synthesizers. The aim is to drive innovations in the field with a little friendly competition.

Niko MatsakisAsync Interview #4: Florian Gilcher

Hello! For the latest async interview, I spoke with Florian Gilcher (skade). Florian is involved in the async-std project, but he’s also one of the founders of Ferrous Systems, a Rust consulting firm that also does a lot of trainings. In that capacity, he’s been teaching people to use async Rust now since Rust’s 1.0 release.

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

One note: something about our setup meant that I was hearing a lot of echo. I think you can sometimes hear it in the recording, but not nearly as bad as it was live. So if I seem a bit spacey, or take very long pauses, you might know the reason why!

Prioritize stability, read/write traits

The first thing we discussed was some background on async-std itself. From there we started talking about what the Rust org ought to prioritize. Florian felt like having stable, uniform AsyncRead and AsyncWrite traits would be very helpful, as most applications are interested in having access to a “readable/writable thing” but don’t care that much where the bytes are coming from.

He felt that Stream, while useful, might be somewhat lower priority. The main reason was that while streams are useful, in many of the applications that he’s seen, there wasn’t as much need to be generic over a stream. Of course, having a standard Stream trait would still be of use, and would enable libraries as well, so it’s not an argument not to do it, just a question of how to prioritize.

Prioritize diagnostics perhaps even more

Although we’ve done a lot of work on it, there continues to be a need for improved error diagnostics. This kind of detailed ergonomics work may indeed be the highest priority overall.

(A quick plug for the async await working group, which has been steadily making progress here. Big thanks especially to tmandry, who has been running the triage meetings lately, but also (in no particular order) csmoe, davidtwco, gilescope, and centril – and perhaps others I’ve forgotten (sorry!).)

Levels of stability and the futures crate

We discussed the futures crate for a while. In particular, the question of whether we should be “stabilizing” traits by moving them into the standard library, or whether we can use the futures crate as a “semi-stable” home. There are obviously advantages either way.

On the one hand, there is no clearer signal for stability than adding something to libstd. On the other, the future crate facade gives a “finer grained” ability to talk about semver.

One thing Florian noted is that the futures crate itself, although it has evolved a lot, has always maintained an internal consistency, which is good.

One other point Florian emphasized is that people really want to be building applications, so in some way the most important thing is to be moving towards stability, so they can avoid worrying about the sand shifting under their feet.

Deprioritize: Attached and detached streams

I asked Florian how much he thought it made sense to wait on things like streams until the GAT story is straightened out, so that we might have support for “attached” streams. He felt like it would be better to move forward with what we have now, and consider extensions later.

He noted an occasional tendency to try and create the single, perfect generic abstraction that can handle everything – while this can be quite elegant, it can sometimes also lead to really confusing interfaces that are complex to use.

Deprioritize: Special syntax for streams

I asked about syntactic support for generators, but Florian felt that it was too early to prioritize that, and that it would be better to focus first on the missing building blocks.

The importance of building and discovering patterns

Florian felt that we’re now in a stage where we’re transitioning a little. Until now, we’ve been tinkering about with the most primitive layers of the async ecosystem, such as the Future trait, async-await syntax, etc. As these primitives are stabilized, we’re going to see a lot more tinkering with the “next level up” of patterns. These might be questions like “how do I stop a stream?”, or “how do I construct my app?”. But it’s going to be hard for people to focus on these higher-level patterns (and in particular to find new, innovative solutions to them) until the primitives even out.

As these patterns evolve, they can be extracted into crates and types and shared and reused in many contexts. He gave the example of the async-task crate, which extracts out quite a bit of the complexity of managing allocation of an async task. This allows other runtimes to reuse that fairly standard logic. (Editor’s note: If you haven’t seen async-task, you should check it out, it’s quite cool.)

Odds and ends

We then discussed a few other features and how much to prioritize them.

Async fn in traits. Don’t rush it, the async-trait crate is a pretty reasonable practice and we can probably “get by” with that for quite a while.

Async closures. These can likely wait too, but they would be useful for stabilzing convenience combinators. On the other hand, those combinators often come attached to the base libaries you’re using.

Communicating over the futures crate

Returning to the futures crate, I raised the question of how best to help convey its design and stability requirements. I’ve noticed that there is a lot of confusion around its various parts and how they are meant to be used.

Florian felt like one thing that might be helpful is to break apart the facade pattern a bit, to help people see the smaller pieces. Currently the futures crate seems a bit like a monolithic entity. Maybe it would be useful to give more examples of what each part is and how it can be used in isolation, or the overall best practices.

Learning

Finally, I posed to Florian a question of how can help people to learn async coding. I’m very keen on the way that Rust manages to avoid hard-coding a single runtime, but one of the challenges that comes with that is that it is hard to teach people how to use futures without referencing a runtime.

We didn’t solve this problem (shocker that), but we did talk some about the general value in having a system that doesn’t make all the choices for you. To be quite honest I remember that at this point I was getting very tired. I haven’t listened back to the video because I’m too afraid, but hopefully I at least used complete sentences. =)

One interesting idea that Florian raised is that it might be really useful for people to create a “learning runtime” that is oriented not at performance but at helping people to understand how futures work or their own applications. Such a runtime might gather a lot of data, do tracing, or otherwise help in visualizing. Reading back over my notes, I personally find that idea sort of intriguing, particularly if the focus is on helping people learn how futures work early on – i.e., I don’t think we’re anywhere close to the point where you could take production app written against async-std and then have it use this debugging runtime. But I could imagine having a “learner’s runtime” that you start with initially, and then once you’ve got a feel for things, you can move over to more complex runtimes to get better performance.

Conclusion

I think the main points from the conversation were:

  • Diagnostics and documentation remain of very high importance. We shouldn’t get all dazzled with new, shiny things – we have to keep working on polish.
  • Beyond that, though, we should be working to stabilize building blocks so as to give more room for the ecosystem to flourish and develop. The AsyncRead/AsyncWrite traits, along with Stream, seem like plausible candidates.
    • We shouldn’t necessarily try to make those traits be as generic as possible, but instead focus on building something usable and simple that meets the most important needs right now.
  • We need to give time for people to develop patterns and best practices, and in particular to figure out how to “capture” them as APIs and crates. This isn’t really something that the Rust organization can do, it comes from the ecosystem, by library and application developers.

Comments?

There is a thread on the Rust users forum for this series.

Christopher ArnoldThe Momentum of Openness - My Journey From Netscape User to Mozillian Contributor

Working at Mozilla has been a very educational experience over the past eight years.  I have had the chance to work side-by-side with many engineers at a large non-profit whose business and ethics are guided by a broad vision to protect the health of the web ecosystem.  How did I go from being on the front of a computer screen in 1995 to being behind the workings of the web now?  Below is my story of how my path wended from being a Netscape user to working at Mozilla, the heir to the Netscape legacy.  It's amazing to think that a product I used 24 years ago ended up altering the course of my life so dramatically thereafter.  But the world and the web was much different back then.  And it was the course of thousands of people with similar stories, coming together for a cause they believed in.

The Winding Way West

Like many people my age, I followed the emergence of the World Wide Web in the 1990’s with great fascination.  My father was an engineer at International Business Machines when the Personal Computer movement was just getting started.  His advice to me during college was to focus on the things you don't know or understand rather than the wagon-wheel ruts of the well trod path.  He suggested I study many things, not just the things I felt most comfortable pursuing.  He said, "You go to college so that you have interesting things to think about when you're waiting at the bus stop."  He never made an effort to steer me in the direction of engineering.  In 1989 he bought me a Macintosh personal computer and said, "Pay attention to this hypertext trend.  Networked documents is becoming an important new innovation."   This was long before the World Wide Web became popular in the societal zeitgeist.  His advice was prophetic for me.

After graduation, I moved to Washington DC and worked for a financial news wire that covered international business, US economy, World Trade Organization, G7, US Trade Representative, the Federal Reserve and breaking news that happened in the US capital.  This era stoked my interest in business, international trade and economics.  During my research (at the time, via a Netscape browser, using AltaVista search engine) I found that I could locate much of what I needed on the web rather than in the paid LexisNexis database, which I also had access to at the National Press Club in Washington, DC.

When the Department of Justice initiated its anti-trust investigation into Microsoft, for what was called anti-competitive practices against Netscape, my interest was piqued.  Philosophically, I didn’t particularly see what was wrong with Microsoft standing up a competing browser to Netscape.  Isn’t it good for the economy for there to be many competing programs for people to use on their PCs?  After all, from my perspective, it seemed that Netscape had been the monopoly of the browser space at the time.

Following this case was my first exposure to the ethical philosophy of the web developer community.  During the testimony, I learned how Marc Andressen, and his team of software developer pioneers, had an idea that access to the internet (like the underlying TCP/IP protocol) should not be centralized, or controlled by one company, government or interest group.  And the mission behind Mosaic and Netscape browsers had been to ensure that the web could be device and operating system agnostic as well.  This meant that you didn’t need to have a Windows PC or Macintosh to access it.

It was fascinating to me that there were people acting like Jiminy Cricket, Pinocchio's conscience, overseeing the future openness of this nascent developer environment.  Little did I know then that I myself was being drawn into this cause.  The more I researched about it, the more I was drawn in.  What I took away from the DOJ/Microsoft consent decree was the concept that our government wants to see our economy remain inefficient in the interest of spurring diversity of competitive economic opportunity, which it asserted would spur a plurality of innovations which could compete in the open marketplace to drive consumer choice and thereby facilitate lower consumer prices.  In the view of the US government, monopolies limit this choice, keep consumer prices higher, and stifle entrepreneurial innovation.  US fiscal and trade policy was geared toward the concept of creating greater open market access to the world markets, while driving prices for consumers lower in an effort to increase global quality of life for all participating economies it traded with.

The next wave of influence in my journey came from the testimony of the chairman of the Federal Reserve Bank in the US congress.  The Federal Reserve is the US Central Bank.  They would regularly meet at the G7 conference in Washington DC with the central bank heads of major economic influencing countries to discuss their centrally-managed interest rates and fiscal policies.  At the time, the Fed Chairman was Allan Greenspan.  Two major issues were top of the testimony agenda during his congressional testimonies in the late 1990’s.  First, the trade imbalances between the US (a major international importer) and the countries of Asia and South America (which were major exporters) who were seeking to balance out their trade deficits via the WTO and regional trade pacts.  In Mr. Greenspan’s testimonies, Congressional representatives would repeatedly ask whether the internet would change this trade imbalance as more of the services sector moved online.
As someone who used a dial-up modem to connect to the internet at home (DSL and cable/dish internet were not yet common at the time) I had a hard time seeing how services could offset a multi-billion dollar asymmetry between US and its trading partners.  But at one of Mr. Greenspan’s sessions with Barney Frank (One of the legislators behind the "Dodd-Frank" financial reform bill which passed post-financial crisis) asked Mr. Greenspan to talk about the impact of electronic commerce on the US economy.  Mr. Greenspan, always wont to avoid stoking market speculation, dodged the question saying that the Fed couldn’t forecast what the removal of warehousing cost could do in impacting market efficiency, therefore markets at large.  This speech stuck with me.  At the time they were discussing Amazon, a book seller which could avoid typical overhead of a traditional retailer by eliminating brick and mortar store fronts with their inventory stocking burdens from products consumer hadn't yet decided they wanted.  Amazon was able to source the books at the moment the consumer decided to purchase, which eliminated the inefficiency of retail.

It was at this time also that my company decided to transition its service to a web-based news portal as well.  In this phase, Mr. Greenspan cautioned against "irrational exuberance" where the stock market valuations of internet companies were soaring to dizzying proportions relative to the future value of their projected sales.  Amid this enthusiastic fervor, I decided that I wanted to move to Silicon Valley to enter the fray myself.  I decided that my contribution would be in conducting international market launches and business development for internet companies.
After a stint working in web development on websites with a small design agency, I found my opportunity to pitch a Japanese market launch for a leading search engine called LookSmart which was replicating the Inktomi-style distributed search strategy.  Distributed search was an enterprise business model called business to business (or B2B) providing infrastructure support for other companies like Yahoo, Excite, MSN, AOL and other portals that had their own dedicated audience or portals.

After my company was reasonably successful in Japan, Yahoo! Japan took interest in acquiring the company, and I moved back to the US to work with Yahoo! on distributing search services to other countries across Asia Pacific.  In parallel, Netscape had followed a bumpy trajectory.  AOL purchased the company and tried to fold it into its home internet subscriber service.  America Online (AOL) was a massively popular dialup modem service in the US at the time.  AOL had a browser of their own too.  But it was a "walled-garden" browser that tried to give users their "daily clicks" like news, weather and email, but didn't promote the open web.  It's easy to understand their perspective.  They wanted to protect their users from the untamed territory of the world wide web, which at the time they felt was too risky for the untrained user to venture out into.  It was a time of a lot of Windows viruses, pop-ups, scams, and few user protections.  AOL's stock had done really well based on their success in internet connectivity services.  Once AOL's stock valuation surpassed Netscape's valuation, they were able to execute an acquisition. 

The team at Netscape may have been disappointed that their world-pioneering web browser was being a acquired by a company that had a sheltered view of the internet and a walled garden browser, even if AOL had been pioneers in connecting the unconnected.  It may have been a time of a lot of soul searching for Marc Andressen's supporters, considering that the idea of Netscape had been one of decentralization, not corporate mergers. 

A group of innovators inside AOL suggested that the threat of a world dominated by Microsoft's IE browser was a risky future for the world of open competitive ecosystem of web developers.  So they persuaded the AOL executive team to set up a skunk-works team inside AOL to atomize the Netscape Communicator product suite into component parts that could then be uploaded into a modular hierarchical bug triage tree, called Bugzilla, so that people outside of AOL could help fix code problems that were too big for internal AOL teams alone to solve.  There is a really good movie about this phase in AOL's history called "Code Rush."

Mozilla project grew inside AOL for a long while beside the AOL browser and Netscape browsers.  But at some point the executive team believed that this needed to be streamlined.  Mitchell Baker, an AOL attorney, Brendan Eich, the inventor of JavaScript, and an influential venture capitalist named Mitch Kapoor came up with a suggestion that the Mozilla project should be spun out of AOL.  Doing this would allow all of the enterprises who had interest in working in open source versions of the project to foster the effort while Netscape/AOL product team could continue to rely on any code innovations for their own software within the corporation.

A Mozilla in the wild would need resources if it were to survive.  First, it would need to have all the patents that were in the Netscape patent portfolio to avoid hostile legal challenges from outside.  Second, there would need to be a cash injection to keep the lights on as Mozilla tried to come up with the basis for its business operations.  Third, it would need protection from take-over bids that might come from AOL competitors.  To achieve this, they decided Mozilla should be a non-profit foundation with the patent grants and trademark grants from AOL.  Engineers who wanted to continue to foster AOL/Netscape vision of an open web browser specifically for the developer ecosystem could transfer to working for Mozilla. 

Mozilla left Netscape's crowdsourced web index (called DMOZ or open directory) with AOL.  DMOZ went on to be the seed for the PageRank index of Google when Google decided to split out from powering the Yahoo! search engine and seek its own independent course.  It's interesting to note that AOL played a major role in helping Google become an independent success as well, which is well documented in the book The Search by John Battelle.

Once the Mozilla Foundation was established (along with a $2 Million grant from AOL) they sought donations from other corporations who were to become dependent on the project.  The team split out Netscape Communicator's email component as the Thunderbird email application as a stand-alone open source product and the Phoenix browser was released to the public as "Firefox" because of a trademark issue with another US company on usage of the term "Phoenix" in association with software. 

Google had by this time broken off from its dependence on Yahoo! as a source of web traffic for its nascent advertising business.  They offered to pay Mozilla Foundation for search traffic that they could route to their search engine traffic to Google preferentially over Yahoo! or the other search engines of the day.  Taking "revenue share" from advertising was not something that the non-profit Mozilla Foundation was particularly well set up to do.  So they needed to structure a corporation that could ingest these revenues and re-invest them into a conventional software business that could operate under the contractual structures of partnerships with other public companies.  The Mozilla Corporation could function much like any typical California company with business partnerships without requiring its partners to structure their payments as grants to a non-profit. 

When Firefox emerged from the Mozilla team, it rapidly spread in popularity, in part because they did clever things to differentiate their browser from what people were used to in the Internet Explorer experience such as letting their users block pop-up banners or customize their browser with add-ons.  But the surge in its usage came at a time when there was an active exploit capability in IE6 that allowed malicious actors to take-over the user's browser for surveillance or hacking in certain contexts.  The US government urged companies to stop using IE6 and to update to a more modern browser.  It was at this time I remember our IT department at Yahoo! telling all its employees to switch to Firefox.  And this happened across the industry.

Naturally as Firefox market share grew, because Mozilla was a non-profit, it had to reinvest all proceeds from their growing revenues back into web development and new features, so they began to expand outside the core focus of JavaScript, browser engines.  As demand for alternative web browsers surged, several Mozillians departed to work on alternative browsers.  The ecosystem grew suddenly with Apple and Google launching their own browsers.  As these varied browsers grew, the companies collaborated on standards that all their software would use to ensure that web developers didn't have to customize their websites to uniquely address idiosyncrasies  of the different browsers consumers had a choice of.

When I joined Mozilla, there were three major issues that were seen as potential threats to the future of the open web ecosystem.  1) The "app-ification" of the web that was coming about in new phones and how they encapsulated parts of the web, 2) The proliferation of dynamic web content that was locked in behind fragmented social publishing environments.  3) The proliferation of identity management systems using social logins that were cumbersome for web developers to utilize.  Mozilla, like a kind of vigilante super hero, tried to create innovative tactics to propose technologies to address each one of these.  It reminded me of the verve of the early Netscape pioneers to try to organize an industry toward the betterment of the problems the entire ecosystem was facing.
To discuss these different threads, it may be helpful to look at what had been transforming the web in years immediately prior.

What the Phone Did to the Web and What the Web Did Back

The web is generally based on html, CSS and JavaScript. A web developer would publish a web page once, and those three components would render the content of the webpage to any device with a web browser.  What we were going into in 2008 was an expansion of content publication technologies, page rendering capabilities and even devices which were making new demands of the web.  It was obvious to us at Yahoo! at the time that the industry was going through a major phase shift.  We were building our web services on mashups of content sources from many different sources.  The past idea of the web was based on static webpages that were consistent to all viewers.  What we were going toward was a sporadically-assembled web.  The concept of the static, consistent web of the 1990s was referred to as "web 1.0" in the web development community.  The new style was frequently called "mash-up" or "re-mix" using multi-paned web pages that would assemble multiple discreet sources of content at the time of page load.  We called this AJAX for asynchronous JavaScript and xml (extensible markup language) that allowed personalized web content to be rendered on demand.  Web pages of this era appeared like dashboards and would be constantly refreshing elements of the page as the user navigated within panes of the site.

In the midst of this shift to the spontaneously assembled dynamic web, Apple launched the iPhone.  What ensued immediately thereafter was a kind of developer confusion as Apple started marketing the concept that every developer wishing to be included in its phones needed to customize content offerings as an app tailored to the environment of the phone.  It was a kind of exclusion where the web developer had to parse their site into smaller sized chunks for ease of consumption in a smaller form factor and different user context than the desktop environment.

Seeing the launch of the iPhone, which sought to combine this wave of the personalized dynamic web, along with the elements of location based content discovery, seemed an outright rethinking of the potential of the web at large.  I was working on the AT&T partnership with Yahoo! at the time when Apple had nominated AT&T to be the exclusive carrier of choice for the iPhone launch.  Yahoo! had done its best to bring access to web content on low-end phones that industry professionals referred to as “feature phones”.  But these devices’ view of the web was incredibly restricted, like the AOL browser of the early web. Collaborating with Yahoo! Japan, we brought a limited set of mobile-ready web content to the curated environment of the NTT Docomo “iMode” branded phones.  We tried to expand this effort to the US.  But it was not a scalable approach.  The broader web needed to adapt to mobile.  No curatorial effort to bootstrap a robust mobile web would achieve broad adoption.

The concept behind the iPhone was to present the breadth of the web itself to the phone of every person.  In theory, every existing webpage should be able to render to the smaller screen without needing to be coded uniquely.  Håkon Wium Lie had created the idea of CSS (the cascading style sheet) which allowed an html coded webpage to adapt to whatever size screen the user had.  Steve Jobs had espoused the idea that content rendered for the iPhone should be written in html5. However, at the time of the phone’s release, many websites had not yet adapted their site to the new standard means of developing html to be agnostic of the device of the user.  Web developers were very focused on the then-dominant desktop personal computer environment. While many web pioneers had sought to push the web forward into new directions that html5 could enable, most developers were not yet on board with those concepts. So the idea of the “native mobile app” was pushed forward by Apple to ensure the iPhone had a uniquely positive experience distinct from the experience every other phone would see, a poorly-rendered version of a desktop focused website.

The adoption of the modern web architecture that existed in html5 hadn't reached broad developer appeal at the time that the market opportunity of iPhone and Android emerged.  Mozilla saw it as a job that it could tackle: The de-appification of the app ecosystem.  Watching this ambitious project was awe inspiring for everyone who contributed to the project at the time.  Mozilla's Chief Technical Officer, Brendan Eich, and his team of engineers decided that we could make a 100% web phone without using the crutch of app-wrappers.  The team took an atomized view of all the elements of a phone and sought to develop a web-interface to allow each element of the device to speak web protocols such that a developer could check battery life, status of motion, gesture capture or other important signals relevant to the mobile user that hadn't been utilized in the desktop web environment.  And they did it.  The phone was app-less with everything running in JavaScript on user demand.  The phones launched in 28 countries around the world. I worked on the Brazilian market launch, where there was dizzy enthusiasm about the availability of a lower cost smart phone based on open source technology stack. 

As we prepared for the golive of the “FirefoxOS” phone launch in Brazil, the business team coordinated outreach through the largest telecommunications carriers to announce availability (and shelf space in carrier stores) for the new phones as I and the Mozilla contributors in Brazil reached out to the largest websites in the country to “consent” to listing their sites as web-applications on the devices.  Typically, when you buy a computer, web services and content publishers aren’t “on” the device, content publishers are just accessible via the device’s browsers.  But iPhone and Android’s trend of “appification” of web content was so embedded in people’s thinking that many site owners thought they needed to do something special to be able to provide content and services to our phone’s users.  Mozilla therefore borrowed the concept of a “marketplace” which was a web-index of sites that had posted their site’s availability to FirefoxOS phone users. 

Steve Jobs was a bit haunted by the app ecosystem he created.  It became a boon for his company, with Apple being able to charge a toll of $.99 or more for content that was already available on the internet for free.  But he urged the developer community to embrace html5 even while most developers were just plopping money down to wrap their web content in the iTunes-required app packaging.  (The iPhone grew out of the Apple Music player project called iPod, which is why every app the phone needed had to be installed from the music player application “iTunes” Apple included on every device it sold for distributing music and podcasts.)  Companies such as Phonegap, and Titanium, popped up to shim web content to the app packaging frameworks required by the Google-acquired Android platform and Apple iTunes.  But the idea of using shims and app wrappers was an inelegant solution to advancing the web’s gradual embracing of the open web. Something needed to change to de-appify the untidy hacks of the Jobs era.  And this is going on to this day. 

Mozilla’s engineers suggested that there shouldn’t be the concept of a “mobile web”.  And we should do everything we can to persuade web developers and content publishers to embrace mobile devices as “1st class citizens of the web.”  So they hearkened back to the concepts in CSS, a much earlier development of web architecture mentioned previously, and championed the concept of device-aware responsive web design with a moniker of “Progressive Web Apps.”  The PWA concept is not a new architecture per se.  It’s the idea that a mobile-enhanced internet should be able to do certain things that a phone wielding user expects it to do.  So a webmaster should take advantage of certain things a user on the move might expect differently from a user sitting at a desktop computer.  PWA work is being heavily championed by Google for the Android device ecosystem now, because it is larger than the iPhone ecosystem, and also because Google understands the importance of encouraging the seamless experience of web content agnostic of which device you happen to possess.

After the launch of the phone, because Mozilla open sources its code, many other companies picked up and furthered the vision.  Now the operating system has been forked into TVs, smart watches, micro-computers and continues to live on in phones under different brand names to this day.  In addition, the project of the atomized phone with hardware elements that can speak https for networking with other devices is now expanded to the current Internet of Things project in Mozilla’s Emerging Technologies group to bring the hardware products we buy (which all speak relatively incompatible radio frequencies) to the common lingua franca of the protocols of the internet.  Not everyone has a Mozilla phone in their pocket. But that was never a goal of the project. 
This brings me to one of the concepts that I appreciate most about Mozilla and the open source community.  An idea can germinate in one mind, be implemented in code, then set free in the community of open source enthusiasts.  Then, anyone can pick it up and innovate upon it. While the open sourcing of Netscape wasn’t the start of this movement, it has contributed significantly to the practice.  The people who created the world wide web continue to operate under the philosophy of extensibility. The founders of Google’s Chromium project were also keen Mozillians. The fact that a different company, with a different code base, created a similarly thriving open source ecosystem of developers aiming to serve the same user needs as Mozilla’s is the absolute point of what Mozilla’s founders set out to promote in my view.  And it echoes those same sentiments I’d heard expressed back in Washington, DC back in the early 1990’s. 

One of the things that I have studied a great deal, with fervor, fascination and surprise, was the concept of the US patent system.  Back in the early days of the US government, Secretary of State Jefferson created the concept of a legal monopoly. It was established by law for the government first, then expanded to the broader right of all citizens, and later all people globally via the US Patent and Trademark Office.  I had an invention that I wished to patent and produce for the commercial market. My physics professor suggested that I not wait until I finish my degree to pursue the project. He introduced me to another famous inventor from my college and suggested I meet with him. Armed with great advice I went to the USPTO to research prior art that might relate to my invention.  Upon thorough research, I learned that anyone in the world can pursue a patent and be given a 17 year monopoly option to protect the invention while the merits of the market fit could be tested. Thereafter, the granted patent would belong as open source, free of royalties to the global community. “What!?” thought I. I declare the goods to the USPTO so they can give it away to all humanity shortly thereafter once I did all the work to bring it to market?  This certainly didn’t seem like a very good deal for inventors in my view.  But it also went back to my learnings about why the government prefers certain inefficiencies to propagate for the benefit of the greater common good of society.  It may be that Whirlpool invented a great washing machine. But Whirlpool should only be able to monopolize that invention for 17 years before the world at large should be able to reap the benefits of the innovation without royalties due to the inventor.

My experiences with patents at Yahoo! were also very informative.  Yahoo! had regularly pursued patents, including for one of the projects I launched in Japan.  But their defense of patents had been largely in the vein of the “right to operate” concept in a space where their products were similar to those of other companies which also had patents or amicable cross-licensing with other organizations that operated in a similar space.  (I can’t speak for Yahoo!’s philosophical take on patents as I don’t represent them. But these opinions stem from how I observed them enforcing their patent rights for formally granted USPTO patents and how they exercised those rights in the market.) I believed that the behaviors of Yahoo!, AOL and Google were particularly generous and lenient.  As an inventor myself, I was impressed with how the innovators of Silicon Valley, for the most part, did not pursue legal action against each other. It seemed they actually promoted the iteration upon their past patents. I took away from this that Silicon Valley is more innovation focused than business focused. When I launched my own company, I asked a local venture capitalist whether I should pursue patents for a couple of the products I was working on .  The gentleman who was a partner at the firm said, paraphrasing: “I prefer action over patents. Execute your business vision and prove the market value. Execution is more valuable than ideas. I’d rather invest in a good executor than an inventor.” And from the 20 years I’ve seen here, it always seems to be the fast follower rather than the inventor who gets ahead, probably precisely because they focus on jumping directly to execution rather than spending time scrawling protections and illustrations with lawyers.

Mozilla has, in the time I’ve worked with them, focused on implementing first in the open, without thinking that an idea needed to be protected separately.  Open source code exists to be replicated, shared and improved. When AOL and the Mozilla project team open sourced the code for Netscape, it was essentially opening the patent chest of the former Netscape intellectual property for the benefit of all small developers who might wish to launch a browser without the cumbersome process of watching out for the licenses for the code provenance.  Bogging down developers with patent “encumbered” code would slow those developers from seeking to introduce their own unique innovations. Watching a global market launch of a new mobile phone based on entirely open source code was a phenomenal era to witness. And it showed me that the benevolent community of Silicon Valley’s innovators had a vision much akin to those of the people I’d witness in Washington DC.  But this time I’d seen it architected by the intentional acts of thousands of generous and forward-thinking innovators rather than through the act of legislation or legal prompting of politicians.

The Web Disappears Behind Social Silos

The web 2.0 era, with its dynamically assembled web pages, was a tremendous advance for the ability of web developers to streamline user experiences.  A page mashed-up of many different sources could enhance the user’s ability to navigate across troves of information that would take a considerable amount of time to click and scroll through.  But something is often lost when something else is gained. When Twitter introduced its micro-blog platform, end users of the web were able to publicize content they curated from across the web much faster than having to author full blog posts and web pages about content they sought to collate and share.  Initially, the Twitter founders maintained an open platform where content could be mashed-up and integrated into other web pages and applications. Thousands of great new visions and utilities were built upon the idea of the open publishing backbone it enabled. My own company and several of my employers also built tools leveraging this open architecture before the infamous shuttering of what the industry called “The Twitter Firehose”.  But it portended a phase shift yet again of the very nascent era of the newly invented social web. The Twitter we knew of became a diaspora of sorts as access to the firehose feed was locked down under identity protecting logins. This may be a great boon to those seeking anonymity and small “walled gardens” of circles of friends. But it was not particularly good for what may of the innovators of web 2.0 era hoped for the greater enfranchisement of web citizenry. 
Many of the early pioneers of the web wanted to foster a web ecosystem where all content linked on the web could be accessible to all, without hurdles on the path that delayed users or obscured content from being sharable.  Just as the app-ified web of the smartphone era cordoned off chunks of web content that could be gated by a paywall, the social web went into a further spitting of factions as the login walls descended around environments that users had previously been able to easily publish and share content across.

The parts of the developer industry that weren’t mourning the loss of the open-web components of this great social fragmentation were complaining about psychological problems that emerged once-removed from the underlying cause.  Fears of censorship and filter-bubbles spread through the news. The idea that web citizens now had to go out and carefully curate their friends and followers led to psychologists criticizing the effect of social isolation on one side and the risks of altering the way we create genuine off-line friendships on the other. 

Mozilla didn’t take a particular stance on the philosophical underpinnings of the social web.  In a way the Bugzilla platform we used to build and maintain Firefox and Thunderbird were purpose-built social networks of collaboration with up-voting and hierarchical structures.  But it was all open source like the code-commits that is housed were. We did have discussions around codes of conduct of our Bugzilla community, geared to ensuring that it remained a collaborative environment where people from all walks of life and all countries could come together and participate without barriers or behaviors that would discourage or intimidate open participation.

But there were certain specific problems that social utilities introduced to the web architecture in terms of the code they required webmasters to integrate for them to be used.  So we focused on those. The first one we hit upon was the idea “log in with x” problem. In the United States, people love to watch race cars go around concrete tracks. They consider it a sport.  One of the most famous racing brands was called Nascar, which was famous for having their cars and drivers covered with small advertisements from their commercial sponsors.  As the social web proliferated, webmasters were putting bright icons on their websites with JavaScript prompts to sign in with five or more different social utilities.  We called this problem “Nascar” because the webmaster never knew which social site a user had an identity registered for. So if a user visited once and logged in with Twitter, and another time accidentally logging in with Facebook, their persona represented on the original site might be lost and irretrievable.  Mozilla thought this was something a browser could help with. If a user stored a credential, agnostic of which source, at the browser level, the user wouldn’t need to be constantly peppered with options to authenticate with 10 different social identities. This movement was well received initially, but many called a more prominent architecture to show the user where their logged identities were being stored.  So Mozilla morphed the concept from “BrowerID” to the idea of a Firefox Accounts tool that could be accessed across all the different copies of Firefox a user had (on PCs, Phones, TVs or wherever they browsed the web.) Mozilla then allowed users to synchronize their identities across all their devices with highly reliable cryptography to ensure data could never be intercepted between any two paired devices. 

Firefox Accounts has expanded to allow users to do synchronize secure session history, browser extensions and preferences, stored passwords (to prevent low risk key-stroke logging for those who were paranoid about that), file transmission with Firefox Send.  Over the years Firefox team has experimented with many common utilities that add to user convenience for leveraging their saved account data. And where Mozilla didn’t offer it, but an addon developer did, the Firefox Account could be used to synchronize those addon-based services as well.

The other great inconvenience of the social web was the steps necessary for users to communicate conventional web content on the social web.  Users would have to copy and paste URLs between browser windows if they wished to comment or share web content. Naturally there was a Nascar solution for that as well: If the web developers for every web site would put in a piece of JavaScript that users could click with a button to upvote or forward content that would solve everything right?  Yeah, sure. And it would also bog down the pages with lots of extraneous code that had to be loaded from different web servers around the internet as well. Turning every webpage into a Frankenstein hodge-podge of Nascar-ed promotions of Twitter and Facebook buttons didn’t seem like and elegant solution to Mozilla’s engineers either!

Fortunately, this was obvious to a large number of the web development and browser community as well.  So the innovative engineers of Mozilla, Google and others put their heads together on a solution that we could standardize across web browsers so that every single website in the world didn’t have to code in a solution that was unique to every single social service provider.  The importance of this was also accentuated with the website of the United States government’s White House integrated a social engagement platform that was found to be tracking the visits of people who visited the web page with tiny code snippets that the White House themselves hadn’t engineered.  People generally like the idea of privacy when their visiting web pages. The idea that a visit to read what the president had to say came along with a tradeoff that readers were going to be subsequently tracked because of that visit didn’t appeal to the site visitors any more than it didn’t appeal to the US government!

To enable a more privacy protecting web, yet enable the convenience users sought of engaging with social utilities, Mozilla’s engineers borrowed a concept from the progressive web app initiative.  PWAs, which were emulating the metaphors of user engagement on phones apps utilized the concept of a user “intent”. Just as a thermostat in a house expresses the thermostat’s setting as a “call for heat” from the houses’ furnace, there needed to be a means for a user to have an “intent to share”.  And as phone manufacturers had enabled the concept of sharing at the operating system level for the applications that users leveraged to express those intentions on the phone, a browser needed to have the same capability. 

At Mozilla we engaged these concepts a “Social APIs.”  An API is an abbreviated term to refer to a kind of hand-shake socket that can interface with another program.  It refers to application program interface. But it generally refers to any socketing capability that can be handled between a hardware, stand-alone software, or web service that can interface with another entity that is not controlled by the originating interface.  Microsoft’s Outlook email software can interface effortlessly with a Google Gmail account using an API if the user of the software authenticates for their program to make such requests to the user’s Gmail account without Microsoft or Google ever having to be directly involved in the authentication the user initiates.  Just as Firefox Accounts could sync services on behalf of a user without knowing any of the details of the accounts the user requested to sync, so too should it be able to recognize when a user wants to share something without having the user dance around between browser windows with copy and paste. 

So Mozilla promoted the concept of browsers supporting share intents, as well as notification intents so that our users didn’t have to always be logged into their social media accounts in order to be notified if something required their attention on any given social media account.  We did this with some great consideration. There was a highly-marketed trend in Silicon Valley at the time around “gamification.” This was a concept that web developers could you points and rewards to try to drive loyalty and return visits among web users. Notifications were heralded by some as a great way to drive the sense of delight for visitors of your website along with the opportunity to lure them back for more of your web goodness, whatever you offered.  Would developers over-notify, we wondered. There was a potential for oversaturation and distraction of user attention which could be a worse cost to the user’s attention and time than it was a benefit for them. 

Fortunately, we did not see huge notification abuses from the sites that supported Social API.  And we did we widespread interest from the likes of Facebook, Twitter, Yahoo!, Google which were the major messaging service providers of the day.  And so we jointly worked to uplevel this to the web standards body called the World Wide Web Consortium (abbreviated as the W3C) for promotion outside the Firefox ecosystem, so that it could be used across all web browsers which supported W3C standards. 

Working with this team I learned a great deal from my peers in the engineering organization.  First I thought, if this is such a great idea, why doesn’t Firefox try to make this a unique selling point of our software?  What’s the rush to standardize this? Jiminy Cricket voices across the organization pointed out, the goal of our implementation of open source code in the browser is precisely to have others adopt the greatest ideas and innovate upon them.  The purpose of the standards organizations we work with was to pass on those innovations so that everyone else could utilize them without having to adopt Firefox-specific code. Good ideas, like the USPTO’s concept of eventual dissemination to the broader global community, are meant to spread to the entire ecosystem so that webmasters avoid the pitfall of coding their website to a the functionality of a single piece of software or web browser.  Mozilla engineers saw their mission as in part being to champion web-compatibility, which they often shortened to webcompat in our discussions at developer events. Firefox is a massive population of addressable users. But we want web developers to code for all users to have a consistently great experience of the web, not just our audience of users. There is abroad group of engineers across Microsoft, Google, Apple, Samsung, Mozilla and many small software developers who lay down the flags of their respective companies and band together in the standards bodies to dream of a future internet beyond the capability of the software and web we have today.  They do this with a sense of commitment to the future we are creating for the next generation of internet, software and hardware developers who are going to follow in the footsteps after us. Just as we inherited code, process and standards from our forebearers. It is the yoke of our current responsibility to pass on the baton without being hampered by the partisanship of our competing companies. The web we want has to be built today if the future generations are going to be set up for success in the demands of the technology environment we will create for tomorrow.

Twice a year the executive team at Mozilla convene the team of people who support the Mozilla Foundation non-profit (and its daughter corporate entity that ships the software) in all-hands meetings where we discuss our part in this shared vision.  Our Chairwoman Mitchell Baker, who managed the Mozilla Project from the spin-out from the AOL organization many years ago gets up on stage to discuss the opportunities she and the foundation see as the web ecosystem evolves.  She speaks in rousing language with phrases like “The Web We Want” in order to instill our team of contributors with an inspiring sense of passion and responsibility. We all go off around the globe as denizens of this mission, carriers of the flag and inspiration, to try to champion and inspire others in turn. 
After one of these events I went off to muse on our projects with one of my mentors, an engineer named Shane Caraveo.  I’d been researching and thinking a lot about all the bluster and buzz that had been happening in the broader internet and press communities about social media platforms.  Facebook had been commissioning studies on the psychological benefits and pitfalls of social media use. I’d listened to their commentaries defending the new paradigms of the social web.  I asked Shane what he thought. Shouldn’t we be championing people go off and build their own web pages instead of facilitating greater facility of leveraging social platforms and tools? Shane pointed out that Mozilla does do that, especially around the Mozilla Developer Network that demonstrates with code examples exactly how to integrate various W3C code specs for website owners, systems administrators and general web enthusiasts.  Shane made a comment that sat with me for years after. “I don’t care how people create content and share it on the internet.  I care that they do.”

The First Acquisition

The standardization of web notifications across browsers one of the big wins of our project.  The other, for Mozilla specifically, was the acquisition of the Pocket platform.  When I worked at Yahoo!, one of the first web 2.0 acquisitions they had made was the bookmark backup and sharing service del.icio.us.  (The name was awkward because many of the companies of the day had given up the idea of paying for overpriced .com URLs in favor of a new surfeit of domains that had become available under the “.us” top level domain name space.) Our Yahoo! team had seen the spreading of the web-sharing trend, pre-Facebook, as one of the greatest new potentials for web publishers to disseminate their content, by the praise and subsequent desire to “re-post” content among circles of friends.  Many years later Yahoo! sold the cloud bookmarking business to the founder of YouTube who sought to rekindle the idea.  But another entrepreneur named Nate Wiener had taken a different approach to solving the same problem.  He’d built addons for web browsers to address the need for cloud-bookmarking.

Saving web content may seem like a particularly fringe use case for only the most avid web users.  But the Pocket service received considerable demand.  With funding from Google’s venture investing arm among others, Nate was able to grow Pocket to support archiving addons for Google’s Chrome browser, Android and iOS phones, and even expand into a destination website where users could browse the recommendations of saved content from other users in a small tight-knit group of curators.  (If this sounds to you like Netscape’s DMOZ project from 20 years ago and del.icio.us from 10 years ago, that was my thought too.)  But it was perhaps the decentralization of Pocket’s approach that made it work so well.  The community of contributors supporting it was web-wide!  And the refined stream of content coming out of its recommendations was very high quality journalism that was in no way influenced by the news publishing industry, which had its own approaches to content promotion.

When I first met the Pocket team, they commented that their platform was not inherently social.  So the constraints of the Social API architecture didn’t fit the needs of their users.  They suggested that we create a separate concept around “save” intents that were not fitting in the constraints of social media intents that the phones and services were pursuing at the time.  When Firefox introduced the “save” function in our own browser, it seemed to be duplicating the concept of the architecture of “Save to Bookmarks”+”Firefox Accounts Sync”.  But we found that a tremendous number of users were keen on Pocket save rather than the sync-bookmarks architecture we already had.
Because Google has already invested in Pocket, I had thought that it was more likely that they would join the Chrome team eventually.  But by a stroke of good fortune, the Pocket team had had a very good experience with working alongside the Mozilla team and decided that they preferred to join Mozilla to pursue the growth of their web services.  This was the first acquisition Mozilla had executed.  Because I had seen how acquisition integrations sometimes fared in Silicon Valley, I had some fascination to see how Mozilla would operate another company with its own unique culture.  Fortunately in my view, Pocket continues to support all browsers that compete with Firefox.  And the active community of Pocket users and contributors continues to stay robust and active to this day.

Protection of Anonymity

One of the most fascinating industry-wide efforts I saw at Mozilla was the campaign behind protecting user anonymity requests and initiatives to enable pseudonymity for users.  As social networking services proliferated in the Web 2.0 era, there were several mainstream services that sought to force users into a web experience where they could have only one single, externally verified, web identity.  The policy was lambasted in the web community as a form of censorship, where internet authors were blocked from using pen-names and aliases (The way Mark Twain authored books under his nom de plume rather than his birth name.)

On the flip side of the argument, proponents of the real-name policy theorized that anonymity of web identities led to trolling behaviors in social media, where people would be publicly criticized by anonymous voices who could avoid reputational repercussions.  This would, in theory, let those anonymous voices say things about others that were not constrained by the normal ethical decency pressures of daily society. 

Wired magazine wrote editorial columns against real names policies saying that users turn to the web to be whomever they want to be and express anonymously ideas that they couldn't without multiple pen-names.   A person’s web identity (Sometimes referred to as “handles” from the early CB radio practice of using declared identities in radio transmissions) would allow them to be more creative than they otherwise would.  One opinion piece suggested that the web is where people go to be a Humpty Dumpty assortment of diverse identities, not to be corralled together as a single source of identity. I myself had used multiple handles for my web pages.  I wanted my music hobby websites, photography website and business websites to all be distinct. In part, I didn’t want business inquiries to be routed to my music website. And I didn’t want my avocation to get tangled with my business either.

European governments jumped in to legislate the preservation of anonymity with laws referred to as “Right to be forgotten” which would force internet publishers to take down content if a user requested it.  In a world where content was already fragmented in a means detached from the initial author, how could any web publisher comply with individual requests for censorship? It wasn’t part of the web protocol to disambiguate names across the broader internet.  So reputation policing in a decentralized content publishing ecosystem proved tremendously complicated for web content hosts. 
Mozilla championed investigations, such as the Coral Project, to address the specific problems of internet trolling when it was targeted to public commenting platforms on news sites.  But as a relatively small player in the broader market, it would have been challenging to address a behavioral problem with open source code.  A broader issue was looming as a threat to Mozilla’s guiding principles. The emergence of behaviorally-targeted advertising that spanned across websites loomed as a significant threat to internet users’ right to privacy. 

The founders of Mozilla had decided to pen a manifesto of principles that they established to keep as the guiding framework for how they would govern projects that they intended to sponsor in the early days of the non-profit.  (The full manifesto can be read here: https://www.mozilla.org/en-US/about/manifesto/) In general, the developers of web software have the specific interests of their end users as their guiding light.  They woo customers to their services and compete by introducing new utilities and conveniences that contribute to the convenience and delight of their users.  But sometimes companies that make the core services we rely on themselves have to outsource some of the work they do to bring the service to us.  With advertising, this became a slippery slope of outsourcing. The advertising ecosystem’s evolution in the face of the Web 2.0 emergence, and the trade-offs publishers were making with regard to end-user privacy, became too extreme for Mozilla’s comfort.  Many outside Mozilla also believed the compromises in privacy that were being made were unacceptable, and so banded together in support of us.

While this is a sensitive subject that raises ire for many people, I can sympathize with the motivations of the various complicit parties that contributed to the problem.  As a web publisher myself, I had to think a lot about how I wanted to bring my interesting content to my audience. Web hosting cost increases with the volume of audience you wish to entertain.  The more people who read and streamed my articles, pictures, music and video content, the more I would have to pay each month to keep them happy and to keep the web servers running. All free web hosting services came with compromises.  So, eventually I decided to pay my own server fees and incorporate advertising to offset those fees.

Deciding to post advertising on your website is a concession to give up control.  If you utilize an ad network with dynamic ad targeting, the advertising platform makes the decision of what goods or services show up on your web pages.  When I was writing about drum traditions from around the world, advertisers may think my website was about oil drums, and it would show ads for steel barrels on my website.  As a web publisher, I winced. Oil barrels aren’t actually relevant to the people who read about African drums. But it paid the bills, so I tolerated it. And I thought my site visitors would forgive the inconvenience of seeing oil barrels next to my drums.

I was working at Yahoo! when the professed boon of behavioral advertising swept through the industry.  Instead of serving semantically derived keyword-matched ads for my drum web page, suddenly I could allow the last webpage you visited to buy “re-targeting” ads on my webpage to continue a more personally relevant experience for you, replacing those oil barrel ads with offers from sites that had been relevant to you in your personal journey yesterday, regardless of what my website was about.  This did result in the unsightly side effect that products you purchased on an ecommerce site would follow you around for months. But, it paid the bills. And it paid better than the mis-targeted ads. So more webmasters started doing it.

Behaviorally targeted ads seemed like a slight improvement in a generally under-appreciated industry at the start.  But because it worked so well, significant investment demand spurred ever more refined targeting platforms in the advertising technology industry.  And internet users became increasingly uncomfortable with what they perceived as pervasive intrusions of their privacy. Early on, I remember thinking, “They’re not targeting me, they’re targeting people like me.”  Because the ad targeting was approximate, not personal, I wasn’t overly concerned.

One day at Yahoo! I received a call.  It had been escalated though their customer support channels as a potential product issue.  As I was the responsible director in the product channel, they asked me if I would talk to the customer.  Usually, business directors don’t do customer support directly. But as nobody was available to field the call, I did.  The customer was actually receiving inappropriate advertising in their browser. It had nothing to do with a Yahoo! hosted page which has filters for such advertising.  But it was caused by a tracking cookie that the user, or someone who had used the user’s computer, had acquired in a previous browsing session. I instructed the user how to clear their cookie store on their browser, which was not a Yahoo! browser either, and the problem resolved.  This experience made me take to heart how deeply seriously people fear the perceived invasions of privacy from internet platforms. The source of the problem had not been related to my company. But this person had nobody to turn to to explain how web pages work. And considering how rapidly the internet emerged, it dawned on me that many people who’ve experienced the internet’s emergence in their lifetime likely couldn’t have had a mentor or teacher tell them about how these technologies worked.

Journalists started to uncover some very unsettling stories about how ad targeting can actually become directly personal.  Coupon offers on printed store receipts were revealing customers purchase behaviors which could highlight details of their personal life and even their health.  Because Mozilla’s principle #4 of the manifesto argued “Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.” They decided to tackle the ills of personal data tracking on the web with the concept of open source code transmitted in browser headers, the handshake that happens between a computer and web server at the start of a browsing session. 
The most savvy web users do know what browser cookies are and where to find them, and how to clear them if needed.  But one of our security engineers pointed out to me that we don’t want our customers to always be chasing down errant irritating cookies and flushing their browser history compulsively.  This was friction, noise and inconvenience that the web was creating for the web’s primary beneficiaries. The web browser, as the user’s delegated agent, should be able to handle these irritations without wasting time of their customers, causing them to hunt down pesky plumbing issues in the preference settings of the software.  The major browser makers banded with Mozilla to try to eradicate this ill. 

At first it started with a very simple tactic.  The browser cookie had been invented as a convenience for navigation purposes.  If you visited a page and wanted to navigate back to it, there should be a “back button” that lets you find it without having to conduct another search.  This was the need the cookie solved.  Every web page you visit sets a cookie if they need to offer you some form of customization.  Subsequently, advertisers viewed a visit to their webpage as a kind of consent to be cookied, even if the visit happened inside a browser frame, called an inline frame (iframe). You visited Amazon previously, surely you’d want to come back, they assumed.  There should be a kind of explicit statement of trust which had been described as an "opt-in" even though a visit to a web destination was in no way a contract between a user and a host. Session history seemed like a good implied vector to define trust. Except that not all elements of a web page are served from a single source. Single origin was a very Web 1.0 concept.  Dynamically aggregated web pages pulled content, code and cookies from dozens of sources in a single page load in the modern web environment. 

The environment of trust was deemed to be the 1st-party relationship between a site a user visits in their web browser and the browser cookie store which was a temporary “cache of history” that could be used in short time frames.  Cookies and other history tracking elements that could be served in iframe windows of the webpage (the portion of the web page that web designers “outsource” to external content calls) were considered outside the environment of user-delegated trust in the 1st party.  They were called “3rd party cookies” and were considered ephemeral.

Browser makers tended to standardize the code handling of web content across their separate platforms by the W3C or other working groups.  And in order to create a standard, there had to be a reference architecture that multiple companies could implement and test. The first attempt at this was called “Do not track”, which was a preference that a user could set in their browser to have trackers quarantined or blocked for certain sessions.  The browser would submit a header file upon visit to a web server that would state the cookie is to be used for the session, not to endure to other site visits on other web pages thereafter. This seemed innocuous enough. It allowed the page architecture to remember the session just so long as necessary to complete the session.  And most viewed that the DNT setting in a browser was a simple enough statement of the trust environment between a web publisher and a visitor for the purpose of daily browser usage. 

All the major browser vendors addressed the concern of the government supervision with the concept that they should self-regulate.  Meaning, they should come to some general consensus that could be used across the industry between browser, publishers and advertisers on how to best serve people using their products and services without having to have government legislators mandate how code should be written or operate.  Oddly, it didn’t work so well. Eventually, certain advertisers decided to not honor the DNT header request. US Congress invited Mozilla to discuss what was happening and why some browsers and advertising companies decided to ignore the user preferences as stated by our shared code in browser headers. 

Our efforts to work in open source via DNT with the other industry parties was not ultimately protecting the users from belligerent tracking. It resulted in a whack-a-mole issue of what we referred to as "finger printing" where advertising companies were re-targeting off of computer or phone hardware aspects or even off the preference not to be tracked itself!  It was a bit preposterous to watch this happen across the industry and to hear the explanations by those doing it. What was very inspiring to watch on the other side was the efforts of the Mozilla product, policy and legal teams to push this concern to the fore without asking for legislative intervention.  Ultimately, European and US regulators did decide to step in to create legal frameworks to punitively address breaches of user privacy that were enabled by the technology of an intermediary.  Even after the launch of the European GDPR regulatory framework, the ensuing scandals around lax handling of user private data in internet services is now very widely publicized and at the forefront of technology discussions and education.

(More to come as the story unfolds)

Daniel Stenbergcurl even more wolfed

I’m happy to announce that curl now supports a third SSH library option: wolfSSH. Using this, you can build curl and libcurl to do SFTP transfers in a really small footprint that’s perfectly suitable for embedded systems and others. This goes excellent together with the tiny-curl effort.

SFTP only

The initial merge of this functionality only provides SFTP ability and not SCP. There’s really no deeper thoughts behind this other than that the work has been staged and the code is smaller for SFTP-only and it might be that users on these smaller devices are happy with SFTP-only.

Work on adding SCP support for the wolfSSH backend can be done at a later time if we feel the need. Let me know if you’re one such user!

Build time selection

You select which SSH backend to use at build time. When you invoke the configure script, you decide if wolfSSH, libssh2 or libssh is the correct choice for you (and you need to have the correct dev version of the desired library installed).

The initial SFTP and SCP support was added to curl in November 2006, powered by libssh2 (the first release to ship it was 7.16.1). Support for getting those protocols handled by libssh instead (which is a separate library, they’re just named very similarly) was merged in October 2017.

<figcaption>Number of supported SSH backends over time in the curl project.</figcaption>

WolfSSH uses WolfSSL functions

If you decide to use the wolfSSH backend for SFTP, it is also possibly a good idea to go with WolfSSL for the TLS backend to power HTTPS and others.

A plethora of third party libs

WolfSSH becomes the 32nd third party component that curl can currently be built to use. See the slide below and click on it to get the full resolution version.

<figcaption>32 possible third party dependencies curl can be built to use</figcaption>

Credits

I, Daniel, wrote the initial new wolfSSH backend code. Merged in this commit.

Wolf image by David Mark from Pixabay

The Talospace ProjectFirefox 72 on POWER

Firefox 72 builds out of the box and uneventfully on OpenPOWER. The marquee feature this time around is picture-in-picture, which is now supported in Linux and works just fine for playing Trooper Clerks ("salsa shark! we're gonna need a bigger boat!"). The blocking of fingerprinting scripts should also be very helpful since it will reduce the amount of useless snitchy JavaScript that gets executed. The irony of that statement on a Blogger site is not lost on me, by the way.

The bug that mashed Firefox 71 (ultimately fallout from bug 1601707 and its many dupes) did not get fixed in time for Firefox 72 and turned out to be a compiler issue. The lifetime change that the code in question relies upon is in Clang 7 and up, but unless you are using a pre-release build this fix is not (yet) in any official release of gcc 9 or 10. As Clang is currently unable to completely build the browser on ppc64le, if your extensions are affected (mine aren't) you may want to add this patch which was also landed on the beta release channel for Firefox 73.

The debug and opt configurations are, again, otherwise unchanged from Firefox 67.

Spidermonkey Development BlogNewsletter 2 (Firefox 73)

Happy new year from the SpiderMonkey team!

Heads up: the next newsletter will likely cover both Firefox 74 and Firefox 75 due to the shorter release cycles this year.

JavaScript

New features

  • The relatedYear field type for Intl.DateTimeFormat.prototype.formatToParts is now part of the spec so André Bargull made it ride the trains.

Project Visage

Project Visage is a project to write a new frontend (parser and bytecode emitter) for JavaScript in Rust that’s more maintainable, modular, efficient, and secure than the current frontend. The team (Jason Orendorff, Nicolas Pierron, Tooru Fujisawa, Yulia Startsev) is currently experimenting with a parser generator that generates a custom LR parser.

There’s a fork of mozilla-central where passing --rust-frontend to the shell makes it try the new frontend, falling back on the C++ frontend for scripts it can’t handle (currently almost everything). LibFuzzer is used as a way to identify issues where the new parser accepts inputs which are currently rejected by the current parser.

Jason also improved most of our bytecode documentation.

(Jason pronounces it “VIZZ-udge”, like the English word, but you can say whatever you want.)

JSScript/LazyScript unification

Ted Campbell fixed the parser to avoid saving trivial data between syntax parsing and the eventual delazification. This saves memory and brings us closer to being able to reconstruct a lazy script directly from its non-lazy version.

The js::BaseScript type now contains the fields it needed to represent lazy and non-lazy scripts. This is an important milestone on the path to unifying JSScript and LazyScript.

Project Stencil

As the GC-free parser work continues, Project Stencil aims to define a meaningful data format for the parser to generate. This paves the way to integrating a new frontend (Visage) and allows us to modernize the bytecode caches and improve page-load performance. We call it ‘stencil’ because this data structure is the template from which the VM’s JSScript will be instantiated.

  • Matthew Gaudet started fuzzing the deferred allocation path in the front end. The hope is that we will enable this code path by default early in the Firefox 74 cycle. The end goal of turning this on by default is to avoid having to maintain two allocation paths indefinitely.
  • The LazyScript unification work continues to simplify (and clarify) the semantics of internal script flags that the parser must generate. These flags will become part of the stencil data structures.

Regular expression engine update

Iain Ireland is writing a shim layer to enable Irregexp, V8’s regexp engine, to be embedded in SpiderMonkey with minimal changes relative to upstream. He is currently working on rewriting the existing regular expression code to call the new engine.

Bytecode and IonBuilder simplifications

The previous newsletter mentioned some large IonBuilder code simplifications landing in Firefox 72. This cycle Jan de Mooij changed all loops to have the same bytecode structure so IonBuilder can use the same code for all of them. This also allowed us to remove more source notes and JIT compilation no longer has to look up any source notes.

These changes help the new frontend because it’s now easier to generate correct bytecode and also laid the groundwork for more Ion cleanup work this year. Finally, it ended up fixing some performance cliffs: yield* expressions, for example, can be at least 5x faster.

toSource/uneval removal

Tom Schuster is investigating removing the non-standard toSource and uneval functions. This requires fixing a lot of code and tests in Firefox so as a first step we may do this only for content code. André Bargull helped out by fixing SpiderMonkey tests to stop using these functions.

Debugger

Logan Smyth rewrote debugger hooks to use the exception-based implementation for forced returns. This ended up removing and simplifying a lot of code in the debugger, interpreter and JITs because the exception handler is now the only place where forced returns have to be handled.

Miscellaneous

  • André Bargull fixed an Array.prototype.reverse performance issue by avoiding GC post barriers if the array is in the nursery.
  • André also contributed more BigInt optimizations. He reduced allocations and added fast paths for uint64 BigInts. For example, certain multiplications are now 3-11x faster and exponentiations can be up to 30x faster!

WebAssembly

JS BigInt <-> wasm I64 conversion

Igalia (Asumu Takikawa) has landed the JS-BigInt-integration proposal, so i64 values in WebAssembly can be converted to/from JavaScript BigInt. This is behind a flag and Nightly-only for the time being.

Reference types and bulk memory

SpiderMonkey continues to track the reference types and bulk memory proposals, with several tweaks and bug fixes having landed recently.

Wasm Cranelift

Cranelift is a code generator (written in Rust) that we want to use in Firefox as the next optimizing compiler for WebAssembly.

  • Andrew Brown (from Intel) has kept on implementing more opcodes for the wasm SIMD proposal.
  • Undergoing work is still being carried out by Julian Seward & Benjamin Bouvier to implement different register allocation algorithms: a simple linear scan as well as a backtracking allocator à la IonMonkey.
  • Ryan Hunt has added WebAssembly bulk memory operations support to Cranelift. This is enabled when using Cranelift as the wasm backend in Firefox.
  • Yury Delendik has implemented basic support for the reference types proposal. It is not complete yet, so it is not available in general in Cranelift.
  • Sean Stangl and @bjorn3 have landed initial support for automatically determining what the REX prefix should be for x86 instructions, which will ease supporting more x86 instructions.

Ongoing work

  • The multi-value proposal allows WebAssembly functions and blocks to return multiple values. Andy Wingo from Igalia is making steady progress implementing this feature in SpiderMonkey. It works for blocks, function calls are in progress.
  • Tom Tung, Anne van Kesteren and others are working on re-enabling SharedArrayBuffer by default.
  • Lars Hansen has done initial work (based on older work by David Major) on implementing the exception handling proposal in our production compilers and in the runtime.

Anne van KesterenFeature detection of SharedArrayBuffer objects and shared memory

If you are using feature detection with SharedArrayBuffer objects today you are likely impacted by upcoming changes to shared memory. In particular, you can no longer assume that if you have access to a SharedArrayBuffer object you can also use it with postMessage(). Detecting if SharedArrayBuffer objects are exposed can be done through the following code:

if (self.SharedArrayBuffer) {
  // SharedArrayBuffer objects are available.
}

Detecting if shared memory is possible by using SharedArrayBuffer objects in combination with postMessage() and workers can be done through the following code:

if (self.crossOriginIsolated) {
  // Passing SharedArrayBuffer objects to postMessage() will succeed.
}

Please update your code accordingly!

(As indicated in the aforelinked changes document obtaining a cross-origin isolated environment (i.e., one wherein self.crossOriginIsolated returns true) requires setting two headers and a secure context. Simply put, the Cross-Origin-Opener-Policy header to isolate yourself from attackers and the Cross-Origin-Embedder-Policy header to isolate yourself from victims.)

Mozilla Security BlogThe End-to-End Design of CRLite

CRLite is a technology to efficiently compress revocation information for the whole Web PKI into a format easily delivered to Web users. It addresses the performance and privacy pitfalls of the Online Certificate Status Protocol (OCSP) while avoiding a need for some administrative decisions on the relative value of one revocation versus another. For details on the background of CRLite, see our first post, Introducing CRLite: All of the Web PKI’s revocations, compressed.

To discuss CRLite’s design, let’s first discuss the input data, and from that we can discuss how the system is made reliable.

Designing CRLite

When Firefox securely connects to a website, the browser validates that the website’s certificate has a chain of trust back to a Certificate Authority (CA) in the Mozilla Root CA Program, including whether any of the CAs in the chain of trust are themselves revoked. At this time Firefox knows the issuing certificate’s identity and public key, as well as the website’s certificate’s identity and public key.

To determine whether the website’s certificate is trusted, Firefox verifies that the chain of trust is unbroken, and then determines whether the website’s certificate is revoked. Normally that’s done via OCSP, but with CRLite Firefox simply has to answer the following questions:

  1. Is this website’s certificate older than my local CRLite Filter, e.g., is my filter fresh enough?
  2. Is the CA that issued this website’s certificate included in my local CRLite Filter, e.g. is that CA participating?
  3. If “yes” to the above, and Firefox queries the local CRLite Filter, does it indicate the website’s certificate is revoked?

That’s a lot of moving parts, but let’s inspect them one by one.

Freshness of CRLite Filter Data

Mozilla’s infrastructure continually monitors all of the known Certificate Transparency logs for new certificates using our CRLite tooling; the details of how that works will be in a later blog post about the infrastructure. Since multiple browsers now require that all website certificates are disclosed to Certificate Transparency logs to be trusted, in effect the tooling has total knowledge of the certificates in the public Web PKI.

CRLite high level information blocks

Figure 1: CRLite Information Flow. More details on the infrastructure will be in Part 4 of this blog post series.

Four times per day, all website certificates that haven’t reached their expiration date are processed, drawing out lists of their Certificate Authorities, their serial numbers, and the web URLs where they might be mentioned in a Certificate Revocation List (CRL).

All of the referenced CRLs are downloaded, verified, processed, and correlated against the lists of unexpired website certificates.

The process flow for generating CRLite filters

Figure 2: CRLite Filter Generation Process

At the end, we have a set of all known issuers that publish CRLs we could use, the identification numbers of every certificate they issued that is still unexpired, and the identification numbers of every certificate they issued that hasn’t expired but was revoked.

With this knowledge, we can build a CRLite Filter.

Structure of A CRLite Filter

CRLite data comes in the form of a series of cascading Bloom filters, with each filter layer adding data to the one before it. Individual Bloom filters have a certain chance of false-positives, but using Certificate Transparency as an oracle, the whole Web PKI’s certificate corpus is verified through the filter. When a false-positive is discovered, the algorithm adds it to another filter layer to resolve the false positive.

The query structure of a CRLite filter

Figure 3: CRLite Filter Structure

The certificate’s identifier is defined as shown in Figure 4:

The data structure used for certificate identification

Figure 4: CRLite Certificate Identifier

For complete details of this construction see Section III.B of the CRLite paper.

After construction, the included Web PKI’s certificate corpus is again verified through the filter, ensuring accuracy at that point-in-time.

Ensuring Filter Accuracy

A CRLite filter is accurate at a given point-in-time, and should only be used for the certificates that were both known to the filter generator, and for which there is revocation information.

We can know whether a certificate could be included in the filter if that certificate has delivered with it a Signed Certificate Timestamp from a participating Certificate Transparency log that is at least one Maximum Merge Delay older than our CRLite filter date.

If that is true, we also determine whether the certificate’s issuer is included in the CRLite filter, by referencing our preloaded Intermediate data for a boolean flag reporting whether CRLite includes its data. Specifically, the CA must be publishing accessible, fresh, verified CRL files at a URL included within their certificates’ Authority Information Access data. This flag is updated with the same cadence as CRLite itself, and generally remains constant.

Firefox’s Revocation Checking Algorithm Today

Today, Firefox Nightly is using CRLite in telemetry-only mode, meaning that Firefox will continue to rely on OCSP to determine whether a website’s certificate is valid. If an OCSP response is provided by the webserver itself — via OCSP Stapling — that is used. However, at the same time, CRLite is evaluated, and that result is reported via Firefox Telemetry but not used for revocation.

At a future date, we will prefer to use CRLite for revocation checks, and only if the website cannot be validated via CRLite would we use OCSP, either live or stapled.

Firefox Nightly has a preference security.pki.crlite_mode which controls CRLite; set to 1 it gathers telemetry as stated above. Set to 2, CRLite will enforce revocations in the CRLite filter, but still use OCSP if the CRLite filter does not indicate a revocation.  A future mode will permit CRLite-eligible certificates to bypass OCSP entirely, which is our ultimate goal.

Participating Certificate Authorities

Only public CAs within the Mozilla Root Program are eligible to be included, and CAs are automatically enrolled when they publish CRLs. If a CA stops publishing CRLs, or problems arise with their CRLs, they will be automatically excluded from CRLite filters until the situation is resolved.

As mentioned earlier, if a CA chooses not to log a certificate to a known Certificate Transparency log, then CRLite will not be used to perform revocation checking for that certificate.

Ultimately, we expect CAs to be very interested in participating in CRLite, as it could significantly reduce the cost of operating their OCSP infrastructure.

Listing Enrolled Certificate Authorities

The list of CAs currently enrolled is in our Intermediate Preloading data served via Firefox Remote Settings. In the FAQ for CRLite on Github, there’s information on how to download and process that data yourself to see what CAs revocations are included in the CRLite state.

Notably, Let’s Encrypt currently does not publish CRLs, and as such their revocations are not included in CRLite. The CRLite filters will increase in size as more CAs become enrolled, but the size increase is modeled to be modest.

Portion of the Web PKI Enrolled

Currently CRLite covers only a portion of the Web PKI as a whole, though a sizable portion: As-generated through roughly a period covering December 2019, CRLite covered approximately 100M certificates in the WebPKI, of which about 750k were revoked.

100M enrolled unrevoked vs 700k enrolled revoked certificates

Figure 5: Number of Enrolled Revoked vs Enrolled But Not Revoked Certificates

The whole size of the WebPKI trusted by Mozilla with any CRL distribution point listed is 152M certificates, so CRLite today includes 66% of the potentially-compatible WebPKI  [Censys.io]. The missing portion is mostly due to CRL downloading or processing errors which are being addressed. That said, approximately 300M additional trusted certificates do not include CRL revocation information, and are not currently eligible to be included in CRLite.

Data Sizes, Update Frequency, and the Future

CRLite promises substantial compression of the dataset; the binary form of all unexpired certificate serial numbers comprises about 16 GB of memory in Redis; the hexadecimal form of all enrolled and unexpired certificate serial numbers comprises about 6.7 GB on disk, while the resulting binary Bloom filter compresses to approximately 1.3 MB.

Size of CRLite filters over time

Figure 6: CRLite Filter Sizes over the month of December 2019 (in kilobytes)

To ensure freshness, our initial target was to produce new filters four times per day, with Firefox users generally downloading small delta difference files to catch-up to the current filter. At present, we are not shipping delta files, as we’re still working toward an efficient delta-expression format.

Filter generation is a reasonably fast process even on modest hardware, with the majority of time being spent aggregating together all unexpired certificate serial numbers, all revoked serial numbers, and producing a final set of known-revoked and known-not-revoked certificate issuer-serial numbers (mean of 35 minutes). These aggregated lists are then fed into the CRLite bloom filter generator, which follows the process in Figure 2 (mean of 20 minutes).

 

Distribution of time needed to generate filters

Figure 7: Filter Generation Time [source]

For the most part, faster disks and more efficient (but not human-readable) file formats would speed this process up, but the current speeds are more than sufficient to meet our initial goals, particularly while we continue improving other aspects of the system.

Our next blog post in this series, Part 3, will discuss the telemetry results that our current users of Firefox Nightly are seeing, while Part 4 will discuss the design of the infrastructure.

The post The End-to-End Design of CRLite appeared first on Mozilla Security Blog.

Mozilla Security BlogIntroducing CRLite: All of the Web PKI’s revocations, compressed

CRLite is a technology proposed by a group of researchers at the IEEE Symposium on Security and Privacy 2017 that compresses revocation information so effectively that 300 megabytes of revocation data can become 1 megabyte. It accomplishes this by combining Certificate Transparency data and Internet scan results with cascading Bloom filters, building a data structure that is reliable, easy to verify, and easy to update.

Since December, Firefox Nightly has been shipping with with CRLite, collecting telemetry on its effectiveness and speed. As can be imagined, replacing a network round-trip with local lookups makes for a substantial performance improvement. Mozilla currently updates the CRLite dataset four times per day, although not all updates are currently delivered to clients.

Revocations on the Web PKI: Past and Present

The design of the Web’s Public Key Infrastructure (PKI) included the idea that website certificates would be revocable to indicate that they are no longer safe to trust: perhaps because the server they were used on was being decommissioned, or there had been a security incident. In practice, this has been more of an aspiration, as the imagined mechanisms showed their shortcomings:

  • Certificate Revocation Lists (CRLs) quickly became large, and contained mostly irrelevant data, so web browsers didn’t download them;
  • The Online Certificate Status Protocol (OCSP) was unreliable, and so web browsers had to assume if it didn’t work that the website was still valid.

Since revocation is still crucial for protecting users, browsers built administratively-managed, centralized revocation lists: Firefox’s OneCRL, combined with Safe Browsing’s URL-specific warnings, provide the tools needed to handle major security incidents, but opinions differ on what to do about finer-grained revocation needs and the role of OCSP.

The Unreliability of Online Status Checks

Much has been written on the subject of OCSP reliability, and while reliability has definitely improved in recent years (per Firefox telemetry; failure rate), it still suffers under less-than-perfect network conditions: even among our Beta population, which historically has above-average connectivity, over 7% of OCSP checks time out today.

Because of this, it’s impractical to require OCSP to succeed for a connection to be secure, and in turn, an adversarial monster-in-the-middle (MITM) can simply block OCSP to achieve their ends. For more on this, a couple of classic articles are:

Mozilla has been making improvements in this realm for some time, implementing OCSP Must-Staple, which was designed as a solution to this problem, while continuing to use online status checks whenever there’s no stapled response.

We’ve also made Firefox skip revocation information for short-lived certificates; however, despite improvements in automation, such short-lived certificates still make up a very small portion of the Web PKI, because the majority of certificates are long-lived.

Does Decentralized Revocation Bring Dangers?

The ideal in question is whether a Certificate Authority’s (CA) revocation should be directly relied upon by end-users.

There are legitimate concerns that respecting CA revocations could be a path to enabling CAs to censor websites. This would be particularly troubling in the event of increased consolidation in the CA market. However, at present, if one CA were to engage in censorship, the website operator could go to a different CA.

If censorship concerns do bear out, then Mozilla has the option to use its root store policy to influence the situation in accordance with our manifesto.

Does Decentralized Revocation Bring Value?

Legitimate revocations are either done by the issuing CA because of a security incident or policy violation, or they are done on behalf of the certificate’s owner, for their own purposes. The intention becomes codified to render the certificate unusable, perhaps due to key compromise or service provider change, or as was done in the wake of Heartbleed.

Choosing specific revocations to honor and refusing others dismisses the intentions of all left-behind revocations attempts. For Mozilla, it violates principle 6 of our manifesto, limiting participation in the Web PKI’s security model.

There is a cost to supporting all revocations – checking OCSP:

  1. Slows down our first connection by ~130 milliseconds (CERT_VALIDATION_HTTP_REQUEST_SUCCEEDED_TIME, https://mzl.la/2ogT8TJ),
  2. Fails unsafe, if an adversary is in control of the web connection, and
  3. Periodically reveals to the CA the HTTPS web host that a user is visiting.

Luckily, CRLite gives us the ability to deliver all the revocation knowledge needed to replace OCSP, and do so quickly, compactly, and accurately.

Can CRLite Replace OCSP?

Firefox Nightly users are currently only using CRLite for telemetry, but by changing the preference security.pki.crlite_mode to 2, CRLite can enter “enforcing” mode and respect CRLite revocations for eligible websites. There’s not yet a mode to disable OCSP; there’ll be more on that in subsequent posts.

This blog post is the first in a series discussing the technology for CRLite, the observed effects, and the nature of a collaboration of this magnitude between industry and academia. The next post discusses the end-to-end design of the CRLite mechanism, and why it works. Additionally, some FAQs about CRLite are available on Github.

The post Introducing CRLite: All of the Web PKI’s revocations, compressed appeared first on Mozilla Security Blog.

Marco ZeheWhat’s new for accessibility in Gutenberg 7.2

Gutenberg 7.2 has just been released as a plugin. The development cycle was longer than usual. As a result, this version contains a lot of changes. Several of them improve Gutenberg’s accessibility.

The tab order in the editor

When editing a block, the tab order has been adjusted. Rather than tabbing to the next block, for example from one paragraph to the next, pressing tab will now put focus into the side bar for the active block. Further tabbing will move through the controls of said side bar. Shift+Tab will go in the opposite direction.

Likewise, when in the main contents area of a block, Shift+Tab will now move focus to the toolbar consistently and through its controls. It will also skip the drag handle for a block, because this is not keyboard operable. Tab will stop on the items to move the block up or down within the current set of blocks.

This makes the keyboard focus much more consistent and alleviates the need to use the custom keyboard shortcuts for the side bar and toolbar. These do still work, so if you have memorized them, you can continue using them. But you do not need to, tab and shift+tab will now also take you to expected places consistently.

Improvements to the Welcome guide

The modal for the Welcome guide has been enhanced. The modal now always gets a proper title for screen readers, so it no longer speaks an empty dialog when focus moves into it. The current page is now indicated for screen readers so it is easy to know which of the steps in the current guide is showing. The main contents is now a document so screen readers which apply a special reading mode for content sections can provide this functionality inside the modal.

This was one of the first two code contributions to Gutenberg by yours truly.

More enhancements and fixes

The justification radio menu items in the formatting toolbar are now properly exposed as such. This was the other of the two code contributions I made to this Gutenberg version.

The Social block now has proper labels.

The block wrapper, which contains the current set of blocks, now properly identifies as a group rather than a section. This will make it easier when dealing with nested blocks or parallel groups of blocks when building pages.

In conclusion

Gutenberg continues to improve. And now that I am a team member as well, I’ll try to help as time and capacity permit. The changes especially to the keyboard focus and semi-modality of blocks is a big step in improving usability.

One other thing that will hopefully land soon once potential plugin compatibility issues are resolved, will be that toolbars conform to the WAI-ARIA design pattern. That will mean that every toolbar container will be one tab stop, and elements within will be navigable via arrow keys. That will reduce the amount of tab stops and thus improve efficiency and compliance.

Daniel Stenbergwebinar: Why everyone is using curl and you should too

I’m please to invite you to our live webinar, “Why everyone is using curl and you should too!”, hosted by wolfSSL. Daniel Stenberg (me!), founder and Chief Architect of curl, will be live and talking about why everyone is using curl and you should too!

This is planned to last roughly 20-30 minutes with a following 10 minutes Q&A.

Space is limited so please register early!

When: Jan 14, 2020 08:00 AM Pacific Time (US and Canada) (16:00 UTC)

Register in advance for this webinar!

After registering, you will receive a confirmation email containing information about joining the webinar.

Not able to attend? Register now and after the event you will receive an email with link to the recorded presentation.

The presentation

Niko MatsakisTowards a Rust foundation

In my #rust2020 blog post, I mentioned rather off-handedly that I think the time has come for us to talk about forming a Rust foundation. I wanted to come back to this topic and talk in more detail about what I think a Rust foundation might look like. And, since I don’t claim to have the final answer to that question by any means, I’d also like to talk about how I think we should have this conversation going forward.

Hat tip

Before going any further, I want to say that most of the ideas in this post arose from conversations with others. In particular, Florian Gilcher, Ryan Levick, Josh Triplett, Ashley Williams, and I have been chatting pretty reguarly, and this blog post generally reflects the consensus that we seemed to be arriving at (though perhaps they will correct me). Thanks also to Yehuda Katz and Till Schneidereit for lots of detailed discussions.

Why do we want a Rust foundation?

I think this is in many ways the most important question for us to answer: what is it that we hope to achieve by creating a Rust foundation, anyway?

To me, there are two key goals:

  • to help clarify Rust’s status as an independent project, and thus encourage investment from more companies;
  • to alleviate some practical problems caused by Rust not having a “legal entity” nor a dedicated bank account.

There are also some anti-goals. Most notably:

  • the foundation should not replace the existing Rust teams as a decision-making apparatus.

The role of the foundation is to complement the teams and to help us in achieving our goals. It is not to set the goals themselves.

Start small and iterate

You’ll notice that I’ve outlined a fairly narrow role for the foundation. This is no accident. When designing a foundation, just as when designing many other things, I think it makes sense for us to move carefully, a step at a time.

We should try to address immediate problems that we are facing and then give those changes some time to “sink in”. We should also take time to experiment with some of the various funding possibilities that are out there (some of which I’ll discuss later on). Once we’ve had some more experience, it should be easier for us to see which next steps make sense.

Another reason to start small is being able to move more quickly. I’d like to see us setup a foundation like the one I am discussing as soon as this year.

Goal #1: Clarifying Rust’s status as an independent project

So let’s talk a bit more about the two goals that I set forth for a Rust foundation. The first was to clarify Rust’s status as an independent project. In some sense, this is nothing new. Mozilla has from the get-go attempted to create an independent governance structure and to solicit involvement from other companies, because we know this makes Rust a better language for everyone.

Unfortunately, there is sometimes a lingering perception that Mozilla “owns” Rust, which can discourage companies from getting invested, or create the perception that there is no need to support Rust since Mozilla is footing the bill. Establishing a foundation will make official what has been true in practice for a long time: that Rust is an independent project.

We have also heard a few times from companies, large and small, who would like to support Rust financially, but right now there is no clear way to do that. Creating a foundation creates a place where that support can be directed.

Mozilla wants to support Rust… just not alone

Now, establishing a Rust foundation doesn’t mean that Mozilla plans to step back. After all, Mozilla has a lot riding on Rust, and Rust is playing an increasingly important role in how Mozilla builds our products. What we really want is a scenario where other companies join Mozilla in supporting Rust, letting us do much more.

In truth, this has already started to happen. For example, just this year Microsoft started sponsoring Rust’s CI costs and Amazon is paying Rust’s S3 bills. In fact, we recently added a corporate sponsors page to the Rust web site to acknowledge the many companies that are starting to support Rust.

Goal #2: Alleviating some practical difficulties

While the Rust project has its own governance system, it has never had its own distinct legal entity. That role has always been played by Mozilla. For example, Mozilla owns the Rust trademarks, and Mozilla is the legal operator for services like crates.io. This means that Mozilla is (in turn) responsible for ensuring that DMCA requests against those services are properly managed and so forth. For a long time, this arrangement worked out quite well for Rust. Mozilla Legal, for example, provided excellent help in drafting Rust’s trademark agreements and coached us through how to handle DMCA takedown requests (which thankfully have arisen quite infrequently).

Lately, though, the Rust project has started to hit the limits of what Mozilla can reasonably support. One common example that arises is the need to have some entity that can legally sign contracts “for the Rust project”. For example, we wished recently to sign up for Github’s Token Scanning program, but we weren’t able to figure out who ought to sign the contract.

Is token scanning by itself a burning problem? No. We could probably work out a solution for it, and for other similar cases that have arisen, such as deciding who should sign Rust binaries. But it might be a sign that it is time for the Rust project to have its own legal entity.

Another practical difficulty: Rust has no bank account

Another example of a “practical difficulty” that we’ve encountered is that Rust has no bank account. This makes it harder for us to arrange for joint sponsorship and support of events and other programs that the Rust program would like to run. The most recent example is the Rust All Hands. Whereas in the past Mozilla has paid for the venue, catering, and much of the airfare by itself, this year we are trying to “share the load” and have multiple companies provide sponsorship. However, this requires a bank account to collect and pool funds. We have solved the problem for this year, but it would be easier if the Rust organization had a bank account of its own. I imagine we would also make use of a bank account to fund other sorts of programs, such as Increasing Rust’s Reach.

On paying people and contracting

One area where I think we should move slowly is on the topic of employing people and hiring contractors. As a practical matter, the foundation is probably going to want to employ some people. For example, I suspect we need an “operations manager” to help us keep the wheels turning (this is already a challenge for the core team, and it’s only going to get worse as the project grows). We may also want to do some limited amount of contracting for specific purposes (e.g., to pay for someone to run a program like Increasing Rust’s Reach, or to help do data crunching on the Rust survey).

The Rust foundation should not hire developers, at least to start

But I don’t think the Rust foundation should do anything like hiring full-time developers, at least not to start. I would also avoid trying to manage larger contracts to hack on rustc. There are a few reasons for this, but the biggest one is simply that it is expensive. Funding that amount of work will require a significant budget, which will require significant fund-raising.

Managing a large budget, as well as employees, will also require more superstructure. If we hire developers, who decides what they should work on? Who decides when it’s time to hire? Who decides when it’s time to fire?

This is a bit difficult: on the one hand, I think there is a strong need for more people to get paid for their work on Rust. On the other hand, I am not sure a foundation is the right institution to be paying them; even if it were, it seems clear that we don’t have enough experience to know how to answer the sorts of difficult questions that will arise as a result. Therefore, I think it makes sense to fall back on the approach to “start small and iterate” here. Let’s create a foundation with a limited scope and see what difference it makes before we make any further decisions.

Some other things the foundation wouldn’t do

I think there are a variety of other things that a hypothetical foundation should not do, at least not to start. For example, I think the foundation should not pay for local meetups nor sponsor Rust conferences. Why? Well, for one thing, it’ll be hard for us to come up with criteria on when to supply funds and when not to. For another, both meetups and conferences I think will do best if they can forge strong relationships with companies directly.

However, even if there are things that the Rust foundation wouldn’t fund or do directly, I think it makes a lot of sense to collect a list of the kinds of things it might do. If nothing else, we can try to offer suggestions for where to find funding or obtain support, or perhaps offer some lightweight “match-making” role.

We should strive to have many kinds of Rust sponsorship

Overall, I am nervous about a situation in which a Rust Foundation comes to have a kind of “monopoly” on supporting the Rust project or Rust-flavored events. I think it’d be great if we can encourage a wider variety of setups. First and foremost, I’d like to see more companies that use Rust hiring people whose job description is to support the Rust project itself (at least in part). But I think it could also work to create “trade associations” where multiple companies pool funds to hire Rust developers. If nothing else, it is worth experimenting with these sorts of setups to help gain experience.

We should create a “project group” to figure this out

Creating a foundation is a complex task. In this blog post, I’ve just tried to sketch the “high-level view” of what responsiblities I think a foundation might take on and why (and which I think we should avoid or defer). But I left out a lot of interesting details: for example, should the Foundation be a 501(c)(3) (a non-profit, in other words) or not? Should we join an umbrella organization and – if so – which one?

The traditional way that the Rust project makes decisions, of course, is through RFCs, and I think that a decision to create a foundation should be no exception. In fact, I do plan to open an RFC about creating a foundation soon. However, I don’t expect this RFC to try to spell out all the details of how a foundation would work. Rather, I plan to propose creating a project group with the goal of answering those questions.

In short, I think the core team should select some set of folks who will explore the best design for a foundation. Along the way, we’ll keep the community updated with the latest ideas and take feedback, and – in the end – we’ll submit an RFC (or perhaps a series of RFCs) with a final plan for the core team to approve.

Feedback

OK, well, enough about what I think. I’m very curious (and a bit scared, I won’t lie) to hear what people think about the contents of this post. To collect feedback, I’ve created a thread on internals. As ever, I’ll read all the responses, and I’ll do my best to respond where I can. Thanks!

Cameron KaiserTenFourFox not vulnerable to CVE-2019-17026

After doing some analysis late last night and today to determine if we need a chemspill build, I have concluded that TenFourFox is not vulnerable to CVE-2019-17026, or at least not to any of the PoCs or test cases available to me. This is the 0-day that was fixed in Firefox 72.0.1 and 68.4.1. Though a portion of the affected code exists in the TenFourFox code base, there doesn't seem to be a way to trigger the exploit due to various other missing optimizations and the oddities of our JIT. (Firefox 45-based browsers using our patches as upstream should bear in mind this may not be true for other architectures, however.) Absent evidence to the contrary it will be nevertheless patched as part of the standard security fixes in FPR19.

The Mozilla BlogExpanding Mozilla’s Boards in 2020

Mozilla is a global community that is building an open and healthy internet. We do so by building products that improve internet life, giving people more privacy, security and control over the experiences they have online. We are also helping to grow the movement of people and organizations around the world committed to making the digital world healthier.

As we grow our ambitions for this work, we are seeking new members for the Mozilla Foundation Board of Directors. The Foundation’s programs focus on the movement building side of our work and complement the products and technology developed by Mozilla Corporation.

What is the role of a Mozilla board member?

I’ve written in the past about the role of the Board of Directors at Mozilla.

At Mozilla, our board members join more than just a board, they join the greater team and the whole movement for internet health. We invite our board members to build relationships with management, employees and volunteers. The conventional thinking is that these types of relationships make it hard for the Executive Director to do his or her job. I wrote in my previous post that “We feel differently”. This is still true today. We have open flows of information in multiple channels. Part of building the world we want is to have built transparency and shared understandings.

It’s worth noting that Mozilla is an unusual organization. We’re a technology powerhouse with broad internet openness and empowerment at its core. We feel like a product organization to those from the nonprofit world; we feel like a non-profit organization to those from the technology industry.

It’s important that our board members understand the full breadth of Mozilla’s mission. It’s important that Mozilla Foundation Board members understand why we build consumer products, why it happens in the subsidiary and why they cannot micro-manage this work. It is equally important that Mozilla Corporation Board members understand why we engage in the open internet activities of the Mozilla Foundation and why we seek to develop complementary programs and shared goals.

What are we looking for?

Last time we opened our call for board members, we created a visual role description. Below is an updated version reflecting the current needs for our Mozilla Foundation Board.

Here is the full job description: https://mzl.la/MoFoBoardJD

Here is a short explanation of how to read this visual:

  • In the vertical columns, we have the particular skills and expertise that we are looking for right now. We expect new board members to have at least one of these skills.
  • The horizontal lines speaks to things that every board member should have. For instance, to be a board member, you should have to have some cultural sense of Mozilla. They are a set of things that are important for every candidate. In addition, there is a set of things that are important for the board as a whole. For instance, international experience. The board makeup overall should cover these areas.
  • The horizontal lines will not change too much over time, whereas the vertical lines will change, depending on who joins the Board and who leaves.

Finding the right people who match these criteria and who have the skills we need takes time. We hope to have extensive discussions with a wide range of people. Board candidates will meet the existing board members, members of the management team, individual contributors and volunteers. We see this as a good way to get to know how someone thinks and works within the framework of the Mozilla mission. It also helps us feel comfortable including someone at this senior level of stewardship.

We want your suggestions

We are hoping to add three new members to the Mozilla Foundation Board of Directors over the next 18 months. If you have candidates that you believe would be good board members, send them to msurman@mozillafoundation.org. We will use real discretion with the names you send us.

The post Expanding Mozilla’s Boards in 2020 appeared first on The Mozilla Blog.

Mitchell BakerExpanding Mozilla’s Boards in 2020

Mozilla is a global community that is building an open and healthy internet. We do so by building products that improve internet life, giving people more privacy, security and control over the experiences they have online. We are also helping to grow the movement of people and organizations around the world committed to making the digital world healthier.

As we grow our ambitions for this work, we are seeking new members for the Mozilla Foundation Board of Directors. The Foundation’s programs focus on the movement building side of our work and complement the products and technology developed by Mozilla Corporation.

What is the role of a Mozilla board member?

I’ve written in the past about the role of the Board of Directors at Mozilla.

At Mozilla, our board members join more than just a board, they join the greater team and the whole movement for internet health. We invite our board members to build relationships with management, employees and volunteers. The conventional thinking is that these types of relationships make it hard for the Executive Director to do his or her job. I wrote in my previous post that “We feel differently”. This is still true today. We have open flows of information in multiple channels. Part of building the world we want is to have built transparency and shared understandings.

It’s worth noting that Mozilla is an unusual organization. We’re a technology powerhouse with broad internet openness and empowerment at its core. We feel like a product organization to those from the nonprofit world; we feel like a non-profit organization to those from the technology industry.

It’s important that our board members understand the full breadth of Mozilla’s mission. It’s important that Mozilla Foundation Board members understand why we build consumer products, why it happens in the subsidiary and why they cannot micro-manage this work. It is equally important that Mozilla Corporation Board members understand why we engage in the open internet activities of the Mozilla Foundation and why we seek to develop complementary programs and shared goals.

What are we looking for?

Last time we opened our call for board members, we created a visual role description. Below is an updated version reflecting the current needs for our Mozilla Foundation Board.

Here is the full job description: https://mzl.la/MoFoBoardJD

Here is a short explanation of how to read this visual:

  • In the vertical columns, we have the particular skills and expertise that we are looking for right now. We expect new board members to have at least one of these skills.
  • The horizontal lines speaks to things that every board member should have. For instance, to be a board member, you should have to have some cultural sense of Mozilla. They are a set of things that are important for every candidate. In addition, there is a set of things that are important for the board as a whole. For instance, international experience. The board makeup overall should cover these areas.
  • The horizontal lines will not change too much over time, whereas the vertical lines will change, depending on who joins the Board and who leaves.

Finding the right people who match these criteria and who have the skills we need takes time. We hope to have extensive discussions with a wide range of people. Board candidates will meet the existing board members, members of the management team, individual contributors and volunteers. We see this as a good way to get to know how someone thinks and works within the framework of the Mozilla mission. It also helps us feel comfortable including someone at this senior level of stewardship.

We want your suggestions

We are hoping to add three new members to the Mozilla Foundation Board of Directors over the next 18 months. If you have candidates that you believe would be good board members, send them to msurman@mozillafoundation.org. We will use real discretion with the names you send us.

Wladimir PalantAvast complies to respect users' privacy

December last year has been an interesting month in Avast-land. After my investigation into Avast’s data collection practices didn’t attract any attention initially, Mozilla and Opera removed Avast’s browser extensions from their respective add-on stores immediately after I reported them. Google spent two weeks evaluating the issue but eventually did the same. The matter of Avast selling users’ data even attracted attention of high-level politicians.

Avast watching you while browsing the web

Avast’s official communication throughout that month was nothing short of amazing. I found it hard to believe that a company could keep denying any wrongdoing despite all the evidence to the contrary. Avast’s CEO Ondrej Vlcek even gave an interview to the Forbes magazine where he claimed that there was no privacy scandal here. Users clearly disagreed, and so did most journalists. But the company’s stance didn’t change: all the data collected is necessary to protect users, and selling it later without user’s agreement is completely unproblematic due to the data being “anonymized.”

So when on December 22nd they finally brought out updated versions of their extensions, I was very curious to see what they changed other than writing a usable privacy policy. The updates have been accepted by all browser vendors and, at the time of writing, all four extensions are available for Firefox and Chrome. The Opera Add-ons site currently lists three extensions, with Avast Online Security still missing.

Let’s say this much up front: the changes are far more extensive and far more convincing than I would have expected. While Chrome and Opera versions appear identical however, there are some additional changes in the Firefox version. That’s presumably to comply with stricter privacy requirements of the Mozilla Add-ons site.

Update (2020-01-10): Avast contacted me to clarify the situation. One piece of information stood out here: “we completely discontinued the practice of using any data from the browser extensions for any other purpose than the core security engine.” In other words, Avast isn’t merely doing the bare minimum required to comply with store policies, they completely give up collecting too much data via their browser extensions and they won’t share this data with anybody either. That’s a massive privacy improvement for any Avast users out there. The open question is whether this policy change also applies to the Avast SafePrice extension and Avast Secure Browser. I’ll update the post again once I have the answer. Update (2020-01-16): The quoted statement from Avast seemed unambiguous, yet further communication established that sharing data with Jumpshot is going to be opt-in functionality for users of the free antivirus application. It’s still an improvement of course but quite different from the initial statement. As to Avast SafePrice and Avast Secure Browser, improvements are expected here in future. Supposedly, the data collected by these was never used, a statement that is impossible to validate.

Just to be clear: with the large codebases and without any official information from Avast I might have overlooked some of the changes. On Firefox I looked at Avast Online Security 19.4.426, on Chrome at Avast Online Security 19.4.433 and on Opera at AVG Online Security 19.4.433.

The bogus consent screen

One change is very obvious when you install the Firefox extension. Upon installation the extension will open this consent screen:

Consent screen asking permission to look at web addresses

Currently, this only happens if you install Avast Online Security from Mozilla Add-ons website. That’s because the antivirus application installs an older version of the extension, and the consent screen isn’t displayed on updates. I assume however that installs via the antivirus application will also produce this consent screen once a new version of the application is available. Chrome and Opera extensions generally won’t show this screen.

But that doesn’t really matter. Do you think that clicking “No thanks” here will switch the extension to a privacy friendly mode? No, the extension will rather drop dead and suggest that the user uninstalls it. This consent screen satisfies Mozilla’s requirement to have any URL collection be strictly opt-in but the user does not really having a choice here. Luckily, this isn’t the only change.

The “share data with Jumpshot” setting

Update (2020-01-10): This section originally discussed an “Allow usage data to be shared with Jumpshot with analytics” setting only visible to Firefox users. As mentioned above, Avast no longer wants to share extension data with Jumpshot or other third parties. According to them, having this setting still present in the Firefox extension was an oversight and has been corrected in Avast Online Security 19.4.434.

Setting named 'Allow usage data to be shared with Jumpshot with analytics'

In previous versions this setting was called “Allow usage data to be shared with 3rd parties for analytics.” When analyzing the extension before I didn’t really understand what this setting was doing, because changing it showed so little effect and because of the misleading internal name communityIQ. As I realize now, this setting (and the setting of the same name in the antivirus application) was responsible for the fact that I didn’t see user identifiers being sent to the servers under some conditions.

This setting still exists internally in the current version of the extension. It is off by default however and will only be switched on if the Avast antivirus application is installed and data sharing is allowed there.

Setting named 'Allow usage data to be shared with 3rd parties for analysis of trends, business, and marketing'

Since the settings of the Avast antivirus application are quite extensive: this particular setting can be found under General > Personal Privacy.

The new data collection practices

If you are a Firefox user, things are quite simple: only minimal data will be sent to Avast now. In addition to the full page address, that data includes information about the extension and the browser you are using.

It’s the same with Chrome and Opera users who didn’t agree to share data with third parties, only a minimal amount of data is sent. If they accepted to share their data however, the extension will send the same data set as previous versions to the Avast servers.

This is a considerable improvement to the previous versions where the “share data” setting had a very limited effect. When unchecked it would ensure that user identifiers are omitted, but otherwise all data would be sent along with a dnl flag (short for “do not log”). Also, this setting was on by default, particularly for users who never installed the antivirus application and hence couldn’t deny data collection on its consent screen.

Update (2020-01-10): Regardless of the browser used, the latest extension versions will now always set the dnl flag. That’s different from Avast Online Security 19.4.426 for Firefox that I looked at two days ago. According to Avast, this flag no longer serves a purpose and will go away eventually.

What does the “dnl” flag do?

Since the dnl flag is being processed on the server side, we can only speculate about what it does. It would be logical to assume that it is being processed according to its name: if data comes in with a request that has the dnl flag set, that data is only used to produce a response but nothing is stored. Given that previous versions of Avast extensions were setting this flag as well and at least users of the Avast antivirus application have seen a consent screen asking them to allow data usage, it would make the whole issue a much smaller one.

There are some oddities here however which make me doubt whether the logical assumption is the correct one. First are the official statements by Avast in reaction to this issue being raised. If data was being shared with Jumpshot only for the users who agreed to it, why not say so? Even if the users have no way to validate this claim, it’s still a much stronger statement than “the data is anonymized, nothing to be concerned about.”

And if the dnl flag is being considered correctly by the server, why is it always set for Chrome and Opera users? With the majority of Avast users on Chrome, I don’t think that Avast would give up so much data intentionally. So it must be a bug, one that has been in production for more than two weeks now. Even with the holiday season, somebody certainly would have noticed a sharp decline in the number of data samples collected? With data being so important to Avast’s business, they certainly would have rushed a fix?

Update (2020-01-10): The speculation in the paragraph above is incorrect, the dnl flag always being set is intentional and not a bug. That’s in line with Avast’s decision to stop sharing data with third parties. So I am now more inclined to believe that historically the dnl flag indeed caused data to be discarded at some point, ideally before this data was passed on to Jumpshot.

And the shopping helpers?

I didn’t spend too much time investigating Avast SafePrice. This extension being a shopping helper, it apparently cannot be expected to be too privacy-friendly. So there is no “data sharing” setting and no dnl flag here. Merely a bogus consent screen was added on Firefox: “either you allow us to collect all this data or the extension won’t be usable.” To add insult to injury, the extension won’t remember you declining or closing this consent screen, so when you restart your browser it will simply assume consent.

Consent screen asking for access to an extensive set of data

At least I noticed one change to the data collection practices. While the data collected here is still quite extensive and will always contain a unique user identifier as well as window and tab identifiers for example, the page address is now being shortened to contain protocol and host name only. At least that much.

Update (2020-01-10): I clarified that the consent screen is required by Mozilla’s policies. There has been a number of other corrections based on information provided by Avast, these are marked separately.

Update (2020-01-16): Avast’s statement above has been extended based on further communication.

Daniel Stenbergcurl 7.68.0 with etags and BearSSL

The year is still young, and we’re here to really kick off 2020 with a brand new curl release! curl 7.68.0 is available at curl.haxx.se as always. Once again we’ve worked hard and pushed through another release cycle to bring you the very best we could do in the 63 days since 7.67.0.

(The previous release was said to be the 186th, but it turned out we’ve been off-by-one on the release counter for a while.)

Numbers

the 188th release
6 changes
63 days (total: 7,964)

124 bug fixes (total: 5,788)
193 commits (total: 25,124)
1 new public libcurl function (total: 82)
0 new curl_easy_setopt() option (total: 269)

3 new curl command line option (total: 229)
70 contributors, 32 new (total: 2,088)
31 authors, 13 new (total: 756)
1 security fixes (total: 93)
400 USD paid in Bug Bounties

Security Vulnerability

CVE-2019-15601: SMB access smuggling via FILE URL on Windows.

Simply put: you could provide a FILE:// URL to curl that could trick it to try to access a host name over SMB – on Windows machines. This could happen because Windows apparently always do this automatically if given the correct file name and curl had no specific filter to avoid it.

For this discovery and report, the curl Bug Bounty program has rewarded Fernando Muñoz 400 USD.

Changes

We ship a new TLS backend: BearSSL. The 14th.

We ship two new command line options for ETags.

We provide a new API call to wakeup “sleeping” libcurl poll calls.

We changed the default handling in libcurl with OpenSSL for verifying certificates. We now allow “partial chains” by default, meaning that you can use an intermediate cert to verify the server cert, not necessarily the whole chain to the root, like you did before. This brings the OpenSSL backend to work more similar to the other TLS backends, and we offer a new option for applications to switch back on the old behavior (CURLSSLOPT_NO_PARTIALCHAIN).

The progress callback has a new feature: if you return CURL_PROGRESSFUNC_CONTINUE from the callback, it will continue and call the internal progress meter.

The new command line option --parallel-immediate is added, and if used will make curl do parallel transfers like before 7.68.0. Starting with 7.68.0, curl will default to defer new connections and rather try to multiplex new transfer over an existing connection if more than one transfer is specified to be done from the same host name.

Bug-fixes

Some of my favorite fixes done since the last release include…

Azure CI and torture

This cycle we started running a bunch of CI tests on Azure Pipelines, both Linux and macOS tests. We also managed to get torture tests running thanks to the new shallow mode.

Azure seem to run faster and more reliable than Travis CI, so moving a few jobs over has made a total build run often complete in less total time now.

prefer multiplexing to using new connections

A regression was found that made the connection reuse logic in libcurl to prefer new connections to multiplexing more than what was actually intended and once fixed we should see libcurl-using application do more and better HTTP/2 multiplexing.

support for ECDSA and ed25519 knownhost keys with libssh2

libssh2 is the primary SSH backend people use with curl. While the library itself has supported these new “knownhost” keys for a while, we hadn’t previously adjusted curl to play nicely with them. Until now.

openssl: Revert to less sensitivity for SYSCALL errors

Another regression in the OpenSSL backend code made curl overly sensitive to some totally benign TLS messages which would cause a curl error when they should just have been silently handled and closed the connection cleanly.

openssl: set X509_V_FLAG_PARTIAL_CHAIN by default

The OpenSSL backend now behaves more similar to other TLS backends in curl and now accepts “partial” certificate chains. That means you don’t need to have the entire chain locally all the way to the root in order to verify a server certificate. Unless you set CURLSSLOPT_NO_PARTIALCHAIN to enforce that behavior.

parsedate: offer a getdate_capped() alternative

The date parser was extended to make curl handle dates beyond 2038 better on 32 bit systems, which primarily seems to happen with cookies. Now the parser understands that they’re too big and will use the max time value it can hold instead of failing and using a zero, as that would make the cookies into “session cookies” which would have slightly different behavior.

Video presenting the release

Cameron KaiserThe new Overbite Android (works with Firefox Android too): Gopherspace on your mobile Android device

Since this blog is syndicated to Planet Mozilla and I periodically post Mozilla- and Firefox-relevant posts, here is another: if you still dwell in Gopherspace and use OverbiteWX and OverbiteNX on desktop Firefox, Overbite Android has been updated to full Android Q compatibility so you can use it with Android Firefox as well. Instead of an add-on, just sideload the .apk, and whenever you tap a Gopher URL in Firefox it will automatically load in Overbite Android so you can seamlessly jump back and forth. (Or Chrome, I guess, but who uses that?)

Naturally Overbite Android works just fine as a standalone application and responds to any gopher:// intent sent by any other activity, including Firefox. However, this latest version has been updated specially for Android Q support, including dark theme:

I also purged a lot of the old API usage, replacing it with a more Material Design-style UI, an actual address bar you can edit for a change, and a dynamic menu off a floating action button (as opposed to the old school Menu button menu, support for which was removed from Android Q). There are also fixes to scrolling and zooming, and you can still generate and drop shortcuts on your Android launcher as bookmarks.

Now that I've gotten off my butt and converted it to Android Studio, I suppose I should start working on download support again but everything else (including searches) functions just dandy.

Overbite Android is offered to you under the BSD license and supports 4.0 (Ice Cream Sandwich) through 10 (Q). You can get it, the Android Studio project and source code, and all the rest of the Overbite gopher client family from the Overbite website or directly from Gopherspace.

Will Kahn-GreeneHow to pick up a project with an audit

Over the last year, I was handed a bunch of projects in various states. One of the first things I do when getting a new project that I'm suddenly responsible for is to audit the project. That helps me figure out what I'm looking at and what I need to do with it next.

This blog post covers my process for auditing projects I'm suddenly the proud owner of.

Read more… (5 min remaining to read)

Mozilla Open Policy & Advocacy BlogOpen Letter to Indian IT Minister by Mozilla, GitHub, and Cloudflare: Release draft intermediary liability rules, assuage concerns voiced during public consultation

Given the Indian government’s impending commitment to the Supreme Court to notify the intermediary liability amendments by January 15 2020, global internet organizations Mozilla, GitHub, and Cloudflare have penned an open letter to the Union Minister of Electronics & Information Technology, Shri. Ravi Shankar Prasad. The letter highlights significant concerns with the rules and calls for improved transparency by allowing the public an opportunity to see a final version of these amendments prior to their enactment.

An excerpt from the letter is extracted below, and the full letter is available online:

“On behalf of a group of global internet organisations with millions of users in India, we are writing to urge you to ensure the planned amendments to India’s intermediary liability regime allow for the Internet to remain an open, competitive, and empowering space for Indians. We understand and respect the need to ensure the internet is a safe space where large platforms take appropriate responsibility. However, the last version of these amendments which were available in the public domain suggest that the rules will promote automated censorship, tilt the playing field in favour of large players, substantially increase surveillance, and prompt a fragmentation of the internet in India that would harm users while failing to empower Indians.

The current safe harbour liability protections have been fundamental to the growth of the internet in India. They have enabled hosting platforms to innovate and flourish without fear that they would be crushed by a failure to police every action of their users. Imposing the obligations proposed in these new rules would place a tremendous, and in many cases fatal, burden on many online intermediaries – especially new organizations and companies. A new community or a startup would be significantly challenged by the need to build expensive filtering infrastructure and hire an army of lawyers.

Given your government’s commitment to the Supreme Court of India to notify these rules by January 15, 2020, it is vital that the public has the opportunity to see a final version of these amendments to help ensure that they assuage the concerns which have been voiced by a wide variety of stakeholders during the public consultation. We appeal for this increased transparency and we remain committed to working with you to achieve the broader objective of these amendments while allowing Indians to benefit from a global internet.”

 

About Mozilla
Mozilla is the not-for-profit behind the popular web browser, Firefox. We believe the Internet is a global public resource, open and accessible to all. We work to ensure it stays open by building products, technologies and programs that put people in control of their online lives, and contribute to a healthier internet. Mozilla is also leading a public petition to Shri. Ravi Shankar Prasad, India’s IT Minister, to make the latest draft of the intermediary liability amendments public prior to their enactment.

About GitHub
GitHub is the developer company. We make it easier for developers to be developers: to work together, to solve challenging problems, to create the world’s most important technologies. We foster a collaborative community that can come together—as individuals and in teams—to create the future of software and make a difference in the world.

About Cloudflare
Cloudflare, Inc. (NYSE: NET / www.cloudflare.com / @cloudflare) is on a mission to help build a better Internet. Cloudflare’s platform protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare have all web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks.

The post Open Letter to Indian IT Minister by Mozilla, GitHub, and Cloudflare: Release draft intermediary liability rules, assuage concerns voiced during public consultation appeared first on Open Policy & Advocacy.

Hacks.Mozilla.OrgFirefox 72 — our first song of 2020

2020 is upon us, folks. We’d like to wish everyone reading this a happy new year, wherever you are. As you take your first steps of the new year, figuring out what your next move is, you may find it comforting to know that there’s a new Firefox release to try out!

Version 72 to be exact.

One of the highlights that we are most proud of is that user gestures are now required for a number of permission-reliant methods, such as Notification.requestPermission(). User research commonly brings up permission prompt spam as a top user annoyance, so we decided to do something about it. This change reduces permission spam and strengthens users’ agency over their online experience.

This release brings several other new features, including DevTool improvements such as Watchpoints, WebSockets inspector improvements, and resource download times; support for CSS features like shadow parts, motion path, and transform properties; and JS/API features such as event-based form participation and the nullish coalescing operator.

Read on for more highlights. To find the full list of additions, check out the following MDN articles:

Now that we’ve moved to a 4-week browser release cycle, you’ll see fewer new features in each individual release, but features will be added to Firefox more often. This gives you faster access to new functionality and bug fixes. You can read our full rationale for the change in Moving Firefox to a faster 4-week release cycle.

DevTools improvements

First, we’ll look at Firefox 72 DevTools improvements in more detail.

Pause on variable access or change

Watchpoints are a new type of breakpoint that can pause execution when an object property gets read or set. You can set watchpoints from the context menu of any object listed in the Scopes panel.

setting watchpoints in the debugger, using options in the context menu of objects in the scopes panel

This feature is described in more detail in the Use watchpoints article on MDN, and Debugging Variables With Watchpoints in Firefox 72 on Hacks.

Firefox DevEdition only: Asynchronous Stacks in Console

Console stacks capture the full async execution flow for console.trace() and console.error(). This lets you understand scheduling of timers, events, promises, generators, etc. over time, which would otherwise be invisible.

an asynchronous callstack being shown in the javascript console

They are only enabled in Firefox Developer Edition for now. We are working to make this feature available to all users after improving performance. Async stacks will also be rolled out to more types of logs, and of course the Debugger.

SignalR formatting & download/upload size for WebSockets

Before shipping the new WebSocket inspector in 71 we had it available in Firefox DevEdition and asked for your input. We didn’t just get a lot of fantastic ideas, some of you even stepped up to contribute code. Thanks a lot for that, and keep it coming!

Messages sent in ASP.NET’s Core SignalR format are now parsed to show nicely-formatted metadata. The bug was filed thanks to feedback from the ASP.NET community and then picked up by contributor Bryan Kok.

Similarly, the community asked to have the total transfer size for download and upload available. This is now a reality thanks to contributor Hayden Huang, who took up the bug as their first Firefox patch.

Start and end times for Network resources

The Timings tab of the Network Monitor now displays timings for each downloaded resource, making dependency analysis a lot easier:

  • Queued — When the resource was queued for download.
  • Started — When the resource started downloading.
  • Downloaded — When the the resource finished downloading.

And as always, faster and more reliable

Here are just a few highlights from our continued performance and quality investments:

  • In the Inspector, editing CSS is no longer blocked by CSP rules.
  • The Inspector‘s badge for Custom Elements now correctly opens the original script for source maps.
  • The Inspector now correctly preserves the selected element for <iframes> when reloading.
  • The Debugger now loads faster when many tabs are open, by prioritizing visible tabs first.

CSS additions

Now let’s move on to the most interesting new CSS features in Firefox 72.

Shadow Parts

One problem with styling elements contained inside a Shadow DOM is that you can’t just style them from CSS applied to the main document. To make this possible, we’ve implemented Shadow Parts, which allow shadow hosts to selectively expose chosen elements from their shadow tree to the outside page for styling purposes.

Shadow parts require two new features. The part attribute exposes an element inside a shadow tree to the outside page:

<custom-element>
  <p part="example">A paragraph</p>
</custom-element>

The ::part() pseudo-element is then used to select elements with a specific part attribute value:

custom-element::part(example) {
  border: solid 1px black;
  border-radius: 5px;
  padding: 5px;
}

CSS Motion Path

Motion Path is an interesting new spec for all you animators out there. The idea here is that you can define a path shape and then animate a DOM node along that path. The spec proposes an alternative to having to animate transform: translate(), position properties like top, right, and so on, or use some other property that often isn’t ideal and could result in very complex sets of keyframes.

With motion path, you define the shape of the path using offset-path:

offset-path: path('M20,20 C20,100 200,0 200,100');

Define an animation to animate the element between different values of the offset-distance property, which defines how far along the defined path you want the element to appear:

@keyframes move {
  0% {
    offset-distance: 0%;
  }

  100% {
    offset-distance: 100%;
  }
}

Then, animate the element using those keyframes:

animation: move 3000ms infinite alternate ease-in-out;

This is a simple example. There are additional properties available, such as offset-rotate and offset-anchor. With offset-rotate, you can specify how much you want to rotate the element being animated. Use offset-anchor to specify which background-position of the animated element is anchored to the path.

Individual transform properties

In this release the following individual transform properties are enabled by default: scale, rotate, and translate. These can be used to set transforms on an element, like so:

scale: 2;
rotate: 90deg;
translate: 100px 200px;

These can be used in place of:

transform: scale(2);
transform: rotate(90deg);
transform: translate(100px 200px);

Or even:

transform: scale(2) rotate(90deg) translate(100px 200px);

These properties are easier to write than the equivalent individual transforms, map better to typical user interface usage, and save you having to remember the exact order of multiple transform functions specified in the transform property.

JavaScript and WebAPI updates

If JavaScript is more your thing, this is the section for you. 72 has the following updates.

User gestures required for a number of permission-reliant methods

Notification permission prompts always show up in research as a top web annoyance, so we decided to do something about it. To improve security and avoid unwanted and annoying permission prompts, a number of methods have been changed so that they can only be called in response to a user gesture, such as a click event. These are Notification.requestPermission(), PushManager.subscribe(), and MediaDevices.getDisplayMedia().

By requiring a user gesture before the permission prompts are shown, Firefox significantly reduces permission spam, thereby strengthening users’ agency over their online experience.

So, for example, prompting for notification permission on initial page load is no longer possible. You now need something like this:

btn.addEventListener('click', function() {
  Notification.requestPermission();
  // Handle other notification permission stuff in here
});

For more detail on associated coding best practices for Notification permissions, read Using the Notifications API.

Nullish coalescing operator

The nullish coalescing operator, ??, returns its right-hand side operand when its left-hand side operand is null or undefined. Otherwise, it returns its left-hand side operand.

This is a useful timesaver in a number of ways, and it is also useful when you only consider null and undefined to be unwanted values, and not other falsy values like 0 and ' '.

For example, if you want to check whether a value has been set and return a default value if not, you might do something like this:

let value;

if(!value) {
  value = 'default';
}

That’s a bit long, so you might instead use this common pattern:

let value;
let value = value || 'default';

This also works OK, but will return unexpected results if you want to accept values of 0 or ' '.

With ??, you can do this instead, which is concise and solves the problem described above:

let value;
value = value ?? 'default';

Event-based form participation

Event-based form participation is now enabled by default. This involves using the new FormData event, which fires when the form is submitted, but can also be triggered by the invocation of a FormData() constructor. This allows a FormData object to be quickly obtained in response to a formdata event firing, rather than needing to create it yourself — useful when you want to submit a form via XHR, for example.

Here’s a look at this feature in action:

formElem.addEventListener('submit', (e) => {
  // on form submission, prevent default
  e.preventDefault();

  // construct a FormData object, which fires the formdata event
  new FormData(formElem);
});

formElem.addEventListener('formdata', (e) => {
  console.log('formdata fired');

  // Get the form data from the event object
  let data = e.formData;

  // submit the data via XHR
  let request = new XMLHttpRequest();
  request.open("POST", "/formHandler");
  request.send(data);
});

Picture-in-picture for video now available on macOS & Linux

In the previous release post, we announced that Picture-in-picture had been enabled in Firefox 71, albeit this was for Windows only. However,today we have the goods that this very popular feature is now available on macOS and Linux too!

picture in picture on mac os; a video being played in a separate overlay form the page where it is actually embedded

The post Firefox 72 — our first song of 2020 appeared first on Mozilla Hacks - the Web developer blog.

The Firefox FrontierHow to block fingerprinting with Firefox

If you wonder why you keep seeing the same ad, over and over, the answer could be fingerprinting. What is fingerprinting? Fingerprinting is a type of online tracking that’s more … Read more

The post How to block fingerprinting with Firefox appeared first on The Firefox Frontier.

The Firefox FrontierSay goodbye to annoying notification requests (unless you want them)

Life is full of interruptions and distractions. Like getting robocalls during dinner or any meal time. It’s inevitable that when you’re trying to get things done, something else demands your … Read more

The post Say goodbye to annoying notification requests (unless you want them) appeared first on The Firefox Frontier.

Daniel Stenbergcurl option-of-the-week

The curl command line tool has a lot of options. And I really mean a lot, as in more than two hundred of them. After the pending release, I believe the exact count is 229.

Starting now, I intend to write and blog about a specific curl command line option per week (or so). With details, examples, maybe history and whatever I can think of is relevant to the option. Quite possibly some weeks will just get shorter posts if I don’t have a lot to say about it.

curl command line option of the week; ootw.

This should keep me occupied for a while.

Options

This will later feature the list of options with links to the dedicate blog posts about them. If you have preferences of what options to cover, do let me know and I can schedule them early on, otherwise I plan to go over them in a mostly random order.

–raw (January 13th, 2020)
m, –max-time (January 20th, 2020)

The Firefox FrontierPicture-in-Picture lets you watch videos while “working”

Some days there’s something extra interesting to watch online — the big game, congressional testimony, a certain show is leaving Netflix so you gotta binge — but you’ve got work … Read more

The post Picture-in-Picture lets you watch videos while “working” appeared first on The Firefox Frontier.

Mozilla Security BlogFirefox 72 blocks third-party fingerprinting resources

Privacy is a human right, and is core to Mozilla’s mission. However many companies on the web erode privacy when they collect a significant amount of personal information. Companies record our browsing history and the actions we take across websites. This practice is known as cross-site tracking, and its harms include unwanted targeted advertising and divisive political messaging.

Last year we launched Enhanced Tracking Protection (ETP) to protect our users from cross-site tracking. In Firefox 72, we are expanding that protection to include a particularly invasive form of cross-site tracking: browser fingerprinting. This is the practice of identifying a user by the unique characteristics of their browser and device. A fingerprinting script might collect the user’s screen size, browser and operating system type, the fonts the user has installed, and other device properties—all to build a unique “fingerprint” that differentiates one user’s browser from another.

Fingerprinting is bad for the web. It allows companies to track users for months, even after users clear their browser storage or use private browsing mode. Despite a near complete agreement between standards bodies and browser vendors that fingerprinting is harmful, its use on the web has steadily increased over the past decade.

We are committed to finding a way to protect users from fingerprinting without breaking the websites they visit. There are two primary ways to protect against fingerprinting: to block parties that participate in fingerprinting, or to change or remove APIs that can be used to fingerprint users.

Firefox 72 protects users against fingerprinting by blocking all third-party requests to companies that are known to participate in fingerprinting. This prevents those parties from being able to inspect properties of a user’s device using JavaScript. It also prevents them from receiving information that is revealed through network requests, such as the user’s IP address or the user agent header.

We’ve partnered with Disconnect to provide this protection. Disconnect maintains a list of companies that participate in cross-site tracking, as well a list as those that fingerprint users. Firefox blocks all parties that meet both criteria [0]. We’ve adapted measurement techniques  from past academic research to help Disconnect discover new fingerprinting domains. Disconnect performs a rigorous, public evaluation of each potential fingerprinting domain before adding it to the blocklist.

Firefox’s blocking of fingerprinting resources represents our first step in stemming the adoption of fingerprinting technologies. The path forward in the fight against fingerprinting will likely involve both script blocking and API-level protections. We will continue to monitor fingerprinting on the web, and will work with Disconnect to build out the set of domains blocked by Firefox. Expect to hear more updates from us as we continue to strengthen the protections provided by ETP.

 

[0] A tracker on Disconnect’s blocklist is any domain in the Advertising, Analytics, Social, Content, or Disconnect category. A fingerprinter is any domain in the Fingerprinting category. Firefox blocks domains in the intersection of these two classifications, i.e., a domain that is both in one of the tracking categories and in the fingerprinting category.

The post Firefox 72 blocks third-party fingerprinting resources appeared first on Mozilla Security Blog.

This Week In RustThis Week in Rust 320

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community

News & Blog Posts

Crate of the Week

This week's crate is sqlx, a modern SQL client library.

Thanks to Jan Riemer for the suggestions!

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

207 pull requests were merged in the last week

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now.

RFCs
Tracking Issues & PRs

New RFCs

Upcoming Events

Europe
North America
South America

If you are running a Rust event please add it to the calendar to get it mentioned here. Please remember to add a link to the event too. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

relatively speaking, my rust programs are like Leonardo DiCaprio in the Revenant, killing grizzly bears with their bare hands, dying and being frozen into a giant ice cubes then, surprise!, they're actually alive.

they can handle a lot, they tend to experience far fewer bugs that come around days or weeks after going into production.

my python programs, otoh, are like William Henry Harrison. Inauguration day! exciting! kind of chilly out here. uh oh -- pneumonia ... dang it!

Jonathan Strong on reddit

Thanks to Jan Riemer for the suggestion!

Please submit quotes and vote for next week!

This Week in Rust is edited by: nasa42 and llogiq.

Discuss on r/rust.

Karl Dubostwebcompat.com Anonymous Reporting - Some context

Context

The first week of January, we had to disable anonymous reporting. GitHub in a two steps strike blocked webcompat-bot (which allows us to handle anonymous reporting) and finally the full web-bugs repo (which handles all the issues for webcompat.com). The reason for blocking was illegal content.

Previous situation

Anonymous reporting was open to everyone and we would moderate after the fact if the issue was really a liability for both GitHub or us. For the last 5 years, I guess the webcompat.com site was not known enough to not be a target of bots and the issues not regular enough. The situation has evolved.

The fall: We missed one issue which needed to be moderated and deleted. It was in a public view for quite a long time. We need to review our process about that.

Current situation

  • Any authenticated GitHub users can still report issues.
  • Anonymous reporting is disabled
  • webcompat-bot is still disabled, until we can prove to GitHub that we handle the issues related to the initial blocking.

Why people report issues anonymously?

There are a couple of reasons why someone would report anonymously an issue about a website

  • Just browser users, not necessary with a GitHub account and not necessary tech-savy enough to go to the hassles of creating a useless account on GitHub.
  • A tech savy user with already a GitHub account but the excessive permissions required to file a bug using Github makes some people uncomfortable.
  • Someone wants to report a website with a real issue, but do not want to be associated with the reports. Reports about websites where the content or topic would create an issue for your own personal safety in certain contexts.

What are the possible issues with reporting?

While anonymity or soft-anonymity is an important feature in our society, it also creates challenges in some contexts. Some of these issues are not only tied to anonymous reporting, but anonymous reporting makes it more difficult to have a direct discussion about them.

  • spam bots randomly posting issues for the purpose of making websites known
  • people with an agenda of just being nasty
  • people genuinely reporting problematic issues:
  • illegal content with possible liabilities
  • private personal information (imagine personal bank accounts, private page on social network, …)
  • internal pages of a company
  • private personal information of others (imagine a social network where the screenshot includes information about users which are usually not publicly visible)

The future

This post will help me to re-think our strategy for anonymous reporting. And what we need to to put in place for the future. We have ideas and options already, that I will probably flesh out during this week.

If you have any ideas to contribute, you are welcome to post comments on the issue. Be constructive. I will be drastic into removing things which are out of line.

Otsukare!

Dave HuntState of Performance Test Engineering (H2/2019)

It’s a new year, and time for me to post an update on the state of Firefox performance test engineering. The last update was in July 2019 and covered the first half of the year. This update covers the second half of 2019.

Team

The team consists of 9 engineers based in California, Toronto, Montreal, and Romania.

Tests

We currently support 4 frameworks:

  • Are We Slim Yet (AWSY) - memory consumption by the browser
  • build_metrics - build times, installer size, and other compiler-specific insights
  • Raptor - page load and browser benchmarks
  • Talos - various time-based performance KPIs on the browser

At the time of writing, there are 511 test suites (query) for the above frameworks. This is up from 263 in the H1/2019 report.

The following are some highlights of tests introduced in H2/2019:

Dashboards

In addition to updating health.graphics and related dashboards based on test changes, we have also contributed summary dashboards for media playback performance and power usage.

We have also contributed several improvements to the Perfherder tool used by the performance sheriffs for monitoring and investigating regression and improvement alerts. Some example are listed below:

  • Ability for sheriffs to assign alerts
  • Allow users to retrigger from the compare view
  • New view showing active performance tests
  • Highlight prioritised alerts for investigation
  • Show measurement unit in graphs

Hardware

Our tests are running on the following hardware in continuous integration:

  • 38 Moto G5 devices (with two of these dedicated to power testing)
  • 23 Pixel 2 devices (with two of these dedicated to power testing)
  • 16 Acer Aspire 15 laptops (2017 reference hardware)
  • 35 Lenovo Yoga C630 laptops (Windows ARM64)
  • 2 Apple MacBook Pro laptops (used for power testing)

See this wiki page for more details on the hardware used.

Sheriffs

We have 3 performance sheriffs 🤠 dedicating up to 50% of their time to this role. Outside of this, they assist with improving our tools and test harnesses.

Q3/2019

During the third quarter of 2019 our perfomance tools generated 1110 alert summaries. This is an average of 12 alert summaries every day.

Of the four test frameworks, build_metrics caused the most alert summaries, accounting for 42% of the total. The raptor framework was the second biggest contributor, with 30%.

Alert Summaries by Framework (Q3/2019)

Of the alert summaries generated, 13% were improvements. Of the alert summaries showing regressions, 51% were determined to be invalid, 5% were backed out, 23% were accepted, and 18% were fixed.

Regressions by Status (Q3/2019)

Our performance sheriffs triaged 63% of alerts within a day, and an additional 22% within 3 days.

Triage Response (Q3/2019)

Bugs were filed within 3 days for 53% of confirmed regressions, with a further 16% within 5 days.

Regression Bug Response (Q3/2019)

Here are some highlights for some of our sheriffed frameworks:

Are We Slim Yet (AWSY)

  • On July 22nd we detected up to 18.9% improvement in AWSY for macOS. This was caused by Paul Bone’s patch on bug 1567366, which switched to using MADV_FREE_REUSABLE.
  • The largest fixed regression detected from AWSY was noticed on August 2nd, and had an impact of up to 21.57% regression to images. This was attributed to bug 1570745 and was fixed by switching AWSY to the common scenario for new tab page rather than the new user experience.

Raptor

  • The improvement alert with the highest magnitude for Raptor was created on July 16th, which saw up to a massive 46.06% improvement to page load time on desktop. The gains were attributed to bug 1541229, which tweaked idle detection during page load.
  • The largest regression alert that was ultimately fixed for Raptor was generated on August 13th. This showed up to 21.86% regression to the cold page load tests on Android, and was caused by bug 1557282. This was also noticed in Telemetry and was fixed via bug 1575794.

Talos

  • For Talos, the alert showing the largest improvement was created on August 13th, which showed up to 13.09% improvement to ts_paint. It was attributed to bug 1572646, which optimised picture cache tiles that are solid colours.
  • The largest fixed regression alert for Talos was spotted on August 8th, and includes a 4.24% hit to ts_paint on Windows. It was caused by bug 1539651, and was fixed by Brian Grinstead in bug 1573158.

Q4/2019

In the final quarter, our perfomance tools generated 996 alert summaries. This is an average of 11 alert summaries every day.

Of the four test frameworks, build_metrics caused the most alert summaries, accounting for 49% of the total. The raptor framework was the second biggest contributor, with 25%.

Alert Summaries by Framework (Q4/2019)

Of the alert summaries generated, 14% were improvements. Of the alert summaries showing regressions, 50% were determined to be invalid, 8% were backed out, 17% were accepted, and 10% were fixed.

Regressions by Status (Q4/2019)

Our performance sheriffs triaged 61% of alerts within a day, and an additional 26% within 3 days.

Triage Response (Q4/2019)

Bugs were filed within 3 days for 67% of confirmed regressions, with a further 11% within 5 days.

Regression Bug Response (Q4/2019)

Here are some highlights for some of our sheriffed frameworks:

Are We Slim Yet (AWSY)

  • The largest improvement noticed for AWSY was on October 11th, where we saw up to a 3.88% decrease in Base Content JS. This was attributed to the work by André Bargull bug 1570370 to move language tag parsing and Intl.Locale to C++.
  • On September 30th, a 6.15% regression was noticed in JS, caused by bug 1345830. This was fixed by Gijs in bug 1586220.
  • The open regression with the highest impact for AWSY was detected on June 12th. Whilst there were a large number of improvements in this alert, the 12.67% regression to images was also significant. The regression was caused by bug 1558763, which changed the value of a preference within Marionette. It looks like this may have been fixed, but our performance sheriffs have yet to verify this.

Raptor

  • The most significant improvement detected by Raptor was on December 2nd. Up to 35.59% boost to many page load and benchmark tests on macOS. This was due to Nathan Froyd’s patch in bug 1599133 to enable constructing Sequence from moved nsTArrays.
  • On November 4th, a regression alert of 9.53% was reported against the Wikipedia page load test for bing.com on desktop. It turned out that bug 1591717 caused the unexpected regression. It was fixed by a patch by Emilio Cobos Álvarez to turn layout.css.notify-of-unvisited off for now.
  • Due to many page recordings being recreated, there are a lot of open alerts that require closer examination.

Talos

  • On December 18th we detected an improvement of up to 32.21% to several tests on macOS. This was thanks to Chris Manchester’s work on enabling PGO (Profile-Guided Optimisations) for the platform in bug 1604578.
  • Talos detected a 35.21% regression to perf_reftest_singletons on November 14th, which was caused by bug 1588431 and fixed by Emilio Cobos Álvarez in bug 1596712.
  • The open regression alert with the highest magnitude was opened on November 26th, and reports up to 196.47% regression to tp5o. It was caused by bug 1512011, which replaced mozhttpd with wptserve in Talos.

Will Kahn-GreeneSocorro Engineering: Year in Review 2019

Summary

Last year at about this time, I wrote a year in review blog post. Since I only worked on Socorro at the time, it was all about Socorro. In 2019, that changed, so this blog post covers the efforts of two people across a bunch of projects.

2019 was pretty crazy. We accomplished a lot, but picking up a bunch of new projects really threw a wrench in the wheel of ongoing work.

This year in review covers highlights, some numbers, and some things I took away.

Here's the list of projects we worked on over the year:

Read more… (13 min remaining to read)

Mozilla VR BlogMozilla Announces Deal to Bring Firefox Reality to Pico Devices

Mozilla Announces Deal to Bring Firefox Reality to Pico Devices

For more than a year, we at Mozilla have been working to build a browser that was made to showcase the best of what you love about browsing, but tailor made for Virtual Reality.

Now we are teaming up with Pico Interactive to bring Firefox Reality to its latest VR headset, the Neo 2 – an all-in-one (AIO) device with 6 degrees of freedom (DoF) head and controller tracking that delivers key VR solutions to businesses. Pico’s Neo 2 line includes two headsets: the Neo 2 Standard and the Neo 2 Eye featuring eye tracking and foveated rendering. Firefox Reality will also be released and shipped with previous Pico headset models.

Mozilla Announces Deal to Bring Firefox Reality to Pico Devices

This means anytime someone opens a Pico device, they’ll be greeted with the speed, privacy, and great features of Firefox Reality.

Firefox Reality includes the ability to sync your Firefox Account enabling you to send tabs, sync history and bookmarks, making great content easily discoverable. There’s also a curated section of top VR content, so there’s always something fresh to enjoy.

“We are pleased to be partnered with Pico to bring Firefox Reality to their users, especially the opportunity to reach more people through their large Enterprise audience,” says Andre Vrignaud, Head of Mixed Reality Platform Strategy at Mozilla. “We look forward to integrating Hubs by Mozilla to bring fully immersive collaboration to business.”

As part of Firefox Reality, we are also bringing Hubs by Mozilla to all Pico devices. In Hubs, users can easily collaborate online around virtual objects, spaces, and tasks - all without leaving the headset.

The virtual spaces created in Hubs can be used similarly to a private video conference room to meet up with your coworkers and share documents and photos, but with added support for all of your key 3D assets. You can fully brand the environment and avatars for your business, and with web-based access the meetings are just a link away, supported on any modern web browser.

Firefox Reality will be available on Pico VR headsets later in Q1, 2020. Stay tuned to our mixed reality blog and twitter account for more details.

Ryan HarterSyncthing

I did a lot of reading and exploring over my holiday break. One of the things I'm most excited about is finding Syncthing. If you haven't seen it yet, take a look. It's like and open-source decentralized Dropbox.

It works everywhere, which for me means Linux and Android. Google Drive …

Ryan HarterSyncthing and Open Source Data Collection

I don't see many open source packages collecting telemetry, so when Syncthing asked me to opt-in to telemetry I was intrigued.

I see a lot of similarities between how Syncthing and Firefox collects data. Both collect daily pings and make it easy to view the data you're submitting (in Firefox …