So let’s start with X-Ray Goggles: the X-Ray Goggles are a tool made by Mozilla that lets you “remix” web pages after loading them in your browser. You can go to your favourite place on the web, fire up the goggles (similar to how a professional web developer would open up their dev tools), and then change text, styling, images, and whatever else you might want to change, for as long as you want to change things, and then when you’re happy with the result and you want to show your remix to your friends, you can publish that remix so that it has its own URL that you can share.
However, the X-Ray Goggles use a publishing service that hosts all its content over
https, because we care about secure communication at Mozilla, and using
https is best practice. But in this particular case, it’s also kind of bad: large parts of the web still use
http, and even if a website has an
https equivalent, people usually visit the
http version anyway. Unless those websites force users to the
https version of the site (using a redirect message), then site they’ll be on, and the site they’ll be remixing, will use HTTP, and the moment the user publishes their remix with X-Ray Goggles, and they get an
https URL back, and they open that URL in their browser….
well, let’s just say “everything looks broken” is not wrong.
But the reason for this is not because Goggles, or even the browser is doing something wrong – ironically, it’s because they’re doing something right, and in so doing, what the user wants to do turns out incompatible with what the technology wants them to do. So let’s look at what’s going on here.
If you’re a user of the web, no doubt you’ll have heard about
https, even if you can’t really say what they technically-precisely mean. In simple terms (but without dumbing it down), HTTP is the language that servers and browsers use to negotiate data transfers. The original intention was for those two to talk about HTML code, so that’s where the
http comes from (it stands for “hypertext” in both
However, HTTP is a bit like regular English: you can listen in on it. If you go to a bar and sit yourself with a group of people, you can listen to their conversations. The same goes for HTTP: in order for your browser and the server to talk they rely on a chain of other computers connected to the internet to get messages relayed from one ot the other, and any of those computers can listen in on what the browser and server are saying to each other. In an HTTP setting it gets a little stranger even, because any of those computers could look at what the browser or server are saying, replace what is being said with something else and then forward that on. And you’ll have no way of knowing whether that’s what happened. It’s literally as if the postal service took a letter you sent, opened it, rewrote it, resealed it, and then sent that on. We trust that they won’t, and computers connected to the internet trust that other computers don’t mess with the communication, but… they can. And sometimes they do.
And that’s pretty scary, actually. You don’t want to have to “trust” that your communication isn’t read or tampered with, you want to know that’s the case.
Well, we can use HTTPS, or “secure HTTP”, instead. Now, I need to be very clear here: the term “secure” in “secure HTTP” refers to secure communication. Rather than talking “in English”, the browser and server agree on a secret language that you could listen to, but you won’t know what’s being said, and so you can’t intercept-and-modify the communication willy-nilly without both parties knowing that their communications are being tampered with. However it does not mean that the data the browser and server agree to receive or send is “safe data”. It only means that both parties can be sure that what one of them receives is what the other intended to send. All we can be sure of is that no one will have been able to see what got sent, and that no one modified it somewhere along the way without us knowing.
However, those are big certainties, so for this reason the internet’s been moving more and more towards preferring HTTPS for everything. But not everyone’s using HTTPS yet, and so we run into something called the “Mixed Content” issue.
http://......, and everything worked fine.
But then I hear about the problems with HTTP and the privacy and security implications sound horrible! So, to make sure my visitors don’t have to worry about whether the page they get from my server is my page, or a modified version of my page, I spring into action, I switch my page over to HTTPS; I get a security certificate, I set everything on my own server up so that it can “talk” in HTTPS, and done!
This is a classic case of mixed-content blocking. My web page is being served on HTTPS, so it’s indicating that it wants to make sure everything is secure, but the resources I rely on still use HTTP, and now the browser has a problem: it can’t trust those resources, because it can’t trust that they won’t have been inspected or even modified when it requests them, and because the web page that’s asking them to be loaded expressed that it cares about secure communication a great deal, the browser can’t just fetch those insecure elements, things might go wrong, and there’s no way to tell!
So it does the only thing it knows is safe: better safe than sorry, and it flat out refuses to even request them, giving you a warning about “mixed content”.
Normally, that’s great. It lets people who run websites know that they’re relying on potentially insecure third party content in an undeniably clear way, but it gets a bit tricky in two situations:
- third party resources that themselves require other third party resources, and
- embedding and rehosting
comments.WeDoCommentsForYou.com. If we have a page that uses HTTPS, running on
https://ourpage.org then we can certainly make sure that we load the comment system from
http://, then too bad, the browser will block that. Sure, it’s a thing that “WeDoCommentsForYou” should fix, but until they do your users can’t comment, and that’s super annoying.
The second issue is kind of like the first, but is about entire web pages. Say you want to embed a page; for instance, you’re transcluding an entire wiki page into another wiki page. If the page you’re embedding is
http and the page it’s embedded on is
https, too bad, that’s not going to work. Or, and that brings us to what I really want to talk about, if you remix a page on
http resources, and host that remix on a site that uses
https, then that’s not going to work either…
And that’s the problem we were hitting with X-Ray Goggles, too.
While the browser is doing the same kind of user protection that it does for any other website, in this particular case it’s actually a big problem: if a user remixed an HTTP website, then knowing what we know now, obviously that’s not going to work if we try to view it using HTTPS. But that also means that instead of a cool tool that people can use to start learning about how web pages work “on the inside”, the result of which they can share with their friends, they have a tool that lets them look at the insides of a web page and then when they try to share their learning, everything breaks.
That’s not cool.
And so the solution to this problem is based on first meeting the expectations of people, and then educating them on what those expectations actually mean.
There are quite a few solutions to the mixed-content problem, and some are better than others. There are some that are downright not nice to other people on the web (like making a full copy of someone’s website and then hosting that on Mozilla’s servers. That’s not okay), or may open people up exploits (like running a proxy server, which runs on HTTPS and can fetch HTTP resources, then send them on as if they were on HTTPS, effectively lying about the security of the communication), so the solution we settled on is, really, the simplest one:
If you remix an
http://... website, we will give you a URL that starts with
http://, and if you remix an
https:// website, we will give you a URL that starts with
https://.... However, we also want you to understand what’s going on with the whole “
https” thing, so when you visit a remix that starts with
http:// the remix notice bar at the top of the page also contains a link to the
https:// version –same page, just served using HTTPS instead of HTTP– so that you can see exactly how bad things get if you can’t control which protocol gets used for resources on a page.
Security is everybody’s responsibility, and explaining the risks on the web that are inherent to the technology we use every day is always worth doing. But that doesn’t mean we need to lock everything down so “you can’t use it, the end, go home, stop using HTTP”. That’s not how the real world works.
So we want you to be able to remix your favourite sites, even if they’re HTTP, and have a learning/teaching opportunity there around security. Yes, things will look bad when you try to load an HTTP site on HTTPS, but there’s a reason for that, and it’s important to talk about it.
And it’s equally important to talk about it without making you lose an hour or more of working on your awesome remix.