The idea of bundling is deceptively simple. Take a bunch of stuff and glom them into a single package. So why is it so difficult to teach the web how to bundle?
The Web already does bundling
A bundled resource is a resource the composes multiple pieces of content. Bundles can consist of content only of a single type or mixed types.
Take something like JavaScript[1]. A very large proportion of the JavaScript content on the web is bundled today. If you haven’t bundled, minified, and compressed your JavaScript, you have left easy performance wins unrealized.
HTML is a bundling format in its own right, with inline JavaScript and CSS.
Bundling other content is also possible with data:
URIs, even if this has
some drawbacks.
Then there are CSS preprocessors, which provide bundling options, image spriting, and myriad other hacks.
And that leaves aside the whole mess of zipfiles, tarballs, and self-extracting executables that are used for a variety of Web-adjacent purposes. Those matter too, but they are generally not Web-visible.
Why we might want bundles
What is immediately clear from this brief review of available Web bundling options is that they are all terrible in varying degrees. The reasons are varied and a close examination of the reasons for this is probably not worthwhile.
It might be best just to view this as the legacy of a system that evolved in piecemeal fashion; an evolutionary artifact along a dimension that nature did not regard as critical to success.
I’m more interested in what reasons we might have for improving the situation. There are reasons in support of bundling, but I doubt that introducing native support for bundling technology will fundamentally change the way Web content is delivered.
Expanding the set of options for content delivery could still have value for some use cases or deployment environments.
In researching this, I was reminded of work that Jonas Sicking did to identify use cases. There are lots of reasons and requirements that are worth looking at. Some of the reasoning is dated, but there is a lot of relevant material, even five years on.
Efficiency
One set of touted advantages for bundling relate to performance and efficiency. Today, we have a better understanding of the ways in which performance is affected by resource composition, so this has been narrowed down to two primary features: compression efficiency and reduced overheads.
Compression efficiency can be dramatically improved if similar resources are bundled together. This is because the larger shared context results in more repetition and gives a compressor more opportunities to find and exploit similarities.
Bundling is not the only way to achieve this. Alternative methods of attaining compression gains have been explored, such as SDCH and cross-stream compression contexts for HTTP/2. Prototypes of the latter showed immense improvements in compression efficiency and corresponding performance gains. However, general solutions like these have not been successful in find ways to manage operational security concerns.
Bundling could also reduce overheads. While HTTP/2 and HTTP/3 reduce the cost of making requests, those costs still compound when multiple resources are involved. The claim here is that internal handling of individual requests in browsers has inefficiencies that are hard to eliminate without some form of bundling.
I find it curious that protocol-level inefficiencies are not blamed here, but rather inter-process communication between internal browser processes. Not having examined this closely, I can’t really speak to these claims, but they are quite credible.
What I do know is that performance in this space is subtle. When we were building HTTP/2, we found that performance was highly sensitive to the number of requests that could be made by clients in the first few round trips of a connection. The way that networking protocols work means that there is very limited space for sending anything early in a connection[2]. The main motivation for HTTP header compression was that it allowed significantly more requests to be made early in a connection. By reducing request counts, bundling might do the same.
One of the other potential benefits of bundling is in eliminating additional round trips. For content that is requested, a bundle might provide resources that a client does not know that it needs yet. Without bundling, a resource that references another resource adds an additional round trip as the first resource needs to be fetched before the second one is even known to the client.
Again, experience with HTTP/2 suggests that performance gains from sending extra resources are not easy to obtain. This is exactly what HTTP/2 server push promised to provide. However, as we have learned with server push, the wins here are not easy to realize. A number of attempts to improve performance with server push often resulted in mixed results and sometimes large regressions in performance. The problem is that servers are unable to accurately predict when to push content and so they push data that is not needed. To date, no studies have shown that there are reliable strategies that servers can use to reliably improve performance with server push.
The uncertainty regarding server push performance means that compression gains and reductions in overhead are the primary focus of current performance-seeking uses of bundles. These together might be enough to counteract the waste of delivering unwanted resources.
I personally remain lukewarm on using bundling as a performance tool. Shortcomings in protocols — or implementations — seem like they could be addressed at that level.
Ergonomics
The use of bundlers is an established practice in Web development. Being able to outsource some of the responsibility for managing the complexities of content delivery is no doubt part of the appeal.
Being able to compose complex content into a single package should not be underestimated.
Bundling of content into a single file is a property common to many systems. Providing a single item to manage with a single identity simplifies interactions. This is how we expect content of all kinds to be delivered, whether it is applications, books, libraries, or any other sort of digital artifact. The Web here is something of an abberation in that it resists the idea that parts of it can be roped off into a discrete unit with a finite size and name.
Though this usage pattern might be partly attributed to path dependence, the usability benefits of individual files cannot be so readily dismissed. Being able to manage bundles as a single unit where necessary, but identify the component pieces is likely to be a fairly large gain for developers.
For me, this reason might be enough to justify using bundles, even over some of their drawbacks.
Why we might not want bundles
The act of bundling subsumes the identity of each piece of bundled content with the identity of the bundle that is formed. This produces a number of effects, some of them desirable (as discussed), some of them less so.
As far as effects go, whether they are valuable or harmful might depend on context and perspective. Some of these effects might simply be managed as trade-offs, with site or server developers being able to choose how content is composed in order to balance various factors like total bytes transferred or latency.
If bundling only represented trade-offs that affected the operation of servers, then we might be able to resolve whether the feature is worth pursuing on the grounds of simple cost-benefit. Where things get more interesting is where choices might involve depriving others of their own choices. Balancing the needs of clients and servers is occasionally necessary. Determining the effect of server choices on clients — and the people they might act for — is therefore an important part of any analysis we might perform.
Cache efficiency and bundle composition
Content construction and serving infrastructure generally operates with imperfect knowledge of the state of caches. Not knowing what a client might need can make it hard to know what content to serve at any given point in time.
Optimizing the composition of the bundles used on a site for clients with a variety of cache states can be particularly challenging if caches operate at the granularity of resources. Clients that have no prior state might benefit from maximal bundling, which allows better realization of the aforementioned efficiency gains.
On the other hand, clients that have previously received an older version of the same content might only need to receive updates for those things that have changed. Similarly, clients that have previously received content for other pages that includes some of the same content. In both cases, receiving copies of content that was already transferred might negate any efficiency gains.
This is a problem that JavaScript bundlers have to deal with today. As an optimization problem it is made difficult by the combination of poor information about client state with the complexity of code dependency graphs and the potential for clients to follow different paths through sites.
For example, consider the code that is used on an article page on a hypothetical news site and the code used on the home page of the same site. Some of that code will be common, if we make the assumption that site developers use common tools. Bundlers might deal with this by making three bundles: one of common code, plus one each of article and home page code. For a very simple site like this, that allows all the code to be delivered in just two bundles on either type of page, plus an extra bundle when navigating from an article to the home page or vice versa.
As the number of different types of page increases, splitting code into multiple bundles breaks down. The number of bundle permutations can increase much faster than the number of discrete uses. In the extreme, the number of bundles could end up being factorial on the number of types of page, limited only by the number of resources that might be bundled. Of course, well before that point is reached, the complexity cost of bundling likely exceeds any benefits it might provide.
To deal with this, bundlers have a bunch of heuristics that balance the costs of providing too much data in a bundle for a particular purpose, against the costs of potentially providing bundled data that is already present. Some sites take this a little further and use service workers to enhance browser caching logic[3].
It is at this point that you might recognize an opportunity. If clients understood the structure of bundles, then maybe they could do something to avoid fetching redundant data. Maybe providing a way to selectively request pieces of bundles could reduce the cost of fetching bundles when parts of the bundle are already present. That would allow the bundlers to skew their heuristics more toward putting stuff in bundles. It might even be possible to tune first-time queries this way.
The thing is, we’ve already tried that.
A standard for inefficient caching
There is a long history in HTTP of failed innovation when it comes to standardizing improvements for cache efficiency. Though cache invalidation is recognized as one of the hard problems in computer science, there are quite a few examples of successful deployments of proprietary solutions in server and CDN infrastructure.
A few caching innovations have made it into HTTP over time, such as the recent immutable Cache-Control directive. That particular solution is quite relevant in this context due to the way that it supports content-based URI construction, but it is still narrower in applicability than a good solution in this space might need.
More general solutions that aim to improve the ability to eliminate wasted requests in a wider range of cases are more difficult. Cache digests is notable here in that it got a lot further, getting several revisions into the IETF working group process. It still failed.
If the goal of failing is to learn, then this too was a failure largely for the most ignomonious of reasons: no interest. Claims from clients that cache digests are too expensive to implement are credible here, but not entirely satisfactory in light of the change to use Cuckoo filters in later versions and recent storage partitioning work.
The point of this little digression is to highlight the inherent difficulties in trying to standardize enhancements to caching on the Web. My view is that it would be unwise to attempt to tackle a difficult problem as part of trying to introduce a new feature. If the success of bundling depends on finding a solution to this problem, then I would be surprised, but it might suggest that the marginal benefit of bundling — for performance — is not sufficient to justify the effort[4].
Erasing resource identity
An issue that was first[5] raised by Brave is that the use of bundles creates opportunities for sites to obfuscate the identity of resources. The thesis being that bundling could confound content blocking techniques as it would make rewriting of identifiers easier.
For those who rely on the identity of resources to understand the semantics and intent of the identified resource, there are some ways in which bundling might affect their decision-making. The primary concern is that references between resources in the same bundle are fundamentally more malleable than other references. As the reference and reference target are in the same place, it is trivial - at least in theory - to change the identifier.
Brave and several others are therefore concerned that bundling will make it easier to prevent URL-based classification of resources. In the extreme, identifiers could be rewritten for every request, negating any attempt to use those identifiers for classification.
One of the most interesting properties of the Web is the way that it insinuates a browser - and user agency - into the process. The way that happens is that the the Web[6] is structurally biased toward functioning better when sites expose semantic information to browsers. This property, that we like to call semantic availability, is what allows browsers to be opinionated about content rather than acting as a dumb pipe[7].
Yes, it’s about ad blockers
Just so that this is clear, this is mostly about blocking advertising.
While more advanced ad blocking techniques also draw on contextual clues about resources, those methods are more costly. Most ad blocking decisions are made based on the URL of resources. Using the resource identity allows the ad blocker to prevent the load, which not only means that the ad is not displayed, but the resources needed to retrieve it are not spent[8].
While many people might choose to block ads, sites don’t like being denied the revenue that advertising provides. Some sites already use techniques that are designed to show advertising to users of ad blockers, so it is not unreasonable to expect tools to be used to prevent classification.
It is important to note that this is not a situation that requires an absolute certainty. The sorry state of Web privacy means that we have a lot of places where various forces are in tension or transition. The point of Brave’s complaint here is not that bundling outright prevents the sort of classification they seek, but that it changes the balance of system dynamics by giving sites another tool that they might employ to avoid classification.
Of course, when it is a question of degree, we need to discuss and agree how much the introduction of such a tool affects the existing system. That’s where this gets hard.
Coordination artifacts
As much as these concerns are serious, I tend to think that Jeffrey Yasskin’s analysis of the problem is correct. That analysis essentially concludes that the reason we have URIs is to facilitate coordination between different entities. As long as there is a need to coordinate between the different entities that provide the resources that might be composed into a web page, that coordination will expose information that can be used for classification.
That is, to the extent to which bundles enable obfuscation of identifiers, that obfuscation relies on coordination. Any coordination that would enable obfuscation with bundling is equally effective and easy to apply without bundling.
Single-page coordination
Take a single Web page. Pretend for a moment that the web page exists in a vacuum, with no relationship to other pages at all. You could take all the resources that comprise that page and form them into a single bundle. As all resources are in the one place, it would be trivial to rewrite the references between those resources. Or, the identity of resources could be erased entirely by inlining everything. If every request for that page produced a bundle with a different set of resource identifiers, it would be impossible to infer anything about the contents of resources based on their identity alone.
Unitary bundles for evey page is an extreme that is almost certainly impractical. If sites were delivered this way, there would be no caching, which means no reuse of common components. Using the Web would be virtually intolerable.
Providing strong incentive to deploy pages as discrete bundles — something Google Search has done to enable preloading search results for cooperating sites — could effectively force sites to bundle in this way. Erasing or obfuscating internal links in these bundles does seem natural at this point, if only to try to reclaim some of the lost performance, but that assumes an unnatural pressure toward bundling[9].
Absent perverse incentives, sites are often built from components developed by multiple groups, even if that is just different teams working at the same company. To the extent that teams operate independently, they need to agree on how they interface. The closer the teams work together, and the more tightly they are able to coordinate, the more flexible those interfaces can be.
There are several natural interface points on the Web. Of these the URL remains a key interface point[10]. A simple string that provides a handle for a whole bundle[11] of collected concepts is a powerful abstraction.
Cross-site coordination
Interfaces between components therefore often use URLs, especially once cross-origin content is involved. For widely-used components that enable communication between sites, URLs are almost always involved. If you want to use React, the primary interface is a URL:
<script src="https://unpkg.com/react@17/umd/react.production.min.js" crossorigin></script>
<script src="https://unpkg.com/react-dom@17/umd/react-dom.production.min.js" crossorigin></script>
If you want add Google analytics, there is a bit of JavaScript[12] as well, but the URL is still key:
<script async src="https://www.googletagmanager.com/gtag/js?id=$XXX"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', '$XXX');
</script>
The same applies to advertising.
The scale of coordination required to change these URLs is such that changes cannot be effected on a per-request basis, they need months, if not years[13].
Even for resources on the same site, a version of the same coordination problem exists. Content that might be used by multiple pages will be requested at different times. At a minimum, changing the identity of resources would mean forgoing any reuse of cached resources. Caching provides such a large performance advantage that I can’t imagine sites giving that up.
Even if caching were not incentive enough, I suggest that the benefits of stability of references are enough to ensure that identifiers don’t change arbitrarily.
Loose coupling
As long as loose coupling is a feature of Web development, the way that resources are identified will remain a key part of how the interfaces between components is managed. Those identifiers will therefore tend to be stable. That stability will allow the semantics of those resources to be learned.
Bundles do not change these dynamics in any meaningful way, except to the extent that they might enable better atomicity. That is, it becomes easier to coordinate changes to references and content if the content is distributed in a single indivisible unit. That’s not nothing, but — as the case of selective fetches and cache optimization highlights — content from bundles need to be reused in a different context, so the application of indivisible units is severely limited.
Of course, there are ways of enabling coordination that might allow for constructing identifiers that are less semantically meaningful. To draw on the earlier point about the Web already having bundling options, advertising code could be inlined with other JavaScript or in HTML, rather than having it load directly from the advertiser. In the extreme, servers could rewrite all content with encrypted URLs with a per-user key. None of this depends on the deployment of new Web bundling technology, but it does require close coordination.
All or nothing bundles
Even if it were possible to identify unwanted content, opponents of bundling point out that placing that content in the same bundle as critical resources makes it difficult to avoid loading the unwanted content. Some of the performance gains from content blockers are the result of not fetching content[14]. Bundling unwanted content might eliminate the cost and performance benefits of content blocking.
This is another important criticism that ties in with the early concerns regarding bundle composition and reuse. And, similar to previous problems, the concern is not that this sort of bundling is enabled as a result of native, generic bundling capabilities, but more that it becomes more readily accessible as a result.
This problem, more so than the caching one, might motivate designs for selective acquisition of bundled content.
Existing techniques for selective content fetching, like HTTP range requests, don’t reliably work here as compression can render byte ranges useless. That leads to inventing new systems for selective acquistion of bundles. Selective removal of content from compressed bundles does seem to be possible at some levels, but this leads to a complex system and the effects on other protocol participants is non-trivial.
At some level, clients might want to say “just send me all the code, without the advertising”, but that might not work so well. Asking for bundle manifests so that content might be selectively fetched adds an additional round trip. Moving bundle manifests out of the bundles and into content[15] gives clients the information they need to be selective about which resources they want, but it requires moving information about the composition of resources into the content that references it. That too requires coordination.
For caches, this can add an extra burden. Using the Vary HTTP header field would be necessary to ensure that caches would not break when content from bundles is fetched selectively[16]. But it takes full awareness of these requests and how they are applied for a cache to not be exposed to a combinatorial explosion of different bundles as a result. Without updating caches to understand selectors, the effect is that caches end up bearing the load for the myriad permutations of bundles that might be needed.
Supplanting resource identity
A final concern is the ability — at least in active proposals — for
bundled content to be identified with URLs from the same origin as the bundle
itself. For example, a bundle at https://example.com/foo/bundle
might contain
content that is identified as https://example.com/foo/script.js
. This is a
long-standing
concern that applies
to many previous attempts at bundling or packaging.
This ability is constrained, but the intent is to have content in a bundle act as a valid substitute for other resources. This has implications for anyone deploying a server, who now need to ensure that bundles aren’t hosted adjacent to content that might not want interference from the bundle.
At this point, I will note that this is also the point of signed exchanges. The constraints on what can be replaced and how are important details, but the goal is the same: signed exchanges allow a bundle to speak for other resources, just that in that case it is resources that are served by a completely different origin.
You might point out that this sort of thing is already possible with service workers. Service workers take what it means to subvert the identity of resources to the next level. A request that is handled by a service worker can be turned into any other request or requests (with any cardinality). Service workers are limited though. A site can opt to perform whatever substitutions it likes, but it can only do that for its own requests. Bundles propose something that might be enabled for any server, inadvertently or otherwise.
One proposal says that all supplanted resources must be identical to the resources they supplant. The theory there is that clients could fetch the resource from within a bundle or directly and expect the same result. It goes on to suggest that a mismatch between these fetches might be cause for a client to stop using the bundle. However, it is perfectly normal in HTTP for the same resource to return different content when fetched multiple times, even when the fetch is made by the same client or at the same time. So it is hard to imagine how a client would treat inconsistency as anything other than normal. If bundling provides advantages, giving up on using bundles for that reason could make bundles totally unreliable.
One good reason for enabling equivalence of bundled and unbundled resources is to provide a graceful fallback in the case that bundling is not supported by a client. One bad reason is to ensure that the internal identifiers in bundles are “real” and that the fallback does not change behaviour; see the previous points about the folly of attempting to enforce genuine equivalence.
Indirection for identifiers
Addressing the problem of one resource speaking unilaterally for another resource requires a little creativity. Here the solution is hinted at with both service workers and JavaScript import maps. Import maps are especially instructive here as it makes it clear that the mapping from the import specifier to a URL is not the URL resolution function in RFC 3986; import specifiers are explicitly not URLs, relative or otherwise.
This leaves open the possibility of a layer of indirection, either the limited form provided in import maps that takes one string and produces another, or the Turing-complete version that service workers enable.
In other words, we allow those places that express the identity of resources to
tell the browser how to interpret values. This is something that HTML has had
forever, with the <base>
element. This is also the fundamental concept behind the fetch maps
proposal, which looks
like this[17]:
<script type="fetchmap">
{
"urls": {
"/styles.css": "/styles.a74fs3.css",
"/bg.png": "/bg.8e3ac4.png"
}
}
</script>
<link rel="stylesheet" href="/styles.css">
Then, when the browser is asked to fetch /styles.css
, it knows to fetch
/styles.a74fs3.css
instead.
The beauty of this approach is that the change only exists where the reference
is made. The canonical identity of the resource is the same for everyone (it’s
https://example.com/styles.a74fs3.css
), only the way that reference is
expressed changes.
In other words, the common property between these designs — service
workers, <base>
, import maps, or fetch maps — is that the indirection
only occurs at the explicit request of the thing that makes the reference. A
site deliberately chooses to use this facility, and if it does, it controls the
substitution resource identities. There is no lateral replacement of content as
all of the logic occurs at the point the reference is made.
Making resource maps work
Of course, fitting this indirections into an existing system requires a few awkward adaptations. But it seems like this particular design could be quite workable.
Anne van Kesteren pointed out that many of the places where identifiers appear are concretely URLs. APIs assume that they can be manipulated as URIs and violating that expectation would break things that rely on that. If we are going to enable this sort of indirection, then we need to ensure that URIs stay URIs. That doesn’t mean that URIs need to be HTTP, just that they are still URIs. Thus, you might choose to construct identifiers with a new URI scheme in order to satisfy this requirement[18]:
<a href="scheme-for-mappings:hats">buy hats here</a>
Of course, in the fetch map example given, those identifiers look like and can act like URLs. In the absence of a map, they translate directly to relative URLs. That’s probably a useful feature to retain as it means that you can find local files when the reference is found in a local file during development. Using a new scheme won’t have that advantage. A new scheme might be an option, but it doesn’t seem to be a necessary feature of the design.
I can also credit Anne with the idea that we model this indirection as a redirect, something like an HTTP 303 (See Other). The Web is already able to manage redirection for all sorts of resources, so that would not naturally disrupt things too much.
That is not to say that this is easy, as these redirects will need to conform to established standards for the Web, with respect to the origin model and integration with things like Content Security Policy. It will need to be decided how resource maps affect cross-origin content. And many other details will need to be thought about carefully. But again, the design seems at least plausible.
Of note here is that resource maps can be polyfilled with service workers. That suggests we might just have sites build this logic into service workers. That could work, and it might be the basis for initial experiments. A static format is likely superior as it makes the information more readily available.
Alternatives and bundle URIs
Providing indirection is just one piece of enabling use of bundled content. Seamless integration needs two additional pieces.
The first is an agreed method of identifying the contents of bundles. The IETF WPACK working group have had several discussions about this. These discussions were inconclusive, in part because it was difficult to manage conflicting requirements. However, a design grounded in a map-like construct might loosen some of the constraints that disqualified some of the past options that were considered.
In particular, the idea that a bundle might itself have an implicit resource map was not considered. That could enable the use of simple identifiers for references between resources in the same bundle without forcing links in bundled content to be rewritten. And any ugly URI scheme syntax for bundles might then be abstracted away elegantly.
The second major piece to getting this working is a map that provides multiple alternatives. In previous proposals, mappings were strictly one-to-one. A one-to-many map could offer browsers a choice of resources that the referencing entity considers to be equivalent[19]. The browser is then able to select the option that it prefers. If an alternative references a bundle the browser already has, that would be good cause to use that option.
Presenting multiple options also allows browsers to experiment with different policies with respect to fetching content when bundles are offered. If bundled content tends to perform better on initial visits, then browsers might request bundles then. If bundled content tends to perform poorly when there is some valid, cached content available already, then the browser might request individual resources in that case.
A resource map might be used to enable deployment of new bundling formats, or even new retrieval methods[20].
Selective acquisition
One advantage of providing an identifier map like this is that it provides a browser with some insight into what bundles contain before fetching them[21]. Thus, a browser might be able to make a decision about whether a bundle is worth fetching. If most of the content is stuff that the browser does not want, then it might choose to fetch individual resources instead.
Having a reference map might thereby reduce the pressure to design mechanisms for partial bundle fetching and caching. Adding some additional metadata, like hints about resource size, might further allow for better tuning of this logic.
Reference maps could even provide content classification tools more information about resources that they can use. Even in a simple one-to-one mapping, like with an import map, there are two identifiers that might be used to classify content. Even if one of these is nonsense, the other could be useable.
While this requires more sophistication on the part of classifiers, it also provides opportunities for better classification. With alternative sources, even if the identifier for one source does not reveal any useful information, an alternative might.
Now that I’m fully into speculating about possibilities, this opens some interesting options. The care that was taken to ensure that pages don’t break when Google Analytics is loaded could be managed differently. Remember that script:
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', '$XXX');
As you can see, the primary interface is always defined and the
window.dataLayer
object is replaced with a dumb array if the script didn’t
load. With multiple alternatives, the fallback logic here could be encoded in
the map as a data:
URI instead:
<element-for-mappings type="text/media-type-for-mappings+json">
{ "scheme-for-mappings:ga": [
"https://www.googletagmanager.com/gtag/js?id=$XXX",
"data:text/javascript;charset=utf-8;base64,d2luZG93LmRhdGFMYXllcj1bXTtmdW5jdGlvbiBndGFnKCl7ZGF0YUxheWVyLnB1c2goYXJndW1lbnRzKTt9Z3RhZygnanMnLG5ldyBEYXRlKCkpO2d0YWcoJ2NvbmZpZycsJyRYWFgnKTs="
]}</element-for-mappings>
<script async src="scheme-for-mappings:ga"></script>
In this case, a content blocker that decides to block the HTTPS fetch could
allow the data:
URI and thereby preserve compatibility. Nothing really
changed, except that the fallback script is async too. Of course, this is an
unlikely outcome as this is not even remotely backward-compatible, but it does
give some hints about some of the possibilities.
Next steps
So that was many more words than I expected to write. The size and complexity of this problem continues to be impressive. No doubt this conversation will continue for some time before we reach some sort of conclusion.
For me, the realization that it is possible to provide finer control over how outgoing references are managed was a big deal. We don’t have to accept a design that allows one resource speaking for others, we just have to allow for control over how references are made. That’s a fairly substantial improvement over most existing proposals and the basis upon which something good might be built.
I still have serious reservations about the caching and performance trade-offs involved with bundling. Attempting to solve this problem with selective fetching of bundle contents seems like far too much complexity. Not only does it require addressing the known-hard problem of cache invalidation, it also requires that we find solutions to problems that have defied solutions on numerous occasions in the past.
That said, I’ve concluded that giving servers the choice in how content is assembled does not result in bad outcomes for others, so we are no longer talking about negative externalities.
If we accept that selective fetching is a difficult problem, supporting bundles only gives servers more choices. What we learn from that might give us the information that allows us to find solutions later. Resource maps mean that we can always fall back to fetching resources individually, which has been pretty effective so far. But resources maps might also be the framework on which we build new experiments with alternative resource fetching models.
All that said, the usability advantages provided by bundles seem to be sufficient justification for enabling their support. That applies even if there is uncertainty about performance. That applies even if we don’t initially solve those performance problems. One enormous problem at a time, please.
Have I ever mentioned that I loathe CamelCase names? Thanks 1990s. ↩︎
This is due to the way congestion control algorithms operate. These start out slow in case the network is constrained, but gradually speed up. ↩︎
Tantek Çelik pointed out that you can use a service worker to load old content at the same time as checking asynchronously for updates. The fact is, service workers can do just about anything discussed here. You only have to write and maintain a service worker. ↩︎
You might reasonably suggest that this sort of thinking tends toward suboptimal local minima. That is a fair criticism, but my rejoinder there might be that conditioning success on a design that reduces to a previously unsolved problem is not really a good strategy either. Besides, accepting suboptimal local minima is part of how we make forward progress without endless second-guessing. ↩︎
I seem to recall this being raised before Pete Snyder opened this issue, perhaps at the ESCAPE workshop, but I can’t put a name to it. ↩︎
In particular, the split between style (CSS) and semantics (HTML). ↩︎
At this point, a footnote seems necessary. Yes, a browser is an intermediary. All previous complaints apply. It would be dishonest to deny the possibility that a browser might abuse its position of privilege. But that is the topic for a much longer posting. ↩︎
This more than makes up for the overheads of the ad blocker in most cases, with page loads being considerably faster on ad-heavy pages. ↩︎
If it isn’t clear, I’m firmly of the opinion that Google’s AMP Cache is not just a bad idea, but an abuse of Google’s market dominance. It also happens to be a gross waste of resources in a lot of cases, as Google pushes content that can be either already present or content for links that won’t ever be followed. Of course, if they guess right and you follow a link, navigation is fast. Whoosh. ↩︎
With increasing amounts of scripts, interfaces might also be expressed at the JavaScript module or function level. ↩︎
Yep. Pun totally intended. ↩︎
Worth noting here is the care Google takes to structure the script to avoid breaking pages when their JavaScript load is blocked by an ad blocker. ↩︎
I wonder how many people are still fetching
ga.js
from Google. ↩︎Note that, at least for ad blocking, the biggest gains come from not executing unwanted content, as executing ad content almost always leads to a chain of additional fetches. Saving the CPU time is the third major component to savings. ↩︎
Yes, that effectively means bundling them with content. ↩︎
Curiously, the Variants design is might not be a good fit here as it provides enumeration of alternatives, which is tricky for the same reason that caching in ignorance of bundling is. ↩︎
There is lots to quibble about in the exact spelling in this example, but I just copied from the proposal directly. ↩︎
It’s tempting here to suggest
urn:
, but that might cause some heads to explode. ↩︎The thought occurs that this is something that could be exploited to allow for safe patching of dependencies when combined with semantic versioning. For instance, I will accept any version
X.Y.?
of this file greater thanX.Y.Z
. We can leave that idea for another day though. ↩︎Using IPFS seems far more plausible if you allow it as one option of many with the option for graceful fallback. ↩︎
To what extent providing information ahead of time can be used to improve performance is something that I have often wondered about; it seems like it has some interesting trade-offs that might be worth studying. ↩︎