Used billions of times each day, the Chrome address bar (which we call the “omnibox”) is a powerful tool to make searching the web easier, whether you’re trying to quickly find your tabs or bookmarks, return to a web page you previously visited, or find information.

With the latest release of Chrome (M124), we’re integrating machine learning models to power the Chrome omnibox on desktop, so that web page suggestions are more precise and relevant to you. In the future, these models will also help improve the relevance scoring of search suggestions. Here’s a closer look at some of the important insights that help our team build this integration and where we hope the new model takes us.

How we got here

As the engineering lead for the team responsible for the omnibox, every launch feels special, but this one is truly near and dear to my heart. When I first started working on the Chrome omnibox, I asked around for ideas on how we could make it better for users. The number one answer I heard was, "improve the scoring system." The issue wasn't that the scoring was bad. In fact, the omnibox often feels magical in its ability to surface the URL or query you want! The issue was that it was inflexible. A set of hand-built and hand-tuned formulas did the job well, but were difficult to improve or to adapt to new scenarios. As a result, the scoring system went largely untouched for a long time.

For most of that time, an ML-trained scoring model was the obvious path forward. But it took many false starts to finally get here. Our inability to tackle this challenge for so long was due to the difficulty of replacing the core mechanism of a feature used literally billions of times every day. Software engineering projects are sometimes described as "building the plane while flying it." This project felt more like "replacing all the seats in every plane in the world while they're all flying." The scale was enormous and the changes are felt directly by every user.

This ambitious undertaking would not have been possible without the work of such a talented and dedicated team. There were bumps in the road, walls we had to break through, and unanticipated issues that slowed us down, but the team was driven by a sincere belief in the impact of getting this right for our users.

A Surprising Insight

One of the fun things about working with ML systems is that the training considers all the data at a scale that would be difficult to impossible for any individual person or team. And that can lead to surprising insights.

The coolest example of this phenomenon on this project was when we looked at the scoring curve of one particular signal: time since last navigation. The expectation with this signal is that the smaller it is (the more recently you've navigated to a particular URL), the bigger the contribution that signal should make towards a higher relevance score.

And that is, in fact, what the model learned. But when we looked closer, we noticed something surprising: when the time since navigation was very low (seconds instead of hours, days or weeks), the model was decreasing the relevance score. It turns out that the training data reflected a pattern where users sometimes navigate to a URL that was not what they really wanted and then immediately return to the Chrome omnibox and try again. In that case, the URL they just navigated to is almost certainly not what they want, so it should receive a low relevance score during this second attempt.

In retrospect, this is obvious. And if we had not launched ML scoring, we definitely would have added a new rule to the old system to reflect this scenario. But before the training system observed and learned from this pattern, it never occurred to anyone that this might be happening.

The Future

With the new ML models, we believe this will open up many new possibilities to improve the user experience by potentially incorporating new signals, like differentiating between time of the day to improve relevance. We want to explore training specialized versions of the model for particular environments: for example, mobile, enterprise or academic users, or perhaps different locales.

Additionally, we observe that the way users interact with the Chrome omnibox changes over time and we believe the relevance scoring should change with them. With the new scoring system, we can now simply collect fresher signals, re-train, evaluate, and deploy new models periodically over time.

By Justin Donnelly, Chrome software engineer



Today’s The Fast and the Curious post covers the release of Speedometer 3.0 an upgraded browser benchmarking tool to optimize the performance of Web applications.

In collaboration with major web browser engines, Blink/V8, Gecko/SpiderMonkey, and WebKit/JavaScriptCore, we’re excited to release Speedometer 3.0. Benchmarks, like Speedometer, are tools that can help browser vendors find opportunities to improve performance. Ideally, they simulate functionality that users encounter on typical websites, to ensure browsers can optimize areas that are beneficial to users.

Let’s dig into the new changes in Speedometer 3.0.

Applying a multi-stakeholder governance model

Since its initial release in 2014 by the WebKit team, browser vendors have successfully used Speedometer to optimize their engines and improve user experiences on the web. Speedometer 2.0, a result of a collaboration between Apple and Chrome, followed in 2018, and it included an updated set of workloads that were more representative of the modern web at that time.

The web has changed a lot since 2018, and so has Speedometer in its latest release, Speedometer 3. This work has been based on a joint multi-stakeholder governance model to share work, and build a collaborative understanding of performance on the web to help drive browser performance in ways that help users. The goal of this collaborative project is to create a shared understanding of web performance so that improvements can be made to enhance the user experience. Together, we were able to to improve how Speedometer captures and calculates scores, show more detailed results and introduce an even wider variety of workloads. This cross-browser collaboration introduced more diverse perspectives that enabled clearer insights into a broader set of web users and workflows, ensuring the newest version of Speedometer will help make the web better for everyone, regardless of which browser they use.

Why is building workloads challenging?

Building a reliable benchmark with representative tests and workloads is challenging enough. That task becomes even more challenging if it will be used as a tool to guide optimization of browser engines over multiple years. To develop the Speedometer 3 benchmark, the Chrome Aurora team, together with colleagues from other participating browser vendors, were tasked with finding new workloads that accurately reflect what users experience across the vast, diverse and eclectic web of 2024 and beyond.

A few tests and workloads can’t simulate the entire web, but while building Speedometer 3 we have established some criteria for selecting ones that are critical to user’s experience. We are now closer to a representative benchmark than ever before. Let’s take a look at how Speedometer workloads evolved

How did the workloads change?

Since the goal is to use workloads that are representative of the web today, we needed to take a look at the previous workloads used in Speedometer and determine what changes were necessary. We needed to decide which frameworks are still relevant, which apps needed updating and what types of work we didn’t capture in previous versions. In Speedometer 2, all workloads were variations of a todo app implemented in different JS frameworks. We found that, as the web evolved over the past six years, we missed out on various JavaScript and Browser APIs that became popular, and apps tend to be much larger and more complicated than before. As a result, we made changes to the list of frameworks we included and we added a wider variety of workloads that cover a broader range of APIs and features.

Frameworks

To determine which frameworks to include, we used data from HTTP Archive and discussed inclusion with all browser vendors to ensure we cover a good range of implementations. For the initial evaluation, we took a snapshot of the HTTP Archive from March 2023 to determine the top JavaScript UI frameworks currently used to build complex web apps.



Another approach is to determine inclusion based on popularity with developers: Do we need to include frameworks that have “momentum”, where a framework's current usage in production might be low, but we anticipate growth in adoption? This is somewhat hard to determine and might not be the ideal sole indicator for inclusion. One data point to evaluate momentum might be monthly NPM downloads of frameworks.

Here are the same 15 frameworks NPM downloads for March 2023:



With both data points on hand, we decided on a list that we felt gives us a good representation of frameworks. We kept the list small to allow space for brand new types of workloads, instead of just todo apps. We also selected commonly used versions for each framework, based on the current usage.



In addition, we updated the previous JavaScript implementations and included a new web-component based version, implemented with vanilla JavaScript.

More Workloads

A simple Todo-list only tests a subset of functionality. For example: how well do browsers handle complicated flexbox and grid layouts? How can we capture SVG and canvas rendering and how can we include more realistic scenarios that happen on a website?

We collected and categorized areas of interest into DOM, layout, API and patterns, to be able to match them to potential workloads that would allow us to test these areas. In addition we collected user journeys that included the different categories of interest: editing text, rendering charts, navigating a site, and so on.



There are many more areas that we weren’t able to include, but the final list of workloads presents a larger variety and we hope that future versions of Speedometer will build upon the current list.

Validation

The Chrome Aurora team worked with the Chrome V8 team to validate our assumptions above. In Chrome, we can use runtime-call-stats to measure time spent in each web API (and additionally many internal components). This allows us to get an insight into how dominant certain APIs are.

If we look at Speedometer 2.1 we see that a disproportionate amount of benchmark time is spent in innerHTML.



While innerHTML is an important web API, it's overrepresented in Speedometer 2.1. Doing the same analysis on the new version 3.0 yields a slightly different picture:



We can see that innerHTML is still present, but its overall contribution shrunk from roughly 14% down to 4.5%. As a result, we get a better distribution that favors more DOM APIs to be optimized. We can also see that a few Canvas APIs have moved into this list, thanks to the new workloads in v3.0.

While we will never be able to perfectly represent the whole web in a fast-running and stable benchmark, it is clear that Speedometer 3.0 is a giant step in the right direction.

Ultimately, we ended up with the following list of workloads presented in the next few sections.

What workloads are included?

TodoMVC

Many developers might recognize the TodoMVC app. It’s a popular resource for learning and offers a wide range of TodoMVC implementations with different frameworks.



TodoMVC is a to-do application that allows a user to keep track of tasks. The user can enter a new task, update an existing one, mark a task as completed, or delete it. In addition to the basic CRUD operations, the TodoMVC app has some added functionality: filters are available to change the view to “all”, “active” or “completed” tasks and a status text displays the number of active tasks to complete.

In Speedometer, we introduced a local data source for todo items, which we use in our tests to populate the todo apps. This gave us the opportunity to test a larger character set with different languages.

The tests for these apps are all similar and are relatable to typical user journeys with a todo app:

  1. Add a task
  2. Mark task as complete
  3. Delete task
  4. Repeat steps 1-3 a set amount of times

These tests seem simple, but it lets us benchmark DOM manipulations. Having a variety of framework implementations also cover several different ways how this can be done.

Complex DOM / TodoMVC

The complex DOM workloads embed various TodoMVC implementations in a static UI shell that mimics a complex web page. The idea is to capture the performance impact on executing seemingly isolated actions (e.g. adding/deleting todo items) in the context of a complex website. Small performance hits that aren’t obvious in an isolated TodoMVC workload are amplified in a larger application and therefore capture more real-world impact.

The tests are similar to the TodoMVC tests, executed in the complex DOM & CSSOM environment.

This introduces an additional layer of complexity that browsers have to be able to handle effortlessly.



Single-page-applications (News Site)

Single-page-applications (SPAs) are widely used on the web for streaming, gaming, social media and pretty much anything you can imagine. A SPA lets us capture navigating between pages and interacting with an app. We chose a news site to represent a SPA, since it allows us to capture the main areas of interest in a deterministic way. An important factor was that we want to ensure we are using static local data and that the app doesn’t rely on network requests to present this data to the user.

Two implementations are included: one built with Next.js and the other with Nuxt. This gave us the opportunity to represent applications built with meta frameworks, with the caveat that we needed to ensure to use static outputs.



Tests for the news site mimic a typical user journey, by selecting a menu item and navigating to another section of the site.

  1. Click on ‘More’ toggle of the navigation
  2. Click on a navigation button
  3. Repeat steps 1 and 2 a set amount of times

These tests let us evaluate how well a browser can handle large DOM and CSSOM changes, by changing a large amount of data that needs to be displayed when navigating to a different page.

Charting Apps & Dashboards

Charting apps allow us to test SVG and canvas rendering by displaying charts in various workloads.

These apps represent popular sites that display financial information, stock charts or dashboards.

Both SVG rendering and the use of the canvas api weren’t represented in previous releases of Speedometer.

Observable Plot displays a stacked bar chart, as well as a dotted chart. It is based on D3, which is a JavaScript library for visualizing tabular data and outputs SVG elements. It loops through a big dataset to build the source data that D3 needs, using map, filter and flatMap methods. As a result this exercises creation and copying of objects and arrays.

Chart.js is a JavaScript charting library. The included workload displays a scatter graph with the canvas api, both with some transparency and with full opacity. This uses the same data as the previous workload, but with a different preparation phase. In this case it makes a heavy use of trigonometry to compute distances between airports.

React Stockcharts displays a dashboard for stocks. It is based on D3 for all computation, but outputs SVG directly using React.

Webkit Perf-Dashboard is an application used to track various performance metrics of WebKit. The dashboard uses canvas drawing and web components for its ui.

These workloads test DOM manipulation with SVG or canvas by interacting with charts. For example here are the interactions of the Observable Plot workload:

  1. Prepare data: compute the input datasets to output structures that D3 understands.
  2. Add stacked chart: this draws a chart using SVG elements.
  3. Change input slider to change the computation parameters.
  4. Repeat steps 1 and 2
  5. Reset: this clears the view
  6. Add dotted chart: this draws another type of graph (dots instead of bars) to exercise different drawing primitives. This also uses a power scale.



Code Editors

Editors, for example WYSIWYG text and code editors, let us focus on editing live text and capturing form interactions. Typical scenarios are writing an email, logging into a website or filling out an online form. Although there is some form interaction present in the TodoMVC apps, the editor workloads use a large data set, which lets us evaluate performance more accurately.



Codemirror is a code editor that implements a text input field with support for many editing features. Several languages and frameworks are available and for this workload we used the JavaScript library from Codemirror.

Tiptap Editor is a headless, framework-agnostic rich text editor that's customizable and extendable. This workload used Tiptap as its basis and added a simple ui to interact with.

Both apps test DOM insertion and manipulation of a large amount of data in the following way:

  1. Create an editable element.
  2. Insert a long text.: Codemirror uses the development bundle of React, whileTipTap loads an excerpt of Proust’s Du Côté de Chez Swann.
  3. Highlight text: Codemirror turns on syntax highlighting, while TipTap sets all the text to bold.

Parting words

Being able to collaborate with all major browser vendors and having all of us contribute to workloads has been a unique experience and we are looking forward to continuing to collaborate in the browser benchmarking space.

Don’t forget to check out the new release of Speedometer and test it out in your favorite browser, dig into the results, check out our repo and feel free to open issues with any improvements or ideas for workloads you would like to see included in the next version. We are aiming for a more frequent release schedule in the future and if you are a framework author and want to contribute, feel free to file an issue on our Github to start the discussion.

Posted by Thorsten Kober, Chrome Aurora

  • Phishing and social engineering attacks: With the move to asynchronous checks, such sites may start to load while server-side Safe Browsing checks are in progress. We have studied the timing data and concluded that it is extremely unlikely a user would have significantly interacted with (e.g. typed in a password) such a site by the time a warning is shown.

  • Exploits against the browser: Chrome maintains a local Safe Browsing list of some sites which are known to deliver browser exploits, and we'll continue to check that synchronously. Besides this, we always recommend updating Chrome as soon as an update is available, to stay protected online.



Sub-resource checks


Most sites we encounter include various sub-resources as a way to render their content. These sub-resources can include images, scripts, and more. Chrome has historically checked both top-level URLs as well as sub-resources with Safe Browsing in order to warn on potentially harmful sites. While the majority of sub-resources are safe, in the past, we'd commonly observe compromised sites embedding sub-resources that were being leveraged by bad actors to distribute malware and exploit browsers at scale.


In recent years, we've seen this attacker trend decline – large scale campaigns that exploit sub-resources are no longer common, making sub-resource checks less important. Additionally, our advances in intelligence gathering, threat detection, and Safe Browsing APIs mean that we now have other ways to protect users in real-time without relying on sub-resource checks. For example, Chrome’s client-side visual ML model can spot images used to create phishing pages, regardless of their use of sub-resources.


As such, moving forward Chrome will no longer check the URLs of sub-resources with Safe Browsing. This means that Chrome clients now connect to Google less frequently, which reduces unnecessary network bandwidth cost for users. On the Safe Browsing side, the change allows us to drastically simplify detection logic and APIs, which helps improve infrastructure reliability and warning accuracy, thus reducing risk overall.



PDF download checks


Finally, we have vastly reduced the frequency with which Chrome contacts Safe Browsing to check PDF downloads.


In the past, PDF was a widely exploited file type due to its popularity. As time has passed, thanks in part to the ongoing hardening of PDF viewers (for example, Chrome's PDF viewer is sandboxed), we aren't seeing widespread exploitation of PDF anymore, nor do we hear industry reports about it being a dangerous file type. Even when we have observed malicious PDFs in the wild, they have contained links that redirect users back to Chrome which gives us another chance to protect users.


As a result of this change, Chrome is now contacting Safe Browsing billions of times less often each week.



What to expect


The changes described above, while mostly under the hood, should result in a smoother web browsing experience for Chrome users without a degradation in security posture. We'll continue to monitor trends in the threat landscape, and remain ready to respond to keep you safe online.


Posted by Jasika Bawa, Chrome Security & Jonathan Li, Safe Browsing




These updates are also designed to help you manage your data. When you sign in to Chrome, the browsing data that’s already on your device will be kept separate as local data on your device. You’ll be able to easily distinguish local from account data in settings. And, if you want any local data to be available on your other devices, you can simply go to settings and save it to your account.


Signing in to Chrome on iOS remains entirely optional. If you don’t sign in, you can still save your bookmarks, passwords and more, but they will be available only on the device where you saved them. You can also continue to sign in to Google web services like Gmail without signing in to Chrome.


We’re hoping these changes make it even easier for you to get the best of Chrome while offering you all the flexibility you need.

Posted by Nico Jersch, Chrome Product Manager