In this first post, I will present the browser fingerprinting technique, the main attributes forming fingerprints and the key properties of fingerprints.
Many techniques exist to identify a browser. The most commonly used is the cookie. It consists of an identifier generated on the server-side and stored on the client-side via the cookie storage space. Other similar techniques stored identifiers in various places, like HTML5 Local/Session Storage, IndexedDB, or into more uncommon areas like Etags.
A browser fingerprint is different. It's created by gathering attributes collected on the browser and is used to identify the browser's properties and configuration. To be effective at identifying, the attributes should display to main properties, namely uniqueness and stability. Uniqueness, generally measured through the attribute's entropy, means it is possible to distinguish attributes, and therefore the browsers, by looking at their values. Stability implies that the values of the attributes change rarely or, if they do, change in a way that can be predicted. Each attribute shows different levels of uniqueness and stability.
HTTP headers are parameters added to requests the user's device sends to servers. They contain information for the server to understand what type of device made the request, how the response should be sent, which languages the user understands, as well as other information. The following table shows the most used HTTP headers for browser fingerprinting.
|Header name||Example of value|
|Accept-Encoding||gzip, deflate, br|
|Do Not Track||1|
|User-Agent||Mozilla/5.0 (X11; Ubuntu; Linux x86\_64; rv:66.0) Gecko/20100101 Firefox/66.0|
Configuration changes in the device or browser impact these headers, and the information in the headers have high entropy, meaning they are highly discriminating [Laperdrix16]. Furthermore, HTTP headers are sent by all browsers and easily collected by the server, making them ideal attributes to use in a browser fingerprint.
|Battery||Battery object, containing the battery state (charging or not), battery level.|
|Device memory||Returns the amout of memory on the device, in gigabytes.
Only available on Chrome 63 and Opera 50 (or higher)
|Do Not Track||Represents the user's will not to be tracked. Similar to the HTTP header.|
|Hardware concurrency||Number of logical processors available in the browser.|
|Languages||Preferred languages for the user. Similar to the HTTP header, but without the weight information.|
|Platform||Represents the OS value. Can be used to check the truthfulness of the User-Agent|
|User-Agent||Give several information about the device making the request. Similar to the HTTP header.|
The window.screen object allow to collect screen and window sizes. Try to resize your window and see the changes below.
Screen size :
Available screen size :
Window size :
These metrics can even leak more information. By computing the difference between the screen size and the screen available size, scripts can detect the presence of elements such as a dock. . The difference between the available screen size and the window size can also tell if the bookmark bar is present at the top of the browser.
Rendering APIs are implemented by browsers to display advanced representations of content to end-users. In the context of browser fingerprinting, rendering can produce a wide diversity of side effects that are interesting to analyze to capture the true nature of the browsing environment.
const baseFont = 'serif'; const testSize = '72px'; const testChar = 'A'; const h = document.getElementsByTagName('body'); // create a span in the document to get the default width/height const s = document.createElement('span'); s.style.fontSize = testSize; s.innerText = testChar; s.style.fontFamily = baseFont; h.appendChild(s); defaultOffsetWidth = s.offsetWidth; defaultOffsetHeight = s.offsetHeight; h.removeChild(s); // Testing a font const fontToBeChecked = 'Apple Chancery' // name of the font along with the base font for fallback. s.style.fontFamily = fontToBeChecked + ',' + baseFont; h.appendChild(s); const offsetWidth = s.offsetWidth; const offsetHeight = s.offsetHeight; h.removeChild(s); const detected = offsetWidth !== defaultOffsetWidth || offsetHeight !== defaultOffsetHeight;
On a desktop, fonts are installed at the system level. It means if you install a device containing fonts (like Microsoft Word), the fonts will be available for all the software. As many software now embed their fonts, it makes the font detection a powerful technique to distinguish browsers and add entropy to a fingerprint.
In the early years of browser fingerprinting, the list of fonts installed on a device was available via the Flash plugin. As Flash is now disappearing, a new technique emerged to detect fonts.
Font enumeration [Nikiforakis13] consists in checking the presence of a set of fonts in the device. It works by measuring the size of a text with a default font, mainly a system fallback font, then measuring again the same text with the font the script wants to check. The example opposite shows the usage of this technique to detect the presence of the Apple Chancery font. The test is completely transparent, as it does not require to print anything to the client, and is fast - dozens or even hundreds of fonts can be checked on a basic device in a second.
This technique detects a number of fonts installed on this device, on a total of more than 1 000 fonts tested.
const canvas = document.createElement('canvas'); canvas.height = 60; canvas.width = 400; const canvasContext = canvas.getContext('2d'); canvas.style.display = 'inline'; canvasContext.textBaseline = 'alphabetic'; canvasContext.fillStyle = '#f60'; canvasContext.fillRect(125, 1, 62, 20); canvasContext.fillStyle = '#069'; canvasContext.font = '11pt no-real-font-123'; canvasContext.fillText('Cwm fjordbank glyphs vext quiz, \ud83d\ude03', 2, 15); canvasContext.fillStyle = 'rgba(102, 204, 0, 0.7)'; canvasContext.font = '18pt Arial'; canvasContext.fillText('Cwm fjordbank glyphs vext quiz, \ud83d\ude03', 4, 45);
As the canvas was studied as one of the most unique and most stable attributes for fingerprinting [Mowery12, Laperdrix16], it access started to be restricted. The Tor browser blocks by default the access of the binary content of the image, and Firefox is now proposing an option to behave similarly. If you're using one of these browsers, the canvas might not display.
The canvas API got enriched with the WebGL API, which allows to display graphics in 3D. The API provides a lot of information such as the renderer and vendor of the graphic card.
WebGL Renderer :
WebGL Vendor :
The API also allow to access many configuration elements, such as the list of extensions supported by the browsers.
extensions supported by this browser
Finally, it can draw 3D shapes, color gradients, and other elements. Similar to the canvas element, the binary result of the drawing can be collected. Opposite is drawn a 3D scene with a triangle and a color gradient.
The audio fingerprinting technique processes an audio signal generated by the browser. This technique is similar to canvas fingerprinting: it asks the browser to render elements whose result will vary depending on some hardware or OS feature. It has been revealed by a 2016 study [Englehardt16].
To be efficient, fingerprints must fulfill two mains properties, uniqueness, and stability. Several studies aim at measuring these properties. Panopticlick and AmIUnique are dedicated websites studying fingerprints and their properties. Their studies [Eckersley10, Laperdrix16] revealed a high percentage of uniqueness (85 to 90%).
However, these two studies have one major drawback, they collect data on a dedicated website that attracts a biased set of users. Due to the precise and technical goal of these websites, users visiting them are often people who care about their privacy, who are more technically capable than an average user on the web, and who are more likely to have special configurations, browsers or extensions to protect themselves. Because these kinds of behaviors are not representative of all web users, the datasets of these websites suffer bias that is hard to study or remove.
In 2018, in an attempt to study an unbiased fingerprint dataset, researchers [Gomez-Boix18] setup a fingerprinting script on one of the Top-15 most popular French websites. They put the script on 2 pages, a weather forecast page, and a page on politics page. They collected around 2 million fingerprints by using the same 17 attributes as AmIUnique. They reached a uniqueness percentage of 33.6%, 35% for desktop computers, and 18% for mobile devices, showing a strong reduction in fingerprint uniqueness compared to previous studies.
Fingerprints change over time. Because of browsers or software updates, attribute values are updated and APIs might appear and disappear in browsers. Consequently, the stability or the predictability of changes is a major property to link fingerprints.
The stability parameter has been studied in 2018.
Researchers [Vastel18] collected dozens of fingerprints from thousands of different browser instances over the years to see if the fingerprint evolutions were predictable and linkable.
Two key takeaways from this work are that there’s a segment of the population that is difficult to track for extended periods using only browser fingerprinting because they use common devices, with popular browsers, and few customizations, making fingerprint collisions more common. In their dataset, this is close to 20% of the browser instances. However, there is another segment, around 25% of browser instances in the dataset, that is highly trackable and have very unique fingerprints with highly identifiable attributes. They hypothesize that users that focus more on privacy and have more experience with technology and tend to use uncommon devices or browsers with extra customization are those more susceptible to fingerprinting tracking.
In the next post, I will present the browser fingerprinting defenses.