From c2c1ee4567c6e1e96abb307fc7e1ba2d586f4787 Mon Sep 17 00:00:00 2001 From: e-moran Date: Thu, 14 Nov 2024 18:16:59 +0100 Subject: [PATCH] docs: document the structure of the json file --- README.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/README.md b/README.md index 48f55f5..6002586 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,33 @@ to evade detection. For more details, see [Non-Technical Notes in the browser-fingerprinting][non-tech-notes-url] project. +## Structure + +Each entry in the JSON represents a specific bot or crawler and includes the following fields: + +- id: A unique identifier for the bot +- categories: An array of categories the bot belongs to (e.g., "search-engine", "advertising") +- pattern: A regular expression pattern used to identify the bot in user agent strings +- url: (optional) A URL with more information about the bot +- verification: A list of supported methods of verifying the bot's identity (if the bot is not verifiable it should be empty). +- instances: An array of example user agent strings for the bot + +### Verification + +Each verification entry contains the following fields: + +- type: The method of verification (currently only `dns` is supported) +- masks: An array of mask patterns used for verification + +### Verification mask petterns + +The mask patterns use the following special characters: + +- *: Represents 0 or 1 of any character +- @: Acts as a wildcard, matching any number of characters + +All other characters in the mask require an exact match. + ## License The project is a hard-fork of [crawler-user-agents][forked-repo-url] at commit