Skip to content

List of well-known bots and user-agent patterns to detect them

License

Notifications You must be signed in to change notification settings

arcjet/well-known-bots

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Well Known Bots

This repository contains a list of Well Known Bots, including robots, crawlers, validators, monitors, and spiders, in a single JSON file. Each bot is identified and provided a RegExp pattern to match against an HTTP User-Agent header. Additional metadata is available on each item.

Install

Direct download

Download the well-known-bots.json file directly.

Realities

It's impossible to create a system that can detect all bots. Well-behaving bots identify themselves in a consistent manner, usually via the User-Agent patterns this project provides. It is straightforward to identify these well-behaving bots, but misbehaving bots pretend to be real clients and use various mechanisms to evade detection.

For more details, see Non-Technical Notes in the browser-fingerprinting project.

Structure

Each entry in the JSON represents a specific bot or crawler and includes the following fields:

  • id: A unique identifier for the bot
  • categories: An array of categories the bot belongs to (e.g., "search-engine", "advertising")
  • pattern: A regular expression pattern used to identify the bot in user agent strings
  • url: (optional) A URL with more information about the bot
  • verification: A list of supported methods for verifying the bot's identity (if the bot is not verifiable it should be empty).
  • instances: An array of example user agent strings for the bot

Verification

Each verification entry contains the following fields:

  • type: The method of verification (currently only dns is supported)
  • masks: An array of mask patterns used for verification

Verification mask patterns

The mask patterns use the following special characters:

  • *: Represents 0 or 1 of any character
  • @: Acts as a wildcard, matching any number of characters

All other characters in the mask require an exact match.

License

The project is a hard-fork of crawler-user-agents at commit 46831767324e10c69c9ac6e538c9847853a0feb9, which is distributed under the MIT License.

About

List of well-known bots and user-agent patterns to detect them

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published