This repository implements a Dockerized REST service to look up URLs in Google Safe Browsing v4 API based on gglsbl using Flask and gunicorn.
The main challenge with running gglsbl in a REST service is that the process of updating the local sqlite database takes several minutes. Plus, the sqlite database is locked during writes by default, so that would essentially cause very noticeable downtime or a race condition.
So what gglsbl-rest does since version 1.4.0 is to set the sqlite database to write-ahead logging mode so that readers and writers can work concurrently. A cron job runs every 30 minutes to update the database and then performs a full checkpoint to ensure readers have optimal performance.
Versions before 1.4.0 maintained two sets of files on disk and switched between them, which is why the status endpoint has the output format lists "alternatives". But the current approach has many advantages, as it reuses fresh downloaded data across updates and cached full hash data.
For security reasons, even though crond
is run as root
, both the background task of updating the database and the gunicorn process are executed as a non-root user called gglsbl
.
The configuration of the REST service can be done using the following environment variables:
-
GSB_API_KEY
is required and should contain your Google Safe Browsing v4 API key. -
WORKERS
controls how many gunicorn workers to instantiate. Defaults to 8 times the number of detected cores plus one. -
TIMEOUT
controls how many seconds before gunicorn times out on a request. Defaults to 120. -
MAX_REQUESTS
controls how many requests a worker can server before it is restarted, as per the max_requests gunicorn setting. Defaults to restarting workers after they serve 16,384 requests. -
LIMIT_REQUEST_LINE
controls the maximum size of the HTTP request line (operation, protocol version, URI and query parameters), as per the limit_request_line gunicorn setting. Defaults to 8190, set to 0 to allow any length. -
KEEPALIVE
controls how long a persistent connection can be idle before it is closed, as per the keepalive gunicorn setting. Defaults to 60 seconds. -
MAX_RETRIES
controls how many times the service should retry performing the request if an error occurs. Defaults to 3.
You can run the latest automated build from Docker Hub as follows:
docker run -e GSB_API_KEY=<your API key> -p 127.0.0.1:5000:5000 mlsecproject/gglsbl-rest
This will cause the service to listen on port 5000 of the host machine. Please realize that when the service first starts it downloads a new local partial hash database from scratch before starting the REST service. So it might take several minutes to become available.
You can run docker logs --follow <container name/ID>
to tail the output and determine when the gunicorn workers start, if necessary.
In production, you might want to mount /home/gglsbl/db
in a tmpfs RAM disk for improved performance. Recommended size is 4+ gigabytes, which is roughly twice of a freshly initialized database, but YMMV.
The REST service will respond to queries for /gglsbl/v1/lookup/<URL>
. Make sure you percent encode the URL you are querying. If no sign of maliciousness is found, the service will return with a 404 status. Otherwise, a 200 response with a JSON body is returned to describe it.
Here's an example query and response:
$ curl "http://127.0.0.1:5000/gglsbl/v1/lookup/http%3A%2F%2Ftestsafebrowsing.appspot.com%2Fapiv4%2FANY_PLATFORM%2FSOCIAL_ENGINEERING%2FURL%2F"
{
"matches": [
{
"platform": "ANY_PLATFORM",
"threat": "SOCIAL_ENGINEERING",
"threat_entry": "URL"
},
{
"platform": "WINDOWS",
"threat": "SOCIAL_ENGINEERING",
"threat_entry": "URL"
},
{
"platform": "CHROME",
"threat": "SOCIAL_ENGINEERING",
"threat_entry": "URL"
},
{
"platform": "LINUX",
"threat": "SOCIAL_ENGINEERING",
"threat_entry": "URL"
},
{
"platform": "ALL_PLATFORMS",
"threat": "SOCIAL_ENGINEERING",
"threat_entry": "URL"
}
],
"url": "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/SOCIAL_ENGINEERING/URL/"
}
There' an additional /gglsbl/v1/status
URL that you can access to check if the service is running and also get some indication of how old the current sqlite database is:
$ curl "http://127.0.0.1:5000/gglsbl/v1/status"
{
"alternatives": [
{
"active": true,
"ctime": "2017-10-30T20:20:55+0000",
"mtime": "2017-10-30T20:20:55+0000",
"name": "/home/gglsbl/db/sqlite.db",
"size": 2079985664
}
],
"environment": "prod"
}
-
Niddel uses gglsbl-rest as an enrichment in its Magnet product;
-
neonknight reports gglsbl-rest is used as a bridge between the fuglu mail filter engine and Google Safebrowsing API through a plug-in.
If your project or company are using gglsbl-rest and you would like them to be listed here, please open a GitHub issue and we'll include you.