diff --git a/README.md b/README.md index 0a301c7..c977754 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,8 @@ By flagging clients originating from these sources you can achieve a nice securi The databases created from the gathered data will be and stay open-source! +If you (*just*) want to keep track of abusers internally - you could also host your dedicated instance of [this app](https://github.com/O-X-L/risk-db/blob/latest/src). + World Map Example ASN Chart Example @@ -48,6 +50,7 @@ You may also want to check out these projects: (*not open/free data*) * [CrowdSec](https://www.crowdsec.net/) * [AbuseIP-DB](https://www.abuseipdb.com/) * [IPInfo Privacy-DB](https://ipinfo.io/products/proxy-vpn-detection-api) +* [nitefood/asn CLI-Tools](https://github.com/nitefood/asn) ---- diff --git a/reporting/Graylog.md b/reporting/Graylog.md index c0db4f0..74696dc 100644 --- a/reporting/Graylog.md +++ b/reporting/Graylog.md @@ -4,6 +4,8 @@ We can create a Graylog Alert Notification to Report Abusers to this Risk-Databa You can find an example on how to split HAProxy logs into different fields here: [gist.github.com](https://gist.github.com/superstes/a2f6c5d855857e1f10dcb51255fe08c6#haproxy-split) (*via Pipeline Rules*) +Hint: You can use [Lookup Tables](https://graylog.org/post/how-to-use-graylog-lookup-tables/) to query if an IP-Address is in your custom safe-ip-list and flag it for further filtering. (*exclude them from being reported*) + ## API Service As Graylog has no option to add advanced filters for the data sent by the notifications, we will have to add a minimal service to do so. @@ -36,8 +38,8 @@ As Graylog has no option to add advanced filters for the data sent by the notifi app = Flask(__name__) - @app.route('/report-abuse/haproxy', methods=['POST']) - def report_abuse_haproxy(): + @app.route('/report-abuse', methods=['POST']) + def report_abuse(): unique_list = [] for log in request.json['backlog']: @@ -141,9 +143,9 @@ As Graylog has no option to add advanced filters for the data sent by the notifi `https:///alerts/notifications` -* **Title**: `Report Abuse - HAProxy` +* **Title**: `Report Abuse` * **Notification Type**: `HTTP Notification` -* **URL**: `http://127.0.0.1:8000/report-abuse/haproxy` +* **URL**: `http://127.0.0.1:8000/report-abuse` ### Create an Alert-Event @@ -152,13 +154,13 @@ As Graylog has no option to add advanced filters for the data sent by the notifi **Event Details**: - * **Title**: `HAProxy Abuse` + * **Title**: `Abuse` * **Priority**: `Low` **Condition**: * **Condition Type**: `Filter & Aggregation` - * **Streams**: Select your HAProxy Access-Log stream + * **Streams**: Select your App's Access-Log stream * **Search Query**: Filter Logs to only include blocks of your security filters. Also exclude your `safe-ips` and so on * **Search within the last**: 1 minute * **Execute search every**: 1 minute @@ -168,6 +170,6 @@ As Graylog has no option to add advanced filters for the data sent by the notifi **Notifications**: - * **Choose Notification**: `Report Abuse - HAProxy` + * **Choose Notification**: `Report Abuse` * **Grace Period**: Disable * **Message Backlog**: 500 (duplicates will be filtered by the API-service) diff --git a/src/README.md b/src/README.md index f78fb06..ea138dd 100644 --- a/src/README.md +++ b/src/README.md @@ -1,6 +1,8 @@ -# Risk-DB Generator +# Risk-DB Sources -This Python3 scripts are used to generate the Risk-Databases from the reports we received. +These Python3 scripts are used for building and managing the Risk-DB. + +You can also run your own dedicated instances of these services. We want to be transparent. All code that is not security-related will be Open-Source. @@ -9,3 +11,7 @@ We want to be transparent. All code that is not security-related will be Open-So Contributions like [reporting issues](https://github.com/O-X-L/risk-db/issues/new), [engaging in discussions](https://github.com/O-X-L/risk-db/discussions) or [PRs](https://github.com/O-X-L/risk-db/pulls) are welcome! Feel free to share your opinion about possible optimizations/extensions. + +## Docker + +Dockerized services will be added later on. diff --git a/src/api/README.md b/src/api/README.md new file mode 100644 index 0000000..8c8a929 --- /dev/null +++ b/src/api/README.md @@ -0,0 +1,76 @@ +# Risk-DB API + +This Python3 script is used to act as Risk-Databases API. + +We want to be transparent. All code that is not security-related will be Open-Source. + +## Contribute + +Contributions like [reporting issues](https://github.com/O-X-L/risk-db/issues/new), [engaging in discussions](https://github.com/O-X-L/risk-db/discussions) or [PRs](https://github.com/O-X-L/risk-db/pulls) are welcome! + +Feel free to share your opinion about possible optimizations/extensions. + +---- + +## Serviceuser + +To allow the API to be run as non-root - you need to add a user: + +```bash +useradd -U --shell /usr/sbin/nologin --home-dir /var/local/lib/risk-db --create-home risk-db +``` + +---- + +## VirtualEnv + +You need to create a Python3 virtualenv to run this app: + +```bash +sudo apt install python3-virtualenv +python3 -m virtualenv /var/local/lib/risk-db/venv +source /var/local/lib/risk-db/venv/bin/activate +pip install flask waitress maxminddb +``` + +---- + +## Service + +You can run it as systemd service: + +``` +# file: /etc/systemd/system/risk-db.service + +[Unit] +Description=Service to run OXL Risk-DB API Service +Documentation=https://github.com/O-X-L/oxl-riskdb + +[Service] +Type=simple +Environment=PYTHONUNBUFFERED=1 +WorkingDirectory=/var/local/lib/risk-db +ExecStart=/bin/bash -c 'source /var/local/lib/risk-db/venv/bin/activate && \ + python3 /var/local/lib/risk-db/main.py' +User=risk-db +Group=risk-db +Restart=on-failure +RestartSec=10s + +StandardOutput=journal +StandardError=journal +SyslogIdentifier=oxl-riskdb + +[Install] +WantedBy=multi-user.target +``` + +Enable & Start: + +``` +systemctl enable risk-db.service +systemctl start risk-db.service +``` + + + diff --git a/src/api/main.py b/src/api/main.py new file mode 100644 index 0000000..47247a6 --- /dev/null +++ b/src/api/main.py @@ -0,0 +1,224 @@ +#!/usr/bin/env python3 + +from ipaddress import IPv4Address, IPv6Address, AddressValueError, IPv4Interface, IPv6Interface +from re import sub as regex_replace +from threading import Lock +from json import dumps as json_dumps +from json import loads as json_loads +from time import time +from socket import gethostname +from pathlib import Path +from datetime import datetime + +from flask import Flask, request, Response, json, redirect +from waitress import serve +import maxminddb + +app = Flask('risk-db') +BASE_DIR = Path('/var/local/lib/risk-db') +RISKY_DB_FILE = { + 4: BASE_DIR / 'risk_ip4_med.mmdb', + 6: BASE_DIR / 'risk_ip6_med.mmdb', +} +ASN_JSON_FILE = BASE_DIR / 'risk_asn_med.json' +NET_JSON_FILES = { + 4: BASE_DIR / 'risk_net4_med.json', + 6: BASE_DIR / 'risk_net6_med.json', +} + +RISK_CATEGORIES = ['bot', 'attack', 'crawler', 'rate', 'hosting', 'vpn', 'proxy', 'probe'] +RISK_REPORT_DIR = BASE_DIR / 'reports' +TOKENS = [] +NET_SIZE = {4: '24', 6: '64'} +report_lock = Lock() + + +def _valid_ipv4(ip: str) -> bool: + try: + IPv4Address(ip) + return True + + except AddressValueError: + return False + + +def _valid_public_ip(ip: str) -> bool: + ip = str(ip) + try: + ip = IPv4Address(ip) + return ip.is_global and \ + not ip.is_loopback and \ + not ip.is_reserved and \ + not ip.is_multicast and \ + not ip.is_link_local + + except AddressValueError: + try: + ip = IPv6Address(ip) + return ip.is_global and \ + not ip.is_loopback and \ + not ip.is_reserved and \ + not ip.is_multicast and \ + not ip.is_link_local + + except AddressValueError: + return False + + +def _valid_asn(_asn: str) -> bool: + return _asn.isdigit() and 0 <= int(_asn) <= 4_294_967_294 + + +def _safe_comment(cmt: str) -> str: + return regex_replace(r'[^\sa-zA-Z0-9_=+.-]', '', cmt)[:50] + + +def _response_json(code: int, data: dict) -> Response: + return app.response_class( + response=json.dumps(data, indent=2), + status=code, + mimetype='application/json' + ) + + +def _get_ipv(ip: str) -> int: + if _valid_ipv4(ip): + return 4 + + return 6 + + +def _get_src_ip() -> str: + if _valid_public_ip(request.remote_addr): + return request.remote_addr + + if 'X-Real-IP' in request.headers: + return request.headers['X-Real-IP'].replace('::ffff:', '') + + if 'X-Forwarded-For' in request.headers: + return request.headers['X-Forwarded-For'].replace('::ffff:', '') + + return request.remote_addr + + +# curl -XPOST https://risk.oxl.app/api/report --data '{"ip": "1.1.1.1", "cat": "bot"}' -H 'Content-Type: application/json' +@app.route('/api/report', methods=['POST']) +def report() -> Response: + if 'Content-Type' not in request.headers or request.headers['Content-Type'] != 'application/json': + return _response_json(code=400, data={'msg': 'Expected JSON'}) + + data = request.get_json() + + if 'ip' in data and data['ip'].startswith('::ffff:'): + data['ip'] = data['ip'].replace('::ffff:', '') + + if 'ip' not in data or not _valid_public_ip(data['ip']): + return _response_json(code=400, data={'msg': 'Invalid IP provided'}) + + if 'cat' not in data or data['cat'].lower() not in RISK_CATEGORIES: + return _response_json( + code=400, + data={'msg': f'Invalid Category provided - must be one of: {RISK_CATEGORIES}'}, + ) + + r = { + 'ip': data['ip'], 'cat': data['cat'].lower(), 'time': int(time()), + 'v': 4 if _valid_ipv4(data['ip']) else 6, 'cmt': None, 'token': None, 'by': _get_src_ip, + } + + if 'cmt' in data: + r['cmt'] = _safe_comment(data['cmt']) + + if 'Token' in request.headers and request.headers['Token'] in TOKENS: + r['token'] = request.headers['Token'] + + out_file = RISK_REPORT_DIR / f'{datetime.now().strftime("%Y-%m-%d")}_{gethostname()}.txt' + with report_lock: + with open(out_file, 'a+', encoding='utf-8') as f: + f.write(json_dumps(r) + '\n') + + return _response_json(code=200, data={'msg': 'Reported'}) + + +@app.route('/api/ip/', methods=['GET']) +def check(ip) -> Response: + if ip.startswith('::ffff:'): + ip = ip.replace('::ffff:', '') + + if not _valid_public_ip(ip): + return _response_json(code=400, data={'msg': 'Invalid IP provided'}) + + try: + with maxminddb.open_database(RISKY_DB_FILE[_get_ipv(ip)]) as m: + r = m.get(ip) + if r is None: + return _response_json(code=404, data={'msg': 'Provided IP not reported'}) + + return _response_json(code=200, data=r) + + except FileNotFoundError: + return _response_json(code=404, data={'msg': 'Temporary lookup failure'}) + + +@app.route('/api/net/', methods=['GET']) +def check_net(ip) -> Response: + if ip.startswith('::ffff:'): + ip = ip.replace('::ffff:', '') + + if ip.find('/') != -1: + ip = ip.split('/', 1)[0] + + if not _valid_public_ip(ip): + return _response_json(code=400, data={'msg': 'Invalid IP provided'}) + + ipv = _get_ipv(ip) + + if ipv == 4: + net = IPv4Interface(f"{ip}/{NET_SIZE[ipv]}").network.network_address.compressed + + else: + net = IPv6Interface(f"{ip}/{NET_SIZE[ipv]}").network.network_address.compressed + + net = f"{net}/{NET_SIZE[ipv]}" + + try: + return _response_json(code=200, data={**NET_DATA[ipv][net], 'network': net}) + + except KeyError: + return _response_json(code=404, data={'msg': 'Provided network not reported'}) + + +@app.route('/api/asn/', methods=['GET']) +def check_asn(nr) -> Response: + if not _valid_asn(nr): + return _response_json(code=400, data={'msg': 'Invalid ASN provided'}) + + try: + return _response_json(code=200, data=ASN_DATA[str(nr)]) + + except KeyError: + return _response_json(code=404, data={'msg': 'Provided ASN not reported'}) + + +@app.route('/') +def catch_base(): + return redirect(f"/api/ip/{_get_src_ip()}", code=302) + + +@app.route('/') +def catch_all(path): + del path + return redirect(f"/api/ip/{_get_src_ip()}", code=302) + + +if __name__ == '__main__': + with open(ASN_JSON_FILE, 'r', encoding='utf-8') as f: + ASN_DATA = json_loads(f.read()) + + NET_DATA = {} + + for _ipv, file in NET_JSON_FILES.items(): + with open(file, 'r', encoding='utf-8') as f: + NET_DATA[_ipv] = json_loads(f.read()) + + serve(app, host='127.0.0.1', port=8000) diff --git a/src/builder/README.md b/src/builder/README.md new file mode 100644 index 0000000..bca2fea --- /dev/null +++ b/src/builder/README.md @@ -0,0 +1,11 @@ +# Risk-DB Generator + +These Python3 scripts are used to generate the Risk-Databases from the reports we received. + +We want to be transparent. All code that is not security-related will be Open-Source. + +## Contribute + +Contributions like [reporting issues](https://github.com/O-X-L/risk-db/issues/new), [engaging in discussions](https://github.com/O-X-L/risk-db/discussions) or [PRs](https://github.com/O-X-L/risk-db/pulls) are welcome! + +Feel free to share your opinion about possible optimizations/extensions. diff --git a/src/build.py b/src/builder/build.py similarity index 99% rename from src/build.py rename to src/builder/build.py index 8038833..ffb1a6d 100644 --- a/src/build.py +++ b/src/builder/build.py @@ -143,7 +143,7 @@ def build_dbs_net(networks: dict): for n, nv in net_list.items(): ipv = nv.pop('ipv') nv = {**nv, **net_asn_info(n)} - n = f"{n}/{BGP_NET_SIZE[ipv]}" + n = f"{n}/{NET_SIZE[ipv]}" if ipv =='4': json4[n] = nv diff --git a/src/config.py b/src/builder/config.py similarity index 97% rename from src/config.py rename to src/builder/config.py index 1441cba..b4f423b 100644 --- a/src/config.py +++ b/src/builder/config.py @@ -41,4 +41,4 @@ 'info': 3 }, } -BGP_NET_SIZE = {'4': '24', '6': '48'} +NET_SIZE = {'4': '24', '6': '64'} diff --git a/src/enrich_data.py b/src/builder/enrich_data.py similarity index 100% rename from src/enrich_data.py rename to src/builder/enrich_data.py diff --git a/src/example_reports.txt b/src/builder/example_reports.txt similarity index 100% rename from src/example_reports.txt rename to src/builder/example_reports.txt diff --git a/src/kind/hosting.txt b/src/builder/kind/hosting.txt similarity index 100% rename from src/kind/hosting.txt rename to src/builder/kind/hosting.txt diff --git a/src/kind/proxy.txt b/src/builder/kind/proxy.txt similarity index 100% rename from src/kind/proxy.txt rename to src/builder/kind/proxy.txt diff --git a/src/kind/scanner.txt b/src/builder/kind/scanner.txt similarity index 100% rename from src/kind/scanner.txt rename to src/builder/kind/scanner.txt diff --git a/src/kind/vpn.txt b/src/builder/kind/vpn.txt similarity index 100% rename from src/kind/vpn.txt rename to src/builder/kind/vpn.txt diff --git a/src/load_reports.py b/src/builder/load_reports.py similarity index 100% rename from src/load_reports.py rename to src/builder/load_reports.py diff --git a/src/main.py b/src/builder/main.py similarity index 100% rename from src/main.py rename to src/builder/main.py diff --git a/src/reputation.py b/src/builder/reputation.py similarity index 100% rename from src/reputation.py rename to src/builder/reputation.py diff --git a/src/requirements.txt b/src/builder/requirements.txt similarity index 100% rename from src/requirements.txt rename to src/builder/requirements.txt diff --git a/src/util.py b/src/builder/util.py similarity index 62% rename from src/util.py rename to src/builder/util.py index abe4557..5ced1b5 100644 --- a/src/util.py +++ b/src/builder/util.py @@ -1,7 +1,7 @@ from time import time from ipaddress import IPv4Address, AddressValueError, IPv4Interface, IPv6Interface -from config import BGP_NET_SIZE +from config import NET_SIZE start_time = time() @@ -22,11 +22,11 @@ def get_ip_version(ip: str) -> str: def get_network_address(ip: str) -> str: try: IPv4Address(ip) - return IPv4Interface(f"{ip}/{BGP_NET_SIZE['4']}").network.network_address.compressed + return IPv4Interface(f"{ip}/{NET_SIZE['4']}").network.network_address.compressed except AddressValueError: - return IPv6Interface(f"{ip}/{BGP_NET_SIZE['6']}").network.network_address.compressed + return IPv6Interface(f"{ip}/{NET_SIZE['6']}").network.network_address.compressed # def get_network_cidr(ip: str) -> str: -# return f"{get_network_address(ip)}/{BGP_NET_SIZE[get_ip_version(ip)]}" +# return f"{get_network_address(ip)}/{NET_SIZE[get_ip_version(ip)]}" diff --git a/src/write.py b/src/builder/write.py similarity index 100% rename from src/write.py rename to src/builder/write.py