Skip to content

Commit

Permalink
Merge pull request #2 from jerry871002/make-package
Browse files Browse the repository at this point in the history
Make PyBitShred into a package
  • Loading branch information
jerry871002 authored Jul 16, 2024
2 parents 719b854 + 962cd9d commit ee8f2eb
Show file tree
Hide file tree
Showing 12 changed files with 127 additions and 37 deletions.
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

19 changes: 0 additions & 19 deletions Dockerfile

This file was deleted.

21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Jerry Yang

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
63 changes: 59 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,69 @@
# bitshred-python
Bitshred python implementation
# PyBitShred

PyBitShred is a Python reimplementation of [BitShred](https://github.com/dbrumley/bitshred), a tool for large-scale malware similarity analysis and clustering.

## Build (Docker)
This is my semester project of fall 2023 at EURECOM, France. The project is supervised by Antonino Vitale and Prof. Simone Aonzo. We implemented the tool in Python to make it more accessible and easier to use. We also added two new modes to the tool: the "all section" mode and the "raw file" mode.

Check the presentation slides [here](https://docs.google.com/presentation/d/1L-4U6vH8q7YYatx5d8Q5gbiRkpo0UWAFfMVTqlKDWVw/edit?usp=sharing).

## Getting Started

### Installation

```bash
pip install pybitshred
```

### Usage

Use the following command to see the available options:

```bash
pybitshred -h
```

There are three stages in the BitShred pipeline: ***update***, ***compare***, and ***cluster***. The ***update*** stage is used to update the database with new samples. The ***compare*** stage is used to compare the samples in the database. The ***cluster*** stage is used to cluster the samples in the database.

Check the [original paper](https://users.ece.cmu.edu/~dbrumley/pdf/Jang,%20Brumley,%20Venkataraman_2011_BitShred%20Feature%20Hashing%20Malware%20for%20Scalable%20Triage%20and%20Semantic%20Analysis.pdf) for more details.


## Running Original BitShred

@im-overlord04 wrote a `Dockerfile` to run the original BitShred tool in a container. This is useful as a reference to compare the results of the original tool with the results of the reimplemented tool.

```dockerfile
FROM ubuntu:14.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
apt-get install --no-install-recommends -y \
wget git automake libtool make cmake gcc g++ pkg-config libmagic-dev \
tar unzip libglib2.0-0 libssl-dev libdb-dev
RUN ld -v
ARG GIT_SSL_NO_VERIFY=1
RUN apt-get install -y automake1.11 binutils-dev

COPY bitshred/bitshred_single_steps bitshred_single_steps
RUN cd bitshred_single_steps/ && ./configure && make

COPY bitshred/bitshred_single bitshred_single
RUN cd bitshred_single/ && ./configure && make

COPY bitshred/bitshred_openmp bitshred_openmp
RUN cd bitshred_openmp/ && make

WORKDIR /
```


### Build

```bash
docker build . -t nino:bitshred
```

## Usage (Docker)
### Usage

```bash
docker run --rm --volume <input_folder>:/input:ro --volume <output_folder>:/db --entrypoint <entrypoint>/src/bitshred nino:bitshred <options>
Expand Down
1 change: 0 additions & 1 deletion bitshred
Submodule bitshred deleted from 9f3ea3
34 changes: 34 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "pybitshred"
version = "0.1.1"
authors = [
{ name="Jerry Yang", email="[email protected]" }
]
description = "A tool for large-scale malware similarity analysis and clustering"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Topic :: Security"
]
dependencies = [
"pefile==2023.2.7",
"psutil==6.0.0"
]

[project.scripts]
pybitshred = "pybitshred.bitshred:main"

[project.urls]
Homepage = "https://github.com/jerry871002/pybitshred"
Issues = "https://github.com/jerry871002/pybitshred/issues"
Empty file added src/pybitshred/__init__.py
Empty file.
File renamed without changes.
15 changes: 9 additions & 6 deletions bitshred-python/bitshred.py → src/pybitshred/bitshred.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import argparse
import logging

from fingerprint_db import cluster_fingerprint_db, compare_fingerprint_db, update_fingerprint_db
from .fingerprint_db import cluster_fingerprint_db, compare_fingerprint_db, update_fingerprint_db

logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s', level=logging.DEBUG)


def main(args: argparse.Namespace) -> None:
def main() -> None:
args = parse_cli_arguments()

if args.binary or args.raw:
update_fingerprint_db(
binary=args.binary,
Expand All @@ -22,8 +24,7 @@ def main(args: argparse.Namespace) -> None:
elif args.cluster:
cluster_fingerprint_db(args.db, args.jacard)


if __name__ == '__main__':
def parse_cli_arguments() -> argparse.Namespace:
parser = argparse.ArgumentParser(description='BitShred reimplementation in Python')

execution_mode = parser.add_mutually_exclusive_group(required=True)
Expand Down Expand Up @@ -55,6 +56,8 @@ def main(args: argparse.Namespace) -> None:
)
parser.add_argument('-j', '--jacard', help='Set Jaccard threshold', default=0.6, type=float)

args = parser.parse_args()
return parser.parse_args()

main(args)

if __name__ == '__main__':
main()
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from dataclasses import dataclass
from typing import Literal

from utils import bit_count, bit_vector_set
from .utils import bit_count, bit_vector_set


@dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@

import psutil

from binary_file import BinaryFile, initailaize_binary_file
from fingerprint import Fingerprint, create_fingerprint, fingerprint_encoder, jaccard_distance
from utils import djb2_hash
from .binary_file import BinaryFile, initailaize_binary_file
from .fingerprint import Fingerprint, create_fingerprint, fingerprint_encoder, jaccard_distance
from .utils import djb2_hash

FINGERPRINT_BASE = 'fingerprints'
JACCARD_BASE = 'jaccard'
Expand Down
File renamed without changes.

0 comments on commit ee8f2eb

Please sign in to comment.