Skip to content

Commit

Permalink
combine capa detection with tool detection to attempt to alleviate fns
Browse files Browse the repository at this point in the history
  • Loading branch information
evandowning committed Feb 26, 2021
1 parent c7eed5d commit fa0ec0c
Show file tree
Hide file tree
Showing 12 changed files with 421 additions and 84 deletions.
29 changes: 16 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ For technical details, please see the paper cited below.

## Grading
- Here we provide real malware binaries compiled from source code which have been [open-sourced or leaked](https://thezoo.morirt.com/). **These are real malware. Do NOT execute these binaries. They should be used for educational purposes only.**
- [Download](https://github.com/fireeye/capa/releases) CAPA release binary and move it to `grader/capa/capa`
- Extract CAPA results
```
(dr) $ cd grader/capa/
(dr) $ ./output_data.sh
```
- Graph ROC curves
```
(dr) $ cd grader/
Expand Down Expand Up @@ -135,26 +141,23 @@ For technical details, please see the paper cited below.
- Sort functions by MSE value to list TPs before FPs
- Our intuition is that functions more unrecognizable by the autoencoder are more likely to be malicious.
- Sort functions by number of basic blocks to list TPs before FPs
- We observed that a lot of malicious functions are larger than benign functions.
- **TODO** - Sort functions by uniqueness compared to other malware samples in population
- An analyst might want to prioritize seeing *unique* functions first.
- Sort functions randomly
- We observed that (on average) malicious functions from our ground-truth samples have more basic blocks than benign functions.
- Sort functions randomly to list TPs before FPs
- This is a gut-check to make sure something naive won't work better
- Sort functions by address
- Sometimes core functionalities are implemented before others. But this is a poor assumption (obviously).
- Sort functions by address to list TPs before FPs
- It is *obviously* a poor assumption that malicious functionalities would appear in the binary in a specific order linearly.
- This is a gut-check to make sure something naive won't work better
- Grade each option from above
```
# Run "roc.sh" above first
(dr) $ ./grade_sort.sh 9.053894787328584e-08 > grade_stdout.txt
(dr) $ vim grade_sort_stdout.txt
```
- **TODO** - Reduce FNs
- Signature-based solutions can be used to identify known functionalities, and thus could catch FNs missed by this tool.
- Grade each option from above
```
(dr) $ ./grade_fp.sh 9.053894787328584e-08 > grade_fp_stdout.txt
(dr) $ vim grade_fp_stdout.txt
```
- Reduce FNs
- Signature-based solutions can be used to identify *known* functionalities, and thus could catch FNs missed by DeepReflect.
- If CAPA identifies a function, it's marked. Else it gets a score from DeepReflect.
- See above grader section for this option's results

## FAQs
- Why don't you release the binaries used to train and evaluate DeepReflect (other than ground-truth samples)?
Expand Down
97 changes: 97 additions & 0 deletions grader/capa/dr_plus_capa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
import sys
import os
import argparse
import json
import numpy as np

def get_data(capaFN, drFN):
# Load CAPA results
content = json.loads(open(capaFN,'r').read())

capa_result = dict()

for rule_name in content['rules']:

# Skip rule if... (from: https://github.com/fireeye/capa/blob/8510f0465122ef11c1d259e47eadc0b0f6946f6c/capa/render/utils.py#L31)
rule = content['rules'][rule_name]
if rule["meta"].get("lib"):
continue
if rule["meta"].get("capa/subscope"):
continue
if rule["meta"].get("maec/analysis-conclusion"):
continue
if rule["meta"].get("maec/analysis-conclusion-ov"):
continue
if rule["meta"].get("maec/malware-category"):
continue
if rule["meta"].get("maec/malware-category-ov"):
continue

scope = content['rules'][rule_name]['meta']['scope']

for addr in content['rules'][rule_name]['matches']:
success = content['rules'][rule_name]['matches'][addr]['success']

if success is True:
capa_result[int(addr)] = '{0}: {1}'.format(rule_name,scope)

# Load DeepReflect results
deepreflect_result = np.load(drFN)
dr_addr = deepreflect_result['addr']
dr_y = deepreflect_result['y']
dr_score = deepreflect_result['score']


addr = list()
score = list()
label = list()

# For each DeepReflect address, determine if CAPA flagged it
for e,a in enumerate(dr_addr):
l = dr_y[e]

# If address in capa results, it means CAPA has flagged this function
if a in capa_result.keys():
s = 1.0
# Else, score is DR score
else:
s = dr_score[e]

addr.append(a)
score.append(s)
label.append(l)

return addr,score,label

def _main():
# Each argument comes in pairs [CAPA json, DR npz]
length = len(sys.argv) - 2
if length % 2 != 0:
sys.stderr.write('Error, arguments incorrect\n')
sys.exit(2)

# Last argument is output numpy file
outFN = sys.argv[-1]

addr = list()
score = list()
label = list()

# For each file pair
for i in range(1,length,2):
capaFN = sys.argv[i]
drFN = sys.argv[i+1]

a,s,l = get_data(capaFN,drFN)
addr.extend(a)
score.extend(s)
label.extend(l)

# Output data file
np.savez(outFN,
y=np.asarray(label),
score=np.asarray(score),
addr=np.asarray(addr))

if __name__ == '__main__':
_main()
4 changes: 2 additions & 2 deletions grader/capa/output_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def get_data(capaFN, drFN):
for e,a in enumerate(dr_addr):
l = dr_y[e]

# If address, in capa results, it means CAPA has flagged this function
# If address in capa results, it means CAPA has flagged this function
if a in capa_result.keys():
s = 1.0
else:
Expand All @@ -62,7 +62,7 @@ def get_data(capaFN, drFN):
return addr,score,label

def _main():
# Each argument comes in pairs [CAPA json, annotation]
# Each argument comes in pairs [CAPA json, DR npz]
length = len(sys.argv) - 2
if length % 2 != 0:
sys.stderr.write('Error, arguments incorrect\n')
Expand Down
61 changes: 24 additions & 37 deletions grader/capa/output_data.sh
Original file line number Diff line number Diff line change
@@ -1,56 +1,43 @@
#!/bin/bash

function capa() {
target="${1}/${2}"
output="${3}_${2}.json"
target="${1}"
output="${2}.json"

# Extract CAPA data
./capa -j "$target" > "$output"
}

# Rbot
echo "Rbot"
root="../../malware-gt/old/"
root="../malware/"
family="rbot"
name="rbot.exe"
capa $root $name $family
python output_data.py "${family}_${name}.json" "../rbot_final_corrected/rbot_ae_acfg_plus_roc_data_func.npz" \
"${family}_capa_data_func.npz"
base="${root}/${family}/"
mkdir -p "${family}"
capa "${base}/rbot.exe" "${family}/rbot"

# Pegasus
echo "Pegasus"
root="../../malware-gt/new/pegasus/binres"
root="../malware/"
family="pegasus"
capa $root "idd.x32" $family
capa $root "mod_CmdExec.x32" $family
capa $root "mod_DomainReplication.x32" $family
capa $root "mod_LogonPasswords.x32" $family
capa $root "mod_NetworkConnectivity.x32" $family
capa $root "rse.x32" $family
python output_data.py "${family}_idd.x32.json" "../pegasus_final/pegasus_ae_acfg_plus_roc_pegasus_idd_data_func.npz" \
"${family}_mod_CmdExec.x32.json" "../pegasus_final/pegasus_ae_acfg_plus_roc_pegasus_mod_cmdexec_data_func.npz" \
"${family}_mod_DomainReplication.x32.json" "../pegasus_final/pegasus_ae_acfg_plus_roc_pegasus_mod_domainreplication_data_func.npz" \
"${family}_mod_LogonPasswords.x32.json" "../pegasus_final/pegasus_ae_acfg_plus_roc_pegasus_mod_logonpasswords_data_func.npz" \
"${family}_mod_NetworkConnectivity.x32.json" "../pegasus_final/pegasus_ae_acfg_plus_roc_pegasus_mod_networkconnectivity_data_func.npz" \
"${family}_rse.x32.json" "../pegasus_final/pegasus_ae_acfg_plus_roc_pegasus_rse_data_func.npz" \
"${family}_capa_data_func.npz"
base="${root}/${family}/"
mkdir -p "${family}"
capa "${root}/${family}/idd.x32" "${family}/idd"
capa "${root}/${family}/mod_CmdExec.x32" "${family}/mod_CmdExec"
capa "${root}/${family}/mod_DomainReplication.x32" "${family}/mod_DomainReplication"
capa "${root}/${family}/mod_LogonPasswords.x32" "${family}/mod_LogonPasswords"
capa "${root}/${family}/mod_NetworkConnectivity.x32" "${family}/mod_NetworkConnectivity"
capa "${root}/${family}/rse.x32" "${family}/rse"

# Carbanak
echo "Carbanak"
root="../../malware-gt/new/carbanak/bin/Release"
root="../malware/"
family="carbanak"
capa $root "bot.exe" $family
capa $root "botcmd.exe" $family
capa $root "downloader.exe" $family
root2="../../malware-gt/new/carbanak/bin/Release simple/plugins/"
capa "$root2" "AutorunSidebar.dll" $family
capa "$root2" "cve2014-4113.dll" $family
capa "$root2" "rdpwrap.dll" $family
python output_data.py "${family}_bot.exe.json" "../carbanak_final/carbanak_ae_acfg_plus_roc_carbanak_bot_data_func.npz" \
"${family}_botcmd.exe.json" "../carbanak_final/carbanak_ae_acfg_plus_roc_carbanak_botcmd_data_func.npz" \
"${family}_downloader.exe.json" "../carbanak_final/carbanak_ae_acfg_plus_roc_carbanak_downloader_data_func.npz" \
"${family}_AutorunSidebar.dll.json" "../carbanak_final/carbanak_ae_acfg_plus_roc_carbanak_autorunsidebar_data_func.npz" \
"${family}_cve2014-4113.dll.json" "../carbanak_final/carbanak_ae_acfg_plus_roc_carbanak_cve2014-4113_data_func.npz" \
"${family}_rdpwrap.dll.json" "../carbanak_final/carbanak_ae_acfg_plus_roc_carbanak_rdpwrap_data_func.npz" \
"${family}_capa_data_func.npz"

base="${root}/${family}/"
mkdir -p "${family}"
capa "${root}/${family}/bot.exe" "${family}/bot"
capa "${root}/${family}/botcmd.exe" "${family}/botcmd"
capa "${root}/${family}/downloader.exe" "${family}/downloader"
capa "${root}/${family}/AutorunSidebar.dll" "${family}/AutorunSidebar"
capa "${root}/${family}/cve2014-4113.dll" "${family}/cve2014-4113"
capa "${root}/${family}/rdpwrap.dll" "${family}/rdpwrap"
110 changes: 103 additions & 7 deletions grader/carbanak.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ roc_multi()
base="${root}/malware/${family}/output/"

python roc_multi.py "${base}/combined_roc_func_data.npz" \
"${base}/combined_capa_func_data.npz" \
"${base}/combined_dr_plus_capa_func_data.npz" \
"DeepReflect" \
"CAPA" \
"DeepReflect+CAPA" \
"Carbanak" \
"${base}/combined_roc.png"
}
Expand All @@ -27,9 +31,25 @@ combine ()
"${base}/downloader_roc_func_data.npz" \
"${base}/rdpwrap_roc_func_data.npz" \
"${base}/combined_roc_func_data.npz"

python combine.py "${base}/AutorunSidebar_capa_func_data.npz" \
"${base}/bot_capa_func_data.npz" \
"${base}/botcmd_capa_func_data.npz" \
"${base}/cve2014-4113_capa_func_data.npz" \
"${base}/downloader_capa_func_data.npz" \
"${base}/rdpwrap_capa_func_data.npz" \
"${base}/combined_capa_func_data.npz"

python combine.py "${base}/AutorunSidebar_dr_plus_capa_func_data.npz" \
"${base}/bot_dr_plus_capa_func_data.npz" \
"${base}/botcmd_dr_plus_capa_func_data.npz" \
"${base}/cve2014-4113_dr_plus_capa_func_data.npz" \
"${base}/downloader_dr_plus_capa_func_data.npz" \
"${base}/rdpwrap_dr_plus_capa_func_data.npz" \
"${base}/combined_dr_plus_capa_func_data.npz"
}

roc ()
dr ()
{
family="$1"
name="$2"
Expand Down Expand Up @@ -83,25 +103,101 @@ roc ()
--roc "${roc_name}" &> "${roc_out}"
}

capa ()
{
family="$1"
name="$2"

root=`pwd`
root_input="${root}/malware/${family}/"
binary="${root_input}/${name}"

root_output="${root_input}/output"

base="${root_output}/${name: 0:-4}"
bndb="${base}.bndb"
raw="${base}_raw.txt"

feature="${base}_feature.npy"
feature_path="${base}_feature_path.txt"

function="${base}_function.txt"
mse="${base}_mse"
annotation="${root_input}/${name: 0:-4}_annotation.txt"
roc_name="${base}_roc"
roc_out="${base}_roc_stdout_stderr.txt"

echo "${base}/${name: 0:-4}_roc_func_data.npz"

cd capa/
python output_data.py "${family}/${name: 0:-4}.json" "${base}_roc_func_data.npz" \
"${base}_capa_func_data.npz"
cd ../
}

dr_capa()
{
family="$1"
name="$2"

root=`pwd`
root_input="${root}/malware/${family}/"
binary="${root_input}/${name}"

root_output="${root_input}/output"

base="${root_output}/${name: 0:-4}"
bndb="${base}.bndb"
raw="${base}_raw.txt"

feature="${base}_feature.npy"
feature_path="${base}_feature_path.txt"

function="${base}_function.txt"
mse="${base}_mse"
annotation="${root_input}/${name: 0:-4}_annotation.txt"
roc_name="${base}_roc"
roc_out="${base}_roc_stdout_stderr.txt"

echo "${base}/${name: 0:-4}_roc_func_data.npz"

cd capa/
python dr_plus_capa.py "${family}/${name: 0:-4}.json" "${base}_roc_func_data.npz" \
"${base}_dr_plus_capa_func_data.npz"
cd ../
}

family="carbanak"

name="AutorunSidebar.dll"
roc "${family}" "${name}"
dr "${family}" "${name}"
capa "${family}" "${name}"
dr_capa "${family}" "${name}"

name="bot.exe"
roc "${family}" "${name}"
dr "${family}" "${name}"
capa "${family}" "${name}"
dr_capa "${family}" "${name}"

name="botcmd.exe"
roc "${family}" "${name}"
dr "${family}" "${name}"
capa "${family}" "${name}"
dr_capa "${family}" "${name}"

name="cve2014-4113.dll"
roc "${family}" "${name}"
dr "${family}" "${name}"
capa "${family}" "${name}"
dr_capa "${family}" "${name}"

name="downloader.exe"
roc "${family}" "${name}"
dr "${family}" "${name}"
capa "${family}" "${name}"
dr_capa "${family}" "${name}"

name="rdpwrap.dll"
roc "${family}" "${name}"
dr "${family}" "${name}"
capa "${family}" "${name}"
dr_capa "${family}" "${name}"

# Combine ROC data
combine "${family}"
Expand Down
Loading

0 comments on commit fa0ec0c

Please sign in to comment.