Skip to content

Commit

Permalink
Merge pull request #193 from uclaacm/cyber-lab-blog-posts
Browse files Browse the repository at this point in the history
Lots of blog posts
  • Loading branch information
bliutech authored Sep 19, 2024
2 parents 6634747 + 60e36ff commit 9f9df4c
Show file tree
Hide file tree
Showing 9 changed files with 649 additions and 59 deletions.
59 changes: 0 additions & 59 deletions data/blog/2023-08-26-first-blog-post.md

This file was deleted.

69 changes: 69 additions & 0 deletions data/blog/2024-03-11-winter-2024-dynamic-malware-analyzer-lab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Dynamic Malware Analysis
authors: [Christopher Simaan, Cameron Monast, Arnav Vora, Teong Seng, Yvana Mouawad, Andy Huang, Mark Epstein, Salma Alandary]
category: Projects
tags: [winter-2024, cyber-lab]
description: Dynamic Malware Analysis using PANDA
---

<img src="https://csisolutions.in/wp-content/uploads/2022/03/malware.png" alt="blog image" />

In this lab, we delved into the world of Dynamic Program Analysis, working in small groups of 2-3 to create "plugins" to analyze and detect different subsets of malware recorded on a Windows Virtual Machine.

## Introduction to Dynamic Analysis

### PANDA-RE

"PANDA is an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. PANDA can be controlled from the command line, through our Python package, or even a Jupyter notebook." -- [PANDA website](https://panda.re/)

## Plugins with PANDA

Alongside being runnable through Python, PANDA offers the ability to create custom plugins in C/C++ that can be run alongside its own. This allows PANDA to be incredibly versatile as we can create our own plugins to detect different types of malware. A more in-depth guide to writing PANDA plugins can be found [here](https://docs.google.com/document/d/1AGxl1yYBoNI0-PFvTKVz--xifcNl4s9GVA0A-zQ89xU/edit?usp=sharing).


## Ransomware Detection
In the development of our ransomware detection plugin, we focused on a simple yet highly effective strategy: closely monitoring the amount of write operations carried out by the processes within the system. The reason for this emphasis lies in the typical behavior of ransomware, which involves rapidly encrypting files and consequently leading to an unusually high number of write operations. This distinct behavior of ransomware serves as a clear indicator that can aid in the early detection of such malicious software.

To effectively utilize this indicator, we initially set a standard for normal write operation levels. This standard acts as a reference point, enabling our plugin to differentiate between regular system operations and the irregular patterns associated with ransomware. By establishing a threshold for what is considered normal, any process surpassing this threshold in terms of write operations will trigger a flag, identifying it as potentially harmful.

This approach centered around monitoring processes allows our plugin to pinpoint and highlight any suspicious activities that deviate from the established norms.

## Malware Replication Detection

### Synopsis
The goal behind our plugin is to detect when malware is replicating. In a few words, the process is to taint a suspicious file that may try to copy itself, find the pc where the taint resurfaces and use it to find the asid of the process responsible

### Steps
The following steps are more of a proof of concept, but illustrate how idea put forth in the synopsis may be carried out. In the example pictures, we used a particularly simple recording that involved copying (using the `cp` command) a file 10 characters long.

1. We used the file_taint plugin to apply labels to the bytes of a file, which can then be queried with the tainted_instr plugin which returns the instructions (specifically, the program counter numbers) involving tainted data.

![image](https://hackmd.io/_uploads/Syu4uwapT.png)

2. We used the pc_search plugin to figure out the corresponding guest instruction values (i.e. ASID values).

![image](https://hackmd.io/_uploads/rknY_wTTT.png)

3. Finally, we used asidstory on the replay to get the ranges of instruction values for each process. We then mapped the guest instruction values we got from pc_search to a specific process. That is, we found the ranges that contained our guest instruction values, which told us what processes those instructions were a part of. This lets us determine if copy instructions were used and thus, if the malware is replicating.

![image](https://hackmd.io/_uploads/S1gjOPTaa.png)


## Malicious IP Detection
For this plugin, we wanted to meaningfully look through network traffic and identify telltale signs of malware existence. Specifically, we planned to scrutinize destination IPs for packets being sent out to. Subsequently, we'll flag potentially malicious IPs through a VirusTotal API. One other goal was to experiment with PyPANDA, a python interface that allows for interaction with PANDA.

After building the regular C++ binaries required for PANDA, we promptly install the pandare python package (contains PyPANDA) and any other pypanda dependencies. The verification objective for our plugin is simple, send a ping to 8.8.8.8 (Google's public DNS server) and be able to confirm that this isn't a suspicious IP. Initially, we prepared a recording that performs this ping, and believed that we can do analysis on this recording with PyPANDA as long as we specified the architecture (i386) and memory allocated for the system captured in the recording. To use PyPANDA, we instantiate a Panda object that takes the recording's system information as parameters. Oddly enough, we get allocated system memory mismatch errors despite specifying the memory in different formats/values. When we first ran the PyPANDA involved script, there was a prompt install of a certain i386 image, so it may be the case that PyPANDA only identifies this certain system configuration. A quick look through PyPANDA plugin examples revealed that most implementations involve making the recording and immediately analyzing the recording within the same session. Based on this record and replay workflow, in the same convention, we instead extended our script to perform the ping and then do an analysis as a quick remedy.

We loaded the C++ based network plugin from base Panda (one of many pyPANDA abilities), which we will use to produce a PCAP file that has all the network traffic info seen in the recording. We then rerun the 8.8.8.8 ping recording with this plugin and feed the outputted PCAP file to another script that extracts IP addresses using pyshark (module that allows for python packet parsing using wireshark dissectors). We then pass the list of obtained IP addresses to the VirusTotal API which successfully performed our verification. Although the results were as expected, we hope to look into the Panda object instantiation more deeply in the future, and potentially debug the issue with system specification.

<br/>

![image](https://hackmd.io/_uploads/r1oQAb_T6.png)
*Performing the record and replay with network plugin*

![tempsnip](https://hackmd.io/_uploads/HJf4xzd6p.png)
*Parsing outputted PCAP and returning IP verification*

## Conclusion

Overall we managed to get quite a lot of interesting detection capability out of PANDA both using its existing plugins and creating our own. Dynamic malware analysis has many benefits in terms of automating the process but our detection ability relies on specific things being true and malware that is context-aware can often times bypass our detection. Regardless of dynamic malware analysis' pros and cons, this project provides an interesting proof of concept in using PANDA to do dynamic malware analysis.
42 changes: 42 additions & 0 deletions data/blog/2024-03-11-winter-2024-fuzzing-lab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: Fuzzing Lab
authors: [Alex Zhang, Ronak Badhe, Simon Koski, Ryan Chang, Renuka Bhusari]
category: Projects
tags: [winter-2024, cyber-lab]
description: Making a fuzzer
---

# Finding Vulnerabilities in Open-Source Software with Fuzzing

In this quarter's Fuzzing Lab workshop, our members learned to use the powerful technique of fuzzing to discover software vulnerabilities. We spent the first few weeks covering fundamental topics such as compiling targets, writing harnesses, and running [Honggfuzz](https://honggfuzz.dev/). To practice these skills, we fuzzed old versions of some software and rediscovered known vulnerabilities. After that, we moved on to finding new vulnerabilities in open-source projects, focusing on parsers for various file formats that haven't already been fuzzed. You can read about some of our results in the following sections written by our members.

## libwbxml

This library parses WBXML files, a binary representation of XML, and converts them back to XML. They are used to compact XML files and reduce bandwidth in mobile communications. For example, it is used to send settings, calendar information, address books, notes, and instant messages.

While the library runs tests by converting XML files to WBXML and then converting them back, I only fuzzed the wbxml2xml (WBXML => XML) portion. This made WBXML test files difficult to find, but Alex Zhang found some files to test [here](https://github.com/dalgleish/wbxml). The harness creates a new parser, and then runs `wbxml_parser_parse` on the data from the file.

Another issue we ran into was the harness being dynamically linked to libwbxml, but the linker not being configured to find the library in the non-standard directory. We solved this by adding additional flags to our build commands so that the library is linked statically.

In the end, we found two unique vulnerabilities that lead to crashes & cannot be disclosed yet since they haven't been fixed.

## NanoSVG

This library parses SVG (Scalable Vector Graphics) files and turns them into a list of cubic bezier shapes which then can be drawn.

Though the function usually takes in an `.svg` file to parse, I directly called the function (`nvsgParse`) which the function that takes in an `.svg` file calls. I then took the sample provided `.svg` files in the Github to then use as samples for the fuzzer to build data upon.

Due to the nature of the library being one single header file, there were no difficulties in running and compiling the fuzzer with the harness.

Through over 1 billion inputs and over 80% coverage of the inputs, there were no crashes detected, indicating that it is unlikely that there is a bug within the code (at least within the function that was tested).


## cxml

This library is a parser for XML written in C. We were not actually able to fuzz this library as it contained a couple of compiler errors.

The most notable of these errors was a use after free error in the code where a lexer was freed and then used by a function a couple of lines later in the code. There was also an unsigned char error which was easily fixed by explicitly casting the variable to signed. This might be due to differences in whether signed is the default on the server we used.

After fixing these two issues on our end, we still ran into unresolved linker errors when building.

I decided to contact the author of the library directly to ask a couple of questions about the errors we received, and he was actually unaware of the use after free error in his code. He also stated that our linker errors may be because of the server we used, however, we did try to compile the code on an x86 machine and ran into the same errors.
Loading

0 comments on commit 9f9df4c

Please sign in to comment.