Skip to content

michaelkonstantinou/android-malware-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image

Android Malware Detection

Using a custom deep learning model

This repository contains a neural network model that detects whether a given application is a malware or not

TLDR;

I created the same deep learning model in Tensorflow and PyTorch to identify whether a given application is malicious or not. Check the notebooks or the website for more

Table of Contents

Problem Description

The widespread proliferation of Android devices has led to a concerning increase in malware threats, which pose significant risks to users' personal data and digital security. Malicious apps often disguise themselves as legitimate software, making them difficult to identify without specialized tools.

The provided dataset, contains some of the features that an application may have or services that it may be using. Given this input, I developed an AI model that tries to find a pattern between the features that may reveal whether an application is malicious or not

This was done specifically for educational purposes to learn better the two biggest ML frameworks: Tensorflow and PyTorch.

Approach

In the process of learning more about the implementation of neural networks and their respective frameworks, this model was created initially in Tensorflow and later on in PyTorch.

In general, there is no reason nor a significant difference between the two implementations. However, for educational purposes the two frameworks have been used

Pre-processing

The dataset was clean enough yet some pre-processing steps had to be taken before feeding the data to the model. Briefly put, a few missing values had to be replaced with the mean value of the respective column and the labels had to be encoded. For more details, please check the two notebooks

Model

The problem is of a binary classification. In other words, the model developed will output whether the given attributes consist of an android malware or a goodware.

To tackle this, a neural network has been used with the an input layer of 241 features and 3 hidden layers in between. For further details, please check the two notebooks

Tensorflow Vs PyTorch model

The main differences between the two models were the control of each step/algorithm. With Tensorflow, most of the steps were abstractly defined. For instance the train method was not implemented, the splitting of the dataset was not implemented. Tensorflow required only a few method calls to cover the aforementioned steps

In PyTorch, more effort was necessary to achieve the same output yet this allowed for more control of the output. For instance, the training method had to be manually written. Additionally, in PyTorch, a manual seed was also added in order to "esnure" reproducibility

Results

  • Accuracy: 99.89%
  • Precision: 99.46%
  • Recall: 100%
  • F1-Score: 99.73%

Confusion Matrix

Image

Technologies used

Docs website

  • Pico CSS
  • Vanilla JS
  • Jyputer notebooks as html

AI Models

  • Python
  • Tensorflow
  • PyTorch
  • Numpy
  • Pandas
  • Sci-kit learn
  • Matplot
  • Seaborn

Credits

Special thanks to

  • Freepik and sentavio for the featured image. More here
  • Joakim Arvidsson for the dataset. More here