Phishing is the practice of sending fraudulent communications that appear to come from a legitimate and reputable source, usually through email and text messaging. The attacker's goal is to steal money, gain access to sensitive data and login information, or to install malware on the victim's device. Phishing is a dangerous, damaging, and an increasingly common type of cyberattack.
This project involves an analysis on emails classified as 'phishing'.
Data source : https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset
Data Source Reference
Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619
Python
- Applied in the whole Data Cleaning process.SQL
- Applied in the Data Exploration phase.Jupyter Notebooks
- Used as the Data Cleaning environment.PostgreSQL
- Used as the Database Management System to handle the data.
- To find out the most frequent emails associated with phishing.
- Identify the frequent phishing targets.
- Identify the trend of phishing emails over time.
- To find out the most frequent target time and day.
- To identify the sender-receiver pattern in phishing.
SELECT
COUNT(*) AS total_observations
FROM
public.fraud_data;
SELECT
sender_email,
COUNT(*) as total_sent
FROM
public.fraud_data
GROUP BY
sender_email
ORDER BY
total_sent DESC;
SELECT
sender_name,
COUNT(*) as total
FROM
public.fraud_data
GROUP BY
sender_name
ORDER BY
total DESC;
SELECT
receiver_email,
COUNT(*) as total_received
FROM
public.fraud_data
WHERE
receiver_email != '[email protected]'
GROUP BY
receiver_email
ORDER BY
total_received DESC
LIMIT 10;
SELECT
date AS date_sent,
COUNT(*) AS total_emails
FROM
public.fraud_data
GROUP BY
date
ORDER BY
total_emails DESC
LIMIT 10;
SELECT
TO_CHAR(date, 'Day') AS day_of_week,
COUNT(*) AS total_sent
FROM
public.fraud_data
GROUP BY
day_of_week
ORDER BY
total_sent DESC;
SELECT
subject
FROM
public.fraud_data
WHERE
subject LIKE '%Important%' OR
subject LIKE '%Money%' OR
subject LIKE '%Urgent%' OR
subject LIKE '%action%';
SELECT
sender_email,
receiver_email,
COUNT(*) AS total
FROM
public.fraud_data
GROUP BY
sender_email,
receiver_email
ORDER BY
total DESC
LIMIT 10;
SELECT
CASE
WHEN EXTRACT(HOUR FROM time) = 0 THEN '12 AM'
WHEN EXTRACT(HOUR FROM time) < 12 THEN CONCAT(EXTRACT(HOUR FROM time)::text, ' AM')
WHEN EXTRACT(HOUR FROM time) = 12 THEN '12 PM'
ELSE CONCAT((EXTRACT(HOUR FROM time) - 12)::text, ' PM')
END AS hour_of_day,
COUNT(*) AS total
FROM
public.fraud_data
GROUP BY
hour_of_day
ORDER BY
total DESC;
SELECT
EXTRACT(HOUR FROM time) AS hour_of_day,
TO_CHAR(date, 'Day') AS day_of_week,
COUNT(*) AS total
FROM
public.fraud_data
GROUP BY
hour_of_day,
day_of_week
ORDER BY
total DESC
LIMIT 10;
- You're more likely to receive a phishing email on
Tuesday
than any other day. - An email from
USAA
is likely to be used in phishing. Urgent, Make Money, Important
are likely to be used in phishing emails.- Phishing emails are likely to be sent between
12 PM
and1 PM
. - Phishing emails are likely to contain lure words such as
offers
. Example :[email protected]
- Verify all emails before clicking any link.
- Implement mandatory and regular phishing awareness training for all employees.
- Develop and regularly update incident response plans specifically tailored for phishing attacks.
- Enforce the use of multi-factor authentication (MFA) across all corporate email accounts and critical systems.
- Enhance email filtering systems by using machine learning algorithms that can detect and flag phishing attempts more accurately.