Project of Technologies for Advanced Programming
Grade: 30 with honors / 30
Antonio Scardace @ Dept of Math and Computer Science, University of Catania
The course aims to study and use useful technologies to build end-to-end solutions to analyze, manage, archive, process, and view a high amount of data in real-time. For instance, we have seen: Docker containers, and pipelines built with Logstash (for data ingestion), Kafka (for data streaming), Spark (for data processing), ElasticSearch (for data storing), and Kibana (for data visualization).
This project was created as an exam project, to test and practice the following skills:
- Knowledge of Docker
- Knowledge of Data Ingestion via Logstash
- Knowledge of Data Streaming via Kafka
- Knowledge of Data Processing via Spark
- Knowledge of Data Storing via Elasticsearch
- Knowledge of Data Visualization via Kibana
- Knowledge of Jupyter Notebook (for the presentation)
The aim of the project is to make stats on the real-time use of the system by the user (and by users in general).
It can be useful as:
- System Monitor owned by Operating Systems owners
- System Monitor for Public Offices Computers
- System Monitor for Prison Computers
- Parental Control
- Spyware
The data source is a Windows Keylogger which sends a log to the TCP server on each foreground window change OR after 1 minute of user inactivity.
The log has the following pattern:
[UUID] :: [Window Title] :: [Timestamp Start]
Logged Text...
[Timestamp End] :: [IP Address]
Each log is composed by:
- UUID: Identifies the PC univocally. Has the following format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
- Window Title: Is the title of the window where the user has typed.
- Timestamp Start: Indicates when the user started typing in that window.
- Logged Text: Is the set of keys pressed by the user and logged by the keylogger.
- Timestamp End: Indicates when the user finished typing in that window.
- IP Address: Is the public IP address. If the PC has no connection, the default value is "Unknown".
For instance:
[154A9DC6-FF4E-4149-B81C-610AE7BBD151] :: [WhatsApp] :: [2022-01-01 12:00:00]
Hi Nicole, happy new year!!
[2022-01-01 12:00:13] :: []
Receives logs (from multiple clients) and passes them to the pipeline illustrated below:
The following functions are available for each user (personal stats) and for all users (general stats):
- For Logged Text:
- Top 8 Last Logged Texts 📄
- Sentiment analysis 📈
- For Metadata:
- Top 10 most used applications 🔖
- Used windows classification 📊
- Social
- Utility
- Entertainment
- Web Browsing
- Office & Study
- Other
- Customers Geolocation by IP 🌎
- Different stats about time spent writing to the PC 👀
Let's see the structure of the project and how I have used all the components.
Each component used in this project has been put inside a Docker Container 🐳
So that the repository is successfully cloned and project run smoothly, a few steps need to be followed.
- At least 12 GB of RAM.
- At least 25 GB of free space.
- Use of Linux, MacOS, or Windows WSL.
- Need to download and install Docker (but the use of Docker Desktop is optional).
- The use of Visual Studio Code is strongly recommended.
$ git clone
$ cd YOUR_PATH/System-Stats-By-Keylogger/
$ bash
Container | URL | Description |
broker | http://localhost:8080 | UI for Kafka |
elasticsearch | http://localhost:9200 | ElasticSearch basic URL |
elasticsearch | http://localhost:9200/keylogger_stats/_search | ElasticSearch index URL |
elasticsearch | http://localhost:9200/keylogger_stats/_search?... | ElasticSearch URL to get all logs |
elasticsearch | http://localhost:9200/keylogger_stats/_search?... | ElasticSearch URL to get all metadata |
kibana | http://localhost:5601 | Kibana basic URL |
kibana | http://localhost:5601/dashboards/list?... | Kibana Dashboards List |
Author: Antonio Scardace.
Distributed under the GNU General Public License v3.0. See LICENSE
for more information.