System monitoring with the help of Python, Elastic ELK stack, AWS, and Docker.
Build a monitoring and analytical system that can support internal teams with data inquiries. Equip the system with automated alerts and tickets to reduce team overhead.
- Design real-time and scheduled data processign solutions that collect and aggregate data.
- Build a system that can alert the team about problems and anomalies on Slack.
- Create an automated ticketing system on Jira.
- Install a Logstash image in a docker container and run the container using an EC2 instance on AWS cloud to continously extract newly created Cloudwatch Logs.
- Transform these logs and load them to Elastic cloud for monitoring purposes.
- Use AWS Lambda to extract data from different sources such as AWS and Balena clouds.
- Transform and aggregate data for monitoring and alerting purposes.
- Extract web app metrics by using Elastic Application Performance Monitoring (APM). Use the metrics to monitor user clicks, API and web page latencies.
- Extract Docker metrics by using Elastic Metricbeat. These metrics will tell us information such as uptime, resource usage, and any errors.
-
It is difficult to differentiate between error logs during log processing and filtering. There needs to be a better backend logging system that assigns unique codes to each error. That way it is easier to process and filter logs for alerting purposes.
-
AWS secheduled Lambdas are far cheaper than running AWS EC2 instances for processing data that is not accessed frequently.
-
The quality of data depends mainly on the initial preprocessing and aggregation steps. Enhance these steps further.
- Enhance error logs by using specific error codes, so warnings and severe bugs can be differentiated.
- Monitor the health of Docker containers and Kubernetes pods in near real time.