Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
juadde committed Mar 29, 2024
0 parents commit 9158892
Show file tree
Hide file tree
Showing 7 changed files with 919 additions and 0 deletions.
72 changes: 72 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# This file is part of Kubernetes Log Fetcher.
#
# Copyright (C) 2023 Airbus CyberSecurity SAS
#
# Kubernetes Log Fetcher is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option) any
# later version.
#
# Kubernetes Log Fetcher is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along with
# Kubernetes Log Fetcher. If not, see <https://www.gnu.org/licenses/>.

name: CI

on:
push:

jobs:
docker_build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build docker
uses: docker/build-push-action@v5
with:
context: kubernetes-log-fetcher
tags: kubernetes-log-fetcher
load: true
- name: Save results
run: docker save kubernetes-log-fetcher | gzip > kubernetes-log-fetcher.docker.tar.gz
- name: Prepare package
run: zip kubernetes-log-fetcher.docker.zip COPYING LICENSE kubernetes-log-fetcher.docker.tar.gz
# Store build as artifact
- name: Archive docker images as artifact
uses: actions/upload-artifact@v4
with:
name: kubernetes-log-fetcher
path: kubernetes-log-fetcher.docker.zip
if-no-files-found: error

docker_release:
runs-on: ubuntu-latest
needs: docker_build
if: ${{ startsWith(github.ref, 'refs/tags/') }}
steps:
- name: Download plugin
uses: actions/download-artifact@v4
with:
name: kubernetes-log-fetcher
- name: Extract artifact
run: unzip kubernetes-log-fetcher.docker.zip -d kubernetes-log-fetcher
- name: Load images
run: docker load --input kubernetes-log-fetcher/kubernetes-log-fetcher.docker.tar.gz
- name: Tag image
run: docker tag kubernetes-log-fetcher kubernetes-log-fetcher:${{ github.ref_name }}
- name: Save newly tagged image
run: docker save kubernetes-log-fetcher:${{ github.ref_name }} | gzip > kubernetes-log-fetcher-${{ github.ref_name }}.docker.tar.gz
- name: Prepare license files
run: mv kubernetes-log-fetcher/LICENSE kubernetes-log-fetcher/COPYING .
- name: Prepare release
run: zip kubernetes-log-fetcher-${{ github.ref_name }}.docker.zip LICENSE COPYING kubernetes-log-fetcher-${{ github.ref_name }}.docker.tar.gz
- name: Release
uses: softprops/action-gh-release@v1
with:
files: kubernetes-log-fetcher-${{ github.ref_name }}.docker.zip
fail_on_unmatched_files: true
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.vscode/
674 changes: 674 additions & 0 deletions COPYING

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Kubernetes Log Fetcher

Copyright (C) 2023 Airbus CyberSecurity SAS

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
57 changes: 57 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Kubernetes log fetcher

The purpose of this software is to collect logs from a Kubernetes cluster.

Kubernetes log fetcher runs on docker. The directory "cloud-module" contains the Dockerfile to build the log collector container (which runs get_logs.sh). The script writes the logs in files. An utilisation of it could be to send those logs to Graylog through rsyslog.

## License

Kubernetes log fetcher

Copyright (C) 2023 Airbus CyberSecurity SAS

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

### Third-party software usage

This program uses the following software to run:

| Software | Version | Copyright | License |
|-|-|-|-|
| Bitnami package for Kubectl | 1^ | 2023 VMware, Inc. | Apache-2.0 |
| jq | 1^ | 2012 Stephen Dolan | MIT |
| Kubectl | 1^ | 2023 Kubernetes | Apache-2.0 |

## Log Format

Predicting what the exact log format will be once we retrieve logs from a kubernetes cluster is not easy. The log format depends mainly on the applications running in that cluster. Every editor has different logic on how they represent the logs of their applications.
However, there are some cases where we can understand the logs without knowing the format beforehand :
- When the logs are in json format, we have both the fields and the values, and this is easy de parse so it makes the format intelligible for us. This is the case for the events from kubernetes that we are monitoring (kubectl get events)
- On every cluster, we have some containers that are always here. In fact, every kubernetes distribution follows some rules. This is why they all have containers for API Server, Scheduler, Controller Manager, and etcd. If we study there log format, we know that on every distribution, we will have roughly the same logs even if some have enhanced capabilities.

On every container log, we have a header containing useful information to determine where the log comes from
[resource_type/pod_name/container_name] <TIMESTAMP> <MESSAGE>

For the Controller manager :
[pod/cloud-controller-manager-cloud-cluster/cloud-controller-manager] 2023-05-31T06:48:34.354618248Z W0531 06:48:34.354571 1 controllermanager.go:288] "service" is disabled

For etcd :
[pod/etcd-cloud-cluster/etcd] 2023-06-23T12:51:06.225562796Z {"level":"info","ts":"2023-06-23T12:51:06.225Z","caller":"fileutil/purge.go:77","msg":"purged","path":"/var/lib/rancher/rke2/server/db/etcd/member/snap/0000000000000003-0000000000dc5c4e.snap"}
In this case, the message is in json format so we will be able to parse it easily.

For apiserver :
[pod/kube-apiserver-cloud-cluster/kube-apiserver] 2023-06-23T12:52:15.539558976Z W0623 12:52:15.539458 1 watcher.go:229] watch chan error: etcdserver: mvcc: required revision has been compacted

For the scheduler :
[pod/kube-scheduler-cloud-cluster/kube-scheduler] 2023-05-31T06:47:57.154583108Z I0531 06:47:57.154215 1 leaderelection.go:258] successfully acquired lease kube-system/kube-scheduler
8 changes: 8 additions & 0 deletions kubernetes-log-fetcher/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM bitnami/kubectl

COPY get_logs.sh get_logs.sh
VOLUME /var/log/cloud_cluster/

HEALTHCHECK CMD kubectl cluster-info

ENTRYPOINT ["./get_logs.sh"]
91 changes: 91 additions & 0 deletions kubernetes-log-fetcher/get_logs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#!/bin/bash

# This file is part of Kubernetes Log Fetcher.
#
# Copyright (C) 2023 Airbus CyberSecurity SAS
#
# Kubernetes Log Fetcher is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option) any
# later version.
#
# Kubernetes Log Fetcher is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along with
# Kubernetes Log Fetcher. If not, see <https://www.gnu.org/licenses/>.

## Functions
# Function to search for pattern in array
array_contains () {
local array="$1[@]"
local seeking=$2
local in=1
for element in "${!array}"; do
if [[ $element == "$seeking"* ]]; then
in=0
break
fi
done
return $in
}

while (true); do

## Launch a "kubectl logs" command for each pods, except if the said command is already running on the system.
# Retrieve all pods in all namespaces.
pods=$(kubectl get pods -o=jsonpath="{.items[*]['metadata.name', 'metadata.namespace']}" --all-namespaces --field-selector=status.phase=Running)

# Create an array from the pods variable.
read -ra pod_array <<< "$pods"

# Retrieve the index of the middle of the array (first half contains pods's name, second half pod's namespace).
mid_index=$(expr ${#pod_array[@]} / 2)

# Retrieve the list of commands of system running processes containing "kubectl logs".
mapfile -t ps_array < <( ps -A --no-headers --format cmd | grep "[k]ubectl logs")

# Parse the array.
for ((i=0; i<$mid_index; i++)); do
pod="${pod_array[i]}"
namespace="${pod_array[i+$mid_index]}"

# Check if the kubectl logs command is already running on the system for the current pod/namespace couple.
# If it is not the case, run the command.
if ! array_contains ps_array "kubectl logs $pod -n $namespace"; then
# Check if a log file already exists for that pod/namespace pair
log_file="/var/log/cloud_cluster/kubelogs/${namespace}_${pod}.log"
if [ -e $log_file ]; then
# Retrieve the timestamp of the last entry of the log file (format 2023-04-28T10:58:18.328114804Z)
last_log=$(tail -n 1 $log_file | awk -F" " '{print $2}')
timestamp="file lastline"
# Check if the retrieved date is valid, if not, we take the last modification time of the file
# This is to ensure the kubectl command do not fail.
if (! [[ $last_log =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]*Z$ ]] || ! $(date -d "$last_log" &> /dev/null)); then
last_log=$(date -r $log_file "+%Y-%m-%dT%H:%m:%S.%sZ")
timestamp="file timestamp"
fi

# Continue log where it was left off
echo "[INFO] Resuming log collection ($timestamp) for $pod : $namespace" >> /proc/1/fd/1
kubectl logs "$pod" -n "$namespace" --prefix=true --timestamps=true --all-containers=true --since-time=$last_log --follow >> /var/log/cloud_cluster/kubelogs/${namespace}_${pod}.log &
else
# Initial log provisioning
echo "[INFO] Initializing log collection for $pod : $namespace" >> /proc/1/fd/1
kubectl logs "$pod" -n "$namespace" --prefix=true --timestamps=true --all-containers=true --follow > /var/log/cloud_cluster/kubelogs/${namespace}_${pod}.log &
fi
fi
done

## Launch the kubectl get events except if it's already running on the system
if ! ps -A --no-headers --format cmd | grep "kubectl get events" | grep -v grep > /dev/null ; then
echo "[INFO] Launching/Resuming kubectl get events command" >> /proc/1/fd/1
kubectl get events --all-namespaces -o json --watch | jq 'recurse(.[]?) |= if . == null then "null" else . end' -c | jq 'if .items then .items[] else . end' -c > /var/log/cloud_cluster/events.log &
fi

# Loop every minute
sleep 60
echo "############ LOOP ############" >> /proc/1/fd/1
done

0 comments on commit 9158892

Please sign in to comment.