Move and rename files and add state machine

This commit is contained in:
Robin Meier 2024-03-19 22:02:37 +01:00
parent 53cb304d7e
commit 699a2e16e8
19 changed files with 189 additions and 102 deletions

3
.gitignore vendored
View File

@ -3,6 +3,9 @@
.last_changelog_read
storage/
log/
config/*
!config/*.EXAMPLE
# This file is unused atm
docker_health_check.sh

View File

@ -9,6 +9,8 @@ The covered tasks range from file change tracking via http/ssh monitoring to zfs
## Installation
The scripts in this repo ***must*** be checked out into `/root/scripts`.
This is subject to change.
The scripts should be installable wherever.
```bash
cd /root
@ -30,21 +32,25 @@ This will be helpful when updating the admin scripts later on.
### Config Files
For each script there is a `.[script_name]_env.EXAMPLE` file, which you must copy (remove `.EXAMPLE` part) and edit while providing your own information.
Config files are located in the `config/` directory.
Each script has its own cofiguration file, in this repo there only are the `config/[script_name].EXAMPLE` example configuration files.
For each script you want to use, you must copy the example and fill in your own data.
For example (`monitoring.sh`):
```bash
SCRIPT_NAME=zfs_health_check
cp /root/scripts/.${SCRIPT_NAME}_env.EXAMPLE /root/scripts/.${SCRIPT_NAME}_env
vim /root/scripts/.${SCRIPT_NAME}_env
cd /root/scripts
cp config/monitoring.EXAMPLE config/monitoring
vim config/monitoring
```
If you want to use the example configuration, you could symbolic link the files instead of just copying them.
This really only makes sense for `.system_health_check`.
If you want to use the provided example configuration, you could symbolic link the files instead of just copying them.
This really only makes sense for `system_health_check`.
The command for this is:
```bash
cd /root/scripts
ln -s .system_health_check.EXAMPLE .system_health_check
ln -s config/system_health_check.EXAMPLE config/system_health_check
```
### Shutdown Notification
@ -64,7 +70,7 @@ To install the [startup helper script](#using-startup-helper) into the regular u
```bash
USRNAME=radioelephant
cp /root/scripts/post_startup.sh /home/$USRNAME/post_startup.sh
cp /root/scripts/.post_startup_env.EXAMPLE /home/$USRNAME/.post_startup_env
cp /root/scripts/config/post_startup.EXAMPLE /home/$USRNAME/.post_startup_env
chown $USRNAME:$USRNAME /home/$USRNAME/post_startup.sh
chown $USRNAME:$USRNAME /home/$USRNAME/.post_startup_env
vim /home/$USRNAME/.post_startup_env
@ -83,10 +89,10 @@ cd /root/scripts
git pull
```
For most of the scripts you only need to check if the `.[script_name]_env.EXAMPLE` has changed and contains different keys than your copied `.[script_name]_env` file.
For your convenience, changes to environment variable files will be documented in the [CHANGELOG](CHANGELOG.md).
If you followed the instructions in this README, then you will find the last time you updated this repository in the `.last_changelog_read` file.
Read it with `cat /root/scripts/.last_changelog_read`.
For most of the scripts you only need to check if the example configuration file (i.e. `config/monitoring.EXAMPLE`) has changed and contains different keys than your copy (i.e. `config/monitoring`).
For your convenience, changes to configuration files will be documented in the [CHANGELOG](CHANGELOG.md).
If you followed the instructions in this README, then you will find the last time you pulled this repository and read the [CHANGELOG](CHANGELOG.md) in the `.last_changelog_read` file.
Get the value with `cat /root/scripts/.last_changelog_read`.
**Make sure to update the last reading time file after reading the CHANGELOG with `date > /root/scripts/.last_changelog_read`**
@ -123,9 +129,11 @@ This will be noted in the CHANGELOG for your convenience.
The check and monitoring scripts in this repo can be run periodically be run and if any problems are detected, they produce output.
The output of these scripts can be redirected and used however you like.
Typically I redirect the output to the `telegram_notification.sh` script which notifies me of any noisy scripts.
Typically I redirect the output to the `helpers/tg_notify.sh` script which notifies me of any noisy scripts.
In case of expected repeating failures, I first redirect the ouput to `helpers/state_machine.sh "keyword"` which silences repeated messages.
The "state machine" only saves a copy of the last message per keyword and compares it to the next message.
Regardless of any problems each script also logs its executions in `/root/logs`.
Regardless of any problems each script also logs its executions under `logs/`
Make sure you created this folder during [installation](#installation).
### Crontab Scheduling
@ -137,20 +145,20 @@ If you are unsure about the cron schedule, use [Crontab Guru](https://crontab.gu
My current crontab looks like this:
```crontab
* * * * * bash -c '/root/scripts/file_monitor.sh | /root/scripts/telegram_notification.sh'
*/2 * * * * bash -c '/root/scripts/monitoring.sh | /root/scripts/telegram_notification.sh'
*/4 * * * * bash -c '/root/scripts/dyndns.sh | /root/scripts/telegram_notification.sh'
*/3 * * * * bash -c '/root/scripts/system_health_check.sh | /root/scripts/telegram_notification.sh'
15 * * * * bash -c '/root/scripts/docker_health_check.sh | /root/scripts/telegram_notification.sh'
*/15 * * * * bash -c '/root/scripts/zfs_health_check.sh | /root/scripts/telegram_notification.sh'
* * * * * bash -c 'cd /root/scripts && ./file_monitor.sh | ./helpers/tg_notify.sh'
*/2 * * * * bash -c 'cd /root/scripts && ./monitoring.sh | ./helpers/state_machine.sh "monitoring" | ./helpers/tg_notify.sh'
*/4 * * * * bash -c 'cd /root/scripts && ./dyndns.sh | ./helpers/tg_notify.sh'
*/3 * * * * bash -c 'cd /root/scripts && ./system_health_check.sh | ./helpers/state_machine.sh "system" | ./helpers/tg_notify.sh'
15 * * * * bash -c 'cd /root/scripts && ./docker_health_check.sh | ./helpers/state_machine.sh "docker" | ./helpers/tg_notify.sh'
*/15 * * * * bash -c 'cd /root/scripts && ./zfs_health_check.sh | ./helpers/state_machine.sh "zfs" | ./helpers/tg_notify.sh'
@reboot sleep 10 && /root/scripts/telegram_notification.sh '[STARTUP] System just booted'
@reboot sleep 30 && bash -c '/root/scripts/zfs_health_check.sh | /root/scripts/telegram_notification.sh'
@reboot sleep 10 && /root/scripts/helpers/tg_notify.sh '[STARTUP] System just booted'
@reboot sleep 30 && bash -c 'cd /root/scripts && ./zfs_health_check.sh | ./helpers/state_machine.sh "zfs" | ./helpers/tg_notify.sh'
```
Adapt this to your needs, you might also implement other checks and only use the `telegram_notification.sh` script from this repo.
Adapt this to your needs, you might also implement other checks and only use the `helpers/tg_notify.sh` script from this repo.
Or you might implement your own notification script to notify you via another service.
The `telegram_notification.sh` can easily be adapted (just remove comment) to forward all notifications to `STDOUT` which typically makes cron send a mail.
The `helpers/tg_notify.sh` can easily be adapted (just remove comment) to forward all notifications to `STDOUT` which typically makes cron send a mail.
### Using Startup Helper

View File

@ -0,0 +1,2 @@
STORAGE_PATH=/root/scripts/storage/state_machine # NO trailing slash
RENOTIFY_AGE_SEC=7200 # In seconds (2h)

View File

@ -1,18 +1,15 @@
#!/bin/bash
logfile=/root/logs/dyndns.log
log_identifier="[DNS]"
log() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" >> $logfile
}
log_echo() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" | tee -a $logfile
}
# Load configuration
set -o allexport
source /root/scripts/.dyndns_env
source /root/scripts/config/dyndns
set +o allexport
# Import logging functionality
logfile=/root/scripts/log/dyndns.log
log_identifier="DNS"
source /root/scripts/functions/logging.sh
url="https://${USERNAME}:${PASSWORD}@infomaniak.com/nic/update?hostname="
log "Updating DynDNS for ${MAIN_DOMAIN}"

View File

@ -8,30 +8,28 @@
# Author: Robin Meier - robin@meier.si
################################################################################
logfile=/root/logs/file_monitor.log
log_identifier="[FILE]"
log() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" >> $logfile
}
log_echo() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" | tee -a $logfile
}
# Load configuration
set -o allexport
source /root/scripts/.file_monitor_env
source /root/scripts/config/file_monitor
set +o allexport
# Import logging functionality
logfile=/root/scripts/log/file_monitor.log
log_identifier="FILE"
source /root/scripts/functions/logging.sh
# Make sure directory exists
mkdir -p /root/scripts/storage/file_monitor
for file in $FILES
do
# Touch storage file if not existing
if [ ! -f /root/scripts/storage/file_monitor/${file//\//_} ]; then
touch /root/scripts/storage/file_monitor/${file//\//_} ];
touch /root/scripts/storage/file_monitor/${file//\//_}
fi
if [ "$file" -nt "/root/scripts/storage/file_monitor/${file//\//_}" ]; then
log_echo "[CHANGE] $file"
touch /root/scripts/storage/file_monitor/${file//\//_} ];
touch /root/scripts/storage/file_monitor/${file//\//_}
fi
done

19
functions/logging.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
if [ -z "$logfile" ]; then
echo "logfile variable missing"
exit 1
fi
if [ -z "$log_identifier" ]; then
echo "log_identifier variable missing"
exit 1
fi
log() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] [$log_identifier]" >> $logfile
}
log_echo() {
log $@
echo -e "[$log_identifier] $@"
}

78
helpers/state_machine.sh Executable file
View File

@ -0,0 +1,78 @@
#!/bin/bash
################################################################################
# STATE_MACHINE.SH
# ----------------
# This script saves the last message for a certain key and compares the next
# message for the same key to not have any repeating notifications.
#
# Author: Robin Meier - robin@meier.si
################################################################################
set -o allexport
source /root/scripts/config/state_machine
set +o allexport
mkdir -p $STORAGE_PATH # Make sure STORAGE_PATH exists
# Get input from standard input or via first parameter
if [[ $# -eq 0 ]]; then
echo "[ERROR] Not enough arguments!"
exit 1
elif [[ $# -eq 1 ]]; then
MESSAGE=$(timeout 32 cat)
KEY=$1
elif [[ $# -eq 2 ]]; then
KEY=$1
MESSAGE=$2
else
echo "[ERROR] Too many arguments!"
exit 1
fi
# Check if KEY is empty
if [[ -z "${KEY}" ]]; then
echo "[ERROR] KEY argument is missing!"
exit 1
fi
KEY_FILE="${STORAGE_PATH}/${KEY}.txt"
if [[ -f $KEY_FILE && -z "${MESSAGE}" ]]; then
# Previous message present and empty message now
OLD_MESSAGE=$(cat $KEY_FILE)
echo "✅ Resolved"
echo "$OLD_MESSAGE"
rm $KEY_FILE
exit 0
elif [[ -f $KEY_FILE ]]; then
# Message and previous message present
OLD_MESSAGE=$(cat $KEY_FILE)
# Compare contents
if [[ "$OLD_MESSAGE" == "$MESSAGE" ]]; then
# Check last notification
if [ "$(( $(date +"%s") - $(stat -c "%Y" "$KEY_FILE") ))" -gt "$RENOTIFY_AGE_SEC" ]; then
touch $KEY_FILE
echo "‼Renotify"
else
exit 0
fi
else
echo "$MESSAGE" > $KEY_FILE
echo "⁉Changed"
fi
else
if [[ -z "${MESSAGE}" ]]; then
# No message present
exit 0
fi
# New message present, create KEY_FILE, continue to relaying
echo "$MESSAGE" > $KEY_FILE
echo "❗New"
fi
# Relay message if made it until here (Quotes are important here, so lines dont get .join(' ')-ed)
echo "$MESSAGE"
exit 0

View File

@ -1,24 +1,23 @@
#!/bin/bash
################################################################################
# TELEGRAM_NOTIFICATION.SH
# ------------------------
# This script takes input via stdin or parameters, removes timestamps from each
# line and replaces newlines with telegram compatible ones and then sends the
# message to a chat
# TG_NOTIFY.SH
# ------------
# This script takes input via stdin or parameters, replaces newlines with
# telegram compatible ones and then sends the message to a chat
#
# Author: Robin Meier - robin@meier.si
################################################################################
set -o allexport
source /root/scripts/.telegram_notification_env
source /root/scripts/config/tg_notify
set +o allexport
BOT_API_URL=https://api.telegram.org/bot${BOT_TOKEN}
# Get input from standard input or via first parameter
if [[ $# -eq 0 ]]; then
MESSAGE=$(timeout 30 cat)
MESSAGE=$(timeout 32 cat)
elif [[ $# -eq 1 ]]; then
MESSAGE=$1
elif [[ $# -eq 2 ]]; then
@ -33,11 +32,6 @@ if [[ -z "${MESSAGE}" ]]; then
exit 0
fi
# Strip timestamps from message
if [ "${MESSAGE:0:12}" == "$(echo '' | ts "[%Y-%m-%d")" ]; then
MESSAGE=$(echo -e "$MESSAGE" | cut -c 23-)
fi
# Replace newlines in message for telegram
TG_MESSAGE=${MESSAGE//$'\n'/\%0A}

View File

@ -1,18 +1,14 @@
#!/bin/bash
logfile=/root/logs/monitoring.log
log_identifier="[MON]"
log() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" >> $logfile
}
log_echo() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" | tee -a $logfile
}
# Load configuration
set -o allexport
source /root/scripts/.monitoring_env
source /root/scripts/config/monitoring
set +o allexport
# Import logging functionality
logfile=/root/scripts/log/monitoring.log
log_identifier="MON"
source /root/scripts/functions/logging.sh
problems=0
@ -23,28 +19,24 @@ do
if [[ $(nc -w 2 ${ssh_host//:/ } <<< "\0" ) =~ "OpenSSH" ]] ; then
log "[SSH] [OK] ${ssh_host} is reachable"
else
# TODO: Rate limit fail messages, also add is back up message
log_echo "[SSH] [FAIL] ${ssh_host} not reachable"
problems=1
fi
done
# TODO: HTTP Status Code 200 Monitoring
for http_host in $HTTP_MONITORING
do
status_code=$(curl --write-out %{http_code} --silent --output /dev/null $http_host)
if [[ "$status_code" -eq 200 ]] ; then
log "[WEB] [OK] ${http_host}"
else
# TODO: Rate limit fail messages, also add is back up message
log_echo "[WEB] [FAIL] ${http_host} status code is ${status_code}"
problems=1
fi
done
if [[ "$problems" -eq "0" ]]; then
log "Monitoring Run Successful"
else
log_echo "Monitoring Run Failed"
log "Monitoring Run Failed"
fi

View File

@ -5,7 +5,7 @@ Before=shutdown.target
[Service]
Type=oneshot
ExecStart=/root/scripts/telegram_notification.sh "[SHUTDOWN] System going down"
ExecStart=/root/scripts/helpers/tg_notify.sh "[SHUTDOWN] System going down"
TimeoutStartSec=0
[Install]

View File

@ -8,19 +8,15 @@
# Author: Robin Meier - robin@meier.si
################################################################################
logfile=/root/logs/system_health_check.log
log_identifier="[SYS]"
log() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" >> $logfile
}
log_echo() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" | tee -a $logfile
}
# Load configuration
set -o allexport
source /root/scripts/.system_health_check_env
source /root/scripts/config/system_health_check
set +o allexport
# Import logging functionality
logfile=/root/scripts/log/system_health_check.log
log_identifier="SYS"
source /root/scripts/functions/logging.sh
problems=0
@ -28,8 +24,9 @@ log "Starting System Health Check"
# RAM usage percentage
ram=$(free | awk '/Mem/{printf("%.2f"), $3/$2*100}')
if [ $(echo "$ram > $RAM_LIMT" | bc -l) -eq 1 ]; then
log_echo "[RAM] usage is ${ram}%! (Limit: $RAM_LIMIT)"
if [ $(echo "$ram > $RAM_LIMIT" | bc -l) -eq 1 ]; then
log_echo "[RAM] usage is abobe limit of ${RAM_LIMIT}%!"
log "[RAM] usage is ${ram}%! (Limit: $RAM_LIMIT)"
problems=1
else
log "[RAM] usage is ${ram}%"
@ -38,7 +35,8 @@ fi
# CPU usage percentage
cpu=$(top -bn1 | grep "Cpu(s)" | awk '{print $2 + $4}')
if [ $(echo "$cpu > $CPU_LIMIT" | bc -l) -eq 1 ]; then
log_echo "[CPU] load is ${cpu}%! (Limit: $CPU_LIMIT)"
log_echo "[CPU] load is above limit of ${CPU_LIMIT}%!"
log "[CPU] load is ${cpu}%! (Limit: $CPU_LIMIT)"
problems=1
else
log "[CPU] load is ${cpu}%"
@ -50,7 +48,8 @@ fi
# Temperature
avg_cpu_temp=$(sensors | awk '/^Core /{++r; gsub(/[^[:digit:]]+/, "", $3); s+=$3} END{print s/(10*r)}')
if [ $(echo "$avg_cpu_temp > $TEMP_LIMIT" | bc -l) -eq 1 ]; then
log_echo "[TEMP] is ${avg_cpu_temp}°C! (Limit: $TEMP_LIMIT)"
log_echo "[TEMP] is above limit of ${TEMP_LIMIT}°C!"
log "[TEMP] is ${avg_cpu_temp}°C! (Limit: $TEMP_LIMIT)"
problems=1
else
log "[TEMP] is ${avg_cpu_temp}°C"
@ -71,5 +70,5 @@ fi
if [ ${problems} -eq 0 ]; then
log "System Health Check Successful"
else
log_echo "System Health Check Found Problems"
log "System Health Check Found Problems"
fi

View File

@ -10,21 +10,18 @@
# Author: Robin Meier - robin@meier.si
################################################################################
logfile=/root/logs/zfs_health_check.log
log_identifier="[ZFS]"
log() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" >> $logfile
}
log_echo() {
echo -e $@ | ts "[%Y-%m-%d %H:%M:%S] $log_identifier" | tee -a $logfile
}
# Load configuration
set -o allexport
source /root/scripts/config/zfs_health_check
set +o allexport
# Import logging functionality
logfile=/root/scripts/log/zfs_health_check.log
log_identifier="ZFS"
source /root/scripts/functions/logging.sh
problems=0
set -o allexport
source /root/scripts/.zfs_health_check_env
set +o allexport
log "Starting ZFS Health Check"
# Pool Status
@ -123,5 +120,5 @@ done
if [ ${problems} -eq 0 ]; then
log "ZFS Health Check Successful"
else
log_echo "ZFS Health Check Found Problems"
log "ZFS Health Check Found Problems"
fi