[[:admin|< retour à la page de l'administration technique]]
Zertrin monitore les services de federez via son serveur perso (zertrin.org) avec monit. Cf config actuelle : https://haste.zertrin.org/rafugamege.txt (NB: federez-monit@zertrin.org est une redirection vers federez@googlegroups.com qui est la mailing de secours de l'équipe technique)
FIXME Documentation du monitoring à Federez sur les serveurs.
=== Mise en place ===
Lors de l'install d'un nouveau serveur, il vaut mieux installer monit pour surveiller les services de base et autres.
apt-get install monit
Ensuite, on paste la conf suivant dans /etc/monit/monitrc :
# Configuration de monit
# On ne met ici que les réglages généraux, la liste des services monitorés est dans services
# On peut rajouter des conf particulières dans le conf.d
set daemon 60
set logfile /var/log/monit.log
set mailserver localhost, smtp.crans.org
set alert monitoring@federez.net { uid gid size nonexist data icmp invalid exec timeout resource checksum timestamp connection permission }
# Local host doit pouvoir contacter monit
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
set mail-format {
from: monit@$HOST
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
Monit, unique employé de federez,
}
include /etc/monit/services
include /etc/monit/conf.d/*
Pour terminer, on met dans /etc/monitrc les services qu'on souhaite monitorer.
De base, il est nécessaire de monitorer ssh, nslcd, nscd et munin-node.
A adapter en fonction des services présents sur la bète.
# Services gérés par monit
# freeradius
check process freeradius with pidfile /var/run/freeradius/freeradius.pid
start program = "/etc/init.d/freeradius start"
stop program = "/etc/init.d/freeradius stop"
if 5 restarts within 5 cycles then timeout
# nslcd
check process nslcd with pidfile /var/run/nslcd/nslcd.pid
start program = "/usr/sbin/service nslcd start"
stop program = "/usr/sbin/service nslcd stop"
if failed unixsocket /var/run/nslcd/socket then restart
if 5 restarts within 5 cycles then timeout
# nscd
check process nscd with pidfile /var/run/nscd/nscd.pid
start program = "/usr/sbin/service nscd start"
stop program = "/usr/sbin/service nscd stop"
if failed unixsocket /var/run/nscd/socket then restart
if 5 restarts within 5 cycles then timeout
# fail2ban
check process fail2ban with pidfile /var/run/fail2ban/fail2ban.pid
start program = "/etc/init.d/fail2ban start"
stop program = "/etc/init.d/fail2ban stop"
if failed port 22 protocol ssh timeout 30 seconds then restart
if children > 200 then restart
if 5 restarts within 5 cycles then timeout
# ssh
check process ssh with pidfile /var/run/sshd.pid
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if failed port 22 protocol ssh timeout 30 seconds then restart
if children > 200 then restart
if 5 restarts within 5 cycles then timeout
# munin-node
check process munin-node with pidfile /var/run/munin/munin-node.pid
start program = "/usr/sbin/service munin-node start"
stop program = "/usr/sbin/service munin-node stop"
if 5 restarts within 5 cycles then timeout