[[:admin|< retour à la page de l'administration technique]] Zertrin monitore les services de federez via son serveur perso (zertrin.org) avec monit. Cf config actuelle : https://haste.zertrin.org/rafugamege.txt (NB: federez-monit@zertrin.org est une redirection vers federez@googlegroups.com qui est la mailing de secours de l'équipe technique) FIXME Documentation du monitoring à Federez sur les serveurs. === Mise en place === Lors de l'install d'un nouveau serveur, il vaut mieux installer monit pour surveiller les services de base et autres. apt-get install monit Ensuite, on paste la conf suivant dans /etc/monit/monitrc : # Configuration de monit # On ne met ici que les réglages généraux, la liste des services monitorés est dans services # On peut rajouter des conf particulières dans le conf.d set daemon 60 set logfile /var/log/monit.log set mailserver localhost, smtp.crans.org set alert monitoring@federez.net { uid gid size nonexist data icmp invalid exec timeout resource checksum timestamp connection permission } # Local host doit pouvoir contacter monit set httpd port 2812 and use address localhost # only accept connection from localhost allow localhost # allow localhost to connect to the server and set mail-format { from: monit@$HOST subject: monit alert -- $EVENT $SERVICE message: $EVENT Service $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION Monit, unique employé de federez, } include /etc/monit/services include /etc/monit/conf.d/* Pour terminer, on met dans /etc/monitrc les services qu'on souhaite monitorer. De base, il est nécessaire de monitorer ssh, nslcd, nscd et munin-node. A adapter en fonction des services présents sur la bète. # Services gérés par monit # freeradius check process freeradius with pidfile /var/run/freeradius/freeradius.pid start program = "/etc/init.d/freeradius start" stop program = "/etc/init.d/freeradius stop" if 5 restarts within 5 cycles then timeout # nslcd check process nslcd with pidfile /var/run/nslcd/nslcd.pid start program = "/usr/sbin/service nslcd start" stop program = "/usr/sbin/service nslcd stop" if failed unixsocket /var/run/nslcd/socket then restart if 5 restarts within 5 cycles then timeout # nscd check process nscd with pidfile /var/run/nscd/nscd.pid start program = "/usr/sbin/service nscd start" stop program = "/usr/sbin/service nscd stop" if failed unixsocket /var/run/nscd/socket then restart if 5 restarts within 5 cycles then timeout # fail2ban check process fail2ban with pidfile /var/run/fail2ban/fail2ban.pid start program = "/etc/init.d/fail2ban start" stop program = "/etc/init.d/fail2ban stop" if failed port 22 protocol ssh timeout 30 seconds then restart if children > 200 then restart if 5 restarts within 5 cycles then timeout # ssh check process ssh with pidfile /var/run/sshd.pid start program = "/etc/init.d/ssh start" stop program = "/etc/init.d/ssh stop" if failed port 22 protocol ssh timeout 30 seconds then restart if children > 200 then restart if 5 restarts within 5 cycles then timeout # munin-node check process munin-node with pidfile /var/run/munin/munin-node.pid start program = "/usr/sbin/service munin-node start" stop program = "/usr/sbin/service munin-node stop" if 5 restarts within 5 cycles then timeout