Prometheus Grafana Best Practices for IT Monitoring and Alerting 5

Mondo Technology Updated on 2024-01-31

Hello everyone, I'm Xiaofei, and today I'll talk about the batch installation.

The previous description describes how to build a complete monitoring and alerting platform with Prometheus + Grafana + AlertManager + Prometheus-Webhook.

Check out the previous 4 articles:

Best Practices for Setting Up IT Monitoring and Alerting for Prometheus + Grafana (1)Best Practices for Setting Up IT Monitoring and Alerting for Prometheus + Grafana (2)Best Practices for Setting Up IT Monitoring and Alerting for Prometheus + Grafana (3)Best Practices for Setting Up IT Monitoring and Alerting for Prometheus + Grafana (4).

Today, let's explain that node exporter is installed on all 400 machines in batches to collect monitoring information of 400 virtual machines.

Under this description, I wanted to use Ansible to install in batches, but because Ansible needs to configure the hosts file, and there are a lot of hosts, the user password login of SSH is troublesome, and the user password on each machine needs to be configured to log in with a key, and the internal use of jumpserver as an internal bastion host to control all machines, so I want to directly use the Ansible batch installation that comes with jumpserver, and Ansible is centralized in jumpserver, and there is a batch command function in the web interface, so you can directly use batch command installation。

Open the jumpserver interface, how to use jumpserver is not explained here, you can check the official documentation if you are not sure.

Open Bulk Command:

Here I need to install the other 184 machines, directly check:

Then write a bulk installation script:

#!/bin/bash#supports system:ubuntu18.04,ubuntu20.04cd ~sudo wget tar -xf node_exporter-1.4.0.linux-amd64.tar.gzsudo mv ~/node_exporter-1.4.0.linux-amd64 node_exportersudo rm -rf ~/node_exporter-1.4.0.linux-amd64.tar.gzsudo groupadd prometheussudo useradd -g prometheus -s /sbin/nologin prometheus -msudo chown -r prometheus:prometheus ~/node_exportersudo cat > node_exporter.After the service script is written, you need to put the script on a certain machine or oss, or on the network disk, you can use wget or curl the latter scp**, here I directly put the script into oss, because the script depends on node exporter-14.0.linux-amd64.tar.GZ compressed package, at the same time, uploaded to OSS, because the link on github is unstable, the speed is too slow. 

In the jumpserver batch script, just execute the following script:

wget &&chmod +x deploy-ubuntu.sh &&/deploy-ubuntu.sh

As shown in Fig

The order of installation failures is as follows:

The network is not connected, the virtual machine disk is full, the disk is not space, the user has no permissions, the script does not grant the execution permission, the installation package fails, please make sure that the installation package is not the link path of github, it is recommended to make sure that it is complete (the incomplete installation package fails to be uncompressed), prevent it from going to the internal oss space, and get the **link of oss. Now you need to write the IPs of these 184 machines into the configuration file of Prometheus's file service discovery, and I have provided a basic YML template here:

- labels: service: it-monitor brand: dell owner: zhangsan targets: -172.17.40.51:9100 - 172.17.40.54:9100 #
That's it, refresh the config file: curl -x post localhost:9090 - reload

Open the configuration file console of Prometheus, you can see that 184 monitoring targets, Ansible batch installation, the next switch can be explained.

Related Pages