Prometheus 安装指南
1. 准备工作
推荐使用一个普通用户运行 prometheus 。
1
2
3
| useradd -m -s /bin/bash prometheus
su - prometheus # 切换用户
|
2. Prometheus
2.1 下载
打开网址
https://prometheus.io/download/
找到 prometheus 下载那部分,找相应的版本下载。
然后解压到 /home/prometheus/prometheus 目录中。
2.2 配置系统服务
以 root 用户新建文件 /etc/systemd/system/prometheus.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| [Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=prometheus
Restart=on-failure
#Change this line if you download the
#Prometheus on different path user
ExecStart=/home/prometheus/prometheus/prometheus \
--config.file=/home/prometheus/prometheus/prometheus.yml \
--storage.tsdb.path=/home/prometheus/prometheus/data
[Install]
WantedBy=multi-user.target
|
2.3 创建配置文件
创建配置文件 /home/prometheus/prometheus/prometheus.yml
以下是一份示例,从官方的 docker 版本里拿的配置文件。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| # my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
|
2.4 运行
以 root 身份使用 systemctl 运行。
1
2
3
4
5
6
7
| # 设置开机启动
systemctl enable prometheus
# 立即运行
systemctl start prometheus
# 查看运行状态
systemctl status prometheus
|
正常启动后,即可通过 http://localhost:9090 进行访问了 dashboard ,而 http://localhost:9090/metrics 则可以收集自身的状态信息。
3. node exporter
Node_exporter 是 prometheus 里监控电脑状态的组件,一般用得比较多。
3.1 下载
打开网址
https://prometheus.io/download/
找到 node_exporter 下载那部分,找相应的版本下载。
然后解压到 /home/prometheus/node_exporter 目录中。
3.2 配置系统服务
以 root 用户新建文件 /etc/systemd/system/node_exporter.service
1
2
3
4
5
6
7
8
9
10
11
| [Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
ExecStart=/home/prometheus/node_exporter/node_exporter
[Install]
WantedBy=default.target
|
3.3 运行
以 root 身份使用 systemctl 运行。
1
2
3
4
5
6
7
| # 设置开机启动
systemctl enable node_exporter
# 立即运行
systemctl start node_exporter
# 查看运行状态
systemctl status node_exporter
|
正常启动后,即可通过 http://localhost:9100/metrics 查看并收集状态信息。
3.4 在 prometheus 中收集
编辑上面 2.3 节的配置文件,在 scrape_configs 底下增加配置:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
# 以下为新增的配置
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
|
改完后重启 prometheus 即可。
1
| systemctl restart prometheus
|
增加其他节点同理。
3.5 使用 Basic auth 对接口进行权限限制
详见:
https://prometheus.io/docs/guides/basic-auth/
需要使用 nginx 作为反射代理。
- 安装 htpasswd
1
| apt install apache2-utils
|
- 创建一个密码文件
1
2
| # -c 指定输出文件路径; admin 为用户名; 运行后会要求输入密码。
htpasswd -c /home/prometheus/node_htpasswd admin
|
- 配置 nginx
1
2
3
4
5
6
7
8
| server {
location / {
auth_basic "Prometheus";
auth_basic_user_file /home/prometheus/node_htpasswd;
proxy_pass http://localhost:9100/;
}
}
|
一个比较完整的配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| server {
listen 443 ssl http2;
server_name example.com;
ssl on;
ssl_certificate /etc/nginx/ssl/example.com.cer;
ssl_certificate_key /etc/nginx/ssl/example.com.key;
ssl_session_cache shared:SSL:5m;
ssl_session_timeout 20m;
ssl_protocols TLSv1.2;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
#ssl_prefer_server_ciphers on;
gzip_vary on;
gzip_comp_level 1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml font/ttf font/opentype;
location / {
auth_basic "Prometheus";
auth_basic_user_file /home/prometheus/node_htpasswd;
proxy_pass http://localhost:9100/;
}
}
|
- 配置 prometheus
增加 basic_auth
1
2
3
4
5
6
7
| - job_name: 'node'
static_configs:
- targets: ['localhost:9100']
# scheme: "https"
basic_auth:
username: admin
password: your_password
|
4. Grafana
docker 版
1
| docker run -d --name=grafana -p 3000:3000 grafana/grafana
|
默认的用户名密码为: admin/admin
普通 Linux 版本 (
https://grafana.com/grafana/download
)
1
2
3
4
5
6
| wget https://dl.grafana.com/oss/release/grafana-6.6.1.linux-amd64.tar.gz
tar -zxvf grafana-6.6.1.linux-amd64.tar.gz
mv grafana-6.6.1 grafana
cd grafana
./bin/grafana-server
|
配置系统服务:
vi /etc/systemd/system/grafana.service
1
2
3
4
5
6
7
8
9
10
11
12
| [Unit]
Description=Grafana Server
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
WorkingDirectory=/home/prometheus/grafana
ExecStart=/home/prometheus/grafana/bin/grafana-server
[Install]
WantedBy=multi-user.target
|
5. Prometheus + Grafana (Docker)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| # my global config
global:
scrape_interval: 1m
evaluation_interval: 1m
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'example1'
static_configs:
- targets: ['example1.com']
scheme: "https"
basic_auth:
username: user
password: password
- job_name: 'example2'
static_configs:
- targets: ['example2.com:9100']
scheme: "https"
basic_auth:
username: user
password: password
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| version: "3.7"
services:
prometheus:
image: prom/prometheus
restart: always
volumes:
- prom_data:/prometheus
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
# ports:
# - "9090:9090"
grafana:
# default admin user is admin/admin
image: grafana/grafana
restart: always
volumes:
- grafana_data:/var/lib/grafana
ports:
- "3000:3000"
volumes:
prom_data:
grafana_data:
|
**一个比较全面的中文 grafana dashboard 的 id 为 **8919
6. node-exporter (Docker)
1
2
3
4
5
6
7
8
9
10
11
12
13
| version: '3.8'
services:
node_exporter:
image: prom/node-exporter
container_name: node_exporter
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
|