Prometheus 安装指南

发表于： 2019-11-15 更新于： 2021-01-02 分类于： software

字数： 1241 阅读：≈ 3分钟浏览：评论：

Prometheus 安装指南

1. 准备工作

推荐使用一个普通用户运行 prometheus 。

1
2
3
useradd -m -s /bin/bash prometheus

su - prometheus    # 切换用户

2. Prometheus

2.1 下载

打开网址 https://prometheus.io/download/ 找到 prometheus 下载那部分，找相应的版本下载。

然后解压到 /home/prometheus/prometheus 目录中。

2.2 配置系统服务

以 root 用户新建文件 /etc/systemd/system/prometheus.service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=prometheus
Restart=on-failure
#Change this line if you download the
#Prometheus on different path user
ExecStart=/home/prometheus/prometheus/prometheus \
  --config.file=/home/prometheus/prometheus/prometheus.yml \
  --storage.tsdb.path=/home/prometheus/prometheus/data
[Install]
WantedBy=multi-user.target

2.3 创建配置文件

创建配置文件 /home/prometheus/prometheus/prometheus.yml

以下是一份示例，从官方的 docker 版本里拿的配置文件。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

2.4 运行

以 root 身份使用 systemctl 运行。

1
2
3
4
5
6
7
# 设置开机启动
systemctl enable prometheus
# 立即运行
systemctl start prometheus

# 查看运行状态
systemctl status prometheus

正常启动后，即可通过 http://localhost:9090 进行访问了 dashboard ，而 http://localhost:9090/metrics 则可以收集自身的状态信息。

3. node exporter

Node_exporter 是 prometheus 里监控电脑状态的组件，一般用得比较多。

3.1 下载

打开网址 https://prometheus.io/download/ 找到 node_exporter 下载那部分，找相应的版本下载。

然后解压到 /home/prometheus/node_exporter 目录中。

3.2 配置系统服务

以 root 用户新建文件 /etc/systemd/system/node_exporter.service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
 
[Service]
User=prometheus
ExecStart=/home/prometheus/node_exporter/node_exporter
 
[Install]
WantedBy=default.target

3.3 运行

以 root 身份使用 systemctl 运行。

1
2
3
4
5
6
7
# 设置开机启动
systemctl enable node_exporter
# 立即运行
systemctl start node_exporter

# 查看运行状态
systemctl status node_exporter

正常启动后，即可通过 http://localhost:9100/metrics 查看并收集状态信息。

3.4 在 prometheus 中收集

编辑上面 2.3 节的配置文件，在 scrape_configs 底下增加配置：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  # 以下为新增的配置
  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']

改完后重启 prometheus 即可。

`1`	`systemctl restart prometheus`

增加其他节点同理。

3.5 使用 Basic auth 对接口进行权限限制

详见： https://prometheus.io/docs/guides/basic-auth/

需要使用 nginx 作为反射代理。

安装 htpasswd

`1`	`apt install apache2-utils`

创建一个密码文件

1
2
# -c 指定输出文件路径； admin 为用户名； 运行后会要求输入密码。
htpasswd -c /home/prometheus/node_htpasswd admin

配置 nginx

1
2
3
4
5
6
7
8
server {
    location / {
        auth_basic           "Prometheus";
        auth_basic_user_file /home/prometheus/node_htpasswd;

        proxy_pass           http://localhost:9100/;
    }
}

一个比较完整的配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
server {
    listen 443 ssl http2;
    server_name example.com;

    ssl on;
    ssl_certificate /etc/nginx/ssl/example.com.cer;
    ssl_certificate_key /etc/nginx/ssl/example.com.key;

    ssl_session_cache shared:SSL:5m;
    ssl_session_timeout  20m;
    ssl_protocols TLSv1.2;
    ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
    #ssl_prefer_server_ciphers on;

    gzip_vary  on;
    gzip_comp_level 1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml font/ttf font/opentype;

    location / {
        auth_basic           "Prometheus";
        auth_basic_user_file /home/prometheus/node_htpasswd;

        proxy_pass           http://localhost:9100/;
    }
}

配置 prometheus 增加 basic_auth

1
2
3
4
5
6
7
  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']
    # scheme: "https"
    basic_auth:
      username: admin
      password: your_password

4. Grafana

docker 版

1
docker run -d --name=grafana -p 3000:3000 grafana/grafana

默认的用户名密码为： admin/admin

普通 Linux 版本 ( https://grafana.com/grafana/download )

1
2
3
4
5
6
wget https://dl.grafana.com/oss/release/grafana-6.6.1.linux-amd64.tar.gz
tar -zxvf grafana-6.6.1.linux-amd64.tar.gz
mv grafana-6.6.1 grafana

cd grafana
./bin/grafana-server

配置系统服务： vi /etc/systemd/system/grafana.service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[Unit]
Description=Grafana Server
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
WorkingDirectory=/home/prometheus/grafana
ExecStart=/home/prometheus/grafana/bin/grafana-server

[Install]
WantedBy=multi-user.target

5. Prometheus + Grafana (Docker)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# my global config
global:
  scrape_interval:     1m
  evaluation_interval: 1m
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:

  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'example1'
    static_configs:
    - targets: ['example1.com']
    scheme: "https"
    basic_auth:
      username: user
      password: password

  - job_name: 'example2'
    static_configs:
    - targets: ['example2.com:9100']
    scheme: "https"
    basic_auth:
      username: user
      password: password

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version: "3.7"

services:
  prometheus:
    image: prom/prometheus
    restart: always
    volumes:
      - prom_data:/prometheus
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
    # ports:
    #   - "9090:9090"

  grafana:
    # default admin user is admin/admin
    image: grafana/grafana
    restart: always
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

volumes:
  prom_data:
  grafana_data:

**一个比较全面的中文 grafana dashboard 的 id 为 **8919

6. node-exporter (Docker)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
version: '3.8'

services:
  node_exporter:
    image: prom/node-exporter
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'