枫之叶 世界是平衡的,每个人都是通过自己的努力,去决定自己生活的样子。

监控体系:Prometheus探索

背景

为了完善监控体系,对Prometheus的探索。

概要:
* Prometheus server Docker安装
* Node exporter 原生安装:方便获取系统信息
* Alertmanager Docker安装

步骤

Prometheus server安装和配置

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

2020年5月28日安装版本:prometheus, version 2.18.1

网址:http://prometheus.a.com/

$ docker exec -it prometheus sh
/prometheus $ find / -name prometheus.yml
find: /root: Permission denied
/etc/prometheus/prometheus.yml

# 查看版本
$ prometheus --version
prometheus, version 2.18.1 (branch: HEAD, revision: ecee9c8abfd118f139014cb1b174b08db3f342cf)
  build user:       root@2117a9e64a7e
  build date:       20200507-16:51:47
  go version:       go1.14.2

修改配置:/etc/prometheus/prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).


# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093 # 去掉注释


# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - “first_rules.yml” # 去掉注释
  # - "second_rules.yml"


# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'


    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.


    static_configs:
    - targets: ['localhost:9090']

first_rules.yml

groups:
- name: example
  rules:
  - alert:  InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Instance has been down for more than 5 minutes

Prometheus 数据备份:https://www.cnblogs.com/zqj-blog/p/12205063.html

Node exporter 安装

不推荐Docker方式安装,因为要获取系统信息。

wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz

tar zxvf node_exporter-1.0.0.linux-amd64.tar.gz
cd node_exporter-1.0.0.linux-amd64
➜  ~ ./node_exporter
➜  ~ netstat -tnlp | grep node
tcp6       0      0 :::9100                 :::*                    LISTEN      15748/./node_export

➜  ~ curl http://127.0.0.1:9100/metrics


# 使用PromQL查询监控数据,将node_memory_MemFree_bytes复制到prometheus的查询命令⾏,就可以看到状态曲线了
➜  ~ curl http://127.0.0.1:9100/metrics|grep  node_memory_MemFree
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP node_memory_MemFree_bytes Memory information field MemFree_bytes.
# TYPE node_memory_MemFree_bytes gauge
node_memory_MemFree_bytes 1.33128192e+08
100 59696    0 59696    0     0  5234k      0 --:--:-- --:--:-- --:--:-- 5829k

Alertmanager 安装和配置

docker run -d -p 9093:9093 -v /data/prometheus/alertmanager:/etc/alertmanager/config.yml --name alertmanager prom/alertmanager

curl http://127.0.0.1:9093/#/alerts
curl http://127.0.0.1:9093/#/status

参考:
* 【集群监控】Docker上部署Prometheus+Alertmanager+Grafana实现集群监控:https://www.cnblogs.com/caizhenghui/p/9184082.html

功能测试

  • Node exporter启动后,查看Prometheus server控制面板状态
  • 手动停止Node exporter后,等待1分钟查看Prometheus server控制面板状态:FIRING状态

  • 查看Alertmanager web面板:有警告了

Q&A

当前时区不对,我能改变timezone和UTC吗?

为了避免任何时区的困惑和混乱,最好统一用UTC通用单位。

(完)

About the author

Add Comment

作者: admin
枫之叶 世界是平衡的,每个人都是通过自己的努力,去决定自己生活的样子。

碎言碎语

有点小懒,不定期更新

分类目录

标签