前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Prometheus 监控虚拟机状态

Prometheus 监控虚拟机状态

原创
作者头像
软件书桌
修改2024-04-30 15:04:42
1010
修改2024-04-30 15:04:42

通过监控虚拟机状态,虚拟机宕机之后,发送告警邮件,这样一个小案例,将 Prometheus 的入门使用给记录下来。

  • 部署 Prometheus
代码语言:javascript
复制
# 安装 Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.25.0/prometheus-2.25.0.linux-amd64.tar.gz

tar xf  prometheus-2.25.0.linux-amd64.tar.gz -C /usr/local

cd /usr/local
mv prometheus-2.25.0.linux-amd64/ prometheus

# 启动 Prometheus
cd /usr/lib/systemd/system

vim prometheus.service
[Unit]
  Description=https://prometheus.io
  
  [Service]
  Restart=on-failure
  ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090

  [Install]                      
  WantedBy=multi-user.target


systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus
systemctl status prometheus

# 访问 Prometheus Web UI
http://178.104.163.109:9090
http://178.104.163.109:9090/metrics

  • 部署 Grafana
代码语言:javascript
复制
# 安装 Grafana
wget https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm/grafana-9.3.2-1.x86_64.rpm

yum install initscripts fontconfig
yum install -y grafana-7.4.3-1.x86_64.rpm


# 启动 Grafana
systemctl start grafana-server.service
systemctl status grafana-server.service 
systemctl enable grafana-server.service


# 访问 Grafana Web UI
http://178.104.163.109:3000/login
admin / admin

  • 部署 node-exporter
代码语言:javascript
复制
[root@desktop-a853 ~]# cat /usr/local/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090','178.104.163.105:9100']

导入 node-exporter Grafana Dashboard 。

代码语言:javascript
复制
# 设置告警规则匹配目录
vi prometheus.yml

rule_files:
  - "rules/*.yml"

alerting:
  alertmanagers:
  - static_configs:
    - targets:       # 这里指定将告警发送到那里,发送到alertmanager
      - 192.168.1.20:9093     # alertmanager 地址
  
  
# 添加告警规则
vi ./rules/node_rule.yml

groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node"} == 0  
    for: 10s                   
    labels:                    
      severity: 1              
      team: node
    annotations:               
      summary: "已停止运行超过 15s"
      description: hello world
      
# 重启 Prometheus
systemctl restart prometheus

  • 部署 Alertmanager
代码语言:javascript
复制
# 安装 alertmanager
tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz

# 拷贝并赋权
install -m 0755 alertmanager-0.21.0.linux-amd64/{alertmanager,amtool} /usr/bin

  • 配置告警邮件
代码语言:javascript
复制
# 添加 alertmanager.yml 配置文件
cat >> alertmanager.yml <<EOF
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.163.com:25' # 邮箱smtp服务器代理
  smtp_from: 'demo*@163.com' # 发送邮箱名称
  smtp_auth_username: 'demo*@163.com' # 邮箱名称
  smtp_auth_password: 'QNHPB***XBRMWCB' # 邮箱密码或授权码
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'mail'
receivers:
- name: 'mail'
  email_configs:
  - to: '*@*.com'
EOF

# 移动文件并设置权限
install -m 0644 -D alertmanager.yml /etc/alertmanager/alertmanager.yml

# 设置 systemctld
cat > alertmanager.service <<EOF
[Unit]
Description=Alertmanager handles alerts sent by client applications such as the Prometheus server.
Documentation=https://prometheus.io/docs/alerting/alertmanager/
After=network.target
 
[Service]
User=root
ExecStart=/usr/bin/alertmanager \\
  --config.file=/etc/alertmanager/alertmanager.yml \\
  --storage.path=/var/lib/alertmanager \\
  --cluster.advertise-address=0.0.0.0:9093
ExecReload=/bin/kill -HUP
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF


# 移动文件,并设置权限
install -m 0644 alertmanager.service /etc/systemd/system
 
# 启动服务
systemctl daemon-reload
systemctl start alertmanager
systemctl status alertmanager
systemctl enable alertmanager

# 访问 AlertManager
http://178.104.163.109:9093

将监控的虚拟机关机或者将虚拟机中的 node-exporter 关闭就可以触发邮件告警通知了。

有了这样一个基础环境,以后学习 Prometheus 相关的功能,就可以在这个环境中继续尝试了。

无论新学什么技术,先将一个 MVP 环境构建出来,似乎都是必不可少的。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
Prometheus 监控服务
Prometheus 监控服务(TencentCloud Managed Service for Prometheus,TMP)是基于开源 Prometheus 构建的高可用、全托管的服务,与腾讯云容器服务(TKE)高度集成,兼容开源生态丰富多样的应用组件,结合腾讯云可观测平台-告警管理和 Prometheus Alertmanager 能力,为您提供免搭建的高效运维能力,减少开发及运维成本。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档
http://www.vxiaotou.com