美文网首页
Grafana 系列文章(十四):Helm 安装Loki

Grafana 系列文章(十四):Helm 安装Loki

作者: 东风微鸣 | 来源:发表于2023-02-10 19:45 被阅读0次

    前言

    写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装 😓

    现在开始介绍如何使用 Helm 安装 Loki.

    前提

    有 Helm, 并且添加 Grafana 的官方源:

    helm repo add grafana https://grafana.github.io/helm-charts
    helm repo update
    

    🐾Warning:

    网络受限, 需要保证网络通畅.

    部署

    架构

    Promtail(收集) + Loki(存储及处理) + Grafana(展示)

    Loki 架构图

    Promtail

    1. 启用 Prometheus Operator Service Monitor 做监控
    2. 增加external_labels - cluster, 以识别是哪个 K8S 集群;
    3. pipeline_stages 改为 cri, 以对 cri 日志做处理(因为我的集群用的 Container Runtime 是 CRI, 而 Loki Helm 默认配置是 docker)
    4. 增加对 systemd-journal 的日志收集:
    promtail:
      config:
        snippets:
          pipelineStages:
            - cri: {}
    
      extraArgs: 
        - -client.external-labels=cluster=ctyun
      # systemd-journal 额外配置:
      # Add additional scrape config
      extraScrapeConfigs:
        - job_name: journal
          journal:
            path: /var/log/journal
            max_age: 12h
            labels:
              job: systemd-journal
          relabel_configs:
            - source_labels: ['__journal__systemd_unit']
              target_label: 'unit'
            - source_labels: ['__journal__hostname']
              target_label: 'hostname'
    
      # Mount journal directory into Promtail pods
      extraVolumes:
        - name: journal
          hostPath:
            path: /var/log/journal
    
      extraVolumeMounts:
        - name: journal
          mountPath: /var/log/journal
          readOnly: true
    

    Loki

    1. 启用持久化存储
    2. 启用 Prometheus Operator Service Monitor 做监控
      1. 并配置 Loki 相关 Prometheus Rule 做告警
    3. 因为个人集群日志量较小, 适当调大 ingester 相关配置

    Grafana

    1. 启用持久化存储
    2. 启用 Prometheus Operator Service Monitor 做监控
    3. sidecar 都配置上, 方便动态更新 dashboards/datasources/plugins/notifiers;

    Helm 安装

    通过如下命令安装:

    helm upgrade --install loki --namespace=loki --create-namespace grafana/loki-stack -f values.yaml
    

    自定义 values.yaml 如下:

    loki:
      enabled: true
      persistence:
        enabled: true
        storageClassName: local-path
        size: 20Gi
      serviceScheme: https
      user: admin
      password: changit!
      config:
        ingester:
          chunk_idle_period: 1h
          max_chunk_age: 4h
        compactor:
          retention_enabled: true
      serviceMonitor:
        enabled: true
        prometheusRule:
          enabled: true
          rules:
            #  Some examples from https://awesome-prometheus-alerts.grep.to/rules.html#loki
            - alert: LokiProcessTooManyRestarts
              expr: changes(process_start_time_seconds{job=~"loki"}[15m]) > 2
              for: 0m
              labels:
                severity: warning
              annotations:
                summary: Loki process too many restarts (instance {{ $labels.instance }})
                description: "A loki process had too many restarts (target {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
            - alert: LokiRequestErrors
              expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10
              for: 15m
              labels:
                severity: critical
              annotations:
                summary: Loki request errors (instance {{ $labels.instance }})
                description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
            - alert: LokiRequestPanic
              expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
              for: 5m
              labels:
                severity: critical
              annotations:
                summary: Loki request panic (instance {{ $labels.instance }})
                description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
            - alert: LokiRequestLatency
              expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le)))  > 1
              for: 5m
              labels:
                severity: critical
              annotations:
                summary: Loki request latency (instance {{ $labels.instance }})
                description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    promtail:
      enabled: true
      config:
        snippets:
          pipelineStages:
            - cri: {}  
      extraArgs:
        - -client.external-labels=cluster=ctyun        
      serviceMonitor:
        # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
        enabled: true
    
      # systemd-journal 额外配置:
      # Add additional scrape config
      extraScrapeConfigs:
        - job_name: journal
          journal:
            path: /var/log/journal
            max_age: 12h
            labels:
              job: systemd-journal
          relabel_configs:
            - source_labels: ['__journal__systemd_unit']
              target_label: 'unit'
            - source_labels: ['__journal__hostname']
              target_label: 'hostname'
    
      # Mount journal directory into Promtail pods
      extraVolumes:
        - name: journal
          hostPath:
            path: /var/log/journal
    
      extraVolumeMounts:
        - name: journal
          mountPath: /var/log/journal
          readOnly: true
    
    fluent-bit:
      enabled: false
    
    grafana:
      enabled: true
      adminUser: caseycui
      adminPassword: changit!
      ## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders
      ## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards
      sidecar:
        image:
          repository: quay.io/kiwigrid/k8s-sidecar
          tag: 1.15.6
          sha: ''
        dashboards:
          enabled: true
          SCProvider: true
          label: grafana_dashboard
        datasources:
          enabled: true
          # label that the configmaps with datasources are marked with
          label: grafana_datasource
        plugins:
          enabled: true
          # label that the configmaps with plugins are marked with
          label: grafana_plugin
        notifiers:
          enabled: true
          # label that the configmaps with notifiers are marked with
          label: grafana_notifier
      image:
        tag: 8.3.5
      persistence:
        enabled: true
        size: 2Gi
        storageClassName: local-path
      serviceMonitor:
        enabled: true
      imageRenderer:
        enabled: disable
    
    filebeat:
      enabled: false
    
    logstash:
      enabled: false
    
    

    安装后的资源拓扑如下:

    Loki K8S 资源拓扑

    Day 2 配置(按需)

    Grafana 增加 Dashboards

    在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入)

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: sample-grafana-dashboard
      labels:
         grafana_dashboard: "1"
    data:
      k8s-dashboard.json: |-
      [...]
    

    Grafana 增加 DataSource

    在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入)

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: loki-loki-stack
      labels:
        grafana_datasource: '1'
    data:
      loki-stack-datasource.yaml: |-
        apiVersion: 1
        datasources:
        - name: Loki
          type: loki
          access: proxy
          url: http://loki:3100
          version: 1
    

    Traefik 配置 Grafana IngressRoute

    因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:

    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRoute
    metadata:
      name: grafana
    spec:
      entryPoints:
        - web
        - websecure
      routes:
        - kind: Rule
          match: Host(`grafana.ewhisper.cn`)
          middlewares:
            - name: hsts-header
              namespace: kube-system
            - name: redirectshttps
              namespace: kube-system
          services:
            - name: loki-grafana
              namespace: monitoring
              port: 80
      tls: {}
    

    最终效果

    如下:

    Grafana Explore Logs

    🎉🎉🎉

    📚️参考文档

    Grafana 系列文章

    Grafana 系列文章

    三人行, 必有我师; 知识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

    相关文章

      网友评论

          本文标题:Grafana 系列文章(十四):Helm 安装Loki

          本文链接:https://www.haomeiwen.com/subject/kknkkdtx.html