美文网首页Linux
docker部署服务器监控

docker部署服务器监控

作者: KS保 | 来源:发表于2021-09-01 11:55 被阅读0次

    说明
    grafana+prometheus+influxdb+node_exporter+alertmanager实现服务器监控加告警,其中除node_exporter外都是docker部署

    1、grafana:Grafana是一款用Go语言开发的开源数据可视化工具,可以做数据监控和数据统计,带有告警功能
    2、prometheus: Prometheus 是一套开源的系统监控报警框架
    3、influxdb:InfluxDB 是一个开源的时间序列平台。这包括用于存储和查询数据、在后台处理数据以用于 ETL 或监控和警报目的、用户仪表板以及可视化和探索数据等的 API
    4、node_exporter:收集操作系统的基本系统, 例如cpu, 内存, 硬盘空间等基本信息, 并对外提供api接口用于prometheus查询存储;Prometheus周期性的从exporter暴露的HTTP服务地址(通常是/metrics)拉取监控样本数据。
    5、alertmanager:Alertmanager 主要用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,而且很容易做到告警信息进行去重,降噪,分组等,是一款前卫的告警通知系统。
    注:其中grafana开放端口为3000;prometheus开放端口为9090;influxdb开放端口为8086;node_exporter开放端口为9100;alertmanager开放端口为9093

    一、Grafana

    1、启动grafana

    • docker-compose.yml
    version: '3.5'
    
    services:
      grafana:
        image: grafana/grafana
        container_name: grafana
        networks:
          - proxy
        ports:
          - "3000:3000"
        volumes:
          - ./grafana.ini:/etc/grafana/grafana.ini
          - ./grafana:/var/log/grafana
        
    networks:
      proxy:
        external: true
    
    
    • grafana.ini
    ##################### Grafana Configuration Example #####################
    #
    # Everything has defaults so you only need to uncomment things you want to
    # change
    
    # possible values : production, development
    ;app_mode = production
    
    # instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
    ;instance_name = ${HOSTNAME}
    
    #################################### Paths ####################################
    [paths]
    # Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
    ;data = /var/lib/grafana
    
    # Temporary files in `data` directory older than given duration will be removed
    ;temp_data_lifetime = 24h
    
    # Directory where grafana can store logs
    ;logs = /var/log/grafana
    
    # Directory where grafana will automatically scan and look for plugins
    ;plugins = /var/lib/grafana/plugins
    
    # folder that contains provisioning config files that grafana will apply on startup and while running.
    ;provisioning = conf/provisioning
    
    #################################### Server ####################################
    [server]
    # Protocol (http, https, h2, socket)
    ;protocol = http
    
    # The ip address to bind to, empty will bind to all interfaces
    ;http_addr =
    
    # The http port  to use
    ;http_port = 3000
    
    # The public facing domain name used to access grafana from a browser
    ;domain = localhost
    
    # Redirect to correct domain if host header does not match domain
    # Prevents DNS rebinding attacks
    ;enforce_domain = false
    
    # The full public facing url you use in browser, used for redirects and emails
    # If you use reverse proxy and sub path specify full url (with sub path)
    ;root_url = %(protocol)s://%(domain)s:%(http_port)s/
    
    # Serve Grafana from subpath specified in `root_url` setting. By default it is set to `false` for compatibility reasons.
    ;serve_from_sub_path = false
    
    # Log web requests
    ;router_logging = false
    
    # the path relative working path
    ;static_root_path = public
    
    # enable gzip
    ;enable_gzip = false
    
    # https certs & key file
    ;cert_file =
    ;cert_key =
    
    # Unix socket path
    ;socket =
    
    # CDN Url
    ;cdn_url =
    
    # Sets the maximum time using a duration format (5s/5m/5ms) before timing out read of an incoming request and closing idle connections.
    # `0` means there is no timeout for reading the request.
    ;read_timeout = 0
    
    #################################### Database ####################################
    [database]
    # You can configure the database connection by specifying type, host, name, user and password
    # as separate properties or as on string using the url properties.
    
    # Either "mysql", "postgres" or "sqlite3", it's your choice
    ;type = sqlite3
    ;host = 127.0.0.1:3306
    ;name = grafana
    ;user = root
    # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
    ;password =
    
    # Use either URL or the previous fields to configure the database
    # Example: mysql://user:secret@host:port/database
    ;url =
    
    # For "postgres" only, either "disable", "require" or "verify-full"
    ;ssl_mode = disable
    
    # Database drivers may support different transaction isolation levels.
    # Currently, only "mysql" driver supports isolation levels.
    # If the value is empty - driver's default isolation level is applied.
    # For "mysql" use "READ-UNCOMMITTED", "READ-COMMITTED", "REPEATABLE-READ" or "SERIALIZABLE".
    ;isolation_level =
    
    ;ca_cert_path =
    ;client_key_path =
    ;client_cert_path =
    ;server_cert_name =
    
    # For "sqlite3" only, path relative to data_path setting
    ;path = grafana.db
    
    # Max idle conn setting default is 2
    ;max_idle_conn = 2
    
    # Max conn setting default is 0 (mean not set)
    ;max_open_conn =
    
    # Connection Max Lifetime default is 14400 (means 14400 seconds or 4 hours)
    ;conn_max_lifetime = 14400
    
    # Set to true to log the sql calls and execution times.
    ;log_queries =
    
    # For "sqlite3" only. cache mode setting used for connecting to the database. (private, shared)
    ;cache_mode = private
    
    ################################### Data sources #########################
    [datasources]
    # Upper limit of data sources that Grafana will return. This limit is a temporary configuration and it will be deprecated when pagination will be introduced on the list data sources API.
    ;datasource_limit = 5000
    
    #################################### Cache server #############################
    [remote_cache]
    # Either "redis", "memcached" or "database" default is "database"
    ;type = database
    
    # cache connectionstring options
    # database: will use Grafana primary database.
    # redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=0,ssl=false`. Only addr is required. ssl may be 'true', 'false', or 'insecure'.
    # memcache: 127.0.0.1:11211
    ;connstr =
    
    #################################### Data proxy ###########################
    [dataproxy]
    
    # This enables data proxy logging, default is false
    ;logging = false
    
    # How long the data proxy waits to read the headers of the response before timing out, default is 30 seconds.
    # This setting also applies to core backend HTTP data sources where query requests use an HTTP client with timeout set.
    ;timeout = 30
    
    # How long the data proxy waits to establish a TCP connection before timing out, default is 10 seconds.
    ;dialTimeout = 10
    
    # How many seconds the data proxy waits before sending a keepalive probe request.
    ;keep_alive_seconds = 30
    
    # How many seconds the data proxy waits for a successful TLS Handshake before timing out.
    ;tls_handshake_timeout_seconds = 10
    
    # How many seconds the data proxy will wait for a server's first response headers after
    # fully writing the request headers if the request has an "Expect: 100-continue"
    # header. A value of 0 will result in the body being sent immediately, without
    # waiting for the server to approve.
    ;expect_continue_timeout_seconds = 1
    
    # Optionally limits the total number of connections per host, including connections in the dialing,
    # active, and idle states. On limit violation, dials will block.
    # A value of zero (0) means no limit.
    ;max_conns_per_host = 0
    
    # The maximum number of idle connections that Grafana will keep alive.
    ;max_idle_connections = 100
    
    # The maximum number of idle connections per host that Grafana will keep alive.
    ;max_idle_connections_per_host = 2
    
    # How many seconds the data proxy keeps an idle connection open before timing out.
    ;idle_conn_timeout_seconds = 90
    
    # If enabled and user is not anonymous, data proxy will add X-Grafana-User header with username into the request, default is false.
    ;send_user_header = false
    
    #################################### Analytics ####################################
    [analytics]
    # Server reporting, sends usage counters to stats.grafana.org every 24 hours.
    # No ip addresses are being tracked, only simple counters to track
    # running instances, dashboard and error counts. It is very helpful to us.
    # Change this option to false to disable reporting.
    ;reporting_enabled = true
    
    # The name of the distributor of the Grafana instance. Ex hosted-grafana, grafana-labs
    ;reporting_distributor = grafana-labs
    
    # Set to false to disable all checks to https://grafana.net
    # for new versions (grafana itself and plugins), check is used
    # in some UI views to notify that grafana or plugin update exists
    # This option does not cause any auto updates, nor send any information
    # only a GET request to http://grafana.com to get latest versions
    ;check_for_updates = true
    
    # Google Analytics universal tracking code, only enabled if you specify an id here
    ;google_analytics_ua_id =
    
    # Google Tag Manager ID, only enabled if you specify an id here
    ;google_tag_manager_id =
    
    #################################### Security ####################################
    [security]
    # disable creation of admin user on first start of grafana
    ;disable_initial_admin_creation = false
    
    # default admin user, created on startup
    ;admin_user = admin
    
    # default admin password, can be changed before first start of grafana,  or in profile settings
    ;admin_password = admin
    
    # used for signing
    ;secret_key = SW2YcwTIb9zpOOhoPsMm
    
    # disable gravatar profile images
    ;disable_gravatar = false
    
    # data source proxy whitelist (ip_or_domain:port separated by spaces)
    ;data_source_proxy_whitelist =
    
    # disable protection against brute force login attempts
    ;disable_brute_force_login_protection = false
    
    # set to true if you host Grafana behind HTTPS. default is false.
    ;cookie_secure = false
    
    # set cookie SameSite attribute. defaults to `lax`. can be set to "lax", "strict", "none" and "disabled"
    ;cookie_samesite = lax
    
    # set to true if you want to allow browsers to render Grafana in a <frame>, <iframe>, <embed> or <object>. default is false.
    ;allow_embedding = false
    
    # Set to true if you want to enable http strict transport security (HSTS) response header.
    # This is only sent when HTTPS is enabled in this configuration.
    # HSTS tells browsers that the site should only be accessed using HTTPS.
    ;strict_transport_security = false
    
    # Sets how long a browser should cache HSTS. Only applied if strict_transport_security is enabled.
    ;strict_transport_security_max_age_seconds = 86400
    
    # Set to true if to enable HSTS preloading option. Only applied if strict_transport_security is enabled.
    ;strict_transport_security_preload = false
    
    # Set to true if to enable the HSTS includeSubDomains option. Only applied if strict_transport_security is enabled.
    ;strict_transport_security_subdomains = false
    
    # Set to true to enable the X-Content-Type-Options response header.
    # The X-Content-Type-Options response HTTP header is a marker used by the server to indicate that the MIME types advertised
    # in the Content-Type headers should not be changed and be followed.
    ;x_content_type_options = true
    
    # Set to true to enable the X-XSS-Protection header, which tells browsers to stop pages from loading
    # when they detect reflected cross-site scripting (XSS) attacks.
    ;x_xss_protection = true
    
    # Enable adding the Content-Security-Policy header to your requests.
    # CSP allows to control resources the user agent is allowed to load and helps prevent XSS attacks.
    ;content_security_policy = false
    
    # Set Content Security Policy template used when adding the Content-Security-Policy header to your requests.
    # $NONCE in the template includes a random nonce.
    # $ROOT_PATH is server.root_url without the protocol.
    ;content_security_policy_template = """script-src 'self' 'unsafe-eval' 'unsafe-inline' 'strict-dynamic' $NONCE;object-src 'none';font-src 'self';style-src 'self' 'unsafe-inline' blob:;img-src * data:;base-uri 'self';connect-src 'self' grafana.com ws://$ROOT_PATH wss://$ROOT_PATH;manifest-src 'self';media-src 'none';form-action 'self';"""
    
    #################################### Snapshots ###########################
    [snapshots]
    # snapshot sharing options
    ;external_enabled = true
    ;external_snapshot_url = https://snapshots-origin.raintank.io
    ;external_snapshot_name = Publish to snapshot.raintank.io
    
    # Set to true to enable this Grafana instance act as an external snapshot server and allow unauthenticated requests for
    # creating and deleting snapshots.
    ;public_mode = false
    
    # remove expired snapshot
    ;snapshot_remove_expired = true
    
    #################################### Dashboards History ##################
    [dashboards]
    # Number dashboard versions to keep (per dashboard). Default: 20, Minimum: 1
    ;versions_to_keep = 20
    
    # Minimum dashboard refresh interval. When set, this will restrict users to set the refresh interval of a dashboard lower than given interval. Per default this is 5 seconds.
    # The interval string is a possibly signed sequence of decimal numbers, followed by a unit suffix (ms, s, m, h, d), e.g. 30s or 1m.
    ;min_refresh_interval = 5s
    
    # Path to the default home dashboard. If this value is empty, then Grafana uses StaticRootPath + "dashboards/home.json"
    ;default_home_dashboard_path =
    
    #################################### Users ###############################
    [users]
    # disable user signup / registration
    ;allow_sign_up = true
    
    # Allow non admin users to create organizations
    ;allow_org_create = true
    
    # Set to true to automatically assign new users to the default organization (id 1)
    ;auto_assign_org = true
    
    # Set this value to automatically add new users to the provided organization (if auto_assign_org above is set to true)
    ;auto_assign_org_id = 1
    
    # Default role new users will be automatically assigned (if disabled above is set to true)
    ;auto_assign_org_role = Viewer
    
    # Require email validation before sign up completes
    ;verify_email_enabled = false
    
    # Background text for the user field on the login page
    ;login_hint = email or username
    ;password_hint = password
    
    # Default UI theme ("dark" or "light")
    ;default_theme = dark
    
    # Path to a custom home page. Users are only redirected to this if the default home dashboard is used. It should match a frontend route and contain a leading slash.
    ; home_page =
    
    # External user management, these options affect the organization users view
    ;external_manage_link_url =
    ;external_manage_link_name =
    ;external_manage_info =
    
    # Viewers can edit/inspect dashboard settings in the browser. But not save the dashboard.
    ;viewers_can_edit = false
    
    # Editors can administrate dashboard, folders and teams they create
    ;editors_can_admin = false
    
    # The duration in time a user invitation remains valid before expiring. This setting should be expressed as a duration. Examples: 6h (hours), 2d (days), 1w (week). Default is 24h (24 hours). The minimum supported duration is 15m (15 minutes).
    ;user_invite_max_lifetime_duration = 24h
    
    # Enter a comma-separated list of users login to hide them in the Grafana UI. These users are shown to Grafana admins and themselves.
    ; hidden_users =
    
    [auth]
    # Login cookie name
    ;login_cookie_name = grafana_session
    
    # The maximum lifetime (duration) an authenticated user can be inactive before being required to login at next visit. Default is 7 days (7d). This setting should be expressed as a duration, e.g. 5m (minutes), 6h (hours), 10d (days), 2w (weeks), 1M (month). The lifetime resets at each successful token rotation.
    ;login_maximum_inactive_lifetime_duration =
    
    # The maximum lifetime (duration) an authenticated user can be logged in since login time before being required to login. Default is 30 days (30d). This setting should be expressed as a duration, e.g. 5m (minutes), 6h (hours), 10d (days), 2w (weeks), 1M (month).
    ;login_maximum_lifetime_duration =
    
    # How often should auth tokens be rotated for authenticated users when being active. The default is each 10 minutes.
    ;token_rotation_interval_minutes = 10
    
    # Set to true to disable (hide) the login form, useful if you use OAuth, defaults to false
    ;disable_login_form = false
    
    # Set to true to disable the sign out link in the side menu. Useful if you use auth.proxy or auth.jwt, defaults to false
    ;disable_signout_menu = false
    
    # URL to redirect the user to after sign out
    ;signout_redirect_url =
    
    # Set to true to attempt login with OAuth automatically, skipping the login screen.
    # This setting is ignored if multiple OAuth providers are configured.
    ;oauth_auto_login = false
    
    # OAuth state max age cookie duration in seconds. Defaults to 600 seconds.
    ;oauth_state_cookie_max_age = 600
    
    # limit of api_key seconds to live before expiration
    ;api_key_max_seconds_to_live = -1
    
    # Set to true to enable SigV4 authentication option for HTTP-based datasources.
    ;sigv4_auth_enabled = false
    
    #################################### Anonymous Auth ######################
    [auth.anonymous]
    # enable anonymous access
    ;enabled = false
    
    # specify organization name that should be used for unauthenticated users
    ;org_name = Main Org.
    
    # specify role for unauthenticated users
    ;org_role = Viewer
    
    # mask the Grafana version number for unauthenticated users
    ;hide_version = false
    
    #################################### GitHub Auth ##########################
    [auth.github]
    ;enabled = false
    ;allow_sign_up = true
    ;client_id = some_id
    ;client_secret = some_secret
    ;scopes = user:email,read:org
    ;auth_url = https://github.com/login/oauth/authorize
    ;token_url = https://github.com/login/oauth/access_token
    ;api_url = https://api.github.com/user
    ;allowed_domains =
    ;team_ids =
    ;allowed_organizations =
    
    #################################### GitLab Auth #########################
    [auth.gitlab]
    ;enabled = false
    ;allow_sign_up = true
    ;client_id = some_id
    ;client_secret = some_secret
    ;scopes = api
    ;auth_url = https://gitlab.com/oauth/authorize
    ;token_url = https://gitlab.com/oauth/token
    ;api_url = https://gitlab.com/api/v4
    ;allowed_domains =
    ;allowed_groups =
    
    #################################### Google Auth ##########################
    [auth.google]
    ;enabled = false
    ;allow_sign_up = true
    ;client_id = some_client_id
    ;client_secret = some_client_secret
    ;scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
    ;auth_url = https://accounts.google.com/o/oauth2/auth
    ;token_url = https://accounts.google.com/o/oauth2/token
    ;api_url = https://www.googleapis.com/oauth2/v1/userinfo
    ;allowed_domains =
    ;hosted_domain =
    
    #################################### Grafana.com Auth ####################
    [auth.grafana_com]
    ;enabled = false
    ;allow_sign_up = true
    ;client_id = some_id
    ;client_secret = some_secret
    ;scopes = user:email
    ;allowed_organizations =
    
    #################################### Azure AD OAuth #######################
    [auth.azuread]
    ;name = Azure AD
    ;enabled = false
    ;allow_sign_up = true
    ;client_id = some_client_id
    ;client_secret = some_client_secret
    ;scopes = openid email profile
    ;auth_url = https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/authorize
    ;token_url = https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token
    ;allowed_domains =
    ;allowed_groups =
    
    #################################### Okta OAuth #######################
    [auth.okta]
    ;name = Okta
    ;enabled = false
    ;allow_sign_up = true
    ;client_id = some_id
    ;client_secret = some_secret
    ;scopes = openid profile email groups
    ;auth_url = https://<tenant-id>.okta.com/oauth2/v1/authorize
    ;token_url = https://<tenant-id>.okta.com/oauth2/v1/token
    ;api_url = https://<tenant-id>.okta.com/oauth2/v1/userinfo
    ;allowed_domains =
    ;allowed_groups =
    ;role_attribute_path =
    ;role_attribute_strict = false
    
    #################################### Generic OAuth ##########################
    [auth.generic_oauth]
    ;enabled = false
    ;name = OAuth
    ;allow_sign_up = true
    ;client_id = some_id
    ;client_secret = some_secret
    ;scopes = user:email,read:org
    ;empty_scopes = false
    ;email_attribute_name = email:primary
    ;email_attribute_path =
    ;login_attribute_path =
    ;name_attribute_path =
    ;id_token_attribute_name =
    ;auth_url = https://foo.bar/login/oauth/authorize
    ;token_url = https://foo.bar/login/oauth/access_token
    ;api_url = https://foo.bar/user
    ;allowed_domains =
    ;team_ids =
    ;allowed_organizations =
    ;role_attribute_path =
    ;role_attribute_strict = false
    ;tls_skip_verify_insecure = false
    ;tls_client_cert =
    ;tls_client_key =
    ;tls_client_ca =
    
    #################################### Basic Auth ##########################
    [auth.basic]
    ;enabled = true
    
    #################################### Auth Proxy ##########################
    [auth.proxy]
    ;enabled = false
    ;header_name = X-WEBAUTH-USER
    ;header_property = username
    ;auto_sign_up = true
    ;sync_ttl = 60
    ;whitelist = 192.168.1.1, 192.168.2.1
    ;headers = Email:X-User-Email, Name:X-User-Name
    # Read the auth proxy docs for details on what the setting below enables
    ;enable_login_token = false
    
    #################################### Auth JWT ##########################
    [auth.jwt]
    ;enabled = true
    ;header_name = X-JWT-Assertion
    ;email_claim = sub
    ;username_claim = sub
    ;jwk_set_url = https://foo.bar/.well-known/jwks.json
    ;jwk_set_file = /path/to/jwks.json
    ;cache_ttl = 60m
    ;expected_claims = {"aud": ["foo", "bar"]}
    ;key_file = /path/to/key/file
    
    #################################### Auth LDAP ##########################
    [auth.ldap]
    ;enabled = false
    ;config_file = /etc/grafana/ldap.toml
    ;allow_sign_up = true
    
    # LDAP background sync (Enterprise only)
    # At 1 am every day
    ;sync_cron = "0 0 1 * * *"
    ;active_sync_enabled = true
    
    #################################### AWS ###########################
    [aws]
    # Enter a comma-separated list of allowed AWS authentication providers.
    # Options are: default (AWS SDK Default), keys (Access && secret key), credentials (Credentials field), ec2_iam_role (EC2 IAM Role)
    ; allowed_auth_providers = default,keys,credentials
    
    # Allow AWS users to assume a role using temporary security credentials.
    # If true, assume role will be enabled for all AWS authentication providers that are specified in aws_auth_providers
    ; assume_role_enabled = true
    
    #################################### Azure ###############################
    [azure]
    # Azure cloud environment where Grafana is hosted
    # Possible values are AzureCloud, AzureChinaCloud, AzureUSGovernment and AzureGermanCloud
    # Default value is AzureCloud (i.e. public cloud)
    ;cloud = AzureCloud
    
    # Specifies whether Grafana hosted in Azure service with Managed Identity configured (e.g. Azure Virtual Machines instance)
    # If enabled, the managed identity can be used for authentication of Grafana in Azure services
    # Disabled by default, needs to be explicitly enabled
    ;managed_identity_enabled = false
    
    # Client ID to use for user-assigned managed identity
    # Should be set for user-assigned identity and should be empty for system-assigned identity
    ;managed_identity_client_id =
    
    #################################### SMTP / Emailing ##########################
    [smtp]
    ;enabled = false
    ;host = localhost:25
    ;user =
    # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
    ;password =
    ;cert_file =
    ;key_file =
    ;skip_verify = false
    ;from_address = admin@grafana.localhost
    ;from_name = Grafana
    # EHLO identity in SMTP dialog (defaults to instance_name)
    ;ehlo_identity = dashboard.example.com
    # SMTP startTLS policy (defaults to 'OpportunisticStartTLS')
    ;startTLS_policy = NoStartTLS
    
    [emails]
    ;welcome_email_on_sign_up = false
    ;templates_pattern = emails/*.html
    
    #################################### Logging ##########################
    [log]
    # Either "console", "file", "syslog". Default is console and  file
    # Use space to separate multiple modes, e.g. "console file"
    ;mode = console file
    
    # Either "debug", "info", "warn", "error", "critical", default is "info"
    ;level = info
    
    # optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug
    ;filters =
    
    # For "console" mode only
    [log.console]
    ;level =
    
    # log line format, valid options are text, console and json
    ;format = console
    
    # For "file" mode only
    [log.file]
    ;level =
    
    # log line format, valid options are text, console and json
    ;format = text
    
    # This enables automated log rotate(switch of following options), default is true
    ;log_rotate = true
    
    # Max line number of single file, default is 1000000
    ;max_lines = 1000000
    
    # Max size shift of single file, default is 28 means 1 << 28, 256MB
    ;max_size_shift = 28
    
    # Segment log daily, default is true
    ;daily_rotate = true
    
    # Expired days of log file(delete after max days), default is 7
    ;max_days = 7
    
    [log.syslog]
    ;level =
    
    # log line format, valid options are text, console and json
    ;format = text
    
    # Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
    ;network =
    ;address =
    
    # Syslog facility. user, daemon and local0 through local7 are valid.
    ;facility =
    
    # Syslog tag. By default, the process' argv[0] is used.
    ;tag =
    
    [log.frontend]
    # Should Sentry javascript agent be initialized
    ;enabled = false
    
    # Sentry DSN if you want to send events to Sentry.
    ;sentry_dsn =
    
    # Custom HTTP endpoint to send events captured by the Sentry agent to. Default will log the events to stdout.
    ;custom_endpoint = /log
    
    # Rate of events to be reported between 0 (none) and 1 (all), float
    ;sample_rate = 1.0
    
    # Requests per second limit enforced an extended period, for Grafana backend log ingestion endpoint (/log).
    ;log_endpoint_requests_per_second_limit = 3
    
    # Max requests accepted per short interval of time for Grafana backend log ingestion endpoint (/log).
    ;log_endpoint_burst_limit = 15
    
    #################################### Usage Quotas ########################
    [quota]
    ; enabled = false
    
    #### set quotas to -1 to make unlimited. ####
    # limit number of users per Org.
    ; org_user = 10
    
    # limit number of dashboards per Org.
    ; org_dashboard = 100
    
    # limit number of data_sources per Org.
    ; org_data_source = 10
    
    # limit number of api_keys per Org.
    ; org_api_key = 10
    
    # limit number of alerts per Org.
    ;org_alert_rule = 100
    
    # limit number of orgs a user can create.
    ; user_org = 10
    
    # Global limit of users.
    ; global_user = -1
    
    # global limit of orgs.
    ; global_org = -1
    
    # global limit of dashboards
    ; global_dashboard = -1
    
    # global limit of api_keys
    ; global_api_key = -1
    
    # global limit on number of logged in users.
    ; global_session = -1
    
    # global limit of alerts
    ;global_alert_rule = -1
    
    #################################### Alerting ############################
    [alerting]
    # Disable alerting engine & UI features
    ;enabled = true
    # Makes it possible to turn off alert rule execution but alerting UI is visible
    ;execute_alerts = true
    
    # Default setting for new alert rules. Defaults to categorize error and timeouts as alerting. (alerting, keep_state)
    ;error_or_timeout = alerting
    
    # Default setting for how Grafana handles nodata or null values in alerting. (alerting, no_data, keep_state, ok)
    ;nodata_or_nullvalues = no_data
    
    # Alert notifications can include images, but rendering many images at the same time can overload the server
    # This limit will protect the server from render overloading and make sure notifications are sent out quickly
    ;concurrent_render_limit = 5
    
    
    # Default setting for alert calculation timeout. Default value is 30
    ;evaluation_timeout_seconds = 30
    
    # Default setting for alert notification timeout. Default value is 30
    ;notification_timeout_seconds = 30
    
    # Default setting for max attempts to sending alert notifications. Default value is 3
    ;max_attempts = 3
    
    # Makes it possible to enforce a minimal interval between evaluations, to reduce load on the backend
    ;min_interval_seconds = 1
    
    # Configures for how long alert annotations are stored. Default is 0, which keeps them forever.
    # This setting should be expressed as a duration. Examples: 6h (hours), 10d (days), 2w (weeks), 1M (month).
    ;max_annotation_age =
    
    # Configures max number of alert annotations that Grafana stores. Default value is 0, which keeps all alert annotations.
    ;max_annotations_to_keep =
    
    #################################### Annotations #########################
    [annotations]
    # Configures the batch size for the annotation clean-up job. This setting is used for dashboard, API, and alert annotations.
    ;cleanupjob_batchsize = 100
    
    [annotations.dashboard]
    # Dashboard annotations means that annotations are associated with the dashboard they are created on.
    
    # Configures how long dashboard annotations are stored. Default is 0, which keeps them forever.
    # This setting should be expressed as a duration. Examples: 6h (hours), 10d (days), 2w (weeks), 1M (month).
    ;max_age =
    
    # Configures max number of dashboard annotations that Grafana stores. Default value is 0, which keeps all dashboard annotations.
    ;max_annotations_to_keep =
    
    [annotations.api]
    # API annotations means that the annotations have been created using the API without any
    # association with a dashboard.
    
    # Configures how long Grafana stores API annotations. Default is 0, which keeps them forever.
    # This setting should be expressed as a duration. Examples: 6h (hours), 10d (days), 2w (weeks), 1M (month).
    ;max_age =
    
    # Configures max number of API annotations that Grafana keeps. Default value is 0, which keeps all API annotations.
    ;max_annotations_to_keep =
    
    #################################### Explore #############################
    [explore]
    # Enable the Explore section
    ;enabled = true
    
    #################################### Internal Grafana Metrics ##########################
    # Metrics available at HTTP API Url /metrics
    [metrics]
    # Disable / Enable internal metrics
    ;enabled           = true
    # Graphite Publish interval
    ;interval_seconds  = 10
    # Disable total stats (stat_totals_*) metrics to be generated
    ;disable_total_stats = false
    
    #If both are set, basic auth will be required for the metrics endpoint.
    ; basic_auth_username =
    ; basic_auth_password =
    
    # Metrics environment info adds dimensions to the `grafana_environment_info` metric, which
    # can expose more information about the Grafana instance.
    [metrics.environment_info]
    #exampleLabel1 = exampleValue1
    #exampleLabel2 = exampleValue2
    
    # Send internal metrics to Graphite
    [metrics.graphite]
    # Enable by setting the address setting (ex localhost:2003)
    ;address =
    ;prefix = prod.grafana.%(instance_name)s.
    
    #################################### Grafana.com integration  ##########################
    # Url used to import dashboards directly from Grafana.com
    [grafana_com]
    ;url = https://grafana.com
    
    #################################### Distributed tracing ############
    [tracing.jaeger]
    # Enable by setting the address sending traces to jaeger (ex localhost:6831)
    ;address = localhost:6831
    # Tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2)
    ;always_included_tag = tag1:value1
    # Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
    ;sampler_type = const
    # jaeger samplerconfig param
    # for "const" sampler, 0 or 1 for always false/true respectively
    # for "probabilistic" sampler, a probability between 0 and 1
    # for "rateLimiting" sampler, the number of spans per second
    # for "remote" sampler, param is the same as for "probabilistic"
    # and indicates the initial sampling rate before the actual one
    # is received from the mothership
    ;sampler_param = 1
    # sampling_server_url is the URL of a sampling manager providing a sampling strategy.
    ;sampling_server_url =
    # Whether or not to use Zipkin propagation (x-b3- HTTP headers).
    ;zipkin_propagation = false
    # Setting this to true disables shared RPC spans.
    # Not disabling is the most common setting when using Zipkin elsewhere in your infrastructure.
    ;disable_shared_zipkin_spans = false
    
    #################################### External image storage ##########################
    [external_image_storage]
    # Used for uploading images to public servers so they can be included in slack/email messages.
    # you can choose between (s3, webdav, gcs, azure_blob, local)
    ;provider =
    
    [external_image_storage.s3]
    ;endpoint =
    ;path_style_access =
    ;bucket =
    ;region =
    ;path =
    ;access_key =
    ;secret_key =
    
    [external_image_storage.webdav]
    ;url =
    ;public_url =
    ;username =
    ;password =
    
    [external_image_storage.gcs]
    ;key_file =
    ;bucket =
    ;path =
    
    [external_image_storage.azure_blob]
    ;account_name =
    ;account_key =
    ;container_name =
    
    [external_image_storage.local]
    # does not require any configuration
    
    [rendering]
    # Options to configure a remote HTTP image rendering service, e.g. using https://github.com/grafana/grafana-image-renderer.
    # URL to a remote HTTP image renderer service, e.g. http://localhost:8081/render, will enable Grafana to render panels and dashboards to PNG-images using HTTP requests to an external service.
    ;server_url =
    # If the remote HTTP image renderer service runs on a different server than the Grafana server you may have to configure this to a URL where Grafana is reachable, e.g. http://grafana.domain/.
    ;callback_url =
    # Concurrent render request limit affects when the /render HTTP endpoint is used. Rendering many images at the same time can overload the server,
    # which this setting can help protect against by only allowing a certain amount of concurrent requests.
    ;concurrent_render_request_limit = 30
    
    [panels]
    # If set to true Grafana will allow script tags in text panels. Not recommended as it enable XSS vulnerabilities.
    ;disable_sanitize_html = false
    
    [plugins]
    ;enable_alpha = false
    ;app_tls_skip_verify_insecure = false
    # Enter a comma-separated list of plugin identifiers to identify plugins to load even if they are unsigned. Plugins with modified signatures are never loaded.
    ;allow_loading_unsigned_plugins =
    # Enable or disable installing plugins directly from within Grafana.
    ;plugin_admin_enabled = false
    ;plugin_admin_external_manage_enabled = false
    ;plugin_catalog_url = https://grafana.com/grafana/plugins/
    
    #################################### Grafana Live ##########################################
    [live]
    # max_connections to Grafana Live WebSocket endpoint per Grafana server instance. See Grafana Live docs
    # if you are planning to make it higher than default 100 since this can require some OS and infrastructure
    # tuning. 0 disables Live, -1 means unlimited connections.
    ;max_connections = 100
    
    #################################### Grafana Image Renderer Plugin ##########################
    [plugin.grafana-image-renderer]
    # Instruct headless browser instance to use a default timezone when not provided by Grafana, e.g. when rendering panel image of alert.
    # See ICU’s metaZones.txt (https://cs.chromium.org/chromium/src/third_party/icu/source/data/misc/metaZones.txt) for a list of supported
    # timezone IDs. Fallbacks to TZ environment variable if not set.
    ;rendering_timezone =
    
    # Instruct headless browser instance to use a default language when not provided by Grafana, e.g. when rendering panel image of alert.
    # Please refer to the HTTP header Accept-Language to understand how to format this value, e.g. 'fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5'.
    ;rendering_language =
    
    # Instruct headless browser instance to use a default device scale factor when not provided by Grafana, e.g. when rendering panel image of alert.
    # Default is 1. Using a higher value will produce more detailed images (higher DPI), but will require more disk space to store an image.
    ;rendering_viewport_device_scale_factor =
    
    # Instruct headless browser instance whether to ignore HTTPS errors during navigation. Per default HTTPS errors are not ignored. Due to
    # the security risk it's not recommended to ignore HTTPS errors.
    ;rendering_ignore_https_errors =
    
    # Instruct headless browser instance whether to capture and log verbose information when rendering an image. Default is false and will
    # only capture and log error messages. When enabled, debug messages are captured and logged as well.
    # For the verbose information to be included in the Grafana server log you have to adjust the rendering log level to debug, configure
    # [log].filter = rendering:debug.
    ;rendering_verbose_logging =
    
    # Instruct headless browser instance whether to output its debug and error messages into running process of remote rendering service.
    # Default is false. This can be useful to enable (true) when troubleshooting.
    ;rendering_dumpio =
    
    # Additional arguments to pass to the headless browser instance. Default is --no-sandbox. The list of Chromium flags can be found
    # here (https://peter.sh/experiments/chromium-command-line-switches/). Multiple arguments is separated with comma-character.
    ;rendering_args =
    
    # You can configure the plugin to use a different browser binary instead of the pre-packaged version of Chromium.
    # Please note that this is not recommended, since you may encounter problems if the installed version of Chrome/Chromium is not
    # compatible with the plugin.
    ;rendering_chrome_bin =
    
    # Instruct how headless browser instances are created. Default is 'default' and will create a new browser instance on each request.
    # Mode 'clustered' will make sure that only a maximum of browsers/incognito pages can execute concurrently.
    # Mode 'reusable' will have one browser instance and will create a new incognito page on each request.
    ;rendering_mode =
    
    # When rendering_mode = clustered you can instruct how many browsers or incognito pages can execute concurrently. Default is 'browser'
    # and will cluster using browser instances.
    # Mode 'context' will cluster using incognito pages.
    ;rendering_clustering_mode =
    # When rendering_mode = clustered you can define maximum number of browser instances/incognito pages that can execute concurrently..
    ;rendering_clustering_max_concurrency =
    
    # Limit the maximum viewport width, height and device scale factor that can be requested.
    ;rendering_viewport_max_width =
    ;rendering_viewport_max_height =
    ;rendering_viewport_max_device_scale_factor =
    
    # Change the listening host and port of the gRPC server. Default host is 127.0.0.1 and default port is 0 and will automatically assign
    # a port not in use.
    ;grpc_host =
    ;grpc_port =
    
    [enterprise]
    # Path to a valid Grafana Enterprise license.jwt file
    ;license_path =
    
    [feature_toggles]
    # enable features, separated by spaces
    ;enable =
    
    [date_formats]
    # For information on what formatting patterns that are supported https://momentjs.com/docs/#/displaying/
    
    # Default system date format used in time range picker and other places where full time is displayed
    ;full_date = YYYY-MM-DD HH:mm:ss
    
    # Used by graph and other places where we only show small intervals
    ;interval_second = HH:mm:ss
    ;interval_minute = HH:mm
    ;interval_hour = MM/DD HH:mm
    ;interval_day = MM/DD
    ;interval_month = YYYY-MM
    ;interval_year = YYYY
    
    # Experimental feature
    ;use_browser_locale = false
    
    # Default timezone for user preferences. Options are 'browser' for the browser local timezone or a timezone name from IANA Time Zone database, e.g. 'UTC' or 'Europe/Amsterdam' etc.
    ;default_timezone = browser
    
    [expressions]
    # Enable or disable the expressions functionality.
    ;enabled = true
    

    二、Prometheus

    1、启动Prometheus

    • docker-compose.yml
    version: "3.5"
    services:
      prometheus:
        image: prom/prometheus:v2.27.1
        restart: always
        container_name: prometheus
        networks:
          - proxy
        ports:
          - 9090:9090
        volumes:
          - ./prometheus.yml:/etc/prometheus/prometheus.yml
          - ./rules:/etc/prometheus/rules
    
    networks:
      proxy:
        external: true
    
    
    • prometheus.yml
    global:
      scrape_interval:     15s # By default, scrape targets every 15 seconds.
    
    
      # Attach these labels to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
        monitor: 'codelab-monitor'
    
    # influxdb配置;需在influxdb中创建好db
    remote_write:
      - url: "http://xxx.xxx.xxx.xxx:8086/api/v1/prom/write?db=prometheus&u=root&p=***"
    
    remote_read:
      - url: "http://xxx.xxx.xxx.xxx:8086/api/v1/prom/read?db=prometheus&u=root&p=***"
    
    rule_files:
      - "./rules/*.yml"
    
    # 告警配置
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ['xxx.xxx.xxx.xxx:9093']
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # Override the global default and scrape targets from this job every 5 seconds.
        scrape_interval: 5s
    
        static_configs:
          - targets: ['localhost:9090']
    
      - job_name: 'agent'
        # 基本认证(启动node_exporter时配置的用户名密码)
        basic_auth:
          username: xxxx
          password: ****
    
        static_configs:
          # 需要监控服务器
          - targets: ['xxx.xxx.xxx.xxx:9100', 'xxx.xxx.xxx.xxx:9100']
    
    • rules/example.yml
    groups:
    - name: hostStatsAlert
      rules:
      - alert: 服务器CPU告警
        expr: (100 - avg (rate(node_cpu_seconds_total{job="agent",mode="idle"}[2m])) by (instance) * 100) > 8
        for: 1m
        labels:
          severity: 二级
        annotations:
          summary: "服务器实例:{{ $labels.instance }} CPU使用率过高"
          description: "服务器 {{ $labels.instance }} CPU使用率超过8% (当前值为: {{ $value }})"
          username: "lcb"
          link: https://grafana.llnovel.com/d/5rJJbzz7k/fu-wu-qi-xing-neng-jian-kong?orgId=1
    
      - alert: 服务器内存告警
        expr: ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes)*100 > 67
        for: 1m
        labels:
          severity: 二级 
        annotations:
          summary: "服务器实例:{{ $labels.instance }} 内存使用率过高"
          description: "服务器 {{ $labels.instance }} 内存使用率超过 67% (当前值为: {{ $value }})"
          username: "lcb"
          link: https://grafana.llnovel.com/d/5rJJbzz7k/fu-wu-qi-xing-neng-jian-kong?orgId=1
    
      - alert: 服务器磁盘告警
        expr: (node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_free_bytes{mountpoint="/"}) / node_filesystem_size_bytes{mountpoint="/"} * 100 > 68.5
        for: 1m
        labels: 
          severity: 二级
        annotations:
          summary: "服务器实例:{{ $labels.instance }} 分区使用率过高"
          description: "服务器 {{ $labels.instance }} 磁盘分区使用率超过 68.5% (当前值为: {{ $value }})"
          username: "lcb"
          link: https://grafana.llnovel.com/d/5rJJbzz7k/fu-wu-qi-xing-neng-jian-kong?orgId=1
    
    

    参数说明:

    for:当一个监控项超过了阀值时,这个告警处于pending状态,而pending状态维持for秒的时间后,就会切换为fire状态,也就是将告警信息发送给了alertmanager

    三、安装node_exporter

    1、wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz
    2、tar -zxf node_exporter-1.1.2.linux-amd64.tar.gz -C /usr/local/
    3、cd /usr/local/
    4、mv /usr/local/node_exporter-1.1.2.linux-amd64/ /usr/local/node_exporter
    5、vim config.yml(配置访问的用户名密码,可根据需要跳过这一步)
    
    • 使用htpasswd加密密码,加密方式为htpasswd -nBC 12 '' | tr -d ':\n'
    • config.yml
    basic_auth_users:
      # 当前设置的用户名为 prometheus , 可以设置多个
      # 密码加密方式 htpasswd -nBC 12 '' | tr -d ':\n'
      username: password
    
    6、nohup /usr/local/node_exporter/node_exporter --web.config=./config.yml &
    nohup ./node_exporter --web.config=./config.yml &
    

    四、安装influxdb

    • docker-compose.yml
    version: "3.5"
      
    services:
      influxdb:
        image: influxdb:1.8
        volumes:
          - ./influxdb.conf:/etc/influxdb/influxdb.conf
          - ./influxdb:/var/lib/influxdb
        ports:
          - "8086:8086"
        command: -config /etc/influxdb/influxdb.conf
        environment:
          - INFLUXDB_DB=prometheus
          - INFLUXDB_ADMIN_ENABLED=true
          - INFLUXDB_ADMIN_USER=admin
          - INFLUXDB_ADMIN_PASSWORD=admin
          - INFLUXDB_USER=root
          - INFLUXDB_USER_PASSWORD=lcb123
    
    • influxdb.conf
    ### Welcome to the InfluxDB configuration file.
    
    # The values in this file override the default values used by the system if
    # a config option is not specified. The commented out lines are the configuration
    # field and the default value used. Uncommenting a line and changing the value
    # will change the value used at runtime when the process is restarted.
    
    # Once every 24 hours InfluxDB will report usage data to usage.influxdata.com
    # The data includes a random ID, os, arch, version, the number of series and other
    # usage data. No data from user databases is ever transmitted.
    # Change this option to true to disable reporting.
    # reporting-disabled = false
    
    # Bind address to use for the RPC service for backup and restore.
    # bind-address = "127.0.0.1:8088"
    
    ###
    ### [meta]
    ###
    ### Controls the parameters for the Raft consensus group that stores metadata
    ### about the InfluxDB cluster.
    ###
    
    [meta]
      # Where the metadata/raft database is stored
      dir = "/var/lib/influxdb/meta"
    
      # Automatically create a default retention policy when creating a database.
      # retention-autocreate = true
    
      # If log messages are printed for the meta service
      # logging-enabled = true
    
    ###
    ### [data]
    ###
    ### Controls where the actual shard data for InfluxDB lives and how it is
    ### flushed from the WAL. "dir" may need to be changed to a suitable place
    ### for your system, but the WAL settings are an advanced configuration. The
    ### defaults should work for most systems.
    ###
    
    [data]
      # The directory where the TSM storage engine stores TSM files.
      dir = "/var/lib/influxdb/data"
    
      # The directory where the TSM storage engine stores WAL files.
      wal-dir = "/var/lib/influxdb/wal"
    
      # The amount of time that a write will wait before fsyncing.  A duration
      # greater than 0 can be used to batch up multiple fsync calls.  This is useful for slower
      # disks or when WAL write contention is seen.  A value of 0s fsyncs every write to the WAL.
      # Values in the range of 0-100ms are recommended for non-SSD disks.
      # wal-fsync-delay = "0s"
    
    
      # The type of shard index to use for new shards.  The default is an in-memory index that is
      # recreated at startup.  A value of "tsi1" will use a disk based index that supports higher
      # cardinality datasets.
      # index-version = "inmem"
    
      # Trace logging provides more verbose output around the tsm engine. Turning
      # this on can provide more useful output for debugging tsm engine issues.
      # trace-logging-enabled = false
    
      # Whether queries should be logged before execution. Very useful for troubleshooting, but will
      # log any sensitive data contained within a query.
      # query-log-enabled = true
    
      # Provides more error checking. For example, SELECT INTO will err out inserting an +/-Inf value
      # rather than silently failing.
      # strict-error-handling = false
    
      # Validates incoming writes to ensure keys only have valid unicode characters.
      # This setting will incur a small overhead because every key must be checked.
      # validate-keys = false
    
      # Settings for the TSM engine
    
      # CacheMaxMemorySize is the maximum size a shard's cache can
      # reach before it starts rejecting writes.
      # Valid size suffixes are k, m, or g (case insensitive, 1024 = 1k).
      # Values without a size suffix are in bytes.
      # cache-max-memory-size = "1g"
    
      # CacheSnapshotMemorySize is the size at which the engine will
      # snapshot the cache and write it to a TSM file, freeing up memory
      # Valid size suffixes are k, m, or g (case insensitive, 1024 = 1k).
      # Values without a size suffix are in bytes.
      # cache-snapshot-memory-size = "25m"
    
      # CacheSnapshotWriteColdDuration is the length of time at
      # which the engine will snapshot the cache and write it to
      # a new TSM file if the shard hasn't received writes or deletes
      # cache-snapshot-write-cold-duration = "10m"
    
      # CompactFullWriteColdDuration is the duration at which the engine
      # will compact all TSM files in a shard if it hasn't received a
      # write or delete
      # compact-full-write-cold-duration = "4h"
    
      # The maximum number of concurrent full and level compactions that can run at one time.  A
      # value of 0 results in 50% of runtime.GOMAXPROCS(0) used at runtime.  Any number greater
      # than 0 limits compactions to that value.  This setting does not apply
      # to cache snapshotting.
      # max-concurrent-compactions = 0
    
      # CompactThroughput is the rate limit in bytes per second that we
      # will allow TSM compactions to write to disk. Note that short bursts are allowed
      # to happen at a possibly larger value, set by CompactThroughputBurst
      # compact-throughput = "48m"
    
      # CompactThroughputBurst is the rate limit in bytes per second that we
      # will allow TSM compactions to write to disk.
      # compact-throughput-burst = "48m"
    
      # If true, then the mmap advise value MADV_WILLNEED will be provided to the kernel with respect to
      # TSM files. This setting has been found to be problematic on some kernels, and defaults to off.
      # It might help users who have slow disks in some cases.
      # tsm-use-madv-willneed = false
    
      # Settings for the inmem index
    
      # The maximum series allowed per database before writes are dropped.  This limit can prevent
      # high cardinality issues at the database level.  This limit can be disabled by setting it to
      # 0.
      # max-series-per-database = 1000000
    
      # The maximum number of tag values per tag that are allowed before writes are dropped.  This limit
      # can prevent high cardinality tag values from being written to a measurement.  This limit can be
      # disabled by setting it to 0.
      # max-values-per-tag = 100000
    
      # Settings for the tsi1 index
    
      # The threshold, in bytes, when an index write-ahead log file will compact
      # into an index file. Lower sizes will cause log files to be compacted more
      # quickly and result in lower heap usage at the expense of write throughput.
      # Higher sizes will be compacted less frequently, store more series in-memory,
      # and provide higher write throughput.
      # Valid size suffixes are k, m, or g (case insensitive, 1024 = 1k).
      # Values without a size suffix are in bytes.
      # max-index-log-file-size = "1m"
    
      # The size of the internal cache used in the TSI index to store previously 
      # calculated series results. Cached results will be returned quickly from the cache rather
      # than needing to be recalculated when a subsequent query with a matching tag key/value 
      # predicate is executed. Setting this value to 0 will disable the cache, which may
      # lead to query performance issues.
      # This value should only be increased if it is known that the set of regularly used 
      # tag key/value predicates across all measurements for a database is larger than 100. An
      # increase in cache size may lead to an increase in heap usage.
      series-id-set-cache-size = 100
    
    ###
    ### [coordinator]
    ###
    ### Controls the clustering service configuration.
    ###
    
    [coordinator]
      # The default time a write request will wait until a "timeout" error is returned to the caller.
      # write-timeout = "10s"
    
      # The maximum number of concurrent queries allowed to be executing at one time.  If a query is
      # executed and exceeds this limit, an error is returned to the caller.  This limit can be disabled
      # by setting it to 0.
      # max-concurrent-queries = 0
    
      # The maximum time a query will is allowed to execute before being killed by the system.  This limit
      # can help prevent run away queries.  Setting the value to 0 disables the limit.
      # query-timeout = "0s"
    
      # The time threshold when a query will be logged as a slow query.  This limit can be set to help
      # discover slow or resource intensive queries.  Setting the value to 0 disables the slow query logging.
      # log-queries-after = "0s"
    
      # The maximum number of points a SELECT can process.  A value of 0 will make
      # the maximum point count unlimited.  This will only be checked every second so queries will not
      # be aborted immediately when hitting the limit.
      # max-select-point = 0
    
      # The maximum number of series a SELECT can run.  A value of 0 will make the maximum series
      # count unlimited.
      # max-select-series = 0
    
      # The maximum number of group by time bucket a SELECT can create.  A value of zero will max the maximum
      # number of buckets unlimited.
      # max-select-buckets = 0
    
    ###
    ### [retention]
    ###
    ### Controls the enforcement of retention policies for evicting old data.
    ###
    
    [retention]
      # Determines whether retention policy enforcement enabled.
      # enabled = true
    
      # The interval of time when retention policy enforcement checks run.
      # check-interval = "30m"
    
    ###
    ### [shard-precreation]
    ###
    ### Controls the precreation of shards, so they are available before data arrives.
    ### Only shards that, after creation, will have both a start- and end-time in the
    ### future, will ever be created. Shards are never precreated that would be wholly
    ### or partially in the past.
    
    [shard-precreation]
      # Determines whether shard pre-creation service is enabled.
      # enabled = true
    
      # The interval of time when the check to pre-create new shards runs.
      # check-interval = "10m"
    
      # The default period ahead of the endtime of a shard group that its successor
      # group is created.
      # advance-period = "30m"
    
    ###
    ### Controls the system self-monitoring, statistics and diagnostics.
    ###
    ### The internal database for monitoring data is created automatically if
    ### if it does not already exist. The target retention within this database
    ### is called 'monitor' and is also created with a retention period of 7 days
    ### and a replication factor of 1, if it does not exist. In all cases the
    ### this retention policy is configured as the default for the database.
    
    [monitor]
      # Whether to record statistics internally.
      # store-enabled = true
    
      # The destination database for recorded statistics
      # store-database = "_internal"
    
      # The interval at which to record statistics
      # store-interval = "10s"
    
    ###
    ### [http]
    ###
    ### Controls how the HTTP endpoints are configured. These are the primary
    ### mechanism for getting data into and out of InfluxDB.
    ###
    
    [http]
      # Determines whether HTTP endpoint is enabled.
      # enabled = true
    
      # Determines whether the Flux query endpoint is enabled.
      # flux-enabled = false
    
      # Determines whether the Flux query logging is enabled.
      # flux-log-enabled = false
    
      # The bind address used by the HTTP service.
      # bind-address = ":8086"
    
      # Determines whether user authentication is enabled over HTTP/HTTPS.
      # auth-enabled = false
      auth-enabled = true
    
      # The default realm sent back when issuing a basic auth challenge.
      # realm = "InfluxDB"
    
      # Determines whether HTTP request logging is enabled.
      # log-enabled = true
    
      # Determines whether the HTTP write request logs should be suppressed when the log is enabled.
      # suppress-write-log = false
    
      # When HTTP request logging is enabled, this option specifies the path where
      # log entries should be written. If unspecified, the default is to write to stderr, which
      # intermingles HTTP logs with internal InfluxDB logging.
      #
      # If influxd is unable to access the specified path, it will log an error and fall back to writing
      # the request log to stderr.
      access-log-path = "/var/log/influxdb/influxdb-access.log"
    
      # Filters which requests should be logged. Each filter is of the pattern NNN, NNX, or NXX where N is
      # a number and X is a wildcard for any number. To filter all 5xx responses, use the string 5xx.
      # If multiple filters are used, then only one has to match. The default is to have no filters which
      # will cause every request to be printed.
      # access-log-status-filters = []
    
      # Determines whether detailed write logging is enabled.
      # write-tracing = false
    
      # Determines whether the pprof endpoint is enabled.  This endpoint is used for
      # troubleshooting and monitoring.
      # pprof-enabled = true
    
      # Enables authentication on pprof endpoints. Users will need admin permissions
      # to access the pprof endpoints when this setting is enabled. This setting has
      # no effect if either auth-enabled or pprof-enabled are set to false.
      # pprof-auth-enabled = false
    
      # Enables a pprof endpoint that binds to localhost:6060 immediately on startup.
      # This is only needed to debug startup issues.
      # debug-pprof-enabled = false
    
      # Enables authentication on the /ping, /metrics, and deprecated /status
      # endpoints. This setting has no effect if auth-enabled is set to false.
      # ping-auth-enabled = false
    
      # Determines whether HTTPS is enabled.
      # https-enabled = false
    
      # The SSL certificate to use when HTTPS is enabled.
      # https-certificate = "/etc/ssl/influxdb.pem"
    
      # Use a separate private key location.
      # https-private-key = ""
    
      # The JWT auth shared secret to validate requests using JSON web tokens.
      # shared-secret = ""
    
      # The default chunk size for result sets that should be chunked.
      # max-row-limit = 0
    
      # The maximum number of HTTP connections that may be open at once.  New connections that
      # would exceed this limit are dropped.  Setting this value to 0 disables the limit.
      # max-connection-limit = 0
    
      # Enable http service over unix domain socket
      # unix-socket-enabled = false
    
      # The path of the unix domain socket.
      # bind-socket = "/var/run/influxdb.sock"
    
      # The maximum size of a client request body, in bytes. Setting this value to 0 disables the limit.
      # max-body-size = 25000000
    
      # The maximum number of writes processed concurrently.
      # Setting this to 0 disables the limit.
      # max-concurrent-write-limit = 0
    
      # The maximum number of writes queued for processing.
      # Setting this to 0 disables the limit.
      # max-enqueued-write-limit = 0
    
      # The maximum duration for a write to wait in the queue to be processed.
      # Setting this to 0 or setting max-concurrent-write-limit to 0 disables the limit.
      # enqueued-write-timeout = 0
    
        # User supplied HTTP response headers
        #
        # [http.headers]
        #   X-Header-1 = "Header Value 1"
        #   X-Header-2 = "Header Value 2"
    
    ###
    ### [logging]
    ###
    ### Controls how the logger emits logs to the output.
    ###
    
    [logging]
      # Determines which log encoder to use for logs. Available options
      # are auto, logfmt, and json. auto will use a more a more user-friendly
      # output format if the output terminal is a TTY, but the format is not as
      # easily machine-readable. When the output is a non-TTY, auto will use
      # logfmt.
      # format = "auto"
    
      # Determines which level of logs will be emitted. The available levels
      # are error, warn, info, and debug. Logs that are equal to or above the
      # specified level will be emitted.
      # level = "info"
    
      # Suppresses the logo output that is printed when the program is started.
      # The logo is always suppressed if STDOUT is not a TTY.
      # suppress-logo = false
    
    ###
    ### [subscriber]
    ###
    ### Controls the subscriptions, which can be used to fork a copy of all data
    ### received by the InfluxDB host.
    ###
    
    [subscriber]
      # Determines whether the subscriber service is enabled.
      # enabled = true
    
      # The default timeout for HTTP writes to subscribers.
      # http-timeout = "30s"
    
      # Allows insecure HTTPS connections to subscribers.  This is useful when testing with self-
      # signed certificates.
      # insecure-skip-verify = false
    
      # The path to the PEM encoded CA certs file. If the empty string, the default system certs will be used
      # ca-certs = ""
    
      # The number of writer goroutines processing the write channel.
      # write-concurrency = 40
    
      # The number of in-flight writes buffered in the write channel.
      # write-buffer-size = 1000
    
    
    ###
    ### [[graphite]]
    ###
    ### Controls one or many listeners for Graphite data.
    ###
    
    [[graphite]]
      # Determines whether the graphite endpoint is enabled.
      # enabled = false
      # database = "graphite"
      # retention-policy = ""
      # bind-address = ":2003"
      # protocol = "tcp"
      # consistency-level = "one"
    
      # These next lines control how batching works. You should have this enabled
      # otherwise you could get dropped metrics or poor performance. Batching
      # will buffer points in memory if you have many coming in.
    
      # Flush if this many points get buffered
      # batch-size = 5000
    
      # number of batches that may be pending in memory
      # batch-pending = 10
    
      # Flush at least this often even if we haven't hit buffer limit
      # batch-timeout = "1s"
    
      # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
      # udp-read-buffer = 0
    
      ### This string joins multiple matching 'measurement' values providing more control over the final measurement name.
      # separator = "."
    
      ### Default tags that will be added to all metrics.  These can be overridden at the template level
      ### or by tags extracted from metric
      # tags = ["region=us-east", "zone=1c"]
    
      ### Each template line requires a template pattern.  It can have an optional
      ### filter before the template and separated by spaces.  It can also have optional extra
      ### tags following the template.  Multiple tags should be separated by commas and no spaces
      ### similar to the line protocol format.  There can be only one default template.
      # templates = [
      #   "*.app env.service.resource.measurement",
      #   # Default template
      #   "server.*",
      # ]
    
    ###
    ### [collectd]
    ###
    ### Controls one or many listeners for collectd data.
    ###
    
    [[collectd]]
      # enabled = false
      # bind-address = ":25826"
      # database = "collectd"
      # retention-policy = ""
      #
      # The collectd service supports either scanning a directory for multiple types
      # db files, or specifying a single db file.
      # typesdb = "/usr/local/share/collectd"
      #
      # security-level = "none"
      # auth-file = "/etc/collectd/auth_file"
    
      # These next lines control how batching works. You should have this enabled
      # otherwise you could get dropped metrics or poor performance. Batching
      # will buffer points in memory if you have many coming in.
    
      # Flush if this many points get buffered
      # batch-size = 5000
    
      # Number of batches that may be pending in memory
      # batch-pending = 10
    
      # Flush at least this often even if we haven't hit buffer limit
      # batch-timeout = "10s"
    
      # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
      # read-buffer = 0
    
      # Multi-value plugins can be handled two ways.
      # "split" will parse and store the multi-value plugin data into separate measurements
      # "join" will parse and store the multi-value plugin as a single multi-value measurement.
      # "split" is the default behavior for backward compatibility with previous versions of influxdb.
      # parse-multivalue-plugin = "split"
    ###
    ### [opentsdb]
    ###
    ### Controls one or many listeners for OpenTSDB data.
    ###
    
    [[opentsdb]]
      # enabled = false
      # bind-address = ":4242"
      # database = "opentsdb"
      # retention-policy = ""
      # consistency-level = "one"
      # tls-enabled = false
      # certificate= "/etc/ssl/influxdb.pem"
    
      # Log an error for every malformed point.
      # log-point-errors = true
    
      # These next lines control how batching works. You should have this enabled
      # otherwise you could get dropped metrics or poor performance. Only points
      # metrics received over the telnet protocol undergo batching.
    
      # Flush if this many points get buffered
      # batch-size = 1000
    
      # Number of batches that may be pending in memory
      # batch-pending = 5
    
      # Flush at least this often even if we haven't hit buffer limit
      # batch-timeout = "1s"
    
    ###
    ### [[udp]]
    ###
    ### Controls the listeners for InfluxDB line protocol data via UDP.
    ###
    
    [[udp]]
      # enabled = false
      # bind-address = ":8089"
      # database = "udp"
      # retention-policy = ""
    
      # InfluxDB precision for timestamps on received points ("" or "n", "u", "ms", "s", "m", "h")
      # precision = ""
    
      # These next lines control how batching works. You should have this enabled
      # otherwise you could get dropped metrics or poor performance. Batching
      # will buffer points in memory if you have many coming in.
    
      # Flush if this many points get buffered
      # batch-size = 5000
    
      # Number of batches that may be pending in memory
      # batch-pending = 10
    
      # Will flush at least this often even if we haven't hit buffer limit
      # batch-timeout = "1s"
    
      # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
      # read-buffer = 0
    
    ###
    ### [continuous_queries]
    ###
    ### Controls how continuous queries are run within InfluxDB.
    ###
    
    [continuous_queries]
      # Determines whether the continuous query service is enabled.
      # enabled = true
    
      # Controls whether queries are logged when executed by the CQ service.
      # log-enabled = true
    
      # Controls whether queries are logged to the self-monitoring data store.
      # query-stats-enabled = false
    
      # interval for how often continuous queries will be checked if they need to run
      # run-interval = "1s"
    
    ###
    ### [tls]
    ###
    ### Global configuration settings for TLS in InfluxDB.
    ###
    
    [tls]
      # Determines the available set of cipher suites. See https://golang.org/pkg/crypto/tls/#pkg-constants
      # for a list of available ciphers, which depends on the version of Go (use the query
      # SHOW DIAGNOSTICS to see the version of Go used to build InfluxDB). If not specified, uses
      # the default settings from Go's crypto/tls package.
      # ciphers = [
      #   "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
      #   "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
      # ]
    
      # Minimum version of the tls protocol that will be negotiated. If not specified, uses the
      # default settings from Go's crypto/tls package.
      # min-version = "tls1.2"
    
      # Maximum version of the tls protocol that will be negotiated. If not specified, uses the
      # default settings from Go's crypto/tls package.
      # max-version = "tls1.3"
    
    
    可通过此方式创建库

    docker exec -it 容器名 ./run-basic.sh

    • run-basic.sh
    #!/bin/bash
    user=xxx
    passwd=***
    
    sleep 5
    /usr/bin/influx <<EOF
    create database prometheus;
    exit
    EOF
    

    五、安装alertmanager

    • docker-compose.yml
    version: "3.5"
    services:
      alertmanager:
        image: docker.io/prom/alertmanager:latest
        restart: always
        container_name: alertmanager
        networks:
          - proxy
        ports:
          - 9093:9093
        volumes:
          - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
          - ./storage:/data/alertmanager/storage
          - ./template:/etc/alertmanager/template
    
    networks:
      proxy:
        external: true
    
    邮件告警模板
    • ./template/email.tmpl
    {{ define "email.from" }}xxx@163.com{{ end }}
    {{ define "email.to" }}xxx@qq.com{{ end }}
    {{ define "email.to.html" }}
    {{ range .Alerts }}
    =========start==========<br>
    告警程序: prometheus_alert <br>
    告警级别: {{ .Labels.severity }} 级 <br>
    告警类型: {{ .Labels.alertname }} <br>
    故障主机: {{ .Labels.instance }} <br>
    告警主题: {{ .Annotations.summary }} <br>
    告警详情: {{ .Annotations.description }} <br>
    触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
    =========end==========<br>
    {{ end }}
    {{ end }}
    
    
    slack告警配置
    • alertmanager.yml.slack
    global:
      slack_api_url: https://xxxxxxxx
    templates:
      - './template/*.tmpl'
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1m
      receiver: 'slack'
    
    receivers:
    - name: 'slack'
      slack_configs:
        - channel: '#告警'
          send_resolved: true
          text: "{{ range .Alerts }} {{ .Annotations.description}}\n {{end}} @{{ .CommonAnnotations.username}} <{{.CommonAnnotations.link}}| click here>"
          title: "{{.CommonAnnotations.summary}}"
          title_link: "{{.CommonAnnotations.link}}"
    
    邮件告警配置
    • alertmanager.yml.email
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.163.com:465'
      smtp_from: 'xxx@163.com'
      smtp_auth_username: 'xxx@163.com'
      smtp_auth_password: '认证码'
      smtp_require_tls: false
    
    templates:
      - './template/*.tmpl'
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 59s
      receiver: 'mail'
    receivers:
    - name: 'mail'
      email_configs:
        - to: '{{ template "email.to" . }}'
          html: '{{ template "email.to.html" . }}'
          send_resolved: true
    
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']
    

    参数说明:

    resolve_timeout: 当告警的状态由firing变为resolve后需要等待多长时间,才宣布告警解除

    receivers:定义谁接收告警。

    name: 代称,方便后面使用

    route:告警内容从这里进入,寻找相应的策略发送出去

    receiver:一级的receiver,也就是默认的receiver,当告警进来后没有找到任何子节点和自己匹配,就用这个receiver

    group_by:告警应该根据那些标签进行分组

    group_wait:同一组的告警发出前要等待多久后才能再次发出去

    group_interval:同一组的多批次告警间隔多少秒后,才能发出

    repeat_interval:重复的告警要等待多久后才能再次发出去

    inhibit_rules:这个叫做抑制项,通过匹配源告警来抑制目的告警。比如说当我们的主机挂了,可能引起主机上的服务,数据库,中间件等一些告警,假如说后续的这些告警相对来说没有意义,我们可以用抑制项这个功能,让Prometheus只发出主机挂了的告警

    source_match:根据label匹配源告警

    target_match:根据label匹配目的告警

    equal:此处的集合的label,在源和目的的值必须相等。如果该集合内的值在源和目的里都没有,那么目的告警也会被抑制

    • 监控规则
    监控内容 监控规则 说明
    cpu 规则1: 100 - avg (rate(node_cpu_seconds_total{job="agent",mode="idle"}[2m])) by (instance) * 100 cpu使用率
    内存 规则1:((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes)100
    规则2:(1 - (node_memory_MemAvailable_bytes{job="agent"} / (node_memory_MemTotal_bytes{job="agent"})))
    100
    内存使用率
    磁盘 规则1:(node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_free_bytes{mountpoint="/"})
    / node_filesystem_size_bytes{mountpoint="/"} * 100
    磁盘分区使用率

    磁盘监控规则2:

    (node_filesystem_size_bytes{job='agent',fstype="ext.*|xfs",mountpoint="/"}-node_filesystem_free_bytes{job='agent',fstype="ext.|xfs",mountpoint="/"})100 /(node_filesystem_avail_bytes {job='agent',fstype="ext.*|xfs",mountpoint="/"}+(node_filesystem_size_bytes{job='agent',fstype="ext.|xfs",mountpoint="/"}-node_filesystem_free_bytes{job='agent',fstype=~"ext.|xfs",mountpoint="/"}))

    六、使用

    • 输入ip+port,进入到grafana,默认用户名密码是admin
    • 配置数据源prometheus


      配置数据源为prometheus.png
    • 添加模板


      添加模板.png
    • 选择数据源


      选择数据源.png
    • 最后就可以查看监控仪表盘了


      系统监控数据.png

    相关文章

      网友评论

        本文标题:docker部署服务器监控

        本文链接:https://www.haomeiwen.com/subject/orexwltx.html