Operational logging and monitoring

Operational logging and monitoring overview

HeartAI instances of Red Hat OpenShift are integrated with cloud-native operational logging, monitoring, and observability software. Prometheus provides monitoring of systems and services, and Alertmanager implements event-triggered system behaviour. Grafana provides a real-time observability solution, with various adaptors to interface with data and metrics providers, allowing this data to be processed and visualised through reporting and user application dashboards.

OpenShift monitoring

OpenShift provides real-time system monitoring and logging with Grafana as a real-time observability solution, Prometheus for monitoring systems and services, and Alertmanager for event-triggered system behaviour.

Red Hat OpenShift implementation

HeartAI orchestrates system services with the Kubernetes-based Red Hat OpenShift container platform. Further information about the HeartAI implementation of Red Hat OpenShift may be found with the following documentation:

OpenShift Monitoring for etcd

The following image shows the OpenShift condole for monitoring of the cluster etcd instances. The OpenShift console for monitoring of these resources provides information for:

  • The number of active etcd replicas.
  • The remote procedure call (RPC) rate.
  • The number of active streams.
  • The etcd database size.
  • The duration of disk synchronisation.
  • Memory usage for active etcd replicas.
  • Client traffic in.
  • Client traffic out.
  • Peer traffic in.
  • Peer traffic out.
  • Total raft proposals.
  • Total leader elections per day.

openshift-console-cluster-monitoring-etcd.png

OpenShift Monitoring for cluster compute resources

The following image shows OpenShift console for monitoring of the compute resources of an OpenShift cluster instance. The OpenShift console for monitoring of these resources provides information for:

  • Headlines: CPU utilisation
  • Headlines: CPU requests committed
  • Headlines: CPU requests limited
  • Headlines: Memory utilisation
  • Headlines: Memory requests committed
  • Headlines: Memory requests limited
  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Requests by namespace
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by namespace: Received
  • Network: Average container bandwidth by namespace: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

openshift-console-cluster-monitoring-compute.png

OpenShift Monitoring for cluster networking resources

The following image shows OpenShift console for monitoring of the networking resources of an OpenShift cluster instance. The OpenShift console for monitoring of these resources provides information for:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth history: Receive bandwidth
  • Bandwidth history: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped
  • Errors: Rate of TCP retransmits out of all sent segments
  • Errors: Rate of TCP SYN retransmits out of all retransmits

openshift-console-cluster-monitoring-networking.png

OpenShift Monitoring for cluster USE Method

The following image shows OpenShift console for monitoring of the USE Method of an OpenShift cluster instance. The OpenShift console for monitoring of these resources provides information for:

  • CPU utilisation
  • CPU saturation
  • Memory utilisation
  • Memory saturation
  • Network utilisation
  • Network saturation
  • Disk IO utilisation
  • Disk IO saturation
  • Disk space utilisation

openshift-console-cluster-monitoring-use-method.png

OpenShift Monitoring for node USE Method

The following image shows OpenShift console for monitoring of the USE Method of an individual node within an OpenShift cluster instance. The OpenShift console for monitoring of these resources provides information for:

  • CPU utilisation
  • CPU saturation
  • Memory utilisation
  • Memory saturation
  • Network utilisation
  • Network saturation
  • Disk IO utilisation
  • Disk IO saturation
  • Disk space utilisation

openshift-console-node-monitoring-use-method.png

OpenShift Monitoring for cluster Prometheus resources

The following image shows OpenShift console for monitoring of Prometheus resources within an OpenShift cluster instance. The OpenShift console for monitoring of these resources provides information for:

  • Prometheus stats
  • Discovery: Target sync
  • Discovery: Targets
  • Retrieval: Average scrape interval duration
  • Retrieval: Scrape failures
  • Retrieval: Appended samples
  • Storage: Head series
  • Storage: Head chunks
  • Query: Query rate
  • Query: Stage duration

openshift-console-cluster-monitoring-prometheus.png

Grafana implementation

Grafana is a real-time observability solution, providing various adaptors to interface with data and metrics providers, and allowing this data to be processed and visualised with a large selection of dashboard functionality. HeartAI instances of Red Hat OpenShift are natively integrated with Grafana, and are further supporting by Prometheus for monitoring systems and services, and Alertmanager for event-triggered system behaviour.

Grafana implementation

Further information about HeartAI instances of Grafana may be found with the following documentation:

Grafana for OpenShift cluster compute resources

The following example shows Grafana monitoring for cluster-level compute resources within a HeartAI instance of Red Hat OpenShift. The following compute metrics are collected and displayed:

  • Headlines: CPU utilisation
  • Headlines: CPU requests committed
  • Headlines: CPU requests limited
  • Headlines: Memory utilisation
  • Headlines: Memory requests committed
  • Headlines: Memory requests limited
  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Requests by namespace
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by namespace: Received
  • Network: Average container bandwidth by namespace: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-cluster.png

Grafana for OpenShift node-level pod compute resources

The following example shows Grafana monitoring for node-level pod compute resources within a HeartAI instance of Red Hat OpenShift. The following compute metrics are collected and displayed:

  • CPU usage
  • CPU quota
  • Memory usage
  • Memory quota

grafana-kubernetes-compute-resources-node-pods.png

Grafana for OpenShift namespace-level workloads

The following example shows Grafana monitoring for namespace-level workloads within a HeartAI instance of Red Hat OpenShift. The following workload metrics are collected and displayed:

  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by workload: Received
  • Network: Average container bandwidth by workload: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-namespace-workloads.png

Grafana for OpenShift namespace-level pod compute resources

The following example shows Grafana monitoring for namespace-level pod compute resources within a HeartAI instance of Red Hat OpenShift. The following pod compute resource metrics are collected and displayed:

  • Headlines: CPU utilisation
  • Headlines: CPU requests committed
  • Headlines: CPU requests limited
  • Headlines: Memory utilisation
  • Headlines: Memory requests committed
  • Headlines: Memory requests limited
  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-namespace-pods.png

Grafana for OpenShift workload-level pod compute resources

The following example shows Grafana monitoring for workload-level pod compute resources within a HeartAI instance of Red Hat OpenShift. The following workload pod compute resource metrics are collected and displayed:

  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by pod: Received
  • Network: Average container bandwidth by pod: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-workload.png

Grafana for OpenShift pod-level compute resources

The following example shows Grafana monitoring for pod-level compute resources within a HeartAI instance of Red Hat OpenShift. The following compute resource metrics are collected and displayed:

  • CPU: CPU usage
  • CPU: CPU throttling
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-pod.png

Grafana for OpenShift cluster networking

The following example shows Grafana monitoring for cluster-level networking within a HeartAI instance of Red Hat OpenShift. The following networking metrics are collected and displayed:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth history: Receive bandwidth
  • Bandwidth history: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped
  • Errors: Rate of TCP retransmits out of all sent segments
  • Errors: Rate of TCP SYN retransmits out of all retransmits

grafana-kubernetes-networking-cluster.png

Grafana for OpenShift namespace-level pod networking

The following example shows Grafana monitoring for namespace-level pod networking within a HeartAI instance of Red Hat OpenShift. The following pod networking metrics are collected and displayed:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth: Receive bandwidth
  • Bandwidth: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped

grafana-kubernetes-networking-namespace-pods.png

Grafana for OpenShift pod-level networking

The following example shows Grafana monitoring for pod-level networking within a HeartAI instance of Red Hat OpenShift. The following networking metrics are collected and displayed:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth: Receive bandwidth
  • Bandwidth: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped

grafana-kubernetes-networking-pod.png

Grafana for OpenShift cluster USE Method

The following example shows Grafana monitoring for a cluster Utilization Saturation and Errors (USE) Method within a HeartAI instance of Red Hat OpenShift. A variety of general resource metrics are collected and displayed:

  • CPU utilisation
  • CPU saturation
  • Memory utilisation
  • Memory saturation
  • Network utilisation
  • Network saturation
  • Disk IO utilisation
  • Disk IO saturation
  • Disk space utilisation

grafana-use-method-cluster.png

Grafana for OpenShift node-level USE Method

The following example shows Grafana monitoring for a node-level Utilization Saturation and Errors (USE) Method within a HeartAI instance of Red Hat OpenShift. A variety of general resource metrics are collected and displayed:

  • CPU utilisation
  • CPU saturation
  • Memory utilisation
  • Memory saturation
  • Network utilisation
  • Network saturation
  • Disk IO utilisation
  • Disk IO saturation
  • Disk space utilisation

grafana-use-method-node.png

Grafana for OpenShift cluster etcd key-value store

The following example shows Grafana monitoring for the cluster-level integrated etcd key-value store within a HeartAI instance of Red Hat OpenShift. The following etcd metrics are collected and displayed:

  • RPC rate
  • Active streams
  • DB size
  • Disk sync duration
  • Memory
  • Client traffic in
  • Client traffic out
  • Peer traffic in
  • Peer traffic out
  • Raft proposals
  • Total leader elections per day

grafana-etcd.png

Grafana for OpenShift cluster Prometheus monitoring

The following example shows Grafana monitoring for the cluster-level integrated Prometheus monitoring solution within a HeartAI instance of Red Hat OpenShift. The following Prometheus metrics are collected and displayed:

  • Prometheus stats
  • Discovery: Target sync
  • Discovery: Targets
  • Retrieval: Average scrape interval duration
  • Retrieval: Scrape failures
  • Retrieval: Appended samples
  • Storage: Head series
  • Storage: Head chunks
  • Query: Query rate
  • Query: Stage duration

grafana-prometheus-overview.png

Red Hat OpenShift Logging implementation

Red Hat OpenShift provides integration support for logging and observability with the Red Hat OpenShift Logging (RHOL) framework. RHOL deploys instances of the following software:

RHOL framework component Description Reference
Elasticsearch Distributed and high-performance search and analytics engine. Supports full-text and structured search. Allows indexing and search capabilities for large volumes of log and document data. https://www.elastic.co/elasticsearch/
Fluentd Pluggable and scalable log and data collector. Standardises upstream and downstream data integration. https://www.fluentd.org/
Kibana Robust data visualisation client application for Elasticsearch. Allows broadly customisable query functionality and corresponding visualisation capabilities. Supports operational and real-time observability. https://www.elastic.co/kibana/

Together these software components are often referred to as the EFK stack. The composition of these technologies provides powerful and extendable mechanisms for logging and observability, including:

  • Broad support for log consumption, including native support for a variety of operational and software interfaces.
  • High-performance graph-based accession of log data.
  • Visualisation and observability of log data and associated metrics.

Kibana UI for log discovery

The follow image shows the Kibana web interface for log discovery, providing log aggregation and observability approaches for available log data. The Kibana log discovery web interface contains a variety of functionalities for querying, processing, and visualising log data, including:

  • Real-time collection and analysis of log data from backing Elasticsearch instances.
  • In-built support for log data querying and processing, allowing the creating of log data reporting and visualisation pipelines.
  • Native support for Red Hat OpenShift instances, with the following default log fields specified:
    • Kubernetes namespace name.
    • Kubernetes namespace ID.
    • Kubernetes pod name.
    • Kubernetes pod hostname.
    • Kubernetes container name.
    • Kubernetes container ID.
    • Log message ID.
    • Log message.
    • Log timestamp.
    • Received timestamp.
  • The ability to save the visualisation as a template, with support to export and import to other Kibana instances.

heartai-kibana-discover.png

Kibana UI for namespace-level log visualisation

The follow image shows the Kibana web interface for log data visualisation from the HeartAI HIB interface service namespace, providing log aggregation and visualisation approaches for corresponding log data. The Kibana log data visualisation web interface contains a variety of functionalities for querying, processing, and visualising log data, including:

  • Real-time collection and analysis of log data from backing Elasticsearch instances.
  • In-built support for log data querying and processing, allowing the creation of log data visualisation pipelines.
  • A visualisation of log data from the HeartAI HIB interface service namespace, including:
    • Moving average of log count over time.
    • Time aggregation interval of 10 minutes.
  • The ability to save the visualisation as a template, with support to export and import to other Kibana instances.

heartai-kibana-visualize.png

Kibana UI for namespace-level log dashboard

The follow image shows the Kibana web interface for log data dashboarding, providing an ability to compose several log data visualisation into a comprehensive overview dashboard. The Kibana log data dashboarding web interface supports:

  • Real-time collection and analysis of log data from backing Elasticsearch instances.
  • Composition of several log data visualisations onto a dashboard plane.
  • The ability to save the visualisation as a template, with support to export and import to other Kibana instances.

heartai-kibana-dashboard.png

Kibana UI for cluster-level log visualisation

The follow image shows the Kibana web interface for log data visualisation for a HeartAI instance of a Red Hat OpenShift cluster, providing log aggregation and visualisation approaches for corresponding log data. The Kibana log data visualisation web interface contains a variety of functionalities for querying, processing, and visualising log data, including:

  • Real-time collection and analysis of log data from backing Elasticsearch instances.
  • In-built support for log data querying and processing, allowing the creation of log data visualisation pipelines.
  • A visualisation of log data from a HeartAI instance of Red Hat OpenShift:
    • Log count per cluster namespace.
  • The ability to save the visualisation as a template, with support to export and import to other Kibana instances.

heartai-kibana-visualize-cluster-namespace-log-counts.png