Grafana implementation

Grafana is a real-time observability solution, providing various adaptors to interface with data and metrics providers, and allowing this data to be processed and visualised with a large selection of dashboard functionality. HeartAI instances of Red Hat OpenShift are natively integrated with Grafana, and are further supporting by Prometheus for monitoring systems and services, and Alertmanager for event-triggered system behaviour.

Red Hat OpenShift implementation

HeartAI orchestrates system services with the Kubernetes-based Red Hat OpenShift container platform. Further information about the HeartAI implementation of Red Hat OpenShift may be found with the following documentation:

Grafana for OpenShift cluster compute resources

The following example shows Grafana monitoring for cluster-level compute resources within a HeartAI instance of Red Hat OpenShift. The following compute metrics are collected and displayed:

  • Headlines: CPU utilisation
  • Headlines: CPU requests committed
  • Headlines: CPU requests limited
  • Headlines: Memory utilisation
  • Headlines: Memory requests committed
  • Headlines: Memory requests limited
  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Requests by namespace
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by namespace: Received
  • Network: Average container bandwidth by namespace: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-cluster.png

Grafana for OpenShift node-level pod compute resources

The following example shows Grafana monitoring for node-level pod compute resources within a HeartAI instance of Red Hat OpenShift. The following compute metrics are collected and displayed:

  • CPU usage
  • CPU quota
  • Memory usage
  • Memory quota

grafana-kubernetes-compute-resources-node-pods.png

Grafana for OpenShift namespace-level workloads

The following example shows Grafana monitoring for namespace-level workloads within a HeartAI instance of Red Hat OpenShift. The following workload metrics are collected and displayed:

  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by workload: Received
  • Network: Average container bandwidth by workload: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-namespace-workloads.png

Grafana for OpenShift namespace-level pod compute resources

The following example shows Grafana monitoring for namespace-level pod compute resources within a HeartAI instance of Red Hat OpenShift. The following pod compute resource metrics are collected and displayed:

  • Headlines: CPU utilisation
  • Headlines: CPU requests committed
  • Headlines: CPU requests limited
  • Headlines: Memory utilisation
  • Headlines: Memory requests committed
  • Headlines: Memory requests limited
  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-namespace-pods.png

Grafana for OpenShift workload-level pod compute resources

The following example shows Grafana monitoring for workload-level pod compute resources within a HeartAI instance of Red Hat OpenShift. The following workload pod compute resource metrics are collected and displayed:

  • CPU: CPU usage
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Current network usage
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Average container bandwidth by pod: Received
  • Network: Average container bandwidth by pod: Transmitted
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-workload.png

Grafana for OpenShift pod-level compute resources

The following example shows Grafana monitoring for pod-level compute resources within a HeartAI instance of Red Hat OpenShift. The following compute resource metrics are collected and displayed:

  • CPU: CPU usage
  • CPU: CPU throttling
  • CPU: CPU quota
  • Memory: Memory usage
  • Memory: Memory quota
  • Network: Receive bandwidth
  • Network: Transmit bandwidth
  • Network: Rate of received packets
  • Network: Rate of transmitted packets
  • Network: Rate of received packets dropped
  • Network: Rate of transmitted packets dropped

grafana-kubernetes-compute-resources-pod.png

Grafana for OpenShift cluster networking

The following example shows Grafana monitoring for cluster-level networking within a HeartAI instance of Red Hat OpenShift. The following networking metrics are collected and displayed:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth history: Receive bandwidth
  • Bandwidth history: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped
  • Errors: Rate of TCP retransmits out of all sent segments
  • Errors: Rate of TCP SYN retransmits out of all retransmits

grafana-kubernetes-networking-cluster.png

Grafana for OpenShift namespace-level pod networking

The following example shows Grafana monitoring for namespace-level pod networking within a HeartAI instance of Red Hat OpenShift. The following pod networking metrics are collected and displayed:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth: Receive bandwidth
  • Bandwidth: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped

grafana-kubernetes-networking-namespace-pods.png

Grafana for OpenShift pod-level networking

The following example shows Grafana monitoring for pod-level networking within a HeartAI instance of Red Hat OpenShift. The following networking metrics are collected and displayed:

  • Bandwidth: Current rate of bytes received
  • Bandwidth: Current rate of bytes transmitted
  • Bandwidth: Current status
  • Bandwidth: Receive bandwidth
  • Bandwidth: Transmit bandwidth
  • Packets: Rate of received packets
  • Packets: Rate of transmitted packets
  • Errors: Rate of received packets dropped
  • Errors: Rate of transmitted packets dropped

grafana-kubernetes-networking-pod.png

Grafana for OpenShift cluster USE Method

The following example shows Grafana monitoring for a cluster Utilization Saturation and Errors (USE) Method within a HeartAI instance of Red Hat OpenShift. A variety of general resource metrics are collected and displayed:

  • CPU utilisation
  • CPU saturation
  • Memory utilisation
  • Memory saturation
  • Network utilisation
  • Network saturation
  • Disk IO utilisation
  • Disk IO saturation
  • Disk space utilisation

grafana-use-method-cluster.png

Grafana for OpenShift node-level USE Method

The following example shows Grafana monitoring for a node-level Utilization Saturation and Errors (USE) Method within a HeartAI instance of Red Hat OpenShift. A variety of general resource metrics are collected and displayed:

  • CPU utilisation
  • CPU saturation
  • Memory utilisation
  • Memory saturation
  • Network utilisation
  • Network saturation
  • Disk IO utilisation
  • Disk IO saturation
  • Disk space utilisation

grafana-use-method-node.png

Grafana for OpenShift cluster etcd key-value store

The following example shows Grafana monitoring for the cluster-level integrated etcd key-value store within a HeartAI instance of Red Hat OpenShift. The following etcd metrics are collected and displayed:

  • RPC rate
  • Active streams
  • DB size
  • Disk sync duration
  • Memory
  • Client traffic in
  • Client traffic out
  • Peer traffic in
  • Peer traffic out
  • Raft proposals
  • Total leader elections per day

grafana-etcd.png

Grafana for OpenShift cluster Prometheus monitoring

The following example shows Grafana monitoring for the cluster-level integrated Prometheus monitoring solution within a HeartAI instance of Red Hat OpenShift. The following Prometheus metrics are collected and displayed:

  • Prometheus stats
  • Discovery: Target sync
  • Discovery: Targets
  • Retrieval: Average scrape interval duration
  • Retrieval: Scrape failures
  • Retrieval: Appended samples
  • Storage: Head series
  • Storage: Head chunks
  • Query: Query rate
  • Query: Stage duration

grafana-prometheus-overview.png