Cloud logging and monitoring

Azure Log Analytics

Azure Log Analytics collects and stores logs, and provides reporting and visualisation functionality.

azure-sah-heartai-aro-prod-aue-001-insights-6.png

Azure Insights

Azure Insights is an application-level telemetry tool that natively integrates with Azure Monitor to monitor and observe application components. This is achievable by embedding a Microsoft application extension into a corresponding supported language. This extension collects and forwards a range of telemetry metrics onto Azure Monitor, which may subsequently be analysed and visualised.

For HeartAI instances of Red Hat OpenShift, Azure Insights provides Azure Container Insights, which is able to embed within all container host machines of the cluster and forward cluster container logs and metrics onto the receiving Azure Monitor instance. Container insights collects processor and memory metrics from the cluster, nodes, controllers, and containers through the OpenShift monitoring API. Logs from stdout and stderr are collected from all cluster containers. An integrated instance of Azure Storage and Azure Log Analytics collects and persists these metrics, which is then available for Azure Monitor to process for reporting and visualisation.

Example: Azure Insights UI for OpenShift cluster

The following image shows the Azure Insights web interface for cluster monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of the OpenShift cluster, with information describing the current cluster resource utilisation. The Azure Insights cluster web interface provides:

  • An overview of OpenShift cluster resource utilisation, including:
    • Node CPU utilisation.
    • Node memory utilisation.
    • Node count.
    • Active pod count.

heartai-azure-monitor-openshift-cluster.png

Example: Azure Insights UI for OpenShift Nodes

The following image shows the Azure Insights web interface for Node monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster Nodes, describing the status and resource utilisation of cluster Nodes. The Azure Insights nodes web interface provides:

  • A tabled report describing OpenShift cluster Nodes, including:
    • Node names.
    • Node health status.
    • Node CPU utilisation.
    • Active container deployment count.
    • Node uptime.

heartai-azure-monitor-openshift-nodes.png

Example: Azure Insights UI for OpenShift controllers

The following image shows the Azure Insights web interface for controller monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster controllers, describing the status and resource utilisation of cluster controllers. The Azure Insights controllers web interface provides:

  • A tabled report describing OpenShift cluster controllers, including:
    • Controller names.
    • Controller health status.
    • Controller CPU utilisation.
    • Active container deployment count.
    • Container deployment restart count.
    • Container uptime.

heartai-azure-monitor-openshift-controllers.png

Example: Azure Insights UI for OpenShift containers

The following image shows the Azure Insights web interface for container monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster containers, describing the status and resource utilisation of cluster containers. The Azure Insights containers web interface provides:

  • A tabled report describing OpenShift cluster containers, including:
    • Container names.
    • Container health status.
    • Container CPU utilisation.
    • Pod assignments.
    • Node assignments.
    • Container deployment restart count.
    • Container uptime.

heartai-azure-monitor-openshift-containers.png

Example: Azure Container Insights collected metrics

For HeartAI instances of Red Hat OpenShift that are integrated with Azure Container Insights, the following metrics are collected:

Azure namespace Metric Description
Insights.container/nodes cpuUsageMillicores CPU utilization in millicores by host.
Insights.container/nodes cpuUsagePercentage CPU usage percentage by node.
Insights.container/nodes memoryRssBytes Memory RSS utilization in bytes by host.
Insights.container/nodes memoryRssPercentage Memory RSS usage percentage by host.
Insights.container/nodes memoryWorkingSetBytes Memory Working Set utilization in bytes by host.
Insights.container/nodes memoryWorkingSetPercentage Memory Working Set usage percentage by host.
Insights.container/nodes nodesCount Count of nodes by status.
Insights.container/nodes diskUsedPercentage Percentage of disk used on the node by device.
Insights.container/pods podCount Count of pods by controller, namespace, node, and phase.
Insights.container/pods completedJobsCount Completed jobs count older user configurable threshold (default is six hours) by controller, Kubernetes namespace.
Insights.container/pods restartingContainerCount Count of container restarts by controller, Kubernetes namespace.
Insights.container/pods oomKilledContainerCount Count of OOMkilled containers by controller, Kubernetes namespace.
Insights.container/pods podReadyPercentage Percentage of pods in ready state by controller, Kubernetes namespace.
Insights.container/containers cpuExceededPercentage CPU utilization percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.
Insights.container/containers memoryRssExceededPercentage Memory RSS percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.
Insights.container/containers memoryWorkingSetExceededPercentage Memory Working Set percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.
Insights.container/persistentvolumes pvUsageExceededPercentage PV utilization percentage for persistent volumes exceeding user configurable threshold (default is 60.0) by claim name, Kubernetes namespace, volume name, pod name, and node name.

Example: Azure Monitor Action rules for Azure Container Insights

For HeartAI instance of Red Hat OpenShift that are integrated with Azure Container insights, Azure Monitor Action rules provide alerting functionalities to HeartAI administrators and developers.

The following alerting rules are configured:

Alert name Description Trigger
Average container CPU % Calculates average CPU used per container. When average CPU usage per container is greater than 95%.
Average container working set memory % Calculates average working set memory used per container. When average working set memory usage per container is greater than 95%.
Average CPU % Calculates average CPU used per node. When average node CPU utilization is greater than 80%
Average Disk Usage % Calculates average disk usage for a node. When disk usage for a node is greater than 80%.
Average Persistent Volume Usage % Calculates average PV usage per pod. When average PV usage per pod is greater than 80%.
Average Working set memory % Calculates average Working set memory for a node. When average Working set memory for a node is greater than 80%.
Restarting container count Calculates number of restarting containers. When container restarts are greater than 0.
Failed Pod Counts Calculates if any pod in failed state. When a number of pods in failed state are greater than 0.
Node NotReady status Calculates if any node is in NotReady state. When a number of nodes in NotReady state are greater than 0.
OOM Killed Containers Calculates number of OOM killed containers. When a number of OOM killed containers is greater than 0.
Pods ready % Calculates the average ready state of pods. When ready state of pods is less than 80%.
Completed job count Calculates number of jobs completed more than six hours ago. When number of stale jobs older than six hours is greater than 0.
References

Further information about these approaches may be found with the following external references:

Azure Sentinel

Log Analytics workspaces are aggregated together with Azure Sentinel, providing the functionality of an integrated security information and event management (SIEM) platform.

Azure Sentinel provides:

  • Real-time collection of Azure resource event data.
  • Event-driven alerting and pattern detection.
  • Detection of abnormal or suspicious event behaviour.
  • Visualisations of events and alerts.
  • Analysis of anomalous activity.
  • Geolocation detection for event behaviour patterns.

heartai-azure-sentinel.png

PostgreSQL monitoring with pgAdmin

HeartAI instances of PostgreSQL data servers are manageable with the pgAdmin data server administration and development platform.

Example: pgAdmin UI for PostgreSQL data servers

The following image shows the pgAdmin web interface for administration and development of a PostgreSQL data server instance. The pgAdmin web interface provides functionality to administer and develop with PostgreSQL instances, including:

  • An overview of pgAdmin-interfaced PostgreSQL data server instances, including:
    • Corresponding PostgreSQL databases.
    • Monitoring metrics and visualisations for:
      • Active database sessions.
      • Database transactions per second.
      • Database tuples in.
      • Database tuples out.
      • Database block I/O.
    • Server activity reporting.

pgadmin-monitoring.png