Cloud logging and monitoring
Azure Log Analytics
Azure Log Analytics collects and stores logs, and provides reporting and visualisation functionality.
Azure Insights
Azure Insights is an application-level telemetry tool that natively integrates with Azure Monitor to monitor and observe application components. This is achievable by embedding a Microsoft application extension into a corresponding supported language. This extension collects and forwards a range of telemetry metrics onto Azure Monitor, which may subsequently be analysed and visualised.
For HeartAI instances of Red Hat OpenShift, Azure Insights provides Azure Container Insights, which is able to embed within all container host machines of the cluster and forward cluster container logs and metrics onto the receiving Azure Monitor instance. Container insights collects processor and memory metrics from the cluster, nodes, controllers, and containers through the OpenShift monitoring API. Logs from stdout
and stderr
are collected from all cluster containers. An integrated instance of Azure Storage and Azure Log Analytics collects and persists these metrics, which is then available for Azure Monitor to process for reporting and visualisation.
Example: Azure Insights UI for OpenShift cluster
The following image shows the Azure Insights web interface for cluster monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of the OpenShift cluster, with information describing the current cluster resource utilisation. The Azure Insights cluster web interface provides:
- An overview of OpenShift cluster resource utilisation, including:
- Node CPU utilisation.
- Node memory utilisation.
- Node count.
- Active pod count.
Example: Azure Insights UI for OpenShift Nodes
The following image shows the Azure Insights web interface for Node monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster Nodes, describing the status and resource utilisation of cluster Nodes. The Azure Insights nodes web interface provides:
- A tabled report describing OpenShift cluster Nodes, including:
- Node names.
- Node health status.
- Node CPU utilisation.
- Active container deployment count.
- Node uptime.
Example: Azure Insights UI for OpenShift controllers
The following image shows the Azure Insights web interface for controller monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster controllers, describing the status and resource utilisation of cluster controllers. The Azure Insights controllers web interface provides:
- A tabled report describing OpenShift cluster controllers, including:
- Controller names.
- Controller health status.
- Controller CPU utilisation.
- Active container deployment count.
- Container deployment restart count.
- Container uptime.
Example: Azure Insights UI for OpenShift containers
The following image shows the Azure Insights web interface for container monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster containers, describing the status and resource utilisation of cluster containers. The Azure Insights containers web interface provides:
- A tabled report describing OpenShift cluster containers, including:
- Container names.
- Container health status.
- Container CPU utilisation.
- Pod assignments.
- Node assignments.
- Container deployment restart count.
- Container uptime.
Example: Azure Container Insights collected metrics
For HeartAI instances of Red Hat OpenShift that are integrated with Azure Container Insights, the following metrics are collected:
Azure namespace | Metric | Description |
---|---|---|
Insights.container/nodes | cpuUsageMillicores | CPU utilization in millicores by host. |
Insights.container/nodes | cpuUsagePercentage | CPU usage percentage by node. |
Insights.container/nodes | memoryRssBytes | Memory RSS utilization in bytes by host. |
Insights.container/nodes | memoryRssPercentage | Memory RSS usage percentage by host. |
Insights.container/nodes | memoryWorkingSetBytes | Memory Working Set utilization in bytes by host. |
Insights.container/nodes | memoryWorkingSetPercentage | Memory Working Set usage percentage by host. |
Insights.container/nodes | nodesCount | Count of nodes by status. |
Insights.container/nodes | diskUsedPercentage | Percentage of disk used on the node by device. |
Insights.container/pods | podCount | Count of pods by controller, namespace, node, and phase. |
Insights.container/pods | completedJobsCount | Completed jobs count older user configurable threshold (default is six hours) by controller, Kubernetes namespace. |
Insights.container/pods | restartingContainerCount | Count of container restarts by controller, Kubernetes namespace. |
Insights.container/pods | oomKilledContainerCount | Count of OOMkilled containers by controller, Kubernetes namespace. |
Insights.container/pods | podReadyPercentage | Percentage of pods in ready state by controller, Kubernetes namespace. |
Insights.container/containers | cpuExceededPercentage | CPU utilization percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name. |
Insights.container/containers | memoryRssExceededPercentage | Memory RSS percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name. |
Insights.container/containers | memoryWorkingSetExceededPercentage | Memory Working Set percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name. |
Insights.container/persistentvolumes | pvUsageExceededPercentage | PV utilization percentage for persistent volumes exceeding user configurable threshold (default is 60.0) by claim name, Kubernetes namespace, volume name, pod name, and node name. |
Example: Azure Monitor Action rules for Azure Container Insights
For HeartAI instance of Red Hat OpenShift that are integrated with Azure Container insights, Azure Monitor Action rules provide alerting functionalities to HeartAI administrators and developers.
The following alerting rules are configured:
Alert name | Description | Trigger |
---|---|---|
Average container CPU % | Calculates average CPU used per container. | When average CPU usage per container is greater than 95%. |
Average container working set memory % | Calculates average working set memory used per container. | When average working set memory usage per container is greater than 95%. |
Average CPU % | Calculates average CPU used per node. | When average node CPU utilization is greater than 80% |
Average Disk Usage % | Calculates average disk usage for a node. | When disk usage for a node is greater than 80%. |
Average Persistent Volume Usage % | Calculates average PV usage per pod. | When average PV usage per pod is greater than 80%. |
Average Working set memory % | Calculates average Working set memory for a node. | When average Working set memory for a node is greater than 80%. |
Restarting container count | Calculates number of restarting containers. | When container restarts are greater than 0. |
Failed Pod Counts | Calculates if any pod in failed state. | When a number of pods in failed state are greater than 0. |
Node NotReady status | Calculates if any node is in NotReady state. | When a number of nodes in NotReady state are greater than 0. |
OOM Killed Containers | Calculates number of OOM killed containers. | When a number of OOM killed containers is greater than 0. |
Pods ready % | Calculates the average ready state of pods. | When ready state of pods is less than 80%. |
Completed job count | Calculates number of jobs completed more than six hours ago. | When number of stale jobs older than six hours is greater than 0. |
Further information about these approaches may be found with the following external references:
Azure Sentinel
Log Analytics workspaces are aggregated together with Azure Sentinel, providing the functionality of an integrated security information and event management (SIEM) platform.
Azure Sentinel provides:
- Real-time collection of Azure resource event data.
- Event-driven alerting and pattern detection.
- Detection of abnormal or suspicious event behaviour.
- Visualisations of events and alerts.
- Analysis of anomalous activity.
- Geolocation detection for event behaviour patterns.
PostgreSQL monitoring with pgAdmin
HeartAI instances of PostgreSQL data servers are manageable with the pgAdmin data server administration and development platform.
Example: pgAdmin UI for PostgreSQL data servers
The following image shows the pgAdmin web interface for administration and development of a PostgreSQL data server instance. The pgAdmin web interface provides functionality to administer and develop with PostgreSQL instances, including:
- An overview of pgAdmin-interfaced PostgreSQL data server instances, including:
- Corresponding PostgreSQL databases.
- Monitoring metrics and visualisations for:
- Active database sessions.
- Database transactions per second.
- Database tuples in.
- Database tuples out.
- Database block I/O.
- Server activity reporting.