Microsoft Azure is a major provider of cloud computing resources and services created by Microsoft. Azure supplies on-demand and reserved instances of virtual machines, virtual networks, storage services, identity services, monitoring services, logging services, observability services, analytical services, and platform services. Azure services may be considered generally as infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).
HeartAI naming standards for Azure resources follow Microsoft-recommended guidelines for resource naming and tagging:
- Microsoft Azure Naming and Tagging
- Microsoft Azure Resource Naming
- Microsoft Azure Resource Abbreviations
The HeartAI production environment is managed through the following Azure subscription:
The following Resource Groups partition the HeartAI production environment:
|sah-heartai-rg-prod-keyvault-aue-001||Azure Key Vault|
|sah-heartai-rg-prod-tfstate-aue-001||Terraform State backend|
|sah-heartai-rg-prod-aue-001||HeartAI production environment resources|
|aro-u541ij0x||HeartAI production environment resources for Microsoft Azure Red Hat OpenShift|
The following image shows the Azure Portal web interface for an Azure subscription. This interface displays:
- Information about the subscription, including the subscription ID and resource location directory.
- Costing by resource, including aggregated costing reports and forecasted costing.
- Summary details about the subscription.
The following image shows Azure Portal for the HeartAI production environment
sah-heartai-rg-prod-aue-001 Resource Group resource:
The HeartAI implementation of Microsoft Azure is managed with the Terraform declarative infrastructure-as-code software framework. Terraform allows for the declaration of system components using configuration files specified with the HashiCorp Configuration Language (HCL). Collections of these configuration files provides a declarative representative of HeartAI infrastructure-level components, which are synchronisable with the state of Microsoft Azure environments through the Azure Resource Manager API. Infrastructure deployment with Terraform supports HeartAI system infrastructure deployment in a way that is consistent, maintainable, scalable, and reproducible.
Further information about the HeartAI implementation of Terraform may be found with the following documentation:
Azure Virtual Network (VNet) provides cloud-hosted networking infrastructure. HeartAI services do not expose network endpoints to the public internet - All network resolution occurs internal to the HeartAI network or through private network extension.
HeartAI Azure Virtual Network address spaces
|Network name||Description||Network address range||Hosting network|
|sah-heartai-vnet-prod-aue-001||HeartAI production environment||10.X.X.0/24||Microsoft Azure|
|sah-heartai-vnet-test-aue-001||HeartAI testing environment||10.X.X.0/24||Microsoft Azure|
The HeartAI production environment partitions an Azure Virtual Network to the following subnetworks:
HeartAI production environment subnetworks
|Subnet name||Description||Subnet address||Address range||Available IPs||Hosts|
|sah-heartai-snet-aroworker-prod-aue-001||Azure Red Hat Openshift worker nodes||10.X.X.0/25||10.X.X.0 - 10.X.X.127||10.X.X.1 - 10.X.X.126||126|
|sah-heartai-snet-aromaster-prod-aue-002||Azure Red Hat Openshift master nodes||10.X.X.128/27||10.X.X.128 - 10.X.X.159||10.X.X.129 - 10.X.X.158||30|
|Subnet range unassigned||10.X.X.160/27||10.X.X.160 - 10.X.X.191||10.X.X.161 - 10.X.X.190||30|
|sah-heartai-snet-paas-prod-aue-004||Azure PaaS endpoints||10.X.X.192/26||10.X.X.192 - 10.X.X.255||10.X.X.193 - 10.X.X.254||62|
Azure Private Link provides networking approaches to securely interface with Azure cloud services, such as Azure Key Vault, Azure Database for PostgreSQL, and Azure Cosmos DB. Through private link these service endpoints addresses extend onto the HeartAI virtual network and are routable entirely through the Microsoft backbone network.
Azure Key Vault provides a secure store for sensitive data such as cryptographic keys and configuration secrets. These sensitive values are injectable to system environments by calling Key Vault and retrieving the corresponding data. The following features support the HeartAI system:
- Applications and secrets have no direct access to keys.
- Encryption keys may be created and imported within minutes.
- Highly available with 99.9% availability.
- Transaction processing within 5 seconds.
The HeartAI system provisions the following Azure Key Vault resources:
|Operations||10,000 / month|
- Highly available with 99.99% availability.
- Data redundancy with 3x replication.
The HeartAI production environment provisions the following Azure Database for PostgreSQL instance:
|Database option||Single server|
|Compute||Gen 5, 2 vCPU|
|Service usage||730 hours / month|
|Savings options||Pay as you go|
Azure Insights is an application-level telemetry tool that natively integrates with Azure Monitor to monitor and observe application components. This is achievable by embedding a Microsoft application extension into a corresponding supported language. This extension collects and forwards a range of telemetry metrics onto Azure Monitor, which may subsequently be analysed and visualised.
For HeartAI instances of Red Hat OpenShift, Azure Insights provides Azure Container Insights, which is able to embed within all container host machines of the cluster and forward cluster container logs and metrics onto the receiving Azure Monitor instance. Container insights collects processor and memory metrics from the cluster, nodes, controllers, and containers through the OpenShift monitoring API. Logs from
stderr are collected from all cluster containers. An integrated instance of Azure Storage and Azure Log Analytics collects and persists these metrics, which is then available for Azure Monitor to process for reporting and visualisation.
The following image shows the Azure Insights web interface for cluster monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of the OpenShift cluster, with information describing the current cluster resource utilisation. The Azure Insights cluster web interface provides:
- An overview of OpenShift cluster resource utilisation, including:
- Node CPU utilisation.
- Node memory utilisation.
- Node count.
- Active pod count.
The following image shows the Azure Insights web interface for Node monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster Nodes, describing the status and resource utilisation of cluster Nodes. The Azure Insights nodes web interface provides:
- A tabled report describing OpenShift cluster Nodes, including:
- Node names.
- Node health status.
- Node CPU utilisation.
- Active container deployment count.
- Node uptime.
The following image shows the Azure Insights web interface for controller monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster controllers, describing the status and resource utilisation of cluster controllers. The Azure Insights controllers web interface provides:
- A tabled report describing OpenShift cluster controllers, including:
- Controller names.
- Controller health status.
- Controller CPU utilisation.
- Active container deployment count.
- Container deployment restart count.
- Container uptime.
The following image shows the Azure Insights web interface for container monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster containers, describing the status and resource utilisation of cluster containers. The Azure Insights containers web interface provides:
- A tabled report describing OpenShift cluster containers, including:
- Container names.
- Container health status.
- Container CPU utilisation.
- Pod assignments.
- Node assignments.
- Container deployment restart count.
- Container uptime.
For HeartAI instances of Red Hat OpenShift that are integrated with Azure Container Insights, the following metrics are collected:
|Insights.container/nodes||cpuUsageMillicores||CPU utilization in millicores by host.|
|Insights.container/nodes||cpuUsagePercentage||CPU usage percentage by node.|
|Insights.container/nodes||memoryRssBytes||Memory RSS utilization in bytes by host.|
|Insights.container/nodes||memoryRssPercentage||Memory RSS usage percentage by host.|
|Insights.container/nodes||memoryWorkingSetBytes||Memory Working Set utilization in bytes by host.|
|Insights.container/nodes||memoryWorkingSetPercentage||Memory Working Set usage percentage by host.|
|Insights.container/nodes||nodesCount||Count of nodes by status.|
|Insights.container/nodes||diskUsedPercentage||Percentage of disk used on the node by device.|
|Insights.container/pods||podCount||Count of pods by controller, namespace, node, and phase.|
|Insights.container/pods||completedJobsCount||Completed jobs count older user configurable threshold (default is six hours) by controller, Kubernetes namespace.|
|Insights.container/pods||restartingContainerCount||Count of container restarts by controller, Kubernetes namespace.|
|Insights.container/pods||oomKilledContainerCount||Count of OOMkilled containers by controller, Kubernetes namespace.|
|Insights.container/pods||podReadyPercentage||Percentage of pods in ready state by controller, Kubernetes namespace.|
|Insights.container/containers||cpuExceededPercentage||CPU utilization percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.|
|Insights.container/containers||memoryRssExceededPercentage||Memory RSS percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.|
|Insights.container/containers||memoryWorkingSetExceededPercentage||Memory Working Set percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.|
|Insights.container/persistentvolumes||pvUsageExceededPercentage||PV utilization percentage for persistent volumes exceeding user configurable threshold (default is 60.0) by claim name, Kubernetes namespace, volume name, pod name, and node name.|
For HeartAI instance of Red Hat OpenShift that are integrated with Azure Container insights, Azure Monitor Action rules provide alerting functionalities to HeartAI administrators and developers.
The following alerting rules are configured:
|Average container CPU %||Calculates average CPU used per container.||When average CPU usage per container is greater than 95%.|
|Average container working set memory %||Calculates average working set memory used per container.||When average working set memory usage per container is greater than 95%.|
|Average CPU %||Calculates average CPU used per node.||When average node CPU utilization is greater than 80%|
|Average Disk Usage %||Calculates average disk usage for a node.||When disk usage for a node is greater than 80%.|
|Average Persistent Volume Usage %||Calculates average PV usage per pod.||When average PV usage per pod is greater than 80%.|
|Average Working set memory %||Calculates average Working set memory for a node.||When average Working set memory for a node is greater than 80%.|
|Restarting container count||Calculates number of restarting containers.||When container restarts are greater than 0.|
|Failed Pod Counts||Calculates if any pod in failed state.||When a number of pods in failed state are greater than 0.|
|Node NotReady status||Calculates if any node is in NotReady state.||When a number of nodes in NotReady state are greater than 0.|
|OOM Killed Containers||Calculates number of OOM killed containers.||When a number of OOM killed containers is greater than 0.|
|Pods ready %||Calculates the average ready state of pods.||When ready state of pods is less than 80%.|
|Completed job count||Calculates number of jobs completed more than six hours ago.||When number of stale jobs older than six hours is greater than 0.|
Further information about these approaches may be found with the following external references:
Azure Sentinel provides:
- Real-time collection of Azure resource event data.
- Event-driven alerting and pattern detection.
- Detection of abnormal or suspicious event behaviour.
- Visualisations of events and alerts.
- Analysis of anomalous activity.
- Geolocation detection for event behaviour patterns.
- Fully managed Red Hat OpenShift cluster.
- Fully managed infrastructure for master and worker nodes.
- Enhanced security with integration through Azure Active Directory.
- Highly available with 99.95% availability.
- Jointly engineered and operated by Microsoft and Red Hat.
HeartAI orchestrates system services with the Kubernetes-based Red Hat OpenShift container platform. Further information about the HeartAI implementation of Red Hat OpenShift may be found with the following documentation:
The following image shows Azure Portal for the
aro-u541ij0x Resource Group, which is dynamically generated alongside a deployment of Microsoft Azure Red Hat OpenShift. The resources in this Resource Group are managed by Azure.
The following figure shows a structural overview of Microsoft Azure cloud resources within a HeartAI production environment instance. This figures represents:
- A corresponding Azure vWAN hub, including:
- An Azure ExpressRoute as an example of external network connectivity.
- An Azure Virtual WAN instance.
- Network peering between a HeartAI Azure Virtual Network instance and a corresponding Azure vWAN hub.
- A HeartAI Azure Virtual Network instance, with the following contained resources:
- Azure Red Hat OpenShift Master nodes.
- Azure Red Hat OpenShift Worker nodes.
- Azure private endpoints, with internal and private network connectivity to Azure cloud services.
- Azure cloud services, including: