Microsoft Azure implementation

Microsoft Azure is a major provider of cloud computing resources and services created by Microsoft. Azure supplies on-demand and reserved instances of virtual machines, virtual networks, storage services, identity services, monitoring services, logging services, observability services, analytical services, and platform services. Azure services may be considered generally as infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).

Azure naming standards

HeartAI naming standards for Azure resources follow Microsoft-recommended guidelines for resource naming and tagging:

Azure subscription for HeartAI production environment

The HeartAI production environment is managed through the following Azure subscription:

  • sah-heartai-prod

The following Resource Groups partition the HeartAI production environment:

Resource Group Description
sah-heartai-rg-prod-keyvault-aue-001 Azure Key Vault
sah-heartai-rg-prod-tfstate-aue-001 Terraform State backend
sah-heartai-rg-prod-aue-001 HeartAI production environment resources
aro-u541ij0x HeartAI production environment resources for Microsoft Azure Red Hat OpenShift

Example: Azure Portal for subscription

The following image shows the Azure Portal web interface for an Azure subscription. This interface displays:

  • Information about the subscription, including the subscription ID and resource location directory.
  • Costing by resource, including aggregated costing reports and forecasted costing.
  • Summary details about the subscription.

heartai-azure-subscription.png

Example: Azure Portal for Resource Group

The following image shows Azure Portal for the HeartAI production environment sah-heartai-rg-prod-aue-001 Resource Group resource:

azure-sah-heartai-rg-prod-aue-001.png

Terraform implementation

The HeartAI implementation of Microsoft Azure is managed with the Terraform declarative infrastructure-as-code software framework. Terraform allows for the declaration of system components using configuration files specified with the HashiCorp Configuration Language (HCL). Collections of these configuration files provides a declarative representative of HeartAI infrastructure-level components, which are synchronisable with the state of Microsoft Azure environments through the Azure Resource Manager API. Infrastructure deployment with Terraform supports HeartAI system infrastructure deployment in a way that is consistent, maintainable, scalable, and reproducible.

Terraform implementation

Further information about the HeartAI implementation of Terraform may be found with the following documentation:

Azure Virtual Network

Azure Virtual Network (VNet) provides cloud-hosted networking infrastructure. HeartAI services do not expose network endpoints to the public internet - All network resolution occurs internal to the HeartAI network or through private network extension.

HeartAI Azure Virtual Network address spaces

Network name Description Network address range Hosting network
sah-heartai-vnet-prod-aue-001 HeartAI production environment 10.X.X.0/24 Microsoft Azure
sah-heartai-vnet-test-aue-001 HeartAI testing environment 10.X.X.0/24 Microsoft Azure

The HeartAI production environment partitions an Azure Virtual Network to the following subnetworks:

HeartAI production environment subnetworks

Subnet name Description Subnet address Address range Available IPs Hosts
sah-heartai-snet-aroworker-prod-aue-001 Azure Red Hat Openshift worker nodes 10.X.X.0/25 10.X.X.0 - 10.X.X.127 10.X.X.1 - 10.X.X.126 126
sah-heartai-snet-aromaster-prod-aue-002 Azure Red Hat Openshift master nodes 10.X.X.128/27 10.X.X.128 - 10.X.X.159 10.X.X.129 - 10.X.X.158 30
Subnet range unassigned 10.X.X.160/27 10.X.X.160 - 10.X.X.191 10.X.X.161 - 10.X.X.190 30
sah-heartai-snet-paas-prod-aue-004 Azure PaaS endpoints 10.X.X.192/26 10.X.X.192 - 10.X.X.255 10.X.X.193 - 10.X.X.254 62

Azure Private Link

Azure Private Link provides networking approaches to securely interface with Azure cloud services, such as Azure Key Vault, Azure Database for PostgreSQL, and Azure Cosmos DB. Through private link these service endpoints addresses extend onto the HeartAI virtual network and are routable entirely through the Microsoft backbone network.

Azure Key Vault

Azure Key Vault provides a secure store for sensitive data such as cryptographic keys and configuration secrets. These sensitive values are injectable to system environments by calling Key Vault and retrieving the corresponding data. The following features support the HeartAI system:

  • Applications and secrets have no direct access to keys.
  • Encryption keys may be created and imported within minutes.
  • Highly available with 99.9% availability.
  • Transaction processing within 5 seconds.

The HeartAI system provisions the following Azure Key Vault resources:

Resource Specification
Region Australia East
Operations 10,000 / month

Azure Database for PostgreSQL

Azure Database for PostgreSQL provides a fully-managed PostgreSQL database service. The following features support the HeartAI system:

  • Highly available with 99.99% availability.
  • Data redundancy with 3x replication.

The HeartAI production environment provisions the following Azure Database for PostgreSQL instance:

Resource Specification
Region Australia East
Database option Single server
Tier General purpose
Compute Gen 5, 2 vCPU
Service usage 730 hours / month
Storage 50 GB
Redundancy Geo-redundant storage
Savings options Pay as you go

Azure Insights

Azure Insights is an application-level telemetry tool that natively integrates with Azure Monitor to monitor and observe application components. This is achievable by embedding a Microsoft application extension into a corresponding supported language. This extension collects and forwards a range of telemetry metrics onto Azure Monitor, which may subsequently be analysed and visualised.

For HeartAI instances of Red Hat OpenShift, Azure Insights provides Azure Container Insights, which is able to embed within all container host machines of the cluster and forward cluster container logs and metrics onto the receiving Azure Monitor instance. Container insights collects processor and memory metrics from the cluster, nodes, controllers, and containers through the OpenShift monitoring API. Logs from stdout and stderr are collected from all cluster containers. An integrated instance of Azure Storage and Azure Log Analytics collects and persists these metrics, which is then available for Azure Monitor to process for reporting and visualisation.

Example: Azure Insights UI for OpenShift cluster

The following image shows the Azure Insights web interface for cluster monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of the OpenShift cluster, with information describing the current cluster resource utilisation. The Azure Insights cluster web interface provides:

  • An overview of OpenShift cluster resource utilisation, including:
    • Node CPU utilisation.
    • Node memory utilisation.
    • Node count.
    • Active pod count.

heartai-azure-monitor-openshift-cluster.png

Example: Azure Insights UI for OpenShift Nodes

The following image shows the Azure Insights web interface for Node monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster Nodes, describing the status and resource utilisation of cluster Nodes. The Azure Insights nodes web interface provides:

  • A tabled report describing OpenShift cluster Nodes, including:
    • Node names.
    • Node health status.
    • Node CPU utilisation.
    • Active container deployment count.
    • Node uptime.

heartai-azure-monitor-openshift-nodes.png

Example: Azure Insights UI for OpenShift controllers

The following image shows the Azure Insights web interface for controller monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster controllers, describing the status and resource utilisation of cluster controllers. The Azure Insights controllers web interface provides:

  • A tabled report describing OpenShift cluster controllers, including:
    • Controller names.
    • Controller health status.
    • Controller CPU utilisation.
    • Active container deployment count.
    • Container deployment restart count.
    • Container uptime.

heartai-azure-monitor-openshift-controllers.png

Example: Azure Insights UI for OpenShift containers

The following image shows the Azure Insights web interface for container monitoring and logging for a HeartAI instance of Red Hat OpenShift. This interface provides an overview of OpenShift cluster containers, describing the status and resource utilisation of cluster containers. The Azure Insights containers web interface provides:

  • A tabled report describing OpenShift cluster containers, including:
    • Container names.
    • Container health status.
    • Container CPU utilisation.
    • Pod assignments.
    • Node assignments.
    • Container deployment restart count.
    • Container uptime.

heartai-azure-monitor-openshift-containers.png

Example: Azure Container Insights collected metrics

For HeartAI instances of Red Hat OpenShift that are integrated with Azure Container Insights, the following metrics are collected:

Azure namespace Metric Description
Insights.container/nodes cpuUsageMillicores CPU utilization in millicores by host.
Insights.container/nodes cpuUsagePercentage CPU usage percentage by node.
Insights.container/nodes memoryRssBytes Memory RSS utilization in bytes by host.
Insights.container/nodes memoryRssPercentage Memory RSS usage percentage by host.
Insights.container/nodes memoryWorkingSetBytes Memory Working Set utilization in bytes by host.
Insights.container/nodes memoryWorkingSetPercentage Memory Working Set usage percentage by host.
Insights.container/nodes nodesCount Count of nodes by status.
Insights.container/nodes diskUsedPercentage Percentage of disk used on the node by device.
Insights.container/pods podCount Count of pods by controller, namespace, node, and phase.
Insights.container/pods completedJobsCount Completed jobs count older user configurable threshold (default is six hours) by controller, Kubernetes namespace.
Insights.container/pods restartingContainerCount Count of container restarts by controller, Kubernetes namespace.
Insights.container/pods oomKilledContainerCount Count of OOMkilled containers by controller, Kubernetes namespace.
Insights.container/pods podReadyPercentage Percentage of pods in ready state by controller, Kubernetes namespace.
Insights.container/containers cpuExceededPercentage CPU utilization percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.
Insights.container/containers memoryRssExceededPercentage Memory RSS percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.
Insights.container/containers memoryWorkingSetExceededPercentage Memory Working Set percentage for containers exceeding user configurable threshold (default is 95.0) by container name, controller name, Kubernetes namespace, pod name.
Insights.container/persistentvolumes pvUsageExceededPercentage PV utilization percentage for persistent volumes exceeding user configurable threshold (default is 60.0) by claim name, Kubernetes namespace, volume name, pod name, and node name.

Example: Azure Monitor Action rules for Azure Container Insights

For HeartAI instance of Red Hat OpenShift that are integrated with Azure Container insights, Azure Monitor Action rules provide alerting functionalities to HeartAI administrators and developers.

The following alerting rules are configured:

Alert name Description Trigger
Average container CPU % Calculates average CPU used per container. When average CPU usage per container is greater than 95%.
Average container working set memory % Calculates average working set memory used per container. When average working set memory usage per container is greater than 95%.
Average CPU % Calculates average CPU used per node. When average node CPU utilization is greater than 80%
Average Disk Usage % Calculates average disk usage for a node. When disk usage for a node is greater than 80%.
Average Persistent Volume Usage % Calculates average PV usage per pod. When average PV usage per pod is greater than 80%.
Average Working set memory % Calculates average Working set memory for a node. When average Working set memory for a node is greater than 80%.
Restarting container count Calculates number of restarting containers. When container restarts are greater than 0.
Failed Pod Counts Calculates if any pod in failed state. When a number of pods in failed state are greater than 0.
Node NotReady status Calculates if any node is in NotReady state. When a number of nodes in NotReady state are greater than 0.
OOM Killed Containers Calculates number of OOM killed containers. When a number of OOM killed containers is greater than 0.
Pods ready % Calculates the average ready state of pods. When ready state of pods is less than 80%.
Completed job count Calculates number of jobs completed more than six hours ago. When number of stale jobs older than six hours is greater than 0.
References

Further information about these approaches may be found with the following external references:

Azure Log Analytics workspaces

azure-sah-heartai-aro-prod-aue-001-insights-6.png

Azure Sentinel

Log Analytics workspaces are aggregated together with Azure Sentinel, providing the functionality of an integrated security information and event management (SIEM) platform.

Azure Sentinel provides:

  • Real-time collection of Azure resource event data.
  • Event-driven alerting and pattern detection.
  • Detection of abnormal or suspicious event behaviour.
  • Visualisations of events and alerts.
  • Analysis of anomalous activity.
  • Geolocation detection for event behaviour patterns.

heartai-azure-sentinel.png

Azure Red Hat OpenShift

Microsoft Azure Red Hat OpenShift provides a fully managed Red Hat OpenShift service on Microsoft Azure. The following features support the HeartAI system:

  • Fully managed Red Hat OpenShift cluster.
  • Fully managed infrastructure for master and worker nodes.
  • Enhanced security with integration through Azure Active Directory.
  • Highly available with 99.95% availability.
  • Jointly engineered and operated by Microsoft and Red Hat.
Red Hat OpenShift implementation

HeartAI orchestrates system services with the Kubernetes-based Red Hat OpenShift container platform. Further information about the HeartAI implementation of Red Hat OpenShift may be found with the following documentation:

Example: Azure Portal for Resource Group of Red Hat OpenShift resources

The following image shows Azure Portal for the aro-u541ij0x Resource Group, which is dynamically generated alongside a deployment of Microsoft Azure Red Hat OpenShift. The resources in this Resource Group are managed by Azure.

azure-sah-heartai-arorg-prod-aue-001.png

Azure deployment overview

The following figure shows a structural overview of Microsoft Azure cloud resources within a HeartAI production environment instance. This figures represents:

  • A corresponding Azure vWAN hub, including:
    • An Azure ExpressRoute as an example of external network connectivity.
    • An Azure Virtual WAN instance.
    • Network peering between a HeartAI Azure Virtual Network instance and a corresponding Azure vWAN hub.
  • A HeartAI Azure Virtual Network instance, with the following contained resources:
    • Azure Red Hat OpenShift Master nodes.
    • Azure Red Hat OpenShift Worker nodes.
    • Azure private endpoints, with internal and private network connectivity to Azure cloud services.
  • Azure cloud services, including:

heartai-azure-network-architecture.svg