# Analytical capabilities

This documentation is currently in-development. Please visit again soon, this section is actively updated.

Evolving approaches with probabilistic programming and computational statistics are promising powerful methods to define and create analytical systems. These methods can represent probabilistic models under a variety of conditions and constraints, and are capable of simulating new data from these states by the generative nature of these models. HeartAI deployments supports inference from a Bayesian perspective, such as by generating samples from a prior or posterior predictive distribution, where there may also be a conditioning or marginalisation of such a distribution. In addition to the creation of a representative probabilistic construct, these models may be used for the generation of new data under a variety of assumptions and hypothetical situations, and allow for the prediction and forecasting of future events and potential outcomes.

## Analytical capabilities overview

HeartAI extends robust data systems with modern and high-performance analytical capabilities, including support for state-of-the-art probabilistic computation and machine learning methodologies. These capabilities allow for conventional data reporting and analytics through to real-time artificial intelligence and learning systems. HeartAI analytical capabilities allow for real-time analytics including prediction, decision support, and optimisation.

For probabilistic computation, HeartAI implements Stan as a powerful probabilistic programming language and high-performance statistical computation library. Stan provides extensive support for probabilistic programming constructs and modern Markov chain Monte Carlo (MCMC) optimisation methods, including Hamiltonian Monte Carlo and no-U-turn sampling (NUTS) optimisation approaches.

For machine learning methodologies, HeartAI implements conventional XGBoost regularising gradient boosting frameworks, and further extends with modern deep learning approaches using PyTorch with the Python programming language.

## Stan implementation

Stan is a powerful probabilistic programming language and high-performance probabilistic computation library, with support for:

- Robust and mature probabilistic programming language constructs.
- High-performance mathematical computation libraries.
- Markov chain Monte Carlo (MCMC) optimisation methods.
- Bayesian inference.
- Variational inference.
- Interfaces to data and analysis languages (R, Python, shell, MATLAB, Julia, Stata).

## Stan supported probability distributions

- Binary probability distributions
- Bounded discrete probability distributions
- Unbounded discrete probability distributions
- Unbounded continuous probability distributions
- Cauchy
- Double exponential (Laplace)
- Double exponential (Laplace), skew
- Gaussian process
- Gaussian process, Cholesky parameterisation
- Gumbel
- Logistic
- Multivariate normal
- Multivariate normal, Cholesky parameterisation
- Multivariate normal, precision parameterisation
- Multivariate Student’s t
- Normal
- Normal, exponentially modified
- Normal, skew
- Student’s t

- Bounded continuous probability distributions
- Positive continuous probability distributions
- [0, 1]-bounded continuous probability distributions
- Correlation probability distributions
- Covariance probability distributions

## Stan supported probabilistic models

- Binary probabilistic models
- Bounded discrete probabilisitc models
- Unbounded discrete probabilisitc models
- Continuous models

## Service activity models

### Example: SA Virtual Care Service activity model

Supporting the SA Virtual Care Service (SAVCS), HeartAI has developed a variety of analytical capabilities that allow the modelling of length-of-service corresponding to individual patients and the particular status of the service. This includes the creation of models for admissions to the service over time, including the patient population of admissions, the likelihood of potential outcomes for the individual patient, and the length-of-service for each episode of care. These modelling capabilities are generated in real-time, are continuously updated as more patient information is determined or observed, and are accessible through an integrated user application interface.

The following example shows a Stan model for SAVCS activity:

```
functions {
matrix kern_quad(matrix dist_sqr, real tau, real lengthscale) {
return square(tau) * exp(-0.5 * inv_square(lengthscale) * dist_sqr);
}
}
data {
// Time-series to model hourly-counts.
int D; // Number of days. First day is 1, last day is D.
int H; // Number of hours. First hour idx is at 1, last is at H.
array[D,H] int hourly_count;
// The density of these is modelled so that we can generate visits from the posterior.
int N; // Number of visits.
int A; // Number of histogrammed Age levels.
int G; // Number of Gender levels.
int C; // Number of Complaint levels.
int U; // Number of Acuity levels.
int S; // Number of VCS stream levels.
int L; // Number of LHN catchment levels.
array[N] int<lower=1, upper=A> age_idx;
array[N] int<lower=1, upper=G> gender_idx;
array[N] int<lower=1, upper=C> complaint_idx;
array[N] int<lower=1, upper=U> acuity_idx;
array[N] int<lower=1, upper=S> vcs_stream_idx;
array[N] int<lower=1, upper=L> lhn_catchment_idx;
// Independent noise added to diagonal of kernel matrix to make it numerically stable.
real<lower=0> nugget;
}
transformed data {
// Nugget matrices for numerical stability.
matrix[D,D] nug_d = nugget * diag_matrix(rep_vector(1, D));
matrix[H,H] nug_h = nugget * diag_matrix(rep_vector(1, H));
vector[D] x_days = linspaced_vector(D, 1, D);
vector[H] x_hours = linspaced_vector(H, 1, H);
// Matrix versions of distances for kernel input, with periodic hour setup in order to use the kern_quad function.
matrix[D,D] dist_d = fabs(rep_matrix(x_days , D) - rep_matrix(x_days' , D));
matrix[H,H] dist_h = sin(pi() / 24 * fabs(rep_matrix(x_hours, H) - rep_matrix(x_hours', H)));
matrix[D,D] distsqr_d = dist_d .* dist_d;
matrix[H,H] distsqr_h = dist_h .* dist_h;
}
parameters {
// GP normal(0, 1) prior noise for non-centred parameterisation.
vector[D] z_d;
vector[H] z_h;
// GP length-scales.
real<lower=0> lambda_d;
real<lower=0> lambda_h;
// GP standard devations. Only for the day GP, so that rate is not overdetermined.
real mu_d; // Constant mean offset.
real tau_d; // Root-variance for the day GP, in units of log-count per hour.
real<lower=0> phi; // Neg-binom overdispersion.
// Model density of factors.
simplex[A] age_theta;
simplex[G] gender_theta;
simplex[C] complaint_theta;
simplex[U] acuity_theta;
simplex[S] vcs_stream_theta;
simplex[L] lhn_catchment_theta;
}
transformed parameters {
vector[D] alpha_d;
vector[H] alpha_h;
{
matrix[D, D] kern_chol_d = cholesky_decompose(kern_quad(distsqr_d, tau_d, lambda_d) + nug_d);
matrix[H, H] kern_chol_h = cholesky_decompose(kern_quad(distsqr_h, 1.0, lambda_h) + nug_h);
alpha_d = mu_d + (kern_chol_d * z_d);
alpha_h = kern_chol_h * z_h;
}
}
model {
z_d ~ normal(0, 1);
z_h ~ normal(0, 1);
lambda_d ~ gamma(14, 2);
lambda_h ~ gamma(2, 1);
mu_d ~ normal(0, 1);
tau_d ~ gamma(2, 1);
phi ~ exponential(1);
for (di in 1:D) {
hourly_count[di] ~ neg_binomial_2_log(alpha_d[di] + alpha_h, phi);
}
age_theta ~ dirichlet(rep_vector(2, A));
gender_theta ~ dirichlet(rep_vector(2, G));
complaint_theta ~ dirichlet(rep_vector(2, C));
acuity_theta ~ dirichlet(rep_vector(2, U));
vcs_stream_theta ~ dirichlet(rep_vector(2, S));
lhn_catchment_theta ~ dirichlet(rep_vector(2, L));
// Per-visit factor likelihood.
for (i in 1:N) {
age_idx[i] ~ categorical(age_theta);
gender_idx[i] ~ categorical(gender_theta);
complaint_idx[i] ~ categorical(complaint_theta);
acuity_idx[i] ~ categorical(acuity_theta);
vcs_stream_idx[i] ~ categorical(vcs_stream_theta);
lhn_catchment_idx[i] ~ categorical(lhn_catchment_theta);
}
}
```

Using a variety of model implementations, the following visualisations show SAVCS service activity:

The following example shows HeartAI service demand forecasting for the SA Virtual Care Service. This figure shows recorded admissions to the service in white, the corresponding probabilistic modelling of historical admissions in blue, and forecasted admissions in orange. Further to modelling of service admissions itself, the probabilistic distributions of several driving (predictive) features is shown, as well as the modelled service outcomes and service length-of-stay.

In addition, these approaches allow an understanding of data relationships down to the level of individual variables. The following figure shows a subset of patient features across admissions to the SA Virtual Care Service. The empirical measure of these patient features are shown alongside the corresponding modelled probabilistic posterior distributions.