Analytical capabilities

In-development

This documentation is currently in-development. Please visit again soon, this section is actively updated.

Analytical capabilities overview

HeartAI extends robust data systems with modern and high-performance analytical capabilities, including support for state-of-the-art probabilistic computation and machine learning methodologies. These capabilities allow for conventional data reporting and analytics through to real-time artificial intelligence and learning systems. HeartAI analytical capabilities allow for real-time analytics including prediction, decision support, and optimisation.

For probabilistic computation, HeartAI implements Stan as a powerful probabilistic programming language and high-performance statistical computation library. Stan provides extensive support for probabilistic programming constructs and modern Markov chain Monte Carlo (MCMC) optimisation methods, including Hamiltonian Monte Carlo and no-U-turn sampling (NUTS) optimisation approaches.

For machine learning methodologies, HeartAI implements conventional XGBoost regularising gradient boosting frameworks, and further extends with modern deep learning approaches using PyTorch with the Python programming language.

Stan implementation

Stan is a powerful probabilistic programming language and high-performance probabilistic computation library, with support for:

  • Robust and mature probabilistic programming language constructs.
  • High-performance mathematical computation libraries.
  • Markov chain Monte Carlo (MCMC) optimisation methods.
  • Bayesian inference.
  • Variational inference.
  • Interfaces to data and analysis languages (R, Python, shell, MATLAB, Julia, Stata).

Stan supported probability distributions

Stan supported probabilistic models

Service activity models

Example: SA Virtual Care Service activity model

functions {
  matrix kern_quad(matrix dist_sqr, real tau, real lengthscale) {
    return square(tau) * exp(-0.5 * inv_square(lengthscale) * dist_sqr);
  }
}

data {
  // Time-series to model hourly-counts.
  int D;  // Number of days. First day is 1, last day is D.
  int H;  // Number of hours. First hour idx is at 1, last is at H.
  array[D,H] int hourly_count;

  // The density of these is modelled so that we can generate visits from the posterior.
  int N;  // Number of visits.
  int A;  // Number of histogrammed Age levels.
  int G;  // Number of Gender levels.
  int C;  // Number of Complaint levels.
  int U;  // Number of Acuity levels.
  int S;  // Number of VCS stream levels.
  int L;  // Number of LHN catchment levels.
  array[N] int<lower=1, upper=A> age_idx;
  array[N] int<lower=1, upper=G> gender_idx;
  array[N] int<lower=1, upper=C> complaint_idx;
  array[N] int<lower=1, upper=U> acuity_idx;
  array[N] int<lower=1, upper=S> vcs_stream_idx;
  array[N] int<lower=1, upper=L> lhn_catchment_idx;

  // Independent noise added to diagonal of kernel matrix to make it numerically stable.
  real<lower=0> nugget;
}

transformed data {
  // Nugget matrices for numerical stability.
  matrix[D,D] nug_d = nugget * diag_matrix(rep_vector(1, D));
  matrix[H,H] nug_h = nugget * diag_matrix(rep_vector(1, H));

  vector[D] x_days = linspaced_vector(D, 1, D);
  vector[H] x_hours = linspaced_vector(H, 1, H);

  // Matrix versions of distances for kernel input, with periodic hour setup in order to use the kern_quad function.
  matrix[D,D] dist_d =                 fabs(rep_matrix(x_days , D) - rep_matrix(x_days' , D));
  matrix[H,H] dist_h = sin(pi() / 24 * fabs(rep_matrix(x_hours, H) - rep_matrix(x_hours', H)));

  matrix[D,D] distsqr_d = dist_d .* dist_d;
  matrix[H,H] distsqr_h = dist_h .* dist_h;
}

parameters {

  // GP normal(0, 1) prior noise for non-centred parameterisation.
  vector[D] z_d;
  vector[H] z_h;

  // GP length-scales.
  real<lower=0> lambda_d;
  real<lower=0> lambda_h;

  // GP standard devations. Only for the day GP, so that rate is not overdetermined.
  real mu_d;  // Constant mean offset.
  real tau_d; // Root-variance for the day GP, in units of log-count per hour.

  real<lower=0> phi;  // Neg-binom overdispersion.

  // Model density of factors.
  simplex[A] age_theta;
  simplex[G] gender_theta;
  simplex[C] complaint_theta;
  simplex[U] acuity_theta;
  simplex[S] vcs_stream_theta;
  simplex[L] lhn_catchment_theta;
}

transformed parameters {
  vector[D] alpha_d;
  vector[H] alpha_h;

  {
    matrix[D, D] kern_chol_d = cholesky_decompose(kern_quad(distsqr_d, tau_d, lambda_d) + nug_d);
    matrix[H, H] kern_chol_h = cholesky_decompose(kern_quad(distsqr_h, 1.0, lambda_h) + nug_h);
  
    alpha_d = mu_d + (kern_chol_d * z_d);
    alpha_h =         kern_chol_h * z_h;
  }
}

model {

  z_d ~ normal(0, 1);
  z_h ~ normal(0, 1);

  lambda_d ~ gamma(14, 2);
  lambda_h ~ gamma(2, 1);

  mu_d ~ normal(0, 1);
  tau_d ~ gamma(2, 1);

  phi ~ exponential(1);
  
  for (di in 1:D) {
    hourly_count[di] ~ neg_binomial_2_log(alpha_d[di] + alpha_h, phi);
  }

  age_theta ~ dirichlet(rep_vector(2, A));
  gender_theta ~ dirichlet(rep_vector(2, G));
  complaint_theta ~ dirichlet(rep_vector(2, C));
  acuity_theta ~ dirichlet(rep_vector(2, U));
  vcs_stream_theta ~ dirichlet(rep_vector(2, S));
  lhn_catchment_theta ~ dirichlet(rep_vector(2, L));

  // Per-visit factor likelihood.
  for (i in 1:N) {
    age_idx[i] ~ categorical(age_theta);
    gender_idx[i] ~ categorical(gender_theta);
    complaint_idx[i] ~ categorical(complaint_theta);
    acuity_idx[i] ~ categorical(acuity_theta);
    vcs_stream_idx[i] ~ categorical(vcs_stream_theta);
    lhn_catchment_idx[i] ~ categorical(lhn_catchment_theta);
  }

}