Bayesian Methods for Cumulative Probability Models and Population Pharmacokinetic Models

class: center, middle, inverse, title-slide

# Bayesian Methods for Cumulative Probability Models and Population Pharmacokinetic Models
## Vanderbilt University Department of Biostatistics
### Nathan T. James
### April 11, 2022

---

<!--

-->

<!-- outline

-- Bayes Inf & background (8 mins)
-- Bayes CPM (8 mins)
-- ADVI for Bayes pop PK (9 mins)
-- Dexmedetomidine (8 mins)
-- Conclusion/Thanks (5 mins)

-->

# Outline

- .font160[.bold[Bayesian Inference and Estimation]]

- .font160[.grey[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]]

- .font160[.grey[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]]

- .font160[.grey[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]]

---

# Bayesian Inference: .font80[Benefits and Challenges]

.font170[

Benefits  
➕ Interpretation using posterior probabilities   
➕ Ability to formally incorporate prior information   
➕ Natural extension to hierarchical models
]

.font170[
Challenges  
➖ Determining appropriate prior distribution  
➖ Computationally intensive (except in a few special cases)
]

---

# Bayesian Inference: .font70[Background]

.font150[
Under the Bayesian paradigm we combine the `$\color{RoyalBlue}{\text{likelihood}}$` of observed data `$y$` with `$\color{orange}{\text{prior}}$` information to get a `$\color{LimeGreen}{\text{posterior}}$` for the parameters given the data.

`$$\color{LimeGreen}{p(\theta|y)}=\frac{\color{orange}{p(\theta)}\color{RoyalBlue}{p(y|\theta)}}{\int \color{orange}{p(\theta)} \color{RoyalBlue}{p(y|\theta)}\, d\theta}$$`
All inference is based on the `$\color{LimeGreen}{\text{posterior}}$` distribution of the parameters.

Usually can't calculate the posterior directly. Instead, we use different methods to get a numerical approximation of the posterior.
]

---

# Bayesian Inference:  .font80[Markov Chain Monte Carlo (MCMC)]

.font120[
Goal: Take random (i.e. Monte Carlo) draws from a sequence of parameter values (Markov Chain) that looks like the posterior distribution. 
]

.font110[
- Start with some initial value for the posterior parameters at the start of the chain.
]

.font110[
- Based on the current value, propose a guess for the next value in the chain.

]

.font110[
- Accept or reject the guess based on a prespecified rule. If the guess is accepted, use it as the next value in the chain. Otherwise, the next value in the chain is the same as the current value.

]

.font110[
- Repeat steps 2 & 3.
]

.font110[
The distribution used to propose the guesses and the prespecified rule are defined so that draws from the chain will *eventually* look like samples from the posterior.

Then we can take draws from the chain and use them to summarize the posterior distribution.

It can take a lot of iterations for the chain to look like the posterior.
]

---

# Bayesian Inference: .font80[MCMC example]

.font90[
MCMC with 500 warmup iterations, 1500 sampling iterations for parameter `$\theta$`  
]

---

# Bayesian Inference: .font80[Variational Inference (VI)]

.font120[
Goal: Find the variational family that has the smallest Kullback-Leibler (KL) divergence to the posterior and use it as an approximation.
]
--

.font110[
- Pick an easy to work with variational distribution family `$q(\theta;\phi)$` indexed by parameters `$\phi \in \Phi$`.

]

.font110[
- Find the value of `$\phi$` that maximizes `$E_{q}[\log p(y|\theta)]-KL(q(\theta;\phi)||p(\theta))$`. This term is called the **ELBO**.

]

.font110[
- Use an iterative optimization algorithm to find `$\phi^*=\underset{\phi \in \Phi}{\arg\max} \{ELBO\}$` and thus the best variational approximation `$q(\theta;\phi^*)$`
]

.font120[
Then we use the approximation `$q(\theta;\phi^*)$` to summarize the posterior distribution.

Optimization using VI is faster than MCMC, but is only an approximation to the true posterior.
]

---

# Bayesian Inference:  .font80[VI example]

![](fig/intro/advi_ex.gif)

---

# Outline

- .font160[.grey[Bayesian Inference and Estimation]]

- .font160[.bold[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]]

- .font160[.grey[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]]

- .font160[.grey[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]]

---

# Cumulative Probability Models: .font70[Model specification]

.font150[
Let `$Y_i$` be the outcome for individual `$i=1, \ldots, n$`

Each `$Y_i$` falls into one of `$j=1,\ldots,J$` ordered categories  
]

.font150[
This can be modeled as `$Y_i \sim Multinomial(1,\boldsymbol{\pi_i})$`

where `$\boldsymbol{\pi_i}=(\pi_{i1},\ldots,\pi_{iJ})$` are the probabilities of individual `$i$` being in each of the `$J$` categories 
]

.font150[
Then the **cumulative probability** of falling into category `$j$` or lower is `$P(Y_i \le j)=\eta_{ij}=\sum_{k=1}^{j} \pi_{ik}$`
]

---

# Cumulative Probability Models: .font70[Likelihood]

.font120[
The CPM relates the cumulative probabilities to the covariates, `$\boldsymbol{x_i}$`, through a monotonically increasing link function `$G(\cdot)$` with inverse `$G^{-1}(\cdot)$`

`$$G(\eta_{ij})=\gamma_j-\boldsymbol{x_i^{T}\beta} \Longleftrightarrow  \eta_{ij}=G^{-1}(\gamma_j-\boldsymbol{x_i^{T}\beta})$$`

where the `$\gamma_j$` are latent continuous cutpoints `$-\infty \equiv \gamma_0 < \gamma_1 < \cdots < \gamma_{J-1} < \gamma_J \equiv \infty$`
]

.font120[
The likelihood for a random sample of observations `$(y_1,\ldots,y_n)$` is defined using differences in cumulative probabilities
`$$p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})=\prod_{j=1}^{J}\prod_{i:y_i=j} [G^{-1}(\gamma_j-\boldsymbol{x_i^{T}\beta}) - G^{-1}(\gamma_{j-1}-\boldsymbol{x_i^{T}\beta})]$$`

If is  `$G(\cdot)$` is the logit link `$G(p)=\log\left(\frac{p}{1-p}\right)$`, this is the likelihood for a proportional odds regression model

]

---

# Cumulative Probability Models: .font70[Continuous Outcomes]

.font140[
For continuous data with no ties `$J=n$`. Letting `$r(y_i)$` be the rank of `$y_i$` the `$\color{RoyalBlue}{\text{likelihood}}$` is
`$$\color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}=\prod_{i=1}^{n} [G^{-1}(\gamma_{r(y_i)}-\boldsymbol{x_i^{T}\beta}) - G^{-1}(\gamma_{r(y_i)-1}-\boldsymbol{x_i^{T}\beta})]$$`
]

--
.font140[
Why use a CPM for continuous outcomes?
]
--

.font130[
- Regression coefficients `$(\boldsymbol{\beta})$` are invariant to monotonic transformations of outcome
- Directly models full conditional cumulative distribution function (CDF)
- Handles any ordered outcomes including mixed continuous/discrete distributions (e.g., continuous outcome with lower limit of detection)
]

---

# Bayesian Cumulative Probability Models: .font70[Priors]

.font150[
To complete the Bayesian model specification we need to define `$\color{orange}{\text{priors}}$` for the unknown parameters `$\color{orange}{p(\boldsymbol{\beta},\boldsymbol{\gamma})}$`
]

.font150[
We assume a priori independence between `$\boldsymbol{\beta}$` and `$\boldsymbol{\gamma}$` so `$\color{orange}{p(\boldsymbol{\beta},\boldsymbol{\gamma})}=\color{orange}{p(\boldsymbol{\beta})p(\boldsymbol{\gamma})}$`

For convenience, we let the prior for the regression coefficients be non-informative `$\color{orange}{p(\boldsymbol{\beta})} \propto 1$` but other choices are possible
]

.font150[
Specifying a prior for the `$\boldsymbol{\gamma}$` directly is challenging because of the ordering constraint and high dimensionality

Instead, we specify a prior for a **transformation** of `$\boldsymbol{\gamma}$`
]

---

# Bayesian CPM: .font70[Priors]

.font150[
Let `$\pi_{\cdot j} \equiv Pr(r(y)=j|x=\boldsymbol{0})$` be the probability of being in category `$j$` when all the covariates are 0. From the previous definition
`$$\pi_{\cdot j}=G^{-1}(\gamma_j-0) - G^{-1}(\gamma_{j-1}-0)=G^{-1}(\gamma_j) - G^{-1}(\gamma_{j-1})$$`
]
--

.font150[
Conversely,
`$$\sum_{k=1}^{j}\pi_{\cdot k}=G^{-1}(\gamma_{j}) \quad \Rightarrow \quad G\left(\sum_{k=1}^{j} \pi_{\cdot k}\right)=\gamma_j$$`

These equations define a transformation `$h(\boldsymbol{\gamma})$` between the cutpoints `$\boldsymbol{\gamma}$` and the probabilities of category membership if all the covariates were 0, `$\boldsymbol{\pi_{\cdot}}=(\pi_{\cdot 1},\ldots,\pi_{\cdot J})$`
]

---

# Bayesian CPM: .font70[Dirichlet distribution]

.font150[
Finally, we specify a Dirichlet prior for `$\boldsymbol{\pi_{\cdot}}$` as `$\color{orange}{p(\boldsymbol{\pi_{\cdot}})} \propto \prod_{j=1}^{J}\pi_{\cdot j}^{\alpha_j-1}$` where `$\alpha_1 = \cdots =\alpha_J$`

The Dirichlet distribution is the multivariate generalization of a Beta distribution to `$J \ge 2$` dimensions with probabilities `$\pi_i \in (0,1)$` such that `$\sum_{i=1}^{J}\pi_i=1$`

]

.font150[
Because the Dirichlet is conjugate to a multinomial distribution the `$\alpha_i$` can be interpreted as the number of pseudo-observations in each category contributed by the prior

Additionally this prior implies `$\pi_{\cdot j}>0$` for all `$j$` when all the covariates are 0
]

---

# Bayesian CPM: .font70[Full Bayesian model]

.font160[
Combining the `$\color{orange}{\text{priors}}$` for `$\boldsymbol{\beta}$` and `$\boldsymbol{\gamma}$` with the `$\color{RoyalBlue}{\text{likelihood}}$` we have

`$$\begin{align}
\color{LimeGreen}{p(\boldsymbol{\gamma},\boldsymbol{\beta}|\boldsymbol{x},\boldsymbol{y})} & \propto \color{orange}{p(\boldsymbol{\gamma})p(\boldsymbol{\beta})} \color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}\\
&\propto \color{orange}{p(h(\boldsymbol{\gamma}))|\mathcal{J_h}|p(\boldsymbol{\beta})} \color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}\\
&\propto \color{orange}{p(\boldsymbol{\pi_{\cdot}})|\mathcal{J_h}|p(\boldsymbol{\beta})} \color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}
\end{align}$$`

Where `$h(\boldsymbol{\gamma})=\boldsymbol{\pi_{\cdot}}$` is the transformation from the cutpoints to the probabilities of category membership when all covariates are 0 and `$\mathcal{J_h}$` is the Jacobian of the transformation
]

---

# Bayesian CPM: .font70[Case Study]

.font110[
Data were previously collected from 216 HIV-positive adults on antiretroviral therapy in two cohort studies

The aim of the analysis was to estimate the association between body mass index (BMI) and several inflammation biomarkers

]

.font110[
We present results for the Interleukin 6 (*IL-6*) biomarker which is skewed and has values censored below a lower detection limit which are recorded as 0
]

<!--
.font80[
.footnote[Koethe et al. (2012) Serum Leptin Level Mediates the Association of Body Composition and Serum C-Reactive Protein in HIV-Infected Persons on Antiretroviral Therapy. *AIDS Research and Human Retroviruses*, 28, 552–557.

Koethe et al. (2015) The metabolic and cardiovascular consequences of obesity in persons with HIV on long-term antiretroviral therapy. *AIDS*] 
]
-->

---

# Bayesian CPM: .font70[Case Study]

.font140[
To deal with the skewness and censoring we fit a Bayesian CPM model to estimate the association between BMI and the conditional mean, median, and 90th quantile of *IL-6*

In addition to BMI, the analysis adjusted for age, sex, race, smoking status, study cohort, and CD4 cell count
]

.font140[
A more traditional approach would require three separate models, e.g.

- Censored regression with outcome transformation for conditional mean
- Two quantile regression models for conditional median and 90th quantile
]

---

# Bayesian CPM: .font70[Case Study]

Posterior estimates and intervals for the covariates for the IL-6 biomarker

---

# Bayesian CPM: .font70[Case Study]

Higher BMI is associated with higher IL-6, e.g. for white, male, nonsmoker with average age and CD4 count in the LiNC study

.font80[
- 95% probability that a subject with these covariates and BMI of 30.3 has a median IL-6 between 2.02 and 2.68

- For the same subject there is a 75% probability that the 90th percentile of IL-6 is greater than 5.75
]

---

# Bayesian CPM: .font70[Discussion]

.font120[
➡️ Reparameterizing CPM model cutpoints using Dirichlet prior allows model to handle a large number of ordinal categories
]

.font120[
➡️ Simulation Results
  - Best performance when the data are fairly dense and sample size is large enough (*n>100*) 
  - Can produce biased estimates for some quantities; censoring can exacerbate bias
  - Computation time increases exponentially with sample size using MCMC

]

.font120[
➡️ Bayesian CPM Advantages  
 - Avoid specification of outcome transformation  
 - Handle continuous and discrete ordered outcomes  
 - Estimate full conditional CDF, conditional mean, and quantiles with one model   
 - Provide exact inference based on interpretable posterior probabilities  
 - Incorporate prior information if available
]

---

# Outline

- .font160[.grey[Bayesian Inference and Estimation]]

- .font160[.grey[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]]

- .font160[.bold[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]]

- .font160[.grey[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]]

---

# Pharmacokinetic Models

.font150[
- Pharmacokinetic (PK) models are used to estimate the amount of drug in the body over time
]

.font150[
- Compartmental PK models approximate absorption, distribution, metabolism, and elimination process by grouping body systems into hypothetical compartments
]

.font150[
- The movement of drug through the compartments is modeled by a system of differential equations whose solutions provide the amount of drug in each compartment over time]

.font150[
- For all but the simplest models, the solutions are *nonlinear* in the parameters
]

---

# Pharmacokinetic Models: .font70[Compartmental PK Models]

Example: One-compartment model with linear elimination, the concentration at time `$t$` for an individual with a single intravenous (IV) infusion dose `$d$` administered at time `$t_d$` for duration `$t_{inf}$` can be defined as:

`$$f\left(D=\{d,t_d,t_{inf}\},\psi=\{Cl,V\},t\right)=
\begin{cases}
\frac{d}{t_{inf}}\frac{1}{Cl}\left( 1-e^{-\frac{Cl}{V}(t-t_d)} \right) & \text{if } t-t_{d} \le t_{inf} \\
\frac{d}{t_{inf}}\frac{1}{Cl}\left( 1-e^{-\frac{Cl}{V}t_{inf}} \right)e^{-\frac{Cl}{V}(t-t_{d}-t_{inf})} & \text{otherwise.}
\end{cases}$$`

The structural model `$f(D,\psi,t)$` describes the change in concentration over time for data from a single individual.

<div class="figure" style="text-align: center">
<img src="dissertation_pres_files/figure-html/ex1-a-1.png" alt="Concentration for subject with Cl=12 and V=40 given 8 unit infusion over 12 hours" width="50%" />
Concentration for subject with Cl=12 and V=40 given 8 unit infusion over 12 hours
</div>

---

# Pharmacokinetic Models: .font70[Population PK Models]

- Population PK models extend this idea to multiple individuals

- PK profiles for all individuals are assumed to follow the same structural model, but PK parameter values can differ from subject to subject:
  - Quantify variability within and between individuals
  - Incorporate covariates that may affect each individual's PK profile
  - Estimate both population and individual-specific parameters

---

# Bayesian Population PK Models

.font130[
Fully Bayesian population PK models have several advantages
- Interpretation using posterior probabilities
- Incorporate prior information
- Account for uncertainty using an explicitly defined, flexible hierarchical model structure
- Straightforward computation of predictive distributions

]

.font130[
Example Questions

"Given information from 3 previous studies and our current data, what is the probability that a certain genotype variant reduces total clearance?"

"How likely is this patient to reach a specific concentration threshold?"
]

---

# Bayesian Population PK Models: .font70[Model specification]

Bayesian population PK models use a three stage hierarchical model specification

Stage one is represented by density `$p_1(\boldsymbol{Y}|\boldsymbol{\psi},\sigma_y^2)$`:
`$$Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij}\\
\varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\sigma_y^2,\;\;\; \gamma \ge 0,$$`

- `$Y_{ij}$` is the observed concentration for subject `$i=1,\ldots,N$` at the `$j$`th time point, `$t_{ij}$`, `$j=1,\ldots,n_i$` where `$n_i$` is the number of observed concentrations for subject `$i$`

- `$D_i$` is dose history for subject `$i$`, `$\boldsymbol{\psi}_{i}$` is the PK parameter vector for subject `$i$`, and `$\varepsilon_{ij}$` are independent random variables with mean zero representing intra-individual variability (residual error)

Stage two is represented by multivariate density `$p_2(\boldsymbol{\psi}|\boldsymbol{X},\boldsymbol{\mu},\boldsymbol{\Omega})$`:
`$$\boldsymbol{\psi_{i}}=\boldsymbol{f_2}(\boldsymbol{\mu},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N(0,\boldsymbol{\Omega}),$$`

- `$\boldsymbol{\mu}$` is the vector of population regression coefficients (fixed effects), `$\boldsymbol{X_{i}}$` is the design matrix comprised of individual-specific covariates, and `$\boldsymbol{\eta_i}$` is a vector of individual-specific deviations (random effects) with covariance `$\boldsymbol{\Omega}$`

Together, stage one and two comprise the `$\color{RoyalBlue}{\text{likelihood, } p_1(\boldsymbol{Y}|\boldsymbol{\psi},\sigma_y^2)p_2(\boldsymbol{\psi}|\boldsymbol{X},\boldsymbol{\mu},\boldsymbol{\Omega})}$`

---

# Bayesian Pop. PK: .font70[Model specification]

.font120[
Stage three specifies the `$\color{orange}{\text{prior}}$` information for the unknown parameters: `$$p_3(\boldsymbol{\mu},\boldsymbol{\Omega},\sigma_y^2)=p(\boldsymbol{\mu})p(\boldsymbol{\Omega})p(\sigma_y^2)$$`
]

.font120[
- `$p(\boldsymbol{\mu})$` is the prior for the fixed effects at stage two, `$p(\boldsymbol{\Omega})$` is the prior for random effects at stage two, and `$p(\sigma_y^2)$` is the prior for residual error variance at stage one
]

.font120[
The `$\color{LimeGreen}{\text{posterior}}$` is
`$\color{LimeGreen}{p(\boldsymbol{\theta}=\{\boldsymbol{\mu},\boldsymbol{\Omega},\sigma_y^2\} | \boldsymbol{Z} =\{\boldsymbol{Y},\boldsymbol{X},D\})} \propto \color{RoyalBlue}{p_1(\boldsymbol{Y}|\boldsymbol{\psi},\sigma_y^2)p_2(\boldsymbol{\psi}|\boldsymbol{X},\boldsymbol{\mu},\boldsymbol{\Omega})} \color{orange}{p_3(\boldsymbol{\mu},\boldsymbol{\Omega},\sigma_y^2)}$` where `$\boldsymbol{\theta}$` contains all the parameters to be estimated and `$\boldsymbol{Z}$` contains all observed data.
]

---

# Bayesian Pop. PK: .font70[Estimation]

.font140[
Bayesian population PK models are usually estimated using MCMC methods (hybrid Gibbs-Metropolis Hastings sampling in `WINBUGS`/`PKBugs`; Hamiltonian Monte Carlo in `Stan`/`Torsten`).
]

.font140[
MCMC can take a long time to ensure convergence. This creates a bottleneck when fitting many models, such as during iterative model checking and development process.

]

.font140[
Therefore, a less time-consuming approximate method can be useful during this process.

We compare MCMC to a variational inference approximation, specifically automatic differentiation variational inference (ADVI).
]

---

# Bayesian Pop. PK: .font70[Automatic Differentiation Variational Inference]

<!-- 
.font90[
Kucukelbir et al. (2017) developed a VI method called automatic differentiation variational inference (ADVI).
]

- more accessible: avoids derivation of complete conditionals/optimal updates for each new model
- more general: requires only a differentiable probability model (compared to conjugacy or exponential family requirements). -->

.font90[
All unknown parameters are transformed to a common, unconstrained real coordinate space `$(T(\theta)=\zeta)$`. The transformation allows the use of a Gaussian variational family on the transformed space for all problems.
]

.font60[
.footnote[Image Credit: Kucukelbir et al. (2017) Automatic Differentiation Variational Inference. *Journal of Machine Learning Research*, 18, 430-474.
]
]

.font90[
A stochastic gradient ascent algorithm is used to optimize the variational objective function in the transformed space. A second transformation `$(S_{\phi}(\zeta)=\eta)$` allows calculation of expectations with Monte Carlo integration and gradients using automatic differentiation.
]

---

# Bayesian Pop. PK: .font70[Simulations]

.font140[
- Simulations for several scenarios and settings
  - 2 Structural models: one-compartment and two-compartment IV infusion models
  - 2 Sampling schemes: dense (100 subjects with 21 observations each) and sparse (300 subjects with 2 - 5 observations each)
  - 4 Priors: strong, weak, misspecified, and non-informative priors
]
--

.font140[
- 250 datasets simulated under each structural model and sampling scheme
]

.font140[
- For each dataset, models were fit using 4 priors with both ADVI and MCMC (No-U-Turns Hamiltonian Monte Carlo)
]

.font140[
- Compare computation time, population and individual parameter estimates, and posterior predictions
]

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

MCMC to ADVI computation time ratio

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Median population parameter `$(\boldsymbol{\mu}$`, `$\boldsymbol{\Omega}$`, `$\sigma_y^2)$` percent bias

`$$\small Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij},\;\;\;
\varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\color{Red}{\sigma_y^2},\;\;\; \gamma \ge 0\\
\small \boldsymbol{\psi_{i}}=\boldsymbol{f_2}(\color{Red}{\boldsymbol{\mu}},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N_2(0,\color{Red}{\boldsymbol{\Omega}})$$`

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Population parameters `$(\boldsymbol{\mu}$`, `$\boldsymbol{\Omega}$`, `$\sigma_y^2)$` 90% credible interval widths

`$$\small Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij},\;\;\;
\varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\color{Red}{\sigma_y^2},\;\;\; \gamma \ge 0\\
\small \boldsymbol{\psi_{i}}=\boldsymbol{f_2}(\color{Red}{\boldsymbol{\mu}},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N_2(0,\color{Red}{\boldsymbol{\Omega}})$$`

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Individual parameters `$(\boldsymbol{\psi_{ij}})$` percent bias and credible interval widths

`$$\small Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij},\;\;\;
\varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\sigma_y^2,\;\;\; \gamma \ge 0\\
\small \boldsymbol{\color{Red}{\psi_{i}}}=\boldsymbol{f_2}(\boldsymbol{\mu},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N_2(0,\boldsymbol{\Omega})$$`

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Individual predictions - dense sampling

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Individual predictions - sparse sampling

<img src="dissertation_pres_files/figure-html/ipred-4-1.png" width="67%" style="display: block; margin: auto;" />

---

# Bayesian Pop. PK: .font60[Simulation Results Two-compartment IV infusion model]
.font110[
For the more complex two-compartment model:
]

.font110[
- MCMC models take even longer to fit relative to ADVI (median MCMC to ADVI fit time ratios between 11 and 48 times)
]

.font110[
- Population parameter posterior medians using ADVI are more biased especially for parameters other than total clearance `$(Cl_{pop})$`
]

.font110[
- Population parameter 90% credible intervals are underestimated using ADVI
]

.font110[
- Individual parameter estimates using ADVI are fairly accurate for `$Cl_{i}$` with more bias for other individual parameters depending on sampling scheme and prior
]

.font110[
- Individual parameter 90% credible interval widths are similar using ADVI and MCMC
]

.font110[
- Individual and population predictions are similar using ADVI and MCMC
]

---

# Bayesian Pop. PK: .font70[Model Selection Simulations]

.font110[Explored model selection using correctly specified and two misspecified models. Clearance submodel can be described as:]

- Correctly specified model (M1) with `$\beta_{Cl_{X_1}}=0.4$`, `$X_1 \sim Bernoulli(\theta=0.4)$`
`$$\log(Cl_i)=\log(Cl_{pop})+0.75\times \ln \left(\frac{weight_i}{70}\right)+\beta_{Cl_{X_1}}X_{1i}+\eta_{Cl_i}$$`
--

- First misspecified model (M2) replace `$X_1$` with `$X_2 \sim Exponential(\lambda = 0.2)$` `$$\log(Cl_i)=\log(Cl_{pop})+0.75\times \ln \left(\frac{weight_i}{70}\right)+\beta_{Cl_{X_2}}X_{2i}+\eta_{Cl_i}$$`

- Second misspecified model (M3) include both `$X_1$` and `$X_2$`
`$$\log(Cl_i)=\log(Cl_{pop})+0.75\times \ln \left(\frac{weight_i}{70}\right)+\beta_{Cl_{X_1}}X_{1i}+\beta_{Cl_{X_2}}X_{2i}+\eta_{Cl_i}$$`
---

# Bayesian Pop. PK: .font70[Model Selection Strategies and Results]

.font110[Each dataset is fit using all three models with both ADVI and MCMC. Then we look at several model selection strategies:]

- Strategy 1: For both ADVI and MCMC, calculate information criteria (DIC, WAIC, and PSIS-LOOIC)

- Strategy 2: For ADVI, use maximized ELBO

- Strategy 3: For ADVI, use 5-fold leave-subject-out cross-validation to calculate log pointwise predictive density `$(lppd_{cv})$`

We compare strategies by calculating how frequently each of the 6 possible rankings — from best (M1, M3, M2) to worst (M2, M3, M1) — are chosen

<!--
-> [fit dataset generated from M1 using models M1, M2, and M3 with MCMC and ADVI]

-> [for each fit calculate  
a. 3 information criteria DIC, WAIC, PSIS-LOOIC using MCMC fits  
b. 3 information criteria DIC, WAIC, PSIS-LOOIC using ADVI fits  
c. ELBO using ADVI fits  
d. k-fold lppd_cv using ADVI fits
]

-> [calculate how frequently each of the 6 possible rankings are chosen]
-->

- Strategy 1 performs worst in terms of ranking the models; using information criteria with ADVI estimates is worse than MCMC estimates

- Strategy 3 was best for most scenarios, sampling schemes, priors, and sensitivity analyses

---

# Bayesian Pop. PK: .font70[Discussion]

.font120[
➡️ Using ADVI can provide a good balance between speed and accuracy when fitting Bayesian population PK models depending on question of interest
]

.font120[
➡️ Simulation Results
  - MCMC took 10x longer for one-compartment model and 45x longer for two-compartment model
  - For population parameters, posterior medians can be fairly accurate but credible intervals are *underestimated* by ADVI
  - For individual parameters, posterior medians can be fairly accurate and credible intervals are similar for ADVI and MCMC
  - Posterior predicted concentrations using ADVI are similar to MCMC 
]

.font120[
➡️ Reduced fit time using ADVI can be leveraged for model selection by using cross-validation
]

.font120[
➡️ Best used when fitting many models or to get results quickly when approximate solution is acceptable
]

---

# Outline

- .font160[.grey[Bayesian Inference and Estimation]]

- .font160[.grey[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]]

- .font160[.grey[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]]

- .font160[.bold[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]]

---

# Population PK Analysis of Dexmedetomidine

.pull-left.w67[
.font90[
- Dexmedetomidine is an alpha2-agonist routinely used as part of intraoperative anesthetic management during surgical repairs of congenital heart disease and in the postoperative period in the intensive care unit

- Commonly dosed as a continuous intravenous (IV) infusion using a fixed weight-based rate (e.g., starting at 0.3 mcg/kg/h)

- This dosing regimen results in inappropriate dosing for some patients who are then at risk for dose-related dexmedetomidine side effects or use of additional sedative agents, including opioid analgesics 
- Accurate prediction of an individual’s dexmedetomidine requirement (precision dosing) could help minimize titration time to achieve sedation and analgesia goals without toxicity

- Objectives were to perform a population PK analysis of dexmedetomidine in children using remnant specimens and electronic health records (EHRs) and explore the impact of patient’s characteristics and pharmacogenetics on dexmedetomidine clearance
]
]

.pull-right.w33[
<img src="fig/bayes_pk_dex/Dexmed_0_9pct_Group_HighRes1.jpg" width="100%" style="display: block; margin: auto;" />
]

.font70[
.footnote[James et al. (2022) Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real- World Data from Electronic Health Records and Remnant Specimens. *British Journal of Clinical Pharmacology*] 
]

---

# Dex. Pop. PK: .font70[Frequentist Analysis]

.pull-left.w40[
<img src="fig/bayes_pk_dex/FlowDiagram_210919.png" width="85%" style="display: block; margin: auto;" />
]

.pull-right.w60[
.font90[
- Dexmedetomidine dosing and patient data were gathered from EHRs and combined with opportunistically sampled remnant specimens.

- Examined multiple structural, error, and variance models and association between PK and patient characteristics (weight, postmenstrual age, sex, cardiac bypass time, etc.) and genotype variables (*UGT1A4* and *UGT2B10* variants, novel *CYP2A6* risk score).

- Population PK models were estimated using a frequentist approach with stochastic approximation expectation-maximization (SAEM) algorithm.

- 354 post-cardiac surgery patients age 0 to 22 years (median 16 months).

- The data were best described with a two-compartment model with allometric scaling for weight and Hill maturation function for postmenstrual age. 
 
- Weight and age were important predictors of clearance; we did not find evidence for pharmacogenetic effects of *UGT1A4* or *UGT2B10* genotype or *CYP2A6* risk score.
]
]

---

# Dex. Pop. PK: .font70[Bayesian Reanalysis Methods]

.font130[
- A limitation of the frequentist analysis is the sparse, irregular, and potentially error-prone data which leads to a large amount of variation and imprecision for some PK parameters
]

.font130[
- Use a Bayesian framework to combine prior information from eight small, designed clinical studies with our larger observational dataset to more precisely estimate these parameters and the entire model 
]

.font130[
- Begin with two-compartment allometric scaling model from original paper and previous studies, then use ADVI to quickly fit and explore models including additional patient characteristics and pharmacogenomic effects
]

.font130[
- Use MCMC estimation to produce final model estimates and perform posterior checks
]

.font130[
- Examine sensitivity to priors 
]

---

# Dex. Pop. PK: .font70[Bayesian Reanalysis Results]

.font130[
Using ADVI with 10-fold cross-validation to estimate out-of-sample predictive accuracy, `$lppd_{cv}$`:
]
--

.font120[
- Two-compartment model with allometric scaling and Hill maturation factor `$(lppd_{cv}=856.6)$` was better than model with allometric scaling alone `$(lppd_{cv}=706.1)$`
]

.font120[
- Allometric scaling and Hill maturation model was not improved by adding serum creatinine, sex, STAT score1, cardiac bypass time, length of ICU stay, or *UGT1A4* or *UGT2B10* variant effects to model
]

.font70[
.footnote[[1] Society of Thoracic Surgery–European Association for Cardio-Thoracic Surgery Congenital Heart Surgery Mortality score
] 
]

.font120[
- Among the 350 subjects with *CYP2A6* score, the allometric scaling and Hill maturation factor model `$(lppd_{cv}=780.3)$` was improved by adding *CYP2A6* effect to model `$(lppd_{cv}=803.3)$`
]

---

# Dex. Pop. PK: .font70[Bayesian Reanalysis Results]

Allometric scaling, Hill maturation, and *CYP2A6* score model posterior median parameter estimates and 90% highest density intervals:
 - Total clearance (*CLpop*) 40.8 L/hr (36.1 - 45.6 L/hr)
 - Standardized *CYP2A6* effect on log total clearance `$(\beta_{CL,CYP2A6})$` 0.0634 (-0.000351 - 0.123)
 - Central compartment volume of distribution (*V1pop*) 142 L (122 – 164 L)
 - Intercompartmental clearance (*Qpop*) 13.5 L/hr (9.79 – 17.6 L/hr)
 - Peripheral compartment volume of distribution (*V2pop*) 1136 L (636 – 1661 L)
 - Postmenstrual age when 50% of adult clearance is achieved (*TM50*) 41.3 weeks (35.7 – 46.2 weeks)
 - Hill coefficient 4.34 (2.65 – 6.06)

A one standard deviation increase in *CYP2A6* score was associated with approximately 1.06 times higher total clearance.

This implies that a subject with a high *CYP2A6* score will require a larger dose to achieve the same concentration as a subject with lower *CYP2A6* score, holding other covariates constant.

---

# Dex. Pop. PK: .font70[Bayesian Reanalysis Results]

Hypothetical subjects with median weight (9.4kg), median postmenstrual age
(104.6 weeks) and *CYP2A6* scores of 2.0 and 2.43. Loading dose of 0.4 mcg/kg delivered over 5 minutes followed by a 6
hour infusion at 0.6 mcg/kg/h and a 2 hour infusion at 0.5 mcg/kg/h

(a) dosing same for both subjects

(b) dosing for high *CYP2A6* subject increased by 0.1 mcg/kg

---

# Dex. Pop. PK: .font70[Discussion]

.font120[
➡️ Using an informative prior, Bayesian estimation produced results much closer to the previous literature for most parameters compared to the frequentist analysis
]

.font120[
➡️ Including *CYP2A6* improved estimated predictive accuracy and effect is potentially clinically meaningful
]

.font120[
➡️ Using ADVI reduced modeling time, e.g. for final model ADVI fit time was 6 minutes, MCMC fit time was 595 minutes (9 hours, 55 minutes)
]

.font120[
➡️ Prior sensitivity analysis
 - Non-informative prior chains stuck/not converged after 3000 warmup iterations
 - Weakly informative prior (increased prior variance) predictions similar to main analysis prior, *Qpop* and *V2pop* parameters were sensitive to prior choice
]

---

# Conclusion
.pull-left[

.font120[Benefits]

➕ Interpretation using posterior probabilities
- 95% probability that a given subject has median IL-6 between 2.02 and 2.68

➕ Ability to formally incorporate prior information 
- Use information from previous dexmedetomidine studies to stabilize estimation of population PK models

➕ Natural extension to hierarchical models
- Population PK models are hierarchical models where concentrations are clustered within individual, can extend CPMs to structured data 
]

.pull-right[

.font120[Challenges]

➖ Determining appropriate prior distribution
- Reparameterize CPM model intercept prior to handle large number of ordinal categories

➖ Computationally intensive
- Use ADVI approximation for population PK models to reduce computation time for model development
]

---

# Thanks

.pull-left[
PhD Committee
- Bryan Shepherd, chair
- Leena Choi, advisor
- Matt Shotwell
- Sara Van Driest

Choi Lab
- Cole Beck
- Elizabeth McNeer
- Michael Williams
- Hannah Weeks
]

.pull-right[
- Frank Harrell
- Jeffrey Blume
- Robert Greevy
- Prince Kannenkeril
- John Koethe
- Yuqi Tian
- Chun Li
- *BJCP* paper co-authors and collaborators
- Vanderbilt Biostatistics students, faculty, and staff
]

---

# Bayesian Inference: .font80[Software]

.font150[
We use `Stan` through the `R` front-end `Rstan` or `CmdStanR` for all Bayesian models and the `Torsten` library for all Bayesian PK models
]

.font150[
Code and examples are available at
https://github.com/ntjames
]

---

# Bayesian CPM: .font70[Posterior conditional distributions]

.font150[
Using MCMC samples from posterior distribution ( `$\tilde{\boldsymbol{\gamma}}, \tilde{\boldsymbol{\beta}}$` ), it is straightforward to calculate quantities of interest for a given set of covariates

- Posterior conditional CDF:  `$\tilde{F}(y_i|\boldsymbol{x}_i)=G^{-1}(\tilde{\gamma}_{r(y_i)}-\boldsymbol{x}_i^{T}\tilde{\boldsymbol{\beta}})$`

- Posterior conditional mean:
`$\tilde{E}[Y|\boldsymbol{x}_i]=\sum_{i=1}^{n}y_i\tilde{f}(y_i|\boldsymbol{x}_i)$` where `$\tilde{f}(y_i|\boldsymbol{x}_i)=\tilde{F}(y_i|\boldsymbol{x}_i)-\tilde{F}(y_{i-1}|\boldsymbol{x}_i)$`

- Posterior conditional quantile:
To estimate the `$q^{th}$` quantile 
 - Find `$y_i=\inf\{y:\tilde{F}(y|x)\ge q\}$` and the next smallest value `$y_{i-1}$` 
 - Use linear interpolation to find quantile `$y_q$` where `$y_{i-1}<y_q<y_i$` 
]

---

# Bayesian CPM: .font70[Simulation Setting]

.font150[
To evaluate the long-run frequency properties of the Bayesian CPM we generate data from a log-normal model
`$$Y=\exp(X_1\beta_1 + X_2 \beta_2 + \varepsilon) \quad \varepsilon \sim N(0,1)$$`
where `$\beta_1=1$`, `$\beta_2=-0.5$`, `$X_1 \sim Bernoulli(0.5)$` and `$X_2 \sim N(0,1)$`

`$1000$` datasets were simulated for sample sizes `$n=25,50,100,200$` and `$400$`

A Bayesian CPM with the properly specified probit link was fit to each dataset
]

---

# Bayesian CPM: .font70[Simulation Results]

.font120[
The average bias of the posterior median and standard error are shown for the two `$\beta$` regression parameters and
five `$\gamma$` intercept parameters corresponding to `$y_1=e^{-1}$`, `$y_2=e^{-0.33}$`, `$y_3=e^{0.5}$`, `$y_4=e^{1.33}$`, `$y_5= e^{2}$`
]

---

# Bayesian CPM: .font70[Simulation Results]

.font120[
To simulate a mixed discrete/continuous outcome identical datasets were produced but with values of `$Y<0$` set to `$0$`
]

---

# Bayesian Pop. PK: .font70[Variational Inference]

.pull-left[
<img src="fig/bayes_pk_advi/VI_diag3.png" width="110%" style="display: block; margin: auto;" />
]

.pull-right[
- Goal of VI is to find the joint distribution, `$q(\boldsymbol{\theta};\phi)$`, in a specified distributional family indexed by variational parameters `$\phi \in \Phi$`, which has the smallest Kullback-Leibler (KL) divergence to the posterior distribution, `$p(\boldsymbol{\theta}|\boldsymbol{Z})$`

]

---

# Bayesian Pop. PK: .font70[Variational Inference]

.pull-left[
<img src="fig/bayes_pk_advi/VI_diag4.png" width="110%" style="display: block; margin: auto;" />
]

- The best variational approximation, `$q(\boldsymbol{\theta};\phi^*)$`, is found by optimizing, `$\phi^*= \underset{\phi \in \Phi}{\arg\min}\{KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{Z}))\}$`
]

- KL can be decomposed as `$KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{Z}))=KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))-E_q[\log p(\boldsymbol{Z}|\boldsymbol{\theta})]+\log p(\boldsymbol{Z})$`
which shows the dependence of the optimization on an intractable term, the marginal likelihood (or evidence), `$\log p(\boldsymbol{Z})$`

- But, all expectations are with respect to `$q(\boldsymbol{\theta};\phi)$` (on which `$\log p(\boldsymbol{Z})$` does not depend), finding `$\phi^*$` that minimizes `$KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{Z}))$` is equivalent to finding `$\phi^*=\underset{\phi \in \Phi}{\arg\max}\{E_q[\log p(\boldsymbol{Z}|\boldsymbol{\theta})]-KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))\}$`

---

# Bayesian Pop. PK: .font70[Variational Inference]

.font130[
- The optimal variational parameter is `$\phi^*=\underset{\phi \in \Phi}{\arg\max}\{E_q[\log p(\boldsymbol{Z}|\boldsymbol{\theta})]-KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))\}$`

- Since the KL divergence `$\ge0$` for all distributions:
`$$\begin{gather*}
KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{z})) \ge 0\\
KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))-E_q[\log p(\boldsymbol{z}|\boldsymbol{\theta})]+\log p(\boldsymbol{z}) \ge 0\\
\log p(\boldsymbol{z}) \ge \underbrace{E_q[\log p(\boldsymbol{z}|\boldsymbol{\theta})]-KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta})).}_{ELBO}
\end{gather*}$$`
- The term that is maximized to find `$\phi^*$` is also called the evidence lower bound (ELBO)
]

---

# Bayesian Pop. PK: .font70[Automatic Differentiation Variational Inference]

.font110[
- VI commonly uses a 'mean-field' approximation which assumes mutual independence between each of the `$\ell$` marginal distributions in the specified variational distributional family, `$q(\boldsymbol{\theta};\phi)=\Pi_{\ell}q_{\ell}(\theta_{\ell};\phi_{\ell})$`
]

.font110[
- Early work on VI focused on deriving closed form solutions for the optimal updates for special cases such as conjugate and exponential family models.
  - deriving and coding the optimal updates can be time-consuming; requires alternative methods for models outside of the special cases 
  - undercuts the aim of quickly iterating over multiple models without constraints on model and prior parameterization
]

.font110[
- Kucukelbir et al. (2017) developed a VI method called automatic differentiation variational inference (ADVI) that combines advances in probabilistic programming and automatic differentiation
  - more accessible: avoids derivation of complete conditionals/optimal updates for each new model
  - more general: requires only a differentiable probability model (compared to conjugacy or exponential family requirements). 
]

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] 

Ratio of `$C_{max}$` predictions

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Population predictions - dense sampling

<img src="dissertation_pres_files/figure-html/ppred-3-1.png" width="67%" style="display: block; margin: auto;" />

---

# Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model]

Population predictions - sparse sampling

<img src="dissertation_pres_files/figure-html/ppred-4-1.png" width="67%" style="display: block; margin: auto;" />

---

# Bayesian Pop. PK: .font70[Case Study Background]

.font130[
- Ketorolac is a nonsteroidal anti-inflammatory drug (NSAID) used to treat pain after surgery

- Data for ketorolac IV dosing and patient characteristics was gathered from EHRs and combined with remnant specimens collected using opportunistic sampling to determine blood plasma concentration

- Concentration samples are sparse so we use the Bayesian population PK framework to combine prior information from several small designed clinical studies with our larger observational dataset

- The primary goal is to find a model that adequately describes individual PK time-concentration profiles and parameters
]
---

# Bayesian Pop. PK: .font70[Case Study Methods]

Based on previous literature ketorolac PK can be described well using a two-compartment model. The two-compartment PK model for multiple IV bolus doses is given by:
`$$\begin{gather*}
D=\{d_k,t_{d_k}\}\;\; \psi=\{Cl,Q,V_1,V_2\}\\
\alpha = \frac{\frac{Q Cl}{V_2 V_1}}{\beta}\;\;\;
\beta = \frac{1}{2}\left[\frac{Q}{V_1}+\frac{Q}{V_2}+\frac{Cl}{V_1} - \sqrt{\left(\frac{Q}{V_1}+\frac{Q}{V_2}+\frac{Cl}{V_1} \right)^2} - 4 \frac{Q Cl}{V_2 V_1} \right]\;\;\;
A = \frac{1}{V_1}\frac{\alpha - \frac{Q}{V_2}}{\alpha-\beta}\;\;\;
B = \frac{1}{V_1}\frac{\beta - \frac{Q}{V_2}}{\beta-\alpha}\\
f(D,\psi,t)=\sum_{k=1}^{K}d_k(A e^{-\alpha (t-t_{d_k})} + B e^{-\beta (t-t_{d_k})}),
\end{gather*}$$`
where `$d_k$` and `$t_{d_k}$` are the amounts and administration times, respectively, for the `$k=1,\ldots,K$` doses. Using this compartmental model, the stage 1 model for observed concentration `$y_{ij}$` for individual `$i$` at time `$j$` including combined proportional and additive error can be defined as:
`$$\begin{gather*}
y_{ij}=f(D_i,\psi_{i},t_{ij})\times(1+\varepsilon_{ij}^{prop})+\varepsilon_{ij}^{add}\\
\varepsilon_{ij}^{prop} \sim N(0,\sigma_{prop}^2)\;\;\; \varepsilon_{ij}^{add} \sim N(0,\sigma_{add}^2)
\end{gather*}$$`

---

# Bayesian Pop. PK: .font70[Case Study Methods]

For stage two, we use a model with fixed allometric scaling to account for different weights:
`$$\begin{gather*}
\log(Cl_i) = \log(Cl_{pop}) + 0.75 \times \log\left(\frac{weight_i}{15}\right) + \eta_{Cl_i},\; \eta_{Cl} \sim N(0, \omega_{Cl}^2)\\
\log(Q_i) = \log(Q_{pop}) + 0.75 \times \log\left(\frac{weight_i}{15}\right) + \eta_{Q_i},\; \eta_{Q} \sim N(0, \omega_{Q}^2)\\
\log(V_{1i}) = \log(V_{1\,pop}) + 1 \times \log\left(\frac{weight_i}{15}\right) + \eta_{V_{1i}},\; \eta_{V_1} \sim N(0, \omega_{V_1}^2)\\
\log(V_{2i}) = \log(V_{2\,pop}) + 1 \times \log\left(\frac{weight_i}{15}\right) + \eta_{V_{2i}},\; \eta_{V_2} \sim N(0, \omega_{V_2}^2)\\
\{\eta_{Cl},\eta_{Q},\eta_{V_1},\eta_{V_2}\} =\boldsymbol{\eta} \sim N_4(0,\Omega)
\end{gather*}$$`

---

# Bayesian Pop. PK: .font70[Case Study Methods]

We use an informative prior based on previously published pediatric ketorolac studies and literature on PK scaling for children

---

# Bayesian Pop. PK: .font70[Case Study Results]

.font150[
The 406 subjects in ketorolac dataset were separated into a training dataset with 320 subjects used to fit the models and a validation dataset with 86 subjects

For the model with allometric scaling and maturation factor:
- Mean-field ADVI algorithm took around *6 minutes and 33 seconds* to converge
- MCMC took *341 minutes* (5 hours, 41 minutes) for 4 parallel chains running concurrently with 4000 warmup iterations and 250 sampling iterations 
]

---

# Bayesian Pop. PK: .font70[Case Study Results]

Posterior Population parameter distributions

---

# Bayesian Pop. PK: .font70[Case Study Results]

Individual posterior clearance `$(Cl_i)$` distributions for 20 individuals from training dataset

---

# Bayesian Pop. PK: .font70[Case Study Results]

---

# Bayesian Pop. PK: .font70[Case Study Results]

---

# Bayesian Pop. PK: .font70[Case Study Results]

---

# Bayesian Pop. PK: .font70[Case Study Results] 

<img src="dissertation_pres_files/figure-html/cs-recovar-1.png" width="50%" style="display: block; margin: auto;" />

---

# Bayesian Pop. PK: .font70[Case Study Results] 

<img src="dissertation_pres_files/figure-html/cs-vpc-1.png" width="50%" style="display: block; margin: auto;" />

---

# Bayesian Pop. PK: .font70[Additional Discussion]

ADVI compares favorably with other possible approximations
 - Maximum a posteriori (MAP) estimation provides very fast point estimates, but doesn't account for uncertainty, can be very inaccurate in hierarchical models
 - Laplace approximation also uses optimization and estimates covariances using the inverse Hessian of the optimization objective function, but makes stronger distributional assumptions and is less flexible than ADVI
 
--
 
There are several potential solutions to underestimated population parameter uncertainty
 - Replace the mean-field assumption with 'full-rank' VI, allow correlation between marginal variational distributions `$q_{\ell}(\theta_{\ell}; \phi_{\ell})$`; increases complexity and computation time, full-rank ADVI can provide worse results than mean-field ADVI 
 - Correct mean-field VI variance estimates using linear response methods3; increased computation time and requires unbiased mean estimates
 
.font70[
.footnote[[3] Giordano et al. (2018). Covariances, Robustness, and Variational Bayes. *Journal of Machine Learning Research*, 19, 1-49.] 
]

---

# Dex. Pop. PK: .font70[Frequentist Analysis Results]

- Population parameter estimates and 95% confidence intervals:
 - .font90[Total clearance (*CLpop*) 27.3 L/hr (24.0 – 31.1 L/hr)] 
 - .font90[Central compartment volume of distribution (*V1pop*) 161 L (139 – 187 L)]
 - .font90[Intercompartmental clearance (*Qpop*) 26.0 L/hr (22.5 – 30.0 L/hr)]
 - .font90[Peripheral compartment volume of distribution (*V2pop*) 7903 L (5617 – 11119 L)]
 - .font90[Postmenstrual age when 50% of adult clearance is achieved (*TM50*) 42.0 weeks (41.5 – 42.5 weeks)]
 - .font90[Hill coefficient 7.04 (6.99 – 7.08)]

---

# Dex. Pop. PK: .font70[Bayesian Reanalysis Priors]

.font70[
.footnote[Wiczling et al. (2016). The Pharmacokinetics of Dexmedetomidine during Long-term Infusion in Critically Ill Pediatric Patients: a Bayesian approach with informative priors. *Journal of Pharmacokinetics and Pharmacodynamics*, 43, 315-324.] 
]

---

# Dex. Pop. PK: .font70[Bayesian Reanalysis Priors]

---

# Dex. Pop. PK: .font70[Bayesian and Frequentist Analysis Comparison]

.pull-left.w40[
.font110[
- Frequentist model used time-varying weight and simplified variance structure; Bayesian model used baseline weight and more flexible variance structure

- Frequentist model found no significant genotype effects; Bayesian model found *CYP2A6* improved predictive accuracy

- Bayesian model includes more 'subjects' through prior

- Frequentist model used BICc which favors parsimony; Bayesian model used `$lppd_{cv}$` which favors predictive accuracy
]
]

.pull-right.w60[
<img src="dissertation_pres_files/figure-html/bayesfreqdex-1.png" width="100%" style="display: block; margin: auto;" />
]