class: center, middle, inverse, title-slide # Bayesian Methods for Cumulative Probability Models and Population Pharmacokinetic Models ## Vanderbilt University Department of Biostatistics ### Nathan T. James ### April 11, 2022 --- <!-- --> <!-- reference https://arm.rbind.io/slides/xaringan.html https://slides.yihui.org/xaringan https://github.com/gadenbuie/xaringanExtra --> <!-- outline -- Bayes Inf & background (8 mins) -- Bayes CPM (8 mins) -- ADVI for Bayes pop PK (9 mins) -- Dexmedetomidine (8 mins) -- Conclusion/Thanks (5 mins) --> # Outline - .font160[.bold[Bayesian Inference and Estimation]] - .font160[.grey[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]] - .font160[.grey[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]] - .font160[.grey[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]] --- # Bayesian Inference: .font80[Benefits and Challenges] .font170[ Benefits ➕ Interpretation using posterior probabilities ➕ Ability to formally incorporate prior information ➕ Natural extension to hierarchical models ] -- .font170[ Challenges ➖ Determining appropriate prior distribution ➖ Computationally intensive (except in a few special cases) ] --- # Bayesian Inference: .font70[Background] .font150[ Under the Bayesian paradigm we combine the `\(\color{RoyalBlue}{\text{likelihood}}\)` of observed data `\(y\)` with `\(\color{orange}{\text{prior}}\)` information to get a `\(\color{LimeGreen}{\text{posterior}}\)` for the parameters given the data. `$$\color{LimeGreen}{p(\theta|y)}=\frac{\color{orange}{p(\theta)}\color{RoyalBlue}{p(y|\theta)}}{\int \color{orange}{p(\theta)} \color{RoyalBlue}{p(y|\theta)}\, d\theta}$$` All inference is based on the `\(\color{LimeGreen}{\text{posterior}}\)` distribution of the parameters. Usually can't calculate the posterior directly. Instead, we use different methods to get a numerical approximation of the posterior. ] --- # Bayesian Inference: .font80[Markov Chain Monte Carlo (MCMC)] .font120[ Goal: Take random (i.e. Monte Carlo) draws from a sequence of parameter values (Markov Chain) that looks like the posterior distribution. <!--For iterations `\(t=1,2,3,\ldots\)`, the sequential values in the chain are `\(\{\theta^{(1)}, \theta^{(2)}, \theta^{(3)},\ldots\}\)`.--> ] -- .font110[ - Start with some initial value for the posterior parameters at the start of the chain.<!--, `\(\theta^{(1)}\)`--> ] -- .font110[ - Based on the current value, propose a guess for the next value in the chain. <!-- - Based on the current value at iteration `\(t\)`, called `\(\theta^{(t)}\)`, propose a guess for the next value in the chain, call the guess `\(\theta^*\)` --> ] -- .font110[ - Accept or reject the guess based on a prespecified rule. If the guess is accepted, use it as the next value in the chain. Otherwise, the next value in the chain is the same as the current value. <!-- - Accept or reject the guess based on a prespecified rule. If the guess is accepted use it as the next value in the chain (set `\(\theta^{(t+1)}=\theta^*\)`), otherwise the next value in the chain is the same as the current value (set `\(\theta^{(t+1)}=\theta^{(t)}\)`)--> ] -- .font110[ - Repeat steps 2 & 3. ] -- .font110[ The distribution used to propose the guesses and the prespecified rule are defined so that draws from the chain will *eventually* look like samples from the posterior. Then we can take draws from the chain and use them to summarize the posterior distribution. It can take a lot of iterations for the chain to look like the posterior. ] --- # Bayesian Inference: .font80[MCMC example] .font90[ MCMC with 500 warmup iterations, 1500 sampling iterations for parameter `\(\theta\)` ] <!-- --> <img src="fig/intro/mcmc_ex.gif" width="200" height="525"/> --- # Bayesian Inference: .font80[Variational Inference (VI)] .font120[ Goal: Find the variational family that has the smallest Kullback-Leibler (KL) divergence to the posterior and use it as an approximation. ] -- .font110[ - Pick an easy to work with variational distribution family `\(q(\theta;\phi)\)` indexed by parameters `\(\phi \in \Phi\)`. <!--KL divergence between the variational family and the posterior is: $$ KL(q(\theta;\phi)||p(\theta|y))=KL(q(\theta;\phi)||p(\theta))-E_{q}[\log p(y|\theta)]+\log p(y) $$ --> ] -- .font110[ - Find the value of `\(\phi\)` that maximizes `\(E_{q}[\log p(y|\theta)]-KL(q(\theta;\phi)||p(\theta))\)`. This term is called the **ELBO**. <!--Find the value of `\(\phi\)` which minimizes `\(KL(q(\theta;\phi)||p(\theta|y))\)`. Equivalently, we can --> <!-- - Want to find the value of `\(\phi\)` which minimizes `\(KL(q(\theta;\phi)||p(\theta|y))\)` but `\(\log p(y)\)` is usually difficult to calculate. Since the expectations are with respect to `\(q(\theta;\phi)\)` which `\(\log p(y)\)` does not depend on we can equivalently find value of `\(\phi\)` that maximizes `\(E_{q}[\log p(y|\theta)]-KL(q(\theta;\phi)||p(\theta))\)`. This term is called the *ELBO*. --> ] -- .font110[ - Use an iterative optimization algorithm to find `\(\phi^*=\underset{\phi \in \Phi}{\arg\max} \{ELBO\}\)` and thus the best variational approximation `\(q(\theta;\phi^*)\)` ] -- .font120[ Then we use the approximation `\(q(\theta;\phi^*)\)` to summarize the posterior distribution. Optimization using VI is faster than MCMC, but is only an approximation to the true posterior. ] --- # Bayesian Inference: .font80[VI example]  --- # Outline - .font160[.grey[Bayesian Inference and Estimation]] - .font160[.bold[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]] - .font160[.grey[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]] - .font160[.grey[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]] --- # Cumulative Probability Models: .font70[Model specification] .font150[ Let `\(Y_i\)` be the outcome for individual `\(i=1, \ldots, n\)` <!--with `\(p\)` associated covariates `\(\boldsymbol{X_i}=(X_{i1},\ldots,X_{ip})\)`--> Each `\(Y_i\)` falls into one of `\(j=1,\ldots,J\)` ordered categories ] -- .font150[ This can be modeled as `\(Y_i \sim Multinomial(1,\boldsymbol{\pi_i})\)` where `\(\boldsymbol{\pi_i}=(\pi_{i1},\ldots,\pi_{iJ})\)` are the probabilities of individual `\(i\)` being in each of the `\(J\)` categories <!--and `\(\sum_{j=1}^{J} \pi_{ij}=1\)`--> ] -- .font150[ Then the **cumulative probability** of falling into category `\(j\)` or lower is `\(P(Y_i \le j)=\eta_{ij}=\sum_{k=1}^{j} \pi_{ik}\)` ] --- # Cumulative Probability Models: .font70[Likelihood] .font120[ The CPM relates the cumulative probabilities to the covariates, `\(\boldsymbol{x_i}\)`, through a monotonically increasing link function `\(G(\cdot)\)` with inverse `\(G^{-1}(\cdot)\)` `$$G(\eta_{ij})=\gamma_j-\boldsymbol{x_i^{T}\beta} \Longleftrightarrow \eta_{ij}=G^{-1}(\gamma_j-\boldsymbol{x_i^{T}\beta})$$` where the `\(\gamma_j\)` are latent continuous cutpoints `\(-\infty \equiv \gamma_0 < \gamma_1 < \cdots < \gamma_{J-1} < \gamma_J \equiv \infty\)` ] -- .font120[ The likelihood for a random sample of observations `\((y_1,\ldots,y_n)\)` is defined using differences in cumulative probabilities `$$p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})=\prod_{j=1}^{J}\prod_{i:y_i=j} [G^{-1}(\gamma_j-\boldsymbol{x_i^{T}\beta}) - G^{-1}(\gamma_{j-1}-\boldsymbol{x_i^{T}\beta})]$$` If is `\(G(\cdot)\)` is the logit link `\(G(p)=\log\left(\frac{p}{1-p}\right)\)`, this is the likelihood for a proportional odds regression model ] --- # Cumulative Probability Models: .font70[Continuous Outcomes] .font140[ For continuous data with no ties `\(J=n\)`. Letting `\(r(y_i)\)` be the rank of `\(y_i\)` the `\(\color{RoyalBlue}{\text{likelihood}}\)` is `$$\color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}=\prod_{i=1}^{n} [G^{-1}(\gamma_{r(y_i)}-\boldsymbol{x_i^{T}\beta}) - G^{-1}(\gamma_{r(y_i)-1}-\boldsymbol{x_i^{T}\beta})]$$` ] -- .font140[ Why use a CPM for continuous outcomes? ] -- .font130[ - Regression coefficients `\((\boldsymbol{\beta})\)` are invariant to monotonic transformations of outcome - Directly models full conditional cumulative distribution function (CDF) - Handles any ordered outcomes including mixed continuous/discrete distributions (e.g., continuous outcome with lower limit of detection) ] --- # Bayesian Cumulative Probability Models: .font70[Priors] .font150[ To complete the Bayesian model specification we need to define `\(\color{orange}{\text{priors}}\)` for the unknown parameters `\(\color{orange}{p(\boldsymbol{\beta},\boldsymbol{\gamma})}\)` ] -- .font150[ We assume a priori independence between `\(\boldsymbol{\beta}\)` and `\(\boldsymbol{\gamma}\)` so `\(\color{orange}{p(\boldsymbol{\beta},\boldsymbol{\gamma})}=\color{orange}{p(\boldsymbol{\beta})p(\boldsymbol{\gamma})}\)` For convenience, we let the prior for the regression coefficients be non-informative `\(\color{orange}{p(\boldsymbol{\beta})} \propto 1\)` but other choices are possible ] -- .font150[ Specifying a prior for the `\(\boldsymbol{\gamma}\)` directly is challenging because of the ordering constraint and high dimensionality Instead, we specify a prior for a **transformation** of `\(\boldsymbol{\gamma}\)` ] --- # Bayesian CPM: .font70[Priors] .font150[ Let `\(\pi_{\cdot j} \equiv Pr(r(y)=j|x=\boldsymbol{0})\)` be the probability of being in category `\(j\)` when all the covariates are 0. From the previous definition `$$\pi_{\cdot j}=G^{-1}(\gamma_j-0) - G^{-1}(\gamma_{j-1}-0)=G^{-1}(\gamma_j) - G^{-1}(\gamma_{j-1})$$` ] -- .font150[ Conversely, `$$\sum_{k=1}^{j}\pi_{\cdot k}=G^{-1}(\gamma_{j}) \quad \Rightarrow \quad G\left(\sum_{k=1}^{j} \pi_{\cdot k}\right)=\gamma_j$$` These equations define a transformation `\(h(\boldsymbol{\gamma})\)` between the cutpoints `\(\boldsymbol{\gamma}\)` and the probabilities of category membership if all the covariates were 0, `\(\boldsymbol{\pi_{\cdot}}=(\pi_{\cdot 1},\ldots,\pi_{\cdot J})\)` ] --- # Bayesian CPM: .font70[Dirichlet distribution] .font150[ Finally, we specify a Dirichlet prior for `\(\boldsymbol{\pi_{\cdot}}\)` as `\(\color{orange}{p(\boldsymbol{\pi_{\cdot}})} \propto \prod_{j=1}^{J}\pi_{\cdot j}^{\alpha_j-1}\)` where `\(\alpha_1 = \cdots =\alpha_J\)` The Dirichlet distribution is the multivariate generalization of a Beta distribution to `\(J \ge 2\)` dimensions with probabilities `\(\pi_i \in (0,1)\)` such that `\(\sum_{i=1}^{J}\pi_i=1\)` <!--and parameters `\(\alpha_i > 0\; \forall i\)`--> ] -- .font150[ Because the Dirichlet is conjugate to a multinomial distribution the `\(\alpha_i\)` can be interpreted as the number of pseudo-observations in each category contributed by the prior <!-- Setting `\(\alpha_1 = \cdots =\alpha_J=\frac{1}{J}\)` implies a total prior contribution of `\(\sum_{i=1}^{J} \alpha_i=\sum_{i=1}^{J}\frac{1}{J}=1\)` observation --> Additionally this prior implies `\(\pi_{\cdot j}>0\)` for all `\(j\)` when all the covariates are 0 ] --- # Bayesian CPM: .font70[Full Bayesian model] .font160[ Combining the `\(\color{orange}{\text{priors}}\)` for `\(\boldsymbol{\beta}\)` and `\(\boldsymbol{\gamma}\)` with the `\(\color{RoyalBlue}{\text{likelihood}}\)` we have `$$\begin{align} \color{LimeGreen}{p(\boldsymbol{\gamma},\boldsymbol{\beta}|\boldsymbol{x},\boldsymbol{y})} & \propto \color{orange}{p(\boldsymbol{\gamma})p(\boldsymbol{\beta})} \color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}\\ &\propto \color{orange}{p(h(\boldsymbol{\gamma}))|\mathcal{J_h}|p(\boldsymbol{\beta})} \color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})}\\ &\propto \color{orange}{p(\boldsymbol{\pi_{\cdot}})|\mathcal{J_h}|p(\boldsymbol{\beta})} \color{RoyalBlue}{p(\boldsymbol{y}|\boldsymbol{x},\boldsymbol{\gamma},\boldsymbol{\beta})} \end{align}$$` Where `\(h(\boldsymbol{\gamma})=\boldsymbol{\pi_{\cdot}}\)` is the transformation from the cutpoints to the probabilities of category membership when all covariates are 0 and `\(\mathcal{J_h}\)` is the Jacobian of the transformation ] --- # Bayesian CPM: .font70[Case Study] .font110[ Data were previously collected from 216 HIV-positive adults on antiretroviral therapy in two cohort studies The aim of the analysis was to estimate the association between body mass index (BMI) and several inflammation biomarkers<!-- since people living with HIV have higher risk of diabetes and cardiovascular disease--> ] -- .font110[ We present results for the Interleukin 6 (*IL-6*) biomarker which is skewed and has values censored below a lower detection limit which are recorded as 0 ] <img src="fig/bayes_cpm/il_6_hist.png" width="45%" style="display: block; margin: auto;" /> <!-- .font80[ .footnote[Koethe et al. (2012) Serum Leptin Level Mediates the Association of Body Composition and Serum C-Reactive Protein in HIV-Infected Persons on Antiretroviral Therapy. *AIDS Research and Human Retroviruses*, 28, 552–557. Koethe et al. (2015) The metabolic and cardiovascular consequences of obesity in persons with HIV on long-term antiretroviral therapy. *AIDS*] ] --> --- # Bayesian CPM: .font70[Case Study] <!-- Interleukin 6 ( IL-6 ) and Interleukin 1 `\(\beta\)` ( IL-1-$\beta$ ) --> .font140[ To deal with the skewness and censoring we fit a Bayesian CPM model to estimate the association between BMI and the conditional mean, median, and 90th quantile of *IL-6* In addition to BMI, the analysis adjusted for age, sex, race, smoking status, study cohort, and CD4 cell count ] -- .font140[ A more traditional approach would require three separate models, e.g. - Censored regression with outcome transformation for conditional mean - Two quantile regression models for conditional median and 90th quantile ] --- # Bayesian CPM: .font70[Case Study] <!--Estimated transformation function based on the `\(\boldsymbol{\gamma}\)` and--> Posterior estimates and intervals for the covariates for the IL-6 biomarker <img src="fig/bayes_cpm/il_6_post_est_ct.png" width="85%" style="display: block; margin: auto;" /> --- # Bayesian CPM: .font70[Case Study] Higher BMI is associated with higher IL-6, e.g. for white, male, nonsmoker with average age and CD4 count in the LiNC study <img src="fig/bayes_cpm/il_6_bmi.png" width="60%" style="display: block; margin: auto;" /> .font80[ - 95% probability that a subject with these covariates and BMI of 30.3 has a median IL-6 between 2.02 and 2.68 - For the same subject there is a 75% probability that the 90th percentile of IL-6 is greater than 5.75 ] --- # Bayesian CPM: .font70[Discussion] .font120[ ➡️ Reparameterizing CPM model cutpoints using Dirichlet prior allows model to handle a large number of ordinal categories ] -- .font120[ ➡️ Simulation Results - Best performance when the data are fairly dense and sample size is large enough (*n>100*) - Can produce biased estimates for some quantities; censoring can exacerbate bias - Computation time increases exponentially with sample size using MCMC ] -- .font120[ ➡️ Bayesian CPM Advantages - Avoid specification of outcome transformation - Handle continuous and discrete ordered outcomes - Estimate full conditional CDF, conditional mean, and quantiles with one model - Provide exact inference based on interpretable posterior probabilities - Incorporate prior information if available ] --- # Outline - .font160[.grey[Bayesian Inference and Estimation]] - .font160[.grey[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]] - .font160[.bold[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]] - .font160[.grey[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]] --- # Pharmacokinetic Models .font150[ - Pharmacokinetic (PK) models are used to estimate the amount of drug in the body over time ] -- .font150[ - Compartmental PK models approximate absorption, distribution, metabolism, and elimination process by grouping body systems into hypothetical compartments ] -- .font150[ - The movement of drug through the compartments is modeled by a system of differential equations whose solutions provide the amount of drug in each compartment over time] -- .font150[ - For all but the simplest models, the solutions are *nonlinear* in the parameters ] --- # Pharmacokinetic Models: .font70[Compartmental PK Models] Example: One-compartment model with linear elimination, the concentration at time `\(t\)` for an individual with a single intravenous (IV) infusion dose `\(d\)` administered at time `\(t_d\)` for duration `\(t_{inf}\)` can be defined as: `$$f\left(D=\{d,t_d,t_{inf}\},\psi=\{Cl,V\},t\right)= \begin{cases} \frac{d}{t_{inf}}\frac{1}{Cl}\left( 1-e^{-\frac{Cl}{V}(t-t_d)} \right) & \text{if } t-t_{d} \le t_{inf} \\ \frac{d}{t_{inf}}\frac{1}{Cl}\left( 1-e^{-\frac{Cl}{V}t_{inf}} \right)e^{-\frac{Cl}{V}(t-t_{d}-t_{inf})} & \text{otherwise.} \end{cases}$$` -- The structural model `\(f(D,\psi,t)\)` describes the change in concentration over time for data from a single individual. <div class="figure" style="text-align: center"> <img src="dissertation_pres_files/figure-html/ex1-a-1.png" alt="Concentration for subject with Cl=12 and V=40 given 8 unit infusion over 12 hours" width="50%" /> <p class="caption">Concentration for subject with Cl=12 and V=40 given 8 unit infusion over 12 hours</p> </div> --- # Pharmacokinetic Models: .font70[Population PK Models] - Population PK models extend this idea to multiple individuals -- - PK profiles for all individuals are assumed to follow the same structural model, but PK parameter values can differ from subject to subject: - Quantify variability within and between individuals - Incorporate covariates that may affect each individual's PK profile - Estimate both population and individual-specific parameters -- <img src="dissertation_pres_files/figure-html/ex2-1.png" width="65%" style="display: block; margin: auto;" /> --- # Bayesian Population PK Models .font130[ Fully Bayesian population PK models have several advantages - Interpretation using posterior probabilities - Incorporate prior information - Account for uncertainty using an explicitly defined, flexible hierarchical model structure - Straightforward computation of predictive distributions ] -- .font130[ Example Questions "Given information from 3 previous studies and our current data, what is the probability that a certain genotype variant reduces total clearance?" "How likely is this patient to reach a specific concentration threshold?" ] --- # Bayesian Population PK Models: .font70[Model specification] Bayesian population PK models use a three stage hierarchical model specification -- Stage one is represented by density `\(p_1(\boldsymbol{Y}|\boldsymbol{\psi},\sigma_y^2)\)`: `$$Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij}\\ \varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\sigma_y^2,\;\;\; \gamma \ge 0,$$` -- - `\(Y_{ij}\)` is the observed concentration for subject `\(i=1,\ldots,N\)` at the `\(j\)`th time point, `\(t_{ij}\)`, `\(j=1,\ldots,n_i\)` where `\(n_i\)` is the number of observed concentrations for subject `\(i\)` -- - `\(D_i\)` is dose history for subject `\(i\)`, `\(\boldsymbol{\psi}_{i}\)` is the PK parameter vector for subject `\(i\)`, and `\(\varepsilon_{ij}\)` are independent random variables with mean zero representing intra-individual variability (residual error) -- Stage two is represented by multivariate density `\(p_2(\boldsymbol{\psi}|\boldsymbol{X},\boldsymbol{\mu},\boldsymbol{\Omega})\)`: `$$\boldsymbol{\psi_{i}}=\boldsymbol{f_2}(\boldsymbol{\mu},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N(0,\boldsymbol{\Omega}),$$` -- - `\(\boldsymbol{\mu}\)` is the vector of population regression coefficients (fixed effects), `\(\boldsymbol{X_{i}}\)` is the design matrix comprised of individual-specific covariates, and `\(\boldsymbol{\eta_i}\)` is a vector of individual-specific deviations (random effects) with covariance `\(\boldsymbol{\Omega}\)` -- Together, stage one and two comprise the `\(\color{RoyalBlue}{\text{likelihood, } p_1(\boldsymbol{Y}|\boldsymbol{\psi},\sigma_y^2)p_2(\boldsymbol{\psi}|\boldsymbol{X},\boldsymbol{\mu},\boldsymbol{\Omega})}\)` <!-- The PK parameters `\(\boldsymbol{\psi_{ij}}\)` are frequently transformed since they must be strictly positive and are often right-skewed. For example, in a one-compartment IV model if `\(\boldsymbol{\psi_{ij}}=\{\log Cl_{ij}, \log V_{ij}\}\)` the stage two model would typically be multivariate lognormal or multivariate log-Student-t. --> --- # Bayesian Pop. PK: .font70[Model specification] .font120[ Stage three specifies the `\(\color{orange}{\text{prior}}\)` information for the unknown parameters: `$$p_3(\boldsymbol{\mu},\boldsymbol{\Omega},\sigma_y^2)=p(\boldsymbol{\mu})p(\boldsymbol{\Omega})p(\sigma_y^2)$$` ] -- .font120[ - `\(p(\boldsymbol{\mu})\)` is the prior for the fixed effects at stage two, `\(p(\boldsymbol{\Omega})\)` is the prior for random effects at stage two, and `\(p(\sigma_y^2)\)` is the prior for residual error variance at stage one ] -- .font120[ The `\(\color{LimeGreen}{\text{posterior}}\)` is `\(\color{LimeGreen}{p(\boldsymbol{\theta}=\{\boldsymbol{\mu},\boldsymbol{\Omega},\sigma_y^2\} | \boldsymbol{Z} =\{\boldsymbol{Y},\boldsymbol{X},D\})} \propto \color{RoyalBlue}{p_1(\boldsymbol{Y}|\boldsymbol{\psi},\sigma_y^2)p_2(\boldsymbol{\psi}|\boldsymbol{X},\boldsymbol{\mu},\boldsymbol{\Omega})} \color{orange}{p_3(\boldsymbol{\mu},\boldsymbol{\Omega},\sigma_y^2)}\)` where `\(\boldsymbol{\theta}\)` contains all the parameters to be estimated and `\(\boldsymbol{Z}\)` contains all observed data. ] --- # Bayesian Pop. PK: .font70[Estimation] .font140[ Bayesian population PK models are usually estimated using MCMC methods (hybrid Gibbs-Metropolis Hastings sampling in `WINBUGS`/`PKBugs`; Hamiltonian Monte Carlo in `Stan`/`Torsten`). ] -- .font140[ MCMC can take a long time to ensure convergence. This creates a bottleneck when fitting many models, such as during iterative model checking and development process. ] -- .font140[ Therefore, a less time-consuming approximate method can be useful during this process. We compare MCMC to a variational inference approximation, specifically automatic differentiation variational inference (ADVI). ] <!-- .font80[ .footnote[Image credit: Box and Tiao (1992). *Bayesian Inference in Statistical Analysis* ] ] --> --- # Bayesian Pop. PK: .font70[Automatic Differentiation Variational Inference] <!-- .font90[ Kucukelbir et al. (2017) developed a VI method called automatic differentiation variational inference (ADVI). ] - more accessible: avoids derivation of complete conditionals/optimal updates for each new model - more general: requires only a differentiable probability model (compared to conjugacy or exponential family requirements). --> .font90[ All unknown parameters are transformed to a common, unconstrained real coordinate space `\((T(\theta)=\zeta)\)`. The transformation allows the use of a Gaussian variational family on the transformed space for all problems. ] <img src="fig/bayes_pk_advi/ADVI_transform1_trim.png" width="35%" style="display: block; margin: auto;" /> .font60[ .footnote[Image Credit: Kucukelbir et al. (2017) Automatic Differentiation Variational Inference. *Journal of Machine Learning Research*, 18, 430-474. ] ] -- .font90[ A stochastic gradient ascent algorithm is used to optimize the variational objective function in the transformed space. A second transformation `\((S_{\phi}(\zeta)=\eta)\)` allows calculation of expectations with Monte Carlo integration and gradients using automatic differentiation. ] <img src="fig/bayes_pk_advi/ADVI_transform2_trim.png" width="35%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Simulations] .font140[ - Simulations for several scenarios and settings - 2 Structural models: one-compartment and two-compartment IV infusion models - 2 Sampling schemes: dense (100 subjects with 21 observations each) and sparse (300 subjects with 2 - 5 observations each) - 4 Priors: strong, weak, misspecified, and non-informative priors ] -- .font140[ - 250 datasets simulated under each structural model and sampling scheme ] -- .font140[ - For each dataset, models were fit using 4 priors with both ADVI and MCMC (No-U-Turns Hamiltonian Monte Carlo) ] -- .font140[ - Compare computation time, population and individual parameter estimates, and posterior predictions ] --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] MCMC to ADVI computation time ratio -- <img src="fig/bayes_pk_advi/fit_time_plt_b.png" width="70%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Median population parameter `\((\boldsymbol{\mu}\)`, `\(\boldsymbol{\Omega}\)`, `\(\sigma_y^2)\)` percent bias `$$\small Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij},\;\;\; \varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\color{Red}{\sigma_y^2},\;\;\; \gamma \ge 0\\ \small \boldsymbol{\psi_{i}}=\boldsymbol{f_2}(\color{Red}{\boldsymbol{\mu}},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N_2(0,\color{Red}{\boldsymbol{\Omega}})$$` -- <img src="fig/bayes_pk_advi/scn1_post_med_plt_pct2_wide.png" width="63%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Population parameters `\((\boldsymbol{\mu}\)`, `\(\boldsymbol{\Omega}\)`, `\(\sigma_y^2)\)` 90% credible interval widths <!--\color{Red}{}--> `$$\small Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij},\;\;\; \varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\color{Red}{\sigma_y^2},\;\;\; \gamma \ge 0\\ \small \boldsymbol{\psi_{i}}=\boldsymbol{f_2}(\color{Red}{\boldsymbol{\mu}},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N_2(0,\color{Red}{\boldsymbol{\Omega}})$$` -- <img src="fig/bayes_pk_advi/post_width_plt_a2_wide.png" width="63%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Individual parameters `\((\boldsymbol{\psi_{ij}})\)` percent bias and credible interval widths `$$\small Y_{ij} = f(D_i,\psi_{i},t_{ij})+\varepsilon_{ij},\;\;\; \varepsilon_{ij} \sim N(0,\sigma_{ij}^2),\;\;\;\sigma_{ij}^2 = f(D_i,\boldsymbol{\psi_{i}},t_{ij})^{\gamma}\sigma_y^2,\;\;\; \gamma \ge 0\\ \small \boldsymbol{\color{Red}{\psi_{i}}}=\boldsymbol{f_2}(\boldsymbol{\mu},\boldsymbol{X_{i}})+\boldsymbol{\eta_i},\;\;\; \boldsymbol{\eta} \sim N_2(0,\boldsymbol{\Omega})$$` -- <img src="fig/bayes_pk_advi/poppk1cpt_sim_d_ind_PK_med_plt.png" width="47%" /><img src="fig/bayes_pk_advi/poppk1cpt_sim_d_ind_ci_width_plt_a.png" width="47%" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Individual predictions - dense sampling <!-- One-compartment PK model simulations - observed concentrations and individual predicted concentrations for the first simulated dataset with dense sampling and weak priors. --> <img src="dissertation_pres_files/figure-html/ipred-3-1.png" width="67%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Individual predictions - sparse sampling <!--One-compartment PK model simulations - observed concentrations and individual predicted concentrations for the first simulated dataset with sparse sampling and weak priors.--> <img src="dissertation_pres_files/figure-html/ipred-4-1.png" width="67%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font60[Simulation Results Two-compartment IV infusion model] .font110[ For the more complex two-compartment model: ] -- .font110[ - MCMC models take even longer to fit relative to ADVI (median MCMC to ADVI fit time ratios between 11 and 48 times) ] -- .font110[ - Population parameter posterior medians using ADVI are more biased especially for parameters other than total clearance `\((Cl_{pop})\)` ] -- .font110[ - Population parameter 90% credible intervals are underestimated using ADVI ] -- .font110[ - Individual parameter estimates using ADVI are fairly accurate for `\(Cl_{i}\)` with more bias for other individual parameters depending on sampling scheme and prior ] -- .font110[ - Individual parameter 90% credible interval widths are similar using ADVI and MCMC ] -- .font110[ - Individual and population predictions are similar using ADVI and MCMC ] --- # Bayesian Pop. PK: .font70[Model Selection Simulations] .font110[Explored model selection using correctly specified and two misspecified models. Clearance submodel can be described as:] - Correctly specified model (M1) with `\(\beta_{Cl_{X_1}}=0.4\)`, `\(X_1 \sim Bernoulli(\theta=0.4)\)` `$$\log(Cl_i)=\log(Cl_{pop})+0.75\times \ln \left(\frac{weight_i}{70}\right)+\beta_{Cl_{X_1}}X_{1i}+\eta_{Cl_i}$$` -- - First misspecified model (M2) replace `\(X_1\)` with `\(X_2 \sim Exponential(\lambda = 0.2)\)` `$$\log(Cl_i)=\log(Cl_{pop})+0.75\times \ln \left(\frac{weight_i}{70}\right)+\beta_{Cl_{X_2}}X_{2i}+\eta_{Cl_i}$$` -- - Second misspecified model (M3) include both `\(X_1\)` and `\(X_2\)` `$$\log(Cl_i)=\log(Cl_{pop})+0.75\times \ln \left(\frac{weight_i}{70}\right)+\beta_{Cl_{X_1}}X_{1i}+\beta_{Cl_{X_2}}X_{2i}+\eta_{Cl_i}$$` --- # Bayesian Pop. PK: .font70[Model Selection Strategies and Results] .font110[Each dataset is fit using all three models with both ADVI and MCMC. Then we look at several model selection strategies:] -- - Strategy 1: For both ADVI and MCMC, calculate information criteria (DIC, WAIC, and PSIS-LOOIC) <!--calculate deviance information criteria (DIC), widely applicable information criteria (WAIC), and Pareto Smoothed Importance Sampling leave-one-out information criteria (PSIS-LOOIC). Models with lower information criteria have better out-of-sample predictive performance. --> -- - Strategy 2: For ADVI, use maximized ELBO <!--which is a lower bound on the marginal log likelihood `\(\log p(y)\)`.The marginal likelihood is also known as the evidence and is the probability of the data given the assumed model (after integrating over parameters). --> -- - Strategy 3: For ADVI, use 5-fold leave-subject-out cross-validation to calculate log pointwise predictive density `\((lppd_{cv})\)` <!--. Separate subjects into 5 groups; fit the model excluding one group and use the fit to calculate the log pointwise predictive density for subjects in the excluded group. Repeat for the other groups and sum to produce `\((lppd_{cv})\)`, a measure of out-of-sample predictive performance.--> -- We compare strategies by calculating how frequently each of the 6 possible rankings — from best (M1, M3, M2) to worst (M2, M3, M1) — are chosen <!-- -> [fit dataset generated from M1 using models M1, M2, and M3 with MCMC and ADVI] -> [for each fit calculate a. 3 information criteria DIC, WAIC, PSIS-LOOIC using MCMC fits b. 3 information criteria DIC, WAIC, PSIS-LOOIC using ADVI fits c. ELBO using ADVI fits d. k-fold lppd_cv using ADVI fits ] -> [calculate how frequently each of the 6 possible rankings are chosen] --> -- - Strategy 1 performs worst in terms of ranking the models; using information criteria with ADVI estimates is worse than MCMC estimates - Strategy 3 was best for most scenarios, sampling schemes, priors, and sensitivity analyses --- # Bayesian Pop. PK: .font70[Discussion] .font120[ ➡️ Using ADVI can provide a good balance between speed and accuracy when fitting Bayesian population PK models depending on question of interest ] -- .font120[ ➡️ Simulation Results - MCMC took 10x longer for one-compartment model and 45x longer for two-compartment model - For population parameters, posterior medians can be fairly accurate but credible intervals are *underestimated* by ADVI - For individual parameters, posterior medians can be fairly accurate and credible intervals are similar for ADVI and MCMC - Posterior predicted concentrations using ADVI are similar to MCMC ] -- <!-- - For ketorolac case study ADVI took 6.5 mins vs. nearly 6 hours for MCMC --> .font120[ ➡️ Reduced fit time using ADVI can be leveraged for model selection by using cross-validation ] -- .font120[ ➡️ Best used when fitting many models or to get results quickly when approximate solution is acceptable ] --- # Outline - .font160[.grey[Bayesian Inference and Estimation]] - .font160[.grey[Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes]] - .font160[.grey[Population Pharmacokinetic Modeling Using Automatic Differentiation Variational Inference]] - .font160[.bold[Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real World Data from Electronic Health Records and Remnant Specimens]] --- # Population PK Analysis of Dexmedetomidine .pull-left.w67[ .font90[ - Dexmedetomidine is an alpha2-agonist routinely used as part of intraoperative anesthetic management during surgical repairs of congenital heart disease and in the postoperative period in the intensive care unit - Commonly dosed as a continuous intravenous (IV) infusion using a fixed weight-based rate (e.g., starting at 0.3 mcg/kg/h) - This dosing regimen results in inappropriate dosing for some patients who are then at risk for dose-related dexmedetomidine side effects or use of additional sedative agents, including opioid analgesics - Accurate prediction of an individual’s dexmedetomidine requirement (precision dosing) could help minimize titration time to achieve sedation and analgesia goals without toxicity - Objectives were to perform a population PK analysis of dexmedetomidine in children using remnant specimens and electronic health records (EHRs) and explore the impact of patient’s characteristics and pharmacogenetics on dexmedetomidine clearance ] ] .pull-right.w33[ <img src="fig/bayes_pk_dex/Dexmed_0_9pct_Group_HighRes1.jpg" width="100%" style="display: block; margin: auto;" /> ] .font70[ .footnote[James et al. (2022) Population Pharmacokinetic Analysis of Dexmedetomidine in Children using Real- World Data from Electronic Health Records and Remnant Specimens. *British Journal of Clinical Pharmacology*] ] --- # Dex. Pop. PK: .font70[Frequentist Analysis] .pull-left.w40[ <img src="fig/bayes_pk_dex/FlowDiagram_210919.png" width="85%" style="display: block; margin: auto;" /> ] .pull-right.w60[ .font90[ - Dexmedetomidine dosing and patient data were gathered from EHRs and combined with opportunistically sampled remnant specimens. - Examined multiple structural, error, and variance models and association between PK and patient characteristics (weight, postmenstrual age, sex, cardiac bypass time, etc.) and genotype variables (*UGT1A4* and *UGT2B10* variants, novel *CYP2A6* risk score). - Population PK models were estimated using a frequentist approach with stochastic approximation expectation-maximization (SAEM) algorithm. - 354 post-cardiac surgery patients age 0 to 22 years (median 16 months). - The data were best described with a two-compartment model with allometric scaling for weight and Hill maturation function for postmenstrual age. - Weight and age were important predictors of clearance; we did not find evidence for pharmacogenetic effects of *UGT1A4* or *UGT2B10* genotype or *CYP2A6* risk score. ] ] --- # Dex. Pop. PK: .font70[Bayesian Reanalysis Methods] .font130[ - A limitation of the frequentist analysis is the sparse, irregular, and potentially error-prone data which leads to a large amount of variation and imprecision for some PK parameters ] -- .font130[ - Use a Bayesian framework to combine prior information from eight small, designed clinical studies with our larger observational dataset to more precisely estimate these parameters and the entire model ] -- .font130[ - Begin with two-compartment allometric scaling model from original paper and previous studies, then use ADVI to quickly fit and explore models including additional patient characteristics and pharmacogenomic effects ] -- .font130[ - Use MCMC estimation to produce final model estimates and perform posterior checks ] -- .font130[ - Examine sensitivity to priors ] --- # Dex. Pop. PK: .font70[Bayesian Reanalysis Results] .font130[ Using ADVI with 10-fold cross-validation to estimate out-of-sample predictive accuracy, `\(lppd_{cv}\)`: ] -- .font120[ - Two-compartment model with allometric scaling and Hill maturation factor `\((lppd_{cv}=856.6)\)` was better than model with allometric scaling alone `\((lppd_{cv}=706.1)\)` ] -- .font120[ - Allometric scaling and Hill maturation model was not improved by adding serum creatinine, sex, STAT score<sup>1</sup>, cardiac bypass time, length of ICU stay, or *UGT1A4* or *UGT2B10* variant effects to model ] .font70[ .footnote[[1] Society of Thoracic Surgery–European Association for Cardio-Thoracic Surgery Congenital Heart Surgery Mortality score ] ] -- .font120[ - Among the 350 subjects with *CYP2A6* score, the allometric scaling and Hill maturation factor model `\((lppd_{cv}=780.3)\)` was improved by adding *CYP2A6* effect to model `\((lppd_{cv}=803.3)\)` ] --- # Dex. Pop. PK: .font70[Bayesian Reanalysis Results] Allometric scaling, Hill maturation, and *CYP2A6* score model posterior median parameter estimates and 90% highest density intervals: - Total clearance (*CL<sub>pop</sub>*) 40.8 L/hr (36.1 - 45.6 L/hr) - Standardized *CYP2A6* effect on log total clearance `\((\beta_{CL,CYP2A6})\)` 0.0634 (-0.000351 - 0.123) - Central compartment volume of distribution (*V1<sub>pop</sub>*) 142 L (122 – 164 L) - Intercompartmental clearance (*Q<sub>pop</sub>*) 13.5 L/hr (9.79 – 17.6 L/hr) - Peripheral compartment volume of distribution (*V2<sub>pop</sub>*) 1136 L (636 – 1661 L) - Postmenstrual age when 50% of adult clearance is achieved (*TM<sub>50</sub>*) 41.3 weeks (35.7 – 46.2 weeks) - Hill coefficient 4.34 (2.65 – 6.06) A one standard deviation increase in *CYP2A6* score was associated with approximately 1.06 times higher total clearance. This implies that a subject with a high *CYP2A6* score will require a larger dose to achieve the same concentration as a subject with lower *CYP2A6* score, holding other covariates constant. --- # Dex. Pop. PK: .font70[Bayesian Reanalysis Results] Hypothetical subjects with median weight (9.4kg), median postmenstrual age (104.6 weeks) and *CYP2A6* scores of 2.0 and 2.43. Loading dose of 0.4 mcg/kg delivered over 5 minutes followed by a 6 hour infusion at 0.6 mcg/kg/h and a 2 hour infusion at 0.5 mcg/kg/h (a) dosing same for both subjects (b) dosing for high *CYP2A6* subject increased by 0.1 mcg/kg <img src="dissertation_pres_files/figure-html/bayesdexdose-1.png" width="85%" style="display: block; margin: auto;" /> --- # Dex. Pop. PK: .font70[Discussion] .font120[ ➡️ Using an informative prior, Bayesian estimation produced results much closer to the previous literature for most parameters compared to the frequentist analysis ] -- .font120[ ➡️ Including *CYP2A6* improved estimated predictive accuracy and effect is potentially clinically meaningful ] -- .font120[ ➡️ Using ADVI reduced modeling time, e.g. for final model ADVI fit time was 6 minutes, MCMC fit time was 595 minutes (9 hours, 55 minutes) ] -- .font120[ ➡️ Prior sensitivity analysis - Non-informative prior chains stuck/not converged after 3000 warmup iterations - Weakly informative prior (increased prior variance) predictions similar to main analysis prior, *Q<sub>pop</sub>* and *V2<sub>pop</sub>* parameters were sensitive to prior choice ] --- # Conclusion .pull-left[ .font120[Benefits] ➕ Interpretation using posterior probabilities - 95% probability that a given subject has median IL-6 between 2.02 and 2.68 ➕ Ability to formally incorporate prior information - Use information from previous dexmedetomidine studies to stabilize estimation of population PK models ➕ Natural extension to hierarchical models - Population PK models are hierarchical models where concentrations are clustered within individual, can extend CPMs to structured data ] -- .pull-right[ .font120[Challenges] ➖ Determining appropriate prior distribution - Reparameterize CPM model intercept prior to handle large number of ordinal categories ➖ Computationally intensive - Use ADVI approximation for population PK models to reduce computation time for model development ] --- # Thanks .pull-left[ PhD Committee - Bryan Shepherd, chair - Leena Choi, advisor - Matt Shotwell - Sara Van Driest Choi Lab - Cole Beck - Elizabeth McNeer - Michael Williams - Hannah Weeks ] .pull-right[ - Frank Harrell - Jeffrey Blume - Robert Greevy - Prince Kannenkeril - John Koethe - Yuqi Tian - Chun Li - *BJCP* paper co-authors and collaborators - Vanderbilt Biostatistics students, faculty, and staff ] --- # Bayesian Inference: .font80[Software] .font150[ We use `Stan` through the `R` front-end `Rstan` or `CmdStanR` for all Bayesian models and the `Torsten` library for all Bayesian PK models ] <img src="fig/intro/stan_logo.png" width="20%" /><img src="fig/intro/torsten-white-stan-cropped-273x150.png" width="20%" /> .font150[ Code and examples are available at https://github.com/ntjames ] --- # Bayesian CPM: .font70[Posterior conditional distributions] .font150[ Using MCMC samples from posterior distribution ( `\(\tilde{\boldsymbol{\gamma}}, \tilde{\boldsymbol{\beta}}\)` ), it is straightforward to calculate quantities of interest for a given set of covariates - Posterior conditional CDF: `\(\tilde{F}(y_i|\boldsymbol{x}_i)=G^{-1}(\tilde{\gamma}_{r(y_i)}-\boldsymbol{x}_i^{T}\tilde{\boldsymbol{\beta}})\)` - Posterior conditional mean: `\(\tilde{E}[Y|\boldsymbol{x}_i]=\sum_{i=1}^{n}y_i\tilde{f}(y_i|\boldsymbol{x}_i)\)` where `\(\tilde{f}(y_i|\boldsymbol{x}_i)=\tilde{F}(y_i|\boldsymbol{x}_i)-\tilde{F}(y_{i-1}|\boldsymbol{x}_i)\)` - Posterior conditional quantile: To estimate the `\(q^{th}\)` quantile - Find `\(y_i=\inf\{y:\tilde{F}(y|x)\ge q\}\)` and the next smallest value `\(y_{i-1}\)` - Use linear interpolation to find quantile `\(y_q\)` where `\(y_{i-1}<y_q<y_i\)` ] --- # Bayesian CPM: .font70[Simulation Setting] .font150[ To evaluate the long-run frequency properties of the Bayesian CPM we generate data from a log-normal model `$$Y=\exp(X_1\beta_1 + X_2 \beta_2 + \varepsilon) \quad \varepsilon \sim N(0,1)$$` where `\(\beta_1=1\)`, `\(\beta_2=-0.5\)`, `\(X_1 \sim Bernoulli(0.5)\)` and `\(X_2 \sim N(0,1)\)` `\(1000\)` datasets were simulated for sample sizes `\(n=25,50,100,200\)` and `\(400\)` A Bayesian CPM with the properly specified probit link was fit to each dataset ] --- # Bayesian CPM: .font70[Simulation Results] .font120[ The average bias of the posterior median and standard error are shown for the two `\(\beta\)` regression parameters and five `\(\gamma\)` intercept parameters corresponding to `\(y_1=e^{-1}\)`, `\(y_2=e^{-0.33}\)`, `\(y_3=e^{0.5}\)`, `\(y_4=e^{1.33}\)`, `\(y_5= e^{2}\)` ] <img src="fig/bayes_cpm/sim_full.png" width="80%" style="display: block; margin: auto;" /> --- # Bayesian CPM: .font70[Simulation Results] .font120[ To simulate a mixed discrete/continuous outcome identical datasets were produced but with values of `\(Y<0\)` set to `\(0\)` ] <img src="fig/bayes_cpm/sim_cens.png" width="80%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Variational Inference] .pull-left[ <img src="fig/bayes_pk_advi/VI_diag3.png" width="110%" style="display: block; margin: auto;" /> ] .pull-right[ - Goal of VI is to find the joint distribution, `\(q(\boldsymbol{\theta};\phi)\)`, in a specified distributional family indexed by variational parameters `\(\phi \in \Phi\)`, which has the smallest Kullback-Leibler (KL) divergence to the posterior distribution, `\(p(\boldsymbol{\theta}|\boldsymbol{Z})\)` ] --- # Bayesian Pop. PK: .font70[Variational Inference] .pull-left[ <img src="fig/bayes_pk_advi/VI_diag4.png" width="110%" style="display: block; margin: auto;" /> ] .pull-right[ - Goal of VI is to find the joint distribution, `\(q(\boldsymbol{\theta};\phi)\)`, in a specified distributional family indexed by variational parameters `\(\phi \in \Phi\)`, which has the smallest Kullback-Leibler (KL) divergence to the posterior distribution, `\(p(\boldsymbol{\theta}|\boldsymbol{Z})\)` - The best variational approximation, `\(q(\boldsymbol{\theta};\phi^*)\)`, is found by optimizing, `\(\phi^*= \underset{\phi \in \Phi}{\arg\min}\{KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{Z}))\}\)` ] -- - KL can be decomposed as `\(KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{Z}))=KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))-E_q[\log p(\boldsymbol{Z}|\boldsymbol{\theta})]+\log p(\boldsymbol{Z})\)` which shows the dependence of the optimization on an intractable term, the marginal likelihood (or evidence), `\(\log p(\boldsymbol{Z})\)` - But, all expectations are with respect to `\(q(\boldsymbol{\theta};\phi)\)` (on which `\(\log p(\boldsymbol{Z})\)` does not depend), finding `\(\phi^*\)` that minimizes `\(KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{Z}))\)` is equivalent to finding `\(\phi^*=\underset{\phi \in \Phi}{\arg\max}\{E_q[\log p(\boldsymbol{Z}|\boldsymbol{\theta})]-KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))\}\)` --- # Bayesian Pop. PK: .font70[Variational Inference] .font130[ - The optimal variational parameter is `\(\phi^*=\underset{\phi \in \Phi}{\arg\max}\{E_q[\log p(\boldsymbol{Z}|\boldsymbol{\theta})]-KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))\}\)` - Since the KL divergence `\(\ge0\)` for all distributions: `$$\begin{gather*} KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}|\boldsymbol{z})) \ge 0\\ KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta}))-E_q[\log p(\boldsymbol{z}|\boldsymbol{\theta})]+\log p(\boldsymbol{z}) \ge 0\\ \log p(\boldsymbol{z}) \ge \underbrace{E_q[\log p(\boldsymbol{z}|\boldsymbol{\theta})]-KL(q(\boldsymbol{\theta};\phi)||p(\boldsymbol{\theta})).}_{ELBO} \end{gather*}$$` - The term that is maximized to find `\(\phi^*\)` is also called the evidence lower bound (ELBO) ] --- # Bayesian Pop. PK: .font70[Automatic Differentiation Variational Inference] .font110[ - VI commonly uses a 'mean-field' approximation which assumes mutual independence between each of the `\(\ell\)` marginal distributions in the specified variational distributional family, `\(q(\boldsymbol{\theta};\phi)=\Pi_{\ell}q_{\ell}(\theta_{\ell};\phi_{\ell})\)` ] <!-- Then, a coordinate ascent algorithm iteratively optimizes each marginal factor, `\(q_j(\theta_j;\phi_j)\)`, using complete conditional densities given all the other parameters `\(\theta_{-j}\)` and the observed data. --> .font110[ - Early work on VI focused on deriving closed form solutions for the optimal updates for special cases such as conjugate and exponential family models. - deriving and coding the optimal updates can be time-consuming; requires alternative methods for models outside of the special cases - undercuts the aim of quickly iterating over multiple models without constraints on model and prior parameterization ] .font110[ - Kucukelbir et al. (2017) developed a VI method called automatic differentiation variational inference (ADVI) that combines advances in probabilistic programming and automatic differentiation - more accessible: avoids derivation of complete conditionals/optimal updates for each new model - more general: requires only a differentiable probability model (compared to conjugacy or exponential family requirements). ] --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] <!-- remove? --> Ratio of `\(C_{max}\)` predictions <img src="fig/bayes_pk_advi/poppk1cpt_sim_d_ind_Cmax_med_ratio_plt.png" width="48%" /><img src="fig/bayes_pk_advi/poppk1cpt_sim_d_ind_Cmax_ci_width_ratio_plt.png" width="48%" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Population predictions - dense sampling <!-- One-compartment PK model simulations - observed concentrations and population predicted concentrations for the first simulated dataset with dense sampling and weak priors. --> <img src="dissertation_pres_files/figure-html/ppred-3-1.png" width="67%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font60[Simulation Results One-compartment IV infusion model] Population predictions - sparse sampling <!-- One-compartment PK model simulations - observed concentrations and population predicted concentrations for the first simulated dataset with sparse sampling and weak priors. --> <img src="dissertation_pres_files/figure-html/ppred-4-1.png" width="67%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Case Study Background] .font130[ - Ketorolac is a nonsteroidal anti-inflammatory drug (NSAID) used to treat pain after surgery - Data for ketorolac IV dosing and patient characteristics was gathered from EHRs and combined with remnant specimens collected using opportunistic sampling to determine blood plasma concentration - Concentration samples are sparse so we use the Bayesian population PK framework to combine prior information from several small designed clinical studies with our larger observational dataset - The primary goal is to find a model that adequately describes individual PK time-concentration profiles and parameters ] --- # Bayesian Pop. PK: .font70[Case Study Methods] Based on previous literature ketorolac PK can be described well using a two-compartment model. The two-compartment PK model for multiple IV bolus doses is given by: `$$\begin{gather*} D=\{d_k,t_{d_k}\}\;\; \psi=\{Cl,Q,V_1,V_2\}\\ \alpha = \frac{\frac{Q Cl}{V_2 V_1}}{\beta}\;\;\; \beta = \frac{1}{2}\left[\frac{Q}{V_1}+\frac{Q}{V_2}+\frac{Cl}{V_1} - \sqrt{\left(\frac{Q}{V_1}+\frac{Q}{V_2}+\frac{Cl}{V_1} \right)^2} - 4 \frac{Q Cl}{V_2 V_1} \right]\;\;\; A = \frac{1}{V_1}\frac{\alpha - \frac{Q}{V_2}}{\alpha-\beta}\;\;\; B = \frac{1}{V_1}\frac{\beta - \frac{Q}{V_2}}{\beta-\alpha}\\ f(D,\psi,t)=\sum_{k=1}^{K}d_k(A e^{-\alpha (t-t_{d_k})} + B e^{-\beta (t-t_{d_k})}), \end{gather*}$$` where `\(d_k\)` and `\(t_{d_k}\)` are the amounts and administration times, respectively, for the `\(k=1,\ldots,K\)` doses. Using this compartmental model, the stage 1 model for observed concentration `\(y_{ij}\)` for individual `\(i\)` at time `\(j\)` including combined proportional and additive error can be defined as: `$$\begin{gather*} y_{ij}=f(D_i,\psi_{i},t_{ij})\times(1+\varepsilon_{ij}^{prop})+\varepsilon_{ij}^{add}\\ \varepsilon_{ij}^{prop} \sim N(0,\sigma_{prop}^2)\;\;\; \varepsilon_{ij}^{add} \sim N(0,\sigma_{add}^2) \end{gather*}$$` --- # Bayesian Pop. PK: .font70[Case Study Methods] For stage two, we use a model with fixed allometric scaling to account for different weights: `$$\begin{gather*} \log(Cl_i) = \log(Cl_{pop}) + 0.75 \times \log\left(\frac{weight_i}{15}\right) + \eta_{Cl_i},\; \eta_{Cl} \sim N(0, \omega_{Cl}^2)\\ \log(Q_i) = \log(Q_{pop}) + 0.75 \times \log\left(\frac{weight_i}{15}\right) + \eta_{Q_i},\; \eta_{Q} \sim N(0, \omega_{Q}^2)\\ \log(V_{1i}) = \log(V_{1\,pop}) + 1 \times \log\left(\frac{weight_i}{15}\right) + \eta_{V_{1i}},\; \eta_{V_1} \sim N(0, \omega_{V_1}^2)\\ \log(V_{2i}) = \log(V_{2\,pop}) + 1 \times \log\left(\frac{weight_i}{15}\right) + \eta_{V_{2i}},\; \eta_{V_2} \sim N(0, \omega_{V_2}^2)\\ \{\eta_{Cl},\eta_{Q},\eta_{V_1},\eta_{V_2}\} =\boldsymbol{\eta} \sim N_4(0,\Omega) \end{gather*}$$` <!-- The stage 2 model with allometric scaling and maturation factor is identical except for the clearance model: `$$\begin{gather*} \log(Cl_i) = \log(Cl_{pop}) + 0.75 \times \log\left(\frac{weight_i}{15}\right) + \log\left(\frac{1}{1+(TM_{50}/pma_i)^{Hill}}\right)+ \eta_{Cl_i} \end{gather*}$$` --> --- # Bayesian Pop. PK: .font70[Case Study Methods] We use an informative prior based on previously published pediatric ketorolac studies and literature on PK scaling for children <img src="dissertation_pres_files/figure-html/cs-priors-1.png" width="65%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Case Study Results] .font150[ The 406 subjects in ketorolac dataset were separated into a training dataset with 320 subjects used to fit the models and a validation dataset with 86 subjects For the model with allometric scaling and maturation factor: - Mean-field ADVI algorithm took around *6 minutes and 33 seconds* to converge - MCMC took *341 minutes* (5 hours, 41 minutes) for 4 parallel chains running concurrently with 4000 warmup iterations and 250 sampling iterations ] --- # Bayesian Pop. PK: .font70[Case Study Results] Posterior Population parameter distributions <img src="dissertation_pres_files/figure-html/cs-param-1.png" width="60%" style="display: block; margin: auto;" /> <!-- <img src="dissertation_pres_files/figure-html/cs-param2-1.png" width="60%" style="display: block; margin: auto;" /> --> --- # Bayesian Pop. PK: .font70[Case Study Results] Individual posterior clearance `\((Cl_i)\)` distributions for 20 individuals from training dataset <img src="dissertation_pres_files/figure-html/cs-ind-1.png" width="60%" style="display: block; margin: auto;" /> <!-- case_study_mod2_pr2_both_indCl_temp.pdf --> --- # Bayesian Pop. PK: .font70[Case Study Results] <img src="dissertation_pres_files/figure-html/cs-ipred-1.png" width="80%" style="display: block; margin: auto;" /> <!-- case_study_mod2_pr2_both_ipred_temp.pdf --> --- # Bayesian Pop. PK: .font70[Case Study Results] <img src="dissertation_pres_files/figure-html/cs-ppred-1.png" width="80%" style="display: block; margin: auto;" /> <!-- case_study_mod2_pr2_both_ppred_val.pdf --> --- # Bayesian Pop. PK: .font70[Case Study Results] <img src="dissertation_pres_files/figure-html/cs-ovp-1.png" width="55%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Case Study Results] <!-- move to supplemental slides --> <img src="dissertation_pres_files/figure-html/cs-recovar-1.png" width="50%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Case Study Results] <!-- move to supplemental slides --> <img src="dissertation_pres_files/figure-html/cs-vpc-1.png" width="50%" style="display: block; margin: auto;" /> --- # Bayesian Pop. PK: .font70[Additional Discussion] ADVI compares favorably with other possible approximations - Maximum a posteriori (MAP) estimation provides very fast point estimates, but doesn't account for uncertainty, can be very inaccurate in hierarchical models - Laplace approximation also uses optimization and estimates covariances using the inverse Hessian of the optimization objective function, but makes stronger distributional assumptions and is less flexible than ADVI -- There are several potential solutions to underestimated population parameter uncertainty - Replace the mean-field assumption with 'full-rank' VI, allow correlation between marginal variational distributions `\(q_{\ell}(\theta_{\ell}; \phi_{\ell})\)`; increases complexity and computation time, full-rank ADVI can provide worse results than mean-field ADVI - Correct mean-field VI variance estimates using linear response methods<sup>3</sup>; increased computation time and requires unbiased mean estimates .font70[ .footnote[[3] Giordano et al. (2018). Covariances, Robustness, and Variational Bayes. *Journal of Machine Learning Research*, 19, 1-49.] ] --- # Dex. Pop. PK: .font70[Frequentist Analysis Results] - Population parameter estimates and 95% confidence intervals: - .font90[Total clearance (*CL<sub>pop</sub>*) 27.3 L/hr (24.0 – 31.1 L/hr)] - .font90[Central compartment volume of distribution (*V1<sub>pop</sub>*) 161 L (139 – 187 L)] - .font90[Intercompartmental clearance (*Q<sub>pop</sub>*) 26.0 L/hr (22.5 – 30.0 L/hr)] - .font90[Peripheral compartment volume of distribution (*V2<sub>pop</sub>*) 7903 L (5617 – 11119 L)] - .font90[Postmenstrual age when 50% of adult clearance is achieved (*TM<sub>50</sub>*) 42.0 weeks (41.5 – 42.5 weeks)] - .font90[Hill coefficient 7.04 (6.99 – 7.08)] --- # Dex. Pop. PK: .font70[Bayesian Reanalysis Priors] <img src="dissertation_pres_files/figure-html/bayesdexprior-1.png" width="70%" style="display: block; margin: auto;" /> .font70[ .footnote[Wiczling et al. (2016). The Pharmacokinetics of Dexmedetomidine during Long-term Infusion in Critically Ill Pediatric Patients: a Bayesian approach with informative priors. *Journal of Pharmacokinetics and Pharmacodynamics*, 43, 315-324.] ] --- # Dex. Pop. PK: .font70[Bayesian Reanalysis Priors] <img src="dissertation_pres_files/figure-html/bayesdexpriorpred-1.png" width="100%" /> .font70[ .footnote[Wiczling et al. (2016). The Pharmacokinetics of Dexmedetomidine during Long-term Infusion in Critically Ill Pediatric Patients: a Bayesian approach with informative priors. *Journal of Pharmacokinetics and Pharmacodynamics*, 43, 315-324.] ] --- # Dex. Pop. PK: .font70[Bayesian and Frequentist Analysis Comparison] .pull-left.w40[ .font110[ - Frequentist model used time-varying weight and simplified variance structure; Bayesian model used baseline weight and more flexible variance structure - Frequentist model found no significant genotype effects; Bayesian model found *CYP2A6* improved predictive accuracy - Bayesian model includes more 'subjects' through prior - Frequentist model used BICc which favors parsimony; Bayesian model used `\(lppd_{cv}\)` which favors predictive accuracy ] ] .pull-right.w60[ <img src="dissertation_pres_files/figure-html/bayesfreqdex-1.png" width="100%" style="display: block; margin: auto;" /> ]