copula package

# load packages
libs<-c("copula","scatterplot3d")
invisible(lapply(libs, library, character.only = TRUE))

Elliptical and Archimedean copulas can be defined using the copula class.

Elliptical Copulas

First, we create a normalCopula object with 2 dimensions and (exchangeable) correlation of \(\rho=0.4\)

set.seed(1)

# create normalCopula object with 2 dimensions and exchangeable correlation
myCop.norm <- ellipCopula(family="normal", dim=2, dispstr="ex", param = 0.4)

# generate random draws from copula
u <- rCopula(1000, myCop.norm)

After generating \(n=1000\) random draws from the Normal copula, we check the correlation and produce a scatter plot of the draws

# check correlation
cor(u)
##           [,1]      [,2]
## [1,] 1.0000000 0.3995705
## [2,] 0.3995705 1.0000000
# scatter plot
plot(u[,1],u[,2], main="1000 draws from Normal copula")

We can also produce contour and perspective plots of the copula density and distribution functions

# contour and perspective plot of density function
contour(myCop.norm,dCopula, main="contour plot of Normal copula density")

persp(myCop.norm,dCopula, main="perspective plot of Normal copula density")

# contour and perspective plot of distribution function
contour(myCop.norm,pCopula, main="contour plot of Normal copula distribution")

persp(myCop.norm,pCopula, main="perspective plot of Normal copula distribution")

Another common elliptical copula is the \(t\). We create a tCopula model with 8 degrees of freedom and 3 dimensions and specify a banded (Toeplitz) correlation structure with parameters \(\rho_1=0.8\) and \(\rho_2=0.5\).

# create tCopula object with 3 dimensions and banded (toeplitz) correlation and 8 df
myCop.t <- ellipCopula(family="t", dim=3, dispstr="toep", param = c(0.8, 0.5), df=8)

We draw \(n=1000\) samples from this copula and verify that the marginals are uniform and the correlation is correct

v<-rCopula(1000, myCop.t)

par(mfrow=c(1,3), mar=c(2,2,1,1))
hist(v[,1])
hist(v[,2])
hist(v[,3])

#check correlation
cor(v)
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.7963426 0.4895844
## [2,] 0.7963426 1.0000000 0.7870277
## [3,] 0.4895844 0.7870277 1.0000000

We can also produce a 3-d scatterplot of a random subset the samples (to avoid overcrowding the plot) and evaluate the copula density and distribution at the sampled values.

# scatter plot
par(mfrow=c(1,1))
subsamp <- sample(1:nrow(v),100)
scatterplot3d(v[subsamp,], type="h")

#evaluate density
head(dCopula(v,myCop.t), 10)
##  [1] 0.2684407 4.3166171 9.8703657 4.4894340 2.0922633 1.5403350 0.6653583
##  [8] 3.5466681 0.2271933 3.1185452
#evaluate distribution P(U1 <= u1, U2 <= u2, U3 <= u3)
head(pCopula(v,myCop.t), 10)
##  [1] 0.01283143 0.67535588 0.02485953 0.11549992 0.58492358 0.19596911
##  [7] 0.19517213 0.10670446 0.37418632 0.03529708

Archimedean Copulas

Archimedean copulas are another popular choice. We create 3 dimensional Clayton copula with \(\alpha=2\) and a 2 dimensional Frank copula with \(\alpha=5\).

myCop.clayton <- archmCopula(family = "clayton", dim = 3, param = 2)

myCop.frank <- archmCopula(family = "frank", dim = 2, param = 5)

Contour plots for the Frank copula density and distribution are shown below.

par(mfrow=c(1,2))
contour(myCop.frank, dCopula, main="contour plot of Frank copula density", box01=FALSE)
contour(myCop.frank, pCopula, main="contour plot of Frank copula distribution", box01=FALSE)

Building multivariate distributions using copulas

Given a copula and marginal distribution functions, the mvdc class defines a multivariate distribution function.

We create a multivariate distribution function by combining 3 normal marginal distributions (\(N(0,2^2)\), \(N(0,1^2)\), and \(N(1,3^2)\)) with a 3-dimensional gumbel copula with \(\alpha=3\). We then draw 1000 samples from this mulivariate distribution and check the marginal means and sds.

myMvd <- mvdc(copula = archmCopula(family="gumbel", dim=3, param=3), 
              margins = c("norm", "norm", "norm"),
              paramMargins = list(list(mean=0, sd = 2), 
                                  list(mean=0, sd = 1),
                                  list(mean=1, sd = 3)))

# generate random draws from multivariate distribution 
w<-rMvdc(1000, myMvd)

# check marginal mean and sd
apply(w,2, function(x) c(mean(x),sd(x)) )
##              [,1]        [,2]      [,3]
## [1,] -0.004235106 0.001274415 0.9862673
## [2,]  1.931454581 0.971024048 2.9104349

In addition, we can evaluate the density and distribution at the sampled points

# evaluate density
head(dMvdc(w,myMvd), 10)
##  [1] 0.060681635 0.050706591 0.039972322 0.015332861 0.002910626
##  [6] 0.011808811 0.031937764 0.022124490 0.007403489 0.012785177
# evaluate distribution
head(pMvdc(w,myMvd), 10)
##  [1] 0.71825269 0.48390858 0.18002531 0.19355349 0.66116464 0.98823123
##  [7] 0.69735867 0.56023835 0.33979564 0.09292659

Next, we make a 2-dimensional distribution from normal marginals (\(N(0,3)\) and \(N(0,4)\)) and Clayton copula with \(\alpha = 2\) and show the contour and perspective plots of the density and distribution.

myMvd1 <- mvdc(copula = archmCopula(family = "clayton", param = 2), 
              margins = c("norm", "norm"),
              paramMargins = list(list(mean=0, sd = 3), 
                                  list(mean=0, sd = 4)))

# contour and perspective plots of density
contour(myMvd1, dMvdc, xlim=c(-10,10), ylim=c(-10,10), nlevels=10, 
        main="contour plot of 2-d density function\n (Clayton copula, Normal marginals)")

persp(myMvd1, dMvdc, xlim=c(-10,10), ylim=c(-10,10), 
      main="perspective plot of 2-d density function\n (Clayton copula, Normal marginals)")

# contour and perspective plots of distribution
contour(myMvd1, pMvdc, xlim=c(-10,10), ylim=c(-10,10), nlevels=10,
        main="contour plot of 2-d distribution function\n (Clayton copula, Normal marginals)")

persp(myMvd1, pMvdc, xlim=c(-10,10), ylim=c(-10,10),
      main="perspective plot of 2-d distribution function\n (Clayton copula, Normal marginals)")

Fitting copula models

The copula package provides the functions loglikCopula() and loglikMvdc() to compute the log-likelihood of the copula and the multivariate copula-based distribution function. fitCopula() and fitMvdc() perform maximum likelihood estimation.

# simulate data
Mvd <- mvdc(copula = ellipCopula(family="normal", param=0.5), 
            margins = c("gamma","gamma"),
            paramMargins = list(list(shape=2, scale=1),
                                list(shape=3, scale=2)))
dat <- rMvdc(200, Mvd)

# 'empty' mvdc object, i.e. parameters not specified 
Mvd0 <- mvdc(copula = ellipCopula(family="normal"), 
            margins = c("gamma","gamma"),
            paramMargins = list(list(shape=NA, scale=NA),
                                list(shape=NA, scale=NA)))

Mvd1 <- mvdc(copula = ellipCopula(family="normal", param=0.8), 
            margins = c("gamma","norm"),
            paramMargins = list(list(shape=4, scale=3),
                                list(mean=0, sd=1)))

# loglikelihood w/ correct parameter values
loglikMvdc(c(2,1,3,2,0.5),dat,Mvd)
## [1] -788.8245

We calculate the MLEs …

Only the distributional assumptions (form of copula and marginals) of the mvdc class object are used when passed to fitMvdc(); the parameter values from the mvdc object are not used

# method of moments estimates to start optimization
mm <- apply(dat,2,mean)
vv <- apply(dat,2,var)
b1.0 <- c(mm[1]^2/vv[1],vv[1]/mm[1])
b2.0 <- c(mm[2]^2/vv[2],vv[2]/mm[2])
a.0 <- sin(cor(dat[,1], dat[,2], method="kendall")*pi/2)
start <- c(b1.0, b2.0, a.0)

# get MLE
fit <- fitMvdc(dat, Mvd, start = start, 
               optim.control = list(trace=TRUE, maxit = 2000))
## initial  value 787.745363 
## final  value 787.113186 
## converged
## initial  value 787.113186 
## final  value 787.113186 
## stopped after 1 iterations
fit
## Call: fitMvdc(data = dat, mvdc = Mvd, start = start, optim.control = list(trace = TRUE, 
##     maxit = 2000))
## Maximum Likelihood estimation based on 200 2-dimensional observations.
## Copula:  normalCopula 
## Margin 1 :
## m1.shape m1.scale 
##    2.053    1.011 
## Margin 2 :
## m2.shape m2.scale 
##    3.427    1.713 
## Copula: 
##  rho.1 
## 0.4947 
## The maximized loglikelihood is -787.1 
## Optimization converged
# w/ 'empty' mvdc object
fit0 <- fitMvdc(dat, Mvd0, start = start, 
               optim.control = list(trace=TRUE, maxit = 2000))
## initial  value 787.745363 
## final  value 787.113186 
## converged
## initial  value 787.113186 
## final  value 787.113186 
## stopped after 1 iterations
fit0
## Call: fitMvdc(data = dat, mvdc = Mvd0, start = start, optim.control = list(trace = TRUE, 
##     maxit = 2000))
## Maximum Likelihood estimation based on 200 2-dimensional observations.
## Copula:  normalCopula 
## Margin 1 :
## m1.shape m1.scale 
##    2.053    1.011 
## Margin 2 :
## m2.shape m2.scale 
##    3.427    1.713 
## Copula: 
##  rho.1 
## 0.4947 
## The maximized loglikelihood is -787.1 
## Optimization converged
# misspecified
fit1 <- fitMvdc(dat, Mvd1, start = start, 
               optim.control = list(trace=TRUE, maxit = 2000))
## initial  value 1201.877971 
## iter  10 value 810.919579
## iter  20 value 800.683656
## iter  20 value 800.683651
## iter  20 value 800.683651
## final  value 800.683651 
## converged
## initial  value 800.683651 
## final  value 800.683651 
## stopped after 1 iterations
fit1
## Call: fitMvdc(data = dat, mvdc = Mvd1, start = start, optim.control = list(trace = TRUE, 
##     maxit = 2000))
## Maximum Likelihood estimation based on 200 2-dimensional observations.
## Copula:  normalCopula 
## Margin 1 :
## m1.shape m1.scale 
##    2.053    1.011 
## Margin 2 :
## m2.mean   m2.sd 
##   5.870   3.034 
## Copula: 
##  rho.1 
## 0.4829 
## The maximized loglikelihood is -800.7 
## Optimization converged

For large \(p\), we can use an alternative two-stage strategy called inference functions for margins (IFM). In step one IFM estimates the marginal parameters, \(\beta\). In step two, estimates of the copula association parameters \(\alpha\) given \(\hat{\beta}\) are produced. Also called two-stage parametric ML method in censored data setting Note that the SE of \(\hat{\alpha}\) is an underestimate since the variability in \(\hat{\beta}\) is not corectly accounted for.

We can obtain a consistent estimate of \(\alpha\) using canonical ML (CML) which uses the empirical CDFs of each marginal to transform the observations into pseudo-observations. The pseudo-observations are used to estimate \(\hat{\alpha}_{CML}\)

TO incorporate covariates into the margins, we need the log-likelihood function which is the summation of the copula log-likelihood and the log-likelihood of all marginals

(ref equation 9)

# example from appendix A