Structuring Markov Models for Multidimensional Health Outcomes
Construction of a Decision-Analytic Markov Model Using Global Burden of Disease Data
CEA
Author
John Graves
Published
December 15, 2023
Support for this work is graciously acknowledged from the Data to Policy initiative administered by Vital Strategies and funded by Bloomberg Philanthropies and the CDC Foundation.
A Markov Trace-Based approach that separately tracks deaths from disease and from other non-disease-related causes, and uses differences in disease-related deaths across cycles to calculate years of life lost to disease (YLLs).
A Transition States approach that is similar to Markov trace approach, but that draws on a death tracking state to calculate YLLs.
A Markov Chain with Rewards approach to Healthy Longevity as developed in Caswell and Zarulli (2018) and Caswell and van Daalen (2021).
Our model will be flexibly parameterized such that practitioners can draw on the provided code to model any disease or country/region represented in the GBD data. For our purposes here, however, we will focus on a general cardiovascular disease model for the United Kingdom. We will also focus on a model for the full population—though the GBD data also allow us to separately model mortality and disease-specific incidence and death separately for males and females.
A state (“bubble”) diagram of the basic model structure is provided in Figure 1:
life_table_year =2019location ="United Kingdom"cause_id =491# CVDsex ="Both"radix =100000# Population basis for the construction of life tables
Data Sources
As noted above, all data and parameters (unless assumed) are drawn from the 2019 Global Burden of Disease study. Citations for specific datasets used are provided in Table 1.
Global Burden of Disease Collaborative Network (2020a)
Process GBD Data
Construct a Cause-Deleted Life Table
For a disease as prevalent as CVD, it is important that background mortality in our model net out all CVD-related deaths—otherwise, we will overcount CVD deaths, as they would be captured in a separate death-from-CVD rate, and in the background mortality rate.
To construct a cause-deleted life table we will use the GBD query tool to extract an estimate of the percentage of deaths from CVD for various (5-year) age groups. We will then apply this percentage to deaths in the overall life table to obtain an estimate of the number of deaths from CVD. We net this from total deaths and reconstruct the life table using a Multiple Decrements approach.
Life Table Data
The GBD life table data are arranged by age group, so we need to download a separate file for each group, and extract the relevant location and year.
The overall life table for our country of interest is shown in Table 2. Relevant columns are described in Table 3.
Construct Overall Life Table
lt <- df_life_table %>%select(location_name,location_id,life_table) %>%unnest(cols =c(life_table)) %>%select(location_name, location_id, age, ex, qx) %>%mutate(age_diff =c(1,lead(diff(age)))) %>%# Age difference in age groupmutate(mx =-log(1- qx)/age_diff) %>%# Mortality rateselect(location_name, location_id, age, ex, qx, mx) %>%mutate(p =1- qx) %>%# Survival probabilitymutate(cump =ifelse(row_number() ==1, 1, lag(cumprod(p)))) %>%# Cumulative survivalmutate(lx = radix * cump) %>%# At riskmutate(dx = lx * qx) %>%# Deaths = at risk * probability of deathas.data.table()setkey(lt, age)lt %>%select(-location_id) %>%kable(digits =3) %>%kable_styling()
Table 2: Overall Life Table
location_name
age
ex
qx
mx
p
cump
lx
dx
United Kingdom
0
81.070
0.004
0.004
0.996
1.000
100000.000
350.925
United Kingdom
1
80.355
0.001
0.000
0.999
0.996
99649.075
59.041
United Kingdom
5
76.401
0.000
0.000
1.000
0.996
99590.034
35.692
United Kingdom
10
71.428
0.000
0.000
1.000
0.996
99554.341
43.338
United Kingdom
15
66.458
0.001
0.000
0.999
0.995
99511.003
120.053
United Kingdom
20
61.535
0.002
0.000
0.998
0.994
99390.950
189.262
United Kingdom
25
56.647
0.002
0.000
0.998
0.992
99201.688
227.279
United Kingdom
30
51.771
0.003
0.001
0.997
0.990
98974.409
311.052
United Kingdom
35
46.926
0.004
0.001
0.996
0.987
98663.357
437.835
United Kingdom
40
42.124
0.007
0.001
0.993
0.982
98225.522
669.101
United Kingdom
45
37.394
0.010
0.002
0.990
0.976
97556.421
984.145
United Kingdom
50
32.748
0.015
0.003
0.985
0.966
96572.276
1489.166
United Kingdom
55
28.219
0.024
0.005
0.976
0.951
95083.110
2312.887
United Kingdom
60
23.856
0.039
0.008
0.961
0.928
92770.223
3601.522
United Kingdom
65
19.712
0.060
0.012
0.940
0.892
89168.701
5376.422
United Kingdom
70
15.805
0.098
0.021
0.902
0.838
83792.279
8207.582
United Kingdom
75
12.233
0.160
0.035
0.840
0.756
75584.698
12084.596
United Kingdom
80
9.055
0.270
0.063
0.730
0.635
63500.101
17118.564
United Kingdom
85
6.434
0.433
0.113
0.567
0.464
46381.538
20064.172
United Kingdom
90
4.426
0.637
0.202
0.363
0.263
26317.366
16755.052
United Kingdom
95
3.100
0.799
0.320
0.201
0.096
9562.313
7636.243
United Kingdom
100
2.226
0.904
0.469
0.096
0.019
1926.070
1741.274
United Kingdom
105
1.720
0.956
0.623
0.044
0.002
184.796
176.607
United Kingdom
110
1.452
1.000
0.000
0.000
8.189
8.189
Table 3: Life Table Parameters
Parameter
Description
ex
Remaining life expectancy
qx
Probability of death in age group
mx
Death rate in age group
p
Survival probability (1-qx)
cump
Cumulative survival probability
lx
Number at risk (set at the radix parameter value for the first age group)
dx
Number of deaths in age group
Note the life expectancy at age 0 (ex in the table). We’ll come back to this later, when we benchmark life expectancy in our model vs. the life table value.
GBD Cause of Death and Disease Incidence Data
To construct cause-deleted life tables we need an estimate of the number of deaths (dx) due to CVD. And to model transitions from the Healthy state to the CVD health state we’ll need disease incidence rates. To obtain these data we’ll draw on the GBD data tool to extract the percentage of deaths due to CVD, and the CVD incidence rate.1
1 Note you need an account with login and password to download these data.
The specific data query used for this example is provided in Figure 2, and the specific cause-of-death percentages for the country/region of interest are shown in Table 4 and Table 5. Code for constructing the cause-deleted life table will be provided later in this document.
Download and process GBD CVD incidence and cause of death data.
Our next step is to parameterize the model. Note that for the matrix calculus method (Approach 3) we need to separately track transient (Healthy, CVD) and absorbing (background mortality, cause-specific mortality) health states, so we do that in the parameters here.
We will structure our model to consider outcomes under four different strategies:
A natural history model that captures the status quo.
A prevention strategy that reduces CVD incidence.
A treatment strategy that reduces CVD-related death.
A prevention and treatment strategy that combines (2) and (3) above.
# Parameterizeparams <-list(# Treatment Strategiestx_names =c("natural_history","prevent","treat","prevent_treat"), # treatment namesn_tx =4, # number of treatment strategiestr_names =c("Healthy","CVD"), # transient health statesab_names =c("DeathOC","DeathCVD"), # absorbing health statesn_states =4, # total number of health statesdf_gbd = df_gbd, # disease incidence and cause-of-death datalife_table = lt, # overall life tablehorizon =120, # model time horizon (in years) r_ann =1e-6, # annual discount rateDelta_t =1, # time step (1 = yearly, 1/12 = monthly, etc.)age0 =0, # initial age in models0T =c(1,0,0,0), # initial state occupancy # c(1,0,0,0) means the modeled cohort starts off heatlhyuCVD =0.773, # CVD utility payoff valueuD =0, # Death utility payoff valueuH =1, # Heatlhy utility payoff valuedw =0.041, # CVD disability weightc_H =0,c_Prevent =10, # Cycle cost of prevention strategy c_Treat =500, # Cycle cost of treatment strategyc_CVD =500, # Cycle cost of CVD health statec_DeathCVD =25000, # Transition cost of death from CVDhr_treat =0.85, # Hazard Ratio for Treatment Strategyhr_prevent =0.9, # Hazard Ratio for Prevention Strategy# GBD Calibration parameters (helps calibrate to reported prevalence by age)rr_deathCVD =1.1, rr_incidence =0.75,cycle_correction ="half-cycle",ExR =# Reference life table from GBD tibble::tribble(~Age, ~Life.Expectancy,0L, 88.8718951,1L, 88.00051053,5L, 84.03008056,10L, 79.04633476,15L, 74.0665492,20L, 69.10756792,25L, 64.14930031,30L, 59.1962771,35L, 54.25261364,40L, 49.31739311,45L, 44.43332057,50L, 39.63473787,55L, 34.91488095,60L, 30.25343822,65L, 25.68089534,70L, 21.28820012,75L, 17.10351469,80L, 13.23872477,85L, 9.990181244,90L, 7.617724915,95L, 5.922359078 ))params <-with(params,{modifyList(params,list(omega = horizon/Delta_t, # Total number of cyclesr_Delta_t = r_ann * Delta_t, # Cycle discount rateages = (0:(horizon/Delta_t))*Delta_t + age0, # Age in each cycle# Approximation function for reference life table life expectancies:fExR =function(x) pmax(0,unname(Hmisc::approxExtrap(ExR$Age, ExR$Life.Expectancy,xout = x)$y)) )) })# First cycle will be 1,0,0,0 so need to slice off one age for # construction of transition matrices, but leave one# for the trace.params$ages_trace <- params$agesparams$ages <- params$ages[-length(params$ages)]
Define Functions for Time Inhomogeneous Parameters
Nearly all health state progression parameters in our model are age-depdendent. We therefore need to define functions for each transition rate that are a function of the age of the cohort in the cycle. The functions below draw on the GBD data above to estimate age-specific transition rates for the following transitions:
rH_CVD(age) = Healthy to CVD (uses a slightly calibrated GBD disease incidence rate).
rCVD_D(age) = CVD to death from CVD (uses the cause-specific mortality rate from a cause-deleted life table).
rD(age) = Death from other causes (uses cause-deleted mortality rate from a cause-deleted life table).
For this model, we will simply draw rates directly from the GBD life tables and other data. A limitation of this approach is that ages are grouped into five-year buckets—meaning that the mortality rate will be held constant within an age group.
2 To explore more on parametric mortality modeling, see our interactive tutorial here.
The leftmost panel in Figure 3 shows the observed vs. fitted log(mortality rate) for our population, while the rightmost panel shows the residual. The residuals show clearly that the model fits mortality rates well up until about age 80, at which point the modeled mortality rate overstates the death rate until roughly age 95—after which the modeled mortality rate is well below the observed rate.
In the code below, we provide alternative functions to draw on the modeled mortality rates to return age-specific death transition rates—though note that in our results below, we continue to draw directly off the GBD data.
Using the functions defined above, we can easily extract transition rates for a given age. Echoing the comments above, note that the basic function (rD(age)) uses grouped mortality rate data, whereas the function based on the Heligman-Pollard model yields different rates for each age:
# Rate of non-cause death for a 75 and 76 year-oldrD(75)
[1] 0.02366835
rD(76)
[1] 0.02366835
# Rate of non-cause death for a 75 and 76 year-old based on Heligman Pollard modelrD_p(75, params = params)
[1] 0.02761481
rD_p(76, params = params)
[1] 0.03007687
Approach 1: Markov Trace
With the basic parameters and parameter functions defined, we can now move to our first approach. Under this approach, we will separate cause-specific and non-CVD deaths as separate health states. The updated state transition diagram is shown in Figure 4:
Figure 4: State Transition Diagram for Approach 1
Building out this model structure requires constructing a series of transition probability matrices.
Before we proceed, it is useful to stop and check whether the experience of our modeled cohort mirrors the experience of the population as summarized in the GBD input data sources.
In Figure 5 below, we calculate and plot the percent of the cohort surviving at each age (1 - DeathOC - DeathCVD) and overlay the survival probability from the population life table used to parameterize the model. As the figure shows, our modeled cohort nearly exactly matches the survival experience of the population.
Our next checkpoint is designed to assess whether CVD prevalance at any age matches the prevalance value in the GBD data. Figure 6 below shows this using the parameters above.
To calculate YLDs in a given cycle, we multiply state occupancy by a disability weight payoff vector \(\mathbf{d}\),
\[
\mathbf{d}=\left[\begin{array}{c}
0 \\
\text{dw}_{CVD} \big ( \frac{1}{r_{\Delta_t}}(1-e^{-r_{\Delta_t}}) \Delta_t \big ) \\
0 \\
0
\end{array}\right]
\] where \(\texttt{dwS1}\) and \(\texttt{dwS2}\) are the disability weights for the Sick and Sicker states, respectively.
Years of Life Lost (YLL)
YLLs are based on the present value of remaining life expectancy among deaths that occur in each cycle \(t\). Define \(a(t)\) as the age of the cohort at cycle \(t\), i.e., \(a(t) = t \cdot \Delta t + a0\), where \(a_0\) is the age of the cohort at \(t=0\). Define \(e(t)=e(a(t))\) as the present value of remaining life expectancy at cycle \(t\). Following the GBD continuous time discounting approach, \(e(a(t))\) is given by
where \(Ex(a)\) is the remaining life expectancy at age \(a\). \(Ex(a)\) is drawn from either an exogenous or an endogenous life table, depending on the objectives of the modeling exercise (Anand and Reddy 2019).
We next define a remaining life expectancy payoff vector at cycle \(t\):
Approach 3: Healthy Longevity (Markov Chain with Rewards)
This approach adapts the protocol in Caswell and van Daalen (2021) to calculate, YLL, YLD, and DALY outcomes using a Markov chain with rewards. The approach is quite math-heavy—though the code provided below is generalized enough that it can be adapted for nearly any decision analytic model.
The healthy longevity method proposed by Caswell and van Daalen (2021)Caswell and Zarulli (2018) are quite powerful, allowing for estimation of expected values as well as other higher-level moments in an outcome distribution (e.g., variance, skewness, etc.). To keep exposition as simple as possible, we focus here only on expected values—though note that formulas for calculating other moments can be found in the original studies.
Define
\[
\begin{aligned}
\tau &= \text{Number of transient (non-absorbing) states}\\
\alpha &= \text{Number of absorbing states}\\
\omega &= \text{Number of cycles} \\
s &= \text{Total number of states; }s=\tau\omega+\alpha \\
\mathbf{U}_{x} &= \text{Transition matrix for age }x, \text{for }x=1,\dots,\omega\\
\mathbf{D}_{j} &=\text{Age advancement matrix for stage }j, \text{for }j=1,\dots,\tau \\
\mathbf{M}_{i} &= \text{Mortality matrix for age class }x, \text{for } x = 1,\dots\omega \\
\mathbf{K} &= \text{vec-permutation matrix; parameters }\tau,\omega
\end{aligned}
\]
In the above notation, the matrix \(\mathbf{U}_x\) captures transition probabilities among transient (i.e., non-absorbing) health states, while \(\mathbf{M}_x\) captures transitions from transient health states to the absorbing death states (non-CVD mortality and CVD-mortality). Indexing by age class \(x\) indicates that separate matrices are defined for each age in the Markov model—that is, our Markov model is time inhomogeneous.
To construct \(\mathbf{U}_x\) and \(\mathbf{M}_x\) we define transition rate (“intensity”) matrices as in Approaches 1 and 2 above.3 The overall intensity matrix \(\mathbf{Q_x}\) is given by
3 The only difference with the two approaches above is that the rows in these rate matrices correspond to the final state, while the columns correspond to the starting state; this is the opposite of the rate matrices defined above, where the rows corresponded to the starting health state and the columns to the ending health state in a given cycle.
\[
\mathbf{Q}_x=\left(\begin{array}{c|c}
\mathbf{V}_x & \mathbf{0} \\
\hline \mathbf{S}_x & \mathbf{0}
\end{array}\right)
\] where \(\mathbf{V}_x\) is the rate matrix for the transitory (i.e., non-absorbing) states and \(\mathbf{S}_x\) is the rate matrix for the absorbing states. The diagonal elements of \(\mathbf{Q}_x\) are the negative sum of the non-diagonal column elements, thus making the column sums of \(\mathbf{Q}_x\) zero.
For the defined time step \(\Delta_t\), the discrete time transition probability matrix \(\mathbf{P}_x\) is obtained by taking the matrix exponential of the intensity matrix (\(\mathbf{Q}_x\)) multipled by the time step (\(\Delta_t\)):
\[
\mathbf{P}_x =e^{\mathbf{Q}_x \Delta t}
\]
We can then obtain \(\mathbf{U}_x\) and \(\mathbf{M}_x\) based on:
In addition, the matrix \(\mathbf{D}_j\) defines age advancement in the Markov chain. Using the example from Caswell and van Daalen (2021), if \(\omega=3\) then
In our implementation, we include the (optional) 1 value in the lower right corner; this assumes that after the last specified age, the cohort continues to experience transitions among health states according to the transition probabilities defined for the last age class. If this value is 0, the model will assume that everyone dies after the last cycle.
Parameterize the Healthy Longevity Model (Part 1)
params3_ <-with(params,modifyList(params,list(alpha =length(ab_names),tau =length(tr_names), s =length(tr_names)*omega +length(ab_names) #total number of states;s=τω+α)))params3_ <-with(params3_,modifyList(params3_,list(# Natural History Transition Rate MatrixmR = ages %>%map(~({ mR_nh =matrix(c(-(rH_CVD(.x) +rD(.x)), rH_CVD(.x), rD(.x), 0 ,0, -(rD(.x)+rCVD_D(.x)), rD(.x),rCVD_D(.x),0,0,0,0,0,0,0,0),nrow =4, ncol =4,byrow=TRUE, dimnames =list(c(tr_names,ab_names),c(tr_names,ab_names) )) mR_prevent =matrix(c(-(hr_prevent *rH_CVD(.x) +rD(.x)), hr_prevent *rH_CVD(.x), rD(.x), 0 ,0, -(rD(.x)+rCVD_D(.x)), rD(.x),rCVD_D(.x),0,0,0,0,0,0,0,0),nrow =4, ncol =4,byrow=TRUE, dimnames =list(c(tr_names,ab_names),c(tr_names,ab_names) )) mR_treat =matrix(c(-(rH_CVD(.x) +rD(.x)), rH_CVD(.x), rD(.x), 0 ,0, -(rD(.x)+ hr_treat *rCVD_D(.x)), rD(.x),hr_treat *rCVD_D(.x),0,0,0,0,0,0,0,0),nrow =4, ncol =4,byrow=TRUE, dimnames =list(c(tr_names,ab_names),c(tr_names,ab_names) )) mR_prevent_treat =matrix(c(-(hr_prevent *rH_CVD(.x) +rD(.x)), hr_prevent *rH_CVD(.x), rD(.x), 0 ,0, -(rD(.x)+ hr_treat *rCVD_D(.x)), rD(.x),hr_treat *rCVD_D(.x),0,0,0,0,0,0,0,0),nrow =4, ncol =4,byrow=TRUE, dimnames =list(c(tr_names,ab_names),c(tr_names,ab_names) ))array(c(as.vector(mR_nh),as.vector(mR_prevent), as.vector(mR_treat),as.vector(mR_prevent_treat)), dim =c(length(tr_names)+length(ab_names),length(tr_names)+length(ab_names),length(tx_names)),dimnames =list(c(tr_names,ab_names),c(tr_names,ab_names),tx_names)) %>%apply(.,3,function(x) t(x), simplify=FALSE) })) )))# not sure why this is needed, but otherwise the length gets set too long...params3 <-with(params3_, modifyList(params3_, list(mR_ = mR %>%transpose())))params3$mR = params3$mR_params3 <-with(params3,modifyList(params3,list(mV = mR %>%map(~({ R = .x R %>%map(~({ m <- .x[tr_names,tr_names] })) })),mQ = mR %>%map(~({ R = .x R %>%map(~({ V = .x[tr_names,tr_names] S = .x[ab_names,tr_names] zero_ <-matrix(0, nrow =length(tr_names)+length(ab_names), ncol =length(ab_names)) tmp <-cbind(rbind(V,S),zero_)dimnames(tmp) <-list(c(tr_names,ab_names),c(tr_names,ab_names)) tmp })) })) )))params3 <-with(params3,modifyList(params3,list(mP = mQ %>%map(~({ Q = .x Q %>%map(~(expm(.x * Delta_t))) })))))params3 <-with(params3,modifyList(params3,list(mU = mP %>%map(~({ P <- .x P %>%map(~(.x[tr_names,tr_names])) })),mM = mP %>%map(~({ P = .x P %>%map(~(.x[ab_names,tr_names])) })))))params3 <-with(params3,modifyList(params3,list(D = {# Create diagonal age advancement matrix D <-matrix(0, omega, omega) vec <-rep(1, omega-1) D[row(D)-1==col(D)] <- vec D[omega,omega] =1 D })))
We next combine the transition matrices (for all age classes) together into a series of block-structured matrices as follows:
where \(\mathbf{K}\) is a permutation matrix known as the vec-permutation matrix.4 See Henderson and Searle (1981) and Appendix B in Caswell and van Daalen (2021) for further information.
4 A function to construct a vec-permultation matrix is provided within the code snippet below.
vec <-# a simple function to return the vec of an arrayfunction(x) { y <-c(x)return(y) }vecperm <-# vecperm# function to calculate the vec permutation matrix K of index m,n# let X be a m x n matrix, and X' the transpose of X# then K satisfies # vec(X') = K*vec(X)function(m, n) { K <-matrix(0, m * n, m * n) a <-matrix(0, m, n)for (i in1:m) {for (j in1:n) { e <- a e[i, j] <-1 K <- K +kronecker(e, t(e)) } }return(K) }params3 <-with(params3,modifyList(params3,list(bbD_ =kronecker(diag(tau), D),bbU_ = mU %>%map(~(bdiag(.x))),K =vecperm(tau, omega))))params3 <-with(params3,modifyList(params3,list(mUtilde = bbU_ %>%map( ~ ({t(K) %*% bbD_ %*% K %*% .x })))))params3 <-with(params3,modifyList(params3,list(mMtilde = mM %>%map(~({do.call(cbind,.x) })) )))params3 <-with(params3,modifyList(params3,list(mPtilde =map2(mUtilde, mMtilde, ~ ({rbind(cbind(.x, matrix(0, tau * omega, alpha)) ,cbind(.y, diag(alpha))) })))))
We now (nearly) have the components needed to calculate outcomes. A key difference in the Healthy Longevity approach, relative to the approaches above, is that we do not calculate outcomes off a trace. Rather, the method utilizes matrix calculus to solve for expected outcomes and other moments in the outcome distribution (e.g., variance, skewness, etc.).5
5 Again, for our purposes here we will focus on expected outcomes—though note that formulas for higher-order moments are provided in Caswell and van Daalen (2021) and Caswell and Zarulli (2018).
To calculate outcomes, we must next define “reward” matrices \(\mathbf{R}_m\), where \(m\) indexes the moment of interest (e.g., expected value, variance, etc.). The structure and values of \(\mathbf{R}_m\) will differ, however, depending on the outcome.
To facilitate how we define rewards (i.e., payoffs), we briefly classify each of our outcomes (LE, YLL, YLD, DALYs, QALYs) into broad classes corresponding to whether the payoff or “reward” applies to occupancy, or to transitions, within or to a given health state:
Classification of Health Outcomes
Outcome
Reward Class
Life Expectancy
Occupancy (1.0 for each alive health state)
Years of life lived with disability (YLD)
Occupancy (disability weight applied to time in CVD state)
Yearls of life lost to disease (YLL)
Transition (remaining life expectancy applied to CVD death transitions)
Disability-Adjusted Life Years (DALYs)
Occupancy (YLD) + Transition (YLL)
Quality-Adjusted Life Years
Occupancy (utility weights applied to living health states)
Occupancy-Based Outcomes
To calculate occupancy-based outcomes, we start with a reward matrix \(\mathbf{H}\), which has dimensions \(\tau \times \omega\) and is structured as shown in Figure 7:
Figure 7: Reward matrix \(\mathbf{H}\)
Cell values within this matrix can be set to one if we want to “reward” that health state-age combination in our outcome measure, and zero otherwise.6
6 This structure allows us, for example, to estimate outcomes for certain age ranges or other decision-relevant combinations of health state and age.
For occupancy-based outcomes, we define \(\mathbf{H}\) such that each cell receives a value of one.
Figure 8: Reward matrix \(\mathbf{H}\)
We use this matrix to define the reward vector \(\mathbf{h}\):7
7 The \(\text{vec}\) operator stacks the columns of an \(m \times n\) matrix into a \(mn \times 1\) vector.
\[
\mathbf{h} = \text{vec } \mathbf{H}
\] We also define \(\neg \mathbf{h}\) as the complement of \(\mathbf{h}\), (i.e., values of 1.0 become 0, and vice versa).
Occupancy-based outcomes with partial rewards (e.g., YLDs, QALYs) require an additional matrix \(\mathbf{V}\), which has the same structure as \(\mathbf{H}\):
If \(\mathbf{S}\) is the set of states that receive a reward (e.g., the CVD state for both QALYs and YLDs), then a cycle spent in state \(i\) at age \(j\) is defined by
\[
\mathbf{V}(i, j)= \begin{cases}E\left[v(i, j)\right] & \text { if }(i, j) \in \mathcal{S} \\ 0 & \text { otherwise }\end{cases}
\] For YLD outcomes, \(\mathbf{V}\) is defined as shown in Figure 9, while for QALY outcomes \(\mathbf{V}\) is defined as shown in Figure 10 .
Figure 9: Reward matrix for YLD Outcome
Figure 10: Reward matrix for QALY Outcome
We simliarly define an occupancy indicator vector \(\mathbf{v}\) just as we did \(\mathbf{h}\):
Like many health economic applications, the Healthy Longevity approach makes assumptions on partial occupancy in a health state.8 Specifically, the approach assumes that partial occupancy in a health state receives half the reward—essentially, we draw on an assumption that mid-cycle transitions occur at the half-way point.9
8 For example, half-cycle corrections are often used—though there are other methdos (e.g., Simpson’s rule) that are also drawn upon.
9 This is similar to the standard “half-cycle” correction, however under this approach, a half-cycle correction occurs in each cycle, not in the first and last cycles. As we will see below, for rewards that are uniform across transient health states (e.g., calculating life expectancy involves a “payoff” of 1.0 for every living health state) this will yield identical answers as a Markov trace-based approach that adopts a half-cycle correction.
We combine this partial ocupancy assumption, along with the vectorized reward matrices \(\mathbf{h}\) and \(\mathbf{v}\) to obtain the following:
We combine \(\tilde{\mathbf{B}}_{1}\) and \(\tilde{\mathbf{C}}_{1}\) to obtain:
\[
\tilde{\mathbf{R}}_{1}=\left(\begin{array}{c|c}
\tilde{\mathbf{B}}_{1} & \mathbf{0} \\
\hline \tilde{\mathbf{C}}_{1} & \mathbf{0}
\end{array}\right)
\] which has same block structure as the transition probability matrix \(\tilde{\mathbf{P}}\).
Expected outcomes are based on
\[
\begin{aligned}
& \tilde{\boldsymbol{\rho}}_{1}=\tilde{\mathbf{N}}^{\top} \mathbf{Z}\left(\tilde{\mathbf{P}} \circ \tilde{\mathbf{R}}_{1}\right)^{\top} \mathbf{1}_{s}
\end{aligned}
\] where \(\tilde{\mathbf{N}}\) is the fundamental matrix
\[
\tilde{\mathbf{N}}=(\mathbf{I}-\tilde{\mathbf{U}})^{-1}
\] and \(\mathbf{Z}\) is
For transition-based outcomes such as YLLs, we define the first moment of remaining life expectancy as the vector \(\tilde{\boldsymbol{\eta}}^{\top}\). This vector has dimensions \(\tau\omega \times 1\) and has the following basic structure:
\[
\tilde{\mathbf{\eta}}=\left(\begin{array}{c}
\eta_{11} \\
\vdots \\
\eta_{\tau 1} \\
\hline \vdots \\
\hline \eta_{1 \omega} \\
\vdots \\
\eta_{\tau \omega}
\end{array}\right)
\] where \(\eta_{i x}\) is remaining life expectancy for an individual in health state \(i\) at a given age \(x\). In this structure, remaining life expectancy for each health state is grouped within age classes.
Our choice for remaining life expectancy values \(\eta_{i x}\) for YLL outcomes will depend on the context and research question at hand (Anand and Reddy 2019). Historically, the GBD has utilized an exogenous, external life table based on the maximum potential life span among humans (Global Burden of Disease Collaborative Network 2021). Anand and Reddy (2019) discuss alternative contexts in which remaining life expectancy based on an endogenous life table or life expectancy model might be preferred.
The distinction between exogenous and endogenous rewards for YLLs boils down to whether the remaining life expectancy value used originates from outside the population under study (i.e., mortality risks used to calculate remaining life expectancy at a given age are independent of the mortality risks of the population being assessed), or not.
For YLLs based on exogenous life tables, such as the reference life table published by the GBD, we define \(\tilde{\boldsymbol{\eta}}^{\top}\) based on the reference life table value at each age. For YLLs based on an endogenous life table, we could simply swap in external life table values from our country or region of interest, or use the model itself to estimate remaining life expectancy for a given age and initial health state.
To aid in intuition, Table 7 provides remaining life expectancy values under exogenous and endogenous approaches for our running example. The row structure of this table mimics the structure needed for \(\tilde{\boldsymbol{\eta}}^{\top}\)—though note that the table only provides values for a subset of ages, not all age classes.
The column based on exogenous life tables (Exogenous-Ref. Life Table) is simply the reference life table life expectancy value as published by the GBD (Global Burden of Disease Collaborative Network 2021). The column marked Endogenous-Model is based on solving for remaining life expectancy using the occupancy based rewards approach discussed above.10 Finally, the last column (Endogenous-Life Table) uses country-specific life expectancy information (by age) published by the GBD (Global Burden of Disease Collaborative Network 2020b). In our running example—which has been structured and (roughly) calibrated to match life table data—the values for the Healthy state are very similar under both endogenous approaches—though in Endogenous-Model, life expectancy values for those who start in the CVD health state are strictly lower, owing to the higher mortality rate from disease.
10 This column essentially reproduces the approach to remaining life expectancy as originally conceived in Caswell and van Daalen (2021). The only difference is that we use the full rewards-based approach (which allows for partial transitions and half-cycle corrections) rather than solving for the fundamental matrix \(\tilde{\mathbf{N}}\) and applying the formula \(\tilde{\boldsymbol{\eta}}_{1}^{\top}=\mathbf{1}_{\tau \omega}^{\top} \tilde{\mathbf{N}}\)
Healthy Longevity Outcome Functions
healthy_longevity_occupancy <-function(params, H, V) {with(params,{map2(mUtilde,mPtilde,~({ U = .x P = .y N =solve(diag(tau*omega)-U) h =vec(H) %>%as.matrix() not_h =1-h v <-vec(V) %>%as.matrix() B1 <- h %*%t(v) +0.5* (not_h %*%t(v)) +0.5* (v %*%t(not_h)) # Eq. 46 C1 =0.5* (rep(1,alpha) %*%t(v)) # Eq. 48 R1 =rbind(cbind(B1, matrix(0, tau * omega, alpha)) ,cbind(C1, diag(alpha))) R2 = R1 * R1 R3 = R1 * R1 * R1 Z =cbind(diag(tau*omega),matrix(0,nrow=tau*omega, ncol=alpha)) e =rep(1,s) rho1_ <-t(N)%*% Z %*%t(P * R1) %*% e# The following needs to be debugged# rho2_ <-# N %*% (Z %*% t(.y * R1) %*% e + 2 * t(.x * B1) %*% rho1_)# B2 <- R2[1:(tau * omega), 1:(tau * omega)]# rho3_ <- t(N) %*% (Z %*% ((t(.y * R3) %*% e)) + 3 * (t(.x * B2) %*% rho1_) + 3 * (t(.x * B1) %*% rho2_)) rho1_ })) })}healthy_longevity_yll <-function(params, life_expectancy, disc) {with(params,{map2(mUtilde,mPtilde,~({ U = .x P = .y N =solve(diag(tau*omega)-U) Z =cbind(diag(tau*omega),matrix(0,nrow=tau*omega, ncol=alpha)) disc_ =rev(sort(rep(disc,length(tr_names)))) eta1_ex_ =rev(sort(rep(life_expectancy,length(tr_names)))) eta1_ex = eta1_ex_ B1 =matrix(0,nrow=tau*omega, ncol = tau*omega) C1 =rbind(matrix(0,nrow=1,ncol=tau*omega),eta1_ex*disc_) R1 =cbind(rbind(B1,C1),matrix(0,nrow=tau*omega+2,ncol=2)) R2 = R1 * R1 R3 = R1 * R1 * R1 Z =cbind(diag(tau*omega),matrix(0,nrow=tau*omega, ncol=alpha)) e =rep(1,s) rho1_ =t(N) %*% Z %*%t(.y * R1) %*% e# The following needs to be debugged# rho2_ <-# N %*% (Z %*% t(.y * R1) %*% e + 2 * t(.x * B1) %*% rho1_)# B2 <- R2[1:(tau * omega), 1:(tau * omega)]# rho3_ <- t(N) %*% (Z %*% ((t(.y * R3) %*% e)) + 3 * (t(.x * B2) %*% rho1_) + 3 * (t(.x * B1) %*% rho2_)) rho1_ })) })}
Echoing the approach to occupancy-based rewards above, expected outcomes are based on
\[
\begin{aligned}
& \tilde{\boldsymbol{\rho}}_{1}=\tilde{\mathbf{N}}^{\top} \mathbf{Z}\left(\tilde{\mathbf{P}} \circ \tilde{\mathbf{R}}_{1}\right)^{\top} \mathbf{1}_{s}
\end{aligned}
\] where \(\circ\) denotes element-wise multiplication, \(\tilde{\mathbf{N}}\) is the fundamental matrix
\[
\tilde{\mathbf{N}}=(\mathbf{I}-\tilde{\mathbf{U}})^{-1}
\] and \(\mathbf{Z}\) is
For both occupancy- and transition-based outcomes, total (across all ages) outcomes for each starting health state are calculated as
\[
\boldsymbol{\rho}_{m}^{\text {stage }}(\operatorname{age} x)=\left(\mathbf{e}_{x}^{\top} \otimes \mathbf{I}_{\tau}\right) \tilde{\boldsymbol{\rho}}_{m} \quad \tau \times 1
\] where \(\otimes\) is the Kronecker operator.
Alternatively, we may wish to calculate outcomes separately under different starting ages, and for a specified starting health state (e.g., healthy). This is given by
Though later GBD iterations removed age and time discounting for the purposes of documenting disease burdens worldwide, the World Health Organization’s Choosing Interventions that are Cost-Effective (WHO-CHOICE) program recommends consideration of time discounting of health outcomes (Murray et al. 2020; Bertram et al. 2021). Discounting is also not covered in the original formulation of the Healthy Longevity approach as outlined in Caswell and van Daalen (2021).
While we have set the annual discount rate to zero for this example, our approach adopts the WHO-CHOICE recommendation and includes the option for time discounting in the provided code. Specifically, we apply the cycle-specific discount factor to the values in the rewards matrix \(\mathbf{V}\) and vector \(\mathbf{v}\) for occupancy-based outcomes. For YLL outcomes, we apply the appropriate discounting factor to the age classes in the remaining life expectancy vector \(\tilde{\boldsymbol{\eta}}^{\top}\).
Remaining life expectancy is identical across approaches, reflecting the fact that we used a half-cycle correction for Approaches 1 and 2, and the Healthy Longevity approach applies a similar half-cycle correction.11 YLDs, YLLs and DALYs are also similar across the three methods.
11 Had we used a different cycle correction method we would yield slightly different results—but the differences are rarely meaningful for optimal decisions.
Table 8: Life Expectancy, YLD, YLL and DALY Outcomes, by Approach
Approach
Life Expectancy (Life Table)
Life Expectancy (Model)
YLDs
YLLs
DALYs
natural_history
Markov Trace
81.07
81.081
0.363
1.921
2.284
Transition States
81.081
0.363
1.921
2.284
Markov Chain With Rewards
81.081
0.353
2.022
2.376
prevention
Markov Trace
81.188
0.334
1.787
2.121
Transition States
81.188
0.334
1.787
2.121
Markov Chain With Rewards
81.188
0.325
1.882
2.207
treatment
Markov Trace
81.350
0.341
1.585
1.926
Transition States
81.350
0.341
1.585
1.926
Markov Chain With Rewards
81.350
0.331
1.670
2.001
prevent_treat
Markov Trace
81.255
0.370
1.704
2.074
Transition States
81.255
0.370
1.704
2.074
Markov Chain With Rewards
81.255
0.360
1.794
2.155
DALYs for Different Starting Ages
Our models above assumed a starting cohort age of 0. One nice feature of the healthy longevity approach, however, is that we can calculate outcomes for any starting age.12 We can apply the following formula to our constructed transition and reward matrices to yield expected outcomes across ages:
12 We could do this using the Markov-traced approaches, too, but we’d need to re-run the model with a new cohort starting at each age.
Note that had we employed an annual discount rate in our results, the age-based plots shown in Figure 11 are not recommended, as the various later ages would be discounted and would therefore not represent cohorts with different initial agess from the same baseline period.
DALYs Based on Endogenous Life Expectancy
We next explore several approaches for calculating DALYs based on endogenous life expectancy.
DALY Shortcut: QALY-like DALY
Some applications and modeling platforms apply a “shortcut” to estimate DALYs by defining a payoff vector that “rewards” time in the sick state using the disability weight (as in the approaches above), and also applies a payoff value of 1 for each cycle spent in the disease-related death state.
To implement this shortcut we multiply state occupancy in a given cycle by the payoff vector \(\mathbf{d_{QALYDALY}}\),
\[
\mathbf{d}_{QALYDALY} =
\left[
\begin{array}{c}
0.0 \\
\text{dw}_{CVD} \\
0.0 \\
1.0
\end{array}
\right]
\begin{array}{l}
\text{Healthy} \\
\text{CVD} \\
\text{DeathOC} \\
\text{DeathCVD} \\
\end{array}
\] where \(\text{dw}_{CVD}\) is the disability weight for CVD.
Important
As we will see in the results below, this approach will yield different results for DALYs because it conflates the model time horizon with life expectancy. A primary issue is that it applies a payoff value of 1.0 to an absorbing state; the number of times this payoff value is applied to the population is purely a function of the time horizon in the model. In other words, we can arbitrarily increase or decrease the estimated contribution of YLLs to DALYs by lengthening or shortening the number of cycles in our model. Since occupancy in an absorbing state will remain positive even if everyone in the cohort has died, we will continue to accrue (or lose) YLLs as we add or subtract cycles to the end of the Markov trace
We do not face a similar issue when estimating YLDs because if everyone has entered an absorbing state, we are simply multiplying the disability weight by a (living) health state occupancy value that is essentially zero.
# Caswell and van Daalen (2021) recommend using eq (51), which multiplies the # fundamental matrix (eq 16) by a vector of ones. However, this estimate# of longevity does not make any kind of cycle correction. Instead, we will# use the estimate of life expectancy calculated earlier (LE3) to fill out the # "reward" matrix for remaining life expectancy at each age. The # code to use the original approach is provided (and commented out) below, however.# N = with(params3,({# mUtilde %>% map(~({# solve(diag(tau*omega)-.x)# }))# }))# # remaining_life_expectancy_endogenous = as.vector(t(rep(1,params3$tau*params3$omega)) %*% N[[1]]) %>% {.[seq(1,params3$tau*params3$omega,2)]}# YLL3_endogenous_ <- params3 %>% healthy_longevity_yll(life_expectancy = remaining_life_expectancy_endogenous, disc = disc[-length(disc)])remaining_life_expectancy_endogenous <- LE3_ %>%map(~({.x[seq(1,params3$tau*params3$omega,2)]})) %>% {.$natural_history}YLL3_endogenous_ <- params3 %>%healthy_longevity_yll(life_expectancy = remaining_life_expectancy_endogenous, disc = disc[-length(disc)])YLL3_endogenous <- YLL3_endogenous_ %>%map(~({ tmp <- (kronecker(t(c(1,rep(0,params3$omega-1))) ,diag(params3$tau)) %*%as.matrix(.x)) tmp[1,1] # life expectancy when starting healthy}))# Endogeneous life expectancy only affects YLLs, so we can simply use the YLD value from above.YLD3_endogenous <- YLD3_ %>%map(~({ tmp <- (kronecker(t(c(1,rep(0,params3$omega-1))) ,diag(params3$tau)) %*%as.matrix(.x)) tmp[1,1]}))DALY3_endogenous_ <-map2(YLL3_endogenous_,YLD3_,~(.x+.y))DALY3_endogenous <- DALY3_endogenous_ %>%map(~({ tmp <- (kronecker(t(c(1,rep(0,params3$omega-1))) ,diag(params3$tau)) %*%as.matrix(.x)) tmp[1,1]}))result4 <-cbind(LE3, YLD3_endogenous, YLL3_endogenous, DALY3_endogenous) %>%as.data.frame() %>%mutate_all(~as.numeric(.)) %>%rownames_to_column(var="strategy") %>%mutate(approach ="Markov Chain With Rewards - Endogenous") %>% dplyr::select(approach, strategy ,LE = LE3, YLD = YLD3_endogenous, YLL = YLL3_endogenous, DALY=DALY3_endogenous) %>%bind_rows(cbind(LE,YLD,YLL_QALY_DALY,QALY_DALY) %>%as.data.frame() %>%mutate_all(~as.numeric(.)) %>%rownames_to_column(var="strategy") %>%mutate(approach ="Markov Trace - QALY-like DALY") %>% dplyr::select(approach, strategy, LE, YLD , YLL= YLL_QALY_DALY, DALY=QALY_DALY) )tib <- result1 %>%bind_rows(result2) %>%bind_rows(result3) %>%bind_rows(result4)tib %>%arrange(strategy) %>%select(-strategy) %>%kable(digits =3, col.names =c("Approach","Life Expectancy (Life Table)","Life Expectancy (Model)","YLDs","YLLs","DALYs")) %>%kable_styling() %>%pack_rows(index =c("natural_history"=5, "prevention"=5, "treatment"=5,"prevent_treat"=5)) %>%kable_styling()
Approach
Life Expectancy (Life Table)
Life Expectancy (Model)
YLDs
YLLs
DALYs
natural_history
Markov Trace
81.07
81.081
0.363
1.921
2.284
Transition States
81.081
0.363
1.921
2.284
Markov Chain With Rewards
81.081
0.353
2.022
2.376
Markov Chain With Rewards - Endogenous
81.081
0.353
1.484
1.837
Markov Trace - QALY-like DALY
81.081
0.363
6.409
6.772
prevention
Markov Trace
81.188
0.334
1.787
2.121
Transition States
81.188
0.334
1.787
2.121
Markov Chain With Rewards
81.188
0.325
1.882
2.207
Markov Chain With Rewards - Endogenous
81.188
0.325
1.380
1.705
Markov Trace - QALY-like DALY
81.188
0.334
5.980
6.314
treatment
Markov Trace
81.350
0.341
1.585
1.926
Transition States
81.350
0.341
1.585
1.926
Markov Chain With Rewards
81.350
0.331
1.670
2.001
Markov Chain With Rewards - Endogenous
81.350
0.331
1.223
1.555
Markov Trace - QALY-like DALY
81.350
0.341
5.348
5.689
prevent_treat
Markov Trace
81.255
0.370
1.704
2.074
Transition States
81.255
0.370
1.704
2.074
Markov Chain With Rewards
81.255
0.360
1.794
2.155
Markov Chain With Rewards - Endogenous
81.255
0.360
1.315
1.675
Markov Trace - QALY-like DALY
81.255
0.370
5.732
6.102
Incremental Outcomes
DALY changes
tib %>%select(approach,strategy,YLD,YLL,DALY) %>%left_join( tib %>%filter(strategy =="natural_history") %>%select(-strategy) %>%select(approach,YLD_ref = YLD,YLL_ref = YLL,DALY_ref = DALY), c("approach") ) %>%mutate(YLD = YLD - YLD_ref,YLL = YLL - YLL_ref,DALY = DALY - DALY_ref) %>%select(approach,strategy,YLD,YLL,DALY) %>%mutate(approach =factor(approach, levels =c("Markov Trace","Transition States","Markov Chain With Rewards","Markov Chain With Rewards - Endogenous","Markov Trace - QALY-like DALY"))) %>%filter(strategy !="natural_history") %>%gather(outcome,value,-approach,-strategy) %>%mutate(outcome =factor(outcome, levels =c("YLD","YLL","DALY"))) %>%mutate(value = value*-1) %>%ggplot(aes(x=strategy, y = value, fill = approach)) +geom_bar(stat="identity", position =position_dodge()) +facet_grid(~outcome, scales="free") +theme_ipsum() +scale_fill_aaas(name="") +theme(legend.position="top") +geom_hline(aes(yintercept =0)) +coord_flip() +labs(x ="", y ="Change relative to natural_history\nNote: Scale changes in each panel")
Figure 12: Disability-Adjusted Life Year Outcome Changes Relative to Natural History Model
Anand, Sudhir, and Sanjay G. Reddy. 2019. “The Construction of the DALY: Implications and Anomalies.”SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3451311.
Bertram, Melanie Y., Jeremy A. Lauer, Karin Stenberg, and Tessa Tan Torres Edejer. 2021. “Methods for the Economic Evaluation of Health Care Interventions for Priority Setting in the Health System: An Update From WHO CHOICE.”International Journal of Health Policy and Management, January. https://doi.org/10.34172/ijhpm.2020.244.
Caswell, Hal, and Silke van Daalen. 2021. “Healthy Longevity from Incidence-Based Models: More Kinds of Health Than Stars in the Sky.”Demographic Research 45 (July): 397–452. https://doi.org/10.4054/demres.2021.45.13.
Caswell, Hal, and Virginia Zarulli. 2018. “Matrix Methods in Health Demography: A New Approach to the Stochastic Analysis of Healthy Longevity and DALYs.”Population Health Metrics 16 (1). https://doi.org/10.1186/s12963-018-0165-5.
Global Burden of Disease Collaborative Network. 2020a. “Global Burden of Disease Study 2019 (GBD 2019) Disability Weights.” Institute for Health Metrics; Evaluation (IHME). https://doi.org/10.6069/1W19-VX76.
———. 2020b. “Global Burden of Disease Study 2019 (GBD 2019) Life Tables 1950-2019.” Institute for Health Metrics; Evaluation (IHME). https://doi.org/10.6069/1PF5-1M37.
———. 2020c. “Global Burden of Disease Study 2019 (GBD 2019) Results.” Seattle, United States: Institute for Health Metrics and Evaluation (IHME).
———. 2021. “Global Burden of Disease Study 2019 (GBD 2019) Reference Life Table.” Institute for Health Metrics; Evaluation (IHME). https://doi.org/10.6069/1D4Y-YQ37.
Henderson, Harold V., and S. R. Searle. 1981. “The Vec-Permutation Matrix, the Vec Operator and Kronecker Products: A Review.”Linear and Multilinear Algebra 9 (4): 271–88. https://doi.org/10.1080/03081088108817379.
Murray, Christopher J L, Aleksandr Y Aravkin, Peng Zheng, Cristiana Abbafati, Kaja M Abbas, Mohsen Abbasi-Kangevari, Foad Abd-Allah, et al. 2020. “Global Burden of 87 Risk Factors in 204 Countries and Territories, 19902019: A Systematic Analysis for the Global Burden of Disease Study 2019.”The Lancet 396 (10258): 1223–49. https://doi.org/10.1016/s0140-6736(20)30752-2.