Julia simGH: simulating times to event from a general hazard structure

Author
Published

January 14, 2024

1 The simGH command.

The simGH command from the HazReg.jl Julia package allows one to simulate times to event from the following models:

A description of these hazard models is presented below as well as the available baseline hazards.

1.1 General Hazard model

The GH model is formulated in terms of the hazard structure

\[ h(t; \alpha, \beta, \theta, {\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}. \]

where \({\bf x}\in{\mathbb R}^p\) are the covariates that affect the hazard level; \(\tilde{\bf x} \in {\mathbb R}^q\) are the covariates the affect the time level (typically \(\tilde{\bf x} \subset {\bf x}\)); \(\alpha \in {\mathbb R}^q\) and \(\beta \in {\mathbb R}^p\) are the regression coefficients; and \(\theta \in \Theta\) is the vector of parameters of the baseline hazard \(h_0(\cdot)\).

This hazard structure leads to an identifiable model as long as the baseline hazard is not a hazard associated to a member of the Weibull family of distributions (Chen and Jewell 2001).

1.2 Accelerated Failure Time (AFT) model

The AFT model is formulated in terms of the hazard structure

\[ h(t; \beta, \theta, {\bf x}) = h_0\left(t \exp\{{\bf x}^{\top}\beta\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}. \]

where \({\bf x}\in{\mathbb R}^p\) are the available covariates; \(\beta \in {\mathbb R}^p\) are the regression coefficients; and \(\theta \in \Theta\) is the vector of parameters of the baseline hazard \(h_0(\cdot)\).

1.3 Proportional Hazards (PH) model

The PH model is formulated in terms of the hazard structure

\[ h(t; \beta, \theta, {\bf x}) = h_0\left(t ; \theta\right) \exp\{{\bf x}^{\top}\beta\}. \]

where \({\bf x}\in{\mathbb R}^p\) are the available covariates; \(\beta \in {\mathbb R}^p\) are the regression coefficients; and \(\theta \in \Theta\) is the vector of parameters of the baseline hazard \(h_0(\cdot)\).

1.4 Accelerated Hazards (AH) model

The AH model is formulated in terms of the hazard structure

\[ h(t; \alpha, \theta, \tilde{\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) . \]

where \(\tilde{\bf x}\in{\mathbb R}^q\) are the available covariates; \(\alpha \in {\mathbb R}^q\) are the regression coefficients; and \(\theta \in \Theta\) is the vector of parameters of the baseline hazard \(h_0(\cdot)\).

2 Available baseline hazards

The current version of the simGH command implements the following parametric baseline hazards for the models discussed in the previous section.

3 Illustrative example: Julia code

In this example, we simulate \(n=1,000\) times to event from the GH, PH, AFT, and AH models with PGW baseline hazards, using the Julia simGH command from the HazReg package. We censor these samples at a fixed value, and fit the corresponding models using the Julia package HazReg.

See also:

3.1 PGW-GH model

Code
# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))
des_t = rand(dist, n)


#----------------------------
# PGW-GH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = 0.5
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = des_t,
      theta = theta0, alpha = alpha0, beta = beta0, 
      hstr = "GH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWGH = GHMLE(init = fill(0.0, 3 + 1 + size(des)[2]), times = simdat,
            status = status, hstr = "GH", dist = "PGW", 
            des = des, des_t = des_t, method = "NM", maxit = 1000)

MLEPGWGH = [exp(OPTPGWGH[1].minimizer[j]) for j in 1:3] 
append!(MLEPGWGH, OPTPGWGH[1].minimizer[4:end]) 

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,alpha0,beta0),MLEPGWGH);  
COMP = DataFrame(COMP, :auto);
 
rename!( COMP, ["True", "MLE"] );
println(COMP)
6×2 DataFrame
 Row │ True     MLE        
     │ Float64  Float64    
─────┼─────────────────────
   1 │    0.1    0.0987151
   2 │    2.0    2.00311
   3 │    5.0    5.06478
   4 │    0.5    0.498934
   5 │   -0.5   -0.489382
   6 │    0.75   0.760254

3.2 PGW-PH model

Code
# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-PH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
      theta = theta0, alpha = nothing, beta = beta0, 
      hstr = "PH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWPH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "PH", dist = "PGW", 
            des = des, des_t = nothing, method = "NM", maxit = 1000)

MLEPGWPH = [exp(OPTPGWPH[1].minimizer[j]) for j in 1:3] 
append!(MLEPGWPH, OPTPGWPH[1].minimizer[4:end]) 

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWPH);  
COMP = DataFrame(COMP, :auto);
 
rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
 Row │ True     MLE        
     │ Float64  Float64    
─────┼─────────────────────
   1 │    0.1    0.0987943
   2 │    2.0    2.0111
   3 │    5.0    5.08918
   4 │   -0.5   -0.49086
   5 │    0.75   0.764558

3.3 PGW-AFT model

Code
# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-AFT simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
beta0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = des, des_t = nothing,
      theta = theta0, alpha = nothing, beta = beta0, 
      hstr = "AFT", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWAFT = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "AFT", dist = "PGW", 
            des = des, des_t = nothing, method = "NM", maxit = 1000)

MLEPGWAFT = [exp(OPTPGWAFT[1].minimizer[j]) for j in 1:3] 
append!(MLEPGWAFT, OPTPGWAFT[1].minimizer[4:end]) 

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWAFT);  
COMP = DataFrame(COMP, :auto);
 
rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
 Row │ True     MLE       
     │ Float64  Float64   
─────┼────────────────────
   1 │    0.1    0.100572
   2 │    2.0    1.97949
   3 │    5.0    4.96817
   4 │   -0.5   -0.482975
   5 │    0.75   0.764053

3.4 PGW-AH model

Code
# Required packages
using HazReg
using Distributions
using Random
using DataFrames

# Sample size
n = 10000

# Simulated design matrices
Random.seed!(123)
dist = Normal()
des_t = hcat(rand(dist, n), rand(dist, n))


#----------------------------
# PGW-AH simulation
#----------------------------

# True parameters
theta0 = [0.1,2.0,5.0]
alpha0 = [-0.5,0.75]

# censoring
cens = 10

# Data simulation
simdat = simGH(seed = 1234, n = n, des = nothing, des_t = des_t,
      theta = theta0, alpha = alpha0, beta = nothing, 
      hstr = "AH", dist = "PGW")

# status variable
status = collect(Bool,(simdat .< cens))

# Inducing censoring
simdat = min.(simdat, cens)

# Model fit
OPTPGWAH = GHMLE(init = fill(0.0, 3 + size(des)[2]), times = simdat,
            status = status, hstr = "AH", dist = "PGW", 
            des = nothing, des_t = des_t, method = "NM", maxit = 1000)

MLEPGWAH = [exp(OPTPGWAH[1].minimizer[j]) for j in 1:3] 
append!(MLEPGWAH, OPTPGWAH[1].minimizer[4:end]) 

# True parameter values vs. MLE
COMP =  hcat(vcat(theta0,beta0),MLEPGWAH);  
COMP = DataFrame(COMP, :auto);
 
rename!( COMP, ["True", "MLE"] );
println(COMP)
5×2 DataFrame
 Row │ True     MLE       
     │ Float64  Float64   
─────┼────────────────────
   1 │    0.1    0.103402
   2 │    2.0    1.94664
   3 │    5.0    4.85217
   4 │   -0.5   -0.484067
   5 │    0.75   0.732805

References

Chen, Y. Q., and N. P. Jewell. 2001. “On a General Class of Semiparametric Hazards Regression Models.” Biometrika 88 (3): 687–702.
Chen, Y. Q., and M. C. Wang. 2000. “Analysis of Accelerated Hazards Models.” Journal of the American Statistical Association 95 (450): 608–18.
Cox, D. R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (2): 187–202.
Kalbfleisch, J. D., and R. L. Prentice. 2011. The Statistical Analysis of Failure Time Data. John Wiley & Sons.
Rubio, F. J., L. Remontet, N. P. Jewell, and A. Belot. 2019. “On a General Structure for Hazard-Based Regression Models: An Application to Population-Based Cancer Research.” Statistical Methods in Medical Research 28: 2404–17.