Aspen Gulley

Data Scientist | Behavior Analyst in Training


Bladder and Brain Cancer Survival Analysis


The Kaplan-Meier survival function gives the probability of surviving past time, t:

Kaplan-Meier Survival Function

86 bladder cancer patients had tumors removed. After removal, these patients were separated into two groups, a placebo group (group 0), and a drug Thiopeta treatment group (group 1). The variable time in this dataset represents how many months until a tumor reoccurred in the patient. The variable censor indicates if the tumor reoccurred. If the tumor did not reoccur, the observation was censored. Variable number represents the number of tumors removed 1 vs 2 or more.

Estimate the survival function in each treatment group using the Kaplan-Meier estimator. Report Kaplan-Meier estimates of S(5), S(10) and S(25) for treatment group and placebo group. Given an explanation of Kaplan-Meier estimates of S(10) for treatment group and placebo group.

S(t) gives the probability that a patient will survive past time(t) without a reoccurrence of a tumor. S(10) can be interpreted 55.77% of the placebo group not experiencing a tumor reoccurrence by 10 months while 66.87% of the treatment group had not experienced a tumor reoccurrence by 10 months.

This Kaplan-Meier curve gives the probability that a patient will survive past time (t) without a reoccurrence of a tumor. Group 0: Placebo, Group 1: Treatment

The log-rank test will examine the difference between the treatment and placebo groups by testing the difference in the probability of tumor reoccurrence between the groups over time.

Null Hypothesis: The data does not support the claim that there is a difference in tumor reoccurrence times between the placebo and treatment groups.

Alternative Hypothesis: The data supports the claim that there is a difference in tumor reoccurrence times between the placebo and treatment groups.

Conclusion: Fail to reject the null hypothesis. The data does not support the claim that there is a difference in tumor reoccurrence times between the placebo and treatment groups.

The Cox proportional hazards model: h(t) = h0(t) exp{β1group + β2number} where h(t) is the hazard function which gives the probability that the tumor occurred before t time, and h0(t) is the hazard function and β is the regression coefficient.

The hazard of recurrence of tumor in people who are in a treatment group is exp(−0.3928) = 0.6751 times that of people in the placebo group. The 95% CI is (0.3726, 1.223), which contains 1. This suggests that the hazard of recurrence of tumor in people who are in a treatment group is not significantly different from that of people in the placebo group.

R Code:

bladder<-read.csv(“/Users/aspengulley/Desktop/bladder.csv”, header=T)
bladder

library(survival)
bladder.km<- survfit(Surv(bladder$time, bladder$censor)~bladder$group, data=bladder)
bladder.km
summary(survfit(Surv(bladder$time, bladder$censor)~bladder$group), times=5)
summary(survfit(Surv(bladder$time, bladder$censor)~bladder$group), times=10)
summary(survfit(Surv(bladder$time, bladder$censor)~bladder$group), times=25)

library(survminer)
ggsurvplot(
 fit = survfit(Surv(bladder$time, bladder$censor)~bladder$group, data = bladder), 
 xlab = “Months”, 
 ylab = “Overall Probability”)

bladder.diff<- survdiff(Surv(bladder$time, bladder$censor)~bladder$group)
bladder.diff

fit <- coxph(Surv(bladder$time, bladder$censor)~bladder$group + bladder$number, data = bladder)
summary(fit)

ISLR2 11.8 Lab: Survival Analysis

Reference:
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2021) An Introduction to Statistical Learning with applications in R, Second Edition,https://www.statlearning.com, Springer-Verlag, New York

Here is the online link to the book if you want to check out some other machine learning topics and labs. The labs are at the end of every chapter. https://hastie.su.domains/ISLR2/ISLRv2_website.pdf

BrainCancer {ISLR2}

A data set consisting of survival times for patients diagnosed with brain cancer. A data frame with 88 observations and 8 variables:
sex: factor with levels “Female” and “Male”
diagnosis: factor with levels “Meningioma”, “LG glioma”, “HG glioma”, and “Other”
loc: location factor with levels “Infratentorial” and “Supratentorial”
ki: Karnofsky index
gtv: gross tumor volume, in cubic centimeters
stereo: stereotactic method factor with levels “SRS” and “SRT”
status: whether the patient is still alive at the end of the study: 0=Yes, 1=No
time: age, in years

Source:
I. Selingerova, H. Dolezelova, I. Horova, S. Katina, and J. Zelinka. Survival of patients with primary brain tumors: Comparison of two statistical approaches. PLoS One, 11(2):e0148733, 2016. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4749663/

Kaplan-Meier Survival Curve for Brain Cancer
Kaplan-Meier Curve for Brain Cancer Stratified by Sex

The log-rank test will examine the difference between the male and female groups by testing the difference in the probability of survival between the groups over time.

Null Hypothesis: The data does not support the claim that there is a difference in the probability of survival between the groups over time.

Alternative Hypothesis: The data supports the claim that there is a difference in the probability of survival between the groups over time.

Conclusion: Fail to reject the null hypothesis. The data does not support the claim that there is a difference in the probability of survival between the groups over time.

The Cox proportional hazard model results:

There is no evidence of a difference between survival in males and females.

Fit a Cox model and include other predictors

Death is 2 times more likely with the diagnosis of LG glioma in comparison to the baseline condition of meningioma, while death with the diagnosis of HG glioma is more than 8 times more likely in comparison to the baseline condition of meningioma. Additionally, the Karnofsky index has a negative coefficient, suggesting higher values are associated with longer survival.

Survival Probability by Diagnosis. Red: Meningioma, Green: LG glioma, Dark Blue: HG glioma, Light Blue: Other

R Code:

library (ISLR2)
data(“BrainCancer”)
?BrainCancer

names (BrainCancer)
attach (BrainCancer)
table (sex)
table (diagnosis)
table (status)

library (survival)
fit.surv<-survfit(Surv(BrainCancer$time, BrainCancer$status)~1, data=BrainCancer)
plot (fit.surv , xlab = “ Months “,
 ylab = “Estimated Probability of Survival”)

library(survminer)
ggsurvplot(
 fit = fit.surv,
 xlab = “Months”,
 ylab = “Estimated Probability of Survival”)

fit.sex<-survfit(Surv(BrainCancer$time, BrainCancer$status)~BrainCancer$sex, data=BrainCancer)
quartz()
plot (fit.sex , xlab = “ Months “,
 ylab = “ Estimated Probability of Survival “, col = c(2,4))
legend (c(“bottomleft”), levels (sex), col = c(2,4), lty = 1)

logrank.test <- survdiff(Surv(BrainCancer$time, BrainCancer$status)~BrainCancer$sex, data=BrainCancer)
logrank.test

fit.cox <- coxph(Surv(BrainCancer$time, BrainCancer$status)~BrainCancer$sex, data=BrainCancer)
summary(fit.cox)

fit.cox <- coxph(Surv(BrainCancer$time, BrainCancer$status)~BrainCancer$sex + BrainCancer$diagnosis + BrainCancer$loc + BrainCancer$ki + BrainCancer$gtv + BrainCancer$stereo, data=BrainCancer)
summary(fit.cox)

modaldata <- data.frame (diagnosis = levels (diagnosis),
 sex = rep (“ Female “, 4),
 loc = rep (“ Supratentorial “, 4),
 ki = rep ( mean (ki), 4),
 gtv = rep ( mean (gtv), 4),
 stereo = rep (“ SRT “, 4))

survplots <- survfit (fit.cox, newdata = modaldata)
plot (survplots , xlab = “Months”,
 ylab = “ Survival Probability “, col = 2:5)
legend (c(“bottomleft”), levels (diagnosis), col = 2:5, lty = 1)

By Aspen Gulley on .



Leave a Reply

WORK & VOLUNTEER EXPERIENCE

Data Analyst
CenCore, LLC
2024 – Current

Mental Health Crisis Counselor
Crisis Text Line
2023 – 2024

Contributing Data Science Writer
Dev Genius
2022 – 2024

Research Assistant & Academic Writer
Utah State University
2019 – 2020

Behavior Technician
Wasatch Behavioral Health
2018 – 2019

Discover more from Aspen Gulley

Subscribe now to keep reading and get access to the full archive.

Continue reading