Predicting Mortality in Heart Failure Patients admitted to ICU¶

Jack B. Huber, Ph.D.
Senior Clinical Data Analyst
Providence Swedish Pharmacy
Seattle, WA

jack.huber@swedish.org
jack.bernard.huber@gmail.com


Introduction¶

Heart failure (HF) is a leading cause of hospitalization and readmission among United States adults.[1] Although outcomes for HF patients have improved over time, those who are hospitalized with HF are still at significant risk of mortality.[1] HF is among the top reasons for cardiac intensive care, and the prevalence of HF is increasing in CICU populations.[1]

These facts underscore ongoing need for research on the factors that predict mortality in this patient population. As one researcher recently suggested, factors predicting mortality in this population and setting remain "poorly characterized."[2] As a result, recent studies have used machine learning to predict mortality after considering the values on a range of features.[1,2,3,4]

One important but understudied dimension of this issue is gender. Does gender predict mortality? Are men or women with HF more likely to die in intensive care? If so, does this relationship depend on age? What are the implications for medical practice and prevention? Parissis et al. (2012) conducted a multivariate analysis of data on 4,953 HF patients drawn from a global registry.[5] They found that female HF patients were, relative to male patients, under-treated and under-represented in clinical trials.[5] The authors "documented the existence of differences in clinical characteristics, co-morbidities, precipitating factors, and therapeutic modalities between male and female patients admitted with AFH (acute heart failure). Despite the variety of the above factors, the length of in-hospital stay and the in-hospital mortality were similar in both genders."[5] While that study shed significant light on the topic, it was limited in not examining the interaction between gender and age.

This study aims to shed further light on this topic by examining the interactions between gender and age in mortality in the ICU among patients with HF. I will use multivariate methods to isolate the effects of gender, age, and their interaction on mortality controlling for pre-existing differences in comorbidities and vital signs. Based on data from the Centers for Disease Control,[6,7] I hypothesize that, all else equal, men with HF will be more likely to die in the ICU than women, and the effect of gender on mortality will depend on age.

This study has implications for medical practice and prevention. A clearer picture of the related mortality risks of both gender and age could help cardiologists and critical care providers tailor care to the needs of specific patients. The general public also stands to benefit from the findings of this study; data suggesting that men are more likely than women to die of HF at a younger age could prompt men to be more proactive about preventing HF from a younger age.


Methods¶

Data Source¶

The data for this study come from Zhou et al. (2021)[5] available here at Dryad, "an open data publishing platform and a community committed to the open availability and routine re-use of all research data." This dataset includes 49 fields for demographic characteristics, vital signs, and laboratory values on 1,177 adult HF patients. The dataset is a sample of patient-level medical data from the larger MIMIC-III Clinical Database, "a publicly available critical care database containing de-identified data on 46,520 patients and 58,976 admissions to the ICU of the Beth Israel Deaconess Medical Center, Boston, USA, between 1 June, 2001 and 31 October, 2012."[8,9,10] This dataset is useful because it contains a large number of diverse patient-level variables which facilitates multivariate analysis.

Measures¶

The outcome measure of all-cause hospital mortality is a binary variable in which a value 1 indicates death and 0 alive. Demographic characteristics include gender (dichotomized as 1 for men and 0 for all others), age (in years at ICU admission), and an interaction term of the two (gender x age). Comorbidities include body mass index (BMI), hypertensive, atrial fibrillation, chronic heart disease (CHD) with no myocardial infarction (MI), diabetes, deficiency anemias, depression, hyperlipemia, renal failure, and COPD, and these are all coded as binary variables in which a value of 1 indicates presence of the condition and 0 indicates absence of the condition. Vital signs include heart rate, systolic blood pressure, diastolic blood pressure, respiratory rate, temperature, oxygen saturation, and urine output.

Data Analysis¶

I will use multivariate logistic regression to regress mortality on the demographic characteristics, comorbidities, and vital signs with a focus on the adjusted odds ratios for gender, age, and the gender*age interaction. These values will illuminate the change in the logs odds of dying associated with being male, older, and older male, controlling for the other measures of comorbidities, vital signs and laboratory values.

Data Preparation¶

In [1]:
# -------------------- IMPORT LIBRARIES --------------------
%matplotlib inline
import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

# -------------------- LOAD RAW DATA --------------------

# these data load from the working directory but the source 
# data come from here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.0p2ngf1zd

d = pd.read_csv("data01.csv") 

# -------------------- SELECT COLUMNS --------------------
data = d.loc[:,["outcome",
     "gendera",
     "age",
     "BMI",
     "hypertensive",
     "Hyperlipemia",
     "atrialfibrillation",
     "CHD with no MI",
     "COPD",
     "diabetes",
     "deficiencyanemias",
     "depression",
     "Renal failure",
     "heart rate",
     "Systolic blood pressure",
     "Diastolic blood pressure",
     "Respiratory rate",
     "temperature",
     "SP O2",
     "Urine output"]]

# -------------------- RENAME COLUMNS --------------------
data.rename(columns={"outcome":"Mortality",
                    "gendera":"Gender",
                    "age":"Age",
                    "hypertensive":"Hypertensive",
                    "hyperlipemia":"Hyperlipemia",
                    "atrialfibrillation":"Atrial_fibrillation",
                    "CHD with no MI":"CHD_no_MI",
                    "diabetes":"Diabetes",
                    "deficiencyanemias":"Deficiency_anemias",
                    "depression":"Depression",
                    "Renal failure":"Renal_failure",
                    "heart rate":"Heart_rate",
                    "Systolic blood pressure":"Systolic_BP",
                    "Diastolic blood pressure":"Diastolic_BP",
                    "Respiratory rate":"Respiratory_rate",
                    "temperature":"Temperature",
                    "SP O2":"SpO2",
                    "Urine output":"Urine_output"}, inplace=True)

# -------------------- RECODE VARIABLES --------------------
data['Male'] = data['Gender'] # ----- Dummy variable for male
def recode(series):
    if series == 2:
        return 1
    else:
        return 0
data['Male'] = data['Male'].apply(recode)

data['Male*Age'] = data['Male']*data['Age'] # ----- Male x Age interaction term

def recode(genders): # ----- Gender with labels
    if genders == 1:
        return 'Female'
    else:
        return "Male"
data['Gender'] = data['Gender'].apply(recode)

def recode(death): # ----- Mortality with labels
    if death == 0:
        return 'Alive'
    else:
        return 'Deceased'
data['Death'] = data['Mortality'].apply(recode)

# -------------------- INSPECT THE DATA FRAME --------------------
data.info()
data.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1177 entries, 0 to 1176
Data columns (total 23 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Mortality            1176 non-null   float64
 1   Gender               1177 non-null   object 
 2   Age                  1177 non-null   int64  
 3   BMI                  962 non-null    float64
 4   Hypertensive         1177 non-null   int64  
 5   Hyperlipemia         1177 non-null   int64  
 6   Atrial_fibrillation  1177 non-null   int64  
 7   CHD_no_MI            1177 non-null   int64  
 8   COPD                 1177 non-null   int64  
 9   Diabetes             1177 non-null   int64  
 10  Deficiency_anemias   1177 non-null   int64  
 11  Depression           1177 non-null   int64  
 12  Renal_failure        1177 non-null   int64  
 13  Heart_rate           1164 non-null   float64
 14  Systolic_BP          1161 non-null   float64
 15  Diastolic_BP         1161 non-null   float64
 16  Respiratory_rate     1164 non-null   float64
 17  Temperature          1158 non-null   float64
 18  SpO2                 1164 non-null   float64
 19  Urine_output         1141 non-null   float64
 20  Male                 1177 non-null   int64  
 21  Male*Age             1177 non-null   int64  
 22  Death                1177 non-null   object 
dtypes: float64(9), int64(12), object(2)
memory usage: 211.6+ KB
Out[1]:
Mortality Gender Age BMI Hypertensive Hyperlipemia Atrial_fibrillation CHD_no_MI COPD Diabetes ... Heart_rate Systolic_BP Diastolic_BP Respiratory_rate Temperature SpO2 Urine_output Male Male*Age Death
0 0.0 Female 72 37.588179 0 1 0 0 0 1 ... 68.837838 155.866667 68.333333 16.621622 36.714286 98.394737 2155.0 0 0 Alive
1 0.0 Male 75 NaN 0 0 0 0 1 0 ... 101.370370 140.000000 65.000000 20.851852 36.682540 96.923077 1425.0 1 75 Alive
2 0.0 Male 83 26.572634 0 0 0 0 0 0 ... 72.318182 135.333333 61.375000 23.640000 36.453704 95.291667 2425.0 1 83 Alive
3 0.0 Male 43 83.264629 0 0 0 0 0 0 ... 94.500000 126.400000 73.200000 21.857143 36.287037 93.846154 8760.0 1 43 Alive
4 0.0 Male 75 31.824842 1 0 0 0 1 0 ... 67.920000 156.560000 58.120000 21.360000 36.761905 99.280000 4455.0 1 75 Alive

5 rows × 23 columns


Results¶

Table 1[10] presents descriptive information about sample characteristics. Approximately 13.6 percent of the 1,177 patients died. The statistically significant predictors of mortality are age, atrial fibrillation, deficiency anemias, depression, hypertension, and renal failure.

In [2]:
# -------------------- TABLE 1: DEMOGRAPHIC CHARACTERISTICS --------------------
from tableone import TableOne

t1_columns = ['Death','Gender', 'Age', 'BMI', 'Atrial_fibrillation', 'CHD_no_MI', 'COPD', 'Diabetes', 'Deficiency_anemias', 'Depression', 'Hyperlipemia', 'Hypertensive', 'Renal_failure',
             'Heart_rate','Systolic_BP','Diastolic_BP','Respiratory_rate','Temperature','SpO2','Urine_output']
t1_categorical = ['Gender', 'Atrial_fibrillation' , 'CHD_no_MI', 'COPD', 'Diabetes', 'Deficiency_anemias', 'Depression', 'Hyperlipemia', 'Hypertensive', 'Renal_failure']
t1_groupby = ['Death']
#t1_nonnormal = ['Age']
#t1_labels={'death': 'mortality'}
#t1 = TableOne(data, columns=t1_columns, categorical=t1_categorical, groupby=t1_groupby, nonnormal=nonnormal, rename=labels, pval=False)
t1 = TableOne(data, columns=t1_columns, categorical=t1_categorical, groupby=t1_groupby, pval=True)
print()
print("Table 1: Sample Characteristics")
t1
Table 1: Sample Characteristics
Out[2]:
Grouped by Death
Missing Overall Alive Deceased P-Value
n 1177 1017 160
Gender, n (%) Female 0 559 (47.5) 478 (47.0) 81 (50.6) 0.442
Male 618 (52.5) 539 (53.0) 79 (49.4)
Age, mean (SD) 0 74.1 (13.4) 73.7 (13.4) 76.3 (13.2) 0.023
BMI, mean (SD) 215 30.2 (9.3) 30.4 (9.2) 28.6 (9.9) 0.070
Atrial_fibrillation, n (%) 0 0 646 (54.9) 578 (56.8) 68 (42.5) 0.001
1 531 (45.1) 439 (43.2) 92 (57.5)
CHD_no_MI, n (%) 0 0 1076 (91.4) 928 (91.2) 148 (92.5) 0.709
1 101 (8.6) 89 (8.8) 12 (7.5)
COPD, n (%) 0 0 1088 (92.4) 935 (91.9) 153 (95.6) 0.139
1 89 (7.6) 82 (8.1) 7 (4.4)
Diabetes, n (%) 0 0 681 (57.9) 579 (56.9) 102 (63.7) 0.124
1 496 (42.1) 438 (43.1) 58 (36.2)
Deficiency_anemias, n (%) 0 0 778 (66.1) 653 (64.2) 125 (78.1) 0.001
1 399 (33.9) 364 (35.8) 35 (21.9)
Depression, n (%) 0 0 1037 (88.1) 888 (87.3) 149 (93.1) 0.048
1 140 (11.9) 129 (12.7) 11 (6.9)
Hyperlipemia, n (%) 0 0 730 (62.0) 620 (61.0) 110 (68.8) 0.072
1 447 (38.0) 397 (39.0) 50 (31.2)
Hypertensive, n (%) 0 0 332 (28.2) 274 (26.9) 58 (36.2) 0.019
1 845 (71.8) 743 (73.1) 102 (63.7)
Renal_failure, n (%) 0 0 747 (63.5) 625 (61.5) 122 (76.2) <0.001
1 430 (36.5) 392 (38.5) 38 (23.8)
Heart_rate, mean (SD) 13 84.6 (16.0) 83.8 (16.0) 89.8 (15.2) <0.001
Systolic_BP, mean (SD) 16 118.0 (17.4) 118.9 (17.3) 112.2 (16.6) <0.001
Diastolic_BP, mean (SD) 16 59.5 (10.7) 59.9 (10.9) 57.2 (8.9) 0.001
Respiratory_rate, mean (SD) 13 20.8 (4.0) 20.6 (3.9) 22.0 (4.4) <0.001
Temperature, mean (SD) 19 36.7 (0.6) 36.7 (0.6) 36.5 (0.7) 0.006
SpO2, mean (SD) 13 96.3 (2.3) 96.3 (2.1) 95.9 (3.1) 0.064
Urine_output, mean (SD) 36 1899.3 (1272.4) 1986.9 (1271.2) 1346.0 (1136.6) <0.001

Figure 1 aims to offer a closer look at the relationships among gender, age, and mortality. The visual differences are not striking, so these relationships are subtle. Male patients do appear to live longer than female patients.

In [3]:
# -------------------- FIGURE 1. BAR CHART --------------------
import matplotlib.pyplot as plt
import seaborn as sns
print()
print('Figure 1: Bar Chart of Gender, Age, and Mortality')
print()
sns.set_theme(style="whitegrid")
g = sns.catplot(
    data=data, kind="bar",
    x="Death", y="Age", hue="Gender",
    errorbar="sd", palette="dark", alpha=.6, height=6
)
g.despine(left=True)
g.set_axis_labels("", "Age")
g.legend.set_title("")
Figure 1: Bar Chart of Gender, Age, and Mortality

Although gender was not a significant predictor of mortality in Table 1, and no relationship between gender, age and mortality is apparent from Figure 1, it is possible that this relationship is suppressed, and depends on other variables. To explore this possibility, I specify and estimate a logistic regression model.

In [4]:
# -------------------- TABLE 2. LOGISTIC REGRESSION --------------------

# ---------- Demographics, comorbidities, and vitals
y, X = dmatrices('Mortality ~ Male + Age + Male*Age + BMI + Atrial_fibrillation + CHD_no_MI + COPD + Diabetes + Deficiency_anemias + Depression + Hyperlipemia + Hypertensive + Renal_failure + Heart_rate + Systolic_BP + Diastolic_BP + Respiratory_rate + Temperature + SpO2 + Urine_output', data=data, return_type='dataframe')

mod = sm.Logit(y, X) # ----- Describe model
res = mod.fit() # ----- Fit model
print()
print('Table 2: Results of Logistic Regression')
print()
print(res.summary())
Optimization terminated successfully.
         Current function value: 0.295019
         Iterations 8

Table 2: Results of Logistic Regression

                           Logit Regression Results                           
==============================================================================
Dep. Variable:              Mortality   No. Observations:                  927
Model:                          Logit   Df Residuals:                      906
Method:                           MLE   Df Model:                           20
Date:                Wed, 02 Aug 2023   Pseudo R-squ.:                  0.1948
Time:                        20:19:13   Log-Likelihood:                -273.48
converged:                       True   LL-Null:                       -339.66
Covariance Type:            nonrobust   LLR p-value:                 1.407e-18
=======================================================================================
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept              33.3027      8.450      3.941      0.000      16.741      49.864
Male                   -1.3877      1.419     -0.978      0.328      -4.170       1.394
Age                     0.0028      0.014      0.205      0.837      -0.024       0.030
Male:Age                0.0119      0.018      0.648      0.517      -0.024       0.048
BMI                     0.0041      0.015      0.272      0.785      -0.025       0.033
Atrial_fibrillation     0.1143      0.241      0.474      0.636      -0.359       0.587
CHD_no_MI               0.2494      0.405      0.616      0.538      -0.544       1.043
COPD                   -0.7732      0.474     -1.632      0.103      -1.702       0.156
Diabetes                0.2632      0.245      1.072      0.284      -0.218       0.744
Deficiency_anemias     -1.0143      0.286     -3.542      0.000      -1.575      -0.453
Depression             -0.7153      0.462     -1.550      0.121      -1.620       0.189
Hyperlipemia           -0.3282      0.247     -1.329      0.184      -0.812       0.156
Hypertensive           -0.0977      0.261     -0.375      0.708      -0.609       0.413
Renal_failure          -0.7933      0.272     -2.919      0.004      -1.326      -0.261
Heart_rate              0.0400      0.008      4.737      0.000       0.023       0.057
Systolic_BP             0.0052      0.008      0.636      0.525      -0.011       0.021
Diastolic_BP           -0.0466      0.014     -3.321      0.001      -0.074      -0.019
Respiratory_rate        0.0580      0.030      1.960      0.050    7.86e-06       0.116
Temperature            -0.8845      0.201     -4.403      0.000      -1.278      -0.491
SpO2                   -0.0430      0.047     -0.916      0.360      -0.135       0.049
Urine_output           -0.0006      0.000     -4.327      0.000      -0.001      -0.000
=======================================================================================

Table 2 reports the results of the full regression model of mortality on gender, age, the gender*age interaction, and all comorbidities, vital signs, and laboratory values. Values in the first column are regression coefficients for each predictor and represent the average change in the log odds of mortality associated with an average change in the value of the predictor. The second column is the standard error surrounding the coefficient. The third column, the Z statistic, is the ratio of the coefficient to the standard error, and the fourth column, P>|Z|, reports the probability values of the Z statistic.

All else equal, gender, age, and the interaction between the two are not statistically significant predictors of mortality after accounting for differences in comorbidities, vital signs, and laboratory values. The only statistically significant predictors of mortality are deficiency anemias, renal failure, heart rate, diastolic blood pressure, respiratory rate, temperature, and urine output.


Discussion¶

This study sought to model the relationships among age, gender, and mortality in a sample of data from 1,177 patients with heart failure admitted to intensive care units in the Boston area from 2001 to 2012. I hypothesized that men would be more likely to die in the ICU, and that age and gender would interact such that the effect of gender on mortality would depend on age.

The results diconfirm my hypotheses about the relationship between gender, age, and mortality. Controlling for differences in comorbidities and vital signs, gender, age, and the interaction between the two are not statistically significant predictors of mortality. This finding is consistent with the findings of Parissis et al. (2012) in their similar analysis of data from a global registry.[5]

This study is limited in three ways. The first, as mentioned previously, is the age of the data. Practice may have evolved in the 11 years since the data were collected. The second is the size of the data set. It includes data on 1,177 patients, and missing values on some variables reduced the sample size for the regression down to 927 patients; and with smaller sample sizes comes the risk of overfitting the model with too many variables. The third limitation is the data set does not include variables on medications or patient care in the ICU. Without this information it is difficult for caregivers to know how to better adapt care to the needs of specific patients.

Although this study failed to find evidence of relationships among gender, age, and mortality among HF patients in ICU settings, this line of inquiry is meritorious. Researchers and data scientists should continue to take full advantage of publicly available clinical data to isolate the factors that make the difference between life and death in the ICU.


References¶

[1] Jentzer JC, Reddy YN, Rosenbaum AN, Dunlay SM, Borlaug BA, Hollenberg SM. Outcomes and Predictors of Mortality Among Cardiac Intensive Care Unit Patients With Heart Failure. J Card Fail. 2022;28(7):1088-1099. https://doi:10.1016/j.cardfail.2022.02.015.

[2] Li J, Liu S, Hu Y, Zhu L, Mao Y, Liu J. Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study. J Med Internet Res. 2022;24(8):e38082. Published 2022 Aug 9. https://doi:10.2196/38082

[3] Lombardi, C., Peveri, G., Cani, D., Latta, F., Bonelli, A., Tomasoni, D., Sbolli, M., Ravera, A., Carubelli, V., Saccani, N., Specchia, C., and Metra, M. (2020) In-hospital and long-term mortality for acute heart failure: analysis at the time of admission to the emergency department. ESC Heart Failure, 7: 2650– 2661. https://doi.org/10.1002/ehf2.12847.

[4] Zhou, Jingmin et al. (2021), Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database, Dryad, Dataset, https://doi.org/10.5061/dryad.0p2ngf1zd.

[5] Parissis, John T., Lilian Mantziari, Nikolaos Kaldoglou, Ignatios Ikonomidis, Maria Nikolaou, Alexandre Mebazaa, Johann Altenberger et al. "Gender-related differences in patients with acute heart failure: management and predictors of in-hospital mortality." International journal of cardiology 168, no. 1 (2012): 185-189. https://doi.org/10.1016/j.ijcard.2012.09.096.

[6] Centers for Disease Control and Prevention. Men and Heart Disease. https://www.cdc.gov/heartdisease/men.htm.

[7] Centers for Disease Control and Prevention. Women and Heart Disease. https://www.cdc.gov/heartdisease/women.htm

[8] Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.

[9] Johnson, A., Pollard, T., Shen, L. et al. MIMIC-III, a freely accessible critical care database. Sci Data 3, 160035 (2016). https://doi.org/10.1038/sdata.2016.35.

[10] Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

[11] Tom J Pollard, Alistair E W Johnson, Jesse D Raffa, Roger G Mark; tableone: An open source Python package for producing summary statistics for research papers, JAMIA Open, Volume 1, Issue 1, 1 July 2018, Pages 26–31, https://doi.org/10.1093/jamiaopen/ooy012