Missing Data Processing for Clinical Research Explains the statistical methods of NEJM s top papers

Mondo Education Updated on 2024-03-06

For more information, please click on the right: 2024 "Real World Clinical Research" training course, send a full set of propensity scoring methods**.

Today's first article:

Interpretation of a clinical trial of NEJM** Missing Data Processing.

The lack of data is a major problem that plagues clinical research, and researchers are so anxious that they scratch their ears and cheeks. There are so many data missing, which has led countless computer scientists, mathematicians, and medical scientists to bend their waists. Today we are going to take a look at an ending multi-interpolation article published in the top journal NEJM.

On December 28, 2023, an article entitled:"restrictive or liberal transfusion strategy in myocardial infarction and anemia"of clinical RCTs** were published in:《the new england journal of medicine》(TOP, IF=158.)5)The author is from the team of Jeffrey L Carson, Rutgers University. The study included 3504 patients with myocardial infarction and anemia, who were divided into restrictive and open transfusion strategies, with binomial regression models and multiple imputations to determine the relationship between different transfusion strategies and myocardial infarction or death. The results showed that in patients with myocardial infarction and anaemia, a liberal transfusion strategy did not significantly reduce the risk of myocardial infarction or death within 30 days. However, the potential harms of restrictive transfusion strategies cannot be ruled out.

Summary and key results

1. Research summary

Objectives:In patients with myocardial infarction and anaemia, a liberal transfusion strategy does not significantly reduce the risk of myocardial infarction or death within 30 days. However, the potential harms of restrictive transfusion strategies cannot be ruled out.

Method:In this phase 3 interventional trial, researchers randomized patients with myocardial infarction and haemoglobin levels below 10 g dl to either a restrictive transfusion strategy (transfusion hemoglobin cut-off, 7 or 8 g dl) or a liberal transfusion strategy (hemoglobin cut-off, < 10 g dl). The primary outcome was myocardial infarction or death at 30 days. knotsFruit:A total of 3504 patients were included in the main analysis. The mean number of transfused erythrocyte units (SD) in the restriction strategy group was 07 ± 1.6. 2. in the free policy group5 ± 2.3。On days 1 to 3 after randomization, the mean hemoglobin level in the restricted strategy group was 1 percent lower than in the free strategy group3 to 16 g/dl。295 of the 1749 patients in the restrictive strategy group (16.)9%), and 255 of 1755 patients in the liberal strategy group (14.)5%) had a primary endpoint event (multiple attribution hazard ratio of 1 for incomplete follow-up15;95% confidence interval [ci] 099 to 134; p = 0.07)。9.9% of patients with restrictive strategies and 8Death occurred in 3% of patients with the liberal strategy (hazard ratio 1.).19.95% ci,0.96-1.47) ,8.5% and 7Myocardial infarction occurs in 2% of patients (hazard ratio 1.).19.95% ci,0.94-1.49)。

Conclusion:In patients with myocardial infarction and anaemia, a liberal transfusion strategy does not significantly reduce the risk of myocardial infarction or death within 30 days. However, the potential harms of restrictive transfusion strategies cannot be ruled out. 2. Findings

1.Baseline characteristics

From April 2017 to April 2023, a total of 3506 patients were enrolled, and 3504 patients were included in the analysis after 2 patients did not consent to the use of their data. The mean age of patients is 721 year old, of which 455% are women. These patients often have comorbidities, with approximately one-third having a history of myocardial infarction, coronary reconstruction, or heart failure, and nearly half having renal failure. The presence of multivessel disease and decreased left ventricular systolic function is common in patients undergoing coronary angiography and left ventricular function assessment prior to randomization. Most (55.)8% of patients had type 2 myocardial infarction, followed by type 1 (41.)7%)。The mean hemoglobin level before randomization was 86 g dl, median creatinine was 14 mg/dl (124μmol/l)。to 3447 cases (98.).3%) patients who received randomization were followed up for 30 days.

2.Implementation of the interventionThe mean hemoglobin level in the restrictive strategy group was 1. lower than in the free strategy group on day 13 g DL (95% confidence interval [CI] 1.)2 to 14) on day 3 lower 16 g/dl (95% ci,1.5 to 17)。The total number of red blood cells transfused in the free strategy group was 35 times (4,325 units vs. 1,237 units). The mean number of red blood cell units (SD) transfused in the free strategy group was 25 ± 2.3, while the throttling policy group is 07 ± 1.6。The median length of hospital stay between randomization and discharge, discontinuation or death was 5 days (2-10 days apart).

InRestrictive policy groups, 46 patients (26%), of which 24 were due to clinical reasons, including surgery and bleeding.

Liberal policy groupDiscontinuation of the protocol occurred in 241 patients (13.)7%) Of these, 89 patients provided clinical causes, including adverse effects, fluid overload, dialysis and transfusion reactions.

Other reasons for discontinuation included patient preference (68), provider preference (53), and other reasons (31), including blood** shortages and staffing issues.

3.We looked at primary and secondary outcomes295 of the 1749 patients in the restrictive strategy group (16.)9%) myocardial infarction or death from any cause within 30 days (primary outcome), 255 of 1755 patients in the free strategy group (14.)5%) died. The crude rr (restrictive and free) is 116(95% ci,1.00-1.35)。According to the site adjustment and incomplete follow-up of 57 patients (20 with restrictive strategy and 37 with free strategy).Logarithmic binomial model, with an estimated RR of 1 for the primary outcome15(95% ci,0.99-1.34; p = 0.07)。After adjusting for baseline prognostic factors (RR = 116; 95% ci 1.00-1.36), the model's estimation of the main outcome is consistent with the first two calculations.

4.K-M curves for the primary outcome (myocardial infarction or death).

5.Subgroup analysisThe effect of restrictive transfusion on primary outcomes was consistent across all prespecified subgroups compared to the liberal transfusion strategy. In patients with type 1 myocardial infarction, restrictive strategies resulted in more primary outcome events than liberal strategies (hazard ratio 132; 95% ci,1.04-1.67), with no significant effect in patients with type 2 myocardial infarction (hazard ratio 1.05; 95% ci,0.85-1.29)。

Design and Statistical Methods

1. Research designp:Adults (18 years of age) with ST-elevation or non-ST-elevation myocardial infarction with anemia (haemoglobin level < 10 g DL in the 24 hours prior to randomization) from 144 trial sites in the US, a total of 3506 patients.

e/c:Liberal transfusion strategy (hemoglobin cut-off, < 10 g dl) Restrictive transfusion strategy (transfusion hemoglobin cut-off, 7 or 8 g dl).

o:Myocardial infarction or death occurs at 30 days.

s:Open-label RCTs. 2. Statistical methods

1.AdoptionittFor analysis, a two-sided test was used, =005, the test power is 80%, assuming that the overall incidence of myocardial infarction or death is 164%。

2.useLogarithmic binomialRegressionThe primary outcomes were analyzed, with fixed effects being defined blood transfusion strategies and random effects being different clinical trial sites.

3.useMultiple interpolation (MICE) of chain equationsMultiple imputation was performed to impost patients who withdrew or were lost to follow-up 30 days ago and did not have a primary outcome eventOutcome data were missing

4.For all trial outcomes, we analysed rough 30-day risk, did not perform multiple imputations, and calculatedRR and 95% CI.

5.Secondary analyses were performed for primary outcomes, and we used:Kaplan Meier methodThe cumulative risk of primary outcome events was assessed and used at time of discontinuation and at 30 dayslog-rank statisticsCompare the two cumulative risk curves.

Imputation vs. multiple imputation

1. Basic knowledge of data imputation

For missing values, we have three common ways to deal with them: not processing (algorithmically accommodating or converting to a new classification); Delete; Imputation. But every piece of clinical data is a valuable resource that we often want to maximize our use of, so reasonable imputation is an ideal approach. Common imputations can be divided into the following:1Mean, median, mode, and mode imputation.

For continuous quantitative data, mean or median can be imputed. It's a simple and feasible method that is often used.

For discrete quantitative data, mode imputation can be used. The mode refers to the median value of the group of data with the highest frequency. 2.Fixed value padding.

Replace with a uniform reference value, a standard value, and a special value.

3.Proximity imputation.

With the attribute values of k samples closest (most similar) to the missing samples, the weighted average is post-imputed. When k=1, the nearest imputation method is also called the heat card filling method. 4.Regression imputation.

Build a regression model to fit missing values.

5.Function interpolation.

For one-dimensional data, two sample points can be used for linear interpolation. It is also possible to use multiple nearby sample points for Lagrangian interpolation and Newtonian interpolation. The difference between function interpolation and regression interpolation is that function interpolation is perfectly fitted, whereas regression interpolation is fitting to achieve minimum variance.

6 Multiple imputation (MI).

Multiple imputation is the use of model estimation and repeated simulations to generate a complete set of data sets. The basic principle is that the simulation generates a random distribution of missing data, and then randomly selects the data from it to fill in the missing values.

For example, for the missing value n, if we imputate with y=ax+b, the deviation of the sampling is not taken into account. But after increasing the residuals e, imputing with y=ax+b+e, it is still assumed that a,b are true values, but a,b are only our estimates. So, we randomly select a,b from the Bayesian posterior distribution. With the Markov Chain Monte Carlo (MCMC) method, we can achieve this multiple imputation by generating stationary distribution chains and simulated sampling. 2. Multiple imputation in this paper

1.The multiple imputation method is described in detail in the appendix and protocol. The main applicationThe principle is the Markov Chain Monte Carlo (MCMC) multiple imputation method. It is used for patients who drop out or are absent during imputation follow-upPrimary outcome data, i.e., 30-day outcome values for missing death and myocardial infarction (yes no). The software used is SAS 94 proc mi and proc mianalyze.

2.The way to do this is to apply the data that has been complete, to create oneLogarithmic binomial modelto estimateOutcomes and key variablesrelationship. This model will be used for outcome probabilities for participants with missing 30-day results. Based on these probabilities, it will be createdTen imputed datasetsA logarithmic binomial model with site random effects will be estimated for each imputed dataset and the results will be presented using a robust methodConvergenceIn order to obtain the ** effect with appropriate adjustment of standard errorSingle estimate. A number of sensitivity analyses were then performed, and the results were similar to those of various methods.

3.Variables included in multiple imputation models:

Postscript:

As long as you have done clinical research, data collection, and statistical analysis, you will almost certainly encounter the problem of missing data, which is like a stamp on human research in the real world. What should I do if my data is missing? Do you just delete it? You might as well try multiple interpolation to save the emergency. Multiple imputation (MI) is a method used to fill in the missing values of complex data, and has appeared in many high-quality SCI clinical studies in recent years.

In this paper, the missing data is filled in through multiple imputations, and the data that are more in line with the actual situation are obtained, and more realistic conclusions are drawn. This also provides a way of thinking for readers when they encounter similar situations. A more detailed approach requires you to read other literature, after all, there is no end to learning, and there is always something new.

For details, please click on the right: Medical Statistics Service: Topics, **Graduation Data Analysis, etc.

Related Pages