Levels of Evidence for Adult and Pediatric Cancer Treatment Studies (PDQ®): Treatment - Health Professional Information [NCI]
This information is produced and provided by the National Cancer Institute (NCI). The information in this topic may have changed since it was written. For the most current information, contact the National Cancer Institute via the Internet web site at http://cancer.gov or call 1-800-4-CANCER.
A variety of end points may be measured and reported from clinical studies in oncology. These may include total mortality (or survival from the initiation of therapy), cause-specific mortality, quality of life, or indirect surrogates such as event-free survival, disease-free survival, progression-free survival, or tumor response rate. End points may also be determined within study designs of varying strength, ranging from the gold standard—the randomized, double-blinded, controlled clinical trial—to case series experiences from nonconsecutive patients.
The PDQ editorial boards use a formal ranking system of levels of evidence to help the reader judge the strength of evidence linked to the reported results of a therapeutic strategy. For any given therapy, results can be ranked on each of the following two scales: (1) strength of the study design and (2) strength of the study end points. Together, the two rankings inform the overall level of evidence. Depending on perspective, different expert panels, professional organizations, or individual physicians may use different cut points of overall strength of evidence in formulating therapeutic guidelines or in taking action. However, a formal description of the level of evidence provides a uniform framework for the data, leading to specific recommendations.
The PDQ Adult Treatment Editorial Board and the PDQ Pediatric Treatment Editorial Board add information on levels of evidence, described below, to the PDQ Adult Cancer Treatment Summaries and the PDQ Pediatric Cancer Treatment Summaries when appropriate.
Strength of Study Design
The various types of study design are described below in descending order of strength:
- Randomized controlled clinical trials (RCTs).
The double-blinded RCT is the gold standard of study design. To meet this designation, the study allocation must be blinded to the physician both before and after the randomization and treatment assignment take place. This design provides protection from allocation bias by the investigator and from bias in assessment of outcomes by both the investigator and the patient. Unfortunately, most clinical trials in oncology cannot be double-blinded after treatment allocation because procedures or toxic effects often vary substantially among study allocations in ways that are obvious to both the health care professional and the patient.
- Meta-analyses of RCTs.
Meta-analyses of randomized studies offer a quantitative synthesis of previously conducted studies. The strength of evidence from a meta-analysis is based on the quality of the conduct of individual studies. Moreover, meta-analyses can magnify small systematic errors in individual studies. A study comparing the results of single, large, randomized trials with those of meta-analyses of smaller trials published earlier on the same topics showed only fair agreement (kappa statistic, 0.35). Outcomes of the large RCTs were not predicted accurately by the meta-analysis 35% of the time.[1,2] Meta-analyses performed by different investigators to address the same clinical issue can reach contradictory conclusions. Therefore, meta-analyses of randomized studies are placed in the same or lower category of strength of evidence as randomized studies, not at a higher level.
- Nonrandomized controlled clinical trials.
This category includes trials in which treatment allocation was nonrandomized and prospective. The allocation would be known to the investigator before informed consent is obtained from the patient. An imbalance can occur in treatment allocation under such circumstances. Subset analyses within randomized trials often fall into this category of evidence.
Subset analyses of randomized studies are subject to errors inherent in multiplicity (i.e., statistically significant results are to be expected because of random variation of measured effects in multiple subsets). Therefore, subset analyses do not represent the same strength of evidence as the overall analysis of a randomized trial as designed unless explicit prospective hypotheses are made for the analyzed subset. Subset analyses should be placed in the next lower category of study design, nonrandomized controlled clinical trials.
- Case series or other observational study designs.
These clinical experiences are the weakest form of study design, but they may be the only available or practical information in support of a therapeutic strategy. This is especially true in the case of rare diseases or when the evolution of the therapy predates the common use of randomized study designs in medical practice. Case series or observational designs may also provide the only practical design when treatments in study arms are radically different (e.g., amputation vs. limb-sparing surgery). Nevertheless, they always raise issues of patient selection and comparability with other populations. In terms of generalizability to other populations, the strongest to weakest study designs are the following: population-based studies that have a definable population, nonpopulation-based but consecutive series, and nonconsecutive cases. Some study designs (e.g., cohort and case-control studies) have internal-control study subjects, while others do not (e.g., case-only series with no internal comparison group or case-only series that are compared with historical controls).
Even large, population-based, observational studies with internal controls that compare therapeutic strategies in oncology should be interpreted with extreme caution. In a study that directly compared observational studies with RCT results, investigators performed a systematic MEDLINE search for observational studies published from 2000 to 2016 using data from the Surveillance, Epidemiology, and End Results (SEER) Program, SEER-Medicare, or the National Cancer Database that compared treatment regimens for any diagnosis of cancer. The investigators matched 350 treatment comparisons to 121 RCTs that made the same comparison. They found no significant correlation between the hazard ratios (HRs) of the observational studies and the matching RCTs (concordance correlation coefficient, 0.08; 95% confidence interval [CI], -0.07 to 0.23). Only 40% of matched studies agreed with respect to treatment effects (kappa statistic, 0.037), and only 62% of the HRs in observational studies were within the 95% CI of the matched RCT. None of these correlations exceeded what would be expected by chance, and correlations did not improve in the studies that used the most sophisticated statistical methods of analysis, including propensity score weighting, instrumental variable adjustment, or sensitivity analysis. Of note, among the 70 observational studies ranked as rigorous and that reported overall survival, 35 reported a positive result, while the RCT reported either no difference or showed an effect in the opposite direction.
- LeLorier J, Grégoire G, Benhaddad A, et al.: Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 337 (8): 536-42, 1997.
- Bailar JC: The promise and problems of meta-analysis. N Engl J Med 337 (8): 559-61, 1997.
- Soni PD, Hartman HE, Dess RT, et al.: Comparison of Population-Based Observational Studies With Randomized Trials in Oncology. J Clin Oncol 37 (14): 1209-1216, 2019.
Strength of Study End Points
Commonly measured end points for adult and pediatric cancer treatment studies are listed below in descending order of strength:
- Overall survival from a defined time (or total mortality).
This outcome is arguably the most important one to patients and is also the most easily defined and least subject to investigator bias.
- Cause-specific mortality (or cause-specific mortality from a defined time).
Although this may be the most biologically important outcome in a disease-specific intervention, it is a more subjective end point than total mortality and more subject to investigator bias in its determination. This end point may also miss important effects of therapy that may actually shorten overall survival.
- Carefully assessed quality of life.
This is an extremely important end point to patients. Careful documentation of this end point within a strong study design is therefore sufficient for most physicians to incorporate a treatment into their practices.
- Indirect surrogates.
- Event-free survival.
- Disease-free survival.
- Progression-free survival.
- Tumor response rate.
These end points may be subject to investigator interpretation. More importantly, they may, but do not automatically, translate into direct patient benefit such as survival or quality of life. Nevertheless, it is rational in many circumstances to use a treatment that improves these surrogate end points while awaiting a more definitive end point to support its use.
Summation of Levels of Evidence
After considering the strength of the study design and the strength of the study end points, the following levels of evidence may be added to PDQ Adult Cancer Treatment Summaries and PDQ Pediatric Cancer Treatment Summaries:
- A1 Evidence. Randomized controlled clinical trial (RCT) (double-blinded or nonblinded) with an end point of overall survival (OS) from a defined time, total mortality, or cause-specific mortality.
- A2 Evidence. Meta-analysis of RCTs with an end point of OS from a defined time, total mortality, or cause-specific mortality.
- A3 Evidence. RCT (double-blinded or nonblinded) with an end point of quality of life that is well-collected, clinically meaningful, and carefully assessed.
- B1 Evidence. RCT (double-blinded or nonblinded) with an end point of event-free survival (EFS), disease-free survival (DFS), or progression-free survival (PFS) differences.
- B2 Evidence. Meta-analysis of RCTs with an end point of EFS, DFS, PFS, or carefully assessed quality of life.
- B3 Evidence. RCT (double-blinded or nonblinded) with an end point of tumor response rate or quality-of-life measurement that does not reach the level described in A3.
- B4 Evidence. Nonrandomized, multicenter, prospective, controlled clinical trial with a planned comparison of efficacy including an end point of OS from a defined time, total mortality, cause-specific mortality, carefully assessed quality of life, EFS, DFS, PFS, or tumor response differences.
- C1 Evidence. Case series or other observational study design, including trials with nonconsecutive cases, with an end point of OS from a defined time, total mortality, cause-specific mortality, or carefully assessed quality of life.
- C2 Evidence. Case series or other observational study design, including trials with nonconsecutive cases, with an end point of EFS, DFS, or PFS differences.
- C3 Evidence. Case series or other observational study design, including trials with nonconsecutive cases, with an end point of tumor response rate or quality-of-life measurement that does not reach the level described in A3.
- D Evidence. Anecdotal experience or expert opinion.
Beyond the level of evidence, all recommendations must take into account other issues that cannot be so easily quantified, such as toxicity, width of confidence intervals of observations, trial size, quality assurance in the trial, and cost. Nevertheless, the PDQ ranking system provides a categorization of strength of evidence as a starting point to evaluate study results.
Changes to This Summary (10 / 13 / 2022)
The PDQ cancer information summaries are reviewed regularly and updated as new information becomes available. This section describes the latest changes made to this summary as of the date above.
This summary was comprehensively reviewed and extensively revised.
This summary is written and maintained by the PDQ Adult Treatment Editorial Board, which is editorially independent of NCI. The summary reflects an independent review of the literature and does not represent a policy statement of NCI or NIH. More information about summary policies and the role of the PDQ Editorial Boards in maintaining the PDQ summaries can be found on the About This PDQ Summary and PDQ® - NCI's Comprehensive Cancer Database pages.
About This PDQ Summary
Purpose of This Summary
This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about the formal ranking system used by the PDQ Editorial Boards to assess evidence supporting the use of specific interventions or approaches. It is intended as a resource to inform and assist clinicians in the care of their patients. It does not provide formal guidelines or recommendations for making health care decisions.
Reviewers and Updates
This summary is reviewed regularly and updated as necessary by the PDQ Adult Treatment Editorial Board, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).
Board members review recently published articles each month to determine whether an article should:
- be discussed at a meeting,
- be cited with text, or
- replace or update an existing article that is already cited.
Changes to the summaries are made through a consensus process in which Board members evaluate the strength of the evidence in the published articles and determine how the article should be included in the summary.
Any comments or questions about the summary content should be submitted to Cancer.gov through the NCI website's Email Us. Do not contact the individual Board Members with questions or comments about the summaries. Board members will not respond to individual inquiries.
Levels of Evidence
Some of the reference citations in this summary are accompanied by a level-of-evidence designation. These designations are intended to help readers assess the strength of the evidence supporting the use of specific interventions or approaches. The PDQ Adult Treatment Editorial Board uses a formal evidence ranking system in developing its level-of-evidence designations.
Permission to Use This Summary
PDQ is a registered trademark. Although the content of PDQ documents can be used freely as text, it cannot be identified as an NCI PDQ cancer information summary unless it is presented in its entirety and is regularly updated. However, an author would be permitted to write a sentence such as "NCI's PDQ cancer information summary about breast cancer prevention states the risks succinctly: [include excerpt from the summary]."
The preferred citation for this PDQ summary is:
PDQ® Adult Treatment Editorial Board. PDQ Levels of Evidence for Adult and Pediatric Cancer Treatment Studies. Bethesda, MD: National Cancer Institute. Updated <MM/DD/YYYY>. Available at: https://www.cancer.gov/publications/pdq/levels-evidence/treatment. Accessed <MM/DD/YYYY>. [PMID: 26389191]
Images in this summary are used with permission of the author(s), artist, and/or publisher for use within the PDQ summaries only. Permission to use images outside the context of PDQ information must be obtained from the owner(s) and cannot be granted by the National Cancer Institute. Information about using the illustrations in this summary, along with many other cancer-related images, is available in Visuals Online, a collection of over 2,000 scientific images.
Based on the strength of the available evidence, treatment options may be described as either "standard" or "under clinical evaluation." These classifications should not be used as a basis for insurance reimbursement determinations. More information on insurance coverage is available on Cancer.gov on the Managing Cancer Care page.
More information about contacting us or receiving help with the Cancer.gov website can be found on our Contact Us for Help page. Questions can also be submitted to Cancer.gov through the website's Email Us.
Last Revised: 2022-10-13