OR WAIT null SECS
Chris Barker, an Independent Statistical Consultant, discusses the differences and biases in summaries obtained from a database called PDQ vs. CTG.
Documented in this article are systematic differences and biases in summaries of oncology clinical trials characteristics obtained from a database called Physicians Data Query (PDQ) vs. clinicaltrials.gov (CTG). Due to the bias, I recommend that PDQ not be used for analysis of characteristics of oncology clinical trials or used for policy decision-making as suggested in the recent publication by Budish (Budish et al, 2013). On April 12, 2016, I extracted a subset of 14,476 industry sponsored oncology clinical trials of a drug or biologic with non-missing clinical trial phase from more than 200,000 clinical studies for the 17-year period 1999 to 2016 in CTG. Budish et al. used PDQ in an analysis to assess characteristics of long-term pharmaceutical investments. The Budish et al. publication variously report the PDQ database has “nearly 12,000” and “more than 17,000” oncology clinical trials or a mean of 945 trials for a 38-year period, 1973 – 2011. Policy decision-makers should be alert to the fact that PDQ is a non-random subset of CTG, and gives a biased summary of characteristics of oncology clinical trials vs CTG.
PDQ’s purpose stated on its website is dissemination of information about cancer research and the database contains NCI sponsored trials. PDQ contains only a non-random subset of CTG and is a biased source of information on characteristics of all clinical trials. Systematic differences in distributions of oncology trial characteristics are indicative of bias due to non-random selection. I recommend that PDQ not be used to make unbiased assessments of clinical trial characteristics. CTG with reporting mandated by federal law, is the preferred unbiased “gold standard” comprehensive database for research on the universe of clinical trials and their characteristics and for policy recommendations.
The PDQ database was used by Budish et al. (2015) “Do Firms Underinvest in Long-Term Research? Evidence from Cancer Clinical Trials.” That paper included summaries of oncology clinical trials characteristics from PDQ to support their conclusion that pharmaceutical firms in general underinvest in long-term research. They also stated the PDQ database was not intended for research. I downloaded and compared characteristics of oncology clinical trials from CTG with characteristics of clinical trials in PDQ presented in the Budish et al. publication. Summaries of trial characteristics from CTG differ markedly and transparently from PDQ. Budish did not report that PDQ are the trials which National Cancer Institute (NCI) co-sponsors and are a non-random subset of the entire CTG database. Systematic differences between the CTG and PDQ database are not unexpected due to the non-random subset. I caution researchers about using PDQ for policy decisions due to biases arising from a non-random subset of clinical trials in CTG.
I also note that Budish et al. provide analyses of characteristics of oncology clinical trials that may be used for policy decisions. More recently, Hirsch et al. recommend use of the CTG database for policy decisions. For the reasons discussed here-in, CTG is the more definitive database for analysis, and the inferences apply to the universe of clinical trials included in the database. Because it is a non-randomly selected subset, PDQ may be used for developing inferences related only to the subset of clinical trials in PDQ but not for inferences for the broad universe of clinical trials such as in CTG. In addition, both Budish et al. and Hirsh et al. combine an external database with respectively PDQ or CTG database. For reasons described in more detail ahead, I caution researchers about the absence of detail in Budish et al. and Hirsch as to how these combinations were prepared, and the serious caveats about the interpretation of the combined data.
There are important caveats for using the CTG database for research on pharmaceutical drug development-it does not provide a link for the sequence of Phase I, II, and III trials that were intended for the approval of a drug nor a link identifying the two Phase III clinical trials that are the basis of a regulatory agency drug approval and marketing authorization. This information would require confirmation with the trial sponsor.
The PDQ database was used in “Do Firms Underinvest in Long-Term Research? Evidence from Cancer Clinical Trials,” published in the July 7, 2015 issue of The American Economic Review. It included summaries of oncology clinical trials characteristics to support the authors’ assertion that pharmaceutical firms underinvest in long-term research. I retrieved a copy of the PDQ database from NCI, and the database help service confirmed that the database at present only includes clinical trials with sponsorship from NCI, but does not include all clinical trials in CTG. Estimates from the two databases are very different and analyses based on PDQ apply only to NCI-sponsored trials. PDQ is a non-random subset of the CTG database, and a prioriit may be expected that distributions of oncology trial characteristics differs between the databases, and that this difference is a bias due to non-random subset. In preparing this manuscript, I found that not all database elements in ACT are available for download and discovered that a second version of the database Aggregate Analysis of Clinical trials (AACT) is available and provides additional fields that are not currently available on download of CTG.
Budish et al. describe that PDQ “… was not developed as a research database and-to the best of our knowledge -has not previously been used as a data source by other researchers…” (Budish et al. p. 22). Budish notes on page 64 of an online appendix to the article, labeled “not for publication,” that contributions to the PDQ database are “voluntary.” Per the documentation for PDQ, contributions to the PDQ are reviewed by medical experts and some portion of the clinical trials are obtained from CTG. CTG, however, with approval by Congress, was designed and intended for use in research. I note also that “online appendices” may not be readily available to readers of the Budish article.
The CTG database maintained by the U.S. National Library of Medicine. Section 113 of the Food and Drug Administration Modernization Act (FDAMA) of 1997 mandates the inclusion of clinical trial data in the CTG database with a few exceptions for early phase clinical trials, which are described on the CTG website. Entry of clinical trial information into the CTG database began in 1999. Clinical trial sponsors who do not report clinical trials in the CTG database or who delay reporting to the CTG database face fines and other penalties. The CTG reporting requirement was reinforced in a subsequent amendment to the FDAMA passed in 2014. It is also expected that every trial collected an informed consent before a patient enrolled in the clinical trial (World Medical Association).
The database used in the present analysis was created from three separate extractions using the advanced search option for the keywords, “oncology,” “cancer,” and “neoplasm” in the condition field. The three separate files were combined, and duplicate records were removed using SAS. The complete dataset, aggregated on April 12, 2016, represents a subset of 15,049 industry-sponsored oncology clinical trials involving drugs or biologic interventions from the more than 200,000 clinical trials contained in the CTG database in Figure I. This sample set includes data covering the 17 years between 1999 (the year of the CTG database’s inception) to 2016, a mean of 885 oncology clinical trials per year. Of the clinical trials in this subset, 14,476 had a non-missing clinical trial phase.
The information in the CTG database is required by federal law. Sponsors are not required to report in CTG whether a drug used in a clinical trial was ultimately approved or whether the drug development program was terminated. Nor does the CTG database require reporting regulatory agreements such as expedited approval, accelerated approval, breakthrough designation, or other expedited regulatory agreements. The CTG database does not provide the specific regulatory definition of clinical trial endpoints as they will appear in the drug label approved by a regulatory authority. Definition of oncology endpoints that may be used for drug approval appear in FDA’s Oncologic Drugs Advisory Committee(ODAC) guidelines. While preparing this manuscript, I discovered that there is a version of CTG for Aggregate Analysis of Clinical Trials (AACT), which includes variables collected for CTG but are not currently included as part of the download available from the website.
Sponsors report clinical trial endpoints, including primary, secondary and other endpoints. Phase I and II oncology clinical trials are typically studies where patients are followed from time of informed consent to end of study for 30 days to six months. Patient tumor response is assessed using a tumor evaluation criteria such as RECIST (Eisenhauer, 2009). The statistical methods used to analyze and report clinical trial results must follow ICH guidelines (ICH E9) but sponsors are not required to report the specific type of statistical method in CTG. Generally, these statistical methods are presented when clinical trial results are published in a clinical journal.
I did not plan to prepare statistical tests to compare databases in my analysis plan-due to the non-random selection, differences between CTG and PDQ are expected, often described statistically as a “foregone conclusion.” The databases overlap and I did not undertake to process the PDQ xml files to determine the studies in common.
Two recent prominent publications address p-values. First, “Statistical Errors” published in Nature (Nuzzo, 2014). Second, recommendations from a blue-ribbon panel of over 20 statistical experts organized by the American Statistical Association (ASA) (Wasserstein, 2016). The ASA is one of the largest professional organizations for statisticians. Nuzzo noted several concerns with p-values, among them was “p-hacking” in reference to performing analyses and presenting only the most statistically significant (i.e., smallest p-value). Wasserstein et al. provided a list of six criteria for justifying the provision of a p-value, and number one was “P-values can indicate how incompatible the data are with a specified statistical model.” Note also that P-values themselves depend in part on sample size.
In this summary of CTG, sample sizes per category range from 1 to more than 13,000. Findings of statistical significance may simply reflect the effect of large sample size and detect statistically significant differences between groups that may not be meaningful; this diminishes the interpretability of p-values. Bishop (1975) suggested a correction for the effect sample size, and the calculation involves dividing the value of the chi-square test statistic by the sample size. Smaller values of a chi-square will tend not to reject the null hypothesis. A related problem for statistical analysis is multiplicity of hypotheses that could be considered for these datasets. A commonly used correction is Bonferroni, where the reference level for significance is divided by the number of hypotheses to account for multiplicity. Comparisons between CTG and PDQ would need to account for any clinical trials that appear in both. I did not undertake the extensive processing of the XML data from PDQ to identify trials in both.
Because the PDQ dataset is a non-randomly selected subset of CTG, there is no statistical model (synonymously probability distribution) by which to prepare and justify p-values. Wasserstein et al, report #6, says, “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.” PDQ is a non-random subset of CTG and differences between the two are due to the non-random selected subset. There is no statistical (synonymously-probability) model associated with the subset, and any p-values for PDQ would not be readily interpretable.
All programming for the summaries presented in this manuscript was completed using SAS V9 and graphs prepared using R2.3.2. Note that some clinical trials, reported in CTG as “Phase I/II” or “Phase II/III,” may represent recent statistical methodology advances called “seamless adaptive trials.” Unless specified otherwise, these were included in the summaries here under the higher phase.
Relatively minor search and data quality issues such as variations in spelling, missing indicators for multi-word search terms, and missing design features were addressed. The CTG database permits the sponsor to enter free text in several search fields, and variations in the spelling of endpoints is not uncommon, such as “progressive free survival” and “progression free survival.” These discrepancies were reviewed and resulting datasets combined after the correction of obvious misspellings. The CTG search engine does not include an indicator of the fields or keywords in the database matched by multi-word search terms, but this issue was addressed by using several variations on the search term. Sponsors occasionally failed to address some key study design features such as use of the randomization and masking and other trial characteristics and these are reported as missing values in the summaries presented here.
The PDQ database is available by first contacting the NCI to request access. The database is made available for download in an XML format from a secure website for up to 90 days. Processing the data requires specialized software for reading the XML file and converting into a format suitable for a statistical programming language, such as SAS. Variables fields in the XML were not readily ascertained and considerable free text-field processing would be required to use the PDQ database. That was not undertaken in preparing this manuscript.
The CTG database subset includes a free text field for disease “condition” for the disease population enrolling in the clinical trial. There are 5,291 unique values of condition in this dataset for oncology clinical trials not presented here. An oncology medical expert would be required to further process these conditions. No attempt was made to categorize these for this manuscript.
Table I describes criteria for comparing the CTG subset and PDQ oncology clinical trial characteristics based on data collected in the CTG and PDQ findings reported by Budish et al. All criteria considered here are based on data collected in the CTG database. Clinical trial characteristics such as “inclusion/exclusion” criteria are collected in CTG and are too detailed to summarize here.
Budish et al. “merge” the SEER database with over nine million records at the patient level with the PDQ database, which describes characteristics of clinical trials. SEER patient-level data is coded using ICD codes. SEER documentation on its website does not describe “inclusion/exclusion” information in the documentation available. Budish does not describe an algorithm for combining SEER data with PDQ that uses the ICD codes, disease type, and inclusion/exclusion criteria in CTG.
Table II demonstrates that 93.6% of oncology clinical trials are for development of a drug, and 6.4% are for a drug/biologic. Table III demonstrates at least 20% of oncology clinical trials in the CTG database enroll patients with two disease conditions and, in some cases, as many as 10 conditions. For example, an early phase clinical trial may evaluate a treatment for any “solid tumors.” Budish et al. appear to assume one disease condition per clinical trial in the PDQ data they use. In CTG, after excluding trials with missing phase information, 81.1% of oncology clinical trials are Phase 0, I, and II, and 4.3% are phase IV, consistent with considerable attrition in drug development prior to Phase III, as noted in Table IV. As would be expected, Phase III oncology clinical trials contain the largest number of patients, with a median size of 383 patients, and Phase I trials, which are typically devoted to determining appropriate doses, contain the smallest number of patients, with a median size of 32 patients. A caveat in table IV is unexpected very large sample sizes (1,670) for some trials in Phase I, where there’s typically a small number of patients for selecting a dose or possibly a “first-in-human” study. Budish et al. does not report clinical trial phase information, and the two databases cannot be compared on this characteristic.
Figure II displays the longitudinal counts of clinical trials registered by year for the period from inception in 1999 to 2016 by trial Phase 0, I, II, III ,and IV, demonstrating more early Phase, 0, I, II trials versus fewer Phase III trials, as would be expected due to attrition of drugs in development. There is a marked increase in the number of reported clinical trials in the period approximately coinciding with 2004 - 2005. Review of the raw text data for the participation of “collaborators” indicates that often this includes contract research organizations (CROs) and other agencies such as the Veterans Administration (VA).
Table IV indicates a relatively small number (%) of studies are combining Phase I and II (11%) or Phase II and III (1.2%). The number of Phase I/II and II/III trials is difficult to verify independently. The Tufts Center for the Study of Drug Development reports that one out of 20 clinical trials are adaptive (Getz, 2013) and a separate Tufts report indicates 475 trials in 2011 had a “20% adoption rate.” Table IV shows that 3.3% of clinical trials are Phase IV. Note that the FDA data standard manual uses the designation “Phase IV” for clinical trials that involve post-marketing requirement and include clinical trials required to fulfill an expedited regulatory approval (FDA, 1995, 1996). In their manuscript, Budish et al. do not report specific clinical trials designated to fulfill accelerated approval and use a “hand count” of approvals for what they describe as a “surrogate endpoint.” It’s worth noting that when a sponsor fails to complete the required studies for accelerated or an otherwise expedited approval, the consequences may result in label restrictions to the complete withdrawal of drug from the market, as was the case when the manufacturers of Luveris failed to complete a required clinical trial (Federal Register, 2016, Sasich, 2012).
As can be seen in Tables III and VI, respectively, approximately 23.1% to 36.2% of CTG Phase 0 through IV oncology clinical trials enrolled patients with one or more clinical conditions, and approximately 54.1% of Phase III oncology trials assessed patient survival. Based on a review of the data, it appears the remainder assess safety or other outcomes for ethics and the fulfillment of regulatory requirements. Table II presents the clinical trial status; 46.1% of studies in this extract are reported as “complete.” Table II also presents information about funding sources, the numerical distribution of collaborators, and whether an intervention is a drug or a biologic. As seen in Table III, 81.1 % of oncology clinical trials are Phase II or earlier. The proportion of Phase II trials is 52.6%, while Phase III trials make up 14.3%. Sixty-three percent of CTG clinical trials described in Table II were reported as having been funded solely by industry; of the remainder, 31.26% were funded by industry in conjunction with other groups, such as NCI, the Department of Defense, and the VA. Funding by either industry or NIH alone is specifically reported in 0.7 % of the oncology clinical trials. There is considerable variation among the open text responses in the “funding” field, and extensive text processing would be required to summarize funding sources more specifically.
Table II presents a summary of missing clinical trial endpoint data by clinical trial phase. When phase is reported, at least 93.1% of clinical trials report a clinical trial endpoint. Table II presents the counts (N) of completed clinical trials where the trial results were not reported, and in general, throughout every phase, more clinical trials have “no results available” than have “results available.” Table II presents other key design features of oncology clinical trials. More than 85% of Phase III trials are randomized (not presented here). Other key design features of clinical trials also include parallel, crossover, and factorial designs. Table II also aggregates information of each clinical trial’s cited primary purpose: 351 trials, or 90.1%, cite “treatment,” while only 2.3% cite “prevention.” Budish et al. report only a single drug for “prevention.” Table II also summarizes whether the clinical trials are ongoing or complete.
As shown in Table VI, 54.1% of Phase III clinical trials have a mortality endpoint, and 38.3% have a progression free survival (PFS) endpoint. Studies may have both endpoints. Of these clinical trials, 46.1% are reported as “complete.” Nineteen of these studies are reported as a drug approval. Table VI also shows that 43 clinical trials, or 0.35%, demonstrate progression-free survival (PFS) only where there is no survival endpoint and may have been for an accelerated approval.
The CTG database does not provide a method of “record linkage” that facilitates the association of specific Phase I, II, and III trials that are intended to be used as part of the approval and marketing authorization. As previously noted, this information would only be available from the trial sponsor. The website notes that there are penalties for failing to report clinical trials. The reporting of trial outcomes among oncology studies is very incomplete, and 90.8% of trials have no results available in CTG.
Clinical trial endpoints are also recorded in “free text” in the CTG and are not the precise wording used to describe an approvable oncology endpoint (FDA’s ODAC). Clinical trial publications in conjunction with the drug label would be required to provide the exact oncology endpoint wording and data collection and analysis details. The data generally appear complete, with the relatively minor provisothat only 96.2% of the trials that were analyzed included phase information (not tabulated).
Both Budish et al. and Hirsch et al. combine their databases with an external database to estimate long-term mortality rates for the cancers. Budish combines with over eight million records in the SEER database. Hirsch combines CTG with an unnamed external cancer database. Neither author describes how the external databases were reliably combined with clinicaltrials.gov and how they are combined with a clinical trial that enrolls multiple cancer types; and neither provides confidence intervals on the estimates. This is not unusual for clinical trials to enroll multiple tumor types. More detail is required to assess these combinations of databases. Further limitations on combining with an external database is that CTG does not include any protocol inclusion or exclusion information. And SEER (used by Budish) provides an ICD9 coding for cancers, while CTG does not provide IC9 codes. This level of detail is essential to assure that the combination was done correctly and prospectively to facilitate others conducting similar research.
The differences in summaries of clinical trial characteristics from two databases, CTG and PDQ, are transparent and arise because PDQ is a non-randomly selected subset of clinical trials in CTG. Differences between databases are not unexpected. The CTG database used here includes clinical trials submitted during 17 years from 1999 (CTG inception) until 2016. Budish reports that PDQ includes clinical trials for 38 years, from 1973 to 2011, a difference of about 21 years. Due to the federal requirement for reporting, CTG is the more current, comprehensive, and robust of the two databases. CTG provides 14,476 oncology clinical trials with non-missing trial phases (15,049 with missing phases included) for the 17-year period 1999-2016, a mean of approximately 852 clinical trials per year vs. a mean of about 945 for PDQ for 38 years. The number of trials per year is standardized by the years of coverage in the database.
There is an additional methodologic limitation of using the SEER database that is not addressed in Budish et al. SEER provide specialized statistical methods for survival estimates. These estimates also account for the systematic differences in registry data due to the fact that SEER is a collection of individual cancer registries, each with varying follow-up periods that vary systematically due to when a registry was included and began reporting data to SEER. SEER estimates are also adjusted for reporting delays (Howlader, 2016).
Budish et al. do not report either trial phase or the number of Phase III clinical trials in PDQ. CTG provides a classification of each clinical trial by Phase, I, III, III, and, in some instances, nomenclature such as Phase I/II and II/III, which may reflect adaptive or seamless clinical trials. CTG includes 2,070 Phase III oncology clinical trials, potentially 1,035 drug approvals based on the regulatory standard of two adequate and well-controlled Phase III trials for an approval and marketing authorization.
Budish et al. incorporate a secondary database of 71 drug approvals between 1990 and 2002. This corresponds to 142 adequate and well-controlled randomized double blind Phase III trials, with the regulatory requirement of two adequate and well-controlled Phase III clinical trials for drug approval. The regulatory requirement of two adequate well-controlled randomized double blind Phase III trials for approval would mean that 16,858 (17,000 – 142) uncategorized clinical trials could be Phase I, Phase II, or earlier phases; failed Phase III; post-approval (post-Phase III); or other trials. Thus, approximately 99% of clinical trials in the PDQ database appear to be either clinical trial phases prior to Phase III, failed Phase III trials, or other trials. This is markedly different than the information contained in the CTG database, which indicates that approximately 80% of clinical trials are Phase II or earlier.
The CTG database reports that throughout the 17-year period, 19 drugs were approved “for marketing,” an approval rate of 1.1 per year (17,000 – 142). This number is likely an underestimate because reporting the status of a drug’s approval is not a mandatory requirement in the CTG database, although that information may be provided by the sponsor entering data. Budish et al. use a secondary database that suggests 71 drug approvals for the 38-year period, approximately 1.8 approvals per year.
Budish et al. state they “hand-collect” clinical trial “surrogate” endpoints, but the specific endpoints and specific drugs and trials are undefined. They note that clinical trials may be approved based on a “surrogate” endpoint. Budish et al. does not define “surrogate,” although the FDA has a guidance on the definitions. This analysis attempted to replicate the count of surrogates cited by Budish et al., and found only 43 Phase III trials in the CTG database-out of 2,070 Phase III studies-with a primary endpoint of progression-free survival and without a survival endpoint, possibly indicating a trial intended for approval of a progression=free survival endpoint.
Budish et al. report “fewer than 500” clinical trials in their database for “prevention,” and do not provide the regulatory definition for approving a drug for “prevention.” The CTG database includes 351 clinical trials with an endpoint that includes the keyword “prevention,” and the database includes both primary and secondary clinical trial endpoints. Budish et al. also describe exactly one drug for “prevention” and do not report the drug’s name, sponsor, and approved endpoints, nor do they make any reference to the drug label or other pertinent clinical trial information such as the sample size requirements for that study.
Budish et al. does not report whether the PDQ reports clinical trial endpoints and does not provide the approved regulatory statements of the endpoint as it appears in the drug label. CTG demonstrates that most clinical trials are Phase II or earlier due to the attrition of drugs in development. Generally, Phase 0, I, and II clinical trials do not result in efficacy claims in an approved label and marketing authorization. Information on trial phase is critical because efficacy labeling information arises from Phase III and from combined (seamless) Phase II/III. All clinical trials in which patients receive study treatment, except possibly for Phase 0, provide drug exposure and patient safety information.
I compared summaries of characteristics of oncology trials from two databases, clinicaltrials.gov and PDQ using multiple criteria, including disease condition, number of trials, trial phase, types of cancer eligible for the trial, funding sources, collaborators, trial size, trial endpoints, missing data, and miscellaneous characteristics. Not unexpectedly, the differences between the two databases are marked because PDQ is a non-random subset of clinical trials in CTG.
The CTG database is widely regarded as the gold standard and an unbiased source for the reporting of clinical trial characteristics. CTG is an extremely comprehensive database because its reporting requirements were approved by Congress and reporting is mandated by federal law. The implications of using PDQ rather than CTG data on the economic model and its policy implications by Budish are outside the scope of this paper. The PDQ database should not be used for policy decisions for clinical trials conducted by the pharmaceutical industry. The CTG database and access is available in the public domain. By federal law, CTG is available for download by anyone with internet access. Clinicaltrials.gov is the preferred database for analysis and reporting of characteristics of oncology clinical trials.
Chris Barker, PhD, is an Independent Statistical Consultant