Papers and patents are becoming less disruptive over time

WoS data

We limit our focus to research papers published between 1945 and 2010. Although the WoS data begin in the year 1900, the scale and social organization of science shifted markedly in the post-war era, thereby making comparisons with the present difficult and potentially misleading67,68,69. We end our analyses of papers in 2010 because some of our measures require several subsequent years of data following paper publication. The WoS data archive 65 million documents published in 28,968 journals between 1900 and 2017 and 735 million citations among them. In addition, the WoS data include the titles and the full text of abstracts for 65 and 29 million records, respectively, published between 1913 and 2017. After eliminating non-research documents (for example, book reviews and commentaries) and subsetting the data to the 1945–2010 window, the analytical sample consists of n = 24,659,076 papers.

Patents View data

We limit our focus to patents granted from 1976, which is the earliest year for which machine-readable records are available in the Patents View data. As we did with papers, we end our analyses in 2010 because some measures require data from subsequent years for calculation. The Patents View data are the most exhaustive source of historical data on inventions, with information on 6.5 million patents granted between 1976 and 2017 and their corresponding 92 million citations. The Patents View data include the titles and abstracts for 6.5 million patents granted between 1976 and 2017. Following previous work12, we focused our attention on utility patents, which cover the vast majority (91% in our data) of patented inventions. After eliminating non-utility patents and subsetting the data to the 1976–2010 window, the analytical sample consists of n = 3,912,353 patents.

Highly disruptive papers and patents

Observations (and claims) of slowing progress in science and technology are increasingly common, supported not only by the evidence we report, but also by previous research from diverse methodological and disciplinary perspectives10,11,18,19,20,21,22,23,24. Yet as noted in the main text, there is a tension between observations of slowing progress from aggregate data on the one hand, and continuing reports of seemingly major breakthroughs in many fields of science and technology—spanning everything from the measurement of gravity waves to the sequencing of the human genome—on the other. In an effort to reconcile this tension, we considered the possibility that whereas overall, discovery and invention may be less disruptive over time, the high-level view taken in previous work may mask considerable heterogeneity. Put differently, aggregate evidence of slowing progress does not preclude the possibility that some subset of discoveries and inventions is highly disruptive.

To evaluate this possibility, we plot the number of disruptive papers (Fig. 4a) and patents (Fig. 4b) over time, where disruptive papers and patents are defined as those with CD5 values >0. Within each panel, we plot four lines, corresponding to four evenly spaced intervals—(0, 0.25], (0.25, 0.5], (0.5, 0.75], (0.75, 1.00]—over the positive values of CD5. The first two intervals therefore correspond to papers and patents that are relatively weakly disruptive, whereas the latter two correspond to those that are more strongly so (for example, where we may expect to see major breakthroughs such as some of those mentioned above). Despite major increases in the numbers of papers and patents published each year, we see little change in the number of highly disruptive papers and patents, as evidenced by the relatively flat red, green and orange lines. Notably, this ‘conservation’ of disruptive work holds even despite fluctuations over time in the composition of the scientific and technological fields responsible for producing the most disruptive work (Fig. 4, inset plots). Overall, these results help to account for simultaneous observations of both major breakthroughs in many fields of science and technology and aggregate evidence of slowing progress.

Relative contribution of field, year and author or inventor effects

Our results show a steady decline in the disruptiveness of science and technology over time. Moreover, the patterns we observe are generally similar across broad fields of study, which suggests that the factors driving the decline are not unique to specific domains of science and technology. The decline could be driven by other factors, such as the conditions of science and technology at a point in time or the particular individuals who produce science and technology. For example, exogenous factors such as economic conditions may encourage research or invention practices that are less disruptive. Similarly, scientists and inventors of different generations may have different approaches, which may result in greater or lesser tendencies for producing disruptive work. We therefore sought to understand the relative contribution of field, year and author (or inventor) factors to the decline of disruptive science and technology.

To do so, we decomposed the relative contribution of field, year and author fixed effects to the predictive power of regression models of the CD index. The unit of observation in these regressions is the author (or inventor) × year. We enter field fixed effects using granular subfield indicators (that is, 150 WoS subject areas for papers, 138 NBER subcategories for patents). For simplicity, we did not include additional covariates beyond the fixed effects in our models. Field fixed effects capture all field-specific factors that do not vary by author or year (for example, the basic subject matter); year fixed effects capture all year-specific factors that do not vary by field or author (for example, the state of communication technology); author (or inventor) fixed effects capture all author-specific factors that do not vary by field or year (for example, the year of PhD awarding). After specifying our model, we determine the relative contribution of field, year and author fixed effects to the overall model adjusted R2 using Shapley–Owen decomposition. Specifically, given our n = 3 groups of fixed effects (field, year and author) we evaluate the relative contribution of each set of fixed effects by estimating the adjusted R2 separately for the 2n models using subsets of the predictors. The relative contribution of each set of fixed effects is then computed using the Shapley value from game theory70.

Results of this analysis are shown in Extended Data Fig. 5, for both papers (top bar) and patents (bottom bar). Total bar size corresponds to the value of the adjusted R2 for the fully specified model (that is, with all three groups of fixed effects). Consistent with our observations from plots of the CD index over time, we observe that for both papers and patents, field-specific factors make the lowest relative contribution to the adjusted R2 (0.02 and 0.01 for papers and patents, respectively). Author fixed effects, by contrast, appear to contribute much more to the predictive power of the model, for both papers (0.20) and patents (0.17). Researchers and inventors who entered the field in more recent years may face a higher burden of knowledge and thus resort to building on narrower slices of existing work (for example, because of more specialized doctoral training), which would generally lead to less disruptive science and technology being produced in later years, consistent with our findings. The pattern is more complex for year fixed effects; although year-specific factors that do not vary by field or author hold more explanatory power than field for both papers (0.02) and patents (0.16), they appear to be substantially more important for the latter than the former. Taken together, these findings suggest that relatively stable factors that vary across individual scientists and inventors may be particularly important for understanding changes in disruptiveness over time. The results also confirm that domain-specific factors across fields of science and technology play a very small role in explaining the decline in disruptiveness of papers and patents.

Alternative samples

We also considered whether the patterns we document may be artefacts of our choice of data sources. Although we observe consistent trends in both the WoS and Patents View data, and both databases are widely used by the Science of Science community, our results may conceivably be driven by factors such as changes in coverage (for example, journals added or excluded from WoS over time) or even data errors rather than fundamental changes in science and technology. To evaluate this possibility, we therefore calculated CD5 for papers in four additional databases—JSTOR, the American Physical Society corpus, Microsoft Academic Graph and PubMed. We included all records from 1930 to 2010 from PubMed (16,774,282 papers), JSTOR (1,703,353 papers) and American Physical Society (478,373 papers). The JSTOR data were obtained via a special request from ITHAKA, the data maintainer (http://www.ithaka.org), as were the American Physical Society data (https://journals.aps.org/datasets). We downloaded the Microsoft Academic Graph data from CADRE at Indiana University (https://cadre.iu.edu/). The PubMed data were downloaded from the National Library of Medicine FTP server (ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline). Owing to the exceptionally large scale of Microsoft Academic Graph and the associated computational burden, we randomly extracted 1 million papers. As shown in Extended Data Fig. 6, the downward trend in disruptiveness is evident across all samples.

Alternative bibliometric measures

Several recent papers have introduced alternative specifications of the CD index12. We evaluated whether the declines in disruptiveness we observe are corroborated using two alternative variations. One criticism of the CD index has been that the number of papers that cite only the focal paper’s references dominates the measure13. Bornmann et al.13 proposes ({{rm{DI}}}_{l}^{{rm{nok}}}) as a variant that is less susceptible to this issue. Another potential weakness of the CD index is that it could be very sensitive to small changes in the forward citation patterns of papers that make no backward citations15. Leydesdorff et al.15 suggests DI* as an alternate indicator of disruption that addresses this issue. Therefore, we calculated ({{rm{DI}}}_{l}^{{rm{nok}}}) where l = 5 and DI* for 100,000 randomly drawn papers and patents each from our analytic sample. Results are presented in Extended Data Fig. 7a (papers) and b (patents). The blue lines indicate disruption based on Bornmann et al.13 and the orange lines indicate disruption based on Leydesdorff et al.15. Across science and technology, the two alternative measures both show declines in disruption over time, similar to the patterns observed with the CD index. Taken together, these results suggest that the declines in disruption we document are not an artefact of our particular operationalization.

Robustness to changes in publication, citation and authorship practices

We also considered whether our results may be attributable to changes in publication, citation or authorship practices, rather than by substantive shifts in discovery and invention. Perhaps most critically, as noted in the main text, there has been a marked expansion in publishing and patenting over the period of our study. This expansion has naturally increased the amount of previous work that is relevant to current science and technology and therefore at risk of being cited, a pattern reflected in the marked increase in the average number of citations made by papers and patents (that is, papers and patents are citing more previous work than in previous eras)44,45. Recall that the CD index quantifies the degree to which future work cites a focal work together with its predecessors (that is, the references in the bibliography of the focal work). Greater citation of a focal work independently of its predecessors is taken to be evidence of a social process of disruption. As papers and patents cite more previous work, however, the probability of a focal work being cited independently of its predecessors may decline mechanically; the more citations a focal work makes, the more likely future work is to cite it together with one of its predecessors, even by chance. Consequently, increases in the number of papers and patents available for citing and in the average number of citations made by scientists and inventors may contribute to the declining values of the CD index. In short, given the marked changes in science and technology over our long study window, the CD index of papers and patents published in earlier periods may not be directly comparable to those of more recent vintage, which could in turn render our conclusions about the decline in disruptive science and technology suspect. We addressed these concerns using three distinctive but complementary approaches—normalization, regression adjustment and simulation.

Verification using normalization

First, following common practice in bibliometric research39,40,41,42,43, we developed two normalized versions of the CD index, with the goal of facilitating comparisons across time. Among the various components of the CD index, we focused our attention on the count of papers or patents that only cite the focal work’s references (Nk), as this term would seem most likely to scale with the increases in publishing and patenting and in the average number of citations made by papers and patents to previous work13. Larger values of Nk lead to smaller values of the CD index. Consequently, marked increases in Nk over time, particularly relative to other components of the measure, may lead to a downward bias, thereby inhibiting our ability to accurately compare disruptive science and technology in later years with earlier periods.

Our two normalized versions of the CD index aim to address this potential bias by attenuating the effect of increases in Nk. In the first version, which we call ‘Paper normalized’, we subtract from Nk the number of citations made by the focal paper or patent to previous work (Nb). The intuition behind this adjustment is that when a focal paper or patent cites more previous work, Nk is likely to be larger because there are more opportunities for future work to cite the focal paper or patent’s predecessors. This increase in Nk would result in lower values of the CD index, although not necessarily as a result of the focal paper or patent being less disruptive. In the second version, which we call ‘field × year normalized’, we subtract Nk by the average number of backward citations made by papers or patents in the focal paper or patent’s WoS research area or NBER technology category, respectively, during its year of publication (we label this quantity ({N}_{{rm{b}}}^{{rm{m}}{rm{e}}{rm{a}}{rm{n}}})). The intuition behind this adjustment is that in fields and time periods in which there is a greater tendency for scientists and inventors to cite previous work, Nk is also likely to be larger, thereby leading to lower values of the CD index, although again not necessarily as a result of the focal paper or patent being less disruptive. In cases in which either Nb or ({N}_{{rm{b}}}^{{rm{m}}{rm{e}}{rm{a}}{rm{n}}}) exceed the value of Nk, we set Nk to 0 (that is, Nk is never negative in the normalized measures). Both adaptations of the CD index are inspired by established approaches in the scientometrics literature, and may be understood as a form of ‘citing side normalization’ (that is, normalization by correcting for the effect of differences in lengths of references lists)40.

In Extended Data Fig. 8, we plot the average values of both normalized versions of the CD index over time, separately for papers (Extended Data Fig. 8a) and patents (Extended Data Fig. 8d). Consistent with our findings reported in the main text, we continue to observe a decline in the CD index over time, suggesting that the patterns we observe in disruptive science and technology are unlikely to be driven by changes in citation practices.

Verification using regression adjustment

Second, we adjusted for potential confounding using a regression-based approach. This approach complements the bibliometric normalizations just described by allowing us to account for a broader array of changes in publication, citation and authorship practices in general (the latter of which is not directly accounted for in either the normalization approach or the simulation approach described next), and increases the amount of previous work that is relevant to current science and technology in particular. In Supplementary Table 1, we report the results of regression models predicting CD5 for papers (Models 1–4) and patents (Models 5–8), with indicator variables included for each year of our study window (the reference categories are 1945 and 1980 for papers and patents, respectively). Models 1 and 4 are the baseline models, and include no other adjustments beyond the year indicators. In Models 2 and 5, we add subfield fixed effects (WoS subject areas for papers and NBER technology subcategories for patents). Finally, in Models 3–4 and 7–8, we add control variables for several field × year level—number of new papers orpatents, mean number of papers or patents cited, mean number of authors or inventors per paper—and paper- or patent-level—number of papers or patents cited—characteristics, thereby enabling more robust comparisons in patterns of disruptive science and technology over the long time period spanned by our study. For the paper models, we also include a paper-level control for the number of unlinked references (that is, the number of citations to works that are not indexed in WoS). We find that the inclusion of these controls improves model fit, as indicated by statistically significant Wald tests presented below the relevant models.

Across all eight models shown in Supplementary Table 1, we find that the coefficients on the year indicators are statistically significant and negative, and growing in magnitude over time, which is consistent with the patterns we reported based on unadjusted CD5 values index in the main text (Fig. 2). In Extended Data Fig. 8, we visualize the results of our regression-based approach by plotting the predicted CD5 values separately for each of the year indicators included in Models 4 (papers) and 8 (patents). To enable comparisons with raw CD5 values shown in the main text, we present the separate predictions made for each year as a line graph. As shown in the figure, we continue to observe declining values of the CD index across papers and patents, even when accounting for changes in publication, citation and authorship practices.

Verification using simulation

Third, following related work in the Science of Science14,71,72,73, we considered whether our results may be an artefact of changing patterns in publishing and citation practices by using a simulation approach. In essence, the CD index measures disruption by characterizing the network of citations around a focal paper or patent. However, many complex networks, even those resulting from random processes, exhibit structures that yield non-trivial values on common network measures (for example, clustering)74,75,76. During the period spanned by our study, the citation networks of science and technology experienced significant change, with marked increases in both the numbers of nodes (that is, papers or patents) and edges (that is, citations). Thus, rather than reflecting a meaningful social process, the observed declines in disruption may result from these structural changes in the underlying citation networks.

To evaluate this possibility, we followed standard techniques from network science75,77 and conducted an analysis in which we recomputed the CD index on randomly rewired citation networks. If the patterns we observe in the CD index are the result of structural changes in the citation networks of science and technology (for example, growth in the number of nodes or edges) rather than a meaningful social process, then these patterns should also be visible in comparable random networks that experience similar structural changes. Therefore, finding that the patterns we see in the CD index differ for the observed and random citation networks would serve as evidence that the decline in disruption is not an artefact of the data.

We began by creating copies of the underlying citation network on which the values of the CD index used in all analyses reported in the main text were based, separately for papers and patents. For each citation network (one for papers, one for patents), we then rewired citations using a degree-preserving randomization algorithm. In each iteration of the algorithm, two edges (for example, A–B and C–D) are selected from the underlying citation network, after which the algorithm attempts to swap the two endpoints of the edges (for example, A–B becomes A–D, and C–D becomes C–B). If the degree centrality of A, B, C and D remains the same after the swap, the swap is retained; otherwise, the algorithm discards the swap and moves on to the next iteration. When evaluating degree centrality, we consider ‘in-degree’ (that is, citations from other papers or patents to the focal paper or patent) and ‘out-degree’ (that is, citations from the focal paper or patent to other papers or patents) separately. Furthermore, we also required that the age distribution of citing and cited papers or patents was identical in the original and rewired networks. Specifically, swaps were only retained when the publication year of the original and candidate citations was the same. In light of these design choices, our rewiring algorithm should be seen as fairly conservative, as it preserves substantial structure from the original network. There is no scholarly consensus on the number of swaps necessary to ensure the original and rewired networks are sufficiently different from one another; the rule we adopt here is 100 × m, where m is the number of edges in the network being rewired.

Following previous work14, we created ten rewired copies of the observed citation networks for both papers and patents. After creating these rewired citation networks, we then recomputed CD5. Owing to the large scale of the WoS data, we base our analyses on a random subsample of ten million papers; CD5 was computed on the rewired network for all patents. For each paper and patent, we then compute a z score that compares the observed CD5 value to those of the same paper or patent in the ten rewired citation networks. Positive z scores indicate that the observed CD5 value is greater (that is, more disruptive) than would be expected by chance; negative z scores indicate that the observed values are lesser (that is, more consolidating).

The results of these analyses are shown in Extended Data Fig. 8, separately for papers (Extended Data Fig. 8c) and patents (Extended Data Fig. 8f). Lines correspond to the average z score among papers or patents published in the focal year. The plots reveal a pattern of change in the CD index over and beyond that ‘baked in’ to the changing structure of the network. We find that on average, papers and patents tend to be less disruptive than would be expected by chance, and moreover, the gap between the observed CD index values and those from the randomly rewired networks is increasing over time, which is consistent with our findings of a decline in disruptive science and technology.

Taken together, the results of the foregoing analyses suggest that although there have been marked changes in science and technology over the course of our long study window, particularly with respect to publication, citation and authorship practices, the decline in disruptive science and technology that we document using the CD index is unlikely to be an artefact of these changes, and instead represents a substantive shift in the nature of discovery and invention.

Regression analysis

We evaluate the relationship between disruptiveness and the use of previous knowledge using regression models, predicting CD5 for individual papers and patents, based on three indicators of previous knowledge use—the diversity of work cited, mean number of self-citations and mean age of work cited. Our measure of the diversity of work cited is measured at the field × year level; all other variables included in the regressions are defined at the level of the paper or patent. To account for potential confounding factors, our models included year and field fixed effects. Year fixed effects account for time variant factors that affect all observations (papers or patents) equally (for example, global economic trends). Field fixed effects account for field-specific factors that do not change over time (for example, some fields may intrinsically value disruptive work over consolidating ones). In contrast to our descriptive plots, for our regression models, we adjust for field effects using the more granular 150 WoS ‘extended subjects’ (for example, ‘biochemistry and molecular biology’, ‘biophysics’, ‘biotechnology and applied microbiology’, ‘cell biology’, ‘developmental biology’, ‘evolutionary biology’ and ‘microbiology’ are extended subjects within the life sciences and biomedicine research area) and 38 NBER technology subcategories (for example, ‘agriculture’, ‘food’, ‘textile’; ‘coating’; ‘gas’; ‘organic’; and ‘resins’ are subcategories within the chemistry technology category).

In addition, we also include controls for the ‘mean age of team members’ (that is, ‘career age’, defined as the difference between the publication year of the focal paper or patent and the first year in which each author or inventor published a paper or patent) and the ‘mean number of previous works produced by team members’. Although increases in rates of self-citations may indicate that scientists and inventors are becoming more narrowly focused on their own work, these rates may also be driven in part by the amount of previous work available for self-citing. Similarly, although increases in the age of work cited in papers and patents may indicate that scientists and inventors are struggling to keep up, they may also be driven by the rapidly aging workforce in science and technology78,79. For example, older scientists and inventors may be more familiar with or more attentive to older work, or may actively resist change80. These control variables help to account for these alternative explanations.

Supplementary Table 3 shows summary statistics for variables used in the ordinary-least-squares regression models. The diversity of work cited is measured by normalized entropy, which ranges from 0 to 1. Greater values on this measure indicate a more uniform distribution of citations to a wider range of existing work; lower values indicate a more concentrated distribution of citations to a smaller range of existing work. The tables show that the normalized entropy in a given field and year has a nearly maximal average entropy of 0.98 for both science and technology. About 16% of papers cited in a paper are by an author of the focal paper; the corresponding number for patents is about 7%. Papers tend to rely on older work and work that varies more greatly in age (measured by standard deviation) than patents. In addition, the average CD5 of a paper is 0.04 whereas the average CD5 of a patent is 0.12, meaning that the average paper tends to be less disruptive than the average patent.

We find that using more diverse work, less of one’s own work and older work tends to be associated with the production of more disruptive science and technology, even after accounting for the average age and number of previous works produced by team members. These findings are based on our regression results, shown in Extended Data Table 1. Models 6 and 12 present the full regression models. The models indicate a consistent pattern for both science and technology, wherein the coefficients for diversity of work cited are positive and significant for papers (0.159, P < 0.01) and patents (0.069, P < 0.01), indicating that in fields in which there is more use of diverse work, there is greater disruption. Holding all other variables at their means, the predicted CD5 of papers and patents increases by 303.5% and 1.3%, respectively, when the diversity of work cited increases by 1 s.d. The coefficients of the ratio of self-citations to total work cited is negative and significant for papers (−0.011, P < 0.01) and patents (−0.060, P < 0.01), showing that when researchers or inventors rely more on their own work, discovery and invention tends to be less disruptive. Again holding all other variables at their means, the predicted CD5 of papers and patents decreases by 622.9% and 18.5%, respectively, with a 1 s.d. increase in the ratio. The coefficients of the interaction between mean age of work cited and dispersion in age of work cited is positive and significant for papers (0.000, P < 0.01) and patents (0.001, P < 0.01), suggesting that—holding the dispersion of the age of work cited constant—papers and patents that engage with older work are more likely to be disruptive. The predicted CD5 of papers and patents increases by a striking 2,072.4% and 58.4%, respectively, when the mean age of work cited increases by 1 s.d. (about nine and eight years for papers and patents, respectively), again holding all other variables at their means. In summary, the regression results suggest that changes in the use of previous knowledge may contribute to the production of less disruptive science and technology.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Read original article here

Leave a Comment