Research Ideas and Outcomes : Research Idea
Print
Research Idea
Citation functions revisited: learning from the princes
expand article info Joakim Philipson
‡ National Library of Sweden, Stockholm, Sweden
Open Access

Abstract

Background

This article explores the possibility of promoting knowledge export by means of citation function indexing. Instances of knowledge export are exemplified by cross-disciplinary citations, which, it is suggested, may indicate a prolonged life time use of documents. For CiTO, the Citation Typing Ontology, to serve the purpose of promoting knowledge export, it should be more specific about citation functions, separating them from evaluation, and then be put to test as a discovery tool.

New information

To promote cross-disciplinary knowledge export by means of improved indexing of citation functions the examples of long "sleeping beauties" and the "princes" finally awakening them from their slumbers should be analyzed thoroughly. The results of this analysis might be useful to render CiTO more specific and targeted to the goal of serving as a scientific discovery tool. Citation function indexing terms should be combined with domain specific subject headings to make the cross-disciplinary coupling of research areas complete.

Keywords

knowledge export, relevance, citation functions, cross-disciplinarity, CiTO

Overview and background

Citations can often be seen as observable results of a transfer of knowledge, as records of used information. Citations as a potential measure of relevance was noted at least implicitly by Gilbert (1977). However, the use of citations vary greatly. We focus here in particular on cross-disciplinary citations and the different functions they fulfil. What purpose do they serve? We want to know how the cited information is used in the citing context, fully aware that there may be other reasons behind citations than strictly intra-scientific judgments of relevance, e.g. as a purely rhetorical device Gilbert (1977), Moed (2005). The references ultimately appearing in an article may also be determined by factors outside the author's immediate control, such as peer review and journal policies.

Still, why is it that certain documents are being found relevant for the most various purposes over and over again long time after their publication, while others tend to fall into oblivion only a few years after their appearance. Which factors are involved in distinguishing the potentially long-lived cited document from the less successful, more short-lived ones? Van Raan and more recently Ke et al. (2015) studied so called sleeping beauties in science, i.e. instances of a publication that goes unnoticed (‘sleeps’) for a long time and then, almost suddenly, attracts a lot of attention (‘is awakened by a prince’) van Raan (2004). Studying long 'sleeping beauties' (SBs) for the purpose of identifying cross-disciplinary citation functions promises to be rewarding, since "top SBs achieve delayed exceptional importance in disciplines different from those where they were originally published" Ke et al. (2015). Levitt and Thelwall (2008) found a link between multi-disciplinarity and a high citedness rate. However, their study did not address the question of cross-disciplinary knowledge export. Multi-disciplinarity and even more so interdisciplinarity or transdisciplinarity have more to do with the integration or synthesis of scientific disciplines working on a common research project, as in the emerging so called I2S, Integration and Implementation Sciences Bammer (2015). Cross-disciplinarity, on the other hand, is more about researchers in one scientific discipline seeking to apply new methodologies, solutions or problems taken from another, sometimes very distant discipline. Thus, results from studies of interdisciplinarity, transdisciplinarity or multi-disciplinarity cannot automatically be applied to cases of cross-disciplinary knowledge export. By knowledge export we understand here the transfer of knowledge from one discipline to another as documented by cross-disciplinary citations.

Apart from the phenomenon of sleeping beauties, citation analyses have shown substantial variations in citation patterns over time from one discipline to another. There are indications e.g. that documents within the social sciences continue to be cited for a longer period of time than what is the case for the natural sciences Garfield 1979. However, there are also examples of remarkably long-lived documents from the natural sciences. A classic paper by Albert Einstein from 1906 was still being cited in journal articles within fields so diverse as dairy sciences, pharmacology, physiology, ceramics, water pollution, acoustics, fluid mechanics, sedimentary petrology and molecular biology during the 1960s Garfield (1979), and well into this century again within ceramics, mechanics and sedimentology.

Another example is that of Molina and Rowland (1974), a paper from the field of atmospheric chemistry published in 1974, which has continually been cited at least up until the mid 1990s also within disciplines such as computer science, law, management, ophthalmology, optics, political science, pharmacology, sociology, and, even more recently, risk management and medicine. Noteworthy in cases like these, where papers continue to be used and cited over a long period of time, is precisely the subject dispersion of citing papers. In the case of Molina and Rowland (1974), the fact that the paper was published in a prestigious multi-disciplinary scientific journal like Nature most likely promoted its exposure also to scientists from outside atmospheric chemistry. The attention it received was no doubt renewed in 1995 when the authors, together with Paul Crutzen, were awarded the Nobel prize for their work in atmospheric chemistry, particularly concerning the formation and decomposition of ozone.

Still, most articles published in Nature never come near the very high citation score attained by this paper. Moreover, Molina and Rowland (1974) received most citations years after its publication, not while it was still new and outsiders, with a fresh issue of Nature in hand, were more likely to be accidentally exposed to the paper, but still before the Nobel prize award (although admittedly there was a new peak in its citation count in 1995, still lower though than in the top year 1976).

Understanding the multipurposeness of scientific papers and their potential for knowledge export calls for an explanation of the function that the cited source fulfils in the context of the citing documents. How does the cited information fit into this sometimes completely new disciplinary environment? In this paper we examine a few examples of cross-disciplinary citation functions, to see if they could also be expressed by the emerging standard citation typing ontology CiTO Shotton et al. (2015) for the purpose of promoting knowledge export.

Most citation analysis studies so far have been quantitative. Citation counts have been made, e.g, in order to identify the core literature of a scientific discipline and co-citation clustering has been used for mapping the structure of scientific disciplines Garfield (1979). Lipetz, pioneer of qualitative citation analysis, investigated the relationship between cited reference and citing document, aiming to improve the selectivity of citation indexes, but the 29 categories he proposed were obviously not intended to constitute a final judgment on the matter Lipetz (1965) .

Since then qualitative or content-based citation analysis Ding et al. (2014) studies have produced a multitude of different schemes describing the various functions of citations, with considerable overlap between categories, although the exact labels used for classification differ among authors Liu (1993).

The earlier classification schemes for citation functions relied essentially on manual citation analysis of relatively small sets of articles (typically 10 to 100 items), while later attempts have been made to use semi-automated or computational methods for citation classification of larger samples of full-text articles. An overview of these attempts is found in Ding et al. (2014).

However, automated methods for citation classification, relying on explicit signals or cue words for identification of citation functions Teufel et al. (2006), may not capture more complex cross-disciplinary citation relationships of the syntagmatic kind described by Green and Bean (1995), where the relevance of the cited source to the citing document stems rather from the provision of a missing piece of information serving e.g. as part of an evidence chain. An example of this kind of relationship is given in the next section where we will be looking closer at some cross-disciplinary citations apparently representing instances of knowledge export. Thus, this paper still depends on a small number of manually extracted citations from a limited set of articles. The purpose is simply to understand why a scientific article was found useful also outside its original field of research.

Objectives

The main objective of this proposal is to find ways to promote cross-disciplinary scientific knowledge export. One way of doing this is to find and describe the functions of citations in real cross-disciplinary use cases. A second step is then to find corresponding citation functions if any in CiTO, the Citation Typing Ontology, and maybe suggest improvements to CiTO, making it more specifically targeted to serve as a cross-disciplinary discovery tool of research outcomes relevant also to other, sometimes seemingly far removed scientific fields. A further step, yet to be taken, would be to look closer at those "princes" coming from distant disciplines who finally managed to wake up the long "sleeping beauties" from their scientific slumber.

Cross-disciplinary citation functions

What follows are some selected examples of citations of papers from the field of atmospheric chemistry or stratospheric ozone monitoring Philipson (1996), all introduced by a description of the identified citation function followed by an analysis and discussion of a possible application of CiTO object properties.

Comparison: Citation refers to similar results from another field of research. It may appear as a metaphorical type of relation, in which one complex unit is perceived as being structurally equivalent (as a whole or in part) to another Green and Bean (1995):660. The importance of analogical, structural comparison (of similar or dissimilar elements) for knowledge transfer has been extensively described by Day and Goldstone (2012)). So it seems only natural that it figures in cases of cross-disciplinary knowledge export and use of scientific data another field of research. A possible instance of this type is Mastenbrook and Oltmans (1983):

"Similar long-term trends are to be found in total column ozone measurements.... London and Kelley (1974) examining global total ozone found an increase in both the Northern and the Southern Hemisphere during the 1960s."

This article had at the time of access no shared subject descriptors with the cited document London and Kelley (1974) in two different research databases (Aerospace database, accessed March, 1996, and the Pascal database, using exclusively English descriptors.) Thus, this is not a case of topic matching. However, the citation link between Mastenbrook and Oltmans (1983) and London and Kelley (1974) appears to be rather strong, with the citation providing both measurement data, functioning as an item of comparison and lending supporting evidence together with other cited documents to the conclusion that

"the long-term trend in stratospheric water and its similarity to the long-term trend in stratospheric ozone suggest that these changes arise from long-term changes in the intensity of the circulation." Mastenbrook and Oltmans (1983):2164

But obviously, the article is not about stratospheric ozone variation, which is the topic of London and Kelley (1974). The main topic of Mastenbrook and Oltmans (1983) is described by the title: Stratospheric water vapor variability for Washington, DC/ Boulder, CO: 1964-82. Citations for this type of non-topical comparisons seem difficult to represent by means of CiTO. A possible candidate for a suitable CiTO object property in this case would perhaps be cito:extends, but it does not capture accurately the non-topical quality of this instance.

Evidence: Citation is used for support of propositions in citing entity. Instances of conclusive, logically binding proofs may be rare; rather, reference is often to the apparent agreement between measurement data and predictions of a theory or a model. This type of citation might seem more natural for specialists within a narrower field of research, as it may sometimes require expertise in the field to seize the arguments involved. However, there are also clear examples of cross-disciplinary citations for evidence. Consider the following extract from an article published in a botanical journal as an illustration:

"Good estimates of the present stratospheric distribution of ozone and subsequent UV radiation are known (Koller, 1952; Dütsch, 1969; Cutchis, 1974). The total amount of ozone in the northern hemisphere is maximal in spring and minimal in fall. ... It is suggested that among flowering plants of the northern hemisphere, many of which have white or yellow flowers (Table 2), there has been convergent evolution in floral UV absorption. Yellow and white flowers are high in flavonoid pigments which strongly absorb UV light. The seasonality of UV radiation may be one major selective pressure. Yellow and white flowers comprise as much as 85% of an arctic flora (Kevan, 1972)." Utech and Kawano (1975).

Discerning some of the more important of premisses involved in the inference leading to the hypothesis in the third sentence of the extract, there is first the observation of the seasonal variation of stratospheric ozone and the subsequent seasonal variation of ultraviolet radiation reaching the earth, leading to a spring maximum of stratospheric ozone and a subsequent spring minimum of UV-radiation in the northern hemisphere (since stratospheric ozone absorbs UV-radiation). Then there is the knowledge that yellow and white flowers are strong absorbants of UV-radiation. Finally there is the evidence of the predominance of yellow and white flowers in the northern hemisphere. Together these premisses make probable the hypothesis that UV-absorption ability has acted as a selective evolutionary mechanism for flowers in the northern hemisphere. It is important to note here that the different premisses come from different subject areas. The first three cited sources in the extract belong to geophysics or climatology, whereas (Table 2) and (Kevan, 1972) are from botany. Despite the differences in subject, the premisses apparently fit together, as slots in a framework (Green and Bean (1995): 660). One describes certain environmental conditions. Another describes an important property of the object being studied, influencing its adaptation to the conditions described by the first. The third premiss describes the frequency of occurrence of the object being studied, thereby corroborating the importance of the property described by the second premiss. Together they make up an evidential structure, that accounts for the relevance of the cited entities to the purpose of the citing document. Thus, all the cited entities here could apparently be ascribed the CiTO object property cito:isCitedAsEvidenceBy Utech and Kawano (1975). Alternatively, some of these citations, e.g. those of the strictly botanical sources, might also be described by the CiTO property cito:isCitedAsDataSourceBy Utech and Kawano (1975).

Force: Citation refers to a likely structure, mechanism or cause behind observed phenomena. A typical example is a reference to a chemical reaction described by the cited entity. Again this type of citation function would seem to be essentially an internal affair among specialists within a field of research, but examples of outsiders making use of it also occur, as this excerpt from a medical journal illustrates:

"Stratospheric ozone depletion, accompanied by increases in ambient, biologically destructive ultraviolet-B radiation,104 may exacerbate the effect of climate change on infectious diseases. Arising from a different anthropogenic process than climate change, ozone destruction is occurring primarily from reactions between ozone and halogen free radicals derived from chlorofluorocarbons, other halocarbons, and methyl bromide."105Patz (1996) ; ref. (105) is to Molina and Rowland (1974)

No specific object property was found in CiTO for citations referring to a likely cause, mechanism or explanatory force. A significant difference between the evidence and the force citation functions appeared in Philipson (1996), where the 32 citations of Molina and Rowland (1974) for evidence had a median publishing year of 1975, only one year after the cited source, whereas the 26 citations of the force type appeared to be among the most long-lived, in the sample, with a median publishing year of 1984, ten years after the cited source. The sample in that study was too small to allow any definite conclusions, but the apparent difference in age distribution may not be surprising anyway. The reference to an explanatory force in the form of a chemical reaction or structure should be of such permanence that it can be expected to be found not only in articles in scientific journals, but even in textbooks.

Method: Citation refers to the method employed in the cited work. This does not necessarily mean that the same method is used or even advocated by the citing article, as observed in the following example:

"Total ozone data were previously analyzed by a number of authors including Angell and Korshover (1973), London and Kelly [sic!] (1974) with particular interest in quantifying long-term trends. The statistical procedure commonly used in these studies is linear regression analysis (i.e. fitting a straight line) applied to adjusted total ozone values (e.g. deviations from monthly means ...). However, problems arise in the interpretation of results from these linear regression models since these models fail to take account of the positive autocorrelation that is present in the ozone data. Hence, we consider time series analysis that accounts for autocorrelation in a quantitative trend assessment of ozone data."Tiao (1983) :460

In CiTO, the object property relating to method presupposes that the cited method is actually used by the citing document, cito:usesMethodIn. This is a problematic feature of CiTO; while some properties seem to be too general to distinguish between different specific citation functions, other properties, like this one, presuppose an active use or endorsement of the content of the citation function extracted from the cited entity. There are of course a number of other object properties in CiTO expressing a negative evaluation of the cited entity, but these are again more general and hold no information about which function or part of the cited entity that is negatively evaluated. The methodological citations in the aforementioned study (Philipson 1996) were few in number, but their relatively long life might be more than just an accidental effect of the selection. If so, support could be gained from the results of Small (1977) , showing how a scientific paper that was formerly frequently cited for theoretical reasons as describing the structure of collagen suddenly ceased to be among the highly cited papers for a short time, when the focus of research in the field shifted from structural studies to biosynthesis, only to reappear as one of the high ranking cited sources a year later, but then cited rather for its methodology Garfield (1979) :127f.

Motivation: Citation serves as a motivation, in adjunction with other reasons, for the research reported by the citing authors or, more generally, for their writing a paper. This kind of citation should be expected to appear primarily in the introduction of an article, in the rationale or statement of purpose of the paper, as in the following passage, where it announces a new factor from another discipline that specialists within the author's own field of research now must take into account:

"Ophtalmologists working in equatorial regions have long been familiar with the syndrome of solar conjunctivo-keratopathy. [Description of symptoms follows.] ... A new factor has now arisen which threatens significantly to increase this hazard and possibly to extend the geopgraphical area in which this minor but apparently incurable syndrome may be encountered. It is the purpose of this note to bring the new circumstances to the attention of ophtalmologists. The new factor is the growing threat to the ozone layer from the ever-increasing quantities of chlorofluoromethane gases released into the atmosphere, mainly from aerosol sprays.... Molina and Rowland (1974) described the threat to the ozone layer." Youngson (1975), with ref. to Molina and Rowland (1974)

Again, there seems to be no directly corresponding object property in CiTO for this type of citation function expressing the motivation for or purpose of a research paper. A more general CiTO property that could be used to cover also cases like this is perhaps cito:obtainsBackgroundFrom, but it does not capture the specificity of this citation function, as documents can be cited for background information also without being directly instrumental for the specific purpose of the citing entity.

Result: Citation involves an implication, viz. if information C0 contained in cited document is true, and if furthermore conditions C1, C2, ... Cn hold good, then the consequences will be such and such. Hence, the citing article does not necessarily have to endorse a claim of truth for the cited information; the only claim is for the potential result, given the conditions described by the antecedent of the implication. The auxiliary conditions C1, C2, ... Cn furthermore do not have to be topically related to the cited information. The only requirement is that there must be no contradiction among them. In Philipson (1996) several instances of this type of citation appeared in articles from journals, that were clearly peripheral to the field of research concerned with stratospheric ozone monitoring, coming from such disciplines as molecular biology, botany, or ophthalmology. Researchers from outside naturally should be more concerned with the implications of the cited information for their own field of research, rather than with trying to assess the validity of that information, lacking the necessary specialist competence for that. The following passage may serve as an example:

"Recent studies by Cicerone (4) and Molina and Rowland (7) state that increased use of fluorocarbons in aerosols and refrigerants could severely deplete the protective layer of ozone in the stratosphere. This would increase the level of UV-B radiation reaching the earth's surface. ... The object of this study was to determine the effects of UV-B irradiation on local lesion development of Chenopodium quinoa Willd. 'Valdivia' plants inoculated with potato virus S (PVS).Semeniuk and Goth (1980); ref. (7) is to Molina and Rowland (1974)

Cito has an object property cito:usesConclusionsFrom that might fit for this kind of citation function, but again it seems the CiTO object property presupposes an active claim of truth for the cited information, whereas the result function described here is more neutral and conditional. In general it would be preferable to separate citation functions from evaluative judgement as clearly as possible, so that each citation function identified could be given one of three values, positive (+), negative (-) or neutral (0).

Now, as we have seen, not all the above examples of citation functions are directly translatable into CiTO object properties, but they nevertheless shed some light on the use of scientific information outside the discipline whence it originated. Possibly other, even more compelling examples such as these can be found, where the age distance between cited and citing documents is larger, as we already saw in section 1 for Einstein (1906) and Molina and Rowland (1974).

Impact

Could citation indexing with CiTO serve the purpose of knowledge export? From the examples above it appears CiTO is not specific enough to capture the finer differences between citation functions. At the same time there seems to be some redundancy in the present version of CiTO Ciancarini et al. (2014), so having index terms more accurately describe citation functions while separating them from value judgments, does not necessarily imply that the number of object properties would have to grow substantially.

We have seen some instances of cross-disciplinary citations characterized by the kind of hierarchical or structural, syntagmatic relationships between citing and cited source, described by Green and Bean (1995). With the citing entity representing the user need, "the topic of the user need and the topic of the cited passage are related as class and subclass, or... as class and class-member" Green and Bean (1995):659. This kind of type-token relationship can be expressed in citations by the provision of an instance of the class referred to. It may also appear in the form of the citation function referred to above as comparison with a structurally equivalent unit.

Structural (or syntagmatic) relationships are those where the topic of the cited passage corresponds to a component within a conceptual syntagmatic structure (...), while the topic of the user need corresponds to another component within the structure, or again, the structure at large Green and Bean (1995):660. We saw an example of this relationship in the evidence function in the case of Utech and Kawano (1975) above.

The limited importance of topic matching relationships in citations was confirmed in a study by Harter et al. (1993) from the area of library and information science, in which the subject similarity among pairs of cited and citing documents was found to be very small. However, independence from topic matching may vary between disciplines. Guerrero-Bote et al. (2007) found a significant correlation between the knowledge export and import rates of different subject categories: This indicates that there are Subject Categories which are more independent, importing and exporting little knowledge, and others with greater flows of knowledge across subject boundaries. Guerrero-Bote et al. (2007)

There are no doubt also numerous instances of scientists not being aware of the potential relevance to their own work of the research performed by scientists within other subject areas, simply because they do not know, let alone refer to the literature of those subject areas. A possible case in point and an example of the fruitfulness in crossing disciplinary barriers for the production of new knowledge is provided by Swanson (1990). The case concerns the discovery of how dietary fish oil could be used as a treatment for Raynaud's syndrome, a disease causing abnormally high blood viscosity and red blood cell rigidity in some patients. A search in MEDLINE and other databases revealed that the intersection set representing the topic matching of fish oil terms with those representing Raynaud's syndrome was actually void Swanson (1990):32. A further search in SCI confirmed the "bibliographic isolation" of the two bodies of literature: With rare exception, they do not have authors in common; they do not cite each other; they do not cite the same literature. However, the two literatures are evidentially relevant to the user's information need. But they are not on the same topic as the user need, that is, treatment options for Raynaud's syndrome Green (1995):650

Implementation

Indexing citation functions is not so much about representing mental models or capturing the original intention of the citing author Ciancarini et al. (2014), Teufel et al. (2006), but rather about describing the actual and potential use - past, present and future - of document contents. It is essential then to look at both sides of citation relationship simultaneously, the citing entity and the cited source. A combination of citation functions and subject headings, extracted from both citing and cited entities might offer even better prospects for knowledge export and provide researchers and readers with new context, adding new relevance to old documents, opening new opportunities for evidence mining. What is needed is a proper test of the capability of an indexing system of citation functions like CiTO, possibly revised and revamped, to serve as a discovery tool across scientific disciplines. Preparation for such a test may start by indexing a sample of outside 'princes', who have awakened some of those long 'sleeping beauties'. Ke et al. (2015) identified a number of these 'princes' suggesting

"that a partial explanation behind the sudden awakening of top SBs may lie in the fact that the paper in question is suddenly “discovered” as relevant by an entire community in another discipline." ... and making the observation that top Sleeping Beauties "are characterized by a typically very high fraction of citations from other disciplines: for about 80% of the top SBs, as much as 75% or more of citations are of interdisciplinary nature."

Next step would be then to have a panel of independent researchers from the same field as the princes, unknowing of her history, find their way to la Belle au bois dormant, by exposing them to a collection of documents from their own research field and similarly indexed by citation functions, of which the prince would be only one item of many.

The resulting indexing scheme of a conclusive test should be sufficiently easy to use, so that virtually anyone who reads and writes and cites would be able to contribute to the indexing effort. Online publishers of scientific journals, managers of digital repositories like JSTOR and existing citation indexes like the Web of Science and CiteSeerX could make it happen by means of crowd-sourcing from the users. Ideally, tagging a scientific article online with citation functions from a controlled index language should be just little more complicated than liking a post on social media.

References