<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">17</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:8E638694-B4E0-570A-856A-746FF325BF6B</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">Research Ideas and Outcomes</journal-title>
        <abbrev-journal-title xml:lang="en">RIO</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2367-7163</issn>
      <publisher>
        <publisher-name>Pensoft Publishers</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3897/rio.2.e8860</article-id>
      <article-id pub-id-type="publisher-id">8860</article-id>
      <article-id pub-id-type="manuscript">5524</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Small Grant Proposal</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>Research cycle and integrity</subject>
          <subject>Social Science Methodology</subject>
          <subject>Theory and practice of scholarly communication</subject>
        </subj-group>
        <subj-group subj-group-type="sdg">
          <subject>Peace and justice strong institutions</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>The value of statistical tools to detect data fabrication</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Hartgerink</surname>
            <given-names>Chris HJ</given-names>
          </name>
          <email xlink:type="simple">chjh@protonmail.com</email>
          <uri content-type="orcid">https://orcid.org/0000-0003-1050-6809</uri>
          <xref ref-type="aff" rid="A638">638</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>Wicherts</surname>
            <given-names>Jelte M</given-names>
          </name>
          <xref ref-type="aff" rid="A638">638</xref>
        </contrib>
        <contrib contrib-type="author" corresp="no">
          <name name-style="western">
            <surname>van Assen</surname>
            <given-names>Marcel ALM</given-names>
          </name>
          <xref ref-type="aff" rid="A638">638</xref>
        </contrib>
      </contrib-group>
      <aff id="A638">
        <label>638</label>
        <addr-line content-type="verbatim">Tilburg University, Tilburg, Netherlands</addr-line>
        <institution>Tilburg University</institution>
        <addr-line content-type="city">Tilburg</addr-line>
        <country>Netherlands</country>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Chris HJ Hartgerink (<email xlink:type="simple">chjh@protonmail.com</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: </p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2016</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>22</day>
        <month>04</month>
        <year>2016</year>
      </pub-date>
      <volume>2</volume>
      <elocation-id>e8860</elocation-id>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/B8D39C83-F3D1-5D94-ACC1-0FAC1146BAC9">B8D39C83-F3D1-5D94-ACC1-0FAC1146BAC9</uri>
      <uri content-type="zenodo_dep_id" xlink:href="https://zenodo.org/record/575839">575839</uri>
      <history>
        <date date-type="received">
          <day>15</day>
          <month>04</month>
          <year>2016</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Chris Hartgerink, Jelte Wicherts, Marcel van Assen</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
          <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>We aim to investigate how statistical tools can help detect potential data fabrication in the social- and medical sciences. In this proposal we outline three projects to assess the value of such statistical tools to detect potential data fabrication and make the first steps in order to apply them automatically to detect data anomalies, potentially due to data fabrication. In Project 1, we examine the performance of statistical methods to detect data fabrication in a mixture of genuine and fabricated data sets, where the fabricated data sets are generated by actual researchers who participate in our study. We also interview these researchers in order to investigate, in Project 2, different data fabrication characteristics and whether data generated with certain characteristics are better detected with current statistical tools than others. In Project 3 we use software to semi-automatically screen research articles to detect data anomalies that are potentially due to fabrication, and develop and test new software forming the basis for automated screening of research articles for data anomalies, potentially due to data fabrication, in the future.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>data fabrication</kwd>
        <kwd>statistics</kwd>
        <kwd>scientific misconduct</kwd>
        <kwd>integrity</kwd>
      </kwd-group>
      <counts>
        <fig-count count="3"/>
        <table-count count="2"/>
        <ref-count count="29"/>
      </counts>
    </article-meta>
    <notes>
      <sec sec-type="Funding programe">
        <title>Funding programe</title>
        <p>This grant proposal has been submitted for the Phase I grant (IR-ORI-16-001) by the Office of Research Integrity. The only addition to the original grant proposal is the reference to Table 2 in the "Project Management" section.</p>
      </sec>
    </notes>
  </front>
  <body>
    <sec sec-type="Problem statement">
      <title>Problem statement</title>
      <p>There is a clear need to develop and validate statistical tools to detect (potential) data fabrication in the social- and medical sciences. 2% of researchers admit to have either falsified or fabricated research data once in their professional career (<xref ref-type="bibr" rid="B3138142">Fanelli 2009</xref>), but only a dozen cases are discovered in the U.S. and the Netherlands per year, while combined they cover ~300,000 researchers. Arguably, then, the cases that are discovered are only the tip of the iceberg.</p>
      <p>Considering that data fabrication undermines the epistemological pursuit of science and negatively affects the validity of published findings, fairness in the scientific reward system, and trust in science, it is important to improve its detection. In the last decade, journals have started using automated tools as a screening device to detect image manipulation and plagiarism in submitted or accepted articles. Numerous case studies (e.g., in the Journal of Cell Biology) suggest that these screening tools are effective and useful to detect various forms of research misconduct. In the social- and medical sciences, image detection tools are practically useless because data are primarily quantitative and based on observations of behavior, questionnaires, (cognitive) tests, etc. Despite their potential use as a screening device, there are currently no well-established tools to detect data anomalies (potentially) due to fabrication of quantitative data.</p>
      <p>Statistical tools to detect data fabrication have been successful in several ad hoc investigations in the social- and medical sciences, of which the Diederik Stapel case is perhaps the most well known. As in the Fuji case in anesthesiology (<xref ref-type="bibr" rid="B3138152">Carlisle 2012</xref>), statistical results reported in the articles of Stapel allowed for statistical tests that indicated his results were too good to be true (<xref ref-type="bibr" rid="B3138172">Levelt Committee et al. 2012</xref>). Similarly, the raw data underlying some of Stapel’s articles enabled the detection of patterns that were clearly different from what would be expected in data subject to random sampling. Such patterns were also used in the investigations of Smeesters and Sanna (<xref ref-type="bibr" rid="B3138181">Simonsohn 2013</xref>). These cases and earlier research (<xref ref-type="bibr" rid="B3138201">Mosimann et al. 1995</xref>, <xref ref-type="bibr" rid="B3138191">Mosimann et al. 2002</xref>) highlighted that researchers are often quite bad in fabricating data that look genuine. However, little is known about how to distinguish fabricated scientific data from genuine scientific data. In this project, we evaluate the value of statistical tools to detect data fabrication and ways to apply these statistical methods (semi-)automatically in a screening tool to detect data anomalies, potentially due to data fabrication.</p>
      <p>The use of statistical tools is of interest to integrity offices (e.g., ORI), editors, peer-reviewers, or (potential) whistleblowers. Currently, editors and peer-reviewers do not actively look for scientific misconduct whilst reviewing (<xref ref-type="bibr" rid="B3138211">Bornmann et al. 2008</xref>). Computerized tools to automatically screen articles for statistical irregularities could be helpful in detecting problematic data at any stage in the research process, but specifically during or after the publication process. To highlight the speed with which such tools could operate: we have previously applied methods to screen for statistical reporting errors, scanning hundreds of papers per minute (<xref ref-type="bibr" rid="B3138221">Nuijten et al. 2015</xref>​).</p>
    </sec>
    <sec sec-type="Goal(s) and objective(s)">
      <title>Goal(s) and objective(s)</title>
      <p>We investigate the performance of statistical tools to detect potential data fabrication in the social- and medical sciences and their potential as an automatic screening tool. To this end, Project 1 aims to evaluate the performance of statistical tools to detect potential data fabrication by inspecting genuine datasets already available and fabricated datasets generated by researchers in our study. In Project 2, we qualitatively assess ways in which researchers fabricate data based on the interviews from Project 1. Finally, in Project 3, we develop and test software to screen research articles for data anomalies potentially due to data fabrication. With these projects, we aim to improve detection methods and lay the groundwork for a thoroughly developed screening tool for detecting data anomalies potentially due to data fabrication.</p>
    </sec>
    <sec sec-type="Project 1: The detection of fabricated raw data">
      <title>Project 1: The detection of fabricated raw data</title>
      <p><italic>Summary</italic>. <italic>We invite researchers to fabricate data for a fictional study</italic>, <italic>which we try to detect as fabricated. We apply the following methods to detect data fabrication: (i) digit analyses, (ii) variance analyses, and (iii) analyses of the multivariate associations. These three types of analyses to detect data fabrication yield 10 tests of data fabrication, which we combine with the Fisher method to provide an overall test of data fabrication. We inspect the performance of these methods with ROC analyses.</italic></p>
      <p>This project examines the performance of statistical tools to detect data fabrication. To this end, we subject fictional data to various statistical methods. We examine the performance of such statistical tools using genuine data (already available) and fabricated data we ask researchers to generate. Additionally, we investigate the summary statistics of these data, providing a replication of a study we are currently conducting on validating methods to detect data fabrication with summary statistics.</p>
      <p>Digit analysis inspects whether reported values follow expected distributions based on mathematical laws or measurement properties. For instance, <xref ref-type="bibr" rid="B3138236">Benford (1938)</xref> states that the first digit should be 1 in ~30% of the cases, 2 in ~18% of the cases, with higher numbers occurring even less frequently. Based on <xref ref-type="bibr" rid="B3138270">Burns (2009)</xref><xref ref-type="bibr" rid="B3138304">Deckert et al. (2011)</xref>, and <xref ref-type="bibr" rid="B3138314">Diekmann (2007)</xref>, we hypothesize that a tool based on Benford’s law will not be helpful to distinguish genuine from fabricated <italic>latency</italic> data. Terminal (i.e., last) digit analysis tests whether the last digits are uniformly distributed (<xref ref-type="bibr" rid="B3138324">Mosimann and Ratnaparkhi 1996</xref>), because these are expected to contain mostly random (measurement) error.</p>
      <p>Variance analysis inspects whether there is sufficient variation in the reported standard deviations (SDs; Simonsohn, 2013), something that might be forgotten by data fabricators. Because SDs are subject to sampling fluctuation, there should be variation in those SDs. Based on the study’s sample size and mean SD, the expected amount of sampling fluctuation can be simulated. Subsequently, the observed variation in the SDs can be compared with the expected amount of sampling fluctuation to determine whether the data are overly consistent.</p>
      <p>Multivariate associations exist in real data but are often not taken into account by fabricators (e.g., <xref ref-type="bibr" rid="B3138334">Buyse et al. 1999</xref>), resulting in peculiar multivariate results in fabricated data. By comparing the multivariate associations observed in the fabricated data with the meta-analyzed multivariate associations observed in genuine data, we try to detect fabricated data by identifying discrepant multivariate associations.</p>
      <sec sec-type="Procedure project 1">
        <title>Procedure project 1</title>
        <p>Of all Dutch/Flemish researchers who published a peer-reviewed paper incorporating the Stroop task (see below), we collect twenty participants for Project 1. We invite these researchers to participate and, if they are willing, schedule a 1.5-hour session where the experimenter (Chris Hartgerink [CHJH] or student-assistant) visits the researcher. In the invitation, researchers are provided with an information leaflet that explains the general procedure and that their participation is compensated with €100. The leaflet includes the informed consent form that explicitly states that the study entails fabricating data for a fictional study and explains our study focuses on the detection of fabricated data with statistical tools. This leaflet also explains that 3 out of 20 fabricated datasets that are the hardest to detect will get an additional reward of €100, which serves as an incentive to fabricate data that are hard to detect.</p>
        <p>During the session, the instruction explicates the timeframe available for fabrication (i.e., 45 minutes) and specifies the hypotheses in the fictional study for which participating researchers have to fabricate data. We use the <xref ref-type="bibr" rid="B3138354">Stroop (1935)</xref> for these fictional studies, a classic research paradigm in psychology that focuses on participants’ response times. In the actual Stroop paradigm, participants are asked to determine the color a word is presented in (i.e., word colors), but the word also reads a color (i.e., color words). The presented word color (i.e., “red”, “blue”, or “green”) can be either presented in the congruent color (e.g., “red” presented in red) or an incongruent color (i.e., “red” presented in green). The dependent variable in the Stroop task is the response latency, where latency is on average higher for incongruent than for congruent words. Researchers participating in our study are asked to fabricate the mean and SD of latency for congruent and incongruent conditions, for 25 (fictional) individuals (i.e., 2 conditions × 2 statistics × 25 persons = 100 data points). A fabrication spreadsheet is provided, where the researchers fill in their fabricated data and are immediately presented with the results for the specified hypotheses.</p>
        <p>Participants are requested to keep notes on how they fabricate the data for the interview that follows immediately after the participant has completed fabricating data. This interview is semi-structured (audio recorded) and lasts approximately twenty through thirty minutes. They are asked:</p>
        <p>
          <list list-type="order"><list-item><p><italic>What tool or software did you use during the data fabrication process</italic>, <italic>if any? (e.g., Excel, SPSS, calculator, etc.)</italic></p></list-item><list-item><p>
                <italic>Did you apply a specific strategy in fabricating data?</italic>
              </p></list-item><list-item><p>
                <italic>Did you pay specific attention to how the fabricated data looked in the end?</italic>
              </p></list-item><list-item><p>
                <italic>Are you familiar with any statistical tools to detect data fabrication?</italic>
              </p></list-item><list-item><p>
                <italic>Is there anything else you would like to note about how you fabricated the results?</italic>
              </p></list-item></list>
        </p>
        <p>After answering these questions we debrief participants, which includes reminding the participant of ethical standards and professional guidelines that condemn data fabrication, to ensure that the participant realizes this was only an academic exercise.</p>
      </sec>
      <sec sec-type="Evaluation project 1">
        <title>Evaluation project 1</title>
        <p>We use both genuine and fabricated datasets (20 datasets each). We collect the fabricated datasets during the project and we download genuine data from the Many Labs 3 project (<ext-link ext-link-type="uri" xlink:href="https://osf.io/ct89g">osf.io/</ext-link><ext-link ext-link-type="uri" xlink:href="http://osf.io/ct89g">ct89g</ext-link>; <xref ref-type="bibr" rid="B3138364">Ebersole et al. 2016</xref>). These genuine and fabricated data are used to examine the statistical properties of the tools to detect data fabrication. We apply four different statistical methods, of which we combine three into an overall test (see Fig. <xref ref-type="fig" rid="F3138092">1</xref>).</p>
        <p>We apply digit analysis to the first and final digit of the fabricated mean and SD response latencies (e.g., for 1.45 we use 1 and 5). We apply Benford’s law to the first digit four times: 2 [congruent/incongruent] × 2 [mean/SD response latencies]. Terminal digit analysis is applied to the last digit four times: 2 [congruent/incongruent] × 2 [mean/SD response latencies]. Each of these applications is based on 25 values (e.g., 25 fabricated means for the congruent condition).</p>
        <p>Next, we test whether there is sufficient variance in the 25 fabricated SDs per condition. This results in two variance analyses, one per condition and each based on 25 values. Given that variances of samples from a population with a known population variance are χ<sup>2</sup> distributed, with N-1 degrees of freedom, the expected distribution of the variance of the SDs is readily simulated. A <italic>p</italic>-value is then computed to determine how extreme the observed amount of variation in the SDs is, which serves as a test for potential data fabrication.</p>
        <p>We test four multivariate associations between means and SDs of the response latencies by comparing them with the meta-analytic estimate of the genuine data. The multivariate association of means <italic>between</italic> conditions, SDs <italic>between</italic> conditions, and the association of means and SDs <italic>within</italic> conditions are inspected (i.e., four in total). For example, if association between the mean response latencies in the congruent and incongruent conditions is estimated to be distributed normally with μ = .23 and σ = .1 in genuine data, finding an association of -.7 is an extreme value (vice versa: .28 would not be extreme) and can be considered an anomaly.</p>
        <p>Finally, we combine the terminal digit analyses, variance analyses, and analyses of multivariate associations into an overall Fisher test (see Fig. <xref ref-type="fig" rid="F3138092">1</xref>; <xref ref-type="bibr" rid="B3138438">Fisher 1925</xref>). We exclude Benford’s law due to expected lack of utility. This test is computed as</p>
        <p>
          <tex-math id="M1">\documentclass[12pt]{standalone}
\usepackage{varwidth}

\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}

\usepackage{amsmath, amssymb, graphics, setspace}
\newcommand{\mathsym}[1]{{}}
\newcommand{\unicode}[1]{{}}
\newcounter{mathematicapage}
\begin{document}
   \begin{varwidth}{50in}
        \begin{equation*}
            \chi^2_{2k}=-2\sum\limits^k_{i=1}ln(p_i)
        \end{equation*}
    \end{varwidth}
\end{document}
</tex-math>
        </p>
        <p>where <italic>p</italic> is the <italic>p</italic>-value of the <italic>i</italic> th method. The <italic>p</italic>-value of the Fisher test provides an overall indication of evidence for potential data fabrication, based on the three methods and is also used to rank order select those fabricators who receive the bonus, where the three largest <italic>p</italic>-values receive the bonus.</p>
        <p>For all tools the false positive- and false negative rate are investigated and related to sensitivity and specificity, as a function of significance level alpha (varying from .000001 to .1), with data of individual labs and fabricators as unit of analysis. We perform an ROC-analysis and estimate the optimal criterion using cost-benefit analysis of correct and false classifications for the 20 genuine and 20 fabricated data sets included in this project.</p>
      </sec>
      <sec sec-type="Outcomes project 1">
        <title>Outcomes project 1</title>
        <p>
          <list list-type="order"><list-item><p>Twenty publicly available datasets of fabricated raw data on the Stroop effect</p></list-item><list-item><p>Manuscript on the performance of statistical tools to detect potentially problematic data</p></list-item><list-item><p>Freely available functions to test for potentially problematic data in the R environment</p></list-item></list>
        </p>
      </sec>
    </sec>
    <sec sec-type="Project 2: Understanding data fabrication">
      <title>Project 2: Understanding data fabrication</title>
      <p><italic>Summary</italic>. <italic>In Project 2, we investigate how data are fabricated. We document how participants from Project 1 described to have fabricated data. This information is qualitatively analyzed for fabrication characteristics that result in data that are easier or harder to detect as fabricated, in order to better understand which statistical tools can(not) detect certain data fabrication characteristics.</italic></p>
      <p>Whereas the previous project focuses on the performance and statistical properties of the tools to detect potential data fabrication, Project 2 focuses on understanding the process of data fabrication. We examine the interviews from Project 1 for data fabrication characteristics and relate these to whether we did (not) detect data fabrication in Project 1.</p>
      <p>From Project 1, participants’ data fabrication descriptions (henceforth called participant’s method) are used to answer questions such as: (i) How are participant’s methods linked to tool performance? (ii) Are some tools in Project 1 more successful in predicting the use of some participant’s methods than others? (iii) If some participant’s method lead to undetected data fabrication with existing tools, does this method suggest (further) development of a tool that may be sensitive to this participant’s method?</p>
      <sec sec-type="Procedure project 2">
        <title>Procedure project 2</title>
        <p>Because the analyses and results of Project 2 are largely dependent on the behavior of participants, we can only provide the framework of our procedure.</p>
        <p>Interviews with participants from Project 1 are transcribed, qualitatively analyzed for data fabrication characteristics, and related to the (non-)detection of data fabrication in Project 1. We apply an inductive approach to identify data fabrication characteristics (<xref ref-type="bibr" rid="B3138447">Yamasaki and Rihoux 2009</xref>), where the transcripts are read and discussed (CHJH and student-assistant) to identify data fabrication characteristics. Subsequently, transcripts are coded for these characteristics by two independent coders. As a result, we acquire a list of data fabrication characteristics for each fabricator. An example of a data fabrication characteristic is whether the participant simulated data with a random number generator. These data fabrication characteristics are linked to whether we were able to detect data fabrication in Project 1, which allows us to assess whether specific data fabrication characteristics were easier or harder to detect than others were.</p>
      </sec>
      <sec sec-type="Evaluation project 2">
        <title>Evaluation project 2</title>
        <p>We apply crisp set qualitative comparative analysis (QCA; <xref ref-type="bibr" rid="B3138461">Rihoux and Ragin 2009</xref>) to identify unique combinations of data fabrication characteristics, which we link to whether we detected data fabrication in Project 1. The goal of this analysis is to analyze unique combinations of characteristics to identify recurring characteristics that improve or reduce detection of data fabrication. This allows us to assess whether specific characteristics of data fabrication yield higher detection rates. Also, we rank unique combinations of data fabrication characteristics on detection rate, allowing us to assess which characteristics are well-detected with current statistical tools and which are not. For example, it might be the case that <italic>all</italic> data fabrication patterns that include copy-pasting data points are detected as fabricated. Subsequently, copy-pasting data in the fabrication process seems sufficient to detecting data fabrication (see Table <xref ref-type="table" rid="T3138096">1</xref>). Vice versa, it can highlight conditions that lead to non-detection (e.g., simulated data). For example, it seems plausible that when univariate data are simulated, statistical tools that inspect univariate results will have more difficulty in detecting fabricated data because simulation may yield sufficient amounts of sampling error.</p>
      </sec>
      <sec sec-type="Outcomes project 2">
        <title>Outcomes project 2</title>
        <p>
          <list list-type="order"><list-item><p>Collection of transcribed verbal interviews on fabrication characteristics in Project 1</p></list-item><list-item><p>Inductive approach to identifying data fabrication characteristics based on interviews</p></list-item><list-item><p>Dataset of applied data fabrication characteristics by 20 fabricators, including whether statistical tools from Project 1 were able to detect data fabrication</p></list-item><list-item><p>Manuscript on data fabrication characteristics and detection of data fabrication in relation to these characteristics.</p></list-item></list>
        </p>
      </sec>
    </sec>
    <sec sec-type="Project 3: automated detection of potential data fabrication">
      <title>Project 3: automated detection of potential data fabrication</title>
      <p><italic>Summary</italic>. <italic>Project 3 applies semi-automatic ways of detecting data anomalies in articles and develops new software that facilitates automated detection of data anomalies. First, we inspect the usefulness of already available software to detect data anomalies (i.e., the R package statcheck) when combined with manual follow-up. Second, we cooperate with ContentMine, specialized in extracting information from research articles in different scientific fields, to improve automated data extraction (e.g., tables, figures). This project provides a proof of concept for using automated procedures to extract data from articles that can be used to detect data anomalies, potentially due to data fabrication. This lays the groundwork for the application of automated procedures in future research (e.g., in Phase II FOA by ORI).</italic></p>
      <p>Currently, relatively few articles are inspected for data anomalies; Project 3 investigates and develops methods to increase the number of articles that can be inspected for data anomalies by (semi-)automating this process, greatly decreasing marginal costs for an initial screening. Automated screening tools for data anomalies have been suggested (<xref ref-type="bibr" rid="B3138470">Carlisle et al. 2015</xref>, <xref ref-type="bibr" rid="B3138481">Miller 2015</xref>), but have yet to be developed.</p>
      <p>This final project investigates screening tools in two subprojects: (i) semi-automatic detection of data anomalies and (ii) development of software to facilitate future automatic detection of data anomalies. In subproject 1 we apply currently available software to semi-automatically detect data anomalies. This software was designed for other purposes than detecting data anomalies and only few statistical methods to detect data anomalies can be applied. Hence, we develop new software in subproject 2 that extracts more data and allows for the application of more extensive statistical methods to detect data anomalies (including the methods from projects 1 and 2).</p>
      <p>In subproject 1, we apply available software to screen ~30,000 psychology articles semi-automatically to detect data anomalies. This software, co-developed by the principal investigator CHJH and first released in 2015, is covered more extensively in the procedure section. It automatically extracts statistical results from research articles (e.g., <italic>t</italic> (85) = 2.86, <italic>p</italic> = .005) and methods inspecting <italic>p</italic>-values can subsequently be applied to flag potentially problematic papers. We follow up the flagged articles manually to investigate whether these were indeed anomalous or not (e.g., erroneous data extraction by the software), resulting in a qualitative assessment of what can go wrong in automated data extraction and an initial assessment of how many papers contain anomalies.</p>
      <p>In subproject 2, we team up with ContentMine to create new and more extensive data extraction software. The software from subproject 1 was developed for other purposes than detecting various kinds of data anomalies. Methods from Project 1 and 2 cannot be applied with the available software. In order to extend the data that are extracted and thereby the detection capabilities, we will work together with ContentMine to make software that can extract other information from research articles. Main goals include developing software to extract the raw data underlying scatterplots (e.g., Fig. <xref ref-type="fig" rid="F3138090">2</xref>), facilitating digit analyses, and to extract data from tables (e.g., Fig. <xref ref-type="fig" rid="F3138491">3</xref>), facilitating variance analyses. ContentMine has indicated that these goals are feasible within the timeframe of the contract (25 days of work).</p>
      <p>After developing this improved open-source software, we validate whether it properly extracts data. Even though subproject 1 provides a proof of concept of using automated tools to detect data anomalies, we need to validate whether these new tools are valid in extracting data prior to applying them to detect data anomalies. As such, the application of this new software to detect data anomalies is scope for future research that becomes possible upon completion of both subproject 1 and 2.</p>
      <sec sec-type="Procedure project 3">
        <title>Procedure project 3</title>
        <p>Subproject 1 uses semi-automatic procedures to flag psychology articles for data anomalies potentially due to data fabrication. We reuse data extracted from ~30,000 psychology articles with the R package <italic>statcheck</italic> (<ext-link ext-link-type="uri" xlink:href="http://osf.io/gdr4q">osf.io/gdr4q</ext-link>; <xref ref-type="bibr" rid="B3138221">Nuijten et al. 2015</xref>). The package scans HTML/PDF versions of articles and extracts all in-line reported results, given that they are reported in the format required by the American Psychological Association (APA). The scope of results extracted by <italic>statcheck</italic> is limited due to this restriction, but already some statistical methods to detect data anomalies can be applied. More specifically, we use the Fisher method (<xref ref-type="bibr" rid="B3138438">Fisher 1925</xref>) to identify papers that report more high <italic>p</italic>-values than low <italic>p</italic>-values.</p>
        <p>Those research articles flagged with the Fisher method as including data anomalies are inspected manually to determine whether there is indeed an anomaly. The <italic>statcheck</italic> procedure could false-positively flag articles for which it erroneously extracted results, instead of actual problems. This manual investigation allows us to investigate whether they are flagged correctly, and if not, why they were flagged nonetheless. This information can be used to improve data extraction software in subproject 2. When done for all initially flagged research articles this will provide an initial prevalence estimate of how many research articles contain data anomalies out of the ~30,000 inspected.</p>
        <p>In subproject 2 of Project 3, ContentMine and we develop software to extract more information from articles. To this end, we use the ContentMine software <italic>ami</italic> (<ext-link ext-link-type="uri" xlink:href="http://github.com/contentmine/ami-plugin">github.com/contentmine/ami-plugin</ext-link>; <xref ref-type="bibr" rid="B3138503">Murray-Rust et al. 2014</xref>​) as the primary infrastructure to extract information. The ContentMine team is contracted to work on building add-ons to this software to extract data from tables, figures, and to train the main applicant (CHJH) on developing so-called dictionaries to specify which statistical information is extracted. The main benefit of <italic>ami</italic> is that it is easily extended to search for additional statistical information. CHJH will extend the software to flexibly and extensively extract statistical results. This includes not only results of statistical tests, as <italic>statcheck</italic> extracts, but also statistical results such as Cronbach's alpha (measure of scale reliability), means, and SDs. Moreover, in future projects (outside of the scope of this proposal) <italic>ami</italic> can be extended to include Natural Language Processing (NLP), which can be applied to understand the structure of sentences in order to extract even more information from research papers.</p>
        <p>The open-source software developed with ContentMine will be applied to 60 empirical research articles and validated by comparing data extracted manually with data extracted automatically. We manually extract the data the software should extract and check whether the software also does so. In order to ensure cross-publisher applicability of the software, we investigate the validity for five publishers, who publish the majority of the social science literature (Elsevier, Wiley, Taylor Francis, Sage, Springer; <xref ref-type="bibr" rid="B3138721">Larivière et al. 2015</xref>). These publishers have various ways of formatting tables or figures, which affects whether the software can properly extract the data. In order to randomly sample 10 articles per publisher, a list of all social- and medical science articles for these publishers is collected from their respective websites automatically (CHJH has previously developed software to this end; <ext-link ext-link-type="uri" xlink:href="http://github.com/chartgerink/journal-spiders">github.com/chartgerink/journal-spiders</ext-link>). The random sample has to be published in or after 2010 and have at least a methods and results section (this makes it plausible it pertains to an empirical research article).</p>
      </sec>
      <sec sec-type="Evaluation project 3">
        <title>Evaluation project 3</title>
        <p>In subproject 1, we flag research articles as potentially problematic based on extracted <italic>p</italic>-values. To this end, we use the Fisher method and adjust it to investigate whether the <italic>p</italic>-value distribution is left-skew, instead of the theoretically expected uniform or right-skew distribution. This adjusted Fisher method is computed as</p>
        <p>
          <tex-math id="M2">\documentclass[12pt]{standalone}
\usepackage{varwidth}

\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}

\usepackage{amsmath, amssymb, graphics, setspace}
\newcommand{\mathsym}[1]{{}}
\newcommand{\unicode}[1]{{}}
\newcounter{mathematicapage}
\begin{document}
   \begin{varwidth}{50in}
        \begin{equation*}
            \chi^2_{2k}=-2\sum\limits^k_{i=1}ln(1-\frac{p_i-t}{1-t})
        \end{equation*}
    \end{varwidth}
\end{document}
</tex-math>
        </p>
        <p>where <italic>t</italic> is the lower bound (i.e., threshold) of the <italic>k</italic> number of <italic>p</italic>-values taken into account. This method is applied to the <italic>p</italic>-values available for each article and results in a χ<sup>2</sup> value with an accompanying <italic>p</italic>-value, which tests the null hypothesis that there is no indication for left-skew anomalies in the <italic>p</italic>-values. For example, if only nonsignificant values are taken into account (i.e., <italic>t</italic> = .05) and the <italic>p-</italic> values from one paper are {.99, .8, .01, .03, .87}, there is evidence for a left-skew anomaly in <italic>p-</italic> values, χ<sup>2</sup>(6) = 16.20, p = .013. We are currently in the process of validating this method in a study similar to Project 1.</p>
        <p>In subproject 2, we validate the newly developed software by manually extracting information from 60 research articles and comparing it to the information extracted automatically. Statistical information that is supposed to be extracted by the software from these 60 research articles will be manually coded (e.g., means, SDs, etc.). Subsequently, we apply the new software to extract information and see to what degree the automatically extracted results correspond to the manually extracted results. With scatterplots, this validation is hardly possible, hence it is feasible that there will be cases where the automated procedures extract <italic>more</italic> information than the manual data extraction.</p>
      </sec>
      <sec sec-type="Outcomes project 3">
        <title>Outcomes project 3</title>
        <p>
          <list list-type="order"><list-item><p>Dataset on research papers automatically flagged with available software, including whether there was reason to believe it flagged erroneously upon manual inspection.</p></list-item><list-item><p>Newly developed open-source software to extract statistical information from empirical research articles (together with ContentMine)</p></list-item><list-item><p>Dataset of manually extracted statistical information and automatically extracted statistical information (extracted with new software) for 60 research articles</p></list-item><list-item><p>Manuscript on automated detection of data anomalies, potentially due to data fabrication</p></list-item></list>
        </p>
      </sec>
    </sec>
    <sec sec-type="Responsible conduct of research plan">
      <title>Responsible conduct of research plan</title>
      <p>To ensure the integrity of the proposed research, we cover (i) ethical considerations, (ii) openness of research materials, and (iii) reproducibility of research results. Project 1 is scrutinized by the Tilburg University Psychological Ethical Testing Committee before data collection commences. Second, all research files will be publicly available (data from Project 1 will be permanently anonymized). Third, reproducibility is promoted with dynamic manuscripts (i.e., with the <italic>knitr</italic> package; <xref ref-type="bibr" rid="B3139427">Xie 2014</xref>) and all analyses are double-checked (i.e., co-piloted; <xref ref-type="bibr" rid="B3139816">Wicherts 2011</xref>, <xref ref-type="bibr" rid="B3139733">Veldkamp et al. 2014</xref>​).</p>
      <p>To ensure all personnel is familiar with ethical guidelines, ethical approval, and research protocols, these are (re)distributed and (re)discussed at the start of each research project. Several additional training days for the student-assistant are provided by CHJH, training him/her in essential responsible research skills (e.g., reproducibility, documenting decisions) and provides a theoretical framework for considering ethical issues that are not included in protocols. This ensures that the student-assistant is familiarized with procedures and promotes independence in an open, reproducible fashion.</p>
    </sec>
    <sec sec-type="Dissemination">
      <title>Dissemination</title>
      <p>We disseminate results on Twitter, at conferences, and in Open Access publications. We will spread findings on Twitter; three science related accounts have agreed to disseminate results and reach approximately 50,000 followers (<ext-link ext-link-type="uri" xlink:href="http://twitter.com/openscience">@openscience</ext-link>, 43,000 followers; <ext-link ext-link-type="uri" xlink:href="http://twitter.com/onscience">@onscience</ext-link>, 650 followers; <ext-link ext-link-type="uri" xlink:href="http://twitter.com/osframework">@OSFramework</ext-link>, 4,500 followers). Part of the results of the projects will be presented at the World Conference on Research Integrity 2017 (Amsterdam) and the 2017 Association for Psychological Science (APS) convention in Boston. Manuscripts will be made available upon completion as preprints and submitted to Open Access journals, which results in more downloads per paper and more citations (<xref ref-type="bibr" rid="B3140386">Davis 2011</xref>).</p>
    </sec>
    <sec sec-type="Project management">
      <title>Project management</title>
      <p>The principal investigator, CHJH, carries day-to-day responsibility for the project (see Table <xref ref-type="table" rid="T3137995">2</xref> for timeline). Marcel van Assen and Jelte Wicherts provide supervision. Jelte Wicherts (JMW) and Marcel van Assen (MvA) both have strong expertise in (advising on) research misconduct cases. For instance, JMW took part in an ad hoc committee on research integrity investigating claims made against Nyborg (<xref ref-type="bibr" rid="B3140772">Vernon 2015</xref>), and MvA was the statistical advisor on one of the committees investigating Stapel’s data fabrication (<xref ref-type="bibr" rid="B3138172">Levelt Committee et al. 2012</xref>). CHJH has previously detected potential data fabrication and is meticulous in his research. The precision and understanding that is required to bring this project to completion are in place and his ideal of opening up the entire scientific process give him a large sense of responsibility. His doctoral project was lauded as most promising at the World Conference on Research Integrity (2015). Management of the research materials themselves occurs via the Open Science Framework and on a continuous basis, which provides an online backup and provides a logbook of changes to research files.</p>
    </sec>
  </body>
  <back>
    <sec sec-type="Funding program">
      <title>Funding program</title>
      <p>This grant proposal has been submitted for the Phase I grant (IR-ORI-16-001) by the Office of Research Integrity. The only addition to the original grant proposal is the reference to Table 2 in the "Project Management" section.</p>
    </sec>
    <ref-list>
      <title>References</title>
      <ref id="B3138236">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Benford</surname>
              <given-names>F.</given-names>
            </name>
          </person-group>
          <year>1938</year>
          <article-title>The Law of Anomalous Numbers</article-title>
          <source>Proceedings of the American Philosophical Society</source>
          <volume>78</volume>
          <issue>4</issue>
          <fpage>551</fpage>
          <lpage>572</lpage>
        </element-citation>
      </ref>
      <ref id="B3138211">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Bornmann</surname>
              <given-names>Lutz</given-names>
            </name>
            <name name-style="western">
              <surname>Nast</surname>
              <given-names>Irina</given-names>
            </name>
            <name name-style="western">
              <surname>Daniel</surname>
              <given-names>Hans-Dieter</given-names>
            </name>
          </person-group>
          <year>2008</year>
          <article-title>Do editors and referees look for signs of scientific misconduct when reviewing manuscripts? A quantitative content analysis of studies that examined review criteria and reasons for accepting and rejecting manuscripts for publication</article-title>
          <source>Scientometrics</source>
          <volume>77</volume>
          <issue>3</issue>
          <fpage>415</fpage>
          <lpage>432</lpage>
          <uri>https://doi.org/10.1007/s11192-007-1950-2</uri>
          <pub-id pub-id-type="doi">10.1007/s11192-007-1950-2</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138270">
        <element-citation publication-type="conference-preoceeding">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Burns</surname>
              <given-names>Bruce D</given-names>
            </name>
          </person-group>
          <year>2009</year>
          <source>Sensitivity to statistical regularities: People (largely) follow Benford’s law</source>
          <conf-name>Proc. Thirty-First Annual Conference of the Cognitive Science Society, Cognitive Science Society</conf-name>
          <conf-loc>Austin, TX</conf-loc>
          <size units="page">2872-2877</size>
        </element-citation>
      </ref>
      <ref id="B3138334">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Buyse</surname>
              <given-names>Marc</given-names>
            </name>
            <name name-style="western">
              <surname>George</surname>
              <given-names>Stephen L.</given-names>
            </name>
            <name name-style="western">
              <surname>Evans</surname>
              <given-names>Stephen</given-names>
            </name>
            <name name-style="western">
              <surname>Geller</surname>
              <given-names>Nancy L.</given-names>
            </name>
            <name name-style="western">
              <surname>Ranstam</surname>
              <given-names>Jonas</given-names>
            </name>
            <name name-style="western">
              <surname>Scherrer</surname>
              <given-names>Bruno</given-names>
            </name>
            <name name-style="western">
              <surname>Lesaffre</surname>
              <given-names>Emmanuel</given-names>
            </name>
            <name name-style="western">
              <surname>Murray</surname>
              <given-names>Gordon</given-names>
            </name>
            <name name-style="western">
              <surname>Edler</surname>
              <given-names>Lutz</given-names>
            </name>
            <name name-style="western">
              <surname>Hutton</surname>
              <given-names>Jane</given-names>
            </name>
            <name name-style="western">
              <surname>Colton</surname>
              <given-names>Theodore</given-names>
            </name>
            <name name-style="western">
              <surname>Lachenbruch</surname>
              <given-names>Peter</given-names>
            </name>
            <name name-style="western">
              <surname>Verma</surname>
              <given-names>Babu L.</given-names>
            </name>
          </person-group>
          <year>1999</year>
          <article-title>The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials</article-title>
          <source>Statistics in Medicine</source>
          <volume>18</volume>
          <issue>24</issue>
          <fpage>3435</fpage>
          <lpage>3451</lpage>
          <uri>https://doi.org/10.1002/(sici)1097-0258(19991230)18:243.0.co;2-o</uri>
          <pub-id pub-id-type="doi">10.1002/(sici)1097-0258(19991230)18:243.0.co;2-o</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138152">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Carlisle</surname>
              <given-names>J. B.</given-names>
            </name>
          </person-group>
          <year>2012</year>
          <article-title>The analysis of 168 randomised controlled trials to test data integrity</article-title>
          <source>Anaesthesia</source>
          <volume>67</volume>
          <issue>5</issue>
          <fpage>521</fpage>
          <lpage>537</lpage>
          <uri>https://doi.org/10.1111/j.1365-2044.2012.07128.x</uri>
          <pub-id pub-id-type="doi">10.1111/j.1365-2044.2012.07128.x</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138470">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Carlisle</surname>
              <given-names>J. B.</given-names>
            </name>
            <name name-style="western">
              <surname>Dexter</surname>
              <given-names>F.</given-names>
            </name>
            <name name-style="western">
              <surname>Pandit</surname>
              <given-names>J. J.</given-names>
            </name>
            <name name-style="western">
              <surname>Shafer</surname>
              <given-names>S. L.</given-names>
            </name>
            <name name-style="western">
              <surname>Yentis</surname>
              <given-names>S. M.</given-names>
            </name>
          </person-group>
          <year>2015</year>
          <article-title>Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials</article-title>
          <source>Anaesthesia</source>
          <volume>70</volume>
          <issue>7</issue>
          <fpage>848</fpage>
          <lpage>858</lpage>
          <uri>https://doi.org/10.1111/anae.13126</uri>
          <pub-id pub-id-type="doi">10.1111/anae.13126</pub-id>
        </element-citation>
      </ref>
      <ref id="B3140386">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Davis</surname>
              <given-names>P. M.</given-names>
            </name>
          </person-group>
          <year>2011</year>
          <article-title>Open access, readership, citations: a randomized controlled trial of scientific journal publishing</article-title>
          <source>The FASEB Journal</source>
          <volume>25</volume>
          <issue>7</issue>
          <fpage>2129</fpage>
          <lpage>2134</lpage>
          <uri>https://doi.org/10.1096/fj.11-183988</uri>
          <pub-id pub-id-type="doi">10.1096/fj.11-183988</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138304">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Deckert</surname>
              <given-names>J.</given-names>
            </name>
            <name name-style="western">
              <surname>Myagkov</surname>
              <given-names>M.</given-names>
            </name>
            <name name-style="western">
              <surname>Ordeshook</surname>
              <given-names>P. C.</given-names>
            </name>
          </person-group>
          <year>2011</year>
          <article-title>Benford's Law and the Detection of Election Fraud</article-title>
          <source>Political Analysis</source>
          <volume>19</volume>
          <issue>3</issue>
          <fpage>245</fpage>
          <lpage>268</lpage>
          <uri>https://doi.org/10.1093/pan/mpr014</uri>
          <pub-id pub-id-type="doi">10.1093/pan/mpr014</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138314">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Diekmann</surname>
              <given-names>Andreas</given-names>
            </name>
          </person-group>
          <year>2007</year>
          <article-title>Not the First Digit! Using Benford's Law to Detect Fraudulent Scientif ic Data</article-title>
          <source>Journal of Applied Statistics</source>
          <volume>34</volume>
          <issue>3</issue>
          <fpage>321</fpage>
          <lpage>329</lpage>
          <uri>https://doi.org/10.1080/02664760601004940</uri>
          <pub-id pub-id-type="doi">10.1080/02664760601004940</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138364">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Ebersole</surname>
              <given-names>Charles</given-names>
            </name>
            <name name-style="western">
              <surname>Atherton</surname>
              <given-names>Olivia</given-names>
            </name>
            <name name-style="western">
              <surname>Belanger</surname>
              <given-names>Aimee</given-names>
            </name>
            <name name-style="western">
              <surname>Skulborstad</surname>
              <given-names>Hayley</given-names>
            </name>
            <name name-style="western">
              <surname>Adams</surname>
              <given-names>Reginald</given-names>
            </name>
            <name name-style="western">
              <surname>Allen</surname>
              <given-names>Jill</given-names>
            </name>
            <name name-style="western">
              <surname>Banks</surname>
              <given-names>Jonathan</given-names>
            </name>
            <name name-style="western">
              <surname>Baranski</surname>
              <given-names>Erica</given-names>
            </name>
            <name name-style="western">
              <surname>Bernstein</surname>
              <given-names>Michael</given-names>
            </name>
            <name name-style="western">
              <surname>Bonfiglio</surname>
              <given-names>Diane</given-names>
            </name>
            <name name-style="western">
              <surname>Boucher</surname>
              <given-names>Leanne</given-names>
            </name>
            <name name-style="western">
              <surname>Brown</surname>
              <given-names>Elizabeth</given-names>
            </name>
            <name name-style="western">
              <surname>Budiman</surname>
              <given-names>Nancy</given-names>
            </name>
            <name name-style="western">
              <surname>Cairo</surname>
              <given-names>Athena</given-names>
            </name>
            <name name-style="western">
              <surname>Capaldi</surname>
              <given-names>Colin</given-names>
            </name>
            <name name-style="western">
              <surname>Chartier</surname>
              <given-names>Christopher</given-names>
            </name>
            <name name-style="western">
              <surname>Cicero</surname>
              <given-names>David</given-names>
            </name>
            <name name-style="western">
              <surname>Coleman</surname>
              <given-names>Jennifer</given-names>
            </name>
            <name name-style="western">
              <surname>Conway</surname>
              <given-names>John</given-names>
            </name>
            <name name-style="western">
              <surname>Davis</surname>
              <given-names>William</given-names>
            </name>
            <name name-style="western">
              <surname>Devos</surname>
              <given-names>Thierry</given-names>
            </name>
            <name name-style="western">
              <surname>Dopko</surname>
              <given-names>Raelyne</given-names>
            </name>
            <name name-style="western">
              <surname>Grahe</surname>
              <given-names>Jon</given-names>
            </name>
            <name name-style="western">
              <surname>German</surname>
              <given-names>Komi</given-names>
            </name>
            <name name-style="western">
              <surname>Hicks</surname>
              <given-names>Joshua</given-names>
            </name>
            <name name-style="western">
              <surname>Hermann</surname>
              <given-names>Anthony</given-names>
            </name>
            <name name-style="western">
              <surname>Humphrey</surname>
              <given-names>Brandon</given-names>
            </name>
            <name name-style="western">
              <surname>Johnson</surname>
              <given-names>David</given-names>
            </name>
            <name name-style="western">
              <surname>Joy-Gaba</surname>
              <given-names>Jennifer</given-names>
            </name>
            <name name-style="western">
              <surname>Juzeler</surname>
              <given-names>Hannah</given-names>
            </name>
            <name name-style="western">
              <surname>Klein</surname>
              <given-names>Richard</given-names>
            </name>
            <name name-style="western">
              <surname>Lucas</surname>
              <given-names>Richard</given-names>
            </name>
            <name name-style="western">
              <surname>Lustgraaf</surname>
              <given-names>Christopher</given-names>
            </name>
            <name name-style="western">
              <surname>Menon</surname>
              <given-names>Madhavi</given-names>
            </name>
            <name name-style="western">
              <surname>Metzger</surname>
              <given-names>Mitchell</given-names>
            </name>
            <name name-style="western">
              <surname>Moloney</surname>
              <given-names>Jaclyn</given-names>
            </name>
            <name name-style="western">
              <surname>Morse</surname>
              <given-names>Patrick</given-names>
            </name>
            <name name-style="western">
              <surname>Nelson</surname>
              <given-names>Anthony</given-names>
            </name>
            <name name-style="western">
              <surname>Prislin</surname>
              <given-names>Radmila</given-names>
            </name>
            <name name-style="western">
              <surname>Razza</surname>
              <given-names>Timothy</given-names>
            </name>
            <name name-style="western">
              <surname>Re</surname>
              <given-names>Daniel</given-names>
            </name>
            <name name-style="western">
              <surname>Rule</surname>
              <given-names>Nicholas</given-names>
            </name>
            <name name-style="western">
              <surname>Sacco</surname>
              <given-names>Donald</given-names>
            </name>
            <name name-style="western">
              <surname>Sauerberger</surname>
              <given-names>Kyle</given-names>
            </name>
            <name name-style="western">
              <surname>Shultz</surname>
              <given-names>Megan</given-names>
            </name>
            <name name-style="western">
              <surname>Smith</surname>
              <given-names>Jessi</given-names>
            </name>
            <name name-style="western">
              <surname>Sobocko</surname>
              <given-names>Karin</given-names>
            </name>
            <name name-style="western">
              <surname>Steiner</surname>
              <given-names>Troy</given-names>
            </name>
            <name name-style="western">
              <surname>Sternglanz</surname>
              <given-names>R. Weylin</given-names>
            </name>
            <name name-style="western">
              <surname>Tskhay</surname>
              <given-names>Konstantin</given-names>
            </name>
            <name name-style="western">
              <surname>Vaughn</surname>
              <given-names>Leigh</given-names>
            </name>
            <name name-style="western">
              <surname>van Allen</surname>
              <given-names>Zack</given-names>
            </name>
            <name name-style="western">
              <surname>Walker</surname>
              <given-names>Ryan</given-names>
            </name>
            <name name-style="western">
              <surname>Wilson</surname>
              <given-names>John</given-names>
            </name>
            <name name-style="western">
              <surname>Wirth</surname>
              <given-names>James</given-names>
            </name>
            <name name-style="western">
              <surname>Wortman</surname>
              <given-names>Jessica</given-names>
            </name>
            <name name-style="western">
              <surname>Zelenski</surname>
              <given-names>John</given-names>
            </name>
            <name name-style="western">
              <surname>Nosek</surname>
              <given-names>Brian</given-names>
            </name>
          </person-group>
          <year>2016</year>
          <article-title>Many Labs 3: Evaluating participant pool quality across the academic semester via replication</article-title>
          <source>Journal of Experimental Social Psychology</source>
          <volume>NA</volume>
          <fpage>NA</fpage>
        </element-citation>
      </ref>
      <ref id="B3138142">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Fanelli</surname>
              <given-names>Daniele</given-names>
            </name>
          </person-group>
          <year>2009</year>
          <article-title>How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data</article-title>
          <source>PLoS ONE</source>
          <volume>4</volume>
          <issue>5</issue>
          <fpage>e5738</fpage>
          <uri>https://doi.org/10.1371/journal.pone.0005738</uri>
          <pub-id pub-id-type="doi">10.1371/journal.pone.0005738</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138438">
        <element-citation publication-type="book">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Fisher</surname>
              <given-names>R. A.</given-names>
            </name>
          </person-group>
          <year>1925</year>
          <source>Statistical Methods for Research Workers</source>
          <publisher-name>Oliver and Boyd</publisher-name>
          <publisher-loc>Edinburgh, United Kingdom</publisher-loc>
          <size units="page">NA</size>
        </element-citation>
      </ref>
      <ref id="B3138721">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Larivière</surname>
              <given-names>Vincent</given-names>
            </name>
            <name name-style="western">
              <surname>Haustein</surname>
              <given-names>Stefanie</given-names>
            </name>
            <name name-style="western">
              <surname>Mongeon</surname>
              <given-names>Philippe</given-names>
            </name>
          </person-group>
          <year>2015</year>
          <article-title>The Oligopoly of Academic Publishers in the Digital Era</article-title>
          <source>PLOS ONE</source>
          <volume>10</volume>
          <issue>6</issue>
          <fpage>e0127502</fpage>
          <uri>https://doi.org/10.1371/journal.pone.0127502</uri>
          <pub-id pub-id-type="doi">10.1371/journal.pone.0127502</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138172">
        <element-citation publication-type="website">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Committee</surname>
              <given-names>Levelt</given-names>
            </name>
            <name name-style="western">
              <surname>Committee</surname>
              <given-names>Drenth</given-names>
            </name>
            <name name-style="western">
              <surname>Committee</surname>
              <given-names>Noort</given-names>
            </name>
          </person-group>
          <article-title>Flawed science: The fraudulent research practices of social psychologist Diederik Stapel</article-title>
          <uri>https://www.commissielevelt.nl/</uri>
        </element-citation>
      </ref>
      <ref id="B3138481">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Miller</surname>
              <given-names>D. R.</given-names>
            </name>
          </person-group>
          <year>2015</year>
          <article-title>Probability screening in manuscripts submitted to biomedical journals - an effective tool or a statistical quagmire?</article-title>
          <source>Anaesthesia</source>
          <volume>70</volume>
          <issue>7</issue>
          <fpage>765</fpage>
          <lpage>768</lpage>
          <uri>https://doi.org/10.1111/anae.13165</uri>
          <pub-id pub-id-type="doi">10.1111/anae.13165</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138324">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Mosimann</surname>
              <given-names>James E.</given-names>
            </name>
            <name name-style="western">
              <surname>Ratnaparkhi</surname>
              <given-names>Makarand V.</given-names>
            </name>
          </person-group>
          <year>1996</year>
          <article-title>Uniform occurrence of digits for folded and mixture distributions on finite intervals</article-title>
          <source>Communications in Statistics - Simulation and Computation</source>
          <volume>25</volume>
          <issue>2</issue>
          <fpage>481</fpage>
          <lpage>506</lpage>
          <uri>https://doi.org/10.1080/03610919608813325</uri>
          <pub-id pub-id-type="doi">10.1080/03610919608813325</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138201">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Mosimann</surname>
              <given-names>James</given-names>
            </name>
            <name name-style="western">
              <surname>Wiseman</surname>
              <given-names>Claire</given-names>
            </name>
            <name name-style="western">
              <surname>Edelman</surname>
              <given-names>Ruth</given-names>
            </name>
          </person-group>
          <year>1995</year>
          <article-title>Data fabrication: Can people generate random digits?</article-title>
          <source>Accountability in Research</source>
          <volume>4</volume>
          <issue>1</issue>
          <fpage>31</fpage>
          <lpage>55</lpage>
          <uri>https://doi.org/10.1080/08989629508573866</uri>
          <pub-id pub-id-type="doi">10.1080/08989629508573866</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138191">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Mosimann</surname>
              <given-names>James</given-names>
            </name>
            <name name-style="western">
              <surname>Dahlberg</surname>
              <given-names>John</given-names>
            </name>
            <name name-style="western">
              <surname>Davidian</surname>
              <given-names>Nancy</given-names>
            </name>
            <name name-style="western">
              <surname>Krueger</surname>
              <given-names>John</given-names>
            </name>
          </person-group>
          <year>2002</year>
          <article-title>Terminal Digits and the Examination of Questioned Data</article-title>
          <source>Accountability in Research</source>
          <volume>9</volume>
          <issue>2</issue>
          <fpage>75</fpage>
          <lpage>92</lpage>
          <uri>https://doi.org/10.1080/08989620212969</uri>
          <pub-id pub-id-type="doi">10.1080/08989620212969</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138503">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Murray-Rust</surname>
              <given-names>Peter</given-names>
            </name>
            <name name-style="western">
              <surname>Smith-Unna</surname>
              <given-names>Richard</given-names>
            </name>
            <name name-style="western">
              <surname>Mounce</surname>
              <given-names>Ross</given-names>
            </name>
          </person-group>
          <year>2014</year>
          <article-title>AMI-diagram: Mining Facts from Images</article-title>
          <source>D-Lib Magazine</source>
          <volume>20</volume>
          <fpage>NA</fpage>
          <uri>https://doi.org/10.1045/november14-murray-rust</uri>
          <pub-id pub-id-type="doi">10.1045/november14-murray-rust</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138221">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Nuijten</surname>
              <given-names>Michèle B.</given-names>
            </name>
            <name name-style="western">
              <surname>Hartgerink</surname>
              <given-names>Chris H. J.</given-names>
            </name>
            <name name-style="western">
              <surname>van Assen</surname>
              <given-names>Marcel A. L. M.</given-names>
            </name>
            <name name-style="western">
              <surname>Epskamp</surname>
              <given-names>Sacha</given-names>
            </name>
            <name name-style="western">
              <surname>Wicherts</surname>
              <given-names>Jelte M.</given-names>
            </name>
          </person-group>
          <year>2015</year>
          <article-title>The prevalence of statistical reporting errors in psychology (1985–2013)</article-title>
          <source>Behavior Research Methods</source>
          <volume>NA</volume>
          <fpage>NA</fpage>
          <uri>https://doi.org/10.3758/s13428-015-0664-2</uri>
          <pub-id pub-id-type="doi">10.3758/s13428-015-0664-2</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138461">
        <element-citation publication-type="book">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Rihoux</surname>
              <given-names>B.</given-names>
            </name>
            <name name-style="western">
              <surname>Ragin</surname>
              <given-names>C. C.</given-names>
            </name>
          </person-group>
          <year>2009</year>
          <source>Configurational comparative methods: Qualitative comparative analysis (QCA) and related techniques</source>
          <publisher-name>Sage</publisher-name>
          <publisher-loc>London: United Kingdom</publisher-loc>
          <size units="page">NA</size>
        </element-citation>
      </ref>
      <ref id="B3138493">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Ruys</surname>
              <given-names>Kirsten I.</given-names>
            </name>
            <name name-style="western">
              <surname>Stapel</surname>
              <given-names>Diederik A.</given-names>
            </name>
          </person-group>
          <year>2008</year>
          <article-title>Emotion Elicitor or Emotion Messenger? Subliminal Priming Reveals Two Faces of Facial Expressions</article-title>
          <source>Psychological Science</source>
          <volume>19</volume>
          <issue>6</issue>
          <fpage>593</fpage>
          <lpage>600</lpage>
          <uri>https://doi.org/10.1111/j.1467-9280.2008.02128.x</uri>
          <pub-id pub-id-type="doi">10.1111/j.1467-9280.2008.02128.x</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138181">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Simonsohn</surname>
              <given-names>U.</given-names>
            </name>
          </person-group>
          <year>2013</year>
          <article-title>Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone</article-title>
          <source>Psychological Science</source>
          <volume>24</volume>
          <issue>10</issue>
          <fpage>1875</fpage>
          <lpage>1888</lpage>
          <uri>https://doi.org/10.1177/0956797613480366</uri>
          <pub-id pub-id-type="doi">10.1177/0956797613480366</pub-id>
        </element-citation>
      </ref>
      <ref id="B3138354">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Stroop</surname>
              <given-names>J. R.</given-names>
            </name>
          </person-group>
          <year>1935</year>
          <article-title>Studies of interference in serial verbal reactions.</article-title>
          <source>Journal of Experimental Psychology</source>
          <volume>18</volume>
          <issue>6</issue>
          <fpage>643</fpage>
          <lpage>662</lpage>
          <uri>https://doi.org/10.1037/h0054651</uri>
          <pub-id pub-id-type="doi">10.1037/h0054651</pub-id>
        </element-citation>
      </ref>
      <ref id="B3139733">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Veldkamp</surname>
              <given-names>Coosje L. S.</given-names>
            </name>
            <name name-style="western">
              <surname>Nuijten</surname>
              <given-names>Michèle B.</given-names>
            </name>
            <name name-style="western">
              <surname>Dominguez-Alvarez</surname>
              <given-names>Linda</given-names>
            </name>
            <name name-style="western">
              <surname>van Assen</surname>
              <given-names>Marcel A. L. M.</given-names>
            </name>
            <name name-style="western">
              <surname>Wicherts</surname>
              <given-names>Jelte M.</given-names>
            </name>
          </person-group>
          <year>2014</year>
          <article-title>Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science</article-title>
          <source>PLoS ONE</source>
          <volume>9</volume>
          <issue>12</issue>
          <fpage>e114876</fpage>
          <uri>https://doi.org/10.1371/journal.pone.0114876</uri>
          <pub-id pub-id-type="doi">10.1371/journal.pone.0114876</pub-id>
        </element-citation>
      </ref>
      <ref id="B3140772">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Vernon</surname>
              <given-names>Tony</given-names>
            </name>
          </person-group>
          <year>2015</year>
          <article-title>Editor’s Note</article-title>
          <source>Personality and Individual Differences</source>
          <volume>78</volume>
          <fpage>100</fpage>
          <lpage>101</lpage>
          <uri>https://doi.org/10.1016/j.paid.2015.01.024</uri>
          <pub-id pub-id-type="doi">10.1016/j.paid.2015.01.024</pub-id>
        </element-citation>
      </ref>
      <ref id="B3139816">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Wicherts</surname>
              <given-names>Jelte M.</given-names>
            </name>
          </person-group>
          <year>2011</year>
          <article-title>Psychology must learn a lesson from fraud case</article-title>
          <source>Nature</source>
          <volume>480</volume>
          <issue>7375</issue>
          <fpage>7</fpage>
          <lpage>7</lpage>
          <uri>https://doi.org/10.1038/480007a</uri>
          <pub-id pub-id-type="doi">10.1038/480007a</pub-id>
        </element-citation>
      </ref>
      <ref id="B3139427">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Xie</surname>
              <given-names>Y.</given-names>
            </name>
          </person-group>
          <year>2014</year>
          <article-title>Dynamic documents with R and knitr</article-title>
          <source>Journal of Statistical Software</source>
          <volume>56</volume>
          <fpage>NA</fpage>
        </element-citation>
      </ref>
      <ref id="B3138447">
        <element-citation publication-type="chapter">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Yamasaki</surname>
              <given-names>S.</given-names>
            </name>
            <name name-style="western">
              <surname>Rihoux</surname>
              <given-names>B.</given-names>
            </name>
          </person-group>
          <year>2009</year>
          <chapter-title>A commented review of applications</chapter-title>
          <person-group person-group-type="editor">
            <name name-style="western">
              <surname>Rihoux</surname>
              <given-names>B.</given-names>
            </name>
            <name name-style="western">
              <surname>Ragin</surname>
              <given-names>C. C.</given-names>
            </name>
          </person-group>
          <source>Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and related techniques</source>
          <publisher-name>Sage</publisher-name>
          <publisher-loc>London, United Kingdom</publisher-loc>
        </element-citation>
      </ref>
    </ref-list>
  </back>
  <floats-group>
    <fig id="F3138092" position="float" orientation="portrait">
      <object-id>10.3897/rio.2.e8860.figure1</object-id>
      <label>Figure 1.</label>
      <caption>
        <p>The applied statistical methods to test for data fabrication in Project 1, depicting those that are combined into an overall test for data fabrication with the Fisher method. Benford’s law is excluded from the overall tests because of an expected lack of utility.</p>
      </caption>
      <graphic xlink:href="rio-02-e8860-g001.png" position="float" id="oo_85073.png" orientation="portrait" xlink:type="simple"/>
    </fig>
    <fig id="F3138090" position="float" orientation="portrait">
      <object-id>10.3897/rio.2.e8860.figure2</object-id>
      <label>Figure 2.</label>
      <caption>
        <p>Scatterplot reporting the accompanying correlation value. The raw data for variables X and Y is available in the individual points and can be extracted. Statistical methods such as terminal digit analysis can be applied to these raw data to detect data anomalies.</p>
      </caption>
      <graphic xlink:href="rio-02-e8860-g002.png" position="float" id="oo_85072.png" orientation="portrait" xlink:type="simple"/>
    </fig>
    <fig id="F3138491" position="float" orientation="portrait">
      <object-id>10.3897/rio.2.e8860.figure3</object-id>
      <label>Figure 3.</label>
      <caption>
        <p>Data table from <xref ref-type="bibr" rid="B3138493">Ruys and Stapel (2008)</xref>, retracted due to data fabrication. This table includes 15 duplicates (highlighted) in 32 cells, which can be seen as a serious data anomaly that could have been detected with, for example, automated screening procedures.</p>
      </caption>
      <graphic xlink:href="rio-02-e8860-g003.jpg" position="float" id="oo_85087.jpg" orientation="portrait" xlink:type="simple"/>
    </fig>
    <table-wrap id="T3138096" position="float" orientation="portrait">
      <label>Table 1.</label>
      <caption>
        <p>Example of qualitative comparative analysis. The table indicates three data fabrication characteristics in the columns. Respondents marked * are duplicates; three unique combinations of characteristics are present. Responses that include copy-pasting are detected as fabricated, whereas those that use multivariate simulation were not. Copy-pasting is a sufficient condition to detect data fabrication based on these qualitative data.</p>
      </caption>
      <table rules="all" border="1" style="width:500px">
        <tbody>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Copy-paste</td>
            <td rowspan="1" colspan="1">Univariate simulation</td>
            <td rowspan="1" colspan="1">Multivariate simulation</td>
            <td rowspan="1" colspan="1">Detected</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Resp. 1*</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">Yes</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Resp. 2</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">No</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Resp. 3</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">Yes</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Resp. 4*</td>
            <td rowspan="1" colspan="1">1</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">0</td>
            <td rowspan="1" colspan="1">Yes</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
    <table-wrap id="T3137995" position="float" orientation="portrait">
      <label>Table 2.</label>
      <caption>
        <p>Timeline proposed projects</p>
      </caption>
      <table rules="all" border="1">
        <tbody>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">
              <italic>What</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>9</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>10</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>11</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>12</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>1</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>2</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>3</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>4</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>5</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>6</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>7</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>8</italic>
            </td>
            <td rowspan="1" colspan="1">
              <italic>Lead</italic>
            </td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Hire research assistant</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Train research assistant</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Project 1</td>
            <td rowspan="1" colspan="1">Ethical approval</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Study setup</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Invite researchers</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Conduct study</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Data analysis</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Write paper</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Programming R package</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Project 2</td>
            <td rowspan="1" colspan="1">Transcribe interviews</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Code interviews</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Qualitative analysis</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Write paper</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Project 3</td>
            <td rowspan="1" colspan="1">Apply Fisher method</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Manually check papers flagged by Fisher method</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Develop software to extract statistical information</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Contentmine</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Collect 60 articles</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Automatically extract data from collected articles</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Manually extract data from collected articles</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Compare manual-automated data</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1"/>
            <td rowspan="1" colspan="1">Write paper</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">✓</td>
            <td rowspan="1" colspan="1">CHJH</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
  </floats-group>
</article>
