<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "https://riojournal.com/nlm/tax-treatment-NS0.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp="http://www.plazi.org/taxpub" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">17</journal-id>
      <journal-id journal-id-type="index">urn:lsid:arphahub.com:pub:8E638694-B4E0-570A-856A-746FF325BF6B</journal-id>
      <journal-id journal-id-type="aggregator">urn:lsid:zoobank.org:pub:FEF66878-15EE-4F8B-B369-7652D735020E</journal-id>
      <journal-title-group>
        <journal-title xml:lang="en">Research Ideas and Outcomes</journal-title>
        <abbrev-journal-title xml:lang="en">RIO</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2367-7163</issn>
      <publisher>
        <publisher-name>Pensoft Publishers</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3897/rio.12.e176590</article-id>
      <article-id pub-id-type="publisher-id">176590</article-id>
      <article-id pub-id-type="manuscript">29454</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Methods</subject>
        </subj-group>
        <subj-group subj-group-type="scientific_subject">
          <subject>Artificial intelligence</subject>
          <subject>Bioinformatics</subject>
          <subject>Computational biology</subject>
          <subject>Data mining &amp; Machine learning</subject>
          <subject>Ecological informatics</subject>
          <subject>Microbiology &amp; Virology</subject>
          <subject>Mycology</subject>
          <subject>Soil science</subject>
        </subj-group>
        <subj-group subj-group-type="sdg">
          <subject>Industry</subject>
          <subject> innovation &amp; infrastructure</subject>
          <subject>Life on land</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Automated extraction of fungal trophic modes from literature using BioBERT: an open pilot workflow</article-title>
      </title-group>
      <contrib-group content-type="authors">
        <contrib contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Bock</surname>
            <given-names>Beatrice Margareta</given-names>
          </name>
          <email xlink:type="simple">beabockm@gmail.com</email>
          <uri content-type="orcid">https://orcid.org/0000-0003-2240-9360</uri>
          <xref ref-type="aff" rid="A1">1</xref>
          <xref ref-type="aff" rid="A2">2</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>
        <addr-line content-type="verbatim">Department of Biological Sciences, Northern Arizona University, Flagstaff, United States of America</addr-line>
        <institution>Department of Biological Sciences, Northern Arizona University</institution>
        <addr-line content-type="city">Flagstaff</addr-line>
        <country>United States of America</country>
        <uri content-type="ror">https://ror.org/0272j5188</uri>
      </aff>
      <aff id="A2">
        <label>2</label>
        <addr-line content-type="verbatim">Center for Adaptable Western Landscapes, Northern Arizona University, Flagstaff, United States of America</addr-line>
        <institution>Center for Adaptable Western Landscapes, Northern Arizona University</institution>
        <addr-line content-type="city">Flagstaff</addr-line>
        <country>United States of America</country>
        <uri content-type="ror">https://ror.org/0272j5188</uri>
      </aff>
      <author-notes>
        <fn fn-type="corresp">
          <p>Corresponding author: Beatrice Margareta Bock (<email xlink:type="simple">beabockm@gmail.com</email>).</p>
        </fn>
        <fn fn-type="edited-by">
          <p>Academic editor: Editorial Secretary</p>
        </fn>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2026</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>28</day>
        <month>01</month>
        <year>2026</year>
      </pub-date>
      <volume>12</volume>
      <elocation-id>e176590</elocation-id>
      <uri content-type="arpha" xlink:href="http://openbiodiv.net/1A16F123-502A-5521-84D0-69CF39A59A46">1A16F123-502A-5521-84D0-69CF39A59A46</uri>
      <history>
        <date date-type="received">
          <day>30</day>
          <month>10</month>
          <year>2025</year>
        </date>
        <date date-type="accepted">
          <day>10</day>
          <month>01</month>
          <year>2026</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Beatrice Margareta Bock</copyright-statement>
        <license license-type="creative-commons-attribution" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
          <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
        </license>
      </permissions>
      <abstract>
        <label>Abstract</label>
        <p>Fungi exhibit diverse trophic strategies, ranging from obligate symbiosis to saprotrophy, with some taxa capable of occupying multiple ecological roles. Manually identifying trophic versatility from literature is time-consuming and difficult to scale. Here, we present a pilot workflow that automates the classification of fungal trophic modes using transformer-based language models. A curated dataset of 56 fungal ecology abstracts was manually labelled as dual (occupying multiple trophic modes) or solo (restricted to one mode) and used to fine-tune four models: BioBERT, BERT-base-cased, BERT-base-uncased and BiodivBERT. Stratified 5-fold cross-validation revealed that BioBERT and BERT-base-cased performed equally well (~ 89% accuracy, balanced precision and recall), highlighting the importance of case sensitivity in taxonomic text. BiodivBERT and uncased BERT models underperformed, indicating that domain adaptation alone is not sufficient. This pilot study emphasises reproducibility, transparency and open data integration, offering a generalisable proof-of-concept for linking literature-derived ecological information to existing fungal trait databases such as FUNGuild and FungalTraits. All code and data are openly available to support reuse and scaling to larger datasets.</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>fungal ecology</kwd>
        <kwd>trophic modes</kwd>
        <kwd>natural language processing</kwd>
        <kwd>machine learning</kwd>
        <kwd>trait databases</kwd>
        <kwd>BioBERT</kwd>
        <kwd>saprotrophy-symbiosis continuum</kwd>
      </kwd-group>
      <counts>
        <fig-count count="4"/>
        <table-count count="1"/>
        <ref-count count="18"/>
      </counts>
    </article-meta>
  </front>
  <body>
    <sec sec-type="Introduction">
      <title>Introduction</title>
      <p>Fungi play essential roles in ecosystems as decomposers, pathogens and symbionts, and many taxa exhibit flexibility across these trophic modes (<xref ref-type="bibr" rid="B13584283">Berbee et al. 2017</xref>). Some species occupy multiple ecological strategies depending on host identity or environmental context, while others remain specialised to a single mode (<xref ref-type="bibr" rid="B13584313">Martin and Tan 2025</xref>). Understanding this trophic versatility is critical for biodiversity assessments, ecosystem modelling and agricultural applications where a fungus’ ecological role affects plant performance.</p>
      <p>Trait databases, such as FUNGuild (<xref ref-type="bibr" rid="B13584322">Nguyen et al. 2016</xref>) and FungalTraits (<xref ref-type="bibr" rid="B13597550">Põlme et al. 2021</xref>), have advanced fungal functional annotation at scale, but manual literature extraction remains a bottleneck. Human-curated classifications are time-intensive, subjective and limited by the accessibility of trait-relevant language in publications. Moreover, trait databases are valuable but can be limited in applicability, as annotations are often performed at the genus or family level despite often substantial interspecific variability (<xref ref-type="bibr" rid="B13584468">Violle et al. 2015</xref>).</p>
      <p>Natural language processing (NLP) offers a scalable way to extract trait-relevant information directly from text. Transformer-based models, such as BioBERT, pretrained on large biomedical corpora, excel at contextual understanding and have achieved state-of-the-art results in various text classification tasks (<xref ref-type="bibr" rid="B13584301">Lee et al. 2019</xref>). However, their use in fungal ecology and trait data integration remains largely unexplored.</p>
      <p>This pilot study tests the feasibility of fine-tuning BioBERT to classify fungal trophic modes from abstracts. The workflow is designed for transparency and future scaling, providing a reproducible pipeline that can complement or benchmark existing trait databases. By linking automated text classification with open trait resources, this work demonstrates a path towards more consistent, interoperable fungal functional data.</p>
      <sec sec-type="Related Work">
        <title>Related Work</title>
        <p>Recent advances in domain-specific NLP illustrate the potential for scaling and refining workflows like this one. BiodivBERT (<xref ref-type="bibr" rid="B13603935">Abdelmageed et al. 2023</xref>) represents the first pretrained language model tailored specifically for biodiversity research, achieving significant gains in named entity recognition and relation extraction. Its development from life-sciences corpora highlights how domain adaptation can substantially improve the precision and recall of ecological information retrieval. Similarly, <xref ref-type="bibr" rid="B13603850">Cornelius et al. (2025)</xref> demonstrated a machine-learning framework for extracting arthropod organismal traits, translating unstructured text into a machine-actionable database (ArTraDB). Their approach shows how targeted trait extraction can efficiently transform literature into structured ecological data.</p>
        <p>Parallel work in plant functional ecology supports the broader utility of transformer-based extraction for trait data. <xref ref-type="bibr" rid="B13603861">Domazetoski et al. (2025)</xref> developed a natural language pipeline that automatically identifies both categorical and numerical plant traits from unstructured descriptions with high precision and recall. This scalability across morphological, life history and functional trait types underscores the potential to adapt such methods to fungal traits, where similar data gaps persist.</p>
        <p>Beyond ecological trait extraction, innovations in model architecture and pretraining further extend applicability. BioT5 (<xref ref-type="bibr" rid="B13603873">Pei et al. 2023</xref>) introduced cross-modal pretraining to connect textual and molecular data using chemically informed representations, an approach that could eventually allow integration of genomic or metabolomic predictors into ecological models. ModernBERT (<xref ref-type="bibr" rid="B13603886">Warner et al. 2025</xref>) provides a computationally efficient encoder capable of handling long-context inputs, which is particularly relevant for large-scale biodiversity corpora. Together, these efforts suggest practical pathways to scale fungal trait text mining beyond small datasets, while maintaining interpretability and reproducibility. Foundational work by <xref ref-type="bibr" rid="B13603904">Gu et al. (2021)</xref> also reinforces the importance of domain-specific pretraining, demonstrating that models trained entirely within a target domain outperform general-domain models adapted later.</p>
      </sec>
    </sec>
    <sec sec-type="Materials and Methods">
      <title>Materials and Methods</title>
      <sec sec-type="Dataset Curation">
        <title>Dataset Curation</title>
        <p>Fig. <xref ref-type="fig" rid="F13800309">1</xref> shows the overall pipeline from data collection to evaluation. Fungal research abstracts were retrieved from the Web of Science Core Collection on 11 September 2025, using two Boolean search queries designed to capture fungi with distinct lifestyle classifications. For solo (single trophic mode) examples, we searched: '("obligate mycorrhizal" OR "strictly endophytic" OR "exclusive saprotroph") AND fungus' (119 results). For dual (multiple trophic mode) examples, we searched: '("dual lifestyle" OR "facultative lifestyle" OR "dual trophic mode" OR "lifestyle switching" OR "endophyte-saprotroph" OR "plant-associated saprotroph") AND fungi' (70 results).</p>
        <p>From these 189 candidate articles, abstracts were manually reviewed by a single labeller (BMB) and 56 were selected, based on: (1) unambiguous description of trophic mode in the abstract text; (2) English language and (3) no duplicates between searches. Abstracts were labelled as:</p>
        <p>Dual: taxa reported to occupy more than one trophic mode (e.g. facultative pathogens that also decompose organic matter);</p>
        <p>Solo: taxa restricted to a single trophic mode (e.g. obligate symbionts or strict saprotrophs).</p>
        <p>Ambiguous abstracts without explicit trophic mode statements were excluded. The final dataset contained 56 abstracts (28 dual, 28 solo). The dataset was balanced between dual and solo classes, with abstract lengths ranging from 150–500 words (mean 360). A supplementary file listing the permissible trophic mode labels and their definitions is available in the repository (datasets/trophic_mode_labels.md).</p>
      </sec>
      <sec sec-type="Preprocessing and Model Training">
        <title>Preprocessing and Model Training</title>
        <p>Abstracts were cleaned and tokenised using model-specific tokenisers (maximum sequence length: 512 tokens). Token length analysis revealed that three of 56 abstracts (5.4%) exceeded this limit and were truncated, though truncation occurred at the end of abstracts where contextual information for classification is typically less concentrated.</p>
        <p>We compared four transformer-based language models to assess the impact of domain-specific pretraining and case sensitivity: (1) BERT-base-uncased ('google-bert/bert-base-uncased'; <xref ref-type="bibr" rid="B13800104">Devlin et al. (2018)</xref>), (2) BERT-base-cased ('google-bert/bert-base-cased'), (3) BioBERT v.1.1 ('monologg/biobert_v.1.1_pubmed', biomedical domain-adapted; <xref ref-type="bibr" rid="B13584301">Lee et al. (2019)</xref>) and (4) BiodivBERT ('NoYo25/BiodivBERT', biodiversity domain-adapted; <xref ref-type="bibr" rid="B13603935">Abdelmageed et al. (2023)</xref>). All models were fine-tuned for binary sequence classification using the Hugging Face Transformers library (v.4.40.0; <xref ref-type="bibr" rid="B13800266">Wolf (2020)</xref>) in Python 3.9 (<xref ref-type="bibr" rid="B13800281">Python Software Foundation 2020</xref>).</p>
        <p>To maximise statistical robustness with the small dataset, we employed stratified 5-fold cross-validation on all 56 abstracts rather than a single train-test split. This approach ensures that each abstract is used once for validation while maintaining class balance across folds (seed = 42 for reproducibility). Models were trained using identical hyperparameters: learning rate = 5 × 10⁻⁵, batch size = 8, maximum epochs = 20 with early stopping (patience = 3 epochs), dropout = 0.2, optimiser = AdamW with class-weighted loss to account for any fold-level imbalances (PyTorch v.2.0.1; <xref ref-type="bibr" rid="B13800289">Paszke et al. (2019)</xref>). Training was executed on NAU's Monsoon HPC cluster using Tesla K80 GPUs with CUDA 11.4.</p>
      </sec>
      <sec sec-type="Evaluation">
        <title>Evaluation</title>
        <p>Performance was evaluated across all five folds for each model. For each fold, accuracy, precision, recall and F1-score were computed using scikit-learn v.1.4.2 (<xref ref-type="bibr" rid="B13800082">Pedregosa et al. 2011</xref>) and results were aggregated as mean ± standard deviation. Precision, recall and F1-score are reported as macro averages (unweighted mean across both classes), which treats solo and dual classes equally and is appropriate for balanced binary classification.</p>
        <p>Comparative visualisations include: (1) model performance bar charts with error bars representing cross-fold variation (Fig. <xref ref-type="fig" rid="F13584482">2</xref>); (2) aggregated confusion matrices for BioBERT showing cumulative predictions across all folds (Fig. <xref ref-type="fig" rid="F13584484">3</xref>) and (3) training time comparison across models (Fig. <xref ref-type="fig" rid="F13800080">4</xref>). Error analysis identified abstracts misclassified by multiple models, highlighting particularly ambiguous cases (results/error_analysis.csv).</p>
      </sec>
      <sec sec-type="Code and Reproducibility">
        <title>Code and Reproducibility</title>
        <p>All code and data are openly available via: <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/records/18156720">Zenodo</ext-link> and <ext-link ext-link-type="uri" xlink:href="https://github.com/beabock/biobert_dualsolo">GitHub</ext-link> (<xref ref-type="bibr" rid="B13800311">Bock 2026</xref>).</p>
        <p>The repositories include scripts for dataset curation, model fine-tuning and evaluation, along with example outputs and documentation to facilitate reuse or adaptation for other trait-related classification tasks.</p>
      </sec>
    </sec>
    <sec sec-type="Results">
      <title>Results</title>
      <sec sec-type="Model Performance">
        <title>Model Performance</title>
        <p>We compared four transformer-based language models using stratified 5-fold cross-validation on all 56 abstracts: (1) BERT-base-uncased; (2) BERT-base-cased; (3) BioBERT v.1.1 (biomedical domain-adapted) and (4) BiodivBERT (biodiversity domain-adapted). All models were trained with identical hyperparameters (20 epochs maximum with early stopping, learning rate 5e-5, batch size 8, dropout 0.2).</p>
        <p>BioBERT achieved the highest overall performance (F1 = 0.892 ± 0.120, Accuracy = 0.894 ± 0.116), marginally outperforming BERT-base-cased (F1 = 0.892 ± 0.100, Accuracy = 0.892 ± 0.100). Case sensitivity proved critical: cased models substantially outperformed their uncased counterparts (BERT-base-uncased: F1 = 0.700 ± 0.241, Accuracy = 0.749 ± 0.177), likely because taxonomic nomenclature capitalisation provides important classification signals. Surprisingly, BiodivBERT underperformed (F1 = 0.747 ± 0.198, Accuracy = 0.771 ± 0.166) despite its biodiversity-specific pre-training, suggesting that domain alignment alone does not guarantee superior performance on specialised classification tasks.</p>
        <p>Classification metrics for BioBERT are summarised in Table <xref ref-type="table" rid="T13584486">1</xref>. Results are reported as mean ± standard deviation across five folds. Comparative model performance (Fig. <xref ref-type="fig" rid="F13584482">2</xref>) demonstrates that cased models substantially outperform uncased variants, while BioBERT and BERT-cased achieve statistically equivalent results. The aggregated confusion matrices (Fig. <xref ref-type="fig" rid="F13584484">3</xref>) shows balanced performance across both classes for BioBERT and BERT-cased. Training efficiency varied substantially (Fig. <xref ref-type="fig" rid="F13800080">4</xref>), with BioBERT and BERT-cased completing training in ~ 10-11 minutes while BiodivBERT and BERT-uncased required ~ 35 minutes, likely due to differences in tokenisation efficiency and convergence patterns.</p>
      </sec>
    </sec>
    <sec sec-type="Discussion">
      <title>Discussion</title>
      <p>This pilot study demonstrates that transformer-based NLP models can extract ecological information embedded in scientific text. With a small but carefully curated dataset, BioBERT and BERT-base-cased achieved ~ 89% accuracy in classifying fungal trophic modes, indicating that pretrained language models, whether biomedical or general-purpose, can generalise to ecological contexts with minimal fine-tuning.</p>
      <p>The comparative model analysis revealed three important insights. First, BioBERT's marginal advantage over BERT-base-cased suggests that biomedical domain adaptation provides limited benefit for this specific task, possibly because fungal ecology vocabulary differs substantially from clinical biomedical text. Second, case sensitivity proved critical: models trained on cased text outperformed uncased variants by ~ 15 percentage points, likely because taxonomic nomenclature capitalisation (e.g. *Fusarium* vs. *fusarium*) carries important classification signals. Third, BiodivBERT's underperformance, despite biodiversity-specific pre-training, indicates that domain alignment alone does not guarantee superior performance; the pre-training corpus must closely match the downstream task domain.</p>
      <p>The workflow presented here provides a proof-of-concept for trait-orientated text mining in fungi. Discrepancies between automated and curated classifications can highlight ambiguous or conflicting entries, which is particularly important given the known limitations in trait data provenance (<xref ref-type="bibr" rid="B13584468">Violle et al. 2015</xref>). When scaled, this approach may support more dynamic updating and cross-validation of trait resources such as FUNGuild and FungalTraits.</p>
      <p>Error analysis across all models (figures/error_analysis.csv) reveals that 39 total misclassifications occurred across 280 predictions (4 models × 5 folds × 56/5 samples per fold). Seven abstracts were misclassified by at least two models, suggesting inherent ambiguity in how trophic modes are described in these cases. For example, abstracts describing endophytes with both plant-beneficial and saprotrophic capabilities proved challenging for all models, likely because the text emphasises ecological context over explicit trophic mode labels. These consistently problematic cases highlight where human curation remains essential and where future annotation guidelines could improve clarity. Improving model learning, such as by increasing the dataset size, refining annotation or using more advanced models, may also help resolve these ambiguous cases in future work.</p>
      <sec sec-type="Limitations">
        <title>Limitations</title>
        <p>This pilot study has several deliberate constraints. The small sample size (56 abstracts) limits statistical power and generalisability, though stratified cross-validation helps mitigate overfitting concerns. The binary simplification (solo vs. dual) reduces the rich spectrum of fungal trophic strategies to a coarse categorisation; real ecological roles often exist along gradients rather than discrete classes. Manual label subjectivity by a single labeller may introduce bias, though this was mitigated by conservative inclusion criteria requiring explicit trophic mode statements.</p>
      </sec>
      <sec sec-type="Future Work">
        <title>Future Work</title>
        <p>Several directions could extend this pilot into a more comprehensive tool:</p>
        <p>1. Expanding dataset scope: Increasing the corpus to hundreds or thousands of abstracts would improve model robustness and allow detection of rarer trophic patterns. This could include full-text articles rather than abstracts alone and taxa beyond fungi to establish cross-kingdom applicability;</p>
        <p>2. Multi-label classification for ecological gradients: Rather than binary solo/dual classification, future models could predict specific trophic modes (e.g. saprotroph, symbiont, pathogen, endophyte) as non-exclusive labels. This would capture the continuous nature of ecological roles and better reflect biological reality where taxa may simultaneously occupy multiple niches;</p>
        <p>3. Integration with environmental metadata: Linking text-derived traits with geographic, climatic or substrate data could enable context-aware predictions; for example, predicting how a taxon's trophic mode might shift under different environmental conditions;</p>
        <p>4. Genomic and metabolomic predictors: Combining textual information with molecular data (e.g. gene content, secondary metabolite profiles) could improve prediction accuracy and provide mechanistic insights into trophic flexibility;</p>
        <p>5. Domain-specific pretraining: Training a BERT-style model from scratch on ecological and mycological corpora (following the BiodivBERT approach) could yield better performance than adapting biomedical models.</p>
      </sec>
    </sec>
    <sec sec-type="Conclusions">
      <title>Conclusions</title>
      <p>This pilot study demonstrates that transformer-based NLP can successfully extract fungal trophic mode information from scientific abstracts. BioBERT and BERT-cased achieved ~ 89% accuracy in classifying abstracts as describing single or multiple trophic modes, validating the feasibility of automated trait extraction for fungal ecology. The key contributions of this work are:</p>
      <p>1. Proof-of-concept: Pretrained biomedical language models can generalise to ecological classification tasks with minimal fine-tuning;</p>
      <p>2. Reproducible workflow: All code and data are openly available, enabling replication and extension by other researchers;</p>
      <p>3. Trait database integration: The approach complements existing resources like FUNGuild and FungalTraits by providing a scalable method to flag taxa with lifestyle plasticity. As fungal trait databases continue to grow in importance for biodiversity assessments and ecosystem modelling, automated text mining offers a path towards more efficient, consistent and comprehensive trait annotation. This workflow provides a foundation for scaling to larger datasets and more nuanced ecological classifications.</p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgements</title>
      <p>Thank you to Dr. Nancy Johnson and Dr. Kitty Gehring for guidance and support on this and other projects. Special thanks to Anne-Marie Cooper, as well.</p>
    </ack>
    <sec sec-type="Conflicts of interest">
      <title>Conflicts of interest</title>
      <p>No conflict of interest to declare</p>
      <p>Disclaimer: This article is (co-)authored by any of the Editors-in-Chief, Managing Editors or their deputies in this journal.</p>
    </sec>
    <ref-list>
      <title>References</title>
      <ref id="B13603935">
        <element-citation publication-type="conference-paper">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Abdelmageed</surname>
              <given-names>N.</given-names>
            </name>
            <name name-style="western">
              <surname>Löffler</surname>
              <given-names>F.</given-names>
            </name>
            <name name-style="western">
              <surname>König-Ries</surname>
              <given-names>B.</given-names>
            </name>
          </person-group>
          <year>2023</year>
          <person-group person-group-type="editor">
            <name name-style="western">
              <surname>Yamaguchi</surname>
              <given-names>Atsuko</given-names>
            </name>
            <etal/>
          </person-group>
          <article-title>BiodivBERT: a Pre-Trained Language Model for the Biodiversity Domain</article-title>
          <source>SWAT4HCLS</source>
          <size units="page">62-71</size>
          <uri>https://ceur-ws.org/Vol-3415/paper-7.pdf</uri>
        </element-citation>
      </ref>
      <ref id="B13584283">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Berbee</surname>
              <given-names>Mary L.</given-names>
            </name>
            <name name-style="western">
              <surname>James</surname>
              <given-names>Timothy Y.</given-names>
            </name>
            <name name-style="western">
              <surname>Strullu-Derrien</surname>
              <given-names>Christine</given-names>
            </name>
          </person-group>
          <year>2017</year>
          <article-title>Early Diverging Fungi: Diversity and Impact at the Dawn of Terrestrial Life</article-title>
          <source>Annual Review of Microbiology</source>
          <volume>71</volume>
          <issue>1</issue>
          <fpage>41</fpage>
          <lpage>60</lpage>
          <pub-id pub-id-type="doi">10.1146/annurev-micro-030117-020324</pub-id>
        </element-citation>
      </ref>
      <ref id="B13800311">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Bock</surname>
              <given-names>Beatrice</given-names>
            </name>
          </person-group>
          <year>2026</year>
          <article-title>beabock/biobert_dualsolo: Reproducible BioBERT &amp; BERT model comparison (4 models, 5-fold CV)</article-title>
          <source>Zenodo</source>
          <pub-id pub-id-type="doi">10.5281/zenodo.17343492</pub-id>
        </element-citation>
      </ref>
      <ref id="B13603850">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Cornelius</surname>
              <given-names>Joseph</given-names>
            </name>
            <name name-style="western">
              <surname>Detering</surname>
              <given-names>Harald</given-names>
            </name>
            <name name-style="western">
              <surname>Lithgow-Serrano</surname>
              <given-names>Oscar</given-names>
            </name>
            <name name-style="western">
              <surname>Agosti</surname>
              <given-names>Donat</given-names>
            </name>
            <name name-style="western">
              <surname>Rinaldi</surname>
              <given-names>Fabio</given-names>
            </name>
            <name name-style="western">
              <surname>Waterhouse</surname>
              <given-names>Robert</given-names>
            </name>
          </person-group>
          <year>2025</year>
          <article-title>From literature to biodiversity data: mining arthropod organismal traits with machine learning</article-title>
          <source>Biodiversity Data Journal</source>
          <volume>13</volume>
          <pub-id pub-id-type="doi">10.3897/bdj.13.e153070</pub-id>
        </element-citation>
      </ref>
      <ref id="B13800104">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Devlin</surname>
              <given-names>Jacob</given-names>
            </name>
            <name name-style="western">
              <surname>Chang</surname>
              <given-names>Ming-Wei</given-names>
            </name>
            <name name-style="western">
              <surname>Lee</surname>
              <given-names>Kenton</given-names>
            </name>
            <name name-style="western">
              <surname>Toutanova</surname>
              <given-names>Kristina</given-names>
            </name>
          </person-group>
          <year>2018</year>
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          <source>CoRR</source>
          <uri>http://arxiv.org/abs/1810.04805</uri>
        </element-citation>
      </ref>
      <ref id="B13603861">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Domazetoski</surname>
              <given-names>Viktor</given-names>
            </name>
            <name name-style="western">
              <surname>Kreft</surname>
              <given-names>Holger</given-names>
            </name>
            <name name-style="western">
              <surname>Bestova</surname>
              <given-names>Helena</given-names>
            </name>
            <name name-style="western">
              <surname>Wieder</surname>
              <given-names>Philipp</given-names>
            </name>
            <name name-style="western">
              <surname>Koynov</surname>
              <given-names>Radoslav</given-names>
            </name>
            <name name-style="western">
              <surname>Zarei</surname>
              <given-names>Alireza</given-names>
            </name>
            <name name-style="western">
              <surname>Weigelt</surname>
              <given-names>Patrick</given-names>
            </name>
          </person-group>
          <year>2025</year>
          <article-title>Using large language models to extract plant functional traits from unstructured text</article-title>
          <source>Applications in Plant Sciences</source>
          <volume>13</volume>
          <issue>3</issue>
          <pub-id pub-id-type="doi">10.1002/aps3.70011</pub-id>
        </element-citation>
      </ref>
      <ref id="B13603904">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Gu</surname>
              <given-names>Yu</given-names>
            </name>
            <name name-style="western">
              <surname>Tinn</surname>
              <given-names>Robert</given-names>
            </name>
            <name name-style="western">
              <surname>Cheng</surname>
              <given-names>Hao</given-names>
            </name>
            <name name-style="western">
              <surname>Lucas</surname>
              <given-names>Michael</given-names>
            </name>
            <name name-style="western">
              <surname>Usuyama</surname>
              <given-names>Naoto</given-names>
            </name>
            <name name-style="western">
              <surname>Liu</surname>
              <given-names>Xiaodong</given-names>
            </name>
            <name name-style="western">
              <surname>Naumann</surname>
              <given-names>Tristan</given-names>
            </name>
            <name name-style="western">
              <surname>Gao</surname>
              <given-names>Jianfeng</given-names>
            </name>
            <name name-style="western">
              <surname>Poon</surname>
              <given-names>Hoifung</given-names>
            </name>
          </person-group>
          <year>2021</year>
          <article-title>Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing</article-title>
          <source>ACM Transactions on Computing for Healthcare</source>
          <volume>3</volume>
          <issue>1</issue>
          <fpage>1</fpage>
          <lpage>23</lpage>
          <pub-id pub-id-type="doi">10.1145/3458754</pub-id>
        </element-citation>
      </ref>
      <ref id="B13584301">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Lee</surname>
              <given-names>Jinhyuk</given-names>
            </name>
            <name name-style="western">
              <surname>Yoon</surname>
              <given-names>Wonjin</given-names>
            </name>
            <name name-style="western">
              <surname>Kim</surname>
              <given-names>Sungdong</given-names>
            </name>
            <name name-style="western">
              <surname>Kim</surname>
              <given-names>Donghyeon</given-names>
            </name>
            <name name-style="western">
              <surname>Kim</surname>
              <given-names>Sunkyu</given-names>
            </name>
            <name name-style="western">
              <surname>So</surname>
              <given-names>Chan Ho</given-names>
            </name>
            <name name-style="western">
              <surname>Kang</surname>
              <given-names>Jaewoo</given-names>
            </name>
          </person-group>
          <year>2019</year>
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          <source>Bioinformatics</source>
          <volume>36</volume>
          <issue>4</issue>
          <fpage>1234</fpage>
          <lpage>1240</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btz682</pub-id>
        </element-citation>
      </ref>
      <ref id="B13584313">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Martin</surname>
              <given-names>Francis</given-names>
            </name>
            <name name-style="western">
              <surname>Tan</surname>
              <given-names>Hao</given-names>
            </name>
          </person-group>
          <year>2025</year>
          <article-title>Saprotrophy-to-symbiosis continuum in fungi</article-title>
          <source>Current Biology</source>
          <volume>35</volume>
          <issue>11</issue>
          <pub-id pub-id-type="doi">10.1016/j.cub.2025.01.032</pub-id>
        </element-citation>
      </ref>
      <ref id="B13584322">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Nguyen</surname>
              <given-names>Nhu H.</given-names>
            </name>
            <name name-style="western">
              <surname>Song</surname>
              <given-names>Zewei</given-names>
            </name>
            <name name-style="western">
              <surname>Bates</surname>
              <given-names>Scott T.</given-names>
            </name>
            <name name-style="western">
              <surname>Branco</surname>
              <given-names>Sara</given-names>
            </name>
            <name name-style="western">
              <surname>Tedersoo</surname>
              <given-names>Leho</given-names>
            </name>
            <name name-style="western">
              <surname>Menke</surname>
              <given-names>Jon</given-names>
            </name>
            <name name-style="western">
              <surname>Schilling</surname>
              <given-names>Jonathan S.</given-names>
            </name>
            <name name-style="western">
              <surname>Kennedy</surname>
              <given-names>Peter G.</given-names>
            </name>
          </person-group>
          <year>2016</year>
          <article-title>FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild</article-title>
          <source>Fungal Ecology</source>
          <volume>20</volume>
          <fpage>241</fpage>
          <lpage>248</lpage>
          <pub-id pub-id-type="doi">10.1016/j.funeco.2015.06.006</pub-id>
        </element-citation>
      </ref>
      <ref id="B13800289">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Paszke</surname>
              <given-names>Adam</given-names>
            </name>
            <name name-style="western">
              <surname>Gross</surname>
              <given-names>Sam</given-names>
            </name>
            <name name-style="western">
              <surname>Massa</surname>
              <given-names>Francisco</given-names>
            </name>
            <name name-style="western">
              <surname>Lerer</surname>
              <given-names>Gabriel</given-names>
            </name>
            <name name-style="western">
              <surname>Bradbury</surname>
              <given-names>James</given-names>
            </name>
            <name name-style="western">
              <surname>Chanan</surname>
              <given-names>Gregory</given-names>
            </name>
            <name name-style="western">
              <surname>Killeen</surname>
              <given-names>Trevor</given-names>
            </name>
            <name name-style="western">
              <surname>Lin</surname>
              <given-names>Zeming</given-names>
            </name>
            <name name-style="western">
              <surname>Gimelshein</surname>
              <given-names>Naresh</given-names>
            </name>
            <name name-style="western">
              <surname>Chintala</surname>
              <given-names>Soumith</given-names>
            </name>
            <etal/>
          </person-group>
          <year>2019</year>
          <article-title>PyTorch: An Imperative Style, High-Performance Deep Learning Library</article-title>
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          <fpage>8024</fpage>
          <lpage>8035</lpage>
        </element-citation>
      </ref>
      <ref id="B13800082">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Pedregosa</surname>
              <given-names>F.</given-names>
            </name>
            <name name-style="western">
              <surname>Varoquaux</surname>
              <given-names>G.</given-names>
            </name>
            <name name-style="western">
              <surname>Gramfort</surname>
              <given-names>A.</given-names>
            </name>
            <name name-style="western">
              <surname>Michel</surname>
              <given-names>V.</given-names>
            </name>
            <name name-style="western">
              <surname>Thirion</surname>
              <given-names>B.</given-names>
            </name>
            <name name-style="western">
              <surname>Grisel</surname>
              <given-names>O.</given-names>
            </name>
            <name name-style="western">
              <surname>Blondel</surname>
              <given-names>M.</given-names>
            </name>
            <name name-style="western">
              <surname>Prettenhofer</surname>
              <given-names>P.</given-names>
            </name>
            <name name-style="western">
              <surname>Weiss</surname>
              <given-names>R.</given-names>
            </name>
            <name name-style="western">
              <surname>Dubourg</surname>
              <given-names>V.</given-names>
            </name>
            <name name-style="western">
              <surname>Vanderplas</surname>
              <given-names>J.</given-names>
            </name>
            <name name-style="western">
              <surname>Passos</surname>
              <given-names>A.</given-names>
            </name>
            <name name-style="western">
              <surname>Cournapeau</surname>
              <given-names>D.</given-names>
            </name>
            <name name-style="western">
              <surname>Brucher</surname>
              <given-names>M.</given-names>
            </name>
            <name name-style="western">
              <surname>Perrot</surname>
              <given-names>M.</given-names>
            </name>
            <name name-style="western">
              <surname>Duchesnay</surname>
              <given-names>E.</given-names>
            </name>
          </person-group>
          <year>2011</year>
          <article-title>Scikit-learn: Machine Learning in {P}ython</article-title>
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          <fpage>2825</fpage>
          <lpage>2830</lpage>
        </element-citation>
      </ref>
      <ref id="B13603873">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Pei</surname>
              <given-names>Qizhi</given-names>
            </name>
            <name name-style="western">
              <surname>Zhang</surname>
              <given-names>Wei</given-names>
            </name>
            <name name-style="western">
              <surname>Zhu</surname>
              <given-names>Jinhua</given-names>
            </name>
            <name name-style="western">
              <surname>Wu</surname>
              <given-names>Kehan</given-names>
            </name>
            <name name-style="western">
              <surname>Gao</surname>
              <given-names>Kaiyuan</given-names>
            </name>
            <name name-style="western">
              <surname>Wu</surname>
              <given-names>Lijun</given-names>
            </name>
            <name name-style="western">
              <surname>Xia</surname>
              <given-names>Yingce</given-names>
            </name>
            <name name-style="western">
              <surname>Yan</surname>
              <given-names>Rui</given-names>
            </name>
          </person-group>
          <year>2023</year>
          <article-title>BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations</article-title>
          <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          <pub-id pub-id-type="doi">10.18653/v1/2023.emnlp-main.70</pub-id>
        </element-citation>
      </ref>
      <ref id="B13597550">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Põlme</surname>
              <given-names>Sergei</given-names>
            </name>
            <name name-style="western">
              <surname>Abarenkov</surname>
              <given-names>Kessy</given-names>
            </name>
            <name name-style="western">
              <surname>Henrik Nilsson</surname>
              <given-names>R.</given-names>
            </name>
            <name name-style="western">
              <surname>Lindahl</surname>
              <given-names>Björn D.</given-names>
            </name>
            <name name-style="western">
              <surname>Clemmensen</surname>
              <given-names>Karina Engelbrecht</given-names>
            </name>
            <name name-style="western">
              <surname>Kauserud</surname>
              <given-names>Havard</given-names>
            </name>
            <name name-style="western">
              <surname>Nguyen</surname>
              <given-names>Nhu</given-names>
            </name>
            <name name-style="western">
              <surname>Kjøller</surname>
              <given-names>Rasmus</given-names>
            </name>
            <name name-style="western">
              <surname>Bates</surname>
              <given-names>Scott T.</given-names>
            </name>
            <name name-style="western">
              <surname>Baldrian</surname>
              <given-names>Petr</given-names>
            </name>
            <name name-style="western">
              <surname>Frøslev</surname>
              <given-names>Tobias Guldberg</given-names>
            </name>
            <name name-style="western">
              <surname>Adojaan</surname>
              <given-names>Kristjan</given-names>
            </name>
            <name name-style="western">
              <surname>Vizzini</surname>
              <given-names>Alfredo</given-names>
            </name>
            <name name-style="western">
              <surname>Suija</surname>
              <given-names>Ave</given-names>
            </name>
            <name name-style="western">
              <surname>Pfister</surname>
              <given-names>Donald</given-names>
            </name>
            <name name-style="western">
              <surname>Baral</surname>
              <given-names>Hans-Otto</given-names>
            </name>
            <name name-style="western">
              <surname>Järv</surname>
              <given-names>Helle</given-names>
            </name>
            <name name-style="western">
              <surname>Madrid</surname>
              <given-names>Hugo</given-names>
            </name>
            <name name-style="western">
              <surname>Nordén</surname>
              <given-names>Jenni</given-names>
            </name>
            <name name-style="western">
              <surname>Liu</surname>
              <given-names>Jian-Kui</given-names>
            </name>
            <name name-style="western">
              <surname>Pawlowska</surname>
              <given-names>Julia</given-names>
            </name>
            <name name-style="western">
              <surname>Põldmaa</surname>
              <given-names>Kadri</given-names>
            </name>
            <name name-style="western">
              <surname>Pärtel</surname>
              <given-names>Kadri</given-names>
            </name>
            <name name-style="western">
              <surname>Runnel</surname>
              <given-names>Kadri</given-names>
            </name>
            <name name-style="western">
              <surname>Hansen</surname>
              <given-names>Karen</given-names>
            </name>
            <name name-style="western">
              <surname>Larsson</surname>
              <given-names>Karl-Henrik</given-names>
            </name>
            <name name-style="western">
              <surname>Hyde</surname>
              <given-names>Kevin David</given-names>
            </name>
            <name name-style="western">
              <surname>Sandoval-Denis</surname>
              <given-names>Marcelo</given-names>
            </name>
            <name name-style="western">
              <surname>Smith</surname>
              <given-names>Matthew E.</given-names>
            </name>
            <name name-style="western">
              <surname>Toome-Heller</surname>
              <given-names>Merje</given-names>
            </name>
            <name name-style="western">
              <surname>Wijayawardene</surname>
              <given-names>Nalin N.</given-names>
            </name>
            <name name-style="western">
              <surname>Menolli</surname>
              <given-names>Nelson</given-names>
            </name>
            <name name-style="western">
              <surname>Reynolds</surname>
              <given-names>Nicole K.</given-names>
            </name>
            <name name-style="western">
              <surname>Drenkhan</surname>
              <given-names>Rein</given-names>
            </name>
            <name name-style="western">
              <surname>Maharachchikumbura</surname>
              <given-names>Sajeewa S. N.</given-names>
            </name>
            <name name-style="western">
              <surname>Gibertoni</surname>
              <given-names>Tatiana B.</given-names>
            </name>
            <name name-style="western">
              <surname>Læssøe</surname>
              <given-names>Thomas</given-names>
            </name>
            <name name-style="western">
              <surname>Davis</surname>
              <given-names>William</given-names>
            </name>
            <name name-style="western">
              <surname>Tokarev</surname>
              <given-names>Yuri</given-names>
            </name>
            <name name-style="western">
              <surname>Corrales</surname>
              <given-names>Adriana</given-names>
            </name>
            <name name-style="western">
              <surname>Soares</surname>
              <given-names>Adriene Mayra</given-names>
            </name>
            <name name-style="western">
              <surname>Agan</surname>
              <given-names>Ahto</given-names>
            </name>
            <name name-style="western">
              <surname>Machado</surname>
              <given-names>Alexandre Reis</given-names>
            </name>
            <name name-style="western">
              <surname>Argüelles-Moyao</surname>
              <given-names>Andrés</given-names>
            </name>
            <name name-style="western">
              <surname>Detheridge</surname>
              <given-names>Andrew</given-names>
            </name>
            <name name-style="western">
              <surname>de Meiras-Ottoni</surname>
              <given-names>Angelina</given-names>
            </name>
            <name name-style="western">
              <surname>Verbeken</surname>
              <given-names>Annemieke</given-names>
            </name>
            <name name-style="western">
              <surname>Dutta</surname>
              <given-names>Arun Kumar</given-names>
            </name>
            <name name-style="western">
              <surname>Cui</surname>
              <given-names>Bao-Kai</given-names>
            </name>
            <name name-style="western">
              <surname>Pradeep</surname>
              <given-names>C. K.</given-names>
            </name>
            <name name-style="western">
              <surname>Marín</surname>
              <given-names>César</given-names>
            </name>
            <name name-style="western">
              <surname>Stanton</surname>
              <given-names>Daniel</given-names>
            </name>
            <name name-style="western">
              <surname>Gohar</surname>
              <given-names>Daniyal</given-names>
            </name>
            <name name-style="western">
              <surname>Wanasinghe</surname>
              <given-names>Dhanushka N.</given-names>
            </name>
            <name name-style="western">
              <surname>Otsing</surname>
              <given-names>Eveli</given-names>
            </name>
            <name name-style="western">
              <surname>Aslani</surname>
              <given-names>Farzad</given-names>
            </name>
            <name name-style="western">
              <surname>Griffith</surname>
              <given-names>Gareth W.</given-names>
            </name>
            <name name-style="western">
              <surname>Lumbsch</surname>
              <given-names>Thorsten H.</given-names>
            </name>
            <name name-style="western">
              <surname>Grossart</surname>
              <given-names>Hans-Peter</given-names>
            </name>
            <name name-style="western">
              <surname>Masigol</surname>
              <given-names>Hossein</given-names>
            </name>
            <name name-style="western">
              <surname>Timling</surname>
              <given-names>Ina</given-names>
            </name>
            <name name-style="western">
              <surname>Hiiesalu</surname>
              <given-names>Inga</given-names>
            </name>
            <name name-style="western">
              <surname>Oja</surname>
              <given-names>Jane</given-names>
            </name>
            <name name-style="western">
              <surname>Kupagme</surname>
              <given-names>John Y.</given-names>
            </name>
            <name name-style="western">
              <surname>Geml</surname>
              <given-names>József</given-names>
            </name>
            <name name-style="western">
              <surname>Alvarez-Manjarrez</surname>
              <given-names>Julieta</given-names>
            </name>
            <name name-style="western">
              <surname>Ilves</surname>
              <given-names>Kai</given-names>
            </name>
            <name name-style="western">
              <surname>Loit</surname>
              <given-names>Kaire</given-names>
            </name>
            <name name-style="western">
              <surname>Adamson</surname>
              <given-names>Kalev</given-names>
            </name>
            <name name-style="western">
              <surname>Nara</surname>
              <given-names>Kazuhide</given-names>
            </name>
            <name name-style="western">
              <surname>Küngas</surname>
              <given-names>Kati</given-names>
            </name>
            <name name-style="western">
              <surname>Rojas-Jimenez</surname>
              <given-names>Keilor</given-names>
            </name>
            <name name-style="western">
              <surname>Bitenieks</surname>
              <given-names>Krišs</given-names>
            </name>
            <name name-style="western">
              <surname>Irinyi</surname>
              <given-names>Laszlo</given-names>
            </name>
            <name name-style="western">
              <surname>Nagy</surname>
              <given-names>László G.</given-names>
            </name>
            <name name-style="western">
              <surname>Soonvald</surname>
              <given-names>Liina</given-names>
            </name>
            <name name-style="western">
              <surname>Zhou</surname>
              <given-names>Li-Wei</given-names>
            </name>
            <name name-style="western">
              <surname>Wagner</surname>
              <given-names>Lysett</given-names>
            </name>
            <name name-style="western">
              <surname>Aime</surname>
              <given-names>M. Catherine</given-names>
            </name>
            <name name-style="western">
              <surname>Öpik</surname>
              <given-names>Maarja</given-names>
            </name>
            <name name-style="western">
              <surname>Mujica</surname>
              <given-names>María Isabel</given-names>
            </name>
            <name name-style="western">
              <surname>Metsoja</surname>
              <given-names>Martin</given-names>
            </name>
            <name name-style="western">
              <surname>Ryberg</surname>
              <given-names>Martin</given-names>
            </name>
            <name name-style="western">
              <surname>Vasar</surname>
              <given-names>Martti</given-names>
            </name>
            <name name-style="western">
              <surname>Murata</surname>
              <given-names>Masao</given-names>
            </name>
            <name name-style="western">
              <surname>Nelsen</surname>
              <given-names>Matthew P.</given-names>
            </name>
            <name name-style="western">
              <surname>Cleary</surname>
              <given-names>Michelle</given-names>
            </name>
            <name name-style="western">
              <surname>Samarakoon</surname>
              <given-names>Milan C.</given-names>
            </name>
            <name name-style="western">
              <surname>Doilom</surname>
              <given-names>Mingkwan</given-names>
            </name>
            <name name-style="western">
              <surname>Bahram</surname>
              <given-names>Mohammad</given-names>
            </name>
            <name name-style="western">
              <surname>Hagh-Doust</surname>
              <given-names>Niloufar</given-names>
            </name>
            <name name-style="western">
              <surname>Dulya</surname>
              <given-names>Olesya</given-names>
            </name>
            <name name-style="western">
              <surname>Johnston</surname>
              <given-names>Peter</given-names>
            </name>
            <name name-style="western">
              <surname>Kohout</surname>
              <given-names>Petr</given-names>
            </name>
            <name name-style="western">
              <surname>Chen</surname>
              <given-names>Qian</given-names>
            </name>
            <name name-style="western">
              <surname>Tian</surname>
              <given-names>Qing</given-names>
            </name>
            <name name-style="western">
              <surname>Nandi</surname>
              <given-names>Rajasree</given-names>
            </name>
            <name name-style="western">
              <surname>Amiri</surname>
              <given-names>Rasekh</given-names>
            </name>
            <name name-style="western">
              <surname>Perera</surname>
              <given-names>Rekhani Hansika</given-names>
            </name>
            <name name-style="western">
              <surname>dos Santos Chikowski</surname>
              <given-names>Renata</given-names>
            </name>
            <name name-style="western">
              <surname>Mendes-Alvarenga</surname>
              <given-names>Renato L.</given-names>
            </name>
            <name name-style="western">
              <surname>Garibay-Orijel</surname>
              <given-names>Roberto</given-names>
            </name>
            <name name-style="western">
              <surname>Gielen</surname>
              <given-names>Robin</given-names>
            </name>
            <name name-style="western">
              <surname>Phookamsak</surname>
              <given-names>Rungtiwa</given-names>
            </name>
            <name name-style="western">
              <surname>Jayawardena</surname>
              <given-names>Ruvishika S.</given-names>
            </name>
            <name name-style="western">
              <surname>Rahimlou</surname>
              <given-names>Saleh</given-names>
            </name>
            <name name-style="western">
              <surname>Karunarathna</surname>
              <given-names>Samantha C.</given-names>
            </name>
            <name name-style="western">
              <surname>Tibpromma</surname>
              <given-names>Saowaluck</given-names>
            </name>
            <name name-style="western">
              <surname>Brown</surname>
              <given-names>Shawn P.</given-names>
            </name>
            <name name-style="western">
              <surname>Sepp</surname>
              <given-names>Siim-Kaarel</given-names>
            </name>
            <name name-style="western">
              <surname>Mundra</surname>
              <given-names>Sunil</given-names>
            </name>
            <name name-style="western">
              <surname>Luo</surname>
              <given-names>Zhu-Hua</given-names>
            </name>
            <name name-style="western">
              <surname>Bose</surname>
              <given-names>Tanay</given-names>
            </name>
            <name name-style="western">
              <surname>Vahter</surname>
              <given-names>Tanel</given-names>
            </name>
            <name name-style="western">
              <surname>Netherway</surname>
              <given-names>Tarquin</given-names>
            </name>
            <name name-style="western">
              <surname>Yang</surname>
              <given-names>Teng</given-names>
            </name>
            <name name-style="western">
              <surname>May</surname>
              <given-names>Tom</given-names>
            </name>
            <name name-style="western">
              <surname>Varga</surname>
              <given-names>Torda</given-names>
            </name>
            <name name-style="western">
              <surname>Li</surname>
              <given-names>Wei</given-names>
            </name>
            <name name-style="western">
              <surname>Coimbra</surname>
              <given-names>Victor Rafael Matos</given-names>
            </name>
            <name name-style="western">
              <surname>de Oliveira</surname>
              <given-names>Virton Rodrigo Targino</given-names>
            </name>
            <name name-style="western">
              <surname>de Lima</surname>
              <given-names>Vitor Xavier</given-names>
            </name>
            <name name-style="western">
              <surname>Mikryukov</surname>
              <given-names>Vladimir S.</given-names>
            </name>
            <name name-style="western">
              <surname>Lu</surname>
              <given-names>Yongzhong</given-names>
            </name>
            <name name-style="western">
              <surname>Matsuda</surname>
              <given-names>Yosuke</given-names>
            </name>
            <name name-style="western">
              <surname>Miyamoto</surname>
              <given-names>Yumiko</given-names>
            </name>
            <name name-style="western">
              <surname>Kõljalg</surname>
              <given-names>Urmas</given-names>
            </name>
            <name name-style="western">
              <surname>Tedersoo</surname>
              <given-names>Leho</given-names>
            </name>
          </person-group>
          <year>2021</year>
          <article-title>FungalTraits: a user-friendly traits database of fungi and fungus-like stramenopiles</article-title>
          <source>Fungal Diversity</source>
          <volume>105</volume>
          <issue>1</issue>
          <fpage>1</fpage>
          <lpage>16</lpage>
          <pub-id pub-id-type="doi">10.1007/s13225-020-00466-2</pub-id>
        </element-citation>
      </ref>
      <ref id="B13800281">
        <element-citation publication-type="software">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Foundation</surname>
              <given-names>Python Software</given-names>
            </name>
          </person-group>
          <year>2020</year>
          <article-title>Python 3.9.0</article-title>
          <uri>https://docs.python.org/3.9</uri>
        </element-citation>
      </ref>
      <ref id="B13584468">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Violle</surname>
              <given-names>Cyrille</given-names>
            </name>
            <name name-style="western">
              <surname>Borgy</surname>
              <given-names>Benjamin</given-names>
            </name>
            <name name-style="western">
              <surname>Choler</surname>
              <given-names>Philippe</given-names>
            </name>
          </person-group>
          <year>2015</year>
          <article-title>Trait databases: misuses and precautions</article-title>
          <source>Journal of Vegetation Science</source>
          <volume>26</volume>
          <issue>5</issue>
          <fpage>826</fpage>
          <lpage>827</lpage>
          <pub-id pub-id-type="doi">10.1111/jvs.12325</pub-id>
        </element-citation>
      </ref>
      <ref id="B13603886">
        <element-citation publication-type="article">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Warner</surname>
              <given-names>Benjamin</given-names>
            </name>
            <name name-style="western">
              <surname>Chaffin</surname>
              <given-names>Antoine</given-names>
            </name>
            <name name-style="western">
              <surname>Clavié</surname>
              <given-names>Benjamin</given-names>
            </name>
            <name name-style="western">
              <surname>Weller</surname>
              <given-names>Orion</given-names>
            </name>
            <name name-style="western">
              <surname>Hallström</surname>
              <given-names>Oskar</given-names>
            </name>
            <name name-style="western">
              <surname>Taghadouini</surname>
              <given-names>Said</given-names>
            </name>
            <name name-style="western">
              <surname>Gallagher</surname>
              <given-names>Alexis</given-names>
            </name>
            <name name-style="western">
              <surname>Biswas</surname>
              <given-names>Raja</given-names>
            </name>
            <name name-style="western">
              <surname>Ladhak</surname>
              <given-names>Faisal</given-names>
            </name>
            <name name-style="western">
              <surname>Aarsen</surname>
              <given-names>Tom</given-names>
            </name>
            <name name-style="western">
              <surname>Adams</surname>
              <given-names>Griffin Thomas</given-names>
            </name>
            <name name-style="western">
              <surname>Howard</surname>
              <given-names>Jeremy</given-names>
            </name>
            <name name-style="western">
              <surname>Poli</surname>
              <given-names>Iacopo</given-names>
            </name>
          </person-group>
          <year>2025</year>
          <article-title>Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference</article-title>
          <source>Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          <fpage>2526</fpage>
          <lpage>2547</lpage>
          <pub-id pub-id-type="doi">10.18653/v1/2025.acl-long.127</pub-id>
        </element-citation>
      </ref>
      <ref id="B13800266">
        <element-citation publication-type="conference-paper">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Wolf</surname>
              <given-names>Thomas</given-names>
            </name>
            <etal/>
          </person-group>
          <year>2020</year>
          <person-group person-group-type="editor">
            <anonymous/>
          </person-group>
          <article-title>Transformers: State-of-the-Art Natural Language Processing</article-title>
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          <conf-loc>Online</conf-loc>
          <publisher-name>Association for Computational Linguistics</publisher-name>
          <uri>https://www.aclweb.org/anthology/2020.emnlp-demos.6</uri>
        </element-citation>
      </ref>
    </ref-list>
  </back>
  <floats-group>
    <fig id="F13800309" position="float" orientation="portrait">
      <object-id content-type="arpha">72BB6A22-3937-5AFB-A970-66A468126C82</object-id>
      <object-id content-type="doi">10.3897/rio.12.e176590.figure4</object-id>
      <label>Figure 1.</label>
      <caption>
        <p>Workflow diagram of the classification pipeline. The diagram summarises the full end-to-end process: literature search (Web of Science queries), manual curation (56 labelled abstracts), preprocessing (text cleaning, tokenisation with truncation at 512 tokens and token-length QC), stratified 5-fold cross-validation, model fine-tuning across four models (BERT-base-uncased, BERT-base-cased, BioBERT v.1.1, BiodivBERT) with standardised hyperparameters, evaluation (metrics, confusion matrices, learning curves) and outputs (fine-tuned models, predictions).</p>
      </caption>
      <graphic xlink:href="rio-12-e176590-g001.png" position="float" id="oo_1505896.png" orientation="portrait" xlink:type="simple">
        <uri content-type="original_file">https://binary.pensoft.net/fig/1505896</uri>
      </graphic>
    </fig>
    <fig id="F13584482" position="float" orientation="portrait">
      <object-id content-type="arpha">5E195455-E3B6-563B-A41E-BF308FA1CADD</object-id>
      <object-id content-type="doi">10.3897/rio.12.e176590.figure2</object-id>
      <label>Figure 2.</label>
      <caption>
        <p>Comparative model performance across four transformer-based architectures. Bar charts show mean classification metrics (accuracy, precision, recall, F1-score) ± standard deviation from stratified 5-fold cross-validation (n = 56 abstracts). BioBERT (biomedical domain-adapted) and BERT-base-cased achieved statistically equivalent performance (~ 89% accuracy), substantially outperforming BERT-base-uncased (~ 75%) and BiodivBERT (~ 77%). Case sensitivity proved critical, with cased models outperforming uncased by ~ 15 percentage points. Metrics are calculated as macro averages (unweighted mean across dual and solo classes).</p>
      </caption>
      <graphic xlink:href="rio-12-e176590-g002.png" position="float" id="oo_1505872.png" orientation="portrait" xlink:type="simple">
        <uri content-type="original_file">https://binary.pensoft.net/fig/1505872</uri>
      </graphic>
    </fig>
    <fig id="F13584484" position="float" orientation="portrait">
      <object-id content-type="arpha">DD1BEBF5-1053-5CED-9DD2-E79E48C0E132</object-id>
      <object-id content-type="doi">10.3897/rio.12.e176590.figure4</object-id>
      <label>Figure 3.</label>
      <caption>
        <p>Aggregated confusion matrices for all four models (BioBERT, BERT-base-cased, BERT-base-uncased, BiodivBERT) across all five folds (total 56 predictions per model). Each matrix shows true labels (Solo = single trophic mode, Dual = multiple trophic modes) versus predicted labels, allowing direct comparison of error patterns and class balance for each model. BioBERT and BERT-base-cased show balanced performance, while uncased and BiodivBERT models display more misclassifications. Colour intensity indicates prediction frequency; diagonal cells represent correct predictions.</p>
      </caption>
      <graphic xlink:href="rio-12-e176590-g003.png" position="float" id="oo_1505873.png" orientation="portrait" xlink:type="simple">
        <uri content-type="original_file">https://binary.pensoft.net/fig/1505873</uri>
      </graphic>
    </fig>
    <fig id="F13800080" position="float" orientation="portrait">
      <object-id content-type="arpha">C8D69530-B51C-51AF-8A76-5839389E6A18</object-id>
      <object-id content-type="doi">10.3897/rio.12.e176590.figure4</object-id>
      <label>Figure 4.</label>
      <caption>
        <p>Training time comparison across models. BioBERT and BERT-base-cased completed 5-fold cross-validation training in ~ 10-11 minutes, while BiodivBERT and BERT-base-uncased required ~ 35 minutes. Differences likely reflect tokenisation efficiency and convergence patterns rather than model size (all models have ~ 110M parameters). Faster convergence in cased models correlates with higher classification accuracy, suggesting that case-preserving tokenisation provides stronger learning signals for this taxonomic text classification task. Training performed on NAU Monsoon HPC cluster (Tesla K80 GPU, CUDA 11.4).</p>
      </caption>
      <graphic xlink:href="rio-12-e176590-g004.png" position="float" id="oo_1505874.png" orientation="portrait" xlink:type="simple">
        <uri content-type="original_file">https://binary.pensoft.net/fig/1505874</uri>
      </graphic>
    </fig>
    <table-wrap id="T13584486" position="float" orientation="portrait">
      <label>Table 1.</label>
      <caption>
        <p>BioBERT classification performance on fungal trophic modes (5-fold CV). Precision, recall and F1-score are reported as macro averages (unweighted mean across both classes), which is appropriate for balanced binary classification and treats both classes equally regardless of support.</p>
      </caption>
      <table rules="all" border="1">
        <tbody>
          <tr>
            <td rowspan="1" colspan="1">
              <bold>Metric</bold>
            </td>
            <td rowspan="1" colspan="1">
              <bold>Value</bold>
            </td>
            <td rowspan="1" colspan="1">
              <bold>Note</bold>
            </td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Accuracy</td>
            <td rowspan="1" colspan="1">89.4% ± 11.6%</td>
            <td rowspan="1" colspan="1">Fraction of correctly predicted labels</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Precision</td>
            <td rowspan="1" colspan="1">89.9% ± 11.5%</td>
            <td rowspan="1" colspan="1">Positive predictive value (macro average)</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">Recall</td>
            <td rowspan="1" colspan="1">88.8% ± 12.4%</td>
            <td rowspan="1" colspan="1">True positive rate (macro average)</td>
          </tr>
          <tr>
            <td rowspan="1" colspan="1">F1-Score</td>
            <td rowspan="1" colspan="1">89.2% ± 12.0%</td>
            <td rowspan="1" colspan="1">Harmonic mean of precision and recall (macro average)</td>
          </tr>
        </tbody>
      </table>
    </table-wrap>
  </floats-group>
</article>
