Chemical Compounds and Their Links to Taxa on Wikidata

“Knowledge Graphs for Plant and Microbiome Multiomics” Symposium, Wageningen

Adriano Rutz

Institute for Molecular Systems Biology, ETH Zürich

October 14, 2025

Introduction

Linked open data as a macroscope for -omics

Slide adapted from Pierre-Marie Allard Slide adapted from Pierre-Marie Allard

Linked open data as a macroscope for -omics

Slide adapted from Pierre-Marie Allard Slide adapted from Pierre-Marie Allard

Linked open data as a macroscope for -omics

Slide adapted from Pierre-Marie Allard Slide adapted from Pierre-Marie Allard

Linked open data as a macroscope for -omics

Slide adapted from Pierre-Marie Allard Slide adapted from Pierre-Marie Allard

Linked open data as a macroscope for -omics

Slide adapted from Pierre-Marie Allard Slide adapted from Pierre-Marie Allard

Taxon-omics

Plant taxonomy would be the most useful guide to man in his search for new industrial and medicinal plants

de Candolle (1816)

Taxon-omics

Taxon-omics

For more information, see Rutz et al. (2019)

The LOTUS initiative

For more information, see Rutz et al. (2022)

Wikidata

For more information, see Waagmeester et al. (2020)

Triples on Wikidata

flowchart LR 
    A(("Subject")) -->|"Predicate"| B(("Object"))

flowchart LR
   A((Wikidata)) -->|"instance of"| B(("Knowledge base"))

flowchart LR
    A(("wd:Q2013")) -->|"wdt:P31"| B(("wd:Q33002955"))

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

Triples on Wikidata

flowchart LR
    A((<img src="https://upload.wikimedia.org/wikipedia/commons/6/66/Wikidata-logo-en.svg"; width="10" />)) -->|"instance of"| B(("Knowledge base"))
    A -->|"Language used"| C((Multilingual))

Triples on Wikidata

flowchart LR
    A((<img src="https://upload.wikimedia.org/wikipedia/commons/6/66/Wikidata-logo-en.svg"; width="10" />)) -->|"instance of"| B(("Knowledge base"))
    A -->|"Language used"| C((Multilingual))
    D((Privacy policy)) -->|"Language used"| C((Multilingual))
    E((<img src="https://upload.wikimedia.org/wikipedia/commons/2/20/YouTube_2024.svg"; width="10" />)) -->|"Privacy policy URL"| D
    F(("Youtuber")) --> |"Named after"| E
    G((<img src="https://upload.wikimedia.org/wikipedia/commons/6/60/NikkiedeJager2020-2.jpg"; width="10" />)) --> |"Occupation"| F

Triples on Wikidata

flowchart LR
    A((<img src="https://upload.wikimedia.org/wikipedia/commons/6/66/Wikidata-logo-en.svg"; width="10" />)) -->|"instance of"| B(("Knowledge base"))
    A -->|"Language used"| C((Multilingual))
    D((Privacy policy)) -->|"Language used"| C((Multilingual))
    E((<img src="https://upload.wikimedia.org/wikipedia/commons/2/20/YouTube_2024.svg"; width="10" />)) -->|"Privacy policy URL"| D
    F(("Youtuber")) --> |"Named after"| E
    G((<img src="https://upload.wikimedia.org/wikipedia/commons/6/60/NikkiedeJager2020-2.jpg"; width="10" />)) --> |"Occupation"| F
    G --> |"Place of Birth"| H(("Wageningen"))
    I((<img src="https://upload.wikimedia.org/wikipedia/commons/6/6a/WUR_forum_building.JPG"; width="10" />)) --> |"Headquarters location"| H(("Wageningen"))

SPARQL (chemical compounds)

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q11173 # chemical compound
}

Link to the query: https://w.wiki/Ff2k On 2025-10-13: 66 results

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q113145171 # type of chemical entity
}

Link to the query: https://w.wiki/Ff2o On 2025-10-13: 1,278,986 results

SPARQL (taxa)

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q16521 # taxon
}

Link to the query: https://w.wiki/Ff35 On 2025-10-13: 3,794,152 results

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?item ?ncbi_id ?gbif_id WHERE {
  ?item wdt:P31 wd:Q16521;  # taxon
        wdt:P685 ?ncbi_id.  # NCBI ID
  OPTIONAL { 
    ?item wdt:P846 ?gbif_id # GBIF ID
  }
  MINUS {
    ?item wdt:P9157 ?ott_id # Open Tree of Life ID
  }
}

Link to the query: https://qlever.dev/wikidata/Q8VmpN On 2025-10-13: 95,112 results

SPARQL (P703)

PREFIX p: <http://www.wikidata.org/prop/>
PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
#title: Referenced compound-taxon pairs on Wikidata.
SELECT DISTINCT ?compound ?compound_inchikey ?taxon ?taxon_name ?reference WHERE {
    ?compound wdt:P235 ?compound_inchikey;         # get the inchikey
    p:P703[                                        # statement found in taxon
      ps:P703 ?taxon;                              # get the taxon
              prov:wasDerivedFrom [
                pr:P248 ?reference ]               # get the reference
    ].
    ?taxon wdt:P225 ?taxon_name.                   # get the taxon scientific name
}

Link to the query: https://w.wiki/FfPJ On 2025-10-13: 675,781 results

SPARQL (P703) on specific taxon

PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT 
?structure ?structure_smiles ?structure_inchikey
WHERE {
  VALUES ?organism {
    wd:Q43084 # Piper nigrum
  }
  ?organism_child (wdt:P171*) ?organism.
  ?structure wdt:P233 ?structure_smiles;
  wdt:P235 ?structure_inchikey;
  (p:P703/ps:P703) ?organism_child.
}

Link to the query: https://w.wiki/FfPL On 2025-10-13: 433 results

SPARQL (P703) on specific taxon

PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT 
?structure ?structure_smiles ?structure_inchikey
WHERE {
  VALUES ?organism {
    wd:Q156522 # Piperaceae
  }
  ?organism_child (wdt:P171*) ?organism.
  ?structure wdt:P233 ?structure_smiles;
  wdt:P235 ?structure_inchikey;
  (p:P703/ps:P703) ?organism_child.
}

Link to the query: https://w.wiki/FfPR On 2025-10-13: 1,872 results

SPARQL (P703) on specific taxon

PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT 
?structure ?structure_smiles ?structure_inchikey
WHERE {
  VALUES ?organism {
    wd:Q846071 #  magnoliids
  }
  ?organism_child (wdt:P171*) ?organism.
  ?structure wdt:P233 ?structure_smiles;
  wdt:P235 ?structure_inchikey;
  (p:P703/ps:P703) ?organism_child.
}

Link to the query: https://w.wiki/FfPWL On 2025-10-13: 9,774 results

Community curation

Community curation

Community gadgets

Author display: https://www.wikidata.org/wiki/User:Ricordisamoa/WikidataTrust.js

Chemical depiction: https://www.wikidata.org/wiki/User:Egon_Willighagen/cdkdepict_gadget.js

Federation

PREFIX p: <http://www.wikidata.org/prop/>
PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
#title: Pigments found in taxa, with reference.
# special thanks goes to User:Lmichan for updating this information!
SELECT DISTINCT ?compound ?compoundLabel ?taxon ?taxonname ?DOI WHERE {
    ?compound p:P703 ?P703statement;
    ((wdt:P31*)/(wdt:P279*)) wd:Q161179.
    ?P703statement ps:P703 ?taxon;
    (prov:wasDerivedFrom/pr:P248) ?ref.
    SERVICE <https://query-scholarly.wikidata.org/sparql> {
      ?ref wdt:P356 ?DOI.
    }
    ?taxon wdt:P225 ?taxonname.
    ?compound rdfs:label ?compoundLabel.
    FILTER((LANG(?compoundLabel)) = "en")
}
ORDER BY (?compoundLabel)
LIMIT 10000

Link to the query: https://lotus.nprod.net/lotus-sparql-examples/examples/NPs/wd_nps_known_pigments_scholarly_subgraph.html On 2025-10-13: 6,099 results

Linked open data has (almost) no limits

Linked open data has (almost) no limits

Linked open data has (almost) no limits

Linked open data has (almost) no limits

Acknowledgements

References

Bolleman, Jerven, Vincent Emonet, Adrian Altenhoff, Amos Bairoch, Marie-Claude Blatter, Alan Bridge, Séverine Duvaud, et al. 2025. “A Large Collection of Bioinformatics Question–Query Pairs over Federated Knowledge Graphs: Methodology and Applications.” GigaScience 14. https://doi.org/10.1093/gigascience/giaf045.
de Candolle, Augustin Pyramus. 1816. Essai Sur Les Propriétés Médicales Des Plantes, Comparées Avec Leurs Formes Extérieures Et Leur Classification Naturelle. Chez Crochard, Libraire. https://doi.org/10.5962/bhl.title.112422.
Galgonek, Jakub, and Jiří Vondrášek. 2021. “IDSM ChemWebRDF: SPARQLing Small-Molecule Datasets.” Journal of Cheminformatics 13 (1). https://doi.org/10.1186/s13321-021-00515-1.
———. 2023. “A Comparison of Approaches to Accessing Existing Biological and Chemical Relational Databases via SPARQL.” Journal of Cheminformatics 15 (1). https://doi.org/10.1186/s13321-023-00729-5.
———. 2024. “The IDSM Mass Spectrometry Extension: Searching Mass Spectra Using SPARQL.” Edited by Peter Robinson. Bioinformatics 40 (4). https://doi.org/10.1093/bioinformatics/btae174.
Jose, Polpass Arul, Anjisha Maharshi, and Bhavanath Jha. 2021. “Actinobacteria in Natural Products Research: Progress and Prospects.” Microbiological Research 246 (May): 126708. https://doi.org/10.1016/j.micres.2021.126708.
Kratochvíl, Miroslav, Jiří Vondrášek, and Jakub Galgonek. 2018. “Sachem: A Chemical Cartridge for High-Performance Substructure Search.” Journal of Cheminformatics 10 (1). https://doi.org/10.1186/s13321-018-0282-y.
Rutz, Adriano, Miwa Dounoue-Kubo, Simon Ollivier, Jonathan Bisson, Mohsen Bagheri, Tongchai Saesong, Samad Nejad Ebrahimi, Kornkanok Ingkaninan, Jean-Luc Wolfender, and Pierre-Marie Allard. 2019. “Taxonomically Informed Scoring Enhances Confidence in Natural Products Annotation.” Frontiers in Plant Science 10 (October). https://doi.org/10.3389/fpls.2019.01329.
Rutz, Adriano, Maria Sorokina, Jakub Galgonek, Daniel Mietchen, Egon Willighagen, Arnaud Gaudry, James G Graham, et al. 2022. “The LOTUS Initiative for Open Knowledge Management in Natural Products Research.” eLife 11 (May). https://doi.org/10.7554/elife.70780.
Waagmeester, Andra, Gregory Stupp, Sebastian Burgstaller-Muehlbacher, Benjamin M Good, Malachi Griffith, Obi L Griffith, Kristina Hanspers, et al. 2020. “Wikidata as a Knowledge Graph for the Life Sciences.” eLife 9 (March). https://doi.org/10.7554/elife.52614.
Willighagen, Egon, Denise Slenter, Adriano Rutz, Daniel Mietchen, and Finn Nielsen. 2025. “Scholia Chemistry: Access to Chemistry in Wikidata,” May. https://doi.org/10.26434/chemrxiv-2025-53n0w.