Title: | Taxonomic Name Standardization |
---|---|
Description: | Matches species names to a taxonomic standard. Resolves synonyms consistently and reproducibly. |
Authors: | Joel Nitta [aut, cre] |
Maintainer: | Joel Nitta <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-20 04:05:22 UTC |
Source: | https://github.com/joelnitta/taxastand |
A dataset containing taxonomic names and associated metadata for the fern family Hymenophyllaceae. Downloaded from the Catalog of Life, Version 1.5. All columns formatted according to Darwin Core standard. Only includes taxa at the species or infraspecies level.
filmy_taxonomy
filmy_taxonomy
A data frame with 2729 rows and 31 variables.
http://www.catalogueoflife.org/
Allows for orthographic differences between query and reference by using fuzzy matching on parsed taxonomic names. Requires taxon-tools to be installed.
ts_match_names( query, reference, max_dist = 10, match_no_auth = FALSE, match_canon = FALSE, collapse_infra = FALSE, collapse_infra_exclude = NULL, simple = FALSE, docker = getOption("ts_docker", default = FALSE), tbl_out = getOption("ts_tbl_out", default = FALSE) )
ts_match_names( query, reference, max_dist = 10, match_no_auth = FALSE, match_canon = FALSE, collapse_infra = FALSE, collapse_infra_exclude = NULL, simple = FALSE, docker = getOption("ts_docker", default = FALSE), tbl_out = getOption("ts_tbl_out", default = FALSE) )
query |
Character vector or dataframe; taxonomic names to be queried.
If a character vector, missing values not allowed and all values must be unique.
If a dataframe, should be taxonomic names parsed with |
reference |
Character vector or dataframe; taxonomic names to use as reference.
If a character vector, missing values not allowed and all values must be unique.
If a dataframe, should be taxonomic names parsed with |
max_dist |
Max Levenshtein distance to allow during fuzzy matching (total insertions, deletions and substitutions). Default: 10. |
match_no_auth |
Logical; If no author is given in the query and the name (without author)
occurs only once in the reference, accept the name in the reference as a match.
Default: to not allow such a match ( |
match_canon |
Logical; Allow a "canonical name" match if only the genus, species epithet,
and infraspecific epithet (if present) match exactly. Default: to not allow such a match ( |
collapse_infra |
Logical; if the specific epithet and infraspecific epithet are the same, drop the infraspecific rank and epithet from the query. |
collapse_infra_exclude |
Character vector; taxonomic names to exclude
from collapsing with |
simple |
Logical; return the output in a simplified format with only the query
name, matched reference name, and match type. Default: |
docker |
Logical; if TRUE, docker will be used to run taxon-tools (so that taxon-tools need not be installed). |
tbl_out |
Logical vector of length 1; should a tibble be returned?
If |
taxon-tools
matches names in two steps:
Scientific names are parsed into their component parts (genus, species, variety, author, etc).
Names are fuzzily matched following taxonomic rules using the component parts.
For more information on rules used for matching, see taxon-tools manual.
Parsing is fairly fast (much faster than matching) but can take some time if
the number of names is very large. If multiple queries will be made (e.g., to
the same large reference database), it is recommended to first parse the
names using ts_parse_names()
, and use the results as input to
query
and/or reference
.
collapse_infra
is useful in situations where the reference database does
not use names that have the same specific epithet and infraspecific epithet.
For example, reference name "Blechnum lunare" and query "Blechnum lunare var.
lunare". In this case, if collapse_infra
is TRUE
, "Blechnum lunare" will
be queried instead of "Blechnum lunare var. lunare". Note that the
match_type
will be "exact" even though the literal query and the matched
name are different (see example below).
Dataframe with the following columns (if simple
is FALSE
):
query: Query name
reference: Matched reference name
match_type: Type of match (for a summary of match types, see taxon-tools manual)
id_query: Unique ID of query
id_ref: Unique ID of reference
genus_hybrid_sign_query: Genus hybrid sign in query
genus_name_query: Genus name of query
species_hybrid_sign_query: Species hybrid sign in query
specific_epithet_query: Specific epithet of query
infraspecific_rank_query: Infraspecific rank of query
infraspecific_epithet_query: Infraspecific epithet of query
author_query: Taxonomic author of query
genus_hybrid_sign_ref: Genus hybrid sign in reference
genus_name_ref: Genus name of reference
species_hybrid_sign_ref: Species hybrid sign in reference
specific_epithet_ref: Specific epithet of reference
infraspecific_rank_ref: Infraspecific rank of reference
infraspecific_epithet_ref: Infraspecific epithet of reference
author_ref: Taxonomic author of reference
If simple
is TRUE
, only return the first three columns above.
if(ts_tt_installed()) { ts_match_names( "Crepidomanes minutus", c("Crepidomanes minutum", "Hymenophyllum polyanthos"), simple = TRUE ) # If you always want tibble output without specifying `tbl_out = TRUE` # every time, set the option: options(ts_tbl_out = TRUE) ts_match_names( "Crepidomanes minutus", c("Crepidomanes minutum", "Hymenophyllum polyanthos") ) # Example using collapse_infra argument ts_match_names( c("Crepidomanes minutus", "Blechnum lunare var. lunare", "Blechnum lunare", "Bar foo var. foo", "Bar foo"), c("Crepidomanes minutum", "Hymenophyllum polyanthos", "Blechnum lunare", "Bar foo"), collapse_infra = TRUE, collapse_infra_exclude = "Bar foo var. foo", simple = TRUE ) }
if(ts_tt_installed()) { ts_match_names( "Crepidomanes minutus", c("Crepidomanes minutum", "Hymenophyllum polyanthos"), simple = TRUE ) # If you always want tibble output without specifying `tbl_out = TRUE` # every time, set the option: options(ts_tbl_out = TRUE) ts_match_names( "Crepidomanes minutus", c("Crepidomanes minutum", "Hymenophyllum polyanthos") ) # Example using collapse_infra argument ts_match_names( c("Crepidomanes minutus", "Blechnum lunare var. lunare", "Blechnum lunare", "Bar foo var. foo", "Bar foo"), c("Crepidomanes minutum", "Hymenophyllum polyanthos", "Blechnum lunare", "Bar foo"), collapse_infra = TRUE, collapse_infra_exclude = "Bar foo var. foo", simple = TRUE ) }
Requires taxon-tools or docker to be installed.
ts_parse_names( taxa, tbl_out = getOption("ts_tbl_out", default = FALSE), quiet = FALSE, docker = getOption("ts_docker", default = FALSE) )
ts_parse_names( taxa, tbl_out = getOption("ts_tbl_out", default = FALSE), quiet = FALSE, docker = getOption("ts_docker", default = FALSE) )
taxa |
Character vector; taxon names to be parsed by taxon-tools |
tbl_out |
Logical vector of length 1; should a tibble be returned?
If |
quiet |
Logical; if TRUE, suppress warning messages that would normally be issued |
docker |
Logical; if TRUE, docker will be used to run taxon-tools (so that taxon-tools need not be installed). |
Parses scientific names into their component parts (genus, species, variety, author, etc).
A dataframe including the following columns.
id: A unique ID number assigned to the input name
name: The input name
genus_hybrid_sign: Hybrid sign for genus
genus_name: Genus name
species_hybrid_sign: Hybrid sign for species
specific_epithet: Specific epithet (name)
infraspecific_rank: Infraspecific rank
infraspecific_epithet: Infraspecific epithet (name)
author: Name of taxon
# Using local taxon-tools installation if (ts_tt_installed()) { ts_parse_names("Foogenus x barspecies var. foosubsp (L.) F. Bar") ts_parse_names( "Foogenus x barspecies var. foosubsp (L.) F. Bar", tbl_out = TRUE) # If you always want tibble output without specifying `tbl_out = TRUE` # every time, set the option: options(ts_tbl_out = TRUE) ts_parse_names("Foogenus x barspecies var. foosubsp (L.) F. Bar") ts_parse_names("Crepidomanes minutum (Blume) K. Iwats.") } # Using docker if (babelwhale::test_docker_installation()) { ts_parse_names( "Foogenus x barspecies var. foosubsp (L.) F. Bar", docker = TRUE) }
# Using local taxon-tools installation if (ts_tt_installed()) { ts_parse_names("Foogenus x barspecies var. foosubsp (L.) F. Bar") ts_parse_names( "Foogenus x barspecies var. foosubsp (L.) F. Bar", tbl_out = TRUE) # If you always want tibble output without specifying `tbl_out = TRUE` # every time, set the option: options(ts_tbl_out = TRUE) ts_parse_names("Foogenus x barspecies var. foosubsp (L.) F. Bar") ts_parse_names("Crepidomanes minutum (Blume) K. Iwats.") } # Using docker if (babelwhale::test_docker_installation()) { ts_parse_names( "Foogenus x barspecies var. foosubsp (L.) F. Bar", docker = TRUE) }
After matching taxonomic names to a reference, some may match synonyms. This function resolves synonyms to their accepted names.
ts_resolve_names( query, ref_taxonomy, max_dist = 10, match_no_auth = FALSE, match_canon = FALSE, collapse_infra = FALSE, collapse_infra_exclude = NULL, docker = getOption("ts_docker", default = FALSE), tbl_out = getOption("ts_tbl_out", default = FALSE) )
ts_resolve_names( query, ref_taxonomy, max_dist = 10, match_no_auth = FALSE, match_canon = FALSE, collapse_infra = FALSE, collapse_infra_exclude = NULL, docker = getOption("ts_docker", default = FALSE), tbl_out = getOption("ts_tbl_out", default = FALSE) )
query |
Character vector or dataframe; taxonomic names to be resolved.
If a character vector, missing values not allowed and all values must be
unique. If a dataframe, should be taxonomic names matched with
|
ref_taxonomy |
Dataframe; reference taxonomic data adhering to the Darwin Core standard with the following columns:
|
max_dist |
Max Levenshtein distance to allow during fuzzy matching (total insertions, deletions and substitutions). Default: 10. |
match_no_auth |
Logical; If no author is given in the query and the name (without author)
occurs only once in the reference, accept the name in the reference as a match.
Default: to not allow such a match ( |
match_canon |
Logical; Allow a "canonical name" match if only the genus, species epithet,
and infraspecific epithet (if present) match exactly. Default: to not allow such a match ( |
collapse_infra |
Logical; if the specific epithet and infraspecific epithet
are the same, drop the infraspecific rank and epithet from the query. For more
information, see |
collapse_infra_exclude |
Character vector; taxonomic names to exclude
collapsing with |
docker |
Logical; if TRUE, docker will be used to run taxon-tools (so that taxon-tools need not be installed). |
tbl_out |
Logical vector of length 1; should a tibble be returned?
If |
query
can take as input either a character vector of taxonomic names, or
the output of ts_match_names()
. If the former, it will run
ts_match_names()
to match the query to ref_taxonomy
, then
resolve synonyms. If the latter, the scientific names in ref_taxonomy
should be the same used as reference with ts_match_names()
(this is not checked).
ref_taxonomy
must be taxonomic data adhering to the Darwin Core standard.
Darwin Core includes many terms, but only four (taxonID
,
acceptedNameUsageID
, taxonomicStatus
, and scientificName
) are required
for this function.
Dataframe; results of resolving synonyms in matched taxonomic names. Includes the following columns:
query
: Query name
resolved_name
: Accepted name after resolving synonyms
matched_name
: Name matched to query
resolved_status
: Taxonomic status of the resolved name (same as taxonomicStatus
in ref_taxonomy
)
matched_status
: Taxonomic status of the matched name (same as taxonomicStatus
in ref_taxonomy
)
match_type
: Type of match (for a summary of match types, see taxon-tools manual)
Names that could not be matched or resolve to multiple, different synonyms have NA
for resolved_name
.
if (ts_tt_installed()) { # Load reference taxonomy in Darwin Core format data(filmy_taxonomy) ts_resolve_names("Gonocormus minutum", filmy_taxonomy) # If you always want tibble output without specifying `tbl_out = TRUE` # every time, set the option: options(ts_tbl_out = TRUE) ts_resolve_names("Gonocormus minutum", filmy_taxonomy) }
if (ts_tt_installed()) { # Load reference taxonomy in Darwin Core format data(filmy_taxonomy) ts_resolve_names("Gonocormus minutum", filmy_taxonomy) # If you always want tibble output without specifying `tbl_out = TRUE` # every time, set the option: options(ts_tbl_out = TRUE) ts_resolve_names("Gonocormus minutum", filmy_taxonomy) }
Test if taxon-tools is installed
ts_tt_installed()
ts_tt_installed()
TRUE
if taxon-tools is
installed, or FALSE
if not.
ts_tt_installed()
ts_tt_installed()
Write out parsed names to a text file
ts_write_names(df, path)
ts_write_names(df, path)
df |
Dataframe with parsed names |
path |
Path to write dataframe Writes out parsed names in a format that can be used by taxon-tools (each part of the scientific name is separated by the pipe symbol (|), with one name per line). |
Path to parsed names
if (ts_tt_installed()) { parsed_names <- ts_parse_names( "Foogenus x barspecies var. foosubsp (L.) F. Bar") temp_file <- tempfile() ts_write_names(parsed_names, temp_file) readLines(temp_file) file.remove(temp_file) }
if (ts_tt_installed()) { parsed_names <- ts_parse_names( "Foogenus x barspecies var. foosubsp (L.) F. Bar") temp_file <- tempfile() ts_write_names(parsed_names, temp_file) readLines(temp_file) file.remove(temp_file) }