Strongyloides genotypes (Barratt and Sapp, 2020)
publication ID |
https://doi.org/10.1016/j.ijppaw.2023.02.003 |
persistent identifier |
https://treatment.plazi.org/id/03CC87A7-2138-FB1A-4D18-FCC8FF5AFF21 |
treatment provided by |
Felipe |
scientific name |
Strongyloides genotypes |
status |
|
2.7. Assigning haplotype names for construction of genotypes
To characterize the genotypes from our reference isolates (and from the St. Kitts vervets), all cox 1, 18S HVR-I, and HVR-IV sequences were assigned a haplotype name following earlier haplotype naming conventions developed for Strongyloides sp. ( Jaleta et al., 2017; Barratt et al., 2019a; Barratt and Sapp, 2020), with some modifications for cox1 (discussed in a later section). Haplotype names were assigned by BLASTN comparison against the fasta sequences provided in File S2. These BLASTN searches were executed using the Geneious Prime interface, requiring hits of 100% sequence identity. BLASTN results were exported from Geneious in text format, where each text file contained the list of haplotype names detected in a given isolate. These text files were used to construct a haplotype data sheet (HDS); a condensed format for representing haplotype data (File S1, Tab B and Tab D) which is the required input format for computation of genetic distances using Barratt’ s heuristic ( Barratt et al., 2021; Jacobson et al., 2022) – see https: //github.com/Joel-Barratt/Eukaryotyping. In all, the HDS contained 30 genotypes from S. stercoralis and 18 genotypes from the loris-derived Strongyloides sp. , plus 191 published S. fuelleborni genotypes , and 48 S. fuelleborni genotypes generated here from St Kitts vervets (see results).
2.8. Establishing minimum data requirements and cox1 haplotype definitions
Barratt’ s heuristic was used to compute pairwise genetic distances for phylogenetic tree construction ( Barratt et al., 2019b; Nascimento et al., 2020; Jacobson et al., 2022). Barratt’ s heuristic was used because it can compute distances for datasets comprising isolates sequenced at different but overlapping combinations of markers. Many of the genotypes examined here were sequenced as part of separate, unrelated studies, so the markers sequenced were not always the same. Barratt’ s heuristic accommodates such datasets by imputing missing distance values when comparing isolates that have not been sequenced at the same loci ( Jacobson et al., 2022). This method was designed for large datasets, and as the number of isolates with mismatched markers increases within a dataset, and/or as the number of shared loci sequenced between a given isolate pair decreases, the more tenuous these imputations become ( Barratt and Sapp, 2020; Jacobson et al., 2022). Consequently, realistic minimum data requirements and maximum limits on the number of markers for which imputation is attempted must be established prior to analysis ( Jacobson et al., 2022). A detailed
155
description of how these minimum data requirements and maximum imputation limits were established, is provided in Supplementary File S1, Appendix A.
No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation.