You are viewing the site in preview mode

Skip to main content

Table 1 Tentacle applied to three use cases. All use cases used the same read data consisting of 1,238,598,682 reads with a total size of 407 GiB in compressed FASTQ (2,213 GiB uncompressed) [6]. The examples were run on 30 nodes, with the cluster system login node hosting the master process. The following options were used, pBLAT: -threads=16 -minIdentity=90 -out=blast8; GEM: -T 16 -m 0.04 -e 0.04 –min-matched-bases 0.80 –granularity 2500000; USEARCH: -usearch_local -query_cov 1.0 -id 0.9 -blast6out

From: Tentacle: distributed quantification of genes in metagenomes

Use case 1 2 3
  Reads mapped to their contigs Reads mapped to large DB Reads mapped to peptide DB
Mapper pBLAT GEM USEARCH
Type of reference Per sample contigs (nucleotide) [6] BGI Refseq geneset (nucleotide) [6] Resqu; antibiotic resistance gene
    database (peptide) [53]
Reference size (bytes) approx. 160 MiB per sample 3.0 GiB 1 MiB
Reference size (sequences) 6,589,348 3,305,138 3,019
Runtime (core hours) 720 3,072 296
Runtime (wall-clock) 1h 30m 6h 24m 0h 37m