...
Data | Archive size | Unpacked data size | Description | Data source | |||||
---|---|---|---|---|---|---|---|---|---|
NCBI taxonomy classification | 2.5 Gb | 31 Gb | This includes a set of taxonomy data files from NCBI. These data should be present for any type the NGS classification analysis. | The Original data were downloaded from the NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/). | |||||
NCBI RefSeq bacterial genomes | 130 Gb | 132 Gb | The data can be used to build a database for CLARK-l (light version of CLARK), CLARK, or Kraken. As UGENE integrates modified version of CLARK/CLARK-l, it is possible to provide *.gz archives as input for building the database. In particular, "CLARK-l DB: RefSeq bacterial+viral genomes" (see below) was generated using the archived data. Also, keep in mind that changing of some parameters of the "Classify Sequences with CLARK" element may cause re-building of the reference database. The reference data should be present in this case! For building a Kraken database usage of *.gz archives is not supported, it is required to unpack each *.gz file, so even more disk space will be required. Note that the data were used to build | Original data were downloaded from the NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteria/bacteria*.genomic.fna.gz) | |||||
NCBI RefSeq viral genomes | 77 Mb | 77 Mb | Similarly to "NCBI RefSeq bacterial genomes", although the size of the data is rather small. The reference data are included into "CLARK-l DB: RefSeq bacterial+viral genomes" and "CLARK-l DB: RefSeq viral genomes" (see below)databases. | The Original data were downloaded from the NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteriaviral/bacteriaviral*.genomic.fna.gz) | NCBI RefSeq viral genomes | 7. | 4 Gb11 Gb | ||
NCBI RefSeq GRCh38 human genome | 837 Mb | 838 Mb | Similarly to "NCBI RefSeq bacterial genomes". The data are not included into any database, but provided in case one would like to use them when building a custom database. | Original data were downloaded from the NCBI FTP (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_*/hs_ref_GRC*chr*.fa.gz). | |||||
Kraken DB: MiniKraken 4Gb database | 2.5 Gb | 4.3 Gb | |||||||
CLARK-l DB: RefSeq bacterial+viral genomes | 7.4 Gb | 11 Gb | |||||||
CLARK-l DB: RefSeq viral genomes | 16 Mb | 72 Mb | |||||||
DIAMOND DB: UniRef50 | 5.2 Gb | 13 Gb | |||||||
DIAMOND DB: UniRef90 | 13 Gb | 34 Gb | |||||||
Total: | 161 Gb | 226 Gb |
...