科技基础性工作专项项目科学数据汇交科技项目数据汇交服务平台作为国家基础学科公共科学数据中心项目数据汇交服务系统。为基础学科领域国家重点专项项目提供汇交计划提交、科学数据异地协同制备,项目完整周期内项目组可持续汇聚项目数据、汇交凭证出具等服务,帮助各级科技项目顺利验收。
Pathogenic genome analysis (12) Plasmids in Pathogens genome analysis (3) SARS-CoV-2 analysis (3) Influenza analysis (3) HIV analysis (5) Mycobacterium Tuberculosis genome analysis (1) Phage genome analysis (2) Prokaryotic regulator analysis (2) gcType genome analysis (3) Fungi genome analysis (8) Others (3) Blast analysis tools (5) Convenient analysis tools (7) Metagenome analysis pipeline (3) Assembly tools (25) Genome structural analysis tools (11) Genome annotation tools (4) Community profiling tools (20) Comparative analysis tools (13) Pathogenic genome analysis: 12The reference guided assembly tool is suitable for genome assembly of bacteria and viruses. It mainly uses BWA and Minimap for reads splicing, and iVar for assembly. The reference genome library for comparison includes 10,600 genomes from 10,401 bacterial strains. Meanwhile, users are allowed to independently upload virus reference genomes for virus genome assembly. Then conduct genome integrity assessment using QUAST, and finally use CGView for complete genome mapping.
The species identification tool mainly uses KRAKEN2, RNAmmer, BLASTn, Mash, and FastANI to compare the similarity between the target sequence and the reference library sequences, and provide the optimal comparison results for species identification of the pathogenic bacteria.
The BLAST-pathogen tool is suitable for sequence alignment of bacteria, viruses, and influenza. It mainly uses NCBI-BLAST for sequence alignment, referring to 465,736 high-quality assembly and 53,568,325 high-quality contig sequence from 359 pathogenic bacteria causing human diseases; This also includes 7787 high-quality assembly from 195 viruses and 1,101,148 high-quality contig sequences, including 1,018,990 contig sequences from influenza. Then, use chewBBACA to draw a genetic development tree for the top 20 sequences and query sequences with the highest similarity.
This tool is suitable for bacterial sequence assembly and annotation. It can perform reference free assembly on the original reads data, and perform principal component analysis and gene annotation on the assembled genome. The main gene annotation databases include KEGG, COG, NR, SwissProt, AntiSMASH, MetaCyc, PHI, Pfam, CARD, VFDB. The analysis results will be fed back to users by email. This website does not support older browsers like Internet Explorer 6 – 8. Please update your browser or use new version of Chrome, Microsoft Edge and 360SE.
SNP analysis tool is suitable for bacteria and virus SNP analysis. It mainly use Snippy for SNP calling of bacterial sequences and iVar for SNP calling of viral sequences. The reference genome library for comparison includes 10,600 genomes from 10,401 bacterial strains. Then, use Gubbins to construct the Core SNP matrix and remove the SNPs in the recombination area. When there are more than 3 sequences to be tested, you can choose to upload the metadata table and use IQ-TREE 2 to draw the genetic development tree.
Multilocus Sequence Typing (MLST) enable to scan bacterial genome against traditional PubMLST typing schemes for sequencing typing. The analysis results will be fed back to users by email. This website does not support older browsers like Internet Explorer 6 – 8. Please update your browser or use new version of Chrome, Microsoft Edge and 360SE.
Core genome multilocus sequence typing (cgMLST) can be used for sequence typing of pathogenic bacteria. We proved cgMLST analysis and 112 cgMLST schema (downloadable) from pathogen species or genus, which were constructed by the ChewBBACA that performs the schema creation and allele calls on complete or draft genomes. The cgMLST analysis result will be a visualization and interactive phylogenetic tree based on the result file of ChewBBACA and the metadata submitted by users (if provided).
This tool can synchronously compare and annotate insertion sequences (IS), integrated convergent elements (ICE), integers (IN), plasma, and transposons (Tn) on the genome of pathogenic bacteria. Subsequently, use Diamond alignment to predict ARG and VF on the tested genome; By analyzing the positional information of ARG and VF on the genome, we can determine whether there is an interaction relationship between them and MGE. The criteria for determining the horizontal transfer of ARG and VF are: 1) Within 10kb upstream and downstream of an ARG or VF, if both sides contain the same IS/IN sequence, it is considered that this ARG or VF may have transferability; 2) If the position of an ARG or VF is within the sequence range of IN, Tn, plasma, and ICE, it is considered to have the possibility of horizontal transfer.
This tool is suitable for sequence assembly and annotation of fungi, mainly by assembling raw reads and annotating the assembled genomes with functional genes. The gene annotation databases include: signalp, VFDB, CARD, CAZy, NR, FungalP450, DFVF, SwissProt, emapper, antisMash.The analysis results will be fed back to users by email. This website does not support older browsers like Internet Explorer 6 – 8. Please update your browser or use new version of Chrome, Microsoft Edge and 360SE.
This tool is based on mNGS data and can analyze pathogens from infection samples of different organs in the human body, including blood infections, central nervous system infections, respiratory system infections, bone and joint infections, reproductive and urinary tract infections, abdominal and thoracic infections.
This tool is specifically designed for fungal pangenome analysis, capable of efficiently processing large-scale genome data and accurately identifying and annotating core genes, accessory genes, and unique genes in the pangenome. For the first time, it combines the removal of human and bacterial contamination from fungal genome data, sequence quality control, gene prediction, pangenome analysis, and protein annotation, providing a new analytical approach for fungal research and assisting researchers in deeply exploring the mysteries of fungal genomes.
This tool is mainly used for SNP detection and annotation of fungal genomes, as well as prediction of drug-resistant phenotypes through the detection of drug-resistant mutation sites. The reference genome covers a total of 293 fungal species at the species level. At the same time, it can satisfy the prediction of 11 drug resistance phenotypes for 4 major classes of antibiotics, involving 233 drug-resistant site mutations in 10 drug-resistant genes.
Plasmids in Pathogens genome analysis: 3This tool is mainly aimed at plasmid contigs idetification in bacterial genomes, which can accurately and quickly predict plasmid contigs from fragmented assembled bacterial genomes, which is helpful for studying the spread of antimicrobial resistance genes and the adaptive evolution of bacterial genomes.
This tool is mainly aimed at annotation of plasmid contigs, which can accurately and rapidly identify drug resistance genes, virulence genes, heavy metal resistance genes, IS, OriT and other plasmid characteristics carried by plasmids, which is helpful for studying the spread of drug resistance genes and the adaptive evolution of bacterial genomes.
This tool enables users to upload plasmid sequences and conduct similarity alignment (BLASTn) with all the plasmid sequences in PIPdb. In the output alignment results, the list of top 30 hits is displayed in reverse order according to the score value, including important parameters such as query ID, subject ID, qstart, qend, sstart, send, identity, score, and evalue. Users can click on any subject ID to view the basic information and distribution of the sequence in PIPdb.
SARS-CoV-2 analysis: 3This tool identifies single nucleotide polymorphisms (SNPs) and amino acid variations within the SARS-CoV-2 nucleotide sequence and provides a detailed evaluation of associated risks, including antibody and receptor binding interactions and the difficulty of amino acid substitutions. The analysis output includes a comprehensive variant risk evaluation table and a visual representation of variant frequencies, facilitated through integration with the VarEPS database. Results are delivered to users via email, enabling convenient access for further study and analysis.
This tool evaluates the effectiveness of SARS-CoV-2 primer sequences by analyzing the variation frequency of the last three nucleotides in the 3' primer region across different lineages. Additionally, it provides a weighted evaluation of the mutation frequency in this critical region. The results are delivered to users via email for convenient access, facilitating further study and analysis.
This tool is designed to detect SARS-CoV-2 and its lineages in wastewater samples, offering a comprehensive analysis that includes sample quality control, a summary of detected lineages along with newly identified SNPs and amino acid variations, and an evaluation of SNP frequency and associated risks such as their impact on antibody and receptor binding interactions and the difficulty of amino acid substitutions, with results conveniently delivered to users via email for further research and analysis.
Influenza analysis: 3This tool identifies single nucleotide polymorphisms (SNPs) and amino acid variations within the influenza nucleotide sequence and provides a detailed evaluation of associated risks, including antibody and receptor binding interactions and the difficulty of amino acid substitutions. The analysis output includes a comprehensive variant risk evaluation table and a visual representation of variant frequencies, facilitated through integration with the VarEPS-Influ database. Results are delivered to users via email, enabling convenient access for further study and analysis.
This tool evaluates the effectiveness of influenza primer sequences by analyzing the variation frequency of the last three nucleotides in the 3' primer region across different subtypes and clades. Additionally, it provides a weighted evaluation of the mutation frequency in this critical region. The results are delivered to users via email for convenient access, facilitating further study and analysis.
This tool identifies the lineage and clade assignment of H9 influenza virus based on its HA segment nucleotide sequences. It generates a comprehensive table detailing the clade and lineage assignment for each sequence, along with a phylogenetic tree constructed by integrating the submitted sequences with reference sequences from the corresponding lineages. The analysis results are delivered via email, providing convenient access for further in-depth study and comparative analysis.
HIV analysis: 5On the basis of the latest HIV sequence reference database and the China-specific CRF01_AE and CRF07_BC reference sequences, the HIV sequences were genotyped phylogenetic tree and BLAST method. At the same time, this tool gives the sub -cluster information of the sequence for CRF01_AE and CRF07_BC.
Based on HIV Trace and Cluster Picker, HIV transmission networking were constructed on this platform . Users only need to upload the fasta sequence file and corresponding meta information to get the molecular network results and several key network evaluation indicators, and the molecular network results support personalized display.
By integrating and recoding HIV sequence quality control codes, this platform provides an alternative quality control tool for HIV sequencing data. This tool including sequence length, mixed base ratio, frameshift mutation, stop codon mutation, and hypermutation.
This tool uses convolutional neural network model (CNN) to train HIV phenotypic drug resistance data. After supplementing the phenotypic resistance data of non-subtype B, the model fine-tuning method was used to improve the accuracy of drug resistance prediction of non-subtype B sequence data.
The molecular network was constructed based on HIV Trace method, and the sequence typing and drug resistance information were supplemented by HIV sequence typing tool and HIV drug resistance analysis tool. On this basis, the transmission risk of HIV transmission clusters and nodes in the network was assessed by the self-defined quantitative screening criteria. At the same time, the risk degree of cross-regional transmission and drug-resistant transmission was given by weight adjustment.
Mycobacterium Tuberculosis genome analysis: 1This method utilizes the official tb-profile 4.3.0 Docker image for MTB lineage determination and drug resistance analysis.
Phage genome analysis: 2This pipeline is designed for quality control, filtering, assembly, and downstream analysis of the next-generation sequencing paired-end reads of bacteriophages. Users can also directly submit their own genome for analysis.
This pipeline is used to predict prophage sequences from prokaryotic genomes and annotate protein functions, antibiotic resistance genes, virulence genes, and predict family of prophage sequences.
Prokaryotic regulator analysis: 2The process of Prokavotic Genome Analysis for Global Requlator is to perform global transcription factor and target gene analysis on the entire genome sequence input by users. And provide information on category, species distribution, and functional annotation information of global regulator factors and their target genes .
The process of predicting global transcription factors in prokaryotes involves identifying the global transcription factors for protein sequences input by users. This process is to compare and analyze the global transcription factor database of prokaryotes and provide their information such as the category, function, and species distribution.
gcType genome analysis: 3The genome assembly and annotation pipeline is suitable for the assembly and annotation of bacterial sequences. It involves assembling and analyzing the raw sequencing data, and then annotating the assembled genome. The raw sequencing data can be assembled using various software tools for both long reads (PacBio or Nanopore) and Illumina short reads. The genomic component analysis includes identifying CRISPR arrays, detecting repetitive structures, predicting non-coding RNAs, prophage prediction, defense system prediction, mobile element detection, and gene prediction using Prodigal. The gene annotation mainly utilizes several databases, including KEGG, GO, COG, NR, Swiss-Prot, AntiSMASH, MetaCyc, PHI, Pfam, CARD and VFDB.
The new species identification pipeline uses the genome sequence of the new type strain as the query to perform a similarity search against the gcType 16S rRNA gene and genome sequence reference database. This allows for the identification of the closest related species, which can then be used to perform a phylogenetic analysis.
This tool first uses Swiss-Prot database and Pfam database to annotate the input type strain protein sequence. Then, based on the locally privately constructed type strain protein database, the input sequences were matched and aligned using TM-vec and DeepBLAST (https://www.biorxiv.org/content/10.1101/2022.07.25.501437v1.full.pdf) software, which considered both the sequence and structural characteristics of the input sequences. Finally, the results of the above two parts are displayed.Please read the manual before you submit your job, if it is your the first time using protein annotation + AI search pipeline.
Fungi genome analysis: 8This tool identifies fungal species by carring out pairwise sequence alignment based on ITS sequences of UNITE+INSD dataset from UNITE. The user can customize the maximum target sequence number. And phylogenetic tree will be constructed based on those target sequences by using FastTree. The results support personalized visual display.
This tool is suitable for sequence assembly and annotation of fungi, mainly by assembling raw reads and annotating the assembled genomes with functional genes.
This tool is specifically designed for fungal pangenome analysis, capable of efficiently processing large-scale genome data and accurately identifying and annotating core genes, accessory genes, and unique genes in the pangenome.
The platform for identification of quarantine fungi species is based on multiple gene sequence benchmark databases. By integrating sequence alignment software, phylogenetic analysis software, and genetic distance analysis modules, as well as designing multiple query function modules, such as sequence input, sequence alignment, construction of phylogenetic trees, species confirmation and other functional modules, individual genes can be automatically analyzed for phylogeny and the function of individual barcode identification screening can be realized.
The species identification platform based on with multilocus data is mainly based on multi-gene reference databases. By integrating sequence alignment software, phylogenetic analysis software, and genetic distance analysis software modules, and designing multiple query function modules, and a series of functional modules, a 'one-stop' multi-gene pedigree typing screening system is established to achieve accurate and rapid species identification based on multiple gene sequences.
The high-throughput screening platform establishes a detection method for rapidly screening quarantine species from quarantine samples. By extracting total DNA from quarantine samples, amplifying their ITS2 region, and performing second-generation sequencing, the bioinformatics software modules embedded in this platform are used for primary information analysis to determine whether there is a possibility of multiple quarantine species in the samples to be tested.
This tool will mark species name of poisonous mushrooms with an asterisk based on the world list of poisonous mushrooms (He et al. 2022, Fungal Biology Reviews). The pairwise sequence alignment is based on ITS sequences of UNITE+INSD dataset from UNITE. The results support personalized display and the hit of poisonous mushroom will be linked to the entry of species detail page.
This tool supports fungal identification based on Internal Transcribed Spacer for fungi from environment. The analysis includes de-duplication, de-singleton (customized frequency), OTU clustering (customized sequence identity) based on reference database of UNITE+INSD from UNITE and species annotation by using BLAST+. Outputs includes results of OTU table, taxonomy table, taxonomy statistics and their visual display.
Others: 3The website has established a new precision labeling algorithm for antibody V(D)J segments, using a reference library consistent with mainstream IMGT and IgBlast databases to ensure the international applicability of the results obtained by users.
The GRAPE webserver allows non-expert users to improve protein thermostability following our recently developed GRAPE strategy. The strategy combines the advantages of hybrid method and greedy algorithm to search beneficial combination pathways in an expanded mutation library by reducing the dimensionality of the data.
IPGA (https://nmdc.cn/ipga/) is a one-stop web service to analyze, compare, and visualize pan-genome as well as individual genomes, which avoid users to install any tools. IPGA features a score system that helps users to evaluate the reliability of pan-genome profiles generated by different packages.
Blast analysis tools: 5Searches a nucleotide query against a nucleotide database
Searches a protein query against a protein database
Searches a nucleotide query against a protein database
Searches a protein query against a nucleotide database
Searches a nucleotide query against a protein database
Convenient analysis tools: 7Tool_OrthoANI
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form
PILER-CR is public domain software for finding CRISPR repeats
tRNAscan-SE identifies transfer RNA genes in genomic DNA or RNA sequences
The RNAmmer 1.2 server predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences
LEfSe (Linear discriminant analysis Effect Size) determines the features most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance
XSTREAM is a tool for rapidly identifying and modeling the architecture of fundamental Tandem Repeats (TRs) in protein sequences. Due to the general nature of TRs, however, any sequence including DNA (or even numbers!) can be processed
Metagenome analysis pipeline: 3GenomeAssemblyAnnotationPipe is a pipeline for genome assembly and annotation.
simpleMetagenomeAnalysis is a pipeline for Metagenome assemble and Annotation.
simpleMetagenomeAnalysis is a pipeline for Metagenome Annotation.
Assembly tools: 25SOAPdenovo is a novel short-read assembly method that can build a denovo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes ina cost effective way
SPAdes(St. Petersburg genome assembler)is an assembly toolkit containing various assembly pipelines.
Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge,in the United Kingdom.Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs
Tool_ALLPATH_LG
Tool_MetaIDBA
MegaHit is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MegaHit assembles the data as a whole, i.e. no pre-processing like partitioning and normalization was needed. When compared with previous methods on assembling the soil data, MegaHitgenerated a three-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a fourfold improvement
Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere and is implemented using peer-to-peer communication
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION)
The program has a capability to clip 5′ and 3′ low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward–reverse constraints to correct assembly errors and link contigs
SSPACE is not a de novo assembler, it is used after a preassembled run. SSPACE is a script to extend and scaffold preassembled contigs using a numbe of mate pairs or paired-end libraries
OPERA (Optimal Paired-End Read Assembler) is a sequence assembly program. It uses information from paired-end/mate-pair/long reads to order and orient the intermediate contigs/scaffolds assembled in a genome assembly project, in a process known as Scaffolding. OPERA is based on an exact algorithm that is guaranteed to minimize the discordance of scaffolds with the information provided by the paired-end/mate-pair/long reads (for further details see Gao et al, 2011)
QUAST stands for QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics
REAPR is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis assemblies and other warnings, and produces a new broken assembly based on the error calls
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes
BUSCO assessments are implemented in open-source software, with a large selection of lineage-specific sets of Benchmarking Universal Single-Copy Orthologs. These conserved orthologs are ideal candidates for large-scale phylogenomics studies, and the annotated BUSCO gene models built during genome assessments provide a comprehensive gene predictor training set for use as part of genome annotation pipelines
Transcriptome assembly and differential expression analysis for RNA-Seq
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads
Comparing expression levels of genes and transcripts in RNA-Seq experiments is a hard problem. Cuffdiff is a highly accurate tool for performing these comparisons, and can tell you not only which genes are up- or down-regulated between two or more conditions, but also which genes are differentially spliced or are undergoing other types of isoform-level regulation
Sailfish is a tool for transcript quantification from RNA-seq data.It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All you need to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your reads. Sailfish runs in two phases; indexing and quantification. The indexing step is independent of the reads, and only needs to be run once for a particular set of reference transcripts and choice of k (the k-mer size).
kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution
Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software children: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads
Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly
SOAPdenovo-Trans is a de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts.The assembler provides a more accurate, complete and faster way to construct the full-length transcript sets.
Genome structural analysis tools: 11PILER-CR is public domain software for finding CRISPR repeats
MinCED is a program to find Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such as assembled contigs from metagenomes
tRNAscan-SE identifies transfer RNA genes in genomic DNA or RN sequences
The RNAmmer 1.2 server predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences
Prodigal is a protein-coding gene prediction software tool for bacterial and archaeal genomes. The acronym stands for PROkaryotic DYnamic Programming Genefinding Algorithm
Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA
A family of gene prediction programs,such as Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes、Gene Prediction in Eukaryotes、Gene Prediction in Transcripts etc
FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes
XSTREAM is a tool for rapidly identifying and modeling the architecture of fundamental Tandem Repeats (TRs) in protein sequences. Due to the general nature of TRs, however, any sequence including DNA (or even numbers!) can be processed
PRISM is a software for split read (reads which span across a structrual variant -- SV ) mapping and SV calling from the mapping result. PRISM is able to detect small insertions and abitrary size deletions, inversions and tandom duplications with the direction of discordant read pairs
A probabilistic framework for structural variant discovery
Genome annotation tools: 4Whole genome annotation is the process of identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files
DFAST is a flexible and customizable pipeline for prokaryotic genome annotation as well as data submission to the INSDC. It is originally developed as the background engine for the DFAST web service and is also available as a stand-alone command-line tool
InterPro is a database which integrates together predictive information about proteins' function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains
PILER-CR is public domain software for finding CRISPR repeats
Community profiling tools: 20Obtaining and importing data/Demultiplexing sequences/Sequence quality control and feature table construction/Generate a tree for phylogenetic diversity analyses
LEfSe (Linear discriminant analysis Effect Size) determines the features most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance
PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes
MetaCV is a composition and phylogeny based algorithm to classify very short metagenomic reads (75-100 bp) into specific taxonomic and functional groups. MetaCV performs (for both sensitivity and specificity) as good as BlastX-based methods on simulated short reads, but runs 300 times faster thus provides effectively and efficiently analysis on huge amount of NGS data
k-SLAM is a program for alignment based metagenomic analysis of large sets of high-throughput sequence data. k-SLAM uses a k-mer based technique to rapidly find alignments between reads and genomes which are then validated using the Smith-Waterman algorithm. Alignments are chained together into a pesudo-assembly to increase specificity
Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments
Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (5.8 GB for all complete bacterial and viral genomes plus the human genome) and classifies sequences at a very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers
a top-down taxonomic profiler for metagenomics
Phylogenetic marker genes are suitable to reconstruct the evolutionary history of organisms and to profile the taxonomic composition of environmental samples. For this purpose, a set of 40 protein-coding phylogenetic marker genes (MGs) have been identified . In the vast majority of known organisms, these 40 MGs occur in single copy and they have recently been used to delineate prokaryotic organisms at the species level. Due to these properties, they can be used to detect and accuratelyquantify not only known species, but also those that still lack genomic information. Based on a subset of these MGs that are suitable for shotgun sequencing data, we developed a method for taxonomic composition profiling of environmental samples
StrainEst is a novel, reference-based method that uses the Single Nucleotide Variants (SNV) profiles of the available genomes of selected species to determine the number and identity of coexisting strains and their relative abundances in mixed metagenomic samples
Estimate the distance of each query sequence to the reference.
sourmash is a command-line tool and Python library for computing MinHash sketches from DNA sequences, comparing them to each other, and plotting the results. This allows you to estimate sequence similarity between even very large data sets quickly and accurately
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling
HUMAnN2 (the HMP Unified Metabolic Analysis Network) is a method for efficiently and accurately determining the presence, absence, and abundance of metabolic pathways in a microbial community from metagenomic or metatranscriptomic sequencing data. It is appropriate for any type of microbial community shotgun sequence profiling
CONCOCT “bins” metagenomic contigs. Metagenomic binning is the process of clustering sequences into clusters corresponding to operational taxonomic units of some level
MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads
These tools are easy to use and identify genome bins automatically
Abundance Bin is anabundance-based tool for binning metagenomic sequences,such that the reads classified in a bin belong to species of identical or very similar abundances. Abundance Bin also gives estimations of species abundances and their genomesize, these two important characteristic parameters for amicrobial community
R package for identifying viral sequences from metagenomic data using sequence signatures
matching hosts of viruses based on oligonucleotide frequency (ONF) comparison
Comparative analysis tools: 13calculate the ANI between two 16s RNA gene sequences
CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output. In addition cd-hit outputs a cluster file, documenting the sequence 'groupies' for each nr sequence representative
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form
BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate.
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes
The index command creates a new index file that allows fast look-up of data in a (sorted) SAM or BAM
Searches a nucleotide query against a nucleotide database(version:latest)
BLAST-Like Alignment Tool, BLAT is a legacy tool for sequence alignment that is not under active development.
DIAMOND is a new high-throughput program for aligning DNA reads or protein sequences against a protein reference database
STAR (Spliced Transcripts Alignment to a Reference) is an ultrafast universal RNA-seq aligner. It not only can perform unbiased de novo detection of canonical junctions, but also can discover non-canonical splices and chimeric (fusion) transcripts, and is capable of mapping full-length RNA sequences.
Tool_TopHat
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome)
The method BLASR (Basic Local Alignment with Successive Refinement) was used to map Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project
Contact US Address:NO.1 Beichen West Road, Chaoyang District, Beijing Institute of Microbiology, Chinese Academy of Sciences Contact:010-64806052 Email:nmdc@im.ac.cn QQ:3415782117
相关知识
国家青藏高原科学数据中心
中华人民共和国植被图(1:1000000)——植物科学数据中心
中国植物物种名录(2022版)——植物科学数据中心
营养与健康所所级科学数据中心举办多组学数据管理与共享培训
植物智——花伴侣专业版APP——植物科学数据中心
中国野生杜鹃花属植物名录与地理分布数据集——植物科学数据中心
微生物所微生物多样性与资源创新利用重点实验室蔡磊研究组招聘启事
绿色数据中心的建设概述
RiceData==国家水稻数据中心品种数据库 >>>审定品种
微生物农药国家工程研究中心
网址: 国家微生物科学数据中心 https://m.huajiangbk.com/newsview738047.html
上一篇: 中国医学细菌保藏管理中心 |
下一篇: 微生物实验用菌种管理规程(含表格 |