NanoForms

Click Datasets in Navigation bar, then click .
Type the new dataset name and select type. For nanopore MinION sequencing data select 'Nanopore sequencing data' option and click Browse button. Open the directory with basecalled data and choose file or files.
When it comes to hybrid Nanopore and Illumina assembly, we support only paired-end DNA sequencing reads from Illumina. To upload paired-end sequencing reads from Illumina select 'Illumina sequencing data' option and click Browse button. Open the directory with sequencing data and choose reads.
We support fastq, fastq.gz, tar and tar.gz formats. While selecting files to upload, for some browsers, the users need to open "Options" and select "All Files", in order to upload gzipped files to the server. Note, that our server is capable of handling small genomes, which total size cannot exceed 15 GB. Click button and wait when uploading will be finished.

Quality tests allow you to generate a statistical summary and a number of plots and assess the quality of uploaded dataset. Performing quality test is optional, but if you are a beginner in processing sequencing data, we encourage you to perform it and look at the results.

Click Quality tests in Navigation bar, then click .
Type the new name of quality test and select a dataset. If you uploaded more than one dataset, you can see them in dropdown list.
Click button.
After pressing the button in Quality tests you can see how many steps have been finished and how many left until the selected analysis is completed. The progress of the analysis and its duration is also displayed on the Gantt Chart. When analysis is completed, the workflow status will be changed from Running to Succeeded or Failed. If analysis is Succeeded, you will be able to see the report. In the other case, please contact us, to find out what went wrong. You do not have to wait until the analysis is complete, we will email you notification as soon as it is completed.
If the results of analysis are satisfying, you can perform data assembly.
You can also perform the quality test by going to the Datasets list, and clicking button on the selected row. From dataset details view, click to create new quality test on this dataset.

Add a dataset and optionally perform a quality test.
Click Data assembly in Navigation bar, then click

.
Select 'ONT assembly' option and type the new name of data assembly and select a dataset. If you uploaded more than one dataset, you can see them in dropdown list.
If you are a beginner, you can try to perform analysis with default parameters value. You can also fill the following parameters values:
- Flye -g: estimated genome size (for example, 5m or 2.6g). The genome size estimate is used for solid k-mer selection in the initial disjointig assembly stage. Flye is not very sensitive to this parameter, and the estimate could be rough. It is ok if the parameter is within 0.5x-2x of the actual genome size. If the final assembly size is very different from the initial guess, consider re-running the pipeline with an updated estimate for better results. Note, that 5m = 5 megabase (5,000,000 bases) and 2.6g = 2.6 gigabase (2,600,000,000 bases).
- Nanofilt -q: quality, filter on a minimum average read quality score (for example, 10),
- Nanofilt -headcrop: trim n nucleotides from start of read (for example, 75).
- Filtlong --min_length : keep only this percentage of the best reads measured by bases (for example, 1000 - discard any read which is shorter than 1 kbp).
- Filtlong --target_bases : keep only the best reads up to this many total bases (for example, 500000000 - remove the worst reads until only 500 Mbp remain, useful for very large read sets. If the input read set is less than 500 Mbp, this setting will have no effect).
- Prokka --genus : Genus name (default 'Genus').
- Prokka --species : Species name (default 'species').
- Prokka --strain : Strain name (default 'strain').
- Prokka --plasmid : Plasmid name or identifier (default '').
Confirm the entered values by clicking the button.
After pressing the button in Data assembly you can see how many steps have been finished and how many left until the selected data assembly is completed. As previously, the progress of the analysis and its duration is also displayed on the Gantt Chart. When analysis is completed, the workflow status will be changed from Running to Succeeded or Failed. If analysis is Succeeded, you will be able to see the report, download the consensus file and reports. In the other case, please contact us, to find out what went wrong. You do not have to wait until the analysis is complete, we will send email you notification when it is finished.
You can also perform the data assembly by going to the Quality tests list, and clicking button on the selected row. From quality tests details view, click to create new quality test on this dataset.

Add a dataset and optionally perform a quality test.
Click Data assembly in Navigation bar, then click

.
Select 'Hybrid Nanopore and Illumina assembly' option and type the new name of data assembly. Then, select a dataset with Nanopore reads from the first dropdown list. Subsequently, select a dataset with Illumina reads from the second dropdown list and from choosen dataset select first read and second read.
If you are a beginner, you can try to perform analysis with default parameters value. You can also fill the following parameters values:
- Nanofilt -q: quality, filter on a minimum average read quality score (for example, 10),
- Nanofilt -headcrop: trim n nucleotides from start of read (for example, 75).
- Illumina fastp -q : the quality value that a base is qualified (for example, 15)
- Prokka --genus : Genus name (default 'Genus').
- Prokka --species : Species name (default 'species').
- Prokka --strain : Strain name (default 'strain').
- Prokka --plasmid : Plasmid name or identifier (default '').
Confirm the entered values by clicking the button.
After pressing the button in Data assembly you can see how many steps have been finished and how many left until the selected data assembly is completed. As previously, the progress of the analysis and its duration is also displayed on the Gantt Chart. When analysis is completed, the workflow status will be changed from Running to Succeeded or Failed. If analysis is Succeeded, you will be able to see the report, download the consensus file and reports. In the other case, please contact us, to find out what went wrong. You do not have to wait until the analysis is complete, we will send email you notification when it is finished.
You can also perform the data assembly by going to the Quality tests list, and clicking button on the selected row. From quality tests details view, click to create new quality test on this dataset.

Badge Public means that dataset, quality test and data assembly connected with this dataset are visible and shared with all users. It has been added to let you know what results you can expect before starting the analysis. There are uploaded four public datasets: Bacillus subtilis 2014-3557 SRX6978160 MinION , Lactobacillus reuteri SRX6494061 MinION , Klebsiella pneumoniae SRX6898383 GridION and Klebsiella pneumoniae SRX6898382 Illumina NovaSeq 6000 . The data were downloaded from The European Nucleotide Archive (ENA) . Note, that you can see performed public quality tests and data assemblies, and also perform your own.

We have prepared a special dictionary with short description of bioinformatics tookit the to answer this question.

Bandage: - a program for visualising de novo assembly graphs.; Citation: Wick R.R., Schultz M.B., Zobel J. & Holt K.E. (2015). Bandage: interactive visualisation of de novo genome assemblies. Bioinformatics, 31(20), 3350-3352.
Fastp: - all-in-one preprocessing for fastq files used for paired end data.; Citation: Chen, S., Zhou, Y., Chen, Y. & Gu, J., 2018. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17), pp.i884–i890.
FastQC: - a quality control tool for high throughput sequence data.
Filtlong: - filtering long reads by quality that uses It uses both read length and read identity when choosing which reads pass the filter.
Flye: - a de novo assembler for single molecule sequencing reads, such as those roduced by PacBio and Oxford Nanopore Technologies.; Citation: Yu Lin, Jeffrey Yuan, Mikhail Kolmogorov, Max W Shen, Mark Chaisson and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using de Bruijn Graphs", PNAS, 2016.
Kraken 2: - a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences.; Citation: Wood DE., Lu J. and Langmead B., 2019. Improved metagenomic analysis with Kraken 2. Genome biology, 20(1), pp. 257.
KrakenTools: - a suite of scripts to be used for post-analysis of Kraken.
Krona: - an interactive visualization tool for exploring the composition of metagenomes; Ondov BD, Bergman NH, and Phillippy AM., 2011. Krona: Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 12(1), pp. 385.
Medaka: - a tool to create a consensus sequence of nanopore sequencing data.
NanoFilt: - filtering and trimming of long read sequencing data.; Citation: De Coster, W., D’Hert, S., Schultz, D.T., Cruts, M. and Van Broeckhoven, C., 2018. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), pp.2666-2669.
Nanoplot: - a plotting tool for long read sequencing data and alignments.; Citation: De Coster, W., D’Hert, S., Schultz, D.T., Cruts, M. and Van Broeckhoven, C., 2018. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), pp.2666-2669.
Prokka: - a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.; Citation: Seemann T., 2014. Prokka: rapid prokaryotic genome annotation Bioinformatics, 30(14), pp. 2068-2069.
Rebaler: - a program for conducting reference-based assemblies using long reads.
Quast: - Quality Assessment Tool for Genome Assemblies.; Citation: Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. and Gurevich, A., 2018. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), pp. i142-i150.

Nucleotide sequence in fasta format representing final assembly after long read only workflow or assembly in hybrid mode (long reads + Illumina data). Contains contigs and possibly scaffolds.

In this case, please contact us. We will be glad to answer your question.

Frequently Asked Questions (FAQ)

How to add a dataset?

What is a quality test and how to perform it?

How to perform assembling genomes using long nanopore sequencing reads?

How to perform assembling genomes using both sequencing reads from Oxford Nanopore Technology and paired-end DNA sequencing reads from Illumina?

What does the badge 'Public' displayed next to the dataset name mean? What do the test data come from?

What tools are used in the application and how to cite them?

What does the assembly.fasta file contain?

What if I have a question about Nanoforms that I have not found an answer to anywhere?