One of the most challenging aspects of sequencing is the relative ease with which hundreds of gigabases of sequence can be produced compared to the difficulty of analyzing these results. Nimbus attempts to simplify the analysis of large sequence datasets by providing customers with workflows that process data using the most popular open source tools used in the field. For example, we have workflows that align sequences (BFAST, BWA, Novoalign, and more), call SNVs and small indels (SamTools mpileup, GATK), and annotate the variants with information from both public and private data sources (dbSNP, the 1000 Genomes Project, OMIM, GeneTest, etc). We continuously update (and version) these workflows to take advantage of the rapidly evolving tool space but still allow you to re-run previous version to keep your analysis consistent.
Most universities, institutions, and large companies have access to computational resources such as a Linux cluster. Yet the extreme growth of sequencer output (approximately a 10-fold increase every two years) far exceeds the growth of information technology resources both from the perspective of Moore's law and what is reasonable for local computing staff to install. The cloud provides a mechanism to quickly scale up computational resources to support huge amounts of data processing. Providers, such as Amazon, have huge computational capacity that is both reasonably priced and extremely easy to scale up. We leverage this extraordinary capacity to process the largest NGS datasets quickly and efficiently.
There are several excellent cloud providers of NGS analysis services. What makes Nimbus Informatics unique is our specialization in analyzing full human genomes/exomes. We provide an end-to-end solution by aligning, re-aligning, re-calibrating quality scores, variant calling, and annotating variants. Furthermore, we can provide the resulting data as a website that includes a genome browser instance and a query engine for filtering variants. For example, see the HuRef website at http://huref.nimbusinformatics.com.
We follow industry best practices for securing data on the cloud. Client data is encrypted using GnuPG before being sent to the cloud, working data is always written to encrypted volumes, and results are encrypted when stored or returned to users. We also track carefully track data histories, access, and permissions.
All of our work, with the exception of private, licensed annotation sources, is made available via the open source SeqWare project. We feel very passionately that, as scientists, we need to have free access to the underlying code used in our research. To that end we use common open source tools that are widely used and published on. By releasing our code we can both contribute to and leverage work done by other SeqWare users such as TCGA at the University of North Carolina, Chapel Hill and the ICGC at the Ontario Institute for Cancer Research. We believe software is best when it is free, transparent, and open.
Without Nimbus, the Cannabis Genome would still be on my personal computer assembled and inaccessible to the world.
Aug 18, 2011
Feb 2, 2011
Our public website is now live.
Dec 31, 2010
Nimbus informatics is now in private beta. If you are interested in receiving an invitation code please email email@example.com.