Capacity
Standard tools User uploaded tools
h]ps://www.genome.gov/mul<media/slides/tcga4/23_davidsen.pdf 33
Democra<ze Cancer Genomics!
• NCI cloud pilot
– –
www.isb-cgc.org
Institute for Systems Biology
Seven Bridges Genomics
www.cancergenomicscloud.org Broad Institute
The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable
computational capacity to the cancer research community.
The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engineallow users to perform complex queries from R or Python scripts, or run
Dockerized workflows on sequence data available in cloud storage.
Seven Bridges Genomics Cancer Genomics Cloud enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible, and scalable manner. A rich query system allows researchers to find the exact data of interest and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts, and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Broad Institute FireCloud is modeled after their Firehose analysis infrastructure and
facilitates collaboration and provides a robust, scalable platform accessible to the community at-large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts,
www.isb-cgc.org
Institute for Systems Biology
Seven Bridges Genomics
www.cancergenomicscloud.org Broad Institute
The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable
computational capacity to the cancer research community.
www.firecloud.org
The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engineallow users to perform complex queries from R or Python scripts, or run
Dockerized workflows on sequence data available in cloud storage.
Seven Bridges Genomics Cancer Genomics Cloud enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible, and scalable manner. A rich query system allows researchers to find the exact data of interest and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts, and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Broad Institute FireCloud is modeled after their Firehose analysis infrastructure and
facilitates collaboration and provides a robust, scalable platform accessible to the community at-large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers, and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can upload their own analysis methods and data to workspaces or run the Broad’s best practice tools and pipelines on pre-loaded data.
www.isb-cgc.org
Institute for Systems Biology
Seven Bridges Genomics
www.cancergenomicscloud.org Broad Institute
The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable
computational capacity to the cancer research community.
www.firecloud.org
The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engineallow users to perform complex queries from R or Python scripts, or run
Dockerized workflows on sequence data available in cloud storage.
Seven Bridges Genomics Cancer Genomics Cloud enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible, and scalable manner. A rich query system allows researchers to find the exact data of interest and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts, and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Broad Institute FireCloud is modeled after their Firehose analysis infrastructure and
facilitates collaboration and provides a robust, scalable platform accessible to the community at-large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers, and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can upload their own analysis methods and data to workspaces or run the Broad’s best practice tools and pipelines on pre-loaded data.
34
37 / 44
Genomon
• Python (2.7.10)
• Perl (5.14.4)
• R (3.3.1)
• bwa (0.7.8)
• blat (v34)
• samtools (1.2)
• Biobambam (0.0.191)
• PCAP-core (20150511)
• htslib (1.3)
• bedtools (2.24.0)
• GenomonPipeline (2.5.3)
• GenomonSV (0.4.2rc)
• GenomonFisher (0.2.0)
• GenomonMuta<onFilter (0.2.1)
• EBFilter (0.2.1)
• GenomonPostAnalysis (1.4.0)
• GenomonQC (2.0.1)
• GenomonExpression (0.3.0)
• fusionfusion (0.3.0)
• paplot (0.5.5)
• sv_u<ls (0.4.0b2)
• annot_u<ls (0.1.0)
• fusion_u<ls (0.2.0
OS
Microso[ Azure Genomon2 RNA 2016 9
• 774 (Cancer Cell Line Encyclopedia (CCLE)) RNA-seq
• STAR + fusionfusion (
h]ps://github.com/Genomon-Project/fusionfusion)
• 230 !
By h]ps://www.microso[.com/ja-jp/
casestudies/imsut.aspx 37
Cloud genome analy<cal workflow
Dockstore: h]ps://dockstore.org
GA4GH:
Containers and Workflows working group
NCI cloud pilot
• Democra<ze Cancer Genomics!
– –
www.isb-cgc.org
Institute for Systems Biology
Seven Bridges Genomics
www.cancergenomicscloud.org Broad Institute
The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable
computational capacity to the cancer research community.
The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engineallow users to perform complex queries from R or Python scripts, or run
Dockerized workflows on sequence data available in cloud storage.
Seven Bridges Genomics Cancer Genomics Cloud enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible, and scalable manner. A rich query system allows researchers to find the exact data of interest and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts, and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Broad Institute FireCloud is modeled after their Firehose analysis infrastructure and
facilitates collaboration and provides a robust, scalable platform accessible to the community at-large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers, and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can
www.isb-cgc.org
Institute for Systems Biology
Seven Bridges Genomics
www.cancergenomicscloud.org Broad Institute
The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable
computational capacity to the cancer research community.
www.firecloud.org
The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engineallow users to perform complex queries from R or Python scripts, or run
Dockerized workflows on sequence data available in cloud storage.
Seven Bridges Genomics Cancer Genomics Cloud enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible, and scalable manner. A rich query system allows researchers to find the exact data of interest and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts, and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Broad Institute FireCloud is modeled after their Firehose analysis infrastructure and
facilitates collaboration and provides a robust, scalable platform accessible to the community at-large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers, and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can upload their own analysis methods and data to workspaces or run the Broad’s best practice tools and pipelines on pre-loaded data.
www.isb-cgc.org
Institute for Systems Biology
Seven Bridges Genomics
www.cancergenomicscloud.org Broad Institute
The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable
computational capacity to the cancer research community.
www.firecloud.org
The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engineallow users to perform complex queries from R or Python scripts, or run
Dockerized workflows on sequence data available in cloud storage.
Seven Bridges Genomics Cancer Genomics Cloud enables researchers to collaborate on the analysis of large cancer genomics datasets in a secure, reproducible, and scalable manner. A rich query system allows researchers to find the exact data of interest and combine it with their own private data. Native implementation of the Common Workflow Language specification makes it easy for developers, analysts, and bench biologists to deploy, customize and run reproducible analysis methods to learn from genomics data faster.
Broad Institute FireCloud is modeled after their Firehose analysis infrastructure and
facilitates collaboration and provides a robust, scalable platform accessible to the community at-large. Using the elastic compute capacity of Google Cloud, FireCloud empowers analysts, tool developers, and production managers to perform large-scale analysis, engage in data curation, and store or publish results. Users can upload their own analysis methods and data to workspaces or run the Broad’s best practice tools and pipelines on pre-loaded data.
39
“bring the analysis to the data”
•
Data Bio-sphere; by Benedict Paten
(SeqPod)
• 1.
2.
3. &
4. Amazon
5.
41
Short summary
•
– I cloud pilot
• OS
– reproducible
Public aaS
• –
• –
•
O PI N I O N Open Access
Computing patient data in the cloud:
practical and legal considerations for genetics and genomics research in Europe and internationally
Fruzsina Molnár-Gábor1*, Rupert Lueck2, Sergei Yakneen2and Jan O. Korbel2*
Abstract
Biomedical research is becoming increasingly large-scale and international. Cloud computing enables the comprehensive integration of genomic and clinical data, and the global sharing and collaborative processing of these data within a flexibly scalable infrastructure. Clouds offer novel research opportunities in genomics, as they facilitate cohort studies to be carried out at unprecedented scale, and they enable computer processing with superior pace and throughput, allowing researchers to address questions that could not be addressed by studies using limited cohorts. A well-developed example of such research is the Pan-Cancer Analysis of Whole Genomes project, which involves the analysis of petabyte-scale genomic datasets from research centers in different locations or countries and different jurisdictions. Aside from the tremendous opportunities, there are also concerns regarding the utilization of clouds; these concerns pertain to perceived limitations in data security and protection, and the need for due consideration of the rights of patient donors and research participants. Furthermore, the increased outsourcing of information technology impedes the ability of researchers to act within the realm of existing local regulations owing to fundamental differences in the understanding of the right to data protection in various legal systems. In this Opinion article, we address the current opportunities and limitations of cloud computing and highlight the responsible use of federated and hybrid clouds that are set up between public and private partners as an adequate solution for genetics and genomics research in Europe, and under certain conditions between Europe Molnár-Gáboret al. Genome Medicine (2017) 9:58
DOI 10.1186/s13073-017-0449-6