Interactive and exploratory visualization of epigenome-wide data

Bio-IT World, April 23, 2015

Hector Corrada Bravo
Center for Bioinformatics and Computational Biology, University of Maryland

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Large blocks of hypo-methylation (sometimes Mbps long) in colon cancer

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Hyper-variable genes are enriched within these blocks.

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Consistently hyper-variable genes are tissue-specific.

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Blocks can be detected using Illumina bead arrays.

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Hyper-variability is enriched within hypo-methylation blocks

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

R/Bioconductor

  • State-of-the-art computational and statistical analysis platform
  • We develop and apply methods for these analyses in this platform
  • Our collaborators take part in analysis in this platform

What we wanted

  • Data transformation and modeling: data smoothing, region finding (R/Bioconductor: Bsmooth, minfi)
  • Genome browsing: search by gene, search by overlap
  • Region analysis: overlap with other data (our own, other labs, UCSC, ensembl)
  • Regulation: expression data (Gene Expression Barcode)

Analysis era

  • Funding calls to (strictly) analyze project data
    • Epigenomics roadmap, Encode, TCGA, ...
  • Journals calling for (strictly) analysis papers (e.g., Nature Methods)
  • We have unprecedented ability to measure
  • and lots of publicly available data to contextualize it

Analysis era

  • Funding calls to (strictly) analyze project data
    • Epigenomics roadmap, Encode, TCGA, ...
  • Journals calling for (strictly) analysis papers (e.g., Nature Methods)
  • We have unprecedented ability to measure
  • and lots of publicly available data to contextualize it
[H. Wickham]

Integrative, visual and computational exploratory analysis of genomic data

  • Browser-based
  • Interactive
  • Integration of data
  • Reproducible dissemination
  • Communication with R/Bioc: epivizr package

I want to use a genome browser track as a display device in R!!

[Nat. Methods, 2014]

Walkthrough and Use Case

Plug-in data from R with epivizr package

Walkthrough and Use Case

Workspaces and filtering

Walkthrough and Use Case

Data transformations and customization

Walkthrough and Use Case

Navigate and annotate

Walkthrough and Use Case

Transformations and Aggregation

Walkthrough and Use Case

Add new visualizations

Walkthrough and Use Case

Statistically informed visual exploration

Walkthrough and Use Case

Reproduce, disseminate and collaborate

Communication with R/Bioc

Using the epivizr package

  • Setup up an epivizr session
mgr <- startEpiviz(workspace="qyOTB6vVnff")
  • Add a device with GRanges data
blocks_dev <- mgr$addDevice(colon_blocks, "450k blocks")
  • Subset ranges by width
keep <- width(colon_blocks) > 250000
mgr$updateDevice(blocks_dev, colon_blocks[keep,])

Communication with R/Bioc

Using the epivizr package: browse by regions of interest.

  • What's around the widest blocks?
o <- order(-width(colon_blocks))
slideShowRegions <- colon_blocks[o[1:5],]
slideShowRegions <- slideShowRegions + 1e5
mgr$slideshow(slideShowRegions)

epivizr uses WebSockets for connection, same as shiny. Big, big, big thanks to the @rstudio folks for working on this infrastructure.

Plugins, plugins, plugins

Our architecture is dynamically extensible. We can easily integrate new data types and add new visualizations.

Example: adding a new visualization

see: https://gist.github.com/11017650

Plugins, plugins, plugins

Datatypes

  • Based on "three-table" design in Bioconductor infrastructure

Build your own browser

  • Standalone version (javascript code bundled in epivizr)
  • Browse your favorite genome:
library(epivizr)
library(Mus.musculus)

mgr <- startStandalone(geneInfo=Mus.musculus, geneInfoName="mm10",
                          keepSeqlevels=paste0("chr",c(1:19,"X","Y")))

Analysis era

[H. Wickham]

One interpretation of Big Data is Many relevant sources of contextual data

  • Easily access/integrate contextual data
  • Driven by exploratory analysis of immediate data

Analysis era

[H. Wickham]

One interpretation of Big Data is Many relevant sources of contextual data

  • Iterative process
  • Visual and computational exploration go hand in hand

Creativity in exploration

We are building a software system to support creative exploratory analysis of epigenome-wide datasets...

[T. Speed]

Visualization goals

  • Context
    • Integrate and align multiple data sources; navigate; search
    • Connect: brushing
    • Encode: map visualization properties to data on the fly
    • Reconfigure: multiple views of the same data

[Perer & Shneiderman]

Visualization goals

  • Data
    • Select and filter: tight-knit integration with R/Bioconductor;
    • (future) filters on visualization propagate to data environment
  • Model
    • New 'measurements' the result of modeling; perhaps suggested by data context

[Perer & Shneiderman]

Acknowledgements

Florin Chelaru, UMD

  • CBCB@UMD
  • JHU/Harvard: K. Hansen, W. Timp, R. Irizarry, A. Feinberg
  • Genentech: Michael Lawrence
  • Rstudio: Joe Cheng, et al.
  • Funding: NIH, Genentech

Check it out: