Interactive and exploratory visualization of epigenome-wide data

ISMB, July 2015

Héctor Corrada Bravo (@hcorrada, hcorrada@umiacs.umd.edu)
Center for Bioinformatics and Computational Biology, University of Maryland

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Large blocks of hypo-methylation (sometimes Mbps long) in colon cancer

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Hyper-variable genes are enriched within these blocks.

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

Consistently hyper-variable genes are tissue-specific.

Hansen, et al., Nat. Genetics, 2011
Corrada Bravo, et al., BMC Bioinformatics, 2012
Timp, et al., Genome Medicine, 2014
Dinalankara, et al., Cancer Informatics, 2015

R/Bioconductor

  • State-of-the-art computational and statistical analysis platform
  • We develop and apply methods for these analyses in this platform
  • Our collaborators perform analysis in this platform

What we wanted

  • Data transformation and modeling: data smoothing, region finding (R/Bioconductor: Bsmooth, minfi)
  • Genome browsing: search by gene, search by overlap
  • Region analysis: overlap with other data (our own, other labs, UCSC, ensembl)
  • Regulation: expression data (Gene Expression Barcode)

Analysis era

  • We have unprecedented ability to measure
  • and lots of publicly available data to contextualize it
[H. Wickham]

Integrative, visual and computational exploratory analysis of genomic data

  • Browser-based
  • Interactive
  • Data integration from multiple sources
  • Reproducible dissemination
  • Communication with R/Bioc: epivizr package

I want to use a genome browser track as a display device in R!!

[Nat. Methods, 2014]

Plug-in data from R with epivizr package

Workspaces and filtering

Data transformations and customization

More transformations and aggregation

Add new visualizations

Communication with R/Bioc

Using the epivizr package

  • Setup up an epivizr session
mgr <- startEpiviz(workspace="qyOTB6vVnff")
  • Calculate a statistic of interest
# Get tumor methylation base-pair data
m <- assay(se)[,"tumor"]

# Compute regions with highest variability across cpgs
region_stat <- calcWindowStat(m, step=25, window=80, stat=rowSds)
s <- region_stat[,"stat"]

Communication with R/Bioc

Using the epivizr package: browse by regions of interest.

  • What's around the regions with highest across CpG variability
# get locations in decreasing order
o <- order(s, decreasing=TRUE)
indices <- region_stat[o, "indices"]
slideShowRegions <- rowRanges(se)[indices] + 1250000L
mgr$slideshow(slideShowRegions)

epivizr uses WebSockets for connection, same as shiny. Big, big, big thanks to the @rstudio folks for working on this infrastructure.

Statistically informed visual exploration

Build your own browser

  • Standalone version (JS code bundled in epivizr BioC package)
  • Browse your favorite genome:
library(epivizr)
library(Mus.musculus)

mgr <- startStandalone(geneInfo=Mus.musculus, geneInfoName="mm10",
                          keepSeqlevels=paste0("chr",c(1:19,"X","Y")))

Extensible framework

What are we working on?

  • EpivizWidgets: using epiviz visualizations within, e.g., Rmarkdown
  • VisualCollaboration: collaborative annotation of workspaces, datasets, visualizations, etc.

Beyond genomics and epigenomics: metagenomics

[Human Microbiome Project]

Beyond genomics and epigenomics: metagenomics

Coordinates:

Beyond genomics and epigenomics: metagenomics

Samples:

Beyond genomics and epigenomics: metagenomics

Hierachically organized features

Beyond epigenomics: metagenomics

Beyond epigenomics: metagenomics

Beyond epigenomics: metagenomics

Built with epivizr and metagenomeSeq

Analysis era

[H. Wickham]

One interpretation of Big Data is Many relevant sources of contextual data

  • Easily access/integrate contextual data
  • Driven by exploratory analysis of immediate data

Analysis era

[H. Wickham]

One interpretation of Big Data is Many relevant sources of contextual data

  • Iterative process
  • Visual and computational exploration go hand in hand

Creativity in exploration

We are building a software system to support creative exploratory analysis of epigenome-wide datasets...

[T. Speed]

Acknowledgements

Florin Chelaru, UMD

  • CBCB@UMD
  • JHU/Harvard: K. Hansen, W. Timp, R. Irizarry, A. Feinberg
  • Genentech: Michael Lawrence
  • Rstudio: Joe Cheng, et al.
  • Funding: NIH, Genentech

Check it out: