Metadata for Reprocessed data from 127 Consolidated Epigenomes (111 Roadmap + 16 ENCODE) and Unconsolidated Epigenomes

DATA SOURCE

Google spreadsheet: Metadata and quality control

The spreadsheet contains 3 sheets (See bottom of sheet)

Consolidated_EpigenomeIDs_summary_Table: Main metadata table for 127 consolidated epigenomes
Consolidated_EpigenomeIDs_QC: QC measures for DNase-seq and Histone ChIP-seq datasets from all 127 consolidated epigenomes
Unconsolidated_Release9_QC: QC measures for DNase-seq and Histone ChIP-seq datasets from all unconsolidated epigenomes
Comments in the column headers describe each column (scroll over column header if you open the spreadsheet in a new window or see bottom of sheet if you open it using the Visualize button)

In order to reduce redundancy, improve data quality and achieve uniformity required for our integrative analyses, experiments were carefully and systematically equalized and uniformly re-processed to obtain comprehensive data for 111 consolidated Roadmap epigenomes. Convenient numeric epigenome IDs (e.g. E001) and mnemonics for epigenome names were assigned for each of the consolidated epigenomes. Key metadata such as age, sex, anatomy, epigenome class (see below), ethnicity and solid/liquid status were summarized for the consolidated epigenomes. Datasets corresponding to 16 cell-lines from the ENCODE project (with epigenome IDs ranging from E114-E129) were also processed similarly and used in the integrative analyses. All datasets from the 127 consolidated epigenomes were then uniformly reprocessed (post read-mapping) and equalized for read length based mappability and sequencing depth as described in the Processed Data section. The metadata, mapping of the individual Release 9 samples to the consolidated epigenome IDs and various quality control (QC) statistics are summarized in the spreadsheet above.

For the 127 consolidated epigenomes, a total of 105 DNA methylation datasets across 96 epigenomes involved either bisulfite treatment (WGBS using MethylC-seq or RRBS assays) or a combination of MeDIP-seq and MRE-seq assays. 53 epigenomes had DNase-seq chromatin accessibility datasets. 56 epigenomes had mRNA-seq gene expression data. Each of the 127 epigenomes included consolidated ChIP-seq datasets for a core set of histone modifications - H3K4me1, H3K4me3, H3K27me3, H3K36me3, H3K9me3 as well as a corresponding whole-cell extract sequenced control. 98 epigenomes and 62 epigenomes had consolidated H3K27ac and H3K9ac histone ChIP-seq datasets respectively. A smaller subset of epigenomes had ChIP-seq datasets for additional histone marks, giving a total of 1320 consolidated datasets.

Epigenomes were grouped in specific classes based on the diversity of assays used to profile them.

Class 1 epigenomes were subjected to a thorough set of assays, including DNA methylation (whole-genome bisulfite sequencing), mRNA expression (RNA-seq), chromatin accessibility (DNase-seq), and ChIP-seq for a large set of histone modifications.
Class 2 epigenomes were used to generate datasets for core histone modifications (ChIP-seq), chromatin accessibility (DNase-seq), DNA methylation (WGBS), and mRNA expression (RNA-seq).
Class 3 epigenomes were used to generate datasets for the core histone modifications (ChIP-seq), chromatin accessibility (DNase-seq), DNA methylation (RRBS or MeDIP/MRE assays), and mRNA expression (microarrays).
Class 4 epigenomes were subjected to the same assays as Class 3 epigenomes except for chromatin accessibility data.
Class 5 epigenomes were subjected to ChIP-seq assays for the minimum set of five core histone modifications.