Stable unmethylated DNA demarcates expressed genes and their cis-regulatory space in plant genomes
Crisp PA, Marand AP, Noshay JM, Zhou P, Lu Z, Schmitz RJ, Springer NM 2020. Proc Natl Acad Sci U S A

Crop genomes can be very large, with many repetitive elements and pseudogenes. Distilling a genome down to the relatively small fraction of regions that are functionally valuable for trait variation can be like looking for needles in a haystack. The location of these regions is often not obvious, and current detection technologies are impractically expensive and intensive for many research projects. The unmethylated regions in a genome are highly stable during vegetative development and can reveal the locations of potentially expressed genes or cis-regulatory elements. This approach provides a framework toward complete annotation of genes and discovery of cis-regulatory elements using methylation profiles from only a single tissue. The genomic sequences of crops continue to be produced at a frenetic pace. It remains challenging to develop complete annotations of functional genes and regulatory elements in these genomes. Chromatin accessibility assays enable discovery of functional elements; however, to uncover the full portfolio of cis-elements would require profiling of many combinations of cell types, tissues, developmental stages, and environments. Here, we explore the potential to use DNA methylation profiles to develop more complete annotations. Using leaf tissue in maize, we define ∼100,000 unmethylated regions (UMRs) that account for 5.8% of the genome; 33,375 UMRs are found greater than 2 kb from genes. UMRs are highly stable in multiple vegetative tissues, and they capture the vast majority of accessible chromatin regions from leaf tissue. However, many UMRs are not accessible in leaf, and these represent regions with potential to become accessible in specific cell types or developmental stages. These UMRs often occur near genes that are expressed in other tissues and are enriched for binding sites of transcription factors. The leaf-inaccessible UMRs exhibit unique chromatin modification patterns and are enriched for chromatin interactions with nearby genes. The total UMR space in four additional monocots ranges from 80 to 120 megabases, which is remarkably similar considering the range in genome size of 271 megabases to 4.8 gigabases. In summary, based on the profile from a single tissue, DNA methylation signatures provide powerful filters to distill large genomes down to the small fraction of putative functional genes and regulatory elements.