Which sequence features promote R-loop formation during transcription?

Cells form R-loops when the nascent RNA hybridizes back to its DNA template, displacing the non-template strand. Sequence features that favor this structure are predictable and profoundly influence where R-loops arise during transcription.

Sequence determinants that promote R-loop formation

High GC skew and G-richness on the non-template strand make RNA:DNA hybridization thermodynamically favorable because guanine-rich RNA pairs form more stable hybrids. Stretches of consecutive guanines and clustered G nucleotides promote the displaced single DNA strand to fold into G-quadruplexes, which stabilize the R-loop. Regions with CpG islands and high GC content at promoters or terminators are particularly prone to R-loop formation. Low intrinsic nucleosome occupancy and open chromatin expose single-stranded DNA, facilitating RNA re-annealing. Strong, processive transcription combined with promoter-proximal pausing increases the dwell time of the RNA near the template, raising the probability of hybridization. Local DNA topology also matters: negative supercoiling behind elongating RNA polymerase lowers the energy barrier for strand separation and favors hybrid formation.

Causes, relevance, and biological consequences

These sequence and topological features interact with cellular factors to determine R-loop landscapes. María Aguilera at University of Seville and Roberto García-Muse at Pompeu Fabra University have emphasized that R-loops are not mere byproducts but can cause genome instability when persistent, by generating single-stranded DNA vulnerable to damage and by provoking collisions between transcription and replication machineries. In some genomic contexts, R-loops have beneficial roles, helping regulate gene expression, facilitating class switch recombination in immune cells, or marking sites of epigenetic regulation. However, their persistence is linked to human diseases: neurodegenerative disorders and cancers often show altered R-loop homeostasis.

Regional and organismal nuance shapes R-loop biology. Bacterial genomes, with different transcription termination mechanisms, exhibit distinct R-loop patterns compared with vertebrate genomes where CpG-rich promoters and chromatin structure play larger roles. Environmental stressors that change transcription rates or DNA supercoiling can shift R-loop formation patterns, with potential territorial consequences for tissue-specific disease vulnerability. Understanding the sequence features that promote R-loops helps target genome-stability pathways and informs therapeutic strategies that aim to modulate R-loop levels without disrupting their regulatory functions.