We study the molecular mechanisms that define the extent of transcription units generated by RNA polymerase II (Pol II) across mammalian genomes. Especially how do protein coding transcripts differ from long noncoding transcripts in their mode of synthesis and coupled RNA processing?
Depicted in the diagram above, protein coding transcripts (in red) are separated into exonic and intronic sequence and subjected to a variety of co-transcriptional RNA processing reactions to generate translatable mRNA; 5’ end capping, intron removal coupled to exon ligation (splicing) and 3’ end cleavage and polyadenylation (CPA). Added to these well-established steps in mRNA synthesis, several additional co-transcriptional RNA processing mechanisms operate, as actively investigated by our research group. First, a fraction of Pol II transcript prematurely terminates at either cryptic polyA signals (PAS) or through transcript cleavage mediated by the Integrator complex (I). Second hairpin structures may be excised from introns by the microprocessor complex (M). This releases pre-microRNA that are subsequently converted into microRNA by cytoplasmic Dicer. Third CPA cleavage at the gene 3’ end not only promotes release of polyadenylated mRNA, but also exposes nascent transcript to 5’->3’ exonuclease activity by Xrn2 (X). This ultimately forces termination of Pol II from the gene template by the torpedo mechanism.
As shown in the diagram, long noncoding transcripts (lncRNA, in green) are also synthesized by Pol II but may be formed and processed in a distinct manner to protein coding transcripts. Many lncRNA derive from R-loop structures usually detected near the ends of protein coding transcripts. These RNA:DNA hybrids force the non-template DNA strand out of the DNA helix. The single stranded DNA so formed can act as a template for de novo antisense transcription by Pol II. Such R-loop promoter activity may explain the origin of many lncRNA, especially over protein coding gene promoters (promoter antisense or PROMPT lncRNA) and terminators (gene antisense lncRNA). Furthermore, transcriptional enhancer elements may similarly generate R-loop dependent lncRNA (eRNA). Some lncRNA are formed independently of protein coding genes in intergenic regions. These are referred to as long intergenic noncoding RNA (lincRNA). All classes of lncRNA are subject to coupled RNA processing and degradation. However, unlike protein coding transcripts they are usually only weakly spliced and polyadenylated and often depend on Integrator to end and restrict their transcription.
All aspects of these coupled transcription and RNA processing mechanisms are under active investigation by our research team and are funded by a recently secured 5 year Investigator Award from the Wellcome Trust (April 2021-March 2026).