New genes arise through gene duplication, retrotransposition, exon shuffling, gene fusion/fission, and de-novo genesis from noncoding DNA. Thus far, RNAmediated gene fusion (RMGFs) has been shown to introduce functional novelty, divergent selective pressures, and divergent expression profiles when compared to unfused parent genes. However, the frequency and properties of these new genes remain largely unknown. Through the application of genome-wide
networks to NGS data from Great Apes we aim to identify RMGFs, investigate their epigenetic profiles and analyse their potential mechanisms of generation, particularly through segmental duplications (SD). Subsequently, we aim to both computationally and experimentally investigate their expression and translation profiles and to characterise the cis-regulatory mechanisms behind RMGF transcription regulation. Finally, in order to enhance our understanding of the modular structure of RMGFs network based analyses were carried out to determine pFam domain usage patterns. 69 RMGFs were identified including 9 human-specific genes, their ancestry investigated across 32 high-quality vertebrate species and a significant enrichment in human SD shown. qRT-PCR and RNA-seq analyses reveal heterogeneous tissue expression with a bias towards testes specific expression in support of the ‘out-of-testis’ hypothesis. Moreover, cis-regulatory analyses of splice factor-binding sites, histone
modifications and transcription factor binding sites support this profile of expression. Ribosomal profiling of human fibroblast cell lines has uncovered translation for 3 RMGFs and these genes remain functionally unannotated. RMGF domain usage pattern does not significantly differ from non-fused protein coding genes in human or indeed across vertebrates. Our genome-wide scan for RMGFs across primates has uncovered that their occurrence is frequent, they are enriched in regions of SD, their transcriptional output and cis-motifs support the ‘out-of-testes’ hypothesis and that their domain usage does not differ significantly to that of non-fused genes.