I am a Master’s student at the University of Sheffield. I spend most of my spare time climbing, cycling or windsurfing and thoroughly enjoy a crazy adventure at the weekend. Favourite food: proper chip shop chips!
The common consensus dictates that all duplicate reads in RNA-Sequencing are the result of biological activity. This is because a high level of biological representation, combined with shorter potential mapping regions make biological duplicates more likely than technical ones. This contrasts with DNA-Seq where any duplicates are assumed to be technical. However, there is a growing body of evidence to suggest that technical duplicates in RNA-Seq account for a significant proportion of reads in some experiments which could have a large impact on a large array of studies. My project involved investigating the proportion of technical duplicates in RNA-Seq data using a modelling-based approach. The goal is to create a tool to indicate the proportion of true biological duplicates in order to guide downstream analysis.