What Are UDIs and UMIs and Their Benefits in Next-Generation Sequencing?
If you are interested in sequencing multiple samples in parallel, you may have heard of unique dual indexes (UDIs) and unique molecular identifiers (UMIs). Significant improvements in next-generation sequencing (NGS) workflows have been through the integration of these artificial nucleotide sequences, or “barcodes” during library preparation.1 But what are they and how do they work? In this article, we will elaborate on the features of UDIs and UMIs to help you understand the key benefits they provide in NGS experiments.
What Are Unique Dual Indexes (UDIs)?
UDIs are nucleotide sequences incorporated on both ends of an NGS library molecule that enable researchers to distinguish which library a sequencing read derives from – specifically when many libraries are multiplexed and sequenced in parallel. The two index sequences within one UDI are distinct and are not reused among the other provided UDIs (Figure 1).
- During the final PCR step of most library preparations, each library is assigned its own UDI through the usage of index primers that fulfills two additional goals: library amplification and incorporation of full-length adapters.
- Following library preparation, every molecule within one library will have identical index sequences within their respective P5 and P7 sequencing adapters.
- During sequencing, index reads will be generated for every sequenced molecule, allowing each of the millions or billions of reads to be assigned to its respective library.
UDIs are Illumina’s recommended indexing strategy, particularly for sequencing on their latest patterned flow cell instruments2, such as the NovaSeq™ 6000.
- UDIs can minimize the effects of index hopping, a phenomenon where a read is assigned to the wrong index.
- Index hopping is elevated in patterned flow cells and can affect up to 2% of total reads2, potentially causing millions of reads to be assigned to the wrong library, leading to incorrect data interpretation and conclusions.
- UDIs are recommended to mitigate this undesired phenomenon as they allow for filtering of unexpected combinations of the dual indexes (i7 and i5), which is not possible with single-index or combinatorial dual index strategies.
- In addition, UDIs allow for a greater number of libraries to be multiplexed and sequenced together than other formats, increasing the overall sequencing throughput and efficiency.
What Are Unique Molecular Identifiers (UMIs)?
Another type of barcode is the Unique Molecular Identifier (UMI). Despite the similar acronyms, UMIs serve different functions than UDIs.
- UMIs are nucleotide sequences that are incorporated into all starting molecules of a sample during library preparation, prior to PCR.
- Each starting molecule receives its own unique sequence.
- After PCR, all copied molecules will have the same UMI as the template molecule.
UMIs help distinguish true variant molecules from false variants caused by errors introduced during the library preparation or sequencing process. They enhance deduplication, a bioinformatics process that accounts for PCR duplicates to determine the original number of starting molecules (Figure 2). This increased deduplication efficiency further promotes improved error correction and increased gene quantification accuracy, especially for libraries prepared with low input material.3 Zymo-Seq SwitchFree™ 3’ mRNA Library Kits include UMIs built-in to the reverse transcription process, allowing the user to make immediate use of their benefits if desired, without any additional purchase.
In short, UDIs distinguish molecules between libraries whereas UMIs distinguish molecules within a library. UDIs and UMIs can be used together to combine their benefits. The Zymo-Seq SwitchFree™ 3’ mRNA Library Kits include both UDIs and UMIs, allowing the user to make use of both of their respective advantages for their library preparations and sequencing runs.
References:
- MacConaill, L. E., Burns, R. T., Nag, A., Coleman, H. A., Slevin, M. K., Giorda, K., Light, M., Lai, K., Jarosz, M., McNeill, M.S., Ducar, M. D., Meyerson, M., & Thorner, A. R. (2018). Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing. BMC genomics, 19(1), 30. https://doi.org/10.1186/s12864-017-4428-5
- Illumina. (2018). Effects of Index Misassignment on Multiplexing and Downstream Analysis [White paper]. Illumina, Inc. https://www.illumina.com/content/dam/illumina-marketing/documents/products/whitepapers/index-hopping-white-paper-770-2017-004.pdf?linkId=36607862
- Fu, Y., Wu, P. H., Beane, T., Zamore, P. D., & Weng, Z. (2018). Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. BMC genomics, 19(1), 531. https://doi.org/10.1186/s12864-018-4933-1