In Focus
Looking beyond the exons
The importance of non-coding genetic variants in the discovery and understanding of blood group alleles
Blood group genotyping now plays a major role in the modern blood banking laboratory.
Developing and maintaining an allele nomenclature system that accommodates for blood group diversity, while maintaining consistency throughout various blood group systems, is an important part of the work that drives the ISBT Red Cell Immunogenetics Blood Group Terminology (RCIBGT) Working Party. In this effort, reference sequences in the human reference genome (GRCh37 or GRCh38) or RefSeq sequences maintained in National Center for Biotechnology Information (NCBI) databases are used in read alignment and variant calling. These reference sequences are key in sequencing projects, such as Sanger and next-generation sequencing (NGS), as they allow the detection of nucleotide variants, indels or copy number changes in test samples.
When analysing variants, it is important to note for which blood group antigen polymorphism a reference sequence represents. The blood group phenotype for a given reference sequence, including the high-prevalence antigens, is often included on the introductory page of the blood group allele tables [1]. For example, the reference allele for the ACKR1 gene encoding the Duffy blood group is FY*01 (FY*A), defined by the nucleotide c.125G in exon 2. Therefore, a sample with the c.125A variant (rs12075G>A) has the FY*02 (FY*B) allele. The ISBT nomenclature used in the tables follow Human Genome Variation Society (HGVS) nomenclature (e.g. c.125G) and this often requires conversion when locating its genomic coordinate on the human genome reference sequence (e.g. GRCh38). The RefSNP (rs) number allows us to track between different reference sequences and nomenclature systems. This capability is one of the features in the upcoming, new RCIBGT blood group allele database.
Exonic single nucleotide variants (SNV) are well-known to underly many of our blood group antigens, however SNVs in the splice site regions are also a common molecular mechanism to generate blood group diversity. Adding to this are structural variants, which include large gene rearrangements resulting in hybrid alleles (e.g. in ABO, MNS, Rh blood group systems). Moreover, deep intronic variants and nucleotide variants in the proximal promoter or other regions that can disrupt regulatory motifs can result in altered gene expression. These variants demonstrate that these regions are just as important as the exons. Developments in gene-specific long-range PCRs and long-read sequencing techniques for blood group genes now permit the detection of structural variants, [2, 3] and the characterisation of allele-specific variants in the non-coding regions.
There are several blood group genes, including ABO, RHD, FY and ICAM4 [4-7], which have had allele-specific promoter and intronic variants identified. These findings may prove useful for distinguishing which non-coding variants are associated with common blood group phenotypes. The significance of intronic regions is highlighted in recent publications reporting the genetic basis of blood groups phenotypes based on variation in gene regulatory mechanisms [8-10]. The identification of allele-specific non-coding variants is valuable to future gene regulatory studies as it provides candidates with a potential to disrupt a regulatory motif or a gain-of-function variant to introduce new motifs [11].
NGS, and more recently long-range sequencing have an increasing role in the characterisation of our blood group genes. The information generated demands sophisticated bioinformatics skills and provides an opportunity for exploration by the growing group of young professionals in the field, as seen in recent publications [3, 12-14]. In line with the increasing use of big sequencing data, a reference whole gene sequence representing major blood group alleles is needed more than ever. Moreover, shifting the comparison between alleles and variants established on these references from manual to computer facilitated retrieval is essential and accommodated for in the newly developed RCIBGT database.
Regarding having a reference build for each blood group major alleles, it is important to keep in mind that one would face the differences between various ethnic groups. Take the example of CR1 Helgeson phenotype, where the variant associate with the DACY/YCAD phenotype has tight linkage among the Caucasians but not Africans [9]. It is not feasible to fit all population under one reference build, therefore, a population-specific consensus has been proposed to resolve the issue [11]. The add-on value of the population consensus reference for each major blood group allele would allow identification of evolutionarily conserved regions. Although a conserved region does not necessarily translate to functional DNA elements in itself, it should be evaluated with biochemical signatures (e.g histone markers and expression data)[15]. However, when dealing with aberrant blood group phenotypes, it would be wise to start investigating variations within the conserved regions than other positions allowing random variations.
To conclude, while reference sequences are updated by NCBI, developments in long read/long-range techniques are enabling further improvements to blood group reference sequences by characterising allele-specific variants. We can now transverse in either direction between genotype and phenotype. This will prove valuable in the future discovery of new blood group alleles and in also in our understanding of blood group gene regulation.
References
- International Society of Blood Transfusion Red Cell Immunogenetics and Blood Group Terminology. Blood Group Allele Tables. [cited 02/04/2024].
- Gueuning M, Thun GA, Trost N, Schneider L, Sigurdardottir S, Engström C, et al. Resolving Genotype-Phenotype Discrepancies of the Kidd Blood Group System Using Long-Read Nanopore Sequencing. Biomedicines. 2024;12(1).
- Zhang Z, An HH, Vege S, Hu T, Zhang S, Mosbruger T, et al. Accurate long-read sequencing allows assembly of the duplicated RHD and RHCE genes harboring variants relevant to blood transfusion. Am J Hum Genet. 2022;109(1):180-91.
- Fichou Y, Berlivet I, Richard G, Tournamille C, Castilho L, Férec C. Defining Blood Group Gene Reference Alleles by Long-Read Sequencing: Proof of Concept in the ACKR1 Gene Encoding the Duffy Antigens. Transfus Med Hemother. 2020;47(1):23-32.
- Gueuning M, Thun GA, Wittig M, Galati A-L, Meyer S, Trost N, et al. Haplotype sequence collection of ABO blood group alleles by long-read sequencing reveals putative A1-diagnostic variants. Blood Advances. 2023;7(6):878-92.
- Srivastava K, Almarry NS, Flegel WA. Genetic variation of the whole ICAM4 gene in Caucasians and African Americans. Transfusion. 2014;54(9):2315-24.
- Tounsi WA, Madgett TE, Avent ND. Complete RHD next-generation sequencing: establishment of reference RHD alleles. Blood Adv. 2018;2(20):2713-23.
- Thun GA, Gueuning M, Sigurdardottir S, Meyer E, Gourri E, Schneider L, et al. Novel regulatory variant in ABO intronic RUNX1 binding site inducing A3 phenotype. Vox Sanguinis. 2024;119(4):377-82.
- Wu PC, Lee YQ, Möller M, Storry JR, Olsson ML. Elucidation of the low-expressing erythroid CR1 phenotype by bioinformatic mining of the GATA1-driven blood-group regulome. Nature Communications. 2023;14(1):5001.
- Westman JS, Stenfelt L, Vidovic K, Möller M, Hellberg Å, Kjellström S, et al. Allele-selective RUNX1 binding regulates P1 blood group status by transcriptional control of A4GALT. Blood. 2018;131(14):1611-16.
- Ballouz S, Dobin A, Gillis JA. Is it time to change the reference genome? Genome Biology. 2019;20(1):159.
- Gleadall NS, Veldhuisen B, Gollub J, Butterworth AS, Ord J, Penkett CJ, et al. Development and validation of a universal blood donor genotyping platform: a multinational prospective study. Blood Advances. 2020;4(15):3495-506.
- McGowan EC, O'Brien H, Sarri ME, Lopez GH, Daly JJ, Flower RL, et al. Feasibility for non-invasive prenatal fetal blood group and platelet genotyping by massively parallel sequencing: A single test system for multiple atypical red cell, platelet and quality control markers. British Journal of Haematology. 2024;204(2):694-705.
- Steiert TA, Fuß J, Juzenas S, Wittig M, Hoeppner Marc P, Vollstedt M, et al. High-throughput method for the hybridisation-based targeted enrichment of long genomic fragments for PacBio third-generation sequencing. NAR Genomics and Bioinformatics. 2022;4(3).
- Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, et al. Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences. 2014;111(17):6131-38.