New sequence variants detected at DXS10148, DXS10074 and DXS10134 loci

A great amount of population and forensic genetic data are available for X-STRs supporting the need for having a common and accurate nomenclature among laboratories allowing for better communication, data exchange, and data comparison. DXS10148, DXS10074 and DXS10134 are commonly used X-STRs particularly due to their inclusion in the commercial kit Investigator Argus X-12 (Qiagen). Samples from West Africa and Iraq were sequenced for all three X-STRs allowing the detection of new DNA sequence variants. At DXS10148, variation was detected at four bases downstream from the flanking region from the repeat motif. The sequence AAGG-AAAG has been detected for the first time as a varying (AAGGAAAG)1–3 motif, in the present work. One additional string when compared to the common one (AAGGAAAG)2 adds eight bases to the fragment size of the tetranucleotide STR. This means that 2 repeats are added in these cases to the fragment size of the allele, while the presence of only one copy will reduce the expected allele size by 2 repeats. At DXS10074 two varying stretches consisting of AC and AG dinucleotide repeats were observed in the upstream flanking region, six bases from the main repeat core that also influence the expected allele size. DXS10134 revealed a simpler nomenclature in the Guinea-Bissau sample set when compared to the previously described allele nomenclature. This detected new hidden variation also has impact on the actual allele nomenclature at this locus as it contributes to a new class of short alleles so far undetected in other studies. ã 2015 Elsevier Ireland Ltd. All rights reserved.


Introduction
X-chromosomal STRs (X-STRs) have been broadly used over the last decade in forensic and population genetics and many worldwide population data have been produced. From in-house to commercially developed kits (e.g., [1][2][3]) a wide range of highly polymorphic X chromosome markers are now available for forensic identification and, in particular, for specific kinship analysis settings. The great amount of studies focusing on X-STRs supports the need for having a common and accurate nomenclature among laboratories allowing for better communication, data exchange, and data comparison. This has long been emphasized by several international DNA working groups (e.g., [4][5][6][7]), as well as by many other studies focusing on STRs (e.g., [8][9][10][11]); two of them specifically concentrated on nomenclature issues regarding X-STRs [8,9].
In this study, the comparison of the sequence data from different populations allowed for the observations of new sequence variations in the flanking regions of the allele repeat structures at DXS10148 and DXS10074 loci, and in the main repeat motif of DXS10134, which are presented here. We also discuss the implications of these findings for the originally proposed allele nomenclatures.

X-STR profiling and sample selection for sequencing
Samples were obtained under informed consent from West African countries (with the bulk of data originating from Guinea-Bissau) and from Iraq. Twelve X-STRs were genotyped using the Investigator Argus X-12 kit (Qiagen, Hilden, Germany), following the recommended protocol in the kit's manual [3]. During X-STR profiling a number of silent, rare or new alleles were detected. Consequently, traditional Sanger sequencing was performed to either identify or confirm the genotypes. In addition to the population samples, reference cell line DNA 9948 [12] was also sequenced for all three loci.

Sequencing of DXS10148, DXS10074 and DXS10134 alleles
Primers for singleplex amplification and sequencing reactions were either selected from the literature or newly designed using the online software Primer3 [13] (Table S1). A touchdown PCR (TD-PCR) protocol was applied [14] to avoid unspecific amplification products, observed initially by electrophoresis on a 1.5% agarose gel and ethidium bromide staining. All amplified products were purified with the PCR product cleanup ExoSAP-IT (USB Corporation) following the manufacturer's conditions. Forward and reverse sequencing reactions were performed using the BigDye Terminator v1.1 (Life Technologies) following the recommended protocol. Final sequenced products were purified with a Sephadex in-house filtration column protocol and detected in an ABI 3130 Genetic Analyser capillary electrophoresis system. Results were analysed with the Sequencing analysis software v.5.2 (Life Technologies).
During sequencing of samples with known genotypes, such as control DNA 9948 (allele 23) and two rare alleles from West Africa (alleles 29 and 32), a minus-two-repeat difference was detected when comparing genotypes to sequenced data (Table 1). For reference DNA 9948, sequencing results revealed the following allele structure (GGAA) 4 -(AAGA) 11 -(AAAG) 4 -N 8 -(AAGG) 2 which, according to the proposed nomenclature, corresponds to an allele 21 and not an allele 23 ( Table 1). The same was detected in the sequences of the two West African samples: the alleles genotyped as 29 and 32 revealed DNA repeat sequences for alleles 27 and 30, respectively (Table 1). When analysing the flanking regions of the main repeat STR core of these samples an additional eight bases (AAGG-AAAG) were identified in all three samples (at the fourth base downstream from the repeat motif (Table 1 and Fig. S1)) when compared to the sequence in Hundertmark et al. [15]. Since DXS10148 displays a tetranucleotide repeat motif that adds eight bases to the fragment size of the STR, this means that two repeats are added to these alleles [samples with (AAGG-AAAG) 3 instead of (AAGG-AAAG) 2 ]. In an Algerian population study [16] it seems that in some alleles (e.g., 14, 23, 28 and 29) the sequenced fragment also does not correspond to the genotyped allele. However, without direct sequence comparisons it is not clear if the same variation is responsible for this difference.
Additional sequencing data generated and analysed for DXS10148 from two samples from Iraq (allele 25.2) revealed the allele repeat motifs that were expected based on the genotypes obtained by comparison with the Argus X-12 allelic ladder ( Table 1). In the Iraqi samples the motif (AAGG-AAAG) was found in duplicate (Table 1, Fig. S2) and not in triplicate as in the West African samples and therefore no differences were detected between fragment sizes and sequenced alleles.
A subset of the West African samples from Guinea-Bissau was sequenced to establish the genetic basis for detecting null (or silent) alleles, hence without previously known genotypes. 62 out of the 63 samples sequenced from Guinea-Bissau displayed two copies of the sequence motif described above (as an example, see alleles 39.1 and 41.1 in Table 1). In one sample the presence of a single copy of the AAGG-AAAG motif was detected adding variation to this site (allele 41.1 in Table 1; Fig. S3).
The null alleles sequenced at DXS10148 displayed the same mutation in all cases: a G ! A transversion in the 9th nucleotide counting from the beginning of the repeat. This position corresponds to the 2nd base of the 3 0 end of the forward primer sequence published in Hundertmark et al. [15]. This mutation is most likely responsible for the silent alleles found at DXS10148 which also changes the structure of the repeat (see alleles 37.1, 38.1, 39.1 and 41.1 as examples, Table 1). The detection of null alleles at this locus as well as the single base mutation observed here has also been described previously in other studies (e.g., 17).
The results obtained in this study allowed the identification of the variable motif AAGG-AAAG adjacent to the core repeat region. This finding adds further variation to the relevant repeat region, with an important impact on the allele nomenclature as the allele designation based on fragment sizes does not always match the sequence-based allele structure. This is important in studies where a high frequency of silent alleles is present at this locus mostly seen Table 1 Sequence structure variation observed for DXS10148.
Reference repeat structure described by Hundertmark et al. [15]: Allele structure variation observed in this study: Al. seq. in African populations [16][17][18][19], and requires sequencing to identify the genotype. For example, in the case of the West African samples, the two null alleles were sequenced as 37.1 and 38.1 and therefore the genotypes of these samples would be considered as such. However, if the genotypes were obtained by fragment length sizing (if no drop-out had occurred) the corresponding alleles would have been 39.1 and 40.1, respectively. This is because eight additional bases are present in these samples when compared to the reference structure [15] causing a shift of plus 2 repeats (Table 1). The major impact of this is the deviation in the allele frequency distribution that can be introduced in these populations, and consequently during the genetic distance comparisons with other population groups. Furthermore, when analysing the repeat motif sequence of DXS10148, the last AAGG of the repeat (considering the reference nomenclature in Hundertmark et al. [15]) followed by the first AAAG of the flanking sequence should also be included together with the detected variant as it is the same one and considered for the allele designation (Table 1). This means that an additional AAGG-AAAG should be added to the already detected one in this study and become (AAGG-AAAG) 2-4.

New sequence variant at DXS10074
DXS10074 has been defined with the simple repeat structure (AAGA) n in the Argus X-12 manual. The given reference allele is an allele 14 displaying fourteen straight repeats as (AAGA) 14 . In the study by Hering et al. [20] two types of allele structures have been reported for DXS10074. In this latter work, short alleles are defined by containing the repeat block (AAGA) n in which n varies between 7 and 10 as (ACAC) 2 -(AGAG) 4 -AA-AAAG-(AAGA) 7-10 -AAGG-AAGA (sequences represented in italic are not included for allele designation). Longer alleles have been characterized by an additional 12 bases underlined in the following structure: (ACAC) 2 -(AGAG) 4 -AA-AAAG-(AAGA) 10-18 -AAGG-(AAGA) 2 -AAGG-AAGA. Due to this insertion the authors proposed that in these types the allele count should be n + 3. According to this repeat motif definition [20], an allele 14 has the following structure: (AAGA) 11 -AAGG-(AAGA) 2 which is different than the simple (AAGA) 14 reported in the Argus X-12 kit [3]. Either a different motif is being considered or these are just two different structures for the same allele.
Six samples from West-Africa (Guinea-Bissau) were chosen for sequencing due to the detection of new (previously not described) alleles at this locus (Table 2). Interestingly, all of these alleles were non-consensus types ranging from 11.3 to 15.3 (except for 13.3 which was not observed) as well as a 13.2 allele. Results revealed that the sequence variations responsible for the intermediate alleles (with exception of the 13.2 allele) were all detected in the main repeat motif AAGA (Table 2), but affecting different nucleotide positions, as either an A or a G has been lost, interrupting the core tandem repeat motif. On the other hand, the intermediate allele 13.2 revealed two additional nucleotides (AG) located outside of the core repeat in the upstream flanking region ( Table 2). This region is composed of two stretches of dinucleotide repeats AC and AG motifs, respectively, reported as tetranucleotides [(ACAC) 2 -(AGAG) 4 ] [20] . We prefer to consider these as dinucleotides repeats as variation at the dinucleotide level has been detected in this study in the following way: (AC) [3][4] and (AG) [8][9] (Table 2). Three of the samples plus control DNA 9948 have an (AC) 4 -(AG) 8 composition. This seems to also be the same for the 30 PCR amplicons sequenced by Hering et al. [20], as no variation was reported at this level. It appears that this is the most common composition of the dinucleotide stretches, at least in Europeans where more sequence data is available for comparison. For the additional samples, two samples had an (AC) 3 -(AG) 9 composition which means the loss of one AC dinucleotide is compensated by one additional AG, thus not changing the total amplicon size (Table 2). However, another allele displayed an (AC) 4 -(AG) 9 combination where the additional AG is responsible for the intermediate allele 13.2 as already described above.
The variation detected in the two dinucleotide AC and AG stretches at DXS10074 has not been considered for the allele designation in the originally recommended nomenclature [20]. This variable DNA region has also impact on the allele nomenclature as in some cases the intermediate x.2 alleles might be due to this variation such as the detected 13.2 allele in this work. Several intermediate alleles of this type have already been detected in other studies for DXS10074, e.g., in a Polish population study a 15.2 allele was described [21], a 16.2 allele in Algeria and in the Ivory Coast [16,18] or 14.2 and 19.2 in US populations [19]. Variation could possibly be occurring in the dinucleotide repeats; however, as no sequence data is available for these intermediate alleles in the mentioned studies this cannot be confirmed.

New sequence variant at DXS10134
The X-chromosome STR DXS10134 presents a highly complex allele sequence structure [3,22] 4 -AAA] 1-2 -(GAAA) n . In this proposed sequence, sixteen GAAA repeats located adjacent to the main repetitive core (represented in italic) contribute to the allele designation. These 16 invariable repeats are added to the variable 3 0 -located GAAA repeats. To confirm doubts in the observed rare genotypes of two Table 2 Sequence structure variation observed at DXS10074 detected in the Guinea-Bissau population. West African individuals (exhibiting alleles 35 and 43.2) sequencing was performed. Results for these samples showed the same type of allele structure compared to the above mentioned one and confirmed the respective genotypes (Table 3). Allele 43.2 showed two additional DNA sequence variations: one GAAA is missing in the adjacent sixteen GAAA repeat region and a total of three [(GAAA) 4 -AAA] structures are present instead of only one. In six West African male individuals from Guinea-Bissau, off ladder alleles with very short fragment sizes were observed at DXS10134 that overlapped with the bin set of the adjacent smaller locus DXS7132 of the Argus X-12 kit (Fig. S4). The smallest allele in the Argus Investigator X-12 allelic ladder for DXS10134 has 28 repeats and the newly detected alleles based on the fragment sizes were estimated as alleles 15 and 16. To our knowledge these alleles have not been reported so far for this locus. To confirm these short fragment genotypes, all six samples were sequenced (Table 3). These alleles revealed a much simpler allele structure (GAAA) 3 -GAGA-(GAAA) 4 -AA-(GAAA)-GAGA-(GAAA) [15][16] compared to the highly complex one described above [3,22]. The allele sequence starts with the same structure but the subsequent tract [GAGA-(GACAGA) 3 -(GAAA)-GTAA-(GAAA) 3 -AAA-(GAAA) 4 -AAA] is missing in these samples (accounting for a loss of eight GAAA repeats when compared to the reference structure [22]). Thus according to the suggested allele nomenclature alleles 23 and 24 are assigned by sequencing data instead of alleles 15 and 16, respectively (Table 3). In addition, a new 39.1 allele, so far unreported, was found in the West African sample set from Guinea-Bissau (Table 3). This allele revealed the same complex sequence structure as the initially proposed one. Reference DNA 9948 was also sequenced for DXS10134 fitting as well into the reference allele nomenclature (Table 3).
In the population samples from Guinea-Bissau a considerably simpler structure was detected in for DXS10134 for much shorter amplicons than those observed in the allelic ladder of the Investigator Argus X-12 kit. Such short alleles have not been described elsewhere for DXS10134 not even in the few African groups that have been analyzed previously for this system [16][17][18][19][23][24]. Possibly this structure is population specific and confined to the region of Guinea-Bissau, since we did not find these short alleles in other West African populations. However, much more sequence data from other populations would be needed to support this hypothesis.

Conclusions
Based on the data obtained in this work a more complex allele structure [(GGAA) 4 -(AAGA) X -(AAAG) Y -N 8 -AAGG-(AAGG-AAAG) Z ] than previously described is present at DXS10148. The same was observed for DXS10074 that harbors two varying dinucleotide repeat stretches [(AC) X -(AG) Y -N 6 -(AAGA) Z ] that were not included in the allele nomenclature.
Nevertheless, and despite the importance of these findings, according to the ISFG guidelines [4,6,7], if a previously established nomenclature of an STR is not in accordance with the ISFG recommendations but has been widely utilized, the nomenclature should not be altered to avoid unnecessary confusion. This is the case for DXS10148 and DXS10074 which, due to its inclusion in the only available X-STR commercial kit in the market, has been already widely used for population and forensic genetic applications. Recently, the manufacturer of this kit has critically revised the primers to improve the detection of previously observed null alleles which are common in the African population. Also, reference has been made in the manual to raise awareness regarding the unusual repeat structure of DXS10148.
The present findings demonstrate the importance of screening individuals from different world-wide populations before developing proposals for new STR nomenclatures. The ISFG DNA commission also strongly recommends [4,7] the screening of diverse population groups as well as the comparison with the Pan troglodytes genome sequence when establishing a new STR nomenclature since it is crucial to identifying regions that may vary as demonstrated by other studies [8][9][10][11]. The chimpanzee genome has been completely sequenced and is accessible using sequence similarity search tools, e.g., the online programs BLAST (www.ncbi.nlm.nih.gov/blast/Blast.cgi) and BLAT (www.genome. ucsc.edu) thus replacing the need for reference DNA samples from this species.