MtDNA Profile of West Africa Guineans: Towards a Better Understanding of the Senegambia Region

The matrilineal genetic composition of 372 samples from the Republic of Guiné‐Bissau (West African coast) was studied using RFLPs and partial sequencing of the mtDNA control and coding region. The majority of the mtDNA lineages of Guineans (94%) belong to West African specific sub‐clusters of L0‐L3 haplogroups. A new L3 sub‐cluster (L3h) that is found in both eastern and western Africa is present at moderately low frequencies in Guinean populations. A non‐random distribution of haplogroups U5 in the Fula group, the U6 among the “Brame” linguistic family and M1 in the Balanta‐Djola group, suggests a correlation between the genetic and linguistic affiliation of Guinean populations. The presence of M1 in Balanta populations supports the earlier suggestion of their Sudanese origin. Haplogroups U5 and U6, on the other hand, were found to be restricted to populations that are thought to represent the descendants of a southern expansion of Berbers. Particular haplotypes, found almost exclusively in East‐African populations, were found in some ethnic groups with an oral tradition claiming Sudanese origin.


Introduction
Unveiling the history of human settlement in the West Coast of Africa is a complex task. It is the result of a continuous complex network of migrations, invasions and admixture of peoples from different origins. Fossil evidence suggests a modern human presence in NW Africa around 40000 years before present (YBP) (Alimen, 1987). A pre-Neolithic Capsian culture evolved later locally or through a diffusion from the Near East (Camps-Faber, 1989). Around 9000 YBP, when the Sahara went through a period of maximum humidity , several Neolithic cultures flourished in the area, bringing together people of sub-Saharan and North African origin (Dutour et al. 1988). The domestication and spread of several Africanspecific plants probably started in western Sahel after 4000 YBP. The first phase of largely east and southward oriented Bantu migrations, originating from the cen-territories occupied earlier by other ethnic groups. The origin of the Balantas is uncertain. Some see language affinities with the Sudanese from whom they could have separated 2000 years ago with the first spread of kushites migrations (Quintino, 1964). According to Stuhlmann (1910), the group derives from a Bantu branch, which separated in the Pleistocene near the Nile, following camite invasions. The Bijagós inhabit the Archipelago of the same name and some scholars see strong cultural resemblances to Egyptians (Quintino, 1964), but others relate them to the Senegalese Djola. The latter are a rather heterogeneous group, and include the Beafada which have an oral tradition of coming from Mali (Lopes, 1999). A mass arrival of Fula took place in the beginning of the 19 th century. The origin of this ethnic group is unknown, but tradition relates them to Hiksos and Nubians. They show the typical phonetic "glottal catch" which characterizes the whole group.
Here we analyze the mtDNA lineages present in the major ethnic groups of Senegambia, covering a broad number of recognized groups underrepresented in previous studies (Graven et al. 1995;Watson et al. 1997;Rando et al. 1998), and compare them within the broader context of African mtDNA variability (Graven et al. 1995;Watson et al. 1997;Rando et al. 1998Rando et al. , 1999Krings et al. 1999;Chen et al. 2000;Pereira et al. 2001;Brehm et al. 2002;Salas et al. 2002). Because mtDNA haplogroups show distinct geographic patterns in Africa, their frequency and diversity patterns in West Africa can be informative with respect to the origin of the different ethnic groups from Guiné-Bissau. The presence of Ychromosomes of Eurasian affiliation among populations from Cameroon at a high frequency, as reported recently (Cruciani et al. 2002), raises the intriguing question of back migrations from Eurasia to Africa, here supported by the presence of particular Eurasian mtDNA lineages among Guineans.

Sampling
A total of 372 blood samples were collected from unrelated Guinean males whose maternal ancestors were known to belong exclusively to a specific ethnic group. The samples were collected either in military camps with the permission of the Guiné-Bissau Chairman of the Joint Chiefs of Staff, or in the villages around the country with the help of the Ministry of Health. Every participant gave his consent in an individual interview after a detailed explanation of the project. Sample sizes and origins (along with additional information) are specified in Table 1 and 2. Due to the complex history involving the major ethnic groups in Guiné-Bissau, they do not all follow a clear present-day settlement pattern (see Figure 1).

HVS-I and HVS-II Sequencing
The leukocyte fraction of whole blood was used for DNA extraction by standard methods and the mtDNA hypervariable segment I (HVS-I) of the control region was amplified and sequenced. Sequencing products were separated on a MegaBACE 1000 automatic sequencer, following the manufacturer's specifications and aligned using Wisconsin Package GCG Version 10.0. All sequences were read between nucleotide positions (nps) 16024 and 16400. Additional information regarding polymorphic sites 185,186,189,195,236,297 and 322 in HVS-II was obtained by directly sequencing all samples that could not be unambiguously classified on the basis of HVS-I information alone.

RFLP Testing
In case of ambiguity in defining mtDNA haplogroups on the basis of the HVS-I haplotype, additional data was gathered from restriction fragment length polymorphisms (RFLPs) of diagnostic sites. All restriction digests were made according to the manufacturer's instructions (Fermentas and New England BioLabs). The following polymorphic restriction sites were screened: 322HaeIII,

Haplogroup characterization
The HVS-I sequence types were classified following the nomenclature of African and European mtDNA haplogroups (Quintana-Murci et al. 1999;Macaulay et al. 1999;Rando et al. 1999;Alves-Silva et al. 2000;Chen et al. 2000;Richards et al. 2000;Bandelt et al. 2001;Torroni et al. 1997Torroni et al. , 2001Mishmar et al. 2003;Salas et al. 2002). Here, and in what follows, the nucleotide position (np) number relative to the revised CRS (Anderson et al. 1981;Andrews et al. 1999) is used to designate haplotype-defining mutations. Character state change is specified only for transversions and insertions/deletions. Based on the previous knowledge of African complete sequences paraphyletic clade L1 is split into two monophyletic units L0, capturing previously defined L1a and L1d lineages, and L1 clade that includes L1b and L1c clades (Mishmar et al. 2003).
The sub-clades of L0a (pro L1a) and L1b are defined as in Salas et al. (2002).

Genetic Analysis and Population Comparisons
Median networks of HVS-I haplotypes (Bandelt et al. 1995 were drawn for each haplogroup separately, using the Network 3.1 program (Arne Röhl, www.fluxus-engineering.com/sharenet.htm). Haplogroup frequencies, molecular diversity indexes (F ST ) and genetic diversity (H -Nei, 1987) for haplotypes and haplogroups and analysis of molecular variance (AMOVA) were calculated using Arlequin v2.0 (Schneider et al. 2000). Comparisons between populations were assessed by subjecting the (relative) frequency vectors of the haplogroups to a principal component analysis (PCA).
A local database with more than 19000 individuals taken from literature and our unpublished data from worldwide populations was employed to search for exact matches of Guiné-Bissau haplotypes, ignoring length variation in the C stretch of the HVS-I.
Coalescence times were estimated by means of the ρ statistic, assuming that a transition within 16090-16365 corresponds to 20180 years (Forster et al. 1996).

Haplogroup Profiles
The 372 Graven et al. 1995;Watson et al. 1997;Rando et al. 1998;Salas et al. 2002). M1 and U6 are found in North and East Africa, Arabia, and the Middle East, whereas U5 has been sampled at appreciable frequencies only in Europe (Passarino et al. 1998;Quintana-Murci et al. 1999;Richards et al. 2000). The haplogroup profile for each ethnic group separately can be found in the Complementary Material.

L Lineages
Haplogroup L0 was represented in Guineans only by its daughter group L0a1 showing marginal frequencies ranging from 1% to 5% (Table 2), in contrast to its frequency in East African populations (e.g. 25% in Mozambique: Watson et al. 1997;Pereira et al. 2001;Salas et al. 2002). Interestingly, only the Balanta, a group claiming Sudanese origin, showed an increased frequency of this clade (11%). Haplogroup L0a has a Paleolithic time depth in East African populations (33,000 year old, Salas et al. 2002). The relatively young coalescent date of L0a1 in Guineans (6400± 2600 years, assuming a single founder) suggests that only a small subset of L0a reached Guinea during the Holocene. The founder haplotype of L0a in Guineans, GB4 (see Table 4 in Complementary Material), has an exact match in East Africa, the Middle East and in Cape Verde and Senegal Mandenka populations, indicating that its spread is not strictly restricted to Guineans. The lack of the L0a2 clade, associated with the 9bp deletion in CoII/tRNA Lys intergenic region, and widespread in Bantu speaking populations all over Africa (Soodyall et al. 1996), suggests that L0a has at least two distinct phylogeographic patterns in Central and West Africa. We cannot discard the possibility of a Bantu migration to West Africa, as the founder group could have a distinct composition from those who participated in the southwards migration(s).
Haplogroup L1b is restricted mostly to West African populations (Graven et al. 1995;Watson et al. 1997;Salas et al. 2002) and is represented by two different branches in Guineans. Its major cluster (Figure 2) L1b1 is associated with a transition at np 16293 and includes a frequent sub-clade defined by the combined presence of a transversion to A at np 16114 and a transition at np 16274 that has also been observed in Senegalese Mandenka (Graven et al. 1995) and Wolof (Rando et al. 1998). L1b1 presents a TMRCA of about 36000 years (Figure 2), predating the diversity of L0a1 in Guineans. The matches in this cluster have a West African distribution well represented in Mandenka (haplotypes GB8 and GB20) and their frequency is highest in the Fulani-western and Senegal-eastern language groups (Table 2). GB23 and GB24 are widespread in Africa and are found in nearly all West African populations considered here (Salas et al. 2002). Another West African specific clade, L1c, is present at a relatively low frequency (0-8%) yet with high haplotype diversity in the Guiné-Bissau sample.
Haplogroups L2a-L2c are frequent in Senegambia (Table 2) and reveal signatures of a recent expansion from a limited number of founder haplotypes that are shared between populations of different linguistic affiliation. In contrast, haplotypes belonging to haplogroup L2d are represented by single individuals and do not show a common founder sequence (Figure 2). Fifteen out of 42 L2a haplotypes sampled in Guinea Bissau had matches elsewhere: West Africa (Cabo Verde, Brehm et al. 2002;Wolofs & Senegalese, Rando et al. 1998;Mandenka, Graven et al. 1995) but can also be found in East, South and North Africa. The geographic distribution of L2b and L2c haplotypes is largely restricted to West Africa. Not surprisingly most of the haplotype matches are with Cabo Verdeans, Wolof and Senegalese. L2c is the haplogroup that shows a higher extent of shared lineages: Cape Verde, Senegal Mandenka, mixed Senegalese and São Tomé. The last case is likely due to a recent gene flow from the Cape Verde Islands (Brehm et al. 2002). However, several L2 haplotypes observed in Guineans appeared as unspecific to other West African populations but shared matches with East and North Africans. This was the case for the Balanta (BLE) haplotype GB44 Figure 2 MtDNA phylogeny of all Guinean haplogroups and skeletons of various L0, L1, L2 and L3 sub-haplogroups based on HVS-I sequences and coding-region RFLPs. The number of individuals assigned to the haplotypes is shown within the circles. The numbers over the lines represent the HVS-I (-16000 bp) and coding region mutations, with respective restriction sites. Transversions are represented with suffixes (length variation in the C-stretch is disregarded). Recurrent mutations are underlined and a refers to the mutation loss relative to root. The star indicates the putative root of the haplogroup. Coalescence estimates ± sd (in ya) are shown for haplogroups or sub-haplogroups. matching only with Sudanese (Watson et al. 1999), and GB59 matching with Moroccan sequences. Interestingly, haplotype GB83 (L2b) found in the Mansonca (MSW) group had an exact match only with Ethiopians (our unpublished data). Also the Fula haplotype GB39 has not been reported in West Africa but appears in East Africa: Lake Turkana (Watson et al. 1997), Nubia, Southern Sudan, Ethiopia and Saudi Arabia (our unpublished data).
Haplogroups L3b, L3d, and L3e are rare or absent in indigenous populations of North and South Africa but well represented in our sample. GB127 and GB134 are particular links of Guinean groups to Northwest African Mozabites, Moroccans and Senegalese. Particularly, GB136 from Fula-related people has been found so far in Hausa and again in Nubians and Sudanese. Apart from Mozambique (6%) the majority of L3d lineages are West African (7% in mixed Senegalese to 12% in Niger/Nigeria) with an estimated age of 42100 (± 10600, Salas et al. 2002). L3f is more frequent in Southeast Africa, ranging from 8% in Kenya/Sudan to 2% in Mozambique. The coalescence time of this haplogroup in West Africa was calculated as 39400 ya (± 10400, Salas et al. 2002), within the error range of the estimate based on Guinean samples (49350± 16200 ya). Haplotype GB178 in Fula shared an exact match with sequences from a wide range of East-African populations (Somalia, Egypt) and even Saudi Arabia. Haplogroup L3h is found in Ethiopia, Cape Verde and Niger/Nigeria at marginal frequencies (∼ 1%) but reaches its highest known frequency in the Ejamat from Guinea (8%). Its coalescent time estimate (14000± 8400 ya) in Guineans is consistent with its late Pleistocene/early Holocene spread around Africa.
No significant differences between Guinean ethnic groups pooled by their linguistic affiliation were observed in haplogroup frequencies. As for their geographic neighbours (Table 2), haplogroups L1b, L1c, L2b, L2c, L2d, L3b, L3d, and L3e cover most of the mtDNA variation (64-85%). The Guiné-Bissau sample shows an overall genetic diversity of 0.901 (sd.005) that is significantly higher than among other samples from West Africa (Table 2).

M1 and U6 Lineages
Haplogroup M1 has been characterized as an East African remnant of the major Asian haplogroup M (Quintana-Murci et al. 1999). It has been found mostly in Ethiopian populations (17%), its characteristic HVS-I motif being also well represented in Egyptian and Sudanese populations along the Nile Valley (7-8%, Krings et al. 1999). HVS-I haplotypes matching the East African M1 clade have also been identified in Northwest Africans (Plaza et al. 2003, unpublished data) where their frequency can reach 12.8% in Algerians and 4% among Moroccan and Algerian Arabs and Berbers. M1 is generally absent from autochthonous West African populations but was found among Balanta, Baiote, and Djola groups speaking Niger Congo Atlantic Bak languages. The Guinean M1 haplotypes matched exactly one West Saharan (Rando et al. 1998), 2 Mozabites (Côrte-Real et al. 1996, 2 Iranian and one Saudi Arabian sequence (unpublished data). This lineage derives from a particular cluster defined by a mutation at position 16185, which is also found in Ethiopia, Morocco and North African populations (Plaza et al. 2003, our unpublished results).
Haplogroup U6 is rather frequent in NW Africa, among Algerian Berbers, Moroccans and Mauritanians (Côrte-Real et al. 1996;Rando et al. 1998;Plaza et al. 2003), but is rare or absent in western sub-Saharan Africans. Three different U6 haplotypes were observed in Fula, Mandenka and Manjaco groups. These haplotypes match with sequences of a wide geographic range: North and West Africa (Cabo Verde, Tuareg, Mozabites, Moroccan Arabs and Berbers), East Africa (Nile Valley, Egypt and Ethiopia), the Middle East (Iran) and Mediterranean Europe (Sicily and Portugal, http://www.ahg.com/), suggesting that their spread might be related to the southern expansions of the Berber groups to whom the Fulani languages relate.

European Lineages: U5
Ten individuals out of 372 samples, all related to Fulbe groups, carried mtDNA variants typical of western Eurasia, particularly Europe. Within these mtDNAs belonging to haplogroup U5 nine Fulanis share one particular HVS-I haplotype. Both haplotypes are only one mutational step away from a common node widespread in Europe. Although U5 is one of the most frequent mtDNA variants among western Eurasians (about 460 sequences in our mtDNA HVS-I database) no exact matches to the two Guinean haplotypes were found, as would be expected in the case of recent admixture. On the other hand, the Fulani U5 haplotype appears in a data set of West Africans (Wolof and Serer, Rando et al. 1998) and in Moroccans (unpublished data), pointing to the existence of a common African founder lineage of haplogroup U5. Again, as in haplogroup U6 the linguistic correlation suggests that the spread of the haplotype in Senegambia might be related to the movement of Berber populations. More data from North and West African populations is needed to better characterize the source and the time of the spread of this founder lineage.

AMOVA and Principal Component Analysis
Analysis of molecular variance (AMOVA) in African populations attributed 15.6% to differences between groups, 3% to variation between populations within groups, and 81.6% to differences within populations   Figure 3).

Final Remarks
Roughly 87% of the mtDNA lineages found in the Guinean populations are common in other West African populations. Not surprisingly, the highest number of matches was with Cape Verde followed by other populations from the area (Mandenka, Wolof, Fulbe), but also with Morocco. The notable L haplotype sharing with North Africans testifies to the absence of a real barrier between this region and typical sub-Saharan populations. On the other hand, some Guinean groups (Fula and Balanta for instance) present haplotypes otherwise observed to date in East-African and Middle East populations.
It is interesting to note that the Bantu-associated markers L0a 9bp del CoII/tRNA Lys (Soodyall et al. 1996), L3b motif 16124-16223-16278 (Watson et al. 1997), L3e1 particularly L3e1a characterized by mutation 16185 (Bandelt et al. 2001) or the 16192 L2a1 subclade , were not found in our sample. This suggests that either Bantu migrations contributed very little to the gene pool of Guineans, despite the evidence of a Bantu migration starting from Cameroon and spreading towards Ghana, Nigeria, Burkina Faso and Mauritania, or that they had a distinct gene pool from that associated with the southwards migrants. The lack of Bantu branches of the Niger-Congo linguistic family, among a plethora of languages spoken in Guiné-Bissau, is more in agreement with the first hypothesis.
The finding of haplogroup M1 lineages of East African origin, albeit at low frequencies (3-5%) in Guinean groups with linguistic affinities to the Bak superfamily including Balanta, Baiote and Ejamat languages, supports the earlier suggestion of a Sudanese origin of the Balanta population and their spread to western Africa with kushitic migrants approximately 2000 years ago. Obviously, thereafter they were assimilated within the local population, acquiring their language. In particular the 16185 mutation might suggest a route through North Africa. The U6 presence in the Guinean pool, although at a low frequency, is not surprising, as these particular lineages have already been reported for this region. It seems plausible that the U5 lineages observed in the Fula arrived in Guiné via Sahel from North Africa before the slave trade. None of the typical European haplogroups (H, J, and T) were found in the presentday population of Guinea, whereas they exist at a fairly high frequency in North Africa in contrast to the U5 frequency (only 4.5%). This makes it less likely that the presence of U5 in Guiné, in particular, and in Northwest Africa in general, is due to recent admixture with the European population. A possible ancient migration from Asia to Africa was proposed by Cruciani et al. (2002) to explain the presence of some unusual Y-chromosome lineages identified in West Africa. Haplogroup R1 (defined by M173 mutation), without further branch defining mutations (M269 and M17) specific to Europeans, accounted for ∼ 40% of the Y-chromosomes in North-Cameroon, while not yet having been sampled elsewhere in Africa. More data from Central and Western Africa are needed to cast light on the origin of such idiosyncratic mtDNA and Y chromosome lineages. Thus, our U5 sequences from the Guinean Fulbe people corroborate Cruciani's hypothesis of a prehistoric migration from Eurasia to West Sub-Saharan Africa, testified by their present day restricted and localised distribution.