Genetic origin, admixture, and asymmetry in maternal and paternal human lineages in Cuba

Background Before the arrival of Europeans to Cuba, the island was inhabited by two Native American groups, the Tainos and the Ciboneys. Most of the present archaeological, linguistic and ancient DNA evidence indicates a South American origin for these populations. In colonial times, Cuban Native American people were replaced by European settlers and slaves from Africa. It is still unknown however, to what extent their genetic pool intermingled with and was 'diluted' by the arrival of newcomers. In order to investigate the demographic processes that gave rise to the current Cuban population, we analyzed the hypervariable region I (HVS-I) and five single nucleotide polymorphisms (SNPs) in the mitochondrial DNA (mtDNA) coding region in 245 individuals, and 40 Y-chromosome SNPs in 132 male individuals. Results The Native American contribution to present-day Cubans accounted for 33% of the maternal lineages, whereas Africa and Eurasia contributed 45% and 22% of the lineages, respectively. This Native American substrate in Cuba cannot be traced back to a single origin within the American continent, as previously suggested by ancient DNA analyses. Strikingly, no Native American lineages were found for the Y-chromosome, for which the Eurasian and African contributions were around 80% and 20%, respectively. Conclusion While the ancestral Native American substrate is still appreciable in the maternal lineages, the extensive process of population admixture in Cuba has left no trace of the paternal Native American lineages, mirroring the strong sexual bias in the admixture processes taking place during colonial times.


Background
At the time of the arrival of Columbus to Cuba in 1492, two different Native American groups inhabited the island: the Ciboneys, spread across the whole island, and the Tainos, mainly occupying the Central and Eastern regions of Cuba [1]. Although not much is known of Ciboney culture including their language, it is known that their economy was based on hunter-gathering (mainly fishing and hunting) and lacked pottery, unlike the Tainos, who were sedentary people living in large settlements and whose culture was supported by technically advanced agriculture. The social organization of the Tainos was based on chiefdoms, in which the caciques were the social authority. The Tainos spoke Arawakan, a language belonging to both the Equatorial sub-family and the Equatorial-Tucanoan family [1].
Who first colonized the Caribbean islands is still a matter of debate. Geographical, archaeological and linguistic evidence [2][3][4], as well as ancient DNA data [5,6] suggest that the Caribbean was most likely populated by successive waves of migration originating in the Lower Orinoco Valley in South America, taking advantage of the close geographical proximity of the islands in the Caribbean. Therefore, the first migratory movement would have involved hunter-gatherer groups arriving around 5000 B.C. (probably the ancestors of the Ciboneys), followed by subsequent migrations of agriculturalists [6]. The Taino Indians would have also migrated from the Orinoco Valley around 1000 B.C., either mixing with or pushing the pre-existent populations towards the West.
With the arrival of the Europeans both the Ciboney and the Taino populations were drastically reduced within a few generations, a consequence of the harsh slavery conditions, confrontation with settlers, starvation, and infectious diseases. The process of population decrease was more dramatic in the case of the Ciboneys, as this population was already in decline by the time of the Spanish landing in Cuba [7]. Since the sixteenth century, the African slave trade grew steadily over several centuries in Cuba. Slaves were brought mainly as mining laborers to make up for the dramatic demographic decline of Native American people [8]. Overall however, slaves accounted for just a small part of the Cuban population which was mainly European, due to constant migrations from Spain, especially from the Canary Islands. During the second half of the eighteenth century, the introduction of African slaves to Cuba accelerated, dramatically changing the demographic characteristics of the island. In comparison with other American and Caribbean countries, the slave trade to Cuba began earlier and lasted longer. According to revised estimations by Curtin [9], the total number of slaves brought to the island over the whole slave trade period was about 702,000. However, Pérez de la Riva [10] documented more than 1,300,000 slaves since the sixteenth century. The exact origin of only a small fraction of the total African slave population is clearly documented [9], with historical records pointing towards Western (Bight of Benin, North of Congo, Angola, the Bight of Biafra and Sierra Leone) and South-eastern (Mozambique and Madagascar) Africa being the main sources for African slaves. Immigration continued during the nineteenth century when Cuban institutions intensely promoted Spanish immigration, especially from the Canary Islands, reflecting fears of a growing African presence and the desire to "whiten" the Cuban population. In addition, Asian individuals, especially from Bengal and South China, arrived to Cuba in order to substitute the slave labor force when slavery became illegal in the nineteenth century. Around 125,000 Chinese were reported to have arrived to the island to work in conditions of semi-slavery [11]. The arrival of immigrants and slaves to Cuba was not uniform across the island. Thus, during the nineteenth century, the island was organized into three departments (see Figure 1): (a) the Western department was the most populated and had the largest slave population, and subsequently, the largest concentration of laborers due to the development of the sugar industry, (b) the Central part was mainly populated by European livestock farmers, and (c) Eastern Cuba was reported to contain similar amounts of Africans and Europeans [12].
The present-day Cuba presents an attractive enclave in which to study the outcome of a complex history of intricate genetic admixture processes. The high phylogeographic information content of both mitochondrial DNA (mtDNA) and Y-chromosome markers has been investigated at depth in the literature, allowing the reconstruction of past demographic and evolutionary events, such as human migrations and admixture processes. MtDNA and Y-chromosome variation show a strong phylogeographical structure among continental areas, to the point that almost all haplogroups are confined to a single continent and can be used to trace migrations out of that continent. See for instance, figures 9.16 and 9.18 in Jobling et al. [13]. In admixed populations, mtDNA sequences and Ychromosome haplotypes can be assigned to a continent of origin [14]. In this study we analyzed the complementary information provided by both markers with the aim of surveying (i) the geographic origin of current Cuban ancestors, (ii) the extent of the admixture present among these, and (iii) their differential sexual contribution to the present-day Cuban gene pool.

Samples
Blood stains from 245 unrelated individuals from the general Cuban population were collected blindly with regard to their socioeconomic status in order to avoid ascertain-  ment biases. For each individual, information on the province of origin of the maternal grandmother and paternal grandfather was recorded (see Figure 1) for further analyses on the geographical distribution of the lineages within Cuba. Samples consisted of unrelated healthy blood donors and appropriate informed consent was obtained from all individuals participating in the study. DNA was extracted using standard phenol-chloroform protocols [15].

MtDNA genotyping
The mtDNA control region was PCR amplified using primers L15996 and H408 [16] following previously published conditions [17]. HVR-I sequences from positions 16024 to 16391 [18] (Anderson et al. 1981) were determined for all the individuals (GenBank accession numbers EU649796-EU650040) with the ABI PRISM BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems) according to supplier's recommendations.
Additionally, five SNPs from the mtDNA coding region (10400, 10873, 11719, 12308, and 12705) were genotyped as described in Bosch et al. [19]. These SNPs allowed the classification of mtDNA lineages into six main phylogeographic haplogroups (L, M, N, R, U and HV/H). Two additional sites (7028 and 11251) were also typed in those sequences belonging to HV/H and R as described in Bosch et al. [19]. Information on the control region sequence was used to refine the haplogroup classification.

Y-chromosome genotyping
Thirty-five Y-chromosome SNPs were typed in the 132 male individuals of the Cuban sample as described by Berniell-Lee et al. [20]. Additionally, the analysis of markers M172, 12f2, M45, M207 and P36 was carried out as described in Bosch et al. [19]. According to these biallelic markers, the samples were finally classified into the main haplogroups and sub-haplogroups according to the Y Chromosome Consortium [21].

Statistical analysis
Intrapopulation genetic diversity parameters were calculated using the Arlequin package [22]. For mtDNA, the average pairwise differences as well as the weighted intralineage mean pairwise difference (WIMP) [23] were calculated. For some analyses, the sample was geographically divided into three groups according to the three departments in which the island was divided during the nineteenth century: West (provinces 1 to 5 in Figure 1), Central (provinces 6 to 10) and East (provinces 11 to 15).
In order to compare the mtDNA sequences obtained, previously published mtDNA data was used (see Additional file 1): 5,370 Africans, 4,147 Native Americans, and 8,645 European lineages. Only variation within the range 16090 to 16365 [18] was used for inter-population comparisons. Each mitochondrial sequence found in Cuba was compared to the corresponding dataset according to the continent of origin of the haplogroup assignation (Africa, America or Eurasia) (see Table 1). Each of the datasets was subdivided into subcontinental regions (see Additional file 1). In order to estimate the putative origin of the Cuban sequences, we calculated the probability of origin of each subcontinental region by a Bayesian approach. The probability of origin of each of the subcontinental region was computed as where, n is the number of Cuban sequences with matching (≥ 1) in the whole continental dataset; k i , the number of times the sequence i is found in the Cuban sample; p is , the frequency of the sequence i in the subcontinental region dataset; and p ic , the frequency of the sequence i in whole continental dataset. In order to provide confidence intervals for each of the estimations for the subcontinental regions, we also computed the standard deviation as

Phylogeographic analysis of mtDNA lineages in Cuba
A total of 153 different mtDNA sequences were found in the 245 individuals analyzed. Although the mtDNA sequence diversity (0.994 ± 0.001) and the mean pairwise differences (8.102 ± 3.774) in Cubans are relatively high, the WIMP value is low (2.869), suggesting that the sample set is composed of distantly related haplogroups with low to moderate internal diversity. According to the geographical origin attributed to each mtDNA haplogroup, 45% of the mtDNA sequences found in Cubans are of African origin, 33% of Native American origin and 22% of the lineages are of West Eurasian origin (namely, Europe and the Middle East) ( Table 1).
Within the African lineages, the vast majority of sequences belong to sub-Saharan L haplogroups (43.3% of the total sample), whereas a small proportion (2% of the total sample) fall into the typical North African U6 haplogroup. Interestingly, most of these U6 sequences (see Additional file 2) belong to the sub-clade U6b1, which is characteristic of Canary Islands [24]. The main U6b1 profile A16163G T16172C A16219G T16311C (three matches in Cuba) is in fact highly prevalent in the Canary Islands (~10%) [24], and outside these islands, it appears only sporadically in some other Latin-American countries such as Uruguay [25]. Haplogroup A2 is the main Native American haplogroup in Cuba (21.9% of the total sample), accounting for 67% of the Native American mtDNA gene pool. It should be stressed that five out of eight sequences belonging to haplogroup D1 and seven sequences out of 13 belonging to haplogroup C have been previously described in extinct Ciboneys [6] and Tainos [5]. Within the small European fraction, the main haplogroup is H, which represents 43% of the European lineages. No East Asian mtDNA lineages [26] were found in the present sample set.
In order to obtain rough estimates for the putative geographic origin of Cuban mtDNA lineages at a more regional continental scale, we searched (identical) matching sequences in different datasets. The Native American lineages found in Cubans were compared to a dataset of 4,147 published Native American sequences (see Addi-tional file 1), divided into three main geographical regions: North (from Alaska to Southern USA; n = 2,005), Central (from Mexico to Panama, including the Caribbean islands; n = 485) and South America (from Colombia southwards; n = 1,657). Only fifteen of the 35 Native American lineages found in Cubans are also found in the American dataset (see Additional file 2). The average of the proportions of Cuban sequences found in each geographical region can be used as a proxy to infer the origin of these lineages within the continent. Thus, the presumed origin of the Cuban Native American sequences could be dissected as follows: 38.7% (SD: 6.9%) to North, 26.7% (SD: 6.2%) to Central and 34.6% (SD: 6.7%) to South America. Although this is a very rough estimate of the putative origin of the lineages within the continent, it shows that the origin of Native American lineages in Cubans cannot be attributed to a single origin within the continent (or that the geographical distribution of mtDNA sequences in America is uninformative).
Using the same approach, the African lineages found in Cubans were compared to a dataset of 5,370 African sequences (see Additional file 1), divided into geographical regions according to [27]: East (n = 835), North (n = 1,312), South (n = 264), South-east (n = 416), South-west (n = 157), West (n = 1,184) and Central (n = 1,202). The putative origin of African lineages in Cubans could be mainly traced to Western (30.3% SD: 5.4%), Central (22.2% SD: 4.9%), South-western (18.4% SD: 4.6%) and South-eastern (12.2% SD: 3.9%) Africa, whereas the rest of the continent together (Eastern, Southern and Northern Africa) would presumably account for less than 17%. These figures are in agreement with the proposed origins of African slaves to the Americas [9] and previous findings based on mtDNA analysis. The Cuban African lineages were also compared to African lineages found in the American continent: North (n = 1,148), Central (n = 83) and South (n = 143). The shared sequence pool could also be attributed to Northern (56%) and Central America (26%) rather than South America (18%).
Taking into account a European dataset of 8,645 sequences (see Additional file 1), Cuban profiles were also compared to different geographical regions within Europe: South-west (Portugal, Spain, Italy and the Western Mediterranean islands), West (France and the British Isles), Central Europe (Germany, Switzerland, Austria and the Czech Republic), South-east (Balkan countries) and North (Scandinavia). Following the same rationale used above, the putative origin of the European sequences could be traced mainly to South-western (38.9% SD: 7.6%), Central (20.8% SD: 6.3%) and Western (16.7% SD: 5.8%) Europe. Taking into account that historical documentation testifies to an overwhelming European emigration to Cuba from Spain, this result, although in part corroborating the written legacy, also mirrors the high homogeneity of the European mtDNA variation at the level of haplotype and haplogroup resolution considered here. It is worth highlighting the large amount of sequences shared between Cubans and individuals from the Atlantic islands of Madeira, Azores and the Canary Islands (see Additional file 2).

Phylogeographic structure of Y-chromosome lineages in Cuba
With respect to the Y-chromosome haplogroups, 78.8% of the chromosomes analyzed can be traced back to the West Eurasian gene pool, whereas the African fraction accounts for 19.7% of Cuban lineages ( Table 2); two individuals (1.5%) carried Y-chromosomes of East Asian origin. Among the West Eurasian fraction, the vast majority of individuals belong to West European haplogroup R1 (xR1a). The African lineages found in Cubans have a Western (haplogroups E1, E2, E3a) and Northern (haplogroup E3b2) African origin. Interestingly, we did not observe Native American Y-chromosome lineages, such as those belonging to haplogroups P or Q [28]. Considering that our sample size of Y-chromosomes is 132 individuals (n), the highest frequency (F) of whatever unobserved haplogroup in the population, with a 95% probability, could be estimated as 1-e -Fn = 0.95, according to the Poisson distribution. Therefore, it is still possible that Native American haplogroups could be present in the Cuban population with roughly a maximum frequency of 2.3%.

Population structure
In order to analyze the geographical distribution of the Native American, European and African lineages within Cuba (see Figure 1), each sample was attributed to one of the three regions in which the island was divided: West, Centre and East (see Table 3). No significant differences were found in the proportions of the main mtDNA geographic lineages (χ 2 = 7.41, 4 d.f., P = 0.116), pointing towards a homogeneous mtDNA genetic landscape across the island. However, significant differences (χ 2 = 7.74, 2

Discussion
In contrast with the popular belief that the ancestral Native American pool in Cuba was totally erased by the massive arrival of Europeans and African slaves and centuries of admixture, and despite the absence of distinct ethnic Native American groups in Cuba, the present results demonstrate the persistence of a substantially high Native American component in the maternal specific gene pool. The presence of an unexpectedly high proportion Native American mtDNA substrate has been described previously in other American populations that also experienced dramatic demographic changes in colonial times, such as Puerto Rico [29,30] [31] and Mexico [32]. The estimated Native American component inferred in Cuba is higher than those estimates based on nuclear markers (<5%) [33]. In addition, the frequency of Native American mtDNA lineages in Cuba is larger than in the English-speaking Caribbean countries (5.4%) [34] as well as in Afro-American populations from Central and South America such as the Garifuna from Honduras (15.9%) and the Chocó people from Colombia (16.3%) [35] (see Additional file 3). In these cases, the indigenous population was even more dramatically replaced by African slaves and to a certain extent by Europeans. Our results differ from a previous independent study carried out in the Cuban province of Pinar del Rio [36], whose authors estimated that 50% of maternal lineages in this province were of European, 46% African, and a maximum 4% of Native American origin. Our results indicate that the Native American mtDNA haplogroup patterns are statistically homogeneous across the island, the maternal Native American substrate being higher than 25% in all provinces. Specifically, in Pinar del Rio we detected 33% Native American maternal lineages, a figure that significantly contrasts (χ 2 = 12.32; 2 d.f., P = 0.002) with the 4% found in the study by Torroni [36]. This difference highlights the risk of population stratification that can easily show up in case-control disease association studies, for example, leading to an increase of the false positive rate [26,37]. Forensic genetic studies are also sensitive to population stratification. This is especially true in pseudo-ethnic groups such as the 'Hispanics', a term firmly established in American societies, and in particular, in the USA [38]. Although our recruitment scheme was designed to capture a representative sample of the Cuban population, and although blood donations were not rewarded, our sample could include unapparent socioeconomic biases that would distort ancestry estimates.
The origin of Native Americans in the Caribbean, such as Ciboneys and Tainos, is a controversial issue. The present mtDNA Native American haplogroup frequencies in con- temporary Cuba differ significantly (P < 0.0001) from the haplogroup composition observed in ancient DNA samples from Ciboneys and Tainos. Fifteen ancient specimens of Ciboneys from Cuba have been analyzed [6] and classified into haplogroups C1 (nine individuals), D1 (five individuals), and A2 (one individual), while, in a different study [5], 24 samples of extinct Tainos from the neighboring island of the Dominican Republic were analyzed and classified as C1 (18 individuals) and D1 (6 individuals). Neither haplogroups A2 nor B2 were observed. According to Lalueza-Fox et al. [6], the scarcity of haplogroup A2 and the predominance of lineages C1 and D1 in the Caribbean point towards South America as the origin of both the Tainos and the Ciboneys. However, an argument based only on average continental haplogroup frequencies can be misleading since haplogroup frequencies vary substantially in different present-day Native American populations, either within North, Central, and South America. A process of progressive island colonization coming from the Orinoco Valley and/or from the Yucatan provides an appropriate ground for the action of genetic drift. Intensive episodes of genetic drift are in fact the rule more than the exception in other Native American populations. Over half of the Cuban sequences belonging to haplogroups C1 and D1 described in the present study have been already described in ancient DNA studies [5,6] in a total of 39 individuals (24 Tainos and 15 Ciboneys). However, these sequences are common in Native American populations (from both North and South America) and many represent founding lineages in the continent. In contrast to the hypothesis by Lalueza-Fox et al. (2003), our data suggest that both North and South America could have contributed to the original gene pool of Cuban Native Americans. We anticipate an even more complex scenario where the contribution of other Native American people coming from different continental locations in the post-colonization period could have contributed to the already admixed population. In fact, importation of Native Americans from Central and North America has already been reported [39]. Therefore, sampling effects consisting of merely the existence of close maternal relatedness between the individual analyzed could have contributed to distorting the haplogroup patterns observed in ancient Tainos. This hypothesis would also explain why the predominance of the C1 and D1 haplogroups in these pre-Columbian samples is not observed neither in present-day Cuba nor in Puerto-Rico [29,30], where the Tainos were also the Native inhabitants before the European arrival.
Although the hypothesized Southern origin of the Native American Cuban people as coming from the Orinoco Valley has been historically favored [2][3][4], our results indicate that a substantial genetic input from Central and North America (e.g. Yucatan or Florida peninsulas) cannot be ruled out. Due to the vulnerability of haplogroup frequencies to genetic drift, the phylogeographic information provided by the sequences is a necessary complementary tool in order to locate the origin of the Cuban sample in the context of the American continent. Thus, the comparison of the Cuban sequences to a dataset of more than 4,000 sequences covering the entire continent suggests a multiregional origin within America since the number of matches was similar for North, Central and South America. We are also aware that phylogeographic information is still of limited use because Native American lineages are scarcely informative for the HVS-I mtDNA control region. A higher molecular resolution based on the analysis of complete genomes (or high throughput mtDNA SNP coding region scans) can be useful to refine phylogeographic inferences [40,41].
Besides the presence of maternal Native American substrate in Cuba, the present results show a strong sexual asymmetry between European males and non-European females in Cuba. In contrast to the 33% Native American presence in the female lineages, no Native American fraction was found in the Y-chromosome haplogroups. This result is in agreement with historical documentation, which records the high prevalence of Native American-'white mestizos' in the first generations after the conquest. The European settlers were in vast majority men, and mating between Spanish men and Native American women was not uncommon during the first generations of settlers [7]. Similar sex biases between Native American and European founders have been previously described in Brazil [31,42,43] and Colombia [44]. Regarding the African component, the strong bias between the mtDNA and Ychromosome haplogroup frequencies is also noticeable. While the African lineages constitute 45% of the total maternal lineages, they are present in only 18% of the Cuban Y-chromosomes. Although extremely high amounts of African slaves were carried to Cuba, the African-born slave population presented extremely high rates of mortality and an unfavorable sex ratio. The 'mulattos' were considered inferior in the Cuban society since the beginning of the slave trade [7], so that mating between African men and European women was strongly discouraged. In contrast, the mating of European masters and the African slave women was more common.

Conclusion
This report shows that despite centuries of inter-ethnic mating between people from different continents, the Native American substrate persists in the present-day Cuban population contributing more than a third of the total maternal lineages. We have also described a noticeable European/Native American, as well as European/African sex bias between the paternal and maternal ancestries in Cuba. In addition, the origin of the present day Cuban