High frequencies of Y-chromosome haplogroup O2b-SRY465 lineages in Korea: a genetic perspective on the peopling of Korea

Background Koreans are generally considered a Northeast Asian group, thought to be related to Altaic-language-speaking populations. However, recent findings have indicated that the peopling of Korea might have been more complex, involving dual origins from both southern and northern parts of East Asia. To understand the male lineage history of Korea, more data from informative genetic markers from Korea and its surrounding regions are necessary. In this study, 25 Y-chromosome single nucleotide polymorphism markers and 17 Y-chromosome short tandem repeat (Y-STR) loci were genotyped in 1,108 males from several populations in East Asia. Results In general, we found East Asian populations to be characterized by male haplogroup homogeneity, showing major Y-chromosomal expansions of haplogroup O-M175 lineages. Interestingly, a high frequency (31.4%) of haplogroup O2b-SRY465 (and its sublineage) is characteristic of male Koreans, whereas the haplogroup distribution elsewhere in East Asian populations is patchy. The ages of the haplogroup O2b-SRY465 lineages (~9,900 years) and the pattern of variation within the lineages suggested an ancient origin in a nearby part of northeastern Asia, followed by an expansion in the vicinity of the Korean Peninsula. In addition, the coalescence time (~4,400 years) for the age of haplogroup O2b1-47z, and its Y-STR diversity, suggest that this lineage probably originated in Korea. Further studies with sufficiently large sample sizes to cover the vast East Asian region and using genomewide genotyping should provide further insights. Conclusions These findings are consistent with linguistic, archaeological and historical evidence, which suggest that the direct ancestors of Koreans were proto-Koreans who inhabited the northeastern region of China and the Korean Peninsula during the Neolithic (8,000-1,000 BC) and Bronze (1,500-400 BC) Ages.


Background
The Koreans are geographically a Northeast Asian group, who are thought to be most closely related to Altaic language-speaking populations. Anthropological and archaeological evidence suggests that the early Korean population was related to Mongolian ethnic groups, who inhabited the general area of the Altai Mountains and the Lake Baikal regions of southeastern Siberia [1]. Based on archaeological data, the earliest modern human lithic cultures date from 25,000 to 45,000 years ago in the Altai Mountains and southeastern Siberia and the Korean Peninsula [2,3]. According to Korea's founding myths, the Gojoseon (the first state-level society) was established around 2,333 BC in the region of southern Manchuria, but later stretched from the northeastern region of China to the northern part of the Korean Peninsula. Thus, the ancient Koreans (proto-Koreans) may have shared a common origin with the Northeast Asian groups who inhabited the general area of the southeastern Siberia and Manchuria during the Neolithic (8,000-1,000 BC) and Bronze (1,500-400 BC) Ages [1,4].
However, like many debates on the genetic history of human populations, the origin of the Korean population remains controversial. Studies of classic genetic markers have shown that, among the East Asians, Koreans tend to have the closest genetic affinity with Mongolians [5,6]. These findings support the first model, that a northeastern Asian origin is most likely, followed by a southeastward migration into the Korean Peninsula. The second model suggests a bi-and/or multidirectional route, with one migration through northeastern Asia and one through southeastern Asia. Recent surveys of genetic variation using two haploid markers (mitochondrial DNA and the Y chromosome) showed that the Korean population contains lineages typical of both Southeast and Northeast Asian populations [7][8][9]. These results led us to consider that the peopling of Korea might have involved multiple events, and that different aspects might be revealed by appropriate additional genetic markers and DNA samples.
Based on recent studies of Y-chromosome variation, the East Asian gene pool is almost completely contained within three major Y-chromosome lineages (haplogroups C, D and K) [7,10]. East Asian populations have major expansions of haplogroup O-M175 lineages (a lineage within K), although there are significant genetic differences in other lineages between the populations [8,11]. Most Y chromosomes found in the Korean population also belong to haplogroup O-M175 and its sublineages [8,9]. The Korean population has a high frequency of the haplogroup O3-M122 lineage, which is shared mainly with Chinese and Southeast Asian populations. By contrast, the haplogroup O2b-SRY465 lineages (and its sublineage) are found with high frequency and diversity specifically among modern populations of Japan and Korea [8,12,13]. These chromosomes are absent in most populations in China, but they have been detected in some samples of Beijing-Han Chinese, Manchurians, Mongolians and Southeast Siberians [7,8,14].
Hammer and Horai [12] hypothesized that the haplogroup O2b-SRY465 and its sublineage, O2b1-47z, might be Yayoi male lineages, which contributed to the contemporary mainland Japanese population via a process of demic diffusion during the Yayoi period from the Korean Peninsula, around 2,300 years ago. Hammer et al. [13] also suggested that the haplogroup O2b1-47z mutation arose on an ancestral O2b-SRY465 chromosome during early phases of the Yayoi migration. This lineage is also distributed sporadically in the Mongol, Manchu and Southeast Siberian populations, and in Indonesia, the Philippines, Vietnam and Micronesia [8,13,14]. Lin et al. [15] suggested that the O2b1-47z Y chromosome associated with the Y2 allele might have originated from an ancestral population in Henan or the southern parts of Shanxi near the Yellow River in China.
Thus, to understand the male lineage history of Korea, more data from such informative genetic markers from Korea and its surrounding regions are necessary. In the present study, 25 Y-chromosome single-nucleotide polymorphism (Y-SNP) markers and 17 Y-chromosome short tandem repeat (Y-STR) loci (Yfiler) were genotyped in 1,108 men from East Asian, several populations, to not only identify haplogroup O2b lineages and trace their migration history, but also to distinguish populations with different genetic backgrounds.

Methods
This study was approved by the Ethics Committee and institutional review boards of Institute of Bio-Science and Technology in the Dankook University. Separate written informed consent was obtained for enrollment from all participants.

Subjects
In this study, we analyzed 1,108 men, representatives of several East Asian populations (Korean, Japanese, Mongolian, Chinese, Indonesian, Filipino, Thai, Vietnamese; Table 1). The DNA samples included subsets of the samples examined by Kim et al. [16] and Jin et al. [8], although the exact number of subjects for each population occasionally varied between the studies. In addition, we included new Korean samples collected from 506 people residing in six major provinces in Korea [17]. DNA was prepared from whole blood taken from each participant by a standard method [18], or extracted from buccal cells as previously described [19].

Y-SNP genotyping
Initially all the samples were analyzed for 12 Y-SNP markers (M9, M45, M89, M119, M122, M174, M175, M214, P31, SRY465, 47z and RPS4Y) using a previously described protocol [17]. The samples belonging to haplogroups C, D, K, NO, O3 and P were subjected to further typing with an additional 13 Y-SNP markers to designate the subclades: two three-plex, three two-plex and one single-plex SNaPshot assay were developed for these 13 Y-SNP markers ( Figure 1). The nomenclature of the haplogroups followed that of the Y Chromosome Consortium [20]. Primers for PCR and single-base extension (SBE) reactions were designed (Primer 3.0 program; http://frodo.wi.mit.edu/primer3/, Cambridge, MA, USA) (see Additional file 1, Table S1; see Additional file 2, Table S2). Conditions of the SNaPshot assays were the same as those previously described [17], with the exception of PCR purification; in our assay, the PCR products were purified by adding 2 μl of an exonuclease I-shrimp alkaline phosphatase preparation (Exo-SAP; USB Corp., Cleveland, OH, USA) to 5 μl PCR product.

Results and discussion
This survey of East Asian Y-SNPs identified 18 haplogroups, 15 of which were present in our Korean group (Table 2; see Additional file 3, Table S3). In general, we found the East Asian population groups to be characterized by male haplogroup homogeneity, showing mostly expansions of haplogroup O-M175 lineages ( Figure 2). Haplogroups O3a3-P201, O2b*-SRY465, O3a*-M324, C3-M217 and O2b1-47z (a sublineage within O2b) were    the most frequent found in Koreans, and accounted for 28.9, 22.5, 15.0, 12.3 and 8.9% of Korean Y chromosomes, respectively. O3-M122 was the commonest Chinese Y-chromosome haplogroup found, and its presence in Korea may originate from demic diffusion by way of south-to-north migration [9,29,30]. C-RPS4Y was the commonest Mongolian Y-chromosome haplogroup (C3 lineage), and this is shared primarily with populations in northern Asia (including Koreans) [9,31,32]. Our network and diversity analyses of C-RPS4Y and O3 may support a southern origin that migrated into East Asia via the southern route [10,29] (Figure 3, Table 3). Interestingly, our Japanese group seemed to have both C1-M105 and C3-M217 chromosomes, whereas haplogroup C1-M105 was not present in most East Asian populations (except for Koreans), consistent with the previous report of Hammer et al. [13]. Haplogroup D2-M55 was found at high frequency only in Japanese subjects (29.3%), including 1.6% of Koreans, whereas it was absent elsewhere in these East Asian populations, except for the Beijing-Han group. Haplogroup D1-M15 was present at extremely Haplogroup N-M231 was present in the Korean and Mongolian groups at moderate frequencies, suggesting that the early Korean population may have shared a common origin with Mongolian ethnic groups who inhabited the general area of the Altai Mountains and Lake Baikal regions of southeastern Siberia [1]. The K-M9 defined chromosomes (L, Q and R subtypes) were also found at low frequencies in the Korean group, and these mainly occur in central and south Asia [13,33].
The presence of these haplogroups in the Korean population implies that the peopling of Korea probably involved multiple events [8,9]. Based on the result of the MDS plot (Figure 4 and Additional file 4, Table S4), the Korean population contains lineages from both the southern and northern areas of East Asia.
The geographical distribution of the O2b*-SRY465 (and its sublineage; together designated O2b-SRY465) in East Asia is shown in Figure 2. Interestingly, we found the high frequency of haplogroup O2b*-SRY465 to be characteristic of Koreans, (22.5%), but its distribution elsewhere in the East Asian populations was very patchy (Table 2, Figure 2). The cluster pattern in the O2b*-SRY465 network (Figure 3) was indicative of Kim et al. Investigative Genetics 2011, 2:10 http://www.investigativegenetics.com/content/2/1/10 a single origin, although people with haplogroup O2b*-SRY465 were found to be distributed widely across both northeastern and southeastern Asia. The genetic differences between the Korean and other East Asian populations were analyzed by AMOVA (Table 4).
When samples were grouped into Southeast Asians (SEA) and Northeast Asians (NEA), AMOVA could not distinguish between them. In addition, Koreans were again not distinct from either SEA or NEA, indicating that a southern versus a northern origin could  not be distinguished from the Y-STR based comparison (Table 4). Previous studies suggested a southeastern Asian origin for O2b-SRY465 [8,13], because the entire haplogroup O in East Asia has been proposed to have a southeastern Asian origin [10]. Under the southern origin hypothesis of the Y chromosome in Asia, no extensive bottleneck or genetic isolation are expected in far-east Asia, because there is no obvious barrier except perhaps a linguistic one (Sino-Tibetan-speaking people versus Altaic-speaking people) [34]. The very low incidence of the O2b*-SRY465 haplogroup in our Chinese population group is a substantial departure from the southernexpansion hypothesis of O2b*-SRY465, and indicates that O2b*-SRY465 has undergone apparent long-term isolation in far-east Asia. Therefore, it is unlikely that southeastern Asia is the place of the early settlement of modern humans carrying O2b chromosomes. In contrast to some other sublineages within haplogroup O, the diversity of O2b*-SRY465 calculated in a previous study [13] indicated that northern populations in East Asia are more polymorphic than southern populations, implying a northern Asian origin of O2b*-SRY465, in accordance with its highest frequency distribution pattern in far-east Asia [14,35]. However, as to the issue of whether haplotype originated in prehistoric Korea or in the Japanese Archipelago within northeastern Asia, the most likely region can again be identified on the basis of the highest frequency and the highest diversity.
Although there was an expected error due to small sample size, the Y-STR diversities (for example, mean number of pairwise differences and average variance ratio) were higher in Koreans than in Japanese (Table 5), consistent with previous results [13]. Both the haplogroup diversity and haplogroup frequency of O2b*-SRY465 thus suggest its early settlement in prehistoric Korea.
Age estimates based on our Y-STR data provide further support for the initial expansion of the O2b*-SRY465 in prehistoric Korea, in accordance with the proto-Korean lineages. Based on the Korean demographic and population history, a constant population in three different BATWING models may give the best fit to our data. Thus, time to most recent common ancestor (TMRCA) for the O2b*-SRY465 lineages was 9,900 years, assuming constant population size (Table 6). According to the three different population expansion models, TMRCA within the Korean and Japanese populations and the whole of East Asia ranged from 6,000-10,000 years ago. This date corresponds to the early Neolithic Age in Korea. Therefore, the age of the O2b*-SRY465 and pattern of variation within the lineages suggests a Neolithic proto-Korean founder in a nearby part of northeastern Asia, followed by a population increase in the vicinity of the Korean Peninsula.
Interestingly, the O2b1-47z sublineage seems to have diverged about 4,400 years ago (Table 6) rather than in the Yayoi period, consistent with a previous estimate of 4,000 years ago [13]. Therefore, the present data support  the possibility of an ancient Korean origin of O2b1-47z, rather than a Japanese origin [13]. Although O2b1-47z is at its highest haplogroup frequency in the Japanese population, the Y-STR data reveal more diversity of O2b1-47z haplotypes in Koreans, as shown by the mean number of pairwise differences and allele size variation ratio (Table 5), supporting an origin of the O2b1-47z mutation in prehistoric Korea. The Japanese samples studied here were derived from Kyushu, Shikoku and southern Honshu (the region closest to Korea), implying that the high frequency of the O2b1 lineage in Japan may be explained by genetic drift [12,16]. This finding is concordant with a previous report of Nonaka et al. [36], showing less diversity in Japan than in Korea ( Table 5). TMRCA of O2b1-47z, and a star-cluster pattern in this study ( Figure 3) and a previous study [13], all suggest the possible association of O2b1-47z with the peopling of Korea. However, because most of the Japanese O2b*-SRY465 and O2b1-47z samples were also in the core (or close to) of the cluster (Figure 3), it cannot exclude the possibility that the Japanese and the Koreans derive from the same proto-population outside of Korea (carrying these lineages) at roughly the same time. Thus, further studies of sufficient sampling in the vast East Asian region and genomewide genotyping should provide further insights.

Conclusions
Our results support the idea that both haplogroups O2b*-SRY465 and O2b1-47z had an in situ origin among Northeast Asians, particularly among the prehistoric Koreans, rather than in southeast Asia or Japan as previously envisaged. The combination of the O2b initial settlement (which became an indigenous proto-Korean component) in part with the relatively recent O3 and C3 lineages (which include a Chinese component) explains some of the main events formulating the current Y chromosome composition of the Korean population. Thus, our findings are consistent with linguistic, archaeological and historical evidence, which suggest that the direct ancestors of Koreans were proto-Koreans who inhabited the northeastern region of China and the Korean Peninsula during the Neolithic (8,000-1,000 BC) and Bronze (1,500-400 BC) Ages.