NIH Public Access Author Manuscript . Author manuscript; available in PMC 2014 December 23.

NIH-PA Author Manuscript

Published in final edited form as: . 2014 ; 40(4): 387–396.

Constructing linkage map based on a four-way cross population Jiang Beibei1, Yu Shizhou1, Xiao Bingguang2,*, Lou Xiangyang3, and Xu Haiming1,4 Jiang Beibei: [email protected] 1Institute

of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China

2Yunnan

Academy of Tobacco Agricultural Sciences, Kunming 650031, China

3Department 4Research

of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA

Center for Air Pollution and Health, Zhejiang University, Hangzhou 310058, China

NIH-PA Author Manuscript

Summary

NIH-PA Author Manuscript

Currently, developing genetic linkage map mostly use the derived-populations from crossing of two homogenous parents, which only covers limited genetic diversity and is inappropriate for some species, such as tobacco with lower diversity in genome. It is very general that there are no sufficient polymorphic markers to construct linkage map and ineffective to conduct markerassisted selection (MAS) and quantitative trait locus (QTL) mapping based on lower density linkage map. This study proposed a method for developing genetic linkage map based on a fourway cross population. Computer simulation was conducted to investigate the feasibility and effectiveness of the method and a supporting program was designed. The main procedures and features of the proposed method were summarized as follows: 1) estimating genetic distance of any paired markers based on maximum likelihood method; 2) splitting all markers into different groups (linkage group) by cluster analysis based on genetic distance of markers; 3) for each linkage group, two end markers were first determined, then the marker order could be determined by inserting other markers in appropriate position by distance analysis of any three neighboring markers. Monte Carlo simulation showed that the proposed method is feasible, effective, and applicable in other derived populations from crossing of two homogenous parents.

Keywords four-way cross population; genetic distance; genetic map; linkage analysis; Monte Carlo simulation Mapping population is considered as a key genetic tool in the construction of genetic mapping, and the first step in genetic linkage mapping is the selection of appropriate mapping population. In conventional genetic mapping, this kind of population, which was derived from a cross between two inbred lines[1–3], has been widely applied to linkage analysis. However, due to lower level of polymorphism between species and within-species, most of the maps have included only a small portion of the genome. For example, a joint

*

Corresponding author: Xiao Bingguang, [email protected].

Beibei et al.

Page 2

NIH-PA Author Manuscript

map from different mapping populations has shown only 31% coverage of the cotton genome[4]. If this genetic map with such poor coverage was used for quantitative trait locus (QTL) mapping, only a very small proportion of the genome would be explored and large amounts of QTL information could not be revealed[5]. It is expected that the more differences between the parental lines there are, the better the genetic map will be. In order to get a perfect genetic map, researchers use the parental lines with genetic differences as much as possible. Nevertheless, in breeding practice, plants with smaller genetic differences were usually used as stock plants[5]. So if researchers want to use the results of QTL mapping base on mapping population with large genetic differences, it will have low efficiency in marker-assisted selection (MAS) breeding because the genetic background have been changed[6]. In addition, for some species with poor genetic polymorphism, the number of markers that can be used is small[7], and for most outbreed species, such as most trees and livestocks, inbred lines could not be developed[8].

NIH-PA Author Manuscript

Four-way cross (4WC) population, derived from two different single-cross hybridizations by four inbred lines (L1, L2, L3 and L4), appears constantly in breeding programs. Using 4WC population directly to construct the genetic map is not only economical but also practical[8–10]. Besides, the strategy based on four-way cross can overcome the shortcoming that outbreed species could not develop inbred lines[8].

NIH-PA Author Manuscript

Because of such advantages, more and more efforts have been made to develop statistical methods for linkage mapping based on a 4WC population. In 1990, Ritter et al. developed estimators for most of the genetic situations in crosses between heterozygous parents[11] and then Arús et al. contributed the solution to two additional situations[12]. When Ritter nearly completed the set in 1996[13], Maliepaard et al.[14] presented maximum likelihood estimates of the recombination fraction and LOD (logarithm of odds) score formulas for all possible pairs of markers that segregated in the ratio of 1: 1, 3: 1, 1: 2: 1, or 1: 1: 1: 1 in a full-sib family in the next year. Based on these, Wu et al.[15] and Lu et al.[16] developed the strategy. Their methods could not only simultaneously estimate linkage and linkage phases but also could predict gene order for a group of markers which may segregate in any possible ratio in a full-Sib family. In 2010, Tong et al.[17] put forward a method which could calculate the likelihood value for an order of a large number of markers. Since 4WC population has the same hereditary basis with the full-sib family, using these methods in 4WC population to construct genetic linkage map is feasible and researchers have applied the methods to 4WC population or analogous population in QTL mapping[5,10]. The tools used in the researches usually are MapMaker[2] and JoinMap[18]. But these reports just focused on how to find the QTLs in a 4WC population. The former used the double pseudotestcross” strategy[19], in which the markers with the segregation of 1: 1 ratio (testcross configuration) were analyzed by traditional statistical approaches. It is not suitable for markers with different segregation patterns and it has the disadvantages of resulting in two separated linkage maps, one for each parent. The latter did further improvements and handled the problems in the analysis of markers showing different segregation patterns in both parents. However, the methods used for ordering markers in a linkage group and recombination fraction estimation were based on minimum squares approximations for

. Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 3

multipoint estimates of distances, but not on the maximum likelihood approach, which would lose several advantages[15].

NIH-PA Author Manuscript

In the present study, a method, using maximum likelihood approach, for constructing linkage map based on a 4WC population was proposed. Monte Carlo simulation was conducted to illustrate the reliability and efficiency of the method, and corresponding computer program named FMAP was designed. Our research would be beneficial to constructing high-density genetic, linkage maps in some plant species with poor genetic polymorphism or some out crossing plant species.

1. Materials and methods 1.1 Four-way cross population

NIH-PA Author Manuscript

A 4WC population consists of two single crosses and can be expressed as (L1×L2)×(L3×L4) (Fig. 1), and in the two diploid parents of a 4WC population, there may be as many as four different alleles presented at a single locus and the number of alleles of different locus may be various. Obviously, the genetic configuration of 4WC population is similar to an outbreed full-sib family. The only difference between a real 4WC population and a regular full-sib family is that the grandparents of a full-sib family could not be inbred[20]. As a result, the linkage phases of marker loci in the parents of a full-sib family can be ambiguous. Given marker genotypes of a three-generation pedigree, the linkage phases of parents may be inferred fairly accurately[9]. For all molecular marker types, the alleles are usually recognized as fragments with distinct molecular masses and one marker is related to one locus. Table 1 shows the types of alleles and the segregation ratio for markers at one locus. It summarized that the combinations of two parental genotypes at an informative marker locus, which were called segregation types, may be aa×ab, ab×ab, aa×bc, ab×ac, ab×bc or ab×cd, where a, b, c, and d denote different alleles at one marker locus, and the two characters on the left of the crossing symbol represent the maternal marker genotype and the two characters on the right represent the paternal marker genotype.

NIH-PA Author Manuscript

Linkage phases between any two-marker loci are not prior known in a 4WC population while they are fixed in populations derived from inbred lines, such as back cross and F2, intercross populations[14]. From Table 2, which listed the number of gene segregation types on two loci under the circumstance of codominant gene expression on every locus, it is easily found that the number of segregation types of the 4WC population is nearly nine times more than the population derived from inbred lines. It means that the 4WC population contains more genetic information, which would increase the polymorphisms on every locus and the difficulty in linkage analysis. In a 4WC population, linkage phase combinations included: 1) coupling (c) in the paternal parent and uninformative in the maternal parent, or vice versa; 2) repulsion (r) in the paternal parent and uninformative in the maternal parent, or vice versa; 3) coupling in both parents (c×c); 4) repulsion in both parents (r×r); and 5) coupling in the maternal parent and repulsion in the paternal parent (c×r), or vice versa (r×c). Hence, there may be two or four

. Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 4

NIH-PA Author Manuscript

possible linkage phases between two informative marker loci in a 4WC population. For example, when the segregation types are all aa×ab at two marker loci, the paternal parent has joint genotype abab and the linkage phase is either coupling (aa/bb) or repulsion (ab/ba), while the maternal parent has joint genotype aa/aa without recombination information. Another example is that, when the segregation types are ab×ab and ab×cd at two marker loci, respectively, the linkage phase between the two loci is either aa/bb×ac/bd (c×c), or aa/ bb×ad/bc (c×r), or ab/ba×ac/bd (r×c), or ab/ba×ad/bc (r×r). With regard to inferring the linkage phases between any pairs of loci, several statistical methods have been well documented[14–16]. 1.2 Estimating the genetic distance between two markers

NIH-PA Author Manuscript

The estimation of genetic distances is the foundation for constructing genetic linkage map. In order to calculate the recombination frequency, we need to know the number of recombination events in both parental meioses. If one knows the genotypes of the gametes, these could be counted easily (Table 3). However, the marker genotypes of the gametes cannot always be deduced from the phenotypes of the individuals in the progeny. To estimate the recombination rate (r) and calculate the likelihood of the observed marker data, the ML-estimate was used[3,21]. Maliepaard et al.[14] have presented all the formula for estimating the distance between the two markers in the Fs-family. According to their results, the formula for estimating r value and corresponding LOD score of two markers in the 4WC population can be found in Table 4. 1.3 Establishing linkage groups

NIH-PA Author Manuscript

Based on the estimation of genetic distances between any pair of two markers, the linkage groups could be established and the LOD score of paired markers was regarded as the criterion assigning markers to one linkage group. A threshold of LOD value, below which linkage is not considered significant, could be set by users. At any stage in this procedure, there is a group of markers, which have been assigned to a linkage group, and a group of free’ markers, which have not yet been assigned. At each step, the following decision will be made: if none of the free’ markers is significantly linked (by LOD value) to any one of the existing groups, a new linkage group is created. Otherwise, the first free’ marker which does show linkage with an existing group is added to that group. The grouping of markers depends on the users how to set critical LOD value. Higher critical LOD values will result in more but smaller linkage groups. A critical LOD value of 3.0 or larger value will, in general, prevent incorrect assignment of markers to the same linkage group. When starting from scratch, it is not unwise to try several other standard LOD values because it will reveal the stability of grouping and also indicate which set of markers form tight linkage group, which marker is debatable and which is definitely floating. 1.4 Ordering markers in linkage groups Ordering markers in linkage groups can be considered as a special case of the traveling salesman problem (TSP)[22–23]. For solving the problem, a number of efficient search algorithms have been developed[24]. In recent years, some methods for the TSP have been used to solve the problem of marker ordering[25]

. Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 5

NIH-PA Author Manuscript

In our study, ordering markers is achieved based on ant colony optimization (ACO)[26], which is a set of algorithms inspired by the cooperative behavior of real ants in finding the shortest path from their nest to a food source. Over the years, it has been reported that ACO has been used successfully to solve various types of discrete optimization problems, and it is one of the best algorithms for the TSP[26]. The more informative multipoint (three-point) analysis may be used to check the local order of markers in a mapped linkage group and also to refine the map distance between adjacent markers. Finally, the estimated recombination frequencies of each interval are transformed to genetic distances using the Haldane’s[27] or the Kosambi’s[28] map function.

2 Monte Carlo simulation Assuming that there are three markers with a known order on a chromosome in a 4WC population, the segregation types of these three markers are shown in the third column of Table 5. The linkage phases and the recombination rate between the ith and (i+1)th markers are assumed as those in the second and the third columns of Table 4 and the sample size is assumed to be 200 and the simulation is repeated 100 times.

NIH-PA Author Manuscript

According to the values of LOD, for marker 1 and marker 2, coupling in the maternal parent and repulsion in the paternal parent (c × r), or vice versa (r × c) was impossible and the value of repulsion in both parents (r × r) was over 0.5, which could not be explained by biology. Thus, the linkage phase for marker 1 and marker 2 was coupling in both parents (c × c) and the value of recombination rate was 0.125. For marker 1 and marker 3, all the four values of LOD were far less than 3. It was indicated that there was no linkage between the two markers and the value of recombination rate was 0.5. Similarly, the recombination rate for marker 2 and marker 3 could be calculated. As we expected, the estimated values of recombination rates were very close to the parameters that we set. Statistically, there were no significant differences between the parameters and the estimated recombination (Table 5).

NIH-PA Author Manuscript

To verify the correctness of the algorithm for establishing linkage groups and ordering markers, a simulation was conducted and the result confirmed the validity. For the population derived from two inbred lines, we chose backcross population to examine the program based on the proposed method and analyzed the genetic distance for every linkage group with 73 markers on 5 chromosomes by t-test. The parameters configuration and the result of simulation were presented in Table 6. For all of the five linkage groups, there were no significant differences between the parameters and the estimated values. The result was still positive for our study, and the feasibility and effectiveness of the proposed method on some populations evolved from inbred lines were demonstrated.

3 Discussion At present, the trend of genetic linkage map construction is highly saturated, practical and generalized[29–30]. Though the methods for constructing genetic linkage maps in inbred lines have been well developed in the past 20 years, it is still difficult to apply them to outbreed species because their genetic structure is too complicated to meet the requirement of

. Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 6

NIH-PA Author Manuscript

practical breeding purpose effectively. Our method, based on 4WC, can overcome this restriction. As a result, it can avoid the problem that the population derived from two inbred lines with poor genetic diversity, and improve map density and coverage. The reasons are as follows: First, the 4WC population can provide much more linkage information than the population derived from two inbred homogenous lines. Our approach can deal with not only any type of segregation markers but also incomplete marker data in a 4WC population. Moreover, all markers in the linkage group, including missing markers, are incorporated in the linkage analysis. Therefore, the recombination rate between adjacent markers can be estimated more accurately and stably. Second, the algorithms for establishing linkage groups and ordering markers, which have higher power than other methods even when the sample size is small, are effective and feasible.

NIH-PA Author Manuscript

Aside from traditional linkage analysis, there is another analysis method called association analysis in genetic study[31]. Both of them use certain statistical theory to explore genetic information, which has their own advantages. Association analysis relies on natural population and the precision of positioning can be up to the level of single gene[32]. Since the association analysis method was proposed, it has been widely used in genetic research. However, some aspects of association analysis still need to improve. First of all, the stratification of the population can cause false positive[33]. Second, the computation in association analysis is very time-consuming, especially on genome-wide association studies (GWAS)[34]. Compared with the association analysis, the linkage analysis based on 4WC population is more likely to control false positive and improve computing speed. With the development of biology, more and more attention is paid on the analysis of multiple alleles on one gene locus. The innovation of the conventional theory and method for two alleles on one gene locus is one purpose of our research. Our method is the extension of current method for construction of molecular linkage map, which will provide breeders or geneticists more options to develop linkage map, especially for those species with lower genetic diversity. In future, it’s also possible to extend our method to map QTLs of complex traits based on the four-way cross population.

NIH-PA Author Manuscript

Acknowledgments Foundation item: Supported by the National Natural Science Foundation of China (No. 31271608), the Yunnan Provincial Tobacco Company Project (No. 08A05), the CNTC Science and Technology Project (No. 110200701023), and the National Institutes of Health (No. DA025095).

References 1. Donis-Keller H, Green P, Helms C, et al. A genetic linkage map of the human genome. Cell. 1987; 51(2):319–337. [PubMed: 3664638] 2. Lander ES, Green P, Abrahamson J, et al. MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics. 1987; 1(2):174–181. [PubMed: 3692487] 3. Lander ES, Green P. Construction of multilocus genetic linkage maps in humans. Proceedings of the National Academy of Sciences of the United States of America. 1987; 84(8):2363–2367. [PubMed: 3470801]

. Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 7

NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

4. Ulloa M, Meredith WR, Shappley ZW, et al. RFLP genetic linkage maps from four F(2. 3) populations and a joinmap of Gossypium hirsutum L. Theoretical and Applied Genetics. 2002; 104(2/3):200–208. [PubMed: 12582687] 5. Qin HD, Guo WZ, Zhang YM, et al. QTL mapping of yield and fiber traits based on a four-way cross population in Gossypium hirsutum L. Theoretical and Applied Genetics. 2008; 117(6):883– 894. [PubMed: 18604518] 6. Guo WZ, Zhang TZ, Zhu XF, et al. Modified backcross pyramiding breeding with molecular marker-assisted selection and its applications in cotton. Acta Agronomica Sinica. 2005; 31(8):963– 970. 7. Rebai A, Goffinet B. Power of tests for QTL detection using replicated progenies derived from a diallel cross. Theoretical and Applied Genetics. 1993; 86(8):1014–1022. [PubMed: 24194011] 8. He XH, Qin HD, Hu ZL, et al. Mapping of epistatic quantitative trait loci in four-way crosses. Theoretical and Applied Genetics. 2011; 122(1):33–48. [PubMed: 20827458] 9. Rao SQ, Xu SZ. Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity. 1998; 81(2):214–224. [PubMed: 9750263] 10. Xu SZ. Mapping quantitative trait loci using four-way crosses. Genetical Research. 1996; 68(2): 175–181. 11. Ritter E, Gebhardt C, Salamini F. Estimation of recombination frequencies and construction of RFLP linkage maps in plants from crosses between heterozygous parents. Genetics. 1990; 125(3): 645–654. [PubMed: 1974227] 12. Arús P, Olarte C, Romero M, et al. Linkage analysis of ten isozyme genes in F1 segregating almond progenies. Journal of the American Society for Horticultural Science. 1994; 119(2):339– 344. 13. Ritter E, Salamini F. The calculation of recombination frequencies in crosses of allogamous plant species with applications to linkage mapping. Genetical Research. 1996; 67(1):55–65. 14. Maliepaard M, de Mol NJ, Tomasz M, et al. Mitosene-DNA adducts. Characterization of two major DNA monoadducts formed by 1, 10-bis (acetoxy)-7-methoxymitosene upon reductive activation. Biochemistry. 1997; 36(30):9211–9220. [PubMed: 9230054] 15. Wu RL, Ma CX, Painter I, et al. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theoretical Population Biology. 2002; 61:349–363. [PubMed: 12027621] 16. Lu Q, Cui YH, Wu RL. A multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family. BMC Genetics. 2004; 5(1):20. [PubMed: 15274749] 17. Tong CF, Zhang B, Shi JS. A hidden Markov model approach to multilocus linkage analysis in a full-sib family. Tree Genetics & Genomes. 2010; 6(5):651–662. 18. Stam P. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. The Plant Journal. 1993; 3(5):739–744. 19. Grattapaglia D, Sederoff R. Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics. 1994; 137(4):1121– 1137. [PubMed: 7982566] 20. Knott SA, Marklund L, Haley CS, et al. Multiple marker mapping of quantitative trait loci in a cross between outbred wild boar and large white pigs. Genetics. 1998; 149(2):1069–1080. [PubMed: 9611214] 21. Wedd N. A computer program for constructing a maximum-likelihood map from linkage data and its application to human chromosome 1. Annals of Human Genetics. 1984; 48(4):333–345. [PubMed: 6548618] 22. Korostensky C, Gonnet GH. Using traveling salesman problem algorithms for evolutionary tree construction. Bioinformatics. 2000; 16(7):619–627. [PubMed: 11038332] 23. Gilmore PC, Gomory RE. A solvable case of the traveling salesman problem. Proceedings of the National Academy of Sciences of the United States of America. 1964; 51(2):178–181. [PubMed: 16591142] 24. Kirkpatrick S, Vecchi MP. Optimization by simulated annealing. Science. 1983; 220(4598):671– 680. [PubMed: 17813860]

. Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 8

NIH-PA Author Manuscript NIH-PA Author Manuscript

25. Schiex, T.; Gaspin, C. Cartagene: constructing and joining maximum likelihood genetic maps. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology; 1997. p. 258-267. 26. Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics. 1996; 26(1):29–41. 27. Haldane J. The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics. 1919; 8(29):299–309. 28. Kosambi DD. The estimation of map distances from recombination values. Annals of Eugenics. 1943; 12(1):172–175. 29. Zakharov IA, Miasnikova EM. Computer program of genetic map construction and database of gene localization. Genetika. 1993; 29(6):1047–1049. [PubMed: 8370501] 30. Gustafson JP, Ma XF, Korzun V, et al. A consensus map of rye integrating mapping data from five mapping populations. Theoretical and Applied Genetics. 2009; 118(4):793–800. [PubMed: 19066841] 31. Flint-Garcia SA, Thornsberry JM, Buckler ES. Structure of linkage disequilibrium in plants. Annual Review of Plant Biology. 2003; 54:357–374. 32. Holland JB. Genetic architecture of complex traits in plants. Current Opinion in Plant Biology. 2007; 10(2):156–161. [PubMed: 17291822] 33. Doerge RW. Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews Genetics. 2002; 3(1):43–52. 34. Stich B, Melchinger AE. Comparison of mixed-model approaches for association mapping in rapeseed, potato, sugar beet, maize, and A rabidopsis. BMC Genomics. 2009; 10:94. [PubMed: 19250529]

NIH-PA Author Manuscript . Author manuscript; available in PMC 2014 December 23.

Beibei et al.

Page 9

NIH-PA Author Manuscript NIH-PA Author Manuscript

Fig. 1.

Production of a four-way cross (4WC) population

NIH-PA Author Manuscript . Author manuscript; available in PMC 2014 December 23.

NIH-PA Author Manuscript aa aa aa —

Genotype of parent PA

Genotype of parent PB

Genotype of offspring

Separation ratio



ab

bb

aa

2

1: 1

aa, ab

ab

aa

2

1: 2: 1

aa, ab, bb

ab

ab

2

a, b, c, and d denote different alleles at one marker locus.

1

Number of alleles

1: 1

ab, ac

bc

aa

3

1: 1: 1: 1

ab, ac, aa, bc

ac

ab

3

1: 1: 1: 1

ab, ac, bb, bc

bc

ab

3

1: 1: 1: 1

ac, bc, ad, bd

cd

ab

4

NIH-PA Author Manuscript

Number of alleles and separation ratio in one gene site of a 4WC population

NIH-PA Author Manuscript

Table 1 Beibei et al. Page 10

. Author manuscript; available in PMC 2014 December 23.

NIH-PA Author Manuscript

NIH-PA Author Manuscript 2

1 6

ab×ab

aa×ab

11

7

3

aa×bc

14

12

8

4

ab×bc

Loe. 2

17 19

18

16

13

15

*

5 10

* 9

ab×cd

ab×cc

Uninformative in recombination.

*

The two characters on the left of the crossing symbol represent the maternal marker genotype and the two characters on the right represent the paternal marker genotype; a, b, c and d have same definition as that in Table 1.

ab×cd

ab×cc

ab×bc

aa×bc

ab×ab

aa×ab

Loc. 1

Number of segregation types at two marker loci of 4WC population

NIH-PA Author Manuscript

Table 2 Beibei et al. Page 11

. Author manuscript; available in PMC 2014 December 23.

NIH-PA Author Manuscript

. Author manuscript; available in PMC 2014 December 23.

13

12

11

10

9

8

7

6

5

4

3

2

1

No.a)

ab

aa

L1

aa

L1

L2

aa

aa

L2

ab

L1

ab

L1

L2

ab

ab

L2

ab

L1

ab

L1

L2

aa

ab

L1

L2

ab

ab

L2

ab

L1

aa

L1

L2

ab

aa

L2

aa

L1

aa

L2

L1

ab

aa

L1

L2

aa

aa

L1

L2

PAb)

Loc.

bc

bc

bc

bc

bc

cd

ab

cc

ab

bc

ab

bc

ab

ab

ab

cd

ab

bc

ab

bc

ab

ab

ab

ab

ab

PBb)

ab

ab

ab

ab

ab

ac

aa

ac

aa

ab

aa

ab

aa

aa

aa

ac

aa

ab

aa

ab

aa

aa

aa

aa

aa

1

ab

ac

ab

bc

ab

ad

aa

bc

aa

ac

aa

ac

aa

ab

aa

ad

aa

ac

aa

ac

aa

ab

aa

ab

aa

2

ab

bb

ab

ab

bc

bc

aa

ac

ab

bb

aa

ab

ab

bb

aa

bc

aa

bb

aa

ab

ab

bb

aa

aa

ab

3

ab

bc

ab

bc

bc

bd

aa

bc

ab

bc

aa

ac

ab

aa

ab

bd

aa

bc

aa

ac

ab

aa

ab

ab

ab

4

ac

ab

ac

ac

ab

ac

bb

ab

ab

ab

bb

ab

ab

ac

ab

ab

ab

ab

ab

5

ac

ac

ac

ad

ab

bc

bb

ac

ab

ac

bb

bb

ab

ad

ab

ac

ab

bb

ab

6

ac

bb

ac

bc

ab

bb

ab

aa

bb

bc

ab

bb

ab

7

ac

bc

ac

bd

ab

bc

ab

ab

bb

bd

ab

bc

ab

8

ac

bb

ab

bb

bb

bb

9

ad

bb

ac

bb

10

Phenotype indicator (f)

bc

bb

bb

bb

11

bd

bb

bc

bb

12

13

14

15

NIH-PA Author Manuscript

Definition of marker phenotype indicators

16

NIH-PA Author Manuscript

Table 3 Beibei et al. Page 12

ab

ab

L2

ab

L1

ab

L1

L2

ab

ab

L1

L2

ab

ab

L1

L2

ab

ab

L1

L2

ab

L2

ab

L1

cd

cd

cd

cc

cc

cc

cd

bc

cc

bc

bc

bc

cd

ac

ac

ac

ac

ac

ac

ac

ab

ac

ab

ab

ab

ac

1

ad

ac

ad

ac

bc

ac

ad

ab

bc

ab

ac

ab

ad

2

bc

ac

bc

ac

ac

bc

bc

ab

ac

ac

bb

ab

bc

3

bd

ac

bd

ac

bc

bc

bd

ab

bc

ac

bc

ab

bd

4

ac

ad

ac

bc

ac

ac

ac

bb

ab

ac

ac

5

ad

ad

ad

bc

ad

ac

bc

bb

ac

ac

ad

6

bc

ad

bc

bc

bc

ac

ac

bc

bb

ac

bc

7

bd

ad

bd

bc

bd

ac

bc

bc

bc

ac

bd

8

ac

bc

ac

bc

ab

bb

9

ad

bc

ad

bc

ac

bb

10

Phenotype indicator (f)

bc

bc

bc

bc

bb

bb

11

bd

bc

bd

bc

bc

bb

12

ac

bd

ac

bd

ab

bc

13

ad

bd

ad

bd

ac

bc

14

bc

bd

bc

bd

bb

bc

15

bd

bd

bd

bd

bc

bc

16

The genotypes of the two parents (PA → PB) at the first (L1) and the second (L2) loci; a, b, c and d have same definition as that in Table 1.

b)

Configuration number according to Table 2;

a)

Reciprocal crosses have identical definitions;

19

18

17

16

15

14

ab

L2

PBb)

NIH-PA Author Manuscript

PAb)

NIH-PA Author Manuscript

Loc.

NIH-PA Author Manuscript

No.a)

Beibei et al. Page 13

. Author manuscript; available in PMC 2014 December 23.

. Author manuscript; available in PMC 2014 December 23.

6

4, 5, 12, 13

3

2, 18

1, 11, 17

r̂r×r= 1 − r̂c×c

r×r

r×c

r̂r×c= r̂c×r

r=[(n2 + n4 + n6 + n8) + 2 (n3 + n7) + 2 n5 r2/(1−2r+2r2)]/(2n)

c×c

c×r

r̂r= 1 − r̂c

r̂r= 1 − r̂c

r̂r= 1 − r̂c

r̂r= 1 − r̂c

Estimate of r

r

c

r

c

r

c

r

c

Phaseb)

(n1 + n3 + n5 + n7) log10[2(1 − r̂c)] + (n2 + n3 + n6+ n8) log10(2 r̂c)

(n1 + n4) log10[2(1 − r̂c)] + (n2 + n3) log10(2 r̂c)

(n1 + n6) log10[2(1 − r̂c)] + (n3+ n4) log10(2 r̂c)

(n1 + n4) log10[2(1 − r̂c)] + (n2 + n3) log10(2 r̂c)

LOD score

NIH-PA Author Manuscript

No.a)

NIH-PA Author Manuscript

Formula for estimating recombination rate and calculating LOD score

NIH-PA Author Manuscript

Table 4 Beibei et al. Page 14

. Author manuscript; available in PMC 2014 December 23.

r̂r×r= 1 − r̂c×c

r̂=[n1 + 2n3 + n4 + n6 + n7 + n9 + 2n10 + n12 + 2(n5 + n8) r2/(1−2r+2r2)]/(2n)

r×r

c×r

r

c (n1 + n3 + n6 + n8) log10[2(1 − r̂c)] + (n2 + n4 + n5 + n7) log10(2 r̂c)

2(n2 + n5 + n12+ n15) log10[2(1 − r̂c×r)] + (n1 + n4 + n6 + n7 + n10 + n11 + n13 + n16) log10[4 r̂c×r(1 − r̂c×r)] + 2(n3 + n8 + n9 + n14) log10(2 r̂c×r)

2(n1 + n6 + n11+ n16) log10[2(1 − r̂c×c)] + (n2 + n3 + n5 + n8 + n9 + n12 + n14 + n15) log10[4 r̂c×c(1 − r̂c×c) + 2(n4 + n7 + n10 + n13)log10(2 r̂c×c)

(n1 + n6) log10[2(1 − r̂c)] + (n2+ n5) log10(2 r̂c)

2(n1+ n12)log10[2(1 − r̂c×c)] + 2(n4+ n9) log10(2 r̂c×c) + (n2 + n3 + n5 + n8 + n10 + n11)log10log10[4 r̂c×c(1 − r̂c×c)]

(n1 + n6) log10[2(1 − r̂c)] + (n2+ n5) log10(2 r̂c)

For several configurations of linkage phases, the estimate of r and LOD score can be obtained by exchanging as indicated the phenotype frequencies in the fully specified preceding formulas. If the estimator contains the recombination frequency itself, then it is an iterative estimator. c: Coupling in the paternal parent and uninformative in the maternal parent, or vice versa; r: Repulsion in the paternal parent and uninformative in the maternal parent, or vice versa; c × c: Coupling in both parents; r × r: Repulsion in both parents; c × r or r × c: Coupling in the maternal parent and repulsion in the paternal parent, or vice versa.

For each configuration, two or four linkage phase combinations are distinguished; n: Total number of observed individuals; ni(i=1, 2, 3, …): Number of observed individuals of ith genotype; log: Logarithm to base 10.

b)

r̂r= 1 − r̂c

r̂r×c= 1 − r̂c×r

r×c

r̂r×r= 1 − r̂c×c

r×r

r̂c×r=[n1 + n4 + n6 + n7 + n10 + n11 + n13 + n16 + 2(n3 + n8 + n9 + n14)]/(2n)

r̂c×c=[n2 + n3 + n5 + n8 + n9 + n12 + n14 + n15 + 2(n4 + n7 + n10 + n13)]/(2n)

c×c

c×r

r̂r= 1 − r̂c

r

c

r̂r×c= r̂c×r

r=[n2 + n3 + 2n4 + n5 + n8 + 2n9 + n10 + n11 + 2(n6 + n7) r2/(1−2r+2r2)]/(2n)

c×c

r×c

r̂r= 1 − r̂c

r

c

Configuration number according to Table 2;

a)

15

14, 16, 19

9

8, 10

7

LOD score

NIH-PA Author Manuscript

Estimate of r

NIH-PA Author Manuscript

Phaseb)

NIH-PA Author Manuscript

No.a)

Beibei et al. Page 15

NIH-PA Author Manuscript

NIH-PA Author Manuscript

M3

M2

ab×cd

ab×cd

ab×cd

Segregation type

0.500

0.500

0.100

True value of r

0.500

0.465

r×r

r×c

0.500

c×c

0.483

0.488

r×c

c×r

0.500

0.423

r×r c×r

0.500

c×c

0.500

r×c

0.500

r×r 0.500

0.125

c×c

c×r



Linkage phase

0.326

0.326

0.048

0.048

0.054

0.054

0.590

0.590

0.000

0.000

54.960

54.960

LOD score

0.500±0.000

0.500±0.000

0.112 ± 0.005

c: Coupling in the paternal parent and uninformative in the maternal parent, or vice versa; r: Repulsion in the paternal parent and uninformative in the maternal parent, or vice versa; c×c: Coupling in both parents; r×r: Repulsion in both parents; c×r or r×c: Coupling in the maternal parent and repulsion in the paternal parent, or vice versa.

r: Recombination rate; r̂: Estimate of r; : The average of 100 estimate of r.

M3

M2

M1

M1

Marker

Simulation result on estimation of recombination rate between three markers on two chromosome with known order, linkage phases, and r value

NIH-PA Author Manuscript

Table 5 Beibei et al. Page 16

. Author manuscript; available in PMC 2014 December 23.

NIH-PA Author Manuscript

NIH-PA Author Manuscript 15 20 9 17

2

3

4

5

12.053

8.000

9.026

10.313

8.623

Parameter*

10.780

8.911

8.988

9.700

7.291

Estimation

Represents genetic distance between two adjacent markers.

*

12

Number of markers

1

Chr.ID

2.067

1.873

0.072

1.455

2.102

t-value

2.131

2.364

2.101

2.160

2.228

tα=0.05

0.056

0.103

0.943

0.170

0.062

P-value

t-test of genetic distances for five linkage groups each with equal length of marker interval

NIH-PA Author Manuscript

Table 6 Beibei et al. Page 17

. Author manuscript; available in PMC 2014 December 23.

Constructing linkage map based on a four-way cross population.

Currently, developing genetic linkage map mostly use the derived-populations from crossing of two homogenous parents, which only covers limited geneti...
543KB Sizes 1 Downloads 6 Views