Computer Programs in Biomedicine 5 (1975) 131-141

© North-Holland Publishing Company

A G E N E R A L P R O G R A M T O E S T I M A T E G E N E - F R E Q U E N C I E S BY T H E M E T H O D OF MAXIMUM-LIKELIHOOD M. JAINZ Abteilung Medizinische Statistik und Dokumentation des Klinikums der Universit~t Kiel, 23 Kiel, Bundesrepublik Deutschland

The program estimates the optimal gene-frequencies from observed values of phenotypes by the method of maximumlikelihood. The program does not make assumptions about a specificblood-group-system.This information has to be supplied by input cards. A straightforward iteration process is used for obtaining initial estimates for the gene-frequenciesin question. Then an iterative application of the Newton-Raphsonmethod is used to solve the maximum-likelihoodequations. The program yields the gene-frequencies and their standard deviations, expected values of the phenotypes, expected values of the genotypes. Finally the ehi-square between observed frequencies and (optimal) expected values of phenotypes is computed. Gene-frequencies

Phenotypes

Genotypes

Maximum-likelihood

1. Problem description

2. Mathematical background

In genetics an interesting question is the estimation of so-called gene-frequencies from a set of observed values of phenotypes. Fisher proposed in [1 ] to apply the method of maximum-likelihood, because it yields efficient estimators together with their variances. In general one has a blood-group-system with a set of observable phenotypes. The underlying genetic model consists of a set of genes or gene-complexes

The theory of the computational process may be read in any one of the books [ 1 - 9 ] . We will give a rough sketch only for understanding what is done in the program. Let

x k,

k = 1..... NP.

Let the observed frequencies of the phenotypes be Bi,

i=I,...,NF

with

NF/>NPi.g.

The expected values of the phenotypes are non-linear functions of the gene-frequencies: E i = E i ( x 1, ..., xNp),

i = 1..... NF.

The problem is to find a set of gene-frequencies ..... X? p so that B i ~, E i, in other words find an optimal solution for a system of non-linear equations with more equations than variables.

x k,

k=I,...,NP,

with

~x

k

k =1

be the gene-frequencies of a set of genes or genecomplexes, that are the base of a blood-group-system. The genetic inheritance rules yield a set of NP*(NP+ 1)/2 genotypes, as each gene.complex may be combined with any one from the set. Certain non-empty subsets of the set of genotypes are called phenotypes, as the members of those subsets are indistinguishable by laboratory methods. Now let - as in (1) Bi,

i= 1..... NF

the observed frequencies of phenotypes in a sample. With T = Y'i Bi, the corresponding expected values are

M. Jainz, Estimation of gene-frequencies by the method of maximum-likelihood

132

E i = T ~ . co,gi],

i = 1 ..... N F ,

(1)

l where ] ranges over those genotypes, that belong to phenotype i, and

gi] =XEKi,], 1 *XEKi, L2 is the expected value of the genotype consisting of the gene-complexes EKi, L 1 and EKi,], 2 divided by eli. The indexarray EK contains for each phenotype the corresponding indexpairs, which indicate the genotypes belonging to it. The ci] are easily derived from Z k x k = 1 and Y'i Ei = T , (Z k Xk)2, namely =/1

for

EKi, L 1 = EKi,], 2

(2

for

EKi, L 1 ¢EKi, i,2

ci]

k

..., XNp) .

The problem of finding an optimal solution to

O,

we get

a2L _ axiOx]

~ 1 aEkaEk k Ek Oxi Ox]

(3)

(see [2, p. 305]). For improving an initial estimation X 0 = (x 0 ..... X~v), we apply iteratively the NewtonRaphson method (see [11] or [12])

Xn+I = X n + [(L")-llXn • [L'lx n .

The iteration process is stopped, ifL is no longer increasing, i.e. has reached its maximum (see [10, p. 1781). The elements of the matrix (L") -1 , the so-called information matrix, directly yield the asymptotic variances of the parameters x k, namely Var(xk) = (L")~-1

B i ~ E i is equivalent to maximizing

The necessary condition L' = 0 transforms the problem to findingx~, ..., x~i, so that aL ) = ( 0 ..... 0),

Cov(xi, xj) = (L")/] 1 .

ax i .

(2)

Now let " L" = (Li]) = i\ ~ a2L ]

be the NP.NP-matrix of second partial derivatives of L. For those we have

OEk ae k

(5)

Eqs. (4) and (5) provide a means to compute the variances of those parameters, that have been linearly substituted (plus additive constants). One parameter may always be substituted, as from

with

02L = ~"~ Bk a2Ek

(4)

and

(x 1 .... , XNp) •

ax i

~-Ekk

L n =L(x~ ..... X~p).

L = ~Bk.lnEk(Xl,

L, = ( a L

0 2 E k - ~2 axiOx] OxiOxj

We also get current values

The maximum-likelihood method considers the following likelihood-function:

L = L

~k

~x k = 1 k

follows

NP X1 = 1 - ~ x k .

k=2

A suitable means for constructing substitutions is given if there exists a partitioning of the set of phenotypes, where the generated subsets contain subsets of the set of gene-complexes in consideration. Let the disjoint subsets of gene-complexes be

(xil,...,Xim)

and

(Xim+l .... ,xiNp),

axi% Y ek axiax/ k ek2

and let the partitioning of the phenotypes be defined by the vectors

If Tis large, we may replace B k by E k, so that Bk/E k = 1. As Z k E k = T = const, and thus

D = (d 1 ..... dNF) and

with

d i E (0, 0.5, 1 ) ,

M. Jainz, Estimation o[ gene-frequencies by the method of maximum-likelihood

D' =(ael ..... d'NF)

with

d~=

i

.5

if if if

di=O d i=0.5 di= l

Then we may compute two scalar products with the observation vector B, namely T 1 = D- B and T2 = D. B with T 1 + T 2 = T. For those we have

T

which yields T1

m

k~=2=xik'

NP

r2

2

T

k-m+l

as

~ X k = 1. k Starting with x 0 = 1/NP, i = 1, ..., NP, the process

Bk * ~. gk/. xn+l = ~ k Ek (Xn1 , ..., Xsp) n

Xik '

Xil - T

(6)

xi ,

with the same range for] as in (8) yields the initial estimation for the Newton-Raphson process. I have no convergence proof for this process, but for all systems we considered, it yielded good initial estimations (see also [2, p. 308] ). The computation of chi-square between optimal expected values and observed frequencies is done by X2

which yields T2

133

NP

Xim+l = --~ -- k=m+ 2 X i k

.

(7)

The variances of the substituted variables are computed

by m m-1 Var(xil ) = ~ Var(xi,) + ~ 2*Cov(xq, Xik), /=2 / /'=2

=

~ (Bi-Ei)2 i Ei

with

DF=K-N-1,

where DF = number of degrees of freedom, K = number of classes, N = number of independent parameters. Classes of phenotypes are built so that E i > 1.0 for all i.

3. Program

k>i

as T1/T = const, has no variance (see [10, p. 177]). One should always try to substitute as many parameters as possible because of the sensitivity of the Newton-Raphson process against dependencies between parameters. For the purpose of getting an initial estimation we use a straightforward iteration process, which is constructed in a heuristic manner. Consider the expression

Bk * ~. gM ~k Ek(Xl, .... I XNp)

(8)

(see eq. (1)), where ] ranges over those genotypes, that contain the ith gene-complex. Then (8) yields an estimation of

3.1. Program description The program starts with reading a deck of control cards, which define the blood-group-system in consideration. If the indicator for symbolic phenotype printout is on, the expected-value-functions of the phaenotypes are printed symbolically. Substituted variables are inspected for containing substituted terms in their representations. If yes, these are replaced by their own representations. After reading a set of observed frequencies of phaenotypes, initial values are assigned to the genefrequencies in question. Now the Newton-Raphson method is iteratively applied until the likelihood function L is increasing no longer. A final iteration step is performed and the expected values of phenotypes and genotypes are computed. The variances of the gene-

134

M. Jainz, Estimation of gene-frequencies by the method of maximum-likelihood

frequencies are taken from the information matrix and the results are printed together with chi-square between observed and expected values of phenotypes. The program restarts by reading a new set of observed values. If "STOP" is found, it exits. 3.2. Capacity The outlay of the arrays in the program has been adjusted to the largest system we considered, namely the Rhesus-system, which consists of 30 phenotypes and 12 parameters, 9 of which are independent. The most space-consuming array is the array for storing the partial derivatives of the expected-value-functions of the phenotypes. So the largest system, that may be processed, is limited by the following condition: NF*NP*8 ~< 32 K Bytes, where NF = number of phenotypes, NP = number of independent parameters, provided the program is run under DOS-/360 in double-precision.

4. Input

The numbers are taken by indexing the set of genecomplexes (from 4.1.3), where the elements have been assigned numbers from 1 through to NP + NS. 6. For each substituted variable a set of those genecomplex numbers and constant indices, that occur in its representation, together with their sign. 7. For each constant, which occurs in the representation of the substituted variables, the corresponding vector, whose scalar product with (Bi)/T yields the constant itself (see eqs. (6) and (7)). 4.2. Data cards Two data cards are necessary for each sample of observed values of a system defined as in 4.1. 1. A text card, identifying the material, from which the sample was taken (up to 60 columns). 2. The observed frequencies of the phaenotypes, separated by commas. If the text card contains "STOP" in the first four columns, the job is terminated. 4.3. Deck-setup (see fig. 1)

4.1. Control cards There are seven types of control cards. 1. Contains the following integer values, separated by commas: NP = no. of parameters (gene-complexes) NF = no. of phenotypes NFT = max. no. of genotypes/phenotype NS = no. of substituted variables NST = max. no. of terms in substitutions NE = indicator for symbolic phenotype printout

I

}._,.

(NE --- 0: NO, NE #= 0: YES) 2. Symbolic name of blood.group-system (up to 20 columns) 3. Symbolic names of gene-complexes and constants in the following sequence: (5 columns each) independent variables, dependent (substituted) variables, constants. 4. Symbolic names of phenotypes (10 columns each). 5. For each phenotype a set of pairs of gene-complex numbers, indicating the corresponding genotypes.

~lBIpla

J (H..

I

I

Fig. 1. Deck setup

M. Jainz, Estimation o f gene-frequencies by the method o f maximum-likelihood

6.1. Sample run a. Input deck

b. S)/mbo{ic phenotype printout ,~

/ / JOB PHE1901 MAXIMUM L I K E L I H O O 0 ESTIMATES I I EXEC JMOIMLMD 3,4,2,113~1, ABO-SYSTEM 0 A B I O A B AB OlD1 02020102 03030103 0203 ~,-1,-2, I,I,l,l, EXAMPLE TAKEN FROM KEMPTHORNE, PAGE l T b 2 0 2 , 1 7 9 , 3 5 , b~ STOP

THE PHENOTYPE FUNCTIONS OF THE ABO-S¥S~EM

c. lnitial ABO-SVSTEM

0

=T*IOiOI

A

-T*IA*A+2*O*AI

B

-T*IB~B+2*O~BI

AB

=T*I2*AtB)

THE SUBSTITUTIONS ARE AS FOLLOWS= B

estimation EXAMPLE

=X-O-A

printout

TAKEN FROM KEMPTHDRNE,

PAGE 176

I N I T I A L VALUES

FREQUENCIES OF GENE-CONPLEXES O = ,69842~ A = .251563 B = .050012 1 = I.OOOOOO

d. Final ABO--SYSTEM

EXAMPLE

result

printout

TAKEN FROM KEMPTHCRNE,

paGE 176

T.ITERATION

FREQUENCIES OF GENE-COMPLEXES O = .698428 ~ .016980 A = .251560 ~ ,016132 a = ,050012 L .007600 P HENOTYPE S 0

OBSERVED 202 14T. fl=)

EXPECTED 205°85 I48°7802t!

I79 (~Z,4II

176.99

35 I AB

SUM

NEWTON CORRECTIONS DO =-I,4535799E-IT DA = 1,0911860E-17

6 (

E. 3 ~ |

30°54

(

7°2360~1

l,4gl

10°62

I

2°5162g)

422 C H [ 2 = 2 . 82504;71E* 00 OGF = I

Fig.2. Sample run AB0-system.

(41.46761)

~22°00

GENOTYPES

EXPECTED

O

/0

205.85

a O

/A /A

26°71 ( 6°3282Z) 148.2q (35.1393Zl

B O

/B /B

A

IB

I48.780211

1.06 29°48

( ,2501gl ( b.9859~|

10.62

(

422.00

Z.Slb2gl

135

M. Jainz, Estimation o f gene-frequencies by the method of maximum-likelihood

136

6.2. Sample

run

a. lnput deck / / JOB PHE1901 MAXIMUM L I K E L I H O O D ESTIMATES FOR GENE FREQUENCIES / / EXEC JMOIMLRD 12,30t9,3e6~l, RH-SVSTEN RL R2 RZ RIbl RZW R Re R "e RV RIM RO RYkl EI C' O.CC'E*E' O.CMCeE*EIDCCIEE * D . C k C ' E E " O. C C E ' E * O . C W . E I E * O.CCEE e O . C I C e E e E IDoC IC ~EE ' 0 . C eC tEE O.CC lEE D. CMC'EE O.CCEE O.CW. EE CCeEeE ° CNCIE'E ' CeC'EE ' CC'EE' CMCeEE' CCE' E' CW.Ee E I CW.EE t CIC JEE CC'~EE CWC lEE CCEE CN.EE 011107110106 061L10110606 031109110108020703060102 05111112021005060~.02 0408 01010107 010~'0 1 1 0 0 6 0 6 0 6 0 TO610 010301090307 010506030605060901120612050703100510 11111106 021108110206 02020208 030202090308 050202120508

E O.r'W.EE e C~'CeEa E I CCEE I

03030309 0 5 0 3 0 5 0 5 0 5 0 9 0 3120 512 0606 0607 0610 0608 06090708 06120810 0707

07101010 0709 07 1 2 0 9 1 0 1 0 L 2 0808 0809 0812 0909 09121212 13,- 11,-6,-1,-4~-7, 16,-2t-6t-8, O, O, IS,-2,- 3v-5,- 8,-9, 1,1,0.5,0.5, 1,1,0.5,0.5,1tO. 5,0,O,O,O,O,l,l,l,0.5,0-5,0.5,1t 1,0.5,0.5,0t0,0,0,0 t 0.5,0.5,0.5,0.5tOtO,O,O,1,ltl,O.5,0.5,0,Otl,O.StO.5tltO.5,O.5,0tO,O,O, 1,0.5,0j5, O, O, O, Ov 0 . 5 , 0 . 5 , O , O t O . 5 , 0 . 5 t 0 , 0 . 5 , 1 , 1 , 1 , 1 ~1,0,0,0,0.5,0.5,0.5,0,0,0.5,0.5, 1, l , 1, [ , 1~ EXARPLE TAKEN FROM SACHS 3362, 166, 1207152w1672t130t7,O,16~,1263,210,3,1,0,O,lb75,TT,O,69,2,0, Z, O t O t O ~ o , o , 0 , 0 , O, STOP

c. lnitia[

EXAMPLE

R H-SYSTEM TN | T | A L

estimation

TAKEN FROM SACHS

VALUES

FREQUENCIES OF GENE-COMPLEXES R1 = .396307 R2 = ol.62550 RZ = °000838 R 1M = .016388 RZW = .00009 1 R = .408524 RI = .009833 R '° = .005877 RY = .000063

Fig. 3a.

R eW RO RYM EI CI E

= = = -

=

.000000 .019548 .000000 .850600 . 576500 .149600

printout

M. Jainz, Estimation of gene-frequencies by the method of maximum4ikelihood

6.2. Sample run

b. SymboLic phenotype printout THE P H E N O T y p I : D.CC' E' E'

FUNCTIONS

OF

THE R H - S Y S T E M

A/~E AS FOLLOWS:

= T $ ( 2 * R I * R O + E * R ttRO~,Z S R | * R I

O .CWC ' E ' E ' = T $ {

2 * R IW~R OeZeR ' MSRO÷2 SR l MSR)

DCCeEE'

= T ~ ( 2 5 A Z *R 0 + 2 SR Y*RO~.2 SR L SR" '~'2 eR2 SR ' e 2 5 RZ$ R ÷ 2 5 R l e R2 I

O.CWCeEE'

=T'I'(Z*RZllmRO~2$RO~RYi4eZ~RZ~ReM~ZSRZMiIR÷Z'I'RLMeRZ÷ZSRIN$11 ''

D.CCE'E'

=T*iR L*RI÷Z~I¢R

D.CW. E' E '

= T * ( 25R l * R l g e Z ¢ R 1 SR ' MeR1W$I~ 1 M+Z S R I MSR ' ÷ 2 $ R I M S R ' M )

O.CCEE'

= T $ ( 25R [ $ R Z e Z * R I * R Ve2 S R Z S R ' )

D°CkI.EE'

= T $ ( 25R lSRZ W',.2$R L ~ $ R Z + 2 SR 1W~'RZ u÷Z SRIW SRY'e2~R I S R Y i l e 2 $ R L M S R V l l eZSR ZMSR . ÷2$RZSR

D . C ' C mE t E ' = T $ I R O~R 0+2 SR OSR) O . C I C , EE ,

= T $ ( 25R 2*R 0.,.2 SR ' * sR 0 + 2 $R2 SR )

O.CtC'EE

= 1"$( R 25RZ'.-25R2 SR me )

D , C C ' EE

= T $ ( 2$RZ e R 2 ÷ Z ~ 2 5 R

DoCi~C ' E E

:T*(

D.CCEE

= T*I R ZeRZe-ZSRZSRY|

D.CW,EE

= T $ ( 2 * R Z MSRZo.RZ WSRZ ~ ÷ 2 sRZ I ~ RYe2 SRZ eRYid,t,2$ RZW* RYM )

C'C'E'E'

=T*(ReR

CC'E'E'

=T*I ZSR*R')

¥÷2*RZSR e j )

2*RZN'e'R24"E't'RZ'e'R VWeZ'e'RZMSR g e )

I

CWCOEOE t

= T ~ ( 2$RSR ' g )

C'C#EE '

= T $ 1 2$R~Ni ' ' )

CCOEE .

= T $ ( 25R ~q.V÷ ZSR ' * R ' ' )

CkIC' E E '

= T $ ( 25R sR YW÷Z il'R ' ' il~ ' W|

CCE' E'

=T*iR'eR')

CW.E'E'

= T*(

CCEEt

= T $ ( 25R ' S A Y )

CWoEE'

= T * I 2 * R ' SR Vii+ Z e R Y * R ' g e e ~R ' I I S R Y M )

CoCeEE

=T~IR e,~Ret)

C C ' EE

= T * ! 2~R ' ' v~e,y )

CMCOEE

=T*(2*R

2 * R ' SR ' i ~ ' R ' M~R ' b)

' ' * R YN)

CCEE

=TelR V*RY)

CW.EE

= T$ I 2 * R Y~RY h+ R VWSR Y l i l

[HE

SL~ST[TUHONS

Rel¢

=E I - R O - R - R

~0

=C m - R 2 - R - R se

RYW

=E-R E-RZ-RZ~-R

I

')

ARE AS F O L L O W S :

].-R L M-R o

° s-RY

Fig. 3b. Sample run Rh-system.

e W+ 2 $ R Z N * R t M |

137

M. Jainz, Estimation of gene-[requencies by the method of maximum-likelihood

138

6.2. S a m p l e

d. Finat RH-SYSTER

E)(AMPLE

resutt

run

printout

TAKEN FROM SACHS

5 . ITERAI"ION ************

NEWTON CORRECTIONS ORI = 7.9898963E-"08 DR2 = 1.6359019E-08 ORZ =-9.9857112E-10 DRI~ = 3.256984BE-09 ORZN = I,2710029E-09 DR = 4.4155807E--08 DR + = I.I065598E-09 DR'' =-7.8110189E-10 DRY = 1.1405284E-09

F R E Q U E N C [ E S OF G E N E - C O M P L E X E S = .396366 ~ .001396 R2 = .142607 ¢ .000863 RZ = .000778 ¢ .000321 RI~ = .016388 ¢ .0C089~ RZW = .000091 ~ .O00099 R = .408525 ~ .001519 R+ = .009781 t .O01058 R +' = .005827 ~ .00081T ~Y = .000097 ~ .000164 R~W = .000000 ~ .000578 RO = .019541 ~ .0C1493 RYW = .000000 ~ .000001

RI

P HENOTYPES 0.CC'E'E e

OBSERVED 3362 133.6Z)

EXPECTED 3397.23 133.9723t)

GENOTVPES RL R~ Rl

D.CWCJEeE I

166 (

1.41)

140.30

(

lEO /RO /R

1.4030X)

RIW / R O RgW / R O RIW / R DCC'EE'

1207

ll2.1tl

1211.28

52

(

.51)

69.63

I

/RO /RO IR tt /R = /R IR2

/RO /RYW /R~W /R RIN / R 2 Rl14 / R m+

D.CM.E°E '

O.CCEE °

D,CW.EE =

O.C=C~E=E e

1672

130

(16.71)

(

7 (

0 (

166

(

l°3t|

.It)

.0~)

1.6t1

F i g . 3c. S a m p l e r u n R h - s y s t e m .

1668.60

135.80

7.09

1.06

163.68

.06401 l .O000t) 1.33891!

• 30 • 04 46.20

.003Oil .00041 ) ,4620~ I .27901 I

27.90 6.36 1130.69

.0636t

|

II.30491 l

.494311 RZW RO R2 RZW

O.CCEOE +

6.40 ( .00 ( 133.89 |

112.11281) RZ RY RI R2 aZ RI

D.C~C=EE g

EXP EETED

154.91 I L . 5 ~ 9 L Z l 3.82 ( .0382Z I 3238.51 (32.385lZl

.04 • 00 .00 .74

.00041 l .O000t) .O000t I .0074t I o46741 l .Olqlt l

66.74 1.91

116.4860t1

I

(

(

(

RI RI

/RI

RI RI RIW Rlt¢ RLW

/Rlk/ /R'W /RIw /R I /R' W

RI RI RZ

/RZ /RY /R e

RI RIW RItd RIW RI RLN RZw RZ RZN

/RZ~ /RZ tRZW /R¥ /RYW /RYW /R= /RIb¢ /R~kl

RO RO

/RO / R

/R +

1571.06 ( 1 5 . 7 1 0 6 1 ) 77.54 ( °7754t I

1.35801|

129.91 • O0 2.69 3,21 • 00

( ( ( ( (

1.29glX} .OOOOt ) .0269t ) ,032 11) .O000Z )

6.17 • T7 .15

( ( (

.06 171) .00771 ) .00151 )

.72 .26 .03

( I (

.00721 I .0026t | .0003t !

• O3 • 00

( (

.O003X .O000t

.07091)

.OL06t)

.DO ( • 02 ( • 00 ( • 00 (

| )

.O00Ot) .00021 | .O000t I .0000% |

1.63481) 3.82 159.66

( (

.03821 ) 1.5966t|

M. Jainz, Estimation of gene-frequenciesby the method o f maximum4ikelihood

6.2.Sampie

d. Final

run

result

printout

(continued) D.C'C'EE'

D.C'C'EE O.CC'EE

1243 (IZ.4Z|

210 I 3 I

D.CWC'EE

O.CCEE

t

I

0 (

O.Cki. EE

0 I

Z.l~t .011)

.OZl

.0I) .OZl

1675 ( 1 6 . 7 ~ )

C'C'E'E'

CCOEOE' CuC'E'E '

77 I

.8Sl

1223.18

219.99 2.5q

i12.Z318g)

I

.01 .00

1668.93 79.92

(

I (

I

4~ I

.St,|

4-7.61

I

.47611[)

CC'EE'

2 I

.0g)

1.93

I

,019311

I

2 I

.Og)

.qb

I

.00961)

CId.E'E'

0 I

.OZ)

.00

(

.O000gl

CCEE'

0 I

.Og}

.02

(

,O002Z)

CW.EE'

C I

.0111

.00

(

.O000Z)

C*C'EE

0 I

.Og)

.34

I

.0034t)

CC'EE

0 I

.0~)

.01

(

.OO01Zl

CWC'EE

0 I

.0~}

.00

(

.O000t}

CCEE

0 (

.0~1

.00

I

,OOOOg)

.00

.OZ22Z) .0028Z! oO009Z)

RZM / R 2 R2 /RYbl RZW I R ' '

.2b .00

( (

.0026~1 ,O000Z!

• 0[

(

,O00tg|

RZ RZ

/RZ

• 011

/RY

.00

(

.0000~1

RZW RZM RZN RZ RZM

IRZ /RZN /RY

.00 I • 00 I

.O000Z) ,O00O|l

IRYM

• 00 ,00

I I

.O000ZI .0000Zl

/RYM

.00

(

.0000~1

R

/R

R

IR'

R

/R'~

F~

IR'

R R' R R''

t668.93

10000

10000.00

C H I 2 = 3, 6101659E~" 00 OGF = 7 F i g . 3C. S a m p l e r u n R h - s y s t e m ( c o n t i n u e d ) .

([6.6893S1

I

79.92

(

.7992Z1

.00

(

.O000Z)

'

47.61

(

.476|Z|

/ RY IR' '

.79 1.14

I (

.0079Z1 .OI[4Z!

IRYM /R'W

.00 .00

( I

.OO00Z! .O000Z}

R'

I R'

• gb (

.OOq6Z|

P'

/R'W

R'W I R ' M

.00 .00

.O000Z) .0000I)

R'

lAY

.02

.O002Z|

R'

IRYl¢

RY /ROW ReW / R Y W

.00 • 00 ( • 00

.0000~1 .O000Zl .OO00Z)

R''

IR''

.34

.0034Z)

R' *

I RY

• Ol

.OO01Z]

R*'

/RYkl

,O0

.O000Zl

RY

IRY

.00

.000011

.00 • 00

.O000Zl .00001|

.0000~) RY / RYbl RYM /RYW

SUN

.OOOIll

.000011

CCE"E"

,Og|

2.Z2 I • 28 ( • Oq I

.7992~1

C'C'EF'

0 I

( 2.0337Z) ( °166211

l16.b893g)

.0000111

C~I.EE

/R2 /RY /R' '

203.37 16.62

.O000~l

(

.00

RZ R2 RZ

( .5573ZJ ( .0228SJ ill.bSITgl

.O00l~)

.00

.0~}

IR2 IR* '

.OOZT~)

,Ogl

0 l

R2 R2

55.73 2.28 llbS.17

.025qt)

0 I

CWC'EE'

IRO /RO IR

2.1999~1

(

.27

~2 R' ' R2

10000.00

139

140

M. Jainz, Estimation of gene-frequencies by the method of maximum-likelihood

5. Output

7. Program modifications

The output consists of three parts. 1. Symbolic printout of the representative functions of the expected values of the phenotypes (optional). 2. Listing of the program-generated initial estimations for the gene-frequencies. 3a.Listing of the optimal maximum-likelihood solution for the gene-frequencies together with their standard deviations and final Newton-Raphson corrections. 3b.Listing of the expected values of phenotypes and genotypes. 3c.Chi-square between observed and expected frequencies of the phenotypes and the involved number of degrees of freedom.

As we use the PL/1-subset under DOS, we were forced to specify the upper limits of array dimensions as decimal integer constants. The arrays in the program have been adjusted to the largest system, that we considered (Rh-system). If larger systems are to be processed, the upper limits mentioned above have to be increased.

6. Sample runs

9. Mode of availability

The first sample is taken from Kempthorne [10, p. 176]. The ABe-system contains three gene-complexes A, B,¢. The phenotypes are:

The program can be obtained from the authors free of charge.

(¢,¢) A:

(A,A), (A, ¢) (B,B), (B, ¢)

AB: (A,B). The gene-complex B is substituted by B = 1-0-A, where the constant 1 is generated by 1 = (1 * ~ + 1 *~.+ 1 *AB--)/T, so that the corresponding vector is D 1 =(1,1, 1,1). The necessary input and the output produced by the program may be seen in fig. 2. The second sample run shows the estimation of gene-frequencies for the Rh-system. The observed values of phenotypes have been taken from Sachs [13]. Necessary input and output produced by the program may be seen in figs. 3a-c.

8. Timing One run for the 12-parameter system of Rhesusfactors requires approximately 40 seconds/sample on the IBM system/360.50.

References [1] R.A. Fisher, Statistical methods for research workers, 13th ed. (Oliver and Boyd, Edinburgh-London, 1963) pp. 299-355. [2] C.Radhakrishna Rao, Linear statistical inference and its applications (Wiley, New York-London-Sydney, 1965) pp. 254-290; [3] C. Radhakrishna Rao, Advanced statistical methods in biometrical research (Hafner, Darien, Conn., 1970) pp. 129-175. [4] A. Linder, Statistische Methoden, 3rd ed. (Birkh~'user Verlag, Basel-Stuttgart, 1960) pp. 432-454. [5] E. Weber, Grundriss der biologischen Statistik, 6th ed. (Gustav Fischer, Stuttgart, 1967) pp. 198-214. [6 ] S.S. Wilks, Mathematical statistics (Wiley, New YorkLondon-Sydney, 1962) pp. 344-393. [7] M.G. Kendall and A. Stuart, The advanced theory of statistics, vol. 2 (Charles Griffin, London, 1961) pp. 1-74. [81 C.A.B. Smith, Biomathematics, vol. 2 (Charles Griffin, London, 1969) pp. 368 ft. [9] B.L. van der Waerden, Mathematische Statistik (SpringerVerlag, Berlin-Heidelberg-NewYork, 1965) pp. 148-208 [10] O. Kempthorne, An introduction to genetic statistics (The Iowa State University Press Ames, Iowa, 1969) pp. 164-178.

M. Jainz, Estimation of gene-frequencies by the method of maximum-likelihood [ 11 ] J.S. Berezin and N.P. Zhidkov, Computing methods, vol. 11 (Pergamon Press, Oxford-London-EdinburghNew York-Paris-Frankfurt, 1965) pp. 129-136. [12] A. Ralston, A first course in numerical analysis (McGraw-

141

Hill, New York-St. Louis-S. Francisco-Toronto-London Sydney, 1965) pp. 318-349. [13] V. Sachs and J. Drews, Z. Immun.-Forsch. 141 (1970) 69-84.

A general program to estimate gene-frequencies by the method of maximum-likelihood.

The program estimates the optimal gene-frequencies from observed values of phenotypes by the method of maximum-likelihood. The program does not make a...
464KB Sizes 0 Downloads 0 Views