ttp://www.bsava.com

STATISTICS UPDATE

Statistics: general linear models (a flexible approach) M. Scott, D. Flaherty* and J. Currall† School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8QW *School of Veterinary Medicine, University of Glasgow, Glasgow G61 1QH †IT Services, University of Glasgow, Glasgow G12 8QQ

This article moves on to discuss a type of statistical testing different from those we have discussed previously, namely a General Linear Model. This system incorporates a number of other statistical models and is a powerful tool used widely in modern statistics. Journal of Small Animal Practice (2014) 55, 527–530 DOI: 10.1111/jsap.12260 Accepted: 26 June 2014; Published online: 19 August 2014

INTRODUCTION In previous articles (Scott et al. 2011a, b, 2012, 2013), we have met a variety of different statistical techniques, chosen because of the nature of the scientific question being investigated, the experiment performed and the properties of the data. We have emphasised that, in many studies, there is a natural response variable, and we wish to explore/find explanations (in explanatories or covariates) of the variability observed in the response. Examples we have explored have used regression models, and hypothesis tests such as a two sample t test. We could continue this series of articles introducing you to a new technique, suitable for a specific experimental context, but instead – in this article – we shall introduce you to a more general class of statistical model called a General Linear Model (GLM). One major benefit of the GLM framework is that, rather than having to study a two-sample t test or an analysis of variance (ANOVA), or a simple regression model, we can regard all of these as special cases of the GLM.

THE LANGUAGE OF STATISTICAL MODELS In earlier articles in this series where we have met statistical models, there have been two fundamental elements: Outcomes or Responses: These are the results of the practical work and of primary importance “Causes” or Explanations: These are the conditions or environment within which the outcomes or responses have been observed. In experiments, many of these have been determined by the experimenter, but some (often referred to as “covariates”), may be aspects that the experimenter has no control over but that are relevant to the outcomes or responses. In observational studies, these are usually not under the control of the experimenter but are recorded as possible explanations of the outcomes or responses. Journal of Small Animal Practice



Vol 55 • October 2014



Models specify the way in which outcomes and explanations link together. In the simplest case, for an experiment investigating how metabolite production in a cell culture depends on temperature, the statistical model may be communicated as follows: Metabolite ∼ Temperature This common notation simply says that metabolite depends somehow on temperature (whether a linear model or otherwise), but the amount of metabolite is not completely determined by the temperature, so therefore our statistical model should include a “random error” term because there are other factors or causes that will have played their part in determining the response. Strictly, therefore, in our model formula, there should be an additional item on the right hand side to account for all the other influences that affect the production of metabolite: Metabolite ∼ Temperature + Error If both metabolite and temperature are measured (and continuous), then this would be a regression situation. However, suppose that the experimenter could only measure temperature as low, medium or high. The same model formula would still apply, but the analysis would correspond to a one-way ANOVA. Using the GLM framework, we only need to worry about specifying the correct model formula. We can generalise this model notation to allow more than one explanatory variable, and – even more importantly – we can include both continuous and categorical explanatory variables in the same model. This means that the two-sample t test can be written as a special case of a GLM, that the linear regression model is a general linear model, and that many other apparently unrelated statistical techniques are all just specific examples of a GLM. Some GLM examples Table 1 presents examples (mainly from our previous articles) and relates the separate tests to their GLM form. In Table 1, column 2 identifies six different statistical techniques, but in column 3, they are all written in the same GLM

© 2014 British Small Animal Veterinary Association

527

M. Scott et al.

Table 1. Relationship between traditional statistical tests and the General Linear Model Example

Traditional test

GLM word equation

The effect of using brachial plexus nerve blocks (BPB) on heart rates of cats compared to not using BPBs (Scott et al. 2012) Comparing the effect on mean arterial blood pressure of two different drugs and a saline solution Investigating the relationship between the thickness of the systolic outer wall (SOW) of horse heart and heart weight (Scott et al. 2013) Investigating the effect of different anaesthetic regimens on arterial blood pressure (ABP) in dogs (Scott et al. 2011b)

Two sample t test

HEART_RATE ~ BPB

One-way analysis of variance Regression

EFFECT ~ DRUG_TYPE WEIGHT ~ SOW

Two-way analysis of variance

Investigating the change over time in blood pressure in dogs for different drugs Investigating the relationship between a variety of ultrasonic measurements on horses’ hearts and heart weight (Scott et al. 2013)

language. So, as a user of statistics, we would no longer need to worry about which hypothesis test we should use, but rather we can describe our analysis in a common way as “response depends on (~) explanatory variable(s)”. This is the power and flexibility of GLM for linear models. Most modern statistics packages (Minitab, SPSS, SAS, R, etc.) have facilities for analysing GLMs and all use syntax similar to that used in the table above.

Analysis of covariance Multiple regression

ABP ~ REGIMEN + TIME or under different assumptions ABP ~ REGIMEN + TIME + REGIMEN × TIME BLOOD_PRESSURE ~ DRUG + TIME WEIGHT ~ SOW + DOW + SIW + DIW + SEW + DEW

Heart rate difference (after-before)

30

Specific GLM examples In this section, we will introduce three examples which will illustrate a variety of statistical approaches, including one-way and two-way ANOVA and ANCOVA (analysis of covariance), but which will all be tackled in the GLM framework.

20 10 0 -10 -20 -30 Lidocaine

Midazolam Test Drug

Saline

FIG 1. Boxplot of median change in heart rate versus test drug

Example 1: one-way ANOVA A small clinical study was undertaken to compare two drugs (lidocaine or midazolam) and a saline solution (control group) on the change in heart rate in dogs at induction of general anaesthesia. Sixty dogs were randomly allocated to one of the three treatment groups, and the question of interest was based on whether there was a difference in the mean heart rate between the three treatments (and specifically between the two drugs). The question of interest is “Is there an effect of drug on mean change in heart rate?” The response variable is the change or difference in heart rate. The statistical summary Table 2 suggests that dogs receiving midazolam had the largest mean change in heart rate of 11·05, whereas lidocaine and saline were broadly similar, but there is considerable variability in the change in heart rate (the largest standard deviation is 16·27). The boxplot in Fig 1 suggests that the heart rate changes in the three groups are broadly similar, as all the boxes overlap. Therefore, it appears unlikely that a formal analysis will be able to detect a statistically significant difference.

Table 3. Output statistics for heart rate change versus drug Factor drug

Type fixed

Levels 3

Analysis of variance Source DF Drug 2 Error 57 Total 59

Seq SS 360·8 9079·1 9439·9

Values Lidocaine, Midazolam, Saline Adj SS 360·8 9079·1

Adj MS F 180·4 1·13 159·3

p 0·329

S=12·6207, R2=3·82%, R2(adj)=0·45%

The formal analysis is carried out in the GLM framework, where the model would be defined as: Rate change ∼ drug and the output is shown in Table 3. As the P-value of 0·329 is greater than 0·05, we cannot reject the null hypothesis that the mean heart rate change is the same in all three treatments; so we would conclude that the test drugs

Table 2. Descriptive statistics: change in heart rate Variable change

528

Test drug

n

Mean

se Mean

sd

Minimum

Q1

Median

Q3

Maximum

Lidocaine Midazolam Saline

20 20 20

7·80 11·05 5·05

3·64 2·10 2·50

16·27 9·41 11·17

−31·00 −2·00 −18·00

−1·00 3·00 −2·00

10·50 10·00 6·50

21·00 17·50 12·00

30·00 31·00 24·00

Journal of Small Animal Practice



Vol 55



October 2014



© 2014 British Small Animal Veterinary Association

Statistical tests

Data Means Temperament

Sedation level

14

Mean change in heart rate

12 10 8 6 4 2 0

Aggressive Alert Excited Nervous Quiet

None/Mild

Moderate/Profound

FIG 2. Main effects plot of mean change in heart rate versus both temperament and sedation level

N Normal

Table 4. GLM data for effect of temperament and sedation level on heart rate following induction of anaesthesia

R Reduced RR

1.0

N

Analysis of variance for change in heart rate, using Adjusted SS for tests DF

Temperament Sedation level Error Total

Seq SS

4 1 54 59

826·0 273·2 8340·7 9439·9

Adj SS 1002·5 273·2 8340·7

Adj MS 250·6 273·2 154·5

F 1·62 1·77

R

p 0·182 0·189

N

0.9 R

RR

R

0.8

do not explain the observed variation in change in heart rate. Indeed, “drug” explains less than 1% of the variability in heart rate change (R2(adj) is 0·45%). Example 2: two-way ANOVA A common situation is one where we have two factors (or categorical variables), and we wish to examine their effect on the response of interest. One example would be a study to examine whether heart rate following induction of anaesthesia might depend on both pre-existing temperament characteristics and sedation level achieved following premedication. We could classify temperament to five descriptors (alert, excited, nervous, quiet and aggressive) and sedation to two levels or intensities (none/mild and moderate/profound). The model in the GLM framework is expressed as: Rate change ∼ temperament + sedation The plot in Fig 2 is known as a main effects plot and shows the individual effects of each factor on the response. The analysis under the GLM is shown in Table 4. From Table 4, we can see that the P-values for temperament and sedation level at 0·182 and 0·189, respectively are both greater than 0·05, so are not statistically significant at the 5% level. Therefore, we would conclude that neither temperament nor sedation level affect heart rate. We can also see from the R2 (adj) value of 3·46%, that very little of the variability in heart rate is explained by these two factors. •

Vol 55 • October 2014

N

N

N

N

N

S=12·4281, R2=11·64%, R2(adj)=3·46%

Journal of Small Animal Practice

N

N

R

BMR

Source



R R 40

R

R

N

N N 50

60

70

80

Weight FIG 3. Scatterplot of basal metabolic rate versus bodyweight on diets with two different protein contents, including the best fitting straight lines for each diet

Example 3: ANCOVA example (effect of weight and diet on basal metabolic rate) Two diets (normal protein intake and a reduced protein intake) were compared in 24 patients, randomised 12 to each diet. The key question of interest was the comparison of the basal metabolic rate (BMR) at the end of 3 months between the two diets. However, as the weight of the patients might also influence their BMR, this must also be taken into account (Fig 3). Figure 3 shows clearly that BMR does depend on weight, and that there is some difference between the diets. The two lines on the plot represent the linear relationship between BMR and weight fitted separately to the two diet groups. These lines are not parallel, which represents a situation such that the two diets have slightly different effects depending on the weight of the individuals. The formal analysis then involves the explanatory variables: weight (continuous) and diet (categorical) to explain the variation in BMR. One way of modelling this situation would be

© 2014 British Small Animal Veterinary Association

529

M. Scott et al.

Table 5. GLM data for effect of body weight and diet type, and their interaction, on basal metabolic rate (BMR) Analysis of Variance for BMR, using Adjusted SS for Tests Source

DF

Wt Diet Diet × wt Error Total

1 1 1 16 19

Seq SS

Adj SS

Adj MS

0·088905 0·040629 0·000947 0·014095 0·144575

0·130223 0·000025 0·000947 0·014095

0·130223 0·000025 0·000947 0·000881

F

p

147·83 0·03 1·07

0·000 0·868 0·315

S = 0·0296801 R2 = 90·25% R2(adj) = 88·42%

Table 6. GLM data for effect of body weight and diet type on basal metabolic rate (BMR) Analysis of Variance for BMR, using Adjusted SS for Tests Source

DF

Seq SS

Adj SS

Adj MS

F

p

Term

Coeff

se Coeff

T

p

Wt Diet Error Total

1 1 17 19

0·088905 0·040629 0·015041 0·144575

0·129289 0·040629 0·015041

0·129289 0·040629 0·000885

146·13 45·92

0·000 0·000

Constant wt

0·33999 0·009064

0·04455 0·000750

7·63 12·09

0·000 0·000

S=0·0297452, R2=89·60%, R2(adj)=88·37%

N Normal

R

N

N

R

BMR

N

RR

1.0

N

0.9 R

and shows two parallel straight lines – the vertical separation between the two lines represents the difference in the effect of diet. The slope of the lines is given in Table 6 as 0·009064, so that for every 1 unit of weight gain, BMR increased by an average of 0·009064.

R Reduced

RR

N

N

N

N

N R

0.8 R R 40

R

R

N

N N 50

60

70

80

Weight FIG 4. Fitted model for basal metabolic rate (BMR) versus bodyweight and dietary protein content

BMR ∼ weight + diet + weight × diet Table 5 shows the output from fitting this model in the GLM framework. The diet × wt entry in the ANOVA table (Table 5) shows a P-value of 0·315 associated with this interaction term; because this is greater than 0·05, we will remove this term from the model and refit the simpler model with only weight and diet. In this case one should not interpret or discuss the other terms in the model. BMR ∼ weight + diet

Conflict of interest None of the authors of this article has a financial or personal relationship with other people or organisations that could inappropriately influence or bias the content of the paper. References

The results of fitting this simpler model are shown in Table 6, where we can see that the P-values for both weight and diet are statistically significant (

Statistics: general linear models (a flexible approach).

This article moves on to discuss a type of statistical testing different from those we have discussed previously, namely a General Linear Model. This ...
560KB Sizes 4 Downloads 6 Views