Greek version of the General Health Questionnaire: accuracy of translation and validity Garyfallos G, Karastergiou A, Adamopoulou A, Moutzoukis C, Alagiozidou E, Mala D, Garyfallos A. Greek version of the General Health Questionnaire: accuracy of translation and validity. Acta Psychiatr Scand 1991: 84: 371-378. The purpose of this study was to examine the accuracy of translation and to test the validity of the Greek version of the General Health Questionnaire (GHQ). In the translation study, the English and the Greek versions of the G H Q were administered to a sample of 50 bilingual respondents. The internal consistency, item-by-item and the subject-by-subject analysis have shown that the 2 versions are equivalent and therefore the Greek translation is highly accurate. In the validity study, 100 consecutive patients attending an internal medicine outpatient clinic completed the Greek version of the GHQ-60 and were interviewed independently using the Present State Examination (PSE). The validity of the shorter forms of the questionnaire (GHQ-30 and GHQ-28) was tested by disembedding the relevant items from the larger set. The correlations obtained between the scores of the questionnaire and the PSE ratings, as well as ail the validity indices (sensitivity, specificity, positive predictive value, negative predictive value and overall misclassification rate) were quite satisfactory for all the G H Q forms, thus confirming the validity of the questionnaire in its Greek version. The best cut-off points as found by receiver-operating characteristics analysis were 11/12 for the GHQ-60, 5/6 for the GHQ-30 and 4/5 for the GHQ-28. The revised (CGHQ) scoring system for the GHQ-30 has not been proved superior to the conventional scoring method. The above results are discussed in relation to the pertinent literature and especially the studies carried out in similar settings and in countries with similar cultural backgrounds.

The General Health Questionnaire (GHQ) is one of the most widely used self-administered psychological tests. It aims at detecting current nonpsychotic disorders among people attending outpatient departments in general hospitals or general practices (1). There are 3 different forms (GHQ-60, -30 and -28) that are equally sensitive to psychopathology and have been translated into more than 36 languages (2). A serious problem arises when various psychological tests are used on groups of people other than those they were originally designed for. Such factors as language and culture have an important influence on the reliability and validity of such tests. Each psychological test or scale has been developed for a particular group of people with a specific language and cultural background. The translation of a test into another language is therefore an important and complicated procedure, necessary for its application

G. Garyfallos, A. Karastergiou, A. Adamopoulou, C. Moutzoukis, E. Alagiozidou, D. Mala, A. Garyfallos Community Mental Health Center, Northwestern District of Thessaloniki, Greece

Key words: questionnaire; translation; validity; Greece George Garyfallos M.D., Community Mental Health Center, H. Panidi 19, Skies, 566 25 Thessaloniki, Greece Accepted for publication June 1, 199 1

in a different language and cultural context. Yet the significance of this step has usually been ignored (3). Once the translation study has been completed, the next important step is to investigate the clinical validity of the test. The validity of the G H Q has been tested in various clinical settings and in a number of other cultures and languages; Italy (4), Spain (9, Mexico (6), Japan (7) and Yugoslavia (8). The most appropriate approach to criterion validity is to compare G H Q responses with the results of an independent and standardized psychiatric assessment, which is assumed to be an error-free measure of true morbidity (2). The majority of the validity studies used the Clinical Interview Schedule (CIS) (9) as external criterion of validity, but some studies (10-13) have tested the G H Q using the Present State Examination (PSE) (14). Goodchild & Duncan-Jones (15) report that they have identified a subset of items in the GHQ-30 for 37 1

Garyfallos et al.

which the response no more than usual may indicate a chronic illness rather than good health. Thus, they proposed a new revised scoring system in which the response no more than usual in these specific items should be scored 1 instead of 0. Using this scoring system, they claim that the G H Q provides a better index of present state. The main aims of this study were: to examine the accuracy of the Greek translation of the G H Q in all its forms (28, 30 and 60); to investigate their clinical validity against the PSE; and to compare the performance of the GHQ-30 scored conventionally to that of the G H Q scored according to Goodchild & Duncan-Jones’ new scoring system.

Material and methods

Translation study

The GHQ-60, -30 and -28 were translated from the original English into Greek with unanimous consensus by a group consisting of 3 psychiatrists and a clinical psychologist. One of them was fluent in English and the other 3 were bilingual. To secure the reliability of the translation, the bilingual method was adopted. A bilingual is a person who is fluent in 2 languages and has lived for at least 1 year in each country and therefore is also bicultural. Residence in both countries is thought to be necessary for an individual to understand fully the different meaning of various words in both languages and not just be able to translate from one language to another (16, 17). Fifty bilingual subjects were chosen to participate in the study. The subjects completed all 3 forms of the test on 2 occasions with a time gap of 2-5 d. In the first group, the 25 subjects were presented with the English version on the first occasion followed by the Greek version on the second. In the second group the remaining 25 subjects were presented with the Greek version first. Validity study

The sample consisted of the first 100 consecutive patients aged 15 years or above who attended the Internal Medicine Outpatient Clinic, Kentrikon Hospital, Thessaloniki, Greece and agreed to participate in the study. Nine patients refused to cooperate for various reasons and were replaced by the following consecutive attenders. After each patient had been seen by the physician, a psychiatric nurse explained the purpose and the procedure of the study. Demographic information was collected and the patient was asked to complete the Greek version of the GHQ-60, after which the nurse ensured that all questions had been answered. Each subject was then


interviewed by a second psychiatric nurse who at no time was aware of the G H Q scores. The psychiatric interview was conducted using the Greek version ofthe PSE (18) and the psychiatric nurse was trained and experienced in its use. The PSE is a semistructured interview and a glossary helps the interviewer to rate items in a standard way. Then these can be subjected to the computer programs CATEGO (14) and Index Definition (ID) (19). The I D allows 8 degrees of definition of symptoms to be established; at the threshold level ( 5 ) and definite levels (6-8), disorders are sufficiently well defined as to be considered pathological and therefore subjects allocated at level 5 or above are reported as cases. Subjects could also receive a CATEGO diagnosis, which can lead to an ICD-9 diagnosis (20). Spearman rank correlation coefficients between G H Q scores and PSE-ID ratings were estimated. Thus, it has been assessed how well the G H Q measures the severity of mental disturbance. The necessary 5 validity coefficients were calculated and the best cut-off points determined. These coefficients are: a) sensitivity: the probability that a true case will be correctly identified by the questionnaire; b) specificity: the probability that a normal person will be correctly identified at the same time; c) positive predictive value (PPV): the probability that a person who scores above the cut-off point on the questionnaire is a true case; d) negative predictive value (NPV): the probability that a subject who scores below the cut-off point is really normal according to the interview as well; and e) overall misclassification rate (OMR): the proportion of subjects wrongly classified by the questionnaire (false positives plus false negatives). The best cut-off point is that which gives the best trade-off between sensitivity and specificity. The easiest method which makes the sensitivity vs specificity trade-off explicit is that of the receiver-operating characteristic (ROC) analysis (2). ROC analysis has been applied in medicine to evaluate the interpretation of radiological images (21), to evaluate recall in depression (22) and to assess competing diagnostic criteria (23). Some investigators have applied ROC analysis to the study of the G H Q (24,25). A ROC curve is obtained by plotting sensitivity against false-positive rate for all possible G H Q cut-off points. Thus, apart from an easy choice of the best cut-off point, the curve allows for the assessment of the discriminating ability of the G H Q across the total spectrum of morbidity. The better the performance of the test, the closer the ROC curve to the boundary of the square and further away from the leading diagonal. The above-mentioned validity indices were estimated for GHQ-60, -30 and -28. The items of the shorter forms were disembedded from the larger set.

Greek GHQ Results Translation study

Of the 50 bilingual subjects 25 were women; 30 were married and 20 were single. Their mean age was 36 & 9 years (range 17-55). Forty-six had a university degree and 4 a secondary full education. Initially, the G H Q scoring method (0-0-1-1) was applied on all 3 forms. The endorsement frequencies for each question were assessed, i.e., the proportion of the subjects who answered the questions in a pathological manner. The range of endorsement frequencies for the Greek version was as follows: GHQ60: 0.00-0.16, GHQ-30: 0.00-0.18 and GHQ-28: 0.02-0.14. For the English version the corresponding results were: 0.00-0.22,O.OO-0.18 and 0.00-0.14, very similar to those of the Greek. Twenty-four items of the GHQ-60, 6 of the GHQ-30 and 8 of the GHQ-28 were found to have exactly the same endorsement frequencies in both languages. The difference in frequency did not exceed 0.10 for any question in any form. The average difference for the GHQ-60 was 0.020 (2.0%), for the GHQ-30 0.026 (2.6%) and for the GHQ-28 0.023 (2.3%). When the item responses were rescored by the Likert method (0- 1-2-3), the internal consistency of the test was computed separately for the 2 versions of each form. The resulting alpha coefficients were extremely high and very similar. GHQ-60: English version, 0.94, Greek version, 0.96; GHQ-30: English version, 0.97, Greek version, 0.94; GHQ-28 English version, 0.94, Greek version, 0.93. In addition, by applying the same scoring method, correlations were assessed: between the answer given in the English and Greek version for each question separately and between the 2 versions for each subject. Cohen’s kappa (26-28) was applied. Finally, the correlation of the total scores of the 2 versions was assessed using Pearson’s productmoment correlation. In the item-by-item analysis, the kappas for all items of the GHQ-30 and GHQ28 were found to be statistically significant. In the GHQ-60, kappa was nonsignificant for 2 questions: been afraid that you were going to collapse in a public place and felt you’re just not able to make a start on anything. The total scores was significantly correlated for all 3 forms. The product-moment correlation coefficient was 0.96 for the GHQ-60 and GHQ-28 and 0.95 for the GHQ-30. The comparison of the 2 versions for each subject separately shows that 48 subjects of 50 (96%) on GHQ-28 and 47 subjects (94%) on GHQ-30 and GHQ-60 had a significant correlation for the pattern of their answers in Greek and English. (Details for the above kappa correlation coefficients are available on request.)

Validity study

Of the 100 medical outpatients, 68 were women; 58 were married, 15 single, 19 widowed and 8 divorced. Their mean age was 49_+12 years and their mean education in years was 7 & 4. The analysis of the PSE data according to the ID program revealed that 46 were allocated to levels 5-8 and therefore they have been considered psychiatric cases. The correlation between G H Q scores and PSE-ID ratings was: GHQ-60: 0.78, GHQ-30: 0.77, GHQ-28: 0.77 and CGHQ: 0.74. Fig. 1-4 refer to the curves obtained by the ROC analysis of the GHQ-60, GHQ-30, GHQ-28 and CGHQ respectively. The best threshold scores for the G H Q forms are: GHQ-60: 11/12; GHQ-30: 5/ 6; GHQ-28: 4/5; and CGHQ: 11/12. For this cut-off point, the GHQ-60 has a sensitivity of 87%, a specificity of 8 5 % , a PPV of 83%, a NPV of 88% and an OMR of 14%; the corresponding indices for the GHQ-30 are 80%, 91%, 88%, 847; and 14%; for the GHQ-28, 89%, 8 l % , SO%, 90% and 15%, and for the CGHQ, SO%, 83%, SO%, 83% and 18%. At the same cut-off points, the ratios of high to low scorers are 48/52 for the GHQ-60, 42/58 for the GHQ-30 and 51/49 for the GHQ-28. The 4 subscales of GHQ-28 are by no means independent (Table 1). Testing the influence of some demographic variables on the behaviour of the GHQ, the various validity indices were calculated in different subgroups of the sample. Although the differences were not statistically significant for the various G H Q forms and on different cut-off points, a tendency emerged for men and married people to manifest better sensitivity and NPV and for women and nonmarried people to have better specificity and PPV. Finally, the more educated patient (> 6 years of education) yielded higher rates on all 4 indices than did 6I years of education). the less educated ( Discussion

This study clearly indicates through various approaches that the Greek translation of the GHQ-60, -30 and -28 is highly accurate. Table 1. Intercorrelations between the 4 subscales and total GHQ-28 score

A scale B scale C scale D scale

B scale

C scale

D scale


0.40 0.35

0.41 0.55





0.33 -

28-item total GHQ score 0.78 0.83 0.66 0.70


Garyfallos et al.


G H Q 60








07 06 x


.-> 0.5

_tJl 2 0.4 . L

tn 0.3


False P o s t h e Rate 1:- cut-off p o i n t : 11/12 Fig. 1 . The fitted ROC curve of the GHQ4lJ

Two important issues were kept in mind concerning the accuracy of the translation. First, the memory factor. By completing the 2 versions too closely chronologically, the accuracy of the results could be influenced by memory effects. On the other hand, if the time gap is too long, a possible alteration of the subject’s psychological state might be a confounding variable, giving false results. This is particular true for the GHQ, which is designed to detect relatively acute changes, many of which are short-lived (2). A time gap of 2-5 d was considered most appropriate. The second issue concerns the possible effects of the sequence of versions. This was also considered by arranging the subjects so that half answered the English version first and the Greek second and vice versa. Comparing our findings to those of Chan’s (29), who used bilingual subjects for the Chinese version of the GHQ-60, the Greek translation manifests fewer discrepancies from the English than did the Chinese. For instance, the differences in endorsement frequencies were not greater than 0.10 between the English and the Greek versions in any question, whereas in the Chinese version the difference was much greater in 7 questions, reaching the 0.30 level. Furthermore, Chan’s study had only 9 questions with equal endorsement frequencies in both versions, 374

whereas in the present study there were 24 questions, almost 3 times as many. The item-by-item comparison between the 2 versions showed that 2 questions had no significant correlation between the Greek and the English, whereas the Chinese study had 8 such questions. The comparison between the scores in the 2 versions for each subject separately showed them to be highly similar. Minor differences should not be attributed to weaknesses of the translation, as such phenomena occur even when a questionnaire is completed on 2 occasions in the same language. The issue of translation is complex and “the very essence of ethnographic research”, particularly for anthropologists (30). Words and expressions are, in fact, symbols that condense a distinctive set of meanings, a culture-specific semantic network (3 1). So, the cultural concept appears to play a significant role and therefore greater cultural differences result in more problems in the translation and standardization of a test (32). For instance, there were tremendous difficulties in the translation of the Minnesota Multiphasic Personality Inventory (MMPI) in Japan and Pakistan, because of the great cultural differences between these cultures and that of the United States ( 3 ) . Kinzie et al. (33) report that the word guilt does not discriminate depressed from nonde-

Greek GHQ 10


08 07 06



E 05

L .ul

2 04


03 02













False Positive Rate %:f

C u t - o f f p o i n t : 5/6

Fig. 2. The fitted ROC curve of the GHQ-30

pressed Vietnamese-Americans as the expression shameful and dishonored does. The Greek culture is mainly a European one, and is thus much closer to the culture on which the English language is based than is the Chinese. We believe that this is the rflain reason why the Greek translation of the G H Q seems to manifest fewer discrepancies from the English prototype than the Chinese. Another reason could be that the Chinese translation was the work of a single person, whereas a group of 4 people participated in the English-to-Greek translation. It has been found that translations by 1 person appear to lack accuracy even when the translator is fluent in both languages (34, 35). The results suggest that all 3 forms of the Greek version of the GHQ yield satisfactory validity. This has been confirmed by the fact that the correlation coefficients are slightly better than those reported as median coefficients by Goldberg & Williams (2) in their overview of 22 known similar studies. The median coefficient for the GHQ-60 was 0.72, for the GHQ-30, 0.59 and for the GHQ-28, 0.76, and the coefficients in this study are 0.78, 0.77 and 0.77 respectively. Comparing our results to those of the studies that investigate the validity of the GHQ vx PSE, only that of Rabins & Brooks (12) gives a better correlation coefficient (0.83) for the GHQ-28.

The sensitivity and the specificity are acceptable and quite similar to the median values reported in Goldberg & William’s (2) overview, which were 79% and 87% for the GHQ-60, 81% and 80% for the GHQ-30 and 86% and 82% for the GHQ-28. Our results are compatible in comparing the above validity indices as well as those of the other coefficients (PPV, NPV, OMR) of this study with those of other studies from countries of similar cultural background using comparable samples from medical outpatient departments or primary care settings (25,36). Greece does not have a system of general practicioners. Therefore, the internal medicine outpatient clinics, especially in urban areas, function to an extent as primary care settings. One reason for the satisfactory results could be the careful translation work from English to Greek. A second reason could be the absence of any time interval between the completion of the G H Q and the PSE interview. Since the G H Q is designed to detect relatively acute changes in state, many of which are short-lived, the consequence of a quite long delay between the 2 stages might be that a proportion of high scorers would be rated as non-cases on subsequent interview and regarded as false positives. A similar albeit smaller effect can be expected from people becoming ill between the completion of the 375

Garyfallos et al.

G H Q 28





03 04 05 06 07 False Positive Rote




cut-off p o i n t : 4/5

Fig. 3. The fitted ROC curve of the GHQ-28.

questionnaire and the interview, and thus being regarded as false negatives. The above phenomena are more evident if the time gap is longer than a week, as Goldberg & Williams (2) report, suggesting that the delay between stages be as short as possible. Sensitivity and specificity depend on the ratio of high to low G H Q scorers in the sample and a ratio of 50:50 is nearly ideal, giving high values to both indices (37). This ratio in this study does not depart significantly from equality on any of the G H Q forms and it is quite similar to those reported in other studies carried out in similar clinical settings and in countries with similar cultural background (25, 36). Furthermore, it should be pointed out that the proportion of cases (46%) identified by the PSE is not considerably different from that (41%) Lob0 et al. (36) found in an internal medicine outpatient clinic in Spain. The best threshold scores, 4/5 for the GHQ-28 and 11/12 for the GHQ-60, are the same as those proposed by the vast majority of other investigators (2). For the GHQ-30, the most popular cut-off point is 4/5 (13 of 31 studies), as the previous authors report, but 6 other studies found 5/6 most appropriate. Our findings do not support those of FinlayJones & Murphy ( l l ) , who stated that the best threshold score for the GHQ-60 vs the PSE has to


be as high as 19/20, nor the view that higher cut-off points are needed in more expressive cultures. According to this study as well as to those from Spain (36), Italy (4) and Mexico (6), the proposed threshold scores are not consistently higher than those found in England, Iceland and the United States (2). The comparison between the GHQ-30 scored in the usual way with the CGHQ indicates that the revised scoring system does not improve the screening capacity of the instrument, as Goodchild & Duncan-Jones (15) suggest. Our results have shown that the 2 methods give exactly the same sensitivity and the new system provides somehow lower scores for the other validity coefficients as well as for the correlation (PSE-ID vs CGHQ) coefficient. Almost the same conclusions have been reported by other investigators (38-40). Goldberg & Williams (2) suggest that it would be sensible for the time being to use the new scoring method in addition to, rather than instead of, the conventional method. The 4 subscales of the scaled version (GHQ-28) are by no means independent (41) and our results indicate that this is also the case for the Greek version. The finding that the highest correlation (0.83) was that between scale B and the total score supports the view that anxiety is a core symptom in the majority of the common mental disorders.

Greek GHQ



oa 07 06


..>, 0 5 A




2 0.4


0.3 02 01












False Positive Rate $1;

cut-off p o i n t : 11/12

Fig. 4 . The fitted ROC curve of the C-GHQ.

As for the influence of some demographic variables on G H Q behaviour, the majority of the investigators (38,4244) report that sensitivity is lower in men than in women, whereas this is not the case for specificity. On these issues the present findings indicate the opposite but are in agreement with the view of Tarnopolsky et al. (37) that women manifest higher PPV than men. However, it should be noted that the small size sample restricts clear and unequivocal conclusions on the effects of sociodemographic variables on G H Q behaviours. Given that the validity of all forms is quite similar, it is difficult to recommend a particular form as being superior to others for use. Staying in line with Goldberg & William’s (2) assertion, we can only suggest that if there is not plenty of time for the completion of the test or the subjects are illiterate and the questions must be read aloud, then the shorter forms are more appropriate. Furthermore, if more information is needed beyond a single score, for case identification, then the best choice is the GHQ-28. In conclusion, the Greek translation of the G H Q is accurate and the Greek version is equivalent to the original English one. The test in all its forms is valid for use as a screening instrument for detection of psychiatric cases among the Greek population. This

confirms again the good performance of the questionnaire in different languages and cultures and suggests that a common language of psychological distress cuts across cultural barriers (45).

Acknowledgements The authors wish to express their thanks to Professor David Goldberg for valuable advice. Copyright for the translation and the use of G H Q in Greece has been obtained by NFER Publishing Co.

References 1. GOLDBERG DP. Manual of the General Health Questionnaire. Windsor: NFER-Nelson, 1978. 2. GOLDBERG DP, WILLIAMS P. A user’s guide to the General Health Questionnaire. Windsor: NFER-Nelson, 1988. 3. BUTCHER JN, PANCHERI P. Handbook of cross-national MMPI research. Minneapolis: University of Minnesota Press, 1976. F, GOBETTI C, ZIMMERMANN-TANSELLA CH, 4. FONTANESI TANSELLA M. Validation of the Italian version of the G H Q in a general practice setting. Psycho1 Med 1985: 15: 41 1 4 1 5 . JL, PASTRANA E, RODRIGUEZ F, 5. MUNOZPE, VASQUEZ ONECAC. Study of the validity of Goldberg’s 60 item G H Q in its Spanish version. SOCPsychiatry 1978: 13: 99-104.


Greek version of the General Health Questionnaire: accuracy of translation and validity.

The purpose of this study was to examine the accuracy of translation and to test the validity of the Greek version of the General Health Questionnaire...
580KB Sizes 0 Downloads 0 Views