WORK A Journal of Prevention, Assessment & Rehabilitation

ELSEVIER

Work 7 (1996) 3-11

Cross-translation of the job descriptive index: Validity and reliability procedure for the translation Ramzi Nasser a , Bahia Diefenbach*,b a The

Emirates Center for Strategic Studies and Research, P.O. Box 4567, Abu dhabi, United Arab Emirates b lhe United Arab Emirates University, United Arab Emirates Received 20 November 1995; accepted 8 December 1995

Abstract This study outlines a translation procedure for an attitudinal instrument. The study investigated the cross-translation of the Job Descriptive Index Sub scale of 'type of work.' The cross-translation or committee translation procedure asks two or more translators to translate a text from source to target language, then an expert assesses the validity of these translations. Empirically, this method has three translators, translate the instrument from English to Arabic and then an expert assesses the translations made by the three translators. This method was supported by having 180 bilinguals attempt the source language and later attempt the target language instrument or the translated instrument. The two versions are then compared through the ANDVA, correlational analyses and factor analyses. The results indicated a high reliability for the Arabic and English versions. The committee translation approach provides a valid method for translation, the results however, showed that the instrument in both languages did not show item to item similarity or equivalence.

Keywords: Cross-translation; Job descriptive index; Procedure; Reliability

L Introduction

The generality of hypothesis or theories tested in non-Western cultures is fundamental to crosscultural research. Researchers attempting to study a social science or educational problem in a foreign culture encounters restraint. The situation is more complicated with the use of psychometric instruments or attitudinal survey question-

* Corresponding author.

naires adapted or translated to a foreign culture or a target language. Generally, translations performed for attitudinal scales and psychometric tests follow three methods: the cross-translation or committee translation method, the back-translation and decentering method. The committee translation has a panel of experts who translate from a source to a target language. The back-translation method, experts translate from a source language to the target language and then back to the source. Valid results are achieved when the source lan-

1051-9815/96/$15.00 © 1996 Elsevier Science Ireland Ltd. All rights reserved. PII:SI051-9815(96)00182-9

4

R. Nasser, B. Diefenbach / Work 7 (1996) 3-11 .

guage is similar to the back-translation. The decentering method, translators attempt to translate the target language and modify the source to satisfy the equivalent translation in the target language. The committee translation method provides an important and theoretically viable method for translation as opposed to the backtranslation method. Although, the specificity and application of this method have been used on narratives and essay texts, its use in the translation of psychometric or single item questionnaires has been limited. Translation of psychometric tests cross-culturally is problematic for researchers attempting to provide validity and reliability measures. The researchers often find that the translation of the question must correspond with the response format, the saliency of the question and type of response in the context of the target culture, otherwise if the correspondence is not satisfied, the meaning of the item could change. Furthermore, educationalists and social scientists rely on tests written in the native language assuming that the bilingual foreign respondents react to the instrument similarly to the native speakers. Often, researchers rely on professional translator or a friend to provide a translation, having little concern for reliability or validity. These types of translations have not been sound in their applicability, use and limitations. The back-translation method has often been a panacea, producing dependable results. The method relies on a number of translators that serve with a complementary group as back-translators from target to source language. In the back translation method, however, most translators do not have the same knowledge and cognitive domain in both languages. In most cases, equivalence is desired in the translation from source to target language, however, the translators in both languages are not subject experts and are not familiar with the context, purpose of translation, operational definitions or criterial basis for the instrument use. Furthermore, the back translation method cannot be used effectively, in cases where single words (e.g. adverbs or stimuli for word association) are translated without a contextual frame; in such a

case it is extremely problematic to catch the meaning of the source language and provide its equivalence in the target language. Significant literature review by Brislin (1970) discussed some of the major problems of the back-translation method, indicating that these problems result from the translation from source to a target language and visa versa from the target to the source i.e. inaccurate translations fro~ source to target produces wrong translations from the target language back to the source language. However, none of these studies suggest a viable method for the committee translation approach. Campbell et aI., indicated the Committee approach or committee translation approach as being one sound approach for the translation of instruments with single words or adjectives. In this process a group of bilinguals translate from a source to a target language, these translations are then consensually assessed by other members of a committee or professional translators. Some researchers argue that the committee translation method can not be used as the only means of translation. One of the disadvantages of the cross-translation approach in connection to the use of materials, or instructions, is the low number of translators who translate from one language to another, from source to target languages. These type of translations rely on a single person often with a large number of inadequacies, inequivalencies or similarities going undetected, and phrasing questions that are supposed to be equivalent in both languages i.e. source and target elicit a different type of stimuli from different respondents. Sechrest et al. (1982), stated that cultures often have vernacular, classical and written languages that are used interchangeably, which result in a difficult instrument to adapt to a specified group. In some instances words in English language have no equivalence in a language like Arabic or Urdu and overall translations do not satisfy the meaning of the construct without the modification of the linguistic construct in the source language. In some situations where the source language does not have an equivalent term in the target language and an instrument is built on partiality i.e. the instrument does not fulfill the construct

R. Nasser, B. Diefenbach I Work 7 (1996) 3-11

5

domain of items, many of the attitudinal properties or constructs could be lost in the translation to the target version. The decentering translation often solves the problem where modifications are made to the source version to adjust to the target version. Based on the knowledge of a professional translator, the source is modified to fit the content found in the target language. The suggestion here regards equivalence of meaning as the most important aspect of translation, and concludes on the quality of translation (Nida, 1964; Catford, 1965). In this paper a committee approach is proposed for the Job Descriptive Index Subscale of 'work.' This method suggests a sound method for the translation procedure, for validity and reliability of the translation. The cross-translation method proposed should overcome much of the tedium involved in the back-translation procedure. This study does not compare the three methods of translation, instead, it concentrates on the reliability and validity of an application of the crosstranslation method, and provides an accumulation of information about a theory and method of translation equivalence as applied to empirical data.

above r = + 0.80 for each of the five scales (Hulin, 1966). The scoring scheme of the JDI is traditionally a 3, 2, 1 scale. The respondent is asked to write 'yes' if he agrees, 'no' for disagreements with the adjective and '?' for the undecided. Agreement responses receive a score of 3; 'yes' to positive items and 'no' to negative items, disagreement responses receive a score of, 'no' to positive items and 'yes' to negative items. Undecided responses '?' receive a score of 2. In this study this scoring scheme was changed from a three point scale to a nine-point scale from strongly agree to strongly disagree. Johnson, Smith and Tucker (1982) reported the internal consistency coefficients were high for the Likert format with an alpha level of 0.87. The alternative scoring scheme, indicates little difference in providing an overall estimate of convergent validity. However, Johnson et al. (1982), indicate the advantages of the non-significant skewed responses on the likert scale or a five point scale over the three point scale. In this study the sample consisted of university students. One area of the instrument, the 'type of work' area satisfaction was reconceptualized into 'school-work' satisfaction.

2. The job descriptive index (JDI)

3. Methodology

One of the widely used instruments for studying job satisfaction is known as the JDI (Smith, Kendall and Hulin, 1969). It has been translated into several languages: Hebrew (Hulin and Mayer, 1982), Tagalog (DeVera, 1985) and French (Candell and Hulin, 1986). This instrument has been administered to staff at all organizational levels all over the United States. The job descriptive index measures job satisfaction in the five areas of pay, promotion, supervision, type of work and co-workers. Reliability and validity measures have been strongly established. The JDI consists of 72 items that.. are allocated among five areas as follows: work, supervision and cO-workers have 18 items each, pay and promotion have nine items each. The instrument is reliable and valid in its five areas. The split half internal consistency coefficient is reported at

This research is concerned with three important procedures and analyses in the cross-translation process. 1. A construct validation of the translation. 2. A test on the source version of the instrument, followed by a retest on the translated target. 3. Demonstration of comparable ratings on the Arabic and English versions of the tests, by bilinguals to provide further evidence for construct validity.

The Committee method was used to translate the instrument from source to target language. First, three expert translators translated the instrument. The experts translated all instrumental instruction, i.e. the items as well as the informa-

R. Nasser, B. Diefenbach /Work 7 (1996) 3-11

6

Table 1 Professional translators background Translator

Degree

Education

Years in Profession

A B

Ph.D. M.A. M.A.

Linguistics Philosophy English as a second language Linguistics

7 years 6 years 5 years

C Expert translator

Ph.D.

9 years

tion gathering questions. In addition these translators were asked to translate the responses. Translators examined each item phrase carefully and tailored the translation to the target culture. The translators professional background are reported on Table 1: education, degrees and previous experience in the profession. A coding scheme was devised to compare the translations made by the committee of translators, from a single English word phrase written in adjective form translated into the Arabic adjectival phrases. A comparison was made among the three translators on syntactical, vocabulary and structure equivalence. A blind expert rater was asked to rate the translations based on a degree of convergence of the translated items by the three main translators. Those items that were found to have a correct translation as judged by the expert rater were scored as '3', a translation that slightly diverged from the true meaning was scored as '2' (e.g. using a verb instead of an adjective) and a translation that completely diverged from the meaning was scored as '1.' The general paradigm of this scoring procedure is based on Kerlinger's (1986) method of congruence, where a panel is to reflect, define or translate a number of items based on specifications and ope rationalization of a construct or a translation. A second part is to have an expert rater rate whether the items are logically or adequately reflecting the objective specification (Dagostino and Carifio, 1993) or the item to be translated. Agreement among raters based on the expert rater supports the specification and reflects the specificatiQn or equivalence which is evidence for validity ot" the instrument across languages. Ac-

cording to Frick and Semmel (1978), assessing the degree of agreement between raters and criterion is the more appropriate analysis than interjudge agreement. A target sample of 180 Bilingual students were selected from an American University in Beirut, Lebanon. These students were given the English version of the test. In an 8-week period these . same students took the Arabic version. All questionnaires with incomplete answers were not considered in this study. 4. Results Two important hypotheses were investigated in this study. The first asked if the translation conducted by the translators were in agreement, it was hypothesized that these high agreements would be a reflection of an equivalence of items. The second hypotheses investigated whether the instruments achieved equivalence through item responses on the Arabic and English instruments. Similar responses indicated high equivalence among item meaning and translation. In connection to these questions a repeated measure ANOVA, a correlation analyses and factor analysis was performed. The first question in this study addressed 'the inter-rater agreement when all the judges on all activities are analyzed as a group? This was done based on the expert rater ratings of the translations. Table 2, presents the frequencies and percentages of inter-rater agreements type across all translations. The square root of agreement percentages approximates the interrater correlation (agreement) coefficients (Kedinger, 1986). The inter·rater agreement was at r = + 0.57 which is a modest correlation value. The first analysis attempted to find the reliability's of the subscale for the 'school-work' job descriptive index of the students. The Cronbach alpha reliability was found at 0.67 (N = 144) for the 18 items. The reliability of the 18 translated (Arabic Versions) items was at 0.81 (N = 47). The strong alphas for the subscales demonstrate the constructs that the subscale attempts to assess i.e. it does so equally reliably among students who perform the Arabic and English version of the test.

R. Nasser, B. Diefenbach / Work 7 (1996) 3-11

7

Table 2 Percentages of inter-rater agreements by type of agreement across all translations Cum-Percent

R

Type of agreement

Frequency

Percent

1. All three translators agreed 2. Two of three raters agreed one partially agreed 3. Two of three raters agreed; one disagreed 4. Two of three judges disagreed or only partially agreed

6

33.3

33.3

0.57

4

22.2

55.5

0.47

2

11.11

66.6

0.57

6

33.3

However, the alpha on the English version was shown to be slightly lower than the Arabic version. An explanation to these results suggest that students primary language may be a source to the low consistency. It is also observed that the number of students who were first year and second year students accounted for almost all the respondents who had come from an Arab monolingual home. Three translators had translated these items with only one expert rater who evaluated the convergence of the translations which were then compared and scored. As all translators could all have the 'wrong' translations, but agree with each other, a deviation score between each rating and the correct response was computed so that this relativity was removed, making the results easily interpretable (Dagistino and Carifio, 1993). Each rater translated the 18 items, the ratings or translations were correlated. Therefore, to take account of the correlations a 19 X 3 repeated measure ANOVA was measured, to assess the degree of intra and inter-agreement of judges.

Table 3 One-way repeated measures ANOVA for the 18 ratings of the three judges

Translators Ratings Error

DF

MS

F

p

2 17

0.04 0.28 0.15

0.47 1.86

> 0.001 > 0.001

34

,

100

Table 3, presents a one-way repeated measures ANOVA on the 18 translations done by 3 translators. As can be seen from Table 3 no significant difference were found between Judges or translators (F = 0.47; df = 2; P> 0.001 and F = 1.86, df = 17, P> 0.001), respectively. These three raters agreed with each other and translated the 18 adjective phrases according to the criterion of the expert rater. Forty seven students that took the English version had taken the Arabic version of the test, a number of students were asked to take the same test in the Arabic language in a 2-month period. Item correlation was performed between the Arabic and English items of the test. The correlations are reported on Tables 4 and 5. The highest correlation is reported on item 11 at a value of 0.57. Low negative correlation was at -0.12. Understanding the empirical structure and relationships between all the variables on the subscales of the JDI in both the Arabic and the English versions and comparing the structures should provide validity to the equivalence of both tests. The data were factor analyzed using principle component analysis with unities in the diagonals, an eigen cut-off value of 1.0, and a varimax rotation, Table 6 presents the results. First the 18 items of the English version of the test were factor analyzed, followed by the Arabic version. The two factor structures were then compared. On the English version, factor analysis reduced the 18 variables to five factors. The first factor accounted for 20.6% of the variance and all the

8

R. Nasser, B. Diefenbach / Work 7 (1996) 3-11

Table 4 Correlation matrix between the Arabic and English translations Target language (Arabic) Items 2

3

4

5

6

7

8

9

10

0.23 (N= 30) 2

0.24 (N = 30) 0.25 (N= 29)

3 4 Source language

0.38 (N = 30) 0.27 (N= 30)

5 6

0.14 (N= 30)

7

-0.12 (N= 30) 0.13 (N= 29)

8

0.57" (N = 30)

9 10

0.36 (N= 30)

"Significant at P < 0.01 level.

five factors accounted for 60% of the variance. On the Arabic version of the test, factor analysis reduced the 18 variables to five factors. The first factor accounted for 29.5% of the variance. The five factors accounted for 73.3% of the variance.

Table 7 presents the rotated factor analysis results of the Arabic version of the JDI, using principle component with unities in the diagonals, an eigen cut of value of 1.0 and a varimax rotation of the Arabic version of the JDI.

Table 5 Items 11 12 13 14 Source language

15 16 17 18

11 0.57 (N = 29)

12

13

14

15

16

17

18

0.22 (N= 30) 0.26 (N= 30) 0.17 (N = 28) 0.36 (N = 29) 0.05 (N= 29) 0.27 (N = 30) 0.20 (N= 30)

R. Nasser, B. Diefenbach / Work 7 (1996) 3-11

9

Table 6 Rotated factor structure of the 'school-work' area of the English version of the job Descriptive index (N = 144)

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

o

Fascinating Routine Satisfying Boring Good Creative Respected Hot Pleasant Useful Tiresome Healthful Challenging On Your Feet Frustrating Simple Endless Gives Sen.

0.62* 0.03 0.05 -0.20 -0.19 0.06 0.38 0.80* 0.65* 0.07 -0.D3 0.71 0.27 0.72* 0.05 0.04 0.06 0.16

II

III

IV

V

h2

0.17 0.02 0.74* 0.00 0.80* 0.80* 0.30 -0.21 0.39 0.44* -0.06 0.02 0.47 -0.07 -0.01 0.05 -0.09 0.53*

-0.08 0.35 -0.06 0.11 -0.09 -0.14 0.25 0.03 -0.22 0.45* 0.71* 0.21 0.31 0.14 0.68* -0.07 0.71 0.21

-0.14 0.68* 0.12 0.81* -0.07 0.00 -0.35 0.12 -0.04 -0.52 0.00 -0.24 -0.18 -0.04 0.31 0.06 0.02 -0.37

-0.15 0.02 0.13 0.13 0.03 -0.07 0.15 -0.12 0.05 0.08 -0.32 0.23 -0.30 0.08 -0.D3 0.90* 0.11 0.01

0.46

Total

The main purpose of the factor analysis was to determine if the structures of the source and target tests were similar. The English version of the JDI showed items 1, 8, 9 and 14 had relatively high loading on the items, were the Arabic version had shown a greater number of items load on the first factor including items 1, 5, 6, 7, 8, 9, and 12. Item 14 which loaded on the first factor of the English version of the test, had loaded on the second factor of the Arabic version of the test. Common item loading on the second factor for both version of the tests were items 3, 10 and 18. Item loadings on the third factor were items 11 and 15, on the fourth factor was item 2. In total nine items loaded on the factors. Although, the tests have shown some similarities, the factor loadings comparisons do not contribute to a similar factor structure. However, if one examines the communalities on the two previous analysis on Tables 6 and 7, one finds that communalities were relatively high among both versions of the tests, suggesting a common feature for the school-work subscale of the JDI. The translation and expert ratings on the test showed convergence on the translation specification which provides evidence for construct validity of the Arabic version of the

0.~8

0.5'9 0.73 0.69 0.60 0.44 0.70 0.62 0.67 0.61 0.67 0.51 0.55 0.57 0.78 0.53 0.48 h 2 = 60%

test. Both correlational analysis and factor analysis show some similarities that does not warrant a significant equivalence of both tests. 5. Discussion The most practical procedure for the translation of instruments is known as the committee approach method. In the committee procedure a bilingual translator translates an instrument from a source language to a target language. This is particularly true when a domain content is represented as a short set of items or an adjective checklist (Werner and Campbell, 1970). Nevertheless, the translator might not be familiar with the context of the specified research. A cross validation necessary in a cross-translation of an instrument. This is accomplished by using the convergent validity paradigm (Campbell and Fiske's, 1959) where one or more experts rate the translations based on their adequacy and appropriateness. This study attempted to present a methodology for the translation of the job descriptive index 'school-work' subscale. To account for the equivalence in the translation procedure, several phases

10

R. Nasser, B. Diefenbach / Work 7 (1996) 3-11

Table 7 Rotated factor structure of the 'school-work area' of the Arabic version of the job Descriptive index (N = 47) II

I. 2. 3. 4. 5. 6. 7. 8. 9. 10. II. 12. 13. 14. 15. 16. 17. 18. Total

0.77* 0.16 -0.01 -0040 0.66* 0.43 -0.01 -0.59 0.56* 0.12 0.68* 0.15 0.58* 0.19 0.51* 0.29 0.64* 0.60* 0.64* 0.25 -0.02 0.15 0.81* 0.20 0.18 0.19 0.67* 0.05 -0.04 -0.09 0.04 0.19 0.12 0.12 0.85* 0.16

III

IV

V

-0.21 0.43 0.17 0.60* 0.28 0.28 0.38 0.03 -0.17 0.02 0.87* - 0.00 0.82* 0.57* 0.72* 0.10 -0,01 0.10

0.13 -0.36 0.50* 0.34 0,01 0.11 0.16 0.10 0.61* -0.05 0.02 0.31 -0.41 0.35 -0.65 0.03 -0.09 0.01 -0.01 0.19 0.12 -0.03 -0.D7 0.16 -0.09 -0.03 -0.14 -0.12 0.23 0.12 0.80* -0.12 -0.09 0.90* 0.10 0.08

h2 0.81 0.70 0.66 0.74 0.78 0.65 0.80 0.77 0.81 0.50 0.79 0.72 0.75 0.82 0.60 0.70 0.84 0.77 h 2 = 70%

of the study were established. Three translators translated the instrument into the target language. An expert rater conducted a criterion based rating of translation, by comparing the number of errors based on the equivalence criteria. Within this process, an expert rater examines the source with the target versions and scores the target translation for its clarity, adequacy and appropriateness. Consequently a final version was developed and a group of students where given the English version of the instrument. Equivalence often establishes universality of the measures, adequacy of test-translation are founded upon finding similar validity, reliability and factor structures,. The established validity of the JDI in the English version has been reported by Smith, Kendall, and Hulin, (1969). In this study, the construct validity of the Arabic version of the 18 items 'school-work' subscale was assessed by having three translators translate the items and an expert rater establish criteria for the 'correct translations: One should note that interrater agreement is not the same as reliability of observational measures, since translators may

have the wrong translations but all agree with each other. Thus, it was necessary to have an expert rater rate the translations. No significant differences were found among the raters, translations were construct validated based on expert criterion. To measure the reliability of the instrument a correlation analysis procedure was conducted between the responses on the instruments in both languages. The low correlations for the bilinguals on the two instruments of the source (English) and target (Arabic) versions, do not grant equivalence between them. These low correlations might have occurred due to the administration time span between the two instrument versions. Although the reliability scores for both the English and Arabic versions of the JDI was high, the factor analysis do not lend support for a similar factor structure. However, some similarities were found in the first factor which accounted for 20.6% of the variance for the English version and 29.5% for the Arabic version. The latter results do not provide a strong reliable support as the respondent to item ratio was well below the minimum 5:1 ratio. These results are not comparable to provide equivalent form-validity between the English and Arabic versions of the test, however, moderate construct validity was established as translators showed agreement based on an expert criteria. In conclusion, the results provide evidence that the Arabic and English version were not similar in structure. One should keep in mind that exact translation is impossible in principle and more important the committee approach does not satisfy equivalent translation or hold the original language for revision. A condition for equivalence of translation is to have bilinguals respond to items in both languages. The goal should not be identity of the item-by-item level but rather equivalence of means and variances, plus appropriate correlations. Evidence from these results suggest that test-retest on a group reflect some linguistic and cultural factors; students who responded on the English version responded differently on the Arabic version as reflected on the results. The selected sample of first year and second year stu-

R. Nasser, B. Diefenbach / Work 7 (1996) 3-11

dents had come from an Arab monolingual home where the saliency of these items in the native language reflected a different attitudinal responses in the time gap between the pretest (English version) and post-test (Arabic version). A researcher who follows a method of translation in cultures or linguistic backgrounds different from his native one face problems. Instruments can not be ideally translated into equivalent forms, however, some specific methods could be applied to provide reliable translations. The results of this study suggest that cross-translation may have not been a viable method but could be used with the back-translation or the decentering method to provide a more valid and reliable results. References Candell, G. and Hulin, C. (1986) Cross-language and crosscultural comparisons in scale translations: Independent sources of information about item nonequivalence. J. Cross-Cult. Psycho\, 17,417-440. Dagostino, L. and Carifio, J. (1993) Establishing the logical validity of instructional activities for teaching reading evaluative1y. North Eastern Research Association Annual Conference. deVera, M. (1985) Establishing cultural relevance and measurement equivalence using emic and etic items: Unpub-

11

Iished Doctoral Dissertation, University of Illinois at Urbana-Champaign. Frick, T. and Semmel, M. (1978) Observer agreement and reliabilities of classroom observational measures. Rev. Educ. Res. 48 (1), 157-184. Hulin, C. (1966) Job satisfaction and turnover in female clerical population. J. App\. Psycho\, 50, 280-285. Hulin, L. and Mayer, L. (1986) Psychometric equivalence of a translation of the JOI into Hebrew. J. App\. Psycho\, 62, 385-391. Johnson, S., Smith, P. and Tucker, S. (1982) Response format of the job descriptive index: assessment of reliability and validity by the multitrait-multimethod matrix. J. App\. Psycho\, 67(4),500-505. Jung, K., Dalessio, A. and Johnson, S. (1986) Acad. Manage. J. 29(3), 609-616. Kerlinger, F. (1986) Foundations of Behavioral Research. New York: Holt, Rinehart and Winston. Lissitz, R. and Green, S. (1975) Effect of the number of scale points on reliability: A Monte Carlo study. J. App\. Psycho\, 60,10-13. Smith, P., Kendall, L. and Hulin, C. (1969) The Measurement of Satisfaction in Work and Retirement. Chicago, IL: Rand McNally and Company. Sechrest, L., Fay, T and Zaidi (1972) Problems of translation in cross-cultural research. J. Cross-Cult. Psycho\, (3)1, 41-56. Werner, O. and Campbell, D. (1970) Translating, working through interpreters and the problem of decentering, In: Naroll, R. and Cohen, R. (Eds.), A Handboo~ of Cultural Anthropology. New York: American Museum of Natural History.

Cross-translation of the job descriptive index: Validity and reliability procedure for the translation.

This study outlines a translation procedure for an attitudinal instrument. The study investigated the cross-translation of the Job Descriptive Index S...
1MB Sizes 0 Downloads 0 Views