Original Research Choice of Rating Scale Labels: Implication for Minimizing Patient Satisfaction Response Ceiling Effect in Telemedicine Surveys

Caterina Masino, MA,1 and Tony C.M. Lam, PhD 2 1

Telehealth Program, UHN—University Health Network, Toronto, Ontario, Canada. 2 University of Toronto, Toronto, Ontario, Canada.

Abstract Background: Lack of response variability is problematic in surveys because of its detrimental effects on sensitivity and consequently reliability of the responses. In satisfaction surveys, this problem is caused by the ceiling effect resulting from high satisfaction ratings. A potential solution strategy is to manipulate the labels of the rating scale to create greater discrimination of responses on the high end of the response continuum. This study examined the effects of a positive-centered scale on the distribution and reliability of telemedicine satisfaction responses in a highly positive respondent population. Materials and Methods: In total, 216 telemedicine participants were randomly assigned to one of three experimental conditions as defined by the form of Likert scale: (1) 5-point Balanced Equal-Interval, (2) 5-point Positive-Packed, and (3) 5-point Positive-Centered Equal-Interval. Results: Although the study findings were not statistically significant, partially because of sample size, the distribution and internal consistency reliability of responses occurred in the direction hypothesized. Loading the rating scale with more positive labels appears to be a useful strategy for reducing the ceiling effect and increases the discrimination ability of survey responses. Conclusions: The current research provides a survey design strategy to minimize ceiling effects. Although the findings provide some evidence suggesting the benefit of using rating scales loaded with positive labels, more research is needed to confirm this, as well as extend it to examine other types of rating scales and the interaction between rating scale formats and respondent characteristics. Key words: ceiling effect, Likert scale, agreement labels, patient satisfaction, positive responses, telemedicine, patient characteristics, survey design, response bias

Introduction

P

atient satisfaction is one of the most widely researched patient-oriented outcomes for assuring quality of services in telemedicine healthcare quality evaluation,1 and it is typically measured quantitatively through the most widely used instruments in healthcare research, self-administered surveys.

DOI: 10.1089/tmj.2013.0350

Similar to patient satisfaction findings from conventional healthcare delivery models,2,3 high levels of satisfaction have been uniformly observed in telemedicine patient evaluation research. Four separate systematic reviews of research on surveying of patient satisfaction with telemedicine have consistently revealed very high satisfaction responses.4–7 This trend continues in recent surveys of nursing care patients8 and lung transplant recipients9 using home monitoring telemedicine. It appears that low response variability in satisfaction surveys resulting from the ceiling effect is now a very well-documented issue in the published conventional healthcare literature. A ceiling effect occurs when the responses are not evenly distributed and show a positive ‘‘skew’’ toward the favorable end.10 When a negative ‘‘skew’’ is observed, that is, responses cluster at the opposite or nonfavorable end of the scale; this is termed the floor effect.11 Regardless of the nature of the high satisfaction responses, may it be genuine satisfaction, evidence of response biases such as socially desirability or acquiescence,3 or both, the ceiling effect stemming from high satisfaction responses makes it impossible for researchers and evaluators to rely on survey results to distinguish respondents with varying levels of satisfaction and consequently analyzing these responses through correlational analysis.12–14 Although the presence of the ceiling effect in patient satisfaction surveys is well recognized, only a few healthcare researchers have directly addressed it.11–13 Technically, the ceiling effect is an instrument issue; specifically, the instrument used to measure satisfaction is not sensitive enough to discriminate satisfaction levels among respondents. To enhance the quality of survey instruments in telemedicine research, Demiris15 presented a set of principles of survey development for telemedicine research applications. Unfortunately, none of the principles pertains to strategies for addressing ceiling or floor effects due to skewed distributions with low variability of satisfaction response. Regarding response scale design, the only recommendation is the use of a 5-point Likert scale with agree and disagree labels—most commonly used in telemedicine research.15 However, researchers and survey methodologists have long argued that survey responses are influenced by the descriptors or anchors used in the rating scale,13 and the typical 5point Likert scale may not be most appropriate for measuring improvements in satisfaction in patients who are already well satisfied.13 To combat ceiling effects in patient satisfaction by spreading response distributions, methodologists in health sciences have proposed the following three rating scale construction strategies.

ª M A R Y A N N L IE B E R T , IN C .  VOL. 20

NO. 12  DECEMBER 2014

TELEMEDICINE and e-HEALTH 1

MASINO AND LAM

USE SATISFACTION WITH PRECEDING TREATMENT AS A BENCHMARK Howorka et al.11 sought to increase the sensitivity of a validated diabetes treatment satisfaction questionnaire to enable the detection of small improvements in insulin treatment effects in an already well-satisfied patient population. The original instrument measured patients’ absolute estimation of satisfaction, for example, ‘‘How satisfied are you with your current treatment?,’’ and used a 7-point rating scale with only end points labeled with ‘‘very satisfied’’ and ‘‘very dissatisfied.’’11 In the modified version, they provided a comparison to the preceding treatment, for example, ‘‘Compared with your experience of treatment just over 3 months ago how satisfied are you with your current treatment?’’ The rating scale remained a 7point one but used relative comparison labels for the end points instead, for example, ‘‘much more satisfied now,’’ ‘‘much less satisfied now.’’11 Howorka et al.11 found significant increases in satisfaction in 25% more items in the modified form compared with the original. Reliability (Cronbach’s alpha) of the responses to the modified form (0.96) was slightly higher than that of the responses to the original form (0.93). Findings from this study suggest that when a comparisonto-preceding treatment or reference period is appropriate, using a relative rating scale is more effective in detecting small increases in satisfaction than the absolute rating scale.11 This strategy of providing reference points to aid the memory retrieval process by the survey respondents has been demonstrated in a variety of applications.16 However, relative rating may not be appropriate as comparison-to-preceding treatment scale use is limited because of the nature and context in which telemedicine services are being used. For example, it would not be appropriate under similar conditions to the current study where telemedicine use is in addition to and does not replace the existing standard of care.

USE OF A 5-POINT RATING SCALE WITH ALL-POINTS-LABELED Irrespective of the type of labels used, the optimal number of scale points is an important factor to consider in scale design. Although scale points have been found to range from 3 to 11 points,3,7 most researchers agree that a 10-point scale is excessive, and preference is given to 5–7 points, with a 5-point scale being the most common.13 In a recent study that sought to improve the data quality in a validated inpatient satisfaction tool, the following two versions were used: a 10-point rating scale with end points-only-defined with labels and a 5-point scale with all-points-labeled.13 In the questionnaire with the 10-point rating scale, the response labels vary according to the content of each item, for example, when asking whether the patient received enough information relating to his or her condition, the resultant endpoint anchors would be, for example, ‘‘I received very little information’’ and ‘‘I received enough information.’’13 With the 5-point rating scale, all the scale points were labeled with the following generic anchors, for example, ‘‘not at all,’’ ‘‘to a small extent,’’ ‘‘to some extent,’’ ‘‘to a large extent,’’ and ‘‘to a very large extent.’’13 Findings revealed different rating response distributions between the two forms.

2 TELEMEDICINE and e-HEALTH D E C E M B E R 2 0 1 4

Distribution obtained by the 10-point rating scale was more skewed ( J- or U-shaped distribution) with larger floor and ceiling effects than that obtained by the 5-point rating scale version.13 There was also a significant difference in the mean scores, with higher means observed with the 10-point rating scale in the majority of the items (21 out of 24).13 The recommendation to use 5-point all-labeled scales in healthcare13 is congruent with survey guidelines in telemedicine as proposed by Demiris.15

OFFSETTING THE MIDDLE AND EXPANDING THE AREA OF INTEREST The third strategy to expand response variability and consequently minimize the ceiling effect is based on the idea that ‘‘average’’ need not be in the middle of a rating scale continuum.12,14 If we expect the responses are mostly above average, we can move the average label below the middle point of the rating scale toward the negative end and consequently create more room to ‘‘pack’’ the rating scale with mostly above average type of labels. For example, for a 5-point rating scale the ‘‘packed’’ labels may include ‘‘below average,’’ ‘‘average,’’ ‘‘above average,’’ ‘‘very good,’’ and ‘‘excellent.’’ In a traditional Likert scale with ‘‘agreement’’ type anchors, the typical Balanced Equal-Interval Rating Scale with labels of ‘‘strongly disagree,’’ ‘‘disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ and ‘‘strongly agree’’ can be converted to a Positive-Packed Rating Scale by packing the middle labels with positive anchors: ‘‘strongly disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ ‘‘very much agree,’’ and ‘‘strongly agree.’’ The Balanced and Positive-Packed Rating Scales both use the same end-point labels (‘‘strongly disagree’’ and ‘‘strongly agree’’), and although the three middle labels in the Balanced Equal-Interval Rating Scale are equally spaced psychologically, the intervals among the three middle labels in the packed rating scale are bunched up in the agreement end. To create a packed rating scale with labels that are roughly equally spaced, we can replace the ‘‘strongly disagree’’ label with a more neutral label like ‘‘disagree’’ and create a Positive-Centered Equal-Interval Rating Scale with these labels: ‘‘disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ ‘‘very much agree,’’ and ‘‘strongly agree.’’ The purpose of our research is to empirically test the strategy of ‘‘offsetting the middle and expanding the area of interest’’ to reduce ceiling effects in satisfaction surveys. Obviously this strategy should be used only if there is strong evidence, as it is the case in telemedicine satisfaction surveys, that the population consistently provides highly positive responses. We explored the feasibility of using Positive-Packed and Positive-Centered Equal-Interval Rating Scales to increase variability and reliability of responses to satisfaction surveys in the telemedicine patient population. The telemedicine modality examined in this current study uses patient–provider twoway live interactive videoconferencing for patients to see their healthcare professional. We hypothesize that the Positive-Packed and Positive-Centered Rating Scales will provide a better discrimination of responses in telemedicine satisfaction surveys than the Balanced Equal-Interval Rating Scale by producing (1) larger response variances and (2) higher internally consistent (reliable) responses. We posit our hypothesis with the assumption that the respondents refer to the labels themselves and not the positions of the labels to

CEILING EFFECT IN TELEMEDICINE SURVEYS

respond to the items. For example, with the Positive-Centered EqualInterval Rating Scale, a respondent will pick ‘‘very much agree’’ and not the fourth position from the left as his or her feeling about the statement. We further assume that researchers report the labels used in their rating scales in their studies so that the results can be properly interpreted. In this way, for example, an average response of 4.2 observed in research X that used a 5-point rating scale with 4 being ‘‘very much agree’’ and 5 being ‘‘strongly agree’’ would be considered more positive than research Y that used a 5-point rating scale with 4 being ‘‘agree’’ and 5 being ‘‘strongly agree.’’ This is because, generally, we can conclude that participants in research X ‘‘very much agreed’’ with the satisfaction statements, whereas participants in research Y only ‘‘agreed’’ with these statements. Finally, unlike other instrument-based bias reduction strategies like the ‘‘forgiving’’ method17 and the randomized response technique,18 the loading of positive rating scale labels strategy tested in our research is not intended to eliminate response biases such as the socially desirable response or other self-serving biases that the respondents bring with them to the survey. Rather, it is designed to minimize the ceiling response effect caused by extreme positive ratings, which can be a result of both genuine satisfaction and response biases in various combinations, so that further analysis can be meaningfully performed. As mentioned earlier, this strategy should only be used if we are certain that the population under survey has a history of providing highly positive satisfaction responses.

Materials and Methods Instrument. The current study used the Ontario Telemedicine Network (OTN) Patient Satisfaction Questionnaire (PSQ) developed and copyrighted by the OTN. The scale items are declarative statements pertaining to patients’ opinion and perceptions of telemedicine. The 17 items are designed to measure patient satisfaction in the following six categories: (1) Expectations, (2) Access, (3) Technical, (4) Communication, (5) Privacy, and (6) Satisfaction. The statements for each of these six categories are depicted in Figure 1. The OTN PSQ uses a 5-point Likert rating scale with the following labels: ‘‘strongly disagree,’’ ‘‘disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ and ‘‘strongly agree’’ (used in the current study for the Balanced EqualInterval Rating Scale condition). Previous OTN 2008 evaluation results found overall satisfaction with telemedicine was 97% (combining ‘‘agree’’ and ‘‘strongly agree’’ responses), with mean ratings ranging from 4.28 to 4.62 (with 5 being the highest rating) across the six categories of items, mode of 5, and reliability, as measured by Cronbach’s alpha, of 0.822.19

PROCEDURE The current study was conducted at a multisite academic tertiarycare organization with an established on-site telehealth program and approved by the Research Ethics Board. The target population is outpatients scheduled for a telemedicine videoconference appointment by a clinician staff member at the participating hospital organization and who are fluent and literate in English. A convenience sample of 216 patients scheduled during the 2-month study time frame was formed.

Fig. 1. Patient Satisfaction Questionnaire structure with abbreviated item content by subscore group. Permission for use in this article has been granted by the Ontario Telemedicine Network.

Before the videoconference appointment, the senior investigator contacted the eligible participants by phone, and information about the study, consent form, and survey was mailed to those who agreed to participate in the study. The senior investigator made a maximum of two follow-up telephone reminders within 2 weeks after the scheduled appointment date. Although participants were reminded they could withdraw at any time from the study, no one expressed a desire to do so during the reminder calls. The 216 consenting participants were randomly assigned to one of the three survey form conditions with the same statements but differing on the anchors for rating scales: (1) Balanced Equal-Interval Rating Scale (‘‘strongly disagree,’’ ‘‘disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ and ‘‘strongly agree’’; (2) Positive-Packed Rating Scale (‘‘strongly disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ ‘‘very much agree,’’ and ‘‘strongly agree’’; and (3) Positive-Centered Equal-Interval Rating Scale (‘‘disagree,’’ ‘‘neutral,’’ ‘‘agree,’’ ‘‘very much agree,’’ and ‘‘strongly agree’’). Equal numbers of participants (n = 72) were assigned to the three treatment conditions. We used the following strategies to minimize nonresponse rate: (1) the purpose of the study, which was ‘‘to examine the effects of rating scale design as a strategy to reduce ceiling effect using a satisfaction survey,’’ was clearly identified to the participant and explicated during the recruitment and consent processes; (2) assurance of anonymity and confidentiality were made; (3) the PSQ is a validated survey instrument consisting of nonthreatening or sensitive questions; and (4) a maximum of two reminder calls was made to return the survey by mail, of

ª M A R Y A N N L I E B E R T , I N C .  VOL. 20

NO. 12  DECEMBER 2014

TELEMEDICINE and e-HEALTH 3

MASINO AND LAM

92% of all the participants were ‘‘satisfied’’ (percentage of responses that are ‘‘agree’’ or ‘‘strongly agree’’ to the positively stated survey items and ‘‘disagree’’ or ‘‘strongly disagree’’ to the one negatively stated item). These means were not statistically significant different as tested by oneway analysis of variance (F2, 151 = 0.105, p = 0.901). The results of the current study are comparable to the PSQ 2008 baseline data, which found that the overall satisfaction with telemedicine was 97%, with mean item rating ranging from 4.28 to 4.62 and a mode of 5.19 Despite the higher means (hence the greater ceiling effect) observed in the Positive-Packed and Positive-Centered conditions, the standard deviations (8.46 and 8.64) were higher than that obtained from the Balanced Scale condition (6.64). The test of homogeneity of variances revealed no statistical significance among the variances [Levene’s statistic = 1.875 (2, 151), p = 0.157]. An examination of the frequency of selection of the rating scale labels revealed that in both the Positive-Packed and PositiveCentered conditions, there is a substantial use of the ‘‘very much’’ label and minimal use of the ‘‘agree’’ label. Respondents in the Balanced Scale condition used the ‘‘agree’’ label 39.5% of the time, but only 6.1% in the Positive-Packed Scale and 5.3% in the PositiveCentered Scale conditions. It could be conjectured that a good number of respondents using the Balanced Scale form rounded down their choice and selected ‘‘agree.’’ However, in the Positive-Packed and Positive-Centered Scale conditions, respondents seemed to have rounded up their choice to ‘‘very much agree.’’ The presence of an intermediate choice ‘‘very much agree’’ between ‘‘agree’’ and ‘‘strongly agree’’ on a rating scale provided a better discrimination of positive responses than without it. Consequently, there is some weak evidence supporting the conclusion that surveys using either the PositivePacked or Positive-Centered Equal-Interval Rating Scale produce higher mean scores but more precise measures of positivity than surveys using the Balanced Equal-Interval Rating Scale. This conclusion is further supported by results from the reliability analysis. Overall, as seen in Table 1, the reliability analysis yielded high Cronbach’s alpha coefficients for all the three survey forms (consistent

which no one expresssed the intention to withdraw from the study. In addition, the telemedicine patient population in this clinical environment uses telemedicine as adjunct appointments, with the vast majority of telemedicine appointments requested by the clinicians and not initiated by the patients, thereby limiting the self-selection bias into a preferred use of telemedicine for their appointment.

Results PARTICIPANT CHARACTERISTICS Of the 216 surveys, 154 (71%) surveys were returned as follows: 54 Balanced Equal-Interval Rating Scale, 48 Positive-Packed Rating Scale, and 52 Positive-Centered Equal-Interval Rating Scale. The sample consisted of 74 females (48%) and 80 males (52%). The age range was from 18 to 65 years of age, with the majority of participants (n = 90, 58%) between 45–64 years of age and the second largest group (n = 41, 27%) being 65 years of age and over. Most of the participants (n = 85, 55%) had used telemedicine before. Twenty-one (14%) of participants reported that they had technical problems with the session. Betweengroup comparisons on background variables showed only one significant difference, and that is in regard to experiencing technical difficulty (chi-squared = 6.391, p = 0.041). The percentages of respondents who experienced problems were 19%, 19%, and 4% for the Balanced, Positive-Packed, and Positive-Centered conditions, respectively.

RESPONSES TO THE THREE FORMS OF THE SURVEY Table 1 displays, for each of the three survey conditions, the percentage of responses endorsing each of the rating scale labels, the means and standard deviations of responses to the 17 items, the internal consistency reliability coefficients as measured by Cronbach’s alpha, and sample sizes. As seen in Table 1, ratings are very high across all the three survey forms. Means are 4.44, 4.78, and 4.67 (with 5 being the highest score) for the Balanced, Positive-Packed, and Positive-Centered conditions. Frequencies of scale label choice of the three survey forms indicate that the majority of respondents selected the ‘‘strongly agree’’ label, and over

Table 1. Descriptive Statistics of Responses for the Three Treatment Conditions PERCENTAGE OF RESPONSE TO EACH RATING SCALE LABEL CONDITION

STRONGLY DISAGREE DISAGREE NEUTRAL b

b

6.3% (3)

VERY MUCH STRONGLY AGREE AGREE MEAN

AGREE

52.9% (5)

4.44

6.64

0.888

54

Positive-Packed Rating Scale

1.0% (1)b

X

3.9% (2)b

6.1% (3)b

24.1% (4)b

64.8% (5)b

4.78

8.46

0.898

48

X

1.4% (1)b

2.7% (2)b

5.3% (3)b

28.8% (4)b

61.8% (5)b

4.67

8.64

0.907

52

Number of items = 17.

b

Score assigned to each category of response in parentheses.

X = label not present in the scale.

4 TELEMEDICINE and e-HEALTH D E C E M B E R 2 0 1 4

X

SD

b

0.3% (1)

a

39.5% (4)

b

RELIABILITY (CRONBACH’S SAMPLE ALPHA)a SIZE

Balanced (Equal-Interval) Rating Scale

Positive (Equal-Interval)Centered Rating Scale

0.9% (2)

b

DESCRIPTIVE STATISTICS

CEILING EFFECT IN TELEMEDICINE SURVEYS

with PSQ baseline data that reported a Cronbach’s alpha of 0.822). Cronbach’s alpha values for responses to the Balanced, PositivePacked, and Positive-Centered Rating Scale survey forms were 0.888, 0.898, and 0.907, respectively. The difference in reliability of responses, albeit small, which could be the result of a ceiling effect of the reliability measures, further suggests that using rating scales loaded with more positive labels in surveying populations who are highly positive is preferred because it can better discriminate the responses.

Discussion In our study we did not use predetermined scale values to guide our selection of labels in constructing the positive centered scale. Determining or referring to published scale values that reflect the varying intensity of different labels in constructing positive-packed rating scales would provide more precise measurements than selecting labels based on impression of connotative meaning conveyed by the labels.20 It is especially true for rating scales that use evaluative and frequency labels. Related to the use of different types of anchors, there is some research evidence suggesting that rating scales with construct-related anchors (anchors that are specific to the items) may produce greater response variability than rating scales with the generic agree–disagree anchors.21,22 Further research is needed to explore the use of construct-related rating scales on satisfaction ratings in highly positive populations in telemedicine. Albeit not statistically significant, which may partially because of the small sample size in the research design, findings from the current research are in the direction suggested by the two hypotheses examined in our research: Positive-Packed and Positive-Centered Rating Scales produce higher variance and reliability in rating responses than those produced by the traditional Balanced Equal-Interval Rating Scale. There is some evidence from our research supporting the idea that to enhance discrimination of survey responses, more positive labels should be included in the rating scales. The tendency for respondents to overuse the positive end of the rating scale is very much evident in telemedicine research. Manipulating scale centeredness appears to be a viable and efficient strategy to counteract ceiling effect as it offsets the middle and expands the area of interest. The effect of the positively loaded rating scales on the distribution of rating responses while maintaining a high reliability despite high mean scores is noteworthy. Parametric statistics such as computation of means and standard deviations and examination of statistically significant differences and correlations through inferential statistics are typically used in satisfaction survey data analysis. These statistics usually require data with interval scale of measurement property. Although it has been argued that parametric statistics can be applied to data generated by Likert scales,23 in general, we recommend that a Positive-Centered Rating Scale should be preferred over a Positive-Packed Rating Scale as it maintains to a larger degree the equal interval properties of a scale and consequently can produce more accurate results with parametric statistical analyses.

RESPONSE RATE Low response rate is a threat to both statistical conclusion and construct validity,24 albeit a few recent studies on telemedicine and

conventional healthcare delivery suggest that allocating substantial expenses to attain higher response rates may not be cost-effective.9,25 The overall response rate of 71% as observed in our study (with a small range from 67% to 75% across treatment conditions) is not low as it is comparable to the response rates (66–68%) reported in other recent healthcare satisfaction surveys,13,26 including telemedicine.9,25 In addition, the differences in response rate among the three conditions are minimal, with 8% between the highest (75%) and the lowest (67%) response rates, which suggests a lack of treatmentrelated attrition bias20 and consequently would not support the conjecture that survey nonresponders are those who felt that that the surveyors only want positive feedback.

Conclusions The results of our study should be interpreted in light of the following three potential limitations. First of all, the accessible population in our study consists of all telemedicine patients scheduled to see a clinician at a multisite academic tertiary-care hospital with an established telehealth program. Participants were not randomly selected, and the sample size was dependent on the number of telemedicine videoconference appointments scheduled by a clinician at the participating sites in the 2-month study time frame. Random selection from the general telemedicine patient population would have been optimal as a strategy to maximize generalizability of findings, but it was not possible because of both time constraints and the need to incorporate the study into the existing clinical environment. We acknowledge that this is limiting to our findings, which therefore can be easily generalized to this group but not to situations where videoconference may become a complete substitute for care (and not an adjunct to in-person appointments as in this case). Still, this study is useful to the extent of this group, further defined by the demographics, and provides evidence that if you need to use protection against the ceiling effect, there could be a better alternative than the use of negatively worded statements. Second, the generalizability of findings may be affected by the delivery system, telemedicine. In our study, the majority of respondents to the survey had indicated that they had used telemedicine before. On the one hand, respondents who had used telemedicine might have a favorable opinion of telemedicine, and that is why they used it again. On the other hand, those enrolled for the first time might be biased by the novelty of using a new device. We do not feel that the delivering system introduces a biasing effect on our findings. As previously mentioned, this telemedicine patient population uses telemedicine as an adjunct service, with the vast majority of telemedicine appointments requested by the clinicians and not initiated by the patients. Consequently, the participants were not self-selected into the study and the telemedicine method because they preferred the use of telemedicine for their appointment. Also, surveys were self-administered and sent immediately after the videoconference by mail to participants’ homes. In addition, the degree of ‘‘disagreement’’ responses, albeit small, was equally noted on all three survey forms. Consequently, the proportion of positive and negative respondents should be roughly the same.

ª M A R Y A N N L I E B E R T , I N C .  VOL. 20

NO. 12  DECEMBER 2014

TELEMEDICINE and e-HEALTH 5

MASINO AND LAM

Finally, the results are not statistically significant; however, the trend is apparent, and hence it is more than a subjective judgment without evidence. In addition, there was a concentrated effort to reduce nonresponse by using the numerous strategies previously described above, in addition to the collection of demographic information on respondents to permit a more complete description of the sample for the reader so findings can be generalized to similar populations. We believe that our research contributes to the advancement of research of patient satisfaction in telemedicine. We caution against an arbitrary selection of rating scales without careful consideration of the effects of the rating scale labels that can occur, especially when surveying a highly positive patient population. Our findings complement and add to the existing survey guidelines in telemedicine proposed by Demiris15 by highlighting the importance of the choice of rating scale labels as a strategy to combat the ceiling effect, which appears to continue as a very significant issue in health science research, including telemedicine.

Acknowledgments The authors thank the University Health Network Telehealth Program for their research participation in addition to the OTN for permission to use the OTN survey for this research and publication.

Disclosure Statement No competing financial interests exist.

insulin treatment: Assessment of improvements in treatment satisfaction with a new insulin analogue. Qual Life Res 2000;9:915–930. 12. Moret L, Nguyen JM, Pillet N, et al. Improvement of psychometric properties of a scale measuring inpatient satisfaction with care: A better response rate and a reduction of the ceiling effect. BMC Health Serv Res 2007;7:197. 13. Garratt AM, Helgeland J, Gulbrandsen P. Five-point scales outperform 10-point scales in a randomized comparison of item scaling for the Patient Experiences Questionnaire. J Clin Epidemiol 2011;64:200–207. 14. Streiner DL, Norman GR. Health measurement scales: A practical guide to their development and use, 3rd ed. New York: Oxford University Press, 2003. 15. Demiris G. Principles of survey development for telemedicine applications. J Telemed Telecare 2006;12:111–115. 16. Tourangeau R, Rips LJ, Rasinski KA. The psychology of survey response. Cambridge, United Kingdom: Cambridge University Press, 2000. 17. Peter J, Valkenburg PM. The impact of ‘‘forgiving’’ introductions on the reporting of sensitive behavior in surveys: The role of social desirability response style and developmental status. Public Opin Q 2011;75:779–787. 18. Fox JA, Tracy PE. Randomized response: A method for sensitive surveys. Thousand Oaks, CA: Sage, 1986. 19. Keresztes C, Hartford K, Wilk P. Measuring patient satisfaction with telemedicine: Establishing psychometric properties. Paper presented at the Canadian Society of Telehealth Meeting, Ottawa, October 2008. 20. Spector PE. Summated rating scale construction: An introduction. Thousand Oaks, CA: Sage, 1992. 21. Gehlbach H. Agree-disagree: A ‘‘strongly disagreeable’’ response scale. Paper presented at the American Educational Research Association Meeting, New York, April 2008.

REFERENCES

22. Barge S, Gehlbach H. Using the theory of satisficing to evaluate the quality of survey data. Res Higher Educ 2012;53:182–200.

1. Whitten PS, Mair F. Telemedicine and patient satisfaction: Current status and future directions. Telemed J E Health 2000;6:417–423.

23. Norman G. Likert scales, levels of measurement and the ‘‘laws’’ of statistics. Adv Health Sci Educ Theory Pract 2010;15:625–632.

2. Avis M, Bond M, Arthur A. Questioning patient satisfaction: An empirical investigation in two outpatient clinics. Soc Sci Med 1997;44:85–92.

24. Shadish W, Cook T, Campbell D. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin Company, 2002.

3. Sitzia J, Wood N. Patient satisfaction: A review of issues and concepts. Soc Sci Med 1997;45:1829–1843.

25. Gagnon MP, Orruno E, Asua J, et al. Using a modified technology acceptance model to evaluate healthcare professionals’ adoption of a new telemonitoring system. Telemed J E Health 2012;18:54–59.

4. Currell R, Urquhart C, Wainwright P, et al. Telemedicine versus face to face patient care: Effects on professional practice and health care outcomes. Cochrane Database Syst Rev 2000;(2):CD002098. 5. Kraai IH, Luttik ML, de Jong RM, et al. Heart failure patients monitored with telemedicine: Patient satisfaction, a review of the literature. J Card Fail 2011;17:684–690. 6. Mair F, Whitten P. Systematic review of studies of patient satisfaction with telemedicine. BMJ 2000;320:1517–1520. 7. Williams TL, May CR, Esmail A. Limitations of patient satisfaction studies in telehealthcare: A systematic review of the literature. Telemed J E Health 2001;7:293–312. 8. Findik UY, Unsar S, Sut N. Patient satisfaction with nursing care and its relationship with patient characteristics. Nurs Health Sci 2010;12:162–169. 9. Finkelstein SM, MacMahon K, Lindgren BR, et al. Development of a remote monitoring satisfaction survey and its use in a clinical trial with lung transplant recipients. J Telemed Telecare 2012;18:42–46. 10. Streiner DL, Norman GR. Health measurement scales: A practical guide to their development and use, 4th ed. New York: Oxford University Press, 2008. 11. Howorka K, Pumprla J, Schlusche C, et al. Dealing with ceiling baseline treatment satisfaction level in patients with diabetes under flexible, functional

6 TELEMEDICINE and e-HEALTH D E C E M B E R 2 0 1 4

26. Sitzia J, Wood N. Response rate in patient satisfaction research: An analysis of 210 published studies. Int J Qual Health Care 1998;10:311–317.

Address correspondence to: Caterina Masino, MA Telehealth Program Centre for Global eHealth Innovation UHN—University Health Network 190 Elizabeth Street, RFE 4S400 Toronto, ON M5G 2C4 Canada E-mail: [email protected] Received: November 22, 2013 Revised: March 11, 2014 Accepted: March 12, 2014

Choice of rating scale labels: implication for minimizing patient satisfaction response ceiling effect in telemedicine surveys.

Lack of response variability is problematic in surveys because of its detrimental effects on sensitivity and consequently reliability of the responses...
278KB Sizes 0 Downloads 4 Views