Research in Social and Administrative Pharmacy j (2014) j–j

Original Research

Web search query volume as a measure of pharmaceutical utilization and changes in prescribing patterns Jacob E. Simmering, M.S.a, Linnea A. Polgreen, Ph.D.a,*, Philip M. Polgreen, M.D., M.P.H.b,c a

Department of Pharmacy Practice and Science, University of Iowa, Iowa City, IA 52242, USA b Department of Internal Medicine, University of Iowa, Iowa City, IA, USA c Department of Epidemiology, University of Iowa, Iowa City, IA, USA

Abstract Background: Monitoring prescription drug utilization is important for both drug safety and drug marketing purposes. However, access to utilization data is often expensive, limited and not timely. Objectives: To demonstrate and validate the use of web search engine queries as a method for timely monitoring of drug utilization and changes in prescribing behaviors. Methods: Drug utilization time series were obtained from the Medical Expenditure Panel Survey and normalized search volume was obtained from Google Trends. Correlation between the series was estimated using a cross-correlation function. Changes in the search volume following knowledge events were detected using a cumulative sums changepoint method. Results: Search volume tracks closely with the utilization rates of several seasonal prescription drugs. Additionally, search volume exhibits changes following known major knowledge events, such as the publication of new information. Conclusions: Search volume provides a first order approximation to pharmaceutical utilization in the community and can be used to detect changes in prescribing behavior. Ó 2014 Elsevier Inc. All rights reserved. Keywords: Pharmacovigilance; Post-marketing surveillance; Novel data sources

Introduction Accurate and timely estimates of pharmaceutical utilization as well as changes in interest or demand for pharmaceuticals are critical for many drug safety and marketing related investigations.1,2 For example, the cumulative burden of an adverse drug event is the result of not only

the relative frequency of that particular event but also the number of people taking the drug. Knowing the expected number of people at risk for a particular adverse event, especially given the potential for post-marketing novel events, is important for designing and guiding interventions for drug safety.1,2 Additionally, marketing and

* Corresponding author. Tel.: þ1 319 384 3024; fax: þ1 319 353 5646. E-mail address: [email protected] (L.A. Polgreen). 1551-7411/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.sapharm.2014.01.003

2

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

other efforts to increase awareness of a specific drug are heavily dependent on detecting any changes in the utilization of the targeted drug. Historically, data regarding the use of pharmaceuticals has been limited in geographic scope, drawn from a relatively small sample, expensive, difficult to obtain, or not widely available in a timely fashion.3–5 For example, data from IMS Health has no listed price and users are directed to sales staff for a quote.5 Data from more easily accessible sources such as the Medical Expenditure Panel Survey is only published after considerable delay. In addition, there is not a national reporting system in place for prescription drug utilization2 as there is for influenza.6 As a result, other than data from individual, national pharmacy chains, there are no geographically diverse, current and available estimates of utilization. A consequence of this delay is the information is too old to lead to timely modifications to existing interventions or marketing efforts. These data limitations complicate research and marketing efforts. Investigators in various fields have used Internet search volume as a proxy for consumer interest and have used these data to help forecast sales of consumer goods. For example, Cho and Varian have shown a strong correlation between search volume at Google and retail sales by the type of good, brand specific sales of automobiles, house sales and even travel.7 Search volume at Yahoo has been shown to track with the number of sales of specific music tracks and movie box office receipts.8 The methods have been extended to health related search topics, most notably influenza and other infectious diseases.9–11 A recent paper using search query data captured by an Internet-search-engine toolbar has shown some evidence for the ability of search queries to detect potential adverse drug events.12 However, these methods have not yet been widely used to explore utilization and changes in utilization of prescription medications. Prescription medications have some unique properties that differentiate them from other consumer goods. Consumers are less likely to search for low prices on prescription drugs compared to other consumer goods and the demand for medications is induced by a condition the patient did not desire.13 Consumers choose to go to a movie or buy a new car but either do not have or have much less agency over their prescription drug use. Additionally, access to such drugs is gated through a prescriber and dispenser. Search

volume based methods of surveillance and estimation have not yet been validated on prescription drugs. Our goal is to demonstrate a relationship between search volume and a nationally representative estimate of actual medication use drawn from the Medical Expenditure Panel Survey (MEPS). Additionally, we examined the responses in search volume following major events expected to change the prescribing and demand patterns (We refer to these as “pharmaceutical knowledge events”) to determine if changes in search volume correspond to these knowledge events. Methods Google Trends Search volume data were obtained from Google Trends. Google Trends is a publicly available source that provides normalized Google “search share” values by week in the range 0–100 starting in January of 2004 for a set of common queries.7 The absolute number of searches represented by a given value is not necessarily constant across time, as the total number of searches, and thus search share, may change.7 However, assuming that the total number and relative frequency of searches is roughly constant, changes in search share can be interpreted as changes in the absolute number of searches for a given query. A search volume of 0 means that in the given week the provided set of keywords were not popular enough to have been indexed. Unless otherwise noted, keywords included both the generic name and commonly used trade names. Validation of search volume as a measure of community utilization In order to determine if search volume is a reasonable proxy for drug utilization, we first extracted weekly outpatient drug-utilization estimates from the Medical Expenditure Panel Survey (MEPS) for 2004–2009 for nine drugs from various therapeutic classes. MEPS is a yearly, nationally representative panel survey of the community based, non-institutionalized population and includes information about prescription drug use.14 The drugs were chosen for an a priori expectation of seasonal behavior. Some drugs are used seasonally which creates within-year variation that allows for meaningful comparison of actual utilization and search volume estimates. We selected amoxicillin, azelastine, azithromycin,

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

benzonatate, cefdinir, ciprofloxacin, levofloxacin, moxifloxacin and olopatadine. All of these drugs are used to treat infections or allergies with a known seasonal pattern and, as such, the drugs are expected to exhibit highly seasonal time series. The included drugs include examples expected to peak either during the summer or the winter. Dispensing events for each of the nine drugs listed above were found via regular expressions matching of the MEPS pharmacy reported name and the generic or trade names of the target drugs. Regular expressions are a formalized and flexible method for string and substring matching.15 Prescription events were found by matching on key substrings of the drug name (e.g., a set of 4–6 characters) to abstract the set of names used for that drug in the MEPS pharmacyreported-name variable. The resulting list was reviewed to ensure against false positive matches (e.g., levothyroxine and levofloxacin both matching the substring “levo”). After removing any false positives, the list of names was then used as a “gold standard” and all records with matching pharmacy reported names were extracted from the MEPS drug dataset. The search volume was obtained over the same interval using Google Trends. The cross-correlation function (CCF) between the MEPS-derived medication utilization and the Google search volume was calculated. The CCF is a series of correlations calculated between two series with various lags in one of the series. For example, at a lag of zero, the CCF is the common correlation between the two series. At lag of þ1, the CCF is the correlation between the two series with the second series shifted by one unit of time (e.g., the correlation between a given week in the first series and the following week in the other series).

3

Knowledge events To determine if “pharmaceutical knowledge events” can be detected in search data, we used a different set of pharmaceutical agents than we used in our validation described above. Our pharmacist reviewer reviewed the list of FDA safety alerts and press releases from third parties (e.g., Heart.org) and identified knowledge events that were likely to have resulted in a major change in practice. These sources were supplemented with the pharmacist’s knowledge of changes in the field that may not have been captured by either of the inputs. In the end, our pharmacist identified seven medications that were subject to some major knowledge event between 2004 and 2011. Reasons for changes in prescribing behavior included name changes, recalls, major safety alerts or efficacy information. We included the two drugs where the name of the medication was changed by the FDA after it was released. These drugs provide a positive control for our methods in that a change in behavior must occur as the drug is no longer ordered or dispensed under the old name. Additionally, as the name changes were essentially instant, they allow us to estimate the minimum amount of time for a change to occur with outstanding supplies. The included drugs, event dates and reason for the event are listed in Table 1. We believe this list provides a reasonable sample of the different possible causes of knowledge events and includes positive controls making it a reasonable test set for determining if knowledge events are detectable in search volume. The Google Trends search volume was abstracted for each series and changepoints were located. A changepoint is a point at which the time series’ behavior changes in a major way. For

Table 1 Drugs potentially subject to major changes in prescribing behavior Affected drug

Alternate drug

Date of event

Event type

Omacor Kapidex Zicam Cold Remedy Nasal Products Rosiglitazone (Avandia) COX-2 Selective Inhibitors Vytorin/ezetimibe (Zetia)a Pseudoephedrine

Lovaza Dexilant N/A Pioglitazone (Actos) N/A Simvastatin (Zocor) Phenylephrine

08/01/2007 03/04/2010 06/16/2009 05/21/2007 09/30/2004 01/14/2008 03/09/2006

Name change Name change Safety alert Safety alert Safety alert Efficacy Regulatory

a Simvastatin, the generic name for part of the Vytorin combination, was not included in the search string for this series as the goal was to measure use of ezetimibe alone or in combination with simvastatin. Including simvastatin would have captured use of simvastatin without ezetimibe or combination drugs (e.g., Vytorin).

4

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

example, a series that increased steadily for a number of months and then stopped increasing has a changepoint where the series plateaus. The changepoints for each series were located using a cumulative sums method. Formally, for each week t with search volume P vt in the series, the cumulative sum is St ¼ ti¼1 vi  v. If the series follows a constant process over the interval under study, the observations vi and viþk where k s 0 are independent. If the observations are independent, we would expect the deviations to “cancel” each other out and EðSt Þ ¼ 0. The week t that maximizes the absolute value of the cumulative sum is taken as the most likely changepoint location tc in the interval [1,T] where T is the end of the series. As this method will always return the most probable changepoint in the interval [1,T], we bootstrapped an empirical P-value for the resulting changepoint. If tc was a significant changepoint at a ¼ 0.05, we partitioned the series into the intervals [1,tc] and [tc,T] and repeated this process recursively until no new significant changepoints were found. Once the probable changepoint nearest the knowledge event was found, the rate at which the search volume for the affected drug was replaced by search volume for the alternate drug was estimated. This is known as the marginal rate of substitution (MRS). The MRS was estimated by regressing the affected drug’s search volume on the volume for the alternative drug over the 12 months before and after the changepoint. Simple linear regression failed to produce meaningful estimates due to the relatively small number of points and the relatively high number of outliers. By definition, the volume for weeks near the changepoint is very volatile. To minimize the effect of this volatility on the regression model, iteratively re-weighted least squares was used to reduce the influence of the outliers and produce more accurate estimates of the MRS. Iteratively re-weighted least squares is a form of regression where the regression weights are selected via an iterative process to minimize the effect of an outliers and better meet the assumptions of constant variance.

Results Utilization Of the nine seasonal drugs we considered to determine if search volume is a reasonable measure of drug utilization, only three (amoxicillin,

azithromycin and cefdinir) had enough outpatient dispensing events in the MEPS data to construct a time series suitable for analysis. The other 6 drugs (azelastine, benzonatate, ciprofloxacin, levofloxacin, moxifloxacin and olopatadine) had many weeks with 0 observed fills. This is due to the relatively low rate of use of these drugs, especially compared to the three more-common series, combined with the moderate sample size of MEPS (roughly 30,000 per year). Additionally, these series had very high levels of inter-week variance that dwarfed the expected seasonal variance. When the MEPS sample weights are applied to produce national level estimates, the week-toweek variance explodes. Because of these limitations with azelastine, benzonatate, ciprofloxacin, levofloxacin, moxifloxacin and olopatadine, we only used amoxicillin, azithromycin and cefdinir in our cross-correlation analysis. The cross-correlation functions for the three considered drugs showed positive correlation between the search volume and MEPS-derived outpatient utilization rates at lags near 0 (Fig. 1). In other words, significant correlation exists between the MEPS-derived use and the search volume in the same week. There is also a strong positive relationship at year intervals and a strong negative relationship at half-year intervals. Knowledge events Each of the seven series that we used to examine knowledge events showed a significant changepoint in the search series that coincided with the knowledge-event date. For example, in January 2008, it was reported that combination Vytorin/ezetimibe did not outperform treatment with just simvastatin. Following this event, a significant change in the Vytorin series occurred, Fig. 2. Vytorin/Ezetimibe lost search share relative to simvastatin immediately after a spike in search interest in January 2008. A similar pattern was observed in the other series corresponding to the different knowledge events. Typically a spike in search volume was associated with the timing of the “knowledge event” followed by a changepoint in the affected series. The changepoints were all near the event date in time, Table 2. For the drugs with an expected alternative drug, there was an increase the searches for the alternative drug following the event date. This would suggest a change in consumer and provider interest in a drug

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

5

Fig. 1. Cross-correlation functions for amoxicillin, azithromycin and cefdinir utilization and Google Trends volume. The height of the bar indicates the correlation coefficient between the MEPS-derived volume series and the shifted Google Trends series. Horizontal lines denote a ¼ 0.05 significance levels, anything outside of the lines is significantly different from 0. Significant positive correlations are found near 0 and 1 year. Negative correlations are found at 0.5 and 1.5 years.

following the “knowledge event.” The rate of change and interval was similar between knowledge event types. Finally, the marginal rate of substitution between the primary affected drug and a likely alternate is also shown in Table 2. The estimates for nearly all of the drugs suggest a decrease in search volume for the affected drug following the changepoint. There is a related increase in search volume for the probable alternate drug in all but one case. Rosiglitazone/pioglitazone are outliers in this respect as the marginal rate of substitution is positive.

Discussion These results show that search volume for drug-related keywords provides an accurate and timely first-order approximation of actual community utilization. Moreover, search volume changes occurred after major knowledge events with minimal delay. The ability to both estimate the utilization at a given time, including historical utilization, and the sensitivity of drug searches to major shifts in interest suggest many future potential applications of this methodology for drug marketing and pharmacovigilance efforts.

Significant, strong correlation at zero lag between the MEPS-derived utilization and the Google Trends search share was found for the three seasonal drugs considered (amoxicillin, azithromycin and cefdinir). Also, each CCF had a local minima at 6 months and local maxima at 1 year. As these drugs are strongly seasonal, a strongly negative correlation would be expected at the half-year mark and the same week in previous and subsequent years would be expected to be positively correlated. This lends support to interpretation that the two series measure the same phenomena. Internet searches were also correlated with knowledge events. For example, following the publication of new efficacy information in January 2008, the search volume for Vytorin/ ezetimibe started to decline. The search volume for simvastatin, a likely alternative, increased following the event date. This pattern of “swapping” for the expected alternative drug was observed for nearly all of the considered drug pairs. Notably, for the two drugs that underwent relabeling, changes in Internet searches appeared roughly 2 months after relabeling. This interval is likely the byproduct of 30- and 90-day supplies being standard fills for many chronic drugs. It

6

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

Fig. 2. Log Google Trends search volume for Vytorin/ezetimibe/Zetia and simvastatin/Zocor. The black vertical line marks the knowledge event of Jan 14, 2008. Following the release of the new efficacy information in Jan 2008, we see an increase in interest in Vytorin/ezetimibe/Zetia followed by a rapid decrease to a new baseline lower than that previously observed. The simvastatin/Zocor series shows a slight increase at the publication date and then moves to a new baseline and does not decrease as the Vytorin/ezetimibe/Zetia series does reflecting a possible substitution in interest between the two series following the Jan 2008 event.

would take roughly 1–2 months before enough patients have had their medications relabeled or changed. In addition, this method detected the switch away from COX-2 inhibitors and Vytorin in approximately 3 months, suggesting relative quick detection of these knowledge events. This pattern of switching after 2 months did not occur for rosiglitazone and pioglitazone. In the interval near the changepoint, the MRS was greater than zero. However, this drug pair was much more sensitive to the specification of the weights in the regression than the other combinations. However, it is not clear that pioglitazone is

the best alternative drug. Pioglitazone has been the subject of several safety alerts and there were questions about its safety at the time of the rosiglitazone alerts.16 Given the unclear safety profile of pioglitazone, providers may have been reluctant to move patients from rosiglitazone to pioglitazone. Providers may have decided, given the lack of an alternative drug, to discontinue therapy or replace rosiglitazone with another drug other than pioglitazone. The other considered drug pairs did not have the problem where the alternative drug is also subject to major safety questions.

Table 2 Event dates and CUSUM changepoints Series

Date of event

Changepoint

Interval (days)

Marginal rate of substitution

Omacor Kapidex Zicam Cold Remedy Nasal Products Rosiglitazone COX-2 Selective Inhibitors Vytorin/ezetimibe (Zetia) Pseudoephedrine

08/01/2007 03/04/2010 06/16/2009 05/21/2007 09/30/2004 01/14/2008 03/09/2006

09/23/2007 05/09/2010 10/11/2009 11/11/2007 12/26/2004 04/13/2008 08/27/2006

52 66 116 173 87 91 170

0.58 0.55 N/A 1.43 N/A 1.15 1.18

(0.64, 0.52) (0.62, 0.48) (1.19, 1.68) (1.47, 0.83) (1.24, 1.13)

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

There are drawbacks to using search volume to characterize medication utilization. The elderly are the largest consumers of medications and also underrepresented among users of search engines resulting in a potential mismatch between the users of the medications and those generating the search data.17,18 However, we theorize that the elderly may have younger caregivers and family members who turn to Google for drug information. Additionally, there are a number of limitations imposed by the nature of the data from Google Trends. The tool reports a normalized share making it possible for the same number of total searches at different times to have two different volume estimates. These properties of Google Trends data make conversion to an absolute scale difficult, if not impossible. Finally, the correlation between Google search volume and actual use as defined in our study depends on the MEPS data. The MEPS sample is nationally representative; however, many drug series have high inter-week variance. The number of observed fills in MEPS may only vary by 1 or 2 fills per week but when the weights are applied this becomes differences on the order of 10,000 fills per week. This can make it difficult to construct meaningful series for even relatively common drugs given the approximately 30,000 subject sample size of MEPS. Therefore, we are limited in the number of drugs for which we can estimate “ground truth” utilization using MEPS. Additionally, the MEPS drug use time series were derived from pharmacy reported names on fills. It is possible that a pharmacy used a very obscure name that contained no elements of the generic name or typical brand names. Patients may also fill, but not take, a prescription. Future applications of these methods should include an exploration of response in search volume following FDA safety alerts such as Black Box Warnings and Risk Evaluation and Mitigation Strategies (REMS). For example, do patients search for drug information when receiving a drug with an REMS notification? Does interest spike with a new Black Box Warning? Is this increased awareness or interest maintained? What effect is there on prescribing following the release of a Black Box Warning or REMS?

Conclusions In spite of the limitations, it appears that Google Trends does provide a reasonable

7

characterization of population medication utilization in near real-time without requiring expensive and time-consuming investigation. Additionally, changepoint analysis using Google Trends search volume appears to be a timely method for the detection of knowledge events – major shifts in not only medication interest but also utilization. The real-time measure of interest and demand for a given medication provides a unique insight into consumer and provider behavior. Thus, it is likely that the value of real-time surveillance over a diverse population at a limited cost exceeds the drawbacks of this approach. Acknowledgments Ben Urick, Pharm.D. provided the list of major knowledge events used in this study.

References 1. Aborn J. Evaluating drug effects in the post-Vioxx World: there must be a better way. Circulation 2006;113:2173–2176. 2. Wysowski DK, Swartz L. Adverse drug event surveillance the drug withdrawals in the United States. Arch Intern Med 2005;165:1363–1369. 3. Patwardhan A, Bilkovski R. Comparison: flu prescription sales data from a retail pharmacy in the US with Google Flu Trends and US ILINet (CDC) Data as Flu Activity Indicator. PLoS One 2012;7:8. 4. Aitken M, Berndt ER, Cutler DM. Prescription drug spending trends in the United States: looking beyond the turning point. Health Aff 2008;28(1):w151–w160. 5. Global Measures of Dispensing Activity. IMS Health. http://www.imshealth.com/portal/site/ims health/menuitem.3e17c48750a3d98f53c753c71ad8c 22a/?vgnextoid¼c5f6e590cb4dc310VgnVCM100000 a48d2ca2RCRD&vgnextfmt¼default; Accessed 01.05.13. 6. Centers for Disease Control and Prevention. Overview of Influenza Surveillance in the United States, Updated October 2013. http://www.cdc.gov/flu/ weekly/overview.htm. Accessed 01.05.13. 7. Choi H, Varian H. Predicting the Present with Google Trends, Published April 2009. http://static.googleu sercontent.com/external_content/untrusted_dlcp/www. google.com/en/us/googleblogs/pdfs/google_predict ing_the_present.pdf. Accessed 01.05.13. 8. Goel S, Hofman JM, Lahaie S, Pennock DM, Watts DJ. Predicting consumer behavior with web search. Proc Natl Acad Sci U S A 2010;107:17486– 17490. 9. Polgreen PM, Chen Y, Pennock DM, Nelson FB, Weinstein RA. Using internet searches for influenza surveillance. Clin Infect Dis 2008;47:1443–1448.

8

Simmering et al. / Research in Social and Administrative Pharmacy j (2014) 1–8

10. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature 2008;457:1012–1014. 11. Chan EH, Sahai V, Conrad C, Brownstein JS. Using web search query data to monitor dengue epidemics. PLoS Negl Trop Dis 2011;5:e1206. 12. White RW, Tatonettie NP, Shah NH, Altman RB, Horvitz E. Web-scale pharmacovigilance: listening to signals from the crowd. J Am Inform Med Assoc 2013;20:404–408. 13. Sorensen AT. An empirical model of heterogeneous consumer search for retail prescription drugs. NBER Working Paper Series; 2001. 14. Agency for Healthcare Research and Quality. Survey Background, Updated August 2009. http://meps. ahrq.gov/mepsweb/data_stats/data_overview.jsp. Accessed 01.05.13.

15. R Core Team. Regular Expressions as Used in R, Updated April 2013. http://stat.ethz.ch/R-manual/ R-devel/library/base/html/regex.html. Accessed December 2013. 16. US Food and Drug Administration. Actos (pioglitazone): Ongoing Safety Review – Potential Increased Risk of Bladder Cancer, Updated August 2011. http://www.fda.gov/Safety/MedWatch/SafetyInfor mation/SafetyAlertsforHumanMedicalProducts/ucm 226257.htm. Accessed 01.05.13. 17. Zickuhr K, Madden M. Older Adults and the Internet. Pew Internet and American Life Project; Published June 2012. http://www.pewinternet.org/w/media// Files/Reports/2012/PIP_Older_adults_and_internet_ use.pdf, Accessed 01.05.13. 18. Gu Q, Dillon CF, Burt VL. Prescription drug use continues to increase. NCHS Data Brief 2010;42: 1–8.

Web search query volume as a measure of pharmaceutical utilization and changes in prescribing patterns.

Monitoring prescription drug utilization is important for both drug safety and drug marketing purposes. However, access to utilization data is often e...
518KB Sizes 0 Downloads 3 Views