International Journal of Sports Physiology and Performance, 2015, 10, 626  -629 http://dx.doi.org/10.1123/ijspp.2014-0390 © 2015 Human Kinetics, Inc.

ORIGINAL INVESTIGATION

Large N: A Strategy for Improving Regional Sport Performance Andrew C. Cornett and Joel M. Stager It has been hypothesized that large differences in maximal performance can arise between various geopolitical regions solely on the basis of differing numbers of participants in the target activity. While there is evidence in support of this hypothesis for a measure of intellectual performance, the same relationship has not been examined for a measure of physical performance. Purpose: To determine whether the number of participants is a predictor of the best athletic performance in a region. Methods: The 2005–2010 USA Swimming Age Group Detail reports were used to determine the number of competitive swimmers participating in each age group for the 59 local swimming communities in the United States. The USA Swimming performance database provided 50-yd-freestyle times in each community for boys and girls for each age (6–19 y). Simple linear regression was used to examine the relationship between the outcome variable (fastest time) and the predictor variable (log of the number of swimmers) for each combination of age, sex, and calendar year. Results: The log of the number of swimmers in a region was a significant predictor of the best performance in that region for all 168 combinations of age, sex, and calendar year (P < .05) and explained, on average, 41%, and as much as 62%, of the variance in the fastest time. Conclusion: These findings have important implications for the development of regional sport strategic policy. Increasing the number of participants in the target activity appears a viable strategy for improving regional performance. Keywords: competitive success, sample size, human performance, elite athletes, swimming In the realm of human performance, it is frequently observed that some geographical regions consistently perform better than others at specific activities. Using national borders to define “region,” we find 2 of many possible examples in Russia’s prolonged history of top performance in international chess competition1 and China’s dominance over world table tennis contests.2 The reality today is that there is a readily recognized link between sport and nationalism such that there is a great deal of regional pride that typically goes along with success in a particular activity. And, when a region (or nation) is perceived to have underperformed, there is often a collective sense of concern and disappointment in that region. Rarely has this been more evident than with the 2014 FIFA World Cup. Tens of thousands of fans packed the streets of Berlin to celebrate Germany’s championship3 while Brazil’s losses in its final 2 contests were considered a national embarrassment leading Brazil’s president to call for soccer reform.4 While it is politically expedient to demand “reform,” mandates of this nature provoke several relevant questions: How does a geographical region go about improving performance in any particular activity or sport? Are there recognized, effective strategies for improving national performance? And are the variables that constrain or allow success within a region adequately defined, or even minimally understood? In terms of the sequence for addressing these questions, it seems that the necessary first step in developing improvement strategies is gaining an understanding of the variables responsible for regional disparities in performance. Genetic and environmental differences between regions are 2 common explanations for unusual competitive success. However, before considering these as the explanations for the observed performance differences between groups, the disparity Cornett is with the School of Health Promotion & Human Performance, Eastern Michigan University, Ypsilanti, MI. Stager is with the Counsilman Center for the Science of Swimming, Indiana University, Bloomington, IN. Address author correspondence to Andrew Cornett at [email protected]. 626

between groups in regard to the best performances must be shown to be different from what would be expected on the basis of statistical sampling.5 The simple reasoning for this is that whenever 2 groups have a similar distribution for a particular variable, the most extreme values are more likely to come from the larger group.5 If it is true that the largest groups or regions have the best performances, then increasing the number of participants in a region for a particular activity would be a viable strategy for improving performance in that region. But this has proven to be a difficult concept to verify empirically, due, in large part, to the challenge of locating a domain in which the necessary participation and performance data are available. Charness and Gerchak6 conducted the only study we were able to locate that investigated the relationship between participation and performance, and they did so for what is considered a measure of intellectual performance, chess. Charness and Gerchak6 report a significant positive correlation between the rating of the highest-ranked chess player in a country and the log of the number of chess players in that country. The same relationship, however, has yet to be confirmed using other measures of performance such as those in sport, physical, or athletic performance. The purpose of this study was twofold: to test the hypothesis that the number of participants in a geographic region is a significant predictor of the best athletic performance in that region and to interpret the findings in the context of regional sport strategic policy.

Methods At the outset, we recognize that extremes in performance occur at both ends of the performance distribution, the high- and lowperforming ends. Because the best performances are generally of more interest than the worst, we chose to focus this study on the former rather than the latter. We used competitive swimming in the United States to investigate the relationship between the number of participants in a region

Downloaded by Australian Catholic University on 09/23/16, Volume 10, Article Number 5

Participation and Maximal Performance   627

and sport performance. Swimming in the United States was well suited for this purpose because USA Swimming (USAS), the main governing body of swimming in the United States, maintains participation and performance records on all of its registered members. USAS divides the country into 59 regions, called local swimming communities, and records the number of registered swimmers in each swimming community for each sex and age. Each year, these participation data are compiled and posted online as the USAS Age Group Detail report.7 We used the 2005 to 2010 reports in this study to obtain the number of registered swimmers, or participants. It is important to note that USAS provides only a single value for the number of participants who are 8 years and younger and 19 years and older, and as a result, the exact number of 6-, 7-, 8-, and 19-year-old swimmers was unknown. We used the number of swimmers 8 years and younger as our estimate of the number of 6-, 7-, and 8-year-old swimmers and the number of swimmers 19 years and older as our estimate of the number of 19-year-old swimmers. While this may be a less sensitive estimate of participation than for the other ages, we assumed that it would provide an unbiased estimate of participation between the regions and thus would still allow us to draw conclusions. In addition, USAS tracks the swim performances of all its registered age-group swim members, as they require that performances from each sanctioned meet are sent to USAS, and the data are then compiled in the USAS performance database. We accessed the database through the USAS Web site (www.usaswimming. org) and recorded the best 50-yd freestyle time in each swimming community from 2005 to 2010 for boys and girls of each age from 6 to 19 years. Because we used performance data from 6 consecutive calendar years, it is possible that the same swimmer could be in the data set more than once. Despite this fact, we chose to treat the observations in different years as independent, which is likely conservative since we would get wider confidence intervals than if we had properly accounted for the repeated measurement. We decided to focus on the 50-yd freestyle performances because more swimmers compete in this event across all age groups and both sexes than any other swimming event. There were 3 independent variables for this analysis: sex (boys and girls), age (6–19 y), and competition year (2005 to 2010). For each of the 168 combinations of levels of the independent variables (2 sexes × 14 ages × 6 competition years), we obtained both the numbers of registered swimmers and the fastest 50-yd freestyle time for each of the 59 swimming communities. We used simple linear regression to examine the relationship between the outcome variable (fastest time in a region) and the predictor variable (the log of the number of registered swimmers in the region). In line with the procedures used by Charness and Gerchak,6 we transformed the participation data by taking the natural log of the number of participants, which served to linearize the relationship between participation and performance. We completed the analysis for each combination of independent variables (again, sex, age, and competition year) for a total of 168 regression analyses. We calculated the mean of the log of the number of participants for each combination of independent variables and then used it with the regression equations to predict the best performance for that particular group size. Next, we back-transformed the mean of the log of the number of participants, doubled it, and took the log. We then used this value and the regression equation to predict the top performance if the number of participants doubled. This enabled us to calculate the predicted improvement in the best regional performance from doubling the number of participants in the region (percentage improvement). Finally, we used 3-way ANOVA to determine whether percentage improvement was affected by sex, age, and calendar year.

In the event of a significant main effect, we planned to conduct all pairwise comparisons using a Tukey post hoc test. For all statistical analyses, significance was set at an alpha level of .05.

Results The log of the number of swimmers in a region was a significant predictor of the top swim time in that region for all 168 combinations of sex, age, and calendar year (P < .05). The mean and median R2 values were .41 and .42, respectively, and the minimum and maximum R2 values were .10 and .62, respectively. The 3-way ANOVA on percentage improvement revealed significant main effects for sex and age (P < .05). Percentage improvement was greater for boys (2.44%, 95% CI 2.17–2.71%) than for girls (2.18%, 95% CI 1.97–2.39%). Percentage improvement in performance was greater for 6-year-olds (5.06%, 95% CI 4.33–5.79%) than for 7-year-olds (3.72%, 95% CI 3.08–4.36%), 8-year-olds (2.49%, 95% CI 2.25–2.73%), 9-year-olds (2.23%, 95% CI 1.95–2.52%), 10-year-olds (2.18%, 95% CI 1.98–2.38%), 11-year-olds (2.00%, 95% CI 1.78–2.23%), 12-year-olds (1.91%, 95% CI 1.67–2.15%), 13-year-olds (1.58%, 95% CI 1.46–1.70%), 14-year-olds (1.57%, 95% CI 1.47–1.66%), 15-year-olds (1.59%, 95% CI 1.53–1.66%), 16-year-olds (1.72%, 95% CI 1.60–1.84%), 17-year-olds (1.57%, 95% CI 1.45–1.69%), 18-year-olds (2.00%, 95% CI 1.84–2.16%), and 19-year-olds (2.73%, 95% CI 2.57–2.90%). In addition, it was greater for 7-year-olds than for 8- to 19-year-olds, greater for 8-yearolds than for 13- to 15- and 17-year-olds, and greater for 19-yearolds than for 13 to 16-year-olds. The back-transformed means for the number of participants and percentage improvement for each combination of sex and age are shown in Table 1. Table 1  Back-Transformed Mean of the Log of the Number of Swimmers (n) and Mean and 95% Confidence Interval (CI) for the Percentage Improvement in Performance Expected From Doubling the Number of Swimmers for Boys and Girls for Each Age From 6 to 19 Years Boys Age (y)

Girls

Improvement (%) n

Mean

Improvement (%)

95% CI

n

Mean

95% CI

6

139a

5.35

3.22–7.48

190a

4.77

3.84–5.70

7

137a

4.05

2.55–5.55

187a

3.39

2.95–3.83

8

137a

2.67

2.30–3.04

187a

2.30

1.88–2.72

9

109

2.32

2.07–2.58

162

2.15

1.43–2.87

10

132

2.40

1.98–2.83

201

1.96

1.61–2.30

11

143

2.37

2.10–2.63

221

1.64

1.34–1.94

12

142

2.22

1.86–2.58

221

1.59

1.30–1.88

13

131

1.73

1.43–2.03

198

1.42

1.14–1.70

14

120

1.65

1.41–1.90

178

1.48

1.31–1.65

15

103

1.52

1.22–1.83

146

1.66

1.53–1.79

16

91

1.54

1.30–1.78

119

1.90

1.69–2.12

17

81

1.56

1.36–1.76

96

1.58

1.24–1.93

18

62

1.87

1.39–2.35

66

2.12

1.74–2.51

19

67b

2.91

2.47–3.35

60b

2.56

2.18–2.93

Represents the number of registered swimmers 8 y and younger, not the number of swimmers of each age. b Represents the number of registered swimmers 19 y and older, not the number of 19-y-old swimmers. a

IJSPP Vol. 10, No. 5, 2015

628  Cornett and Stager

Downloaded by Australian Catholic University on 09/23/16, Volume 10, Article Number 5

Discussion The primary finding of this study is that the number of swimmers in a region is a significant predictor of the best swim performance in that region. In particular, we found that the log of the number of swimmers in a region accounted for, on average, 41%, and as much as 62%, of the variability in the single best performance in a region. This marks the first time, as far as we are aware, that this relationship has been demonstrated for a measure of physical, or more specifically in this case, athletic, performance. Previously, the relationship between participation and performance had only been described for a measure of intellectual performance when Charness and Gerchak6 showed the log of the number of chess players in a country to be a significant predictor of the ranking of the highest-rated player in that country. In comparing the best swim performance for pairs of communities or the rating of the best chess player for pairs of countries, 2 important points regarding the nature of sampling variability become clear: (1) The best performance is more likely to come from the larger region, but (2) this will not always be the case. Because of the nature of sampling variability, when 2 groups have similar performance distributions, the best performance will, at times, come from the smaller of the 2 groups. The important underlying assumption here is the requirement that the group performance-distribution parameters are the same. If they are the same, then the best performances are most likely to come from the larger group, and there is 1 factor that can influence the expected best performance: the size of the sample, or N. If the distribution parameters are not the same, however, then the best performances would not necessarily be expected to come from the larger group. While there are several distribution parameters that can vary, we will focus on the 2 that are most commonly discussed and reported: location and scale. The location, often represented by the population mean (μ), is where the center of the distribution falls on the x-axis, while the scale describes the degree to which the distribution is spread out, and is commonly represented by the population standard deviation (σ). If 2 groups have performance distributions with different locations (ie, μ is different) despite having equivalent scales and sample sizes (ie, σ and N are the same), the best performances would be expected to come from the group with the larger μ (see Figure 1). Alternatively, if 2 groups have performance distributions with different scales (ie, σ is different) despite having equivalent locations and sample sizes (ie, μ and N are the same), the best performances would be expected to come from the group with the larger σ (see Figure 2). These scenarios provide a theoretical basis for developing strategies for improving performance in a particular geographical region. Charness and Gerchak6 previously identified 3 such strategies: the large μ strategy, the large σ strategy, and the large N strategy. In the large μ strategy, the focus is on increasing μ while allowing σ and N to remain constant. This strategy should be effective because, when considering performance in arbitrary units, any distance from the mean for a distribution for which μ has increased occurs at a higher level of performance than the same distance from the mean on the original distribution (see Figure 1). Charness and Gerchak6 suggest that this can be accomplished by devoting a region’s resources to all members of the participant population equally, regardless of the performance level of the individuals. In doing so, participants of all levels improve to some extent, and thus the distribution shifts to a higher level of performance. In the large σ strategy, the focus is on increasing σ while allowing μ and N to remain constant. This would act to increase regional

Figure 1 — The large μ strategy. Two distributions plotted in arbitrary performance units. When the original performance distribution (dotted curve) shifts to the right (ie, μ increases), it leads to a higher level of performance at any given point on the shifted distribution (solid curve).

Figure 2 — The large σ strategy. Two distributions plotted in arbitrary performance units. When the original performance distribution (dotted curve) spreads out (ie, σ increases), it leads to a higher level of performance on the broader distribution (solid curve) for any given number of standard deviation units above the mean.

performance because any given number of standard deviation units above the mean would occur at a higher level of performance in a broader distribution. Charness and Gerchak6 propose that this can be accomplished by using the majority of a region’s resources to further develop the highest-performing individuals in the region. This strategy is rather intuitive. If the desired goal is to achieve the best performances from a given region, it makes sense to invest a disproportionate amount of resources in the individuals who already perform at the highest level in the region, or the individuals who are predicted to develop into the next, best performers. A key element, then, of the large σ strategy is the existence of a system for identifying which individuals to expend resources on as a means of optimizing regional performance. For the current top performers in a region, it is simply a matter of building a database that allows tracking and ranking the best performances. The process for the future top performers, on the other hand, is certainly more complicated because the top performers in a particular domain at 10 years old, or even 15 years old, are not necessarily still going to be the best at 20 or 25 years old. This represents a conundrum

IJSPP Vol. 10, No. 5, 2015

Downloaded by Australian Catholic University on 09/23/16, Volume 10, Article Number 5

Participation and Maximal Performance   629

for the large σ strategy, because if a region is going to commit resources toward a relatively few performers, it must be certain that the resources are being invested in the performers with the highest performance potential. In the last of the 3 proposed strategies, the large N strategy, the emphasis is on increasing N but letting μ and σ remain constant. This is the strategy that was the focus of the current study. The logic behind this strategy is that by maximizing participation in the target activity, a group or region can rapidly improve on its best performances. Again, this strategy should work because, all else being equal, a region with more participants is more likely to have high-level performances than a region with a smaller participant population (see Figure 3). Using the regression equations developed earlier, we estimated the expected improvement in the top performance in a region from doubling the number of participants in that region to be around 2%. It is important to realize, however, that the simple act of doubling the number of participants is unlikely to have any sort of immediate impact on the best performance. The best swimmers in a region are likely already participating, so any added swimmers are likely to be relatively new to the sport. Since it takes time and effort to develop the skills necessary to perform at the highest levels, it might be years before any impact on performance can be detected from the increased number of participants. As a result, it might not be worthwhile to focus attention on adding new swimmers to the sport at the oldest ages—18 or 19 years old—but it is likely beneficial at the earliest ages. Thus, rather than advocating solely for the large N strategy, we suggest that some hybrid approach, or combination of approaches, may be optimal for improving the best performances in a region. Using swimming as an example, we reason that the large N strategy should be emphasized in the

Figure 3 — The large N strategy. A distribution plotted in arbitrary performance units. When 2 groups have different numbers of participants but the same performance distribution, the expected maximum value is higher for the larger group. For instance, using the standard normal distribution, the expected maximum value for a sample size of 500 (arrow A) is greater than the expected maximum value for a sample size of 100 (arrow B).

sport and directed toward young potential athletes, assuming, of course, that the region has the necessary resources to do so. The funding emphasis for invoking this strategy might be more focused on marketing and publicity and less on coaching and coaching resources. However, for older cohorts, it may be more effective to gravitate away from the large N strategy and to shift focus to the large σ strategy. While it is difficult to determine the age at which this shift should occur, it would likely depend, in part, on the age at which the future elite performers are able to be identified. Once this can be done, the region can begin to shift its resources to further developing these already high-performing individuals.

Conclusion We began this project with the observation that certain groups or geographical regions routinely outperform others for certain measures of human performance. While it is tempting to look for genetic and environmental differences between the groups or regions in trying to better understand these performance disparities, we conclude that an equally important factor to consider is the number of participants—or, rather, sample size. Consistent performance differences between groups can arise even when the groups have the same performance distribution, so before considering teleological or biological arguments as probable reasons for performance differences, it is important to first determine whether the performance differences are greater than what would be expected on the basis of group differences in the number of participants.

References 1. World Chess Federation. FIDE Tournaments Archive. Athens: FIDE. http://ratings.fide.com/archive.phtml. 2. International Table Tennis Federation. Olympic Medalists. Lausanne: ITTF. http://www.ittf.com/media/News/Statistics/OlympicMedalists. pdf. 3. Khamvongsa M. Germany dances in the streets at World Cup victory parade. The Washington Post. July 15, 2014. http://www.washingtonpost.com/blogs/early-lead/wp/2014/07/15/germany-dances-in-thestreets-at-world-cup-victory-parade/ 4. Trevisani P. Brazil’s president calls for soccer reform. Wall Street Journal. July 11, 2014. http://online.wsj.com/articles/brazils-presidentcalls-for-soccer-reform-1405123123. 5. Bilalic M, Smallbone K, McLeod P, Gobet F. Why are (the best) women so good at chess?: participation rates and gender differences in intellectual domains. Proc Biol Soc. 2009;276:1161–1165. PubMed doi:10.1098/rspb.2008.1576 6. Charness N, Gerchak Y. Participation rates and maximal performance: a log-linear explanation for group differences, such as Russian and male dominance in chess. Psychol Sci. 1996;7(1):46–51. doi:10.1111/j.1467-9280.1996.tb00665.x 7. USA Swimming. 2005–2010 Age Group Detail Reports. Colorado Springs: Swimming USA. http://www.usaswimming.org/ DesktopDefault.aspx?TabId=1521.

IJSPP Vol. 10, No. 5, 2015

Large N: a strategy for improving regional sport performance.

It has been hypothesized that large differences in maximal performance can arise between various geopolitical regions solely on the basis of differing...
835KB Sizes 2 Downloads 7 Views