Access to the PPM Journal and newsletters is FREE for clinicians.
10 Articles in Volume 8, Issue #2
Anticephalgic Photoprotective Premedicated Mask
Culture and the Ethics of Patient-Centered Pain Care
Interpreting the Clinical Significance of Pain Questionnaires
Intrathecal Therapy Trials with Ziconotide
Iontophoresis in Pain Management
Maximizing Tertiary Effects of Low Level Laser Therapy
Platelet Rich Plasma (PRP): A Primer
Protecting Pain Physicians from Legal Challenges: Part 1
Right Unilateral Electroconvulsive Therapy Treatment for CRPS
Temporomandibular Dysfunction and Migraine

Interpreting the Clinical Significance of Pain Questionnaires

A comparison of effect sizes of commonly used patient self-report pain instruments—in different pain patient populations —provides an objective ranking of such tools.

Chronic pain is one of the most prevalent problems facing the health-care system today. Over 50 million Americans are affected by pain, creating enormous personal, societal, and financial hardships. Over 80% of all physician visits occur due to complaints of pain, and the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) has mandated that pain be considered the fifth vital sign, in addition to blood pressure, pulse, temperature, and respiration.1 Healthcare costs due to pain are in the tens of billions of dollars annually.2,3 Chronic pain patients are five times more likely to utilize healthcare services,4 and direct medical expenses can exceed $90 billion annually for back pain alone.5 In addition, the consequences of pain include lost earnings, reduced productivity, and increased disability and workers’ compensation benefits.6 Given the enormous impact of pain, the United States Congress declared the Decade of Pain Control and Research, starting January 1, 2001.7

The assessment of pain and its associated effects is a central component of any clinical practice in pain management, from primary care to specialized tertiary rehabilitation settings. In conjunction with clinical interviews, practitioners use various psychometric instruments to assess chronic pain patients, including self-reported measures and clinician-rated scales. Using multiple instruments in combination—rather than relying on any single measure—to assess the efficacy of a given intervention helps to ensure adequate and comprehensive assessments. Over the years, a myriad of psychometric measures have been developed to aid in the assessment and treatment of chronic pain. However, assessments can become complex, onerous, and burdensome to patients as the number of instruments utilized grows. An over-abundance of administered measures may often complicate rather than clarify the assessment process. As new instruments are developed, each measure is individually evaluated, and psychometric properties such as reliability and validity are established. However, instead of the newly developed measures replacing those that are out-of-date or less efficient, most are simply added on to some previously established assessment protocol. Therefore, determining which of the various tests displays the greatest utility in evaluation and responsiveness to change is critically important. The purpose of this article is: (1) to provide practicing clinicians with an update on what are the newest trends in measuring change in pain which may be quite daunting, especially since they are in their “infancy”; and (2) to present a more “tried and proven” approach that can be currently used in everyday practice.


The increasing amounts of paperwork that most medical patients have to complete prior to any initial physical evaluation require clinicians to consider the issue of incremental validity, or the extent to which an instrument contributes additional useful and accurate information toward answering a clinical question.8 Although screening tools are helpful in many instances, they are not diagnostic in and of themselves. Ongoing debates persist regarding how best to measure outcomes, and what the targeted outcome actually is for the treatment of chronic pain. Patients’ self-reported pain and disability level, return-to-work rates, and level of functioning have all been offered as the most important outcome to consider.9

The importance of measuring the efficacy of treatments has become an increasingly focal issue as more diverse invasive and non-invasive treatments for chronic pain emerge, and policy-makers grapple with escalating costs. A growing number of assessment measures have been developed in attempts to validate such treatments. Specifically, patient-reported measures have gained greater acceptance in recent years.10 Indeed, a recent Draft Guidance published by several Federal departments and regulatory bodies emphasized the importance of scientifically documenting the psychometric properties of patient-reported measures.11 Research studies are constantly testing and re-testing various measures to prove or disprove their reliabilities and analyze their predictive validities. The reliability (i.e., the consistency of any measure over time) is crucial because measures are typically utilized to evaluate patients undergoing active and ongoing treatment over time. The issue of validity (i.e., the extent to which an instrument measures what it is purported to measure) is also vitally important. Instruments must also demonstrate sensitivity, or the ability to accurately classify patients. Meta-analyses have demonstrated significant inconsistencies in these domains when analyzing chronic pain studies.12 In addition, the statistical properties of any given measure do not necessarily guarantee the practical relevance of the results that it generates.

Minimal Clinically Important Difference

Historically, clinical measures utilized to assess treatment outcome have focused primarily on reliability and validity. The issue of responsiveness (i.e., a measure’s ability to detect change) was less widely studied. Recently, however, the topic of clinical importance has received increasing emphasis, as both clinicians and researchers endeavor to determine a treatment’s practical importance rather than just its statistical significance. Studies with large sample sizes may demonstrate small individual changes in outcome that are unnoticeable to clinicians and patients, but nevertheless reach statistical significance when aggregated. That is to say, just because a change has been found to be statistically significant does not mean that it is necessarily clinically significant! Accordingly, a number of studies have attempted to define clinically meaningful change, and to calculate concrete values that determine the importance of an observed clinical effect. The importance of such interpretive guidance is heightened by the inherently subjective nature of pain and the notoriously recalcitrant reputation of chronic pain. However, any determination of importance is made within a certain context and perspective. A meaningful change for a patient may be very different from that of a clinician, an institution, or a policy-maker. Also, individual treatment decisions are based on standards of change that may differ greatly from the treatments, decisions, and effects within a group and, even in a group with negligible effects, some individuals will likely exhibit meaningful changes.13 In addition, the practical importance of a change of any magnitude depends upon the cost required to produce it; a consideration that is often evaluated separately from determinations of clinical significance.14,15 Most recent formulations of clinically important change in the field of pain have tended to pursue a rather “elusive” concrete value that might indicate a clinically significant outcome.
“Effect size is a standardized statistical measure that compares score-change after treatment to a pre-treatment standard deviation. Cohen17 established guidelines for interpreting effect size that have been widely accepted and utilized in the literature.”

One such approach is the concept of a minimum clinically important difference (MCID). This has been defined as the smallest difference that patients perceive as beneficial.16 Such a benchmark value has also been dubbed the minimal clinically important change (MCIC) and the minimal important change (MIC) when used to describe longitudinal changes within the same patients over time. This variety of terminology, unfortunately, is also paralleled by the diversity in methodology for determining such a value. There is still a great deal of preliminary clinical research needed to “sort out” what is the most meaningful and applicable methodology. For the purposes of providing just an initial overview of such concepts, a brief discussion of them will be presented. Generally, there are two basic methods of calculating a measure’s MCID: (1) distribution-based; and (2) anchor-based approaches [for more explicit information, see Crosby, Kolotkin, & Williams,13 who list seven anchor-based strategies and seven distribution-based methods along with equations for determining clinically meaningful change].

Distribution-based Approaches

These approaches focus on the statistical characteristics of the sample, and compare an observed change to an index of variability in order to determine whether the change is substantive.15 Such approaches have used an array of measures of variability, including the standard error of measurement, the standard error of the mean change, the standard deviation of change, and the standard deviation of the stable group.13 Effect size is a standardized statistical measure that compares score-change after treatment to a pre-treatment standard deviation. Cohen17 established guidelines for interpreting effect size that have been widely accepted and utilized in the literature. We will emphasize this effect size measure in the present article. In contrast, the minimal detectable change (MDC) provides an indication of the smallest change that can be considered greater than the measurement error within a certain level of confidence.15 Demonstrating that an observed change is outside the range of normal variability, as well as the need for assuming the psychometric precision of the instrument used, are necessary steps to proving meaningful change. However, the clinical importance of such a difference currently remains to be proven. Statistical information on its own does not address the question of clinical importance.

Anchor-based Approaches

These approaches to determining the MCID relate a change on a patient-reported outcome measure to some external criterion thought to be indicative of change.15 Such external anchors may include objective outcomes (such as medication use or healthcare utilization) or subjective indications (such as patient reports of improvement). Indeed, patients’ global assessments of treatment efficacy after its completion is the primary anchor used to develop an MCID value. However, it should be clearly noted that evaluating responsiveness on one self-report measure based on another self-report rating is both circular in logic and fraught with bias. For example, global ratings possess unproven validity and reliability, and are also likely subject to contextual and personal influences (including patient expectancy and recall bias factors).

These anchor-based methods are also used in combination with statistical calculations in many ways, such as: determining the mean change score (MCS) of patients reporting significant change; comparing the magnitude of change to the variability of non-changed patients (minimal detectable change, or MDC); and finding the optimal cut-off score to differentiate between changed from non-changed patients (optimal cutoff point, or OCP).18 Despite subsequent statistical elaboration, such methods are founded upon patients’ global judgments of their experiences of change. In addition, all of these methods require accurate, non-arbitrary classification of patients into the various groups—a condition that is complicated by self-report effects and scale imperfections, but rarely explicitly addressed in the literature. In sum, anchor-based approaches attempt to address clinical importance by relating observed change to an external anchor, but the relevance and validity of any such external anchor must be firmly established in order to provide any meaningful guidance. This knowledge is currently lacking in the scientific literature.

Of course, the current pursuit of information regarding clinically important change is a worthy endeavor that holds great potential for informing and guiding the treatment of pain. However, the current state of research on MCID in pain treatment provokes more questions than it provides answers. Different studies have supplied a multitude of methods for calculating MCIDs, and these methods result in vastly disparate values. For instance, a recent attempt to develop international consensus regarding minimal important change on five common measures of pain and functional status reviewed studies that found differences in estimates of minimal important change that typically ranged from 25% to 35% of the magnitude of the entire scale (e.g., anywhere from 2 to 29 points on the 100-point Visual Analog Scale).19 The authors concluded that limited and heterogeneous empirical evidence has led to variable results, as well as no consensus method for determining the MCID. Despite these difficulties, the group managed to agree upon a 30% change as a generally useful guide for determining clinical importance. A percentage change may be more appropriate than an absolute number, because the value of the MCID depends not only on the methodology used in its calculation, but also on the magnitude of the initial scores.14,18,20 That is to say, the greater the initial severity of pain, the greater the change has to be before it is considered significant. Also, the impact of the same magnitude of change may depend upon whether it is an improvement or a decline. The context of the treatment and the consequences of the determination must also be contemplated when considering whether a change is clinically meaningful.18 Finally, it is much more credible to conclude that a change surpassing some predetermined threshold is meaningful than to conclude that a change failing to meet it is not. All in all, while the quest to provide objective meaning to unfamiliar, subjective units is worthwhile, there is great danger in oversimplifying the task and focusing on a single value that is often arbitrary and unsupported by evidence.14

With the above in mind, it becomes clear that very little consensus has been reached on the most appropriate method for determining the minimal clinically important difference in general, or the most accurate value or percentage to be used specifically on measures of pain and disability. Thus, any determination of the clinical importance of statistical results requires the contribution of external factors—whether patient-rated anchors, established guidelines of statistical interpretation, individual judgment of involved clinicians, or collaborative decisions of experts in the field. These still need to be developed!

Effect Size

We can now turn to a discussion of the more traditional way of determining the strength of an evaluated treatment outcome. Effect size has been the established method of quantifying the magnitude of a change by considering the difference between two groups, the effect of a treatment compared to baseline measures, or a measure of the strength of the relationship between two variables.21 The primary benefit of reporting an effect size is having a common term that permits a standardized comparison with other instruments, interventions, or studies about the relative magnitude of a statistically significant effect. Such a comparison can provide an indication of whether a change is clinically or practically relevant.22 Effect size can be expressed using a number of different statistics, most commonly Pearson’s r and Cohen’s d. Cohen’s d is especially useful in evaluating group differences, or pre- to post-treatment effects, and is defined as the difference between the group means divided by the standard deviation of either group (as long as the variances of the two groups are homogenous).17 The use of a pooled standard deviation is commonly used when calculating Cohen’s d, and it is calculated by finding the square root of the average of the squared standard deviations. In evaluating the magnitude of change within the same cohort (e.g., pre- vs. post-treatment measures), the difference between the mean of pre-treatment scores and the mean of post-treatment scores is divided by the standard deviation of the pre-treatment scores.21 Cohen17 has proposed the following d values or effect sizes that are now routinely used:
0.2 = small effect size
0.5 = medium effect size
0.8 = large effect size

Thus, a change of one half of a standard deviation is considered medium, and values for small and large effects were selected with the intent that they be readily distinguishable from a medium effect and no effect at all. Although the effect size statistic shares many of the limitations that make determinations of true clinical importance difficult, it represents a well-established, straightforward gauge of practical relevance, and it provides an objective value by which disparate instruments can be compared. Below, we analyze the responsiveness and utility of several commonly-used pain measures by comparing their effect sizes when administered to various pain treatment populations.

Background Context

A total of 262 patients completed an interdisciplinary treatment program for chronic pain, and thus had pre- to post-treatment measures. Demographic data for these patients are presented in Table 1. The patients were further divided into three subgroups based on their pain profile (240 patients could be assigned to one of these three groups): musculoskeletal pain subgroup (n=98); any other type of single pain diagnosis (e.g., headache, neuropathy, RSD, fibromyalgia, cancer; n=43); multiple categories of pain (i.e., more than one type of pain complaint; n=99). Paired sample t-tests were conducted to evaluate each measure for pre- to post-treatment change. Cohen’s d was used to express the effect size of each measure.
Table 1. Demographic Variables Total Interdisciplinary Treatment Completers
53.73 (14.99)
Gender (%)
Male 81 (30.9)
Female 181 (69.1)
Race (%)
Caucasian 217 (84.1)
African American 23 (8.8)
Hispanic 12 (4.6)
Asian 2 (0.8)
Other 4 (1.5)
Marital Status (%)
Single 33 (12.6)
Married 169 (66.0)
Living with significant other 8 (3.1)
Divorced or separated 29 (11.3)
Spouse Deceased 16 (6.3)
Receiving Disability Payments (%)
Yes 47 (18.2)
No 203 (81.2)
Pending litigation related to pain (%)
Yes 217 (86.1)
No 35 (13.9)

Physical/Functional Measures

Tools used to measure the physical and functional status of pain patients include:

  • Visual Analog Scale (VAS)
  • Million Visual Analog Scale (MVAS)
  • Oswestry Low Back Pain Disability Questionnaire (OSW)
  • Medical Outcomes Survey 36-Item Short Form Health Survey (SF-36)

The Visual Analog Scale (VAS) of pain intensity consists of a 10-centimeter horizontal line dashed at 2-point intervals from 0 to 10, ranging from “no pain” to “worst possible pain.” Patients reported their current degree of pain by marking an “X” on a line. The VAS is well-researched and has consistently demonstrated good psychometric properties,23-25 including reliability, validity, and sensitivity.26 However, despite widespread use, few studies have looked at its utility in predicting treatment outcomes.27 McGeary, Mayer, and Gatchel,28 though, have found that the level of pain intensity at pre- and post-treatment in a workers’ compensation rehabilitation setting was significantly associated with treatment outcomes. Higher pain ratings at pre-treatment were related to increased dropout rates, and higher pain intensity ratings at post-treatment indicated a greater risk for poor socioeconomic outcomes.

The Million Visual Analog Scale (MVAS)29 is a self-report questionnaire that addresses the domains of pain and disability. It consists of a 15-item analog scale on which patients respond by indicating on a 10-cm line their level of pain and disability associated with each domain. Scores ranging from 0-39 indicate mildly disabling pain, 40-84 indicate moderately disabling pain, and scores 85 and over indicate severely disabling pain. The MVAS was originally designed to assess physical functioning and disability among patients with chronic low back pain. While few studies have specifically focused on the psychometric properties of this measure, the research that has been done shows promising results.24,30 For example, the MVAS has been demonstrated to be an effective disability rating scale and to have utility in predicting treatment outcomes for patients with chronic disabling spinal disorders.31

The Oswestry Low Back Pain Disability Questionnaire (OSW)32 is a 10-item self-rated measure that assesses limitations on various activities of daily living due to pain. It was specifically designed for assessment of low back pain. Each item is scored on a 0-5 point scale, with a potential range of total scores from 0 to 50 and higher scores indicating increasing levels of disability. The OSW is widely used, especially in the low back pain patient subpopulation, and several studies have demonstrated its psychometric properties and usefulness as an index of functional limitation.33-38

The Medical Outcomes Survey 36-Item Short Form Health Survey (SF-36)39 is a 36-item multipurpose health survey used to assess quality of life related to health status. It is widely used for routine monitoring and assessment of healthcare treatment outcomes and is reported to have high test-retest reliability coefficients with good internal consistency. While it was not originally developed specifically for a pain population, the SF-36 has been used as an outcome measure in a number of studies focused on the treatment of pain.40-42 It contains eight scales, as well as two standardized summary scales that correspond to patients' overall sense of physical and mental well-being—the Mental Component Scale (MCS) and the Physical Component Scale (PCS). The availability of population-based normative data from various medical populations (such as a spinal population) makes the SF-36 useful for comparative purposes. In addition, elevations on the MCS have been consistently identified in the assessment of pain management patients.43-45 However, the SF-36’s utility as a clinical application for assessing individual patients’ outcomes remains unproven. Thus far, it has demonstrated utility in comparing group changes over time (e.g., pre- to post-rehabilitation) when studied with chronically disabled back pain patients, but it was ineffective when used for individual patient assessment.41

Psychosocial Measures

  • Beck Depression Inventory-II (BDI-II)
  • Pain Medication Questionnaire (PMQ)
  • West-Haven-Yale Multidimensional Pain Inventory (MPI)

The Beck Depression Inventory-II (BDI-II)46 is a 21-item self-report inventory designed to assess the severity of depressive symptoms. Each item is scored from 0 to 3, with a potential range of total scores from 0 to 63. The BDI was originally developed by Beck and colleagues in 1961 and revised in 1996.46,47 The BDI-II has categorical divisions, with 0-13 considered to be minimal depression, 14-19 mild depression, 20-28 moderate depression, and 29-63 severe depression. The BDI-II is a broadly used measure for assessing depression levels in a variety of settings. The relationship between pain and depression has been widely reviewed in the literature48-50 and the BDI-II has been demonstrated to be a valid measure of depression in chronic pain patients.

The Pain Medication Questionnaire (PMQ)51 is a relatively new self-report screening measure containing 26 items based on behavioral correlates and attitudes suggestive of medication misuse. The PMQ is constructed on a 5-point Likert scale ranging from 0 (“Disagree”) to 4 (“Agree”). Greater potential risk of medication misuse is reflected by an overall higher score. The PMQ was found to be psychometrically sound with regard to test-retest reliability and internal consistency, but there is limited research available on its association with outcomes or its utility in effectively tailoring interventions. However, the PMQ has good potential for research application, particularly considering many pain management programs’ current focus on identifying and treating medication misuse in chronic pain patients.52-56

The West-Haven-Yale Multidimensional Pain Inventory (MPI)57 is a 61-item self-report measure that utilizes a cognitive-behavioral perspective to examine how patients evaluate and manage their pain. This assessment classifies a patient’s responses into one of several coping styles: Adaptive, Interpersonally Distressed, Dysfunctional, Anomalous, Hybrid, or Unanalyzable. The MPI was originally developed and intended to be used for pre-treatment evaluation, not as a measure of treatment outcome. A normative sample of chronic pain patients was used in development and demonstrated good internal consistency.57 However, systematic investigation of outcomes has demonstrated a poor association between the MPI and treatment outcomes in a chronic pain population.58,59

Total Heterogeneous Pain Group (n=262)

Results indicated that the physical and/or functional instruments showing the greatest effect size were the VAS (d=1.27) and the MVAS (d=0.94), with both exhibiting large effect size. The OSW (d=0.67) showed a moderate effect size, while the SF-36/PCS (d=0.19) had the lowest effect size of the physical measures. Among the psychosocial instruments, a moderate effect size was obtained for three of the measures: the BDI-II (d=0.72), the SF-36/MCS (d=0.62) the PMQ (d=0.79). The MPI was associated with a negligible effect size (d=0.03). Table 2 summarizes the mean differences, standard deviations, effect sizes, and p-values for each of the instruments within the context of the Heterogeneous Pain Group.

Table 2. Effect Size Total Heterogeneous Pain Group (n=262)
Measures n (Mean ∆) SD d Sig.
VAS 238 (3.19) 2.50 1.27* .000
MVAS 227 (26.05) 27.57 0.94* .000
OSW 227 (5.41) 8.03 0.67** .000
SF-36/PCS 209 (4.16) 22.20 0.19*** .007
PMQ 87 (6.43) 8.15 0.79* .000
BDI-II 214 (5.50) 7.62 0.72* .000
SF-36/MCS 209 (7.61) 12.34 0.62** .000
MPI 234 (0.08) 2.47 0.03*** .634
*high effect size; **medium effect size; ***low effect size

Musculoskeletal Pain Subgroup (n=98)

Among patients with musculoskeletal pain, the physical/functional instruments that displayed a large effect size were the VAS (d=1.30), the MVAS (d=0.92), and the OSW (d=0.85). The SF-36/PCS (d=0.63) showed a moderate effect size. In terms of psychosocial instruments, a large effect size was obtained for the PMQ (d=1.00). Both the BDI-II (d=0.68) and the SF-36/MCS (d=0.76) showed a moderate effect size, while a negligible effect was associated with the MPI (d=0.07). Table 3 summarizes the mean differences, standard deviations, effect sizes, and p-values for each of the instruments within the context of the Musculoskeletal Pain Subgroup.

Table 3. Effect Size Musculoskeletal Pain Subgroup (n=98)
Measures n (Mean ∆) SD d Sig.
VAS 86 (3.11) 2.28 1.30* .000
MVAS 82 (24.32) 26.30 0.92* .000
OSW 24 (6.36) 7.46 0.85* .000
SF-36/PCS 80 (5.48) 8.76 0.63** .000
PMQ 24 (6.33) 6.35 1.00* .000
BDI-II 88 (6.00) 8.79 0.68** .000
SF-36/MCS 80 (9.13) 11.95 0.76** .000
MPI 86 (0.20) 2.66 0.07*** .492
*high effect size; **medium effect size; ***low effect size

Other Pain Subgroup (n=43)

For this group of patients, large effect sizes were associated with the VAS (d= 1.29), the MVAS (d=1.15), and the SF-36/PCS (d=0.90). The OSW (d=0.49) showed a moderate effect size. Among the psychosocial instruments, the largest effect size was associated with the PMQ (d=1.21). The BDI-II (d=0.71) displayed a moderate-to-large effect size, while the SF-36/MCS (d=0.43) was associated with a low-to-moderate effect size. The MPI was associated with a small effect (d=0.17). Table 4 summarizes the mean differences, standard deviations, effect sizes, and p-values for each of the instruments within the context of the Other Pain Subgroup.

Table 4. Effect Size Other Pain Subgroup (n=43)
Measures n (Mean ∆) SD d Sig.
VAS 41 (3.17) 2.46 1.29* .000
MVAS 38 (24.29) 21.04 1.15* .000
OSW 38 (3.84) 7.89 0.49*** .005
SF-36/PCS 34 (7.56) 8.44 0.90* .000
PMQ 15 (8.33) 6.88 1.21* .000
BDI-II 41 (5.39) 7.64 0.71** .000
SF-36/MCS 34 (5.63) 13.11 0.43*** .017
MPI 39 (0.41) 2.45 0.17*** .302
*high effect size; **medium effect size; ***low effect size

Multiple Pain Category Subgroup (n=99)

Large effect sizes were demonstrated on both the VAS (d=1.25) and the MVAS (d=0.97). The OSW (d=0.65) displayed a moderate-to-large effect size, while the SF-36/PCS (d=0.02) demonstrated a negligible effect. Among the psychosocial instruments, moderate effect sizes were demonstrated on the BDI-II (d=0.60), the SF-36/MCS (d=0.59), and the PMQ (d=0.59). Consistent with the other pain groupings, the MPI (d=0.16) demonstrated minimal responsiveness. Table 5 summarizes the mean differences, standard deviations, effect sizes, and p-values for each of the instruments within the context of the Multiple Pain Subgroup.

Table 5. Effect Size Multiple Pain Subgroup (n=99)
Measures n (Mean ∆) SD d Sig.
VAS 94 (3.36) 2.68 1.25* .000
MVAS 89 (29.85) 30.66 0.97* .000
OSW 87 (5.59) 8.62 0.65** .000
SF-36/PCS 78 (0.85) 34.62 0.02*** .829
PMQ 40 (5.69) 9.63 0.59** .001
BDI-II 94 (5.20) 6.82 0.76** .000
SF-36/MCS 78 (8.71) 11.94 0.73** .000
MPI 93 (0.38) 2.28 0.16*** .115
*high effect size; **medium effect size; ***low effect size

Effect Sizes and Physical/Functional Measures

A comparison of effect sizes among the physical/functional measures is illustrated in Figure 1. The VAS measure of pain intensity yielded the greatest effect size. This conclusion is supported by other research that indicated a patient’s self-report as the best measure of pain60 and the utility of the VAS as a predictor of treatment outcome.28 It should also be noted that the large effect size for the VAS was consistent across all pain groups, indicating that it is applicable across differing types of pain categories. Despite its limitations, the VAS can provide information about pain perception that has proven to be a valuable tool in assessing and treating chronic pain. However, it cannot be concluded that the VAS is the only measure needed to assess outcomes of interdisciplinary treatment for chronic pain patients. This study displays the strength of several other measures in assessing chronic pain patients and treating more than one aspect of specific physical complaints.

The MVAS was demonstrated to be a strong indicator of change when assessing chronic pain patients. This may be due in part to the fact that the first question on the MVAS is the same as the question posed on the VAS, specifically, “How bad is your pain?” Like the simple VAS, the MVAS had a large effect size in the context of the heterogeneous population as well as in the divided groups (i.e. musculoskeletal, other, and multiple), supporting its utility in assessing a number of diagnostic pain categories. As indicated previously, the MVAS has been shown to be reliable in predicting treatment outcomes for patients with chronic disabling spinal disorders.31 The current findings further support the use of the MVAS to determine a patient’s perception of pain and to document treatment outcome.

Moderate effect sizes were obtained for the OSW in the context of a heterogeneous group (d=.68). However, the present study’s findings indicate a low effect size for the OSW for patients in the “other” category (d=0.49). When divided into diagnostic groups, patients who experienced any single pain complaint other than musculoskeletal did not demonstrate as high of a responsiveness to change on this measure. Beurskens’s study61 displayed a large effect size for the OSW, which likely can be explained by its being comprised solely of low back pain patients, the specific group for which the OSW was designed. The current results indicate that the OSW is a useful measure when used to assess treatment outcomes for chronic spinal and/or musculoskeletal pain patients, as well as functional limitations within the context of a musculoskeletal pain population.

The small observed SF-36/PCS effect size (d=.19) was consistent with a previous investigation of the SF-36/PCS within a heterogeneous population pain patients (d=0.28).43 The fact that the SF-36/PCS showed significantly lower effect sizes than any other physical measure analyzed in this study may be due to the nature of the questions asked in evaluating the physical components related to quality of life. Other physical measures (i.e., VAS, MVAS, & OSW) assess perceived pain, physical disability, and direct limitations to specific activities of daily living due to pain, while the SF-36/PCS focuses on quality of life factors that may not be as consistently defined, and it therefore yielded a smaller effect size. It has been shown that physical and mental components measuring quality of life both contribute to deficits in functioning.45 However, the present results suggest that the quality of life physical components may not be as significantly indicative of impairment as quality of life mental components when treating a heterogeneous group of chronic pain patients—particularly with a heterogeneous group and patients with multiple pain complaints. One possible explanation may be that when a chronic pain patient is experiencing multiple types of pain, the quality of their physical life is not as likely to change drastically in the course of treatment. The division of pain categories in this study revealed that this measure shows a larger effect when used with specific pain groups than when given to patients with more than one type of pain diagnosis.

Effect Sizes and Psychosocial Measures

A comparison of effect sizes among the psychosocial measures is illustrated in Figure 2. The results further support the use of the BDI-II in a chronic pain setting. The moderate-to-large effect size (d=0.65) confirms this measure’s strength in the ability to detect significant changes arising from interdisciplinary treatment for chronic pain patients. A previous study found BDI-II effect size ranges from 0.87-1.67.62 The present observed effect size was likely lower due to the specific population that was studied. In the Reisch et al. study,62 participants included patients who were being treated for various psychiatric disorders as the primary diagnosis and who thus likely had greater changes in depression levels than did the present study population.

The SF-36/MCS assesses quality of life related to health status, specifically a patient’s overall sense of mental well-being. The availability of population-based normative data from various medical populations (such as a spinal population) makes the SF-36 useful for comparative purposes. This study demonstrated a moderate-to-large effect size for the SF-36/MCS (d=.62) among a heterogeneous sample of patients. The mental component resulted in a stronger measure of effect size than the physical component, indicating that a measure of mental well-being is more sensitive to changes in pre- to post-treatment outcomes of interdisciplinary chronic pain patients.

This underscores the importance of a biopsychosocial assessment for chronic pain patients. Viewing the results in the context of the different types of pain diagnoses studied in this heterogeneous population, it can be noted that the mental component yielded a moderate effect size for the musculoskeletal and multiple pain categories but a low effect size for the other single diagnosis group. This could indicate that the MCS is not as sensitive to change for patients who have a single pain diagnosis such as headache, neuropathy, fibromyalgia, reflex sympathetic dystrophy, or cancer. In addition, the “other” category used in this study may be too broad to yield results as meaningful as the other more specific parameters for group classification.

The PMQ displayed a moderate-to-large effect size (d=.79), indicating its utility in assessing change as related to the potential for medication misuse in chronic pain populations. The division of the heterogeneous group into diagnostic categories yielded moderate-to-large effect sizes as well, supporting the use of this measure in a variety of specific pain populations. The single pain categories (musculoskeletal and other single type of diagnoses) revealed a large effect size, and patients with multiple types of pain complaints showed a moderate effect size. These results suggest that patients with more than one type of pain diagnosis may be less likely to report a change in their attitudes and behaviors in medication use.

Previous studies have documented the inability of the MPI to satisfactorily document treatment outcome, especially compared to other instruments.58,59 The present results further support the notion that the MPI is not a strong measure when used to assess outcome changes in a heterogeneous chronic pain population. Indeed, the overall effect size was negligible (d=.02) and consistently within the small effect size range (d < .2) across the three different pain groups. These findings reinforce the original conceptualization of the MPI as an instrument of pre-treatment evaluation with little utility in documenting treatment outcome.

Summary and Conclusions

Due to the prevalence of the problem, the enormity of the suffering, and the significance of the financial burden, chronic pain represents a vitally important problem in modern healthcare. The inherently subjective and multifaceted nature of the affliction hampers attempts at accurate assessment. Despite these difficulties, appropriate understanding and treatment of pain demands adequate measurement of its nature, severity, and consequences, as well as its progression over time. Controversy remains about the most appropriate statistical method to gauge clinical responsiveness, especially in documenting meaningful clinical changes. A multitude of methods have been employed, producing a plethora of values purported to represent the minimal change required to be clinical significant. However, these efforts have thus far failed to adequately address the myriad of theoretical and methodological concerns to arrive at a truly trustworthy value. Statistically-based methods fail to demonstrate clinical importance. Anchor-based approaches are only as good as the criteria on which they are based. Hybrid approaches have yet to establish their ability to accurately classify patients into groups. Accordingly, although the appeal of a cut-off score for significant change is readily apparent, clinically important change is better conceptualized as a guiding consideration rather than a concrete value.

“The data presented in the current article are consistent with the biopsychosocial model of pain, emphasizing pain as a complex, multidimensional phenomenon expressed as interactions among biological, psychological and social components.”9,63

Effect size represents one of the simplest, most well-established statistics for comparing diverse findings and determining practical importance. The main purpose of the present article was to demonstrate the effect size of common measures utilized in chronic pain settings at pre- and post-treatment to determine which measures show more responsiveness in measuring interdisciplinary treatment outcomes. Future research would be beneficial in extending the scope of the current data to other types of diagnoses, treatments, and settings in order to further understand the nature of pain and better inform its treatment. In addition, further investigation of methods for comparing measures and evaluating clinical importance would be helpful in clarifying the complicated interactions and consequences of pain assessment.

The data presented in the current article are consistent with the biopsychosocial model of pain, emphasizing pain as a complex, multidimensional phenomenon expressed as interactions among biological, psychological and social components.9,63 While certain measures, such as the VAS and MVAS, demonstrated a large effect size, several other instruments displayed moderate-to-large effect sizes (e.g., OSW, BDI-II, PMQ, and SF-36/MCS) and should not be disregarded when choosing a battery of tests for assessment purposes. Gatchel9,64 recommends using multiple measures of change whenever possible, and these data support the usefulness of multiple assessment instruments in providing a comprehensive assessment of chronic pain patients. These data provide guidance for selection of the strongest measures if a comprehensive assessment is not available due to time constraints or financial limitations. However, a comprehensive evaluation makes use of instruments that demonstrate both robust psychometric properties and diverse content domains in order to optimally assess and treat patients with pain in the context of the biopsychosocial model.

Last updated on: February 21, 2011
close X