The pain scale demonstrated construct and content validity; however intra- and inter-observer reliability for the items were only satisfactory, suggesting that refinement and readjustment of the items was required.
Construct validity was demonstrated by the observed differences in scores between the groups at T4 [10]. The differences between GC, GA and GAA show that the scale is able to differentiate between horses with and without pain. Furthermore, in view of the fact that pain scores were different between GC and GCA, the scale was also able to identify different pain intensities. The ability of the scale to measure pain was confirmed by its responsiveness, seen in the change in GC scores between T4, T6 and T24 [2], and by the percentage change in pain scores after surgery and in response to analgesic administration [11].
In both groups that did not undergo surgery (GA, GAA) male and female horses were included, which may have been a limitation of the study; since the scale was based on assessment of behaviour, sex differences in behaviour may produce differences in scores between groups at TC. However, this did not occur and scores in the different sexes were similar. The absence of a difference between groups before surgery and/or anaesthesia provides an alternative means to confirm construct validity of the pain scale. Response to stress may mimic pain behaviour, and since it might be predicted that stallions would be more agitated prior to surgery, the lack of a difference between treatment groups in baseline scores shows that the construct validity of the scale was not compromised [6].
Another aspect that may have confused the interpretation of pain behaviours, especially in relation to assessments requiring the evaluator to interact with the horse, was the time allowed for each horse to acclimatise to the stable, and to the investigator who interacted with the horses. Variability in the degree of this initial interaction between each horse and the investigator was expected. However, in order to limit this, one of the inclusion criteria for the study was that the horse must be halter trained and used to interacting with people.
Another limitation of the study is that the pain model used here (castration) probably results in only mild to moderate pain. Therefore the scale should also be tested under conditions considered to cause more severe pain and be validated under these circumstances as well.
To date there are no validated pain scales that measure mild to moderate soft tissue pain in horses and so criterion validity was evaluated by the contrast in variability between the standard evaluator and the other evaluators [7,11]. The total scores from the pain scale were correlated with three other classical scales used to measure clinical pain in animals and a strong positive correlation was evident [8]. Although these scales are also not validated, they are frequently used to evaluate pain [12] and lameness in horses [13,14]. Correlation with such classical scales has also been used previously to validate pain scales in horses for measurement of visceral pain [3,4].
The discrepancy in variability between the evaluators and the standard evaluator for each item of the scale suggests poor criterion validity. However, another possible explanation would be the limited training of the evaluators, combined with the complexity and large number of items that comprise the scale. The blinded evaluators were chosen because of their experience in pain-related studies in numerous other species, including horses. Furthermore, the item-total correlation, which denotes the importance of each item, showed that most provided moderate correlation. Hence the very large number of items evaluated within the initial scale is likely to have reduced the overall criterion validity.
The variation in intra and inter-observer reliability for each item on the scale may suggest a low reliability of the proposed instrument under study. However, the process of pain scale validation does not occur in one step but is iterative, so after excluding items showing no specificity and relevance, the instrument should be re-evaluated using the same validity criteria [2,5,7]. The poor reproducibility for some items, such as “Response to approach” and “presence of the observer”, may be related to failure of observation, due to the difficulty in observing the videos, a fact that might be resolved when observations are performed in situ. Otherwise the presence of the observer may also modify the animals’ behaviour.
The scale items that gave the best relevance, specificity and total-item correlation results were retained in the scale after the refinement. However, despite the lack of relevance and low inter-observer reliability, the behaviour “kicking the abdomen” was retained in the scale as this is considered to be a classical abdominal pain related behaviour [12,15]. Although the inclusion of physiological parameters is questioned by some authors [3], these items are usually included in tools to assess acute pain in horses [2,5], as well as in other species [7] and provide a multidimensional character to the scale. Heart rate was retained after refinement as this was the only parameter that varied with time, it is easy to evaluate and has historical importance in the assessment of pain [12]. In view of the fact that heart rate increased above 25% of pre-operative values (TC) in animals undergoing surgery (GC and GCA) at T4 and T6, overall changes in heart rate above 25% were considered relevant as an indicator of post-operative pain and were therefore included.
As noted in a study that described the behaviours of horses undergoing arthroscopic surgery and laparotomy [16], horses without pain were more likely to position themselves at the front of the stable compared to other positions in the box. Behaviours such as “head position” and “response to auditory stimuli” were excluded due to their variability and since they might be unduly influenced by environmental stimuli.
Behaviours related to the interaction with the observer showed similar relevance and specificity to those reported when using an orthopaedic pain scale [2] and similar item-total correlation to animals undergoing laparotomy [5]. However this behaviour may also be influenced by the type of management with which the animal is familiar [17]. In our study, locomotion was also useful to detect pain after soft tissue surgery, as animals in pain tend to be reluctant to move, reflecting the findings of altered locomotion in horses after orthopaedic surgery [13,14]. However this contrasts with results from other studies in which increased locomotion was associated with pain [2,3], indicating that it is the change in locomotion that is a useful characteristic to evaluate during pain assessment in horses.
Although palpation of the surgical site showed low item-total correlation in this study, specificity ranged from moderate to good and this item was relevant. In a previous study, horses undergoing laparotomy showed a high incidence of avoidance responses [5]. In our study, the reaction response was probably related to the inflammation caused by surgical incision. However, it is common for horses not to tolerate palpation of the inguinal area. Furthermore, in those cases where this behaviour was evaluated on the video, there may have been misinterpretation. Although the two cameras were placed in diagonally opposite positions in the stable to try to avoid blind spots, it was difficult to observe the animal when it was positioned close to the wall directly beneath one of the cameras. Under some of these circumstances it was not possible to visualize the pelvic limbs during palpation of the groin.
This is the first study to identify the behaviour of lifting the pelvic limb as a pain-related behaviour in the horse, indicated by the relevance and moderate specificity and item-total correlation. This item was included in the scale after validation of content and before construct validation because it was a behaviour observed by the evaluator in situ during assessment of the GC group.
Since there is now a considerable body of work describing the development of tools for pain assessment, it was possible to evaluate the relevance, specificity and reliability of various pain behaviours previously described as relevant in horses. The low repeatability and reproducibility of some behaviours may indicate that their interpretation is influenced by the experience of the evaluator, and therefore they are imprecise. Although the reliability of the total score of the refined scale was not investigated, the sensitive and specific items of the behaviours and categories may be used to compose a refined scale for future validation, ideally under clinical conditions.
It should also be noted that during the initial part of the scale, the observer was not present in the box. I It is therefore difficult to accurately ascertain how much the evaluator's presence might interfere with the pain assessment. Consequently, whenever possible horses should be observed using a remote monitoring system. Although the time necessary for pain assessment has not been determined, after 700 hours of video analysis, we empirically suggest a time frame of 5 minutes would be sufficient for observation of pain-related relevant behaviours in the horse.