We have examined the variability, repeatability and test-retest reliability of FVEPs in normal, adult horses to evaluate if this method may be a suitable, additional diagnostic tool to use in the clinical work-up of equine patients with ophthalmological and/or neurological disease. Before a new method can be used in clinical practice, its strengths and weaknesses have to be assessed, in order to correctly interpret results from clinical patients and be able to make adequate clinical decisions.
The FVEP waveform recorded in all 17 horses in this study consisted of a series of positive and negative wavelets (P1, N1, P2, N2a, P3, N2, P4 and P5), as previously reported [16, 17]. However, some wavelets were only present in a limited number of recordings in all horses. This is similar to results from several human studies. Gastaut & Regis [8] showed that only about 20% of recordings included all wavelets in the human FVEP (peaks I-VI according to their nomenclature), but wavelets IV (around 80 ms) and V (around 130 ms) were present in practically all recordings and in all subjects. In the ISCEV standard for human FVEPs, the waveform is defined as a series of positive and negative wavelets (N1, P1, N2, P2, N3 and P3). Of these wavelets, N2 (around 90 ms) and P2 (around 120 ms) are described as the most robust components [1]. In our study on the equine FVEP, wavelets N1, P2, N2 and P4 were present in all recordings in all 17 horses. Both the intra- and inter-individual coefficients of variation were low (5–11%) for the peak times of P2, N2 and P4. Although present in all recordings, the N1 peak time was shown to be more variable (up to 17%), probably because this low-amplitude wavelet is occasionally difficult to localize and measure precisely. Our results support that the P2, N2 and P4 peak times are most robust, and should be included in the evaluation of equine clinical patients. Although not present in all recordings, the N2a, P3 and P5 wavelets may still provide valuable clues regarding transmission and processing of visual stimuli. Hence, they may be useful for evaluation of certain conditions, or be valuable for our understanding of processing of visual input, but this warrants further studies.
Our results show that there is a substantial intra- and inter-individual variability in amplitudes in the equine FVEP. A substantial variability in amplitudes has also been shown in studies on the human FVEP [8, 9]. In our study, the coefficients of variation for the amplitudes present in all recordings in all horses (N1P2, P2N2 and N2P4), were up to 30% within horse and more than twice that (up to 64%) between horses. Hence, the variation for these amplitude parameters was considerably higher than that for peak times. Based on the results of our study, a wide range of amplitudes must be considered normal in the FVEP recorded from sedated horses. Therefore, the mere presence of a wavelet and its peak time may often be sufficiently informative, as very few patients are likely to have amplitudes that fall outside of the normal range and only severe abnormalities will cause sufficiently abnormal amplitudes.
The waveforms obtained from left and right eyes during the same recording session were similar. The variability in peak times between eyes in the same horse was low, only 7% or less. The variation has also been described to be quite similar between eyes within the same subject in humans [1]. This low inter-ocular variation enhances evaluation of clinical patients with suspected unilateral dysfunction, when one eye can serve as a control eye. The coefficients of repeatability reported in our study represent the range within which the absolute difference between two measurements on the same subject should fall with a 95% probability. Larger differences, outside the range set by the CR values (P2; 5 ms, N2; 18 ms, P4; 18 ms), are likely to indicate abnormal function. Again, amplitudes were shown to be more variable between eyes. However, the P2N2 and N2P4 amplitudes may provide important information, with CR values at 1.7 μV and 2.3 μV, respectively.
The waveforms obtained at separate recording sessions appeared quite similar. The coefficients of variation for peak times between sessions were low (3–6%) but higher for amplitudes (24–30%), which is similar to the variability shown between eyes within the same session. Bland-Altman plots with 95% limits of agreement were used to graphically examine the agreement between two measurements from separate recording sessions. The plots show that the mean difference between sessions is low for the P2 peak time, but higher for the N2 and P4 peak times. The mean differences were similar across all amplitude parameters. The coefficients of repeatability were computed to quantify the absolute repeatability in the same unit as the parameter with a probability of 95%. Based on our results, differences in peak times between recording sessions falling outside the reported CR values (P2; 5 ms, N2; 16 ms, P4; 39 ms), are likely to indicate either an improvement or deterioration of a condition. For the amplitudes, the CR values are higher, again supporting the conclusion that differences in amplitudes between sessions only rarely will provide reliable information regarding the progression of a disease or effect of a treatment. The ICC is a widely used reliability index in test-retest analyses [18]. In our study on the equine FVEP, we found that the ICC values ranged from fair (N2 peak time) to excellent (P2 peak time and P2N2 amplitude) according to the grading system proposed by Cicchetti [19].
Some of the variability in our data is due to difficulties in establishing the peak or trough of a specific wavelet precisely. Large amplitude, pointed wavelets are generally easier to pinpoint compared to low-amplitude, or more rounded or elongated wavelets. N1 was usually a low-amplitude wavelet, where the trough sometimes was difficult to localize precisely. P2, on the other hand, was most often a distinct peak that was easy to discriminate. Thus, it is not surprising that the P2 peak time showed least variability and highest repeatability and reliability. Although the N2-complex always was easy to identify, this complex was sometimes wide with a flat bottom (not a distinct trough), and sometimes also included N2a and P3. Therefore, the exact position of N2 was occasionally difficult to determine. P4 was also most often easily discriminated, but in some horses this wavelet was wide without a distinct peak (for example horse 3 in Fig. 2), which made precise marking difficult. In spite of differences in the precise localization of some wavelets, coefficients of variation were low for P2, N2 and P4 peak times. The coefficients of repeatability, which represent the absolute difference between measurements (in the same unit as the parameter), show higher values for both N2 and P4 peak times compared to P2 peak time, which altogether is not surprising.
The difficulties in the precise localization of wavelets certainly affected the amplitude measurements for the same reasons mentioned above, although probably not as much as peak times according to our subjective assessment. The large variability shown for amplitudes is more likely due to other factors, such as variation in the level of sedation, muscle and movement artifacts, the temperament of the horse and its responses to external disturbances in the clinical environment. The variation between sessions may also be attributed to minor differences in electrode positions between sessions.
Andersson et al. [9] evaluated the test-retest properties of the human FVEP in 15 awake, normal subjects at three separate recording sessions. They found that precise marking of wavelets was sometimes difficult, due to split peaks and highly variable waveforms within and between individuals. Wide inter-individual ranges for both N2 and P2 peak times and amplitudes (the parameters they evaluated) were described, and a high intra-individual variability over time was reported. Specifically, they concluded that due to the large variability, the FVEP is unreliable as a tool for detecting increased intracranial pressure which had previously been suggested by other authors. Therefore, they advised caution when interpreting changes in FVEPs in clinical work. It is not possible to make direct comparisons between human and equine FVEPs, because of differences in the overall waveform between species, different electrode positions due to anatomical dissimilarities, and the fact that our horses were sedated and their human subjects fully awake. All these differences may certainly have an impact on the variability. However, some comparisons may still be relevant. We saw some individual variation in waveforms between horses (Fig. 2 and Fig. 3), which is similar to what was described by Andersson et al. [9] in their human subjects. As in our study, Andersson et al. [9] report that deciding on the exact position of a peak or trough of a wavelet was sometimes difficult. However, our impression is that the wavelets were somewhat easier to discriminate in the equine FVEPs compared to the human FVEPs. Also, split peaks were not as prominent and frequent as described by Andersson et al. [9] in some human subjects. The range is narrower for P2 and N2 peak times in the equine FVEP (Table 2), compared to more than 50-ms-intervals considered to represent the normal ranges for N2 and P2 in the human study.
A limitation of our study is the small sample size, which is due to limited access to horses available for the study. Reported values for reliability and repeatability should therefore be interpreted with some caution. In addition, further studies are needed, to evaluate equine FVEPs in horses with diseases in visual pathways, causing visual impairment. Although we have provided CR values between eyes and sessions in equine FVEPs, their clinical significance warrants further studies. Differences lower than the reported CR values (between eyes and sessions) can still be of importance, because mild dysfunction of the visual pathways may still be present. Therefore, results from FVEP testing should always be put into context with other findings and results from additional tests obtained during work-up of a patient, including for example ophthalmic and neuro-ophthalmologic examinations, obstacle course testing and diagnostic imaging.