 Research article
 Open Access
 Published:
Comparison of timeseries models for monitoring temporal trends in endemic diseases seroprevalence: lessons from porcine reproductive and respiratory syndrome in Danish swine herds
BMC Veterinary Researchvolume 15, Article number: 231 (2019)
Abstract
Background
Monitoring systems are essential to detect if the number of cases of a specific disease is rising. Data collected as part of voluntary disease monitoring programs is particularly useful to evaluate if control and eradication programs achieve the target. These data are characterized by random noise which makes harder to interpret temporal changes in the data. Monitoring trends in the data is a possible approach to overcome this issue.
The objective of this study was to assess the performance of three timeseries models that allows monitoring trends in data in terms of its adaptability when used to monitor changes in disease seroprevalence at a national scale based on data collected as part of voluntary monitoring programs. We compared two Bayesian forecasting methods and an Exponential smoothing method, specifically a Dynamic Linear Model, a Dynamic Generalized Linear Model and a Holt’s linear trend method, respectively. These three different types of time series models were applied to data on weekly seroprevalence of Porcine Reproductive and Respiratory Syndrome (PRRS) in Danish swine herds.
Results
Comparing the linear crossdependence between the filtered values obtained from the three models and the raw data, we observed that the Holt’s linear trend method shows negative linear dependence for roughly half of the time for breeding/nucleus and multiplier herds, having values close to zero for most of the period in finisher herds.
Conclusions
Bayesian forecasting methods adapt faster to changes in the data, compared to the deterministic Holt’s linear trend method. The practical implication of this greater flexibility is that the Bayesian methods will provide more reliable values of changes in the data and have potential to be implemented as part of a surveillance system in Denmark.
Background
Monitoring systems are essential to detect changes in disease status in a timely and effective manner. Failure of control (and eradication) programs may have a devastating economic impact on herds with susceptible animals.
In the context of endemic diseases, slow and gradual increases in incidence and prevalence are expected to be observed due to the existence of vaccination and health management programs and to previous exposure, which can lead to natural immunity of several individuals in the population [1]. When following up on control and eradication programs, the failure to achieve a target value of disease prevalence within a certain period of time indicates that the implemented strategies should be revised. These programs are often based on laboratory diagnostic tests performed on a regular basis. As a result, the serological results generate large amounts of data characterized by random noise, due to the variation in the disease prevalence and the number of herds tested over time. The frequency of laboratory diagnostic testing also depends on the economic value of the animal and on the disease impact [2]. In these cases, the data differ from those obtained from traditional surveillance (generally focused on identifying new cases) increasing their complexity as disease burden is affected not only by new cases but also by the duration and recovery rate.
In recent years, several studies explored the performance of different temporal monitoring methods in detecting outbreaks of (re)emerging diseases [3, 4]. However, these methods might result in false alarms when applied to laboratory diagnostic data characterized by random noise and, as a consequence, with the costs of investigation of these alarms as well as a lower trust on the monitoring system. One alternative approach could be to monitor the trend of the underlying level of the time series, which can be positive or negative depending on whether the time series exhibits an increasing or decreasing pattern [5, 6]. This is particularly useful for monitoring temporal changes in trends laboratory diagnostic results collected as part of voluntary disease monitoring programs. Based on these data, veterinarian authorities can implement control measures whenever certain thresholds related to the disease status have been exceeded. Furthermore, the efficiency of implemented control measures and eradication programs can be evaluated and redefine whenever the disease prevalence (and incidence) fails to achieve a certain level.
In a previous simulation study [6], monitoring the trend component showed great potential as a basis for monitoring temporal changes in diseases prevalence. However, those methods were not applied to real laboratory diagnostic data collected as part of voluntary disease monitoring programs, where challenges such as shifts in the collection frequency often occur.
The objective of this study was to assess the performance of three timeseries models in terms of their adaptability when used to monitor changes in disease seroprevalence at a national scale based on real world data. We compared two Bayesian forecasting methods, namely a Dynamic Linear Model (DLM) and a Dynamic Generalized Linear Model (DGLM), both with a linear trend component. The third time series model was an exponential smoothing method, specifically a Holts’ linear trend model. These three different types of time series models were applied to data on weekly seroprevalence of Porcine Reproductive and Respiratory Syndrome (PRRS) in Danish swine herds.
Results
Data description
A total of 51,639 laboratory submissions from 5095 Danish swine herds sent between 2007 and 2014 were included in the analysis, corresponding to 386 Red SPF herds, 3441 Blue SPF herds and 2174 Non SPF herds. The median (Q1 — Q3) number of weekly herds tested was 59 (50–68), 53 (42–62), 23 (19–28) for Red SPF, Blue SPF and non SPF herds respectively. The median PRRS seroprevalence from 2010 to 2014 is 0.10 (0.06–0.14) for the Red SPF, 0.32 (0.26–0.37) for the Blue SPF and 0.35 (0.27–0.43) for Non SPF herds. For the Red SPF herds with at least one positive submission throughout the study period, the median (Q1 – Q3) number of weeks between two consecutive submissions was 4 (1–5) weeks. For the Red SPF herds with all submissions negative for PRRS, the median number of weeks between two consecutive submissions was 4 (4–5) weeks. For Blue SPF herds, the corresponding values were 37 (13–52) and 51 (45–54) while for the Non SPF herds they were 32 (3–40) and 25 (2–27) respectively.
Models initiation and parametrization
The DLM was run with deltas of 0.95, 0.97 and 0.98 for Red, Blue and Non SPF herds respectively; deltas of 0.98, 0.97 and 0.98 were used in the DGLM for the same herd types respectively. For PRRS seroprevalence in Red SPF herds, the initial trend was − 0.0004 with an initial level of 0.129 and a beta of 0.001 and alpha of 0.08 for the Holt’s linear trend method; an initial trend of 0.0004 and initial level of 0.373 with a value of beta of 0.001 and alpha of 0.21 was defined for Blue SPF herds. For modeling PRRS seroprevalence in Non SPF herds, the Holt’s linear trend method was defined with an initial trend of − 0.0004, an initial level of 0.404 with a beta of 0.113 and alpha of 0.002.
Modeling performance
Results of the forecast errors obtained for the three modeling approaches are shown in Table 1. The root mean of squared errors (RMSE) of the DGLM is significantly smaller than the RMSE of the DLM for Red and Non SPF herds. The RMSE of Holt’s linear trend method is significantly larger than both Bayesian forecast methods for Red SPF herds and significantly lower for Non SPF herds. For Blue SPF herds the RMSE values only differed on the 4th decimal place.
Comparing the filtered means obtained from the different models (Fig. 1), it is possible to observe that the DLM is the method with the highest flexibility to adapt to changes in PRRS seroprevalence for the different herd types. This is most evident for the period between June 2011 and January 2013 for Non SPF herds where the filtered values of the DLM followed the decreasing shifts in PRRS seroprevalence in contrast with the HW filtered mean, which stayed at a constant level.
Local linear crossdependence
A Kernel smoother parameter of 4 for the DLM and DGLM and of 30 for the Holt’s linear trend method were found to minimize the generalized crossvalidation gamma deviance of the multivariate Evolutionary Wavelet Spectrum analysis. Figure 2 illustrates the local linear crossdependence between PRRS seroprevalence in the different herd types and the filtered values obtained from the three different models for level j = 1, i.e. the wavelet should include 1 time unit corresponding to 1 week. The local linear crossdependence between PRRS seroprevalence in the different herd types and the filtered values obtained from the DLM and the DGLM show a positive linear dependence for most of the time (i.e. the filtered values obtained from these models followed the observed changes in PRRS seroprevalence). The linear crossdependence between PRRS seroprevalence in the different herd types and the filtered values obtained from the Holt’s linear trend method shows a negative linear dependence for roughly half of the time for Red and Blue SPF herds, having values close to zero for most of the period in Non SPF herds. The sum of the local linear crossdependence absolute values obtained for the whole period of study (256 weeks starting from 23rd July 2012 to 31st December 2014) are presented in Table 2.
Discussion
The performance of three timeseries models for monitoring trends in PRRS seroprevalence in different types of Danish swine herds was assessed. Bayesian forecasting methods demonstrated more flexibility to adapt to shifts in PRRS seroprevalence timeseries when compared to the Holt’s linear trend method.
The Red SPF herds are tested for PRRS on a monthly basis, while for Blue SPF herds the frequency of testing is once per year and Non SPF herds have no legal requirements for testing frequency. For Red SPF and Blue SPF herds, the laboratory diagnostic data are a good indication of the betweenherd PRRS seroprevalence at country level. For Non SPF herds, the frequency of testing depends on suspicions of outbreaks, farmer compliance, trading purposes and ongoing control and eradication programs. This resulted in timeseries characterized by random noise, as a result of the variation in the disease prevalence and the number of herds tested over time, as well as larger shifts on the level, as seen in Fig. 1.
One core feature, which separates the two Bayesian methods from the Holt’s linear trend method, is that the Bayesian methods are fundamentally stochastic in their way of updating the parameter vector, while the Holt’s linear trend method is fundamentally deterministic; in the Bayesian approach, the values in the parameter vector are updated at each time step, based not only on the magnitude of the forecast errors but also the level of confidence we can have in the forecast error, expressed as the forecast variance in the Kalman filter [7]. Furthermore, the observational variance, which is used when calculating the forecast variance [7] was dependent on the number of herds observed at a given time step. In other words, the amount by which the model would errorcorrect was influenced by the reliability of the observed prevalence values, as well as the reliability of the modelbased forecasts and forecast errors. With all of these factors, which went into the Bayesian estimation of the most likely underlying true levels of the time series, it is perhaps not surprising that both of the Bayesian methods showed greater flexibility than the deterministic method. This is particularly useful to veterinarian authorities to provide reliable information when implementing (and revising) disease control and eradication programs whenever certain thresholds related to the disease status have been exceeded. Also, in cases where a shift occurs from voluntary to mandatory surveillance programs (or vice versa), it would be expected that the variance of the prevalence would be significantly affected; for the mandatory records, you would expect that the farms being tested are much the same from one month to the next, leading to less variation. The Bayesian methods would be able to take such changes into account when estimating the true prevalence, while this would be impossible for deterministic methods such as the Holt’s linear trend method.
Conclusions
We showed that Bayesian forecasting methods adapt faster to changes in the data, compared to the deterministic Holt’s linear trend method. The practical implication of this greater flexibility is that the Bayesian methods will provide more reliable values of changes in the prevalence and have the potential to be implemented as part of a surveillance system in Denmark. Understanding these differences in utility of various monitoring methods will allow veterinarian authorities to improve the implementation (and revisions) of disease control and eradication programs at national scales.
Methods
Data sources
Laboratory submission data stored in information management systems in the National Veterinary Institute  Technical University of Denmark and in the Laboratory for Swine Diseases  SEGES Pig Research Centre were used to determine the weekly PRRS seroprevalence in Danish swine herds from January 2007 to December 2014. The laboratory diagnostic test results for PRRS from both laboratories were only available for this period of time. Collections of individual blood samples collected on the same day from different animals from a given herd are processed as individual laboratory submissions. Only submissions with at least 10 individual blood samples tested by serological tests (Blocking EnzymeLinked Immunosorbent Assay — ELISA ([8]; IDEXX, Ludwigsburg, Germany) and/or Immunoperoxidase monolayer assay — IPMA [9]) were included in the study. Herds were classified as PRRS seropositive when at least 2 individual blood samples in each submission tested PRRS positive. The countrylevel betweenherd PRRS seroprevalence was calculated weekly as the proportion of PRRS positive herds for a given time period from the total number of herds tested for PRRS on that week. This will be referred to as “PRRS seroprevalence” throughout the manuscript.
In Denmark, the Specific Pathogen Free (SPF) system is a voluntary health program with established rules for monitoring several diseases, including PRRS [10]. The designations “Red”, “Blue” and “Green” are used within the SPF system to classify the herds according to its biosecurity status. Within the SPF herds, the “Red” herds are tested every month for PRRS and the “Blue” and “Green” herds are tested at least once a year. The laboratory submission data were merged with the SPF database and, based on the herd id numbers, the herds were classified as Blue, Red and Non SPF herds (i.e. if the herds were not part of the SPF system). Due to the low number of “Green” SPF herds in the SPF system, these herds were included in the analysis as Blue SPF herds. The individual timeseries for each herd type were created and modeled separately. The seasonality test was performed based on auto correlation plots (ACF function in R) of individual timeseries and by comparing the fitness of the Holt Winters model with and without seasonal components (both “additive” and “multiplicative”). No autocorrelation or seasonality was found in the time series.
The datasets for this manuscript are not publicly available because are owned by the National Veterinary Institute — Technical University of Denmark and by the Pig Research Centre–SEGES.
Modeling
All methods described below were performed in R (version 3.5.0) [11].
Bayesian forecasting methods
Two Bayesian forecast models, a DLM and a DGLM, both with a linear trend, as described by [6] were used to model the different timeseries. The aim of these models is to estimate the underlying true value combining the observed data (i.e. PRRS seroprevalence) with a conditional distribution ϴ_{t} given by D_{t} (ϴ_{t} D_{t}) sequentially for each time step. A linear growth component includes a timevarying slope (or local linear trend) was incorporated in the model to allow the system to adapt to a possible positive or negative growth for each t.
A DLM model is represented by a set of two equations, defined as the observation equation (Eq. 1) and the system equation (Eq. 2).
where V_{t} and W_{t} are referred to as the observational variance and system variance, respectively. The transposed design matrix (F^{′}) had the following structure:
Eq. 2 describes the evolution of ϴ from time t1 to t. The system matrix (G) for a local linear trend model is given as:
In our study, the observational variance was adjusted for the number of submissions in a given week:
where n_{t} was the number of herds tested for PRRS on that week.
Unlike the DLM, the DGLM was based on a binomial distribution. The observation equation for the DGLM was defined as:
For both models, the system variance (w_{t}), which described the evolution of variancecovariance of the system for each time step, was modelled using a discount factor (δ), as previously described by [7].
The parameter vector was updated according to Bayesian principles. This was achieved with modelspecific implementations of the Kalman filter, as described in detail by [7].
In order to account for value of zero in PRRS seroprevalence, the parameter vector would be updated deterministically in accordance with eq. 2, but the stochastic, i.e. Bayesian, aspects of the update [7] would be omitted.
The DLM and DGLM were both programmed and implemented as described by [6]. The R code used for each model can be found as Additional file 1. The same implementation of the DLM and DGLM was validated on simulated data in a previous study [6]. These models have further been validated and used in previous studies [12,13,14,15].
Exponential smoothing method
The Holt linear trend method was also used to model the different timeseries. The method is described by [16] and is an exponentially weighted moving average filter of the level and trend components of a time series. Briefly, the Holt’ linear trend method allows forecasting of data with a trend. The level (l) and the trend (b) are then updated with the following equations:
and the kstep ahead forecast is then:
where l_{t} denotes an estimate of the level of the series at time t, b_{t} denotes an estimate of the trend (or growth) of the series at time t, α is the smoothing parameter for the level (0 < α < 1), β is the smoothing parameter for the trend (0 < β < 1).
The Holt’s linear trend method was used based on the ‘HoltWinters’ function in R.
Models initialization and parameterization
Reference analysis was used to estimate the initial parameters D_{0}~[m_{0}, C_{0}] for the DLM and the DGLM as described by [7]. The DLM and the DGLM models were run on the data between 2007 and 2009 with different values for δ ranging from 0.1 up to 1 by increments of 0.01. The discount factor which minimized the sum of the squared forecast errors was chosen for the analysis. For the DLM and the DGLM, the forecast errors e_{t}, standardized with respect to their variance Q_{t}, such \( {u}_t={e}_t/\sqrt{{\mathrm{Q}}_{\mathrm{t}}} \)
In order to estimate the initial trend to initiate the Holt’s linear trend method, a subset of the seroprevalence for each herd type in 2007 was used. A linear model was used to estimate this initial declining trend where the week of each year was used as predicted variables for PRRSV seroprevalence. The initial level was estimated as the average seroprevalence in 2007 for each herd type. .
The different combinations of the smoothing parameters α and β ranging from 0 up to 1 with increments of 0.00001 where tested in the different subsets with data between 2007 and 2009. The combination of α and β which minimized the sum of the squared forecast errors for the Holt’s linear trend method was used.
Modeling performance
The model fitting was evaluated using the following quantitative measures [7]:
, where e_{t} is forecast errors and N is the total number of observations in data from 1st January 2010 to 31st December 2014.
Additionally, the 95% confidence intervals (CI) for RMSE were calculated, assuming that the prediction errors were normally distributed around a mean of 0 with a constant standard deviation for all forecast errors. Thus, the χ^{2} distribution could be used to estimate the CI as:
, where n is the number of degrees of freedom, equivalent to the number of herds included in the study for each herdtype.
Linear crossdependence
We used the method described by [17] as it provides a measure of the presence (or lack) of local linear crossdependence between PRRS seroprevalence and the filtered values obtained from the three different models for each time step t and level j; the level expresses how many time units the wavelets should be “stretched over” and it can be interpreted as the timeseries lag. The functions ‘mvEWS’ and ‘coherence’ from the R package ‘mvLSW’ (version 1.2.1) [18] were used to estimate and quantify the dynamic linear crossdependence between PRRS seroprevalence and filtered means obtained from the three models with a default wavelet and values ranging from 1 to 30 with increments of 1 were used as Kernel smoothing parameters. The scale of j = 1, which corresponds to 1 week lag, was used to evaluate the crossdependence.
This method requires timeseries with lengths of 2^{J} where J is a positive integer. Data with the latest 256 (2^{8}) observations (i.e. data from 1st February 2010 to 31st December 2014) were used.
Availability of data and materials
The data that support the findings of this study are available from the National Veterinary Institute  Technical University of Denmark and in the Laboratory for Swine Diseases  SEGES Pig Research Centre but restrictions apply to the availability of these data and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the National Veterinary Institute  Technical University of Denmark and in the Laboratory for Swine Diseases  SEGES Pig Research Centre.
Abbreviations
 CI:

Confidence Intervals
 DGLM:

Dynamic Generalized Linear Model
 DLM:

Dynamic Linear Model
 HM:

Holt’s linear trend method
 PRRS:

Porcine Reproductive and Respiratory Syndrome
References
 1.
Carslake D, Grant W, Green LE, Cave J, Greaves J, Keeling M, et al. Endemic cattle diseases: comparative epidemiology and governance. Philos Trans R Soc Lond Ser B Biol Sci. 2011;366:1975–86. https://doi.org/10.1098/rstb.2010.0396.
 2.
Doherr MG, Audigé L. Monitoring and surveillance for rare healthrelated events: a review from the veterinary perspective. Philos Trans R Soc Lond Ser B Biol Sci. 2001;356:1097–106. https://doi.org/10.1098/rstb.2001.0898.
 3.
Burkom HS, Murphy SP, Shmueli G. Automated time series forecasting for biosurveillance. Stat Med. 2007;26:4202–18.
 4.
Lotze T. Preparing biosurveillance data for classic monitoring; 2008.
 5.
Bee Dagum E, Bianconcini S. Time series components. In: Seasonal adjustment methods and real time trendcycle estimation: Springer International Publishing; 2016. p. 29–57. https://doi.org/10.1007/9783319318226_2.
 6.
Lopes Antunes AC, Jensen D, Halasa T, Toft N. A simulation study to evaluate the performance of five statistical monitoring methods when applied to different timeseries components in the context of control programs for endemic diseases. PLoS One. 2017;12:e0173099. https://doi.org/10.1371/journal.pone.0173099.
 7.
West M, Harrison J. Bayesian forecasting and dynamic models. 2nd ed. New York, USA: Springer; 1997.
 8.
Sørensen KJ, Bøtner A, Smedegaard Madsen E, Strandbygaard B, Nielsen J. Evaluation of a blocking Elisa for screening of antibodies against porcine reproductive and respiratory syndrome (PRRS) virus. Vet Microbiol. 1997;56:1–8. https://doi.org/10.1016/S03781135(96)013454.
 9.
Bøtner A, Nielsen J, BilleHansen V. Isolation of porcine reproductive and respiratory syndrome (PRRS) virus in a Danish swine herd and experimental infection of pregnant gilts with the virus. Vet Microbiol. 1994;40:351–60. https://doi.org/10.1016/03781135(94)901228.
 10.
Specific Pathogen Free System. Specific Pathogen Free System. http://spfsus.dk/. Accessed 20 Jan 2016.
 11.
R Core Team. R. A Language and Environment for Statistical Computing. 2017. https://www.rproject.org. Accessed 4 Jan 2017.
 12.
Jensen DB, van der Voort M, Hogeveen H. Dynamic forecasting of individual cow milk yield in automatic milking systems. J Dairy Sci. 2018;101:10428–39.
 13.
Cornou C, Kristensen AR. Use of information from monitoring and decision support systems in pig production: collection, applications and expected benefits. Livest Sci. 2013;157:552–67.
 14.
Ostersen T, Cornou C, Kristensen a R. Detecting oestrus by monitoring sows’ visits to a boar. Comput Electron Agric. 2010;74:51–8. https://doi.org/10.1016/j.compag.2010.06.003.
 15.
Madsen TN, Kristensen AR. A model for monitoring the condition of young pigs by their drinking behaviour. Comput Electron Agric. 2005;48:138–54.
 16.
Hyndman R, Koehler AB, Ord JK, Snyder RD, Hand DJ. Forecasting with exponential smoothing: the state space approach: Springer Science & Business Media; 2008. https://doi.org/10.1111/j.17515823.2009.00085_17.x.
 17.
Park T, Eckley IA, Ombao HC. Estimating timeevolving partial coherence between signals via multivariate locally stationary wavelet processes. IEEE Trans Signal Process. 2014;62:5240–50.
 18.
Taylor S. R Package “mvLSW.” 2017. doi:https://doi.org/10.1109/TSP.2014.2343937>.
Acknowledgments
The authors would like to thank the Pig Research Centre–SEGES for providing part of the data used in this study.
Funding
The work in this study was carried out as part of the agreement between the Technical University of Denmark and the Danish Veterinary and Food Administration regarding contingency, preparedness, and risk analysis regarding animal health.
Author information
Affiliations
Contributions
ACLA contributed to the study design, data management, data analysis, modeling and wrote the manuscript. DJ assisted in designing the study, assisted the data analysis and modeling and reviewed the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Correspondence to Ana Carolina Lopes Antunes.
Ethics declarations
Ethics approval and consent to participate
The study was conducted using surveillance data and did not involve experiments on animals. From an ethical perspective, all of the material collected and used as part of this study was outside the scope of Directive 2010/63. The serum samples used were obtained from blood samples voluntarily collected for monitoring PRRS in Danish swine herds from January 2007 to December 2014. Permission from the National Veterinary Institute  Technical University of Denmark and in the Laboratory for Swine Diseases  SEGES Pig Research Centre was obtained to access the data used in this study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
R code used for a Dynamic Linear Model (DLM) and a Dynamic Generalized Linear Model (DGLM), both with a linear trend component. (TXT 5 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Surveillance
 Time series
 Modeling
 Trends