Data sources
Laboratory submission data stored in information management systems in the National Veterinary Institute - Technical University of Denmark and in the Laboratory for Swine Diseases - SEGES Pig Research Centre were used to determine the weekly PRRS sero-prevalence in Danish swine herds from January 2007 to December 2014. The laboratory diagnostic test results for PRRS from both laboratories were only available for this period of time. Collections of individual blood samples collected on the same day from different animals from a given herd are processed as individual laboratory submissions. Only submissions with at least 10 individual blood samples tested by serological tests (Blocking Enzyme-Linked Immunosorbent Assay — ELISA ([8]; IDEXX, Ludwigsburg, Germany) and/or Immunoperoxidase monolayer assay — IPMA [9]) were included in the study. Herds were classified as PRRS sero-positive when at least 2 individual blood samples in each submission tested PRRS positive. The country-level between-herd PRRS sero-prevalence was calculated weekly as the proportion of PRRS positive herds for a given time period from the total number of herds tested for PRRS on that week. This will be referred to as “PRRS sero-prevalence” throughout the manuscript.
In Denmark, the Specific Pathogen Free (SPF) system is a voluntary health program with established rules for monitoring several diseases, including PRRS [10]. The designations “Red”, “Blue” and “Green” are used within the SPF system to classify the herds according to its biosecurity status. Within the SPF herds, the “Red” herds are tested every month for PRRS and the “Blue” and “Green” herds are tested at least once a year. The laboratory submission data were merged with the SPF database and, based on the herd id numbers, the herds were classified as Blue, Red and Non SPF herds (i.e. if the herds were not part of the SPF system). Due to the low number of “Green” SPF herds in the SPF system, these herds were included in the analysis as Blue SPF herds. The individual time-series for each herd type were created and modeled separately. The seasonality test was performed based on auto correlation plots (ACF function in R) of individual time-series and by comparing the fitness of the Holt Winters model with and without seasonal components (both “additive” and “multiplicative”). No autocorrelation or seasonality was found in the time series.
The datasets for this manuscript are not publicly available because are owned by the National Veterinary Institute — Technical University of Denmark and by the Pig Research Centre–SEGES.
Modeling
All methods described below were performed in R (version 3.5.0) [11].
Bayesian forecasting methods
Two Bayesian forecast models, a DLM and a DGLM, both with a linear trend, as described by [6] were used to model the different time-series. The aim of these models is to estimate the underlying true value combining the observed data (i.e. PRRS sero-prevalence) with a conditional distribution ϴt given by Dt (ϴt| Dt) sequentially for each time step. A linear growth component includes a time-varying slope (or local linear trend) was incorporated in the model to allow the system to adapt to a possible positive or negative growth for each t.
A DLM model is represented by a set of two equations, defined as the observation equation (Eq. 1) and the system equation (Eq. 2).
$$ {Y}_t={F}^{\hbox{'}}.\kern0.5em {\theta}_t+{v}_t,\kern0.5em {v}_t\sim N\left(0,{V}_t\right) $$
(1)
$$ {\theta}_t=G.\kern0.5em {\theta}_{t-1}+{w}_t,\kern0.5em {w}_t\sim N\left(0,{W}_t\right) $$
(2)
where Vt and Wt are referred to as the observational variance and system variance, respectively. The transposed design matrix (F′) had the following structure:
$$ \kern1.75em {\boldsymbol{F}}^{\prime }=\left[1\kern0.5em 0\right] $$
(3)
Eq. 2 describes the evolution of ϴ from time t-1 to t. The system matrix (G) for a local linear trend model is given as:
$$ \kern1em \boldsymbol{G}=\left[\begin{array}{cc}1& 1\\ {}0& 1\end{array}\right] $$
(4)
In our study, the observational variance was adjusted for the number of submissions in a given week:
$$ {v}_t=\frac{Y_{t-1}\ \left(\ 1-{Y}_{t-1}\right)}{n_t} $$
(5)
where nt was the number of herds tested for PRRS on that week.
Unlike the DLM, the DGLM was based on a binomial distribution. The observation equation for the DGLM was defined as:
$$ {P}_t={F}^{\prime }.\kern0.5em {\theta}_t $$
(6)
For both models, the system variance (wt), which described the evolution of variance-covariance of the system for each time step, was modelled using a discount factor (δ), as previously described by [7].
The parameter vector was updated according to Bayesian principles. This was achieved with model-specific implementations of the Kalman filter, as described in detail by [7].
In order to account for value of zero in PRRS sero-prevalence, the parameter vector would be updated deterministically in accordance with eq. 2, but the stochastic, i.e. Bayesian, aspects of the update [7] would be omitted.
The DLM and DGLM were both programmed and implemented as described by [6]. The R code used for each model can be found as Additional file 1. The same implementation of the DLM and DGLM was validated on simulated data in a previous study [6]. These models have further been validated and used in previous studies [12,13,14,15].
Exponential smoothing method
The Holt linear trend method was also used to model the different time-series. The method is described by [16] and is an exponentially weighted moving average filter of the level and trend components of a time series. Briefly, the Holt’ linear trend method allows forecasting of data with a trend. The level (l) and the trend (b) are then updated with the following equations:
$$ {l}_t=\upalpha\ {Y}_t+\left(1-\upalpha \right)\left({l}_{t-1}+{b}_{t-1}\right) $$
(7)
$$ {b}_t=\upbeta\ \left({l}_t-{l}_{t-1}\right)+\left(1-\upbeta \right){b}_{t-1} $$
(8)
and the k-step ahead forecast is then:
$$ {\hat{Y}}_{t+h\mid \mathrm{t}}={l}_t+{b}_tk $$
(9)
where lt denotes an estimate of the level of the series at time t, bt denotes an estimate of the trend (or growth) of the series at time t, α is the smoothing parameter for the level (0 < α < 1), β is the smoothing parameter for the trend (0 < β < 1).
The Holt’s linear trend method was used based on the ‘HoltWinters’ function in R.
Models initialization and parameterization
Reference analysis was used to estimate the initial parameters D0~[m0, C0] for the DLM and the DGLM as described by [7]. The DLM and the DGLM models were run on the data between 2007 and 2009 with different values for δ ranging from 0.1 up to 1 by increments of 0.01. The discount factor which minimized the sum of the squared forecast errors was chosen for the analysis. For the DLM and the DGLM, the forecast errors et, standardized with respect to their variance Qt, such \( {u}_t={e}_t/\sqrt{{\mathrm{Q}}_{\mathrm{t}}} \)
In order to estimate the initial trend to initiate the Holt’s linear trend method, a subset of the sero-prevalence for each herd type in 2007 was used. A linear model was used to estimate this initial declining trend where the week of each year was used as predicted variables for PRRSV sero-prevalence. The initial level was estimated as the average sero-prevalence in 2007 for each herd type. -.
The different combinations of the smoothing parameters α and β ranging from 0 up to 1 with increments of 0.00001 where tested in the different subsets with data between 2007 and 2009. The combination of α and β which minimized the sum of the squared forecast errors for the Holt’s linear trend method was used.
Modeling performance
The model fitting was evaluated using the following quantitative measures [7]:
$$ Root\ Mean\ Squared\ Error\ (RMSE)=\kern0.5em \sqrt{\kern0.5em \frac{\sum_{t=1}^i{e_t}^2\ }{N}} $$
(10)
, where et is forecast errors and N is the total number of observations in data from 1st January 2010 to 31st December 2014.
Additionally, the 95% confidence intervals (CI) for RMSE were calculated, assuming that the prediction errors were normally distributed around a mean of 0 with a constant standard deviation for all forecast errors. Thus, the χ2 distribution could be used to estimate the CI as:
$$ 95\%{CI}_{RMSE}=\left[\sqrt{\frac{n}{\chi_{1-\frac{5}{2},n}^1}}.\kern0.5em RMSE,\sqrt{\frac{n}{\chi_{\frac{5}{2},n}^1}}.\kern0.5em RMSE\ \right] $$
(11)
, where n is the number of degrees of freedom, equivalent to the number of herds included in the study for each herd-type.
Linear cross-dependence
We used the method described by [17] as it provides a measure of the presence (or lack) of local linear cross-dependence between PRRS sero-prevalence and the filtered values obtained from the three different models for each time step t and level j; the level expresses how many time units the wavelets should be “stretched over” and it can be interpreted as the time-series lag. The functions ‘mvEWS’ and ‘coherence’ from the R package ‘mvLSW’ (version 1.2.1) [18] were used to estimate and quantify the dynamic linear cross-dependence between PRRS sero-prevalence and filtered means obtained from the three models with a default wavelet and values ranging from 1 to 30 with increments of 1 were used as Kernel smoothing parameters. The scale of j = 1, which corresponds to 1 week lag, was used to evaluate the cross-dependence.
This method requires time-series with lengths of 2J where J is a positive integer. Data with the latest 256 (28) observations (i.e. data from 1st February 2010 to 31st December 2014) were used.