Data
Data on PRRS diagnostic cases, pig demographics and pig movements were provided by pig producers in one US state. The PRRS data comprised 237 reported PRRS cases and more than 56,731 pig movements occurring between >500 production sites from 2012 to 2015. The state, name and exact number of production sites are omitted here to preserve confidentiality. More information about the data set used regarding farm demographics and pig trade network can be found in Lee et al. [15]. Information on the exact geographical location (x, y coordinates) and number of pigs on the farm were also obtained. PRRS surveillance is primarily conducted in sow farms, nurseries and gilt development units (GDUs), as well as in other types of production sites (e.g., finishers and wean-to-finish farm; WF) Therefore, after an outbreak investigation, we focused only on modeling PRRS cases in sow farms, nurseries and GDU, and the subpopulation of finishers and wean-to-finisher farms that had reported PRRS outbreaks in the last 4 years. In summary a total of 124 farms were considered for the analyses.
Modelling approach
We used a two step approach. First, we used a parameter driven model to identify if there is a spatio-temporal trend of PRRS reported cases from Jan 2012 to July 2014. Then, if a spatio-temporal trend existed, an autoregressive parameter driven model was used to predict the PRRS cases for the last half of 2014 based on different weight matrices. All the analyses were done by using R-INLA software package for R ([19]; www.r-inla.org).
Parameter driven model
We assumed that the general binary observation of PRRS outbreaks in farm i at time t (Y
it
) follows a Binomial distribution of the form:
$$ {Y}_{it}\kern0.1em \sim \kern0.1em \mathrm{B}\mathrm{i}\mathrm{n}\mathrm{o}\mathrm{m}\mathrm{i}\mathrm{a}\mathrm{l}\kern0.1em \left({m}_{it},\kern0.5em {\eta}_{it}\right)\kern0.1em \mathrm{w}\mathrm{i}\mathrm{t}\mathrm{h} \hspace{0.1em} t=1,2,\dots, T; i=1,2,\dots, I $$
(1)
where the number of trials, m
it
, adjusts for possible numbers of tested individuals on farm. In this study we assumed 10% of the animals (i.e. number of trials) in each farm were being tested in a 6-month period. Therefore the probability that the farm is having a outbreak based on the PRRS test result is following binomial distribution. Here T is equal to six corresponding to the number of 6-month time steps from 2012 to 2015, I is equal to 124 (i.e. number of the farms in this study) and η
it
is the probability of having at least one positive animal in the farm, which is specified as recommended by Blangiardo and Cameletti, [3]:
$$ l o g i t\left({\eta}_{i t}\right)={b}_0+{u}_i+{\nu}_i+\left(\kappa +{\delta}_i\right) t $$
(2)
where b
0 is the intercept, which quantifies the average PRRS farm status in the entire study, while u
i
and ν
i
are the area-specific effects. The parameter u
i
is assumed to be structured in space, which takes into account the PRRS status in neighboring farms [18]. We defined neighboring farms as one located within 6 km radius.
Conditional autoregressive [2], was specified as the structure for the u = {u
1
,…, u
n
}. Considering n areas, each characterized by a set of neighbors \( {\mathcal{N}}_i \) and assuming u
i
is the following random variable [3]:
$$ {u}_i\mid \boldsymbol{u}- i\sim Normal\left({\mu}_i+\sum_{j=1}^n{r}_{i j}\left({u}_j-{\mu}_j\right),{s}_i^2\right) $$
(3)
where μ
i
is the mean for the farm i and \( {s}_i^2={\sigma}_u^2/{\mathcal{N}}_i \) is the variance for the same farm, which depends on its number of neighbors (e.g., if an farm has many neighbors then its variance will be smaller). This variance structure recognizes the fact that in the presence of strong spatial correlation, the more neighbors a farm has the more information there is in the data about the value of its random effect. While the variance parameter \( {\sigma}_u^2 \) controls the amount of variation between the spatially structured random effects. The quantity r
ij
indicates the spatial proximity and can be calculated as ϕ ×W
ij
, where W
ij
= \( {a}_{i j}/{\mathcal{N}}_i \), a
ij
is 1 if farms i and j are neighbors and 0 otherwise (note that a
ii
is set to 0, thus W
ii
and r
ii
are 0 as well); and finally, the parameter ϕ controls the properness of the distribution as it was formulated by Cressie [4].
The parameter ν
i
is a spatially unstructured component which follows a Normal distribution of the form\( \mathrm{N}\left(0,{\ \sigma}_{\nu}^2\right) \); where \( {\sigma}_{\nu}^2 \) is the variance of the marginal unstructured component. The main linear trend κ, represents the global time effect. A differential trend δ
i
, which identifies the interaction between time and space, represents the difference between the global trend (κ) and the area specific trend. If δ
i
< 0 then the area specific trend is less steep than the mean trend, whilst δ
i
> 0 implies that the area specific trend is steeper than the mean trend.
Autoregressive parameter driven model
For this step we used a hierarchical Bayesian model similar to the one described by Schrödle et al. [21]. Equation 2 can be re-written using two stages.
$$ \begin{array}{l}\mathrm{Stage}\ 1:\mathrm{logit}\left(\ {\eta}_{i t}\right)={b}_0+{u}_i+{\nu}_i+\left(\kappa +{\delta}_i\right) t+{\zeta}_{i t}\\ {}\mathrm{Stage}\ 2:{\zeta}_{i t}=\lambda .{\zeta}_{i, t-1}+\rho {\sum}_{j\ne i}{\omega}_{j i}.{\zeta}_{j, t-1}+{\epsilon}_{i t}\end{array} $$
(4)
Because PRRS spread follows more an endemic than epidemic pattern in our particular study area u
i
and ν
i
are included in stage one as suggested by [21].
Equation 3 includes an autoregressive process ζ
i
= (ζ
i1, … , ζ
iT
)T for each farm i to model the latent spatial spread of the disease based on the georeferenced location of the farms (e.g. [3]). In the second stage, λ and ρ are the autoregressive parameters. We used N(0, 0.25) distributions as priors for λ and ρ in all the models. The term \( \sum_{j\ne i}{\omega}_{j i}.{\zeta}_{j, t-1} \) is a weighted sum of the past states on other farms j different than the farm of interest (i). Different choices for the weights ω
ij
are possible (e.g., [16]). Here we used five different weights namely: (i) geographical distance weight which contains only the inverse distance (1/i ~ j) between each pair of farms in kilometers, (ii) pig trade (PT
ji
) weight which contains the number of pig shipments between each pair of farms and, (iii) the product between the distance weight and the standardize relative pig trade weight (PT
ji
/number of animals in sender farm), (iv) the product between the standardized distance weight (standardized distance (i ~ j) between each pair of farms in kilometers) and the standardized relative pig trade weight (standardize pig trade (PT
ji
) weight/ number of animals in sender farm), and (v) the product of the distance weight and the pig trade weight. For the geographical distance weight the matrix is symmetric. Other combinations using pig trade weight and distance weight in the denominator were tried with but no convergence was obtained. However, the pig trade weight is not symmetric for each pair of farms, because the number of pig traded from farm j to farm i in our study was different from the number of pig traded from farm i to farm j. The errors ϵ
i
= (ϵ
i1, … , ϵ
iT
)T were assumed to be independent and normally distributed with variance\( {\sigma}_{\epsilon}^2 \).
We calculated the mean of the first (finite) moment of the predictive probability distribution (μP = E(y
iT
| y − T ) as defined by Schrödle et al. [21]. Here, the vector y − T contains all the observations in all regions i up to time T-1.We used μP to categorize the PRRS status of farms into two groups namely negative and positive (i.e., farm predicted as negative or positive by the model). The optimal threshold based on the μP was calculated using the true positive rate (sensitivity) and true negative rate (specificity). The optimal threshold is the threshold that maximizes the distance to the identity line in the ROC curve [24].
Model selection criteria and prediction ability
We used the deviance information criteria (DIC), which can be easily calculated in INLA, as the selection criterion for the best final Bayesian model [22]. Smaller DIC values indicate a better trade-off between complexity and fit.
In order to determine the discriminating power for distinguishing between PRRS negative and positive farms we calculated the area under the Receiver Operating Characteristic (ROC) curve (AUC) based on the calculated the mean of the first (finite) moment of the predictive probability distribution (μP) of each model [8]. The higher the AUC, the better the model performed in predicting the infected and not infected farms in the last half of 2014. We also computed the simulation error and the model prediction error. The simulation error is the number of times in which the PRRS farm status is incorrectly classified based on the calculated μP from 2012- first half of 2014 (5 time steps). The model prediction error is the sum of the false positive farms and the false negative farms based on the calculated μP of the second half of 2014.