Study herds, sampling, bacteriology and ELISA test
The data used have been previously described by Kranker et al.[10] and originate from three Danish pig herds known to be infected with Salmonella Typhimurium. The herds had moderate to high levels of Salmonella Typhimurium and therefore the withinherd prevalence was 40% or higher. These measures of prevalence were based on meatjuice samples collected over three months, evaluated by use of an optical density (OD) cutoff of 20%. Two of the farms, with 650 and 440 sows, respectively, were twosite operations; the remaining farm was a threesite operation with 300 sows. The three herds were selfsupplying. In each herd, 10 litters were randomly selected, and in each litter, the ears of six randomly selected piglets were tagged. To account for variations in Salmonella shedding over time, litters from each herd were divided into two groups of five litters that were raised at approximately onemonth intervals. Thus, on each farm there were two cohorts consisting of 30 pigs each, yielding a total of 180 piglets at the start of the study. All eartagged pigs from a given cohort were raised together for the entire observation period. The animals were followed longitudinally [10] and were first tested at the age of four weeks and thereafter at two to five week intervals until the age of slaughter. Slaughter age varied between cohorts but was on average around 25 weeks. Cohorts were tested either six to seven times in total. On each testing occasion, sera and faeces from the animals were collected and tested for the presence of Salmonella spp.. At the age of four weeks only faeces were collected, because maternal antibodies still present could give a false positive result. An animal was considered serologically positive, wherever the serological test revealed a result of OD% >20, and bacteriologically positive if Salmonella was isolated from the faeces. The serological test used at this cutoff value is considered to have a sensitivity of 68% and to be 100% specific [11]. The bacteriological test is considered to be 100% specific and the sensitivity is around 30 to 55% [12]. These test characteristics were incorporated in the statistical model.
Infection status of the pigs
The testing time interval was different in each cohort, varying from two to five weeks. A homogenous dataset was derived by inferring the infection status of each pig, every two weeks. The time step of two weeks was chosen because on average it takes two weeks for an animal to test positive to serology after being infected. It was therefore assumed that an animal was infectious in the two weeks before being seropositive. The most likely infection status of each pig was determined for each twoweek time step based on both faecal shedding and serology results from the sampling period closest to each time step. Each animal was categorized as susceptible (S), infectious (I) or carrier (R). An animal was considered susceptible if the agent was not present and the animal was at the risk of infection. An animal was considered infectious if it was shedding the agent and a potential source of infection to other animals. An animal that was infected with the agent but not shedding and therefore not able to infect other animals was considered a carrier. In the absence of reasonable sensitivity of the bacteriological culture method, serology offered an alternative and complementary way to assign the infection status of a pig and both methods were used in parallel to categorise the pigs’ status.
Pigs were attributed status S when there was no presence of bacteria in the faecal samples and the OD% was below 20. Status I was assigned from the date when a pig was found to be bacteriologically positive until it was no longer positive by this testing method. Additionally, pigs were assigned to status I based on seroconversion. The beginning of the infectious period was set to two weeks prior to the recorded date of seroconversion [13, 14] and the duration was set to four weeks, assuming that a pig would shed Salmonella spp. within an average of four weeks. This average period was based on experimental data regarding duration of the shedding period [13, 15]. Thus, information was used from both tests in parallel for pig classification. Finally, status I was followed by status R and the pigs could return to status I if they were found to be culture positive later on during the study period. It was assumed that no pig would return to the susceptible status after being infected, because of the relative short life span of finisher pigs (after infection it takes around 112 days to clear the agent from the organs [14], and postweaned pigs are generally slaughtered before this time). The following describes a particular example of how the classification was performed: if a pig was shedding at a specific testing time, it was considered infectious in the nearest biweekly time step, until it became bacteriologically negative, after which it was considered a carrier (in the nearest biweekly time step). If a pig was serologically positive, in the presence of a negative culture, it was considered infected and therefore classified as infectious for at least four weeks, from the nearest biweekly time step prior the testing time. If an animal was bacteriologically and serologically negative, it was considered susceptible. Given that testing of piglets was restricted to bacteriology (which has low sensitivity) at the beginning of the followup period, some piglets infected by the sow could have been erroneously classified as susceptible. For this reason, the analysis in each cohort started at the time infected animals were first detected (by either serology or bacteriology).
Estimation of the transmission parameters
Conventionally, transmission parameters of infectious disease, including Salmonella spp., in swine herds [14, 16–21] are estimated using regression models. These are often based on data describing the prevalence of that disease in the country or region to which the particular study refers. As suggested in some studies [5, 22, 23], we first applied a stochastic SIR models in the form of Generalised Linear Models (GLMs) in order to estimate the three transmission parameters. However, preliminary results (not reported here) suggested the presence of overdispersion in the GLMs, hinting towards unobserved sources of variation in the data such as cohort heterogeneity. Instead, we here report a framework for stochastic SIR models which i) extends the current GLM framework by including random effects, ii) is implemented using a Bayesian approach thus allowing incorporation of prior information (such as the sensitivity of Salmonella tests), iii) explicitly estimates the probability of not detecting infectious animals due to test sensitivity and iv) incorporates all sources of uncertainty/variation thus obtaining more realistic estimates of transmission parameters. As suggested by some authors [5], the inclusion of random effects automatically accounts for overdispersion by inflating the variance of the response variable while at the same time allowing for cohort heterogeneity.
Stochastic SIR models (and other variants such as SI or SIS models) are wellestablished in animal disease literature [4, 19–21]. The benefit of using a stochastic SIR model is that transmission parameters can be estimated using statistical modelling; here the conventional stochastic SIR model was extended by explicitly allowing for cohort variation and unobserved temporal effects. The three components of the stochastic SIR model are described in detail below.
Transition from susceptible to infectious
It was assumed that pigs become infected by “infectious contacts” defined as: either contact with other infected animals, or contact with their environment (rodents, contaminated muck or feed). The rate at which a given animal has infectious contacts was assumed i) to be constant in time and ii) proportional to the density of infectious animals [19], with a constant of proportionality β, i.e. the transmission rate parameter. In other words, the infectious contacts per animal happen randomly in time so that their occurrence can be described by a Poisson process. More precisely, the number of infectious contacts per animal in a period Δt is Poisson distributed with mean λ = β (I/N) Δt, where I is the number of infectious animals and N is the total number of animals, at the beginning of Δt. As such, the probability of no infectious contacts per animal in Δt is exp (β (I/N) Δt), implying that the probability of infection in Δt is p = 1 exp (β (I/N) Δt). This means that the number of new cases C at the end of Δt is Binomial with parameters S and p so that the mean of C is S*p.
Here, the current established methodology was extended to allow for the fact that i) λ may vary in time due to exogenous factors and ii) λ may vary across cohorts due to unobserved cohort effects. A random (scaling) effect exp (r_{jt}) was included, for the j^{th} cohort at time t, to get λ_{jt} = β (I/N) exp (r_{jt}) Δt as the mean number of infectious contacts of a random animal, in herd j at time t. Note that Δt denotes the length of a time interval whereas t refers to actual time. On average, exp (r_{jt}) was assumed to be equal to one, so that across all cohorts and time, the average transmission rate parameter is still β. By doing this, variation due to cohort or unknown temporal effects was explicitly modelled, which would otherwise contribute to the uncertainty in estimating β.
All time intervals in the data are equal to two weeks so for clarity, Δt = 1 was set so that one time step Δt corresponds to two weeks. This does not qualitatively affect the estimation of the transmission parameters. Because of the nature of the data, time t is now defined in discrete consecutive (biweekly) time steps.
The model may be formulated as follows:
\begin{array}{l}{\mathrm{C}}_{\mathrm{jt}}~\mathrm{Binomial}\left({\mathrm{S}}_{\mathrm{jt}},{\mathrm{p}}_{\mathrm{jt}}\right)\\ {\mathrm{p}}_{\mathrm{jt}}=1\u2013exp\left\{\mathit{\beta}\left({\mathrm{I}}_{\mathrm{jt}1}/{\mathrm{N}}_{\mathrm{jt}1}\right)exp\left({\mathrm{r}}_{1\mathrm{jt}}\right)\right\}\\ \mathrm{cloglog}\left({\mathrm{p}}_{\mathrm{jt}}\right)=log\left(\mathit{\beta}\right)+log\left({\mathrm{I}}_{\mathrm{jt}1}\right)log\left({\mathrm{N}}_{\mathrm{jt}1}\right)+{\mathrm{r}}_{1\mathrm{jt}}\end{array}
(1)
where:

C_{
jt
} denotes the number of new infectious animals in cohort (j) at the end of the time step (t),.

S_{jt1} is the number of susceptible animals in cohort (j) at the end of the time step (t1),

p_{jt} is the probability of a susceptible animal in cohort (j) at the end of time step (t1) becoming infectious by the end of time step (t),

cloglog is the complementary loglog transformation,

β is the transmission rate parameter for the transition from susceptible to infectious,

I_{jt1} is the number of infectious animals in cohort (j) at the end of the time step (t1),

N_{jt1} is the total number of animals in cohort (j) at the end of the time step (t1), and.

r_{1jt} is a cohort timedependent random effect (which is zero on average).
Note that, at the beginning of the study, pigs were considered to be either in the S or I status depending on the test results. When there was no infectious pig present at the end of the previous time step, i.e. I_{jt1} = 0, the probability of becoming infectious was modelled as:
Cloglog (p_{jt}) = log (β) + r_{1jt}. This is because even if there are no infectious pigs present, animals can still be infected (e.g., contaminated environment, feed, water, etc.). In this formulation, β is seen as the underlying rate of transition for a random pig in an average cohort with no infectious animals, while r_{1jt} allows for unobserved cohorttime effects in the data e.g., anthropogenic influence, rodents etc. Note that homogeneous mixing of the pigs in each cohort (i.e. all pigs could come into contact with each other) was assumed, due to the small size of the cohorts.
In using the number of infectious pigs I_{jt}, in each cohort at the end of time step t, it was necessary to account for the sensitivity of both the serological and bacteriological test. Since the specificity in both tests is considered to be 100%, the parallel specificity is 1. This implies that I_{jt} = Iobs_{jt} + Inob_{jt}, where Iobs_{jt} is the observed value and Inob_{jt} is the number of infectious animals not detected (false negative pigs). In other words, Iobs_{jt} is a lower bound on the actual I_{jt}. The unobserved variable Inob_{jt} may be incorporated (and thus estimated) in the stochastic model and here it was assumed that it has a Binomial distribution with parameters N_{jt} and pND where pND is the probability of not detecting infectious animals. This probability, pND, is of course dependent on the sensitivity probabilities of each test, which were assumed to be independent. Inob_{jt} was modelled as follows:
\begin{array}{l}{\mathrm{I}}_{\mathrm{jt}}={\mathrm{Iobs}}_{\mathrm{jt}}+{\mathrm{Inob}}_{\mathrm{jt}}\\ {\mathrm{Inob}}_{\mathrm{jt}}~\mathrm{Binomial}\left({\mathrm{N}}_{\mathrm{jt}},\mathrm{pND}\right)\\ \mathrm{pND}=\left(1\mathrm{SenC}\right)*\left(1\mathrm{SenE}\right)\end{array}
(2)
where:

SenC is the sensitivity probability of microbiological culture, and.

SenE is the sensitivity probability of the ELISA test.
Treating Inob_{jt} as an unobserved random variable allows formal quantification of the uncertainty in the data due to test sensitivity and constitutes one of the novelties of the proposed model. The Bayesian framework (see section 5 later on) used to estimate the stochastic SIR model can easily incorporate the estimation of Inob_{jt} given prior information on SenC and SenE.
Transition from infectious (I) to resistant (R)
The rate α at which a random infectious animal in a given cohort becomes a carrier was assumed to be constant in time. As such, the length of time τ until an infectious animal becomes carrier can be modelled by an exponential distribution with rate parameter α. So, given that the animal is infectious at the start of time interval Δt, the probability pR of becoming carrier is pR = Pr (τ ≤ Δt) = 1exp (αΔt) since τ is exponentially distributed (recall that Δt = 1 was set for conciseness). As before, a random cohort effect r_{2j} was added to allow for cohort heterogeneity in the data, to obtain pR_{j} = 1exp (αexp (r2_{j})). The number of new carrier animals Rnew_{jt} at the end of time step t, is thus Binomial with parameters I_{jt} and pR_{j}. Note that a single parameter α was utilised, describing the rate at which a random infectious animal in an average cohort, becomes carrier, however, cohort variability (not all cohorts are average) was allowed for through r_{2j}, which in turn reduces uncertainty in estimating α. The I to R transition was modelled as follows:
\begin{array}{l}{\mathrm{Rnew}}_{\mathrm{jt}}~\mathrm{Binomial}\left({\mathrm{I}}_{\mathrm{jt}},{\mathrm{pR}}_{\mathrm{j}}\right)\\ \mathrm{cloglog}\left({\mathrm{pR}}_{\mathrm{j}}\right)=log\left(\mathit{\alpha}\right)+{\mathrm{r}}_{2\mathrm{j}}\end{array}
(3)
Transition from resistant to infectious
For this compartment of the model, the rate of infectious contacts ν in a random carrier animal was assumed to be constant in time, where ν is the transmission rate parameter for the transition from carrier to infectious. With similar arguments as in the S to I compartment, the number of infectious contacts per animal in time period Δt is Poisson distributed with mean νΔt. Since this transition was actually a rare event (only happening three times in the entire study), the Poisson distribution can be used, as it approximates the Binomial when its probability parameter is close to zero. So if in cohort j, there are R_{jt1} carrier animals at the end of the previous time step, the number of transitions from R to I in time step t may be modelled as a Poisson variable with mean μ_{jt} = νR_{jt1}exp (r_{3j}). More explicitly:
\begin{array}{l}{\mathrm{Inew}}_{\mathrm{jt}}~\mathrm{Poisson}\left({\mathit{\mu}}_{\mathrm{jt}}\right)\\ log\left({\mathit{\mu}}_{\mathrm{jt}}\right)=log\left(\mathit{\nu}\right)+log\left({\mathrm{R}}_{\mathrm{jt}1}\right)+{\mathrm{r}}_{3\mathrm{j}}\end{array}
(4)
where:

Inew_{jt} denotes the number of new infectious animals (that result from this transition) in cohort (j) at the end of the time step (t),

μ_{jt} is the mean number of carrier animals that become infectious in the cohort (j) during time step (t),

ν is the transmission rate parameter for the transition from carrier to infectious state,

R_{jt1} is the number of carrier animals at the end of the time step (t1) in cohort (j), and.

r_{3j} is a cohort random effect that allows for cohort heterogeneity.
Note that R_{jt1} = 0 is possible, in which case log (R_{jt1}) = 0 was set. The argument for doing this is that the transmission rate parameter ν may be defined as the limit of μ_{jt}/R_{jt1}as R_{jt1} goes to zero. As such, ignoring the random effect for a moment, μ_{jt}/R_{jt1} should tend to a constant (i.e. ν) as R_{jt1} goes to zero rather than infinity. Note that in our data, R_{jt1} = 0 happened on 20% of occasions. In the hypothetical case that R_{jt1} = 0 for the majority of time steps and cohorts, then this component of the model (i.e. the transition R to I) becomes redundant as there will ultimately be almost no information with which to estimate the transition parameter.
Cohort random effects
As indicated above, random cohort effects were incorporated into each transition step to allow for i) cohort heterogeneity/variability in the data, ii) unobserved cohortspecific factors, iii) unobserved temporal effects in the S to I compartment. These effects were different for each transition under the assumption that any unobserved cohort factors affect each transition in a different way. For the transitions S to I and R to I, these random effects also allow for factors which affect disease spread but which are not dependent on the animals themselves (such as contaminated environment, feed, water, etc.).
For the transition S to I, the cohort random effects were assumed to be timevarying and autocorrelated, and were modelled as:
\begin{array}{l}{\mathit{r}}_{1\mathit{j},\mathit{t}=1}~\mathit{Normal}\left(0,{\mathit{\sigma}}_{1}^{2}\right)\\ {\mathit{r}}_{1\mathit{j},\mathit{t}}~\mathit{Normal}\left({\mathit{r}}_{1\mathit{j},\mathit{t}1},{\mathit{\sigma}}_{1}^{2}\right)\end{array}
(5)
where the cohort random effect (r_{1jt}) for time step t depends on the previous cohort random effect at time (t1). With this cohort timedependent random effect any unobserved dynamic behaviour in the spreading of infection within cohorts was captured, such as that due to infected mice.
For the transition I to R and R to I, the random effects were modelled as:
{\mathit{r}}_{\mathit{kj}}~\mathit{Normal}\left(0,{\mathit{\sigma}}_{\mathit{k}}^{2}\right),\mathit{k}=2,3
(6)
where:
In a preliminary model building stage, a cohort timedependent random effect, r_{2jt,} was considered for the transition I to R; however the results showed no improvement to the model fit. Note that cohort timedependent random effects were not considered for the Poisson model of the transition R to I. The transition only occurred three times in the entire study and it would be unreasonable to try to estimate unobserved temporal effects from this.
Model implementation
The overall SIR model described above was implemented in a Bayesian framework and fitted using Markov chain Monte Carlo (MCMC). In this framework, parameters are treated as random variables whose “prior” distribution expresses our uncertainty about their value before any data is observed. Prior distributions (priors) are combined with the observed data through Bayes theorem to produce the posterior distributions for each parameter (posteriors). The posteriors express the uncertainty about model parameters after data is observed and all statistical inference is based solely on the posteriors. MCMC is a numerical technique which produces samples of values that eventually converge (after a certain “burnin” number) to samples of values from the posterior (distribution) of each parameter.
There was no historical information with which to inform the prior distributions of log (β), log (α) and log (ν), so Normal distributions with zero mean and a variance of 100 were used, reflecting prior ignorance while avoiding the use of improper prior distributions [24]. For the sensitivity probabilities of both serological and bacteriological tests, a Beta distribution was used as a prior. Previous information about the sensitivity of both tests [11, 12] was used to inform those Beta distributions: a mean of 0.49 for faecal culture and a mean of 0.68 for Danish mix ELISA were assumed, so SenC ~ Beta (48.5, 50.5) and SenE ~ Beta (58.5, 27.5) were specified. These priors have means 0.49 and 0.68 respectively, and variances that match the range of possible values dictated by the findings of [11, 12]. Specificity was assumed to be 100% in both tests. The precision (i.e. the inverse of the variance) of the Normal distribution for each random effect was given a Gamma (0.5, 0.005) prior distribution (large mean and very large variance to indicate prior ignorance).
The complete SIR model was implemented in the opensource statistical software WinBUGS [25]. Exactly 100,000 posterior samples were collected after a 5,000 sample burnin to ensure convergence to the posterior distribution [26]. Two MCMC runs were performed, with different initial values, to ensure convergence and mixing. The samples were thinned by only collecting one in 10 consecutive samples to eliminate autocorrelation in posterior samples (the R package “coda” [27] was used), so that in total we ended up with 20,000 samples. Convergence was assessed by inspection of traceplots but also more formally using the Raftery and Lewis diagnostic, and the GelmanRubin Rhat diagnostic which should be sufficiently close to one if convergence has been achieved [28, 29]. Mixing in the chains was assessed by comparing the Markov Chain (MC) error with the standard deviation, for each parameter. Ideally the MC error for each parameter should be less than 5% of the standard deviation [30] for good mixing.
Posterior predictive simulation was used for model checking as described by Gilks et al. [24]. This technique is effectively testing whether the observed data are extreme in relation to the posterior predictive distribution of the observations (i.e., the fitted model). The deviance was the measure adopted for comparison. The technique involves the calculation of a “pvalue” which should not be extreme (close to 0 or 1) for good model fit.
Calculations of the basic reproduction ratio (R_{0})
Samples from the posterior distribution of R_{
0
} were calculated from those of β and α using the following formula [5]:
{\mathit{R}}_{0}=\raisebox{1ex}{$\mathit{\beta}$}\!\left/ \!\raisebox{1ex}{$\mathit{\alpha}$}\right.
(7)
where β is the transition rate from S to I, and α is the transition rate from I to R.