Flock and animal selection
Fourteen meat flocks with a size ranging from 290 to 1400 adult ewes (median 610) were selected for the study. They all belonged to the same breeders’ association located in the Lot administrative region of France. Inclusions criteria were (i) Causse du Lot purebred closed flocks with no introduction of replacement ewes for at least 4 years, (ii) history of positive serological results and/or of clinical cases of paratuberculosis, and (iii) no history of vaccination against paratuberculosis. Sampling was performed from March 2014 to March 2015, avoiding the month before and after lambing as well as the month after artificial insemination or mating. Although it has been shown that the sensitivity of serological testing may be enhanced in early and late lactation in cattle [15, 21], this sampling scheme was applied to fulfill breeders’ requests to reduce animal stress. Only 2- to 3-year-old ewes were included, using their eartag as an indicator of their birth cohort. Individual ages at sampling were calculated based on birth date available from the French Systeme National d’Information Génétique (SNIG) database. Ewes showing obvious clinical signs of paratuberculosis, if any, were excluded because the target population was sub-clinically infected animals. If no feces could be retrieved intra-rectum at the time of sampling, animals were excluded and the next one fulfilling the inclusion criteria was substituted. Depending on flock size, the target sample size ranged between 60 and 150 ewes per flock.
Sample collection and handling
A handful of feces was sampled from the rectum of selected animals using single-use gloves without lubricant and was placed in an individually identified sterile plastic bag for transportation. In parallel, a five-milliliter blood sample was also collected from the jugular vein in vacuum tubes without anticoagulant (Vacutainer® System). Feces and blood samples were frozen at −20 °C prior to analysis. Animal handling was performed in compliance with the European Commission Directive 2010/63/EU. All farmers gave written consent for their animals to be used in this study.
Laboratory testing
Serological tests
Two commercial ELISA tests were applied to serum samples using an overnight incubation protocol following the manufacturer’s instructions: ELISA A (ID Screen Paratuberculosis Indirect®, batch 602, IDVet, Montpellier, France) and ELISA B (IDEXX paratuberculosis screening® kit, batch 5074, IDEXX, Montpellier, France). Negative and positive controls provided by the manufacturers were included on each ELISA plate, and manufacturer’s guidelines were strictly followed for interpretation of sample to positive (S/P) ratio results: for ELISA A serum, samples with S/P values <60%, between 60 and 70%, and ≥70% were considered negative, doubtful, and positive for MAP antibodies, respectively. For ELISA B, the negative and positive thresholds were 45% and 55%, respectively.
Fecal real-time PCR
First, fecal samples underwent a concentration procedure using the ADIAFILTER system (BioX, Rochefort, Belgium) following the manufacturer’s instructions. Ten grams of feces were rehydrated overnight in 70 mL of bidistilled sterile water. The top 10 mL of the supernatant were then filtered and centrifuged using the ADIAFILTER® disposal. Pellets were then resuspended in 500 μL of bi-distilled water and mixed with 300 mg of 150-250 μm silica beads (Silibeads, Sigmund Lindner, Warmensteinach, Germany) for 30 s at 6800 rpm three times in a bead beater (Precellys 24®, Bertin Technologies, Montigny-le-Bretonneux, France). A magnetic bead-based DNA extraction was performed on a Kingfisher Flex® magnetic particle processor (Thermo Fisher Scientific, Courtaboeuf, France) following the NucleoMag 96 tissue protocol (Macherey-Nagel, Hoerdt, France), with addition of an extraction control (ADIAVET™ PARATB REAL TIME, BioX, Rochefort, Belgium) in each plate well. Samples were subjected to qPCR (ADIAVET™ PARATB REAL TIME, BioX, Rochefort, Belgium), following the manufacturer’s instructions. Each sample was also tested for amplification of the internal control. Bi-distilled water and synthetic IS900 DNA provided in the amplification kit were used as negative and positive controls, respectively. Forty-five amplification cycles were performed on a LightCycler 480 (Roche Life Science, Meylan, France), and fluorescent signals were recorded in two channels, with FAM detecting IS900 and VIC detecting the extraction control. Due to the overlapping spectra of the two dyes, a color compensation step was applied. Raw fluorescence data were obtained from the LightCycler 480 and modeled using the qpcR package [27] in R software [28]. Cycle thresholds were determined using second derivative maximum (CpD2). According to the manufacturer’s recommendations, samples that reached fluorescence with a cycle count (Ct) below 40 were considered positive. A higher threshold (Ct ≤ 42) was also considered. Indeed, careful examination of late fluorescence curves indicated that they were associated with low but unambiguously positive results up to 42 Ct, while non-specific amplification results could not be ruled out beyond this threshold.
All tests were performed blind for other test outcomes.
Target conditions
The purpose of this evaluation was to provide an accurate appraisal of sensitivity and specificity of two ELISAs and one fecal qPCR for the diagnosis of paratuberculosis in sub-clinically infected 2- to 3-year-old ewes. The target condition for this evaluation was MAP-infected animals that shed enough bacteria in their feces to potentially test positive on fecal PCR at the time of sampling, that mounted an antibody response towards MAP that could be detected by ELISA, or both. Following the Nielsen and Toft (2008) definition [29], this target condition included both infected and infectious animals but probably only few affected ones, as ewes showing obvious clinical signs of paratuberculosis were excluded on farms. Note that animals passively shedding MAP in their feces [30, 31] as a result of heavy environmental contamination were also included in our target conditions.
Statistical analysis
Separate analyses were performed for the four scenarios according to whether doubtful ELISA results were handled as positive or negative and on the choice of the positive cut-off for fecal qPCR (Ct ≤ 42 or Ct ≤ 40). Based on previous serological results, history of paratuberculosis clinical cases and judgment of practicing veterinarians and technicians supervising the flocks, flocks were grouped into 4 sub-populations according to the within-flock suspected prevalence of infection: very low (3 flocks, 287 sampled ewes), low (5 flocks, 299 sampled ewes), moderate to high (6 flocks, 447 sampled ewes) and very high (2 flocks, 164 samples ewes).
Model definition
We applied multiple populations Bayesian Latent Class models [32, 33] to estimate the diagnostic accuracy of the two ELISAs and the fecal qPCR in the absence of gold standard.
The models were defined following the approach by Dendikuri and Joseph (2001) [4] that uses a multinomial distribution to model the frequency of the 8 observed combinations of test outcomes. The simplest model assumes conditional independence between tests (i.e., given the true disease state of a sample, the outcome of one test does not have any influence on the probability of a positive or negative outcome in a second test). Under this assumption, the probability of a combination of test outcomes in a given population only depends on the true prevalence within this population and the sensitivities and specificities of diagnostic tests, which are assumed constant across all populations [3]. If Ti + denotes the event of a positive outcome for test i, i = 1, …, 3, Sei and Spi denote the sensitivity and specificity of test i, respectively, and πj, the true prevalence in a given population j, j = 1…4, then the probability of all three test being positive on a sample in this population is given by
$$ P\left({T}_1^{+},{T}_2^{+},{T}_3^{+}\right)={\pi}_j{Se}_1{Se}_2{Se}_3+\left(1-{\pi}_j\right)\left(1-{Sp}_1\right)\left(1-{Sp}_2\right)\left(1-{Sp}_3\right) $$
The probability of other combinations of test outcomes can be easily derived analogously. The assumption of conditional independence between tests may, however, not hold in practice and should be challenged against models allowing for the conditional dependence between tests [2]. We considered the approach proposed by Dendikuri and Joseph (2001) [4], where pairwise dependence of sensitivities and specificities of tests are explicitly modeled by covariance terms (Covse and Covsp). In the fully dependent case, the probability of all three tests being positive on a sample within population j is then given by
$$ P\left({T}_1^{+},{T}_2^{+},{T}_3^{+}\right)={\pi}_j\left({Se}_1{Se}_2{Se}_3+{Covse}_{23}{Se}_1+{Covse}_{13}{Se}_2+{Covse}_{12}{Se}_3+{Covse}_{123}\right)+\left(1-{\pi}_j\right)\left(\left(1-{Sp}_1\right)\left(1-{Sp}_2\right)\left(1-{Sp}_3\right)+{Covsp}_{23}\left(1-{Sp}_1\right)+{Covsp}_{13}\left(1-{Sp}_2\right)+{Covsp}_{12}\left(1-{Sp}_3\right)-{Covsp}_{123}\right) $$
Starting from the fully saturated model below, covariance terms were removed one-by-one following a stepwise backward selection procedure using the Deviance Information Criterion (DIC) as the selection criterion [34]. The DIC evaluates the model fit while penalizing the number of parameters, and it is generally accepted that models with smaller DIC are better supported by the data.
Comparing diagnostic test accuracies
The Bayesian posterior probability of difference (PPD) in sensitivity and specificity between tests was estimated using the Boolean step function in OpenBUGS [12, 16]. If PPD <0.05 or >0.95, we concluded that the sensitivities (or specificities) of two compared tests were significantly different.
Serial and parallel testing
The accuracy of serial and parallel testing for the combinations of one ELISA and fecal qPCR was finally evaluated. For two conditionally dependent tests, namely, Test 1 and Test 2, the sensitivity (Seser) and specificity (Spser) of serial testing are given by
$$ {Se}_{ser}={Se}_1{Se}_2+{CovSe}_{12} $$
$$ {Sp}_{ser}=1-\left(\left(1-{Sp}_1\right)\left(1-{Sp}_2\right)+{CovSp}_{12}\right), $$
where CovSe12 and CovSp12 denote the covariance terms for the pairwise dependence of sensitivities and specificities, respectively.
Sensitivity (Separ) and specificity (Sppar) of parallel testing were given by
$$ {Se}_{par}=1-\left(\left(1-{Se}_1\right)\left(1-{Se}_2\right)+{CovSe}_{12}\right) $$
$$ {Sp}_{par}={Sp}_1{Sp}_2+{CovSp}_{12} $$
Prior distributions
Uniform distributions in the range from 0 to 1 were used as priors for sensitivity and prevalence model parameters. Based on previous published estimates in sheep [16, 35,36,37], the specificity of ELISAs and fecal qPCR was set at 0.95, with 95% certainty to be greater than 0.80. The corresponding Beta distribution Beta (21.20, 2.06) was generated using the epi.betabuster function embedded in the epiR package in R software [38] and was used as prior distribution for all specificity parameters.
Constraints were defined for covariance terms so that each of the 8 probabilities of combinations of test outcomes was between 0 and 1 [4], and uniform distributions between the lower and upper constraint bounds were used as non-informative priors.
Implementation
Computations were performed with OpenBUGS [39] embedded in R software using the R2OpenBUGS package [40]. Posterior estimates for test sensitivity and specificity were generated using the Markov Chain Monte Carlo (MCMC) sampling method and the Gibbs algorithm. Three simulation chains of 200,000 iterations were run with different starting values, with the first 10,000 iterations discarded as the burn-in period. The chains were then thinned, taking every tenth sample to reduce autocorrelation among the samples. The convergence of the chains following the initial burn-in period was assessed visually by examining the traces, histories, Monte Carlo errors and the Gelman-Rubin diagnostic plots [41, 42]. The posterior distribution of each parameter was summarized using the mean and the 95% posterior credible interval (95% PCI). Analysis and graphing of the MCMC output were conducted using the coda package in R [43].
The aggregated data sets supporting the results of this article and the R2OpenBUGS code used are provided as additional files (Additional files 1 and 2).
Sensitivity analysis and model assumption checking
To assess the influence of prior information on the estimates of model parameters, poorly informative uniform distributions in the range of 0.5 to 1 were also considered for specificities. These truncated distributions were chosen to avoid convergence issues of single MCMC chains due to label switching [44].
To verify the assumption of constant test accuracy across all populations, we first excluded each of the 4 populations and subsequently each of the 14 flocks, one at a time, and re-ran all investigated models.