To reach this goal, the following procedure was followed (Fig. 1). In the first step, a system of ODEs was proposed to represent the theoretical interaction between inflammatory cells and bacteria during mastitis (model [1]) and to obtain their “model predictions”. In the second step, these predictions (= simulated dataset 1) and SCS collected at specific points in time in cows clinically infected (= observed dataset 2) were smoothed with cubic spline models and variable knots (= models [2]). In step 3, time derivatives of models [2] were computed (= models [3]). Finally, in step 4, rates of recruitment were estimated via a linear regression (= models [4a, 4b]) on these time derivatives. All computations were done on SAS 9.1. To achieve normality of distribution, concentrations in all models were expressed under the log scale as it is done routinely.

### Mathematical simulation of an inflammatory response during mastitis

The model [1] obeyed key biological characteristics observed within an infected udder:

$$ \begin{array}{l}\mathrm{dx}/{\mathrm{dt}}_{\mathrm{i}}\kern0.5em =\upgamma\;\mathrm{x}\hbox{-} \upomega \kern0.5em \mathrm{x}\kern0.5em \mathrm{y}\\ {}\mathrm{dy}/{\mathrm{dt}}_{\mathrm{i}}\kern0.5em =\kern0.5em \upupsilon \kern0.5em \left({\mathrm{y}}_0\kern0.5em \hbox{-} \kern0.5em \mathrm{y}\right)\kern0.5em +\upbeta \kern0.5em \mathrm{x}\mathrm{y}\end{array} $$

(1)

where x (t_{i}) symbolizes the bacterial and y (t_{i}) the somatic cell concentrations at a particular point in time (t_{i} = 0, 1, 2, …, 100 time-units; *i* = 1, 2, 3, …, 101). At start, the concentrations are ×_{0} and y_{0}. Across time, the bacterial concentration is controlled by the multiplication rate (γ) and the rate at which each inflammatory cell kills bacteria (ω). This last parameter is a first indicator of the level of resistance of the cow. If the animal is insufficiently resistant, ω is low and its inflammatory cells cannot successfully kill bacteria. This is observed in animals with inherited disorders of phagocytic cells [14] and in cows during the peri-parturient period [15]. During health, the concentration of inflammatory cells is controlled by the first term of the second equation where υ is the rate at which cells are recruited and removed in the absence of infection. So, even in the absence of bacteria, there is a standing stock of cells ready to attack. During mastitis, an extra-concentration of activated cells is recruited at a rate β. This is a second indicator of the level of resistance of the host and the indicator of interest in this study. Indeed, an infected host is not very resistant when β is low because activated somatic cells cannot migrate towards the site of infection. This is observed in diseases such as the bovine leukocyte adhesion deficiency syndrome [16].

Initial values were set at ×_{0} = 2 and y_{0} = 10, and a range of values, from 0.001 to 0.50, were set for the different rates in model [1]. Because it is deterministic, the model doesn’t take into account the discreteness of the populations and their random fluctuations so extinction is not possible. Indeed, as the number of cells decreased, the assumption of continuous cell populations is no longer valid and oscillations may occur [17]. As a solution, a threshold of 0.01 was introduced below which the bacterial population is considered extinct, alike in [10].

The system accepts two equilibria, one in the absence and one in the presence of infection. The linear stability of these equilibria was determined by evaluating the characteristic equation (s) of the Jacobian. The equilibrium is stable when all eigenvalues of its Jacobian matrix have negative real parts; it is unstable if at least one eigenvalue has positive real part [18].

### Simulated and observed datasets

Two datasets were used for the second step of the procedure. The simulated dataset (dataset 1) consisted of the values of the concentrations of bacteria (x (t_{i})) and inflammatory cells (y (t_{i})) obtained with [1]. Simulation steps were executed for a period of 100 time-units or until the infection dies out.

The second dataset (dataset 2) came from a survey of 31 commercial dairy farms conducted between January 2008 and December 2011 in the Walloon region of Belgium (Additional file 1)[19]. Herds were enrolled in the regional dairy herds recording system from which test-day somatic cell scores or SCS (i.e., the log-transform of SCC; denoted s (t_{i})) were obtained. Farmers recorded 756 mastitis cases and 1947 SCS values were used in the analyses. Other information included year of calving, parity, days in milk, and number of days between successive events. Clinical mastitis was diagnosed by the breeder when milk from one or more glands was abnormal in color, viscosity, or consistency, with or without accompanying heat, pain, or redness. Only the first mastitis case per parity was considered. Data were collected from 50 days before up to 50 days after the date of mastitis detection. Lactation must include at least two months of lactation. No information was available on bacterial concentrations.

### Cubic Spline models

For the second step of the analysis, records from both datasets were fitted with cubic splines and variable knots:

$$ \mathrm{U}\left({\mathrm{t}}_{\mathrm{i}}\right)\kern0.5em =\kern0.5em {\mathrm{g}}_0+{\mathrm{g}}_1\;{\mathrm{t}}_{\mathrm{i}}\kern0.5em +{\mathrm{g}}_2\;{{\mathrm{t}}_{\mathrm{i}}}^2\kern0.5em +\kern0.5em {\mathrm{g}}_3\;{{\mathrm{t}}_{\mathrm{i}}}^3+{\Sigma}_{\mathrm{i}}\;{\mathrm{h}}_{\mathrm{i}}\kern0.5em {\left({\mathrm{t}}_{\mathrm{i}}\kern0.5em \hbox{--} \kern0.5em {\mathrm{k}}_{\mathrm{i}}\right)}^3\kern0.5em {\mathrm{d}}_{\mathrm{i}}\kern0.5em +\kern0.5em \mathrm{e}\kern0.5em \left({\mathrm{t}}_{\mathrm{i}}\right) $$

(2)

for *i* = 1, 2, …. 100. The variables U (t_{i}) are either x (t_{i}) and y (t_{i}) (dataset 1) or s (t_{i}) (dataset 2) measured at time t_{i} where t_{i} is, for dataset 1, the time since infection (t_{i} = 1, 2,.., 100) and, for dataset 2, the number of days elapsed between the date of milk recording and the date of the case occurrence (t_{i} = −50 to +50 days). Parameters g_{0}, g_{1}, g_{2}, g_{3} and h_{i} are regression coefficients. The knots k_{i} can be any value of t_{i} and the dummy variable d_{i} = 0 if t_{i} ≤ ki and d_{i} = 1 if t_{i} > k_{i}. Besides effects in [2], the model for dataset 2 included fixed effects of parity (1, 2, ≥3), days in milk (1, 2 …, 300) and herd-year-season (1, 2 *…,* 174) when the case was observed. These last effects are known factors affecting test-day milk yield and SCS (e.g., [20]). Cubic B-splines are the most frequently chosen spline to fit biological systems because they ally simplicity and biological signification and estimates tend to have high variance when the order of the spline gets larger (e.g., [21]).

Statistically significant knots were selected in a stepwise manner. The final model contained only variables with F-statistics for entry and staying in the model significant at the 0.15 level. The selection stopped at a local minimum of the predicted residual sum of squares (PRESS) criterion. The e (t_{i}) were assumed normally and independently distributed with E (e (t_{i})) = 0 and var. (e (t_{i})) = σ_{e}
^{2}. Estimated values of U (t_{i}) (= Û (t_{i})) and of regression coefficients (= ĥ and ĝ) were obtained by minimizing the squared differences between U (t_{i}) and Û (t_{i}). Differences between U (t_{i}) and Û (t_{i}) and R^{2} values (called R1 values in the following text) were used to evaluate the fit of the model.

### Estimation of rates of recruitment

For both datasets and after stepwise selection, time derivatives (step 3) were computed as:

$$ \mathrm{d}\widehat{\mathrm{U}}\kern0.2em \left({\mathrm{t}}_{\mathrm{i}}\right)\kern0.2em {/\mathrm{dt}}_{\mathrm{i}}={\widehat{\mathrm{g}}}_1+2\kern0.5em {\widehat{\mathrm{g}}}_2\kern0.2em {\mathrm{t}}_{\mathrm{i}}\kern0.5em +3\kern0.2em {\widehat{\mathrm{g}}}_3\kern0.2em {{\mathrm{t}}_{\mathrm{i}}}^2+3\kern0.2em {\Sigma}_{\mathrm{p}}\kern0.2em {\widehat{\mathrm{h}}}_{\mathrm{p}}\kern0.2em {\left({\mathrm{t}}_{\mathrm{p}}-{\mathrm{k}}_{\mathrm{p}}\right)}^2\kern0.2em {\mathrm{d}}_{\mathrm{p}} $$

(3)

For \( \widehat{\mathrm{U}}\kern0.1em \left({\mathrm{t}}_{\mathrm{i}}\right)=\kern0.5em {\mathrm{x}\widehat{\Big(}\mathrm{t}}_{\mathrm{i}}\left),\kern0.5em {\mathrm{y}\widehat{\Big(}\mathrm{t}}_{\mathrm{i}}\right)\kern0.5em \mathrm{and}\kern0.5em {\mathrm{s}\widehat{\Big(}\mathrm{t}}_{\mathrm{i}}\Big) \) with *i* = 1, 2, … 100; p indexes the significant segments (*p* = 1, …. ≤100); ĝ and ĥ are the ordinary least-squares estimates obtained for the regression coefficients in [2]. Differences between dU_{i} (model [1]) and dÛ_{i} (model [3]) were computed.

Finally (step 4), the derivatives were regressed on each system of ODE Eqs. For dataset 1 (model [4a]),

$$ \begin{array}{l}\mathrm{d}\widehat{\mathrm{x}}/{\mathrm{dt}}_{\mathrm{i}}=\upgamma \kern0.5em \mathrm{x}\kern0.5em -\upomega \kern0.5em \mathrm{x}\kern0.5em \mathrm{y}\\ {}\mathrm{d}\widehat{\mathrm{y}}/{\mathrm{dt}}_{\mathrm{i}}=\upupsilon \kern0.5em \left({\mathrm{y}}_0\kern0.5em -\mathrm{y}\right)+\upbeta \kern0.5em \mathrm{x}\kern0.5em \mathrm{y}\end{array} $$

(4a)

and for dataset 2 (model [4b]),

$$ \begin{array}{l}\mathrm{d}\mathrm{z}/{\mathrm{dt}}_{\mathrm{i}}\kern0.5em =\kern0.5em \upgamma \kern0.5em \mathrm{z}\kern0.5em -\upomega \kern0.5em \mathrm{z}\mathrm{y}\\ {}\mathrm{d}\widehat{\mathrm{s}}/{\mathrm{dt}}_{\mathrm{i}}\kern0.5em =\kern0.5em \upupsilon \kern0.5em \left({\mathrm{s}}_0\kern0.5em -\mathrm{s}\right)\kern0.5em +\upbeta \kern0.5em \mathrm{z}\;\mathrm{s}.\end{array} $$

(4b)

The R^{2} values (called R2 values in the following text) were computed to assess the fit of the models in estimating ODE rate parameters. In dataset 2, no information was available on bacterial concentrations necessary to solve model [4b]. As an alternative and to prove the concept of the proposed methodology, they were replaced by z. The values of z were simulated thanks to the first equation of [4b] (dynamics similar to [4a]) with γ = 0.1 and the value of ω that gave the highest R2.