Data
The GDV occurrence data set consists of all recorded cases of GDV among the military working dogs (MWD) at the Lackland Air Force Base (LAFB) from January 1993 through December 1998. In each case, the breed of affected dog, its sex, date of birth, age at the onset of the disease and weight were recorded. All dogs were of one of three breeds: German Shepherd, Belgian Malinois, or Dutch Shepherd. The first recorded case of GDV occurred on Jan 6, 1993 and the last one on Dec 25, 1998. The total number of recorded cases (i.e. the days on which GDV case was registered) is 60. Out of 60, only two days involved more than one case of GDV; on both of them, there were 2 affected dogs. All kenneled dogs were checked by staff every 3 hours per organizational standard procedures. Cases were dogs that demonstrated abdominal swelling, tympani of the stomach, and radiographic evidence of gastric dilatation as determined by a veterinarian. Surgical intervention was initiated on all cases, either due to life-threatening condition or for non-emergency prophylactic gastropexy procedure.
The number of dogs under observation at LAFB was not constant but rather changing from month to month. The monthly dog census data was available Oct 1993 through Dec 1997 only, starting with 357 dogs in October 1993 and ending with 281 dog in December 1997. Because of unavailability of census data for 1998, GDV occurrence data for that year were not used.
A large database of weather data was assembled from the National Climatic Data Center at the Kelly Air Force Base, located immediately adjacent to LAFB. It contained hourly data on the wind direction, speed and wind gust; hourly temperature in Fahrenheit degrees, both adjusted and unadjusted for humidity and, finally, atmospheric pressure in inches of mercury, both adjusted to the sea-level, and unadjusted one (in millibars).
Modeling approach
A number of possible models for the GDV occurrence in the dog population were considered. In all of them, the occurrence of GDV on a given day was coded using 1 for a GDV day or 0 for a non-GDV day and used as a response variable. Other quantities, such as, for example, atmospheric pressure and air temperature, were used as predictor variables. The binary response data lends itself to a number of possible approaches including binary GLM with the logistic link function (logistic regression) and a Poisson GLM with a log link function (Poisson regression).
Both response and covariates were recorded over time; this makes it reasonable to view both the response and covariates as time series. Therefore, instead of the usual generalized linear model that assumes the data are iid, we utilized the modified form of it where both response and covariates are autocorrelated over time.
Additionally, a time series approach is useful in this setting because it is quite likely that a number of weather-related (and possibly other) covariates are not accounted for; a large number of possible etiological factors makes it rather difficult to include all of them in one model. Besides air temperature and atmospheric pressure that were considered earlier, air humidity (either absolute or relative) may be one of the possible etiological factors. Humidity is recorded over time and usually exhibits noticeable autocorrelation. Typically, it is thought of as a time series; if the response variable is truly dependent on humidity, its omission causes the response variable to exhibit temporal autocorrelation. This line of thinking suggests that a time series model in the GLM framework may be better at describing GDV occurrence than the regular logistic GLM model considered in [7]. Other covariates that are commonly cited as the likely possible etiological factors of GDV, such as atmospheric pressure, air temperature and others, are also random time-dependent covariates (time series) themselves. Thus, the omission of any one of them is likely to induce additional autocorrelation in the response.
Out of several possible modeling approaches, this research used the one that is based on putting integer valued time series in the generalized linear model framework [13]. Unlike the earlier Markov chain approach, it does not cause the number of parameters to be estimated to grow exponentially with the sample size; it is also broad enough to encompass most of the practically important models. Note that neither Markov property nor stationarity have to be assumed when employing this approach. This is important since both of these properties may be difficult to verify in practice. The resulting models can be estimated using the same method (iterative reweighted least squares, IWLS for short) as regular generalized linear models; the only difference is that the results have to be interpreted conditionally on the past.
GDV occurrence on day t was denoted as Y
t
. Thus, Y
t
was binary. The covariates may include past and present values of explanatory variables as well as past values of Y
t
. The vector of all covariates was denoted as
where p is the number of covariates, k the number of explanatory variables lags and q the number of the past lags of the Y
t
considered; for each 1 ≤ i ≤ p and 1 ≤ j ≤ k,
represented the value of ith covariate at time t - j. The probability of a GDV case on any given day was defined as p
t
which is also a function of the covariate vector z
t
. There are at least two possible ways of modeling probability of GDV on a given day. The first choice is to use the binary GLM with the logit link function – effectively, a logistic regression model. The random component of the model then is a vector of binary values of Y
t
that are autocorrelated over time. The systematic component of the resulting logistic model becomes
where α is the intercept, β is the vector of coefficients and p
t
is the probability of the GDV event on a given day t that depends on the set of covariates z
t
. For each GDV day, the probability of the event is defined as the number of cases observed on that day divided by the total population of dogs recorded on that day. Because all of GDV probabilities were small in the study, it is also possible to model the probability of the GDV event on a given day t using the Poisson regression. This implies that the random component of the model is a vector of Y
t
that are viewed now as Poisson counts. The systematic component of the model relates the average number of cases on day t μ
t
to the covariates using the log link as
where, yet again, α is the intercept, β is the vector of coefficients and μ
t
depends on the set of covariates z
t
as before.
As a first step in the model selection procedure, the daily characteristics, such as max, min and mean, of the atmospheric pressure and air temperature were considered as possible covariates. The temperature was not adjusted for humidity. The lagged values of atmospheric pressure and/or temperature can be viewed as possible GDV etiological factors and thus additional explanatory variables as well. The likelihood-ratio tests were used to verify statistical significance of the model covariates. Explanatory variables that were to be used as a part of z
t
were chosen. Let us denote the minimum daily atmospheric pressure on a given day pmin
t
, maximum daily atmospheric pressure pmax
t
, maximum daily atmospheric temperature tmax
t
, and minimum daily atmospheric temperature tmin
t
. Past values (a day before) of the above were pmint-1, pmaxt-1, tmaxt-1 and tmintt-1. The maximum hourly rise/drop in the atmospheric pressure on the day of GDV event was denoted rp
t
and dp
t
, respectively while the maximum hourly rise/drop in the temperature on the day of GDV event was denoted rt
t
and dt
t
. The dogs' breed was not considered since the dog population was rather homogeneous consisting of three large breeds routinely used as MWD. All non-event days were considered in this analysis as well.
The models were constructed using the process of stepwise forward selection. The covariates were added successively and the log-likelihood ratio for each new covariate is calculated. To avoid premature stop, even if the covariate did not add much to the descriptive ability of the model, as measured by its log-likelihood ratio statistic, the process continued until all of the covariates described earlier had been tried. The number of lags of Y
t
included in the models considered in this research was limited to 3 in order to guarantee parsimoniousness of the models. The residuals of each model can later be analyzed for autocorrelation patterns and additional lags added, if necessary. The coefficients of Yt-1 and Yt-3 had very large log-likelihood ratio p-values regardless of which external covariates were included in the model; more specifically, their log-likelihood ratio p-values exceed 0.1 everywhere and thus were excluded from the final models.
The models selected as final are shown in the Table (1). Except the minimum atmospheric pressure on GDV day and the maximum atmospheric pressure on the day of GDV event and on the day before the GDV event, all of the other external covariates display log-likelihood ratio p-values above the chosen threshold level of 0.10 are not included.
All of the models contained in the Table (1) are fit using the iterative reweighted least squares algorithm commonly applied to fitting of generalized linear models. Given that we are using the canonical link function log for the binary data, the iterative reweighted least squares algorithm coincides with the regular Newton-Raphson algorithm in this case.