Premises data
The data used in this paper were taken from the DEFRA FMD Data archive [9]. Relevant information for the 2,026 mainland IPs were farmhouse coordinates and infection and slaughter dates. Thirteen IPs in this database that were confirmed on serology tests for antibodies to the virus do not have estimated infection dates; we assume that these IPs were infected 10 days before reporting, which is the period suggested by DEFRA in the database. Data for all other livestock holdings in the UK are an amalgam of 2001 census data and DEFRA's list of premises including all IPs and culled premises from the epidemic; in total 185,791 premises. Relevant information for each premises was farmhouse coordinates.
Road network
The UK road network was taken from the Digimap Meridian™ 2 Database [10]. In this database, road centre-lines are represented as links, and road intersections as nodes. A road link, which connects two nodes, comprises one or more line segments fixed positionally by a series of connected coordinate points. The coordinate system is the National Grid with a resolution of 1 m. The database distinguishes between Motorways, A roads, B roads and minor roads; it does not include private roads, tracks and some minor roads and cul-de-sacs of less than 200 m. We extract from this database the coordinates of all line segments of all road links. We create our own network of nodes and links, where each line segment is a link connected to two nodes. A node contains a list of all other nodes linked to it, and the Euclidean distance to each of these nodes calculated from the line segment coordinates.
Each farmhouse is then assigned to its nearest node in the road network, under the assumption that this node is the closest node to the true farm entrance. The validity of this assumption was checked by hand for 150 randomly chosen premises in Devon, Wales and Cumbria by comparison to Ordnance Survey 1:50000 raster images. Of the 150 premises, 144 (96%) had correctly assigned nodes, the other 4% were assigned a node within 1 km of their correct node. Fig. 5 shows true road distances from the 150 farmhouses to their entrance on a road in our network, as estimated from the raster images, against the Euclidean distance of the farmhouses to their nearest node in the network (in general, the nearest node does not correspond to the position of farm entrance onto a road in our network). Premises can be categorised into two types: those with farmhouses adjacent to a road in the network, which tend to be less than 200 m away from their nearest node; and farmhouses some distance away from a road, which show a linear trend with their distance from their nearest node (these farmhouses are always connected to a road in our network by a road or track not represented in the Digimap Meridian™ 2 Database). We assume that farmhouses less than 200 m away from their nearest node are 0 m away from a road, and that farmhouses greater than 200 m away from their nearest node are -60 + 1.03x metres away from a road (from the linear regression shown in Fig. 5), where x is the distance to their nearest node. Any redundant nodes are removed from the network to improve computational efficiency. This comprises nodes at dead ends, and nodes that have only two links (in this case, the nodes linking the redundant node are linked together and the distance between them is the sum of the distances between the redundant node and the two linking nodes). A node assigned to a premises is not made redundant.
Calculating shortest and quickest routes
We calculate the shortest route between all pairs of livestock premises in the UK within 10 km of each other. This is done by analysing 40 × 40 km2 overlapping regions incremented by 10 km horizontally or vertically. This ensures that all farms within 10 km of an IP are linked to an IP by road. Larger regions are computationally infeasible.
The road network in a 40 × 40 km2 region is converted into an N × N matrix where N is the number of nodes in the region. The matrix is initialised with the road distances between all linked nodes; elements of nodes not linked are given infinite values. The Floyd-Warshall algorithm [11, 12] is then applied to this matrix resulting in an N × N matrix where the value of each element gives the shortest route between its corresponding pair of nodes. The computational running time of the Floyd algorithm scales as N3, where N varies from approximately 100 to 10,000 depending on the density of roads. When N exceeds 10,000 the algorithm's running time exceeds 1 day. The shortest route between any pair of farms is taken as the shortest route between the two assigned nearest nodes to these farms plus the assumed road distance of the farms from the main road. In a very few cases, especially neighbouring farms, the spatial configuration of a pair of farms and their connecting nodes causes the road distance to be less than the Euclidean distance. For these rare cases we assume road distance equal to the Euclidean distance.
To find the quickest route between two farms, distances between two nodes in the network are replaced with journey times. We assume that Motorway and trunk road speeds are 112 kph, A, B and minor road speeds are 72 kph, and farmhouse to road junction speed is 16 kph [13].
Statistical analysis of distance – based risk
Owing to incomplete or equivocal tracing data, it is not possible to prove conclusively which farm infected which. Therefore we must consider all infectious IPs as possible sources of transmission on the particular day a farm gets infected. However, we can calculate the probabilities of possible transmission events based on known risk factors. We know that risk depends on proximity from an infectious IP (K(d)) and on the transmissibility (
) of the infecting farm [5]. Thus, we assume that the probability of an infectious IP i infecting a susceptible farm j (on the day t when j was infected) is given by
where
(t) is the set of all IPs infectious on day t. The denominator normalises pi,jsuch that the probability of farm j being infected on day t is 1. The transmissibilities
, are given by [5]
= T
s
Ns,i+ T
c
N
c
,i, (2)
where T
s
is the transmissibility of sheep, T
c
the transmissibility of cattle, Ns,ithe number of sheep and Nc,ithe number of cattle. Only the relationship between T
s
and T
c
is required because of the form of Equation 1. We assume that the infectious periods of all IPs begin 3, 4 or 5 days after they become infected and end on the day they are slaughtered [14–16]; the infection and slaughter dates of IPs are taken from the DEFRA FMD Data archive [9].
For a given region, defined in Table 1, only farms in those counties are used in the analysis. For example, for the Cumbria region we assume that only farms in Cumbria can infect Cumbrian farms. Farms in the neighbouring county of Dumfries and Galloway are assumed not to infect Cumbrian farms. Some pre-emptively culled farms may have been infected but never reported. Because it is not possible to say which farms these were or how many of them there were, we cannot include them as IPs in our analysis.
For each IP we find the Euclidean distances and the shortest and the quickest routes between it and all farms it could have infected after 23rd February 2001 and within 10 km (termed possible transmissions), and all farms it could not have infected after 23rd February 2001 and within 10 km (termed non-transmissions). A possible transmission can occur when an IP is infectious on the day another farm was infected (and hence became an IP). A non-transmission between an IP and a farm is defined for three cases: the IP was infectious before the other farm became infected, the IP was infectious before the other farm was pre-emptively culled, and the other farm was never infected or culled.
The mean shortest or quickest route between infectious and susceptible premises in a region is found for possible transmissions (weighted by their probability of occurrence p, Equation 1, in which di,jrepresents Euclidean distance) and for non-transmissions. The difference between these means is recorded. The next step is to compare this difference to a null-distribution. The null hypothesis states that the difference in the means could have arisen by chance. The null-distribution is found as follows. One thousand weighted random samples of possible transmissions are taken from the population of all IP-farm pairs. The sampling is done without replacement. The weighting takes into account the fact that the ratio of possible transmissions to IP-farm pairs varies with Euclidean distance. Therefore, the probability of sampling a possible transmission at a given Euclidean distance is conditioned on this ratio at that distance. If we did not do this, we would preferentially sample IP-farm pairs with longer Euclidean distances within the population because these are more numerous. The unsampled IP-farm pairs make up a random sample of non-transmission pairs. The mean shortest or quickest route of the randomly sampled possible transmissions and non-transmissions are found and their difference calculated. The observed difference in the means is then compared to the null-distribution to obtain a p-value.
To test if Euclidean distance is a better predictor of risk than shortest or quickest route, the two variables under consideration are swapped with di,jin Equation 1 representing shortest or quickest route.
Simulated epidemics
Epidemics were simulated in order to test the power and specificity of the statistical test. The simulations are based on the stochastic simulations done by [5]. Briefly, the infection of susceptible farms are Poisson processes with rates determined by the susceptibility of the susceptible farms, the transmissibility of all infectious farms and a Euclidean-distance or road based transmission kernel. The rates and the Euclidean-distance based kernel are parameterised using the UK 2001 epidemic [5]. If the Euclidean-distance based transmission kernel is K
e
(e) (where e is Euclidean distance), and the Euclidean distance-shortest or quickest route density function of IP-farm pairs (e.g., Fig. 2) is f(r, e) (where r is shortest or quickest route), then the shortest or quickest route based transmission kernel K
r
(r), is given by
The Euclidean-distance kernel is the black line in Fig. 1. Using farms in Devon for f(r,e), the shortest route kernel is the magenta line and the quickest route kernel is the green line. For the first 30 days of the simulated epidemics, IPs are slaughtered after 3 days of reporting and farms within 1.5 km of an IP are pre-emptively culled after 5 days of reporting. These reduce to 1 and 2 days respectively after the first 30 days. There is no dangerous contact culling. One thousand simulations using the shortest route based transmission kernel were analysed. For an a value of 0.05, shortest route was a significantly better predictor of transmission than Euclidean distance for 98% of cases. However, the test for Euclidean distance as a better predictor of transmission was significant in 15% of cases. Conservatively, therefore, our test has a power of about 85%. An additional 1000 simulations using the Euclidean-distance based kernel were analysed. For an a value of 0.05, Euclidean distance was a significantly better predictor of transmission than shortest route for > 99.9% of cases. However, the test for shortest route as a better predictor of transmission was significant in just 1% of cases. Conservatively, therefore, our test has a specificity of about 99%.
Test for best distance – based transmission kernel
The following statistical test was developed to see if transmission between farms on opposite sides of specific transmission barriers is better modelled using a shortest route based transmission kernel or a Euclidean-distance based one. The distribution of infection probabilities (Equation 1) is found for IPs on opposite sides of a barrier first with di,jrepresenting Euclidean distance. The same is then done with di,jrepresenting shortest route. If these two infection-probability distributions are significantly different from each other, this suggests that transmission across the barrier will be modelled differently under the two kernels. Given that transmission did not occur directly over the barrier, this implies that the shortest route based transmission kernel would be the better model. If, however, the distributions are not significantly different from each other, then transmission across the barrier will not be modelled significantly differently under the two kernels; therefore we can assume that a simple Euclidean-distance based transmission kernel will suffice. The Kolmogorov-Smirnov test was used to compare the distributions.