BCMS data was provided by DEFRA from the RADAR project on 24 May 2006, covering movements for the period from January 1999 to April 2006. This was approximately 21 GB of data. The data was loaded into an Oracle™ database (Oracle Corporation, Redwood Shores, California, USA). In this analysis we have selected all records for two complete years, 1 January 2004 to 31 December 2005.
The main table of BCMS data in RADAR consists of "stay on location" records for individual cattle. These include fields for the individual livestock (cow) identifier, the location identifier, the arrival date, the departure date, the type of movement on to the location (e.g. by birth, or by a normal trade transfer on to the location) and the type of movement off the location (e.g. death, normal trade transfer off). From this table may be derived a set of "movement" records. These are constructed from pairs of records relating to the same animal, where the departure date of the first record is equal to the arrival date for the second record. This will miss some movements where the arrival at the second location is on a later day than that of the departure from the first location. The logic of the data extraction would collect "movements" which were not directly from the first location to the second, but actually involved intermediate locations, if the rule requiring the dates to be the same were to be relaxed. The limit of precision of timing to one day means that sometimes two "stay on location" records may be started and completed within one day and their temporal sequence cannot be recovered. Some can be resolved by a further rule, that the type of movement off the first location may not be "DEATH". No "movement" record is constructed if either the location-from or the location-to is undefined (coded as "-1" in the RADAR data; 4.4% of all "stays on location"). All such locations would match in the next stage of data extraction, which would be misleading. The "movements" are collected from the whole of 2004 and 2005.
Each record consists of the location-from, the location-to and the date of the movement. All movements where one of the locations was unknown have been eliminated. Since we are interested in the cattle-holding locations rather than individual animals, the dataset is simplified by removing duplicate records, thus reducing the movement of a group of cattle between two locations to a single contact between those locations, irrespective of the number of cattle involved. The tie strength (i.e. the number of animals moved) is ignored for the purposes of this analysis.
There is a second BCMS data table that contains details of all the locations in the main table, including the location type, where known. The preponderant types are Agricultural Holding (64% of "stays on location" with known location), Slaughterhouse (Red Meat) (19%), Market (15%), Landless Keeper (1%) and Showground (0.3%). For 0.3% of stays with known location, the location type is unknown, but for the purposes of this analysis they are considered to be Agricultural Holdings. Each "movement" record receives a type-from record and a type-to record from the location table, based on its location-from and location-to fields, and finally a unique identity number. These records are the nodes of the network. The list of nodes can be used to generate networks for structural investigation and simulations of disease spread.
The edges of a network are defined as directed links from a first movement to a second movement, where the location-to of the first movement is the same as the location-from of the second, and the second movement occurs on or after the date of the first movement. The network thus incorporates the temporal sequencing of the contacts between locations. We are concerned with the risk of disease transmission by these contacts. After a period of time, the risk that an infection, brought into a location by a contact, will be passed on in a subsequent contact, will have declined, so edges are not constructed that link two movements that are separated in time by a period greater than an arbitrary limit. If the number of days that elapse after the first movement exceeds the time limit, a link is no longer made (Fig. 2).
Two methods have been used to set time limits, one for locations where an all-in, all-out policy is expected and another for "Farms". For locations where an all-in, all-out policy is expected, this maximum time is set by inspection of the distribution of lengths of "stay on location" for these location types in the expectation that one group of animals will not mix with the next group. For Markets, Showgrounds, Slaughterhouses and Other non-Farm locations the maximum time is thus set at 6, 5, 5 and 4 days, respectively. For Agricultural Holdings and Landless Keepers (referred to here collectively as "Farms"), the maximum will depend on the nature of the disease that is to be modelled. We have chosen two illustrative values for this time limit, and therefore constructed two different networks from the data. The values used (for all Farms) were 7 days and 14 days.
The network is defined by an adjacency list consisting of pairs of numbers: the node identifier of the starting node of the edge and the node identifier of the ending node of the link. Thus the network will incorporate in a simple adjacency list the temporal sequencing of the contacts and the interaction between the properties of the type of disease being modelled and the types of locations on which the cattle are held. The construction of the network, as described above, ensures that all the links are real routes through which infection can pass from one location via an intermediary location to a third location. This is in contrast to networks where the nodes are locations and the edges are movements. These lose the temporal sequence information available in the source movement data.
For the two networks (with 7-day infectious period, denoted by "the 7-day infection network", and with 14-day infectious period, denoted by "the 14-day infection network", as described above), the following standard network parameters  have been calculated within Oracle, by processing Oracle output in MS Excel, and with our own routines (the Contagion library ):
1 Number of nodes in the network and the number of edges (links) between them.
2 Density, measured as the proportion of all the theoretically possible edges between nodes that actually exist. This tends to be small for large networks.
3 Out-degree frequency distribution. The out-degree of a node is the number of edges that start from that node.
4 In-degree frequency distribution. The in-degree of a node is the number of edges that end at that node.
5 Frequency distribution of in- versus out-degree per node ("2-dimensional" frequency distribution).
6 Dyad census, being the count of the types of connection between pairs of nodes and thus giving information on the structure of the network. All possible pairs of nodes in the network are considered, and are categorised as mutual (edges in both directions), asymmetric (only one edge between them) or null (unconnected) dyads. Uses our routine, dyad_census .
7 Reciprocity, calculated as the proportion of non-null dyads that are mutual, quantitates the reciprocal nature of the linkages.
8 Partial triad census. The triad census is the count of the types of connection between trios of nodes and thus gives information on the structure of the network. All possible trios of nodes in the network are considered, and are categorised into 16 classes in a manner analogous to dyads . Our census is partial as it does not give a precise count of the null triads (unconnected trios). Uses our routine, triad_census .
9 Clustering coefficient, derived from the triad census  as the ratio of triangles (nodes linked to two other nodes, which are themselves interlinked) to triples (nodes linked to two other nodes). This quantitates the tendency of nodes to formed interlinked clusters.
10 Frequency distribution of sizes of weak components. These are sets of nodes that are linked together, but not necessarily linked directly or reciprocally. The sizes of these components reflect the overall reachability of any node from any other node. Uses our routine, weak_component_count .
In order to demonstrate in principle the simulation of the spread of disease on our networks, we applied the sir_net function , which was originally designed for the simulation of disease spread on a static network with locations as nodes and movements as directed (but not temporally sequenced) links. In the simulations described here:
1. The nodes are movements and the simulation begins with the random assignment of one movement as "infected". This effectively places infected cattle on the target location of the movement, so that the subsequent movements that start from that location are infected.
2. The edges are directed and run between movements under the criteria described for these networks above.
3. Each iteration tests the spread of infection along the edges from infected movements, so infection spreads in step that are not synchronised in time. The number of steps used in the simulations was 100. To assist in putting this in a context of time periods, the range of time intervals represented by 100 steps is presented for each network in the Results.
4. Each node remained infectious for only 1 step, as the edges leading from an infected movement should only be tested once.
5. The probability of transmission of infection along an edge (q) was constant for the simulation. To test for transmission along each edge at risk, a random positive number less than 1 was compared with q. Transmission occurred if its value was less than or equal to q.
6. The values of q, that were simulated on each of the networks, were 0.1, 0.11, 0.12, 0.15, 0.2, 0.5, 1.0.
7. 1000 simulations were run for each value of q, for each network.