Targeted collation of contact data typically only represent a small subset of the true population, and if these data are biased this may lead to misinterpretation of recorded contact structures [1–3]. Consequently, heterogeneities in population contact structure can be poorly characterised. The importance of such contact heterogeneities for infectious disease transmission have been highlighted through the development of social network models in humans  and movement network models in livestock [5–10]. In Great Britain (GB), the application of network analysis to livestock movements has been uniquely informed by a well-defined temporally explicit Cattle Tracing System (CTS) database [11, 12]. However, even in this case there is some evidence of potential bias in cattle movement patterns arising through missing or incorrect movement records at the level of the type of enterprise . Such systematic errors, arising from data collection procedures and inaccuracies in reported information, may lead to biases in the quantification of network properties. Bias identification is therefore an important step in ensuring model validity.
Mathematical models of avian influenza (AI) in Great Britain (GB) have been largely informed by the Poultry Network Database (PND), providing poultry network information for a subset of the industry, and the Great Britain Poultry Register (GBPR) which provides more representative demographic information. Although the PND does not reflect temporally explicit movements on-to and off-of farms, shared industry associations have been used to infer potential contacts between farms and have informed stochastic simulation and exploratory models [14–16]. For example, all farms that are associated with a particular slaughterhouse are assumed potentially epidemiologically linked to one another. In the absence of epidemic data, and therefore without the ability to validate predictive models for AI control in GB, mathematical models are a valuable tool for exploring the connectivity of the poultry industry. These epidemiological models have investigated the efficacy of current control measures for AI in GB and have identified particular scenarios that could result in a large outbreak [14–16].
The PND was collated in 2006 by the Veterinary Laboratories Agency (VLA). This was designed to establish farms that share industry associations such as through catching companies (CCs), slaughterhouses (SHs) or through being part of a larger integrated company (IC). From this, an estimate of between-farm association frequency (i.e. the maximum number of farms a single farm may be associated with) can be made at a farm-level, which can be used to inform logistical considerations during a disease outbreak prior to the implementation of movement restrictions . These between-farm associations inferred from the PND have been used as a proxy for between-farm "contacts" as they are considered to represent potential routes of between-farm spread of infection through personnel, shared equipment and vehicles .
Epidemiological evidence from previous outbreaks of AI indicate the role of indirect transmission via fomites, for example through shared equipment, the reuse of disposable egg-trays, the movement of vehicles (during chick delivery, the delivery of feed, and the collection of dead-birds), the management practices of integrated companies, contaminated bird-carrying crates during slaughterhouse-related farm visits and through the clothing, shoes and hands of farm visitors [18–27]. Such mechanisms of transmission via fomites are also identified as sources of possible risk through catching company personnel and vehicles associated with slaughterhouse-related farm visits .
Whilst this evidence is largely circumstantial, arising from epidemiological investigations, it is considered likely that AI will share the same mechanisms for between-farm transmission as other pathogens similarly transmitted via the faecal-oral route , such as Salmonella, Campylobacter and those associated with coccidiosis . Fomites have been implicated in poultry flock infections caused by these pathogens and represent possible mechanisms of between-farm transmission; for example, during slaughterhouse-related farm visits via equipment such as bird-carrying crates and pallets, the wheels of forklift trucks and slaughterhouse vehicles, the boots of drivers' and catchers', as well as via staff and equipment shared between different farm premises [20, 30–34]. Evidence from previous outbreaks also suggests that spatial spread, possibly via airborne mechanisms, may also play an important role between farms within close proximity [18, 20, 25, 35, 36]. However, this mechanism is considered to be relatively less important for GB compared with countries such as the Netherlands , which has regions of greater poultry farm density.
As a result of the targeted sampling of known SHs and CCs, missing data inherently biases the PND towards large poultry premises. Therefore the PND cannot be considered representative of the entire GB poultry industry and was never intended to be so [Lucy Snow, pers. comm.]. It has been shown that even when individuals are sampled at random, this process may not result in a random representation of their contacts, and consequently overall network properties [1, 2, 37]. Missing data within the PND are inherently non-random, and therefore systematic differences in the types of farms sampled compared to those unsampled may further exacerbate the misrepresentation of network properties, and the identification of high risk sectors of the poultry industry. The validity of generalising PND informed network properties to a national-scale is potentially reduced by missing farms. Therefore, establishing the likely characteristics of these missing farms, based on the known properties of those that are well-characterised, is an important step to inform future data collection exercises. It is only through a more representative characterisation of the poultry industry that contact heterogeneities can be usefully applied to predictive models of poultry disease control.
To our knowledge, the appropriateness of using inferred industry contacts from the PND for informing predictive AI models in GB has not been considered in the published literature. In particular, the potential implications of targeted sampling procedures for predictive modelling of AI control have yet to be quantified. Potential biases in inferred poultry network properties may have important consequences for government preparedness of resource distribution during an outbreak; the extent of between-farm spread may depend on how rapid and where the movement restrictions that inhibit this risk are implemented. As the human health, animal welfare and economic consequences of a large AI outbreak could potentially be catastrophic [38–44], government and industry preparedness for such an event is vital.
Our aim was to identify geographical areas with biases in the farm contact structure by extrapolating network data informed by the PND to the GBPR, which is more demographically representative of GB poultry farms but without the detailed information on between-farm associations via the poultry industry. This database was established by the British Department for Environment, Food and Rural Affairs (Defra) in December 2005, and it is mandatory for all commercial farms holding more than 50 birds to record their farm-related details .
Specifically, our objectives were to: (i) determine statistical associations between farm-level factors and network informed between-farm association frequency, using multivariable logistic regression; (ii) extrapolate the fitted statistical models to each farm recorded in the GBPR, obtaining predicted probabilities for categorical between-farm association frequency; (iii) compare the regional-level (GB divided into eleven geographical regions) distribution of PND-informed between-farm association frequencies with estimates following extrapolation to the GBPR.