The ESR1 gene is associated with risk for canine mammary tumours

Background The limited within-breed genetic heterogeneity and an enrichment of disease-predisposing alleles have made the dog a very suitable model for the identification of genes associated with risk for specific diseases. Canine mammary cancer is an example of such a disease. However, the underlying inherited risk factors for canine mammary tumours (CMTs) are still largely unknown. In this study, 52 single nucleotide polymorphisms (SNPs) in ten human cancer-associated genes were genotyped in two different datasets in order to identify genes/alleles associated with the development of CMTs. The first dataset consisted of English Springer Spaniel (ESS) CMT cases and controls. ESS is a dog breed known to be at increased risk of developing CMTs. In the second dataset, dogs from breeds known to have a high frequency of CMTs were compared to dogs from breeds with a lower occurrence of these tumours. Results We found significant associations to CMT for SNPs and haplotypes in the estrogen receptor 1 (ESR1) gene in the ESS material (best PBonf = 0.021). A large number of SNPs, among them several SNPs in ESR1, showed significantly different allele frequencies between the high and low risk breed groups (best PBonf = 8.8E-32, best PBPerm = 0.076). Conclusions The identification of CMT-associated SNPs in ESR1 in two independent datasets suggests that this gene might be involved in CMT development. These findings also support that CMT may serve as a good model for human breast cancer research.


Background
The modern dog breeds are a result of vigorous linebreeding and often originate from a few founding ancestors. This has led to extreme phenotypic variation between breeds, but limited genetic variation within breeds [1]. Some breeds have a considerably higher susceptibility to certain diseases than others. This indicates an enrichment of risk alleles within these specific breeds. Canine cancer is an example of such a disease. Mammary tumours are among the most common canine cancer forms [2]. Elderly, intact bitches are primarily affected by these tumours [3,4], with a higher incidence in breeds such as the English Springer Spaniel, Boxer, Cocker Spaniel and Dachshund [3][4][5][6][7][8]. The differences in breed predisposition clearly indicate a genetic influence on canine mammary tumour (CMT) development. There are many similarities between CMTs and human breast cancer. Among the shared disease characteristics are a spontaneous occurrence of tumours, by which females are primarily affected, and a hormonal influence on tumour development (e.g. oestrogen and progesterone) (as reviewed by Queiroga et al. 2011 [9]). Regional lymph node metastasis seems to be less important as a prognostic factor in dogs than in humans, but metastatic spread is otherwise broadly equal [10]. There are also similarities regarding the histological features and classifications of human breast cancer and CMTs. But while epithelial tumours are by far the most common in humans, CMTs relatively frequently also contain myoepithelial and mesenchymal components. Furthermore studies indicate that CMT and human breast cancer have several mutual prognostic markers and genetic risk factors for disease (as reviewed by Queiroga et al. 2011 [9]). Still, the underlying inherited risk factors for CMTs are largely unknown. Previous studies of genotypes, gene and protein expression in CMTs have identified possible candidate genes and pathways, but the results need to be confirmed or are somewhat inconsistent. However, a study on BRCA1 and BRCA2, two genes well-known to be involved in human breast cancer, showed associations of these genes with mammary tumours in English Springer Spaniels [11]. Another CMT study found mutations in the cancer-associated gene TP53 [12]. We have previously identified a considerable number of single nucleotide polymorphisms (SNPs) in ten cancer-associated genes known from studies in dogs and/or humans [13]. Some of these SNPs are likely to be associated with canine cancer, and the protein-changing SNPs are of particular interest due to their potential as functional disease-causing variants. In the present study, we aimed at exploring such CMT associations to identify genes involved in the development of mammary tumours in a case-control population of English Springer Spaniels (ESS) and in a second dataset of high and low risk breeds of CMT.

Results
After quality control, 165 ESS cases and 94 controls were left for analysis (Table 1). Of the 41 SNPs that passed quality control in the ESS case-control dataset, nominal single SNP association was found for two SNPs in ESR1 exon 2 (rs21960513) and intron 7-8 (ss244244344) (P Raw = 0.033 and 0.002, respectively, Table 2). Another SNP in ESR1 (ss244244343) showed borderline nominal significance (P Raw = 0.052). The risk alleles for the two SNPs ss244244344 and ss244244343 are extremely common in the ESS cohort ( Table 2). The ss244244344 and rs21960513 SNPs were still significant after correcting for multiple testing using 10,000 permutations (P Perm = 0.018 and 0.042, Table 2). Ss244244344 also remained significant after applying Bonferroni correction. Of the classified cases, 32 malignant and 78 benign were left after quality control (Table 1). In the comparison of cases with malignant diagnosis vs. controls, ss244244344 showed nominal significance (P Raw = 0.016. Additional file 1: Table S3), but the result was not significant after multiple testing correction. We found no significant differences in allele frequencies between benign cases and controls or malignant and benign cases. All association results are provided in Additional file 1.
We identified 11 LD blocks with D'~1 and LOD ≥ 2 in the ESS case-control dataset. One LD block was identified in each of the genes BRCA2, BRIP1, CDH1, CHEK2, EGFR and PTEN, whereas two and three blocks were found within ESR1 and ERBB2 (HER2), respectively. Only one SNP was genotyped in BRCA1 and STK11, making LD and haplotype analysis impossible. Nominal association was found for one and two haplotypes in each of two ESR1 LD blocks, respectively, and one haplotype showed borderline nominal significance (Table 3). No haplotypes remained significant after correction for multiple testing. As the total cancer risk of an individual is probably a result of risk alleles at multiple loci, we evaluated the combined effect of the risk haplotypes of the two ESR1 LD blocks compared to the protective haplotypes. A borderline Bonferroni-significant association was found for the combined risk haplotypes with an odds ratio of 3.3 (P Bonf = 0.055) ( Table 4).
In the second dataset, 237 dogs of high risk breeds, 191 dogs of low risk breeds and 43 SNPs passed quality control and were included in the study of allele frequencies ( Table 1). Nineteen of the 31 SNPs with nominal single SNP associations were significant after 10 7 permutations, 23 after Bonferroni correction, and among them were the three ESR1 SNPs rs21960513, ss244244343 and ss244244344 (Table 5).
However, there were considerable inter-breed variations in SNP allele frequencies for the high and low risk dataset also within the high and low risk breed groups (Table 6). Thus, the mean allele frequency of the group was often not representative for all the breeds included. The overall differences in allele frequency among breeds also caused general inflation of association P-values, complicating the interpretation. Breed permutation testing was therefore performed to correct for the inflation. None of the SNPs were significant after breed permutation testing (Table 5). To estimate the degree of association seen for the Bonferroni significant polymorphisms, correlation between disease risk and average breed allele frequency for each SNP was calculated. No statistically significant correlations were found.
We applied the LD block criteria D'~1 and LOD ≥ 2 to the second dataset to study if any of the blocks found in the ESS cases and controls could be re-identified in the high and low risk breeds. Not all blocks were present in all breeds, but a 9kilobase block of eight SNPs in ERBB2 (ss244244354, ss244244355, ss244244357, ss244244358, ss244244360, ss244244361, ss244244363 and ss244244364) was re-identified for all breeds but the Beagle.
When aligned in Sequencher W the canine ESR1 exons showed a high match percentage (≥84%) to the human exons, except for exon 1 and 8. We found no human cancer-associated polymorphisms in close proximity to the canine ESR1 exon 2 SNP (rs21960513). However, the canine SNP in exon 4 (ss244244343) was positioned one base pair (bp) next to the human rs1801132 SNP ( Figure 1). Another human ESR1 polymorphism, rs2228480, aligned to the canine exon 8 at a position 207 bp downstream from the canine exon 8 SNP (ss244244346) (minimum match percentage of 84%) ( Figure 1). The canine SNPs rs21960513 and ss244244346 were synonymous, while ss244244343 lead to an amino acid substitution from isoleucine to leucine. Also, ss244244343 is located in a gene region conserved across four species [13]. However, the substitution was predicted benign and tolerant by PolyPhen and SIFT, respectively [13,14].

Discussion
Publications about the existence, frequency and importance of CMT-associated germline mutations and their role in the tumour development are sparse. In the present survey, we studied SNPs in known cancer-associated genes and observed significant differences in allele and haplotype frequencies for the ESR1 gene in the ESS material. These findings were supported by the high and low risk breed groups and suggested an association of ESR1 alleles with increased risk of CMTs. The ESR1 gene encodes an estrogen receptor which works as a ligand-activated transcription factor in the cell. Besides its normal role in e.g. sexual development and reproductive function, the estrogen receptor is involved in several pathological processes such as breast cancer (as reviewed by Dahlman-Wright et al. 2006 [20]). Previous studies on human breast cancer have suggested that ESR1 polymorphisms are associated with the development of these tumours (Additional file 2). The non-synonymous canine exon 4 SNP (ss244244343) is positioned one bp downstream compared to the position of the human rs1801132 SNP. While rs1801132 is in human codon 325, the canine SNP aligned to a position in human codon 326. Both codons encode amino acids in the hormone binding domain of the human estrogen receptor ( Figure 1). This domain is related to receptor dimerization, chaperone binding and recruitment of co-regulators [21]. Studies on rs1801132 have shown an association with breast malignancies [22,23]. Its C allele has been associated with cancer, suggesting that it interferes with the binding of the GATA-1 and GATA-2 transcription factors to the estrogen receptor [21]. GATA transcription factors interact with the activation factor 2 (AF2) region of the ligand binding domain of the human estrogen receptor [15]. There are structural differences in the human and canine ERα proteins, but the major pocket sites seem to be very similar [24]. An association with breast cancer has also been suggested for the human rs2228480 [25], which aligned in proximity of the synonymous canine exon 8 SNP (ss244244346). Rs2228480 is positioned near the F domain  of the human estrogen receptor (Figure 1). This domain is believed to be important for the ability of estrogen receptor to distinguish between receptor agonist and antagonist binding [26]. Thus, the observed associations of the canine exon 4 and exon 8 SNPs in our two independent canine datasets are supported by similar effects of closely linked SNPs in the human gene. This suggests ESR1 as an interesting candidate gene for mammary tumours in dogs as well. Given the prior evidence that ESR1 plays a role in human breast cancer etiology, it seems probable that the described loci of the present study might be correlated with a causal variant affecting ESR1 function. Correction for multiple testing is necessary to adjust for the multiple comparisons performed when testing a Fisher-Exact test applied due to low counts of the CG haplotype (less than five in both cases and controls). b Corrected for the number of LD blocks (11). c For 29 cases and 18 controls, the haplotype combinations could not be fully determined due to genotyping failure of one or more of the ESR1 SNPs.   several SNPs or haplotypes for association. However, SNPs within a gene are very closely linked and not independent observations. Correcting for the number of SNPs would therefore be too conservative. Moreover, this study comprises a selection of biologically important and previously cancer-associated genes where it is likely to find an association, rather than a random set of genes.
To assure an appropriate correction for multiple testing of the single SNP and haplotype P-values, we therefore used the number of LD blocks. The fact that only one SNP in ESR1 was significant in the ESS material after Bonferroni correction was somewhat surprising. Possibly, the rest of the SNPs we assessed are not directly causative or in high LD with such variants, creating false negative gene association results. Further, a substantial number of sub-classifications of CMTs exists [27], potentially with different germline mutations contributing to the development of different tumour subtypes and grades of malignancy. Such heterogeneity might complicate the detection of truly cancer-associated mutations. Thus, our study of the ESS dataset might have insufficient power to prove SNP associations to CMT. Still, it has been indicated that about 100 cases and 100 controls should suffice to find loci with strong effects (fivefold) even for complex traits such as cancer [28]. As the one Bonferroni significant ESR1 SNP (ss244244344) is intronic, and we have tested only a limited set of SNPs, this SNP is more likely in LD with a functional mutation than being causative itself. It might also be a false positive. Thus, more studies on the ESR1 gene are required to establish its role in CMT development.
The considerable breed variations in allele frequencies for the high and low risk dataset might be expected due to between-breed genetic heterogeneity for risk alleles at different risk loci. Yet, the differences are interesting as it appears to be a significant breed-specific accumulation of certain coding variants also in genes that are vital for normal cell function. However, the large allele frequency differences between breeds within the same risk group complicated the interpretation of the results. There is potentially a genetic heterogeneity between breeds as to predominant CMT types and associated risk genes/alleles. If the candidate genes in this study are associated with cancer risk in the selected breeds, the variation in allele frequencies indicates that the associated genes/alleles differ between breeds. Another possibility is that cancer-driving mutations are in the regulatory parts of the genes, and the coding SNPs in the present study can be considered markers for functionally active regulatory sites. During breed differentiation there may have been recombinations between regulatory sites and coding parts resulting in different coding SNPs being linked to regulatory variation in different breeds. Consequently, even if a gene has an important role in cancer development, the associated coding SNP allele/haplotype might vary between breeds. Another challenge is the documentation of high and low risk breeds. As a result of e.g. ancestral patterns of geographical establishment, fluctuations in a dog breed's popularity and extensive use of popular sires within a country, the genetic composition of a dog breed can change over time and between different geographical locations. Thus, breed predisposition of CMT might vary from subpopulation to subpopulation. We based the selection of high and low risk breeds in this study on previous publications, but not all of them were Norwegian, and some were up to twenty years old. This could possibly be a source of sample error in our study. Moreover, the dogs from the high and low risk breeds in our study are randomly selected without knowledge about the CMT phenotype. They would be a mixture of individuals with high and low risk corresponding to the population frequency of CMT. The frequency and effect of CMT-associated risk alleles would need to be relatively strong to be detected in such a material. It might be that our study is underpowered in that respect. However, we have documented large breed variations in allele frequencies of the coding SNPs in important cancer genes, and there are probably similar differences in the frequency of functionally active haplotypes between dog populations.

Conclusions
Cancer is a very complex disease. As in human breast malignancies, it is likely that the development of CMTs is influenced by several genes. The identified association of ESR1 to CMT in the present survey supports the power of the canine model for human breast cancer and the fact that combined studies within and between breeds can add power to the detection of risk alleles also for complex traits. However, the increased risk of CMTs in ESS and other high risk breeds might be due to other SNPs and/or genes than those selected in the present study. There is also a chance that predisposing CMT variants are undetected in our study due to limited power. Nevertheless, this is to our knowledge the first reported association of ESR1 polymorphisms to CMT and supports ESR1 as a candidate gene for canine cancer that should be further studied.

Samples
Two separate datasets were included in this study. The first consisted of blood DNA from English Springer Spaniel CMT cases and controls [11] (Table 1). These were privately owned female dogs registered in the Swedish Kennel Club. Approximately half of the dogs were confirmed unrelated at the parental level, while the rest could be as closely related as siblings. A subset of the cases had been classified as malignant (n = 33) or benign (n = 83) CMTs by histopathology by a veterinary pathologist at the time of the genotyping analysis. The rest of the cases were selected based on a veterinary clinical examination confirming the presence of single or multiple nodules within the mammary glands, but they had not (yet) had their mammary tumours surgically removed or histopathologically evaluated. The control dogs were older than eight years with confirmed absence of CMT after palpation of the mammary glands by a veterinarian. However, some of the samples from the material by Rivera et al. were not available for the present study. The same ESS cohort has been genotyped in a parallel study using Illumina 170 K canine HD SNP array, and multidimensional scaling plots were used to evaluate population stratification (data not shown). In this analysis, an outlier group of 29 dogs was identified. These dogs were consequently removed from further analysis in the present study.
The second dataset consisted of EDTA blood samples from the Canine Biobank at the Norwegian School of Veterinary Science (NSVS). In total, this dataset comprised samples from 450 individuals of nine dog breeds ( Table 1). The selected breeds were known to be at either high or low risk of developing CMTs according to previous studies. However, the CMT status of the individual dogs from the Canine Biobank was unknown. But according to the higher genetic risk for some of the breeds, an increased allele frequency of associated risk alleles would be expected. Representing breeds at high risk, the Boxer, Cocker Spaniel, Dachshund, English Setter and Standard Poodle were selected. Assumed low risk breeds included in the study were the Beagle, Bernese Mountain Dog, Collie and Shetland Sheepdog [3][4][5][6][7][8]. Genomic DNA was extracted from the EDTA blood samples using E.Z.N.A Blood Kit according to the manufacturer's protocol (Omega W , VWR International, West Chester, Pennsylvania, USA). The DNA was analysed for quality and quantity using NanoDrop (Thermo Fisher Scientific, Wilmington, Pennsylvania, USA).

Single nucleotide polymorphism selection and genotyping
All samples were genotyped for 52 previously described canine SNPs [13] (Additional file 3). These SNPs were located in ten genes previously reported to be cancer-associated in humans; BRCA1, BRCA2, BRIP1, CDH1, CHEK2, EGFR, ERBB2 (HER2), ESR1, PTEN and STK11 ( Table 7). Twenty of the SNPs were found in coding regions, including 11 synonymous and 9 non-synonymous SNPs. The SNPs were distributed into two pools and genotyped using the Sequenom iPLEX Gold Mass ARRAY W according to the manufacturer's protocol (Sequenom W , San Diego, California, USA). The genotyping was performed at Broad Institute, Cambridge, Massachusetts.

Single SNP and haplotype association analysis
Single SNP and haplotype analysis were performed separately for the two different study datasets; the ESS cases were compared to the controls, and the high risk breeds from the Canine Biobank were all compared to the low risk breeds. Single SNP association analysis was also performed for the subset of ESS cases with benign tumours vs. controls, malignant tumours vs. controls and benign vs. malignant tumours. Only samples and SNPs with a genotyping success rate of ≥75% and SNPs with a minor allele frequency (MAF) ≥1% were included in the single SNP and haplotype association analysis. The PLINK software [29,30] was used for analysing allele frequencies, single χ 2 SNP association and SNP odds ratios. Haploview [31,32] was used to identify LD blocks with a D'~1 and LOD ≥ 2 for each dataset and to generate haplotypes and haplotype association statistics. Odds ratios for haplotypes at each specific locus were estimated using calculators at VassarStats [33]. The nominal (raw) χ 2 P-values from the single SNP and haplotype analysis were Bonferroni corrected using the number of LD blocks to adjust for the problem of multiple comparisons that arises from evaluating several SNPs or haplotypes. Multiple testing correction using 10,000 permutations for the ESS dataset and 10 7 permutations for the high and low risk breed dataset was also performed. Further, we did permutation testing by permuting the high/low risk labels simultaneously for all dogs in each breed in combination with PLINK analysis of association, using 10,000 permutations. A P-value of less than 0.05 after correction for multiple testing was reckoned statistically significant.
Considering each individual breed as the study unit rather than the individual dog, we performed pairwise