Characterization of the aldo-keto reductase 1C gene cluster on pig chromosome 10: possible associations with reproductive traits

Background The rate of pubertal development and weaning to estrus interval are correlated and affect reproductive efficiency of swine. Quantitative trait loci (QTL) for age of puberty, nipple number and ovulation rate have been identified in Meishan crosses on pig chromosome 10q (SSC10) near the telomere, which is homologous to human chromosome 10p15 and contains an aldo-keto reductase (AKR) gene cluster with at least six family members. AKRs are tissue-specific hydroxysteroid dehydrogenases that interconvert weak steroid hormones to their more potent counterparts and regulate processes involved in development, homeostasis and reproduction. Because of their location in the swine genome and their implication in reproductive physiology, this gene cluster was characterized and evaluated for effects on reproductive traits in swine. Results Screening the porcine CHORI-242 BAC library with a full-length AKR1C4 cDNA identified 7 positive clones and sample sequencing of 5 BAC clones revealed 5 distinct AKR1C genes (AKR1CL2 and AKR1C1 through 4), which mapped to 126–128 cM on SSC10. Using the IMpRH7000rad and IMNpRH212000rad radiation hybrid panels, these 5 genes mapped between microsatellite markers SWR67 and SW2067. Comparison of sequence data with the porcine BAC fingerprint map show that the cluster of genes resides in a 300 kb region. Twelve SNPs were genotyped in gilts observed for age at first estrus and ovulation rate from the F8 and F10 generations of one-quarter Meishan descendants of the USMARC resource population. Age at puberty, nipple number and ovulation rate data were analyzed for association with genotypes by MTDFREML using an animal model. One SNP, a phenylalanine to isoleucine substitution in AKR1C2, was associated with age of puberty (p = 0.07) and possibly ovulation rate (p = 0.102). Two SNP in AKR1C4 were significantly associated with nipple number (p ≤ 0.03) and another possibly associated with age at puberty (p = 0.09). Conclusion AKR1C genotypes were associated with nipple number as well as possible effects on age at puberty and ovulation rate. The estimated effects of AKR1C genotypes on these traits suggest that the SNPs are in incomplete linkage disequilibrium with the causal mutations that affect reproductive traits in swine. Further investigations are necessary to identify these mutations and understand how these AKR1C genes affect these important reproductive traits. The nucleotide sequence data reported have been submitted to GenBank and assigned accession numbers [GenBank:DQ474064–DQ474068, GenBank:DQ494488–DQ494490 and GenBank:DQ487182–DQ487184].


Background
In swine, the rate of pubertal development and successful pregnancy in gilts affects the efficient management of breeding females. Selection for growth rate and leanness in modern commercial pigs has resulted in a delay in the onset of puberty [1]. Age at puberty and weaning to estrus interval (WEI) are positively correlated [2] and the primary reason for culling sows is failure to return to estrus after weaning. Quantitative trait loci (QTL) have been identified for age of puberty in the pig on different chromosomes [3,4]. One of the QTL regions is located on the long arm of pig chromosome 10 (SSC10q) near the telomere, which is homologous to human chromosome 10p15 [5,6]. The q-arm of pig chromosome 10 also has QTL for ovulation rate [4] and number of nipples [7][8][9][10]. In the human, an aldo-keto reductase (AKR) gene family (AKR1C) has been identified near the telomere on chromosome 10p15 that contains at least six aldo-keto reductase family 1, member C genes [11]. The homologous region on mouse chromosome 13 contains a cluster of eight or nine AKR1C genes [12]. These duplicated genes maintain a high degree of sequence similarity, but differ greatly in their substrate specificity and tissue expression patterns.
The AKR superfamily are monomeric oxidoreductases that catalyze the NADP(H)-dependent reduction of a wide variety of substrates, ranging from steroids, prostaglandins, bile acids, carbohydrates and xenobiotics [13]. AKRs are also thought to deactivate damaging reactive oxygen species like carbonyl compounds from lipids and proteins leading to their elimination [14,15]. Aldo-keto reductases interconvert weak androgens, estrogens, progestins, mineralocorticoids and glucocorticoids to their more potent counterparts by catalyzing the reduction and oxidation of keto-and hydroxysteroids, respectively, thereby regulating a wide range of physiological processes involved in development, homeostasis and reproduction [16]. In this manner, AKRs regulate the occupancy and transactivation of several steroid receptors in target tissues leading to transcription of hormone-responsive genes [13]. These steroid substrates can also act directly through "non-genomic" effects, such as, formation of neuroactive steroids and activation of ion channels, G-protein-coupled receptors and several kinase signalling pathways. The products of AKR activity have been implicated in prostate disease, breast cancer, obesity, polycystic ovary disease and delay in the onset of puberty in humans [17][18][19][20][21][22]. Because of their location in the swine genome and their implication in directing reproductive physiology, this gene cluster was characterized and evaluated for affecting age at puberty in the pig.

Identification of AKR1C genes
A genomic amplicon spanning exons 4 and 5 of AKR1C4 (Genbank accession number AF473815; [5]) probed against one third of the porcine RPCI-44 BAC library identified 3 clones (62L11, 69L21 and 125A17). A full-length cDNA for AKR1C4 (TC200328, The Institute for Genomic Research (TIGR) [23] probed against one third of the porcine CHORI-242 BAC library identified seven clones (203C8, 204L24, 226I21, 264H20, 275P11, 315D4 and 319P22). Three of these clones (203C8, 226I21 and 264H20) are represented in contig 10007 of pig BAC fingerprint map [24]. Southern analysis of a BamHI digest of 8 clones from both libraries using the same cDNA probe showed 5-9 shared bands and dissimilar banding patterns. Five of these BACs from the CHORI-242 library (203C8, 226I21, 264H20, 315D4 and 319P22) were digested with BamHI and HindIII and subcloned for sample sequencing. One BAC clone (CHORI-242-203C8) that contained at least 4 AKR1C genes was nebulized and subcloned into pBluescript to obtain more coverage. The complete AKR1C4 gene was PCR cloned by amplification using exon primers and the RPCI44-125A17 BAC clone as template. Exon sequences for AKR1CL2, AKR1C1, AKR1C2 and AKR1C3 were found in 4 of the 5 CHORI-242 BAC subclone libraries (203C8, 226I21, 315D4 and 319P22) and AKR1C4 sequence was found in CHORI-242-264H20 subclones exclusively. AKR1C2 was found in both sets of BACs in the overlapping region. Sequences corresponding to AKR1CL1 were not found in BAC subclones. The promoter region of AKR1C4 was cloned by probing BamHI/BglII digests of the RPCI BACs with a fragment containing exon 1 and part of intron 1. The promoter contained LSF (late SV40 factor), ERE (estrogen response element) and multiple SP1 and MYC sites [Genbank: DQ494489]. A 3900 bp contig from BAC clone CHORI-242-203C8 upstream of AKR1CL2 contained promoter elements CCAAT, SP1, ETS, GATA, NF-1 and ERE [Genbank: DQ494488] [25].

Identification of cDNAs for AKR1C
TIGR contigs were identified for four of the five AKR1C genes identified from BAC subclone sequence; only AKR1C2 was not represented in EST libraries sequenced or in contigs assembled by TIGR. These mRNAs were confirmed by RT-PCR of overlapping fragments or by completely sequencing individual clones. Amino acid and nucleotide homologies ranged from about 50-90% and 73-93%, respectively, among the pig AKR1C transcripts and the presence of active site residues common to aldoketo reductases was conserved ( Figure 1). No additional ESTs that would represent AKR1CL1 or other AKR1C genes were identified by sequence similarity analyses. Because of the high homology of these genes to all of the human AKR1C genes they were named by their relative position and conservation of amino acid sequence with human genes. The identity of pig AKR1CL2 could be confidently assigned; pig AKR1CL2 was most similar to human AKR1CL2 and mouse Akr1e1 ( Figure 2) and had little similarity to other human AKR1C genes. The pig AKR1C1 followed by AKR1C2 was the most similar to human AKR1C genes (83-85%) and to bovine AKR1C genes; pig AKR1C4 was more closely related to human AKR1C4 and AKR1CL1. Except for AKR1CL2, paralogues were more similar within species than were homologues among species.

Mapping and gene organization
SNPs identified in three of the five genes (AKR1CL2, AKR1C2 and AKR1C4, [Genbank: BV102614, BV680543, AF473815, respectively]) and microsatellites SB88-91 [GenBank: DQ487182-84] found in BAC subclone sequences were linkage mapped to SSC10, positions 126-128 cM. The resolution of the map was not great enough to determine order of genes or markers. Four AKR1C genes (AKR1CL2, AKR1C2, AKR1C3 and AKR1C4) and microsatellite markers SB89, SB90 and SB91 (Table 1) were mapped using the IMpRH 7000rad and IMNpRH2 12000rad radiation hybrid panels and anchored with flanking genes (PRKCQ, GDI2 and IDI1) and micro-satellite markers SWR67 and SW2067 ( Figure 3). The resolution of the IMNpRH2 12000rad panel was nearly the same as that of the IMpRH 7000rad panel. Comparison of sample sequence data with mRNA sequences and the porcine BAC fingerprint map showed that the complete cluster of genes resides in two overlapping BAC clones that were subcloned, CHORI-242-203C8 and 264H20; clones 226I21, 315D4 and 319P22 completely overlap 203C8 ( Figure 4). AKR1CL2, AKR1C1, AKR1C2 and AKR1C3 were all contained within CHORI-242 203C8 and microsatellite markers SB89, SB90 and SB91 were identified from this clone, as well. The gene order was determined to be AKR1C4, AKR1C2, AKR1C1, AKR1C3 and AKR1CL2 from centromere to telomere. BAC clone CHORI-242-264H20 contained AKR1C2, AKR1C4, the urocortin 3 (UCN3), tubulin alpha-like 3 (TUBAL3) and neuroepithelial cell transforming gene 1 (NET1) genes ( Figure 4). The orientation of the AKR1C genes was determined by aligning cDNAs or BAC subclone contigs with BES on the BAC fingerprint map [24]. This region corresponds to about 630 kb of human sequence (4.8-5.5 Mb on HSA10) but is contained in only two porcine BAC clones, average insert size of 173 kb, suggesting that this region is about half the size in the pig. A portion of this reduction is possibly due to the lack of a pig homologue to human AKR1CL1, Alignment of pig AKR1C mRNAs Figure 1 Alignment of pig AKR1C mRNAs. Conserved residues T24, L54, Y55, H117, F118, W227, N306 and Y310 involved in substrate binding are highlighted in yellow [52] and residues highlighted in blue are invariant residues found in all AKRs [53]. Residues Asp274Asn in AKR1C1, Ile16Phe, Tyr216Asn, Val234Phe in AKR1C2 and Val97Phe in AKR1C3 in red are non-synonymous amino acid polymorphisms identified in cDNA and genomic sequence of Meishan and White composite pigs. Dashes indicate residues that are absent, shading indicates identity and asterisks represent 10 bp increments.
although it is possible that AKR1CL1 was missed in the BAC sequence survey.

Expression of pig AKR1C genes in different adult tissues
Gene-specific cDNA for the five different AKR1C genes was amplified from 16 tissues (Table 1). Six tissues expressed all five genes tested (spleen, lung, ovary, adrenal, kidney, and endometrium) and AKR1CL2 and AKR1C4 were the most widely expressed genes ( Figure 5). AKR1CL2 was expressed in all tissues except pancreas and brain. Unlike human AKR1C4, pig AKR1C4 expression was not specific to the liver but was expressed in all tissues and AKR1C2 was the only other AKR1C gene expressed in brain. These expression patterns were also reflected by the number of clones and tissue source of the EST libraries from which these cDNAs were identified [23].

Association of SNPs with phenotypic traits
Polymorphisms were identified from RT-PCR sequences of Meishan and White composite endometrium cDNA. SNPs identified in the coding region were genotyped across generations F8 and F10 of the resource population. Additional SNPs were found by sequencing genomic DNA from animals of similar breed types and were chosen based on their frequency and potential for being a nonsynonymous mutation ( Table 2 and Additional File 1). Twelve SNPs were genotyped for association with reproductive traits. Three of these were in the AKR1C2 coding region, one was in AKR1CL2, and eight were the AKR1C4 gene, including one in the promoter region (Table 2). One SNP (49422_42), an isoleucine to phenylalanine substitution in AKR1C2, was associated with age of puberty (p = 0.07) and possibly ovulation rate (p = 0.102). Another SNP in AKR1C4 (49431_198) was possibly associated with age at puberty (p = 0.093; Table 3). Two other SNP were significantly associated with nipple number (p ≤ 0.03; Table 3).

Discussion
As in other species, this gene cluster is conserved in the pig, although individual family members have undergone some sequence divergence and specialization of tissue expression, possibly due to duplication of function. Because of high sequence similarity of the genes within species and divergence among species, it is difficult to definitively assign homologues for all members of the gene family and not all genes are represented in the pig (i.e., hAKR1CL1) that are found in human or mouse. Gene duplication usually results in tandem duplication of genes or segments along the chromosome [26] and gene conversion can result in a species paralogues being more closely related than homologues among species [27]. Because these genes are expressed in a multitude of tissues and the Radiation hybrid maps of AKR1C genes on IMpRH 7000rad and IMNpRH2 12000rad panels  Phylogenetic relationship of AKR1C family members of human, mouse, cow and pig Figure 2 Phylogenetic relationship of AKR1C family members of human, mouse, cow and pig. Multiple sequence alignment was performed by ClustalW and neighbor-joining tree of full-length amino acid sequences is shown. Species is designated by a lower case letter preceding the AKR1C gene family member (h = human; b = bovine; p = porcine and m = mouse). Numbers at branches represent 1000 bootstrapped datasets and scale bar represents substitutions per site. Human genes are from the AKR1C cluster on chromosome 10p15, mouse and bovine genes are from chromosomes 13 in both species and the five pig genes on SSC10q described in this study. In general, except for AKR1CL2, paralogues within species are more related than are homologues among species. expression patterns in pig tissues do not differentiate these genes with expression patterns described in human or mouse, assignment of homologues is even more complicated. In addition, the orientation of genes in this cluster is not identical to human or mouse gene order, while gene order of flanking genes (UCN3, TUBAL3, and NET1) is conserved [11]. As more species are fully sequenced, a clearer picture of the evolutionary process of this gene family can be drawn.
Because this pluripotent family of enzymes regulates steroid hormone action in a tissue-specific manner, they are compelling positional candidates for regulating reproductive functions [16,28]. Steroid metabolites of AKR1C enzymes rise at the onset of puberty [22,29] presumably due to increased substrate and enzyme activity. The onset of puberty is marked by hormonal changes directed by neuronal signals that result in activation of the hypothalamic-pituitary-gonadal axis and reproductive maturity [30]. Central to behavioral and gonadal maturity is the release of gonadotropin releasing hormone (GnRH) followed by synthesis and secretion of luteinizing hormone (LH) and follicle stimulating hormone (FSH). An LH surge in turn is essential for stimulating the cascade of events leading to ovulation [31]. The gonadal steroid 3alpha-hydroxy-4-pregnen-20-one (3 alpha HP) produced from progesterone by AKR1C inhibits GnRH activity on gonadotropes and suppresses FSH release from pituitary cells [32]. Modulation of the GnRH pulse frequency could therefore cause variation in the timing of puberty. Furthermore, as GnRH regulates FSH and LH release, it is possible that AKR1C activity may affect the number of ova shed during an estrus. FSH secretion stimulates the development of antral follicles and FSH levels are greater in some lines of gilts with higher ovulation rate [33,34].
QTL for nipple number have been identified on SSC10q and two SNPs in AKR1C4 were significantly associated with nipple number. Some of these QTL are more proximally located on SSC10q [8][9][10], but one identified in a Meishan/Pietrain cross maps to the same location as the AKR1C gene cluster [7]. Because androgen and antiandrogen treatment in rodents alters nipple development and BAC contig map of region containing pig AKR1C genes Pig retention [35], and treatment of rats with an inhibitor of 5α-reductase during gestation inhibits male nipple regression [36], a role for dihydrotestosterone (DHT) is implicated in normal nipple development. AKRs convert DHT, a preferred substrate, to the less active androgen 3αandrostanediol, thereby regulating steroid responsiveness in target tissues [13] such that variation in AKR1C activity could affect nipple development in the pig. AKR1C4 is the most catalytically active isoform for DHT reduction in human [28] and has high expression in mammary tissue ( Figure 5). Because it is ubiquitously expressed and ESTs have been identified in porcine embryonic libraries [23], it is probably expressed during embryonic development of mammary tissue.
Considerable support for an association of the AKR1C genotypes with nipple number was detected as well as some indication of an effect on age at puberty and possibly ovulation rate. Age of puberty and ovulation rate at a specific age are negatively correlated traits in Meishan pigs because ovulation rate increases from puberty to later estrus cycles [37] and animals that reach puberty earlier will have had more cycles at the time of measurement and greater number of ova shed; however, this increase in ovulation rate is less dramatic in occidental pigs. Because there was no selection performed on these animals and this is an area of increased recombination, it is assumed that recombination has greatly reduced linkage disequilibrium in this region, facilitating fine-mapping of repro-RT-PCR amplification of pig AKR1C genes in different tissues from a purebred Meishan sow at day 25 of gestation and mature testis using gene-specific primers (Hyp, hypothalamus; Spl, spleen; Plc, placenta; Mam, mammary tissue; Pan, pancreas; Pit, pitu-itary; Lng, lung; Lvr, liver; Ovr, ovary; Adr, adrenal; Kid, kidney; LD, longissimus dorsi muscle; End, endometrium; Brn, brain; Int, small intestine; Tst, testis) Figure 5 RT-PCR amplification of pig AKR1C genes in different tissues from a purebred Meishan sow at day 25 of gestation and mature testis using gene-specific primers (Hyp, hypothalamus; Spl, spleen; Plc, placenta; Mam, mammary tissue; Pan, pancreas; Pit, pituitary; Lng, lung; Lvr, liver; Ovr, ovary; Adr, adrenal; Kid, kidney; LD, longissimus dorsi muscle; End, endometrium; Brn, brain; Int, small intestine; Tst, testis). Amplicon sizes are given in Table 1; marker is a 100 bp ladder.

Conclusion
Variation in the aldo-keto reductase gene cluster on pig chromosome 10 may be associated with age of puberty, nipple number and ovulation rate in swine. Future studies will determine if this variation will be useful for selection of breeding females with greater reproductive efficiency in industry populations.

cDNA synthesis, amplification and sequencing
Porcine cDNAs for aldo-keto reductases (AKRs) were identified from EST sequences deposited in GenBank and assembled at The Institute for Genomic Research (TIGR) [23] or by homology of porcine genomic BAC sequence to human AKRs. Tissues from adult purebred and composite breed animals were collected in RNAlater (Ambion, Austin, TX) and homogenized in Trizol (Invitrogen, Carlsbad, CA) as the source for total RNA synthesis of RT-PCR template for identification of sequence variation. Tissues from  1 Accession numbers beginning with BV are sequences submitted to dbSTS. 2 Location refers to nucleotide position in cDNA. 3 Allele frequencies were calculated from F8 and F10 animals genotyped (ranging from 238-420 animals), and founder Meishan (MS) and founder White composite (WC) animals (7-10 animals each) in the original resource population; ND = not determined. 4 Polymorphisms are identified by IUB code with flanking genomic sequence. a purebred Meishan sow collected at day 25 of gestation and testis from a mature boar were used for differential expression of AKR1C transcripts. cDNA was synthesized with M-MLV reverse transcriptase (Promega, Madison, WI) using 2 μg of total RNA from endometrium, placenta, testis, ovary, liver, lung, adrenal, kidney, spleen, pituitary, hypothalamus, brain, pancreas, small intestine, skeletal muscle and mammary tissue. These reactions were run for 35 cycles with 20 ng of template as described below. Sequences were extended using 3'-RACE or RT-PCR from exon sequences identified in BAC subclones. Full-length cDNA clones were obtained by iterative screening and self-ligation of inverse PCR (SLIP) [38] of the MARC 1PIG and 2PIG primary libraries before normalization [39].

BAC screens and subclone libraries
Filters from the RPCI-44 and CHORI-242 porcine BAC libraries were screened using a random-primed nearly full-length cDNA of porcine AKR1C4 as probe. The probe was prepared by PCR of the MARC 2PIG library using primers in exons 1 and 9 and radioactive random-primed labeled (Megaprime DNA Labeling System, Amersham, Piscataway, NJ). Positive clones were grown in 100 ml cultures and processed for BAC DNA using a midi-prep column (Marligen Biosciences, Ijamsville, MD). The BACs were digested with BamHI and HindIII separately and subcloned into pBluescript. One 96-well plate of clones was processed for each BAC and enzyme combination and sequenced with T3 and T7 primer. One CHORI-242 BAC (203C8) was sheared with a nebulizer (Invitrogen, Carlsbad, CA), end-repaired, then cloned into pBluescript and four 384-well plates of clones were sequenced.

PCR and sequencing
Primer pairs for amplification of genomic DNA were designed from porcine AKR1C cDNA sequences and genomic sequence obtained during this study using Primer 3; code available at the Primer 3 Software website [40]. Five μl of the PCR reaction was electrophoresed in 1.5% agarose gels to determine quality of amplification and the remainder was prepared for sequencing. Chromatograms were imported into the MARC database, bases called with Phred, assembled into contigs with Phrap, polymorphisms identified using Polyphred, and assessed using Consed [41].

SNP genotyping
SNPs were mapped using a primer extension assay on the Sequenom MassArray™ system (San Diego, CA, USA). Ten μl PCR reactions contained 10 ng of genomic DNA, 0.25 U HotStar Taq, 1× of supplied buffer, 1.5 mM MgCl 2 , 200 μM dNTPs, and 0.4 μM forward and reverse tailed primers. The primer extension reaction used 0.6 μM of probe primer and was performed according to the manufacturer's recommendations for hME chemistry (Sequenom, San Diego, CA, USA).  One half of the reaction was loaded on 2% agarose gels and manually genotyped. Data were analyzed for twopoint and multipoint linkage with the IMpRH mapping tool [44] and submitted to the IMpRH database [45]. Carthagene [46,47] was used to estimate multipoint marker distance and order using all public markers on chromosome 10 in the IMpRH database [45] for the IMpRH 7000rad panel and those developed in this study for the IMNpRH2 12000rad panel to approximate position of mapped markers. Markers run on the IMpRH 7000rad panel have been submitted to the IMpRH public database.

Radiation hybrid and linkage mapping
Linkage analyses were performed as described [48] where TWOPOINT analyses were used to indicate the chromosome linkage group and the ALL, FLIPS and FIXED options were used to determine the multipoint position of the marker (CRIMAP v2.4). Multipoint locations for all mapped markers are based on the latest published swine genetic map [49].

Animals and resource population
Genomic DNA from parents of the MARC reference family (a White composite boar and seven crossbred sows) was used to identify SNPs. Phenotypes and genotypes were collected from animals of generations 8 and 10 (F8 and F10) produced from the original resource population [4] used to identify QTL. The original animals were from reciprocal backcrosses of 10

Phenotypic data and statistical methods
The F8 gilts were observed for first estrus beginning when the oldest gilts reached 120 days of age as described by Rohrer et al. [4], and ovulation rates were determined by counting corpora lutea on the ovaries at slaughter after the third estrus in gilts from the F8 and F10 generations. Genotype probabilities were calculated for all animals using an extended version of GenoProb [50]. Association analyses between phenotypes and genotypes were conducted using MTDFREML [51]. The model fitted included fixed effects for contemporary group and regressions on SNP marker genotype probabilities (i.e., probability of an animal being aa, aA, or AA). Random effects included the animal's polygenic breeding value and residual error. Each marker was analyzed separately. No adjustments to reported p-values were made for multiple comparisons. Twelve SNP markers were analyzed for three traits; age of puberty (191 observations), ovulation rate (233 observations) and nipple number (1144 observations).

Phylogenetic analysis
AKR1C peptide sequences were aligned with ClustalX, the distance matrix constructed with the program PRODIST using a Dayhoff PAM matrix model, and a neighbor-joining tree constructed in PHYLIP (v 3.65). The tree was constructed with the program NEIGHBOR and rooted at a mid-point with the program RETREE. Bootstrap values were derived from 1000 pseudo-datasets generated in SEQBOOT. The tree was viewed in TREEVIEW (v 1.6.6).