Immunoinformatics and analysis of antigen distribution of Ureaplasma diversum strains isolated from different Brazilian states

Background Ureaplasma diversum has numerous virulence factors that contribute to pathogenesis in cattle, including Lipid-associated membrane proteins (LAMPs). Therefore, the objectives of this study were to evaluate in silico important characteristics for immunobiological applications and for heterologous expression of 36 LAMPs of U. diversum (UdLAMPs) and, also, to verify by conventional PCR the distribution of these antigens in strains of Brazilian states (Bahia, Minas Gerais, São Paulo, and Mato Grosso do Sul). The Manatee database was used to obtain the gene and peptide sequences of the antigens. Similarity and identity studies were performed using BLASTp and direct antigenicity was evaluated by the VaxiJen v2.0 server. Epitope prediction for B lymphocytes was performed on the BepiPred v2.0 and CBTOPE v1.0 servers. NetBoLApan v1.0 was used to predict CD8+ T lymphocyte epitopes. Subcellular location and presence of transmembrane regions were verified by the software PSORTb v3.0.2 and TMHMM v2.2 respectively. SignalP v5.0, SecretomeP v2.0, and DOLOP servers were used to predict the extracellular excretion signal. Physico-chemical properties were evaluated by the web-software ProtParam, Solpro, and Protein-sol. Results In silico analysis revealed that many UdLAMPs have desirable properties for immunobiological applications and heterologous expression. The proteins gudiv_61, gudiv_103, gudiv_517, and gudiv_681 were most promising. Strains from the 4 states were PCR positive for antigens predicted with immunogenic and/or with good characteristics for expression in a heterologous system. Conclusion These works contribute to a better understanding of the immunobiological properties of the UdLAMPs and provide a profile of the distribution of these antigens in different Brazilian states.


Background
U. diversum, a member of the Mollicutes class, is a bovine pathogen related to reproductive disorders [1]. This agent presents the following outstanding characteristics: the production of ammonia, through urea hydrolysis, and the absence of a cell wall [2]. Although U. diversum infection is not conditioned by the presence of clinical symptoms, it can colonize the respiratory and genital/reproductive systems of cattle, generating severe inflammatory conditions often culminating in abortion [3]. It is considered an opportunistic pathogen found in the mucosa and secretions of the vulva, vagina, and udder of cows and secretion of the respiratory tract of calves [1].
Milk production in cows and spermatogenesis in bulls are also affected. U. diversum produces mastitis along with visible changes in the milk and udder [4]. In bulls, it causes seminal vesiculitis, balanoposthitis, epididymitis, and morphological and functional changes in sperm. Thus, U. diversum colonizes different regions of the reproductive system leading to active semen contamination [5]. Infection of semen for artificial insemination and in vitro fertilization results in serious obstacles to modern bovine reproduction techniques [6].
In addition to urease, U. diversum has sophisticated virulence mechanisms, including LAMPs, a mixture of mycoplasmic lipoproteins expressed on the cell surface that interact directly with host cells. These antigens are considered the main molecular agents associated with pathogens in several Mollicutes species and play an important role in host pathogenicity and immunomodulation [7]. In addition to lipoproteins, in the bovine ureaplasma genome, our research group identified genes encoding the multiple band antigen (MBA), which contain multiple series repetitions in the C-terminal region, as well as the gene for hemolysin and for the Mycoplasma Ig binding protein (MIB) and Mycoplasma Ig protease (MIP) -MIB-MIP system-, which acts by binding and cleaving the IgG heavy chain [2,8].
The genomic sequencing of a species offers researchers new possibilities for research. Rapid analysis of all or part of the genome allows the construction of primers and screening of genes coding for virulence factors in the most diverse bacterial strains. The use of immunoinformatics tools allows screening with a high level of reliability of the physico-chemical and immunological properties of these molecules with low cost and reliable results [9]. The use of recombinant DNA technology can, through expression in a heterologous system, allow the analysis of virulence factors alone. Therefore, the objective of this work was to evaluate antigens of U. diversum regarding immunobiological properties and desirable characteristics for expression in a heterologous system, as well as to evaluate the distribution of these antigens in isolates from different regions of Brazil.

U. diversum antigens have low similarity with bovine proteome proteins
BLASTp analyses of the 36 UdLAMPs with bovine proteomes revealed that the maximum similarity occurred between the lipoprotein gudiv_159 and the Tinken-1 protein from Bos taurus taurus (29%). Bos taurus indicus had a similarity detected only for gudiv_517 (10%). The hybrid showed no significant similarity to any protein (Table 1).
In silico analysis showed that U. diversum antigens have epitopes for B and T lymphocytes Conformational and linear B cell epitopes were evaluated for the number of regions and the total percentage of amino acids in epitope regions. All proteins showed conformational epitopes for B lymphocytes. The most significant B cell epitopes are listed in Additional Table 1. The number of antigenic regions ranged from 2 in gudiv_388 to 124 in gudiv_398. The proteins with the lowest and highest percentage of amino acids in antigenic regions were gudiv_164 (4.3%) and gudiv_66 (39.9%) respectively (Table 2). Except for 10 proteins (gudiv_546, gudiv_457, gudiv_427, gudiv_442, gudiv_ 388, gudiv_357, gudiv_331, gudiv_228, gudiv_171 and gudiv_159), all the others have a number of predicted regions greater than or equal to the values for surface protein 5 (Msp5) from Anaplasma marginale (Table 3). In the prediction of linear epitopes the number of antigenic regions varied from 1 in gudiv_159 to 84 in gudiv_398. The protein with the highest percentage of amino acids in antigenic regions was gudivi_179 (90.4%). Thirty proteins had number of antigenic regions greater than or equal to Msp5. Eighteen of the 36 UdLAMPs were predicted to be antigenic (score greater than or equal to 0.5 on the VaxiJen server).
In the prediction for major histocompatibility complex class I (MHCI) ligand, with the exception of gudiv_85 and gudiv_159, all other lipoproteins showed at least one predicted link for 4 of the 8 MHCI alleles bovine lymphocyte antigen (BoLA) studied (Table 3). Epitopes with strong binding in each BoLA allele are listed in Additional Tables 2 and 3. The maximum number of bonds was between the epitopes of the gudiv_398 protein and the BoLA-2 *01201 allele (75 bonds). Only three U. diversum antigens (gudiv_85, gudiv_331, and gudiv_388) had fewer connections than the Theileria parva 2 antigen (Tp2) in all alleles, of these, gudiv_85 did not show predicted connections in any allele (Table 3).

Some UdLAMPs have characteristics for heterologous expression in Escherichia coli
Parameters such as molecular weight (PM), instability index, aliphatic index, grand average of hydropathy (GRAVY), and solubility were predicted for U. diversum antigens. The protein PM varied between 9.0 and 240.2 (kilodalton) kDa. The proteins with the highest molecular weight were gudiv_398 (240.2 kDa), gudiv_162 (90.5 kDa) and gudiv_180 (88.7 kDa), while with lower molecular weight were gudiv_159, gudiv_ 85, and gudiv_331 with 13.3; 9.4 and 9.0 kDa ( Table 5). The instability rates ranged from 9.16 (gudiv_499) to 67.15 (gudiv_331). In general, when this index is less than 40, proteins are considered stable; therefore, in this study, only 4 proteins (gudiv_ 93, gudiv_159, gudiv_331, and gudiv_560) were classified as unstable according to the prediction. To assess hydrophobicity, GRAVY was studied, GRAVY positive proteins were only gudiv_91, gudiv_228, gudi_357, and gudivi_546 with values of 0.05; 0.12; 0.61 and 0.05, respectively. As for solubility, the proteins gudiv_91, gudiv_171, gudiv_287, gudv_357, gudiv_458, and gudiv_560 were insoluble in both Protein-Sol and SOLpro. Gudiv_91 and gudiv_357 also presented 4 and 7 transmembrane loops, respectively (Table 5). In total, sixteen proteins were predicted to be soluble in the two predictors (Table 5).
A considerable number of UdLAMPs have a signal for excretion by the classical and non-classical pathways The analysis of classical secretion mediated by signal peptide (SP) was performed by SignalP5. This server predicted SP in 29 of the 36 proteins studied. The size of the SPs ranged from 18 to 29 amino acids and all showed a cleavage site for peptidase II (sec / SPII). A cysteine immediately after the cleavage site can be seen in the predicted SPs ( Table 6). The DOLOP server, which uses a series of criteria to predict bacterial lipoprotein SPs, including the preferred occurrence of amino acids, ranked 17 of the 29 proteins predicted by SignalP with typical SP lipoprotein carriers. Of the twentynine proteins predicted with the presence of SPs by SignalP, twenty-five also showed a prediction of non-classical excretion when submitted to the predictor SecretomeP (non-signal peptide-mediated  Only the maximum similarity found − Similarity not significant by BLASTp excretion). In addition, some proteins (gudiv_61, gudiv_93, gudiv_162, gudiv_164, gudiv_179, gudiv_ 287, gudiv_331, gudiv_388, gudiv_546, gudiv_633, and gudiv_663) not discriminated as having SP for lipoproteins by DOLOP were predicted to be secreted by non-classic pathways ( Table 6). The prediction analysis reveals that UdLAMPs have important characteristics both for immunobiological applications and for expression in a heterologous system The antigens of U. diversum have been classified according to undesirable properties for use in prophylactic and immunodiagnostic measures; and undesirable properties for expression in E. coli. The proteins gudiv_61, gudiv_ 103, gudiv_517, and gudiv_681 passed in all parameters, not being retained in any exclusion criteria established in this study Fig. 1. In addition, a considerable number Table 3 Prediction of binding of UdLAMPs (peptide windows with 9 amino acids) to different BoLA alleles (MHCI) performed through the NetBoLApan v1.0 server. The total of strong and weak connections is expressed in absolute numbers UdLAMPs BoLA-1 *02301 BoLA-3 *00201 BoLA-2 *01201 BoLA-6 *01301 BoLA-3 *00101 BoLA-6* 04101 BoLA-T2C BoLA -T5  * Mapping of TCD8+ lymphocyte epitopes was also performed for Theileria parva Tp2 antigen of UdLAMPs were retained in only one or none of the exclusion criteria.
Gene coding sequences (CDS) for LAMPs predicted as antigenic are present in strains from different Brazilian states To verify the distribution of U. diversum antigens in different Brazilian states, the presence of genes for LAMPs in 46 U. diversum strains was investigated by PCR. Table 7 lists the primers constructed. All antigens were detected in strain ATCC 49782. The lowest and highest percentage of amplified antigens (not considering the ATCC strain) occurred for strains S8 and 59, respectively, 5.6 and 83.3% (Fig. 2). Regarding antigens, the highest prevalence was gudiv_759, gudiv_357, and gudiv_91 detected in 87, 84.8, and 82.6% of the strains, 27.98% − Identity not significant by BLASTp respectively. In contrast, the least present were gudiv_ 402 (2.2%) and gudiv_458 (4.3%). The presence of antigens varied in the strains isolated from the states studied (Fig. 3). In Bahia, the state with the highest number of strains, a total of 35 antigens were detected by PCR. The only strain in Minas Gerais tested positive for seven proteins. Isolated representatives of Mato Grosso do Sul (805 and 9653) had 27 antigens. In São Paulo, all 13 strains were PCR positive for 34 proteins.

Discussion
Mollicutes lipoproteins are important virulence factors associated with pathogenesis in the reproductive and respiratory tract of infected hosts [10]. In this study, the lipoprotein gudiv_159 had 29% similarity with the Tektin-1 protein from the Bos taurus taurus proteome. For the other UdLAMPs and proteins from other bovine subspecies, all similarity values were less than 12%. Similarity values greater than 25% are relevant when assessing immunological aspects [11]. The similarity between virulence factors and host proteins can make it difficult to develop an adequate immune response, or even generate cross-reaction events with autoantibody production during infection [12]. Mycoplasma hominis, M. fermentans and M. arthitides are species of Mollicutes often found in patients with autoimmune diseases [10]. A protective immune response with the production of effector cells and antibodies able to recognize epitopes of an infectious agent are essential for fighting infection. Conformational epitopes represent the majority of B cell epitopes (about 90%). However, conformational epitopes usually contain one or a few stretches of linear epitopes [13]. In the prediction, we found that all 36 UdLAMPs have conformational and linear epitopes for B lymphocytes and are predicted as antigenic (VaxiJen predictor). A considerable number of regions of conformational and linear epitopes were greater than or equal to the values for Msp5, one of the main surface proteins of A. marginale, known for its ability to induce antibody production during cattle infection [14]. The presence of these epitopes points to these molecules as agents capable of stimulating the development of a humoral immunological response.
U. diversum can also behave as an optional intracellular pathogen [15]. Thus, the possibility of UdLAMPs being processed and presented via MHCI can lead to cellular response activation. In this study, epitopes binding to bovine MHCI alleles were predicted in several UdLAMPs. Furthermore, 33 LAMPs had connections equal to or greater than the T. parva Tp2 antigen in all studied alleles. Tp2 is recognized for stimulating CD8 + T cells during bovine T. parva infection [16]. The studied alleles represent cattle destined for the different livestock sectors. Five alleles representing Bos taurus taurus (BoLA-6 * 01301, BoLA-2 * 01201, BoLA-3 * 00201, BoLA-1 * 02301 and BoLA-6 * 04101), two alleles representing Bos taurus indicus (Bola -T5, BoLA-3 * 00101) and an allele (BoLA-T2C) belongs to a hybrid [17]. Taurine breeds are predominantly found on dairy farms and Zebu cattle are mostly used for meat production [18]. Bovine hybrids are usually produced to align the commercial and management characteristics of both subspecies [19]. In this case, our prediction data reveal that a considerable number of UdLAMPs can interact with MHCI alleles of cattle destined for different activities in the livestock sector, reflecting in activation of inactivation immune response.
The identity analysis of UdLAMPs with proteomes of other microorganisms capable of infecting cattle is a useful initial approach for studies aimed at using these antigens or antibodies produced in immunodetection tests. We found that the proteins gudiv_103, gudiv_159, gudiv_171, gudiv_228, gudiv_517, gudiv_546, gudiv_680, and gudiv_681 did not present a significant identity with the proteins of other important Mollicutes that infect bovine. In contrast, 25 proteins showed an identity greater than 30%. According to Rost [20] above a cutoff point of 30% identity, 90% of the pairs are homologous. The low identity between proteins of different infectious agents from the same host is related to good specificity when considering detection tests [21]. Thus, U. diversum proteins with low identity may represent specific targets for use in immunodiagnostic techniques in detecting this pathogen. Fig. 1 Distribution of U. diversum antigens according to the prediction parameters. In red, the proteins included in the evaluated parameter and in white those not included. In undesirable parameters for use in prophylactic and immunodiagnostic measures, the relevant parameters for inducing the production of specific antibodies and positive immunomodulation are evaluated. In undesirable parameters for expression in E. coli, predicted parameters related to the production of stable, soluble, secreted proteins and with properties that facilitate the purification process after expression were evaluated In addition to the prediction of immunobiological properties, the prediction of properties favorable to expression in a heterologous system can contribute to the broad scale of a protein biological target. Some physicochemical properties influence the state of solubility, the formation of inclusion bodies or proteolysis of the heterologous peptide [22]. In this study, the protein PM ranged from 9.0 to 240.2 kDa. Proteins with PM between 70 and 60 kDa are well tolerated when E. coli is used as an expression system; however, proteins with very high PM are not adequately expressed in these bacteria, and are, therefore, degraded or structured in the form of inclusion bodies [23]. Small peptides (about 10 kDa) are also difficult to express in stable form due to improper folding, so they are often subject to proteolytic degradation [24].
Our analyses also showed that only gudiv_93, gudiv_ 159, gudiv_331 and gudiv_560 had an instability index greater than 40 and, therefore, all the others (with an index below 40) were considered to be stable [25]. Most of the proteins were GRAVY negative, which is related to hydrophilicity [26]. Greater hydrophilicity implies a greater capacity to form hydrogen bonds with water molecules and, consequently, greater solubility [27]. Sixteen proteins were predicted to be soluble in the two predictors used in this work (Solpro and proteinSol) and only two proteins had more than two predicted transmembrane loops. Transmembrane loops are hydrophobic regions that reduce solubility [28]. Expression in the soluble form is desirable, because to obtain soluble proteins from insoluble forms, a series of processing steps that involves the use of strong denaturants followed by renaturation is inevitable [29]. Even so, these additional steps do not guarantee the production of soluble and functional proteins.
The presence of specific markers capable of directing heterologous peptides to the extracellular medium in an expression system also contributes to the subsequent steps in the production of recombinant proteins [23,30,31]. Here, we show that more than half of the studied proteins were predicted to possess a SP recognized by sec/SPII and consequently likely a lipoprotein capable of being expressed and exported to the extracellular medium by E. coli. The presence of a SP for a classical secretory pathway or markers for secretion by a nonclassical pathway facilitates the transport and secretion of the transcript into the extracellular compartment. Secretion in the extracellular medium simplifies On the bottom is the total percentage of antigens that each strain carries in its genome (based on PCR results) and on the left is the percentage of strains carrying the coding sequence for each antigen individually purification processes, protects heterologous proteins from proteolysis, decreases endotoxin levels, and improves biological activity and solubility [32].
Bacterial proteins with good properties both for stimulating the immune response and for cloning and expression in a heterologous system are desirable targets for biotechnology [30]. In this study, the use of a filter with exclusion criteria based on the prediction data (In Diagram 1) showed that gudiv_61, gudiv_103, gudiv_517, and gudiv_681 are the ULAMPs most promising for immunobiological applications and for expression in E. coli as a heterologous system. However, the fact that an antigen does not meet all the requirements of Diagram 1 does not rule it out as a target for immunobiological studies or expression in a heterologous system. Depending on the type of analysis, proteins having good immunostimulatory properties, but with properties that hinder expression in E. coli could be expressed in other expression systems [33], or even in E. coli through fusion with proteins (tag) that increase the size of the transcript or improve solubility, reduce growth temperature, use of weak promoters and use of low concentrations of inducer [24]. Very large proteins or with many transmembrane loops could be studied by producing multiepitope chimeric proteins [34]. Finally, there is also the possibility of using expression systems entirely in vitro [35]. However, these alternatives increase the costs of the process; therefore, the inclusion of prediction in the planning stages of works that intend to express proteins can reduce project costs in addition to providing a theoretical forecast of bench tests.
In this work, the PCR detection of 36 UdLAMPs in isolates from U. diversum, from different regions of Brazil, warns of potential damage to livestock that U. diversum can cause, because in addition to immunomodulation, studies suggest that LAMPs are involved in adherence and invasion and cell apoptosis [2,7,15,36]. Strains representing the four evaluated states (Bahia, Minas Gerais, São Paulo, and Mato Grosso do Sul) presented proteins with interesting properties for immunological stimulation (Diagrams 1 and 3). These data corroborate with other studies that show that U. diversum induces variable immune responses in vivo and in vitro [7,37].

Conclusion
It was demonstrated that the U. diversum genome has CDS for molecules with potential for application in UdLAMPs studied were noted in the genome as UdLAMPs and that many of them have signaling of typical lipoprotein secretion. It is well described in the literature that Mollicutes have ingenious molecular mechanisms to change parts of these molecules; however, this initial study contributes to understanding the virulence factors of U. diversum and provides a series of data and approaches that can be used in studying these pathogens.

Methods
Access to genes and analysis of similarity with bovine proteomes

Mapping of B lymphocyte epitopes and antigenicity prediction
The CBTOPE v1.0 server (available at http://crdd.osdd. net/raghava/cbtope/) was used to predict discontinuous (conformational) epitopes of B lymphocytes. A threshold of − 0.3 was used, and on the probability scale (0-9) amino acids with values greater than four were considered conformational epitopes. This server has a data set with non-redundant protein chains consisting of antibody interacting residues of B cell epitopes [38]. To predict continuous epitopes, the primary protein sequences were analyzed in the BepiPred v2.0 software (http:// www.cbs.dtu.dk/services/BepiPred/), a predictor trained only with data, present in your internal database, from epitopes derived from crystallographic structures. Amino acids with thresholds greater than 0.5 were considered linear B cell epitopes [13]. The protein sequences were also submitted to the VaxiJen v2.0 server (http://www. ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html); this predictor allows classifying antigens without using the sequence alignment feature. All proteins predicted to score above thresholds (0.5) were classified as antigenic.
The prediction of B cell epitopes and antigenicity was also performed for the Msp5 ESXA_MYCBO peptide from A. marginale accessed at NCBI under ID number AY527217.1.
Mapping of TCD8 + lymphocyte epitopes and identity analysis with proteomes of other Mollicutes The prediction of binding to MHCI with peptide windows with 9 amino acids, was performed using the server NetBoLApan v1.0, accessed at http://www.cbs.dtu. dk/services/NetBoLApan/. A standard threshold of 0.5% was used for strong bonds and 2% for weak bonds; finally, the number of strong and weak connections were added and expressed in absolute numbers. The NetBo-LApan v1.0 was trained on a peptide dataset with binding affinity to BoLA molecules [39]. The alleles used in this study were BoLA