Multiparametric and semiquantitative scoring systems for the evaluation of mouse model histopathology - a systematic review

Background Histopathology has initially been and is still used to diagnose infectious, degenerative or neoplastic diseases in humans or animals. In addition to qualitative diagnoses semiquantitative scoring of a lesion`s magnitude on an ordinal scale is a commonly demanded task for histopathologists. Multiparametric, semiquantitative scoring systems for mouse models histopathology are a common approach to handle these questions and to include histopathologic information in biomedical research. Results Inclusion criteria for scoring systems were a first description of a multiparametric, semiquantiative scoring systems which comprehensibly describe an approach to evaluate morphologic lesion. A comprehensive literature search using these criteria identified 153 originally designed semiquantitative scoring systems for the analysis of morphologic changes in mouse models covering almost all organs systems and a wide variety of disease models. Of these, colitis, experimental autoimmune encephalitis, lupus nephritis and collagen induced osteoarthritis colitis were the disease models with the largest number of different scoring systems. Closer analysis of the identified scoring systems revealed a lack of a rationale for the selection of the scoring parameters or a correlation between scoring parameter value and the magnitude of the clinical symptoms in most studies. Conclusion Although a decision for a particular scoring system is clearly dependent on the respective scientific question this review gives an overview on currently available systems and may therefore allow for a better choice for the respective project.


Background
Histopathology has initially been and is still used today to diagnose infectious, degenerative or neoplastic diseases in humans or animals. These qualitative diagnoses are based on a sum of observable changes in the morphology of the analyzed tissue. The cognition of these changes is based on the pattern recognition of the observer and the comparison of these patterns with the known physiologic variation in tissue morphology in the respective species. Decades of experience in veterinary pathology show that this approach allows for reproducible qualitative diagnoses by the observer but can also be used for semiquantitative scoring of the lesions magnitude, i.e. on an ordinal scale for instance with a low, medium or high grade trichotomy which correlates with the clinical relevance of the lesions.
Absolute quantification of the lesions extent and severity is however difficult since two main problems hamper absolute quantification, i.e. on a rational scale with absolute values of 1, 2, 3 etc., using standard, non-automated histopathology. First, the detection method is not reliable enough. Despite intensive training and attempts to standardize nomenclature and the definition of lesions there are still unresolved issues in terms of interobserver variation which may be acceptable for qualitative and semiquantitative evaluation but not for absolute quantitation [1]. Second, in most circumstances it is impossible to objectively justify the interval between two values, thus a read out of histopathologic scoring on a rational scale is impossible.
Image analysis by automated calculation of the tissue area affected or cells present per area have been introduced to overcome this problem. These approaches aim at a reliable and reproducible histopathology read out in a rational scale to allow proper statistical processing and at an exclusion of an observer bias [2]. Image analysis approaches usually use one or only few two dimensional planar sections of the tissue of interest to measure threedimensional objects. This two-dimensional approach thus may also lead to biased results. Stereology, which is based on systematic random sampling and estimates third dimensional information, has been developed to avoid this Table 1 Semiquantitative scoring systems for murine intestinal disease models bias [3]. It can therefore be seen as the most sophisticated method for the quantification of histologic information. It is however by comparison a laborious and complex method which is established in only few laboratories.
Semiquantitative scoring systems are therefore still the most widely used methods to include histopathologic information in biomedical research. These scoring systems usually include multiple parameters which are separately quantified on an ordinal scale and finally combined in a total score. Average scores of the different experimental groups can then be compared by non-parametric statistical tests. The selection of the parameters should be based on the scientific hypothesis or question together with the current knowledge on the morphologic outcome of the investigated disease model. It may therefore be useful to design individual scoring system for each study which in the best possible way answers the particular scientific question. Standard scoring systems for specific disease models on the other hand allow for the comparison of the results of different studies.
Several standard scoring systems for different mouse models have been introduced or emerged in the past 20 years. Histopathologists are therefore repeatedly requested by cooperating scientists to evaluate the outcome of animal studies using standard scoring systems or to elaborate project specific scoring systems. The present review is intended to give a comprehensive overview on the currently most commonly used multiparametric, semiquantitative scoring systems for mouse model histopathology.

Scoring systems for murine intestinal disease models
Eighteen original scoring systems for colitis models could be identified. Most of these scoring systems were designed for dextran sodium sulfate (DSS)-induced colitis models but 2,4,6-trinitrobenzene sulfonic acid (TNBS)-induced colitis and several models of immunopathologic colitis were also used to establish scoring systems (Table 1, Figure 1). Eight scoring systems for DSS-induced colitis fulfilled all required parameters and were included in this review. Generally, all of the paper with original DSS colitis scoring systems had a high citation rate but the scoring system described by Cooper is one of the earliest system with the highest citation number up to date and can therefore be seen as a prototype for DSS-colitis scoring [4]. It separates the colon into three segments which are then scored by the parameters of crypt loss, inflammation and affected area. Although later studies refined and increased the number of histopathologic parameters the value of this initial study is the separation of the colon in three segments and the sophisticated approach to the establishment of the scoring system. Remarkably, parameters in this study were chosen and tested according to their correlation with the clinical symptoms of the mice. This is contrast to the vast majority of scoring systems presented in this review, which only rarely stated the rational for choosing the included parameters and did not perform a correlation with the clinical symptoms.
When comparing all original colitis models it becomes obvious that a wide variety of appellation for the most common parameters inflammation, crypt and surface   (Table 1). These differences in the nomenclature make it however difficult to directly compare the different scoring systems. Less often used parameters in the colitis scoring systems were goblet cell loss, regeneration, muscular and epithelial hyperplasia, edema and the separation between acute and chronic inflammation. In some cases these singularities of the respective scoring system seem to be dependent on the objectives of the study while in most cases the rationale for selection of the parameters was not given.
The number of identified scoring systems for small intestinal disease was significantly lower than for colitis models and had on average lesser citations (Table 1). Two independent scoring systems were identified for intestinal ischemia which include the comprehensible parameters of hyperemia and hemorrhage as well as inflammation and epithelial damage and in the case of the higher cited publication by Park et al. several other more sophisticated parameters [23,24]. Only two scoring systems for small intestinal enteritis were detected which both included villous morphology, epithelial damage and inflammation as the main features (Table 1). Surprisingly, only one semiquantitative scoring system for gastritis was identified. Wang et al. scored the severity of Helicobacterinduced gastritis in a uniparametric scoring of five gastric areas. For the sake of completeness this studies was included in this review although it did not fulfill the criterion of multiple parameters [27].

Scoring systems for murine osteoarthritis models
Seventeen original semiquantitative, multiparametric scoring systems were identified for murine osteoarthritis models ( Table 2). Three of these, designated as osteoarthritis in Table 2, are scoring systems for human idiopathic arthritis and were transferred to the murine model to allow for comparisons of the model with the human disease. Of these the the score developed by Mankin et al. has by far the highest citation number which is most Table 2 Semiquantitative scoring systems for murine osteoarthritis models Arthritis [30] Cartilage destruction (0-6), optional subgrading (subdivision in 2 subgrades) 197 Arthritis [31] Synovial lining, resident cell density, inflammation (each 0-3) 49 Collagen-induced [29] Extent of synovitis, cartilage loss, bone erosions (together 0-3) 713 Collagen-induced [32] Inflammation, cartilage destruction, bone erosion (each 0-3) 273 Collagen-induced [33] Infiltration in the exudate, infiltration of the synovial membrane, cartilage destruction, bone erosion (each 0-3)

Scoring systems for murine renal disease models
Fourteen original multiparametric, semiquantitative scoring systems for murine models of renal diseases fulfilled the required criteria for inclusion in this review (Table 3). Scoring systems for murine Lupus erythematous models were the dominant model in the category of renal disease models with four appearances. Austin et al. published the lupus nephritis score with the highest citation number [48]. It uses a complex scoring system with 10 parameters and a scale width of four and five respectively and was thus more sophisticated than the other scoring systems which used only four different parameters. Glomerular cellularity and proliferation were the terms most commonly used in all renal scoring systems except one scoring system for obstructive nephropathy model [61]. In addition, one half of the systems included tubulointerstitial infiltration and fibrosis in the scoring system.

Scoring systems for murine models of neurologic disease
Twenty-two original scoring systems for murine models of central nervous system (CNS) disease were identified (Table 4). Scoring systems for experimental autoimmune encephalomyelitis (EAE) and stroke clearly dominated results. Due to the wide variety of diseases covered by the system the selection of parameters to be analyzed also had a wide variation and was clearly dependent on the pathophysiology of the disease. But again, the rationale for inclusion of parameters was not consistently given.
A striking feature of CNS disease scoring systems was the relatively low number of ordinal scales for param-eter´s magnitudes and the common inclusion of multiple anatomical sites into the scoring system (Table 4). This discrepancy is not addressed in the respective publications but may be based on the anatomical diversity of the CNS. Furthermore, the inclusion of absolute values like lesions/mm 2 occurred significantly more often in Table 3 Semiquantitative scoring systems for murine renal disease models Renal disease model Scoring system: parameters (scale width) Citation Lupus nephritis [48] Activity index (glomerular/tubulointerstitial abnormalities (6-tier, each 0-4)); chronicity index (4-tier, each 0-3)
CNS disease scoring than for other organs although this is not comprehensible in each case. Only one scoring system occurred for a peripheral nerve system disease which has been developed to evaluate peripheral nerve ischemia in a relatively simple two-tier system with a zero to fourscale [66].

Scoring systems for murine models of pulmonary diseases
Fourteen original semiquantitative, multiparametric scoring systems were identified for pulmonary diseases (Table 5). Of these pulmonary fibrosis and pulmonary inflammation were diseases with the highest number scoring systems and citations. For instance, the scoring system developed by Ashcroft, which is a relatively simple multiparametric but single scaled system, is commonly used for the evaluation of lung fibrosis [83]. Three scoring systems were developed for models of general acute lung inflammation (Table 5). They used the parameters of edema and anatomical site specific inflammation as parameters to evaluate the relative amount of inflammatory response. Similar parameters in a wide variety of combinations were used to develop scoring systems for diverse infectious pneumonia models. This variation is again in most cases not based on reasonable argument for the inclusion of a certain parameter in a certain model and therefore not in all cases clearly associated with the supposed pathogen-associated pathogenesis of the respective pneumonia.
Scoring systems for myocardial, vascular and muscular disease models Three original scoring systems for the evaluation of viral myocarditis were identified [96][97][98]. All of them included the evaluation of the parameters of myocardial necrosis and inflammation (Table 6). In addition, two of them also included calcification as a parameter while fibrosis and Evans blue-staining as a marker of myofiber damage were used as a parameter of myocardial disease only once. Focal Ischemia [63] 18 areas x neuronal injury (0-5) 23 Global ischemia [64] Infarcts in 3 cerebral regions (0-4), hippocampus infarction (0-4) 100 Global ischemia [65] Eight regions x neuronal cell los/gliosis/iron deposition/gliosis (0-3) 95 Peripheral nerve ischemia [66] Edema, fiber regeneration (each 0-4) 21 Multiple sclerosis models

Miscellaneous CNS diseases
Oxidative damage [80] 7 areas x necrosis (0-3) 27 Senescence Three semiquantitative scoring systems for the most important human vascular diseases were identified: atherosclerosis, aneurysms and vasculitis (Table 6) [101][102][103]. They all cover several aspects of the pathogenesis and pathophysiology of the diseases but have been generally rarely cited yet. The aneurysm scoring system grades the severity of the disease by the extent of medial and adventitial lesion together with the general size of the lesion [102].
The atherosclerosis scoring systems uses a 5-tier system with a 0-1 scale width [102], whereas the vasculitis score uses the parameters infiltration, elastic lamina destruction and intimal thickening, thus indicating that the system may only be useful for evaluation of larger vessel types [103].
[104] while the other scoring system was developed to quantify the extent of ischemia-induced muscle necrosis by the parameters necrosis, infiltration and hemorrhage [105].

Scoring systems for hepatic and pancreatic diseases
Ten original scoring systems for chronic hepatitis have been developed or used for the quantification chronic hepatic disease ( Table 7). The scoring systems by Ishak and Knodell are both highly cited scoring systems and cover almost all possible histomorphologic changes in chronically inflamed livers [106,107]. The two identified scoring systems for acute hepatitis quantify lesions by grading the extent of inflammation and necrosis, similar to the Ishak system for chronic hepatitis [108,109]. Five scoring systems for the evaluation of acute pancreatitis have been identified ( Table 7). The first and most commonly cited scoring system was published by Schmidt et al. [116]. It uses the five parameters edema, necrosis, inflammation, hemorrhage and fat necrosis to score the extent of pancreatic lesions. All four later developed scoring systems only marginally modified the parameters by omitting a single parameter or including vacuolization as an additional marker (Table 7). In addition, one multiparametric but several uniparametric (data not shown) scoring systems were identified for the quantification of insulitis in mice models. The scoring system by Papaccio et al. uses islet infiltration, atrophy and destruction as parameters for the evaluation of Isle of Langerhans inflammation [122].

Scoring systems for skin and ocular diseases and miscellaneous disease models
Murine psoriasis models are the only skin disease model with more than one identified scoring system (Table 8). Both scoring systems offer a wide variety of parameters for the evaluation of epidermal and dermal changes in models of this relevant human disease [123,124]. In addition, scoring systems for dermal sclerosis, burn scars, atopic dermatitis and epithelial irritation were identified (Table 8).
Five original scoring systems for the evaluation of ocular diseases were identified. Two of these systems use only one parameter for the evaluation of autoimmune and endotoxin uveitis. For the sake of completeness these scoring systems are also displayed in Table 8, although they do not fulfill requirements for inclusion [133,134]. Furthermore, the identified scoring system for diabetic retinopathy uses two parameters evaluated in absolute numbers of leukocytes per area [130].
Three scoring systems for the evaluation of abdominal adhesions after traumatic or toxic irritation of the peritoneum could be identified. Interestingly, not all scoring Table 7 Semiquantitative scoring systems for murine hepatic and pancreatic disease models Disease model Scoring system: parameters (scale width) Citations

2,001
Chronic hepatitis [110] Mitotic activity, portal inflammation, ductular proliferation, councilman bodies, fibrosis (each 0-3) systems for abdominal adhesion use fibrosis as a parameter for the grading of the adhesions [135][136][137]. Finally, very helpful scoring systems for the evaluation of embryonic development and wound healing could be identified (Table 8).

Scoring systems for systemic diseases and transplant rejection
Three original scoring systems analyzing lesions associated with graft-versus-host disease (GvHD) are available ( Table 9). The most cited scoring system by Hill et al. exclusively covers intestinal lesions associated with GvHD [145]. This system allows a very thorough analysis of intestinal lesion using a wide variety of parameters in the small and large intestine. The other two systems also cover intestinal lesions but provide additional parameters for the analysis of hepatic [146] or hepatic and skin lesions [147]. Three original scoring systems have been developed for the analysis lesions associated with hemorrhagic shock ( Table 9). The scoring system with the highest citation number only focusses on the pulmonary lesion and offers a variety of parameters for the evaluation of shock-induced lesions in the lung [148]. The other two scoring system also include parameters for lung evaluation but both offer additional parameters for the quantification of intestinal changes [149,150] or in one case offer scoring systems for renal and hepatic lesions [150] associated with hemorrhagic shock. The identified scoring system for immunotoxicity of toxins more or less demands the analysis of all immune organs but gives a good guideline in terms of the
nomenclature of morphologic changes associated with the toxin application (Table 9) [151]. Finally, the three scoring systems which have been used for murine transplant rejection models have consistently been developed for the evaluation of tissues from human patients (Table 9). They are all used in mouse models unmodified to allow for better conclusions from the mouse models for the situation in the human patient.

Discussion
Extensive research of the literature identified 146 originally designed semiquantitative, multiparametric scoring systems for the histopathology of mouse models. These scoring systems cover almost all organs systems and a wide variety of disease models. Colitis and especially ulcerative colitis was the disease model with the largest number of different scoring system closely followed by experimental autoimmune encephalitis (EAE), lupus nephritis and collagen induced osteoarthritis.
The number of citations for the publication including the scoring system varied between few citations and up to 2176. The citation number clearly reflects the value of the scientific work shown in the papers and thus also indirectly reflects the quality of the included scoring systems. In some cases there is even clear evidence that the high citation number is directly based on the "gold" standard character of the scoring system and its regular use in mouse models or human tissues, for instance the score from Mankin [4,28,106,107]. Nevertheless, after careful analysis of the publication it also became obvious that scoring systems in publications with a small number of citation also proofed to be of expedience for certain scientific question.
Assuming that the main function of scoring systems is the analysis of the influences of experimental factors on the microscopical tissue morphology the selected parameters should be consciously chosen to be able to reflect the potential changes. The lack of a rationale for the selection of the parameters was therefore an emerging and Complete assessment of all lymphoid organs (each 0-4) 50
surprising finding during the literature search for this study. Although the selection of parameters in most scoring systems is comprehensibly based on the common knowledge on the pathogenesis of the disease modeled in the mouse, there is only rarely a clear statement or a line of argument for choosing a parameter. Even less often the correlation between scoring parameter value and the magnitude of the clinical symptoms or the differences in the extent of the experimental factor is given as for instance in the excellent study of Cooper et al. [4]. This lack is most probably due to the timeconsuming work involved, but it may however tremendously increase the value and the scientific merit of the scoring system.

Conclusion
In summary, a final judgment of the quality and the usefulness of the scoring systems presented was not an aim of this study and is after all most probably not possible since the value of a scoring system clearly depends on the scientific question, the underlying hypothesis, the model characteristics and the pathogenesis of the disease. This review may however give an overview on currently available scoring systems and may therefore allow for a better choice for the respective project.

Selection of scoring systems
The systematic review was prepared according to the PRISMA guidelines [157]. All items were considered and can be viewed in Additional file 1. Scoring systems were identified by a comprehensive Pubmed search (http:// www.ncbi.nlm.nih.gov/pubmed/) using a combination of the search terms "mouse", "score", "histopathology". This led to the identification of 1479 publication by October 30, 2012 ( Figure 1). Full text versions of all publications were obtained and analyzed for the description of multiparametric, semiquantitative, scoring systems for the histopathology of mouse models. Inclusion of a mouse scoring system in this overview was based on the fulfillment of six parameters. First, the scoring system had to be based on the semiquantitative evaluation of histopathologic changes in murine tissues. Thus, approaches using digital image analysis for absolute quantification of lesion area, cell number or immunohistochemical signals or scoring systems with dominance of immunohistochemical markers as evaluation parameters were not included.
Second, only scoring systems evaluating more than one histomorphologic parameter were included in the review. Nevertheless, scoring systems with high citation numbers which combined several parameters in a uniparametric score were also included. For instance, if a highly cited scoring system integrated the presence and extent of crypt abscesses, epithelial sloughing and submucosal infiltration into a single score of 0 to 4 the study was also included.
Third, the scoring approach had to be comprehensibly described to allow for reproduction by the reader.
Fourth, the scoring system had to be originally designed for the presented study without citation of former publications. If former publications were cited as the source of the scoring system, the string of citations was followed back to the study originally describing the scoring systems. If scorings systems were not referenced to older studies but similar approaches were detected in earlier publication, only the older study was included in this review.
Fifth, the scoring systems were generally grouped by the organ affected and analyzed. Systemic diseases and transplantation models were included in separate groups. If the number of identified scoring systems for a specific disease model exceeded ten, only the then most cited scoring systems were included in this review.

Competing interests
The author declares that he has no competing interests.
Author's contributions RK had the idea of the project, performed the internet search, the data analysis and wrote the abstract.