Ulcerative Colitis Is Under Dual (Mitochondrial and Nuclear) Genetic Control

Background:Cellular oxidative stress and genetic susceptibility have been implicated in the multifactorial etiology of ulcerative colitis (UC). The nuclear genome association with UC has been intensely investigated, but the role of the mitochondrial DNA (mtDNA) has received far less attention and may account for part of the missing heritability. This study is a comprehensive analysis of the mtDNA contribution to UC susceptibility. Methods:The association of mitochondrial single-nucleotide polymorphisms (mtSNPs) and haplogroups with UC was tested in 488 cases and 833 controls of European ancestry from the NIDDK IBD Genetics Consortium Ulcerative Colitis Genome-Wide Association Study available through dbGaP and from the Illumina Genotype Control Database (studies 64 and 65). Results:No evidence of population stratification could be detected using 218 ancestry informative markers for European Americans. Seven of the 58 tested mtSNPs were nominally associated with UC, and A10550G in MT-ND4L withstands the Bonferroni correction (P = 1.29E-06, odds ratio [ORG] [95% confidence interval (CI)] = 4.80 [2.54–9.05], 10550G allele: 8.1% of patients and 1.9% of controls). A10550G remains equally associated after conditional analyses on the 11 UC genome-wide association studies (GWAS) top SNPs (6.35E-07 < Pcond < 4.58E-06), which suggests that it constitutes an independent risk factor from nuclear-encoded susceptibility loci. We detected additive (but not multiplicative) epistatic interactions between A10550G and all 11 top GWAS hits. Subhaplogroup K1 (P = 0.021, OR [95% CI] = 1.71 [1.08–2.69]) increased the risk for UC, whereas the U5b lineage conferred protection (P = 0.016, OR [95% CI] = 0.34 [0.14–0.82]). Conclusions:These results suggest that UC has a dual mitochondrial and nuclear genetic control that warrants further replication in independent data sets and reinforces its etiopathogenic complexity.

C rohn's disease and ulcerative colitis (UC) are the most prevalent forms of inflammatory bowel disease (IBD), affecting over 2.5 million people of European ancestry. 1 UC is a chronic relapsing inflammatory condition characterized by mucosal ulcers in the rectum and colon. Several key players have been implicated in its multifactorial etiology, including environmental factors (e.g., smoking, diet, drugs, geographical and social status, stress), commensal microbiota, epithelial barrier dysfunction, innate and adaptive immune responses, cellular oxidative stress, and genetic susceptibility. [2][3][4][5] Genome-wide association studies (GWAS) and metaanalyses have brought the most rapid novel insights into UC pathogenesis. Among the 163 IBD-associated loci, 110 are shared between UC and Crohn's disease, and the 23 UC-specific loci are primarily immune response mediators (IL10, IL23R, IL26, and MHC genes in chromosome 6p21) and genes involved in the epithelial barrier function (ECM1, CDH1, HNF4a, and LAMB1). 5 However, these genetic risk factors collectively explain only approximately 20% to 25% of the heritability 6 and less than 10% of the total disease variance, 5 which suggests that additional genetic players (e.g., rare variants, extranuclear genome) must be identified.
Hypothesis-free GWAS have pinpointed common genetic risk factors in the nuclear-encoded genome, but the association of the second genome, the mitochondrial DNA (mtDNA), and of their interaction, has received far less attention. mtDNA variation has been associated with non-Mendelian, nonmaternally inherited, complex autoimmune, 7 and inflammatory 8 disorders. Although mitochondrial single-nucleotide polymorphisms (mtSNPs) are represented in most genotyping platforms used in GWAS, their analysis has been neglected or underreported. 5 Mitochondria are the main intracellular source of reactive oxygen species (ROS) because their primary function is production of ATP through oxidative phosphorylation in the electron transport chain. High levels of ROS and oxidative stress have been implicated in the pathophysiology of IBD, 3,4 while the Krebs cycle and oxidative phosphorylation in the mitochondria are among the few downregulated pathways in active UC biopsies. 9 These organelles also play a central role in intracellular signaling, apoptosis, and metabolic pathways such as tricarboxylic acid cycle and the metabolism of amino acids, lipids, cholesterol, and steroids.
The human mtDNA is a haploid, nonrecombining circular genome of approximately 16,600 nucleotides encoding for 13 electron transport chain polypeptides, 22 transfer RNAs, and 2 ribosomal RNAs. Particular combinations of mtSNPs define haplogroups and their subclades that tend to be associated with broad geographic areas and/or populations. 10 Although presumed to be selectively neutral, mtSNPs of phylogenetic relevance and especially those in the mtDNA-coding region are likely to assume a functional role on the expression of some complex traits. mtDNA polymorphisms and haplogroups may act independently or synergistically with nuclear genetic factors and mtDNAencoded gene/nuclear-encoded gene interactions may explain part of the UC genetic vacuum.
To determine whether the mitochondrial genome may account for part of the missing heritability in UC, in this study we tested the association of mtSNPs and haplogroups in 1321 UC cases and healthy controls of European ancestry from the NIDDK IBD Genetics Consortium Ulcerative Colitis (NIDDK IBD UC) GWAS 11 available through the NCBI Database of Genotypes and Phenotypes (dbGaP), 12 and from Illumina Genotype Control Database (iControlDB) studies, controlling for population stratification with the analysis of ancestry informative markers (AIMs).

Study Subjects and Genotyping
The criteria and procedures for the recruitment of patients with UC are described in detail elsewhere. 11 Briefly, cases of self-reported white, European non-Hispanic ancestry, were ascertained in North American institutions and were selected to have either left-sided or extensive UC (i.e., patients only with proctitis were excluded). Genotyping was performed at the Feinstein Institute for Medical Research with the Illumina HumanHap550v3 Genotyping BeadChips (detailed methodology described in Silverberg et al 11 ). The phenotype and genotype data investigated in this study is deposited in the Database of Genotypes and Phenotypes (dbGaP) 12

Population Stratification Assessment
Among the 300 AIMs described to be most informative for discriminating northwest and southeast European ancestries among European Americans and which may be used to correct for population substructure (http://genepath.med.harvard.edu/ ;reich/EUROSNP.htm), 13 156 were directly genotyped in the UC data set as they were included in the Illumina Human-Hap550v3 microarrays. Sixty-two ungenotyped AIMs were replaced by a proxy SNP available in the array and identified using the SNP Annotation and Proxy Search (SNAP) tool (http://www.broadinstitute.org/mpg/snap/) 14 (r 2 ¼ 1 in the CEU population panel and the 1000 Genomes Pilot 1 SNP data set, maximum distance 500 kb), and the remaining 82 AIMs could not be confidently substituted. Of the original 300 AIMs, a panel of 218 AIMs (see Table, 15 F ST values indicate which proportion of the total genetic variance is attributable to the differences between populations and to the differences between individuals of the same population. This measure of genetic structure varies between 0 (no genetic differentiation of population units) and 1 (complete differentiation of population units). The l inflation factor indicates population stratification when l .1.000 and is used to correct the chi-square association tests when stratification exists. 16

mtDNA Haplogroup Classification
Given the nonrecombining nature of mtDNA, haplogroups are defined by the haplotypic combination of multiple mtSNPs throughout the molecule, rather than the status at any single point mutation. Among the genotyped mtSNPs, and considering the updated human mtDNA phylogeny in PhyloTree (build 16; http:// phylotree.org/), 17 we selected the SNPs with phylogenetic relevance to classify our samples into the most prevalent haplogroups in North American population of European ancestry: West Eurasian haplogroups R0/HV, H, V, J, T, U, K, N1, I, and W; Eastern Eurasian and Native American haplogroups A, B2, C, and D4; Near Eastern and Central Asian haplogroups M and Z; and African haplogroup L. The nomenclature of clades follows that proposed by Torroni et al, 18 Macaulay et al, 19 and Richards et al 20 Sixteen samples were excluded from the mtDNA haplogroup analysis because it was not possible to classify them within the above-mentioned haplogroups.

Association Analyses
For the mtSNPs analysis, a quality filtering was applied to consider only those with an SNP call rate$ 95% and minor allele frequency (MAF) $1%. Heterozygote genotypes were considered as missing. For the haplogroup classification, all mtSNP genotypes available were used because rare variants may be of phylogenetic relevance. For instance, the A5656G mtSNP did not pass the quality filtering, but its single occurrence in haplogroup U phylogeny in PhyloTree proved useful for classifying U5b1 samples (further analyzed in the U5b cluster).
For haplogroups' analyses, we compared each haplogroup with all others pooled together. Rare haplogroups (,2%) were included in the "others" category (haplogroups A, B2, C, D4, N1, L, M, and Z). Because no phylogeographic reasoning exists for these clades, the "others" category was not considered for the association analysis. To adjust the association analyses for relevant confounding factors, sex was included as covariate in a binomial logistic regression (log-additive model) computed with the R freeware version 3.1.2. 21 Pairwise conditional analyses, additive, and epistatic interactions between mtSNPs and top GWAS nuclear SNPs were performed sex-adjusted by logistic regression (log-additive model) in R.
Results were considered nominally significant below the conventional level of 0.05. Because some mtSNPs are in linkage disequilibrium and the haplogroup comparisons are not independent, no corrections for multiple testing were performed in both SNP and haplogroup association analyses, and uncorrected P values are reported.

ETHICAL CONSIDERATIONS
This study was approved by the ethics committee of the Centro Hospitalar Lisboa Norte, E.P.E./Faculdade de Medicina de Lisboa, Lisboa, Portugal.

Characterization of the Dataset
The NIDDK IBD UC GWAS 11 data available through dbGaP (study accession number phs000345.v1.p1) consists of phenotypes and genotypes of 488 patients obtained on the Illumina HumanHap550v3 Genotyping BeadChips (which include 156 mtSNPs) and are the focus of the work presented here. The remaining data for 540 UC cases genotyped on the Illumina Hu-manHap300v2 BeadChips could not be further investigated here because mtSNPs were not represented in these arrays. Control data (N ¼ 833) was obtained from the Illumina iControlDB database. No information was available regarding the age at examination (AAE) of the cases, but the controls' average AAE 6 SD was 48.2 6 11.3 years (n ¼ 798). The 2 groups differ significantly (P ¼ 4.09E-33) in the sex distribution (49.2% and 18.0% of males in the cases and controls, respectively), and because UC seems to occur more frequently in women than in men, 1 the association analyses reported in this study are adjusted for sex (very similar results were obtained in unadjusted association tests, data not shown).

Population Stratification Assessment
The effect of population stratification is particularly problematic in mtDNA studies owing to its smaller effective population size when compared with autosomal markers. 22 Because we investigated the mtDNA association on part of the data set used in the NIDDK IBD UC GWAS, we assessed whether a hidden ancestry and/or geographic substructure was present in this subset of samples using genotypes at 218 biallelic AIMs (see Table, Supplemental Digital Content 1, http://links. lww.com/IBD/B189).
To validate the intercontinental resolution of these 218 AIMs, we performed a PCA with patients with UC and controls plus HapMap CEU, CHB, JPT, and YRI individuals (Fig. 1A).  Figure 1B depicts the top 2 principal components of the PCA analysis performed with the patients with UC and controls only, which shows in greater detail that these 2 groups seem to overlap, with no major outliers. Analysis of variance for population differences along the eigenvectors demonstrated that UC cases and controls did not differ with regard to the first 2 principal components, nor did they differ in the overall PCA test findings (P ¼ 0.810). The inflation factor (l ¼ 1.000) and F ST index (F ST [patients with UC, UC controls] ¼ 0.000) support the inexistence of population stratification and genetic differentiation, respectively, between these subsets, and therefore adjustment for ancestry in association tests is not required.
Of the 156 mtSNPs genotyped in our data set, 58 passed quality control and were further analyzed. The mean genotype call rate was 98.1% and 97.6% in patients and controls, respectively. Table 1 shows the association results of mtSNPs nominally associated with UC risk that were analyzed using a sex-adjusted logistic regression (log-additive model), and Supplemental Digital Content 3 (see Table,  . This mtSNP is equally associated (6.35E-07 # P cond # 4.58E-06) after pairwise conditional analyses on the 11 top NIDDK IBD UC GWAS SNPs when adjusted for sex (see Table, Supplemental Digital Content 4, http://links.lww.com/IBD/ B192). Association of top UC GWAS SNPs was also not affected (1.09E-05 # P cond # 0.033) by conditional analysis on A10550G (see Table, Supplemental Digital Content 4, http:// links.lww.com/IBD/B192), which suggests that the mtSNP A10550G constitutes an independent risk factor from nuclearencoded susceptibility loci.
Because the interplay between multiple independent genetic variants may be important in conferring disease susceptibility to UC, we tested whether the A10550G mtSNP genetically interacts with each of the top 11 nuclear-encoded GWAS SNPs. Epistasis was explored using an additive interaction model that suggests the existence of multiple genes acting in parallel, and a multiplicative interaction model that addresses synergistic effects among genes. Additive effects were specifically tested using sex-adjusted logistic regression analyses for a score that ranges from 0 to 3 and is the sum of the number of risk alleles each individual has at A10550G (number of G alleles that is 0 or 1) and at the nuclear SNP being tested (ranges from 0 to 2). We detected additive interactions between A10550G and the top 11 nuclear SNPs because the ORs were greater and the P values were lower in the additive model (see Table, Table, Supplemental Digital Content 5, http://links.lww.com/IBD/B193). For this particular SNP, the number (and percentage) of controls and cases with 0, 1, 2, and 3 risk alleles were 695 (88.3%), 88 (11.2%), 4 (0.5%), and 0 (0%); and 349 (74.1%), 114 (24.2%), 8 (1.7%), and 0 (0%), respectively. As tested using a linear logistic regression model (logOdds ¼ b0 + A10550G + nuclear SNP + A10550G*nuclear SNP + sex), no multiplicative effects (P mult $ 0.395) were detected between the A10550G mtSNP and any of the top 11 GWAS SNPs (see Table, Supplemental Digital Content 5, http://links.lww.com/ IBD/B193). Figure 2 depicts the simplified phylogenetic tree used for haplogroup classification of individuals in our data set. The most common European mtDNA haplogroups were observed in our data set of UC cases and controls, and their frequencies in the control data set are in agreement with those previously reported for the European populations. 18 Table 2 shows the sex-adjusted haplogroup association results with UC. Unlike other subclades, U5a, U5b, and K1 were tested for association because missing calls did not prevent U and K individuals, respectively, to be further subclassified. The K1 subcluster was significantly (P ¼ 0.021, OR [95% CI] ¼ 1.71 [1.08-2.69]) more frequent in patients with UC than in controls (8.8% versus 5.6%, respectively). Additionally, the U5b lineage seems to confer protection against UC (1.4% in patients with UC versus 3.6% in controls; P ¼ 0.016, OR [95% CI] ¼ 0.34 [0.14-0.82]), in accordance with its defining A7768G mtSNP (P ¼ 1.03E-02, OR [95% CI] ¼ 0.30 [0.12-0.75]). The K haplogroup was marginally associated (P ¼ 0.051) despite the nominal (P ¼ 0.030) and strong (P ¼ 1.29E-06) association of it defining A3480G and A10550G mtSNPs, respectively. Because haplogroups are defined by a particular combination of alleles at several defining SNPs, this apparent inconsistency may result from the synergistic effect of other haplogroup-defining SNPs over A3480G and A10550G, and/or from the protective 10398A allele in haplogroup K and its state reversion to 10398G in K1 (Fig.  2). Furthermore, it could be partially explained by a higher number of control individuals having the K haplogroup (58/833) than the 3480G (53/819) or the 10550G (15/787) alleles, because of missing calls (Tables 1 and 2). Some of the 58 control individuals carrying the K haplogroup had missing calls in A3480G or A10550G, but all of them were classified in the K haplogroup as their genotypes at other SNPs (e.g., A11467G, A12308G, G12372A, T9698C, and T14798C) supported this classification.

CONCLUSIONS
Since the colon of patients with UC is characterized by an energy deficiency state and elevated ROS, 3,4,9 mitochondria may contribute to the pathogenic process because of their pivotal role in cellular energy metabolism and ROS generation. Given that ATP depletion compromises the regeneration and barrier function of the colonic mucosa 23,24 and high ROS levels help to sustain the chronic inflammation, certain mtSNPs and haplogroups may influence the susceptibility to UC. We specifically tested whether the mtDNA genome contributes to the missing heritability in UC in a subset of samples with no detectable population stratification and in which previously reported top GWAS findings were validated. The chief finding of this association study was that the A10550G variant in MT-ND4L increases UC risk by approximately 4.8-fold, which is higher than the increase in the risk conferred by any single nuclear genomeencoded variant associated thus far with UC. This supports the notion that UC has a dual (nuclear and mitochondrial) genetic control.
MT-ND4L encodes the core subunit of the respiratory chain NADH dehydrogenase (complex I) and is believed to belong to the minimal assembly for the transfer of electrons from NADH to ubiquinone. A reduced activity in mitochondrial respiratory chain complex II has been observed in the colonic mucosa of patients with UC and UC-induced mice models, whereas the activity of other complexes has not been consistently described as altered. 3,25,26 Synonymous SNPs such as A10550G (Met27Met) are assumed not to affect protein function because no change occurs in the amino acid composition. However, naturally occurring synonymous codon substitutions in the nuclear genome have been shown to alter the in vivo translation kinetics and thus protein folding and function. 27 Nonsynonymous mutations and SNPs in MT-ND4L have been associated with several diseases, including colorectal cancer and body mass index. 28,29 Given that we only assessed the association of 58 mitochondrial SNPs with UC, it is conceivable that the associated SNP is a proxy for another functional genetic variation in strong linkage disequilibrium that was not tested.
Hudson et al 30 probed the mtDNA association with eleven common late-onset diseases (ankylosing spondylitis, ischemic stroke, multiple sclerosis, Parkinson's disease, primary biliary cirrhosis, psoriasis, schizophrenia, coronary artery disease, hypertension, type-2 diabetes, and UC) investigated under the Wellcome Trust Case Control Consortium (WTCCC). Among these phenotypes, UC was the disease with the largest number of nominal mtDNA associations (23 mtSNPs), 5 of which were from effectively genotyped SNPs and the remaining were imputed, 30 but none of these associations overlapped with those described herein. Several factors confound the comparison of these 2 studies, starting with the choice of commercial arrays (Affymetrix 6.0 and Illumina HumanHap550v3) with mostly different mitochondrial polymorphisms. Although the WTCCC data set is larger (2855 cases and 5033 controls after quality control), almost two-thirds of the SNPs tested for association with UC in the WTCCC data set were imputed 30 and not effectively genotyped as in the NIDDK data set. Additionally, population stratification in the WTCCC study was assessed by PCA performed with mtSNPs, 30 not with nuclear-encoded SNPs, and the numerous clusters observed most likely correspond to the major european haplogroups. 31 Therefore, the reported lack of population stratification among cases and controls using this method in the WTCCC data set is consistent with the modest association of relatively rare mitochondrial subhaplogroups with UC in the NIDDK data set, but does not exclude the possibility that mtSNP associations in the WTCCC data set are confounded by the geographical origin, which is known to correlate well with autosomal principal components. 31 Future studies are therefore warranted to further validate and finemap these associations in other populations and to subsequently determine how this variation contributes to UC etiopathogenesis.
Genetic polymorphisms in nuclear-encoded genes involved in mitochondrial energy metabolism (e.g., OCTN, UCP2, DLD, and NF-kB) have been associated with UC. 5,32,33 OCTN and UCP2 have been implicated in intestinal barrier function in vivo, 34,35 and NF-kB is known to regulate epithelial cell proliferation, 24,36 2 biological processes disrupted in patients with UC. In our study, we did not test genetic interactions between the mitochondrial genome and the above-mentioned genes because they were not the top findings in the GWAS under investigation, but we detected an epistatic effect between A10550G and the 11 top SNPs in the NIDDK IBD UC GWAS. The genetic interaction was of an additive and not of a multiplicative nature, as expected, because the functions of ND4L and of the nuclear genes to which these polymorphisms map (e.g., IL23R, IL26) do not seem to be correlated.
Gene expression and proteomic characterization of active UC biopsies have reported significant downexpression of citrate cycle and respiratory chain mitochondrial proteins 9 and show high levels of a number of oxidative stress response proteins (e.g., selenium-binding protein, SOD, and thioredoxindependant peroxide reductase) and energy-generation proteins (e.g., isocitrate dehydrogenase, L-lactate dehydrogenase B-chain, inorganic phosphatase, and enoyl-CoA hydratase). 37 Because mitochondrial-related pathways were not enriched among UC GWAS hits, Cardinale et al 9 suggested that perturbation of the mitochondrial respiration and ATP production are not etiopathogenic drivers but rather secondary to inflammation. It is however important to note that the mitochondrial genome was not investigated in this meta-analysis of UC GWAS scans, 5 therefore underscoring the putative contribution of 13 mtDNAencoded polypeptides related to oxidative phosphorylation. The importance of oxidative stress contribution to UC is further supported by the successful use of therapeutic drugs in patients and animal models that, directly or indirectly, scavenge free radicals, increase antioxidative capacity of cells, and inhibit prooxidative enzymes. 38,39 Finally, it would be of interest to develop a more comprehensive line of research regarding the role of mtDNA variation in UC risk (e.g., full mtDNA sequencing, heteroplasmy levels) and to test the interactions between the mtSNPs, polymorphisms in the nuclear genome and environmental factors previously reported for UC. Deeper understanding of the cellular and genetic contributors to oxidative damage in UC may expand the existing therapeutic targets toward cellular redox homeostasis.