Genetic studies of paired metabolomes

 The GCKD study is an ongoing prospective observational study that enrolled 5,217 adult persons with CKD between 2010 and 2012. Patients regularly seen by nephrologists with eGFR between 30 and 60mlmin1 per 1.73m2 or eGFR >60mlmin1 per 1.73m2 with UACR>300mg per g (or urinary protein/creatinine ratio>500mg per g) were included53. This study used biomaterials collected at the baseline visit, shipped frozen to a central biobank and stored at 80C54. A more detailed description of the study design, standard operating procedures and the recruited study population has been published53,55.Genetic studies of paired metabolomes 

Genetic studies of paired metabolomes
Genetic studies of paired metabolomes

Study design and participants

The GCKD study was registered in the national registry for clinical studies (DRKS 00003971) and approved by local ethic committees of the participating institutions (universities or medical faculties of Aachen, Berlin, Erlangen, Freiburg, Hannover, Heidelberg, Jena, Mnchen and Wrzburg)53. All participants provided written informed consent. For this project, metabolites were quantified from stored EDTA plasma and spot urine. Information on genome-wide genotypes, covariates and metabolites was available for 4,960 (plasma) and 4,912 (urine) persons.

Genotyping and imputation

Genotyping and data cleaning in the GCKD study were conducted as follows5,56. Genomic DNA from GCKD participants was genotyped at 2,612,357 variants using Illumina Omni2.5Exome BeadChip arrays and imputed using minimac3 version 2.0.1 at the Michigan Imputation Server57 and the Haplotype Reference Consortium haplotype version r1.1 and Eagle 2.3 for phasing. On the variant level, SNPs with <96% call rate, imputation quality of r20.3, MAF<1% or deviating from HardyWeinberg equilibrium (P<11010) and all multi-allelic SNPs were removed.

Metabolite identification and quantification

Non-targeted mass spectrometry analysis was performed at Metabolon, and sample preparation was carried out as published by Schlosser et al.5. Automated comparison of the ion features in the experimental samples to a reference library of chemical standard entries (>4,500 purified standards) was used for metabolite identification. Known metabolites reported in this study conformed to confidence level 1 (the highest confidence level of identification) of the Metabolomics Standards Initiative58,59, unless otherwise denoted with an asterisk.

Data cleaning of quantified metabolites

An in-house pipeline was set up for data quality control, filtering and normalization of metabolite concentrations. No plasma specimens and four pairs of urine specimens with a Pearson correlation coefficient greater than 0.9 and differing sample IDs were removed. Four plasma specimens and no urine specimens were removed for >50% missing data. A total of 130 plasma and 131 urine metabolites were removed, as less than 300 genotyped samples were available.Genetic studies of paired metabolomes

Genetic studies of paired metabolomes
Genetic studies of paired metabolomes

Definition of additional variables

In the GCKD study, an IDMS-traceable enzymatic assay (Creatinine Plus, Roche) was used to measure serum creatinine levels, for estimating GFR by means of the CKD-EPI formula61, and to measure urine creatinine levels. The Tina-quant Albumin assay (Roche) was used to measure serum and urine albumin, for adjustment and calculation of the UACR, respectively. The GFR was estimated in the ARIC study from serum creatinine and cystatin C using the CKD-EPI formula62.

Genome-wide association study of metabolite levels

Based on log2-transformed metabolite levels, residuals adjusted for age, sex and the first three genetic principal components were generated (similar to previous mGWAS5,6,56,63,64), with plasma levels additionally adjusted for ln(eGFR) and serum albumin. GWAS analyses of these residuals were performed with SNPTEST version 2.5.2 (, using imputed genotype dosages and linear regression under additive modeling. Statistical significance was defined as genome-wide significance after correcting for multiple testing by a Bonferroni procedure (3.91011=51081,296 plasma traits; 3.61011=51081,401 urine traits).

Heritability estimation

A genetic relationship matrix was calculated from all autosomal SNPs with an imputation quality of r2>0.6 using GCTA-GRM71. GCTA-GREML72 was then used to estimate the proportion of variation in log2-transformed and, in the case of urine, pq-normalized metabolite levels that can be explained by the SNPs for all metabolites that gave rise to an mQTL.Genetic studies of paired metabolomes

Independent SNP selection and statistical fine mapping

We identified independent signals within mQTL using approximate conditional analyses, with LD information estimated from our study sample. The fine-mapping regions of mQTL were aligned within matrices across metabolites, if index SNPs were in LD (r2>0.8). For each mQTL, the GCTA-COJO Slct algorithm version 1.91.6 (ref. 73) was used to identify independent genome-wide significant SNPs (Pconditional<3.91011), using a collinearity cutoff of 0.1. For mQTL with multiple independent SNPs, approximate conditional analyses were carried out conditioning on the other independent SNPs in the region using the GCTA-COJO Cond algorithm to estimate conditional effect sizes. Statistical fine mapping was performed for all independent SNPs per mQTL.

Independent SNP

In loci with a single independent SNP, approximate Bayes factors (ABFs) were calculated from the original GWAS effect estimates using Wakefields formula74 with a standard deviation prior of 1.33. For mQTL with multiple independent SNPs, ABFs were derived from the conditional effect estimates. The SNPs ABF was used to calculate the posterior probability for the variant driving the association signal (PPA, causal variant). Credible sets were calculated by summing the PPA across PPA-ranked variants until the cumulative PPA was >99%. log2-transformed credible set sizes were regressed on the MAFs of independent index SNPs.

Pairwise colocalization tests of plasma and urine mQTL

To examine whether association patterns with metabolites measured in plasma and/or urine are shared across or within matrices, we conducted pairwise colocalization analyses between mQTL. When the windows of 500kb around the index SNPs for two mQTLs overlapped, colocalization was performed within the region of the merged windows using a version of Giambartolomeis colocalization method75 as implemented with the function from the R package gtx ( with default parameters and prior definitions. To visualize the effect sizes and explained variance for colocalizing signals for mQTLs detected for the same metabolite across matrices (Extended Data Fig. 4), we used the R package circlize (ref. 76).


SNP annotation was performed by querying the SNiPA database version 3.4 (released 13 November 2020)13, based on the 1000 Genomes phase 3 version 5 and Ensembl version 87 datasets. The retrieved combined annotation-dependent depletion (CADD) score was based on CADD version 1.3. The Ensembl VEP tool was used for the effect prediction of SNPs. SNiPA was used to collect the following annotations for each index SNP: gene hit or close by, regulated genes, CADD score, SnpEff effect impact (exonic and noncoding), mQTL, pQTL, GWAS Catalog, cis eQTL, disease genes (based on ClinVar, OMIM, HGMD and Drugbank) and UK Biobank associations.

Relation of mQTLs to plasma proteins in trans and phenotypes

We also performed colocalization analyses of mQTLs with disease outcomes and biomarker measurements in the UK Biobank, with two representative kidney function traits and with trans pQTLs using the precomputed pQTL data from Sun et al.79 to gain insights into clinical consequences and potential molecular mediators of mQTLs. Association summary statistics between SNPs and 30 biomarkers from the UK Biobank baseline examination, including the liver function markers AST, ALT, GGT, bilirubin and albumin, were computed using BOLT-LMM80 (application no. 20272) in the same subset of European-ancestry participants as previous studies81.

Human Protein

Precomputed GWAS summary statistics of diseases as ascertained in the UK Biobank and analyzed using phecodes were obtained from (1,403 binary traits) and from (2,325 of 2,989 binary traits82; traits containing job-coding terms were excluded from the analysis).Genetic studies of paired metabolomes

Genetic studies of paired metabolomes
Genetic studies of paired metabolomes


Processing of gene expression data from tissue and cell types

To test for over-representation of plasma or urine mQTL-related genes among those highly expressed in specific tissues and cell types, we compiled bulk and single-cell gene expression (RNA-seq) datasets. These included GTEx version 8 (ref. 78), the Human Liver Cell Atlas85, a single-cell dataset and a single-nucleus dataset from the human kidney86,87, a single-cell dataset from the mouse kidney88, a single-cell dataset from the human intestine89 and a single-nucleus dataset from the kidneys of patients with CKD from the Kidney Precision Medicine Project (KPMP)90.

GO, KEGG, tissue and cell type enrichment analyses

Enrichment testing of the 282 identified genes was performed as follows. The number of independent SNPs per gene was computed using GCKD genotypes (PLINK version 1.90 (ref. 93)), and a database of Entrez gene identifiers based on version 3.8.2 was generated. Gene annotation included the number of independent SNPs per gene, gene length, GO terms94 and KEGG pathways95, as well as being Human Protein Atlas tissue or group enriched96;

Human Protein 

Human Protein Atlas cell type enhanced, enriched or group enriched97; being a VIP gene from PharmGKB (accessed 5 December 2020)98; being a gene with an actionable drug interaction from the Clinical Pharmacogenetics Implementation Consortium (levels A, A/B and B; accessed 13 January 2021)99; and being among the top 10% highly expressed genes in each GTEx version 8 tissue78 and human85,86,87,89,90 and murine cell types88.

100 million random draws

We performed 100 million random draws of an equal number of genes as contained in the respective source list (combined mQTLs, 282; plasma mQTLs, 214; urine mQTLs, 195; plasma-only mQTLs, 87; urine-only mQTLs, 68), matched for deciles of the number of independent SNPs and deciles of gene length and compared any overlap with cell types, tissues and terms with the ones identified for the original source list.

Multiple-testing correction was performed using the BenjaminiHochberg procedure100.Genetic studies of paired metabolomes



Leave a Comment