Large scale proteomic studies create novel privacy considerations

TitleLarge scale proteomic studies create novel privacy considerations
Publication TypePublication
Year2023
AuthorsHill AC, Guo C, Litkowski EM, Manichaikul AW, Yu B, Konigsberg IR, Gorbet BA, Lange LA, Pratte KA, Kechris KJ, DeCamp M, Coors M, Ortega VE, Rich SS, Rotter JI, Gerzsten RE, Clish CB, Curtis JL, Hu X, Obeidat M-E, Morris M, Loureiro J, Ngo D, O'Neal WK, Meyers DA, Bleecker ER, Hobbs BD, Cho MH, Banaei-Kashani F, Bowler RP
JournalSci Rep
Volume13
Issue1
Pagination9254
Date Published2023 Jun 07
ISSN2045-2322
KeywordsAtherosclerosis, Bayes Theorem, Genome-Wide Association Study, Humans, Polymorphism, Single Nucleotide, Privacy, Proteome
Abstract

Privacy protection is a core principle of genomic but not proteomic research. We identified independent single nucleotide polymorphism (SNP) quantitative trait loci (pQTL) from COPDGene and Jackson Heart Study (JHS), calculated continuous protein level genotype probabilities, and then applied a naïve Bayesian approach to link SomaScan 1.3K proteomes to genomes for 2812 independent subjects from COPDGene, JHS, SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) and Multi-Ethnic Study of Atherosclerosis (MESA). We correctly linked 90-95% of proteomes to their correct genome and for 95-99% we identify the 1% most likely links. The linking accuracy in subjects with African ancestry was lower (~ 60%) unless training included diverse subjects. With larger profiling (SomaScan 5K) in the Atherosclerosis Risk Communities (ARIC) correct identification was > 99% even in mixed ancestry populations. We also linked proteomes-to-proteomes and used the proteome only to determine features such as sex, ancestry, and first-degree relatives. When serial proteomes are available, the linking algorithm can be used to identify and correct mislabeled samples. This work also demonstrates the importance of including diverse populations in omics research and that large proteomic datasets (> 1000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered unidentifiable.

DOI10.1038/s41598-023-34866-6
Alternate JournalSci Rep
PubMed ID37286633
PubMed Central IDPMC10247808
Grant ListU01 HL089856 / HL / NHLBI NIH HHS / United States
U01 HL089897 / HL / NHLBI NIH HHS / United States
R01 HL147148 / HL / NHLBI NIH HHS / United States
R01 HL139634 / HL / NHLBI NIH HHS / United States
R01 HL135142 / HL / NHLBI NIH HHS / United States
U01 HL089856 / HL / NHLBI NIH HHS / United States
K08 HL136928 / HL / NHLBI NIH HHS / United States
UL1 TR001420 / TR / NCATS NIH HHS / United States
UL1 TR001079 / TR / NCATS NIH HHS / United States
UL1 TR000040 / TR / NCATS NIH HHS / United States
N01HC95169 / HL / NHLBI NIH HHS / United States
N01HC95168 / HL / NHLBI NIH HHS / United States
N01HC95167 / HL / NHLBI NIH HHS / United States
N01HC95166 / HL / NHLBI NIH HHS / United States
N01HC95165 / HL / NHLBI NIH HHS / United States
75N92020D00007 / HL / NHLBI NIH HHS / United States
N01HC95164 / HL / NHLBI NIH HHS / United States
75N92020D00004 / HL / NHLBI NIH HHS / United States
N01HC95163 / HL / NHLBI NIH HHS / United States
75N92020D00006 / HL / NHLBI NIH HHS / United States
N01HC95162 / HL / NHLBI NIH HHS / United States
75N92020D00003 / HL / NHLBI NIH HHS / United States
N01HC95161 / HL / NHLBI NIH HHS / United States
75N92020D00002 / HL / NHLBI NIH HHS / United States
N01HC95160 / HL / NHLBI NIH HHS / United States
75N92020D00005 / HL / NHLBI NIH HHS / United States
N01HC95159 / HL / NHLBI NIH HHS / United States
HHSN268201500003I / HL / NHLBI NIH HHS / United States
75N92020D00001 / HL / NHLBI NIH HHS / United States
P30 DK063491 / DK / NIDDK NIH HHS / United States
UL1 TR001881 / TR / NCATS NIH HHS / United States
R01 HL105756 / HL / NHLBI NIH HHS / United States
R01 HL117626 / HL / NHLBI NIH HHS / United States
U01 HL120393 / HL / NHLBI NIH HHS / United States
R01 HL120393 / HL / NHLBI NIH HHS / United States
RC2 HL102419 / HL / NHLBI NIH HHS / United States
MS#: 
MS247
Manuscript Full Title: 
Large scale proteomic studies create novel privacy considerations
Manuscript Lead/Corresponding Author Affiliation: 
Clinical Center: Denver (National Jewish Health)
ECI: 
Manuscript Status: 
Published and Public