As a bioinformatician with expertise in computational proteomics, I have the training, experience, and expertise
to perform the proposed research project. I have more than 19 years of experience in bioinformatics algorithm
design, software development, and data analysis. During the last 15 years, my research is focused on mass
spectrometry (MS)-based computational proteomics. I have extensive data analysis experience in peptide and
protein identification, quantification, post-translational modification (PTM) identification and de novo sequencing
by top-down and bottom-up MS and published more than 70 papers in this area. I designed and developed
TopPIC suite, the most widely used open-source software package for proteoform identification by top- down MS,
and CHAMPS and TBNovo, two software tools for whole protein sequencing using MS. Collectively, these tools
have been downloaded more than 7000 times by researchers around the world and have been applied to study
various cell systems, such as cardiac muscle cells and cancer cells. I collaborated with leading proteomics
laboratories on MS-related studies, such as Pacific Northwest National Laboratory and Dr. Liang liang Sun at
Michigan State University. I identified a proteomics biomarker in Bronchiolitis Obliterans Syndrome using bottom up
MS and carried out MS-based qualitative and quantitative proteomics studies and systems biology studies of
several diseases, such as diabetes and colorectal cancer. Collaborating with Dr. Sun, we performed the first
proteome-level comparative study of proteoforms in metastatic and non-metastatic colorectal cells (McCool et
al., Science Advances, 2022).
I am a PI of an active NCI grant titled “Quantitative top-down proteomics of human colorectal
cancer cells and tumors” (R01CA247863 2/2021 – 1/2026). The aim of the project is to develop
highly sensitive mass spectrometry-based systems for top-down proteomics, gain new insights
into the colorectal cancer (CRC) metastasis through quantitative top-down proteomics of two
isogenic CRC cell lines (SW480 and SW620), and understand Lynch Syndrome at the proteoform
level via quantitative top-down proteomics of patients’ tumor samples.
The genome level and transcriptome-level information cannot accurately reflect the protein-level
information because post-transcriptional regulation can modulate protein expression and because
post-translational modifications (PTMs) can influence protein function. Quantitative proteomic
studies of CRC are vital. Many bottom-up proteomics studies have been completed on CRC cells
and tumors, but limited information on proteoforms have been acquired due to low protein
sequence coverages typically obtained from bottom-up proteomics. Different proteoforms from
the same gene can have drastically different functions. We hypothesize that large-scale and
quantitative top-down proteomics of human CRC cells and tumors will provide new insights into
CRC, leading to better therapies.
In this project, we develop new analytical tools to boost the sensitivity and scale of top-down
proteomics. The new tools will enable large-scale and quantitative top-down proteomics of CRC
cells before and after metastasis as well as CRC tumors from patients with Lynch Syndrome. The
novel analytical tools will boost the sensitivity of top-down proteomics by tenfold and will be
particularly useful for the proteomics community for large-scale top-down proteomics of mass limited
samples. Quantitative top-down proteomics of CRC cells before and after metastasis will
generate an unprecedented resource for the cancer biology community to gain new insights into
CRC metastasis. Quantitative top-down proteomics of the Lynch Syndrome tissues will elucidate
the roles played by mutations and functions of DNA mismatch repair genes in Lynch Syndrome
at the proteoform level.