Genome annotation
In order to interpret and translate omics datasets into biological and medical insights. Genome annotation is a basic requirement. I implemented ANNOgesic - the first tool for bacterial/archaeal RNA-seq-based genome annotations (S.H. Yu, 2018, GigaScience). ANNOgesic was reported and collected on many platforms and websites. Currently, I am continuing to extend ANNOgesic to more applications and make it more user-friendly for scientists who have no bioinformatic background. Moreover, an annotation tool for eukaryotic species is also being developed.
Multi-Omics data analysis
High-throughput technology revolutionized the whole field of biology. Numerous analyzing tools for NGS and proteomics data have been published in recent years. However, a single omics data may not reveal a complete picture for the global understanding. Every omics has different advantages but contains some blind spots as well. For instance, transcriptomics can easily detect non-coding genes but is not a proper candidate for the studies of phenotypes. Integration of multi-omics data will definitely be the discernible trend in the following years. Although the global correlation between transcriptomics and proteomics data is quite low (~10-25%), many studies still successfully used multi-omics to achieve specific goals like protein/RNA complex. Thus, the aims of my research in multi-omics analysis go in two directions – improvement of the global integration for multi-omics data and discovery of the issues that can be solved by multi-omics.
Single cell transcriptomic and proteomics
Due to the rapid progress of next-generation sequencing technologies, studying the characterization of individual cells can be achieved. Single-cell technology is the only way to understand biological heterogeneity or study rare cells. In recent years, the number of single-cell studies in genome and transcriptome levels has increased significantly, and the technology is reaching maturity as well. However, single-cell proteomics is still in the initial stage, and numerous possible projects can still be explored. I am involved in some projects with outstanding scientists in the single-cell proteomics and transcriptomics community (S.H. Yu., 2020, J. Proteome Res.). The aim of the projects is to develop a useful tool for single-cell proteomics and transcriptomics.
Deep learning/Machine learning
Deep learning is a subset of artificial intelligence which have caught a lot of attention recently. Currently, many studies show that deep learning performs high-quality and accurate predictions in the bioinformatics field. However, applying deep learning technology to omics data analysis is still fresh ground that needs to be reclaimed. The goal of the project is to build a precise tool for analyzing omics data.