Published September 21, 2021
UBNOW talks with Jihnhee Yu, professor in the Department of Biostatistics and director of the Population Health Observatory, School of Public Health and Health Professions, whose work aims to apply data to medical diagnosis and treatment options.
I have been intrigued by challenging subjects in general. Statistics at first sounded great and sufficiently esoteric to trigger my curiosity for that discipline. My undergraduate major was mathematics, which definitely motivated me to study statistics. Also, statistics is a practical discipline, thus with more job opportunities. That combination was perfect.
Medical applications and providing some answers to medical questions are my major interests all the time. Recently my research is more related to survey data, big data and some image data analysis.
As one of my major methodological works, I have worked on nonparametric statistical inference areas, especially the development of empirical likelihood (EL) methodology, a subject of great importance in the current statistical/biostatistical literature. As an outcome of this aspect of my work, my colleague Dr. [Albert] Vexler [SPHHP professor of biostatistics] and I published a book, “Empirical Likelihood Methods in Biomedicine and Health” [CRC Press, Taylor & Francis, 2018], which I consider a wonderful accomplishment.
To my understanding, biostatistics or statistics is the discipline of data, which provides people with data-driven information to understand what’s happening in the world. There is a question, and statistics/biostatistics uses the data to answer the question. There are two types of problems in this world in terms of data. One is too little data, and the other is too much data. Statistics/biostatistics deal with both problems.
From an education point of view, the statistical thought process is extremely important these days. It consists of establishing a hypothesis, measuring certainty or uncertainty given existing or fairly collected data, and testing the hypothesis. Then we abolish the hypothesis or modify it, where we can still commit two types of errors: One is that the data used may be insufficient or incorrectly collected, and the other is that our model to derive information is wrong. In that regard, statisticians tend to be skeptics, since they start all things with questions, which is like other scientific disciplines and which I consider a good trait of serious researchers.
I used to work at Roswell Park Comprehensive Cancer Center as a biostatistician, and I had a voluntary faculty position at UB. A tight relationship between the biostatistics departments of Roswell Park and UB gave me an opportunity to join to UB. That worked out great, since after working and training in a real medical field, I could be more ready for research work that may have a greater practical flavor versus simply being a theoretical exercise.
As a biostatistician, I will continue to explore clinical data and hopefully can impact the medical area, especially its diagnosis and choice of treatment options. In particular, there are different forms of the data where the valuable information may not be extracted from each variable stored in the form of two-dimensional Excel files.
Image data is a good example, where individual voxels [an element of volume in a 3-D space] means nothing; those voxels need to be connected to one another to produce meaningful interpretation. I believe that the best instrument for that job is a human brain. Can the data process emulate the steps that are happening inside our brain all the time? What might be an efficient form of data processing that can transform such data to something more traditional that may be stored in a two-dimensional Excel sheet? These questions are actually recent trends related to data mining and machine leaning. I hope that I can contribute to these areas as they are as intriguing as it gets.