Lior Shamir
K-State Computer Science
Lior Shamir
Machine Learning in Science – Robustness, Reproducibility, and Reliability

January 29, 2024
4:30 p.m.
CW 102 or Zoom
Email office@phys.ksu.edu for the Zoom address

 

Abstract

The size and complexity of scientific databases have reinforced the need for automation that can mine and annotate data, turning data into knowledge and scientific discoveries. One of the emerging paradigms to approach the analysis of complex and large scientific databases is machine learning, recently leading to a vast pipeline of studies. In particular, many of these recent experiments are based on the machine learning concept of deep neural networks. While machine learning has many important advantages, they also introduce a broad range of limitations that do not exist in “traditional” data analysis. These limitations include complex biases that can very easily lead to false conclusions. These biases are often very difficult to notice, and even highly experienced researchers and machine learning experts can easily be deceived by the nature of these algorithms and the results that they produce. I will show some common practices and uses of machine learning, as well as machine learning experiments that are heavily biased by the machine learning algorithms. Some of these experiments are foundational in machine learning, yet these biases escaped the attention of the experimentalists. I will also show why machine learning bias is extremely difficult to identify, and can easily deceive even experienced researchers.