KDD2021 Tutorial

Machine Learning Robustness, Fairness, and their Convergence

August 14, 2021


  1. Overview:
    Responsible AI becomes critical where robustness and fairness must be satisfied together. Traditionally, the two topics have been studied by different communities for different applications. Robust training is designed for noisy or poisoned data where image data is typically considered. In comparison, fair training primarily deals with biased data where structured data is typically considered. Nevertheless, robust training and fair training are fundamentally similar in considering that both of them aim at fixing the inherent flaws of real-world data. In this tutorial, we first cover state-of-the-art robust training techniques where most of the research is on combating various label noises. In particular, we cover label noise modeling, robust training approaches, and real-world noisy data sets. Then, proceeding to the related fairness literature, we discuss pre-processing, in-processing, and post-processing unfairness mitigation techniques, depending on whether the mitigation occurs before, during, or after the model training. Finally, we cover the recent trend emerged to combine robust and fair training in two flavors: the former is to make the fair training more robust (i.e., robust fair training), and the latter is to consider robustness and fairness as two equals to incorporate them into a holistic framework. This tutorial is indeed timely and novel because the convergence of the two topics is increasingly common, but yet to be addressed in tutorials. The tutors have extensive experience publishing papers in top-tier machine learning and data mining venues and developing machine learning platforms.
  2. Outline (3 hours):
    • Introduction (5 minutes)
    • Part I: Robustness to Label Noise (80 minutes)
      • Motivation and Issues (5 minutes)
      • Label Noise Modeling (5 minutes)
      • Robust Training Overview (10 minutes)
      • Robust Architecture Approach (10 minutes)
      • Loss Adjustment Approach (20 minutes)
      • Sample Selection Approach (20 minutes)
      • Real-World Noisy Data Sets (10 minutes)
    • Part II: Fairness to Data Bias (50 minutes)
      • Motivation and Issues (5 minutes)
      • Preparing Unbiased Data (15 minutes)
      • Training on Biased Data (20 minutes)
      • Debiasing a Trained Model (10 minutes)
    • Part III: Convergence of Robustness and Fairness (30 minutes)
      • Fairness-Oriented Approach (15 minutes)
      • Equal Merger (15 minutes)
    • Concluding Remark (5 minutes)
  3. Presenters:
    • Jae-Gil Lee is an associate professor at the School of Computing, KAIST. Before joining KAIST in 2010, he worked at the IBM Almaden Research Center and the University of Illinois Urbana-Champaign. He earned his Ph.D. in computer science in 2005 from KAIST. His research interests encompass spatio-temporal data mining and scalable machine learning, and he is recently working on the data quality issues for deep learning. He is a senior program committee member of KDD 2021 and has served as an associate editor of IEEE TKDE since 2019.
    • Yuji Roh is a Ph.D. student at the School of Electrical Engineering, KAIST. Her research interests are responsible/trustworthy AI, human-centered AI, and big data-AI integration. She won the Qualcomm Innovation Fellowship Korea in 2020. She received her B.S. degree in Electrical Engineering from KAIST in 2018.
    • Hwanjun Song is a research scientist of NAVER AI Lab. He is particularly interested in designing advanced approaches to handle large-scale and noisy data, which are two main real-world challenges for the practical use of AI approaches. He worked as a research intern at Google Research in 2020. He earned his Ph.D. in Knowledge Service Engineering from KAIST in 2021.
    • Steven Euijong Whang is an associate professor at the School of Electrical Engineering and Graduate School of AI, KAIST. His research interests are responsible AI and big data-AI integration. Previously he was a Research Scientist at Google Research and co-developed the data infrastructure of the TensorFlow Extended (TFX) end-to-end machine learning platform. He received his Ph.D. in computer science in 2012 from Stanford University. He is a recipient of the Google AI Focused Research Award in 2018, the first in Asia.
  4. Slides: