Table of Contents
Data Cleaning and Validation
Overviews
Data Cleaning
Data Validation
Related Pages
Data Cleaning and Validation
Overviews
Data Lifecycle
Polyzotis et al 2018 - Data Lifecycle Challenges in Production Machine Learning: A Survey
Data Cleaning
Krishnan et al 2017 - BoostClean: Automated Error Detection and Repair for Machine Learning
(searched “data cleaning ensembling machine learning” on Google Scholar)
2017 - Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations
Liu & Guo 2020 - Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates
Swayamdipta et al 2020 - Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Data Validation
Book chapter:
Ch 4 - Data Validation
Talks about TensorFlow Data Validation (TFDV)
Breck et al 2019 - Data Validation for Machine Learning
(from
here
)
Related Pages
Data Preparation
Dataset Creation