Login or signup to connect with paper authors and to register for specific Author Connect sessions (if available).
The Cost of Rigidity: How Class-Based Data Limits Machine Learning Performance
Aida Khosh Raftar Nouri, Jeffrey Parsons
The design of data collection interfaces can significantly influence the quality of training data for machine learning (ML) systems. While class-based designs—which structure information into predefined categories—are widely used, their rigid classifications risk omitting critical details. This study compares class-based data collection with an instance-based approach, where contributors freely describe observed phenomena, to assess their impact on ML performance. We conducted a controlled experiment in autonomous driving. Participants (N = 260) described driving scenes using class-based or instance-based interfaces. Machine learning models trained on these datasets were evaluated for predictive accuracy. Results show instance-based data outperformed class-based across all models, with higher accuracy and robustness, improving generalization to real-world complexity. In cases where data is crowdsourced to train machine learning models, this work shows the benefits of instance-based data collection design that integrates structured input with open-ended reporting to enhance ML performance.
AuthorConnect Sessions
No sessions scheduled yet