cff-version: 1.2.0 abstract: "
This dataset supports the paper “Driving Heterogeneity Identification using Machine Learning: A Review and Framework for Analysis” (Chapter 2 of the PhD dissertation). The research provides a systematic review of existing machine learning (ML)-based approaches for identifying driving heterogeneity. The review organises key concepts and categorisations of driving heterogeneity, highlights strengths and drawbacks of various methods, and outlines applications of identification analysis. Based on the literature review, a structured framework that guides the ML-based identification process is proposed, including data collection and pre-processing, feature selection, ML model training, and performance evaluation. The dataset includes summary statistics on data collection methods over time and an overview of traffic variables used in the reviewed literature. It is provided as a zipped folder containing files in .ipynb
and .xlsx
formats, along with a ch2_Readme.txt
file that explains the dataset structure and provides usage instructions.