Annual Conference of the IEEE Industrial Electronics Society / 2025
Multi-Dimensional Safety Assessments of LLM-Assisted Driving Systems
Large language models (LLMs), AI systems trained to process and generate human language, are increasingly being integrated into autonomous vehicles, leading to the emergence of LLM-assisted driving systems (LADSs). Rigorous evaluation of LADSs is essential for driving technological progress and building user trust. However, there is currently a lack of specific evaluation metrics for LADSs, and existing LLM evaluation methods are not readily applicable to LADSs. To evaluate the performance of LADSs, this study proposes four assessment indices that collectively consider driving robustness, safety, and ethical decision-making. First, cosine similarity is employed to guide the injection of disturbances, establishing a basis for quantitative input-output analysis. Second, robustness and safety indices are proposed to characterize vehicle performance, while an LLM-based evaluator is used to assess ethical behavior. To enhance alignment with human judgment, a language-numerical optimization algorithm is developed for prompt tuning. By integrating the knowledge base, Cohen's Kappa (κ) between the experienced driver and the LLM-based evaluator reaches 0.81, indicating strong agreement. Additionally, this study first identifies and analyzes a novel phenomenon termed "extreme thinking". Building on these results, a multi-dimensional safety assessment index is proposed to evaluate LADSs. The proposed indices and methods are validated using over 1000 data segments collected from both simulations and experiments.
Full paper
Read the original paper
A direct open-access PDF is not available in the database yet. Use the source page or learning resources below to open the complete paper from the publisher or index.