Predictive Modeling of Flight Delays at an Airport Using Machine Learning Methods


Hatıpoğlu I., Tosun Ö.

Applied Sciences (Switzerland), cilt.14, sa.13, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14 Sayı: 13
  • Basım Tarihi: 2024
  • Doi Numarası: 10.3390/app14135472
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Anahtar Kelimeler: flight delay prediction, forecasting, machine learning, performance analysis
  • Akdeniz Üniversitesi Adresli: Evet

Özet

Flight delays represent a significant challenge in the global aviation industry, resulting in substantial costs and a decline in passenger satisfaction. This study addresses the critical issue of predicting flight delays exceeding 15 min using machine learning techniques. The arrival delays at a Turkish airport are analyzed utilizing a novel dataset derived from airport operations. This research examines a range of machine learning models, including Logistic Regression, Naïve Bayes, Neural Networks, Random Forest, XGBoost, CatBoost, and LightGBM. To address the issue of imbalanced data, additional experiments are conducted using the Synthetic Minority Over-Sampling Technique (SMOTE), in conjunction with the incorporation of meteorological data. This multi-faceted approach ensures robust forecast performance under varying conditions. The SHAP (SHapley Additive exPlanations) method is employed to interpret the relative importance of features within the models. The study is based on a three-year period of flight data obtained from a Turkish airport. The dataset is sufficiently extensive and robust to provide a reliable foundation for analysis. The results indicate that XGBoost is the most proficient model for the dataset, demonstrating its potential to deliver highly accurate predictions with an accuracy of 80%. The impact of weather factors on the predictions is found to be insignificant in comparison to scenarios without weather data in this dataset.