Abstract
|
The basic objective of this study is to develop a model which analyzes and predicts the occurrence of flight arrival delays in the United States. Macroscopic and microscopic delay factors are discussed. In this research, we proposed new features which, to the best of our knowledge, were never used in previous studies, namely departure Period and Arrival Period of the day (Mornings, Afternoons, Evenings, Nights) and type of aircraft. US domestic flight data for the year 2018,extracted from Bureau of Transportation Statistics (BTS), were adopted in order to train the predictive model. We used efficient Machine Learning classifiers such as Naive Bayes, Decision Trees, K-Nearest Neighbors and Random Forest. To overcome the issue of imbalanced data, sampling techniques were performed. We chose Grid Search technique for best parameters selection . We evaluated the effectiveness of each algorithm by comparing performance metrics, parameters optimization, data balancing and features selection. As a result, Random Forest proved to be the best classifier with an accuracy of 93.56% and a well satisfying classification. The performance of each classifier was compared in terms of evalua- tion metrics, parameters tuning, data sampling and features selection. The experimental results showed that tuning and sampling techniques have successfully generated the best classifier which is MLP with an accuracy of 98.42% and a higher number of correctly classified flights.
|
Keywords
|
Machine Learning Classification, Flight Delay Prediction,
Multilayer Perceptron, Random Forest, Decision Trees,
K-Nearest Neighbors
|