Building a Flight Delay Prediction Model on papAI

Flight delays can be a major inconvenience for travelers and can also have a significant impact on the airline industry. To minimize the negative effects of flight delays, it is important to have accurate predictions of when and where delays are likely to occur. According to the EUROCONTROL quarter 3 2022 report, The average delay per flight on departure in Q3 2022 increased to 23.0 minutes per flight from 10.4 minutes in Q3 2021.


Using machine learning algorithms to analyze past flight data and find patterns related to delays is one method of anticipating flight delays. In this article, we’ll examine how to build a model from this data and apply it to forecast upcoming flights under similar conditions.

Overview of aircraft delays in europe quarter 3 2022


The average departure delay per flight for all-causes


Flights arriving within 15 minutes or earlier than the scheduled arrival time 


The proportion of reactionary delay to average delay per flight

The trend of higher delays has continued. In Q3 2022, the average delay per flight (the delay to a flight experienced by airlines, airports and passengers against the scheduled time) increased to a 5-year high of 23.0 minutes per flight. This compared to Q3 2021 where the average delay per flight was 10.4 minutes. 

The number of flights increased by 27% when compared to Q3 2021. However, in comparison to Q3 2019, there were 13% fewer flights. 

Arrival punctuality for the quarter also sharply deteriorated, with only 64.5% of flights arriving within 15 minutes or earlier than their scheduled arrival time (STA). 

Analysis into the causes of delay shows reactionary delay contributed the most to the average delay per flight at 11.0 minutes, with airline causes ranking second with 6.1 minutes per flight.

(Eurocontrol source)

What are the main factors contributing to flight delays ?

There are several things that can delay a flight, among them :

Weather conditions : Flight delays may be caused by adverse weather conditions such as strong winds, thunderstorms and poor visibility. These circumstances can interfere with normal air traffic patterns and make it difficult or impossible for aircraft to take off or land safely.

No-show passengers: Flight delays can also be brought on by no-show passengers. The airline may need to wait for a standby customer or rebook the seat when a passenger fails to show up for their flight, which can take time and cause the flight to depart later than scheduled.

Flight crew rest requirements: Another aspect that may cause delays in flights is the need for flight crew rest. To ensure safety and adherence to rules, airlines must provide their pilots and flight attendants enough time to relax. If the flight crew is unable to take off when planned, this could result in delays.

Technical problems: Flight delays can also be brought on by technical issues. These could involve difficulties with the aircraft’s mechanical systems, difficulties with the navigation system, or other technical issues that need to be fixed before the flight can depart.

Air traffic restrictions: Restrictions on air traffic can potentially delay flights. These can include limitations on the number of flights that can take off or land at a specific airport at a particular time, or limitations on the flight paths that are permitted.

Airport security lines: Flight delays might also result from long airport security lines. These could be lengthy waits at the TSA checkpoint or delays brought on by security measures like pat-downs or bag checks.


What are the impacts of flight delays ?

Flight delays have a tremendous effect on the airline sector. Airlines may lose revenue as a result of delays, and their expenses for items like fuel and labour may rise as well. In the US alone, airline delays cost the economy an estimated $16 billion in lost productivity and other costs each year, according to a research by the Global Business Travel Association.

The impact of flight delays can be felt in several areas:

Economic impact: Airlines may lose money as a result of flight delays, in addition to incurring higher costs for items like manpower and fuel. In the United States alone, flight delays are estimated to cost the economy $16 billion annually in lost productivity and other costs.

Reputation : Due to potential customer perceptions that the airline is unreliable and untrustworthy, flight delays can have a detrimental effect on an airline’s reputation. Long-term, this may cause the airline to lose clients and income.

Disruption of personal and professional schedules: Travelers may experience severe schedule interruptions, lost productivity, and increased stress as a result of delays.

Missed connections: Missed connections due to delays can cost tourists more money in terms of lodging, transportation, and other fees.

Environmental impact: The environment might be negatively impacted by flight delays since extra fuel may need to be burned while the planes are waiting to take off or get to their destination.



Through this use case, we are going to train an ML model capable of predicting if these flights will be delayed or not and maybe help the industry into better organization and minimize these extra costs. 

We are focusing on this dataset containing US domestic flights from 2018 with around 7 million rows and 28 categories related to each flight details that will be manipulated through the papAI platform.

1- Importing dataset into papAI platform

Thanks to papAI platform data import solutions, we can import any type of data, including our dataset, from any source you expect. Adding to that, the import can be quite seamless and fast regardless of the size. In our case, we import the dataset from the local machine and check the preview to inspect if the expected schema is detected and change some settings to your liking to import it the correct way into your papAI project.

2- Exploring and analyzing the data

Before going through the cleaning step, we need to explore what this raw data has to offer. That’s why papAI platform includes statistics and data visualization tools included when you import a dataset to assess any patterns found that can be interesting to explore before the cleaning step. In our case, we can see that American Airlines has accumulated around 3.5 million minutes of arrival delay, which represents nearly 7 years of delay cumulated for the year 2018. However, compared to their rivals, Delta Airlines have a negative cumulated delay of -61 thousand minutes or nearly a month and a half where they have earlier than the expected arrival time throughout all their domestic flights in the US in 2018. Through these types of analysis, we can have clear insights easily accessible and intuitive through these simple tools.

3- Prepare the dataset for training ML model

After the EDA step, we have a clear vision of what is expected as a result of the prediction. Hence, papAI gives you the ability to clean and preprocess the raw data with either simple operations included in the Cleaning module or with more complex operations through Python, R, or SQL recipes. In our case, some categories describing the reason for the delay and other information that we don’t have before the departure are dropped to train the model according to the information we have along with some feature engineering to assess if the flight is delayed or not. Giving us a dataset with only known data before the actual departure of the airplane to obtain the right prediction from the model

4- Choosing the right model

When the dataset is set up correctly, you will need to create your own experiments to test out multiple models and find the right one that will answer your needs. Thanks to the wide catalog of built-in algorithms on papAI, you can try out multiple options and tweak some settings to your liking in order to set up your ML pipeline. For our case, we have a use case of Binary Classification to predict the flight status, either it’s on time or delayed. Through the experiment creation panel, you tweak every setting necessary to create your own ML pipeline and train it.

5- Evaluating the model’s performances

When you are done with the settings, you can train your experiments and when the training is done, you can check out the performances of each model by clicking into one of the runs and you will land on the evaluation interface, full of metrics and useful tools and graphs to assess the right model to promote and use its prediction capabilities. Not only can you evaluate the model, but you can also discover in-depth what influences the prediction made by the model through the interpretability module. In our case, the accuracy of our model is 87%.

6- Selecting the right model and predict on a raw dataset

When you are certain of choosing the right model for your needs, you simply need to promote it and then use its prediction capabilities on any other dataset to test its ability to correctly predict the target and how good it is. Adding to that, for each individual prediction, you can explore the influence of each value to the prediction and even try to make some assumptions and understand how the prediction is affected. This module is the local interpretability.

When you are really satisfied with the results, you can just simply deploy your model into your local environment and use it to reach your goals from your use case.


In conclusion, flight delays are a major problem in the airline industry, causing significant inconvenience and costs to travelers and the industry, not only economically but also in terms of reputation, disruption of schedules, missed connections, and environmental impact. As passenger numbers increase, the problem is likely to get worse if the industry does not take action to reduce delays. By using platforms such as papAI and implementing effective processes, and by constantly monitoring and analyzing data, the industry can help minimize the impact of flight delays and improve the overall traveler experience.

Interested in discovering papAI ?

Our commercial team is at your disposal for any question.

Building a Flight Delay Prediction Model on papAI
Scroll to top