(2024)). Also, these studies deem that the
involvement of deep learning models partly
incorporated with physical processes using physical
constraints ensures more accurate predictions (de
Bezenac, et.al., 2020). Res Net-based models and
deep convolutional networks provided promising
results for medium-range forecasting (Rasp, S., &
Thuerey, N. (2021)), and also adaptive Fourier neural
operators have been very well used for high-
resolution forecasts (Pathak, J, et.al., 2022).
Moreover, graph neural networks have efficiently
predicted weather patterns showing good spatial
awareness (Sønderby, C. K., et.al., 2020). In addition,
the AI-powered precipitation forecasting models such
as Met Net have significantly improved short-term
weather forecasts Zhang, J., et.al., 2021.
The integration of convolution neural networks in
satellite image analyses has played an important role
in the storm detection and severe weather prediction
Molchanov,et.al., 2021. Further, other studies
explored the use of IoT-sensor data fusion with AI
techniques for real-time weather monitoring (Sharma,
R., et.al.(2022)). Thus, big data analytics
incorporated into cloud-based prediction systems
have greatly enhanced AI's predictive capabilities
(Weyn, J. A., et.al.(2020)). In addition, methods of
ensemble learning and data assimilation are already
being investigated so as to produce the optimal
machine learning model for weather forecasting
(Kashinath, K.,et.al., (2021) ) Innovative
developments in meteorology to streamline the
accuracy of prediction include physics informed deep
learning method (Evensen, G., & Monsen, S. M.
(2021). ). All of these undertakings further exemplify
the thrust of artificial intelligence, deep learning, and
big data analytics into modern weather forecasting,
thereby heralding a more reliable and intelligent
predictive system.
3 DATASET COLLECTION AND
PRE-PROCESSING
3.1 Dataset Collection
To inform the forecasts, this project intends to use the
most diversely sourced high-quality data from widely
reputable sources TL Yu, et.al.,2024. The data
sources include: satellite observations, ground- based
stations, remotely sensed technologies, Internet of
Things enabled sensors, and historical weather
records. Satellite observations have a wide variety of
available data about the atmosphere, including
temperature, humidity, cloud amount, wind speed,
and precipitation level. (Ben-Bouallegue , et.al.,2023)
This forms the basis for creating different long-term
weather patterns and events of severe weather on a
broad scale (Anandkumar, A. (2024)). Ground- based
weather stations supply meteorological data such as
real-time information on atmospheric pressure, wind
direction, temperature variations, and precipitation.
(Anandkumar, A. (2024).)This plays a major role in
solving satellite readings through providing further
precision in forecasting locally.(de Bezenac,et.al.,
(2024)) Remote sensing technologies add the frenetic
capability of latest technologies, for instance, Doppler
RADAR and LIDAR in observing clouds-storm
intensity- wind currents, which is a further push
towards intrusive short-term weather
predictions.(Rasp, S., & Thuerey, N. (2021) The
technologies allow for extreme monitoring of
phenomena like hurricanes and
thunderstorms.(Weyn, J. A.,et.al.(2021)) IoT-enabled
weather sensors, which are located locally, collect
real-time meteorological data that enhance
microclimate analysis and short-term
forecasting.(Pathak, J. ,et.al.(2021)) Long term
historical weather records gathered together,
including NOAA, Kaggle, and meteorological
agencies basically form the background for training
machine learning models in detecting trends and
forecasting future events.(Sønderby, et.al.(2020))
Weather APIs, such as Open Weather Map, Weather
Stack, and Climacell, provide such Public API across
different parts of the globe and ease the task of the
meteorologist taking into account accuracy on
forecasts.
3.2 Data Pre-Processing
Raw meteorological data are often marked in-
completeness, inconsistency, and noise, and several
preprocessing phases are performed in order to enable
high-quality machine learning model input that
effectively cleans, normalizes, and structurally
organizes the dataset (Ben-Bouallegue, et.al., 2023).
Dealing with missing data: Value missingness in a
weather dataset can occur with sensor-related failure
or incomplete transmissions. To handle missingness,
the most often used statistical imputation strategies
include but are not limited to mean, median, mode
replacement, or interpolation (de Bezenac,et.al.,
2020). Noise reduction: Due to environmental
conditions, sensor readings may be affected, resulting
in fluctuations in the recorded values. Moving
averages, median filters, and outlier removal
algorithms are helpful for smoothing the data and