projections can be generated using cohort-component
models and adjusted dynamically using LSTM
networks fed with real-time migration, birth, and
administrative data. Such
integration
improves
both
the accuracy and responsiveness of forecasts.
3 LITERATURE ANALYSIS:
APPLICATIONS AND
FORECASTING RESULTS
3.1 Datasets Used
The effectiveness
of any population forecasting
model depends significantly on the quality, resolution,
and
granularity
of the
datasets used. Across the
reviewed literature, we
observe a
diversity
of
data
sources
ranging from national census and
administrative
registers to
high-resolution
spatial
grids and real-time datasets.
Traditional demographic projections, as
represented by
Aryal, mainly
rely on census data,
vital registration
systems (including
birth
and death
registrations), and migration
statistics aggregated
at
the national
or subnational
levels. These datasets
are
typically collected every ten years and constitute
the
basis
for organized
forecasting models, such
as
cohort-component
projections. While these
sources
are marked by
standardization
and consistency, they
often suffer from
a lack
of
timeliness
and
are not
sufficient to capture rapid changes in population
dynamics, especially in times of crisis.
Probabilistic and spatial modeling approaches,
which
are represented by the research works of
Vollset
et al. and
Chen et
al. utilize
more extensive
demographic and socioeconomic information. For
example, the Global Burden of Disease study utilized
the
World Population Prospects, United
Nations
projections, and national statistical databases to make
long-term demographic projections for 195 countries.
Chen et al. extended this methodology in the Chinese
setting by integrating
gridded population
data
with
Shared Socioeconomic Pathways (SSPs) to facilitate
high-resolution
projections under alternative
climate
and
development futures. Such models
call for
harmonization of heterogeneous data sources, subject
to intricate preprocessing and spatial coordination
procedures (Vollset et al., 2020; Chen et al., 2020).
Spatial studies, such as Sang et al., relied on
county-level population data in China combined with
satellite-based urbanization metrics
and geocoded
administrative
boundaries (Sang et al., 2024). Such
fine-grained datasets are essential for modeling urban
growth and spatial population
distribution, but they
are limited by availability and standardization,
especially across developing countries.
AI and machine learning methods rely even more
heavily on comprehensive and real-time data.
Grossman et al. used small-area data from Australia’s
national
statistical
agency, which
included
historical
population counts and migration
flows
over multiple
decades (Grossman
et al., 2023). The LSTM
model
they developed required
time series input data
structured into temporal windows, making
continuous, high-frequency datasets
essential for
model
training. Similarly, Papastefanopoulos et
al.
evaluated
forecasting
models
using COVID-19
case
data expressed as percentages of active cases per
population. The
time-sensitive nature
of such data
reflects
the strengths of
AI in responding to fast-
moving demographic phenomena.
Overall, data quality and
availability remain
critical
barriers to forecasting performance. While
traditional models tolerate coarse, static data,
probabilistic and AI-based approaches demand
detailed, structured, and consistent inputs that are not
universally accessible.
3.2 Comparative Analysis of
Forecasting Results
The
reviewed
studies highlight
distinct
performance
characteristics across different forecasting techniques,
each with context-dependent strengths.
In deterministic models such as the
cohort-
component approach, Aryal found that forecasts were
generally accurate in countries with stable population
structures and low migration volatility (Aryal, 2020).
However, these
models
underperformed in
regions
experiencing sudden demographic shifts. For
example, Kim and Kim showed that even when
applied sub-nationally, deterministic projections
underestimated the speed of population
aging in
South
Korea’s rural
areas
(Kim & Kim, 2020). The
model’s simplicity, while advantageous for
interpretability, leads to
rigidity in uncertain or
rapidly evolving conditions.
Probabilistic
models introduce robustness
by
providing forecast intervals rather than single
estimates. Vollset et al. demonstrated that this
approach reduced errors in long-term global
projections. Their simulations, incorporating
uncertainty in fertility and migration, produced more
realistic estimates, especially for developing
countries with unstable demographic indicators
(Vollset et al., 2020). Yu et al. validated this strength
at the county level, where their Bayesian model