energy-efficient DNNs, contributing to sustainable
mobile AI solutions.
2 RELATED WORK
Liu et al. (2023) propose an energy-constrained
pruning method using energy budgets as constraints
to reduce computational and memory costs. While
effective, the method assumes consistent energy
budgets across all deployment scenarios, which may
not always be realistic. It may also struggle in highly
dynamic environments with fluctuating
computational loads. The reliance on fine-tuning after
each pruning iteration ensures accuracy retention but
can be computationally intensive, especially for
large-scale networks. Additionally, pruning methods
based on the Frobenius norm might overlook other
factors affecting energy consumption, such as data
movement or memory access. These limitations could
hinder real-world scalability for edge devices.
Guo et al. (2023) propose an AR-RNN model for
predicting building energy consumption with limited
historical data, achieving a 5.72% MAPE. However,
the model may introduce bias when faced with
significant shifts in usage patterns or external factors
like weather changes. Its sensitivity to the quality of
data preprocessing, particularly dimensional
reduction and interpolation, poses challenges.
Important features might be removed, leading to
reduced accuracy in more complex or variable
scenarios. Furthermore, limited historical data
inherently constrains the model’s generalizability to
different buildings, making it difficult to scale across
diverse environments without further adaptations or
additional data sources
Zhao et al. (2023) introduce a divide-and-co-
training strategy for achieving better accuracy-
efficiency trade-offs. By dividing a large network into
smaller subnetworks and training them
collaboratively, the method enhances performance
and allows for concurrent inference. However,
uneven distribution of tasks across subnetworks can
cause bottlenecks. Improvements also rely heavily on
multi-device availability, which may not be feasible
in all deployment environments. Synchronization
during co-training introduces potential
communication overheads, slowing the training
process. Additionally, co-training effectiveness may
be affected by the quality of data augmentation or
sampling techniques used, limiting the approach’s
efficiency gains in certain datasets
Qin et al. (2023) propose a collaborative learning
framework for dynamic activity inference, designed
to adapt to varying computational budgets by
adjusting network width and input resolution.
However, the framework assumes all configurations
are equally effective, which may not hold in scenarios
involving complex or noisy data. Overfitting can
occur due to excessive knowledge sharing between
subnetworks. Moreover, relying on predefined
configurations limits adaptability to unforeseen
resource constraints or new devices. This
framework’s scalability and robustness under real-
world conditions may require further enhancements,
such as more adaptive configuration strategies or
dynamic input data analysis
Yang et al. (2023) propose a method emphasizing
the role of data movement in energy consumption for
DNNs. While focusing on memory access
optimization, the framework assumes static memory
hierarchies and accurate hardware energy metrics,
which may not reflect real-world variability across
different devices. These assumptions can lead to
inaccurate energy estimations in dynamic or rapidly
evolving hardware environments. Additionally,
optimizing memory access patterns is not always
feasible for highly flexible or frequently changing
DNN architectures. The methodology also does not
account for potential hardware-software co-design
challenges, which could impact its utility in more
diverse deployment scenarios
Giedra and Matuzevicius (2023) investigated the
prediction of inference times for TensorFlow Lite
models across different platforms. They evaluated
Conv2d layers' inference time to identify factors such
as input size, filter size, and hardware architecture
that impact computational efficiency. Their
methodology, which used Multilayer Perceptron
(MLP) models, achieved high prediction accuracy on
CPUs but faced challenges on resource-limited
devices like the Raspberry Pi 5 due to data variance
and limited input channels. The study emphasizes the
need for hardware-specific optimizations to improve
inference time predictions across various devices.
This study (M. B. Hossain et al. 2023) focused on
optimizing TensorFlow Lite models for low-power
systems, primarily through CNN inference time
prediction. Researchers explored various strategies
like pruning, quantization, and Network Architecture
Search (NAS) to reduce model complexity while
maintaining accuracy. They proposed a methodology
using Conv2d layers as the basis for predicting time
complexity, identifying dependencies between CNN
architecture and inference efficiency. The study
highlighted the critical role of layer configurations
and hyperparameter tuning in enhancing model
performance on edge devices.