Applications and Developments of Deep Learning in Image

Processing

Xiang Wang

School of Business & IT, James Cook University, Singapore Campus, Singapore

Keywords: Deep Learning, Image Processing, Convolutional Neural Networks (CNNs), Generative Adversarial

Networks (GANs), Medical Image Analysis.

Abstract: Deep learning has become a game changer in image processing, and it has served as the most effective

approach for most practical image processing problems like object recognition, image segmentation, and

image enhancement, among others. Regarding deep learning for image processing, this paper relates these

enhanced techniques with some of the applications and trends in health care, security, and self-driving cars.

CNNs and GANs have become two indispensable techniques for future development in medical image

analysis, facial recognition, and synthetic image synthesis. Moreover, advancements in deep learning

architectures like transformer models and self-supervised learning have given further boost and

generalizability. However, prospective issues like impractical computation needs, limited data availability,

and issues of fairness and privacy are still not beyond the state of the art. However, following the propositions

of this paper and considering recent literature, it is possible to discuss the major trends and potential

developments in this area further to cover these challenges. Accordingly, while acknowledging the theoretical

developments of deep learning and learning thereof from applications practiced in the literature, this paper

outlines the role of deep learning in enhancing image processing technologies.

1 INTRODUCTION

Deep learning has been helpful in areas such as

medical image processing, self-driving automobiles,

security surveillance, and industrial robotics

(Boopathi, Pandey, & Pandey, 2023). Previous

approaches to computer vision rely on the

programmer extracting features, which hampers

generalization across different datasets and cases.

This field has recently been transformed by the CNN

learning models, intense learning, which allows

models to learn a hierarchical representation of the

image data (GeeksforGeeks, 2020).

Computer vision using deep learning methods is

on an entirely different level than earlier machine

learning techniques. In their paper, Tsirtsakis et al.

assessed that deep learning recognition surpassed

other image feature engineering techniques regarding

precision and speed and produced better

performances in automated vision-related tasks.

Therefore, deep learning has been widely applied in

many applications, and it is now one of the essential

https://orcid.org/0000-0000-0000-0000

techniques for solving many related vision tasks

(Tsirtsakis et al., 2025).

Deep learning has been one of the most significant

advancements, especially in image processing,

because of its capability to handle big data. Deep

neural networks can learn rich patterns from massive

datasets, which allow them to accurately detect,

segment, and classify objects (Noor & Ige, 2025).

ResNet, DenseNet, and Vision Transformers (ViT)

architectures have emerged, which improve model

efficiency and scalability to tackle existing issues,

like vanishing gradients and inefficient computations

(Yan et al., 2025).

The influence of deep learning goes beyond

research to practical applications. It has brought early

disease detection in healthcare with AI-driven

radiology and pathology analysis with improved

patient outcomes (Pinto-Coelho, 2023). Deep

learning models play a role in real-time perception

systems, which power such driving in autonomous

driving, enabling safe navigation in dynamic

environments (Gupta et al., 2021). In addition,

Wang, X.

Applications and Developments of Deep Learning in Image Processing.

DOI: 10.5220/0013687900004670

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 2nd International Conference on Data Science and Engineering (ICDSE 2025), pages 277-284

ISBN: 978-989-758-765-8

277

industries have utilized AI-powered quality control

systems to increase production efficiency and reduce

defects (Finio & Downie, 2024).

However, deep learning in image processing is not

without its challenges. Some research areas are still to

be focused on, like high computational costs, the

requirement of large labeled datasets, and issues

related to model interpretability. Solving these

problems will be critical to realizing more broadly

accessible and reliable deep learning systems. Future

research will seek to increase model efficiency,

reduce dependency on big datasets, and enhance

explainability to promote broader adoption in

industrial scenarios.

2 DEEP LEARNING

ARCHITECTURES FOR

IMAGE RECOGNITION

The convolutional neural networks are undoubtedly

the main ones responsible for the rise of image

recognition because of the significant improvements

in accuracy, speed, and scalability that deep learning

architectures provide. Some effective architectures

that have revolutionized the field include CNN,

ResNet, ViT, and most attention-based models. They

solve important issues such as feature extraction,

vanishing gradients, and Long Short Term Memory

for computer vision, making them inevitable in

today’s AI solutions.

2.1 Convolutional Neural Networks

(CNNs)

Convolutional Neural Networks (CNNs) have been

employed chiefly for depth image recognition

learning using convolutional layers for spatial feature

extraction, pooling layers for feature size reduction,

and fully connected layers for classification (Zhao et

al., 2024). This inevitably makes CNNs capable of

extracting low and high-level features from the

images for better accuracy and robustness.

Yann LeCun also created LeNet-5 for digit

recognition, wanting to showcase that convolution

layers were suitable for feature extraction, as

presented in the Convolutional Neural Network

(CNN) Architectures (GeeksforGeeks, 2023). These

features allow AlexNet to be a ReLU activation, data

augmentation, and GPU acceleration to make deep

networks easier to process large-scale datasets of

images (Li et al., 2021).

Henceforth, some more progression occurred,

including the VGGNet created by researchers at

Oxford University in 2014. Small 3x3 convolutional

kernels and deeply layered structures corresponded to

another benefit of VGGNet, where it was observed

that while accuracy and efficiency decreased,

computation power was reduced immensely (Boesch,

2024). The improvements in CNN architectures have

contributed significantly to enhancing image

recognition. They are indispensable for medical

imaging, computer vision, robotic vision, and Face

ID.

2.2 Residual Networks (ResNet)

Deep learning models have also become deep and

have experienced problems such as the

vanishing/exploding gradient problem, which made

training difficult. To overcome this, Residual

Networks (ResNet) employ skip connections or

residual blocks to keep the gradient flow efficient

even with intense networks (Mahaur et al., 2022).

Their deep architecture led to ResNet-50 and ResNet-

101 models performing state-of-the-art accuracy on

large-scale image classification benchmarks.

On ResNet’s foundation, DenseNet was created in

2017 and improved the information flow by

connecting every layer to all previous layers, hence

reducing the redundancy and improving the gradient

propagation (Zhou et al., 2022). This denser

connectivity resulted in higher efficiency, facilitating

deep networks to leverage fewer parameters

efficiently and achieve good performance.

DenseNet's feature reuse mechanism is proven highly

effective in medical imaging applications where deep

feature extraction is essential (Wang & Zhang, 2020).

2.3 Vision Transformer (ViT)

However, traditional CNNs rely on local receptive

fields, restricting their ability to learn the global

dependency within images. The Vision Transformer

(ViT) brought the self-attention mechanism from

natural language understanding (NLP) to image

understanding, which allows models to learn long-

range dependencies in visual data (Wang et al., 2025).

Unlike CNNs, where convolutional filters are applied

for spatial regions, ViT segments images into patches

and operates on them using self-attention

mechanisms.

Rein Bugnot noted that ViT models outperform

CNNs on large datasets with pretraining on extensive

labeled data (Bugnot, 2025). Unfortunately, ViTs

have a significant computational cost, making them

ICDSE 2025 - The International Conference on Data Science and Engineering

278

hard to deploy in resource-constrained environments.

Researchers are investigating hybrid models

combining CNNs for feature extraction with

transformer-based self-attention to balance efficiency

and performance.

2.4 Attention-Based Networks

Modern deep-learning architectures incorporate

attention mechanisms that allow models to focus on

key image features while ignoring distractions.

Another primary attention model is the Squeeze-and-

Excitation (SE) Network (Ghaffarian et al., 2021),

which injects a channel-wise attention mechanism to

recalibrate features dynamically. SE modules

improve classification accuracy by applying global

pooling followed by fully connected layers that

enhance important features and suppress less relevant

ones.

The Non-local Network is another robust

attention-based architecture that extends self-

attention to model long-range image dependencies.

However, unlike CNNs, non-local networks compute

relationships between all pixels of the image using a

global receptive field and have been proven more

effective in video analysis and action recognition

tasks. Applications of these models have seen a

marked improvement in security surveillance,

medical imaging, and real-time image segmentation.

Deep learning architectures have been game

changers for solving crucial problems with feature

extraction, gradient optimization, and global

dependency modeling in the image recognition

domain. Although ResNet and DenseNet have

allowed us to build more profound and efficient

networks, CNN remains a fundamental image-

processing tool. Self-attention mechanisms in Vision

Transformers help model long-range dependency,

and attention-based networks improve feature

selection and interpretability. With continued

research in deep learning, future advancements will

probably come from hybrid architectures that fuse the

advantages of CNNs and transformers while

maximizing efficiency in computation. These

advancements will motivate further innovation in AI-

powered image recognition for applications ranging

from healthcare to autonomous systems and industrial

automation.

3 TRAINING OPTIMIZATION

STRATEGIES

Deep learning optimization is critical to improving

accuracy, overcoming overfitting, and increasing

computational efficiency. Various training

techniques have been created to improve model

performance, such as data augmentation, transfer

learning, hyperparameter optimization,

regularization, and self-supervised learning.

3.1 Data Augmentation

Artificial expansion of training datasets through data

augmentation is commonly used to improve model

generalization. It is essential in applications where

data collection is complex, like medical imaging. For

example, image transformations in radiology have

allowed models to detect diseases in the same image

under different conditions (Islam et al., 2024).

Moreover, other sophisticated techniques like GANs

have further enriched the dataset, thus creating

synthetic examples required for training. According

to Chen et al., it has been proved that GAN-based

augmentation performs much better for medical

imaging when the labeled data are limited in the

datasets (Chen et al., 2022).

Another functional augmentation approach is

Mixup. It combines two images and their associated

labels to generate new training samples and improve

robustness (Yang & Xiang, 2024). Augmentation

methods improve the recognition of faces with

different lighting and angles and help ensure the

reliability of deep learning models in real-world

applications.

3.2 Transfer Learning

Transfer learning employs pre-trained deep learning

models on massive datasets to solve other tasks better.

It significantly reduces the training time by enabling

models to use the previous features learned, making

it ideal for applications with small labeled data

(Transfer Learning, 2022). Popular models, including

ResNet, VGG, and Vision Transformers (ViT), have

been widely adopted for transfer learning in medical

imaging and autonomous driving.

Already, CNNs pre-trained on ImageNet have

been utilized successfully for X-ray and MRI

classification tasks in medical imaging, mitigating the

need for large numbers of labeled datasets (Li et al.,

2023). Fine-tuning pre-trained models for domain-

specific tasks effectively achieves high diagnostic

Applications and Developments of Deep Learning in Image Processing

279

accuracy and efficiency. Transfer learning has

enabled the rapid deployment of AI-based defect

detection systems in industrial automation.

Cross-modal transfer learning is another growing

approach, where models developed on images are

adapted for video analysis. Crossmodal transfer

learning has furthered real-time video analytics for

surveillance systems, teaching AI based on still

images for moving frames.

3.3 Hyperparameter Optimization

Hyperparameter tuning is an important part of deep

learning model optimization. The 'right' learning rate,

batch size, and optimizer can significantly affect a

model's performance and convergence speed (Raiaan

et al., 2024). Standard hyperparameter tuning

methods are Grid Search, Random Search, and

Bayesian Optimization. AutoML techniques have

induced popularity, automating hyperparameter

selection to increase model performance.

Another crucial factor in optimization is learning

rate scheduling. Adaptive learning rate strategies,

such as cyclical learning rates and warm restarts, can

accelerate training convergence and improve the final

model accuracy. Furthermore, we also see how batch

normalization works to standardize activations during

training and how this technique helps stabilize

optimization and reduce sensitivity to initialization.

Also, the "use of hyperparameter optimization in deep

reinforcement learning helped to enhance

autonomous vehicle navigation by developing more

effective decision-making policies.

3.4 Regularization Techniques

Regularization techniques are harnessed to prevent

overfitting and improve model generalization. L2

regularization (weight decay) is a common technique

in penalizing large weight values to prevent

overfitting. Other effective regularization techniques

include dropout, which acts by randomly deactivating

neurons during training so that the model develops

redundant representations for increased robustness.

In residual networks, a newer approach called

Stochastic Depth allows models to skip layers during

training, thereby cutting costs and maintaining

accuracy. Stochastic depth has shown great promise

in facilitating deep networks to be efficient for

resource-constrained environments such as mobile

devices. According to Ioffe & Szegedy, Batch

Normalization normalizes the inputs to each layer and

allows for faster training and stabler deep networks

(Ioffe & Szegedy, 2015).

3.5 Self-Supervised Learning

Deep learning models can learn from unlabeled data

with the help of pretext tasks through self-supervised

learning (SSL). SSL methods like contrastive

learning and predictive modeling have shown

tremendous success in reducing the reliance on

massive labeled datasets. Contrastive learning-based

approaches like SimCLR and MoCo have been shown

to outperform supervised learning on many vision

tasks with self-supervised feature learning.

In medical imaging, self-supervised learning

played a crucial role in training models with fewer

labeled medical scans, thus improving the diagnostic

resolution. SSL techniques such as jigsaw puzzle

solving and rotation prediction have enhanced feature

extraction functionality, enabling deep learning

models to do more with less data.

SSL has been applied in industrial environments,

such as predictive maintenance systems, where deep

learning models can analyze sensor data to explore

equipment's early failure symptoms. Self-supervised

learning-based anomaly detection in industrial

automation has dramatically lowered unexpected

downtimes, resulting in significant cost savings.

The role of training optimization strategies in

improving deep learning models for image processing

is crucial. Data augmentation, transfer learning,

hyperparameter tuning, and regularization all boost

model performance, while self‐supervised learning

eliminates the need for labeled data. With the ongoing

advancements in deep learning, future optimization

methods will also lead to even more significant

improvements in computational efficiency and model

generalization. Future research will likely focus on

the AutoML hyperparameter selection, expanding on

SSL techniques, and even developing energy-

efficient training methods—the following section

details real-world use cases and challenges of

implementing deep learning models for image

processing.

4 APPLICATIONS AND

CHALLENGES

Deep learning and its state-of-the-art solutions have

revolutionized image-processing tasks. This section

explore key application areas of deep learning in

medical imaging, autonomous driving, security

surveillance, and industrial automation, as well as the

challenges, including data requirements,

computational costs, and model interpretability.

ICDSE 2025 - The International Conference on Data Science and Engineering

280

4.1 Applications

4.1.1 Medical Imaging

The advancements in medical imaging via deep

learning have dramatically enhanced diagnostic

accuracy and speed. Deep learning models are

capable of automatically detecting tumors while

reducing false favorable rates and promoting early

diagnosis. Convolutional Neural Networks (CNNs)

and Vision Transformers are increasingly used in

cancer cell detection, disease classification, and

medical image segmentation (Jiang et al., 2023).

Based on these models, it has been possible to design

models that will assist in analyzing MRI and CT scans

for tumor detection and also help radiologists

diagnose diseases with much precision.

Deep learning-powered automated systems are

helpful in enhancing the sensitivity for detecting any

abnormalities in radiology images and are not a

burden for radiologists. The well-known CNN

architecture that is widely used is U-Net, which has

significantly improved the accuracy of the organ and

lesions’ segmentation. This means that with these

advancements, one would be able to diagnose faster

and more effectively.

Federated learning is gradually becoming one of

the critical instruments in medical AI. Several deep

learning models are trained across multiple

institutions across datasets without patients’

information being transferred across institutions. As

explained, this method makes deep learning-based

radiology models more generalizable to different

patient populations while also considering ethics and

privacy issues. In medical AI, federated learning

helps improve the identification of disease and

diagnosis without compromising data protection.

The utilization of Deep Learning in Medical

Imaging is set to expand as the field develops with an

expectation of producing advanced and accurate

diagnostic techniques. This could also pave the way

for further expansion of the use of AI in live clinical

decision support systems to improve the metrics on

such areas from the viewpoint of medical

professionals and patients. Deep learning and

federated learning are two approaches to enhancing

medical imaging and are very promising because they

improve disease detection without violating patients’

privacy and ethical issues.

4.1.2 Autonomous Driving

Autonomous driving uses deep learning extensively

to achieve real-time object recognition, lane

detection, and collision evasion pedestrian, Vehicle,

and Traffic sign recognition models using CNNs and

LiDAR-based deep learning. Tesla and Waymo use

deep learning-driven vision systems to interpret

complex driving environments. One popular deep

learning model for object detection, YOLO (You

Only Look Once), has significantly advanced real-

time object detection, permitting autonomous

vehicles to decide instantly while driving in real-time

(Gheorghe et al., 2024).

In addition to object detection, deep learning

models based on reinforcement learning allow self-

driving systems to make real-time adaptive driving

decisions. Deep networks processing raw sensor

inputs to predict vehicle controls are explored to

enhance decision-making capabilities using end-to-

end learning approaches. One challenge in ensuring

deep learning-based driving models' safety and

integrity is rigorously testing them under various

environmental conditions.

4.1.3 Security and Surveillance

Facial recognition, behavior analysis, and threat

detection have been built based on deep learning,

which enhances security and surveillance.

Governments and private organizations track public

spaces using AI-powered surveillance systems to

detect suspicious actions. FaceNet is a deep-learning

model that performs accurate facial recognition for

identity verification. For example, recurrent neural

networks (RNNs) are used to analyze behavior and

detect anomalies in surveillance footage, preventing

real-time security threats.

Through the use of deep learning, it is reasonably

possible to process hundreds of video streams

simultaneously and detect the presence of unusual

activities and potential threats. Nevertheless, ethical

issues concerning mass surveillance and privacy

violations call for the development of fairness-aware

AI models that aim to address particular biases in

facial recognition systems (Shabbir et al., 2024).

4.1.4 Industrial Automation

Some utilization areas of deep learning include

manufacturing and quality control for defect

detection, product classification, and preventive

maintenance. CNNs and transformer models enable

effective analysis of product images on assembly

lines to identify defects, reducing reliance on manual

inspection (Thomas et al., 2023). Using artificial

intelligent controlled systems decreases the prospect

of errors and increases production capacity.

Specifically, it encourages robotic automation

Applications and Developments of Deep Learning in Image Processing

281

because industrial robots are capable of analyzing

data and making precise time decisions about a

particular manufacturing process.

Through the application of deep learning in the

industrial automation processes, the number of

produced defects has been reduced by 30 percent,

generally enhancing predictive maintenance. Some of

the loads from digitized manufacturing include

predictive maintenance for equipment supported by

advanced IT, such as artificial intelligence, in that it

is used to identify possible machinery breakdowns

before they occur, hence avoiding downtime and

costly maintenance. Thus, wear and tear may be

predicted based on the data from sensors and

observations of the machine’s previous performance

to avoid potential breakdown.

In particular, deep learning provides quality

control through real-time monitoring and anomaly

detection. Deep learning leads to advanced vision

systems capable of detecting minute defects in

products that a human inspector may miss, resulting

in higher product reliability. AI-driven automation

also facilitates supply chain management by

optimizing inventory tracking, demand forecasting,

and logistics.

However, as deep learning remains a growing

field, the applications in the manufacturing industry

will grow along with it, resulting in increasingly

intelligent and adaptive production systems. The

future could include self-learning robots integrated

with Process adjustments based on real-time feedback

and more precise defect detection models. Integrating

deep learning with industrial automation allows

manufacturers to be more efficient, reliable, and cost-

effective, making industrial production more

innovative and resilient in an increasingly

competitive market.

4.2 Challenges

4.2.1 Large-Scale Data Requirements

One major issue with deep learning in image

processing is that in order to achieve good results,

large amounts of labeled training data are required.

According to (Khan et al., 2022), data annotation

costs remain a significant bottleneck, especially in

domains that require expert labeling, such as medical

imaging and autonomous driving. Millions of labeled

images are required by supervised learning

approaches to achieve high accuracy. However,

assembling and annotating such large datasets is

tedious and costly. In order to address this problem,

self-suppled learning and data-efficient training

techniques are researched to limit data dependency

4.2.2 High Computational Costs

CNNs and transformers, both deep learning models,

require a lot of computational resources. Training

deep networks is expensive in computation and

infrastructure (cost of GPUs, TPUs). Additionally,

efficient real-time models that can process images for

applications such as autonomous driving or

surveillance are needed. Pruning and quantization

techniques have been developed to reduce model size

while keeping accuracy intact so that the model can

be deployed at the edge devices. These techniques are

being investigated to optimize deep learning

architectures for deployment on low-power hardware.

4.2.3 Model Interpretability Issues

Although deep learning models are very accurate,

they are often considered a black box because it can

be challenging to understand their decision-making

process. In medical imaging or autonomous driving,

understanding how a model arrives at its conclusion

is critical for reliability and trust (Boopathi, Pandey,

& Pandey, 2023): model transparency and

interpretability increase by developing explainable

AI (XAI) methods like Grad-CAM and SHAP. AI

techniques are crucial in high-stakes scenarios where

profound learning decisions are interpretable by

human experts.

5 CONCLUSIONS

Deep learning has revolutionized image processing,

improving accuracy, efficiency, and automation.

Deeper learning models have reached new heights in

medical imaging, eliminating diagnosis errors and

extending early detection capabilities." As a result,

AI-powered diagnostic tools that help radiologists,

improve pathology analysis, and allow predictive

solutions in healthcare have become available. This

also impacts autonomous driving, where object

recognition and real-time decision-making rely

heavily on deep learning. Deep learning models such

as YOLO have significantly improved vehicle

perception, making self-driving systems safer and

more responsive.

In future research, there is likely a push for more

efficient, interpretable, and accessible deep-learning

models for image processing. Self-supervised and

federated learning are key solutions for reducing the

ICDSE 2025 - The International Conference on Data Science and Engineering

282

reliance on labeled data so that AI can be scaled to

various applications." Further, adversarial robustness

to these attacks and vulnerabilities will require

significant efforts to protect AI systems safely and

robustly in cybersecurity and fraud detection.

Finally, deep learning has altered the possibilities

of image processing and has the potential to

revolutionize applications in many industries.

However, these methods pose difficulties regarding

computational costs, data needs, and model

interpretability, which must be overcome for wider

adoption and longevity. Deep learning will remain

one of the cornerstones of artificial intelligence by

further advancing model efficiency, ethical AI

practices, and interpretability, delivering more

reliable and accessible solutions for practical

applications.

REFERENCES

Boesch, G. 2024. Very Deep Convolutional Networks

(VGG) Essential Guide. viso.ai. Retrieved from

https://viso.ai/deep-learning/vgg-very-deep-

convolutional-networks.

Boopathi, S., Pandey, B. K., & Pandey, D. 2023. Advances

in artificial intelligence for image processing. In

Advances in computational intelligence and robotics

book series (pp. 73–95).

Bugnot, R. 2025. Vision Transformers (ViT) explained:

Are they better than CNNs? Towards Data Science.

Retrieved from https://towardsdatascience.com/vision-

transformers-vit-explained-are-they-better-than-cnns/

Chen, Y., et al. 2022. Generative adversarial networks in

medical image augmentation: A review. Computers in

Biology and Medicine, 144, 105382.

Finio, M., & Downie, A. 2024. AI in manufacturing.

IBM.com. Retrieved from

https://www.ibm.com/think/topics/ai-in-manufacturing

GeeksforGeeks. 2020. Residual Networks (ResNet) - Deep

Learning. GeeksforGeeks. Retrieved from

https://www.geeksforgeeks.org/residual-networks-

resnet-deep-learning/

GeeksforGeeks. 2023. Convolutional Neural Network

(CNN) architectures. GeeksforGeeks. Retrieved from

https://www.geeksforgeeks.org/convolutional-neural-

network-cnn-architectures/

Ghaffarian, S., Valente, J., Van Der Voort, M., &

Tekinerdogan, B. 2021. Effect of attention mechanism

in deep learning-based remote sensing image

processing: A systematic literature review. Remote

Sensing, 13(15), 2965.

Gheorghe, C., Duguleana, M., Boboc, R. G., & Postelnicu,

C. C. 2024. Analyzing real-time object detection with

YOLO algorithm in automotive applications: A review.

Computer Modeling in Engineering & Sciences,

141(3), 1939–1981.

Gupta, A., Anpalagan, A., Guan, L., & Khwaja, A. S. 2021.

Deep learning for object detection and scene perception

in self-driving cars: Survey, challenges, and open

issues. Array, 10, 100057.

Ioffe, S., & Szegedy, C. 2015. Batch normalization:

Accelerating deep network training by reducing

internal covariate shift. arXiv.org.

Islam, T., Hafiz, M. S., Jim, J. R., Kabir, M. M., & Mridha,

M. F. 2024. A systematic review of deep learning data

augmentation in medical imaging: Recent advances and

future research directions. Healthcare Analytics, 5,

100340.

Jiang, X., Hu, Z., Wang, S., & Zhang, Y. 2023. Deep

learning for medical image-based cancer diagnosis.

Cancers, 15(14), 3608.

Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S.,

& Shah, M. 2022. Transformers in vision: A survey.

ACM Computing Surveys, 54(10s), 1–41.

Li, M., Jiang, Y., Zhang, Y., & Zhu, H. 2023. Medical

image analysis using deep learning algorithms.

Frontiers in Public Health, 11.

Li, S., Wang, L., Li, J., & Yao, Y. 2021. Image

classification algorithm based on improved AlexNet.

Journal of Physics Conference Series, 1813(1), 012051.

Mahaur, B., Mishra, K. K., & Singh, N. 2022. Improved

residual network based on norm-preservation for visual

recognition. Neural Networks, 157, 305–322.

Mohd Noor, M. H., & Ige, A. O. 2025. A survey on state-

of-the-art deep learning applications and challenges.

Pinto-Coelho, L. 2023. How artificial intelligence is

shaping medical imaging technology: A survey of

innovations and applications. Bioengineering, 10(12),

1435.

Raiaan, M. A. K., et al. 2024. A systematic review of

hyperparameter optimization techniques in

convolutional neural networks. Decision Analytics

Journal, 11, 100470.

Shabbir, A., Arshad, N., Rahman, S., Sayem, M. A., &

Chowdhury, F. 2024. Analyzing surveillance videos in

real-time using AI-powered deep learning techniques.

International Journal of Research in Information

Technology and Computer Communication. Retrieved

from http://www.ijritcc.org

Thomas, J. B., Chaudhari, S. G., K. V., S., & Verma, N. K.

2023. CNN-based transformer model for fault detection

in power system networks. IEEE Transactions on

Instrumentation and Measurement, 72, 1–10.

Transfer Learning. 2022. Transfer learning - an overview.

ScienceDirect Topics. Retrieved from

https://www.sciencedirect.com/topics/computer-

science/transfer-learning

Tsirtsakis, P., Zacharis, G., Maraslidis, G. S., & Fragulis,

G. F. 2025. Deep learning for object recognition: A

comprehensive review of models and algorithms.

International Journal of Cognitive Computing in

Engineering.

Wang, S.-H., & Zhang, Y.-D. 2020. DenseNet-201-based

deep neural network with composite learning factor and

precomputation for multiple sclerosis classification.

Applications and Developments of Deep Learning in Image Processing

283

ACM Transactions on Multimedia Computing

Communications and Applications, 16(2s), 1–19.

Wang, Y., Deng, Y., Zheng, Y., Chattopadhyay, P., &

Wang, L. 2025. Vision Transformers for image

classification: A comparative survey. Technologies,

13(1), 32.

Yan, H., Mubonanyikuzo, V., Komolafe, T. E., Zhou, L.,

Wu, T., & Wang, N. 2025. Hybrid-RViT: Hybridizing

ResNet-50 and Vision Transformer for enhanced

Alzheimer’s disease detection. PLoS ONE, 20(2),

e0318998.

Yang, L., & Xiang, Y. 2024. AMPLIFY: Attention-based

mixup for performance improvement and label

smoothing in transformer. PeerJ Computer Science, 10,

e2011.

Zhao, X., Wang, L., Zhang, Y., Han, X., Deveci, M., &

Parmar, M. 2024. A review of convolutional neural

networks in computer vision. Artificial Intelligence

Review, 57(4).

Zhou, T., Ye, X., Lu, H., Zheng, X., Qiu, S., & Liu, Y. 2022.

Dense convolutional network and its application in

medical image analysis. BioMed Research

International, 2022, 1–22.

ICDSE 2025 - The International Conference on Data Science and Engineering

284