
4 CONCLUSION AND FUTURE
WORK
This paper presented a novel, end-to-end framework
for automated plant disease detection and interven-
tion in agricultural environments. Our approach suc-
cessfully integrated three key modules: VLM-based
disease detection using CLIP, LLM-based planning
with GPT-4, and low-level robotic control execution.
The CLIP-based cassava disease detection algorithm
demonstrated significant improvements over baseline
methods, achieving 83.17% accuracy with consistent
precision, recall, and F1-scores of 0.83. Most no-
tably, our CLIP-Aug model outperformed the Mo-
bileNet baseline by 17.65 percentage points, show-
ing particularly strong performance in CMD detec-
tion with 2387 correct detections compared to Mo-
bileNet’s 171. The LLM-based planning module ef-
fectively translated disease detection results into co-
herent action sequences, demonstrating the ability to
generate contextually appropriate plans for plant nav-
igation and treatment application. Our simulation ex-
periments validated that the generated plans could be
successfully executed by the low-level control sys-
tem, with the robot accurately navigating to specific
3D coordinates, manipulating objects like spray bot-
tles, and performing targeted treatment applications.
For future work, we plan to incorporate a large-
scale dataset from SILAL, one of the largest green-
house operations in the UAE. We will extend our
pipeline to real greenhouse environments by integrat-
ing the large-scale plant dataset and deploying the
system. To mitigate dynamic-camera performance
degradation, we plan to incorporate motion-aware im-
age deblurring and fine-tune the CLIP encoder on
blurred and off-angle augmentations.
ACKNOWLEDGMENTS
This publication is based upon work supported by the
Khalifa University of Science and Technology under
Award No. RC1-2018- KUCARS and Autonomous
Underwater Robotic System for Aquaculture Appli-
cations: 8474000419.
REFERENCES
Achard, S. (2025). Indoor farming in the uae: A breakdown
in 2025. Details labor dependency (1 worker/500m²)
and high operational costs for climate-controlled sys-
tems.
Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr,
I., Hasson, Y., Lenc, K., Mensch, A., Millican, K.,
Reynolds, M., et al. (2022). Flamingo: a visual lan-
guage model for few-shot learning. Advances in neu-
ral information processing systems, 35:23716–23736.
Amrani, A., Diepeveen, D., Murray, D., Jones, M. G., and
Sohel, F. (2024). Multi-task learning model for agri-
cultural pest detection from crop-plant imagery: A
bayesian approach. Computers and Electronics in
Agriculture, 218:108719.
Arshad, M. A., Jubery, T. Z., Roy, T., Nassiri, R., Singh,
A. K., Singh, A., Hegde, C., Ganapathysubramanian,
B., Balu, A., Krishnamurthy, A., et al. (2024). Ageval:
A benchmark for zero-shot and few-shot plant stress
phenotyping with multimodal llms. arXiv preprint
arXiv:2407.19617.
Arshad, M. A., Jubery, T. Z., Roy, T., Nassiri, R., Singh,
A. K., Singh, A., Hegde, C., Ganapathysubramanian,
B., Balu, A., Krishnamurthy, A., et al. (2025). Lever-
aging vision language models for specialized agricul-
tural tasks. In 2025 IEEE/CVF Winter Conference
on Applications of Computer Vision (WACV), pages
6320–6329. IEEE. Discusses AI-driven disease detec-
tion challenges in resource-scarce environments like
the UAE.
Dai, W. (2023). Teaching Language Models to See: Build-
ing Robust and Versatile Vision-Language Models.
PhD thesis, Hong Kong University of Science and
Technology (Hong Kong).
Farooq, M. S., Javid, R., Riaz, S., and Atal, Z. (2022).
Iot based smart greenhouse framework and control
strategies for sustainable agriculture. IEEE Access,
10:99394–99420.
Feuer, B., Joshi, A., Cho, M., Chiranjeevi, S., Deng, Z. K.,
Balu, A., Singh, A. K., Sarkar, S., Merchant, N.,
Singh, A., et al. (2024). Zero-shot insect detection
via weak language supervision. The Plant Phenome
Journal, 7(1):e20107.
for International Peace, C. E. (2023). Climate change and
vulnerability in the middle east. Highlights how cli-
mate stressors exacerbate disease risks in UAE agri-
culture due to extreme heat and poor governance.
Hoseinzadeh, S. and Garcia, D. A. (2024). Ai-driven in-
novations in greenhouse agriculture: Reanalysis of
sustainability and energy efficiency impacts. Energy
Conversion and Management: X, 24:100701.
Kaggle (2020). Cassava Leaf Disease Classifica-
tion. https://www.kaggle.com/competitions/
cassava-leaf-disease-classification. Accessed:
[Insert Date Accessed].
Karahan, S., Yildirum, M. K., Kirtac, K., Rende, F. S., Bu-
tun, G., and Ekenel, H. K. (2016). How image degra-
dations affect deep cnn-based face recognition? In
2016 international conference of the biometrics spe-
cial interest group (BIOSIG), pages 1–5. IEEE.
Maraveas, C. (2022). Incorporating artificial intelligence
technology in smart greenhouses: Current state of the
art. Applied Sciences, 13(1):14.
Maraveas, C., Piromalis, D., Arvanitis, K. G., Bartzanas, T.,
and Loukatos, D. (2022). Applications of iot for op-
CLIP-LLM: A Framework for Autonomous Plant Disease Management in Greenhouse
209