Crop Prediction Using Feature Selection and Machine Learning
S. Md Riyaz Naik, Shaik Mohammad Zaheer, Lingala Madhana Damodhar,
Shaik Shahid and Edula Shiva Praneeth Reddy
Department of Computer Science and Engineering, Santhiram Engineering College, Nandyal, Andhra Pradesh, India
Keywords: Crop Yield Prediction, Machine Learning, Stacked Regression, Lasso Feature Selection, Kernel Ridge
Regression, Elastic Net.
Abstract: In many developing countries, farming isn’t just about filling plates it’s the pulse of economies, supporting
millions of livelihoods. Yet farmers and leaders face a relentless struggle: harvests that swing wildly due to
erratic weather, exhausted soils, and shrinking resources like water and fertile land. This uncertainty ripples
through communities, threatening food security and economic survival. By blending three advanced
algorithms Lasso Regression, Kernel Ridge Regression, and Elastic Net the research builds a "stacked" model
designed to predict crop yields with precision. But here’s the game-changer: instead of drowning in endless
data, the model focuses only on the most critical factors like rainfall, temperature, and pesticide use through
a process called feature selection.
1 INTRODUCTION
Agriculture isn’t just about planting seeds it’s the
backbone of economies, feeding billions and
employing millions worldwide. But predicting crop
yields has always been a high-stakes guessing game.
Traditional methods, like relying on past harvest data
or expert opinions, often miss the mark. Why?
Because nature doesn’t follow a script. Climate shifts,
soil changes, and unpredictable weather throw
curveballs that old-school approaches can’t catch.
Enter machine learning, the game-changer.
Imagine a tool that sifts through mountains of data
rainfall patterns, pesticide use, temperature swings
and spots hidden trends even seasoned experts might
miss.
By stacking these techniques, the model becomes
a precision tool. But here’s the kicker: pairing them
with feature selection a process that trims the fat from
datasets supercharges efficiency. Less clutter means
faster computations and sharper predictions.
2 LITERATURE REVIEW
Imagine a world where farmers can predict harvests
with near-surgical precision no crystal ball needed,
just data. That’s the promise machine learning (ML)
has brought to agriculture, revolutionizing how we
forecast crop yields. Over the past decade, researchers
have tested everything from basic algorithms to
cutting-edge AI, and the results are reshaping the field
literally. Early pioneers like Veenadhari et al. (2011)
cracked open the potential of ML by linking climate
factors like rainfall and soil health to soybean yields
using Decision Trees. But the real breakthrough came
with Random Forest models. Shekoofa et al. (2014)
showed these "forests" of decision trees could
outsmart traditional stats in predicting maize yields,
thanks to their knack for spotting patterns in messy,
real-world data. By 2016, sugarcane farmers saw a
15% accuracy boost using Random Forest, proving
ML wasn’t just a lab experiment it worked in the field.
As datasets grew, so did ambition. Researchers
like Gomez et al. (2019) tapped into satellite
imagery and deep learning to predict potato yields,
turning pixels into actionable insights. Meanwhile,
Pandith et al. (2020) pitted K-Nearest Neighbors
(KNN) and Artificial Neural Networks
(ANN) against simpler models for mustard crops.
Spoiler: ANN and KNN won, showing complex
relationships (think soil pH + monsoon timing) matter
more than single factors. Why settle for one algorithm
when you can mix and match? Mishra et al. (2016)
fused techniques like boosting and bagging into
hybrid models, which adapted better to unpredictable
variables (looking at you, climate change). Even older