Authors:
Anjan Kumar Reddy Ayyadapu
Addresses:
Department of Information Technology, Cloudera Inc., Ashburn, Virginia, United States of America.
The sheer boom in data creation from multiple sources beginning from IoT sensors to social media has changed the technology landscape, rendering predictive analytics in big data environments more important than ever before. This work suggests a hybrid machine learning model for improving predictive accuracy, computational performance, and scalability in big data environments. The Model employs Random Forest (RF) for strong feature selection and Gradient Boosted Decision Trees (GBDT) for enhanced classification accuracy, both optimised using Apache Spark's MLlib in a distributed setting. The Model takes advantage of hybridisation to beat the single-algorithm model drawback in accommodating high-dimensional, heterogeneous data. The architecture is designed to handle data in real time and includes a dynamic pre-processing layer, parallel training pipeline, and continuous evaluation modules. Performance metrics like accuracy, precision, recall, F1 score, and AUC are used to validate performance against benchmark data sets. System-level metrics like latency, Throughput, and scalability are also monitored to validate usability for real-world deployments. Experiments using healthcare and e-commerce industry datasets yield better prediction power and operational effectiveness than single ML models. Visualisation using scatter plots and 3D graphs shows sharp jumps in model accuracy and processing time for different volumes of data. It is a scalable, adaptable, and precise predictive analytics tool for the big data era. Ease of use of the solution with Apache Hadoop and Spark deployment is a willingness to be utilised at the enterprise level.
Keywords: Hybrid Machine Learning; Big Data Analytics; Predictive Modelling; Apache Spark; Real-time Processing; Decision-Making; Predictive Analytics; Hybrid Models; Ensemble Models; Scalability and Adaptability.
Received on: 25/07/2024, Revised on: 12/10/2024, Accepted on: 28/11/2024, Published on: 05/03/2025
DOI: 10.64091/ATICS.2025.000103
AVE Trends in Intelligent Computing Systems, 2025 Vol. 2 No. 1, Pages: 27-38