A Hybrid Machine Learning Model for Predictive Analytics in Big Data Frameworks

Authors:
Anjan Kumar Reddy Ayyadapu

Addresses:
Department of Information Technology, Cloudera Inc., Ashburn, Virginia, United States of America.

Abstract:

The sheer boom in data creation from multiple sources beginning from IoT sensors to social media has changed the technology landscape, rendering predictive analytics in big data environments more important than ever before. This work suggests a hybrid machine learning model for improving predictive accuracy, computational performance, and scalability in big data environments. The Model employs Random Forest (RF) for strong feature selection and Gradient Boosted Decision Trees (GBDT) for enhanced classification accuracy, both optimised using Apache Spark's MLlib in a distributed setting. The Model takes advantage of hybridisation to beat the single-algorithm model drawback in accommodating high-dimensional, heterogeneous data. The architecture is designed to handle data in real time and includes a dynamic pre-processing layer, parallel training pipeline, and continuous evaluation modules. Performance metrics like accuracy, precision, recall, F1 score, and AUC are used to validate performance against benchmark data sets. System-level metrics like latency, Throughput, and scalability are also monitored to validate usability for real-world deployments. Experiments using healthcare and e-commerce industry datasets yield better prediction power and operational effectiveness than single ML models. Visualisation using scatter plots and 3D graphs shows sharp jumps in model accuracy and processing time for different volumes of data. It is a scalable, adaptable, and precise predictive analytics tool for the big data era. Ease of use of the solution with Apache Hadoop and Spark deployment is a willingness to be utilised at the enterprise level.

Keywords: Hybrid Machine Learning; Big Data Analytics; Predictive Modelling; Apache Spark; Real-time Processing; Decision-Making; Predictive Analytics; Hybrid Models; Ensemble Models; Scalability and Adaptability.

Received on: 25/07/2024, Revised on: 12/10/2024, Accepted on: 28/11/2024, Published on: 05/03/2025

DOI: 10.64091/ATICS.2025.000103

AVE Trends in Intelligent Computing Systems, 2025 Vol. 2 No. 1, Pages: 27-38

  • Views : 127
  • Downloads : 13
Download PDF