Authors:
Girish Wali, Jessy Christadoss, Praveen Sivathapandi, Anita Kori, Chetan Bulla
Addresses:
Department of Business Intelligence, Citibank, Bengaluru, Karnataka, India. Department of Business Intelligence, Integral Ad Science, New York, United States of America. Department of Technology Architect, Citibank, Bengaluru, Karnataka, India. Department of Artificial Intelligence and Machine Learning, Basaveshwar Engineering College, Bagalkot, Karnataka, India. Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkot, Karnataka, India.
Abstract:
At a conceptual level, when examined carefully, traditional cleaning pipelines often operate in isolation, correcting errors based on static rules without understanding the upstream context or downstream impact., to some extent By embedding semantic understanding into the lineage tracking, the system can infer correct data values based on the relationships between entities rather than just column constraints., depending on contextual factors In many observed contexts, the rapid adoption of Lakehouse architectures has converged the flexibility of data lakes with the management capabilities of data warehouses., in several instances This study proposes a novel Semantic Lineage Pipeline, (as reflected in earlier discussions In many observed contexts, the study employed Apache Spark for processing, Delta Lake for storage, and a custom graph-based lineage parser. In a broader academic sense, researchers utilised a dataset comprising 491 instances of complex retail transaction logs and IoT sensor readings. The observed outcomes demonstrate that incorporating semantic lineage significantly improves the accuracy of automated repairs compared to traditional isolationist methods. to some extent (SLP) that leverages metadata graphs to automate data repair across multiple instances. From an interpretative angle, the proposed architecture reduces manual intervention time and increases the reliability of analytical dashboards derived from the Lakehouse.
Keywords: Lakehouse Model; Data Quality; Semantic Lineage; Automated Repair; Metadata Management; Data Governance; Contextual Factors; Business Intelligence.
Received on: 06/03/2025, Revised on: 27/06/2025, Accepted on: 26/08/2025, Published on: 03/01/2026
DOI: 10.64091/ATICS.2026.000284
AVE Trends in Intelligent Computing Systems, 2026 Vol. 3 No. 1 , Pages: 48-56