Articles
| Open Access |
DATA-CENTRIC CLOUD WAREHOUSING FOR MACHINE LEARNING AND HIGH-STAKES DECISION SYSTEMS
Dr. Rafael Montenegro , University of Buenos Aires, ArgentinaAbstract
The contemporary data ecosystem is undergoing a profound transformation driven by the convergence of large-scale cloud-native data warehousing, artificial intelligence, and domain-specific analytics. Organizations operating in high-stakes environments such as healthcare, finance, energy, and industrial automation now depend on the seamless integration of data pipelines, scalable storage architectures, and intelligent analytical models to generate actionable insights. This research article advances a comprehensive theoretical and methodological framework for designing intelligence-driven data warehouses that fuse modern extract–transform–load paradigms, massively parallel processing platforms, and explainable machine learning systems. At the core of this framework is the recognition that data warehousing has shifted from being a passive repository of historical facts to an active computational substrate for continuous inference, prediction, and decision support, a transition that is particularly evident in cloud-native platforms such as Amazon Redshift (Worlikar, Patel, & Challa, 2025).
Building on this foundational understanding, the article critically examines how data engineering practices derived from data lakes, data warehouses, and hybrid architectures shape the reliability, scalability, and epistemic trustworthiness of downstream AI models (Mandala, 2021; Mandala, Data Engineering in Cloud-Native Architectures). It argues that the historical dichotomy between data lakes and data warehouses is increasingly inadequate for the complexity of modern analytics, particularly in domains such as predictive healthcare, financial fraud detection, and smart energy systems, where heterogeneous data streams, regulatory constraints, and real-time inference must be harmonized within a unified architectural vision (Yasmeen et al., 2024; Cheng et al., 2024). By integrating insights from explainable artificial intelligence, distributed machine learning, and multi-objective optimization, the article demonstrates that the architecture of the data warehouse itself is now a key determinant of model performance, interpretability, and ethical accountability (Xu et al., 2024; Soleimani et al., 2024).
Methodologically, the study adopts a design science research paradigm, combining architectural analysis, comparative evaluation of ETL strategies, and interpretive synthesis of recent advances in machine learning applications. Instead of relying on numerical experiments or benchmark tables, the analysis develops a richly elaborated conceptual model that links cloud-native data warehousing patterns, such as columnar storage, elastic scaling, and workload management, to the operational needs of intelligent systems across sectors. The descriptive results indicate that when modern data warehouses are designed according to principles of data-centric AI, governance-aware ETL, and explainable analytics, they significantly enhance the robustness, transparency, and adaptability of predictive models deployed in high-risk contexts.
The discussion situates these findings within broader scholarly debates on the future of data infrastructure, arguing that intelligence-driven data warehouses represent a new epistemological layer in digital societies, where the architecture of data storage and processing shapes what can be known, predicted, and acted upon. The article concludes by outlining future research directions that bridge data engineering, machine learning theory, and socio-technical governance, emphasizing that the next generation of data warehouses will not merely store data but will actively participate in the production of knowledge.
Keywords
Cloud-native data warehousing, ETL pipelines, predictive analytics, explainable artificial intelligence
References
Li, Yufeng, et al. Research on Adverse Drug Reaction Prediction Model Combining Knowledge Graph Embedding and Deep Learning. 2024 4th International Conference on Machine Learning and Intelligent Systems Engineering. IEEE, 2024.
Mandala, N. R. Security and Compliance in ETL Pipelines. Journal of Scientific and Engineering Research, 8(7), 305–313.
Xu, Q., Feng, Z., Gong, C., Wu, X., Zhao, H., Ye, Z., and Wei, C. Applications of explainable AI in natural language processing. Global Academic Frontiers, 2(3), 51–64.
Soleimani, M., Irani, F. N., and Davoodi, Y. M. Multi-objective optimization of building HVAC operation: Advanced strategy using Koopman predictive control and deep learning. Building and Environment, 248, 111073.
Wang, L., Cheng, Y., Xiang, A., Zhang, J., and Yang, H. Application of Natural Language Processing in Financial Risk Detection. Financial Engineering and Risk Management, 7, 1–10.
Tian, J., Mercier, P., and Paolini, C. Ultra low-power, wearable, accelerated shallow-learning fall detection for elderly at-risk persons. Smart Health, 100498.
Gupta, V., and Kumar, E. AO-SAKEL: arithmetic optimization-based self-adaptive kernel extreme learning for international trade prediction. Evolving Systems.
Cheng, Y., Guo, J., Long, S., Wu, Y., Sun, M., and Zhang, R. Advanced Financial Fraud Detection Using GNN-CL Model. arXiv preprint arXiv:2407.06529.
Worlikar, S., Patel, H., and Challa, A. Amazon Redshift Cookbook: Recipes for building modern data warehousing solutions. Packt Publishing Ltd.
Yasmeen, Z., Machi, S., Maguluri, K. K., Mandala, G., and Reddy, R. Transforming Patient Outcomes: Cutting-Edge Applications of AI and ML in Predictive Healthcare. SEEJPH, 25, S1.
Meng, A., Chen, S., Ou, Z., et al. A hybrid deep learning architecture for wind power prediction based on bi-attention mechanism and crisscross optimization. Energy, 238.
Leong, H. Y., Gao, Y. F., Shuai, J., et al. Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation. arXiv preprint arXiv:2409.09324.
Dang, B., Ma, D., Li, S., Qi, Z., and Zhu, E. Deep learning-based snore sound analysis for the detection of night-time breathing disorders. Applied and Computational Engineering, 76, 109–114.
Mandala, N. R. Data Engineering in Cloud-Native Architectures.
Tian, J., Mercier, P., and Paolini, C. Fall detection through inferencing at the edge. International Symposium on Intelligent Computing and Networking, Springer Nature Switzerland, 376–390.
Cheng, Y., Yang, Q., Wang, L., Xiang, A., and Zhang, J. Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm. Financial Engineering and Risk Management, 7, 11–19.
Mandala, N. R. ETL in Data Lakes vs. Data Warehouses.
Liu, Shicheng, and Zhu, Minghui. Distributed inverse constrained reinforcement learning for multi-agent systems. Advances in Neural Information Processing Systems 35, 33444–33456.
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Dr. Rafael Montenegro

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Ethics:
- Authors are responsible for obtaining permission to use any copyrighted materials included in their manuscript.
- Authors are also responsible for ensuring that their research was conducted in an ethical manner and in compliance with institutional and national guidelines for the care and use of animals or human subjects.
- By submitting a manuscript to International Journal of Economics Finance & Management Science (IJEFMS), authors agree to transfer copyright to the journal if the manuscript is accepted for publication.