Articles
| Open Access |
Navigating Complexity: A Multidisciplinary Framework for Resilience Engineering, Chaos Testing, and Human Reliability in Distributed Cyber-Physical Ecosystems
Kat Milestone , Institute for Systems Engineering and Project Management, University of Edinburgh, United KingdomAbstract
This research presents an extensive investigation into the convergence of resilience engineering, chaos testing, and human reliability across distributed systems and cyber-physical infrastructures. As industrial ecosystems evolve toward hyper-connectivity, the traditional reactive paradigms of fault tolerance are increasingly insufficient. This article synthesizes diverse theoretical perspectives-from project management praxeology and supply chain resilience to microservices testing and cloud availability-to propose a holistic framework for systemic robustness. Through a rigorous analysis of chaos engineering as a pedagogical tool, the study explores how intentional disruption facilitates the development of high-reliability engineering teams. Furthermore, the research addresses the "domino effect" in industrial settings, proposing dynamic modeling approaches to mitigate man-made disasters. By integrating the socio-technical dimensions of human improvisation with the technical rigor of automated fault-tolerance, this paper provides a comprehensive roadmap for architecting systems that do not merely survive turbulence but thrive through it. The findings emphasize that resilience is a continuous process of adaptation, requiring a fundamental shift from static safety protocols to dynamic, experimental learning frameworks.
Keywords
Resilience Engineering, Chaos Testing, Microservices, Human Reliability
References
Aldossary, S., Allen, W. Data security, privacy, availability and integrity in cloud computing: issues and current solutions. Int J Adv Comput Sci Appl (2016).
Bhamra, R., Dani, D., Burnard, K. Resilience: the concept, a literature review and future directions. International Journal of Production Research (2011).
Chen, C., Reniers, G., Khakzad, N. Integrating safety and security resources to protect chemical industrial parks from man-made domino effects: A dynamic graph approach. Reliability Engineering & System Safety (2019).
Chen, C., Reniers, G., Khakzad, N. A thorough classification and discussion of approaches for modeling and managing domino effects in the process industries. Safety Science (2020).
Crawford, L., Nahmias, A.H. Competencies for managing change. International Journal of Project Management (2010).
Dedousis, P., Stergiopoulos, G., Arampatzis, G., Gritzalis, D. Enhancing operational resilience of critical infrastructure processes through chaos engineering. IEEE Access (2023).
Ding, S., Wang, Z., Wu, D., Olson, D.L. Utilizing customer satisfaction in ranking prediction for personalized cloud service selection. Decis Support Syst (2017).
ElMaraghy, H., Azab, A., Schuh, G., Pulz, C. Managing variations in products, processes and manufacturing systems. CIRP Ann (2009).
Fogli, M., Giannelli, C., Poltronieri, F., Stefanelli, C., Tortonesi, M. Chaos engineering for resilience assessment of digital twins. IEEE Trans Ind Inform (2024).
Sagar Kesarpu. (2025). Chaos Engineering as a Learning Framework: A Human-Centered Model for Developing High-Reliability Engineering Teams. The American Journal of Engineering and Technology, 7(12), 57–64. https://doi.org/10.37547/tajet/Volume07Issue12-05
Klein, L., Biesenthal, C., Dehlin, E. Improvisation in project management: A praxeology. International Journal of Project Management (2015).
Konstantinou, C., Stergiopoulos, G., Parvania, M., Esteves-Verissimo, P. Chaos engineering for enhanced resilience of cyber-physical systems. 2021 resilience week, RWS (2021).
Koren, Y. The global manufacturing revolution: Product-process-business integration and reconfigurable systems, Wiley series in systems engineering and management, Wiley-Blackwell, Oxford (2010).
Kuhnle, A. Adaptive order dispatching based on reinforcement learning: application in a complex job shop in the semiconductor industry [Dissertation] Karlsruher Institut für Technologie (2020).
Mukwevho, M.A., Celik, T. Toward a smart cloud: a review of fault-tolerance methods in cloud systems. IEEE Trans Serv Comput (2018).
Nabi, M., Toeroe, M., Khendek, F. Availability in the cloud: state of the art. J Netw Comput Appl (2016).
Rahi, K. Project resilience: a conceptual framework. International Journal of Information Systems and Project Management (2019).
Rosen, L. Linkedin being mindful of members. In: Rosenthal Casey, Jones Nora (Eds.), Chaos engineering, O’Reilly, Beijing u.a. (2020), pp. 91-106.
Shishodia, A., Sharma, R., Rajesh, R., Munim, Z.H. Supply chain resilience: A review, conceptual framework and future research. Int J Logist Manag (2023).
Tao, J., Qiu, D., Yang, F., Duan, Z. A bibliometric analysis of human reliability research. Journal of Cleaner Production (2020).
Thomé, A.M.T., Scavarda, L.F., Scavarda, A., Thomé, F.S. Similarities and contrasts of complexity, uncertainty, risks, and resilience in supply chains and temporary multi-organisation projects. International Journal of Project Management (2016).
van Steen, M., Tanenbaum, A.S. Distributed systems (4th ed., version 4.01 (January 2023)).
Waseem, M., Liang, P., Shahin, M., Di Salle, A., Márquez, G. Design, monitoring, and testing of microservices systems: the practitioners’ perspective. J Syst Softw (2021).
Wickramasinghe, S. Chaos testing: What it is, challenges & best practices (2024). https://testsigma.com/blog/chaos-testing/. [Accessed 17 June 2024]
Article Statistics
Downloads
Copyright License
Copyright (c) 2026 Kat Milestone

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Ethics:
- Authors are responsible for obtaining permission to use any copyrighted materials included in their manuscript.
- Authors are also responsible for ensuring that their research was conducted in an ethical manner and in compliance with institutional and national guidelines for the care and use of animals or human subjects.
- By submitting a manuscript to International Journal of Computer Science & Information System (IJCSIS), authors agree to transfer copyright to the journal if the manuscript is accepted for publication.