
Advancing Graph Processing: A Hardware-Software Co-Design Approach
Rohan Mehta , Department of Computer Engineering, Indian Institute of Technology Bombay, IndiaAbstract
Graph processing is increasingly important in numerous domains, including recommender systems, neuroscience, cybersecurity, and social network analysis (Wu et al., 2023; Bullmore & Sporns, 2009; Wang et al., 2019; Yin et al., 2023; Luo et al., 2023; He et al., 2024). However, the unique characteristics of graph data, such as irregularity and unstructuredness, pose significant challenges to achieving high performance. This paper explores the latest advancements in hardware and software co-design techniques aimed at addressing these challenges and improving the efficiency of graph processing systems. We examine novel architectural approaches, memory management strategies, and software frameworks that collectively contribute to enhanced performance.
Keywords
Graph processing, hardware-software co-design, FPGA acceleration
References
Bai, J. Y., Guo, J., Wang, C. C., Chen, Z. Y., He, Z., Yang, S., ... & Guo, Y. W. (2023). Deep graph learning for spatially-varying indoor lighting prediction. Science China Information Sciences, 66(3), Article 132106.
Ben-Nun, T., Sutton, M., Pai, S., & Pingali, K. (2017). Groute: An asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices, 52(8), 235-248.
Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature reviews neuroscience, 10(3), 186-198.
Chen, D., Gui, C. Y., Zhang, Y., Jin, H., Zheng, L., Huang, Y., & Liao, X. F. (2022). GraphFly: Efficient asynchronous streaming graphs processing via dependency-flow. In 2022 International Conference for High Performance Computing, Networking, Storage and Analysis.
Chen, D., He, H. H., Jin, H., Zheng, L., Huang, Y., Shen, X. Y., & Liao, X. F. (2023). MetaNMP: Leveraging Cartesian-like product to accelerate HGNNs with near-memory processing. In Proceedings of the 50th Annual International Symposium on Computer Architecture, Article 56.
Chen, D., Jin, H., Zheng, L., Huang, Y., Yao, P. C., Gui, C. Y., ... & Zheng, R. (2022). A general offloading approach for near-DRAM processing-in-memory architectures. In 2022 IEEE International Parallel and Distributed Processing Symposium (pp. 246-257).
Chen, X. Y., Chen, Y., Cheng, F., Tan, H. S., He, B. S., & Wong, W. F. (2022). ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. In 55th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 1342-1358).
Chi, P., Li, S. C., Xu, C., Zhang, T., Zhao, J. S., Liu, Y. P., ... & Xie, Y. (2016). PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In 43rd Annual International Symposium on Computer Architecture (pp. 27-39).
Dai, G. H., Huang, T. H., Chi, Y. Z., Xu, N. Y., Wang, Y., & Yang, H. Z. (2017). ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 217-226).
Dong, W., Moses, C., & Li, K. (2011). Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World Wide Web (pp. 577-586).
Fang, P., Wang, F., Shi, Z., Feng, D., Yi, Q. X., Xu, X. H., & Zhang, Y. X. (2022). An efficient memory data organization strategy for application-characteristic graph processing. Frontiers of Computer Science, 16(1), 1-14.
Fey, M., & Lenssen, J. E. (2019). Fast graph representation learning with PyTorch geometric. arXiv preprint arXiv:1903.02428.
Gui, C. Y., Zheng, L., He, B. S., Liu, C., Chen, X. Y., Liao, X. F., & Jin, H. (2019). A survey on graph processing accelerators: Challenges and opportunities. Journal of Computer Science and Technology, 34(2), 339-371.
Ham, T. J., Wu, L. S., Sundaram, N., Satish, N., & Martonosi, M. (2016). Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 49th Annual IEEE/ACM International Symposium on Microarchitecture.
He, D. L., Yuan, P. P., & Jin, H. (2024). Answering reachability queries with ordered label constraints over labeled graphs. Frontiers of Computer Science, 18(1), 1-14.
Hu, M., Strachan, J. P., Li, Z. Y., Grafals, E. M., Davila, N., Graves, C., ... & Williams, R. S. (2016). Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In 53rd Annual Design Automation Conference, Article 19.
Huang, Y., Zheng, L., Yao, P. C., Wang, Q. G., Liao, X. F., Jin, H., & Xue, J. L. (2020). A heterogeneous PIM hardware-software co-design for energy-efficient graph processing. In 2020 IEEE International Parallel and Distributed Processing Symposium (pp. 684-695).
Huang, Y., Zheng, L., Yao, P. C., Wang, Q. G., Liao, X. F., Jin, H., & Xue, J. L. (2022). Accelerating graph convolutional networks using crossbar-based processing-in-memory architectures. In 2022 IEEE International Symposium on High-Performance Computer Architecture (pp. 1029-1042).
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Rohan Mehta

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright and Ethics:
- Authors are responsible for obtaining permission to use any copyrighted materials included in their manuscript.
- Authors are also responsible for ensuring that their research was conducted in an ethical manner and in compliance with institutional and national guidelines for the care and use of animals or human subjects.
- By submitting a manuscript to International Journal of Computer Science & Information System (IJCSIS), authors agree to transfer copyright to the journal if the manuscript is accepted for publication.