Abstract:This paper addresses the issues of low computational efficiency and insufficient flexibility encountered during the encryption and homomorphic operations of the Paillier homomorphic encryption algorithm. We design and implement an acceleration scheme for the Paillier algorithm. Utilizing software-hardware Co-design technology, this scheme efficiently processes algorithmic precomputation, data interaction, and the requirements for parsing algorithm operations, thereby effectively enhancing its flexibility and reducing resource consumption. Furthermore, significant improvements in computational throughput and real-time performance are achieved through the customized design and implementation of a dual-high-radix Montgomery modular multiplication core. Test results demonstrate a significant acceleration effect on the algorithm′s critical computational steps. Under a 1 024-bit computational width, the average latencies for modular multiplication and modular exponentiation are approximately 0.523 and 667.42 μs, respectively. Compared to an Intel Core i9.13900HX processor, these latencies are reduced by approximately 68.74% and 42.76% (corresponding to speedups of 3.20× and 1.75×). The proposed scheme is capable of providing efficient privacy computation support for secure multi-party computation and federated learning.