An Efficient Multi-FPGA System-Aware Hypergraph Partitioning Framework
Invited Speaker: Hailong You, Xidian University
Abstract: "Nowadays, multi-FPGA systems (MFS) are widely used in logic emulation and rapid prototyping of large designs. These systems emerged as the primary solution because of their high flexibility and cost-effectiveness. A multi-FPGA system consists of several FPGAs connected by physical wires or a programmable interconnection network. However, FPGAs are usually not fully connected because of limited I/O resources. The connections between different FPGAs could be irregular. Therefore, if an inter-FPGA signal starts from one FPGA and heads for another FPGA that is not directly connected to the source one, this would introduce the hops and increase the delay. Furthermore, the time-division multiplexing (TDM) technique is commonly used in multi-FPGA systems to allow multiple signals sharing the same TDM only to occupy one wire, increasing signal latency costs. Therefore, the hops and the TDM will cause longer delays in the signal path and further decrease system performance. In the typical compilation flow of a multi-FPGA system, partitioning, and system-level routing are performed in the early stages. These two processes involve dividing a large circuit into smaller sub-circuits and interconnecting the FPGAs. As circuit design becomes increasingly large and sophisticated, these steps play an increasingly critical role in system performance and delay. There exist several related works that focus on partitioning and system-level routing; for example, hMETIS in 1997, PaToH in 2011, KaHyPar in 2017, and SpecPart in 2022 are all presented to address the cut-size issue. However, a multi-FPGA system introduces additional FPGA topology constraints and complex optimization objectives such as hop and TDM, which present more significant challenges in solving these problems. This report introduces the MaPart, a novel hypergraph partitioning framework that aims to minimize the maximum path delay in a multi-FPGA system. In MaPart, the core engine, TopoPart+, which employs the improved multi-level partitioning paradigm and the very fast candidate FPGA propagation theorem and corollary, strives to achieve a non-hop partition. When obtaining a non-hop partition is unfeasible, MaPart combines binary search with TopoPart+ to minimize the maximum hop count during partitioning. Furthermore, TopoPart+ incorporates two successive local refinement algorithms that optimize the TDM ratio, reduce total hop count, and alleviate congestion on critical paths. In the system-level routing stage, MaPart integrates a system-level router based on layered graphs, enabling flexible control of the hop count based on the timing criticality of each path. In industry and large-scale design experiments, TopoPart+ provides enhanced problem-solving capabilities and achieves a remarkable 96% reduction in cut-size compared to the baseline. In overall performance comparison to the SOTA algorithm, the MaPart achieves a significant 37% reduction in delay."
A Security-Driven FPGA Application Development Workflow Based on GCN Algorithm
Presenter: Jing Zhou, Beijing Microelectronics Technology Institute
Abstract: As the complexity of integrated circuit designs increases and the supply chain becomes more globalized, the threat of hardware Trojans has escalated, posing higher security requirements for Electronic Design Automation (EDA) software.This study introduces an FPGA integrated development environment and FPGA application development process that incorporates a hardware Trojan detection algorithm. It extracts features from the netlist after FPGA synthesis and utilizes Graph Convolutional Networks(GCN) to process the rich structural features in the netlist. To address the issue of imbalanced datasets, we incorporate the GraphSMOTE technique to enhance the model's generalization capability by synthesizing minority class samples. In the classification phase, an optimized GCN model is employed to determine whether each node is a Trojan node. Comparative experiments with several other models demonstrate a significant improvement in detection accuracy, achieving the highest F1 score and a high True Positive Rate (TPR), thereby
validating the effectiveness and superiority of the GCN-based method in the field of hardware Trojan detection. This study not only enhances the security of the FPGA application development process but also provides new insights and tools for subsequent research in hardware security.
FPGA Divide-and-Conquer Placement using Deep Reinforcement Learning
Presenter: Shang Wang, University of Alberta
Abstract: This paper introduces the problem of learning to place logic blocks in Field-Programmable Gate Arrays (FPGAs) and a preliminary learning-based method. In contrast to previous FPGA placement algorithms, we depart from heuristic search and instead employ Deep Reinforcement Learning (DRL) for the placement task with the objective of minimizing wirelength. To facilitate the agent's decision-making, we design unique state representations that include the chipboard observations and interconnections between different blocks. Additionally, we propose the decomposition training paradigm to address the nature of large search space and sparse rewards in the placement problem by dividing the full problem into small subtasks and solving each subtask using DRL respectively. Experiments demonstrate the effectiveness of the decomposition paradigm on FPGA placement tasks.
PatRouter: An Optimal-Pattern-Oriented Routability-driven Routing Algorithm for FPGA
Presenter: Chen Wu, EagleChip Technology Limited
Abstract: Quality of routing results is one of the most sig_x0002_nificant aspects for modern FPGA design. Although various attempts have been made on improving routability and runtime,the existing approaches pay little attention on the nature of PathFinder-based routers to optimize the quality of routing results. In this paper, we propose PatRouter, an optimal-patternoriented routability-driven algorithm to improve the quality of routing. A pattern generator is first developed to build optimal patterns, which is defined as routing paths with the minimal number of wire segments to represent the routing resources in device. On this basis, we propose a pattern-oriented min-segment (PoM) connection router, which intensively route connections with optimal patterns. In this way, PoM searches for routing paths under pruned routing resources, thus, also reducing runtime. Meanwhile PoM will gradually increase the number of wire segments for congested connections to maintain routability. To further improve routability, we design a patternoriented A* (PoA) connection router to address the most congested connections by searching the whole routing resources.Experiments on our self-defined architecture and benchmarks show that PatRouter optimizes the quality of routing results while maintaining routability. To be specific, with PatRouter, 46.9% of the total connections are routed with minimal number of wire segments on average. Meanwhile, 22.2% of the total connections need sub-optimal patterns within 2 wire segments from optimal.Comparison with the latest approaches on Titan benchmark also shows 1.2× reduction in wirelength.
AIP-SEM: An Efficient ML-Boost In-Place Soft Error Mitigation Method for SRAM-based FPGA
Presenter: Zhuoli Wang, Beijing Microelectronics Technology Institute
Abstract: With the wide application of SRAM-base FPGA in aerospace engineering, the radiation resistance performance of FPGAs becomes unprecedented important. To correct errors, the Triple Modular Redundancy (TMR) technique uses a voter and incurs significant PPA overhead. We propose an efficient in-place soft error mitigation method, that can mitigate soft errors without any additional PPA overhead. Compared to other synthesis-based algorithms, our method use an XGBoost-based prediction model which predicts time-consuming circuit metrics. For the EPFL benchmark circuit, our approach achieves up to 19.42\% runtime speedup, and under 3\% loss on the failure rate reduction.
System Routing and TDM Assignment Optimization in Multi-2.5D FPGA-Based Prototyping Systems
Presenter: Chenxi Huang, Xidian University
Abstract: 2.5-D FPGA has been used in many Multi-FPGA Systems (MFS) for prototype verification due to its higher logic capacity and larger number of pins. The FPGA is composed of multiple dies connected with special wires. Due to the limited connections, FPGA-based system-level routing may cause internal congestion and lead to implementation failures. In addition, TimeDivision Multiplexing (TDM) is used in inter-FPGA connections to improve logic utilization, and each signal is assigned a TDM ratio. However, the increase in inter-FPGA delay of signals positively correlated with the ratio. And the system performance will be significantly influenced by these internal congestions and TDMs. In this paper, a system-level routing framework with hybrid initial routing and two-stage reroute algorithms that generates legal and high-quality routing results for a 2.5-D MFS is proposed. Afterwards, a three-step framework is proposed to generate legal TDM ratios and optimize the system’s performance, where a high-quality discretization algorithm based on bottom-up dynamic programming is implemented to optimize the performance losses. The experimental results demonstrate an average improvement of 8% in the solution’s quality compared with our baseline algorithms within reasonable runtime. And compared with the winner of the 5th EDA Elite Challenge, the quality of our solutions achieved the best results in most cases.
An Efficient Multi-FPGA System-Aware Hypergraph Partitioning Framework
Invited Speaker: Hailong You, Xidian University
In the typical compilation flow of a multi-FPGA system, partitioning, and system-level routing are performed in the early stages. These two processes involve dividing a large circuit into smaller sub-circuits and interconnecting the FPGAs. As circuit design becomes increasingly large and sophisticated, these steps play an increasingly critical role in system performance and delay. There exist several related works that focus on partitioning and system-level routing; for example, hMETIS in 1997, PaToH in 2011, KaHyPar in 2017, and SpecPart in 2022 are all presented to address the cut-size issue. However, a multi-FPGA system introduces additional FPGA topology constraints and complex optimization objectives such as hop and TDM, which present more significant challenges in solving these problems.
This report introduces the MaPart, a novel hypergraph partitioning framework that aims to minimize the maximum path delay in a multi-FPGA system. In MaPart, the core engine, TopoPart+, which employs the improved multi-level partitioning paradigm and the very fast candidate FPGA propagation theorem and corollary, strives to achieve a non-hop partition. When obtaining a non-hop partition is unfeasible, MaPart combines binary search with TopoPart+ to minimize the maximum hop count during partitioning. Furthermore, TopoPart+ incorporates two successive local refinement algorithms that optimize the TDM ratio, reduce total hop count, and alleviate congestion on critical paths. In the system-level routing stage, MaPart integrates a system-level router based on layered graphs, enabling flexible control of the hop count based on the timing criticality of each path.
In industry and large-scale design experiments, TopoPart+ provides enhanced problem-solving capabilities and achieves a remarkable 96% reduction in cut-size compared to the baseline. In overall performance comparison to the SOTA algorithm, the MaPart achieves a significant 37% reduction in delay."