EDA² | ISEDA 2024

Design Space Exploration

Abstract: With the increasing complexity of designs, it is hard to expect certain fixed optimization sequences to generate optimum results for all designs. Automating the space exploration of optimization sequences for a design attracted a good trend of research work using deep learning methods. In this paper, we introduce a framework designed to generate scripts for executing logic synthesis flows. The framework integrates a novel precisely quantified multi-objective reward function and leverages mapped netlist circuit features to enhance state representation within the reinforcement learning paradigm. Experimental results on the EPFL dataset show that the proposed method outperforms existing exploratory methods in terms of area reduction while meeting delay constraints.

Abstract: Approximate computing is an emerging paradigm that, by relaxing the requirement for full accuracy in convolutional neural networks (CNNs), offers benefits in the design area and power consumption. In circuit design, approximate logic synthesis (ALS) is to discover and synthesize the approximate circuits automatically, given an exact circuit description. This paper proposes a Hybrid ALS Flow which composes Re-partition XOR-BMF ALS and Cartesian Genetic Programming ALS (CGP-based ALS), and designs the Hessian-aware Mixed-quantization CNNs. This paper designs an 8-bit approximate multiplier using the proposed Hybrid ALS Flow and applies it to the Mixed-quantization CNNs. Experiments show that the proposed Re-partition XOR-BMF ALS has better design space exploration than the BLASYS. Compared to the exact Multiplier, the designed approximate multiplier reduces the power delay product (PDP) by 56.17% under an industry 28nm process technology with the power supply of 0.8V, while the accuracy loss is only 1.33% and 2.36% in VGG16 and Resnet50 on CIFAR100.

Abstract: Deep Neural Networks (DNNs) have demonstrated exceptional capabilities in complex tasks, driving advancements in computation methods, notably through approximate computation. This approach, particularly with approximate multipliers, is crucial for DNN accelerators. Achieving a high-performance accelerator requires not only selecting suitable approximate multipliers but also fine-tuning the network's weights. Traditional methods treat these steps separately. Our innovation introduces trainable TLED-layers, integrating multiplier updates and weight tuning. The method integrates parameterized approximate multipliers into layers, enabling direct hardware optimization in the loss function. This allows for network training using conventional methods after layer replacement. This method optimizes both hardware efficiency and accuracy, as evidenced in MNIST and CIFAR-10 evaluations. Results show a hardware-friendly accelerator on LeNet, reducing the product of power, delay, and area by 10.17% compared to state-of-art approximate multipliers, and a design with only 0.25% accuracy loss yet offering a substantial the product of power, delay, and area decrease of 75.82%. For AlexNet, our approach achieves the highest accuracy among both precise and approximate multipliers. Experimental findings affirm the superiority of our proposed methodology over prior designs.

Abstract: Nonlinear functions are indispensable parts of deep neural networks (DNNs). Based on the piecewise approximation with the non-uniform segmentation strategy, this paper proposes PWL-Explorer, a highly parameterized reconfigurable nonlinear core designed with Chisel to implement diverse nonlinear functions. Additionally, we model the design space exploration (DSE) as a multi-objective black-box optimization problem via Bayesian optimization to explore the Pareto front of PWL-Explorer for the precision and area-delay product (ADP) objectives. The experimental results show that compared to state-of-the-art work for element-wise activation functions, the optimized PWL-Explorer architecture achieves an average of 1.58x better ADP while obtaining 2.93x higher maximum absolute error (MAE) precision. Compared to state-of-the-art work for Softmax, PWL-Explorer achieves 1.05x better ADP while obtaining 40.01x better mean square error (MSE) precision. Due to the elaborate hardware design and push-button workflow, PWL-Explorer can provide support to DNN accelerator developers.

Abstract: High-dimensional analog circuit sizing with machine learning-based surrogate models suffers from the high sampling cost of evaluating expensive black-box objective functions in huge design spaces. This work addresses the sampling efficiency challenge by elaborately reducing the dimensionality of the input spaces, enabling efficient optimization for automated analog circuit sizing. We propose a latent space optimization method that includes an iteratively updated generative model based on a variational autoencoder to embed the solution manifold of analog circuits to a low-dimensional and continuous space, where the latent variables are optimized using Bayesian optimization. The effectiveness of the proposed method has been verified on two real-world analog circuits with 18 and 59 design variables. In comparison with BO in the original high-d spaces or latent low_x0002_d spaces assisted by other embedding strategies, the proposed method achieves 23%˜73% improvements in optimization performance within the same runtime limitations. We also conduct a technology migration experiment using the pre-trained variational autoencoder model, which demonstrates the necessity of pre-training and the scalability of the proposed method.