Article
Unified Group-Decomposed MAC with Parallel Arithmetic for Scalable AI Accelerators
Multiply–Accumulate (MAC) integrates an arithmetic framework for high-speed and energy-efficient VLSI computation. Recent studies indicate that MAC operations contribute nearly 60–75% of total execution time and over 65% of dynamic power consumption in modern DSP and AI accelerators, while parallel MAC architectures can improve throughput by more than 3× with up to 40% energy reduction when optimized arithmetic units are employed. Traditional MAC designs rely on conventional multipliers and carry-propagation-based adders, leading to long critical paths, excessive switching activity, poor scalability, and increased area–power overhead when extended to parallel multi-MAC configurations. Moreover, replicating complete arithmetic blocks with independent control logic further aggravates energy inefficiency and design complexity in large-scale systems. To address these limitations, the proposed architecture introduces a Unified Group-Decomposed Multiplier (UGDM) that performs hierarchical operand decomposition and parallel partial-product generation, significantly shortening the multiplication critical path and reducing unnecessary transitions. In addition, a Predictive Skip-Merge Adder (PSMA) is employed for accumulation, which combines speculative carry prediction with selective skip–merge paths to accelerate summation without full carry propagation. Multiple MAC units are instantiated in parallel using this single novel multiplier–adder pair and governed by a centralized shared control logic, enabling synchronized operation, reduced hardware redundancy, and scalable throughput. The entire framework is realized through automated Verilog generation, ensuring flexibility across operand widths while achieving enhanced speed, lower energy consumption, and improved suitability for next-generation DSP systems.
Full Text Attachment





























