containing only 14 transistors, and embedded it within
both array and Wallace Tree multipliers(Choi, J.,
Jeon, J., & Choi, K. (2000). The result was a notable
reduction in both silicon area and dynamic power,
while the use of hybrid full adders assisted in
recovering voltage swing lost by prior GDI-only
designs. Hybrid implementations were found to
address some shortcomings of pure-CMOS and pure-
GDI multipliers, and simulation under 250nm
technology showed substantial improvements over
traditional designs (Benini, L et al., 2020). Wallace
Tree-based multipliers specifically benefited from
this arrangement, achieving low area and high-speed
operation for larger bit-widths compared to classical
structures (Elguibaly, F. 2020).
Jain et al. built on this by integrating Vedic
mathematics with Booth and Wallace Tree
algorithms, culminating in hybrid multiplier designs
that exploit the advantages of each approach. Their
Radix-4 Booth and Wallace algorithm fusion led to
reductions in both critical path delay and area. The
resulting architecture suits applications where both
computational speed and hardware optimization are
necessary. Similar combinations of high- speed
arithmetic techniques reflect the industry's direction
toward multipliers tailored to diverse, domain-
specific requirements.
Munawar et al. modified Dadda multipliers with
carry select adders and binary to excess-1 converters,
which delivered additional speed improvements and
power reductions over traditional Dadda Tree and
array multipliers (Fadavi-Ardekani, J. 2022). Their
Cadence-based analysis using 180nm technology
highlighted that judicious adder selection and
architectural modularity can further balance energy,
timing, and layout requirements for custom VLSI
accelerators (Cooper, A. R. 1988).
Finally, Tung and Huang proposed a high-
performance, pipelined multiply-accumulate (MAC)
unit that integrates additions and accumulations
directly into the partial product reduction process,
thus minimizing the delays and switching activity of
conventional accumulation strategies. This approach
is especially beneficial in applications requiring
repeated MAC operations, such as DSP and neural
network inference engines.
In summary, recent literature reflects a strong
progression from conventional, speed-oriented
multiplier architectures (e.g., Wallace, Dadda, Braun)
toward designs that optimize for energy efficiency and
silicon area, often at the architectural, logic, and even
transistor level. Architectural segmentation, selective
approximation, reversible logic, and hybrid CMOS-
GDI implementations are central to these
advancements. The Additive Multiply Module
(AMM) multiplier is developed in this context,
employing operand segmentation and efficient adder
arrangements to capitalize on these trends, making it
highly suitable for modern low-power VLSI systems
where trade- offs between accuracy, power, and area
are key design considerations.
3
METHODOLOGY
3.1 Input Division Strategy
The multiplicand and multiplier are systematically
divided into smaller segments to facilitate partial
product generation, .Multiplicand Division, The 8-bit
multiplicand, denoted as A, is partitioned into two 4-
bit segments: a higher-order part (AH) and a lower-
order part (AL). This division can be mathematically
expressed as:
A=(AH×24)+AL or A=(AH≪4)+AL (1)
where AH represents the most significant 4 bits and
AL represents the least significant 4 bits of A. For
instance, if A=(a7a6a5a4a3a2a1a0), then
AH=(a7a6a5a4) and AL=(a3 a2a1a0), The 8-bit
multiplier, denoted as B, is divided into four 2-bit
segments, ordered from most significant to least
significant: B3,B2,B1,B0. This decomposition can be
represented as[1]:
B=(B3×26)+(B2×24)+(B1×22)+B0 (2)
or, equivalently, using bitwise left shifts:
B=(B3≪6)+(B2≪4)+(B1≪2)+B0 (3)
Here, B3,B2,B1,B0 are individual 2-bit groups. For
example, if B=(b7b6b5b4b3b2b1b0)2, then
B3=(b7b6)2, B2=(b5b4)2, B1=(b3b2)2, and
B0=(b1b0)2.
This division strategy enables the generation of
smaller, less complex partial products, which are
fundamental to the efficient design of the overall
multiplier architecture.
3.2 Bitwise Partial Product Generation
Following the systematic segmentation of the 8-bit
multiplicand A and multiplier B as described in
Section II- A, the next critical step involves the parallel
generation of partial products. This phase leverages
the inherent bitwise nature of digital multiplication to
decompose the overall operation into simpler, smaller-