This article is about the general topic of Instruction set bit manipulation subsets. For bit manipulation extensions unique to AMD and Intel, see x86 Bit manipulation instruction set.
Further information on bit manipulation in general, see: bit manipulation
Bit manipulation instructions are instructions that perform bit manipulation operations in hardware, rather than requiring several instructions for those operations as illustrated with examples in software.[1] Several leading as well as historic architectures have bit manipulation instructions including ARM, WDC 65C02, the TX-2 and the Power ISA.[2]
Bit manipulation is usually divided into subsets as individual instructions can be costly to implement in hardware when the target application has no justification. Conversely, if there is a justification then performance may suffer if the instruction is excluded. Carrying out the cost-benefit analysis is a complex task: one of the most comprehensive efforts in bit manipulation was a collaboration headed by Clare Wolfe, providing justifications, use-cases, c code, proofs and Verilog for each proposed instruction.[3][4]
Particular practical examples include Bit banging of GPIO using a low-cost Embedded controller such as the WDC 65C02, 8051 and Atmel PIC. At the slow clock rate of these CPUs, if bit-set/clear/test bit manipulation were not available the use of that low-cost CPU would, self-evidently, not be viable for the target application.
Note:
In something of a Wikipedia Fourth wall breakage note: GPUs and other highly-specialist tasks such as cryptography tend to result in extreme-specialist instructions, wthout which performance would suck. Examples include AES instruction set extensions that cannot in any way be used for any other purpose. GPUs such as Larrabee[5] and Nyuzi attempted to "dial back" this practice to some extent, only to discover why it is done (performance sucks otherwise... seeing a trend, here?).
This page is not about such specialised instructions, nor even of their functionality. It covers useful Categorisation of the existence in CPUs and CPU families, of general-purpose bit-manipulation instructions that happen to greatly improve performance or power consumption of specific algorithms. An example is cryptography making heavy use of rotate, but rotate having many other practical uses elsewhere: just not as many as, say, Add. Such ISA design trade-offs are notoriously meticulous but ultimately pragmatic.
If you encounter any type of unusual or important bit manipulation instructions, or any CPU that has them, feel free to add them below, bearing in mind that the page's primary purpose is Categorisation, not explicit functional description per se. A helpful task for future readers would be to add such pages describing the functionality to the "See also" section. Enjoy the end of the Fourth Wall...
Also present in the AVX-512#GFNI subset is bit-matrix affine transformation and its inverse: GF2P8AFFINEQB is effectively an 8x8 bit-matrix multiply in the Galois field GF(2^8).[6]
Power ISA
Power ISA has a large range of bit manipulation instructions,[7] largely due to its history and relationship with IBM mainframes and the z/Architecture:
masked bit-extract pextd and bit-deposit pdepd these drop and distribute bits in place according to a mask instead of the more usual technique of a offset and a length.[9]; An unusual centrifuge instruction which moves masked-bits to the left and unmasked bits to the right, preserving their relative order in both instances. Most ISAs would have an operand expressing the number of sequential bits to extract, plus the length: cfuged combines these into one general-purpose bitmask.[9]
8x8-bit transpose vgbbd[10] which treats a 64-bit quantity as an 8x8 2D matrix, and performs a matrix transpose operation. Each bit 0 of each byte therefore becomes the first byte, each bit 1 of each byte becomes the second and so on.
a strange but very useful indexing instruction, (bpermd)[11] which allows selection of up to eight individual bits from a 64-bit source, by treating each byte of a second 64-bit register as bit-indices into the first.
Power v3.1 also introduced a number of additional bit manipulation instructions including swapping the order of bytes within half-words, words, and the whole 64-bit register.
IBM System/360 through z/Architecture
IBM S/370, S/370-XA, ESA/370, and ESA/390 vector operations
z/Architecture did not support the previous vector facility.[16] However, starting with the 11th edition of the z/Architecture Principles of Operation:[17] it supported the following instructions:
Vector test under mask vtm[21] - sets a Condition Code based on comparing all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both.
memory-based test-and-set and various masked-test set/clear bit operations, which move or copy a single bit into Condition Codes.[28]
ARM
ARM11 has bitwise test-ANDed (a bitmasked test) and test-XOR, standard logical bitwise operations including OR-complement; byte halfword and bit-reversing, and conditional byte-selection/merging. Shift and rotate are available on Operand2.[29]
ARM Cortex-A has bit-field set, clear, extract and reverse.[30]
ARM A64 has SWAR-style half-word byte-swapping, bit-field insert and extract, and bit-reversing.[31]
RISC-V
In the standard extensions RISC-V has scalar bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions.
RISC-V Zb* extensions contain a significant number of bit manipulation instructions.[32] The four groups are broken down into useful categories (the integer subset has min/max, rotate and Popcount for example), and have very good researched justifications for their inclusion and the improvements they bring.[33]
The RISC-V Vector Extension (RVV) has instructions that qualify as hardware-level bit manipulation, but on Vector masks rather than Scalar registers as is normally the case. For example, a Vector-mask Popcount is available.[34] RVV also has per-element bitwise operations.[35]
The 8051 has SETB, CLR and CPL - set clear and invert bit instructions - and a considerable percentage of its instructions are bit manipulation.[37] Also included is Or-complement and And-complement, present in RISC-V Zb*.[38]
This article has not been added to any content categories. Please help out by adding categories to it so that it can be listed with similar articles. (August 2025)