Bit manipulation instructions

Bit manipulation instructions are instructions that perform bit manipulation operations in hardware, rather than requiring several instructions for those operations as illustrated with examples in software.[1] Several leading as well as historic architectures have bit manipulation instructions including ARM, WDC 65C02, the TX-2 and the Power ISA.[2]

Bit manipulation is usually divided into subsets as individual instructions can be costly to implement in hardware when the target application has no justification. Conversely, if there is a justification then performance may suffer if the instruction is excluded. Carrying out the cost-benefit analysis is a complex task: one of the most comprehensive efforts in bit manipulation was a collaboration headed by Clare Wolfe, providing justifications, use-cases, c code, proofs and Verilog for each proposed instruction.[3][4]

Particular practical examples include Bit banging of GPIO using a low-cost Embedded controller such as the WDC 65C02, 8051 and Atmel PIC. At the slow clock rate of these CPUs, if bit-set/clear/test bit manipulation were not available the use of that low-cost CPU would, self-evidently, not be viable for the target application.

Note:

In something of a Wikipedia Fourth wall breakage note: GPUs and other highly-specialist tasks such as cryptography tend to result in extreme-specialist instructions, wthout which performance would suck. Examples include AES instruction set extensions that cannot in any way be used for any other purpose. GPUs such as Larrabee[5] and Nyuzi attempted to "dial back" this practice to some extent, only to discover why it is done (performance sucks otherwise... seeing a trend, here?).

This page is not about such specialised instructions, nor even of their functionality. It covers useful Categorisation of the existence in CPUs and CPU families, of general-purpose bit-manipulation instructions that happen to greatly improve performance or power consumption of specific algorithms. An example is cryptography making heavy use of rotate, but rotate having many other practical uses elsewhere: just not as many as, say, Add. Such ISA design trade-offs are notoriously meticulous but ultimately pragmatic.

If you encounter any type of unusual or important bit manipulation instructions, or any CPU that has them, feel free to add them below, bearing in mind that the page's primary purpose is Categorisation, not explicit functional description per se. A helpful task for future readers would be to add such pages describing the functionality to the "See also" section. Enjoy the end of the Fourth Wall...

Hardware bit manipulation

All the architectures below have instruction subsets and groups where the bit manipulation is provided in hardware.

Intel and AMD (x86)

Power ISA

Power ISA has a large range of bit manipulation instructions,[7] largely due to its history and relationship with IBM mainframes and the z/Architecture:

  • Count leading zeros and trailing, and masked versions of the same.[8] Popcount[8]
  • masked bit-extract pextd and bit-deposit pdepd these drop and distribute bits in place according to a mask instead of the more usual technique of a offset and a length.[9]; An unusual centrifuge instruction which moves masked-bits to the left and unmasked bits to the right, preserving their relative order in both instances. Most ISAs would have an operand expressing the number of sequential bits to extract, plus the length: cfuged combines these into one general-purpose bitmask.[9]
  • 8x8-bit transpose vgbbd[10] which treats a 64-bit quantity as an 8x8 2D matrix, and performs a matrix transpose operation. Each bit 0 of each byte therefore becomes the first byte, each bit 1 of each byte becomes the second and so on.
  • a strange but very useful indexing instruction, (bpermd)[11] which allows selection of up to eight individual bits from a 64-bit source, by treating each byte of a second 64-bit register as bit-indices into the first.
  • Ternary 8-bit Bitwise ternary logic instruction xxeval[12] similar to AVX-512
  • SWAR-style 8, 16 and 32-bit parity instructions
  • bit-matrix multiply and transpose, which are computationally otherwise very expensive.
  • strategic instructions for accelerating Packed BCD.[13]
  • Power v3.1 also introduced a number of additional bit manipulation instructions including swapping the order of bytes within half-words, words, and the whole 64-bit register.

IBM System/360 through z/Architecture

IBM S/370, S/370-XA, ESA/370, and ESA/390 vector operations

The IBM 3090 introduced an optional vector facility[14] to the System/370-XA and Enterprise Systems Architecture/370 instruction sets. In addition to integer and floating-point vector arithmetic and logical operations on multiple integer and floating-point values, it introduced vector bit manipulation operations count leading zeros vczvm and population count vcovm.[15]

z/Architecture scalar

z/Architecture did not support the previous vector facility.[16] However, starting with the 11th edition of the z/Architecture Principles of Operation:[17] it supported the following instructions:

  • Vector count leading zeros vclz, count trailing zeros vctz[18][19] and vector population count vpopct[20]
  • Vector test under mask vtm[21] - sets a Condition Code based on comparing all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both.
  • Vector GF(2) multiply and multiply-accumulate, vgfm,[22] known as carryless multiply
  • And-complement and others,
  • bit-extract and deposit,[23]
  • a range of bit byte and masked insert instructions,[24]
  • comprehensive rotate and insert instructions including masked rotate-and-OR,[25] and shift,[26]
  • comprehensive Packed BCD.[27]
  • memory-based test-and-set and various masked-test set/clear bit operations, which move or copy a single bit into Condition Codes.[28]

ARM

  • ARM11 has bitwise test-ANDed (a bitmasked test) and test-XOR, standard logical bitwise operations including OR-complement; byte halfword and bit-reversing, and conditional byte-selection/merging. Shift and rotate are available on Operand2.[29]
  • ARM Cortex-A has bit-field set, clear, extract and reverse.[30]
  • ARM A64 has SWAR-style half-word byte-swapping, bit-field insert and extract, and bit-reversing.[31]

RISC-V

In the standard extensions RISC-V has scalar bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions.

  • RISC-V Zb* extensions contain a significant number of bit manipulation instructions.[32] The four groups are broken down into useful categories (the integer subset has min/max, rotate and Popcount for example), and have very good researched justifications for their inclusion and the improvements they bring.[33]
  • The RISC-V Vector Extension (RVV) has instructions that qualify as hardware-level bit manipulation, but on Vector masks rather than Scalar registers as is normally the case. For example, a Vector-mask Popcount is available.[34] RVV also has per-element bitwise operations.[35]

Embedded Microcontrollers

Intel

  • The 8086 has TEST, as well as bitwise operations[36]
  • The 8051 has SETB, CLR and CPL - set clear and invert bit instructions - and a considerable percentage of its instructions are bit manipulation.[37] Also included is Or-complement and And-complement, present in RISC-V Zb*.[38]

MOS 6502

  • The WDC 65C02 added bit-manipulation: set, reset and test on individual bits.
  • Rockwell added similar extensions (RMB, SMB, BBR and BBS) to the R65C00 series[39]

Atmel PICs

others

See also

References

z/Architecture Principles of Operation (PDF) (First ed.). IBM. December 2000. SA22-7832-00. Retrieved August 8, 2025.
z/Architecture Principles of Operation (PDF) (Eleventh ed.). IBM. March 2015. SA22-7832-10. Retrieved August 8, 2025.
z/Architecture Principles of Operation (PDF) (Fifteenth ed.). IBM. April 2025. SA22-7832-14. Retrieved July 3, 2025.
Power ISA™ Version 3.1 (PDF) (v3.1 ed.). IBM. May 1, 2020. SA22-7832-14. Retrieved Aug 7, 2025.
IBM System/370 Vector Operations (PDF) (Third ed.). IBM Corporation. August 1986. SA22-7125-2. Retrieved Sep 20, 2018.
  1. ^ "Bit Twiddling Hacks".
  2. ^ "Advanced bit manipulation instructions: Architecture, implementation and applications". ProQuest.
  3. ^ "GitHub - riscv/Riscv-bitmanip at v0.93". GitHub.
  4. ^ https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-draft.pdf [bare URL PDF]
  5. ^ "TomF's talks and papers".
  6. ^ "GF2P8AFFINEQB — Galois Field Affine Transformation".
  7. ^ power3.1 & IBM Power ISA v3.1.
  8. ^ a b power3.1, pp. Power ISA Book I Chapter 3.3.13 Fixed-Point p104.
  9. ^ a b power3.1, pp. Power ISA Book I Chapter 3.3.13 Fixed-Point p106.
  10. ^ power3.1, pp. Power ISA Book I Chapter 6.12.1 Vector Facility p445.
  11. ^ power3.1, pp. Power ISA Book I Chapter 3.3.13 Fixed-Point p105.
  12. ^ power3.1, pp. Power ISA Book I Chapter 7. Vector-Scalar Extension Facility p967.
  13. ^ power3.1, pp. Power ISA Book I Chapter 3.3.15 Fixed-Point p117.
  14. ^ ibm370 & IBM System/370 Vector Operations.
  15. ^ ibm370, pp. 3-7–3-8.
  16. ^ z1, p. 1-1.
  17. ^ z11, p. xxviii.
  18. ^ z15, pp. 22-11–22-12.
  19. ^ z15, pp. 7-289–7-290.
  20. ^ z15, pp. 22–26, 7–424.
  21. ^ z15, p. 22-37.
  22. ^ z15, p. 22-16.
  23. ^ z15, p. 7-36.
  24. ^ z15, p. 7-309.
  25. ^ z15, pp. 7-426–7-430.
  26. ^ z15, p. 7-437.
  27. ^ z15, pp. 8-1–8-14.
  28. ^ z15, pp. 7-458–7-459.
  29. ^ https://pages.cs.wisc.edu/~markhill/restricted/arm_isa_quick_reference.pdf [bare URL PDF]
  30. ^ "Documentation – Arm Developer".
  31. ^ "Documentation – Arm Developer".
  32. ^ "Riscv-bitmanip/Bitmanip/Index.adoc at main · riscv/Riscv-bitmanip". GitHub.
  33. ^ "Riscv-bitmanip/Bitmanip/Overview.adoc at main · riscv/Riscv-bitmanip". GitHub.
  34. ^ "Riscv-v-spec/V-spec.adoc at master · riscvarchive/Riscv-v-spec". GitHub.
  35. ^ "Riscv-v-spec/V-spec.adoc at master · riscvarchive/Riscv-v-spec". GitHub.
  36. ^ "Bit Manipulation Instructions in 8086 | Logical Instructions". 11 August 2018.
  37. ^ https://cs.uok.edu.in/Files/79755f07-9550-4aeb-bd6f-5d802d56b46d/Custom/InstructionSet_UnitII.pdf [bare URL PDF]
  38. ^ "Boolean (Bitwise) instructions in 8051 for bit manipulation". 29 April 2020.
  39. ^ "Rockwell R6500/11, R6500/12 and R6500/15 One-Chip Microcomputers". 7 June 1987. Archived from the original on 3 September 2023. Retrieved 30 April 2020.
  40. ^ https://www.ti.com/lit/pdf/spru198 [bare URL]
  41. ^ "TX-2 Documentation".
  42. ^ http://www.bitsavers.org/pdf/mit/tx-2/TX-2_UserHandbook_ch3.pdf [bare URL PDF]
Prefix: a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9

Portal di Ensiklopedia Dunia

Kembali kehalaman sebelumnya