Jump to content

Bit manipulation instructions

From Wikipedia, the free encyclopedia

Bit manipulation instructions are instructions that perform bit manipulation operations in hardware, rather than requiring several instructions for those operations as illustrated with examples in software.[1] Several leading as well as historic architectures have bit manipulation instructions including ARM, WDC 65C02, the TX-2 and the Power ISA.[2]

Bit manipulation is usually divided into subsets as individual instructions can be costly to implement in hardware when the target application has no justification. Conversely, if there is a justification then performance may suffer if the instruction is excluded. Carrying out the cost-benefit analysis is a complex task: one of the most comprehensive efforts in bit manipulation was a collaboration headed by Clare Wolfe, providing justifications, use-cases, c code, proofs and Verilog for each proposed instruction.[3][4]

Particular practical examples include Bit banging of GPIO using a low-cost Embedded controller such as the WDC 65C02, 8051 and Atmel PIC. At the slow clock rate of these CPUs, if bit-set/clear/test bit manipulation were not available the use of that low-cost CPU would, self-evidently, not be viable for the target application.

Note:

In something of a Wikipedia Fourth wall breakage note: GPUs and other highly-specialist tasks such as cryptography tend to result in extreme-specialist instructions, wthout which performance would suck. Examples include AES instruction set extensions that cannot in any way be used for any other purpose. GPUs such as Larrabee[5] and Nyuzi attempted to "dial back" this practice to some extent, only to discover why it is done (performance sucks otherwise... seeing a trend, here?).

This page is not about such specialised instructions, nor even of their functionality. It covers useful Categorisation of the existence in CPUs and CPU families, of general-purpose bit-manipulation instructions that happen to greatly improve performance or power consumption of specific algorithms. An example is cryptography making heavy use of rotate, but rotate having many other practical uses elsewhere: just not as many as, say, Add. Such ISA design trade-offs are notoriously meticulous but ultimately pragmatic.

If you encounter any type of unusual or important bit manipulation instructions, or any CPU that has them, feel free to add them below, bearing in mind that the page's primary purpose is Categorisation, not explicit functional description per se. A helpful task for future readers would be to add such pages describing the functionality to the "See also" section. Enjoy the end of the Fourth Wall...

Hardware bit manipulation

[edit]

All the architectures below have instruction subsets and groups where the bit manipulation is provided in hardware.

Intel and AMD (x86)

[edit]
  • The x86 instruction core set contains:
    • BSR Bit Scan Reverse - a quirky backwards count leading zeros
    • BSF Bit Scan Forward - a quirky backwards count trailing zeros
  • SSE4 and the BMI instruction set extensions contains instructions for:
    • Count leading zeros lzcnt,
    • Count trailing zeros tzcnt
    • Population count popcnt
    • Bit extract/bit deposit pext/pdep
  • The AVX-512 extension includes a Bitwise ternary logic instruction, vpternlog. Also noteworthy is a conflict detection instruction. VPCONFLICTD
  • Also present in the AVX/AVX-512#GFNI subset is bit-matrix affine transformation and its inverse: GF2P8AFFINEQB is effectively an 8x8 bit-matrix multiply in the Galois field GF(2^8).[6]
  • An Intel GNFI technology guide on that AVX/AVX512 GNFI Extension also lists numerous uses including parallel byte-wise set/clear/invert bitmanipulation, 5-bit sign-extension and points out the potential is much greater.[7]
  • Intel BCD opcodes

Power ISA

[edit]

Power ISA has a large range of bit manipulation instructions,[8] largely due to its history and relationship with IBM mainframes and the z/Architecture:

  • Count leading zeros and trailing, and masked versions of the same.[9] There is a mixture of Popcount[9] parity[10] and SWAR-style instructions, but not a full set of each: popcntb is SWAR byte-level 8x8-bit but there is no 4x16-bit popcnth yet there is 2x32-bit popcntw and 64-bit scalar popcntd. Likewise, prtyw is SWAR half-word 4x16-bit but there is no prtyb
  • masked bit-extract pextd and bit-deposit pdepd these drop and distribute bits in place according to a mask instead of the more usual technique of a offset and a length.[11]; An unusual centrifuge instruction which moves masked-bits to the left and unmasked bits to the right, preserving their relative order in both instances. Most ISAs would have an operand expressing the number of sequential bits to extract, plus the length: cfuged combines these into one general-purpose bitmask.[12]
  • 8x8-bit transpose vgbbd[13] which treats a 64-bit quantity as an 8x8 2D matrix, and performs a matrix transpose operation. Each bit 0 of each byte therefore becomes the first byte, each bit 1 of each byte becomes the second and so on.
  • a strange but very useful indexing instruction, (bpermd)[14] which allows selection of up to eight individual bits from a 64-bit source, by treating each byte of a second 64-bit register as bit-indices into the first.
  • Ternary 8-bit Bitwise ternary logic instruction xxeval[15] similar to AVX-512
  • strategic instructions for accelerating Packed BCD.[16]
  • Power v3.1 also introduced a number of additional bit manipulation instructions including swapping the order of bytes within half-words, words, and the whole 64-bit register.

Cray Supercomputers

[edit]

Cray patented BMM (Bit matrix multiply) in 1990 which could cope with up to 64x64-bit operands.[17]

IBM System/360 through z/Architecture

[edit]

IBM S/370, S/370-XA, ESA/370, and ESA/390 vector operations

[edit]

The IBM 3090 introduced an optional vector facility[18] to the System/370-XA and Enterprise Systems Architecture/370 instruction sets. In addition to integer and floating-point vector arithmetic and logical operations on multiple integer and floating-point values, it introduced vector bit manipulation operations count leading zeros vczvm and population count vcovm.[19]

z/Architecture scalar

[edit]

z/Architecture did not support the previous vector facility.[20] However, starting with the 11th edition of the z/Architecture Principles of Operation:[21] it supported the following instructions:

  • Vector count leading zeros vclz, count trailing zeros vctz[22][23] and vector population count vpopct[24]
  • Vector test under mask vtm[25] - sets a Condition Code based on comparing all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both.
  • Vector GF(2) multiply and multiply-accumulate, vgfm,[26] known as carryless multiply
  • And-complement and others,
  • bit-extract and deposit,[27]
  • a range of bit byte and masked insert instructions,[28]
  • comprehensive rotate and insert instructions including masked rotate-and-OR,[29] and shift,[30]
  • comprehensive Packed BCD.[31]
  • memory-based test-and-set and various masked-test set/clear bit operations, which move or copy a single bit into Condition Codes.[32]

DEC PDP-10

[edit]

The DEC PDP-6 and PDP-10 had Packed BCD.[33] and LUT2-style Logical operations covering the full suite of 2-operand logic.[34] Boolean function instead of ternary, like AVX512 and Power ISA.

ARM

[edit]
  • ARM11 has bitwise test-ANDed (a bitmasked test) and test-XOR, standard logical bitwise operations including OR-complement; byte halfword and bit-reversing, and conditional byte-selection/merging. Shift and rotate are available on Operand2.[35]
  • ARM Cortex-A has bit-field set, clear, extract and reverse.[36]
  • ARM A64 has SWAR-style half-word byte-swapping, bit-field insert and extract, and bit-reversing.[37]

RISC-V

[edit]

In the standard extensions RISC-V has scalar bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions.

  • RISC-V Zb* extensions contain a significant number of bit manipulation instructions.[38] The four groups are broken down into useful categories (the integer subset has min/max, rotate and Popcount for example), and have very good researched justifications for their inclusion and the improvements they bring.[39]
  • The RISC-V Vector Extension (RVV) has instructions that qualify as hardware-level bit manipulation, but on Vector masks rather than Scalar registers as is normally the case. For example, a Vector-mask Popcount is available.[40] RVV also has per-element bitwise operations.[41]

Embedded Microcontrollers

[edit]

Intel

[edit]
  • The 8086 has TEST, as well as bitwise operations[42]
  • The 8051 has SETB, CLR and CPL - set clear and invert bit instructions - and a considerable percentage of its instructions are bit manipulation.[43] Also included is Or-complement and And-complement, present in RISC-V Zb*.[44]

MOS 6502

[edit]
  • The WDC 65C02 added bit-manipulation: set, reset and test on individual bits.
  • Rockwell added similar extensions (RMB, SMB, BBR and BBS) to the R65C00 series[45]

Atmel PICs

[edit]

others

[edit]
  • Texas Instruments DSPs such as the TMS320C6000 series have set, clear, invert, test, extract and insert bit (or bit-field) instructions.[46]
  • The TX-2 from 1958 had "skip on bit" predication, as well as set, clear, invert and permute bits, and shift and other bitwise operations.[47][48]
  • SuperH has comprehensive memory-based bit manipulation including And-complement and Or-complement, but also has standard register-based test/set/clear and an unusual instruction that replaces bit N (in the range 0 to 7) and copies the replaced bit into the Test register.[49]

See also

[edit]

References

[edit]
z/Architecture Principles of Operation (PDF) (First ed.). IBM. December 2000. SA22-7832-00. Retrieved August 8, 2025.
z/Architecture Principles of Operation (PDF) (Eleventh ed.). IBM. March 2015. SA22-7832-10. Retrieved August 8, 2025.
z/Architecture Principles of Operation (PDF) (Fifteenth ed.). IBM. April 2025. SA22-7832-14. Retrieved July 3, 2025.
Power ISA™ Version 3.1 (PDF) (v3.1 ed.). IBM. May 1, 2020. SA22-7832-14. Retrieved Aug 7, 2025.
IBM System/370 Vector Operations (PDF) (Third ed.). IBM Corporation. August 1986. SA22-7125-2. Retrieved Sep 20, 2018.
DECsystem-10 - DECSYSTEM--20 - Processor Reference Manual (PDF). Digital Equipment Corporation. AA-H391A-TK, AD-4391A-T1. Retrieved August 8, 2025 – via bitsavers.org.
  1. ^ "Bit Twiddling Hacks".
  2. ^ "Advanced bit manipulation instructions: Architecture, implementation and applications". ProQuest.
  3. ^ "GitHub - riscv/Riscv-bitmanip at v0.93". GitHub.
  4. ^ https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-draft.pdf [bare URL PDF]
  5. ^ "TomF's talks and papers".
  6. ^ "GF2P8AFFINEQB — Galois Field Affine Transformation".
  7. ^ "Galois Field New Instructions (GFNI) Technology Guide". networkbuilders.intel.com.
  8. ^ power3.1, IBM Power ISA v3.1.
  9. ^ a b power3.1, p. 104, Power ISA Book I Chapter 3.3.13 Fixed-Point.
  10. ^ power3.1, p. 103, Power ISA Book I Chapter 3.3.13 Fixed-Point.
  11. ^ power3.1, p. 106.
  12. ^ power3.1, p. 106, Power ISA Book I Chapter 3.3.13 Fixed-Point.
  13. ^ power3.1, p. 445, Power ISA Book I Chapter 6.12.1 Vector Facility.
  14. ^ power3.1, p. 105, Power ISA Book I Chapter 3.3.13 Fixed-Point.
  15. ^ power3.1, p. 967, Power ISA Book I Chapter 7. Vector-Scalar Extension Facility.
  16. ^ power3.1, p. 117, Power ISA Book I Chapter 3.3.15 Fixed-Point.
  17. ^ https://patents.google.com/patent/US5170370A/en
  18. ^ ibm370, IBM System/370 Vector Operations.
  19. ^ ibm370, pp. 3-7–3-8.
  20. ^ z1, p. 1-1.
  21. ^ z11, p. xxviii.
  22. ^ z15, pp. 22-11–22-12.
  23. ^ z15, pp. 7-289–7-290.
  24. ^ z15, pp. 22–26, 7–424.
  25. ^ z15, p. 22-37.
  26. ^ z15, p. 22-16.
  27. ^ z15, p. 7-36.
  28. ^ z15, p. 7-309.
  29. ^ z15, pp. 7-426–7-430.
  30. ^ z15, p. 7-437.
  31. ^ z15, pp. 8-1–8-14.
  32. ^ z15, pp. 7-458–7-459.
  33. ^ pdp10, pp. 2.99.
  34. ^ pdp10, p. 2.38, 2.4 Boolean Functions.
  35. ^ https://pages.cs.wisc.edu/~markhill/restricted/arm_isa_quick_reference.pdf [bare URL PDF]
  36. ^ "Documentation – Arm Developer".
  37. ^ "Documentation – Arm Developer".
  38. ^ "Riscv-bitmanip/Bitmanip/Index.adoc at main · riscv/Riscv-bitmanip". GitHub.
  39. ^ "Riscv-bitmanip/Bitmanip/Overview.adoc at main · riscv/Riscv-bitmanip". GitHub.
  40. ^ "Riscv-v-spec/V-spec.adoc at master · riscvarchive/Riscv-v-spec". GitHub.
  41. ^ "Riscv-v-spec/V-spec.adoc at master · riscvarchive/Riscv-v-spec". GitHub.
  42. ^ "Bit Manipulation Instructions in 8086 | Logical Instructions". 11 August 2018.
  43. ^ https://cs.uok.edu.in/Files/79755f07-9550-4aeb-bd6f-5d802d56b46d/Custom/InstructionSet_UnitII.pdf [bare URL PDF]
  44. ^ "Boolean (Bitwise) instructions in 8051 for bit manipulation". 29 April 2020.
  45. ^ "Rockwell R6500/11, R6500/12 and R6500/15 One-Chip Microcomputers". 7 June 1987. Archived from the original on 3 September 2023. Retrieved 30 April 2020.
  46. ^ https://www.ti.com/lit/pdf/spru198 [bare URL]
  47. ^ "TX-2 Documentation".
  48. ^ http://www.bitsavers.org/pdf/mit/tx-2/TX-2_UserHandbook_ch3.pdf [bare URL PDF]
  49. ^ https://shared-ptr.com/sh_insns.html