BerandaComputers and TechnologyStructure of the ARM A64 instruction set (2017)

Structure of the ARM A64 instruction set (2017)

Earlier this year I bought a Raspberry Pi 3 to have as an AArch64
development machine. The fastest way to get familiar with an
instruction set is to write a disassembler for it and I’ve made one
for 64-bit ARM in R6RS Scheme as part of
the machine-code project.
The instruction set is called ARM A64, instructions are always 32 bits
wide and they have a neat structure which is pretty fast to decode in software.

The architecture has 31 integer registers (x0-x30). There is also a
stack pointer register and a zero register that always contains
zeroes. Both these registers are encoded as register number 31, and
it’s up to each instruction if an operand can use the stack pointer or
the zero register. The x30 register is used to store the return
address. These registers are all 64-bit registers and the lower 32
bits can be accessed using the names w0-w31. Operations that write to
the lower 32 bits also clear the upper 32 bits, just like on AMD64.

There are also 32 registers usable as either floating point registers
or 128-bit vector registers. As vectors they support different
arrangements that are either 64 or 128 bits in total, containing
8-bit, 16-bit, 32-bit or 64-bit quantities. There are many
instructions that operate on multiple quantities at the same time,
which is an interesting way to speed up code. Multiple loop iterations
can be run simultaneously.

The instructions are documented in the ARM ARM for ARMv8-A. I’ve
counted, not including instruction aliases, 442 instruction mnemonics
(things like ADD, EOR, B.EQ, etc). They are organized in what is
basically a four-level table: main encoding, instruction group, decode
group and instruction. Chapter C4 of the manual follows the same
structure. This structure is nice for fast decoding, but it’s not
strictly necessary since all encodings at the instruction level still
need to have a unique meaning.

For each instruction mnemonic there can be multiple variants that
enable the instruction to handle different types of operands. An
example of this is the FMUL instruction that multiples two floating
point values. In a C program it would look like a = b c. In A64
assembler it might look like one of these, depending on what the
surrounding code does:

fmul s0, s1, s2            
fmul d0, d1, d2            
fmul v0.2s, v1.2s, v2.2s   
fmul v0.4s, v1.4s, v2.4s   
fmul v0.2d, v1.2d, v2.2d   
fmul s0, s1, v2.s[0]       
fmul d0, d1, v2.d[0]
fmul v0.2s, v1.2s, v2.s[0] 
fmul v0.4s, v1.4s, v2.s[0]
fmul v0.2d, v1.2d, v2.d[0]

That’s quite a few variants for a single mnemonic. Not all mnemonics
have this many variants, but depending on how one counts I estimate
that there are in total around 1000-2000 variants. The instruction set
designers had to fit all these variants into 32 bits, while at the
same time making space for instructions that encode relatively large
immediate operands, and not forgetting about leaving space for future
extensions. As if that wasn’t difficult enough, the instructions
should also be easy to decode with hardware.

I’ve extracted the tables from my disassembler, rendered them with
the bit-field package, and
made them slightly interactive. If you’re reading this in a browser
you can see the encodings below. The thing to notice is that each
layer adds extra fixed bits: fields that must be a fixed 0 or 1 value.
(The last level, the instruction level, is not shown in this table).
Two encodings under the same parent always have some differences in
these fields, so that they can be separated by an instruction decoder.
Click an encoding to expand the next level of encodings.

There are many conventions in the field names. Instructions that take
register operands encode them in fields named Rd, Rn and Rm.
Immediate values (integers, PC-relative offsets, etc) are named imm.
Fields that change the type of operation tend to be called opN or
opcode. In general a few of the fields encode the operation (or the
size of the operation) and the rest encode the operands.

The image below shows the encoding space of the instruction set. The
x axis goes from 0 to 216-1 and encodes the lower 16 bits
of the instruction space, and the y axis contains the upper 16 bits.
The different colors denote different decode groups, i.e. all the
encodings at the third level of the table above. (There is probably a
better representation).

An image showing the 32-bit
  encoding space. Mostly there are horizontal thick lines of different
  colors. This shows that the higher 16 bits tend to keep similar
  instructions together, although there is some mirroring around the
  middle of the image.

All the dark spots are places where ARMv8 does not have any allocated
instructions, or the encoding is reserved. For many instructions there
are some fields that have reserved encodings and these are also dark.

Even if instructions are kept to the fixed 32 bit encoding there is
still plenty of room for the instruction set to grow.

ARM A64 is a quite clean instruction set with only a few quirks here
and there in its encoding. Compared to AMD64 it has twice the amount
of registers, a clean separation of load/store instructions, clean
RISCy operands (mostly one destination register and two source
registers) and of course the register names and most mnemonics are
totally different. Both have 128-bit vector registers and 64-bit
integer registers and a 64-bit address space. They look quite similar,
except everything’s different.

Read More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments