Endians - big and little
Confusing behavior in hex dumpers
Consider the following bit of assembly:
; test.asm
bits 16
mov cl, 12
If you assemble this with nasm:
nasm test.asm
You will get a file that should contain the following binary sequence, as per the 8086 reference manual:
1011000100001100
Explanation: there are two 2 bytes here:
- 1011 0 001: the first 4 bits are the immediate to register opcode, the 0 bit after that
is the
w
bit, and the 001 bits after that encode the register cl. - 0000 1100: this is just 12 in binary.
In hexadecimal, this would be
b1 0c
That’s because:
- 1011 is b
- 0001 is 1
- 0000 is 0
- 1100 is c
Ok now here comes the first part. If you try to do a hexdump of the binary using hexdump test
,
you get:
0000000 0cb1
0000002
Ignore the offset – what you see though is that you get 0c b1
instead of b10c
. Hold on
to that thought for a moment. If you use hexdump -C test
instead, you get:
00000000 b1 0c |..|
00000002
Interesting, so now we do see b1 0c
. How about our trusted xxd test
?
00000000: b10c ..
As you will have guess, the issue here is one of endianness. If you pass the -e
option
to xxd, you get the output in little endian. That is xxd -e test
gives:
00000000: 0cb1
And for reference, the rule for hexdump
is that without any argument, it is the same as
providing -x
, which respects the endianness of the system.
Now we know that endianness is relevant only for “multi-byte” words: so what is the word
used by xxd
and hexdump
?
- As per the doc,
hexdump -x
uses two-byte words in whatever endianness the system is.hexdump -C
prints byte by byte. xxd -e
uses 4-byte words! So if you have 4 bytes, thexxd -e
andxxd
calls will invert the 4 bytes. If instead you passedxxd -g2 -e
, then the 4 bytes will be treated as two 2-bytes words, and each word will be reversed, separately.
To illustrate the last point, suppose I have some binary containing the following
bytes b1 0c b9 0c
in that order. Then we have the following:
xxd mytest # this gives: b10c b90c
xxd -e -g2 mtytest # this gives: 0cb1 0cb9
xxd -e -g4 mytest # this gives: 0cb90cb1
More confusion
So ok we’ve seen that xxd
and hexdump
can print things in normal mode or in little endian
mode. But so then which one is relevant?
To really answer this, we need once again to look at memory. Your file is a sequence of 0s and 1s in memory. Nothing more, nothing less. If all you want to know is that sequence, then endianness has nothing to do with anything. Endianness only matters for decoding a sequence of 0s and 1s.
So if what you want to do is view that sequence of bytes without any decoding, then you have a few options:
xxd -b test
will show you the content of thetest
file as a series of 0s and 1s.xxd test
will show the same thing but in hex. You can also typexxd -g1 test
to get space-separated bytes.hexdump -C test
will show the same info asxxd -g1 test
.
Ok, so now when should I care about the little endian versions? Well for now I haven’t really found a use for it, but it might just be because I haven’t really done much low-level stuff, and whatever I have done requires me to know exactly what is in the file, byte by byte. I’ll update this note if I ever have a use case for little endian dumps.