Endians - big and little
Confusing behavior in hex dumpers
Consider the following bit of assembly:
; test.asm
bits 16
mov cl, 12
If you assemble this with nasm:
nasm test.asm
You will get a file that should contain the following binary sequence, as per the 8086 reference manual:
1011000100001100
Explanation: there are two 2 bytes here:
- 1011 0 001: the first 4 bits are the immediate to register opcode, the 0 bit after that
is the
wbit, and the 001 bits after that encode the register cl. - 0000 1100: this is just 12 in binary.
In hexadecimal, this would be
b1 0c
That’s because:
- 1011 is b
- 0001 is 1
- 0000 is 0
- 1100 is c
Ok now here comes the first part. If you try to do a hexdump of the binary using hexdump test,
you get:
0000000 0cb1
0000002
Ignore the offset – what you see though is that you get 0c b1 instead of b10c. Hold on
to that thought for a moment. If you use hexdump -C test instead, you get:
00000000 b1 0c |..|
00000002
Interesting, so now we do see b1 0c. How about our trusted xxd test?
00000000: b10c ..
As you will have guess, the issue here is one of endianness. If you pass the -e option
to xxd, you get the output in little endian. That is xxd -e test gives:
00000000: 0cb1
And for reference, the rule for hexdump is that without any argument, it is the same as
providing -x, which respects the endianness of the system.
Now we know that endianness is relevant only for “multi-byte” words: so what is the word
used by xxd and hexdump?
- As per the doc,
hexdump -xuses two-byte words in whatever endianness the system is.hexdump -Cprints byte by byte. xxd -euses 4-byte words! So if you have 4 bytes, thexxd -eandxxdcalls will invert the 4 bytes. If instead you passedxxd -g2 -e, then the 4 bytes will be treated as two 2-bytes words, and each word will be reversed, separately.
To illustrate the last point, suppose I have some binary containing the following
bytes b1 0c b9 0c in that order. Then we have the following:
xxd mytest # this gives: b10c b90c
xxd -e -g2 mtytest # this gives: 0cb1 0cb9
xxd -e -g4 mytest # this gives: 0cb90cb1
More confusion
So ok we’ve seen that xxd and hexdump can print things in normal mode or in little endian
mode. But so then which one is relevant?
To really answer this, we need once again to look at memory. Your file is a sequence of 0s and 1s in memory. Nothing more, nothing less. If all you want to know is that sequence, then endianness has nothing to do with anything. Endianness only matters for decoding a sequence of 0s and 1s.
So if what you want to do is view that sequence of bytes without any decoding, then you have a few options:
xxd -b testwill show you the content of thetestfile as a series of 0s and 1s.xxd testwill show the same thing but in hex. You can also typexxd -g1 testto get space-separated bytes.hexdump -C testwill show the same info asxxd -g1 test.
Ok, so now when should I care about the little endian versions? Well for now I haven’t really found a use for it, but it might just be because I haven’t really done much low-level stuff, and whatever I have done requires me to know exactly what is in the file, byte by byte. I’ll update this note if I ever have a use case for little endian dumps.