Endians - big and little

Posted on May 9, 2024

Confusing behavior in hex dumpers

Consider the following bit of assembly:

; test.asm
bits 16
mov cl, 12

If you assemble this with nasm:

nasm test.asm

You will get a file that should contain the following binary sequence, as per the 8086 reference manual:

1011000100001100

Explanation: there are two 2 bytes here:

  • 1011 0 001: the first 4 bits are the immediate to register opcode, the 0 bit after that is the w bit, and the 001 bits after that encode the register cl.
  • 0000 1100: this is just 12 in binary.

In hexadecimal, this would be

b1 0c

That’s because:

  • 1011 is b
  • 0001 is 1
  • 0000 is 0
  • 1100 is c

Ok now here comes the first part. If you try to do a hexdump of the binary using hexdump test, you get:

0000000 0cb1
0000002

Ignore the offset – what you see though is that you get 0c b1 instead of b10c. Hold on to that thought for a moment. If you use hexdump -C test instead, you get:

00000000  b1 0c                                             |..|
00000002

Interesting, so now we do see b1 0c. How about our trusted xxd test?

00000000: b10c                                     ..

As you will have guess, the issue here is one of endianness. If you pass the -e option to xxd, you get the output in little endian. That is xxd -e test gives:

00000000:     0cb1

And for reference, the rule for hexdump is that without any argument, it is the same as providing -x, which respects the endianness of the system.

Now we know that endianness is relevant only for “multi-byte” words: so what is the word used by xxd and hexdump?

  1. As per the doc, hexdump -x uses two-byte words in whatever endianness the system is. hexdump -C prints byte by byte.
  2. xxd -e uses 4-byte words! So if you have 4 bytes, the xxd -e and xxd calls will invert the 4 bytes. If instead you passed xxd -g2 -e, then the 4 bytes will be treated as two 2-bytes words, and each word will be reversed, separately.

To illustrate the last point, suppose I have some binary containing the following bytes b1 0c b9 0c in that order. Then we have the following:

xxd mytest           # this gives: b10c b90c
xxd -e -g2 mtytest   # this gives: 0cb1 0cb9
xxd -e -g4 mytest    # this gives: 0cb90cb1

More confusion

So ok we’ve seen that xxd and hexdump can print things in normal mode or in little endian mode. But so then which one is relevant?

To really answer this, we need once again to look at memory. Your file is a sequence of 0s and 1s in memory. Nothing more, nothing less. If all you want to know is that sequence, then endianness has nothing to do with anything. Endianness only matters for decoding a sequence of 0s and 1s.

So if what you want to do is view that sequence of bytes without any decoding, then you have a few options:

  • xxd -b test will show you the content of the test file as a series of 0s and 1s.
  • xxd test will show the same thing but in hex. You can also type xxd -g1 test to get space-separated bytes.
  • hexdump -C test will show the same info as xxd -g1 test.

Ok, so now when should I care about the little endian versions? Well for now I haven’t really found a use for it, but it might just be because I haven’t really done much low-level stuff, and whatever I have done requires me to know exactly what is in the file, byte by byte. I’ll update this note if I ever have a use case for little endian dumps.