12

It is to my understanding that in /bin we have some binary executables that are just compiled C programs. Out of curiosity, I decided to play with them and I opened ls with sudo privileges using nano, and added the character 0 at the beginning. I saved the file and, to my amusement, I checked that indeed, executing the ls command in a terminal does not work.

What I did not expect is that after deleting this character I had added, the ls command still does not work and segfaults.

If the files contained in /bin are nothing but machine code with ones and zeros, why does deleting a character that we have previously added not yield a program that works?

  • 2
    Consider that compiled binaries are binaries and not Human Editable. By editing compiled binaries in this way, you break the byte-code's readability. That in turn breaks the executable. – Thomas Ward Nov 19 '24 at 18:30
  • 5
    It all depends on the editor you are using. – U. Windl Nov 20 '24 at 09:21
  • 2
    @ThomasWard pedantically, we would not normally use the term "byte code" to refer to machine code (native code); the term "byte code" (or "bytecode") typically refers to machine-independent binary code which is interpreted or JIT-compiled to machine code by a language's runtime - think Java class files, Python .pyc, etc. – nneonneo Nov 21 '24 at 14:48
  • 1
    Nitpick: while many are written in C, some may use C++ or any other compiled language (or a combination). It's even possible to have shell scripts or other scripts run using another interpreter (via the #! first line). – jcaron Nov 22 '24 at 10:41
  • Chances are the editor did more than just add a character; it may have added UTF info to the file, assuming it was supposed to be a text file but was lacking it. – DanW58 Dec 08 '24 at 20:54

3 Answers3

27

Editing a binary file with an editor meant for text may or may not work - in practice it usually won't. The editor might make various changes like fixing line endings (e.g., if a sequence of bytes in the file contained \r\n, the editor might "fix" that to \n, or fix occurrences of just \n to \r\n), adding a trailing newline if file didn't end with one, etc. There's a high likelihood that the ls binary did get corrupted by some of such changes. You can compare the output of od -c or hexdump on the original and modified files to see what changed, exactly. And consider using a hex editor in future (Please recommend a hex editor for shell) for editing non-text files.

Raffa
  • 35,113
muru
  • 207,970
  • 4
    Notably CR+LF issues are not the only probable cause, e.g. TAB and SPACE characters may also get "optimized" - depending on settings for the used text editor. – Hannu Nov 19 '24 at 17:34
  • 2
    Also, text editors tend to dislike NUL (0x00) characters – Nayuki Nov 20 '24 at 02:06
  • @Nayuki Poorly written ones do. Good text editors have no issues whatsoever because they don’t rely on in-band signaling to indicate the end of a string like C does by default, instead using data structures that make the size of the data explicit in some way (for example, fixed length strings in a rope). – Austin Hemmelgarn Nov 20 '24 at 02:34
  • 6
    Yet another likely effect is normalizing any invalid utf-8 characters, assuming the editor is running under UTF-8 locale. – jpa Nov 20 '24 at 08:34
  • The original Windows Write program had an option to open files with "No Conversion" and could be used to edit binaries. In an idle moment at work, I edited a copy of command.com (DOS 5.0, I think) on a boot floppy disk. A couple of our interns used a copy of that disk to try to troubleshoot the computer of a user who had opened a help desk ticket. They were greatly concerned when the message "Bad command or your mama" appeared. I had to explain. Write could also be used to edit Notepad.exe to display .bat files by default instead of .txt files. – Wastrel Nov 20 '24 at 15:09
  • 1
    The term is "binary-safe". Vim, for example, is binary safe. Qemacs is binary-safe. – Marc Wilson Nov 20 '24 at 16:07
  • line wrapping might also be an issue – ilkkachu Nov 22 '24 at 18:10
23

Repeating your experiment and then comparing with the original using binwalk shows that the reason is indeed the conversion of 0x0D (CR) to 0x0A (LF) character:

$ binwalk -Wi test /bin/ls

OFFSET test /bin/ls

  • 0x00000190 02 00 00 00 06 00 00 00 58 0D 02 00 00 00 00 00 |........X.......| \ 02 00 00 00 06 00 00 00 58 0A 02 00 00 00 00 00 |........X.......|
  • 0x000015B0 01 00 0D 00 03 00 03 00 03 00 03 00 03 00 03 00 |................| / 01 00 0A 00 03 00 03 00 03 00 03 00 03 00 03 00 |................|
  • 0x00001660 43 05 00 00 10 00 00 00 94 91 96 06 00 00 0D 00 |C...............| \ 43 05 00 00 10 00 00 00 94 91 96 06 00 00 0A 00 |C...............|

    Likely one of these bytes was part of a memory address or an instruction involved in creating an address. Invalid memory address then results in the segmentation fault.

    When you open the file, nano informs of this as (Converted from Mac format).

    You can disable this conversion and the addition of a trailing newline by giving the command line options nano -LN. With these options the result is identical to the original, but editing binary files in this way is still prone to corruption.

jpa
  • 1,595
  • 3
    "Converted from Mac format" is an interesting historical artifact. Mac switched to newlines like other *nix systems with the introduction of OSX, 23 years ago. – Kevin Nov 20 '24 at 20:40
  • What is the binwalk command you're using? I tried https://github.com/ReFirmLabs/binwalk but it doesn't support a -W option. In the past I've used cmp -l for quickly looking at differences in binary files, but it's terrible (one line per difference, and in octal) – Peter Cordes Nov 21 '24 at 20:05
  • @PeterCordes Debian binwalk package, version 2.3.3+dfsg1-2. I guess the rust reimplementation is not yet feature-complete. – jpa Nov 22 '24 at 08:08
  • @Kevin True.  But how long were Macs using CRs, though?  (And how else should CR-based text be referred to, given that LF-based has been called ‘Unix format’ and similar for half a century?) – gidds Nov 22 '24 at 23:00
0

When doing that, you basically change the file type. To demonstrate, a quick. experiment:

cd /bin
sudo cp ls ls_copy  #ensure not to play around with binaries needed by the system
file ls_copy

The last commend will generate an output similar to the following:

ls_copy: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=36b86f957a1be53733633d184c3a3354f3fc7b12, for GNU/Linux 3.2.0, stripped

Now open ls_copy with Nano using root privileges:

sudo nano ls_copy

What happens here is that Nano opens the binary file ls_copy in text mode. You will see characters which most doesn't seem to make any sense. Add a character like a 0 at the beginning or any other position and save the file pressing STRG+o to write out the file. Now run

file ls_copy

again and you will get an output similar to

ls_copy: data

So obviously, ls_copy is not an executable binary anymore, but some kind of text-type file. Running

ls_copy

while result in an error like

-bash: /usr/bin/ls_copy: cannot execute binary file: Exec format error

due to that.

To edit binary file, you need to use a binary editor, not a text editor.

noisefloor
  • 1,803