16

I have two external disks that has the same files. One is encrypted, the other is not. The encrypted one has a lot less space left than the non encrypted, I now assume that it is because of hardlinks on the non encrypted disks.

So I would like to check, if there are any hardlinked files that might be doubled on the encrypted disk. How can I identify a hardlink?

If you have any other ideas what the reason for the free space issue could be, I'am open to ideas. Is it possible that the files need more space because of the encryption?

Jeno
  • 375
  • Please edit your original question, when the two external disks are connected and their partitions mounted, in order to show the output of the following commands for the two file systems (the encrypted one and the un-encrypted one), df ; sudo lsblk -f ; sudo lsblk -m ; Indent each line 4 spaces to render the output as code. – sudodus Nov 03 '17 at 06:40
  • There are several reasons thiis question needs clarification. First/foremost, having two links to one file does not effect the amount of space consumed. You can have hundreds of (hard or soft) links to a 10GB file (for example) and the space consumed will be 10GB total (plus a litttle overhead for the addtional directory entries). Please indicate how you arrive at the amount of space used and provide details that show why you feel this may be a concern. – Dennis Aug 14 '24 at 19:11

4 Answers4

32
$ find -type f -links +1

That will show all regular files that have more than one link (name) to them. It will not tell you which names are linked to the same file, for that you could use -samefile or -inum, e.g. find -samefile "$somefile"

In the technical sense, all files (file names) are (hard) links, it's just that files with more than one link pointing to them are interesting in this sense. But even in those cases, there's no way to say that one of them is the "proper" file, and the other a link, the links are equal.

As an example:

$ touch a b c
$ ln b b2 ; ln c c2
$ find -type f -links +1
./c2
./b
./b2
./c
$ find -samefile b
./b
./b2
ilkkachu
  • 1,887
  • 11
  • 15
  • +1 This is a good answer :-) I will 'borrow from it' to my answer. – sudodus Nov 02 '17 at 19:22
  • 1
    -links +1 is a GNU extension. For better portability (POSIX compliance), use the equivalent \! -links 1 as in: find . -type f \! -links 1. Also, -samefile is likewise a GNU extension for which there is no simple POSIX equivalent (at least not within find). – Wildcard Nov 02 '17 at 22:26
  • @Wildcard, yep. Though I've yet to see an Ubuntu system with a non-GNU userspace. – ilkkachu Nov 03 '17 at 23:26
  • Ah, that is true. I frequent Unix & Linux SE and tend to forget that portability isn't as much of a concern over here. :) – Wildcard Nov 04 '17 at 01:09
  • find . -type f -links +1 -exec ls -i {} \; | sort will find all hard linked files, then list the inode of the file, and sort them in order. This will group all the same files together. – smilingfrog Aug 19 '24 at 22:15
7

Search for hard links

@ilkkachu's and @barrycarter's answers are good. This answer is an alternative, that describes some results with more details.

  • If the linked {match is/matches are} in the same directory tree, you will find them directly.

  • Otherwise you can search in the whole file system from the mount point, but only within the same file system using -xdev, which is important if you search the root partition / and there are other mounted partitions.

    $ sudo find / -xdev -type f -links +1 -ls | sort -n > hard-links-in-root.txt
    

The following is an example, where one hard linked pair is found in the current directory, and two hard linked matches are found in another directory by searching from the mount point /media/multimed-2 of the data partition.

$ sudo find . -xdev -type f -links +1 -ls | sort -n
  5242881    648 -rw-rw-r--   2 olle     nio        657936 jun 30  2015 ./like-this.png
  5242882    940 -rw-rw-r--   2 olle     nio        957688 jun 30  2015 ./from-here.png
 14843905   1620 -rw-r--r--   2 olle     nio       1652803 jun 30  2015 ./img_4810.jpg
 14843905   1620 -rw-r--r--   2 olle     nio       1652803 jun 30  2015 ./mid-sommer-night_4810.jpg

$ find /media/multimed-2/ -samefile ./like-this.png
/media/multimed-2/Photos/2015/06/30/like-this.png
/media/multimed-2/Bilder/kartor/like-this.png

$ find /media/multimed-2/ -samefile ./from-here.png
/media/multimed-2/Photos/2015/06/30/from-here.png
/media/multimed-2/Bilder/kartor/from-here.png

Other causes why different amount of drive space is used

  • Different file systems (ext4, NTFS, FAT32 ...)

  • Different partition size, which causes differences in the overhead (meta-data).

  • Different sector size on the drive (maybe?)

sudodus
  • 47,864
4

In theory, hard links should be indistinguishable from regular files (that's sort of the point). If "x" is a hardlink to "y", then "y" is also a hardlink to "x". That being said, the second column of ls -l tells you how many links there are to a given file. If this number is bigger than 1, the file is or has a hardlink somewhere. This may not work for directories, but I'm not sure why. I initially said each file in a directory has a link to that directory, but I was wrong: I found a directory with 10 files whose "link count" was only 2.

Once you've found the hard link, you can do ls -i to see its inode, and then use find's inode option to find other file(s) with the same inode (thus making them hard links to each other). Be sure to restrict find to a specific device, otherwise you may get spurious results.

To find all hard links at once, have find spit out inodes for all files on a device, and then use things like sort and uniq to find duplicates.

  • 2
    This may not work for directories, but I'm not sure why. I initially said each file in a directory has a link to that directory, but I was wrong: I found a directory with 10 files whose "link count" was only 2. The parent has a link to the directory, the directory itself has a . to it, and each child directory has a .. to it. – tkausl Nov 02 '17 at 16:02
  • 3
    Technically, a hardlink is just an association of a filename to some data. Every normal file (i.e. not symlinks, devices, etc.) is a hardlink. So I'd complain that "If this number is bigger than 1, the file is or has a hardlink somewhere" isn't quite accurate; I'd rather see it say "...the file has another hardlink somewhere" or something like that. This is a minor point though. – David Z Nov 02 '17 at 20:37
  • @DavidZ It's minor, but relevant, because people easily lose sight of that fact. What the OP really wants is a list of all files that aren't directories and have 2 or more link count. – Monty Harder Nov 03 '17 at 15:07
3

You could do something like this:

find . -type t -ls | grep -v " 1 username"

This will list files in the current directory and perform a ls on it. As @barrycarter said, hard links are indistinguishable from real files, but in this listing they will show up as having more than one link. Using grep -v you weed out the files that have only one link. (The username in the grep command is to make grep look in the right place for the single 1. Replace by your own username.)

Jos
  • 30,707
  • 1
    Actually, regular ls -l shows this too, and ls -l |perl -anle 'print $F[1]' is a more generic solution (you can also use cut or something). I refer to this column as "the column everyone ignores" :) I was surprised to learn that some packages/software create hard links-- I thought my drive was hard link free, but apparently not. –  Nov 02 '17 at 13:48
  • A directory that is "hard link free," contains no files. Hard link = file = hard link. How or why people differentiate between these two names for the same thing, is baffling. It's like saying, "I have cows but no cattle." Simply put: a hard link is a name for a file. I may have other names and those names may be in other directories – Dennis Aug 14 '24 at 18:49