News How To Find Large Files on Linux

Jun 12, 2022
1
0
10
I was wondering what would be the best way to list the most dense folder, in terms of disk occupation. I had already a situation where an application was (wrongly) creating thousands of small files, and it was hard to find out the disk space's offender.
 

domih

Reputable
Jan 31, 2020
187
170
4,760
I was wondering what would be the best way to list the most dense folder, in terms of disk occupation. I had already a situation where an application was (wrongly) creating thousands of small files, and it was hard to find out the disk space's offender.

Suggestion
  • cd to a directory you have access to.
  • run:
find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c 'echo $(find "{}" -printf "\n" | wc -l) "{}"' | sort -nr | head -n 10

This return the top 10 folders with the most files (not necessarily the biggest folders in size).

Example

domih@trx:~$ find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c 'echo $(find "{}" -printf "\n" | wc -l) "{}"' | sort -nr | head -n 10
257681 .
86846 ./.cache
26093 ./.mozilla
19392 ./.config
18697 ./.phoronix-test-suite
15151 ./.gradle
13930 ./clion-2021.3.4
13289 ./.local
10035 ./.cargo
5962 ./.wine

Ref: https://stackoverflow.com/questions/9157138/recursively-counting-files-in-a-linux-directory

Otherwise to manually peruse through a file system:


Alternatively, use ncdu which is more useful than the GUI-based Disk Usage Analyzer (baobab):

sudo apt install ncdu
# Go to the directory you want to analyze or pass the directory path as parameter
ncdu ~
# Read the ncdu man page to look at all the options and commands.

Ref: comment in https://www.tomshardware.com/how-to/check-disk-usage-linux
 
Last edited:
Jun 12, 2022
1
0
10
I was wondering what would be the best way to list the most dense folder, in terms of disk occupation. I had already a situation where an application was (wrongly) creating thousands of small files, and it was hard to find out the disk space's offender.

One thing that I tend to do in these circumstances -- which I don't have time to flesh out right now -- is to take a snapshot of whatever it is I'm looking at (in your case, directories), then take another look, say, 20 seconds later, and see which one has the biggest delta -- e.g., in your case, the most files created in that interval.