Findlargedir is a tool written to help quickly identify “black hole” directories on any filesystem having more than 100k entries in a single flat structure.
When a directory has many entries (directories or files), getting a directory listing gets slower and slower, impacting the performance of all processes attempting to get a directory listing. Processes reading large directory inodes get frozen while doing so and end up in the uninterruptible sleep (“D” state) for longer and longer periods. Depending on the filesystem, this might become visible with 100k entries and a very noticeable performance impact with 1M+ entries.
Such directories cannot shrink back even if the content gets cleaned up since most Linux and Un*x filesystems do not support directory inode shrinking. This often happens with forgotten Web sessions directory (PHP sessions folder where GC interval was configured to several days), various cache folders (CMS compiled templates and caches), POSIX filesystem emulating object storage, etc.
The program will attempt to identify any such events and report on them based on calibration, i.e. how many assumed directory entries are packed in each directory inode for each filesystem. While doing so, it will determine the directory inode growth ratio to the number of entries/inodes and will use that ratio to quickly scan the filesystem, avoiding doing expensive/slow directory lookups.
While many tools scan the filesystem (find, du, ncdu, etc.), none of them use heuristics to avoid expensive lookups since they are designed to be fully accurate, while this tool is meant to use heuristics and alert on issues without getting stuck on problematic folders.
Findlargedir will not follow symlinks and requires r/w permissions to calibrate the directory to calculate a directory inode size to number of entries ratio and estimate a number of entries in a directory without actually counting them. While this method approximates the actual number of entries in a directory, it is good enough to scan for offending directories quickly.
The tool is available for free on GitHub.