Table of Contents

Name

fishfood - Calculates file-size frequency-distribution.

Synopsis

fishfood [OPTIONS] [File-path ...]

Description

Counts the number of the specified files which fall within each of a sequence of discrete size-intervals.

Options

-s Int, --binSizeIncrement=Int
Defines the constant size-increase in the arithmetic sequence of bins into which the byte-sizes of files are categorised; defaulting to one standard-deviation.
-r Float, --binSizeRatio= Float
Defines the constant size-ratio in the geometric sequence of bins into which the byte-sizes of files are categorised; an alternative to "binSizeIncrement".
-p[Bool], --deriveProbabilityMassFunction[=Bool]
Whether to derive the Probability Mass Function rather than the Frequency-distribution.
The default value, in the absence of this option, is "False", but in the absence of only the boolean argument, "True" will be inferred.
-d Int, --nDecimalDigits=Int
The precision to which fractional auxiliary data is displayed.
--verbosity=(Silent|Normal|Verbose|Deafening)
Produces additional output where appropriate; i.e. file-names, file-size statistics, & column-headers.

Generic Program-information

-v, --version
Outputs version-information & then exits.
-?, --help
Displays help & then exits.

File-paths

If File-path is a single hyphen-minus (-), then the list of file-paths will be read from standard-input. Only plain files are acceptable; no directories, symlinks, sockets, ...

Exit-status

0 on success, & >0 if an error occurs.

Examples

Example 1

To find the frequency-distribution in the size of any Matroska video-files in a file-system:
fishfood --verbosity=Verbose $(find / -name ’*.mkv’ 2>/dev/null)    #CAVEAT: for efficiency, one may want to be more precise with the path supplied to "find".

Files=86, mean=394459250.942, standard-deviation=304129537.729
  Bin-size Frequency
========== =========
         0        44
 304129538        29
 608259076         6
 912388614         4
1216518152         2
1520647690         1

The left-hand column defines an arithmetic sequence of bins, whilst the right-hand column defines the number of files from those specified which fall into each. The choice of the increment between each bin has defaulted to one standard-deviation.
From this data one can conclude that there are 44 files whose size lies in the semi-closed interval [0, 1) standard-deviations, decaying monotonically to only one file whose size lies in the semi-closed interval [5, 6) standard-deviations.

Example 2

One can alternatively specify the arithmetic increment between bin-sizes, & also derive the probability that a file-size lies in any specific bin.
fishfood --verbosity=Verbose --binSizeIncrement=100000000 --deriveProbabilityMassFunction $(find / -name ’*.mkv’ 2>/dev/null)

Files=86, mean=394459250.942, standard-deviation=304129537.729
  Bin-size Probability
========== ===========
 100000000 0.209
 200000000 0.302
 300000000 0.233
 400000000 0.035
 500000000 0.070
 600000000 0.023
 800000000 0.035
 900000000 0.035
1000000000 0.012
1100000000 0.012
1200000000 0.012
1400000000 0.012
1600000000 0.012

CAVEAT: the total probability may differ from "1", due to round-errors; see "nDecimalDigits".

Example 3

One can alternatively define a geometric sequence of file-size bins, & also read the file-names from standard-input, to bypass any limit applied by the shell to the length of the command-line.
find /etc -type f -readable 2>/dev/null | fishfood --verbosity=Verbose --binSizeRatio=10 -

Files=1735, mean=13846.622, standard-deviation=74846.621
  Bin-size Frequency
========== =========
         0         4    #Though "0" isn’t a member of the requested geometric
sequence, it’s the integral value beneath all fractional values which are.
         1         2
        10       100
       100       563
      1000       794
     10000       188
    100000        83
   1000000         1

From this data one can conclude that there are 4 files whose size is zero, 2 files in the semi-closed interval [1, 10), 100 files in [10, 100), ...

find $HOME -name ’*.png’ -o -name ’*.gif’ -o -name ’*.jp*g’ | fishfood --verbosity=Verbose -r 2 -p -

Files=878, mean=78365.943, standard-deviation=297831.014
  Bin-size Probability
========== ===========
        32 0.023
        64 0.017
       128 0.008
       256 0.015
       512 0.034
      1024 0.046
      2048 0.047
      4096 0.096
      8192 0.155
     16384 0.179
     32768 0.155
     65536 0.157
    131072 0.032
    262144 0.017
    524288 0.003
   1048576 0.010
   2097152 0.007

When specifying an arithmetic sequence of bin-sizes, the lack of resolution amongst smaller files makes the distribution appear like the decaying exponential of a geometric distribution, but by using a geometric sequence of bin-sizes, it can be seen more clearly to be a log-normal distribution; see "A Large-Scale Study of File-System Contents" by John R. Douceur and William J. Bolosky.

Author

Written by Dr. Alistair Ward.

Bugs

Reporting Bugs

Report bugs to <fishfood@functionalley.com>.

Copyright

Copyright © 2013-2015 Dr. Alistair Ward

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/ >.

See Also


Table of Contents