Table of Contents

Name

squeeze - Finds the best subset, of the specified files, which fits into a given space.

Synopsis

squeeze [OPTIONS] [File-path ...]

Description

Finds the subset of the specified File-paths, which fits into the specified space with least room to spare, whilst also meeting the minimum usage-requirements; i.e. a degenerate instance of the "0-1 Knapsack-problem".

Any directories amongst the specified File-paths are treated as atomic units, & therefore only solutions which involve either all or none of the files contained, are returned.

Because of its exponential time-complexity, solutions of increasing suitability are produced lazily, rather than waiting until the optimal solution is known (which might take an inordinately long time).

Options

--verbosity=(Silent|Normal|Verbose|Deafening)
Produces additional explanatory output where appropriate.
CAVEAT: to be effective, this option must precede others.

Generic Program-information

-v, --version
Outputs version-information, & then exits.
-?, --help
Displays help, & then exits.

Selection

-M Bytes, --maximumBytes=Bytes
Defines the maximum available space, in bytes; defaulting to the space available on a DVD, i.e. 4700000000 bytes.
-m Float, --minimumUsageRatio=’Float
Defines the minimum acceptable usage-ratio (in the closed unit-interval [0,1]); defaulting to ’0.99’, i.e. 99% of maximumBytes.
-z[Bool], --includeEmpty[=Bool]
When "True", any empty file or directory, can be a member of all solutions.
The default value, in the absence of this option, is "False", but in the absence of only the boolean argument, "True" will be inferred.
CAVEAT: for n such files or directories, a factor of 2^n times more viable solutions exist, obscuring the minimal solutions on which they’re based.

Test

-r[Int], --randomSeed[=Int]
This option takes an optional integral argument with which to seed the unique random-number generator used for all random operations.
In the absence of this option, the random-number generator will be seeded unpredictably from the operating-system, but in the absence of only the integral argument, "0" will be inferred.
CAVEAT: to be effective, this option must precede either "testPerformanceContinuous" or "testPerformanceDiscrete".
--testPerformanceContinuous=’(Integer, LogNormalDistribution location scale^2)’
Measures the CPU-time required to find the best fit, for the specified number of randomly generated virtual files, the size of which conform to the specified continuous probability-distribution; & then exits.
"A Large-Scale Study of File-System Contents by John R. Douceur and William J. Bolosky" concludes that for arbitrary file-types, the frequency of file-sizes matches this distribution.
--testPerformanceDiscrete=’(Integer, PoissonDistribution lambda)’
Measures the CPU-time required to find the best fit, for the specified number of randomly generated virtual files, the size of which conform to the specified discrete probability-distribution; & then exits.
CAVEAT: the standard-deviation for this distribution is narrower than observed.

File-paths

If File-path is a single hyphen-minus ("-"), then the list of file-paths will be read from standard-input.

Exit-status

0 on success, & >0 if an error occurs.

Examples

Example 1

Say we’ve a directory of audio-files, categorised by artist.
ls -p

    ArabStrap/ BobDylan/ JeffBuckley/ JohnMartyn/ JoniMitchell/ ReservoirDogsOST/
    RichardThompson/ SethLakeman/ SusheelaRaman/ TeddyThompson/ Vangelis/

We want to find which combinations can be stored on a CD without wasting inordinate amounts of space.

squeeze -M 700000000 -m 0.999 *

    699871313    ["BobDylan","RichardThompson","SethLakeman","TeddyThompson"]
    699893320    ["ArabStrap","BobDylan","JeffBuckley","JohnMartyn","SethLakeman","SusheelaRaman"]
    699998310    ["ArabStrap","BobDylan","JoniMitchell","ReservoirDogsOST","SethLakeman","TeddyThompson","Vangelis"]

Note that the proposed solutions don’t split any of the directories, into their constituent files.

We can confirm the validity of the optimal result:

find ArabStrap BobDylan JoniMitchell ReservoirDogsOST SethLakeman TeddyThompson Vangelis -type f -print | perl -e ’use List::Util qw(sum); printf(qq(%d\n), sum map { chomp; (stat)[7] } <>);’

    699998310

N.B.: "du" will return a slightly larger size, since it includes the space required for directory-structures.

Example 2

We can improve on that result if we’re prepared to split some of the artist-specific directories into individual albums. With the expectation of more solutions from which to select, we’ll also raise the bar for what is acceptable.
N.B.: from version 1.0.3.0, the runtime can be passed a "-N" flag to request parallel execution on multiple cores, which as a side-effect, alters the selection of sub-optimal solutions to return. Whether this option actually results in reduced execution-time, depends on the file-size frequency-distribution & on the available hardware.
squeeze -M 700000000 -m 0.99999 ArabStrap BobDylan/* JeffBuckley JohnMartyn JoniMitchell ReservoirDogsOST RichardThompson SethLakeman SusheelaRaman TeddyThompson Vangelis +RTS -N

    699995815    ["ArabStrap","BobDylan/BlondeOnBlonde","BobDylan/Highway61Revisited","BobDylan/ModernTimes","BobDylan/StreetLegal","BobDylan/SubterraneanHomesickBlues","JoniMitchell","RichardThompson","SusheelaRaman","TeddyThompson","Vangelis"]
    699998578    ["BobDylan/Desire","BobDylan/Highway61Revisited","BobDylan/Infidels","BobDylan/StreetLegal","JeffBuckley","JohnMartyn","JoniMitchell","RichardThompson","SethLakeman","SusheelaRaman","TeddyThompson","Vangelis"]
    699999112    ["BobDylan/BlondeOnBlonde","BobDylan/Desire","BobDylan/Highway61Revisited","BobDylan/Infidels","BobDylan/ModernTimes","JeffBuckley","JohnMartyn","ReservoirDogsOST","RichardThompson","SusheelaRaman","TeddyThompson","Vangelis"]

Example 3

If we’re prepared to add individual files from another artist:
squeeze -M 700000000 -m 0.9999999 ArabStrap BobDylan/* JeffBuckley JohnMartyn JoniMitchell ReservoirDogsOST $(find RichardThompson -type f) SethLakeman SusheelaRaman TeddyThompson Vangelis

    699999964    ["ArabStrap","BobDylan/Desire","BobDylan/Infidels","BobDylan/ModernTimes","BobDylan/OhMercy","BobDylan/SubterraneanHomesickBlues","BobDylan/TimeOutOfMind","JeffBuckley","JohnMartyn","JoniMitchell","ReservoirDogsOST","RichardThompson/FrontParlourBallads/RichardThompson-06-MySoulMySoul.ogg","RichardThompson/RumorAndSigh/RichardThompson-12-MotherKnowsBest.ogg","RichardThompson/TheOldKitBag/RichardThompson-01-Gethsemane.ogg","RichardThompson/TheOldKitBag/RichardThompson-11-OutsideOfTheInside.ogg","SethLakeman","SusheelaRaman","TeddyThompson","Vangelis"]
    699999987    ["BobDylan/BlondeOnBlonde","BobDylan/BloodOnTheTracks","BobDylan/Desire","BobDylan/ModernTimes","BobDylan/OhMercy","BobDylan/StreetLegal","BobDylan/SubterraneanHomesickBlues","BobDylan/TimeOutOfMind","JoniMitchell","ReservoirDogsOST","RichardThompson/FrontParlourBallads/RichardThompson-01-LetItBlow.ogg","RichardThompson/IWantToSeeTheBrightLightsTonight/RichardAndLindaThompson-13-TheCalvaryCross[Live-Bonus].ogg","RichardThompson/TheOldKitBag/RichardThompson-01-Gethsemane.ogg","RichardThompson/TheOldKitBag/RichardThompson-06-FirstBreath.ogg","RichardThompson/TheOldKitBag/RichardThompson-11-OutsideOfTheInside.ogg","SethLakeman","SusheelaRaman","TeddyThompson","Vangelis"]
    700000000    ["ArabStrap","BobDylan/Desire","BobDylan/Highway61Revisited","BobDylan/Infidels","BobDylan/ModernTimes","BobDylan/StreetLegal","BobDylan/SubterraneanHomesickBlues","BobDylan/TimeOutOfMind","JeffBuckley","JoniMitchell","ReservoirDogsOST","RichardThompson/FrontParlourBallads/RichardThompson-01-LetItBlow.ogg","RichardThompson/RumorAndSigh/RichardThompson-08-BacklashLoveAffair.ogg","RichardThompson/RumorAndSigh/RichardThompson-12-MotherKnowsBest.ogg","RichardThompson/TheOldKitBag/RichardThompson-01-Gethsemane.ogg","RichardThompson/TheOldKitBag/RichardThompson-04-ALoveYouCantSurvive.ogg","RichardThompson/TheOldKitBag/RichardThompson-06-FirstBreath.ogg","SusheelaRaman","TeddyThompson","Vangelis"]

^C

The exact match isn’t unexpected, given the 2^71 possible combinations. The process was terminated after this solution was found, though where time permits, one may choose to wait for alternative exact matches.

Author

Written by Dr. Alistair Ward.

Bugs

Reporting Bugs

Report bugs to <squeeze@functionalley.com>.

Copyright

Copyright © 2010-2015 Dr. Alistair Ward

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/ >.

See Also


Table of Contents