I have a directory with a very large file in it that is split up into pieces. I need this file to be in one big piece for a particular application (although that will be fixed soon).
In the meantime I wanted some kind of progress display to show me how long combining all of these files was going to take so I whipped this up:
#!/usr/bin/env bash
cat $(ls blocks/blk*.dat | sort) | \
pv -s $(($(du -c blocks | cut -f1 | tail -n 1)*512)) > timsblocks.dat
The files are the Bitcoin blockchain pieces if you're wondering. And what the script does is as follows:
Get the list of block files from ls and make sure they're sorted by name (I'm paranoid):
cat $(ls blocks/blk*.dat | sort)
Get the size of the blocks directory and get a final total of everything in there -c
as the last output line:
du -c blocks
Cut everything off after the first value (similar to awk '{ print $1 }'
but cooler):
cut -f1
Get just the total:
tail -n 1
Then the text in the parenthesis (...)
is treated as a mathematical expression and multiplied by 512 because du
reports sizes in 512-byte blocks and I need them in byte. Finally, that's passed to the -s
switch of pv
to tell it how big the data is. And now my output looks like this:
> ./combine.sh
118GiB 0:10:07 [ 216MiB/s] [==========> ] 33% ETA 0:19:59
Much better than guessing when it'll be done.