-
Merging 100+K Gzipped Files Together
In order to open a large volume of cloudfront logs in a spreadsheet, we needed to merge over 100K .gz files on OS X Maverick into a single file. Trying to use the following command resulted in a /usr/bin/cat: Argument list too long error:
cat logs/*.gz >> combined_logs.gz
The reason this error occurs is because bash expands the asterisk to all matching files, producing a very long command line. To circumvent this problem the xargs command needs to be used to split up the list:
find logs/ -name "*.gz" -print0 | xargs -0 cat >> combined_logs.gz
To clean up all the extra header information you can throw in a sort -u on the combined_logs file (or in our case we just sorted the columns in Excel).
The above command can also be modified for other “Argument list too long errors” such as rm, cp, mv, tar, etc.
Leave a reply