Say you have two directories with a bunch of text files. How can you create a third directory that will contain all the files that are common to both input directories, such that every output file is a concatenation of both input files?
Shell-Foo is a series of fun ways to take advantage of the powers of the shell. In the series, I highlight shell one-liners that I found useful or interesting. Most of the entries should work on bash on Linux, OS X and other UNIX-variants. Some probably work with other shells as well. Your mileage may vary.
Feel free to suggest your own Shell-Foo one-liners!
comm -12 <(ls input1/) <(ls input2/) | \ xargs -n 1 -P 8 -I fname sh -c \ 'cat input1/fname input2/fname >combined/fname'
commtakes two files, and prints the lines from the files in 3 columns:
- The first column contains lines that appear in the first file, but not in the second.
- The second column contains lines that appear in the second file, but not in the first.
- The third column contains lines that appear in both files.
comm -12prints just the third column, resulting a list of common lines.
<(command)takes the output of
commandand “wraps” it as a file. It’s a neat bash shortcut to using temporary files like this:
ls input1/ >input1.ls ls input2/ >input2.ls comm -12 input1.ls input2.ls rm input1.ls input2.ls
- I wrote about
-Pflags) before. The
-I replstrflag is used to tell xargs to replace occurrences of “replstr” in the arguments list with the line from stdin. By default, up to 5 occurrences are replaced. This can be controlled using the
In case you’re wondering why I wrapped the entire command in
sh -c '...', it’s because I want to redirect the output of every command separately, as opposed to redirecting outputs from all commands together to one file. To make this clearer, consider the “intuitive alternative”:
xargs -n 1 -P 8 -I fname cat input1/fname input2/fname >combined/fname. This will run
cat as expected, but the result will be that all files are concatenated into a single file combined/fname, keeping just the output from the last command.
This can be easily generalized to any “combining function” (
cat in this case). For example, to get a sorted combined file:
comm -12 <(ls input1/) <(ls input2/) | \ xargs -n 1 -P 8 -I fname sh -c \ 'cat input1/fname input2/fname | sort >combined/fname'
I believe this is bash-specific, due to the way I redirected the output of two
ls commands into the input of