Shell Foo: Simple Distributed grep on a GCE Cluster

You just finished running a distributed multi-phase pipeline on a cluster of 200 GCE VM’s. Good for you! But something doesn’t look right, and you’d like to investigate by grepping the local log files across all nodes and do something with the results. How would you do that?

Let’s say that if you had 1-2 machines, you would ssh into each, and run this command:

$ grep ERROR /tmp/*.log.* >errors

This post describes a simple one-liner that scales the local grep to cluster-scale.

Shell-Foo credit for this one: Eyal Fink.

Shell-Foo is a series of fun ways to take advantage of the powers of the shell. In the series, I highlight shell one-liners that I found useful or interesting. Most of the entries should work on bash on Linux, OS X and other UNIX-variants. Some probably work with other shells as well. Your mileage may vary.

Feel free to suggest your own Shell-Foo one-liners!

Continue Reading…