Shell Foo: Subtracting Lists With comm

The comm utility is powerful, yet not well known.

Named after the word “common”, you can use the comm utility to select or reject lines common to two files.

Practically, once I learned about this tool, I use it often to:

  1. Get all lines that exist in both files (“select common” lines).
  2. Find lines that are missing from one file compared to another (“reject common” lines).

Just don’t forget that comm requires that the input files are sorted lexically! That shouldn’t be a problem, since you can always use sort before calling comm. But you do need to remember to do this, otherwise you might get unexpected results (which may or may not happen to me on a regular basis).

Shell-Foo credit for this one: Eyal Fink.

Shell-Foo is a series of fun ways to take advantage of the powers of the shell. In the series, I highlight shell one-liners that I found useful or interesting. Most of the entries should work on bash on Linux, OS X and other UNIX-variants. Some probably work with other shells as well. Your mileage may vary.

Feel free to suggest your own Shell-Foo one-liners!

Continue Reading…

How To Use SSH With Multiple GitHub Accounts

All I wanted is to have two GitHub accounts, and use them both from the same computer, using SSH. Is this too much to ask?

I’ve been using my personal GitHub account for some time, and associated my default SSH keys with that account. When I created a work account on GitHub, I expected I’d be able to associate the same SSH keys with the work account, so I can git pull/push as easily.

This proved to be a problem. GitHub does not allow reusing SSH keys across accounts. Here is the message I got when trying to do that:

I managed to overcome this problem by creating new SSH keys for the work account. Read on for the details.

Continue Reading…

Using Python’s print With No Newline

Maybe I’m the only one who didn’t know that, but yesterday I learned how to make Python’s print function not write \n.

If you’re using Python’s print function to write stuff to standard output, you’re probably used to its default behavior of ending lines with \n:

> for i in xrange(5): print i
0
1
2
3
4

Sometimes you don’t want it to add the trailing newline. I used to think that in such cases, I need to switch to sys.stdout:

> import sys
> %paste
for i in xrange(5):
    sys.stdout.write('%d ' % (i))
else:
    sys.stdout.write('\n')  # for the trailing newline after the last iteration

## -- End pasted text --
0 1 2 3 4

Turns out that the print function supports the same behavior with a trailing comma!

> for i in xrange(5): print i,

0 1 2 3 4

Of course, this is covered in the documentation.

Every day I learn something new…

Shell Foo: Simple Distributed grep on a GCE Cluster

You just finished running a distributed multi-phase pipeline on a cluster of 200 GCE VM’s. Good for you! But something doesn’t look right, and you’d like to investigate by grepping the local log files across all nodes and do something with the results. How would you do that?

Let’s say that if you had 1-2 machines, you would ssh into each, and run this command:

$ grep ERROR /tmp/*.log.* >errors

This post describes a simple one-liner that scales the local grep to cluster-scale.

Shell-Foo credit for this one: Eyal Fink.

Shell-Foo is a series of fun ways to take advantage of the powers of the shell. In the series, I highlight shell one-liners that I found useful or interesting. Most of the entries should work on bash on Linux, OS X and other UNIX-variants. Some probably work with other shells as well. Your mileage may vary.

Feel free to suggest your own Shell-Foo one-liners!

Continue Reading…

Supporting External Libraries In My SCons Shortcuts

By Thursday, February 26, 2015 0 , , Permalink 0

This is the fifteenth post in my SCons series. This post introduces a small enhancement to my SCons shortcuts system – nicer support for external libraries via the with_libs keyword argument.

In recent episodes, I needed to link with the protobuf library to use the Protocol-Buffers-based AddressBook library. I did this by adding the LIBS=['protobuf'] argument to the Program target, which works just fine.

If this works “just fine”, why mess with it? Well, I already mentioned my OCD, haven’t I? I already have a nicer way to refer to libraries I use, so why can’t I write with_libs=['AddressBook::addressbook', 'protobuf']? It looks a bit cleaner.

The reason this would not work as is, is because I lookup the with_libs entries in a shared dictionary of project-specific libraries (more no that in the post that introduced the shortcuts system), and “protobuf” is not a project library.

This post extends the shortcuts system to support also external libraries. In addition to improved aesthetics, I add a couple of useful features:

  • Support configuration-based list of “supported external libraries”. This allows for centralized control of external libraries used across the project, which can be very useful in projects that want to enforce policies about library-usage (e.g. licensing requirements etc.).
  • Simpler support for libraries that are not installed system-wide, taking care of icky details, like CPPPATH and LIBPATH crap.
  • Protection against potentially difficult troubleshooting due to library name typo’s.
  • External library aliases and groups.

This episode picks up where the previous episode left off. Read on for the full details, or check out the final result on my GitHub scons-series repository.

Continue Reading…

Adding SCons Proto Builder Shortcut

This is the fourteenth post in my SCons series. The topic of this post is adding a shortcut for the custom SCons Protoc builder from the previous episodes.

The shortcut is in line with the SConscript simplification approach described in an earlier episode. In this installment, I add a new Proto shortcut to the collection, so the address book SConscript can look like this:

"""AddressBook proto-based library SConscript script"""

Import('*')

AbProtos = ['person.proto', 'addressbook.proto']

Proto(AbProtos)
Lib('addressbook', protos=AbProtos)

The final result is available on my GitHub scons-series repository.

Continue Reading…

Don’t Use Python Lists Where Generators Work Better!

By Sunday, February 15, 2015 1 Permalink 1

I recently learned about a subtle difference between lists and generators in Python.

The gist: when passing iterables, always prefer generators over lists!

Also: when returning iterables, always prefer generators over lists!

Most of the time, the intuitive way does the Right Thing. I noticed the subtlety in some cases where my intuition didn’t do the Right Thing.

In this post I explain the difference using several examples. The last example is an important one, demonstrating how fnmatch.filter() can misbehave!

Continue Reading…

Fixing the Protoc SCons Builder

This is the thirteenth post in my SCons series.

In the previous episode I integrated Protoc, an existing custom builder, in my SCons project.

This post will demonstrate how this builder fails with non-trivial projects, and suggest some fixes and improvements.

I wanted to share my proposed fix via the SCons wiki page, but I couldn’t create a user… I’d appreciate if someone with access to that wiki could assist 🙂 .

The final result is available on my GitHub scons-series repository.

Continue Reading…

The Protoc Builder: Compiling Protocol Buffers With SCons

This is the twelfth post in my SCons series. This posts continues exploring ways to work with protocol buffers in a SCons project.

In the previous episode I covered the manual approach to using protocol buffers in a SCons project. As mentioned there, SCons does not know how to compile .proto files into C++ and Python code out of the box.

This post will take the integration a step further, by actually using SCons to compile .proto files.

I am definitely not the first one to suggest this SCons extension. I started using the SCons ProtocBuilder by Scott Stafford. It worked fine for my needs, until it didn’t. This post focuses on integrating Scott’s builder as is. Future posts will deal with fixes and improvements.

The final result is available on my GitHub scons-series repository.

Continue Reading…