Automating Module Discovery in a SCons Project

By Thursday, December 25, 2014 0 , , Permalink 0

This is the ninth post in my SCons series. The topic of this post is automating module discovery.

Up until now, the module directories were hardcoded in site_scons/site_config.py. This was a natural choice, because you needed to specify the modules in the correct order. In the previous episode, I implemented a solution that supports arbitrary modules order. Now it’s much more natural to automate the process of module discovery, over manual specification.

As a developer in the project, you know that when you create a new module you need to create a SConscript to describe how to build it. It makes sense that the build system will be able to locate all modules to build, by looking for the SConscript files recursively from the project base directory.

In this episode, I describe how I implemented module auto-discovery by walking the project directory tree.

My implementation also correctly handles common caveats:

  • Avoid walking the build directory.
  • Skip “hidden” directories (like .git, and .whatever from your favorite IDE metadata).

In addition, my implementation provides several useful features:

  • Ability to limit recursion depth.
  • Support for “stop marker files”. If a stop marker file (e.g. .noscons) exists in a project directory, the directory (and all sub-directories) will be skipped.

The episode builds on top of the previous episode. The final result is available on my GitHub scons-series repository.

Automate SCons module discovery to reduce duplicate maintenance!

As a reminder, the (seemingly silly) C++ project is a simple address book program. Refer to a previous post if you need more details.

Basic naive implementation

You may recall that my SCons multi-module framework uses the modules() function in site_scons/site_config.py to obtain the module directories to process. Here’s a first iteration of a modules() function that automates module discovery:

def modules():
    """Generate modules to build (directories with a SConscript file)."""
    for dirpath, dirnames, filenames in os.walk('.'):
        if 'SConscript' in filenames:
            yield dirpath

This is a little too naive, for multiple reasons:

  • The build directory is also in the project tree. This implementation walks also the build directory, which contains links to the modules. The result is that every module will be yielded multiple times, confusing the system.
  • The project directory also might contain hidden directories, like .git and .metadata or other .whatever from various tools. It’s redundant to walk these too, and might even result phantom modules.
  • For every flavor we iterate over the modules twice. This means that a single scons run may call modules() four times, with the same results every time. It’s a shame to waste time on file-system operations.
  • Once the project becomes complex, it may contain deep “sub-projects”. Some of those may be C/C++ projects, and some may be website projects. There’s no point to walk a deep sub-tree that has no SConscript files.

Improvements to the naive implementation

Some of the improvements take advantage of on-the-fly os.walk recursion manipulation. I use it here to prune the walked directory tree to skip sub-trees. If you’re not familiar with this feature, read this post I wrote about it.

Skipping the build directory and other hidden directories

By checking if dirpath is the build directory, or a hidden directory, I can prune os.walk to skip it:

def modules():
    """Generate modules to build (directories with a SConscript file)."""
    for dirpath, dirnames, filenames in os.walk('.'):
        if '.' == dirpath:
            dirpath = ''
        if os.path.normpath(_BUILD_BASE) == os.path.normpath(dirpath) or os.path.basename(dirpath).startswith('.'):
            dirnames[:] = []
        elif 'SConscript' in filenames:
            yield dirpath

Walking just once

By caching the results of the first os.walk, I can generate consequent results from the cache instead of repeating the walk:

_CACHED_MODULES = list()

def modules():
    """Generate modules to build (directories with a SConscript file)."""
    if not _CACHED_MODULES:
        # Build the cache
        # ... the walk from above ...
    # Yield modules from cache
    for module in _CACHED_MODULES:
        yield module

Limiting recursion by depth

It’s simple enough to count path separators and prune the tree to limit the depth:

MAX_DEPTH = 7  # or None to disable the limit

def modules():
    """Generate modules to build (directories with a SConscript file)."""
    for dirpath, dirnames, filenames in os.walk('.'):
        if MAX_DEPTH and MAX_DEPTH > 0:
            depth = dirpath.count(os.path.sep)
            if depth >= MAX_DEPTH:
                dirnames[:] = []:
        if 'SConscript' in filenames:
            yield dirpath

Note that when depth == MAX_DEPTH, the current directory is processed (because it’s at the allowed depth), but sub-directories are not even walked. This means that the condition depth > MAX_DEPTH should never happen.

Limiting recursion with “stop markers”

If we agree that the existence of a file named .noscons indicates that the directory should be skipped, it’s straight forward to implement it:

def modules():

    """Generate modules to build (directories with a SConscript file)."""
    for dirpath, dirnames, filenames in os.walk('.'):
        if '.noscons' in filenames:
            dirnames[:] = []
        elif 'SConscript' in filenames:
            yield dirpath

Final implementation – combining and generalizing improvements

The stand-alone improvements I presented above are clear and specific. My final implementation combines all of them, making them more generic on the way.


# List of cached modules to save processing for second call and beyond
_CACHED_MODULES = list()

def modules():
    """Generate modules to build.

    Each module is a directory with a SConscript file.
    """
    if not _CACHED_MODULES:
        # Build the cache
        def build_dir_skipper(dirpath):
            """Return True if `dirpath` is the build base dir."""
            return os.path.normpath(_BUILD_BASE) == os.path.normpath(dirpath)
        def hidden_dir_skipper(dirpath):
            """Return True if `dirpath` last dir component begins with '.'"""
            last_dir = os.path.basename(dirpath)
            return last_dir.startswith('.')
        for module_path in module_dirs_generator(
                max_depth=7, followlinks=False,
                dir_skip_list=[build_dir_skipper, hidden_dir_skipper],
                file_skip_list='.noscons'):
            _CACHED_MODULES.append(module_path)
    # Yield modules from cache
    for module in _CACHED_MODULES:
        yield module

def module_dirs_generator(max_depth=None, followlinks=False,
                          dir_skip_list=None, file_skip_list=None):
    """Use os.walk to generate directories that contain a SConscript file."""
    def should_process(dirpath, filenames):
        """Return True if current directory should be processed."""
        for skip_dir_func in listify(dir_skip_list):
            if skip_dir_func(dirpath):
                return False
        if intersection(filenames, file_skip_list):
            print 'scons: |- Skipping %s (skip marker found)' % (dirpath)
            return False
        return True
    top = '.'
    for dirpath, dirnames, filenames in os.walk(top, topdown=True,
                                                followlinks=followlinks):
        # Find path relative to top
        rel_path = os.path.relpath(dirpath, top) if (dirpath != top) else ''
        if rel_path:
            if not should_process(rel_path, filenames):
                # prevent os.walk from recursing deeper and skip
                dirnames[:] = []
                continue
            if max_depth:
                # Skip too-deep directories
                max_depth = int(max_depth)
                assert max_depth > 0
                # Calculate current depth relative to top path
                depth = rel_path.count(os.path.sep) + 1
                if depth == max_depth:
                    # prevent os.walk from recursing deeper
                    dirnames[:] = []
                if depth > max_depth:
                    # shouldn't reach here though - shout and skip
                    print 'w00t?! Should not reach here ... o_O'
                    continue
        # Yield directory with SConscript file
        if 'SConscript' in filenames:
            yield rel_path
def listify(args):
    """Return args as a list."""
    if args:
        if isinstance(args, list):
            return args
        return [args]
    return []

def intersection(*args):
    """Return the intersection of all iterables passed."""
    args = list(args)
    result = set(listify(args.pop(0)))
    while args and result:
        # Finish the loop either when args is consumed, or result is empty
        result.intersection_update(listify(args.pop(0)))
    return result

This code is divided between site_scons/site_config.py (the modeules() function) and site_scons/site_utils.py (the rest). I chose to split it this way, because I wanted site_config.py to be minimal, containing only project configuration. The other functions are utility functions that modules() happens to use.

Some notes about generalizing the improvements:

  • Instead of hardcoded .noscons stop marker, I support a list of marker names.
  • Instead of hardcoded directories to skip, I support a list of skip-functions. For every directory, each function is called with the directory path. The directory (and sub-tree) is skipped if any of the functions returns True.

Demo

The default run is exactly as it was before:

itamar@legolas sconseries (episodes/08-discovery) $ rm -r build/
itamar@legolas sconseries (episodes/08-discovery) $ scons
scons: Reading SConscript files ...
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Writer ...
scons: + Processing flavor release ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
... snipped ...
scons: done building targets.

Lets try out the other features as well!

To do it, I created a stub module:

itamar@legolas sconseries (episodes/08-discovery) $ mkdir -p Foo/Bar
itamar@legolas sconseries (episodes/08-discovery) $ touch Foo/Bar/SConscript
itamar@legolas sconseries (episodes/08-discovery) $ scons
scons: Reading SConscript files ...
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Foo/Bar ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Foo/Bar ...
scons: |- Second pass: Reading module Writer ...
scons: + Processing flavor release ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Foo/Bar ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Foo/Bar ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.

Depth limit

Changing max_depth to 1 and running scons:

itamar@legolas sconseries (episodes/08-discovery) $ scons
scons: Reading SConscript files ...
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Writer ...
scons: + Processing flavor release ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.

As expected, Foo/Bar is not processed.

Stop marker

Changing back max_depth to 7, and creating a stop marker file:

itamar@legolas sconseries (episodes/08-discovery) $ touch Foo/Bar/.noscons
itamar@legolas sconseries (episodes/08-discovery) $ scons
scons: Reading SConscript files ...
scons: + Processing flavor debug ...
scons: |- Skipping Foo/Bar (skip marker found)
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Writer ...
scons: + Processing flavor release ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.

As expected, Foo/Bar is skipped. Also, the fact that the skip message appears only once indicates that caching works as expected!

Summary

Once again, this episode brings no change in functionality, but makes the build framework more flexible and developer-friendly.

The automated module discovery, as described and implemented, solves the double-maintenance issue in managing modules. The modules discovery functionality provides a robust configurable module scanner, that can be easily extended to cover more scenarios that I didn’t think about here.

For instance, the implementation doesn’t include these ideas, but they can be easily added:

  • Taking the value for max_depth from a command line flag instead of hardcoded value.
  • Maintaining a list of modules that should not be processed and built by default, unless a specific command line flag is passed. This can be useful, for example, if you maintain a collection of codelabs in the main project tree, but don’t want to build them by default.

I’ll leave these as an exercise for the dedicated reader 😉 . If you implement it, please share back!

The final result is available on my GitHub scons-series repository. Feel free to use / fork / modify. If you do, I’d appreciate it if you share back improvements.

My automated SCons module discovery is robust and configurable

See the scons tag for more in my SCons series. Upcoming episodes that may interest you include supporting SCons help / quiet, and propagating required libraries.

No Comments Yet.

Leave a Reply