Multi-Module SCons Project With Separate Build Directory

By Thursday, September 25, 2014 0 , , Permalink 3

This is the third post in my SCons series. The topic of this post is setting up a multi-module C++ project using SCons, with a separate build directory.

In previous posts in the series I introduced the SCons open source build tool, and described a simple C++ example project that uses SCons.

In this post I use the exact same C++ project from the basic example. I rewrite the build scripts, using SCons, to achieve the following properties:

  1. Divide the project into distinct modules. A module maps to a directory in the project tree. Each module contains a SConscript file, describing the targets included in that module.
  2. Separate the build output directory from the module source code directories. I like my project tree clean and tidy, without object files scattered between source files.
  3. Allow build targets in modules to refer to other targets in other modules easily. This is required, for example, when a program in one module uses functions from a static library in another module.

The final result is available on my GitHub scons-series repository. In the rest of this post I explain the details of what I came up with.

Use SCons to handle the build process of a multi-module project with a separate build directory

As a reminder, the (seemingly silly) C++ project is a simple address book program. Refer to the previous post if you’re interested in more details.

The main SConstruct

The main SConstruct file is in the project root directory. This is where SCons starts processing when it is executed in the project.

You can see the complete file in the GitHub repository. I will paste selected parts in arbitrary order for deductive purposes. The pasted code might be modified, so please don’t copy & paste and use it as is – prefer using the GitHub version!

The heart of the SConstruct file is the module-loop:


# Go over modules to build, and include their SConscript files
for module in modules():
    sconscript_path = os.path.join(module, 'SConscript')
    # Execute the SConscript file, with variant_dir set to the module dir under the project build dir.
    targets = env.SConscript(sconscript_path,
                             variant_dir=os.path.join(build_dir, module),
                             exports={'env': env})
    # Add the targets built by this module to the shared cross-module targets
    #  dictionary, to allow the next modules to refer to these targets easily.
    for target_name in targets:
        # Target key built from module name and target name
        # It is expected to be unique
        target_key = '%s::%s' % (module, target_name)
        assert target_key not in env['targets']
        env['targets'][target_key] = targets[target_name]

The loop iterates over a list of modules (generated by the call to modules()), assuming every element yielded is a module directory. The module SConscript file is processed in the highlighted lines. The variant_dir argument instructs SCons to place build artifacts for that module in the specified directory (as explained in the SCons user guide). The module SConscript is expected to return a dictionary of targets it contains (see the content of the SConscripts below). That dictionary is then added to the shared cross-module targets dictionary, storing all targets from all modules declared so far.

The modules() function can be anything that returns or yields the names of the modules to be built. def modules(): return ['AddressBook', 'Writer'] would suffice.

I used a generator instead of a list, mostly because I’m a smart-ass:

def modules():
    yield 'AddressBook'
    yield 'Writer'

Module SConscripts

The module-level SConscript files are simple enough:

Import('*')

module_targets = dict()
module_targets['addressbook'] = env.Library('addressbook', ['addressbook.cc'])

Return('module_targets')
Import('*')

module_targets = dict()
module_targets['writer'] = env.Program(
    'writer', ['writer.cc'] + env.get_targets('addressbook'))

Return('module_targets')

Both SConscript files follow a similar pattern. They create a dictionary, declare build targets and save them in the dictionary, and return the dictionary.

The AddressBook module SConscript simply builds a addressbook library. Nothing special about it.

The Writer module SConscript builds the writer program, based on writer.cc source file, and a weird env.get_targets('addressbook') thingie. This is the interesting part!

Before going into details about env.get_targets, which is an extension I added, lets first understand what we’re trying to do.

Linking with Libraries

The SCons user guide explain how to build a program that links with libraries. When declaring the Program target, the LIBS and LIBPATH construction variables should be specified. These flags, in turn, are passed to the linker (using -l<libname> and -L<libpath> in case of gcc / clang).

So in the address book example, it would look like this: Program('writer', ['writer.cc'], LIBS=['addressbook'], LIBPATH=['#build/AddressBook']).

I don’t like this… The module-level SConscript should not need to know about the “build directory” explicitly. If it is changed in the main SConstruct, the build will break unless the developer remembers to change also relevant SConscript files.

One possible solution may look like this: Program('writer', ['writer.cc'], LIBS=['addressbook'], LIBPATH=['../AddressBook']). This is better – no explicit reference to the build dir. Another option would be to do something like this: LIBPATH=['$BUILDDIR/AddressBook'], assuming that SConstruct file did something like this: env['BUILDDIR'] = '#build'. This is also a valid solution.

But I still don’t like this… 🙂

In all 3 variations, the Program declaration still referred to the address book module twice. I named the library I want (in LIBS), and the search path for that library (in LIBPATH). Granted, these are two different things – library name, and the directory that has it. But it feels like it can be cleaner, more elegant.

The fact that SCons files are written in Python allows for powerful customizations using familiar syntax and tools!

Striving for elegance and SConscript simplicity, I wrote the get_targets extension to make it easier to refer to targets from other modules.

The get_targets SCons Extension

First, note that the module-level SConscripts can use env.get_targets thanks to the highlighted line in the main SConstruct:


env = Environment()
# Allow including from project build base dir
env.Append(CPPPATH=['#%s' % (build_dir)])
# Prepare shared targets dictionary
env['targets'] = dict()
# Allow modules to use `env.get_targets('libname1', 'libname2', ...)` as
#  a shortcut for adding targets from other modules to sources lists.
env.get_targets = lambda *args, **kwargs: get_targets(env, *args, **kwargs)

This way, when a module-level SConscript calls env.get_targets(a1, a2, kw1=v1, kw2=v1), it will pass the call to the get_targets function with (env, a1, a2, kw1=v1, kw2=v2). Not completely different from a regular Python instance method.

Now, what do I want this function to do? Lets characterize the function, before going into its implementation.

The end goal is to simplify the way SConscripts refer to targets from other modules. In most cases, this applies to Program targets that need to use Library targets from other modules. Without limiting generality, I’ll use the example to characterize the desired behavior.

The writer program needs to be linked with the addressbook library, in the AddressBook module. We can refer to a target named lib in a module named mod in a globally unique way as mod::lib. This is globally unique because :: is not a valid sequence in any POSIX or Windows path. So, assume that for every library that is built in any module in the project, the list of targets returned from the Library builder is stored in a global dictionary under the unique identifier mod::lib. If this dictionary is globally accessible, then any program that needs to link with this library can simply extend its list of sources with the targets from the dictionary at mod::lib.

We already saw that this dictionary exists in env['targets'], with the main SConstruct updating it after including every module-level SConscript. So all the get_targets(...) function needs to do, is lookup the dictionary entry for every target identifier passed to it, and return an aggregate list of all targets it saw.

But wait. In most common scenarios, you’re not going to have different modules reusing the same target names, right? In the current example, for instance, the name addressbook is specific enough to refer to the AddressBook::addressbook library uniquely!

So I want get_targets to support short-form queries as well. But I don’t want to force the project to avoid reusing target names in different modules. So a naive approach that creates two entries in the targets dictionary for every target (one for mod::lib and one for lib) will not suffice, because lib might be overwritten.

My solution – do a “smart” lookup in get_targets:

  1. If a query contains ::, lookup a full mod::lib match in the targets dictionary.
  2. If it’s just a name (no ::), lookup any mod::<name> match (and maybe print a warning if more than one matched).
  3. Added bonus – if the query contains * – do a wildcard lookup (allowing, for example, getting all targets from a module with module::*).

The get_targets with this behavior is implemented in site_scons/site_init.py. This file is automatically read by SCons before SConstruct/SConscript files.

You can see the complete function on GitHub. Lets take a closer look at selected parts.

The query_to_regex helper function decides for every query what kind of query it is. It returns a RegEx matcher object that can be applied against the targets dictionary entries. It also returns a boolean flag, used to decide whether multiple matches are a problem or not.

    def query_to_regex(query):
        """Return RegEx for specified query `query`."""
        # Escape query string
        query = re.escape(query)
        if r'\*' in query:  # '\' because of RE escaping
            # It's a wildcard query
            return re.compile('^%s$' % (query.replace('\\*', '.*'))), False
        if r'\:\:' in query:  # '\' because of RE escaping
            # It's a fully-qualified "Module::Target" query
            return re.compile('^%s$' % (query)), True
        # else - it's a target-name-only query
        return re.compile(r'^[^\:]*\:{2}%s$' % (query)), True

The main loop simply goes over the queries in the args list. Every query in converted into a RegEx matcher, and checked against all target names in the dictionary. Matching entries are added to the matched list.

    for query in args:
        qre, warn = query_to_regex(query)
        for target_name in target_names:
            if qre.match(target_name):
                matching_target_names.append(target_name)

The main loop also counts matches (per-query), and prints warnings if no matches were found, or unexpected multiple matches were found.

Finally, an aggregate list of targets in constructed and returned. This is done using the one-liner return reduce(lambda acculist, tname: acculist + env['targets'][tname], matching_target_names, []). If it confuses you (like it confused me to write it), here’s an equivalent spread-out version:

acculist = []
for tname in matching_target_names:
    acculist.extend(env['targets'][tname])
return acculist

The Python docs on the reduce function can help.

Side notes:

  1. The matched list is a list and not a set, because order may be important, and set is not ordered.
  2. Duplicate matches are avoided by removing the matched entries from the list of available target names in every iteration. It also reduces the number of iterations for every query.
  3. It is possible that after a couple of queries, all available targets are matched. If this happens, it doesn’t make sense to keep iterating over the queries list. I chose to complete the iteration instead of breaking out of the main loop, in order to print all the warnings for all the remaining queries.

It Works?

It does!

itamar@legolas sconseries (episodes/02-modules) $ scons
scons: Reading SConscript files ...
scons: |- Reading module AddressBook ...
scons: |- Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o build/AddressBook/addressbook.o -c -Ibuild build/AddressBook/addressbook.cc
ar rc build/AddressBook/libaddressbook.a build/AddressBook/addressbook.o
ranlib build/AddressBook/libaddressbook.a
g++ -o build/Writer/writer.o -c -Ibuild build/Writer/writer.cc
g++ -o build/Writer/writer build/Writer/writer.o build/AddressBook/libaddressbook.a
scons: done building targets.

What’s the Catch..?

This all seems nice and fun, doesn’t it? Well, nothing comes for free, does it?

You might have already noticed the caveats and potential problems. Let me list what I’m aware of: (and let me know if you spot more!)

  1. The module-level SConscript files are cluttered with the local module_targets dictionary that the main SConstruct expects.
  2. The modules() in the main SConstruct must be given in the correct order!

The first item is no deal breaker (maybe). It does upset my OCD. So in another post in the series I take the task of SConscript simplification to an extreme.

That second item is more significant. Let me show you. If I switch the order of Writer and AddressBook, and try to build, here’s what happens:

itamar@legolas sconseries (episodes/02-modules) $ scons
scons: Reading SConscript files ...
scons: |- Reading module Writer ...
scons: warning: get_targets query "addressbook" had no matches
scons: |- Reading module AddressBook ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o build/AddressBook/addressbook.o -c -Ibuild build/AddressBook/addressbook.cc
ar rc build/AddressBook/libaddressbook.a build/AddressBook/addressbook.o
ranlib build/AddressBook/libaddressbook.a
g++ -o build/Writer/writer.o -c -Ibuild build/Writer/writer.cc
g++ -o build/Writer/writer build/Writer/writer.o
Undefined symbols for architecture x86_64:
  "PhoneNumber::set_number(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)", referenced from:
      PromptForAddress(Person*) in writer.o
  "PhoneNumber::set_type(PhoneNumber::PhoneType)", referenced from:
      PromptForAddress(Person*) in writer.o
  "Person::set_id(int)", referenced from:
      PromptForAddress(Person*) in writer.o
  "Person::set_name(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)", referenced from:
      PromptForAddress(Person*) in writer.o
  "Person::set_email(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)", referenced from:
      PromptForAddress(Person*) in writer.o
  "Person::name() const", referenced from:
      _main in writer.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
scons: *** [build/Writer/writer] Error 1
scons: building terminated because of errors.

Line #4 (highlighted) shows that get_targets failed to find matches for the addressbook query. Line #12 (also highlighted) shows that the writer program isn’t linked with the addressbook library. The result is that the build breaks.

What happened..?

That should be apparent by now. Since Writer/SConscript got processed first, when it reached the call to get_targets, the global targets dictionary still didn’t contain the AddressBook::addressbook library.

Solutions?

  1. Specify the models in order of dependency. A module may use targets only from modules that appeared before.
  2. An important implication – you can’t have a target in module A using targets from module B, along with a target from module B using targets from module A. This creates a cyclic dependency graph. You cannot list A and B in an order that satisfies #1. You will have to refactor – maybe create a common module that contains targets from A and B that are used by both.

Better solutions?

Technically, if SConstruct has the full modules list in advance, it can pre-process it. It can determine what targets belong to what modules, and what modules depend on what targets. Using this information, it can construct a module dependency graph, and try to resolve it. The result would be an ordered modules list, that satisfies the requirements (assuming there are no cyclic dependencies).

This indeed would be a better solution! It’s also interesting and non-trivial enough to deserve its own post in this series. 🙂

Summary

That was my first SCons extension, supporting the use-case of multi-module project with a separate build directory.

I showed how to use the SConscript function with the variant_dir argument to delegate targets declarations to module-level SConscript files, and have the build artifacts created under a separate build directory.

An easy way to declare that a target in one module uses targets from other modules was characterized and implemented.

The final result is available on my GitHub scons-series repository. Feel free to use / fork / modify. If you do, I’d appreciate it if you share back improvements.

Use SCons to easily manage building muti-module C++ projects with separate build directory

See the scons tag for more in my SCons series. Specific posts of interest may include:

No Comments Yet.

Leave a Reply