Fixing the Protoc SCons Builder

This is the thirteenth post in my SCons series.

In the previous episode I integrated Protoc, an existing custom builder, in my SCons project.

This post will demonstrate how this builder fails with non-trivial projects, and suggest some fixes and improvements.

I wanted to share my proposed fix via the SCons wiki page, but I couldn’t create a user… I’d appreciate if someone with access to that wiki could assist 🙂 .

The final result is available on my GitHub scons-series repository.

The original Protoc builder breaks with non-trivial projects

Importing proto-files is broken

Like in other programming languages, you may want to divide your protocol-buffer-based data structures to multiple proto-files. In protocol buffers, this can be done with the import keyword. It allows a proto-file to refer to protobuf-messages defined in imported files.

Let’s try splitting the proto-messages in addressbook.proto to two proto-files:


// Copyright 2015 The Ostrich / by Itamar O
// Source: https://developers.google.com/protocol-buffers/docs/cpptutorial

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

// Copyright 2015 The Ostrich / by Itamar O
// Source: https://developers.google.com/protocol-buffers/docs/cpptutorial

import "AddressBook/person.proto";

message AddressBook {
  repeated Person person = 1;
}

And update the AddressBook/SConscript accordingly:


"""AddressBook proto-based library SConscript script"""

Import('*')

Protoc([], 'person.proto', PROTOCPROTOPATH='.',
       PROTOCOUTDIR='.', PROTOCPYTHONOUTDIR=None)
Protoc([], 'addressbook.proto', PROTOCPROTOPATH='.',
       PROTOCOUTDIR='.', PROTOCPYTHONOUTDIR=None)
Lib('addressbook', 'addressbook.pb.cc')

Trying to build this with scons reveals the first problem:

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
clang++ -o build/debug/AddressBook/addressbook.pb.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug -I. build/debug/AddressBook/addressbook.pb.cc
build/debug/AddressBook/addressbook.pb.cc:78:5: error: no member named 'protobuf_AddDesc_AddressBook_2fperson_2eproto' in the global namespace
  ::protobuf_AddDesc_AddressBook_2fperson_2eproto();
  ~~^
1 error generated.
scons: *** [build/debug/AddressBook/addressbook.pb.o] Error 1
scons: building terminated because of errors.

Remember the annoyance with the explicit build dir in the include path? Now it’s more than an annoyance… Because protoc compiles build/debug/AddressBook/person.proto, the actual symbols generated contain build_2fdebug_2fAddressBook_2fperson_2eproto. But in addressbook.proto I tried to import AddressBook/person.proto following my regular convention. This caused the protoc compiler to use symbols without the build_2fdebug_2f prefix, and they don’t exist…

To prove it, see what happens if I change the import path:


// Copyright 2015 The Ostrich / by Itamar O
// Source: https://developers.google.com/protocol-buffers/docs/cpptutorial

import "build/debug/AddressBook/person.proto";

message AddressBook {
  repeated Person person = 1;
}

Now the debug build works fine!

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
protoc -I. --cpp_out=. build/debug/AddressBook/addressbook.proto
clang++ -o build/debug/AddressBook/addressbook.pb.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug -I. build/debug/AddressBook/addressbook.pb.cc
clang++ -o build/debug/AddressBook/person.pb.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug -I. build/debug/AddressBook/person.pb.cc
ar rc build/debug/AddressBook/libaddressbook.a build/debug/AddressBook/addressbook.pb.o build/debug/AddressBook/person.pb.o
ranlib build/debug/AddressBook/libaddressbook.a
clang++ -o build/debug/Reader/reader.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug -I. build/debug/Reader/reader.cc
clang++ -o build/debug/Reader/reader build/debug/Reader/reader.o build/debug/AddressBook/libaddressbook.a -lprotobuf
clang++ -o build/debug/Writer/writer.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug -I. build/debug/Writer/writer.cc
clang++ -o build/debug/Writer/writer build/debug/Writer/writer.o build/debug/AddressBook/libaddressbook.a -lprotobuf
Install file: "build/debug/Reader/reader" as "build/debug/bin/Reader.reader"
Install file: "build/debug/Writer/writer" as "build/debug/bin/Writer.writer"
scons: done building targets.

But it is clear this is not a viable solution. I cannot explicitly refer to build-dir in non-auto-generated code! This is flavor-dependent!

Before I show a solution, let’s see another problem, so we can solve both at the same time 🙂 .

The Protoc tool doesn’t detect imported proto-files

Continuing with the two proto-files example, I keep the import-path hack for now.

What happens if I try to perform a clean build?

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
protoc -I. --cpp_out=. build/debug/AddressBook/addressbook.proto
build/debug/AddressBook/person.proto: File not found.
build/debug/AddressBook/addressbook.proto: Import "build/debug/AddressBook/person.proto" was not found or had errors.
build/debug/AddressBook/addressbook.proto:7:12: "Person" is not defined.
scons: *** [build/debug/AddressBook/addressbook.pb.cc] Error 1
scons: building terminated because of errors.

Wait, what changed?! Just a moment ago the import-path hack worked!

Well, I cheated a bit.

The scons command that failed before the path-hack failed only after the protoc targets were executed. After I changed the import-path, the person.proto file was already in build/debug/AddressBook, from the previous run. That way, running scons after the change, the protoc compiler was able to locate the imported proto-file – a copy of the original. Trying to perform clean-build, the copy of person.proto is not there when the addressbook.proto target is executed. So protoc fails, and the entire build fails!

If I could force the person.proto target to get built first, the problem would disappear. But that’s a workaround, not a robust solution…

Why don’t we start looking into solutions?

Avoiding explicit build dir in symbol names and import paths

I rewrote the protoc.py code, and completely removed the part that manipulates the sources paths in ways I didn’t understand. The result is available on GitHub. Here are the main parts:

def protoc_emitter(target, source, env):
    """Return list of targets generated by Protoc builder for source."""
    for src in source:
        proto = os.path.splitext(str(src))[0]
        if env['PROTOCPPOUT']:
            target.append('%s.pb.cc' % (proto))
            target.append('%s.pb.h' % (proto))
        if env['PROTOPYOUT']:
            target.append('%s_pb2.py' % (proto))
    return target, source

def generate(env):
    """Add Builders and construction variables
    for protoc to the build Environment."""
    try:
        bldr = env['BUILDERS']['Protoc']
    except KeyError:
        action = SCons.Action.Action('$PROTOCOM', '$PROTOCOMSTR')
        bldr = SCons.Builder.Builder(action=action,
                                     emitter=protoc_emitter,
                                     src_suffix='$PROTOCSRCSUFFIX')
        env['BUILDERS']['Protoc'] = bldr

    # pylint: disable=bad-whitespace
    env['PROTOC']          = env.Detect(_PROTOCS) or 'protoc'
    env['PROTOCFLAGS']     = SCons.Util.CLVar('')
    env['PROTOCSRCSUFFIX'] = _PROTOSUFFIX
    # Default proto search path is same dir
    env['PROTOPATH']       = ['.']
    # Default CPP output in same dir
    env['PROTOCPPOUT']     = '.'
    # No default Python output
    env['PROTOPYOUT']      = ''
    proto_cmd     = ['$PROTOC']
    proto_cmd.append('${["--proto_path=%s"%(x) for x in PROTOPATH]}')
    proto_cmd.append('$PROTOCFLAGS')
    proto_cmd.append('${PROTOCPPOUT and "--cpp_out=%s"%(PROTOCPPOUT) or ""}')
    proto_cmd.append('${PROTOPYOUT and "--python_out=%s"%(PROTOPYOUT) or ""}')
    proto_cmd.append('${SOURCES}')
    env['PROTOCOM'] = ' '.join(proto_cmd)

Note that my rewrite changed a couple of things beyond the fix:

  • Changed default Python output to disabled.
  • Changed names of environment arguments to shorter ones.
  • Removed FileDescriptorSet support (what’s that??).

With this Protoc builder in place, I continue with the two proto-files example. I removed the extra # C++ compiler search directory from site_scons/site_config.py, and changed AddressBook/SConscript to this:

"""AddressBook proto-based library SConscript script"""

Import('*')

Protoc([], 'person.proto',
       PROTOPATH=['$BUILDROOT'], PROTOCPPOUT='$BUILDROOT')
Protoc([], 'addressbook.proto',
       PROTOPATH=['$BUILDROOT'], PROTOCPPOUT='$BUILDROOT')
Lib('addressbook', ['addressbook.pb.cc', 'person.pb.cc'])

These changes made the protoc compiler work relative to the build dir. This way, the generated symbols do not contain the build dir prefix, so there’s no need to include the build dir prefix in the proto import path. I can go back to the clean addressbook.proto:

// Copyright 2015 The Ostrich / by Itamar O
// Source: https://developers.google.com/protocol-buffers/docs/cpptutorial

import "AddressBook/person.proto";

message AddressBook {
  repeated Person person = 1;
}

But this doesn’t solve the second problem:

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ rm -r build/
(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
protoc --proto_path=build/debug --cpp_out=build/debug build/debug/AddressBook/addressbook.proto
AddressBook/person.proto: File not found.
AddressBook/addressbook.proto: Import "AddressBook/person.proto" was not found or had errors.
AddressBook/addressbook.proto:7:12: "Person" is not defined.
scons: *** [build/debug/AddressBook/addressbook.pb.cc] Error 1
scons: building terminated because of errors.

Since now protoc runs relative to build/debug, importing AddressBook/person.proto still refers to build/debug/AddressBook/person.proto. As before, this file may not exist when running a clean build, depending on the order of targets execution…

Adding proto-import-scanner to the Protoc tool

It doesn’t make sense that SCons doesn’t know about implicit dependencies between source files, does it? We came to rely on SCons doing the dirty dependency analysis for us, as far as C/C++ code was concerned. Why should it be different with proto-imports?

Well, it shouldn’t! To enable SCons to do the work for us, the Protoc builder has to give SCons a way to deduce these implicit dependencies! This is where SCons Scanners come into play. The original Protoc builder simply didn’t implement a scanner, so SCons was unable to help.

I implemented the missing scanner on top of the changes described above. The full code is available in the GitHub episode. Here are the main parts, with the scanner-related additions highlighted:

_PROTOC_SCANNER_RE = re.compile(r'^import\s+\"(.+\.proto)\"\;$', re.M)

def protoc_emitter(target, source, env):
    """Return list of targets generated by Protoc builder for source."""
    for src in source:
        proto = os.path.splitext(str(src))[0]
        if env['PROTOCPPOUT']:
            target.append('%s.pb.cc' % (proto))
            target.append('%s.pb.h' % (proto))
        if env['PROTOPYOUT']:
            target.append('%s_pb2.py' % (proto))
    return target, source

def protoc_scanner(node, env, _):
    """Return list of file nodes that `node` imports"""
    contents = node.get_text_contents()
    # If build location different from sources location,
    #  get the destination base dir as the base for imports.
    nodepath = str(node.path)
    srcnodepath = str(node.srcnode())
    src_pos = nodepath.find(srcnodepath)
    base_path = src_pos and nodepath[:src_pos-1] or ''
    imports = [os.path.join(base_path, imp)
               for imp in _PROTOC_SCANNER_RE.findall(contents)]
    return env.File(imports)

def generate(env):
    """Add Builders, Scanners and construction variables
    for protoc to the build Environment."""
    try:
        bldr = env['BUILDERS']['Protoc']
    except KeyError:
        action = SCons.Action.Action('$PROTOCOM', '$PROTOCOMSTR')
        bldr = SCons.Builder.Builder(action=action,
                                     emitter=protoc_emitter,
                                     src_suffix='$PROTOCSRCSUFFIX')
        env['BUILDERS']['Protoc'] = bldr

    # pylint: disable=bad-whitespace
    env['PROTOC']          = env.Detect(_PROTOCS) or 'protoc'
    env['PROTOCFLAGS']     = SCons.Util.CLVar('')
    env['PROTOCSRCSUFFIX'] = _PROTOSUFFIX
    # Default proto search path is same dir
    env['PROTOPATH']       = ['.']
    # Default CPP output in same dir
    env['PROTOCPPOUT']     = '.'
    # No default Python output
    env['PROTOPYOUT']      = ''
    proto_cmd     = ['$PROTOC']
    proto_cmd.append('${["--proto_path=%s"%(x) for x in PROTOPATH]}')
    proto_cmd.append('$PROTOCFLAGS')
    proto_cmd.append('${PROTOCPPOUT and "--cpp_out=%s"%(PROTOCPPOUT) or ""}')
    proto_cmd.append('${PROTOPYOUT and "--python_out=%s"%(PROTOPYOUT) or ""}')
    proto_cmd.append('${SOURCES}')
    env['PROTOCOM'] = ' '.join(proto_cmd)

    # Add the proto scanner (if it wasn't added already)
    env.AppendUnique(SCANNERS=SCons.Scanner.Scanner(function=protoc_scanner,
                                                    skeys=[_PROTOSUFFIX]))

The scanner simply reads the file content, and uses a regular expression to find import lines and extract the imported file.

With the scanner in place, SCons is able to successfully build the project, regardless of targets execution order:

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ rm -r build/
(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
protoc --proto_path=build/debug --cpp_out=build/debug build/debug/AddressBook/addressbook.proto
protoc --proto_path=build/debug --cpp_out=build/debug build/debug/AddressBook/person.proto
clang++ -o build/debug/AddressBook/addressbook.pb.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug build/debug/AddressBook/addressbook.pb.cc
clang++ -o build/debug/AddressBook/person.pb.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug build/debug/AddressBook/person.pb.cc
ar rc build/debug/AddressBook/libaddressbook.a build/debug/AddressBook/addressbook.pb.o build/debug/AddressBook/person.pb.o
ranlib build/debug/AddressBook/libaddressbook.a
clang++ -o build/debug/Reader/reader.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug build/debug/Reader/reader.cc
clang++ -o build/debug/Reader/reader build/debug/Reader/reader.o build/debug/AddressBook/libaddressbook.a -lprotobuf
clang++ -o build/debug/Writer/writer.o -c -std=c++11 -Wall -fvectorize -fslp-vectorize -g -DDEBUG -Ibuild/debug build/debug/Writer/writer.cc
clang++ -o build/debug/Writer/writer build/debug/Writer/writer.o build/debug/AddressBook/libaddressbook.a -lprotobuf
Install file: "build/debug/Reader/reader" as "build/debug/bin/Reader.reader"
Install file: "build/debug/Writer/writer" as "build/debug/bin/Writer.writer"
scons: done building targets.

You can see that it works by using the --tree=all flag, to see the dependencies that scons detects.

With the scanner:

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons --tree=all build/debug/AddressBook/addressbook.pb.o
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `build/debug/AddressBook/addressbook.pb.o' is up to date.
+-build/debug/AddressBook/addressbook.pb.o
  +-build/debug/AddressBook/addressbook.pb.cc
  | +-build/debug/AddressBook/addressbook.proto
  | +-build/debug/AddressBook/person.proto
  | +-/usr/bin/protoc
  +-build/debug/AddressBook/addressbook.pb.h
  | +-build/debug/AddressBook/addressbook.proto
  | +-build/debug/AddressBook/person.proto
  | +-/usr/bin/protoc
  +-build/debug/AddressBook/person.pb.h
  | +-build/debug/AddressBook/person.proto
  | +-/usr/bin/protoc
  +-/usr/bin/clang++
scons: done building targets.

And without the scanner:

(debug) itamar@legolas sconseries (episodes/12-protoc-fixes) $ scons --tree=all build/debug/AddressBook/addressbook.pb.o
scons: Reading SConscript files ...
scons: Using active flavor "debug" from your environment
scons: + Processing flavor debug ...
scons: |- First pass: Reading module AddressBook ...
scons: |- First pass: Reading module Reader ...
scons: |- First pass: Reading module Writer ...
scons: |- Second pass: Reading module AddressBook ...
scons: |- Second pass: Reading module Reader ...
scons: |- Second pass: Reading module Writer ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `build/debug/AddressBook/addressbook.pb.o' is up to date.
+-build/debug/AddressBook/addressbook.pb.o
  +-build/debug/AddressBook/addressbook.pb.cc
  | +-build/debug/AddressBook/addressbook.proto
  | +-/usr/bin/protoc
  +-build/debug/AddressBook/addressbook.pb.h
  | +-build/debug/AddressBook/addressbook.proto
  | +-/usr/bin/protoc
  +-build/debug/AddressBook/person.pb.h
  | +-build/debug/AddressBook/person.proto
  | +-/usr/bin/protoc
  +-/usr/bin/clang++
scons: done building targets.

You can see that person.proto is missing in some of the targets without the scanner.

Summary

Now the Protoc builder is more complete, and more robust. In my opinion, it’s also a bit simpler now.

Even if you’re not following my multi-module approach, you can take the protoc.py custom builder as is, and use it in any other SCons project.

There’s still the issue of a messy SConscript file, because of the way the Protoc builder is integrated. This will be taken care of in the next episode.

Some of the future episodes include:

The code for this episode is available on my GitHub scons-series repository. Feel free to use / fork / modify. If you do, I’d appreciate it if you share back improvements.

Integrate Protocol Buffer code generation in SCons with a custom builder

See the scons tag for more in my SCons series.

No Comments Yet.

Leave a Reply