Sort Your Include Files!

By Itamar Ostricher Monday, November 17, 2014 0 Software Engineering automation, cpp, cpplint, howto, quality Permalink 1

The Google C++ Style Guide defines a guideline for names and order of includes. It occurred to me that grouping and sorting include files is tedious and error prone, and a computer can do it much better. So I wrote a script that does exactly that 🙂 .

The nitpick script is available on my GitHub cpplint fork.

If you’re comfortable with Python, you can figure out the script straight from nitpick.py source code and the accompanying unit tests. You can also read the rest of the post for some plain English review 🙂 .

Background

Quoting from the Google C++ Style Guide:

In “dir/foo.cc” or “dir/foo_test.cc”, whose main purpose is to implement or test the stuff in dir2/foo2.h, order your includes as follows:

1. dir2/foo2.h (preferred location — see details below).
2. C system files.
3. C++ system files.
4. Other libraries’ .h files.
5. Your project’s .h files.

If you use the cpplint tool (and you should), you know it detects violations of this guideline (sort of). You may even silently curse it – “if you’re so smart, why don’t you fix it yourself?!”.

Automating includes-sorting

Lets see the script in action.

Here’s the example set up:

itamar@legolas cpplint (nitpick-sort-includes) $ ls -R
README               cpplint.pyc          cpplint_unittest.py     nitpick.py          nitpick_test.py
cpplint.py          cpplint_test_header.h     neater.pyc          nitpick.pyc          proj

./proj:
common     foo     mymath

./proj/common:
logging.h

./proj/foo:
bar.cc

./proj/mymath:
factorial.h

The content of the example source file at proj/foo/bar.cc:

// Copyright 2014 The Ostrich

#include "common/util.h"
#include <vector>
#include "mymath/factoiral.h"
#include <map>
#include <algorithm>
#include <stdio.h>
#include "common/logging.h"
#include "foo/bar.h"

int main() {
  return 42;
}

Indeed, cpplint recognizes multiple “include order” issues:

itamar@legolas cpplint (nitpick-sort-includes) $ ./cpplint.py --root proj proj/foo/bar.cc
proj/foo/bar.cc:4:  Found C++ system header after other header. Should be: bar.h, c system, c++ system, libs, other.  [build/include_order] [4]
proj/foo/bar.cc:6:  Found C++ system header after other header. Should be: bar.h, c system, c++ system, libs, other.  [build/include_order] [4]
proj/foo/bar.cc:7:  Found C++ system header after other header. Should be: bar.h, c system, c++ system, libs, other.  [build/include_order] [4]
proj/foo/bar.cc:8:  Found C system header after other header. Should be: bar.h, c system, c++ system, libs, other.  [build/include_order] [4]
Done processing proj/foo/bar.cc
Total errors found: 4

My nitpick.py script can fix the includes order and edit the source file in place:

itamar@legolas cpplint (nitpick-sort-includes) $ ./nitpick.py style --root proj proj/foo/bar.cc
INFO: Stylifying file proj/foo/bar.cc ...
INFO: Writing changes back to filepath proj/foo/bar.cc ...
INFO: Done with file proj/foo/bar.cc ...

The fixed file makes cpplint happy!

itamar@legolas cpplint (nitpick-sort-includes) $ ./cpplint.py --root proj proj/foo/bar.cc
Done processing proj/foo/bar.cc
Total errors found: 0

Here’s to edited source file:

// Copyright 2014 The Ostrich

#include "foo/bar.h"

#include <stdio.h>

#include <algorithm>
#include <map>
#include <vector>

#include "common/logging.h"
#include "common/util.h"
#include "mymath/factoiral.h"

int main() {
  return 42;
}

nitpick.py commands and modules

As shown in the example, I ran the include sorter using the style command with the nitpick.py script.

Currently, the style command is the only available command. It applies various style modules on the processed files. The sort_includes is the only available module.

I chose to write the script this way to simplify adding extra features later on, without adding a new script for every new feature, or changing the usage.

For example, I might add a “fix_header_guard” module to the style command. I might add other commands that do other things.

Features of the include sorter style module

The general behavior of the include sorter module for every processed file is:

Find “batches” of #include lines in the source file.
Classify each include.
Assign include classes to sections.
Sort the includes in each section.
Rewrite the include “batch” in sorted sections with blank line between sections.

Some notes:

A “batch” of includes is a series of lines starting with #include, ignoring blank lines (see caveats).
The include sorter classifies includes using the cpplint classification function, to avoid inconsistencies.
- An exception is the “own include file”. cpplint is pretty tolerant about classifying includes as “likely my header” and “possibly my header”. To deal with it, I wrote a verification function that reflects project-specific naming conventions…
Empty sections are skipped without extra blank lines.
Any number of blank lines between the batch and the code that follows is replaced with a single blank line.

Preserving include-related comments

Sometimes your include lines may look like this:

#include <algorithm>  // for std::min
#include <iostream>  // NOLINT(readability/streams)

This and other situations are valid, and the include sorter should’t remove it – so it doesn’t 🙂 . The includes are sorted by the name of the included file, while preserving the string that follows the closing > (or ").

Handling duplicate includes

Sometimes your code may contain duplicate includes (usually due to copy&paste programming). The include sorter will print a warning when it detects duplicates, and keep only one.

Duplicates will not be detected if they are not in the same “include batch”.

Consider the following example of inconsistent duplicate include:

#include <algorithm>  // for std::min
#include <algorithm>  // for std::max

In this case, the include sorter can’t keep just one without losing information. I chose to print an error in this case, and abort processing of the current file.

Detecting wrong include form

Consider the following valid C++ includes snippet:

#include "algorithm"
#include <foo/bar.h>

It works fine, but it’s poor style. System includes should be in <...>, and project-specific includes should be in "...".

The include sorter tries to detect these, and issues warnings accordingly.

For example, if I change foo/bar.cc from the example to use such misguided style, I get these warnings:

itamar@legolas cpplint (nitpick-sort-includes) $ ./nitpick.py style --root proj --quiet --no_edit proj/foo/bar.cc
WARNING: "foo/bar.h" looks like a project-file, but is included with <> in "proj/foo/bar.cc:3": #include <foo/bar.h>
WARNING: "algorithm" looks like a system-file, but is included with "" in "proj/foo/bar.cc:7": #include "algorithm"

I chose to warn instead of trying to fix it, because I may be wrong about the observation. I preferred to leave the fixing to the developer.

The observation is based on the is_project_file() function. It assumes that a file belongs to the current project if either the file exists relative to the project directory, or the first directory element of the include file exists relative to the project directory.

Why not just “existing file”? This is meant to support source files that are generated as part of the build process.

When can the observation be wrong? Here are a couple of scenarios:

You happen to have a file with the same name as a system .h-file in the project directory (e.g. math.h).
You have generated source files that are created in a non-existing directory (e.g. “gen/file.h”).

In the first scenario, I recommend you avoid such name collisions…

In the second scenario, I recommend to change the is_project_file() function to meet your needs.

nitpick.py style command line options

The most basic usage of the style command is ./nitpick.py style file1 file2 .... This will make the script process the specified files list, applying all available style modules to each file, with in-place editing.

You can always run ./nitpick.py style --help for the built-in help.

Specifying project root directory

In the example above, I already used one of the flags – --root path/to/subdir. You may be familiar with this flag from cpplint. I use it in the include sorter to determine whether an include file belongs to the current project or not, and whether a header file is “my own”.

This can be seen in the two functions that contain these tests.

Suppressing info messages

You can specify the --quiet flag to suppress info messages. This will make the script print just errors and warnings.

This mode is useful when processing many files, so you don’t miss important warnings or errors in a sea if infos.

Choosing modules to run

By default, the style command applies all available style modules. You can specify -m mod_name (multiple times) to selectively run a subset of available modules.

Disabling in-place edit of source files

Sometimes you may not want to let the script perform edits in-place. You can do that by passing the --no_edit flag.

This mode is useful (along with --quiet) to just analyze many files for warnings and errors.

Printing the diff

You can specify --show_diff, to have the script write the diff of processed files to STDERR.

This is useful along with --no_edit, if you want to review what changes will be made without actually writing them back.

Reading from STDIN instead of source files

You can omit the list of source files to make the script process STDIN. In this mode, the modified code is written to STDOUT.

This can be useful if you want to run the script on a file, and have the output written to another file: ./nitpick.py style <infile.cc >outfile.cc.

You will still see info, warning, and error messages, as they are all written to STDERR.

Caveats

Multiple include “batches” are disjoint islands

If your code contains lines that aren’t blank or #include lines between groups of includes, the include sorter processes the different batches independently.

In my projects, this was usually the desired behavior. Consider this example:

// Copyright 2014 The Ostrich

#include "common/defs.h"

#ifdef CUSTOMIZATION_A
// include headers for customization A
#endif

#ifdef CUSTOMIZATION_B
// include headers for customization B
#endif

In this example, sorting all the includes as one batch would change the meaning of the code!

Summary

In this post I introduced my nitpick.py script, with the style command and include sorter module. The script automates the process of grouping and sorting include files according to the Google C++ Style Guide.

The script is available on my GitHub cpplint fork. You’re invited to use it, modify it, share it, etc. If you enhance it, I’d appreciate it if you share back 🙂 .

You can also create issues on GitHub if you run into problems.