C++程序性能优化（二）- C++语言层面

本文最后更新于：15 天前

参考资料：The Most Important Optimizations to Apply in Your C++ Programs - cppcon2022

演讲者将性能优化手段大致分为三个层面：

软件构建阶段的修改

C++的有效使用

针对硬件的相关优化

本文继续上一篇文章，主要从C++语言层面出发，讨论相关的性能优化手段。

C++的有效使用

标注(annotate)代码

14.总是使用constexpr

constexpr 是 C++11 引入的关键字，用于指示表达式、变量、函数等可以在编译期求值，从而提高性能，减少运行时开销。

constexpr 变量：
1
2
3
constexpr int size = 1024; //编译期常量 char buffer[size]; constexpr std::array<int, 3> arr = {1, 2, 3}; //C++20
- 必须在定义时就可计算出值，可以用在需要常量表达式的地方，如数组大小、switch case标签等
- C++20 开始，STL 中如 std::vector, std::string, std::array, std::span 等很多容器支持 constexpr
- constexpr声明的变量具有只读属性，无法修改
  
  C++20引入constinit，只需保证初始化是编译期进行的，但值支持运行时修改。

constexpr 函数：

1
2
3

constexpr int add(int a, int b) {
    return a + b;
}

C++11: 函数体必须是单一 return 表达式
C++14 起：允许多语句、局部变量、分支结构等

C++20 起：几乎可以做所有事，包括 try/catch，甚至分配内存（与 std::vector 一起用）

C++20 还引入了consteval，更加严格，表示函数必须在编译期求值，不能在运行时调用

constexpr int foo(int x) { return x + 1; }
consteval int bar(int x) { return x + 1; }

int main() {
	int runtime = 5;
	int a = foo(runtime);  // 运行时调用
	// int b = bar(runtime); // 错误，不能调用 consteval 函数
}

constexpr函数必须能在编译时运行，否则会报错或退化为运行时求值：

如果在必须编译期常量的上下文中调用 constexpr 函数，而该函数却无法满足编译期常量的条件时，会报错

constexpr int f(int x) {
   int y = x;
   return y;
}
int runtime_value = 5;
constexpr int v = f(runtime_value); // 报错！runtime_value 不是编译期常量

如果是不要求编译期常量 的上下文，会退化为运行时求值

constexpr int square(int x) { return x * x; }
int get_runtime_value();
void foo() {
    int a = get_runtime_value();
    int b = square(a); // square退化为普通函数
}

C++20引入了std::is_constant_evaluated()，属于编译期内建函数，用来在函数内部判断当前代码是否正在编译期求值

1	`constexpr bool std::is_constant_evaluated() noexcept; //函数定义`

#include <iostream>
#include <bit>
constexpr int foo(int x) {
    if (std::is_constant_evaluated()) {
        // 编译期路径
        return x * 2;
    } else {
        // 运行期路径
        std::cout << "Running at runtime\n";
        return x * 3;
    }
}
int main() {
   	constexpr int a = foo(10); // 编译期执行，走的是 x * 2
   	int b = foo(10);           // 运行期执行，走的是 x * 3，并输出提示
   	std::cout << "a = " << a << ", b = " << b << "\n";
}

constexpr 构造函数和类

struct Point {
    int x, y;
    constexpr Point(int x_, int y_) : x(x_), y(y_) {}
};

constexpr Point p(1, 2);  // 在编译期构造

构造函数必须支持编译期求值
所有成员变量也需是 constexpr 可求值的

if constexpr（C++17 起）用于在编译时根据常量表达式进行分支判断

编译时就会被消除不成立的分支代码，因此可避免对无效代码路径的编译、模板实例化或语法检查

if constexpr (condition) {
    // 只有 condition 为 true，才会编译这部分
} else {
    // 否则编译这部分
}

可替代过去的一些 SFINAE 或模板特化技巧

// 旧写法：模板特化
template<typename T>
void print_type(T x) {
    std::cout << x << '\n';
}

template<>
void print_type<int>(int x) {
    std::cout << "int: " << x << '\n';
}

// 旧写法：使用 enable_if 和 SFINAE
template<typename T>
typename std::enable_if<std::is_integral<T>::value>::type
process(T x) {
    std::cout << "Integral type\n";
}

template<typename T>
typename std::enable_if<!std::is_integral<T>::value>::type
process(T x) {
    std::cout << "Non-integral type\n";
}

// 旧写法：通过类型标签进行调度
template<typename T>
void process_impl(T x, std::true_type) {
    std::cout << "Integral\n";
}

template<typename T>
void process_impl(T x, std::false_type) {
    std::cout << "Non-integral\n";
}

template<typename T>
void process(T x) {
    process_impl(x, std::is_integral<T>{});
}

//均可以使用 if constexpr 统一到一个函数中
template<typename T>
void process(T x) {
    if constexpr (std::is_integral_v<T>) {
        std::cout << "Integral\n";
    } else {
        std::cout << "Non-integral\n";
    }
}

注意！if constexpr (std::is_constant_evaluated())的结果恒为真，是错误用法！只能使用if(std::is_constant_evaluated()),判断当前是否处于常量求值上下文，但这样所有分支都会被实际编译。

因此C++23引入了if consteval，用于在编译期判断当前是否处于常量求值上下文，不会编译无效分支。
1
2
3
4
5
6
7
8
9
// 用 if consteval（C++23）
constexpr int bar(int x) {
    if consteval {
        // 只有在常量求值时才编译并执行
        return x + 1;
    } else {
        return 42;
    }
}

15.使用const声明

将变量声明为const，能够帮助编译器确定优化循环代码，包括进行循环不变量外提。

将成员函数声明为const，有利于编译器进行函数内联、常量传播、循环不变量提出、多线程只读访问优化、缓存或寄存器重用等优化
将全局变量拷贝为const局部变量（仅当拷贝开销较小时），当编译器发现分支判断条件为不变量时，可以将循环中的分支外提。示例如下：

16.总是使用noexcept

void f() noexcept 表示函数承诺不会抛出异常。如果函数实际抛出了异常，程序会 std::terminate()

允许编译器省略掉异常处理机制相关的代码、省略栈展开信息、采取更激进的内联优化、进行函数调用点处的优化（省略异常检查与异常传播机制的代码）等

有时会使用到条件 noexcept，可能主要用在模板

void f() noexcept(false);  // 允许抛出异常,与不加noexcpet作用相同
void g() noexcept(true);   // 不允许抛出异常（抛出会调用 std::terminate），等同与只写noexcpt

template<typename T>
void func(T&& t) noexcept(std::is_nothrow_move_constructible<T>::value); //根据类型 T 是否是 noexcept 的移动构造函数，来决定 func 是否是 noexcept

noexcept也可以作为一个运算符，用于在编译期检测表达式是否为 noexcept 表达式
1
void func() noexcept(noexcept(expr));

将移动构造函数或移动赋值运算符设置为noexcept很重要，许多标准容器通常只会在确保移动操作不会抛出异常的前提下，才会优先使用移动构造或移动赋值，否则会退回使用拷贝构造函数。

template<typename T>
void maybe_move_vector(std::vector<T>& vec) {
    vec.push_back(T{});  // 如果 T 的移动构造不是 noexcept，会用拷贝构造！
}

可通过类似手段实现该判断

if constexpr (std::is_nothrow_move_constructible_v<T> || !std::is_copy_constructible_v<T>)
    // 使用 move
else
    // 使用 copy

17.内部链接使用static

使用static标记的函数符号可见性仅限于本编译单元(一般为cpp文件)。但
- 还有一个隐藏的作用是，若一个函数体过长，编译器一般会选择不将其内联，即使加了inline修饰（c++中inline只是提示作用，无法强制编译器将函数内联），而使用static修饰表示该函数只在本翻译单元，则能够使编译器内联函数

18.使用[[noreturn]]

标记函数不会返回，可以帮助编译器优化调用处代码。一般用于抛出异常的函数

1
2
3

[[noreturn]] void fatalError(const std::string& msg) {
    throw std::runtime_error(msg);
}

19.使用[[likely]]和[[unlikely]]

C++20引入，提示编译器某个条件判断 很可能成立（likely） 或 不太可能成立（unlikely），以优化生成的分支跳转机器码

20.使用[[assume(condition)]]

C++23引入的属性，向编译器强假设（assume）某个条件恒为真，以便进行更激进的优化。但GCC已早有类似的扩展用法__builtin__unreachable()

编译器将信任此条件（无视实际运行时行为）
如果 condition 实际上是 false，则行为是未定义的（UB）
与 assert(...) 不同，assert 只在 Debug 模式有效，[[assume]] 是编译器优化提示

21. 使用__restrict

C/C++ 中的一个编译器扩展，用于指针的 别名优化（aliasing optimization）。它告诉编译器：通过这个指针访问的内存地址，在其生命周期内 不会被其他指针访问或修改。

必须确保真的不会别名重叠，否则行为未定义（UB）

在没有 restrict 的情况下，编译器必须假设两个指针可能指向同一块内存，所以优化会更保守

GCC提供的函数属性 __attribute__((malloc))有类似作用，告诉编译器该函数返回一个不别名（non-aliasing）的指针，指向一个新分配的、未别名的内存块

void* my_alloc(size_t size) __attribute__((malloc));
void use() {
    int* a = (int*)my_alloc(sizeof(int));
    int* b = (int*)my_alloc(sizeof(int));
    *a = 1;
    *b = 2;  // 编译器可以假设这不会影响 a，因为 a 和 b 指向不重叠的内存
}

22.保持函数纯净 Make functions pure

使用GCC 提供的函数属性 __attribute__((pure))和 __attribute__((const))，用于标注一个函数具有“纯函数”的语义，有助于编译器进行更激进的优化

函数 不能有副作用（比如写入文件、修改内存、打印输出等）。
函数 **返回值必须只依赖于输入参数或只读全局状态(pure)**，否则就是未定义行为（UB）

//这告诉编译器：get_length(s) 只会读取 s 指向的内容，不会改动其他状态。因此，如果程序中多次调用 get_length(s)，编译器有可能将其结果缓存一次复用，减少调用开销
__attribute__((pure))
int get_length(const char* s); 

//例如下边这两句
int len1 = get_length(s);
int len2 = get_length(s);

//编译器可能将其优化为：
int len = get_length(s);
int len1 = len;
int len2 = len;

GCC支持的属性标注可以查看此处，常用的属性如下：

属性含义与优化作用

__attribute__((hot)) 提示函数是热点路径，编译器可能将其放到代码段前面，有利于缓存友好与分支预测优化。

__attribute__((cold)) 表示函数很少调用，编译器可能将其分离出来，减少主路径代码污染。

__attribute__((pure)) 函数仅依赖参数和全局只读状态，没有副作用，有利于调用合并或删除重复调用。

__attribute__((const)) 比 pure 更强，函数只依赖参数，无副作用也不访问全局变量，进一步利于编译器缓存/删除。

__attribute__((malloc)) 表示返回的是新内存块的指针，不别处别名，可用于别名分析与向量优化。

__attribute__((noreturn)) 表示函数不返回（如 exit()），可用于删除无用路径或简化控制流图。

__attribute__((assume_aligned(16))) 指定返回的指针有对齐要求，有利于SIMD向量化与内存访问优化。

对于交叉编译链，若不清楚支持哪些属性，可以写一个带有很多属性的测试文件：
1
2
3
4
5
6
7
8
__attribute__((hot))
__attribute__((cold))
__attribute__((pure))
__attribute__((const))
__attribute__((malloc))
__attribute__((noreturn))
__attribute__((assume_aligned(16)))
void test() {}
然后加上-Wattributes配置项编译，不支持或冲突的属性会被编译器以警告形式列出来：
1
<your-toolchain-prefix>-gcc -c test.c -Wall -Wextra -Wattributes

属性	含义与优化作用
`__attribute__((hot))`	提示函数是热点路径，编译器可能将其放到代码段前面，有利于缓存友好与分支预测优化。
`__attribute__((cold))`	表示函数很少调用，编译器可能将其分离出来，减少主路径代码污染。
`__attribute__((pure))`	函数仅依赖参数和全局只读状态，没有副作用，有利于调用合并或删除重复调用。
`__attribute__((const))`	比 `pure` 更强，函数只依赖参数，无副作用也不访问全局变量，进一步利于编译器缓存/删除。
`__attribute__((malloc))`	表示返回的是新内存块的指针，不别处别名，可用于别名分析与向量优化。
`__attribute__((noreturn))`	表示函数不返回（如 `exit()`），可用于删除无用路径或简化控制流图。
`__attribute__((assume_aligned(16)))`	指定返回的指针有对齐要求，有利于SIMD向量化与内存访问优化。

不做冗余拷贝

23.采取合适的参数类型

这里演讲者给出了超详细的路线图，关于函数的参数应该取什么类型。

主要总结几个平时工程代码中不注意的点：

string的隐式转换。由于图方便，许多函数传参时是const char*，而函数参数是string，增加了一次构造
1
2
void func(string str) {//...} func("hello");
函数参数按值传递，存在拷贝构造。除了int、float等pod类型变量可以忽略，拷贝开销较大的对象类型若无需求都应引用传参。常被忽略的是shared_ptr变量，当函数内不需要管理指针生命周期时，可以只传原始指针，或将参数类型写为引用
老陷阱，map遍历时需要用 const auto&承接，因为std::map<std::string, int> myMap的元素类型是std::pair<const std::string, int>。如果漏了const就会导致隐藏的元素拷贝