In C++ programming, structs are commonly used to group related variables together. However, I realize that the order of declaration of struct members can have a significant impact on the memory size that the struct occupies. This is due to memory alignment in C++, which can lead to the insertion of padding (unused memory) between struct members.
In this post, we will explore how to arrange struct members in a way that optimizes memory usage and minimizes unnecessary padding.
What is Memory Alignment and Padding?
Memory alignment refers to how data types are stored in memory at specific byte boundaries. On most systems, certain data types such as int, long long, char, and bool are required to be stored at memory addresses that are divisible by their respective sizes.
For example:
int usually requires 4-byte alignment.
long long requires 8-byte alignment.
char and bool can be stored at any memory address.
If struct members are not aligned properly, the compiler may insert padding to ensure each data type adheres to its alignment requirements. This padding results in wasted memory.
Example of a Non-Optimized Struct
Let’s look at a simple struct with members declared in a non-optimal order:
#include <bits/stdc++.h>
using namespace std;
struct {
int b; // 4 bytes
char c; // 1 byte
long long a; // 8 bytes
bool d; // 1 byte
} st;
signed main() {
cout << sizeof(st) << endl; // Output the size of the struct
return 0;
}
In this struct:
int b takes 4 bytes.
char c takes 1 byte, but the compiler will add 3 bytes of padding after it to align long long a to an address divisible by 8.
long long a takes 8 bytes and requires 8-byte alignment.
bool d takes 1 byte, but the compiler adds 7 bytes of padding after it to ensure the struct’s total size is a multiple of 8 bytes.
Total size:
4 (int) + 1 (char) + 3 (padding) + 8 (long long) + 1 (bool) + 7 (padding) = 24 bytes
Optimizing the Struct
To reduce padding and optimize memory usage, we can rearrange the struct members so that larger types are declared first, minimizing the need for padding. Here’s how we can improve the struct:
#include <bits/stdc++.h>
using namespace std;
struct {
long long a; // 8 bytes
int b; // 4 bytes
char c; // 1 byte
bool d; // 1 byte
} st;
signed main() {
cout << sizeof(st) << endl; // Output the size of the struct
return 0;
}
Optimized Result:
In this optimized struct:
long long a takes 8 bytes and requires 8-byte alignment. Placing it at the beginning ensures no padding is added after it.
int b takes 4 bytes and requires 4-byte alignment. Placing it after long long ensures no padding.
char c takes 1 byte and requires 1-byte alignment. Placing it after int b results in no padding.
bool d takes 1 byte and requires 1-byte alignment. It fits perfectly after char c, with no padding needed.
Finally, the compiler adds 2 bytes of padding after it to ensure the struct’s total size is a multiple of 8 bytes.
Alignment of the struct: The overall size of the struct must also be divisible by the alignment of the member with the highest alignment requirement. If a struct contains a member like long long (which requires 8-byte alignment), then the total size of the struct must be a multiple of 8 to comply with this alignment requirement.
Total size:
8 (long long) + 4 (int) + 1 (char) + 1 (bool) + 2 (padding) = 16 bytes
Why This Struct Is More Optimized
1. Proper memory alignment: By placing the largest types (long long) first, we ensure no padding is required between members.
2.Reduced padding: Arranging members in decreasing order of size and alignment minimizes the need for padding, resulting in more efficient memory usage.
3. Optimized memory usage: This struct now has a total size of 16 bytes, saving 8 bytes compared to the original version.
Note
In the case of a struct like this:
struct {
double x;
int a[3];
int b;
};
We will declare double x before int a[3], even though the total memory size of a[3] is 12 bytes and x is 8 bytes. This is because double has a larger alignment requirement than int, and placing x first ensures proper alignment.
Therefore, the declaration order depends on the size of the data type rather than the total memory size of the elements.
Conclusion
When working with structs in C++, it's not just about grouping related variables together but also about optimizing memory usage. By arranging members in decreasing order of size and alignment, you can reduce padding and save memory, which is essential for memory-sensitive applications.
We hope this post helps you understand how memory optimization works with structs in C++ and how to make your programs more efficient!
My code was getting memory limit excited at an issue and I accidentally discovered it. :V
On most systems, certain data types such as int, long long, char, and bool are required to be stored at memory addresses that are divisible by their respective sizes.
Probably the compilers are just being conservativeThe compilers are required to produce aligned addresses (An object type imposes an alignment requirement on every object of that type
in C++). But using unaligned addresses are possible and hardly cause performance differences with modern CPUs . Will you code pass with#pragma pack(1)
without the struct change?It has passed, but according to some information I found on the internet, #pragma pack(1) can have a Performance Impact and Lead to Loss of Compiler Optimizations.
what about using
__attribute((packed))__
? Does it have any performance issues, since using it gives the size of your struct 14 bytes?It certainly could. Misaligned pointers are UB according to C standard so it entirely depends on what your compiler does in your specific case since you're using a compiler extension attribute.
In the context of CP I would be curious to see if this can cause any TLEs. In a simple test I added
-march=skylake
to check the case for SIMD, whether with it, the existence of#pragma pack(1)
or not produces the same asm except the memory offsets.Anon Tokyo! (excuse my weird point of attention)
The compilers are following the standard. It's UB so it can work perfectly fine on most modern hardware, but compilers are made to be compatible with (for a reasonable definition of "all") all hardware.
Thanks for pointing out this, I updated my comment.
On memory optimization associated with structs, when we use a data type larger than the required range, eg. using int to store a positive integer less than 1e5, there is a useful trick to cut down memory usage:
Such a struct only takes up 64 bits or 8 bytes of memory, yet it gives us 3 usable variables (on the condition that they don't exceed their respective upper limits).
This is useful knowledge
Aliens magic, wow
And how does memory access to such variables work? You can't just
movq
those, there needs to be extra overhead. Even more complicated is store.https://godbolt.org/z/45GPWGdzv
load adds a
shr
andand
, store adds 4 extra bitwise operations. I'm curious how much this overhead is compared to the cache locality savings, I would think it's basically negligible.The usual rule is negligible with random access, big with sequential access. Exact performance somewhat affected by specific CPU's cache parameters.
does this also happen in the classes ?!
Yes. In C++, classes are just structs with
private
as the default access specifier.