When using
ELF-based toolchains (AVR, ARMs; whenever compiling clang or GCC, really), the concept of a
section is key. (In the simplest of terms, it is just a container, for code/data/stuff; each section does have a set of flags to describe what it contains. In object files –– those compiled but not linked yet ––, sections do not have specific memory addresses yet; everything is relative to the beginning of the/a section.)
Linker script is what describes where each section will end up, and can even discard entire sections. (It can also pull out specific symbols from specific sections.)
objdump -h object-file.o will describe the size of each section, alignment, flags, and even where the data starts in the object file. For linked files (executables, libraries, firmware images before conversion to hex/raw format), it also tells where in the process memory address space that section will reside in.
Because relocation references –– for example, a function name your code calls or takes the address of –– are based on sections in ELF files, and compilers can automatically put each function in their own section (
-ffunction-sections for gcc and clang), unused functions can be automatically pruned (left out) by linkers because their sections won't occur in any relocation record. In other words, the "include only used functions from a statically compiled library" feature relies on the library being compiled with each individual function in their own section. These will be named using a pattern, so that the linker will simply hoover them all up just as efficiently as if they were all in a single section. Similar option (
-fdata-sections) exists for data (global variables and structures).
The programmer can also specify a section for each function or data object, via a compiler-specific
section attribute (even in assembly language; this is not specific to C, and only relies on the ELF object file format). This means that one can construct an array (of pointers, or of structures) that is automatically collected from all object files when linking into a linear array. The downside is that there is no way to force a specific order for the elements from different files. (From a single file, their order is fixed.) But, when you have say a hardware interrupt table, just declaring it as a suitable structure or pointer array (depending on exactly how the hardware implements the table), and putting it into a specific section, lets the linker script put it in the exact correct location in memory in the firmware image.
Similarly, if you have compiled your object file with each function in their own section, you can write a simple script to parse the output of
objdump -rt object-file(s), and generate the possible call graph of those functions, including via function pointers in unmodified structures. (Again, because ELF relocation records are based on sections, the relocation record section name identifies the target function, so no disassembly or memory lookups is needed.)
I've found this extremely useful with state machines, for example visualizing exactly the menu structure, when the menu entries are defined as structures in flash pointing to each other and/or to transition functions and event handlers.
Because of all this, I
claim that analysing memory use based on the division to code, data, uninitialized/zero-initialized data, stack, and heap, is an oversimplification. It is better to understand the ELF object file properties and section shenanigans underneath, because they can provide exact statistics and interrelationships (like why calling some function drags in
printf() even when
printf() will never be called in practice). Stack is a bit different, because real world sampling can tell the
typical stack use, you kinda-sorta need static analysis to find out the worst case; and add the worst case interrupt chain stack frame sizes on top –– making stack the most annoying to estimate/analyze, unless you have a package that can do it at the source level. (The fact that the typical effect of stack overflow is corruption of unrelated variables, and that even canaries only provide a (high) statistical detection probability, makes stack size one of the most annoying problems to optimize.)
If you have Linux or WSL2 available, you might find
this answer and example code regarding the section array use informative. From there, I recommend you simply dive in to existing linker scripts (those used in Arduino cores for various microcontrollers, for example, you can find Teensy LC (ARM Cortex-M0+)
here), keeping a browser window open to the
binutils linker script documentation. The learning curve is steep, but worth it, if you want to do serious embedded or microcontroller development.