Which is giving me an idea that I can try on my RISC-V core. In a typical architecture in which the stack pointer is growing downwards, we could just keep track of the *minimum* value of the stack pointer every time it changes, and store that in a dedicated register (like in a CSR.)
If you expand that idea, so that every access to memory is verified in hardware to be within an allowed range or an interrupt is triggered, you'll end up with a segmented memory model with linear address space.
Due to experimenting with named address spaces in C and C++ (no, they are definitely not standardized; just an extension implemented by GCC for C and clang for C and C++), I've been surprised to find how extremely useful these are when they do have hardware support. (Even on x86-64, there are essentially four independent address spaces: the "default" one, the stack (accessed via the SS segment), and FS and GS segments. On many AVRs, being based on Harvard architecture, there are two: code ("progmem"), and data.)
Named address space support as implemented by gcc and clang use the
type of a pointer variable to indicate its address space; and this can be overridden/cast via explicit expressions. The named address space in the type specification does affect overloading and templates (C++) and type generics (C _Generic), so with a workable software engineering mindset, these Just Work.
It seems to me OP has the misguided understanding that types and type specifications are just specifications for how the underlying hardware memory access is done; and that the confusion stems from this fundamental misconception. (You could, and I'm sure someone will, claim that
"You're wrong! That's exactly
what types are!", based on the fact that compilers and interpreters use the type to perform the correct memory access; to that, I retort that just because a human can be eaten, does not make humans "food".)
In a much more fundamental sense,
type specifies all the known information about the variable or memory access expression at that point, that the
value of the variable or expression does not.
(Not realizing this, and being fixated on their own definitions, is why the GCC C++ front-end developers and C++ standard committee claims that named address space support is impossible/"too hard" to implement for C++; and inversely, having realized the above, the Clang/LLVM C++ developers just went and implemented it because it was needed for OpenCL anyway. I'm very tempted to draw a parallel to an universal list-based structural markup format we could replace XML with, and both significantly increase the parsing speed and minimizing RAM usage (especially important for IoT) while allowing basically a complexity explosion in the options for specifying structured metadata associated with each node; and how getting it accepted or standardized without first implementing actual real-world examples handily beating their current XML/XML-derivative "competitors" –– and even afterwards! –– is basically impossible because most human minds are set in their ways. Smart people can be scarily stupid, you see.)
In C, the exact meaning of the asterisk (
*) is heavily dependent on the context. It can be the binary multiplication operator, the unary dereferencing operator, or a pointer type specifier. Similarly, the ampersand (
&) can be the binary 'and' operator, or the unary address-of operator. So, one of the first skills one
needs to cultivate, is the skill of recognizing what context to apply to any statement, lexical sequence, or expression.
For type specifications – for example, when specifying what kind of parameters a function takes –, the asterisk-as-pointer-type-specifier is particularly simple: it reads as "is a pointer to". Type specifications themselves are split at asterisks into type specifier sets. The order within a type specifier set is irrelevant:
const volatile int,
volatile const int,
int volatile const,
volatile int const, and so on, are all equal; although many humans prefer a specific order to keep things familiar. The order of the sets, however,
is important: they are read from right to left. A final wrinkle is that if the type specification names a variable, any
const or
volatile preceding the variable name without an asterisk in between, means those are associated with the variable and not the type.
For example,
const volatile int *const x;says, quite literally,
"x is a const pointer to a const volatile int". The first
const in the English statement, corresponding to the rightmost
const in the C expression, is a promise to the compiler that the code does not try to change the value of variable
x. The other
const, the leftmost
const in the C expression, is a promise to the compiler that the code does not try to change the value of whatever this pointer refers to. The
volatile tells the compiler that although
this code won't try to change that value, other code may, and therefore the compiler should not try and cache the value. Aside from the asterisk
* that tells us the declared variable
x is a pointer, the only thing left is the
int: it specifies the type of the object that variable
x points to, is an
int.
The same simple logic applies to all type specifications. For function pointers, the fact that the pointer variable name is in the middle with its parameter specification in parentheses to the right does make things harder to read, but the rules stay the same.
Type specifications can also appear in
cast expressions. In C, a
cast is an operation that affects the type of a variable or an expression. These are common in "accessor" functions: small functions whose purpose is to make code more maintainable and easier to read. For example, if you have a binary communications protocol, you might wish to have an accessor function that can convert four consecutive bytes in a specific byte order ("endianness") to a 32-bit signed integer:
static inline int32_t get_s32_le(const void *ref)
{
const unsigned char *const buf = ref; /* = (const unsigned char *const)ref */
return (int32_t)( ((uint32_t)buf[0])
| (((uint32_t)buf[1]) << 8)
| (((uint32_t)buf[2]) << 16)
| (((uint32_t)buf[3]) << 24));
}
However, some "programmers" think they need to write clever code minimizing the length to show how "good" they are, so they may instead write the above as a macro
#define S32_LE(ptr) ((int32_t)( *((const unsigned char *)(ptr)+0) \
| ((uint32_t)(*((const unsigned char *)(ptr)+1)) << 8) \
| ((uint32_t)(*((const unsigned char *)(ptr)+2)) << 16) \
| ((uint32_t)(*((const unsigned char *)(ptr)+3)) << 16) ))
or even as
#define S32_LE(ptr) ({ const unsigned char *_p = (ptr); (int32_t)( p[0] | (((uint32_t)p[1]) << 8) | (((uint32_t)p[2]) << 16) | (((uint32_t)p[3]) << 24) );})
and congratulate themselves for having "optimized" the heck out of this operation, without realizing that using either GCC or Clang with typical/recommended optimizations enabled (
-O2), in the final binaries, all three end up generating the same machine code. Yet, two of the three are quite unreadable (thus likely sources of bugs; can you find the one that I might have inserted there on purpose?).
True, the
static inline function does have unnecessary code – it's verbose like me; but not for verbositys sake, only to try and convey the underlying concepts and ideas in as useful form as possible – like the cast of the first byte to 32-bit unsigned integer. However, since they generate no extra code, whether one should keep or drop them, depends on which form one believes is the most efficiently maintainable one: which form is easiest to maintain in the long term, keeping the probability of a bug (being accidentally introduced and/or left unnoticed in this code) as low as possible.
If the couple of us still left reading this novel of a post circle back to the original question at hand, using what we learned just above (with the named address spaces being just a spice on top to remind us how oddly useful and strange
types can be), we'll find we have the following:
- We have functions void ftoa(float v, char *p, int n); and int intToStr(int v, char *p, int n);
- We have char res[20]; and float n;
- We call ftoa(n, res, 4);
- Within function ftoa(), we have int i = intToStr( (int)v, p, 0);.
A detail in C not yet discussed is that the name of an array variable "decays" to a pointer to its first element.
That is, in a very real sense,
res is an array of 20 chars, but when we
use res in an expression (that does not just specify a type, so excluding for example expressions like
sizeof res, which evaluates to 20 and not to
sizeof (char *)), it behaves like it was declared as
char *res;.
Indeed,
(res + 1) == &(res[1]) is true.
This means that when main() calls
ftoa(n, res, 4),
res decays to a pointer to the first element in an array of 20 chars. The definition of the function
ftoa() says the second parameter is a pointer to char, so this is absolutely fine.
To convert the integer part of
nto
i chars, ftoa() calls the equivalent of
int i = intToStr( (int)v, p, 0); . Why is the second parameter just
p and not
&p? Because
p is a pointer to char, that's why.
&p would pass a parameter of type
char **: a pointer to a pointer to a char.
Because of pointer arithmetic and array variables decaying to pointers to their first element in C, there is no distinction in C between pointers pointing to a
single element, and pointers pointing to
several consecutive elements. Put simply, there is no reason to expect
p above to point to a single character. If there is no separate parameter specifying how many there are, we just do not know. If the code overflows the buffer, then we can just say "ouch, that wasn't what I intended", and then fix it.
It would be much better to use something like the following function signature here:
char *float2str(char *buffer, size_t buflen, float value, int decimals);The first parameter is a pointer to the array of characters used to store the resulting string. (A string in C is just an array of chars with a terminating nul byte,
'\0', at end.) The second parameter is the number of chars in that array. Because we have the wonderful
sizeof operator (and in C,
sizeof (char) == 1 always), this includes the string-terminating nul byte at the end. The third parameter is the value to be converted, and the fourth is the number of decimal digits desired. The function returns a pointer to the first character of the string describing the value as a decimal number, stored somewhere within the specified buffer, but not necessarily at the beginning of the buffer.
If there is a problem, for example the buffer is too small for the number of decimals desired, the function can return a NULL pointer indicating the conversion was impossible. (For embedded code, I would use a dedicated "error" pointer, one that always points to a nul byte, however. That way checking for errors would be optional, but barring implementation bugs, it would Always Just Work.)
The trick to implementing such a function efficiently, is to start at the decimal point. The integral and fractional parts are constructed separately; it does not really matter which one first. Operate on magnitudes only (absolute values); and if the original value was negative, prepend a '-' just before the most significant decimal digit. The integral part advances left via repeated division (of the integral part only) by ten, and the fractional part right via repeated multiplication (of the fractional part only) by ten. The same approach works fine for all fixed-point formats also, and you only need integer division-by-ten-with-remainder (the remainder corresponding to the digit at that position), and fractional multiplication by ten (followed by extracting the integral part of the result as the digit, with rounding ever applied to the last presented digit.
If anyone is interested, I can post an example implementation; however, I'd prefer if OP tried their hand at it first. Not only is it interesting – implemented this way the function is
much faster than e.g. snprintf() in hosted C environments, while still yielding the exact same string for all finite floats/fixed-point numbers – so there is real motivation to implement and test one of their own, but everything I blabbed about above about learning the proper context, is easier to learn in practice. You'll find that whenever you read or write code that specifies a type, you automatically think in terms of "type specification context and syntax".
The English equivalent is that lead and lead do not rhyme.