Author Topic: [C] Ow, pointers are making my brain hurt... (Read 7322 times)

HwAoRrDk · « **on:** November 09, 2021, 09:57:57 pm »

In a C program, if I want to pass a pointer to a buffer in to a function, then within that function read from the buffer, but also increment the original pointer, is this the right way?

Code: [Select]

void foo(uint8_t **in) {
    uint8_t n;
    n = *(*in)++; // read a byte from buf, but also increment the original pointer
}

void bar() {
    uint8_t *buf;
    foo(&buf); // After returning, buf pointer will have been incremented
}

My brain is hurting at the moment, so I'm unsure if that's doing it correctly.

cfbsoftware · « **Reply #1 on:** November 09, 2021, 10:29:42 pm »

If pointers are making your brain hurt you should ensure that you understand the basic principles. These are explained really well in Chapter 5: Pointers and Arrays of The C Programming Language by Kernighan and Ritchie. It's less than 30 pages. If you haven't got the book already then go no further until you have.

SiliconWizard · « **Reply #2 on:** November 09, 2021, 10:36:52 pm »

Quote from: HwAoRrDk on November 09, 2021, 09:57:57 pm

In a C program, if I want to pass a pointer to a buffer in to a function, then within that function read from the buffer, but also increment the original pointer, is this the right way?

Code: [Select]
void foo(uint8_t **in) { uint8_t n; n = *(*in)++; // read a byte from buf, but also increment the original pointer } void bar() { uint8_t *buf; foo(&buf); // After returning, buf pointer will have been incremented }
My brain is hurting at the moment, so I'm unsure if that's doing it correctly.

Yes this is correct.

rstofer · « **Reply #3 on:** November 09, 2021, 10:53:44 pm »

Quote from: HwAoRrDk on November 09, 2021, 09:57:57 pm

In a C program, if I want to pass a pointer to a buffer in to a function, then within that function read from the buffer, but also increment the original pointer, is this the right way?

Code: [Select]
void foo(uint8_t **in) { uint8_t n; n = *(*in)++; // read a byte from buf, but also increment the original pointer } void bar() { uint8_t *buf; foo(&buf); // After returning, buf pointer will have been incremented }
My brain is hurting at the moment, so I'm unsure if that's doing it correctly.

printf() is your friend. Make up some sample cases and print the results.

golden_labels · « **Reply #4 on:** November 10, 2021, 03:22:21 am »

No trying to combine as much unrelated operations as possible in a single expression may help:

Code: [Select]

n = **in;
++*in;

Or even:

Code: [Select]

uint8_t* buf = *in;
n = *buf;
*in = buf + 1;

Kjelt · « **Reply #5 on:** November 10, 2021, 07:17:34 am »

I hate that code. If someone else reads it or has to modify, you can only make mistakes.
Instead declare a buffer and pass the start of the buffer as a const pointer to the function, add an index as parameter and let the function increase the index. You never ever want to change the original pointer of your buffer and esp. Not in another function than the one who declared it (and owns it).. Memory leaks and other dangers are lurking.
Oh yes also good practice to pass the size of the buffer or you have another problem that the other function is accessing memory beyond the buffer.

Nominal Animal · « **Reply #6 on:** November 10, 2021, 10:34:32 am »

When reading pointer types (excluding the variable name), split it into tokens at each asterisk (*), and read the tokens from rightmost to leftmost, replacing each asterisk with "[is] a pointer to".

Thus, const char *volatile p; reads as "p is volatile, a pointer to const char", meaning that the value of p, the pointer, is volatile (the compiler may not assume its value does not change unexpectedly); and it points to a string or char array that we promise not to try and change the contents of.

Because parameters are passed by value, a function that takes say char *s as a parameter, can modify both the pointer s and the value it points to, (*s), but only the latter change is visible to the caller.

In function calls, the name of an array "decays" to a pointer to the first element of an array. (This viewpoint is useful in that it also explains why sizeof(array)/sizeof(*array) returns the number of elements in array array, but passing the array to a function loses that size information, and instead yields (size of a pointer in bytes/size of the target type in bytes).)

These are not the rules C standard defines, but give you the correct intuition that you can then refine if need be.

If your function wants to modify a pointer, but doesn't return anything, return the modified pointer. For example:

Code: [Select]

const char *skip_whitespace(const char *src)
{
    if (src) {
        while (isspace((unsigned char)(*src)))
            src++;
    }
    return src;
}

or say

Code: [Select]

const char *trim(char *src, size_t *len)
{
    if (!src)
        return "";  /* NULL turns into an empty string! */

    /* Skip leading whitespace */
    while (isspace((unsigned char)(*src)))
        src++;

    /* Trim out trailing whitespace */
    char *end = src + strlen(src);
    while (end > src && isspace((unsigned char)(end[-1])))
        end--;
    *end = '\0';

    /* Save length, if requested. */
    if (len)
        *len = (size_t)(end - src);

    return src;
}

In some cases you know the pointer won't be modified. Then, it is a good idea to consider what useful does the function calculate that a caller might be interested in. For example, you might need a function that trims a string converting all linear whitespace to a single space, returning the final length:

Code: [Select]

size_t trim_to_spaces(char *src)
{
    /* NULL and empty strings have length zero */
    if (!src || !*src)
        return 0;

    /* We keep src unmodified, but: */
    char *s = src;  /* Next source character */
    char *d = src;  /* Next destination character */

    /* Skip any leading whitespace. */
    while (isspace((unsigned char)(*s)))
        s++;

    /* Copy loop. */
    while (*s) {
        if (isspace((unsigned char)(*s))) {
            /* Skip all consecutive/linear whitespace, */
            do {
                s++;
            } while (isspace((unsigned char)(*s)));
            /* and replace it with a single space. */
            *(d++) = ' ';
        } else {
            *(d++) = *(s++);
        }
    }

    /* Remove the possible final space from output. */
    if (d > src && d[-1] == ' ')
        d--;

    /* String ends at d. */
    *d = '\0';

    /* Return the length of the result. */
    return (size_t)(d - src);
}

The idea is that the caller can do either trim_to_spaces(stringvar); or len = trim_to_spaces(stringvar); depending on whether the length of the trimmed and space-compacted string is useful or not.

Before I decide on the function prototype/interface, I like to write a small test case for the key points in the algorithm I want to implement, to see exactly what would be useful there. Unless I'm implementing something I'm already familiar with, I often discover a completely different way of implementing the algorithm than what I originally envisioned, by just changing the helper functions suitably.

A common way to describe variable-length byte data like strings while modifying them, is to use a structure similar to

Code: [Select]

typedef struct {
    char *data;
    size_t  size;  /* Allocated size, i.e. data[0..size-1] are valid accesses */
    size_t  used;  /* Number of bytes used data, i.e. data[0..used-1] */
} area;
#define  AREA_INIT  { NULL, 0, 0 }

I do believe structures like the above are what Kjelt referred to, above.

Instead of passing three pointers (one to the data pointer, one to the allocated size, and one to the current length of the contents in the buffer), you just pass a pointer to the structure. Any changes the function does to the pointer are not visible to the caller, but any changes it makes to the structure contents are: perfect.

The AREA_INIT macro is useful in that if variables of type area are initially set to AREA_INIT, we don't need a separate "init" function. That is, you declare e.g. area result = AREA_INIT;.

For example, to append data to an area would then be

Code: [Select]

int area_append(area *dst, const void *src, const size_t len)
{
    if (!dst) {
        errno = EINVAL;
        return -1;
    }

    if (dst->used + len >= dst->size) {
        const size_t  new_size = dst->used + len + 1;  /* TODO: Growth policy? */
        void *new_data = realloc(dst->data, new_size);
        if (!new_data) {
            /* Old area is intact, but we cannot get more room.  So this fails, but is not fatal. */
            errno = ENOMEM;
            return -1;
        }
        dst->size = new_size;
        dst->data = new_data;
    }

    if (len > 0) {
        memcpy(dst->data + dst->used, src, len);
        dst->used += len;
    }

    /* In case it is string data, we append a nul byte, just to be nice. */
    dst->data[dst->used] = '\0';

    return 0;
}

and to append a C string,

Code: [Select]

int area_append_string(area *dst, const char *src)
{
    const size_t  len = (src) ? strlen(src) : 0;
    return area_append(dst, src, len);
}

When destroying/freeing an area, we return it into the initial state, so it can be reused:

Code: [Select]

void area_free(area *dst)
{
    if (area) {
        free(area->data);  /* Note: free(NULL) is safe, and does nothing. */
        area->data = NULL;
        area->size = 0;
        area->used = 0;
    }
}

Kjelt · « **Reply #7 on:** November 10, 2021, 11:44:03 am »

Quote from: Nominal Animal on November 10, 2021, 10:34:32 am

I do believe structures like the above are what Kjelt referred to, above.

The guy is a newbie and I don't want him to get scared of structs with pointers or double pointers etc.

Just let him pass the three parameters and work it out, that is one level deep and should be comprehensible for beginners.
I have seen 20+ yr experienced programmers go on their behind on double pointer errors, I deal a lot with starter programmers from foreign countries and the main task is to get them enthousiastic about programming,
and not get them scared by being too smart or showing off (not that you are doing that but you get the point)

mfro · « **Reply #8 on:** November 10, 2021, 12:02:48 pm »

Quote from: HwAoRrDk on November 09, 2021, 09:57:57 pm

In a C program, if I want to pass a pointer to a buffer in to a function, then within that function read from the buffer, but also increment the original pointer, is this the right way?

Code: [Select]
void foo(uint8_t **in) { uint8_t n; n = *(*in)++; // read a byte from buf, but also increment the original pointer } void bar() { uint8_t *buf; foo(&buf); // After returning, buf pointer will have been incremented }
My brain is hurting at the moment, so I'm unsure if that's doing it correctly.

"if it was hard to write, it has to be hard to read".

Besides what others have mentioned already (most likely not a good idea when the callee modifies a pointer in the caller, etc.), adopt the ultimate wisdom of humanity since stone age:

if you want to eat an elephant: cut it in slices first.

If you can't write something to be easily readable in a single line, write it in several. That's not to your disgrace, but to you reader's convenience (and yours, if you need to revisit the code).

Compiling code into something reasonably performant is the compiler's job (that's why it's called compiler), not yours (at least not at such early stage).

Nominal Animal · « **Reply #9 on:** November 10, 2021, 12:26:54 pm »

Quote from: Kjelt on November 10, 2021, 11:44:03 am

I have seen 20+ yr experienced programmers go on their behind on double pointer errors, I deal a lot with starter programmers from foreign countries and the main task is to get them enthousiastic about programming,
and not get them scared by being too smart or showing off (not that you are doing that but you get the point)

True.

I probably should have left my post with just the initial part, and omit the examples, as internalizing the pointer reading rule, functions passing variables by value and not by reference (so changes to a passed variable are not visible to the caller), and arrays decaying to pointers to their first elements, really covers most if not all "pain" regarding pointers. It's worth talking about and experimenting with an hour or so, alone.

In my defense, I really like helping others, but I often get excited and too verbose on the net, not having the face-to-face nonverbal feedback that tells me when I need to backtrack and try another approach.

The three string handling examples and the area structures are all cases I've seen others get initially horribly wrong, but spring back when playing with these examples, and discussing why they are written the way they are. Even the isspace((unsigned char)(*src)) is important, as without the (unsigned char) cast, it will fail on non-ASCII characters on OSes and architectures where char is a signed type, which usefully leads to a short side discussion about casting, and what it means. (Say, as opposed to type punning, leaving the latter for a later exercise.)

Quote from: mfro on November 10, 2021, 12:02:48 pm

If you can't write something to be easily readable in a single line, write it in several. That's not to your disgrace, but to you reader's convenience (and yours, if you need to revisit the code).

Very true. Writing readable, easily maintained code is much more important than writing concise or tricky code. That's why coding "tricks" are considered "evil" outside obfuscated code contests and code golf!

If you also learn to write comments that describe the programmer intent behind an operation or a function, say /* We sort the data (using Quicksort), so that we can use binary search to find the entries efficiently later on. */ instead of /* Sort the data */, you're already way ahead. I cannot overstate the value that kind of commenting skill, in real life projects. I so wish I had learned to write better comments early on, because it is darned hard to learn afterwards.

DavidAlfa · « **Reply #10 on:** November 10, 2021, 02:32:49 pm »

I've done that recently, yes, can be done, but be very careful when modifying it...

Code: [Select]

Void something (uint8_t **ptr){
  // Increase pointer pointed by our pointer
  *ptr = *ptr+1;
  // *ptr++ doesn't work.
}

Nominal Animal · « **Reply #11 on:** November 10, 2021, 03:20:27 pm »

That's because C operator precedence says that *ptr++; is equivalent to *(ptr++);.
You need to write (*ptr)++; or ++(*ptr); to increase the value the pointer points to.

Me, I don't rely on the C operator precedence at all, because the explicit parentheses make it easier to parse for us humans anyway, and the compiler does not mind.
(That is, adding parentheses even when not strictly necessary, does not change the compiled code at all.)

DavidAlfa · « **Reply #12 on:** November 10, 2021, 05:05:47 pm »

Nice explanation

Everytime I get such specific in-depth details I feel like "Unga bunga, me know program thing"

HwAoRrDk · « **Reply #13 on:** November 10, 2021, 05:07:21 pm »

Thank you for the 'C Pointers 101', but it really wasn't necessary.

Unfortunately, today I am one of those "20+ yr experienced programmers" (but not all in C) Kjelt referred to being put on his arse by brain fade induced by late night tiredness. Sometimes you just want to get a quick confirmation that what your brain is concocting from the dredged-up knowledge it can muster is actually on the right track.

I was more concerned with what has been latterly discussed with regard to operator precedence. Like, is *(*in)++ actually doing what I think it's doing? Apparently not. Guess I should do the pointer incrementation separately from the read.

Code: [Select]

void foo(uint8_t **in) {
    uint8_t n;
    n = *(*in);
    *in = *in + 1; // or maybe *in += 1?
}

To address some of the other points raised:

Quote from: Kjelt

Instead declare a buffer and pass the start of the buffer as a const pointer to the function, add an index as parameter and let the function increase the index. You never ever want to change the original pointer of your buffer and esp. Not in another function than the one who declared it (and owns it).. Memory leaks and other dangers are lurking.
Oh yes also good practice to pass the size of the buffer or you have another problem that the other function is accessing memory beyond the buffer.

This sub-function only ever (conditionally) reads one byte from the buffer (which is guaranteed to exist and be within bounds); passing an index argument to be incremented locally and/or a buffer size would be pointless. In fact, this function will only ever get called from one other function, because it is mostly just some syntactic sugar to tidy up and abstract logic a little. Decoupling its pointer manipulation from the parent function would be a waste of time.

Quote from: Nominal Animal

If your function wants to modify a pointer, but doesn't return anything, return the modified pointer.

The function does actually need to return something - I just elided that for simplicity. I might actually change it up so that the value returned is via an output argument pointer, and the incremented pointer is the return value.

SiliconWizard · « **Reply #14 on:** November 10, 2021, 06:18:03 pm »

Quote from: DavidAlfa on November 10, 2021, 02:32:49 pm

I've done that recently, yes, can be done, but be very careful when modifying it...
Code: [Select]
Void something (uint8_t **ptr){ // Increase pointer pointed by our pointer *ptr = *ptr+1; // *ptr++ doesn't work. }

'*ptr++' doesn't work here, as you said, but '(*ptr)++' does, as the OP actually did!
All the lecturing after that about semantics and style was interesting and informative, but at least I think we can congratulate the OP on getting it right, which is not that common for people still uncomfortable with pointers!

DavidAlfa · « **Reply #15 on:** November 10, 2021, 08:30:31 pm »

I rarely used pointer few years ago, as most C code was for small microcontrollers, with little ram, little flash, no dynamic allocation, and mostly done using state machines.
Honesty, not so long ago the "->" was "The strange arrow I see sometimes"

With stm32 and pic32 everything changed.
A lot of power, static allocating everything limited the program grow, so I learned a lot of new things, including pointers.
It's like those things that you never knew about, but when you discover them, you think: Where have you been during my whole life?

Nominal Animal · « **Reply #16 on:** November 12, 2021, 10:56:07 am »

Quote from: HwAoRrDk on November 10, 2021, 05:07:21 pm

Thank you for the 'C Pointers 101', but it really wasn't necessary. Unfortunately, today I am one of those "20+ yr experienced programmers" (but not all in C) Kjelt referred to being put on his arse by brain fade induced by late night tiredness.

Don't be offended, though; we didn't know, and in any case, these posts are indexed search engines, and it is common for others to stumble on the thread afterwards.

I myself do not write "answers" or "help" dedicated to the asker, but try to expand a bit, covering not just the asker's particular requirements, but also some of the alternatives in case someone else has the same problem, but slightly different requirements.

The reason is that I just cannot do the focused, narrow answers without any context; and helping only the original asker my way is unlikely to be worth anything to anyone. Call it a personality flaw of mine. But at no time should the depth or style of my answers be taken as indicative of my understanding of the asker's knowledge: I always write in the "101 style", trying to keep "everyone along", because I cannot help it. I've accidentally pissed off many members because they thought I was unaware of their knowledge, but that's not the case; I just have that urge of trying to explain things so that even interested passersby can gain from the discussion details, that I cannot seem to control.

Quote from: HwAoRrDk on November 10, 2021, 05:07:21 pm

I was more concerned with what has been latterly discussed with regard to operator precedence. Like, is *(*in)++ actually doing what I think it's doing? Apparently not. Guess I should do the pointer incrementation separately from the read.

Or, do like I do, and use "extra" parentheses to ensure the expression does what you want it to do: *((*in)++) to dereference the pointer twice, and post-increment the dereferenced pointer, or (*(*in))++ to dereference the pointer twice, and post-increment the twice-dereferenced value.
(Well, actually, now that I look at that, I too would prefer to split the dereference and increment to separate statements.)

I just cannot memorize details like that; my memory does not work that way. But I can effectively compensate (by using parentheses, and rely on man pages for standard C APIs, like whether memset() takes (pointer, size, value) or (pointer, value, size) – a surprisingly common bug!), and the resulting code is both explicit, more reliable (because I don't *trust*, I *check*), and just as efficient as if I relied on the inherent operator precedence order.

Yeah, I know that cow-orkers can make snide remarks when they see I always have a terminal window open on a man page, or a browser window open to Linux man-pages online (which are useful on non-Linux OSes too, because each page in sections 2 and 3 have a Conforming To section, describing where and when one can expect the function or facility to be available). After a few months, when they get familiar with my output, they tend to change their opinion and adopt the same approach. (It's the higher ups and nontechnical people, who don't get that results are more important than appearance.)

brucehoult · « **Reply #17 on:** November 12, 2021, 12:14:47 pm »

Quote from: HwAoRrDk on November 09, 2021, 09:57:57 pm

In a C program, if I want to pass a pointer to a buffer in to a function, then within that function read from the buffer, but also increment the original pointer, is this the right way?

Code: [Select]
void foo(uint8_t **in) { uint8_t n; n = *(*in)++; // read a byte from buf, but also increment the original pointer }

It's perfectly correct, but perhaps not the best way to do it.

I'm not sure what you're planning to do with n. As it stands the compiler will just optimise everything away.

Let's say it's a global:

Code: [Select]

#include <stdint.h>

uint8_t n;

void foo(uint8_t **in) {
    n =  *(*in)++; // read a byte from buf, but also increment the original pointer
}

It's pretty useful to compile the code to assembly language and see what it does ... https://godbolt.org/z/r1bdb1Ers ... with added comments:

Code: [Select]

foo:
        lw      a5,(a0)   # fetch the pointer to the buffer from memory
        addi    a4,a5,1  # increment the pointer value
        sw      a4,(a0)   # store the incremented value back to the original pointer
        lbu     a4,(a5)   # load the byte pointed to by the original value of the pointer
        lui     a5,%hi(n) # load the high bits of the address of n
        sb      a4,%lo(n)(a5) # store the byte from the buffer into the global n
        ret
n:
        .zero   1

An annoying an inefficient thing about this is that the compiler is forced to store a pointer to the buffer into memory, even if it is only in a register in the calling function.

A better way can be to just pass the pointer, and pass back the updated pointer: https://godbolt.org/z/foz6af9nq

Code: [Select]

#include <stdint.h>

uint8_t n;

uint8_t* foo(uint8_t *in) {
    n =  *in++;
    return in;
}

This produces much less machine code:

Code: [Select]

foo:
        lbu     a4,(a0)  # get a byte from the buffer
        lui     a5,%hi(n) # load the high bits of the address of n
        sb      a4,%lo(n)(a5)  # store the byte to n
        addi    a0,a0,1 # add 1 to the pointer and return it
        ret
n:
        .zero   1

This version has only the (logically necessary) one memory load instead of two, and doesn't need to store the updated pointer value back to memory.

On a modern CPU using registers is much faster than using memory. And of course fewer instructions is faster than using more instructions -- at least in a RISC instruction set. CISC can look like fewer instructions, but expanding to a lot of hidden expensive micro-ops.

SiliconWizard · « **Reply #18 on:** November 13, 2021, 12:32:06 am »

You're making a good point, although it goes into the optimization category and may be a bit much here. Another two points I'm thinking of: your remark applies to functions that are *called*, but in all cases where said function can be inlined, I think the compiler will probably implement this using registers (when it can). Another point of course is that returning the updated pointer works if you have only one such parameter. As soon as you have more than one, you can't.

Well, to be fair though: you may 1/ reply that having more than one 'pointer to pointer' as arguments in a function might be bad style (dunno) and 2/ that even so, you can use return values using structs. Yeah, Returning structs is a bit unusual in C, but it's not forbidden. And if you return say two pointers in a struct, they will still be returned in a pair of registers (usually). I actually do not dislike this style - it's kind of a functional style.

Something like:

Code: [Select]

typedef struct { uint8_t *in; uint8_t *out; } InOutPtrs_t;

InOutPtrs_t foo(InOutPtrs_t InOut)
{
    *InOut.out++ = *InOut.in++;
    return InOut;
}

On RISC-V, you get:

Code: [Select]

foo:
        lbu     a5,0(a0)
        addi    sp,sp,-32
        addi    a0,a0,1
        sb      a5,0(a1)
        addi    a1,a1,1
        addi    sp,sp,32
        jr      ra

Not too shabby! The only thing I'm wondering - sorry if this is yet another brain fart due to time - is why GCC manipulates the sp register while the stack is not even used.
(GCC 11.1.0 here, default RISCV target, -O3)

brucehoult · « **Reply #19 on:** November 13, 2021, 01:50:21 am »

Quote from: SiliconWizard on November 13, 2021, 12:32:06 am

your remark applies to functions that are *called*, but in all cases where said function can be inlined, I think the compiler will probably implement this using registers (when it can).

Yes, if the function is inlined, and that's the only thing taking the address of the pointer, then the taking of the address and subsequent dereference won't be done and it won't be forced to memory.

Quote

Well, to be fair though: you may 1/ reply that having more than one 'pointer to pointer' as arguments in a function might be bad style (dunno) and 2/ that even so, you can use return values using structs. Yeah, Returning structs is a bit unusual in C, but it's not forbidden. And if you return say two pointers in a struct, they will still be returned in a pair of registers (usually). I actually do not dislike this style - it's kind of a functional style.

Right.

What I don't quite understand is why most ABIs limit this to two return registers. It makes perfect sense to use as many registers for function returns as for function arguments -- it's just tail calling the continuation (where the continuation address is passed in ra).

Quote

The only thing I'm wondering - sorry if this is yet another brain fart due to time - is why GCC manipulates the sp register while the stack is not even used.

Bug. It's presumably the result of not running the stack optimisation pass (again?) after doing the "return struct in two registers" optimisation pass.

Clang doesn't touch sp when compiling the same function.

I hope people don't mind using RISC-V for these kinds of examples. I feel the assembly language is significantly more transparent and less cluttered to read than even ARM or MIPS.

SiliconWizard · « **Reply #20 on:** November 13, 2021, 01:57:11 am »

Quote from: brucehoult on November 13, 2021, 01:50:21 am

Quote
Well, to be fair though: you may 1/ reply that having more than one 'pointer to pointer' as arguments in a function might be bad style (dunno) and 2/ that even so, you can use return values using structs. Yeah, Returning structs is a bit unusual in C, but it's not forbidden. And if you return say two pointers in a struct, they will still be returned in a pair of registers (usually). I actually do not dislike this style - it's kind of a functional style.

Right.

What I don't quite understand is why most ABIs limit this to two return registers. It makes perfect sense to use as many registers for function returns as for function arguments -- it's just tail calling the continuation (where the continuation address is passed in ra).

Yes, this is too bad. And yes I think people should be more "aware" of this functional-like approach (which is not commonly seen in C) which looks more elegant, and can even be more efficient. And less bug-prone.

Quote from: brucehoult on November 13, 2021, 01:50:21 am

Quote
The only thing I'm wondering - sorry if this is yet another brain fart due to time - is why GCC manipulates the sp register while the stack is not even used.

Bug. It's presumably the result of not running the stack optimisation pass (again?) after doing the "return struct in two registers" optimisation pass.

Thought so. I'm a bit disappointed currently with GCC for RISC-V. I think it was going better a year or two ago.

Quote from: brucehoult on November 13, 2021, 01:50:21 am

I hope people don't mind using RISC-V for these kinds of examples. I feel the assembly language is significantly more transparent and less cluttered to read than even ARM or MIPS.

Well, it might be a bit harder for people not knowing the RISC-V IS, but I agree that, compared to ARM, MIPS or even x86, it looks cleaner and is easier to read.
The same function in x86_64 assembly:

Code: [Select]

foo:
        movdqu  (%rdx), %xmm0
        movq    %xmm0, %rdx
        movhlps %xmm0, %xmm1
        paddq   .LC0(%rip), %xmm0
        movq    %rcx, %rax
        movzbl  (%rdx), %ecx
        movq    %xmm1, %rdx
        movups  %xmm0, (%rax)
        movb    %cl, (%rdx)
        ret

Yeah. OK.

brucehoult · « **Reply #21 on:** November 13, 2021, 02:59:31 am »

ARMv7 gcc ... all in memory...

Code: [Select]

foo:
        sub     sp, sp, #8
        add     ip, sp, #8
        stmdb   ip, {r1, r2}
        ldrb    ip, [r1], #1    @ zero_extendqisi2
        strb    ip, [r2], #1
        str     r1, [r0]
        str     r2, [r0, #4]
        add     sp, sp, #8
        bx      lr

ARMv7 Clang ... cleaner but return struct it still in memory..

Code: [Select]

foo:
        ldrb    r3, [r1], #1
        strb    r3, [r2], #1
        stm     r0, {r1, r2}
        bx      lr

ARMv8 (er, Aarch64) gcc...

Code: [Select]

foo:
        ldrb    w2, [x0], 1
        strb    w2, [x1], 1
        ret

Better.

Neither ARM assembly language makes it all that clear what the trailing ", 1" or ", #1" does, for those not familiar.

gcc in general seems to be suffering recently. The code base is far more annoying to work on, and getting approved to upstream changes has been a PITA forever. The focus of most compiler developers has moved on to LLVM probably five or six years ago at least, and LLVM has now overtaken gcc in quality for most ISAs.

PlainName · « **Reply #22 on:** November 13, 2021, 09:02:19 pm »

Quote

A better way can be to just pass the pointer, and pass back the updated pointer:
...
This produces much less machine code:

But it's doing less work - something else, which isn't accounted for here, has to update the pointer to achieve the same effect as the first version.

brucehoult · « **Reply #23 on:** November 13, 2021, 09:23:40 pm »

Quote from: dunkemhigh on November 13, 2021, 09:02:19 pm

Quote
A better way can be to just pass the pointer, and pass back the updated pointer:
...
This produces much less machine code:

But it's doing less work - something else, which isn't accounted for here, has to update the pointer to achieve the same effect as the first version.

"pass back the updated pointer"

Achieving the same result while doing less work is the point.

PlainName · « **Reply #24 on:** November 13, 2021, 09:37:50 pm »

The point was the example assembler for it is doing less because it's not updating the pointer. It may be passing it back, but that does nothing until the caller uses it. To be equivalent, you'd need to show the assembler for the calling function doing the update.

I'm surprised that wasn't clear so perhaps I've missed something?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: [C] Ow, pointers are making my brain hurt... (Read 7322 times)

Share me