Author Topic: Oh, C3! (Read 6193 times)

SiliconWizard · « **Reply #25 on:** April 23, 2023, 08:54:50 pm »

Quote from: Siwastaja on April 22, 2023, 05:39:34 am

Quote from: SiliconWizard on April 21, 2023, 08:08:33 pm
'nextcase' doesn't allow to jump around freely but only to the next 'case', as I've got it at least.

No, look closer - clearly that was their original idea, but they feature creeped their "fallthrough" to take an argument, allowing you to jump to any other cases, in any order. I'm ridiculing this feature.

Oh, you're right. You had a closer look than I did.

There's this infamous "labelled nextcase". https://c3-lang.org/statements/#nextcase-and-labelled-nextcase

It's not even merely a "label", it's an expression that can be evaluated at run-time and this acts as though the code flow was looping back to the switch select with a value given by the "label" (which again isn't a label.)

That doesn't look good.

Oh, and incidentally, with this construct, if the expression given to nextcase evaluates to a value that is not handled by any 'case', then it becomes an infinite loop. Nice!!

Or does the switch just exit in this case? Who knows, not sure I saw that clearly.

That said, the guy maybe had the typical state machine construct in mind. For which I would favor setting a variable with the next state, rather than directly controlling the flow anyway.
His approach avoids having to put the switch inside a loop, but then it makes a potentially implicit loop, which is horrific.

The benefit of using a variable holding the state is that it's much easier to trace. If you control the flow directly in many places, it makes tracing much more tedious.

PlainName · « **Reply #26 on:** April 25, 2023, 06:45:31 pm »

Quote

I'm sick and tired of bugs introduced by using sizeof() on an array when you should have used sizeof(a)/sizeof(a[0])

All my project have:

Code: [Select]

#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))

Nominal Animal · « **Reply #27 on:** April 25, 2023, 07:19:46 pm »

Ever since the last C-related programming language discussion, I've now and then examined the typical cases where human programmers make most errors in C.

I'm pretty darned convinced that the only fundamental change (in addition to various syntax etc. additions we have discussed here in other threads previously), would be to replace pointers with arrays as the base memory reference type. (There are details wrt. read-only strings I'm not sure about, though.) That would let the compiler do compile-time memory access validation, helping kill buffer-related bugs. All other changes could be done incrementally, by replacing the C standard library.

Whenever I do such experiments –– i.e., how would code look like and what would it compile to, if I tweaked the compiler and libraries just so –– I always discover that the end result I desire is obtainable by a smaller 'real' change (but large paradigm/approach/theory-wise change) than one would initially assume or believe. Furthermore, the things most developers get stuck on –– myself included, unless I monitor myself to explicitly avoid this ––, end up not affecting the actual language use much; only how it looks like on the surface, and how it can be described to other people. Unimportant fluff, in other words.

To repeat from those other threads, I would like additional features that provide an unordered (data-parallel) for loop constructs, as well as a way to tell the computer that two independent code sections can be interleaved (that their relative order is unimportant, perhaps at block level). But these are optimization, things that are somewhat difficult for compilers to optimize using current C rules; and I have not verified what kind of constructs would be needed and what kind of changes needed to the C standard abstract machine model to address these. The all-pointers-are-actually-arrays change, however, would be suprisingly straightforward.

PlainName · « **Reply #28 on:** April 25, 2023, 08:36:25 pm »

Quote

replace pointers with arrays as the base memory reference type

Why would that be better?

DiTBho · « **Reply #29 on:** April 25, 2023, 09:43:49 pm »

Quote from: PlainName on April 25, 2023, 08:36:25 pm

Quote
replace pointers with arrays as the base memory reference type

Why would that be better?

because they can transport borders and size { begin, end, size }

Nominal Animal · « **Reply #30 on:** April 25, 2023, 09:53:48 pm »

Quote from: PlainName on April 25, 2023, 08:36:25 pm

Quote
replace pointers with arrays as the base memory reference type
Why would that be better?

It makes it possible for the compiler (with current gcc and clang static analysis/warning capabilities) to verify buffer accesses are valid. (That is, the compiler knows at compile time if each access is "valid" (safe, within the buffer), "invalid" (overrun/underrun), or "undetermined"; with the last one only affecting code that uses pointers or tricky indexing math whose limits are unknown at compile time.)

To explore this yourself, write some test code using the pattern
type1 somefunc(size_t len, type2 buffer[len], ...)
i.e. instead of pointers, you pass an array; and to avoid having the array auto-decay to a pointer, you need to specify its size too.
If you introduce a typical buffer overrun bug in such a function, no matter how deep in a call chain, the compiler will tell you if you enable the relevant warnings. There are no added run-time checks at all; look at the generated machine code too.

This alone does not do anything for existing code, say for example strlen(). The idea is to use the change to rewrite the standard library in a form where pointers are replaced with array references. Currently it is a bit cumbersome, for example strlen() would best be written as
ssize_t strlen(size_t len, const char s[len]);
so at minimum we'd need the compiler to allow the size of a parameter array to be defined later in the explicit parameter list (with ssize_t from stolen from POSIX; C currently uses int for it, which is problematic on LP64 architectures).

The necessary changes, as I said, are surprisingly small. The effects, however, to how easily statically analyzable it makes efficient code, are surprising. You really do need to experiment with it to see the possibilities. As I also said, read-only/immutable strings have peculiarities I'm not sure yet how best to deal with, but basically the rest of POSIX C -like functionality (i.e., with different API/function signatures, but same or very similar functionality) is quite straightforward. Oh, and functions allocating or reallocating memory, returning an array reference, may need syntactic sugar (as currently they really need to return a struct containing the start address of the allocated memory, and length in bytes).

(What I am not sure about yet, is whether we need a new non-scalar base "type" with two properties, start address and length. Currently, we can do that and slicing (three properties: start address, step size, and count), just fine using structures. But there might be additional compiler optimization/compile time static analysis opportunities, if it was a base type to begin with.)

PlainName · « **Reply #31 on:** April 25, 2023, 10:02:12 pm »

Quote from: Nominal Animal on April 25, 2023, 09:53:48 pm

Quote from: PlainName on April 25, 2023, 08:36:25 pm
Quote
replace pointers with arrays as the base memory reference type
Why would that be better?
It makes it possible for the compiler (with current gcc and clang static analysis/warning capabilities) to verify buffer accesses are valid. (That is, the compiler knows at compile time if each access is "valid" (safe, within the buffer), "invalid" (overrun/underrun), or "undetermined"; with the last one only affecting code that uses pointers or tricky indexing math whose limits are unknown at compile time.)

Ah! Of course, I was stuck in pointer mode thinking the array would be passed as a pointer and just look like an array to the programmer. But I see now that's not the idea

SiliconWizard · « **Reply #32 on:** April 25, 2023, 11:11:07 pm »

That's more or less akin to always using the base pointer to an allocated block (rather than accessing it through a pointer that could point arbitrarily inside, or even outside of it) and some index for accessing its content. You also need to store the size.

After which it does look like a full-fledged array indeed.

That's something you can always do in pure C though, even if that means a bit more programming overhead and possibly a bit less opportunity for optimization (even though that would remain to be seen in practice.)

I have written a header file years ago, that I still use to this day (with some minor evolutions), that exposes a few macros to encapsulate dynamic memory management, including "dynamic arrays", which are accessed via indices only (either with, or without bounds checking depending on the use case.) I haven't directly called any malloc/realloc/free ever since. The runtime overhead is either zero or extremely small.

Sure having that built-in would be nice, but point is, this approach can still be - at least in essence - used without designing a new language. I for one wouldn't use C without this small "library" I wrote more than a decade ago, at least for anything requiring dynamic allocations. For pure static allocation stuff, part of it can still be used.

The basic idea is to create a type for your 'arrays', something like this:

Code: [Select]

typedef struct
{
    BaseType *Block;    // pointer to your memory block, from static or dynamic allocation
    size_t n;    // current number of items of type 'BaseType' in the memory block
    size_t nMax;    // max number of items of type 'BaseType' in the memory block
}   Array_t;

That can be initialized from a statically-allocated array just like so:

Code: [Select]

BaseType Array[xxx];

Array_t MyArray = { .Block = Array, .n = 0, .nMax = ARRAY_SIZE(Array) };

The variant for dynamically-allocated arrays is also easy. With this simple construct, one can see that handling dynamic arrays becomes 'straightforward'.

Accessing it is just:

Code: [Select]

MyArray.Block[index] // no bounds-checking, can be used with zero overhead when guarantees about 'index' are sufficient
MyArray.Block[index < MyArray.n? index : MyArray.n - 1] // bounds-checking, default to "saturating" the index

// You can add variants of bounds-checking that will execute some code in case of an out-of-bounds condition if needed.

With a few macros, that can be declared, and manipulated for just any base type very easily.
Some will find that clunky, especially if they don't like macros, as doing this without macros will be even clunkier.
Others will do this kind of stuff with macros and move on.

In any case, you'll indeed realize that directly playing with pointers is rarely necessary, and can be left to the very occasional and very low-level stuff.

The benefits of keeping all your pointers only pointing to *objects*, rather than potentially pointing arbitrarily *inside* an object are multiple.

PlainName · « **Reply #33 on:** April 25, 2023, 11:43:22 pm »

I do a similar thing but use functions rather than macros. There's an overhead in the function call, but more scope to mess around when debugging without affecting anything else. Also doesn't rely on the programmer remembering to use the safe access when appropriate

Nominal Animal · « **Reply #34 on:** April 26, 2023, 12:12:52 am »

It is very interesting (but unclear) to me exactly why most C programmers – myself included – prefer
rettype funcname(elemtype *ptr, size_t len);
over
rettype funcname(size_t len, elemtype buf[len]);
even though the only difference in machine code is the order of parameters; but the latter API pattern allowing much better buffer access checking at compile time, even through deep call chains (each call limiting to a smaller sub-array), helping catch buffer underrun/overrun errors.

The easy answer is inertia (or habit or familiarity or because everyone else does it that way too), but I'm not sure it is the whole answer.
Isn't it interesting how rarely anything like this (arrays-not-pointers) is suggested for "the next C", even though memory or buffer over/underrun bugs are the most common issues in C code?

Quote from: PlainName on April 25, 2023, 10:02:12 pm

Quote from: Nominal Animal on April 25, 2023, 09:53:48 pm
Quote from: PlainName on April 25, 2023, 08:36:25 pm
Quote
replace pointers with arrays as the base memory reference type
Why would that be better?
It makes it possible for the compiler (with current gcc and clang static analysis/warning capabilities) to verify buffer accesses are valid. (That is, the compiler knows at compile time if each access is "valid" (safe, within the buffer), "invalid" (overrun/underrun), or "undetermined"; with the last one only affecting code that uses pointers or tricky indexing math whose limits are unknown at compile time.)
Ah! Of course, I was stuck in pointer mode thinking the array would be passed as a pointer and just look like an array to the programmer. But I see now that's not the idea

Yep. It's more like a cultural change than a technical one, even though its purpose is purely technical: help with compile-time static analysis wrt. buffer accesses.

Quote from: SiliconWizard on April 25, 2023, 11:11:07 pm

That's more or less akin to always using the base pointer to an allocated block (rather than accessing it through a pointer that could point arbitrarily inside, or even outside of it) and some index for accessing its content. You also need to store the size.

Actually, what I want is for the compiler to be aware of the size whenever it is known at compile time.

If you consider the two funcname() definitions at the beginning of this post, you can clearly see the difference between the pointer and the array approach. This difference is the critical one; it is not about adding explicit size information to interfaces that currently use a pointer only. (Except for memory allocation functions: these should return both the allocated size and the base address, instead of just the base address. This would actually be desirable in many grow-as-needed use cases, considered completely separately. Oh, and possibly the string functions, which deserve to be redesigned anyway.)

Quote from: SiliconWizard on April 25, 2023, 11:11:07 pm

That's something you can always do in pure C though, even if that means a bit more programming overhead and possibly a bit less opportunity for optimization (even though that would remain to be seen in practice.)

Note that the change would not cause any change to runtime code, no inherent additional runtime memory or CPU overhead at all.

Many string functions would actually add an explicit size parameter (ABI-wise), but I consider that a plus (and a deficiency in current standard C library string functions). I've discussed the related issues especially in embedded environments before; let's just say that string handling can be done much better (faster, more reliably) even in current C than what the standard C library provides.

Passing an array forwards is trivial even in current C (since C99), although the size of the array must be before the array in the parameter list, but receiving an array from a function call is not supported. Thus far, in my experiments I've simply assumed syntax "elemtype arrayname[sizetype count] = ...;" (declaring two variables at once, initialized by a single function call returning both the base pointer and the size, with the size divided by the element size to obtain the count "automagically"), but I'm sure better syntax can be devised.

Quote from: SiliconWizard on April 25, 2023, 11:11:07 pm

The basic idea is to create a type for your 'arrays', something like this:
Code: [Select]
typedef struct { BaseType *Block; // pointer to your memory block, from static or dynamic allocation size_t n; // current number of items of type 'BaseType' in the memory block size_t nMax; // max number of items of type 'BaseType' in the memory block } Array_t;

Yes, I use this pattern extensively. For some reason, I use 'used' for the current number of items, and 'size' for the maximum number of items, and 'item' for the pointer or C99 flexible array member. It is very common to see a variant of
typedef struct {
size_t size;
size_t used;
elemtype item[];
} elem_array;
in my code.

Indeed, whenever this information is already available, why don't we tell the C compiler about it, so it can help check the array boundaries for us at run time?

This is the core of this suggestion. Not to add size and/or used to everywhere (except functions that in my opinion should have had the size from the beginning even in the standard C library), but to help the compiler understand better exactly what us humans intend, and help catch our thinkos at compile time.

DiTBho · « **Reply #35 on:** April 26, 2023, 08:54:36 am »

Quote from: Nominal Animal on April 26, 2023, 12:12:52 am

It is very interesting (but unclear) to me exactly why most C programmers – myself included – prefer
rettype funcname(elemtype *ptr, size_t len);
over
rettype funcname(size_t len, elemtype buf[len]);

actually I do prefer

ans_t funcname(buffer_t buffer)

DiTBho · « **Reply #36 on:** April 26, 2023, 09:08:11 am »

Quote from: Nominal Animal on April 26, 2023, 12:12:52 am

whenever this information is already available, why don't we tell the C compiler about it, so it can help check the array boundaries for us at run time?

even better, why don't we tell the ICE about it? so it can help automatic test-cases and autonomously check boundaries for us at run time?

even better++, why don't we facilitate AI-assisted ICEs? So they can also use that information to identify common patterns in pieces of code that have a high probability of containing bugs.

JPortici · « **Reply #37 on:** April 26, 2023, 10:20:59 am »

Quote from: PlainName on April 25, 2023, 06:45:31 pm

Quote
I'm sick and tired of bugs introduced by using sizeof() on an array when you should have used sizeof(a)/sizeof(a[0])

All my project have:
Code: [Select]
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))

of course, and i should do that.
However a lengthof(x) operator that can only take arrays as an input would have been better than a macro (ISTR lengthof as an extension in some compilers)

SiliconWizard · « **Reply #38 on:** April 26, 2023, 08:40:00 pm »

Quote from: PlainName on April 25, 2023, 11:43:22 pm

I do a similar thing but use functions rather than macros. There's an overhead in the function call, but more scope to mess around when debugging without affecting anything else. Also doesn't rely on the programmer remembering to use the safe access when appropriate

Sure, problem is that you can't avoid macros to generate the type definitions themselves (such as the Array_t example I gave, for any given base type.)
You can kind of work around it by using a void * pointer for the allocated block and add an additional member for the 'element size', but then you lose any basic static check, and you suddenly get an even better way of shooting yourself in the foot than directly messing with pointers.

Or you hand-write every 'array' type definition, which is horrible. Macros are for avoiding to retype the same text over and over again, and that's what I use them for.
Want to add a member to your 'generic' type? Just modify the macro. Macros need some care to avoid the usual pitfalls, but when used with some care, they are infinitely preferable to duplicating code.

Siwastaja · « **Reply #39 on:** April 27, 2023, 09:06:56 am »

Quote from: SiliconWizard on April 26, 2023, 08:40:00 pm

Or you hand-write every 'array' type definition, which is horrible. Macros are for avoiding to retype the same text over and over again, and that's what I use them for.
Want to add a member to your 'generic' type? Just modify the macro. Macros need some care to avoid the usual pitfalls, but when used with some care, they are infinitely preferable to duplicating code.

And I truly believe the C preprocessor is one of its strongest points and reason why C became so popular. People who invent "C replacements" tend to miss this fact. The first thing they do is they remove the preprocessor because it's so inelegant and dangerous; yet fail to come up with something with at least the same capabilities.

C programmers have this love-hate relationship with the preprocessor. It's horrible, but it's surprisingly powerful and makes it possible to do generic programming in C.

DiTBho · « **Reply #40 on:** April 27, 2023, 11:23:51 am »

Quote from: Siwastaja on April 27, 2023, 09:06:56 am

Quote from: SiliconWizard on April 26, 2023, 08:40:00 pm
Or you hand-write every 'array' type definition, which is horrible. Macros are for avoiding to retype the same text over and over again, and that's what I use them for.
Want to add a member to your 'generic' type? Just modify the macro. Macros need some care to avoid the usual pitfalls, but when used with some care, they are infinitely preferable to duplicating code.

And I truly believe the C preprocessor is one of its strongest points and reason why C became so popular. People who invent "C replacements" tend to miss this fact. The first thing they do is they remove the preprocessor because it's so inelegant and dangerous; yet fail to come up with something with at least the same capabilities.

C programmers have this love-hate relationship with the preprocessor. It's horrible, but it's surprisingly powerful and makes it possible to do generic programming in C.

sure! why not? in fact cpp was the first thing being banned and removed entirely in my-c

It can be done, and my-c doesn't miss anything, just it solves problems differently and makes life easier

Whereas C/89/99 ... well, the last bug i fought in the Linux kernel was a typo with "#define something SPACE MISTAKE etc" which got through the build Gcc-v12 steps but caused a silent but catastrophic and sneaky bug, and I wasted three weeks on it

DiTBho · « **Reply #41 on:** April 27, 2023, 11:31:56 am »

Quote from: Siwastaja on April 27, 2023, 09:06:56 am

C programmers have this love-hate relationship with the preprocessor

Also, include those who write analysis software and develop ICE tools.
For all of us cpp is more than terrible.

Siwastaja · « **Reply #42 on:** April 27, 2023, 01:55:14 pm »

CPP = C Plus Plus

DiTBho · « **Reply #43 on:** April 27, 2023, 02:08:10 pm »

CPP = C Pre Processor

belongs to
sys-devel/gcc ---> /usr/$arch-$computer-linux-gnu/gcc-bin/$gcc_version/cpp
dev-lang/gcc_gnat ---> /usr/$arch-$computer-linux-gnu/gnat-bin/$gcc_version/cpp

(gcc_gnat is ... gcc + gnat_ada_core, recompiled as gcc with languages={C, Ada} )

c++ = C plus plus
g++ = GNU C plus plus

DiTBho · « **Reply #44 on:** April 27, 2023, 02:18:08 pm »

cpp also belongs to
overlay@idp: sys-devel/my-c~MIPS5++ ---> /usr/idp/my-c-bin/$my-c_version/cpp

what?

didn't you say that cpp was banned?

Yup!

So why is cpp there in the my-c tree?

to cure your inertia at being tempted to invoke it with ... random punishments in form of
- console blocked for 5 minutes (like with the "SL" ncurses program)
- you cannot do nothing but get your random insults, ncurses full screen

(so at the end of the day you would like to DELETE it, and you cannot because you don't have root permissions)

SiliconWizard · « **Reply #45 on:** April 27, 2023, 07:41:13 pm »

Quote from: Siwastaja on April 27, 2023, 09:06:56 am

C programmers have this love-hate relationship with the preprocessor. It's horrible, but it's surprisingly powerful and makes it possible to do generic programming in C.

I personally don't hate the preprocessor at all - I find it very useful.

Yep, every attempt at replacing the preprocessor to achieve the same level of generic programming have either led to something much less flexible/powerful, or true untamable and unverifiable monsters.

What many people seem to miss - and that Wirth has kept saying over and over again - is that simplicity should be a goal.
C is simple, the C preprocessor is simple.

C++ templates are monsters.

DiTBho · « **Reply #46 on:** April 28, 2023, 10:14:54 am »

Quote from: SiliconWizard on April 27, 2023, 07:41:13 pm

every attempt at replacing the preprocessor to achieve the same level of generic programming have either led to something much less flexible/powerful, or true untamable and unverifiable monsters.

every attempt? except my-c, so it's some but not all

DiTBho · « **Reply #47 on:** April 28, 2023, 10:20:45 am »

even replacing #define macro() in cpp with a true compiler built-in macro() mechanism is better

everything that doesn't pre-process the source is better because it doesn't hide information

PlainName · « **Reply #48 on:** April 28, 2023, 10:39:41 am »

Quote

everything that doesn't pre-process the source is better because it doesn't hide information

But isn't that one of the main features of functions? They hide lots of nitty gritty detail behind a simple name (and, of course, let you reuse code without repeating it, which is also what macros can do).

DiTBho · « **Reply #49 on:** April 28, 2023, 03:26:26 pm »

macros (by cpp) vs functions:
- functions are not pre-processed but compiled
- macro does not check any Compile-Time Errors, Function checks Compile-Time Errors

the second is what I meant: you lose information during pre-processing.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Oh, C3! (Read 6193 times)

Share me