Author Topic: Global variables - Evil or not (Read 8622 times)

Nominal Animal · « **Reply #75 on:** October 27, 2022, 10:58:29 am »

One useful approach to use and avoidance of global variables is to consider applications and the conflict between single or multiple documents per process.

It is useful to group the state variables in an application between "application state" (application configuration, like which windows are shown and where, what language is the user interface, and so on) and "document state" (document or current task configuration, like say page size, content language, and so on).

If you write an application that creates a new process for each document opened, then you can keep document state in global variables just like application state. However, each instance is separate, so if you e.g. change the user interface language in one, other open instances are not affected.
(Note that you do not actually waste memory doing this in current operating systems, because they use virtual memory, and directly map libraries and executables to memory. Each process has their own stack and kernel metadata, and of course any read-write data memory segments/sections/areas, but there is only one copy of the read-only code segments in memory.)

If you write an application that can handle multiple documents at once, you have "application state" (configuration) and "document state" (document settings). While "application state" is global to the entire process, the "document state" is specific to each document, and you must not keep any document state in global variables. If the user starts a new copy of the application while one is running, instead of running normally, the new instance just sends an event or file open request or new document request to the already running application, and then exits.

Now, extend that idea to a library, or even to a facility in your own program or microcontroller hardware.

If there cannot be more than one instance because of physical limitations, then global state is fine. However, often it is more useful to be able to create multiple instances of the thing/facility; and then you cannot use global state.

Here is a practical example. I love to use the Xorshift64* pseudo-random number generator, because it is extremely fast, and because if one uses only the 32 high bits of the result, it passes all randomness test in the BigCrush randomness test suite:

Code: [Select]

#include <stdint.h>

/* Xorshift64* PRNG state.
   To randomize the sequence, initialize to any nonzero value.
*/
static uint64_t  prng_state = 1;

static inline uint32_t  prng_u32(void)
{
    uint64_t  x = prng_state;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    prng_state = x;
    return (x * UINT64_C(2685821657736338717)) >> 32;
}

I left out the randomizing function, because I like to use the Linux-specific getrandom() call for it, but one could also use POSIX clock_gettime() or even the C89 time() or BSD gettimeofday() for time-based initialization.

The above works, is very fast and very random (assuming you choose a random initial 64-bit seed, prng_state, as otherwise it obviously produces always the same sequence).

However, you only have one generator per process.

Let's say you have a multithreaded process, but you still need repeatable results (based on seeds for each thread or task at hand), so you cannot use one global pseudorandom number generator. What to do?

Well, make it non-global, of course. The first step:

Code: [Select]

typedef  struct {
    uint64_t  state;
} prng;
#define  PRNG_INIT(seed)  { .state = (seed) }

/* void prng_randomize(prng *g); -- omitted, because it tends to be OS-specific */

void prng_init(prng *g, uint64_t seed)
{
    if (g) {
        /* Note: zero seed is not allowed, so we replace that with 1. */
        g->state = seed | !seed;
    }
}

uint32_t prng_u32(prng *g)
{
    if (g) {
        uint64_t  x = g->state;
        x ^= x >> 12;
        x ^= x << 25;
        x ^= x >> 27;
        g->state = x;
        return (x * UINT64_C(2685821657736338717)) >> 32;
    } else {
        return 0;
    }
}

This way, in your own code, you just declare a random number generator you want to use, and initialize it to the desired seed,
prng mine = PRNG_INIT(1); /* Seeded with 1 */
or equivalently
prng mine;
prng_init(&mine, 1);
or randomized via
prng mine;
prng_randomize(&mine);
and then generate the pseudorandom sequence using prng_u32(&mine) calls.

(It is straightforward to extend such a state structure to contain both the needed state, as well as a pointer to the generator function, so that one can choose the PRNG implementation at run time. Since the high 32 bits of Xorshift64* beats even Mersenne Twister in randomness, I don't bother anymore, though.)

See how the concept of "state" generalizes? It is quite intuitive, and functional approach to global variables.

A counterexample, and a common one seen in Arduino sketches:

Code: [Select]

int i;

void foo(void)
{
    /* ... */

    for (i = 0; i < 5; i++) {

        /* ... */

        bar();

    }
}

void bar()
{
    /* ... */

    for (i = 0; i < 15; i++) {
        baz();
    }

    /* ... */
}

The bug is obvious: whenever we call bar() from within the loop in foo(), the loop index variable gets modified, and thus the code does something completely different than what a straightforward reading of the code would indicate. In this case, foo() will only call bar() once.

The key is to think about why this is a bug. We've erroneously globalized the internal state of foo(), which then gets unexpectedly modified by a call to foo().

Apologies for the wall of text, but hopefully it was worth reading.

Siwastaja · « **Reply #76 on:** October 27, 2022, 12:57:22 pm »

Keep state as local as possible is a good general advice not limited to globals vs. locals.

I hate to see C code where all local variables are defined at the start of the function; especially if it's a bunch of loop variables, temporary variables of different types, etc. Even the older C standards allowed definition at the start of a block, and for a long time, variables can be defined in the middle of block (and inside the for loop initializer, for(int i = 0; ...))

This not only limits the scope of the variables, actually preventing bugs (because using the name elsewhere now just results in a compiler error about undefined name), but also makes code much more readable as you don't need to scroll far to find what is going on.

The same is about keeping static state. At least I went through the learning stages: first I just used global variables to keep state. Next I learned they don't need to be exposed globally, make them static. Later I realized I can put them inside function, if the use is limited to one function, limiting the scope to this function only. And last, I realized it doesn't need to be at the start of the function. Consider the differences in readability:

Code: [Select]

int uart_initialized = 0;

void initialize_uart()
{
    ... do things ...

    if(!uart_initialized)
    {
        ... things ...
        uart_initialized = 1;
    }

    ... things ...
}

Code: [Select]

void initialize_uart()
{
    static int uart_initialized = 0;
    ... do things ...

    if(!uart_initialized)
    {
        ... things ...
        uart_initialized = 1;
    }

    ... things ...
}

Code: [Select]

void initialize_uart()
{
    ... do things ...

    static int uart_initialized = 0;
    if(!uart_initialized)
    {
        ... things ...
        uart_initialized = 1;
    }

    ... things ...
}

Code: [Select]

void initialize_uart()
{
    ... do things ...

    // you can even limit the scope further!
    {
        static int uart_initialized = 0;
        if(!uart_initialized)
        {
            ... things ...
            uart_initialized = 1;
        }
    }

    ... things ...
}

Buriedcode · « **Reply #77 on:** October 27, 2022, 05:41:51 pm »

Quote from: brucehoult on October 20, 2022, 10:57:26 pm

Quote from: Jester on October 20, 2022, 05:47:08 pm
Context:
1) I'm not a programmer I just dabble in C and c++ to a small extent.
2) Embedded application running on a smallish uC for example a small sub module in a automobile, for example: bool wipersOn = false;

I appreciate a global variable in some large program for some large company for example American Airlines would probably be a really bad idea.

The moment you google "global variable" half the comments are DON'T DO IT you will burn in hell.

I have used global variables forever and have never had any issues. Am I missing something?

There are two different issues:

1) static lifetime for a variable. This is absolutely necessary for statefull programming. Sometimes these variables may even need to be in EEPROM rather than RAM so they survive reset/power off.

2) too much visibility of a static variable, such that you can't easily know which parts of the program read and/or write to it.

Tools such as an IDE or even simply grep on the command line can help you find all the places a variable is used.

If a variable is used in a single function then it can be declared as "static" inside that function.

If a variable is used in a small number of functions then it and those functions can be gathered in a single C file and the variable global in that file but marked as static. Other C files will then not be able to refer to it by accident.

In a small program none of this is a problem. The potential problems arise in large programs with poor modularity discipline.

I just wanted to double the visibility of this and highlight the point

Having worked on 8-bit MCUs with 1k program space, and at the other extreme on a team working on a large multi OS app, certain "rules" simply cannot be applied equally to both ends of that spectrum.
I suppose habits that are completely device/code/compiler independant such as variable naming conventions can cover almost every area, but as soon as it relates to architecture the differences are so great the number of sweeping generalisations diminishes.

coppice · « **Reply #78 on:** October 27, 2022, 05:49:53 pm »

Quote from: Buriedcode on October 27, 2022, 05:41:51 pm

Having worked on 8-bit MCUs with 1k program space, and at the other extreme on a team working on a large multi OS app, certain "rules" simply cannot be applied equally to both ends of that spectrum.
I suppose habits that are completely device/code/compiler independant such as variable naming conventions can cover almost every area, but as soon as it relates to architecture the differences are so great the number of sweeping generalisations diminishes.

Oh, the same rules can be applied, and often are. Someone used to big software tries writing for a small MCU, fails to adapt and concludes the application just cannot fit. The project is cancelled. If someone more flexible later picks up the problem a product may be produced.

NorthGuy · « **Reply #79 on:** October 27, 2022, 06:07:28 pm »

Quote from: Siwastaja on October 27, 2022, 12:57:22 pm

Next I learned they don't need to be exposed globally, make them static.

You have to. But not all people know this. It is very easy for to put a variable into a .c file (as opposed to .h) and think that it is now hidden from the outside. And it is, sort of, true - the compiler which compiles other .c files doesn't see it. But the linker does. So, if someone has two .c files with a variable named "counter", the linker will map both to the same memory space, creating complete mess. To prevent this, you must use "static".

This confusion happens because for the compiler there's no difference whether something is in the .h file or .c file. The compiler doesn't know which part of the code is going to be exposed to other compilation units through the .h file, and which part is not. Hence, the rules for both .h and .c files cannot be different. If you put a variable into the .c file, the result is the same as if it was in the .h file. In contrast, pascal has two different sections - "interface" for exposed variables and "implementation" for anything local to the compilation unit. In C you must use "static".

If someone forgets "static" for a file-scope variable it will become gloabal for the linker. But this is unintentional. This doesn't affect the use of global variable when you really meant them to be gloabal.

uer166 · « **Reply #80 on:** October 27, 2022, 07:42:16 pm »

Quote from: Siwastaja on October 27, 2022, 12:57:22 pm

Keep state as local as possible is a good general advice not limited to globals vs. locals.

You can go even further and say that state as a general concept is the root cause of a lot of complexity and some % of hard to reproduce bugs. I think in certain cases it's actually a good thing to centralize state (whether you make it all global or not), such that the system is easily observable, and all state is accounted for. Just imagine some power converter with a few control loops that use global V/I/PID state/Lead-lag state variables, but where each function is entirely state-less on its' own. To me that's a perfectly valid solution that doesn't bury scope of state to the point that it's hidden.

Idk why I'm still rambling on this other than to show that nothing a fit-for-all advice, and appeal to authority is not really a valid defense of any argument. Reducing total state space of a running program is a much more agreeable concept to most though, compared to globals vs. locals vs. shit and sticks.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Global variables - Evil or not (Read 8622 times)

Nominal Animal

Re: Global variables - Evil or not

Siwastaja

Re: Global variables - Evil or not

Buriedcode

Re: Global variables - Evil or not

coppice

Re: Global variables - Evil or not

NorthGuy

Re: Global variables - Evil or not

uer166

Re: Global variables - Evil or not

Share me