Since each ADC sample is 32-bit, let's assume you use 48 bits (6 bytes) for each sample set, 8 bits for the number of 32-bit samples in the set, and 32+8=40 bits for the sample sum. One possibility in C is to use
#define SETS 150
uint64_t sum_total;
uint32_t sum_count;
uint32_t average;
volatile uint32_t set_sum[SETS];
volatile uint8_t set_overflow[SETS];
volatile uint8_t set_count[SETS];
uint8_t set_index;
uint8_t set_size;
This takes 18+6×SETS = 918 bytes of RAM. Initially, all these are cleared to all zeroes, except set_size to SETS; it sets the window duration in number of updates (0.2 second units) between 2 and SETS, inclusive. It is okay to initialize everything again to all zeros (except set_size to the new window duration), so that previous measurements are completely ignored and a new averaging window is started from scratch.
Whenever a 32-bit ADC sample is acquired, you do
if (__builtin_add_overflow(set_sum[set_index], SAMPLE, &(set_sum[set_index]))
set_overflow[set_index]++;
set_count[set_index]++;
or equivalent, i.e. add the sample value to the current sum, and increment overflow if the 32-bit value overflowed; and finally increment the current count. Normally, each set_count[] value will be between 0 and 80 (=400/5).)
To update the average, you do something like
uint32_t state = begin_atomic();
uint8_t new_count = set_count[set_index];
uint32_t new_sum = set_sum[set_index];
uint8_t new_overflow = set_overflow[set_index];
if (++set_index >= set_size)
set_index = 0;
uint8_t old_count = set_count[set_index];
uint32_t old_sum = set_sum[set_index];
uint8_t old_overflow = set_overflow[set_index];
set_count[set_index] = 0;
set_sum[set_index] = 0;
set_overflow[set_index] = 0;
end_atomic(state);
sum_count += new_count;
sum_count -= old_count;
sum_total += new_sum + (uint64_t)(new_overflow) << 32;
sum_total -= old_sum + (uint64_t)(old_overflow) << 32;
if (sum_count > 0)
average = (sum_total + sum_count/2) / sum_count;
else
average = 0; // Do not report an average
where the +sum_count/2 adds rounding halfway upwards, instead of truncation towards zero. It is somewhat important that this occurs at regular 0.2 second intervals.
Notice that the atomic (uninterruptible) part just updates the variables (to temporary variables), so it should not last for more than maybe two dozen clock cycles. All the math is done outside the critical/atomic section, and since it is only done about 5 times a second, won't need much resources even if you are using an 8-bit microcontroller.
Of course, there are many other approaches one could use, the above one is not the only possibility!
The key here is that for each sample set you maintain both the sum of samples, and the number of samples in the sum, separately. You need 30s / 0.2s = 150 such sets. That does not need to be a constant, either, as long as it is between 2 and the number of elements allocated to the arrays as above. (Changing it should always clear everything to zeroes, though; otherwise you'd need to move data around in the arrays to ensure the correct sets are used.)
If one has enough RAM (a few kilobytes) and a 32-bit MCU, and the output report format is not fixed yet, I would consider keeping both sum of samples and sum of squared samples in each set, so that in addition to the box-car average (windowed sample mean), you could also report the variance within the window. The minimum variance depends on the amount of noise in the ADC process, but problems possibly affecting the average like vibrations et cetera will increase the variance, and might be useful. The variance would be windowed exactly the same way as the data is.
You see, if say sum_total is the sum of sum_count samples, and sum_squared is the sum of those samples squared (each sample squared, then summed), then the variance of the samples is (sum_squared-sum_total*sum_total)/sum_count or equivalently sum_squared/sum_count - sum_total*sum_total/sum_count. (Statistically, the unbiased estimate is (sum_squared-sum_total*sum_total)/(sum_count - 1).)