Author Topic: March algorithm for RAM testing - byte-by-byte or bit-by-bit? (Read 1074 times)

HwAoRrDk · « **on:** December 31, 2022, 02:22:56 pm »

Quick question regarding the March C algorithm for testing RAM:

Is it supposed to be done on a bit-by-bit basis, or with larger units such as bytes/words?

The available material I've read on the matter either doesn't make it explicitly clear, or gives conflicting information. For example, I've read Microchip AN1799, which talks about writing to "cells", which I presume equates to bits, but I've also read ST AN4435 which talks about reading and writing on a word-by-word basis (i.e. 32 bits).

For instance, using the first stage of March C as an example, do I do it on a byte-by-byte basis like so:

1. Write 0x00.
2. Verify read 0x00; write 0xFF.
3. Verify read 0xFF; write 0x00.

Or on a bit-by-bit basis (X = don't care):

1. Write 0b00000000.
2. Verify read 0b0XXXXXXX; write 0b10000000.
[...]
9. Verify read 0bXXXXXXX0; write 0b11111111.
10. Verify read 0b1XXXXXXX; write 0b01111111.
[...]
19. Verify read 0bXXXXXXX1; write 0b00000000.

Kleinstein · « **Reply #1 on:** December 31, 2022, 02:51:06 pm »

The reading can usually compare all bits. Checking a bit multiple times is not a bad thing, especially if it does not take extra time.
Bsides 0x00 and 0xFF other common test bytes are 0x55 and 0xAA. This is to also detect if 2 adjecent bits somehow got linked.
There is nothing wrong with testing whole words at once. In some cases (e.g. a 32 bit CPU with 16 bit bus it can make sense to add extra tests for coupling between the bytes / other addressing case. It depends on the system and what level of test is needed / wanted. For example you may want to disable / flush caches to make sure the data actually come from the RAM and not just a cache.
Many tests are made for some errors in mind, e.g. stuck bit, a short between adjacent lines, a dead adress bit and so on. The simple tests may not find all faults (e.g. a disconnected/stuck adress bit may be missed with some tests).

Doctorandus_P · « **Reply #2 on:** January 03, 2023, 03:45:26 am »

You have to design memory tests with specific failure modes in mind.

For example, testing with 0x55 and 0xAA may find if adjacent bits are linked, but there is no guarantee that bits in your byte are also next to each other in the RAM IC's themselves. Normally you can just interchange all the bits of a databus to a RAM chip with each other and it will stil work. This can be done for example to ease PCB routing.

There are also weird things that can cause faults, for example do a search for "rowhammer"

Nominal Animal · « **Reply #3 on:** January 03, 2023, 10:21:14 am »

In addition to the systematic checks, I like additional tests based on pseudorandom number sequences, specifically using the Xorshift family of PRNGs, because they're very fast and very 'random'.

The basic idea is that you use whatever randomness sources you have to generate a valid initial seed state (for Xorshift family of functions, the only invalid seed state is all zeros), then fill in the memory with the sequence starting at that state. You can then verify the memory contents by recalculating the same sequence (restarting at the same seed state).

The pseudorandom sequence tests should be considered statistical as opposed to deterministic, in the sense that they will never report a false error, but may not catch all possible errors, due to their pseudorandom nature and typically linear access pattern. An even more complex test is to use another pseudorandom number sequence to 'randomize' the memory access pattern itself, so that both the memory address and the value written there are random, but repeatable and thus verifiable.

The benefit from pseudorandom sequence tests is that the longer you run them, the more reliable the result (no errors) is. It is also important to note that certain architectures (including x86-64 aka AMD64) have memory access instructions that bypass all caches, "nontemporal" loads and stores; then, it is important to test both cached and nontemporal accesses; preferably even mix them.

For example, on commissioning new server hardware (or my own desktop machines), I habitually run similar memory tests (memtest86 and variants) for at least 48 hours straight (usually over a weekend!). If the hardware does not pass, I cannot rely on it, and will fix it or get something else instead. Before starting big development based on a specific microcontroller, I wouldn't mind testing a few in parallel for a week or more, just to get better understanding on their reliability.

Finally, cosmic rays can occasionally flip bits or crash processors even when there is nothing physically wrong with the processor. So, a single error on a single unit is not indicative of anything, and one must take a rather statistical approach to reliability and robustness here.

(Apologies to everyone who already knows all of this. I just wanted to make sure these things were mentioned in the thread, since this thread pops up in web searches for "ram testing" "bit by bit" "byte by byte".)

wek · « **Reply #4 on:** January 03, 2023, 11:33:36 am »

Answer depends on what's the purpose of the test.

If the intention is to perform a functional test, then it's of course bit by bit, and with intimate knowledge of the physical implementation of memory (mainly layout, determining the addressing sequence - even of bits within words, as bits of one word are not necessarily adjacent to each other). You then perform the test in conjunction with other tests, e.g. comprehensive test of glue logic, IDDQ and dynamic supply and IO current measurement, leakage test (manufacturing fault where SRAMs prematurely "forget" their state after hours of standby at elevated temperatures), whatnot. This is something done typically by the chip/system designer/manufacturer.

If you want to pass some sort of "safety" certification, you simply do whatever the papers and certification body require, while trying to balance it with minimal effort and impact on the product on your side. It's a mockery of testing anyway, with infinitesimally small value as far as fault finding goes, done mostly for bureaucratic and CYA reasons. Note for example this line from ST's AN4435 you've referred to above:

Note:
The RAM test is word oriented but the more precise bit March testing algorithm can be used.
However this method is considerably more time and code consuming.

If the intention is to perform a "symptomatical" test (which is still a very valid thing to do in systems with large design and manufacturing variations, e.g. "PC"), you do what Nominal Animal recommended above.

JW

mikeselectricstuff · « **Reply #5 on:** January 03, 2023, 11:42:28 am »

As mentioned, you need to know why you are testing.
For example, external RAM can be subject to open address lines. That means you have to write a pattern to the whole of RAM before reading it back, as an address open will result in a duplicate address.
A simple way to do this, e.g. for 64K is write the sum of the high & low bytes to each location, then repeat with the inverse pattern to catch any stuck bits.

peter-h · « **Reply #6 on:** January 03, 2023, 08:36:53 pm »

I've done loads of RAM tests since c. 1980 and the only time one found a duff device was with Siemens 1mbit DRAMs (mid 1980s or so) where I wrote a pseudo random byte stream into 4Mbytes of these, left it for a second or so to check refresh is working, and then read it back.

One can run this test continuously, with varying patterns, and that will eventually test pretty well every (non timing related) failure mode.

Doctorandus_P · « **Reply #7 on:** January 08, 2023, 05:42:48 pm »

I once did a test on some static RAM chips I salvaged from old 486 Mobo's. I wrote a pseudorandom sequence to it, then removed power and shorted the power pins for a few seconds (so I was sure decoupling caps were discharged) then powered the device up again and attempted to read back the data. Most of the data could be read back even after multiple seconds without power.

m k · « **Reply #8 on:** January 08, 2023, 06:00:13 pm »

Standard Amiga feature.
Seconds are short for that.

I've also seen a forced bit.
That where others around are forcing the one changing its state.

I'd say that thoroughly testing a memory is very difficult.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: March algorithm for RAM testing - byte-by-byte or bit-by-bit? (Read 1074 times)

HwAoRrDk

March algorithm for RAM testing - byte-by-byte or bit-by-bit?

Kleinstein

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

Doctorandus_P

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

Nominal Animal

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

wek

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

mikeselectricstuff

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

peter-h

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

Doctorandus_P

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

m k

Re: March algorithm for RAM testing - byte-by-byte or bit-by-bit?

Share me