Author Topic: Ideas for storing frequently updated data in a FLASH chip (Read 5349 times)

peter-h · « **on:** January 02, 2022, 08:58:32 pm »

I am using an Adesto AT45DB321 4MB chip (serial SPI). It is organised in 512 byte blocks.

2MB of this is a FAT file system and that all works. Obviously this has the issue that if you say create a file and then append a few bytes to it, 100k to 1M times, you will wear out the FAT.

Other parts are used for other stuff, and I have allocated the remaining 512k for general storage, for stuff like logging data. The challenge is how to prevent wearing out this bit also

One could write sequentially, but one must not maintain anything like a byte counter, or a pointer to the end of the data, because that counter will wear out.

ISTM that the only way is to write a unique pattern at the current end of data (and perhaps another at the current start of data, if not at the start of the 512k block) and scan the device for these to find the data block. And hope these patterns will never occur within the data by accident (fairly easy to do if say 8 bytes are used).

I vaguely recall hearing of clever schemes which rely on an erased FLASH being all 1s, and then dropping these to 0, one at a time, and each time you drop one of them to 0, it indicates something, and you have achieved this using just one bit, but I am not sure how to use this. It seems to be creating a different kind of "FAT" really; one which wears out a lot slower.

I don't have the option to keep stuff in RAM and quickly save into FLASH when the power fails; it could have been done but the PSU is an isolated switcher and early enough power fail detection would have needed an extra opto-isolator.

The messy aspect is searching a FLASH with 512 byte blocks for a unique pattern, which is quite likely to span across two blocks. I have already written a function which validates a CRC-32 over a file (in the 2MB file system) and that was messy enough because the 4-byte CRC, taking up the last 4 bytes of the file, can span across two blocks. But the principle is fast enough, at something like 300 kbytes/sec, if using just a 512-byte buffer (another important requirement; I can't load the whole 512k into RAM - it's a 32F417).

This stuff must have been solved for data loggers many times over...

Any ideas much appreciated, as always

wek · « **Reply #1 on:** January 02, 2022, 09:05:15 pm »

There are many ways to skin a cat and the optimal solution depends on the particularities, but a simple and working method is to store sequencially 1 byte length + length byte data, where lenght < 0xFF (and data may contain address if needed).

JW

Benta · « **Reply #2 on:** January 02, 2022, 09:13:56 pm »

Did you consider using MRAM?
https://www.everspin.com/serial-peripheral-interface

nctnico · « **Reply #3 on:** January 02, 2022, 09:14:41 pm »

I have solved this in the past by having a sequence counter using fixed length chuncks of data (ideally equal to the flash block size). The data block with the highest sequence is the last one so start writing after that. Make sure to do a verify after a write and write the data in the next block in case the CRC check fails. This way the flash is written to sequentially and in case a block goes bad, it is skipped automatically.

SiliconWizard · « **Reply #4 on:** January 02, 2022, 09:16:00 pm »

I've used FRAM in the past for frequent data writes (it's somewhat similar to MRAM), and it worked great. The downside is the cost.

SiliconWizard · « **Reply #5 on:** January 02, 2022, 09:18:43 pm »

Otherwise, yes, structure your logged data so that you can identify where the last entry is. Then you make it a rolling log. I almost always structure this kind of data myself anyway, so that would just be a matter of giving the last entry a special type. To make it relatively low overhead in terms of memory size, you can group a number of logged entries in a single structured entry. Typically, a "structured entry" would contain a 'type', payload size, payload, and some checksum or CRC.

amyk · « **Reply #6 on:** January 02, 2022, 10:38:03 pm »

I can't seem to find any mention of write endurance in the datasheet?

The "correct" way to handle wear leveling/block remapping is to use a "flash translation layer" (FTL).

Nusa · « **Reply #7 on:** January 02, 2022, 10:47:45 pm »

To land on workable solutions, first "frequently" needs to be quantified.

peter-h · « **Reply #8 on:** January 02, 2022, 11:02:35 pm »

The endurance is "100,000 program/erase cycles per page minimum".

I don't know if is actually per-byte, or if any write to any part of the 512-byte block will count as writing the whole block.

I don't have the option of hardware changes. I am also a bit concerned about writing a block until the block fails, because the failure may be marginal. OK; I realise that block remapping in real drives is done exactly this way

but maybe they do something to skew the marginal affect in their favour.

As regards "how often", well, what I am looking for is some rough equivalent to proper wear-levelling but for the simplified sequential-write scenario. One can do this calculation for say an SSD, and for say a 256GB SSD with internally implemented wear levelling one gets numbers like 50TB of total data can be written because every block has reached the max # of writes. This endurance can be achieved with either a proper wear levelling file system, or in specific usage scenarios like mine (basically data logging) by writing sequentially and then wrapping back to the start (of the 512k block) so basically the last 512k is always available.

As to why the 2MB FAT-FS filesystem already mentioned did not implement wear levelling, it is because it is intended for occassional usage only. Stuff like firmware updates, config files, etc.

I don't want a complicated solution, don't need multiple or named files, etc.

nctnico · « **Reply #9 on:** January 02, 2022, 11:10:14 pm »

Another question is: is the data essential? If the data is essential then you could add error correction data. If the flash is marginal then this is detected but the chance is high the data can still be recovered. But it comes at the price of having redundant data.

NorthGuy · « **Reply #10 on:** January 02, 2022, 11:35:14 pm »

There must be some sort of period to your data. For example, a year consists of 365(or 366) days. If you write once a day, then designate a fixed location to each calendar day and write into there. This way, you will always have the last 365 days of data in the flash. It's easy to find too - January 2-nd is always at the same location. Store the year within the data so that you know whether the data in the specific location is fresh or stale.

Instead of days, you can use minutes, hours, or whatever units. Your period may be not a year, but a day, a decade, or whatever fits into the flash.

errorprone · « **Reply #11 on:** January 03, 2022, 07:12:37 am »

EEPROM emulation is another set of algorithms used to handle wear on flash devices. Most microcontroller manufactures will have an app note since EEPROM is not embedded in alot of new devices. Here’s an example from ST.

https://www.st.com/resource/en/application_note/dm00311483-eeprom-emulation-techniques-and-software-for-stm32-microcontrollers-stmicroelectronics.pdf

DavidAlfa · « **Reply #12 on:** January 03, 2022, 03:37:50 pm »

EEPROM emulation will have the same flash wearing issue. You use the internal flash as eeprom, that's all.
It's not easy to develop and implement a flash wear algorithm yourself.

Check the SPIFFS filesystem, which handles this:
https://github.com/pellepl/spiffs
https://github.com/nickfox-taterli/spiffs-stm32

Or use a memory that does this by itself, like a SD Card.

By the way, I think you can map the spi memory as addressable memory with the FMSC. Don't remember the details.
Will make it much easier, faster and transparent. Perhabs it had to be quad or octo-spi, but I think that was when used as program memory (executable code).

nctnico · « **Reply #13 on:** January 03, 2022, 04:12:07 pm »

Quote from: DavidAlfa on January 03, 2022, 03:37:50 pm

Or use a memory that does this by itself, like a SD Card.

SD cards typically don't do wear levelling! AFAIK there are some that do but those are for industrial purposes. A solution could be to use eMMC which (AFAIK) does wear levelling.

voltsandjolts · « **Reply #14 on:** January 03, 2022, 04:18:50 pm »

Maybe use a file system that does wear levelling for you, like LittleFS (disclaimer: never used it).

Peabody · « **Reply #15 on:** January 03, 2022, 04:46:25 pm »

It appears your flash chip also has two sectors of static ram. You can accumulate readings in the ram, then write the ram to flash one full sector at a time. That would eliminate the issue of what counts as a write. Of course if you lose power, you lose up to one sector of data, but how important is that to you?

And if you write one sector of data at a time, then to find out where you are on a powerup, you only have to check the first byte of the sector to see if it's FF or something else. A binary search of 2MB to find the first sector beginning with FF would require reading 21 sectors, which would go pretty quickly. Of course as long as power is on, you would just keep track of where you are in your software.

peter-h · « **Reply #16 on:** January 03, 2022, 05:24:02 pm »

"EEPROM emulation will have the same flash wearing issue"

Indeed. I have seen EEPROM or even FAT emulation done in the 32F4's FLASH memory; it won't be quick but it will work, though it doesn't help with this issue.

"Maybe use a file system that does wear levelling for you, like LittleFS"

Looks interesting, but with apparently little usage, it could be yet another "open source bug fixing job for a few months"

which as the owner of my small business I am familiar with.

"It appears your flash chip also has two sectors of static ram. You can accumulate readings in the ram, then write the ram to flash one full sector at a time. That would eliminate the issue of what counts as a write. Of course if you lose power, you lose up to one sector of data, but how important is that to you?"

The strategy will indeed be to accumulate data in RAM and write it to the FLASH storage periodically, say every minute, and you will lose that part if the power goes.

" I think you can map the spi memory as addressable memory with the FMSC."

I don't think the 32F417 can do that. Just checked the data sheet. It would be wonderful; we could have loads of RAM for other, unrelated to this, stuff like MbedTLS which needs amounts of RAM which don't exactly relate to the normal meaning of "embedded"

There are other CPU mfgs which do support SPI RAM, SPI boot code loading, etc.

I think that, no matter how you shake this, the solution must lie in moving along the location of the "end pointer" in FLASH. That's what a wear levelling FS must do too; it moves the FAT along, periodically.

Storing the end pointer in RAM as a temporary measure, and having a unique pattern in FLASH at the end of the data, is a good compromise, if another RTOS task needs to access all the data stored, quickly.

Nominal Animal · « **Reply #17 on:** January 03, 2022, 05:32:04 pm »

I've described the basic scheme in another thread, but basically, you just need to use a wide enough incrementing update counter at the beginning of each page (say, six to eight bytes). Keep the all-zeros and all-ones bit patterns "invalid values" for the counter, so that you can tell from the counter alone if the page is empty.

Because pages in use have consecutive update counters, to find the current page, you only need a binary search to find it. (However, since the AT45DB321 takes 3ms to boot up properly, even if you were to read the 64-bit update counters of all 1024 pages in a linear fashion, it would only add something like 128000 AT45DB321 clock cycles – 1.6ms at 80 MHz clock; so, say a few ms in real life.)

As the pages are used in consecutive fashion, you'll get optimum wear leveling, and don't need to worry about block or sector level erasing.

It does mean that you lose 8 bytes per page for the log. (The AT45DB321E at least supports 528-byte pages – it's actually 33Mbit, not 32Mbit, chip – so putting the page update counter and a 64-bit checksum after the 512 data bytes, would make for a very nice circular buffer.) If you update each page at a time, that's it. Otherwise, you also need to make sure each record you store never crosses a page boundary, and never matches all-zeros/all-ones, the erased bit pattern.

SiliconWizard · « **Reply #18 on:** January 03, 2022, 06:49:30 pm »

As I think the OP themselves suggested and I also suggested, you don't need to do anything particular for wear-leveling for typical data logs, as you can write the data contiguously and never write the same area twice until the log rolls over. That's wear-leveling for free. Some use cases may not lend themselves well to that, but for simple data logging, it certainly does.

As we said, if you don't want to store a "pointer"/index to the last entry, you can just write the last entry with a special "token". Then you'd have to scan all entries at startup to find where to write next, but that only has to be done once at initialization. Just a thought.

Nominal Animal · « **Reply #19 on:** January 03, 2022, 07:48:29 pm »

Quote from: SiliconWizard on January 03, 2022, 06:49:30 pm

As we said, if you don't want to store a "pointer"/index to the last entry, you can just write the last entry with a special "token".

What I suggested, was to use incrementing counter for the token, at the beginning or end of each page (the chip supports reading at any byte boundary without penalties), with valid counters never being all-zero or all-ones bit pattern. No linear scan needed, a binary search will find it just fine, although one must use monotonically increasing counters. The downside is say 8 bytes reserved per 512-byte page, unless 528-byte pages are used.

Basically, there are just two situations. Before all log pages have been written to,
01 02 03 04 05 06 07 -- -- -- -- -- -- -- -- --
where -- denotes erased page (invalid counter, all bits set/all bits clear); and after all log pages have been written to at least once,
35 36 37 38 39 40 -- -- 27 28 29 30 31 32 33 34
where we happen to have case P=2.

With 2^N pages and up to P pre-erased pages, you need to read at most 3+P+N page counters, each taking 128 cycles of the flash chip clock (read command is 5-8 bytes long, and immediately followed by the read data). With this chip, one can do P=0 using the SRAM buffer in it (one can write to the SRAM buffer, and then commit the data, with the chip doing the page erase and copying the data from SRAM to Flash on its own), but the firmware should be ready to deal with nonzero P in case power is lost after the page has been cleared but before the data has been written.

The binary search only needs to find the largest counter, ignoring any invalid counter values (--). If the read-modify-write command is used to update the page contents, then that page may not be full. The page to be replaced (erased and then written to) next is the succeeding page, and will have the counter incremented.

In OP's case, N=10 and P≤1 if one uses the page-based SRAM buffer update (write with page erase), which means one only needs to read 14 pages' counters, to find the latest page ("tail pointer"). That number would of course be saved in the STM32, and only re-discovered at next bootup.

SiliconWizard · « **Reply #20 on:** January 03, 2022, 07:52:31 pm »

Ah, good idea.

Nominal Animal · « **Reply #21 on:** January 03, 2022, 10:54:34 pm »

Although I personally believe that OP should just read the counter on each 1024 pages in linear fashion since that would only take a couple of milliseconds, and the run time savings is not worth the code complexity in this exact case, here is a test bench example for the binary search in case someone else might need the run time savings.

First, implement the search as a separate test program. There are nasty corner cases, for example when the erased page is at the beginning and/or the end of the log area; for example, with P=2, both:
-- 65 66 67 68 69 70 --

There are various strategies, but if one handles the initial case (1 2 3 - - - - -) without linear scanning, then one should never need to access more than N+P+3 pages, where P is the maximum number of erased pages in the log, not including initially erased pages. (It should be obvious: even if the P pages are scanned linearly, the limit holds.) Personally, I'd optimize for the normal operation, but also allow any random page to be erased with just the initial search being slightly slower.

I would consider taking the buffer counters as command-line parameters, with the program simulating filling in a new page. On non-Windows systems, I'd use something along the lines of

Code: [Select]

// SPDX-License-Identifier: CC0-1.0
#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>

#ifndef  NO_SEARCH
/* For you to implement, using only 'sectors' (sector count) and 'get_sector(sector number)'.
   It should return the index to the sector with the largest count, given that 0 indicates unused,
   and all used sectors have increasing counters.
*/
extern int  find_largest(int sectors, unsigned long (*get_counter)(int sector));
#endif

/* Internal implementation. */
static int            counters = 0;
static unsigned long *counter = NULL;

#ifndef  NO_SEARCH
static int            accesses = 0;
static unsigned long  get_counter(int sector)
{
    if (sector < 0 || sector >= counters) {
        fprintf(stderr, "get_sector(): Attempted to access sector %d (valid are 0 through %d, inclusive).\n", sector, counters - 1);
        exit(EXIT_FAILURE);
    }
    accesses++;
    return counter[sector];
}
#endif

int parse_ulong(char *src, unsigned long *to)
{
    char          *end;
    unsigned long  val;

    if (!src || !*src)
        return -1;

    errno = 0;
    end = src;
    val = strtoul(src, &end, 0);
    if (errno)
        return -1;
    if (end == src)
        return -1;

    /* Check for trailing garbage. */
    while (isspace((unsigned char)(*end)))
        end++;
    if (*end)
        return -1;

    if (to)
        *to = val;
    return 0;
}

int main(int argc, char *argv[])
{
    if (argc < 3 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", arg0);
        fprintf(stderr, "       %s C1 ... CN\n", arg0);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program simulates a circular log buffer, with N sectors,\n");
        fprintf(stderr, "and C1 to CN representing the sector update counters (positive).\n");
        fprintf(stderr, "Use 0 or - for erased sectors.\n");
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }
    counters = argc - 1;
    counter = malloc(counters * sizeof counter[0]);
    if (!counter) {
        fprintf(stderr, "Out of memory.\n");
        return EXIT_FAILURE;
    }

    /* Parse each counter. */
    for (int i = 0; i < counters; i++) {
        if (parse_ulong(argv[i+1], counter+i)) {
            if (!strcmp(argv[i+1], "-") || !strcmp(argv[i+1], "--")) {
                counter[i] = 0;
            } else {
                fprintf(stderr, "%s: Invalid counter value.\n", argv[i+1]);
                return EXIT_FAILURE;
            }
        }
    }

    /* For verification, locate the largest counter value. */
    int            largest_counter_index = 0;   /* If all unused, use 0. */
    unsigned long  largest_counter_value = 0;
    for (int i = 0; i < counters; i++) {
        if (counter[i] > 0) {
            if (counter[i] > largest_counter_value) {
                largest_counter_value = counter[i];
                largest_counter_index = i;
            }
        }
    }

#ifndef  NO_SEARCH
    int  i = find_largest(counters, get_counter);
    if (i != largest_counter_index) {
        fprintf(stderr, "Failed: Search returned sector %d, while %d is the correct sector:\n%lu",
                        i, largest_counter_index, counter[0]);
        for (int k = 1; k < counters; k++) {
            fprintf(stderr, " %lu", counter[k]);
        }
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    fprintf(stderr, "%d accesses\n", accesses);
    fflush(stderr);
#endif

    /* Use the next sector. */
    counter[(largest_counter_index + 1) % counters] = counter[largest_counter_index] + 1;

    /* To erase additional sectors, you can use e.g.
    counter[(largest_counter_index + 2) % counters] = 0;
    counter[(largest_counter_index + 3) % counters] = 0;
    counter[(largest_counter_index + 4) % counters] = 0;
    */

    /* Write the counters to standard output. */
    printf("%lu", counter[0]);
    for (int k = 1; k < counters; k++) {
        printf(" %lu", counter[k]);
    }
    printf("\n");

    return EXIT_SUCCESS;
}

To compile a baseline (no binary search) version, use e.g. gcc -DNO_SEARCH -Wall -O2 above.c -o example.

Otherwise, create a C source file that implements
int find_largest(int sectors, unsigned long (*get_counter)(int sector));
where get_counter(sector) returns the counter value for the given sector, 0 to sectors-1, inclusive; and the function returns the index to the sector with the highest counter value, given the counter rules I outlined before. If that is in say search.c, use e.g. gcc -Wall -O2 search.c above.c -o example to compile the ./example binary.

The standard output will contain the updated counters, with the next sector overwritten; with the number of get_counter() accesses written to standard error (omitted for the baseline version). If there is any kind of an error, or the search misses, the program will abort and report the error.

Running
./example 17 18 19 20 0 0 7 8 9 10 11 12 13 14 15 16
the standard output (intended for easier comparison) is
17 18 19 20 21 0 7 8 9 10 11 12 13 14 15 16

One can trivially add e.g. randomly erasing further pages, to ensure the search algorithm can handle all corner cases. I omitted that for brevity (as randomizing a PRNG in a portable fashion is annoying; in Linux, I just use Xorshift64* and get_random_bytes()).

DavidAlfa · « **Reply #22 on:** January 04, 2022, 12:55:24 pm »

Not all SD cards did wear leveling, but modern SD usually do, ex. WD Purple cards.
If you think it, it's almost mandatory with current storage capacities, doesn't make sense otherwise, you could buy a 64GB SD card and ruin it by writing a 128Byte text file a lot of times?
You have the sdio interface, I'd make use of it and fatFS.
For the memory mapping, it's possible in newer mcus (L4,F4,H7), like the 32F446. It's a peripheral on its own, not fmc/fsmc, called quadspi and octospi.
Actually, can be pretty fast, as it supports DDR operation and prefetching.
Quadspi also supports dual flash, so it fetches 8 bits at a time using 2 quad-spi memories concurrently, behaving like an octo-spi.
The prefetching will be reading ahead few words, reducing the latency, the only downwards is the energy consumption, as it holds the flash active all the time.
This can be disabled but you'll lose the prefetching features.
It can address up to 256 MB, no matter if the flash is larger.

Another simple idea for ex. "settings.dat".
- Read the contents to ram if small enough. Or write to the disk directly.
- Rename old file to settings_001.old (Or find the first unused number).
- Create a new settings.dat with the ram buffer contents. If not in ram, copy the old file.
- Old files will prevent from using the same area until you delete them.
- When full, delete the "old" files and start again.
- You might add the counter value inside the file itself.

peter-h · « **Reply #23 on:** January 04, 2022, 03:19:14 pm »

Some brilliant ideas here, though I must confess understanding some is above my mental pay grade

I think those brilliant people for putting in such effort to help this really good community.

I think Nominal Animal's method is similar to storing a marker at the start of each 512 byte block, which is a byte count within that block. Sadly, 2 bytes would be needed for the byte count. But one could use special values above the 0-510 range e.g.

bit 15 = 1 = this block is empty
bit 14 = 1 = first data block,
any nonzero value in bits 0-8 < 510 = the data ends within this block

My task is not just to code this but also to implement an API which any pleb can use. I work on the principle that if I can understand something then everybody will have no problem at all, and if I can't then it will be a support nightmare

(this product will have some other people write code for it, later on)

And, thinking about the API "for a data logger", it comes back to implementing a auto-wear-levelling file system

Which I don't have and have no interest in ripping out FatFS because

- it works perfectly
- don't have the time
- the file system is also accessible via USB, as a removable logical block device, 512 byte sectors (hence the FLASH device choice, 512 and not 528) so the same files are visible to Windows etc (with some deliberate limitations in the embedded API e.g. only 8.3 filenames, no subdirectory support...

Any wear-levelling FS would not be visible via USB, unless somehow emulated (complicated, because Windows dives straight in at sector level).

So, back to my 512kbyte area dedicated for data logging. What sort of API should be implemented?

- fast format
- slow format (for secure erase)
- append a data block, size x
- read a data block from offset x, max y bytes (obviously x needs to be < current end of data)
- truncate the data at x (poke the "end of data" marker in, before the previous end of data
- return total data size

I think supporting the above with the more cunning schemes would be quite complicated. It really lends itself to a "end of data" marker scheme. OTOH it would encourage a usage mode where people write just say 100k of data, then read it and do something with it (send it somewhere, write it out to a file in the FatFS filesystem, etc), then format. So the early part of the 512k would get more wear. But vastly less wear than opening a file in FatFS and appending to it, which thrashes the FAT area (like an SSD with WinXP, which always IME works for exactly 1 year, on a 24/7 machine, before you get "NTLDR corrupted"

).

If one wants to keep the wear totally even, one needs to do it differently, with no fixed "start of data" point. One needs both "start of data" and "end of data" points to be continually moving around the 512kbyte block. I need to decide whether this is worth the extra effort. One simple way would be for the "fast format" command to replace the "end of data" marker with a "start of data" marker, and then the next block to write would be written after that.

The job becomes trivially easy if the start of data marker is 0x00, the end of data marker is 0xFF and the data cannot contain either

Then you don't have to do string matching across block boundaries. I am sure there is a slick way but I don't know of one.

Yes indeed I could (can't on the current board) put in an SD card socket and have "unlimited" storage. All SD cards should support the basic SPI mode, if not the others (hence my other thread about whether licensing is needed for the faster modes; the actual algorithms seem to have escaped into the wild, but I can't run my SPI faster than 21mbps anyway, so the 5-10mbps possible with license-free SPI would be more than fine).

Nominal Animal · « **Reply #24 on:** January 04, 2022, 04:55:34 pm »

Note that the way I suggested one could use the 528-byte sectors on the flash chip, is to only expose (via the API and e.g. USB interface) 512 bytes of each sector, and use the 16 bytes per sector for bookkeeping and generation counters. The downside is that no single read or write can then cross the sector boundary, as the extra 16 bytes on the boundary need to be skipped when reading and inserted when writing: any single read or write would be 512 bytes at most. It shouldn't be too difficult to handle it in the API, though.

Quote from: peter-h on January 04, 2022, 03:19:14 pm

If one wants to keep the wear totally even, one needs to do it differently, with no fixed "start of data" point. One needs both "start of data" and "end of data" points to be continually moving around the 512kbyte block. I need to decide whether this is worth the extra effort. One simple way would be for the "fast format" command to replace the "end of data" marker with a "start of data" marker, and then the next block to write would be written after that.

The job becomes trivially easy if the start of data marker is 0x00, the end of data marker is 0xFF and the data cannot contain either Then you don't have to do string matching across block boundaries. I am sure there is a slick way but I don't know of one.

This relates to an escaping/encoding method I'm quite fond of. Basically, if you reserve some values for the markers, use an escaping method where the escape code is different than the code it escapes. In visual terms, let's call your start and end markers < and > and the escape marker \ . Then, for example,
\( encodes <
\) encodes >
\/ encodes \
This way,the < and > markers only exist in the data at the beginning and end, and you can do a linear search to locate the chunk.
I would also reserve the all-bits-clear/set value, whichever the flash erases to.

If so desired, the escape sequences can be expanded to include any pair of reserved marker values, and even run length encoding. Again, it depends on the exact data being logged, and the balance between usefulness versus code complexity.

To repeat, the key is to have the code following the escape code NOT match the code being escaped, but be completely different. That way, the non-escaped codes only exist in the data as the markers. (I can't tell you how annoying I find that so many languages use \\ to escape a single backslash. If only they'd used say \/, a lot of problems would have been mercifully avoided.)

Consider, for example, a scheme where the cyclic log buffer has 510-byte logical sectors, with say the first 2 bytes in each sector encoding a 16-bit update counter, but the data uses the above escaping scheme. A data sequence can cross a sector boundary, but even then the update counter will be in the start of each sector; the data will just 'skip' the counter.

The trick with a 16-bit counter is to consider 1..16383 < 16384..32767 < 32768..49151 < 49152..65534 < 1..16383 when comparing them; this works for up to 16383 sectors in the cyclic buffer. (Again, both 0 and 65535 reserved for erased.)

Even a linear scan will very quickly find the most recent sector written to (in about 64000 flash chip cycles; 0.8ms at 80 MHz); then, reading that sector will find the (last) end marker. If there is no start marker, one simply reads the preceding sector(s), until a start marker is found. (Obviously, the latest start marker is the one that corresponds to the (last) end marker.) A binary search will find the latest written sector even faster.

(That is, the most recent data chunk ends in the sector that has the highest 16-bit counter value, using the cyclic comparison rules as above. The start of that data chunk, if it spans multiple sectors, will need to be found by linearly reading the contents of each preceding sector until one containing a start marker is found. This is not necessary for finding the point at boot time where the next data chunk will begin, though; only when one wants to read the most recent data chunks.)

The complications in this scheme are:

The data written to the Flash may differ (due to escaping), but more importantly, differ even in length.
Because the sector update counter is in a fixed position, you need to either allow an escape pair to cross a sector boundary, or use a padding marker (the escape marker itself is fine!) that encodes no data and is used when there is only one byte available before the sector boundary, but the next data byte needs escaping.
When writing to the log, one must track the offset within the sector, and handle that sector boundary crossing (by including the sector counter in the data).
When reading from the log, one must track the offset within the sector, and skip the sector boundary if the data chunk crosses a sector boundary.
Whenever a data chunk crosses a sector boundary, it is also possible that the next sector has an erased update counter value, or an earlier update counter value. That means that the chunk was only partially stored, and the chunk is incomplete, truncated.

Before implementing this in any MCU, I would definitely do a C/C++ library implementation, and test that thoroughly; especially with pathological ("designed evil") data.
You can even have your ~~victims~~ coworkers and end-users take a look at the interface, to see if it provides the functionality they need from the cyclic log buffer.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Ideas for storing frequently updated data in a FLASH chip (Read 5349 times)

Share me