Author Topic: Reflective memory (Read 3691 times)

theoldwizard1 · « **on:** December 04, 2017, 09:03:01 pm »

A number of years ago (okay, ancient history in electronics world) VMIC developed something called Reflective Memory for VME embedded controller systems. Basically it was a serially connected dual ported RAM. It was the "hot ticket" back in the 80s and 90s.

Has anyone ever heard of something like this ever being implemented on a microcontroller chip ?

The application has run out of internal RAM and Flash. It can be "divided" in two, with a relatively small amount (1KB ?) of reflective memory on two different chips. Reflectibe memory will work because data updates do not need to be instantaneous, but zero overhead is a requirement.

C · « **Reply #1 on:** December 04, 2017, 10:33:39 pm »

Need to remember that you can write to more then one memory chip at a time

With out this type of memory
Each CPU could have some local dual port ram with one port connected to system bus.
Any write to this memory then has to be via the system bus to keep all local copies up to date.

With dual port ram chips you can do this if you prevent read while a write is taking place to same address location.

theoldwizard1 · « **Reply #2 on:** December 04, 2017, 10:40:55 pm »

Quote

Each CPU could have some local dual port ram with one port connected to system bus.
Any write to this memory then has to be via the system bus to keep all local copies up to date.

Very true.

The application is extremely cost sensitive. Basically, it is a single board (and currently single controller chip) computer. There is no bus. All memory is on the embedded controller chip. The fewer the number of pins used the better (the rest of the pins are used for data acquisition and control).

C · « **Reply #3 on:** December 05, 2017, 01:52:06 am »

Today I would use CAN Bus to share data between two controllers.

theoldwizard1 · « **Reply #4 on:** December 05, 2017, 03:57:29 am »

Quote from: C on December 05, 2017, 01:52:06 am

Today I would use CAN Bus to share data between two controllers.

Not anywhere near fast enough (I want close to dual-ported RAM speed) and too much overhead.

fourtytwo42 · « **Reply #5 on:** December 07, 2017, 09:49:28 am »

Your not giving much information! What is the MCU and clock speed your talking about ?
How many pins have you free ?
You talk about wanting close to DPR speed, so are you trying to compare serial with parallel access ?
VME reflective memory has nothing to do with your problem unless you wish to implement a very pin hungry VME interface on your MCU! The company you are referring to make boards for VME systems not memory chips.

theoldwizard1 · « **Reply #6 on:** December 07, 2017, 12:31:49 pm »

Quote from: fourtytwo42 on December 07, 2017, 09:49:28 am

Your not giving much information! What is the MCU and clock speed your talking about ?
How many pins have you free ?
You talk about wanting close to DPR speed, so are you trying to compare serial with parallel access ?

I just want to know if it has been done before.

Ultimately, it would be a semi-custom chip and high speed serial, about the same speed as one lane of PCI-E, would be adequate. We are talking "chip-to-chip" transfers with no external transceivers/drivers.

Quote from: fourtytwo42 on December 07, 2017, 09:49:28 am

VME reflective memory has nothing to do with your problem unless you wish to implement a very pin hungry VME interface on your MCU! The company you are referring to make boards for VME systems not memory chips.

That reference was just an example. Pin count is very sensitive.

tggzzz · « **Reply #7 on:** December 07, 2017, 02:23:27 pm »

Have a look at the XMOS processors. They have hard realtime scalable multicore hardware and software, where

"hard" means the IDE examines the optimised binary to define the maximum program times; there's none of this rubbish run-it-and-hope you see the worst case
"multicore" means up to 32 cores/chip (i.e. 4000MIPS/chip)
"scalable" means both on-chip and across chips

As an added benefit the I/O is "FPGA like", i.e. it contains multiple programmable clocks, SERDES for 250Mb/s per port, plus each port has timers defining when input did occur or output will occur with 4ns resolution.

Most importantly, the programming environment is theoretically sound, being based on Hoare's CSP (communicating sequential processes) and C (with the ill-defined bits omitted).

30000ft flyer: https://www.digikey.com/en/pdf/x/xmos/xcore-architecture
how all the hardware and software fits together: https://www.xmos.com/support/tools/programming?version=latest&component=18344
programming tutorial: https://www.xmos.com/support/tools/programming?component=17653

For an example of the IDE's analysis and guarantees, see reply #13 in https://www.eevblog.com/forum/microcontrollers/cortex-sam3x-interrupt-latency-what-gives!/msg1305944/#msg1305944

theoldwizard1 · « **Reply #8 on:** December 07, 2017, 03:32:32 pm »

Quote from: tggzzz on December 07, 2017, 02:23:27 pm

Have a look at the XMOS processors. They have hard realtime scalable multicore hardware and software, ...

VERY INTERESTING ! The design is such that each task has it own core yet can communicate to other cores.
I would like to know more about how their XCONNECT switch handles off-chip messaging.

Also I find the following statements hard to believe based on my knowledge of chip design

Each tile contains local SRAM memory, which is shared between all cores on that tile for code and data
Each scheduled core has an allocated slot to access the memory in a single cycle
The xCORE memory will always respond within the allocated cycle

Execution out of RAM is $$$ for large applications.

One BIG benefit for "reflective memory" is that you double the amount of RAM and Flash available to the total application. Yes, it will require some forethought on how to divide the tasks to best utilize these resources.

Clearly a "reflective memory" serial communication channel would be setup for full duplex. Differential drivers may not be required if the link is going to another device on the same board. Yes, this is starting to look like a SPI network on steroids, so clearly it could also be used for "intelligent" I/O chips (just map their registers into the controller chips memory space) and multiple channels could be used for devices with different latencies.

tggzzz · « **Reply #9 on:** December 07, 2017, 04:19:48 pm »

Quote from: theoldwizard1 on December 07, 2017, 03:32:32 pm

Quote from: tggzzz on December 07, 2017, 02:23:27 pm
Have a look at the XMOS processors. They have hard realtime scalable multicore hardware and software, ...
VERY INTERESTING ! The design is such that each task has it own core yet can communicate to other cores.

They are, aren't they! Most MCUs are very similar to each other, and most languages are inherently serial, not parallel. It is rare to find some that are significantly different and, most importantly, unify the hardware and the software.

I have only "kicked the tyres" with a simple design, but I found their documentation remarkably simple, clear, and without strange "gotchas". I haven't found any bugs either

It is worth realising that the concepts are old and have stood the test of time: CSP is from the 70s, hardware for CSP is from the 80s (Transputer), software for CSP from the 80s (Occam), and XMOS xCore is a decade or so old. Many CSP/Occam concepts are re-materialising in modern languages such as Go and Rust (but I've used neither).

Prof. David May was has been involved in all of that, and has avoided past problems.

Quote

I would like to know more about how their XCONNECT switch handles off-chip messaging.

I am not an expert and have not investigated this, however I suspect I can offer a few pointers:

within a tile all cores/tasks share the same memory on a timesliced basis - think of SMT
within a tile, inter-task comms can be implemented either via a comms channel or via shared memory
across tiles, comms have to be implemented by copying memory using a comms channel; clearly this adds latency
xC ensures all that is transparent. There are restrictions to ensure correctness, e.g. no cross-task aliasing of memory
if comms can occur between tiles on the same chip, extension to comms between tiles on different chips is trivial. ISTR it requires 5 wires and a serialisation protocol, but you would be wise to verify that
both i/o and inter-task comms uses the same language primitives and xCONNECT. That works very pleasantly - just "think of" the i/o port as a different task.

FFI, dig around on the XMOS website and forum to find more information. I've found these documents particularly useful, but others are more directed at your questions.
XMOS-DIY-USB.pdf
XMOS-Introduction-to-XS1-ports_3.pdf
XMOS-Programming-Guide-_documentation_F-2.pdf
XMOS-XCC-Command-Line-Manual_X6904A.pdf
XMOS-XC-Reference-Manual_8.7-[Y-M].pdf
XMOS-XS1-Architecture_1.0.pdf
XMOS-XS1-Assembly-Language-Manual_8.7-[Y-M].pdf
XMOS-xTIMEcomposer-User-Guide-14_14.x.pdf

Quote

Also I find the following statements hard to believe based on my knowledge of chip design
Each tile contains local SRAM memory, which is shared between all cores on that tile for code and data
Each scheduled core has an allocated slot to access the memory in a single cycle
The xCORE memory will always respond within the allocated cycle

Execution out of RAM is $$$ for large applications.

This is aimed at hard realtime embedded programming, not general purpose systems. See digikey for available processors.

I would question the necessity of having very large memory shared between many cores/tasks. Everything I've heard leads me to believe that "high performance computing" is heavily based on message-passing between separate non-shared memory computers. I believe that general purpose systems will have to go that route, but it will take a generation of kicking and screaming by people wedded to existing languages and implementations.

Although the HPC-vs-xCORE/xC details are very different, many of the high-level paradigms are similar: if you can "think" in one, you can "think" in the other. (The same can be said of Java-vs-C#, for example).

Quote

One BIG benefit for "reflective memory" is that you double the amount of RAM and Flash available to the total application. Yes, it will require some forethought on how to divide the tasks to best utilize these resources.

The number of cores/tasks is a hard limit - with exceptions! Given some reasonable restrictions on how a task is coded, the compiler can silently combine several tasks onto the same core. Essentially this comes down to sequentially merging all the "setup()" parts of a task, and having the "while (1) {select...}" parts into a single select statement.

Quote

Clearly a "reflective memory" serial communication channel would be setup for full duplex. Differential drivers may not be required if the link is going to another device on the same board. Yes, this is starting to look like a SPI network on steroids, so clearly it could also be used for "intelligent" I/O chips (just map their registers into the controller chips memory space) and multiple channels could be used for devices with different latencies.

technix · « **Reply #10 on:** December 07, 2017, 04:52:03 pm »

If you are talking about bigger systems like a PC or up, maybe you can give RDMA a try? You can grab used single-port or dual-port Infiniband cards for fairly cheap, and for a small (2-node to 3-node) IB RDMA network you can use point-to-point connections instead of hunting down an expensive IB switch.

theoldwizard1 · « **Reply #11 on:** December 07, 2017, 09:49:09 pm »

Quote from: tggzzz on December 07, 2017, 04:19:48 pm

I am not an expert and have not investigated this, however I suspect I can offer a few pointers:
within a tile all cores/tasks share the same memory on a timesliced basis - think of SMT
within a tile, inter-task comms can be implemented either via a comms channel or via shared memory
across tiles, comms have to be implemented by copying memory using a comms channel; clearly this adds latency

Your first bullet bothers me ! If it is a fixed timesliced, then tasks that are small will waste a lot of time doing nothing. Also how do you deal with large tasks, or tasks that are triggered by an external event (i.e. an interrupt).

Regarding the last bullet, how much memory is copied ?

nctnico · « **Reply #12 on:** December 07, 2017, 10:30:38 pm »

Quote from: theoldwizard1 on December 07, 2017, 12:31:49 pm

Quote from: fourtytwo42 on December 07, 2017, 09:49:28 am
Your not giving much information! What is the MCU and clock speed your talking about ?
How many pins have you free ?
You talk about wanting close to DPR speed, so are you trying to compare serial with parallel access ?
I just want to know if it has been done before.

Ultimately, it would be a semi-custom chip and high speed serial, about the same speed as one lane of PCI-E, would be adequate. We are talking "chip-to-chip" transfers with no external transceivers/drivers.

Tie two ethernet MACs together and you have a fast and DMA capable data interchange system. In modern microcontrollers you usually have the ethernet RAM on a different bus so the DMA transfers from the ethernet controller and the ethernet buffer memory don't interfere with the microcontroller fetching instructions and data (except when accessing the ethernet buffer RAM). With a relatively simple CPLD/FPGA in between which acts as a HUB you could even create a system where data is shared between several devices.

theoldwizard1 · « **Reply #13 on:** December 07, 2017, 10:51:58 pm »

Quote from: nctnico on December 07, 2017, 10:30:38 pm

Tie two ethernet MACs together and you have a fast and DMA capable data interchange system. In modern microcontrollers you usually have the ethernet RAM on a different bus so the DMA transfers from the ethernet controller and the ethernet buffer memory don't interfere with the microcontroller fetching instructions and data (except when accessing the ethernet buffer RAM).

Maybe ...

I think this still has too much overhead. I would have to seem some low level simulation of the ethernet controller to convince me that this would work.

The goal is ZERO overhead for each processor and no external-to-the-chip drivers.

tggzzz · « **Reply #14 on:** December 07, 2017, 11:02:52 pm »

Quote from: theoldwizard1 on December 07, 2017, 09:49:09 pm

Quote from: tggzzz on December 07, 2017, 04:19:48 pm
I am not an expert and have not investigated this, however I suspect I can offer a few pointers:
within a tile all cores/tasks share the same memory on a timesliced basis - think of SMT
within a tile, inter-task comms can be implemented either via a comms channel or via shared memory
across tiles, comms have to be implemented by copying memory using a comms channel; clearly this adds latency

Your first bullet bothers me ! If it is a fixed timesliced, then tasks that are small will waste a lot of time doing nothing. Also how do you deal with large tasks, or tasks that are triggered by an external event (i.e. an interrupt).

Regarding the last bullet, how much memory is copied ?

Interrupts? What are they? You can't guarantee timings to the clock cycle if you have either interrupts or caches

I don't know what you mean by a small or large task. A task is a unit of computation started by a message or I/O, and which has to be completed before the next message or I/O occurs.

Typically you have one task/core dedicated to an I/O peripheral; when there's nothing to do, the core sleeps. The task resumption latency is low; in my application I see it as being instantaneous (10ns), but I believe XMOS states <100ns. Think of it as having the RTOS in hardware.

Each core has a 100MHz clock and executes one instruction every 10ns. The chip runs at 500MHz. Thus there can be 5 100MHz tasks running "simultaneously" in a tile. If you have 8 tasks then the IDE will pessimistically assume they are all running at full speed and will indicate the obvious increase in execution time. (In practice, cores are often waiting for I/O or messages, and consume zero execution time).

The amount of memory copied is defined by the message you send; 1 byte upwards.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Reflective memory (Read 3691 times)

theoldwizard1

Reflective memory

C

Re: Reflective memory

theoldwizard1

Re: Reflective memory

C

Re: Reflective memory

theoldwizard1

Re: Reflective memory

fourtytwo42

Re: Reflective memory

theoldwizard1

Re: Reflective memory

tggzzz

Re: Reflective memory

theoldwizard1

Re: Reflective memory

tggzzz

Re: Reflective memory

technix

Re: Reflective memory

theoldwizard1

Re: Reflective memory

nctnico

Re: Reflective memory

theoldwizard1

Re: Reflective memory

tggzzz

Re: Reflective memory

Share me