Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 63011 times)

westfw · « **Reply #100 on:** December 14, 2018, 01:06:59 am »

Quote

[MSP430 is] a little bit CISCy with memory-to-memory moves and adds. It falls into the PDP11-design M68000 space.

Based on the example I just tried, the code isn't very compact! At least as generated from C by gcc.

Variable instruction length and execution time. Definitely CISCy. Although of "elegantly minimal" form rather than the "we're going to implement cobol in microcode" form. With twice the registers of a PDP11 and half the 68k, I think it qualifies as different enough to be "new." And "relatively" successful.
The MSP430 code gcc produced for your example is depressingly bad. It fails to refactor the array access into pointer-based accesses, dutifully incrementing the index and adding it to each array base on each loop, when it could have used auto-incrementing indexed addressing, I think. (I thought that was an optimization that gcc would do before even getting to cpu-specific code generation. I guess not.)

Quote

I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions

Adding a "rotate" is relatively easy because it's a single instruction. Supporting Carry means retaining awareness of state that isn't part of the C model of how things work. "Which carry were they talking about?" For example, the really short examples that people are posting are all based on having "loop" instructions that don't change the carry bit. We saw how that restricts register choice on x86. MSP430 doesn't have any such looping instructions (that I recall or see in summaries.) So the compiler would have to decide that some math is different than other math, and ... it makes my brain hurt just thinking about it. (ARM has the "S" suffix for instructions to specify that they should update the flags, which is cute, I guess. But I'm not sure it's worth spending a bit on (and indeed, it's not there in Thumb-16)

Quote

Quote
Operating systems and compiler runtime libraries are always going to have a little bit of assembler in them...

That's really a sign of a bad design in 2018.

I don't know. In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days. (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Quote

[RISCV is] not going to disappear without trace when the company that owns it goes out of business or loses interest. That's a serious problem ... How much software has been lost as a result of the demise of PDP11, VAX, Alpha, Nova, Eclipse, PA-RISC, [etc]

PDP10. "loses interest." Sigh.

ataradov · « **Reply #101 on:** December 14, 2018, 01:30:28 am »

Quote from: westfw on December 14, 2018, 01:06:59 am

I don't know. In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days. (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Memory mapped registers give you cleaner code. The reason AVR is hard is that it has limited address space. This is not a problem on 32-bit systems.

Essentially what I'm asking for is SCB on Cortex-M devices. It is still standard and defined by the architecture specification, so all vendors have to implement it to get a compliant core.

Special registers result in weird code where you move stuff to/from general purpose register. And I don't see how this is any better from implementation point of view. If you are writing the value into the special register, you still have to wait until register fetch stage. And at that point you know the target address of a store operation and can break the pipeline if necessary.

westfw · « **Reply #102 on:** December 14, 2018, 02:44:51 am »

Quote

seems "cleaner"

Well, for instance, since "rotate" has been mentioned...
It's nice the compile can be made smart enough to see:

Code: [Select]

((x << n) | (x >> (opsize - n)));

and perhaps generate a "rotate left" instruction. But I'd really rather a rotl(x,n); statement that I KNOW generates the appropriate assembly.

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable. I guess. Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...

ataradov · « **Reply #103 on:** December 14, 2018, 02:52:50 am »

Quote from: westfw on December 14, 2018, 02:44:51 am

But I'd really rather a rotl(x,n); statement that I KNOW generates the appropriate assembly.

Me too. But that's a question to the compiler/standard library creators. Such things can either be defined as part of the standard (no way it realistically will happen for C) or as part of the library in a form of intrinsics. Intrinsics are easier, but they simply reflect instruction set with all its limitations on types of arguments.

Quote from: westfw on December 14, 2018, 02:44:51 am

Or, in the case of ARM, it's nice that the ABI and the hardware agree on which registers get saved, so that ISR functions and normal C functions are indistinguishable. I guess. Other times I wish the ISRs in C code were more easily distinguishable, and that the HW interrupt entry was quicker...

That's a matter of future improvement. I'll take ARMs system any day of the week over what we had before. Now to stay competitive they need to make it better. For example have a register in the NVIC that defines a bit mask of registers to save/restore. Your choice to stay with the default and be compatible with the ABI or do something manually.

RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

NorthGuy · « **Reply #104 on:** December 14, 2018, 03:20:22 am »

Quote from: ataradov on December 14, 2018, 02:52:50 am

RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

I don't think ARM is somehow targeted to MCUs. Compare to ARM, RISC-V seems cleaner and better, and it is also free. There's no reason to choose ARM over RISC-V.

With MCUs, a big problem is that the instructions are fetched from flash. Flash fetching is slow, so you only can fetch so many instructions per unit of time. A natural way to improve the performance is to make your instructions wider, so that every single instruction can do more, which is CISC, totally different to what you see in either ARM or RISC-V. However, such approach doesn't seem to be very popular. Everybody wants ARM. Perhaps, 5 years from now everybody will want RISC-V, which is definitely a good thing.

ataradov · « **Reply #105 on:** December 14, 2018, 03:23:24 am »

Quote from: NorthGuy on December 14, 2018, 03:20:22 am

I don't think ARM is somehow targeted to MCUs.

And what is Cortex-M0+ then?

Quote from: NorthGuy on December 14, 2018, 03:20:22 am

There's no reason to choose ARM over RISC-V.

What is the interrupt latency on the RISC-V?

Quote from: NorthGuy on December 14, 2018, 03:20:22 am

Perhaps, 5 years from now everybody will want RISC-V, which is definitely a good thing.

Quite likely, but not without effort on RISC-V part.

brucehoult · « **Reply #106 on:** December 14, 2018, 03:56:59 am »

Quote from: hamster_nz on December 13, 2018, 09:56:58 pm

Quote from: legacy on December 13, 2018, 09:33:59 pm
You did in two days?
... of spare time between the boy going to bed, and me going to bed.

It's not much to look at:

Good work!

One of my current work tasks is helping extend binutils (assembler, disassembler) and Spike to understand the proposed Vector instruction set. Very similar stuff. And then on to llvm...

NorthGuy · « **Reply #107 on:** December 14, 2018, 04:00:19 am »

Quote from: ataradov on December 14, 2018, 03:23:24 am

Quote from: NorthGuy on December 14, 2018, 03:20:22 am
I don't think ARM is somehow targeted to MCUs.
And what is Cortex-M0+ then?

An existing architecture used for a purpose which wasn't intended during original design.

Quote from: ataradov on December 14, 2018, 03:23:24 am

Quote from: NorthGuy on December 14, 2018, 03:20:22 am
There's no reason to choose ARM over RISC-V.
What is the interrupt latency on the RISC-V?

You don't know that. This is ISA, not architecture. You can design your MCU with very low interrupt latency. Or you can design an MCU with long pipeline and bad interrupt latency. That is actually were the benefit is. Anyone can design their own CPU with the design characteristics they want, and all of them can use the same ISA. Such things were completely impossible with ARM because the core was copyrighted and you had to live with what they gave you.

lucazader · « **Reply #108 on:** December 14, 2018, 04:25:34 am »

Continuing from NorthGuy's comments about interrupt latency being implementation independent:

If you look at SiFive's core design for the E20 core, which is targeted at the same level as the M0+ core by the looks of their marketing material, the interrupt latency is 6-cycle to a c handler whereas an M0+ is 15-cycles.
https://www.sifive.com/cores/e20

Now sure there is an note on there that this is when using the CLIC vectored mode. But the M0+ also has a vectored interrupt controller.

ataradov · « **Reply #109 on:** December 14, 2018, 04:27:39 am »

Quote from: lucazader on December 14, 2018, 04:25:34 am

the interrupt latency is 6-cycle to a c handler whereas an M0+ is 15-cycles.

Except that Cortex-M saves registers in those 15 cycles, and RISC-V only arrives at a register saving code after those 6.

EDIT: I don't see how it can do 6 cycles to the C code. May be someone can point out what it is doing those 6 cycles?

brucehoult · « **Reply #110 on:** December 14, 2018, 04:37:10 am »

Quote from: hamster_nz on December 13, 2018, 09:27:52 pm

I had to build dummy hardware for the "AON" (Always On) Peripheral, and the "PRCI" (used for clocking control) Peripheral, and it gets as far as attempting to configure the QSPI interface on address 0x10014000.

You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

Code: [Select]

$ cd freedom-e-sdk/software/hello
$ riscv64-unknown-elf-gcc -O -march=rv32i -mabi=ilp32 hello.c -o hello

objdump that and you'll find main calling puts calling _puts_r calling {strlen, __sinit, __sfvwrite_r}.
Eventually you'll find yourself (after all the bollocks in Newlib) down in _write()

Code: [Select]

00012cd0 <_write>:
   12cd0:       ff010113                addi    sp,sp,-16
   12cd4:       00112623                sw      ra,12(sp)
   12cd8:       00812423                sw      s0,8(sp)
   12cdc:       00000693                li      a3,0
   12ce0:       00000713                li      a4,0
   12ce4:       00000793                li      a5,0
   12ce8:       04000893                li      a7,64
   12cec:       00000073                ecall
   12cf0:       00050413                mv      s0,a0
   12cf4:       00055a63                bgez    a0,12d08 <_write+0x38>
   12cf8:       40800433                neg     s0,s0
   12cfc:       08c000ef                jal     ra,12d88 <__errno>
   12d00:       00852023                sw      s0,0(a0)
   12d04:       fff00413                li      s0,-1
   12d08:       00040513                mv      a0,s0
   12d0c:       00c12083                lw      ra,12(sp)
   12d10:       00812403                lw      s0,8(sp)
   12d14:       01010113                addi    sp,sp,16
   12d18:       00008067                ret

You need to have your emulator implement ecall, and do the right thing based on the code in a7:

57: _close
62: _lseek
63: _read
64: _write
80: _fstat
93: _exit
214: _sbrk

Except for _fstat (which needs a struct remapped), most of those are easy to just pass directly on to your host OS.

brucehoult · « **Reply #111 on:** December 14, 2018, 04:49:58 am »

Quote from: brucehoult on December 14, 2018, 04:37:10 am

You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

And if you cut your program down to ...

Code: [Select]

void _write(int fd, char *s, int len);

int main()
{
    _write(0, "hello world!\n", 13);
    return 0;
}

... you'll have a much smaller binary (I get 1532 bytes, 383 instructions) that still works fine on qemu user:

Code: [Select]

$ riscv64-unknown-elf-gcc -O -march=rv32i -mabi=ilp32 hello.c -o hello
$ size hello
   text	   data	    bss	    dec	    hex	filename
   1532	   1084	     28	   2644	    a54	hello
$ qemu-riscv32 hello
hello world!

Make that work on yours and you'll be sweet :-)

ok .. you'll need a link map something like HiFive1 uses to let you extract the elf to a raw binary. I'm sure you can handle that.

brucehoult · « **Reply #112 on:** December 14, 2018, 05:11:35 am »

Quote from: rstofer on December 13, 2018, 11:56:47 pm

What I really need is a reference book for the RISC-V that covers all the hardware details. Not just at 10,000 feet up but right down in the dirt.

RISC-V overview and tutorial: The RISC-V Reader: An Open Architecture Atlashttps://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/0999249118] [url]https://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/0999249118[/url]

RISC-V instruction set reference Work In Progress (User ISA, and privileged): https://github.com/riscv/riscv-isa-manual/releases/latest

Computer architecture textbook for undergrads, using RISC-V: https://www.amazon.com/Computer-Organization-Design-RISC-V-Architecture/dp/0128122757

Quote

Something I can convert from text to HDL or, better yet, maybe the HDL is given.

Are there any such references?

There is no such thing as The HDL. RISC-V is an ISA specification that anyone can implement any way they want.

And many people have already!

If you want concrete, open, HDL implementations, here is a selection you can study, build, put in an FPGA, and run: https://github.com/riscv/riscv-wiki/wiki/RISC-V-Cores-and-SoCs

Rocket is the original core design from Berkeley. Many other projects are based on it including SiFive's "freedom" and commercial x3n and x5n cores and BOOM.

PULPino is from ETH Zurich and is what has been used in the new NXP microcontroller SoC (with a RI5CY core and a Zero RISCY core as well as an M0 and an M4F)

PicoRV32 and VexRiscv are very popular for use in small FPGAs.

ReonV is based on the old Leon SPARC open-source implementation with the opcodes changes.

brucehoult · « **Reply #113 on:** December 14, 2018, 05:43:28 am »

Quote from: hamster_nz on December 14, 2018, 12:32:18 am

- I have the real hardware on my desk https://www.sifive.com/boards/hifive1 (one of the early team signature edition boards, no less!), to clear up any of my misunderstandings

Yup. I ordered one of the Signature Edition boards back in Dec 2016 (late Jan by the time it arrived in Moscow) https://twitter.com/BruceHoult/status/824965355755991041

I was pretty impressed that a company with about ten people had got the chip taped out, and back, and it worked at 320 MHz, and made the boards and software (including porting the Arduino libraries), in 18 months after being founded.

Two weeks later, someone on the support forums posted a video of playing the Dr Who theme on a square-wave software synthesizer they'd written for the HiFive1. I responded by spending about six hours on a quick hack to play the same thing from a proper WAV file using straight Arduino digitalWrite():

And then a Queen song (kinda topical right now!):

I think SiFive noticed these videos at the time -- Megan Wachs just asked me about them a couple of weeks ago and got me to resurrect the code for one of the demos in the SiFive booth at the RISC-V Summit last week.

Long story short ... after Michael Clark and I presented the rv8 simulator at CARRV (Workshop on Computer Architecture Research with RISC-V) in Boston in October 2017 various SiFive people took me to dinner and bars and suggested that I might like to come and work for them. As I was already impressed by the HiFive1 and they were at that time already taping-out the FU540 for the linux board it was a pretty easy sell as being more interesting than what I was doing at Samsung :-)

legacy · « **Reply #114 on:** December 14, 2018, 05:56:58 am »

Quote from: rstofer on December 13, 2018, 11:56:47 pm

I was wondering if you planned to implement the various registers that save state through the pipeline. I am interested in detecting and overcoming hazards.

When a customer asked me to help him at debugging his pipelined-CPU (VHDL), we found a CPU which basically was working fine, except for some registers mysteriously corrupted during the execution.

Digging I found there were a couple of bugs in how the pipeline was stalling the ALU during divisions and multiplication, plus another bug of the same kind in the load/store.

The pipeline was not correctly stalled, thus this caused data corruption.

Books usually don't cover anything at this level of details, and it makes sense because courses are already too heavy.

Anyway, my customer's CPU has eight stages. Although the instruction and data memory occupy multiple cycles, they are fully pipelined, so that a new instruction can start on every clock. The function of each stage is given as follows: (NOTE: the stages described below are different with those of MIPS R2K)

IF - First half of instruction fetch. Program Counter selection actually happens here, together with the initiation of instruction cache access
IS - Second half of instruction fetch, complete instruction cache access. Note that it's assumed that instruction accesses always hit the instruction cache
ID - Instruction decode, hazard checking (this stage is critical)
RF - RegisterFile Fetch
EX - Execution, which includes effective address calculation, ALU operation, and branch-target computation and condition evaluation (this stage is critical)
DF - Data fetch first half of data cache access
DS - Second half of data fetch, completion of data cache access. Note that the data access always hit the data cache.
WB - Write back for loads and register-register operations.

This working scheme is immediately defective in this description, because EX, DF and FS are assumed to take 1 clock edge, while there are scenarios when they take more than 1 clock edge (e.g. multiplication, division, data not in cache --> access to the ram --> n wait-states ---> n + m clock edges)

Therefore the pipeline needs to be stalled properly. This is usually not considered in books, but it's what you find in reality.

brucehoult · « **Reply #115 on:** December 14, 2018, 05:58:50 am »

Quote from: westfw on December 14, 2018, 01:06:59 am

The MSP430 code gcc produced for your example is depressingly bad. It fails to refactor the array access into pointer-based accesses, dutifully incrementing the index and adding it to each array base on each loop, when it could have used auto-incrementing indexed addressing, I think. (I thought that was an optimization that gcc would do before even getting to cpu-specific code generation. I guess not.)

Yes, I don't know why. gcc is perfectly capable of doing this on other ISAs.

Quote

Quote
I'm actually very disappointed that manufacturers of machines with condition codes don't seem to have added recognition of the C idiom for the carry flag and generated "add with carry" from it. gcc on every machine does recognise idioms for things such as rotate and generate rotate instructions
Adding a "rotate" is relatively easy because it's a single instruction. Supporting Carry means retaining awareness of state that isn't part of the C model of how things work. "Which carry were they talking about?" For example, the really short examples that people are posting are all based on having "loop" instructions that don't change the carry bit. We saw how that restricts register choice on x86. MSP430 doesn't have any such looping instructions (that I recall or see in summaries.) So the compiler would have to decide that some math is different than other math, and ... it makes my brain hurt just thinking about it.

Yes.

I did a little googling and found that one idiom said to produce near-optimal code with some compilers and ISAs is to do the arithmetic in double precision and then cast/mask/shift the result back down to normal precision:

Code: [Select]

long tmp = (long)a + b + carryIn;
int sum = (int)tmp;
int carry = (int)(tmp>>(sizeof(int)*8);

I've luck with the same kind of approach to generate instructions such as "give me the high bits of a multiply" in the past, but I haven't checked this idiom for carry myself yet.

Quote

(ARM has the "S" suffix for instructions to specify that they should update the flags, which is cute, I guess. But I'm not sure it's worth spending a bit on (and indeed, it's not there in Thumb-16)

PowerPC does the same thing with a "." suffix on opcodes to update the condition codes in cr0. (cr1..cr7 are updated only by cmp instructions).

legacy · « **Reply #116 on:** December 14, 2018, 06:24:28 am »

Quote from: ataradov on December 14, 2018, 01:30:28 am

Quote from: westfw on December 14, 2018, 01:06:59 am
I don't know. In some senses, having actual assembler modules seems "cleaner" than some of the things that compilers get forced into these days. (Consider the whole "sfr |= bitmask;" optimization in AVR...)

Memory mapped registers give you cleaner code. The reason AVR is hard is that it has limited address space. This is not a problem on 32-bit systems.

HC11 (8bit register machine, 16bit address space) comes with soft registers; basically, gcc uses the fist 256byte of CPU internal ram for this.

But are you sure that this makes cleaner code?

ataradov · « **Reply #117 on:** December 14, 2018, 06:44:41 am »

Quote from: legacy on December 14, 2018, 06:24:28 am

But are you sure that this makes cleaner code?

We are talking about different things. Memory-mapped general purpose registers is a horrible idea.

I'm talking about special registers that use special commands to access them (lie co-processors on ARM and MIPS) vs just mapping the same special registers into the regular address space where regular load/store instructions can get to them.

brucehoult · « **Reply #118 on:** December 14, 2018, 08:23:56 am »

Quote from: ataradov on December 14, 2018, 02:52:50 am

RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.

Than you're not looking.

There is an Embedded ABI being worked on, with fewer volatile registers -- probably a0..a3 and t0..t1 instead of a0..a7 and t0..t6 for Linux, thus cutting down the number of registers that need to be saved on an interrupt to six (plus ra) instead of fifteen. Also, all registers from 16..31 become callee-save, making the ABI identical between rv32i and rv32e. This is being worked on by people in the embedded community.

There is the CLIC (Core Local Interrupt Controller), a backward-compatible enhancement that was *specifically* designed with the needs and inputs of the embedded community. I've already pointed you to this. It provides direct vectoring to C functions decorated with an attribute, plus with a very small amount of ROMable code that can be built into a processor it provides vectoring to *standard* ABI C functions along with features such as interrupt chaining (dispatching to the next handler without restoring and re-saving registers), late dispatch (if I higher priority interrupt comes in while registers are being saved), and similar latencies to those ARM cores provide.

SiFive has developed a small 2-stage pipeline processor core *specifically* for deeply embedded real-time applications. There is no branch prediction .. all taken branches take 2 cycles. Other suppliers such as Syntacore and PULPino have similar cores, for example the Zero RISCy in the new NXP chip. SiFive has also developed an extension for the 3-series and 5-series cores to disable branch prediction for embedded real-time tasks (and of course turning part of the the icache into instruction scratchpad).

The Vector Extension working group had been bending over backwards to accommodate the wishes and needs of the embedded community and make the very lowest-end implementations simpler and better performing. As a simple example, the high-end people such as Esperano, Barcelona Supercomputer Centre, and Krste at SiFive wanted predicated-off vector lanes to be set to zero i.e. vadd dst,src1,src2 should not have to read dst. The embedded guys wanted the predicated-off vector lanes to be "left untouched". In this and in several other areas the design has been modified to better suit small embedded cores.

The Bit Manipulation Working Group is almost exclusively looking at things that primarily the embedded community want.

Claims that there is "no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs" is so far from the clearly obvious truth that it just about has to be trolling.

brucehoult · « **Reply #119 on:** December 14, 2018, 08:58:30 am »

Quote from: ataradov on December 14, 2018, 06:44:41 am

Quote from: legacy on December 14, 2018, 06:24:28 am
But are you sure that this makes cleaner code?
We are talking about different things. Memory-mapped general purpose registers is a horrible idea.

I agree with this. It's very important for execution pipelines that registers have "names" not "numbers". That is, the register an instruction refers to must be specified explicitly in the instruction, and not be subject to modification or calculation.

Quote

I'm talking about special registers that use special commands to access them (lie co-processors on ARM and MIPS) vs just mapping the same special registers into the regular address space where regular load/store instructions can get to them.

For peripherals, fine.

But *exactly* the same reasons that apply to general purpose registers not being memory mapped also apply to registers that affect the execution environment of the machine. And more.

brucehoult · « **Reply #120 on:** December 14, 2018, 09:05:34 am »

Quote from: brucehoult on December 14, 2018, 04:49:58 am

Quote from: brucehoult on December 14, 2018, 04:37:10 am
You'd have an easier time making a generic binary rather than a HiFive1 one, and using stdin/stdout emulation.

And if you cut your program down to ...

Even simpler, of course, you could just allocate a few bytes at some address in low memory (non RAM) -- call it STDIO_BASE perhaps. Then make loads from STDIO_BASE read a character from the host OS stdin, stores to STDIO_BASE+1 write a character to the host OS stdout and (optional) stores to STDIO_BASE+1 write a character to the host OS stderr.

That dead easy both to implement in the simulator and to write programs for.

Bonus points: write implementations of _read(), _write() that do that, so the rest of <stdio> Just Works.

brucehoult · « **Reply #121 on:** December 14, 2018, 11:22:54 am »

I'd just like to say thanks to all who have contributed to this thread (and I doubt it's dead yet): rstofer, lucazader, legacy, ehughes, DavidH, hamster_nz, NorthGuy, westfw, ataradov, obiwanjacobi, FlyingDutch

Cheers, guys :-)

westfw · « **Reply #122 on:** December 15, 2018, 07:27:32 am »

Quote

The MSP430 code gcc produced for your example is depressingly bad.

Here's what I get for a hand-written MSP430 version. It's sort-of interesting the way there ends up being a "local variable" for the carry, but I still get to use the addc instruction, thanks to the status also being available as a register...

(Now, it's been a while since I did any MSP430 assembly, I'm not sure I fully understand the C ABI, and I didn't actually ~~compile or~~ test this. But I think it should be pretty close. OTOH, I think some of t~~hose auto-incrementing mov instructions may turn out to be 32bits.~~ But... SO MUCH better than gcc did :-( )

Edit: I fixed it up to the point where it will at least go through the assembler OK... (auto-increment indexed addressing doesn't work for a destination...) (maybe I shouldn't clear ALL the flags...)

Code: [Select]

bignumAdd:
        push    sum
        push    savSR

        cmp     #1, cnt         ; if (cnt <= 0) return
        jl      exit
        clrc                     ; is this already clear?
        mov     SR, savSR       ; clear carry to start

loop:   mov     savSR, SR       ; get carry from sum, not cnt decrement.
        mov     @a+, sum        ; get a[n]
        addc    @b+, sum        ;  add b[n]
        mov     sum, 0(c)       ;   store c[n]
        mov     SR, savSR       ; save the carry info
        incd    sum             ;  increment destination (by 2)
        dec     cnt             ; decrement count
        jnz     loop            ; next word.
exit:
        pop     r10
        pop     sum
        ret

westfw · « **Reply #123 on:** December 15, 2018, 07:34:10 am »

Are are any books/curricula written on "comparative assembly language" ?
Kids today are barely exposed to one, I think, and even "back in the day" when we had both IBM360 and PDP11, we didn't really compare them...
I guess some of that gets covered in "computer architecture" classes, but I remember those being a lot more hardware-oriented...
Maybe you can't compare them without a hardware orientation? But it seems like it ought to be possible. I mean, *I* enjoy comparing instruction sets, and my background gives only a pretty vague handwave to the the actual implementation...

ataradov · « **Reply #124 on:** December 15, 2018, 07:39:31 am »

Quote from: westfw on December 15, 2018, 07:34:10 am

Maybe you can't compare them without a hardware orientation?

That is 100% the case. At least if you are actually comparing for performance. You can compare for size easily.

There is a number of fun examples for Cortex-M7 where rearranging the order of a couple absolutely independent instructions, changes the speed of execution by a factor of two. This is because CM7 is a dual-issue pipeline and integer and floating point instructions can essentially execute at the same time. It is still the same instruction set as Cortex-M4, but how you write the code now matters.

And comparing any of this to modern X86 is just silly.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: RISC-V assembly language programming tutorial on YouTube (Read 63011 times)

Share me