Author Topic: RiscV v.s ARM Cortex instructions. (Read 1524 times)

MT · « **on:** February 27, 2023, 05:15:34 pm »

Seams Risc V dont have an equivalent to the ARM Cortex UXTB instruction, so how many operations would a 32bit Risc V take to do the same?

UXTB
Unsigned Extend Byte extracts an 8-bit value from a register, zero-extends it to 32 bits, and writes the result to the destination register.
The instruction can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit value.

woofy · « **Reply #1 on:** February 27, 2023, 05:59:37 pm »

Would SRLI followed by ANDI do the job?

SiliconWizard · « **Reply #2 on:** February 27, 2023, 08:04:49 pm »

Quote from: woofy on February 27, 2023, 05:59:37 pm

Would SRLI followed by ANDI do the job?

Yep.

brucehoult · « **Reply #3 on:** February 27, 2023, 08:09:05 pm »

It would indeed. Or just ANDI with 255 if the rotation is zero.

More generally a "SLLI;SRLI" pair can extract any zero-extended field of any bit size or bit position, and "SLLI;SRAI" can extract any sign-extended field of any bit size of bit position.

SiliconWizard · « **Reply #4 on:** February 27, 2023, 08:15:58 pm »

Quote from: brucehoult on February 27, 2023, 08:09:05 pm

It would indeed. Or just ANDI with 255 if the rotation is zero.

More generally a "SLLI;SRLI" pair can extract any zero-extended field of any bit size or bit position, and "SLLI;SRAI" can extract any sign-extended field of any bit size of bit position.

I looked in the bitmanip extension, and didn't find any instruction there that could do it in a single instruction.
Which I found a bit odd. IIRC, such a "byte extraction" instruction was proposed in early drafts. A lot of proposals have been stripped off. I can understand that, as the first draft I read looked like a monster.
And as often in the RISCV ISA, they probably favored reducing the number of instructions, and considered this one could be optimized fusing two existing instructions.

woofy · « **Reply #5 on:** February 27, 2023, 08:47:24 pm »

Quote from: SiliconWizard on February 27, 2023, 08:15:58 pm

Quote from: brucehoult on February 27, 2023, 08:09:05 pm
It would indeed. Or just ANDI with 255 if the rotation is zero.

More generally a "SLLI;SRLI" pair can extract any zero-extended field of any bit size or bit position, and "SLLI;SRAI" can extract any sign-extended field of any bit size of bit position.

I looked in the bitmanip extension, and didn't find any instruction there that could do it in a single instruction.
Which I found a bit odd. IIRC, such a "byte extraction" instruction was proposed in early drafts. A lot of proposals have been stripped off. I can understand that, as the first draft I read looked like a monster.
And as often in the RISCV ISA, they probably favored reducing the number of instructions, and considered this one could be optimized fusing two existing instructions.

Yeah, I guess there are diminishing returns in adding too many niche instructions. MIPS does have nice bitfield instructions though (EXT and INS in MIPS32 release 2).

Nominal Animal · « **Reply #6 on:** February 27, 2023, 11:48:22 pm »

In general, examining the operation to be done, and dividing it in different ways can yield much better solutions than just mapping machine instructions from one architecture to another.

If you do not consider the capabilities of the target instruction set architecture, and only look for equivalent instructions or equivalent instruction sequences, you won't find the most efficient ways to implement the underlying sequence of operations.

For ARM Cortex-M4 and -M7, you want ARMv7-M Architecture Reference Manual. (Both M4 and M7 have the DSP extension mentioned built-in, i.e. SMLAD and such.)

Even such a simple operation as blending two 15-bit RGB pixel values together, i.e. p = 0 (0%) to 33 (100%)
r' = (r * p + R * (33 - p)) >> 5 = (R*33 + p*(r - R)) >> 5
g' = (g * p + G * (33 - p)) >> 5 = (G*33 + p*(g - G)) >> 5
b' = (b * p + B * (33 - p)) >> 5 = (B*33 + p*(b - B)) >> 5
can be implemented in many different ways on 32-bit architectures, depending on exactly what kind of machine instructions are available and efficient.
You definitely do not need to do six multiplications per pixel as the above might suggest. And yes, 100% blend of rgb, 0% of RGB, is actually p = 2ⁿ+1 for n-bit color components, like n=5 here, not 2ⁿ. Many disagree, but they're wrong. It is trivial to verify. (It is because (2ⁿ-1)*(2ⁿ+1) = 2^2*n-1, and the right shift is a truncation/rounding towards zero.)

westfw · « **Reply #7 on:** February 28, 2023, 01:33:56 am »

Quote

a "SLLI;SRLI" pair can extract any zero-extended field of any bit size or bit position

This is particularly odd feeling coming from an 8bit microcontroller world without barrel shifters, where shifts (more than 1bit shift and/or more than 8bits shifted) are quite expensive.

Similarly, I've seen CM0 compilers implement bit tests with a shift of the desired bit to carry or sign position, followed by a conditional branch. It essentially burns a register, but there is no "and immediate" instruction on CM0 (non-destructive or otherwise), so it ends up quicker and shorter than other possibilities.

brucehoult · « **Reply #8 on:** February 28, 2023, 03:32:18 am »

Quote from: woofy on February 27, 2023, 08:47:24 pm

Quote from: SiliconWizard on February 27, 2023, 08:15:58 pm
Quote from: brucehoult on February 27, 2023, 08:09:05 pm
It would indeed. Or just ANDI with 255 if the rotation is zero.

More generally a "SLLI;SRLI" pair can extract any zero-extended field of any bit size or bit position, and "SLLI;SRAI" can extract any sign-extended field of any bit size of bit position.

I looked in the bitmanip extension, and didn't find any instruction there that could do it in a single instruction.
Which I found a bit odd. IIRC, such a "byte extraction" instruction was proposed in early drafts. A lot of proposals have been stripped off. I can understand that, as the first draft I read looked like a monster.
And as often in the RISCV ISA, they probably favored reducing the number of instructions, and considered this one could be optimized fusing two existing instructions.

Yeah, I guess there are diminishing returns in adding too many niche instructions.

Very much diminishing returns. The general rule when we were doing the bitmanip extension was that any new instruction had to replace at least 3 or 4 existing instructions unless it was an incredibly common operation -- and preferably many more than that.

Bitfield extract only replaces two instructions (as above), so at most it saves one clock cycle. Any new bitfield extract instruction would have to be a full 4 byte opcode, which means no code size saving at all over the shift pair sequences which are also 4 bytes if you don't mind the final extracted value being in the same register as the structure it was extracted from (and that register is one of the 8 "C extension" registers.

Quote

MIPS does have nice bitfield instructions though (EXT and INS in MIPS32 release 2).

I proposed adopting the Motorola 88000 instructions ext, extu, mak, mask and maybe set though the latter doesn't fit RISC-V's 2R 1W pipeline design.

westfw · « **Reply #9 on:** February 28, 2023, 07:06:18 am »

Quote

Very much diminishing returns.

That'd be the point of RISC in general, right? I remember a bunch of "string" instructions added to the PDP10 that ended up being slower than the regular instructions that they would have replaced. Just to please the COBOL folks.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: RiscV v.s ARM Cortex instructions. (Read 1524 times)

MT

RiscV v.s ARM Cortex instructions.

woofy

Re: RiscV v.s ARM Cortex instructions.

SiliconWizard

Re: RiscV v.s ARM Cortex instructions.

brucehoult

Re: RiscV v.s ARM Cortex instructions.

SiliconWizard

Re: RiscV v.s ARM Cortex instructions.

woofy

Re: RiscV v.s ARM Cortex instructions.

Nominal Animal

Re: RiscV v.s ARM Cortex instructions.

westfw

Re: RiscV v.s ARM Cortex instructions.

brucehoult

Re: RiscV v.s ARM Cortex instructions.

westfw

Re: RiscV v.s ARM Cortex instructions.

Share me