Author Topic: ASM programming is FASCINATING! (Read 8960 times)

tggzzz · « **Reply #100 on:** August 01, 2020, 09:43:08 am »

Quote from: greenpossum on August 01, 2020, 12:21:48 am

That's just the way EEVblog is, every discussion will eventually turn into a exposition by experts of angles far outside the scope of the original query or observation. Often educational but sometimes overwhelming.

It is not just eevblog. I've seen the phenomenon for 35 years; it started on usenet and continues there and elsewhere.

MK14 · « **Reply #101 on:** August 01, 2020, 04:44:33 pm »

Quote from: eti on August 01, 2020, 12:12:06 am

Quote from: MK14 on July 30, 2020, 07:43:39 pm
Quote from: ebastler on July 30, 2020, 06:57:24 pm
We seem to have lost the OP a while ago. More specifically, we lost him right after his original post. Maybe the fascination didn't last.

I noticed this as well, a long while ago. But decided, the thread is of general interest, and seems to be a fun and educational experience. I'm pleased with the information in it, and have enjoyed participating and reading it.
We were at least a bit off-topic as well. (Opinion dependent!).
So, if they had sticked around, it may have taken a completely different course.

I suspect, in a number of cases, you get the following:
[I'm really revving to go, and want to do assembly language programming, it must/will be great fun]
..
>>>Tries their first assembly program, 8 lines long, takes 38 minutes, to get rid of 5 assembly errors, then runs it and it crashes for no apparent reason.
..
[Gives up and moves on to something else]

I NEVER give up! I may "temporarily set-aside" an interest, no matter how fleeting my current interest in it, is, and allow my life to continue, and allow whatever I've learnt up until that point, to subconsciously absorb into my mind and digest for however many weeks, months or years, but my interest is still there. I may not be au fait with this subject, but I have returned, re-visited my ASM interest, failed a few times to grasp... but it's truly absorbing into my mind now

Now the dust has settled a bit.

Yes, assembler, can be a rather hard and daunting subject, to master.
It can look so, temptingly easy, with each assembly instruction, only doing a very simple and quick operation.
E.g. INC A .. Increment Accumulator by 1, so 57 becomes 58 and so fourth.

NOP .. Do absolutely nothing this machine cycle(s).

CLC .. Clear Carry flag

But the real difficulty, is realising how, using these very simple building blocks. You can create a powerful, working program from them.

Speaking from experience, I'd recommend, doing 3 things initially..
(1) Practice
(2) Practice
(3) Practice

Do all 3 things, don't skimp and only do one or two of them.

There probably are some good books on how to go from beginner, to Assembly language programmer.
If you have the time, patience and desire to read books like that.

EDIT: Thread Cleanup - Removed later posts - To hopefully get back on-topic.

RJSV · « **Reply #102 on:** August 01, 2020, 08:31:01 pm »

More thoughts, on 6502 vs Z-80:
Very old-school, but my guess is PIC programmers might benefit from this thread. Now comparing implementing an indexed list, there is an interesting reversal. With Z-80 ASM code, you use a base value, in IX register, that's 16 bits for a 64k total RAM space, and plus a fixed offset of 8 bits. Visualize IX,14 for example. Now with a 6502 it is reversed: There is a fixed BASE, plus a variable offset held in a register.
So that case visualize 20,000 plus contents of 'Y' register. That's the reverse case, as you get a total RAM of 64k, but the relevant block is 256 bytes. Sounds more natural, and practical. Hopefully, my presentation will precede some helpful ASM learning, I always used that difference as a memory tool, thinking first that the (two) processors had reversed scopes, and then I could recall the exact structure, and thus write code appropriately.
However, a project partner disagreed with convention,
writing SELF-MODIFYING code. In horror, I nastilly spoke "You must never, ever write self-mod code, it won't work from a ROM". He was pulling out the 2-byte base value, from the 6502 ASM object code, figuring to increment that, for access to a larger, 8k block (C-64 bit mapped' graphics). But it worked, so...
I wished for better people skills as there was arguing around that. But now, maybe some legacy projects have to deal with issues, of non- writable ROM, in year 2020 legacy re-issues. (Can always copy to a RAM buffer, for correct function, in any hand-held legacy gamer.) I think, ultimately, the other guy was right.
By the way, a 6502 could do a 16 bit alteration, by a few lines; change the low byte and bring any overflow / underflows to the high byte, but there are several lines
of code just to retrieve and store. Probably, the guy was thinking Z-80, while making 6502 code! But it works, and I felt like a snob, "his code inferior or dodgy". Humans are complex, imperfect.
While reading such, some learning takes place; how to write code, either 6502 or Z-80, by playing each system concepts against the other. I think one justification was, we can re-write for a hypothetical, when it comes real. I have to respect that.
Oh, and I would use caution w an Altair Simulator, regarding exact real-time clock states, it may not strictly follow the 'legacy' functions. If I recall, the Z-80 could run on a 4 MHz clock, using 4 clocks per (instruction). The 6502 often used a 1 MHz clock, with one clock per instruction.
Injecting some personality into the process is a good memorizing tool, that's why the 6502 and Z-80 can play off each other...

T3sl4co1l · « **Reply #103 on:** August 02, 2020, 01:14:13 am »

I like to approach optimization from the compiler's side, and go from there. Work in stages:

1. Any high-level optimizations (to the algorithm) have already been done. (Start there first!!)
2. You can write your ASM function to drop in, so the compiler knows how to use it and you can go back to #1 if a new strategy comes up. (You may end up discarding that function in the process, wasting effort -- hence the emphasis on the front end.)
3. While you still need to comply with the compiler's contract for call/return conventions (the ABI), you have complete freedom to (ab)use the hardware within the function. If you can make better use of it, there you go, that's clock cycles saved!
4. If it looks like #1 is basically done, you can go farther and expand into other functions, optimizing neighboring ones, inlining them in a longer or higher-level function, etc. (The compiler does some of this already, but it won't discover many optimizations it couldn't have already made in the base functions.)

Case study: a simple DSP (digital signal processing) operation. The basic fundamental of DSP is the multiply-accumulate (MAC) instruction, A = A + B * C, used in a loop to convolve two arrays. (Convolution is a big word for a simple process: for each element in two arrays, multiply the elements pairwise, then sum up their results. If you know linear algebra, it's the dot product of two vectors. This isn't a full definition of convolution, more a functional example.)

For example, if we convolve a sequence of digital samples (the latest sample, and the N previous samples), with a sequence of samples representing the impulse response of a filter (also of length N), the result is the next sample in the series of that signal as if it were passed through that filter. This is a FIR (finite impulse response) filter: an impulse input (one sample nonzero, the rest zero) gets multiplied by each element of the filter in turn, all of which can be nonzero, up to length N where the magnitude implicitly drops to zero, because, well, that's the size of the array.

FIR filters are great because you can control the impulse response, well, exactly; that's all the filter array is! And by extension the frequency response, and it's very easy to get certain minimum-phase or equal-delay properties from them. The downside is, if your desired filter has a very long time constant (e.g., a low frequency, narrow bandpass/stop, or long time delay or reverberation), you need as long of an array. And as much sample history. And you need to compute the convolution every sample.

So, DSP machines need to crank a lot of data, and tend to be very powerful indeed, if relatively specialized for this one task. (Example: one might use a 200MHz core capable of delivering one MAC per cycle, so could process around 4500 words per 44.1kHz audio sample; a FIR filter could have a minimum cutoff on the order of 20Hz.)

If we use a history of input and output samples, we can employ feedback to useful effect. We have to be careful, obviously we don't want that accumulating to gibberish, with exponential growth; it has to be stable. And being a feedback process, we expect it to have an exponential (well, discrete time, so technically, geometric) decay; an infinite impulse response (IIR). Whereas the FIR filter coefficients can take any value, the IIR filter coefficients have to be computed carefully. Fortunately, that analysis has been done, so we can design filters using tools, without having to get too in-depth with the underlying theory. (Which revolves around the Z transform. It happens to map to the Fourier transform -- everything the EE knows about analog signals, already applies to DSP, if in a somewhat odd way.)

Anyway, that explains the basic operation. How do we compute it?

Here's a basic MAC operation, taking as parameters, a 32-bit "accumulator", a 16-bit sample, and an 8-bit coefficient. (This wouldn't be nearly enough bits for a proper filter, but works well enough for a simple volume control -- we might convolve arrays of live samples, with gain coefficients, to create a mixing board. We can fill the arrays however we like, after all; the convolution doesn't have to be across time, it just is when we are filtering a signal.) The format is integer, signed, presumably in fixed point. (A typical case would be both sample and coefficient in fractional format (0.16 and 0.8 ), so that the result is 8.24, and the top 8 bits are simply discarded, after testing for overflow of course.

Code: [Select]

int32_t mac32r16p8(int32_t acc, int16_t r, const int8_t* p) {

	return acc + (int32_t)r * *p;
}

In avr-gcc 4.5.4, this compiles to: (comments added inline to help with those unfamiliar with the instruction set)

Code: [Select]

mac32r16p8:
    push r14
    push r15
    push r16
    push r17   	    ; save no-clobber registers
    mov r14,r22
    mov r15,r23
    mov r16,r24
    mov r17,r25     ; r14:r15:r16:r17 = acc
    mov r30,r18
    mov r31,r19     ; Z = p
    ld r22,Z        ; r22 = *p
    clr r23
    sbrc r22,7      ; skip following instruction if bit in register set
    com r23         ; bit 7 = sign bit; com = complement -- ah, this is a sign extend operation
    mov r24,r23
    mov r25,r23     ; sign extend to 32 bits (r22:r23:r24:r25 = (int32_t)*p)
    mov r30,r20
    mov r31,r21     ; [1]
    mov r18,r30
    mov r19,r31     ; ?!?!
    clr r20
    sbrc r19,7      ; oh, sign extending r20:r21 = r...
    com r20         ; probably the compiler allocated r18:r19:r20:r21 at [1],
    mov r21,r20     ; so it had to move to a temporary register (r30:r31) first. sloppy.
    rcall __mulsi3  ; r22-r25 * r18-r21 = r26-r31; 37 cycles
    mov r18,r22
    mov r19,r23
    mov r20,r24
    mov r21,r25     ; ?!
    add r18,r14
    adc r19,r15
    adc r20,r16
    adc r21,r17     ; acc + product
    mov r22,r18
    mov r23,r19
    mov r24,r20
    mov r25,r21     ; return result
    pop r17         ; restore no-clobber registers
    pop r16
    pop r15
    pop r14
    ret

- Note the surrounding push and pops: the ABI says, don't clobber r2-r17 and r28-r29. This function uses a lot of registers (8 passed in, 4 passed out), so that might happen. Push and pop costs a couple cycles each (most instructions are 1 cycle, but most memory accesses add a 1-2 cycle penalty), so they're a priority to get rid of.
- Most of the instructions are moves. Hmm, that's not a good sign. Why can't we get the registers in the right places, to begin with? Well, calling conventions are fixed by the ABI, nothing we can do about that at this stage, but there's still more shuffling going on than that.
- Though, it looks like acc starts in r22-r25, and that's also where our output goes. Hmmmmm...
- Also, the compiler has already made obvious boners: there is a MOVW instruction which copies registers pairwise; instead of mov r14,r22; mov r15,r23 it should've used movw r14,r22 (and the rest).
- It looks like everything is just setup to use the library wide-multiply call __mulsi3, a 32 x 32 = 32 bit multiply. This sounds terribly inefficient. I mean, for what it does, the library call isn't bad, 37 cycles for that much work -- but we were supposed to be doing 8x16 here. This is ridiculous!

But it is standard practice, when the compiler encounters operations that would be too difficult to reason about. GCC will emit MUL instructions for byte multiplies, but anything larger uses library calls.

Naturally, they don't have libraries for every possible combination, only the most common -- signed-signed, signed-unsigned and unsigned-unsigned at 16 and 32 bits each I think.

Library calls also use a custom ABI, are never inlined, and never pruned (even with -flto). So the unused parts (sign extension and correction) waste code space, too.

So let's have a go at this, eh? What can we get it down to? Here's what I have in my project:

Code: [Select]

mac32r16p8:
    movw r30, r18
    ld r18, Z+      ; get multiplier

; r25:r24:r23:r22 += mul_signed(r21:r20, r18)

    eor r19, r19    ; use r19 as zero reg
    mulsu r18, r20  ; p*lo (result in r0:r1)
    sbc r24, r19
    sbc r25, r19    ; sign extend
    add r22, r0
    adc r23, r1
    adc r24, r19
    adc r25, r19
    muls r18, r21   ; p*hi
    sbc r25, r19
    add r23, r0
    adc r24, r1
    adc r25, r19
    eor r1, r1
    ret

- Yup, the push/pop can be removed!
- Wait, how is this even so short? Yeah the compiler is wasteful, but heck! Well, inlining the multiply and stripping it down to the required 8x16 bit operation saves a hell of a lot of effort. The two sub-terms need to be added together (same way you do 1 x 2 = 3 digit multiplication by hand). The MULS(U) instructions return carry set if the result needs to be corrected for signedness; in essence we sign-extend the result, hence the sbc rx, 0's into the accumulator. It's a lot more addition than the 2-3 add/adc needed for an unsigned operation, but it's still better than adjusting for sign after the fact (i.e. using an unsigned multiply).
- Clearly, the semantics of doing all this is more involved than just one instruction. Extra registers need to be allocated (MUL uses r0:r1 as implicit destination; and I've used r19 to hold zero). Probably there's no equivalent in GCC's internal representation, either (where most of the optimization is done). So they give up and call a library.
- Conveniently, acc is passed in the same place the result is returned, so we can just accumulate directly into those registers. (This depends on parameter order -- swapping parameters alone may yield optimization!)
- Only one instruction is needed to maintain compiler compatibility: r1 is used as the zero register, so needs to be zeroed after use.

I further inlined this into the looping function. Notice the ld r18, Z+ postincrement instruction; r30:r31 isn't read after this, it's discarded instead. (There's no cycle penalty for ld Z vs. ld Z+ so it doesn't matter that I left it in there.) I can abuse this clobbered value of Z to just run this function in a loop, loading and accumulate all the products right there.

But further, I can inline it, saving a few more cycles (loop overhead being less than call overhead + unrolled loop).

Overall, the optimizations on this project yielded almost a tenfold improvement -- on the meager platform, it went from hardly worth it (two mixer channels, ~50k MACs/s) to reasonably featureful (two biquad filter stages and 8 mixer channels, ~500k MACs/s).

Tim

KL27x · « **Reply #104 on:** August 02, 2020, 03:33:47 am »

Quote

It only has RETLW (Return literal in W), so any subroutine that needs to pass back a value in W has to have multiple exit points, one for each possible value!

A lot of the cool new instructions are convenient, but in many cases we were achieving the same thing with a more basic instruction set, already.

In this case, you can transfer contents of w to any register you want and retrieve it after calling the subroutine. The "new feature" mostly just saves code space (and 1 memory register) and execution cycles. Any given subroutine is going to change registers and/or ports. That's what they're for; that's all they do; that's what code does at the most basic level. The w register is just one more of many.

You don't do retlw lookup tables because it's the only way to do it. You do it where it's convenient. We still use them in 14 bit PIC, because they're still convenient.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
edited to add: if you don't have a return instruction, you use up one general data register and two lines of program memory after the call, and it's somewhat uncommon to need to do this in the first place. If you have only return and no retlw, your lookup tables will take up twice the program memory. E.g., instead of

retlw ['a']
retlw ['b']
retlw ['c']

you would have to do

movlw ['a']
return
movlw ['b']
return
movlw ['c']
return

also your BRW or addwf pcl instruction will need an extra instruction prior to it to double the value of indexing
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

In my own code, 90% of my "returns" could be "retlw [arbitrary value]." And I still use retlw data tables. When deciding what to cram into the 12 bit core, they left out the one of two instructions that was less valuable with the easier work around. "Return" is more expendable. The only place where it's not expendable is return from an interrupt. And here, you have retfie, only. You don't have retfielw (which here would be almost useless).

Eti: everything you can do in C, you can do in assembly. But in more complex micros, writing assembly will make your spin and is extremely time consuming.

westfw · « **Reply #105 on:** August 02, 2020, 04:29:59 am »

Quote

In many cases we were achieving the same thing

I've been somewhat amused that ARM CM0 doesn't have the "shifted constant" version of MOV, but that you can load a small constant and then do a shift, in the same amount of space. (you don't get the other weird modes of MOV constant, but ... they are less common.)

VK3DRB · « **Reply #106 on:** August 02, 2020, 05:59:03 am »

Quote from: MK14 on August 01, 2020, 04:44:33 pm

Yes, assembler, can be a rather hard and daunting subject, to master....

Rubbish. Assembler is easy. Just read the manual that came with the assembler or macro assembler. These compilers are not hard to master.

Assembly on the other hand can be a little hard and daunting for newcomers. A good place to start is read the datasheets and the programming literature for a particular microcontroller. Use an emulator or ICE or other debug tool. For Microchip, the PICkits are a good tool to step through your code to know what is going on. A "hello world" can be as simple as configuring a UART and echoing a character back; or just flashing an LED; or even just loading a register and reading it back.

Berni · « **Reply #107 on:** August 02, 2020, 07:53:24 am »

Quote from: VK3DRB on August 02, 2020, 05:59:03 am

Quote from: MK14 on August 01, 2020, 04:44:33 pm
Yes, assembler, can be a rather hard and daunting subject, to master....

Rubbish. Assembler is easy. Just read the manual that came with the assembler or macro assembler. These compilers are not hard to master.

Assembly on the other hand can be a little hard and daunting for newcomers. A good place to start is read the datasheets and the programming literature for a particular microcontroller. Use an emulator or ICE or other debug tool. For Microchip, the PICkits are a good tool to step through your code to know what is going on. A "hello world" can be as simple as configuring a UART and echoing a character back; or just flashing an LED; or even just loading a register and reading it back.

Its one thing figuring out the tiny instruction set of a PIC16F where you have something like 35 opcodes in total. In fact its so simple that a single Wikipedia page covers all the information needed to write assembly for all the PIC families from PIC12 to PIC24: https://en.wikipedia.org/wiki/PIC_instruction_listings

The modern x86 instruction set has >1000 instructions. There is a mountain of internal registers, lots of them sticking around for legacy reasons. In terms of execution time the instructions can have complex dependencies due to how the pipeline works. For example you get instructions like PMOVZXBW with a huge long opcode of "66 0F 38 30 [r]" that just simply does "Zero extend 8 packed 8-bit integers to 8 packed 16-bit integers". Sure you don't need to know all of it and can get by with the basic move, math and conditionals but you will have a hard time beating C when it comes to writing a reasonably complicated algorithm. You need to know a LOT to be able to write assembler that is more optimized than what C spits out. While writing the C code does that is a lot easier and requires less knowledge (And can be applied to other architectures too).

Still this is no reason to never touch assembler. Knowing the basics of how assembler works on your particular architecture can really help you understand what C has to do in order to make things happen. This makes it easier to understand the occasional oddities in C and help you write faster C code. If nothing else you can look at the disassembly and check any critical tight loops for any potential improvements. But writing whole apps in assembler doesn't make sense these days. The actual app tends to get complex enough already, so you don't want to make it even harder by having all the code look like endless unreadable towers of instructions with no format while having think about how to do something rather than just writing it out as a single line of code. You can still stick some inline assembler in a C function to hand optimize a fast loop if needed, but leave the mountain of other non critical stuff to C.

Ian.M · « **Reply #108 on:** August 02, 2020, 07:55:52 am »

Quote from: KL27x on August 02, 2020, 03:33:47 am

You don't do retlw lookup tables because it's the only way to do it. You do it where it's convenient. We still use them in 14 bit PIC, because they're still convenient.

In my own code, 90% of my "returns" could be "retlw [arbitrary value]." And I still use retlw data tables. When deciding what to cram into the 12 bit core, they left out the one of two instructions that was less valuable with the easier work around. "Return" is more expendable. The only place where it's not expendable is return from an interrupt. And here, you have retfie, only. You don't have retfielw (which here would be almost useless).

Actually the baseline core doesn't even have RETFIE, because it doesn't have *any* interrupts!

The interrupt capable enhanced baseline core (which very few PICs use, IIRC only PIC16F527, PIC16F570 & PIC16HV540) shoehorns an extra three instructions into the 12 bit core instruction decoder, including both RETFIE to support interrupts and RETURN, and also, to make it semi-usable, increases the stack to four levels.

Quote from: VK3DRB on August 02, 2020, 05:59:03 am

Assembly on the other hand can be a little hard and daunting for newcomers. A good place to start is read the datasheets and the programming literature for a particular microcontroller. Use an emulator or ICE or other debug tool. For Microchip, the PICkits are a good tool to step through your code to know what is going on. A "hello world" can be as simple as configuring a UART and echoing a character back; or just flashing an LED; or even just loading a register and reading it back.

On learning PIC assembly language by doing:

A PICkit (2 to 4) is a great debugging aid *IF* your target supports debugging. To do so, the PIC must have hardware debug support built into its silicon^*. Baseline PICs, many midrange PICs and some enhanced midrange PICs don't. Choose a debug capable PIC and enjoy learning by stepping through your code instruction by instruction, watching the actual effect of pins changing in your target circuit.

OTOH The MPLAB IDE simulator is cycle accurate for the core. Its also reasonably good for supported peripherals, but a lot of the more interesting peripherals are either only partially supported or aren't supported at all. However, you can simulate anywhere, without having to be tethered to that big mess of wires and chips on your bench!

* For some non-debug-capable PICs you can buy an expensive and fragile 'debug header', which includes a special debug capable version of the target, with additional pins bonded out to support the ICD interface on a footprint converter board. They are a PITA to work with on anything other than DIP footprint target boards. Unless you are backed into a corner avoid Microchip 'debug headers' as choosing a debug capable PIC in the first place with enough pins to reserve the three ICSP/ICD ones for programming and debugging is almost always a better option. You may say "but cheeseparing management demands I save $0.02 on the BOM" and forces you to choose a non-debug-capable PIC , but nowadays there's almost certainly an even cheaper debug capable enhanced midrange device that can do the job.

KL27x · « **Reply #109 on:** August 02, 2020, 08:18:36 am »

Quote

Actually the baseline core doesn't even have RETFIE, because it doesn't have *any* interrupts!

Oh, yeah. I keep forgetting that. Westf has pointed that out to me before a couple of times. I'm glad he refrained and allowed you, this time. I kinda moved on from baseline before even turning my attention to interrupts, and I somehow got the idea that they must have them, too. (The only reason I said that the 12F54/7 "fruit machine" code didn't have an ISR was that literally there was no interrupt code in the program. It didn't occur to me that it wasn't even possible, at the time I said it.)

Quote

The interrupt capable enhanced baseline core (which very few PICs use, IIRC only PIC16F527, PIC16F570 & PIC16HV540) shoehorns an extra three instructions into the 12 bit core instruction decoder, including both RETFIE to support interrupts and RETURN, and also, to make it semi-usable, increases the stack to four levels.

Cool, yeah. It makes much more sense this way. That return and retfie are introduced together. Because at the core, I can't really imagine why retfie and return would need to occupy separate chunks of the core, at all. I wonder if they are really only distinguished by the IDE for error handling and semantics? I suppose once an interrupt handler is added, retlw is the more dispensable one of the two.

edit: oh. Well, that only holds up until you add automatic context-saving. So you still need some etra core space to distinguish between return and retfie at that stage, but maybe not completely redundant. But without automatic context-saving (which many older PICs don't have) it seems like retfie and return should be interchangeable other than your IDE or assembler wouldn't allow it.

Ian.M · « **Reply #110 on:** August 02, 2020, 08:37:11 am »

The enhanced baseline core kept RETLW (opcode 0x8nn). The only difference between enhanced baseline RETURN (opcode 0x01E) and RETFIE (opcode 0x01F) is one bit, which is needed for the instruction decoder to determine whether or not to set the GIE flag (to re-enable interrupts on RETFIE) and restore normal context. The 'classic' baseline core doesn't use any 0x01n opcodes.

KL27x · « **Reply #111 on:** August 02, 2020, 08:50:53 am »

Musta added the edit to my post while you were typing.

I'm sure some of the PICs I have used had no automatic context saving. Your ISR had to start by saving context (w, page, bank, STAT?), then restore that stuff before you retfie. The PIC templates included that for you; you have to leave it there to assemble (if you are using interrupts and want it to work).

And they could technically have done the same with the GIE enable? Re-enable before retfie? Curious... Because now, you don't need separate retfie/return, do you? Or what else am I missing?

... I suppose GIE bit would have to be globally accessible from all banks, if it must actually be the very last instruction before retfie, and maybe they couldn't use it or didn't have any more bits available, or I'm missing something more basic... hmm. reenable, interrupt before retfie executes, potential stack overflow?

Sorry for the digression. To the OP, especially, lol. And this is why threads go on for pages with nothing to do with the OP.

Ian.M · « **Reply #112 on:** August 02, 2020, 09:29:43 am »

Reenabling GIE would have to be deferred till after the next RETURN if you did it that way. I suspect that it would be simplest to update GIE when the next NOP is executed as RETURN 'NOP's out the following instruction in the pipeline while its restoring the PC from the stack. Therefore in your main code you'd BSF or BCF GIE followed by a NOP to action it. Active skips and GOTO also 'NOP' out the following instruction so they'd have to be avoided between setting GIE and the ISR RETURN.

A bigger issue that made automatic context saving unavoidable was the choice forced on them by the lack of memory mapped SFR space (they'd already reduced the unbanked GPR space to only four bytes), to use a non-memory-mapped bank select register, so you couldn't read it back for context saving.

KL27x · « **Reply #113 on:** August 02, 2020, 09:40:25 am »

I think I understand why, now. But what's really amazing is that you know such minute details, at all!? Incredible.

I suppose that's where the "e" comes from. RETurn From Interrupt.... and reEnable.

KL27x · « **Reply #114 on:** August 02, 2020, 10:01:58 am »

Quote from: Ian.M on August 02, 2020, 08:37:11 am

The enhanced baseline core kept RETLW (opcode 0x8nn). The only difference between enhanced baseline RETURN (opcode 0x01E) and RETFIE (opcode 0x01F) is one bit, which is needed for the instruction decoder to determine whether or not to set the GIE flag (to re-enable interrupts on RETFIE) and restore normal context. The 'classic' baseline core doesn't use any 0x01n opcodes.

I chewed on that. So that extra bit doesn't even get processed in the normal cycle. Because all 12 bits of address space are already used. So it only gets checked in the last branch of the core machinery which does the new return-that-conserves-w-register instruction plus or minus the GIE enable, after. I feel like I'm one step closer to understanding the flow chart thing depicting the hardware structure in the datasheets. In the beginning, I thought that day would never come, but now I'm not 100% sure it won't, some day. (And if I don't, it WILL be for lack of trying; I don't know why I'm even thinking about it.

)

Ian.M · « **Reply #115 on:** August 02, 2020, 11:21:39 am »

Take MPLAB 8.92 and select a PIC with the desired core and as much program memory as possible. Open the program memory view, and make sure the Machine tab (at bottom) is selected and the Disassembly column is visible. Next, right click and fill with a sequence, starting from address 0x000, opcode 0x000, and it will fill all memory with incrementing opcodes, show you how they map to instructions and where the gaps are. Some gaps are real, others aren't. e.g. the hardware doesn't have a CLRW opcode, it and CLRF are really special cases of the undocumented CLR opcode, which if it was treated as any other instruction with a target register would take a F or W modifier bit, so CLRW is really CLR <dont_care_reg>, W 'under the hood'. Compare with the INSTRUCTION SET SUMMARY table from the datasheet and you'll start getting clues to the instruction decoder architecture.

For further enlightenment you'll have to set up to execute undocumented opcodes on real hardware, feeding in preconditions and dumping results. If you've got the use of a full chip level reverse engineering lab so you can decap the die and microprobe it running or even use a scanning electron microscope in voltage contrast mode, you can make definitive conclusions, but most of us have to settle for educated guesses backed up by as much of the IC state as we can read back or infer.

Berni · « **Reply #116 on:** August 02, 2020, 03:24:42 pm »

Quote from: MK14 on August 02, 2020, 02:38:20 pm

Quote from: VK3DRB on August 02, 2020, 05:59:03 am
Quote from: MK14 on August 01, 2020, 04:44:33 pm
Yes, assembler, can be a rather hard and daunting subject, to master....

Rubbish. Assembler is easy. Just read the manual that came with the assembler or macro assembler. These compilers are not hard to master.

They are NOT compilers.

Source:
https://www.geeksforgeeks.org/difference-between-compiler-and-assembler/

No need to get too pedantic about the terminology.

If one is to be pedantic then a thing that takes C code and spits out a working EXE file can't be called a compiler. Even tho this is what pretty much everyone will understand when they hear compiler.

The compiler just turns the C source files into object files. These things can't be executed yet. So they need to be fed trough an assembler to turn it into machine code. Yet you still can't run that because all you got now is a bunch of blobs of machine instructions. This needs to be fed trough a linker to actually put them all together into memory and hook them up together. So now you got a memory image that would run, but still can't be put inside a EXE. So it must be put trough a loader/output generator to package it up into an executable file along with all the information required to load it into memory and start it.

But everyone just calls it compiling, even tho that's technically just one small part of the process.

Benta · « **Reply #117 on:** August 02, 2020, 06:28:40 pm »

Quote from: MK14 on August 02, 2020, 04:06:07 pm

Please, let's try and get this thread, back on track!

How? You're one of the contributors that derailed it completely. I understand why the OP has stepped back.

eti · « **Reply #118 on:** August 03, 2020, 12:35:28 am »

Derailment *and* a train wreck in one fell swoop - amazing. How is it that people can't just keep shtum and let others have a viewpoint without "correcting" them?

I'll leave you lot to argue whilst the contributors contribute, I'll pop along again soon and see if we've actually progressed with a coherent, calm interchange, at all...


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: ASM programming is FASCINATING! (Read 8960 times)

Share me