I've spent the last few days debugging a sketch on a LED controller. Basically there are buttons attached through an I2C IO expander which allow for manual control of the PWM outputs, eg on/off switch, increase or decrease duty cycle in 20 discrete steps. The device can also be read/written over the serial interface using the modbus protocol. Instead of a master device (raspberry pi) polling the micro at regular intervals, the micro has an "interrupt" output that signals the RPi to perform a read operation. This is activated when you press any button on the micro.
This is an abbreviated version of the sketch:
void loop() {
if (ModbusRTUServer.poll()){
//If there's a modbus transaction matching the slave address, process the registers for updated values
process_modbus_stuff();
digitalWrite(INTERRUPT_PIN, LOW); //Clear the read trigger
}
if (button_press() == true){
process_buttons(); // Call any functions associated with each button
write_modbus_registers(); // Update the modbus registers so they can be read externally
digitalWrite(INTERRUPT_PIN, HIGH); //Triggers a read operation from an RPi
}
}
I noticed some odd behaviour when trying to read via modbus. Take for example when pressing the dimmer increase/decrease buttons. After pressing one of those buttons the following would happen:
- Next time through loop(), button_press() would return true
- process_buttons() would check which button was pressed and call the dimmer increase/decrease functions inside a light Class
- Inside the light Class, a dimmer_increase() function would increase a counter between 0-20, similar for dimmer_decrease()
- That 0-20 value would be processed into a 0-255 PWM value and written to the output pins using analogWrite()
- The values inside of the light Class would then be copied to modbus registers.
- INTERRUPT_PIN would be then written high
- Typically 1.5ms later, the RPi would send a read request, and would allow me to see the values of the light Class
This would all work perfectly fine without connecting the RPi (so the interrupt pin would just stay high after the first button press, because there would be no modbus transactions to reset the pin). I was also able to confirm that the functions were working correctly by just controlling a light, but also by manually performing a read operation after pressing a button to see that the registers are showing the correct values.
However things started behaving strangely once the RPi was also performing a read operation in response to the interrupt pin. As previously mentioned this was around 1.5ms after the pin goes high. The dimmer would only allow me to cycle between 0 or 1 steps, equating to a PWM value of 0-12 (which is 255 / 20 * 1). This made no sense to me, as the counter variable was internal to the light Class, and my understanding is that process_buttons() would complete all processes before write_modbus_registers() or digitalWrite(INTERRUPT_PIN, HIGH), and even then, there's a whopping 1.5ms after the digitalWrite before any modbus transaction occurs.
As soon as I stopped the RPi from doing any read operations, everything would work fine again.
On a hunch, I simplified my code so that it bypassed the 0-20 to PWM 0-255 processing, so a button press would perform a simple count += 1; or count -= 1 operation to increment or decrement the counter. Strangely, everything now worked, I could read the counter going between 0-20 using the RPi.
The cause of this is beyond my understanding of what the microcontrollers are doing. My understanding is that my functions should execute to completion before moving to the next function. Yet all I changed was removing some mathematical operations to speed up process_buttons().
Confusing? It gets worse.
I know that division operations are generally undesirable as they are quite slow. The 0-20 counter to 0-255 pwm value was given using:
- dimmer_0_255 = 255 * current_step / total_steps;
So I had the idea of removing all the division operations from my light Class. I could just precalculate the values I needed and store them in an array, where a read operation would be much faster than division. So a 0-20 counter could return a 0-255 conversion by simply reading an array with the stored values {0, 12, 25, 38, 51 ........}. So I made the replacements to my first example, and tried it without the RPi doing any automatic reads. It worked fine independently, and I could see that it was showing the correct values by doing manual reads after pressing the buttons.
So I started the RPi automatic reader................and now the micro controller would count between 0 and about 9 steps
Like bro what. Again, the second I stopped the RPi script, the micro would function as normal. How is any of this possible??
At this point I had completely optimised my light class functions, there were no divisions, only basic math, memory accessing and bit shifting. But even then, how is it possible that something which should occur ~1.5ms after completing all functions could have any effect?
In the end, "fixed" the problem on a completely different hunch. write_modbus_registers() involves calling the ModbusRTUServer.holdingRegisterWrite() function, and process_modbus_stuff() involves calling the ModbusRTUServer.holdingRegisterRead() function.
Because ModbusRTUServer.poll() doesn't distinguish between read or write transactions, my function was re-writing the modbus registers to the light class even on read requests. I guessed that the modbus registers were not being updated before ModbusRTUServer.poll() was called, thereby resetting the previous dimmer change. Again, that shouldn't matter? Since it looks like ModbusRTUServer.holdingRegisterWrite() just writes to an array, therefore this should be completed before loop() starts again. Not to mention the completely aberrant behaviour of the internal counter in the light Class not being incremented or decremented properly.
It should be mentioned that throughout all of this, I could still write to the micro just fine via modbus, ie set the dimmer levels via writing to the registers, so I don't think there's any issues with my modbus processing functions..
This has really shaken up my understanding of what is going on under the hood. I'm not a low level programmer. I know the arduino environment is highly abstracted and hides away nearly all the underpinning code. I've guessed as much in looking at how the serial interfaces operate in the background. But from this experience, I can surmise that when I write code in order, I can't guarantee that the code will be executed or completed in order?