Author Topic: Issue with RS485 Communication and Impact of a Faulty IC on the Entire Bus (Read 770 times)

digitalectron · « **on:** September 28, 2024, 07:19:35 am »

Hello everyone,

I’ve designed a board where the SHT30 sensor is connected to an STM32G0 microcontroller, and I can successfully read the temperature and humidity data. These data are sent to the HMI via the Modbus protocol and RS485 bus. The problem occurs when multiple devices are connected to the RS485 line.

After a few hours, if there is a disruption in one of the ADM485 ICs, it affects the entire line. After measuring the voltage, I noticed that the RE and DE pins, which are connected to the microcontroller for enabling data transmission or reception, get stuck at 3.3V. This issue disrupts the entire line and causes the other devices to stop working as well. I find that by cutting off the power and reconnecting the devices, everything works fine again, but after a few hours, the problem reoccurs.
here is part of my design for RS48

" alt="" class="bbc_img" />

I’d really appreciate it if anyone with similar experience could provide a solution to this problem.

Salitronic · « **Reply #1 on:** September 28, 2024, 08:23:25 am »

This doesn't seems to be a case of faulty RS485 IC, rather something in your firmware is causing the RE/DE pin to stay high. RE/DE on the RS485 transceiver are inputs, they cannot be stuck high themself.

RS485 transceivers have a number of fail-safe features that result in high-impedance connection to the bus so as not to disrupt communication. However if the RE/DE are constantly high then you are holding the bus constantly driven and this prevents any other device from using the bus.

You should look into the firmware to understand what is causing the RE/DE to stay high.

Side note: You have an on-board 120ohm termination resistor, if you have more than 2 devices on the bus you cannot have the resistor mounted on all devices. There should only be 2 120ohm termination resistors, one at either extreme end of the bus. If you have many devices on the bus it might be that all of these parallel resistors are leading to an overload and from there causing your firmware to get stuck somehow.

Siwastaja · « **Reply #2 on:** September 28, 2024, 08:31:10 am »

Yeah, your code is broken.

digitalectron · « **Reply #3 on:** September 28, 2024, 09:22:38 am »

Thank you for your detailed response!

I forgot to mention that I haven't mounted the 120-ohm resistor on the board; I only have the PSM712 installed for protection. Regarding the RE/DE pins, I've written firmware that should restart the microcontroller if no data is received on the RX pin for a certain period, but it doesn't seem to be working as expected.

What do you think about using a watchdog timer reset in this case? Could it help with the issue, or is there something else I might be missing?

Code: [Select]

void TIM1_BRK_UP_TRG_COM_IRQHandler(void) {
	/* USER CODE BEGIN TIM1_BRK_UP_TRG_COM_IRQn 0 */
	myTime++;
	if (myTime > 10000) {
		if (rxReceive == 0) {
			myTime = 0;
			NVIC_SystemReset();
		}
	}

	/* USER CODE END TIM1_BRK_UP_TRG_COM_IRQn 0 */
	HAL_TIM_IRQHandler(&htim1);
	/* USER CODE BEGIN TIM1_BRK_UP_TRG_COM_IRQn 1 */

	/* USER CODE END TIM1_BRK_UP_TRG_COM_IRQn 1 */
}

tooki · « **Reply #4 on:** September 28, 2024, 10:27:22 am »

Well, I mean... if you just set up the watchdog timer and never clear it, then you'll just reset periodically no matter what.

Can you run your MCU with the debugger going the entire time, so that when the error occurs, you can go in and inspect what's going on?

Siwastaja · « **Reply #5 on:** September 28, 2024, 10:45:14 am »

Quote from: digitalectron on September 28, 2024, 09:22:38 am

I forgot to mention that I haven't mounted the 120-ohm resistor on the board; I only have the PSM712 installed for protection. Regarding the RE/DE pins, I've written firmware that should restart the microcontroller if no data is received on the RX pin for a certain period, but it doesn't seem to be working as expected.

Wait, you have designed a modbus slave; why would you restart the microcontoller if no data is received? This is a pretty normal condition, not every master polls all the time.

In RS485 slave it is crucial to correctly read the incoming packet, check for matching slave_id and valid CRC and then, with correct timing, enable the transmitter, output the reply and then immediately (again see timing from modbus specification) disable the transmitter. Just try to follow sensible programming practices and your code should not just become stuck in that sending part.

And just to be sure, you have assigned different slave_id's to different slaves yes?

digitalectron · « **Reply #6 on:** September 28, 2024, 11:44:53 am »

Quote from: tooki on September 28, 2024, 10:27:22 am

Well, I mean... if you just set up the watchdog timer and never clear it, then you'll just reset periodically no matter what.

Can you run your MCU with the debugger going the entire time, so that when the error occurs, you can go in and inspect what's going on?

Thank you for your suggestion! " tooki"

It's quite difficult for me to debug the MCU continuously in this case. I have 4 devices on one bus, and I leave them powered on for an entire day and night. The issue usually happens while I’m away. For example, when I leave the office and come back the next day, I often find that the problem has occurred after 10 or more hours.

Do you have any suggestions for troubleshooting in such a scenario where real-time debugging is challenging?

digitalectron · « **Reply #7 on:** September 28, 2024, 11:52:18 am »

Quote from: Siwastaja on September 28, 2024, 10:45:14 am

Wait, you have designed a modbus slave; why would you restart the microcontoller if no data is received? This is a pretty normal condition, not every master polls all the time.

In RS485 slave it is crucial to correctly read the incoming packet, check for matching slave_id and valid CRC and then, with correct timing, enable the transmitter, output the reply and then immediately (again see timing from modbus specification) disable the transmitter. Just try to follow sensible programming practices and your code should not just become stuck in that sending part.

And just to be sure, you have assigned different slave_id's to different slaves yes?

Thank you for the feedback!

Just to clarify, the HMI broadcasts to the entire RS485 bus using the Modbus protocol. I've implemented a timer in the code that will restart the MCU after 10 seconds if no data is received. Under normal conditions, the MCU receives data from the HMI approximately every second to respond with temperature and humidity readings, and everything works fine until an issue arises—at which point the system gets stuck, and I'm not sure what causes it.

Do you have any thoughts on what might be going wrong in this scenario?

Siwastaja · « **Reply #8 on:** September 28, 2024, 11:55:38 am »

Quote from: digitalectron on September 28, 2024, 11:52:18 am

Just to clarify, the HMI broadcasts to the entire RS485 bus using the Modbus protocol.... to respond with temperature and humidity readings

Well here's your problem. Modbus broadcasts cannot be responded to. The master needs to individually poll the nodes (unicast).

See Modbus specification https://www.modbus.org/docs/Modbus_over_serial_line_V1_02.pdf page 7

tooki · « **Reply #9 on:** September 28, 2024, 12:07:17 pm »

Some avenues to investigate:
- Does the problem occur when a single MCU is on the bus with the HMI?
--> Is this an issue that only occurs as a result of interactions between multiple slave devices, or is it the result of interaction between master and slave?

- Does the problem occur if an MCU is not connected to the bus at all? (Monitor the TX_EN line and see if it goes high even with no communication at all.)
--> Is this an issue that only occurs as a result of interactions between master and slave devices, or is it completely self-contained?

- Can you perform accelerated testing by having the messages from the HMI come at a much higher rate?
--> Is this an overflow, for example a counter that never gets reset and eventually increments to an illegal value, like a signed integer circling around to a negative number.

Right now the HMI sends messages about every second. What if you generate those at, say 50 per second. Then you'd expose such an overflow in a few minutes instead of many hours.

digitalectron · « **Reply #10 on:** September 28, 2024, 12:31:38 pm »

Quote from: Siwastaja on September 28, 2024, 11:55:38 am

Well here's your problem. Modbus broadcasts cannot be responded to. The master needs to individually poll the nodes (unicast).

See Modbus specification https://www.modbus.org/docs/Modbus_over_serial_line_V1_02.pdf page 7

Thank you for pointing that out! I read the PDF thoroughly and reviewed my Modbus code again to ensure proper communication between the HMI and the MCU.

Salitronic · « **Reply #11 on:** September 28, 2024, 12:35:48 pm »

Can you share more of your code, the part handing the TX/RX?

How are you handling the RE/DE: manual GPIO toggle or using the RS485 hardware flow control feature of the STM32?

digitalectron · « **Reply #12 on:** September 28, 2024, 12:36:50 pm »

Quote from: tooki on September 28, 2024, 12:07:17 pm

Some avenues to investigate:
- Does the problem occur when a single MCU is on the bus with the HMI?
--> Is this an issue that only occurs as a result of interactions between multiple slave devices, or is it the result of interaction between master and slave?

- Does the problem occur if an MCU is not connected to the bus at all? (Monitor the TX_EN line and see if it goes high even with no communication at all.)
--> Is this an issue that only occurs as a result of interactions between master and slave devices, or is it completely self-contained?

- Can you perform accelerated testing by having the messages from the HMI come at a much higher rate?
--> Is this an overflow, for example a counter that never gets reset and eventually increments to an illegal value, like a signed integer circling around to a negative number.

Right now the HMI sends messages about every second. What if you generate those at, say 50 per second. Then you'd expose such an overflow in a few minutes instead of many hours.

I want to let you know that I will check your suggestions and try them out. I’ll share my experiments here once I have some results.

digitalectron · « **Reply #13 on:** **Yesterday** at 05:24:01 am »

Quote from: Salitronic on September 28, 2024, 12:35:48 pm

Can you share more of your code, the part handing the TX/RX?

How are you handling the RE/DE: manual GPIO toggle or using the RS485 hardware flow control feature of the STM32?

Code: [Select]

void vTask_Modbus(void *pvParameters) {
	for (;;) {
		/******************************************************
		 * ModBUS
		 ******************************************************/
		if (rxReceive) {
			rxReceive = 0;
			HAL_UART_DMAStop(&huart1);
			HAL_UART_Receive_DMA(&huart1, (uint8_t*) RxData, ModbusSizeReceive); // Enable MODBUS
			if (RxData[0] == SLAVE_ID) {
				switch (RxData[1]) {
				case 0x3:
					readHoldingRegs();
				case 0x06:
					writeSingleReg();
				default:
					modbusException(ILLEGAL_FUNCTION);
					break;
				}
			}
		}
		vTaskDelay(500);
	}
}

.
.
.


void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) {
	rxReceive = 1;
	myTime = 0;
	HAL_UART_DMAStop(&huart1);
	HAL_UART_Receive_DMA(&huart1, (uint8_t*) RxData, ModbusSizeReceive); // Enable MODBUS

}
.
.
.

uint8_t readHoldingRegs (void)
{
	uint16_t startAddr = ((RxData[2]<<8)|RxData[3]);  // start Register Address
	startAddr = startAddr - 200;//start of register in HMI is 200
	uint16_t numRegs = ((RxData[4]<<8)|RxData[5]);   // number to registers master has requested
	if ((numRegs<1)||(numRegs>125))  // maximum no. of Registers as per the PDF
	{
		modbusException (ILLEGAL_DATA_VALUE);  // send an exception
		return 0;
	}
	uint16_t endAddr = startAddr+numRegs-1;  // end Register
	if (endAddr>49)  // end Register can not be more than 49 as we only have record of 50 Registers in total
	{
		modbusException(ILLEGAL_DATA_ADDRESS);   // send an exception
		return 0;
	}

	TxData[0] = SLAVE_ID;  // slave ID
	TxData[1] = RxData[1];  // function code
	TxData[2] = numRegs*2;  // Byte count
	int indx = 3; 
	for (int i=0; i<numRegs; i++)   // Load the actual data into TxData buffer
	{
		TxData[indx++] = (Holding_Registers_Database[startAddr]>>8)&0xFF;  // extract the higher byte
		TxData[indx++] = (Holding_Registers_Database[startAddr])&0xFF;   // extract the lower byte
		
		//TxData[indx++] = (myTemp>>8)&0xFF;  // extract the higher byte
		//TxData[indx++] = (myTemp)&0xFF;   // extract the lower byte
		startAddr++;  // increment the register address
	}
	sendData(TxData, indx);  
	return 1;  
}
.
.
.
void sendData (uint8_t *data, int size)
{
	uint16_t crc = crc16(data, size);
	data[size] = crc&0xFF;   // CRC LOW
	data[size+1] = (crc>>8)&0xFF;  // CRC HIGH
	HAL_GPIO_WritePin(TX_EN_GPIO_Port, TX_EN_Pin, GPIO_PIN_SET);// RE-DE
	HAL_UART_Transmit(&huart1, data, size+2, 1000);
	HAL_Delay(1);
	HAL_GPIO_WritePin(TX_EN_GPIO_Port,TX_EN_Pin , GPIO_PIN_RESET);// // RE-DE
}

I use GPIO for handling the RE/DE and use

these in both the Rx callback and the task to perform certain DMA work. It worked for me for hours or more

Code: [Select]

HAL_UART_DMAStop(&huart1);
HAL_UART_Receive_DMA(&huart1, (uint8_t*) RxData, ModbusSizeReceive); // Enable MODBUS

Doctorandus_P · « **Reply #14 on:** **Yesterday** at 07:55:32 am »

I think you made an error. You used a wire instead of labels to connect pins 2 and 3 of the IC on your schematic.

Siwastaja · « **Reply #15 on:** **Yesterday** at 08:36:14 am »

Quote from: Doctorandus_P on Yesterday at 07:55:32 am

I think you made an error. You used a wire instead of labels to connect pins 2 and 3 of the IC on your schematic.

Nothing wrong in that, this most usual RS485 transceiver pinout uses opposite polarity driver enable and receiver enable signals purposely so they can be tied together to create a simpler 1-bit "direction" signal. Doing that you lose {inactive, inactive} combination which might be lower power (or useful if you want to multiplex many transceivers to single MCU UART peripheral, an idea applicable to modbus masters but not slaves), and {active, active} combination which would allow you to "listen back" what you send detecting some error conditions like bus short circuit, but these features are often not necessary.

PGPG · « **Reply #16 on:** **Yesterday** at 11:22:41 am »

Quote from: Doctorandus_P on Yesterday at 07:55:32 am

I think you made an error. You used a wire instead of labels to connect pins 2 and 3 of the IC on your schematic.

Quote from: Siwastaja on Yesterday at 08:36:14 am

Quote from: Doctorandus_P on Yesterday at 07:55:32 am
I think you made an error. You used a wire instead of labels to connect pins 2 and 3 of the IC on your schematic.
Nothing wrong in that,

Doctorandus_P didn't questioned the connection itself but lack of consistency in drawing the schematic.

Doctorandus_P · « **Reply #17 on:** **Yesterday** at 02:11:35 pm »

Quote from: PGPG on Yesterday at 11:22:41 am

Doctorandus_P didn't questioned the connection itself but lack of consistency in drawing the schematic.

Indeed. A single 8 pin IC and it has 6 dislocated sections connected with labels.
It's almost surprising he managed to connect a LED and a resistor without using labels.

Salitronic · « **Reply #18 on:** **Yesterday** at 07:19:53 pm »

Quote from: digitalectron on Yesterday at 05:24:01 am

Code: [Select]
if (RxData[0] == SLAVE_ID) { switch (RxData[1]) { case 0x3: readHoldingRegs(); case 0x06: writeSingleReg(); default: modbusException(ILLEGAL_FUNCTION); break; }

The switch cases for 0x3 and 0x06 lack break statements. This results in fall-through, meaning that after executing readHoldingRegs(), the code continues to execute writeSingleReg() and then modbusException(ILLEGAL_FUNCTION). This might be causing unintended behavior.

It's not clear what size you are declaring TxData, check that you do not have a buffer overrun in readHoldingRegs() or sendData()

It is possible that an exception is being raised in HAL_UART_Transmit, if you are using cubeMX the default Hard/Mem fault ISRs are an infinite loop. That would leave the RE/DE high.

Siwastaja · « **Reply #19 on:** **Today** at 06:40:21 am »

Yeah, for a project like this write some kind of

Code: [Select]

void error(int errcode)
{
   _disable_irq();
    GPIOx->MODER = // set the critical control pin as output
    GPIOx->BSRR = // write the critical control pin to safe state (in this case, listen mode, transmitter disabled)
    while(1)
    {
        // blink errcode counts on a LED.
    }
}

Then make sure every interrupt handler (including hardfault, busfault, your default handler for unhandler ISRs, etc.) calls this function. Especially important for unmaskable highest-priority interrupts like hardfault. Then configure a watchdog which gives you highest possible priority interrupt (that would be next to hardfault) and call error() there, too. Now it should be impossible for any infinite loop to ever cause stuck driver.

Then, use different error codes from different call sites so that when things get stuck, you can look at the blinking LED and immediately rule out many wrong leads.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Issue with RS485 Communication and Impact of a Faulty IC on the Entire Bus (Read 770 times)

Share me