How can I write this instruction for read this bit or carry its value and clear it in one
For PICs, or in standard C, you cannot.
On the architectures where this is possible, it is only possible via dedicated functions, either built-in or implemented in assembly.
For
Intel Itanium aka IA-64 architecture support, GCC added a set of
__sync built-in functions, but they are deprecated nowadays. Instead, GCC, Clang, Intel Compiler Collection, and many other C and C++ compilers –– but not all! –– provide
__atomic built-in functions using C++11 memory models as an extension, on architectures where those are possible. On those where they are not, they compile to actual function calls, for the runtime support to implement them as best it can.
SDCC, for example, does not provide such built-ins.
The useful built-in here would be
__atomic_fetch_and(&variable, ~bitmask) & bitmask, which reads the value of
variable returning the state of the bits in
bitmask, also clearing those bits, atomically, as if in a single operation. (To set those bits, you'd use
__atomic_fetch_or(&variable, bitmask) & bitmask; to toggle,
__atomic_fetch_xor(&variable, bitmask) & bitmask.)
Most architectures don't have such bit manipulation functions, but do have either compare-exchange (CAS) or load-linked store-conditional (LL/SC) instructions, which allow this to be done in a tight loop: the value is loaded, the replacement value computed, but the replacement value is only stored if the original value is still unchanged. (CAS detects changes by value, LL/SC by access.) Because of technical reasons, such loops normally only do one iteration, two iterations max, so they don't really "spin". All processes, threads, or interrupts will see either the original value, or the replacement value, until the next modification; it truly is "atomic", indivisible in this sense. The only real "cost" is that the exact number of cycles taken varies, otherwise it is genuinely atomic read-and-modify operation.
Lockless data structures in C use these to implement atomic counters, bit masks, and so on, so you can also see them used in low-level libraries used in Linux/BSD/Android/other POSIXy systems. On microcontrollers they are rarer; for example, Raspberry Pi Zero (RP2040) does not support those either, but does have a hardware
spinlock support, for implementing fast mutually exclusive locks.