You can use DWT->CYCCNT as suggested above, but it's not enabled by default since uses the DWT, a debug feature, and is missing on the M0.
Plus you'd need to consider rollovers, and will have jitter from event to comparison.
The required initialization is mentioned in our code examples. Obviously on parts that don't have DWT, it won't be available.
Roll-over with classic *unsigned* arithmetic is actually a non-problem, as long as it doesn't roll-over more than once (which is impossible with the above code, unless it gets interrupted for a very long time, which would make your delay loop completely borked anyway (a 32-bit counter @100MHz will span over 42s). Something that's nice to know. Keep your variables all unsigned here.
Consider for instance a roll-over, with a start value of 0xFFFFFFF0 and an end value of say, 0x100. Now what is the 32-bit, unsigned difference (0x100 - 0xFFFFFFF0) ? 0x110. Which is still right.
The only problem, as I said, is if it rolls over more than once.
Jitter? Yes it will never be cycle accurate, but still much more predictable and easy to use than an hand-written delay loop in assembly or worse, in C (again unless your target is so simple that you can really write code with a cycle-accurate execution, which is not really possible on most ARM CPUs.)