The next step is ARM <something or other>, no question about it. There is a question whether STM32 is the way to go but there is a book, "Mastering STM32" that will hold your hand starting with installing Eclipse and GCC and working up.
https://leanpub.com/mastering-stm32Any time you can get a book, it's a good place to start.
Another way to go: Look at mbed.org and, specifically, the original mbed (LPC1768). I use these all the time and I really like the online toolchain. I don't have to install anything, the binary is 'drag and drop' onto the device and I can work on my code from any PC in the world that has a web browser. Yes, there are detractors, but ask if they ever did a sizable project actually using the tools. Mostly it's opinion "I don't like online...".
You will also notice that some STM32 boards are also 'mbed compatible' and you will find them listed on the mbed.org site. I'm not sure how well supported they are but the 'drag and drop' will work and that's the important bit.
The mbed infrastructure does not include a hardware debugger. Fine! Just use printf(). I have never found single stepping to be all that productive anyway and I've been doing it for almost 50 years (started in '70).
For a first attempt, I think the "Mastering STM32" is the way to start.
Completely different: Cypress PSOC 6 (or PSOC 4 and PSOC 5). There are terrific videos on how to use the boards and the toolchain creates a lot of the underlying code. There is also a graphical component where you lay down the peripherals you will use and interconnect them as necessary. When you build the configuration, all the support code is generated. This is a VERY nice development environment. The PSOC 6 has two ARM cores, an M0 and an M4. One can be used to support tasks with high compute needs and the other might support IO. But both devices have access to all peripherals and memory.