QoS (Quality of Service, i.e. traffic priority) is where switches start to cost real money. If you take something like the "cheap" (and soon to be discontinued) Cisco 2960 series, it is possible to get PoE versions, but all 2960 models do lack a lot of QoS bells and whistles that you need to do this. Not until you hit the likes of 3650 (also discontinued) or similar iron will you get the features you're looking for.
My understanding of the broader switch vendor market is somewhat limited, but I have the firm impression that most SoHo gear (like D-Link or NetGear or similar ) will have just a fleeing shadow of QoS, and not up to your specification. They work for the naïve case, and as long as everything is fine and there is light traffic they will perform adequately. If you add media streams or multicast they will show their limitations.
The reason -- beyond market segmentation -- is that to do these things you need a sorting and priority assignment framework, and the power and resources to do it in the switch. Which costs development time and real hardware memory.
Your idea sounds simple, just allow this much from there to there, but not as much from there to there. But there is a bunch of stuff happening under the surface that need to work.
Zooming in on the Internet connection, we can assume that this is where there will be a bandwidth problem. Somewhere there will be a queue, because unless your Internet connection has the same bandwidth as your switch interface, there has to be a way to make packets that arrive at 1Gbit/s wait before they can be put on the wire at perhaps 10Mbit. When the row of packets that have destinations determined to be on the other side of the WAN link becomes too long for the buffer that's pacing them down from 1Gbit to 10Mbit, they will have to be dropped.
Thus, what we need to do is to make a function that makes sure that the departure buffer, when full, first drops those packets we think are less important.
How do we know what is important and not?
Not counting the work to understand traffic patterns and judging their importance here are a number of sorting possibilities, the most common being DSCP tag, IP address, traffic type and interface. For all of those except interface and DSCP tag the switch needs to be a L3 switch. For DSCP it probably is that too, at least capable of. If you select "interface", because I haven't said it's expensive yet, you will indeed get that priority, but since time-critical traffic from the phones will pass over the same interface as the generic surf traffic, you will get poor phone performance if you hit it with the same rule. Therefore, you need some way of telling that media packets going from a apartment phone to the door phone can't be hit with that hard rule. (Probably they'll never hit the Internet connection, but they must be allowed to pass with priority tags added into the switch, since if not they'll not get past next step. )
Thus, you need to identify and mark packets on their importance. This sorting must happen
before we hit the queue, on input, as early as possible. In your case, we can identify two levels, high and low, but we must further subdivide into three classes;
- Priority traffic from management systems, "management"
- Priority traffic for voice communications, and its signalling, "voice"
- Scavenger traffic, the rest, "bulk"
How we do it, well, on the management interface, if we believe in the illusion that the network is managed and secure, we can mark "management" on ingress, by default. On the rest of the internal interfaces, we can mark on IP address, if we know the addresses of all the phones, or mac address, if we know those. It is probably the easiest to single out the priority devices, and mark those explicit "voice", and default all others to "bulk". Most voice equipment can mark its traffic with DSCP tags, but trusting that means that you'll open up for bulkers marking their traffic with priority DSCP and bypassing your limits.
Then, as I wrote above, on all the interfaces there are queues outbound from the switch. One priority queue, and one non priority queue (or more, but good luck finding more than two in anything except very expensive iron). Here the interfaces differ in their strategy;
- The interface towards Internet connection needs to prioritise "management" and put the rest into best effort.
- The interface towards the management gear needs to prioritise "voice" and "management"
- The interfaces towards the apartments needs to prioritise "voice" and default the rest.
This is just a rough sketch, and there's bound to be gotchas. But it sort of conveys the general idea of traffic management in packet-forwarding multiservice networks. And, it does only prioritise if there's starvation.