I just did a count of a DIY TCP/IP stack running on a STM32F407 chip, and it takes 5668 bytes of FLASH and 160 bytes of RAM (4 TCP connection slots). The on-chip ethernet MAC takes 16kB though because it runs 5 tx & 5 rx buffers (need 100Mbit throughput rate).
That stack implements (a very minimalistic) TCP, a simple webserver, UDP, a custom UDP protocol, ARP, IMCP and IPv4.
If you need to add all those extra features listed by nctnico, I estimate it would take a few kB extra. Maybe it's application is "richer" for ethernet, then it may take another few kB extra. All in all I don't think it's that tight of a squeeze.
The STM32F103 vs STM32F107 only difference is the 107 packs an ethernet controller.. but as I said those ethernet buffers take quite a lot of RAM. You can run less than I do (down to 1/1 tx/rx), but suffer on throughput. You do get 100Mbit speeds in return for "hosting" it on chip.
Other than that, I think the cost difference (if it's hobby I am assuming) between ENC28J60 and an phy is negligible , and it comes down to if you want a generic SPI <> Ethernet chip, or a (R)MII <> Ethernet chip, which is more tied to a select few CPU's.
The nice thing about chips like ENC28J60 is you can easily port it over to smaller platforms; I run my DIY TCP/IP stack on a PIC24 with 8KB of RAM and that works nicely.
As for the TCP/IP stack; it seems to me that it must be fairly easy to port. It compiles on XC16/32 which is basically GCC. There are only so few things that are platform dependent, like SPI, UART, Timer and external interrupts.
You can probably run the TCP/IP stack on a on-chip phy easily if you cut the code "in half" at the points of macTx and macRx(). From my own experience, the layers on top are often just doing some juggling with a bunch of structs, some if(), switch() and CRC calculations; and that's it.