If you can't get enough on the web.
It might seem obvious, but let's also just get this out of the way: You'll need a GPU with a lot of memory, and probably a lot of system memory as well, should you want to run a large language model on your own hardware — it's right there in the name. A lot of the work to get things running on a single GPU (or a CPU) has focused on reducing the memory requirements.
Getting the models isn't too difficult at least, but they can be very large. LLaMa-13b for example consists of 36.3 GiB download for the main data (opens in new tab), and then another 6.5 GiB for the pre-quantized 4-bit model (opens in new tab). Do you have a graphics card with 24GB of VRAM and 64GB of system memory? Then the 30 billion parameter model (opens in new tab) is only a 75.7 GiB download, and another 15.7 GiB for the 4-bit stuff. There's even a 65 billion parameter model, in case you have an Nvidia A100 40GB PCIe (opens in new tab) card handy, along with 128GB of system memory (well, 128GB of memory plus swap space). Hopefully the people downloading these models don't have a data cap on their internet connection.
https://www.tomshardware.com/news/running-your-own-chatbot-on-a-single-gpu