Regarding the memory...
Would I need external memory for this many operations?
10k is an awfully long FIR filter.
You can easily calculate this, but you need to know how many bits you need per a single coefficient and how many bits you need for an accumulator.
For each FIR filter, you need to maintain 10k of accumulators. If an accumulator is 20 bits its 200k bits - 6 BRAM blocks. Every ADC clock you need to fetch every one of these, multiply the most recent ADC reading by an appropriate coefficient, add it to the value you have read and store it back into the next slot. If you don't have enough BRAM you will need to organize a pipeline which will read from the external RAM then write it back, but if you have 5 DSPs working on this, it's 100 bits read and 100 bits write every DSP cycle, which you probably won't be able to achieve with external RAM.
You will also need space for the coefficients. For example, 10bit x 10k = 100k bits another 3 BRAM blocks.
What is easy about the FPGA, once you build one FIR filter, all of the others will be the same. So, N x N matrix is just N^2 identical filters which use N^2 identical sets of resources.
If one FIR filter takes 5 DSPs and 9 BRAM blocks, then for 10 x 10 = 100 FIR filters you'll need 500 DSP and 900 BRAM blocks. These are very aproximate estimates though.
As FPGAs are built, BRAM will be more of a problem than DSP.
Using a big FPGA to fit all filters may net be a good idea. Say if you want 100 FIR filters, XC7K480T has 955 BRAM blocks and 1920 DSPs and costs few $k.
It may be more beneficial to use a number of smaller FPGAs. Say, XC7A200T has 365 BRAM blocks and 740 DSPs, which might be enough for 40 FIR filters.