1 pointAmen. And that's a good view for all Xilinx IP. The structure for a simple digital filter is not that complex; you can implement them in HDL. I've done that. Xilinx IP is convenient but usually not the best approach when you have concerns about using up limited resources. The issue for using very fast resources like BRAM and DSP slices is that they are placed in particular locations throughout the device with limited routing resources for the signals between them or other logic. You can let Xilinx balance throughput, resource usage, logic placement, and throughput or you can try to do that yourself. Trying to use 100% of every BRAM or DSP resource in order to minimize the number of BRAM or DSP resources used is not easy. In my experience FPGA vendors are content to have their IP wizards make the customer think that he needs a larger and more expensive device. So that's the trade-off; let the vendors' tools do the work to save time or write your own IP and be responsible for taking care of all the little details that the IP hides form you. I've spent some time experimenting with DSP resources from various FPGA vendors. They are complicated with a lot of modes and depending on how you use them throughput can decline substantially from the ideal. Just read the user's guide and switching specs in the datasheet to get the idea. Generally the DSP slices are arranged to perform optimally with certain topologies but not all. Implementing designs that are iterative or have feedback can get ugly; especially when you try and fit that into a larger design using most of the devices resources. As a general rule, in my experience, use vendor IP and don't ask a lot of questions or design your own IP and be prepared to learn how to handle a lot of details that aren't obvious. Time verses convenience.
1 pointHi, [1 2 3 4 0 1 2 3 4] is not symmetric in a linear-phase sense. That would be e.g. [1 2 3 4 0 4 3 2 1]. You could exploit the shared coefficients manually, see e.g. Figure 3 for the general concept. But this case is so unusual that I doubt tools will take it into account. The tool does nothing magical. If performance matters more than design time, you'll always get better results for one specific problem with manual design. One performance metric is multiplier utilization (e.g. assuming you design for a 200 MHz clock, one DSP delivering 200M operations / second performs at 100 %. Reaching 50+ % is a realistic goal for a simple / single rate structure). For example, do I want to use an expensive BRAM at all, when I could use ring shift registers for delay line and coefficients. Then you only need a small controlling state machine around it that does a full circular shift for each sample, muxing in the new input sample every full cycle (the BRAM makes more sense when the filter serves many channels in parallel, then FF count becomes an issue).