Jump to content

Power utilization frequency


Recommended Posts

Hi everyone,

I worked on a digital synthesis project and we were supposed to implement a FIR filter and then run the implementation and study the power consumption, resources utilization, frequency, etc. and I have never done that and I'm a bit lost. I'd like to know how interpret those data on the pictures.

Please help me analyze those numbers and figures.

(Also, it reads that my design uses 430 LUTs (at least I know what LUTs means), but isn't that number a little high? Is my design too... "heavy" or something? What about all the other numbers...?)

Any help would be welcome. Thank you in advance.

:)

Questpow3.PNG

Questpow2.PNG

Questpow.PNG

Link to comment
Share on other sites

Hi @sab,

There are some FPGA gurus that will likely have some more specific advice with regards to the FIR filter utilization itself, though I suspect the amount of resources it "should" use are largely dependent on how it was designed and what you need it to accomplish. This will also give better insight as to how many LUTs your design should be using, though considering that your design is only using about 1% of the available resources on the Kintex chip, it's not what I would call a heavy design at least for this particular FPGA chip.

As for the resource names, FF is for Flip-Flops, DSP is Digital Signal Processing (slice), IO is the amount of input/output pins on the FPGA, and BUFG are global clock buffers.

Thanks,
JColvin

Link to comment
Share on other sites

@sab,

Just on its face, I'm surprised you were able to implement an FIR filter in only 430 LUTs and one DSP.  This sounds rather light.

Tell me, though, how many coefficients did your FIR filter have?  (I'm typically looking at 10+, this looks too small for that.)  How bits did each of those coefficients have?  (I like between 8 and 16, to match the incoming ADC)  Were the coefficients fixed?  (Vivado can do a lot of optimizations on fixed coefficients, not so much on run-time programmable coefficients.)  How many bits did each of the input samples have?  How many bits were in the output?

All of these have an affect on how much logic an FIR uses.

Dan

Link to comment
Share on other sites

@JColvin

Thank you for the reply. I actually guessed for FF (:)) but I don't know how to analyze the numbers, I don't know what is good or bad.

 

@D@n,

Thank you for the reply. The filter response has 16 elements (actually 32, but it's symmetrical, so only 16 elements are stored in that ROM). Yes the coefficients are fixed and have a width of 11 bits.

The input data have a width of 12 bits and are stored in an array of 32 elements through a DPRAM; only after the 32 values are stored that the calculation begins. And of course the output data has a width of 23 bits.

 

My design has a SAMPLER that samples the input signal, a DPRAM (with a port for writing and another for reading), a ROM (for the impulse response coefficients), a MULTIPLIER, an ACCUMULATOR, and a SEQUENCER (that manages the FSM for the whole design).

Guys, how do you interpret the energy? Is 0.084 W good or bad as total power on-chip for the Kintex?

And does anyone know the clock pin names for KINTEX (xc7k70tfbg484-1) and SPARTAN (xc7s6cpga196-1)? I saw online respectively G7 and A7 on xilinx website but each time the implementation shows a critical warning and says that it's "...not a valid site or package pin name".

Thank you very much for all the replies.

 

Link to comment
Share on other sites

@zygot

I wrote what I was asked: to make a power and resources study... and then compare them for the implementation on an artix, kintex, spartan (and something else that I forgot) chip. And finally tell how I can improve my design. First of all, I need to understand what each value really means for my design and I'll see how I can improve it later. I seriously can't explain more because I said everything that I was told...

Link to comment
Share on other sites

@sab,

Fascinating!  Which core are you using?  Is it a public core, a Xilinx core, or commercial core, or one of your own that you are evaluating?  If you need a public core that can be used (mostly) cross-platform, then I can provide some that you can then reference in your study if you need to.

Let me also suggest that your coefficients need to be run-time settable for performance measurements.  If you don't, the synthesis tool might remove certain multiplies (multiply by +/- 2^n) and so otherwise bias your result.

Dan

Link to comment
Share on other sites

Apart from being an exercise in learning various device family resources and peculiar features of the tools I guess I'd be asking whoever assigned the project some more questions; because I still don't know what the objective is.

I do have some thoughts.

  • Vivado is happy to use BRAM even if you only are using 1% of it. You must have let Vivado choose how to implement memory  which likely accounts for the "high" LUT usage and 0 BRAM usage. Given the information so far I can't say that I'd consider your LUT and Register rates high.
  • You don't mention what strategies you used for synthesis and implementation
  • The version of Vivado might influence overall optimization of the power profile
  • Driving output pins is usually a significant part of the overall power usage analysis and that depends on IOSTANDARD, IO Bank Vcco, and external termination. Your report would seem to have almost no output loading. Capacitive loading can be significant.
  • Clock rates obviously can be a factor power dissipation. Series 7 devices can use clock enables to disable parts of a design that doesn;t need to run. Get friendly with the Series 7 Clocking User's Manual. Obviously the lower the clock rate the lower the overall power dissipation. This is a system design issue to solve.
  • Unless you are creating designs for a target that already has the IO pins assigned letting Vivado choose the pins is not a bad idea. You can open the implementation  and change IO pin setting constraints. If your targets are PCBs where the pins have been assigned you need to get those pin locations from the board schematics and assign location constraints.

I have no experience with the Spartan 7 family so I don't know if it has any special features for minimizing the power profile. You need to review the Kintex and Spartan 7 literature to see what's different. My general feeling is that it's hard to analyse a design kernal for performance outside of the context that it will be used in an overall design.

Link to comment
Share on other sites

On 10/30/2019 at 8:44 PM, D@n said:

Fascinating!  Which core are you using?  Is it a public core, a Xilinx core, or commercial core, or one of your own that you are evaluating?  If you need a public core that can be used (mostly) cross-platform, then I can provide some that you can then reference in your study if you need to.

@D@n

Thank you but sorry, I really don't know. I'm not familiar with that, I just learned how to program in VHDL and that's about it. I'm right now learning a little bit more through this exercise.

@zygot

What's BRAM (I read online block RAM?) and how can I check about BRAM for my design? Anyway, thank you. A lot of helpful infos.

Link to comment
Share on other sites

1 hour ago, sab said:

What's BRAM (I read online block RAM?) and how can I check about BRAM for my design?

Block Ram is hard memory resources in the FPGA. After Implementation the flow summary in Vivado show resource usage including BRAM. Use the table view to see actual usage of resources. In many cases LUT resources can also be used as memory though there is generally a timing penalty involved, especially for high clock rates.

Since you are just learning VHDL I'm assuming that your task is to familiarize yourself with the tools. Do avail yourself of the many User Guides and Tutorials that Xilinx has to offer.

Link to comment
Share on other sites

On 11/3/2019 at 7:49 PM, zygot said:

Block Ram is hard memory resources in the FPGA. After Implementation the flow summary in Vivado show resource usage including BRAM. Use the table view to see actual usage of resources. In many cases LUT resources can also be used as memory though there is generally a timing penalty involved, especially for high clock rates.

Thanks.

I have a few more questions...

What does "Worst Negative slack" represent? I know it has to be positive, but... for example, for my Artix target, I got 3.96 ns, for Kintex, 5.421 ns and for Spartan, 4.023 ns.

Since I'm making a comparison which one would be better?

Also, I'm to determine the maximum frequency... I found out that it would be the inverse of the time values. So, I get for Artix, 252.252 MHz; 184.467 MHz for Kintex and 248.570 MHz for Spartan. Which one is better and how can I interpret it? I first fell like the highest the frequency was, the better (because it might represent the speed of the device(?)) but now I'm not sure. I've read very different things about it online...

Thanks.

Link to comment
Share on other sites

Hi @sab,

I took a look online and found this Xilinx thread as well as one of Xilinx's User Guides as referenced by zygot that should be of help to you regarding the Worst Negative Slack.

The maximum frequency (presuming you are talking about how fast your design will work) will not be related to the worst negative slack; it will depend on how your design has been implemented. The DC and AC Switching Characteristics on each of the respective chip's 7-Series datasheet will give you some maximum speeds; however since your design will inevitably not solely dedicated towards the singular task of getting one clock working at full speed, I would expect something more along the lines from this Xilinx thread. Every design will be different though.

Thanks,
JColvin

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...