artyz7-20 Troubleshooting a possible power issue

MaxDZ8 · April 3, 2021

Hello, I hope you will be enjoying your vacations if you have been given some.

For me this has meant finally being able to work on my spare-time experiment and finally reach closure on my upgraded design. Let me describe the process.

The ArtyZ7-20 is just the initial prototyping. I'm going to move to real production FPGA boards ASAP (probably in August) but for the time being I'd just want to go ahead with the Arty.
The system is systemverilog RTL and barebone C++.
The initial design was 100Mhz and 6-stage pipe. Vivado estimated about 2.2W power. I suspect it was much lower than that. I ran it over USB2. Let me be clear my mainboard has a fairly beefy usb2 going beyond usual specification.
I later went ahead with a 12-stage pipeline. I was unable to run it on USB2, but it runs rock stable on USB3.
In the last few weeks I've upgraded it to 200Mhz. Vivado now estimates 5.5W.
1. I never expected this to be able to run on USB3 power (I haven't tried, but I doubt my USB3 can deliver 1A)
2. so I've hooked the ARTY through its power jack to a industrial supply (details, if needed, in a later message)
3. The board gives up.

If I leave the board free-running, it hangs almost instantly.

Here's how it goes by stepping it in the debugger:

Booting ok, DHCP fails (ok) and fixed ip is estabilished.
Server correctly found, input requested and correctly received.
Input feed to PL
PL start

As soon as I pass beyond the {4} breakpoint, the card hangs. The debugger will never hit the next breakpoint.

I can tell the thing is more relevant than just software because my PL turns on red LD5 when idle. It would turn on green and eventually blue plus animate LD0-3. This never happens.

I was thinking about hooking a bunch of capacitors to the supply and see if it improves but I guess there might be other issues to consider as well.

Do you have any suggestion?

zygot · April 13, 2021

2 hours ago, MaxDZ8 said:

I need some more time to digest all this content. I feel the need clarify a key thing. There has been a time I when I followed and even advocated for best practices. In my spare time I cannot quite stick to it anymore - just today I got assigned to yet another year-long project which would require a whole team and I'll need to complete in in a few months!

Yeah, I sometimes get sloppy with some projects. But here's a different way to think about making the best use of your very limited time; it's one that I keeps getting thrust into my mindset by complicated projects. For 'side' projects this is all about self-preservation and self-defense. Spinning my wheels is more frustrating than forcing myself to do some sort of code version preservation. There are no hard rules and you don't need versioning tools. Just zipping up a project when you know that there are going to be major changes is often sufficient. Developing some good, simple, habits and processes might only pay off big every so often.... but when they do it saves you so much time. Archiving whole project snapshots is good because you can compare messages, reports, simulations easily. You can also add measurement instrumentation to both the old and new project. If there's an unexpected major change in behavior it might help find unintended changes. Anytime you're using Vivado board design to create code there is that potential. I've seen Vivado do some very strange things when I make minor changes. (I should mention that I never use the board design flow except for ZYNQ projects ).

Somewhere, there's some good balance between working on a project and doing what's needed to evolve a project without encountering undue pain. What works for me doesn't necessarily work for you. Some projects turn out to be as simple and straight-forward as expected. Others, not so much. If you think about it, projects done for personal growth are no different than those done for work ( aside from the usual noise associated with a team project having a strict deadline ). They both take up your time; so why not make the time you have as productive as possible? Just some personal thoughts on the subject.

zygot · April 21, 2021

2 hours ago, MaxDZ8 said:

Pulling in the XADC measurement took more effort in finding the documentation than the work itself but the process has been straightforward.

Yeah, but now that you know how to do it the next time will be easy. I strongly encourage using the XADC for all more serious Series7 projects, fan with heat sink, heat sink without fan or nothing at all. If you don't check on things you are just operating on assumptions. In electronics ignorance is rarely bliss ( for long ), especially of you are pushing your hardware.

I kind of agree with you that LUT52% FF68% usage, with minimal or no IO being driven ( I assume ), and clocked at 200 MHz shouldn't cause your platform power supply to roll over and play dead... but that's the nature of this class of boards. As I mentioned the more expensive FPGA hardware comes with a more robust power supply design. Regardless, I'm pretty sure that 1 unhappy customer out of a thousand isn't going to change the way that these boards are designed. I'm betting that 90% of the projects done on them use less than 20% of resources... just a guesstament.

I hate to mention it at this point but there's still that outside, low probability chance that something has gone wrong with the 12-pipeline 200 MHz build. At this point, personally, I might be curious enough to try a few builds sliding from 100 MHz upward to see the power dissipation curve... but I can get overly curious. Most likely it's the hardware platform.

Thanks for keeping us informed about your experiences.

Edited April 21, 2021 by zygot

zygot · April 3, 2021

It seems to me that you already have a suspect in mind; a power supply issue. So, why not try to either confirm or eliminate that possibility?

If you have a decent oscilloscope you can monitor the core voltages at the point where you think that the problem occurs. If you don't have a scope then you can use the FPGA XADC facility to monitor and set alarms for internal supply voltaes. This might require a bit of alteration to your PL design. Perhaps you could add control bits to allow some functionality in the PL but not others. This would allow you to selectively enable parts of the design.

You don't mention it but the ARM processors are fast enough to cause but faults. I've seen this with standard AXI GPIO IP and trying to flip bits at too high a rate. You can test this by adding delays between successive AXI read/write accesses.

One thing that is important to do when debugging complex (SW + HW ) problems is divide and conquer. Pare down the possible causes of the problem. Often, it turns out to be more complicated than just one thing but a binary approach to eliminating suspects is still the best approach most of the time. It never hurts to take a short break and come back and try and do a fresh review of what's going on with a design and try a few thought experiments. Of course, you've already done your PL design simulation and carefully looked for unexpected changes from previous design iterations.. right? Sometimes, the HW timing reports and messages have a clue as well.

On-board FPGA power supplies are designed to provide a limited range of average and instantaneous power. Trying to change the design behavior by throwing a larger power source or capacitors at it is not a recommended practice. If your prototype platform is limited then you should adjust your prototype performance to suit the platform. Of course, this has as yet to be confirmed.

Oh, and you can instrument your PL design to spit out XADC readings through a HDL UART using 2 spare pins and an external USB TTL UART cable. This will disconnect ARM issues and SW debugging from monitoring the core temperature, voltages, currents etc. In fact, thinking about how to separate HW debugging from SW debugging by instrumenting the PL is never a bad idea...

Edited April 3, 2021 by zygot

MaxDZ8 · April 3, 2021

Thank you Zygot, I have difficulty boiling down the suggestions to a concrete course of action.

For the time being, I notice this:

3 hours ago, zygot said:

You don't mention it but the ARM processors are fast enough to cause but faults. I've seen this with standard AXI GPIO IP and trying to flip bits at too high a rate. You can test this by adding delays between successive AXI read/write accesses

That's a scary possibility! My design pours out a few monitor signals which are fetched to a not-quite-PWM. If memory serves it should be pulsing at about 200khz. It all goes through FPGA fabric.

Indeed, the system does poll HW through AXI quite frequently. Assuming no bugs it should happen a few times a millisecond but I am inclined to look elsewhere as in my debug runs I am stepping through manually and it still hangs at dispatch, it never even gets to execute the following instruction, let alone the polling.

4 hours ago, zygot said:

Of course, you've already done your PL design simulation and carefully looked for unexpected changes from previous design iterations.. right? Sometimes, the HW timing reports and messages have a clue as well.

What can I be looking for?

Last few notes:

I was not going to add caps at random to something featuring sequencing. I meant to add them straight out of the power supply as I know this supply can't boot 3 rPis reliably and I'm pretty sure the hardware I have sythetized is more power hungry than 1 rPi. I know from previous experimentns the caps can smooth out the dynamic loads enough for it to make it through but that's the best I can reason about.
No, I don't have a scope at home. I don't know where to hook the probes, I haven't even looked it up because I'm not skilled enough at soldering to bring out those traces.
From the Arty7-20 reference manual it seems for FPGA 1V is 2.1A typical, while 3.3 is about 1.5A. I am a bit surprised to find out those are alrady in the right range to give me issues.
I understand using the integrated device could help assuming the issue is related to ARM core communication?

zygot · April 3, 2021

2 hours ago, MaxDZ8 said:

have difficulty boiling down the suggestions to a concrete course of action.

Try to eliminate the power supply as the main issue
- implement XADC in PL logic using DRU
- add a UART to spit out alarm conditions and or voltages. There are some code examples in the Project Vault area. There are some good 3V compatible TTL USB UART cables and breakout boards from Adafruit and Sparkfun. I have 4-5 laying around and in general most are in use on some project or another.
- I don't know how your Arm cores connect to your HDL design but try and add some enables to parts of your design to help identify what portion might be causing issues.
Try and create a separate debug path in your HDL. Obviously the SW debugger is of no help once the ARM core has faulted.
What are you looking for in simulation and HW synthesis or P&L messages? I don't know. Doubling your pipeline latency and clock can introduce unexpected design issues. A good simulation testbench can often help identify areas of interest.

The general idea is track down things that you suspect are a problem and divide and conquer parts that might not be obvious areas of problems.

MaxDZ8 · April 7, 2021

Erm, I understand those things up to a certain point.

I have dumped "fpga DRU" in duckduckgo and found those two links across the garbage.

For "XADC in PL logic using DRU", I have found a little more interesting content here and there but I don't quite connect the dots.

What does that even mean?

Anyway, I guess it is a good idea to take a step back and start from easier things first I've left out some important information.

The Arty Z7-20 has a magical green led LD12, labeled "done" it turns on after FPGA bitstream has been uploaded. This is supposed to be on basically all the time.

What I observe: LD12 turns off when compute starts.

I'm not deep enough in the tech to understand how this LED is driven but I'm inclined to believe bitstreams must be truly garbled to interfere with it.

Another interesting LED is LD13 (RED) described in the reference manual as "on when all the supply rails reach their nominal voltage". This also goes off. The process overall is as follows:

Power on +12V rail through jack. Red LD12 on. Green LD8 on.
Run tera term. Connected to COM.
Run Vitis. I launch the project as Assistant tab > Debug > context-menu > Debug> Single Application Debug
In maybe a second fpga is programmed, LD12 on. Debugger on first line of program. Red LD5 on.
(Not considered relevant: I must run the orchestrator here, it's the program cooking the data for easy consumption)
Canonical ethernet traffic is observed
Debugger hits breakpoint at parameter setup. Continue.
Debugger hits breakpoint at 'start compute'. Continue.
LD12 blinks off.

This behaviour is compatible with running from USB3.

I think at this point I'm not sure of how adding UARTs or monitors in either PL or hardened SoC features can help me.

There's still the possibility the PSU might fail in adjusting fast enough. I'm considering taking down the thing with me at work when I can use the scope to monitor 12v voltage in but that's pretty much it.

zygot · April 8, 2021

6 hours ago, MaxDZ8 said:

I have dumped "fpga DRU" in duckduckgo

Yeah, sorry about that. I meant to refer to the DRP. You might refer to UG780 which is the XADC User Guide. I think that there might be a few more related references in the Xilinx documentation.

I was just suggesting that you could try and tie a specific core or IO bank voltage drop to your HDL becoming active.

Trying to address the 12V supply that powers the FPGA power supply is unlikely to be of much help. The supply that powers the FPGA is limited to it's design specification and can only supply current to its outputs regardless of the input power that drives it. If you are certain that you are having a supply issue then you would seem to have two choices:

reduce the throughput of your HDL design. you've stated that you know where it worked before the current implementation.
find a different hardware platform

It's not unusual to scale a prototype design to fit within the capabilities of the prototype hardware you are working with. LD13 is ties to the FPGA supply controller "Power Good" status pin so if this is ever de-asserted while running your application then you have problems that you won't solve with a debugger. LD12 is tied to the FPGA CONFIG DONE pin and if you lose your configuration then the ARM AXI bus is sure to fault as there is nothing in the PL to handshake with.

I was just trying to throw out some suggestions for you to cogitate on.... one never knows what might provide a useful path to get by a roadblock. Personally, I have no problem starting a side project as an effort to solving a seemingly intractable problem. Sometimes these lead to new and fruitful lines of inquiry that I'd not have considered pursuing otherwise. Worst case is that I have a new tool to refer to when I encounter a similar issue.

Edited April 8, 2021 by zygot

MaxDZ8 · April 9, 2021

Thank you for the pointers, I have been interested in using XADC for long time so I will take this chance to read UG780.

It is my understanding bare metal apps run on core 0. I will try rebuilding the Vitis project to run on core 1.

I hoped the issue would be about main PSU dynamic response but if both Z7-10 and Z7-20 use TPS65400, then either the -10 variant is hugely overspec'd or the -20 is underpowered. The -20 is thee times bigger!

Nonetheless, I have been trying to run some... something very similar to the old device from weeks ago (it has a few thousand extra flip flops but that's it). Well, it seems I can't get anything really to run on it anymore!

My plan was to move this on A100 but with the ridiculous situation with silicon now, odds are I'll need to wait at least another 2 months! ?

zygot · April 9, 2021

Most of my bigger Series 7 projects use the XADC to monitor and alarm the substrate temperature. It's just not reasonable to expect cheap general purpose FPGA boards to provide the kind of thermal management capabilities that these devices might need for demanding applications. It's also not reasonable to expect an over-designed power supply to meet the requirements of any application. I suspect that even boards with a heat sink and fan like the Genesys2 can get configured with a design that will over tax the core and IO bank supply design. DDR is typically located close to the FPGA and warms up the board and PCB planes quite a bit. While it appears that you've managed to do something that I haven't, which is make the power supply cry uncle, I've certainly seen substrate temperatures venture into the danger zone on both general purpose platforms and custom designed ones as well.

You can use the XADC fairly easily on ZYNQ devices with the core but accessing the XADC through the DRP in PL logic with a UART provides a means for monitoring temperature and voltages when the cores and the debugger are no longer talking to each other. Of course if you lose your PL configuration that design isn't much use either.

Today's failure isn't necessarily a dead end; it can be an opportunity to learn something new or at least hone a skill. It's really all about your personal curiosity and attitude as to whether failure is a bad thing or a great thing....

Edited April 9, 2021 by zygot

zygot · April 10, 2021

19 hours ago, MaxDZ8 said:

My plan was to move this on A100 but with the ridiculous situation with silicon now, odds are I'll need to wait at least another 2 months!

Xillinx branded boards typically have more robust and heftier power supply designs.. at a cost premium. The ZC702 is a Z7020 board with 2 FMC connectors and might work. Unfortunately, these are no longer in production and the ones in distributor stockrooms have undergone a dramatic price increase over what the originally sold for. I don't know it Xilinx sells them directly anymore. The older Ti power module designs though can be a pain when things go wrong and are a bit clunky. I've had experience with this. Always refer to the schematic as part of your pre-purchase analysis.

I do like the idea of going with a non-ZYNQ platform if you don't need the ARM cores. It is indeed frustrating to find out that your purchased platform can't keep up with your project requirements. Doing your due digilence before purchasing hardware is goo practice and, as you've likely found out an expensive lesson. Fortunately, the Vivado tools can help with power estimation though when a design has lots of output pins being driven accurate estimation requires some detailed analysis.

Edited April 10, 2021 by zygot

MaxDZ8 · April 11, 2021

???

I'm not sure I understand those things. Probably because I have a different mindset.

First, minor thing: XADC user guide is UG480 (four hundred 80).

For the purpose of future readers, I think it is a good idea to document the progress what changed since last time?

On 4/9/2021 at 7:48 PM, MaxDZ8 said:

Nonetheless, I have been trying to run some... something very similar to the old device from weeks ago (it has a few thousand extra flip flops but that's it). Well, it seems I can't get anything really to run on it anymore

It turns in the refactor to time 200Mhz I flipped a bit in the 'start work' functionality which would cause the device to almost never transition back to ready state. By itself, this caused the CPU code stall. I took the occasion to rework a bit the thing to be more robust so I could at least run something.

The data I have acquired today

I can't bother pulling the old, known-to-work design for testing. It was a 6-stage pipeline clocked at 100Mhz with a Vivado estimation of 2118mW (that's accurate because I have it on note). I ran it four hours passively w/o heatsink and it ended clearly above ambient temperature. I wouldn call it even lukewarm.

By now, I have performed a couple extra runs (I have added a small heatsink to the SoC).

6 stages, 100Mhz, I forgot to pick the estimate but if memory serves it was about 2.3W. I left it running the whole night and it was definitely warm. A scanned a bit with an IR thermometer and measured 36C at the SoC and 33C at the TPS65400.
12 stages, 100Mhz. Vivado estimates 3.313W and 63.2C temperature. After about an hour running I measured 45C at the SoC and 39C at the TPS. I would believe +20C between heatsink and core to be enough of a diff, at that point the core would be at 65C. It starts being uncomfortable but I think there's still more thermal headroom... except... I have measured 58C on the caps between the SoC and LD4. I suspect the thermomether might have been fooled by the shiny surfaces, yet the whole board is definitely lukewarm. The situation seems to be worse on the back side.

I would classify both cases as "rock stable".

Additional considerations

On digikey, TPS65400 is 4.20668€/250 pieces. The arty z7-20 is less than 180 EUR. Considering the design costs, all the other components and assembly I think it is reasonable to think Digilent dimensioned the power supply for the -20 variant and shared the design with z7-10.

The reference seems a bit conservative with the currents. TPS65400 datasheet notes buck3 and buck4 can output 2A on first page. Junction is a hefty 150C.

The Arty Z7-20 schematic notes vcc1v5 and vcc1v8 both at 1.8A which sounds good to me. It seems vcc1v0 is a bit underpowered at 2.6A while vcc3v3 seems to be a lot lower at 1.6A but all things considered I doubt it really even needs that much.

Some simple and most likely wrong numbers, adding the wattage of each of those rails gives me 13.8W which would be 2.76 amps from at 5V. Not impossible but definitely more comfortable on 12V. Notably, the power adapter for PYNQ Z1, which is almost the same as Z7-20 pours out 3 amps! I have no clue how a chip such as 7Z020 can possibly dissipate even just the 13 watts from the TPS but I guess there should be some room.

Random thinking

Which clock should I be feeding to my MCMM/PLL? Is there any chance feeding it the AXI clock can give issues? I honestly don't like how it routes, perhaps there is a better candidate?

I need to do more tests.

Edited April 11, 2021 by MaxDZ8
That was supposed to be a question.

zygot · April 11, 2021

4 hours ago, MaxDZ8 said:

For the purpose of future readers, I think it is a good idea to document the progress

I second that motion.

If you open the Xilinx Document Navigator and do a search for XADC you will see quite a few references, some with code examples.

On the subject of your temperature measurements. I also have an IR thermometer as a quick 'safety' check when I connect new external hardware. I do this mostly in case there is bus contention or driving outputs to ground**. As a more general measurement as to how your components are faring when there aren't any defects in a design I would caution against putting too much faith in such measurements. Fortunately, the Series 7 devices all have the capability of measuring substrate temperature using the XADC. This, in my view, is the proper way to assess thermal conditions. All of the components on your board have maximum operating substrate temperature limits that need to be adhered to.

Let's recap the post so far. As I understand it, your initial observation was that once your PL design started running 2 LED indicators indicating the health of the power supply and FPGA configuration status indicated that both immediately indicated failure conditions. After this your communication between the SDK debugger and the ARM cores stopped. It's certainly reasonable, barring other considerations, to suspect that a drop out of some of the power supply rails precipitated these events. The next step might be to try and prove this hypothesis, and perhaps find a way around the root cause.

The ZYNQ has a number of PLLs on the PS side that generate derived clocks for all of the internal peripherals like DDR controller, Ethernet PHY, UART baud rates and of course AXI bus clocks. You can export these clocks to your PL. You can also use the PL MMCM and PLL clock generators to generate PL logic clocks. There's no wrong way to go as long as you adhere to basic clock domain principals for passing signals between clock domains. There are a lot of ways to do this correctly, but of course there are a lot more ways to do it incorrectly. Most high speed data transfer involves elastic storage, like a circular buffer or a FIFO. If there are two clock domains involved then the FIFO and buffer must have 2 clocks.

The full AXI BUS is not trivial to work with and there are a lot of ways to cause bus faults. Bus faults will certainly terminate your debugger session but I have no idea how they could cause the power supply controller to fail to regulate an output rail or cause the logic to lose configuration. ( I also don't do a lot of ZYNQ design work so I haven't had the need to do some serious investigation into all of the details about those devices... and the interesting details are usually not easy to find in the literature )

I'm not at all surprised by the push-back to my suggestion that expecting a cheap general purpose FPGA board aimed at the educational sector be able to handle the full capabilities of the FPGA devices is unreasonable. The Series 7 devices are really quite capable. What separates expensive commercial or military grade products from something like the Arty Z7-20 is testing and guaranteed specifications. That's what you're paying for when you buy expensive products. It's easy to underestimate the cost of over designing a power supply for a low profit margin product. I don't know of any vendor of general purpose FPGA boards in the educational market that provide a demo project that even attempts to explore the maximum operating conditions of their boards. In fact, it seems to me, that for most of these boards the vendors are banking on the user to create projects that use only a small subset of the external interfaces, and IO, and FPGA resources available and that few will ever need to do timing closure because of high clock rates on most of the logic resources. Beyond simply using a beefier power supply other ancillary costs usually go with enhanced performance, like more PCB layers, heavier copper planes, etc. , etc. Estimating production costs verses profit is a complicated business... and many companies don't do it well.

** You might think that this is the result of bad design processes, and in the end I suppose that you'd be correct. But it's a lot easier to get into these conditions than you might realize. A location constraint might be wrong or ignored by the tools ( always , always check the post route pin assignments) or perhaps you didn't get the timing constraints correct. Sometimes the tools automatically resolve a misunderstanding between your source and what they infer as your intent; and the only indication is a warning, among hundreds of messages, that something is terribly amiss. There are a lot of critters in the swamp of FPGA development waiting to take a bite out of you if you fail to notice their presence...

Edited April 11, 2021 by zygot

zygot · April 11, 2021

6 hours ago, MaxDZ8 said:

I flipped a bit in the 'start work' functionality which would cause the device to almost never transition back to ready state

This is where simulation can help, especially when making major modifications to a 'rock solid' design.

6 hours ago, MaxDZ8 said:

I can't bother pulling the old, known-to-work design for testing.

I've found that, when not confined to a code versioning system, archiving a project that is at a 'good stopping point' before making changes is a good practice. This make is easy to refer back to a previous known state in the project. Sometimes it just makes sense to keep archived snapshots of just the HDL source. Sometimes, it makes sense to save the project as a new one so that I can open either the old 'working' version or the new 'in progress' version. This is really no different than standard software development except that the FPGA tools create way more intermediate files on the HD. Not everyone works the same way but I thought that this was worth mentioning... you know, for future readers who might find it interesting.

MaxDZ8 · April 13, 2021

Hello Zygot, thank you again for the input. I don't know what to say more than that, your help is invaluable.

I need some more time to digest all this content. I feel the need clarify a key thing. There has been a time I when I followed and even advocated for best practices. In my spare time I cannot quite stick to it anymore - just today I got assigned to yet another year-long project which would require a whole team and I'll need to complete in in a few months!

I agree with you about power delivery; above I tried to make sense of whatever the power delivery was woefully unreasonable but certainly I am not in the position to assess the quality of those rails, whatever maybe grounds gets to bounce too much, or maybe it's about the dynamic adjustments or an interaction in switching activity. That's what I get, unfortunately. It would be a total let down.

As for source control, I would say Vivado project mode is whoefully inadeguate; I've been fighting it for months by now. Yes, it can be put in source control but it's super very noisy, especially as IP is involved. I honestly have no clue why Xilinx even proposes it as a starting point, at the end of the day I use Vivado just for the device view! The project is in a repository nonetheless, perhaps in the next few days I'll play fiddle with the previous versions as well, for the time being I will re-calibrate my understanding of device temperatures, as this seems to be a solid step forward.

Edited April 13, 2021 by MaxDZ8
Can I write?

MaxDZ8 · April 21, 2021

I think we have managed to find a common point, thank you.

I don't know how I will do in other chapters of this book but for the time being, I would be happy enough with putting a stop on this issue.

Pulling in the XADC measurement took more effort in finding the documentation than the work itself but the process has been straightforward.

The first batch of runs I made passively cooled. I tried:

6 stages, no buffers, 100 Mhz - LUT29% FF30%, Vivado estimations 2.415W, 52.9C. Highest tmax reported 54.7C after about 4 hours.
6 stages, full buffer, 200 Mhz - LUT 29% FF39, Vivado estimations 3.443W, 64.7C. Highest tmax reported 68.2C (this has been run by the day so the temperatures are not completely comparable, it was about 2C warmer). OFC it makes no sense to run this thing...
12 stages, no buffers, 100Mhz - LUT52% FF50%, Vivado estimations 3.328W, 63.4C. Highest tmax reported 70.7C after about 4 hours.

I must admit I am a very surprised by the temperatures, there can be up to 30 degrees between heatsink and core o_o ! This really puts way more emphasis on the industrial temperature range, I'll need to reconsider my hardware choices but considering other news today this is might be good thing.

Nonetheless, after this batch of runs, I have figured it was best to turn on the fan and went to another few tries.

12 stages, full buffer, 100 Mhz - LUT52% FF68%, Vivado estimations 3.188W, 61.8C. I let this run almost 8 hours and the highest tmax reported was 49.7C.
12 stages, full buffer, DSP, 100Mhz - LUT50%, FF67%, Vivado estimations 3.221M, 62.1C. After over 10 hours the highest temperature is 53.32C

All those run off USB3.

With those I think we explored pretty finely this power range. The goal is to run this at 12 stages and 200 Mhz. The design is simple enough as the resource usage is again LUT52% FF68% (I mean, the tool has no trouble at all at PAR nor timing, there are a couple bits replicated here and there but that's it).

As before, running 12 stages at 200 Mhz causes an instant hang. On USB I could understand but the same thing happens on +12. Let's see if I can manage to convey my disappointment. The ArtyZ20 documentation reads:

Quote

A USB 2.0 port can deliver maximum 0.5A of current according to the specifications. This should provide enough power for lower complexity designs. More demanding applications, including any that drive multiple peripheral boards or other USB devices, might require more power than the USB port can provide.

Certainly driving peripheral is a different matter. Yet I think this wording is not sensible. It seems to me USB power can max board capabilities. Granted, the USB host is key but the bottom line is reading this I understand the jack will provide more juice. I do not expect to run arbitrary bitstreams but I do expect there is a class of reasonable bitstreams and I would expect something being 99% pure compute to rank in this range. This is particularly important as a variant Z20 is also sold as PYNQ. I assume the PYNQ won't (by usual designs) run at higher performance than the base clock and so I assume it will be stable.

As much as I appreciate the benefits of having a higher voltage supply I think suggesting the jack can provide more power is a mistake. This doesn't seem to be the case. The seems to be no benefit in using the jack in my case. The board seems to me more limited than the documentation seems to suggest.

I'm tempted to hook to I2C and check TPS settings but I suspect it'll be useless. Perhaps it's just time to accept this is the limit of the board and that's it. It seems my vendor managed to find a couple boards at the bottom of the barrel so perhaps I will be lucky and get some other hardware by the end of the month.

MaxDZ8 · April 23, 2021

Thank you Zygot, honestly I couldn't believe my findings I hoped really there could be another way (but I knew since start your suggestion made sense!).

Yes, I agree. I'm trying various things for the time being as I approach a methodology which gives trustful results. It's taking me ages and I often forget things in the evenings but I'll be getting there... at some point ?

Nonetheless, I already bought a Kria. I am confident its "reasonable" level of performance will satisfy my needs.

Sakshi Verma · December 16, 2021

While troubleshooting any technical problems or power issues you must understand the nature of the problem first.

based on the problem detection we must understand and decide the troubleshooting process. While doing this we may need to use the different technical tools to resolve the problem. Advanced modern tools are sufficiently developed to solve the problem quickly.

artyz7-20 Troubleshooting a possible power issue

Question

Link to comment

Share on other sites

17 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in