Jump to content
  • 0

AD2 and WF Communication Failed error popup


Bruce Boyes

Question

Today WF has thrown two of these Errors. I am using the USB cable which came with AD2, WF version as shown, Ubuntu 18.04. Just started using AD2 on this Ubuntu machine. I'm switching USB ports. Have not seen this on Windows10 notebook with same device and same USB cable. Any ideas? Forum search didn't find this error mentioned. Maybe something peculiar to Linux? Happy to send any further details or logs if you tell me where to find them.

 

DptiIOfailed.png

WFVersion.png

Link to comment
Share on other sites

13 answers to this question

Recommended Posts

Hi @Bruce Boyes,

I have moved your question to a more appropriate section of the Forum, where the creator of the WaveForms software will be able to see and respond to your question. As a preliminary question, is this Ubuntu machine a VM on the same Windows 10 notebook you mentioned or a different computer entirely?

Thanks,
JColvin

Link to comment
Share on other sites

21 hours ago, attila said:

Hi @Bruce Boyes

This ERC 0x2 indicates communication timeout error.
You could try disabling the usb autosuspend in the OS/kernel.

Did that. No better. It is saving each triggered acquisition and it fails after 500 to 2500 events, even with holdoff of 1 second (so not much data transfer per second). I had it in Option 2 (16K Samples) but had time set for only 8K, it still failed.

Link to comment
Share on other sites

Here I'll keep a running total of what I have tried and what worked or didn't. I have to find a stable solution or quickly switch to other hardware. We are scheduled to log in the field well before the Christmas holiday, as soon as we can be ready.

In our real application, trigger events are expected to be very infrequent (perhaps not even one per day) but obviously if USB connection to the AD2 fails, and WF just halts, we have failed the whole acquisition attempt.

AD2, Ubuntu 18.04, Asus UN62, Intel Core i5,  test signal asserts every 100 msec, holdoff 0, (OpenScope MZ also running in Chrome, sampling trigger output from AD2). At 100 KHz: 16 msec/div, at 50 KHz: 32 msec/div

  1. Option 1 (8K Samples),  Time 8192 Samples, 50 KHz, 3999 logged acquisitions success: 4, fails: 0.
  2. Option 2 (16K Samples), Time 16384 Samples, 50 KHz, 3999 logged acquisitions success: 0, fails: 2 l@ 3226, 3434 acquisitions
  3. Option 2 (16 K Samples), Time 16384 Samples, 100 KHz, 3999 logged acquisitions success: 1, fails: 3 @ 366, 1 after max index reached
  4. Option 2 (16K Samples), Time 8192 Samples, 50 KHz, 3999 logged acquisitions success: 0, fails: 2

Communication fails while just monitoring the scope, as you would in the lab. No logging happening, but Execute was set to "Each Triggered Acquisition" but Export said "Not saving. Maximum index reached". I changed Execute to "Manual" to see if that made any difference: no.

There seems to be correlation with the Device Manager setting for number of samples: 8K has never failed and 16K always fails. Oddly it doesn';t seem to be the number of samples sent over USB but the number being contemplated by the AD2. Setting for 16K but only sending 8K (via the time base) still fails. Sample rate doesn't seem to be the issue.

Also, OpenScope MZ continued to run flawlessly on the Linux system, monitoring trigger out from the AD2.

AD2, Windows 10 Pro, Thinkpad Yoga S1 notebook - this is my main daily use notebook. I notice startup is much slower on Windows, so is deleting the temp log files, by like 100X: Windows deletes 80 files per second, so for 3999 files this takes a while. Anyway, after two tests I turned unchecked "noise" per Atilla's suggestion.

  1. Option 2 (16K Samples), Time 16384 Samples, 100 KHz, 3999 logged acquisitions success: 8, fails: 0. In addition, no comm error while just viewing waveform on screen.

AD2, Windows 10 Pro, Thinkpad x11e notebook - this is the notebook we can leave on site for extended logging

  1. Option 2 (16K Samples) can run for days no error... then suddenly there is a popup with Comm error 0xA. This was while just viewing on screen, not logging.

 

Link to comment
Share on other sites

5 hours ago, attila said:

The data transfer timeout over USB is set to 10s which should be more than suffice to transfer 16k samples (64kB). I suppose there is some OS/kernel or laptop/hardware issue.

Agreed timeout would not seem to be the issue here... AD2 also has some USB data loss issue with Linux on Raspberry Pi... any connection?

When this connection error occurs, what does it really mean (handshake lost, data CRC failed, etc) and what does the software do? Should I expect any better result under Windows 10? Is there any hardware data I can provide to help? Any logging of detailed error information?

Every time this ERC 0x2 communication timeout error happens, I can always reconnect by clicking in Waveforms, so apparently the hardware is OK at some level... so why can't WF automatically do this? Halting on an error means if this happens in the field, our whole data mission is lost.

Link to comment
Share on other sites

On Raspberry PI it is some problem with the software fifo in the usb library, the data is altered or bytes are lost. This manifests in wrong data reception or communication error. I didn't notice any errors in the USB protocol (CRC, handshake) but with the higher level usb library. Due to this most of the time you can't even connect with the RPI, in the best case you can perform one acquisition before the error rises.

The communication between the software (waveforms runtime) and device (firmware) is based on two fifo channels, download and upload.
For upload software sends a packet of data requesting a certain amount of bytes to be uploaded by the device, then it waits to receive this data. 
The ERC 0x2 indicates that it did not receive the requested data. Due to this the communication can get out of sequence and simple retry won't help. Such communication errors should be handled/resolved at the USB level...

I have not seen such problem except the known RPI issue, but I will try to reproduce leaving captures running overnight.

Link to comment
Share on other sites

Thanks! I understand your explanation. It does sound like some FIFO control error. That would be great if you could fix this. Leaving a headless Linux system for remote logging is a lot more practical than Windows... in all other experience with USB on a small variety of Linux PC and desktops, I find Linux does a much better/faster job with USB devices, especially development systems where I am downloading new code maybe dozens of times a day or loading test code into 250 new systems. So I am really surprised that in this case, on mainstream Intel hardware on Asus motherboard, USB is worse under Linux. On this particular system I have done exactly those tasks with never a problem. I hope you can find this and get it fixed. I will be interested to hear what the problem and solution are. Sincerely...

Link to comment
Share on other sites

On 12/10/2018 at 5:34 AM, attila said:

 I have not seen such problem except the known RPI issue, but I will try to reproduce leaving captures running overnight.

Just now there is a new Comm error: 0xA, this is on a ThinkPad Yoga 11e with Win 10 Pro, one instance of Waveforms 3.8.2 is the only application open. AD2 is in option 2, only the Scope is open. Just viewing, not logging.

What is error 0xA?

This is a deal killer if this happens while the system is unattended since all operation ceases and there is no recovery without a visit to the field site.

FYI I have Windows auto update turned off, so it will not download or install without my permission, so that is not the issue here. I have sleep turned off, and just now in addition turned off even checking for updates for 35 days (the most allowed).

FYI in the Windows Defender history I can see it did to a scan and installed a threat definition update this morning. I don't see how to turn checking for those off. But that alone I would hope would not cause USB comm errors. I just forced a scan now: no comm errors. I'm starting a more rigorous full scan now to see if it causes comm errors: so far, not but it has 3 hours to go. Previous scans have not found any threats.

AD2_Win10_TPYoga11e_CommFail_0xA_Capture.PNG

Link to comment
Share on other sites

5 hours ago, attila said:

Hi @Bruce Boyes

It could be USB power limitation from the laptop, an USB cable contact issue or it might be a cold solder of the connector.
Try using other cable, powered hub and inspect the soldering.

Can do. We just received another AD2, so we could test on a whole other PC with different cable and different AD2. This PC is using the cable which came with the AD2.  We have a full electronics shop with stereo microscopes etc so can give it a good look.

And just to be clear, this PC with the 0xA error (what does that mean exactly?) is a completely different PC from the Ubuntu system with the 0x2 error.

Thanks for the reply. We'd like to get this tracked down. We were hoping to use two AD2s as our main remote logging devices. Linux is highly preferred because of all the junk in Windows 10 which cannot be shut off. You'd think this instrumentation would be easy for us to put together but it is not proving to be the case.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...