Welcome! Log In Create A New Profile

Advanced

PanelDue causes random print failures

Posted by pantau 
PanelDue causes random print failures
January 29, 2015 06:33PM
I already explained my problem in another thread , but I think it's better to have it as a seperate topic as the problem still exists.


My setup is Ormerod 1 with firmware 1.00b-dc42 and PanelDue 1.01 from Think3DPrint3D.
Webinterface is also always running.

While trying to print a file I get this error:
Error: Attempting to extrude with no tool selected.

This happens occasional/randomly at different positions/times during the print.

This is not a "T0 in the gcode" problem.
To replicated I always used the same file, started the printer from off and did home the axes and bed level before starting the print.

What I found so far:
- The print NEVER fails if the PanelDue is NOT connected (10prints)
- The print sometimes fails if PanelDue is connected (5 failures, 4 ok). So this is an intermittend issue.
- After the failure the Head and the Bed are in off, axes still homed.

What I investigated:
- Accidential touches on the Panel:
Nobody in the room while printing, no vibration. Also I put the Panel in the "Files" screen, where no buttons exist, still print stopped.
- Reboot of Duet for unknown reason:
Didn't happen, a typical log:
23:16:44 Platform Diagnostics: Memory usage: Program static ram used: 43760 Dynamic ram used: 45464 Recycled dynamic ram: 888 Current stack ram used: 1696 Maximum stack ram used: 5756 Never used ram: 2436 Last reset 00:10:06 ago, cause: power up Error status: 0 Bed probe heights: 0.000 0.247 0.272 0.570 0.153 Free file entries: 9 Longest block write time: 0.0ms Slowest main loop (seconds): 0.054688; fastest: 0.000000 Move Diagnostics: MaxStepClocks: 0, minCalcClocks: 999, maxCalcClocks: 0, maxReps: 12 Heat Diagnostics: GCodes Diagnostics: Move available? no Network diagnostics: Free connections: 15 of 16 Free transactions: 23 of 24 Free send buffers: 19 of 20 Webserver Diagnostics:
23:16:00 Error: Attempting to extrude with no tool selected.
23:09:49 File [z-motor-brace.g] sent to print
23:06:45

Another log has an additional entry:
22:46:48 Platform Diagnostics: Memory usage: Program static ram used: 43760 Dynamic ram used: 45464 Recycled dynamic ram: 888 Current stack ram used: 1696
Maximum stack ram used: 5756 Never used ram: 2436 Last reset 00:20:23 ago, cause: power up Error status: 0 Bed probe heights: -0.135 -0.058 -0.087 0.351 -0.143
Free file entries: 9 Longest block write time: 0.0ms Slowest main loop (seconds): 0.054443; fastest: 0.000000 Move Diagnostics: MaxStepClocks: 0, minCalcClocks: 999,
maxCalcClocks: 0, maxReps: 13 Heat Diagnostics: GCodes Diagnostics: Move available? no Network diagnostics: Free connections: 15 of 16
Free transactions: 23 of 24 Free send buffers: 19 of 20 Webserver Diagnostics:
22:46:33 Error: Attempting to extrude with no tool selected.
22:41:15 Error: Setting temperature: no tool selected.
22:41:14 File [z-motor-brace.g] sent to print

The Error: Setting Temperature shows up sometimes, but never stops the print.

- Electrical Noise:
I did reroute/change the wiring harness, problem persists. I can leave the harness plugged into the Duet, as long as the Panel is not connected at the other end, all print work well.

- Reset of Panel causing issue:
I didn't observe any reset, but pressing the reset button on the panel never stops the print (immediately).

So after all this tests, I'm pretty sure that it has something to do with the communication between the panel and the printer, but I couldn't identify the root cause.

Anybody with this panel having simular issues?
Any idea what could cause this issue? Or what to investigate next?

Thanks

Peter

Edited 2 time(s). Last edit at 01/29/2015 07:08PM by pantau.
Re: PanelDue causes random print failures
January 29, 2015 07:34PM
Hi Peter, I'm sorry to hear you are having problems.

It appears to me that the printer has executed an M0 or M1 command for some unknown reason. There is no other situation in which all heaters get turned off but the machine has not been reset, which from your logs and your observation that both the hot end and the bed are off (not on standby) is what has happened.

One possibility that occurs to me is that an overrun error is occurring either in the serial port or the input buffer, and that the M105 command that the PanelDue sends is being misinterpreted as M0. But I'm puzzled that I am not seeing this problem and nobody else has reported it.

Unfortunately, the Arduino Due core that manages the serial port does not provide any facilities for detecting overrun errors. This is another case of the Arduino core being a hindrance rather than a help. I already had it in mind to ditch the Arduino core in RepRapFirmware, and this issue adds weight to that plan. However, getting rid of the Arduino core will take a little time.

In the meantime, I could add checksums to the commands sent by PanelDue. The Duet firmware includes code to check the checksum (although I don't know whether it has ever been tested), and reject the command and request a resend if it doesn't compute. Unfortunately, if the * character that introduces the checksum it dropped, this won't help. Another possibility is to reduce the baud rate from 115200 to 57600, which should reduce the chance of dropped characters.

Can you attach your gcode file and config.g to a post, so that I can try the same print?



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 29, 2015 07:54PM
Dave,

the files I use are in the attached zip archive. I always used the setbed.g, so it is included as well.
I have M0 in my custom end gcode. But I don't think the gcode interpreter could jump there?
I was also wondering why nobody else has this issue. That's why I also looked into EMI. And I still can't rule it out completely.
I was thinking that would contribute to bitflips on the serial link. But buffer overrun is another possibility. I haven't looked at all your code, but adding checksum to the link is no bad idea anyhow.
I saw you didn't use parity as well. Do you think that could be helpful? Should be easy to switch on, right?

Thanks again

Peter
Attachments:
open | download - print_setup.zip (200.8 KB)
Re: PanelDue causes random print failures
January 29, 2015 07:59PM
If the problem is an overrun error, turning in parity won't help.

I've uploaded a version of 1.00c Duet firmware with the baud rate reduced to 57600 to the usual place, and I'm about to upload a 57600b PanelDue binary for the 4.3 inch screen. Please try them. Look for the files with 57600 in the filename.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 30, 2015 05:28AM
Quote
dc42
If the problem is an overrun error, turning in parity won't help.

I've uploaded a version of 1.00c Duet firmware with the baud rate reduced to 57600 to the usual place, and I'm about to upload a 57600b PanelDue binary for the 4.3 inch screen. Please try them. Look for the files with 57600 in the filename.

Ok will try in the evening, but shouldn't there be a dependency on which screen I'm in on the PanelDue or if I press buttons if it is an overrun? At least I couldn't see this in my tests.

I will also check the voltage levels on the serial link, haven't done this so far.
Re: PanelDue causes random print failures
January 30, 2015 07:16AM
Quote
pantau
Ok will try in the evening, but shouldn't there be a dependency on which screen I'm in on the PanelDue or if I press buttons if it is an overrun? At least I couldn't see this in my tests.

The PanelDue send a M105 polling message every 2 seconds or so when you are not pressing anything, no matter what screen you are in. The only exception is when you are in the touch calibration screen.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 30, 2015 08:11AM
Quote
dc42
The PanelDue send a M105 polling message every 2 seconds or so when you are not pressing anything, no matter what screen you are in. The only exception is when you are in the touch calibration screen.
That's what I saw in the code. My question was the other way around: Assuming the root cause is a buffer overflow in the Duet: Shouldn't I get more problems if I use/play around with the panel during print as I increase the data flow?
And I haven't seen this dependency so far.
Re: PanelDue causes random print failures
January 30, 2015 09:10AM
115200Bd to signal between Duet and panel sounds pretty optimistic given that the cable length could well be up to a meter or so of ordinary wire.

At: [www.tldp.org]
The max cable length at 56000Bd is cited as 2.6m, so 115200Bd will be shorter - and that assumes true RS232 signalling levels and shielded cable of the correct impedance. If the connection is not made via coax or twisted pairs, and the signal levels fall short of true RS232 levels the distance is considerably shortened. Assuming that you are using a MAX232 or similar for level changing at each end, it will only be achieving +/- 8V or so and with 1m of wire I'd say that errors at that baud rate are very likely.

If all that is needed are a few characters at 2 second polling intervals, and maybe a one-off file-list collection signalling could probably go down even as low as 600Bd with little or no detriment. Maybe put a 'scope on the Rx line at the Duet end and see what the signal looks like - I've often been amazed at the amount of ringing that occurs over even short cable lengths. These days with serial speeds of 1000Mbps over Ethernet cable and modems running at 75Mbps over standard telephone cable to a street cabinet we tend to forget how sophisticated the transceivers and signalling methods have had to become to achieve anything like that speed. 9600Bd was considered very cutting-edge at one time!

If 115200Bd is really desirable, it may be an idea to use RS422 or similar differential transceivers (on daughter boards) connected with twisted pairs for each signal, or at the very least use separate 100-ohm twisted pairs for Rx and Tx if the signalling is at single-ended RS232 levels.

Dave
Re: PanelDue causes random print failures
January 30, 2015 09:13AM
I don't think the cause is a software buffer overflow, I think it's more likely to be a hardware buffer overflow in the serial port, caused by the baud rate being high, the lack of more than one hardware buffer in the chip, and occasional long interrupt latency during fast head moves. When you adjust temperatures, speed factor etc. the PanelDue doesn't send data until you press Set, so it's difficult to increase the data flow much. Also, the commands sent when you press Set are probably less easily misinterpreted as M0. OTOH if M105 is received and alternate characters are dropped, you get M0.

I am looking at trying DMA to read the serial port, and/or bypassing the Arduino core so that I can check for overrun errors. But reducing the baud rate may be a quick fix for now.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 30, 2015 10:34AM
Quote
dmould
Assuming that you are using a MAX232 or similar for level changing at each end, it will only be achieving +/- 8V or so and with 1m of wire I'd say that errors at that baud rate are very likely.
Dave

Dave (dc42),
are the schematics from the PanelDue available? Haven't seen them on the github. I don't have access to the board right now, but I don't recall having seen a MAX 232, so I assume the link is TTL?
Re: PanelDue causes random print failures
January 30, 2015 11:21AM
I haven't got round to putting up the schematic on github yet, but I will do so soon. The link is at 3.3V logic levels, with a 2.2K series resistor in each signal line and a Schottky protection diode to +5V on the microcontroller side of the resistor. I haven't seen any errors at 115kb in testing with up to 750mm of cable.

The data packets sent by the Duet are typically about 280 bytes long, with a longer response occasionally. However, transmission is not interrupt driven at the Duet end (yet another reason to ditch the Arduino core), so if it takes an excessive amount of time to send the response, printing may be slowed.

Edited 1 time(s). Last edit at 01/30/2015 11:29AM by dc42.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 30, 2015 01:34PM
Quote
dc42
I haven't got round to putting up the schematic on github yet, but I will do so soon. The link is at 3.3V logic levels, with a 2.2K series resistor in each signal line and a Schottky protection diode to +5V on the microcontroller side of the resistor. I haven't seen any errors at 115kb in testing with up to 750mm of cable.

I really would not trust a logic level serial link at that baud rate over more than a few cm. I've even had errors at 9600Bd on a TTL (5V) logic level link over about 3m of loose wire, and when I looked at the received signal on a 'scope it was not all that surprising. It's OK for something non-critical such as a debug port, and you may get away with it if you have good error detection/correction, but IMO guaranteed error-free comms via a simple asynch serial link either needs a pretty low baud rate or good, impedance matched driver hardware and paired or screened cabling for any distance more than finger-length or so if you cannot tolerate errors. I didn't realise there were so many bytes per poll needed, but 4800Bd would allow just under 2 packets per second (assuming 10 bits per byte on the link), which if the poll rate is every 2 seconds should be plenty sufficient, and it may be possible to get away with 2400Bd.

Apart from direct and ground induced noise issues and increases in rise and fall times, there will be significant ringing on an unmatched line, and if the ringing time period is an unfavourable multiple of the bit rate it can easily cause a 1 to read as a 0 or vice-versa. This means that a short line will in some cases be worse than a longer line because the ringing frequency happens to be less favourable. The baud rate should be such that any ringing is negligible (below 10% signal level at least) within one quarter of a bit-time, and preferably within the first 1/8 bit time.

Dave
Re: PanelDue causes random print failures
January 30, 2015 05:58PM
Quote
dmould
I really would not trust a logic level serial link at that baud rate over more than a few cm. I've even had errors at 9600Bd on a TTL (5V) logic level link over about 3m of loose wire, and when I looked at the received signal on a 'scope it was not all that surprising. It's OK for something non-critical such as a debug port, and you may get away with it if you have good error detection/correction, but IMO guaranteed error-free comms via a simple asynch serial link either needs a pretty low baud rate or good, impedance matched driver hardware and paired or screened cabling for any distance more than finger-length or so if you cannot tolerate errors,

That's a nice theory, so I hung an oscilloscope on the Din and Dout pins of the PanelDue, with a 750mm cable connecting it to the Duet. Here is the trace on Dout:



This was at 57600 baud. The Din trace is similar but the rise and fall times are faster as there is no 2.2K series resistor to slow down the signal from the Duet. There is no sign of ringing. This is just as I would expect, because ringing is normally only a problem at higher frequencies than this or with very long cable lengths. So I really don't think the cabling is a problem, even at 115200 baud. I still think the problem is likely to be the serial receive interrupt latency caused by the step ISR. Unfortunately, the UART on the Duet has only a 1-character receive holding buffer. If reducing the baud rate doesn't fix the problem, then I'll try using DMA, or reprogramming the NVIC so that the serial receive ISR can interrupt the step ISR.

Edited 3 time(s). Last edit at 01/30/2015 06:03PM by dc42.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].

Re: PanelDue causes random print failures
January 31, 2015 08:46AM
Quote
dc42
Quote
dmould
I really would not trust a logic level serial link at that baud rate over more than a few cm. I've even had errors at 9600Bd on a TTL (5V) logic level link over about 3m of loose wire, and when I looked at the received signal on a 'scope it was not all that surprising. It's OK for something non-critical such as a debug port, and you may get away with it if you have good error detection/correction, but IMO guaranteed error-free comms via a simple asynch serial link either needs a pretty low baud rate or good, impedance matched driver hardware and paired or screened cabling for any distance more than finger-length or so if you cannot tolerate errors,

That's a nice theory, so I hung an oscilloscope on the Din and Dout pins of the PanelDue, with a 750mm cable connecting it to the Duet. Here is the trace on Dout:

This was at 57600 baud. The Din trace is similar but the rise and fall times are faster as there is no 2.2K series resistor to slow down the signal from the Duet. There is no sign of ringing. This is just as I would expect, because ringing is normally only a problem at higher frequencies than this or with very long cable lengths. So I really don't think the cabling is a problem, even at 115200 baud. I still think the problem is likely to be the serial receive interrupt latency caused by the step ISR. Unfortunately, the UART on the Duet has only a 1-character receive holding buffer. If reducing the baud rate doesn't fix the problem, then I'll try using DMA, or reprogramming the NVIC so that the serial receive ISR can interrupt the step ISR.

That trace looks really clean and is a surprise to me, as I have had very different experiences with long lines connected to standard logic outputs. Maybe logic drives have improved since I last tried driving a long line, or your connection leads happen to be well matched - might be an idea to scope with the leads supplied by the 3rd party PanelDue supplier. Makes me wonder whether RS232 drivers are needed and why the max recommended cable lengths are so short! At that baud rate I would say that you either need a UART FIFO buffer of 5 bytes or so, or an interrupt driven receive routine that isn't masked for many CPU cycles. The interrupt routine can be extremely short & simple so it doesn't impact other processes - just read the character and put in a receive ring buffer. I have no idea how easy it would be to add a UART interrupt routine to the existing code.

If the issue is with buffer overflow however, it does not explain why one person will have symptoms pretty frequently and not another, and the issue should be able to be reproduced with a debug program pretty quickly. Maybe Peter is the only person so far to have done much printing with the panel?

[Added] I just noticed your trace is of the DOUT of the PanelDue. Could you try a trace on the DIN on the Duet?

Dave

Edited 1 time(s). Last edit at 01/31/2015 08:48AM by dmould.
Re: PanelDue causes random print failures
January 31, 2015 09:21AM
I've just released Duet firmware 1.00d and PanelDuet firmware 1.02. These both allow selection of the baud rate. The Duet firmware also has provision for mandatory checksums in the received commands, and the PanelDue generates the checksums. More details in the Duet and PanelDue firmware threads.

dmould, the serial receive code in the Arduino core is already interrupt-driven. However, I don't currently allow anything to interrupt the step ISR. I've done more than 10 prints with PanelDue connected, and never seen the problem. If pantau is using higher speeds for non-printing moves than I am, or is doing a print with a lot more non-printing moves, then that could increase the chance if the receive character interrupt latency becoming too great.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 31, 2015 11:08AM
Dave,

you change things faster than I can test them... :-)
Unfortunatly the intermitted character of this failure makes it not easy to test. Yesterday I had a successful 2h print. Today the same print failed after 1:35h with the well known error. This was with the 57600 baud versions (1.00c/ 1.01). So seems like this is not (the only) solution.
I also looked at my link with a scope. It looked realy terrible, but actually so terrible that I need to double check my measurement setup. Need to find the time and will report back.
I will also try the new versions you released, but I set my hope right now on the checksums, so will try this first.

My non-print moves are at 150mm/s, most of the print at 32-40mm/s. How does this compare to your setup?

Thanks

Peter
Re: PanelDue causes random print failures
January 31, 2015 02:39PM
I have non-printing moves set to 100mm/sec in slic3r. Over the last couple of days, I have completed three 3-hour prints (PanelDuet enclosures) and several test prints, all with a PanelDue attached.

Quote

I will also try the new versions you released, but I set my hope right now on the checksums, so will try this first.

The checksums are only supported in Duet firmware 1.00d and later, and PanelDue firmware 1.02 and later. But if the scope trace looks terrible, you need to investigate that. I took my scope trace with the probe on the Dout pin of the Molex connector, and the earth clip pressed against the back of the pad for TP1.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
January 31, 2015 06:42PM
I did check my serial com with a scope. Scope was connected at Dout of the Panel at the Molex, GND at T1. Signal look good, see attached files.
So signal quality is not the issue.
I did upgrade to 1.00d and 1.02, activated checksum and set baud to 57600.
I only had time for one print (2h) and that went through. But that happend before, will continue to test and keep you updated.
Attachments:
open | download - IMAG002.jpg (34.7 KB)
open | download - IMAG003.jpg (37.3 KB)
Re: PanelDue causes random print failures
January 31, 2015 07:46PM
DOUT is probably not the best place to measure what the Duet is seeing. The DIN on the panel will give a better indication of what the Duet is seeing, but even there it assumes that the logic driver & receiver of the Duet are the same as those of the panel. The signal at the input end of a transmission line is not necessarily the same as the signal at the other end. If there are any components in series with the UART input pin of the Duet's CPU on the Duet board, the signal should be measured after those components.

A dedicated UART usually has a Schmitt input to better reject noise, but when the UART input is multiplexed with other GPIO functions that is not usually the case.

I understand why it is necessary to disable interrupts during critical move routines, but it should be possible to have the interrupt processing so lean that the Rx interrupt can safely be left enabled (if it is put on a FIQ interrupt the whole interrupt process could probably be reduced to 5 or 6 low-cycle machine code instructions). I don't know without looking at the CPU datasheet, but all the ARM SOCs that I have used have UARTS that at partially emulate a 16550 and can thus be set to have a small Rx receive buffer (16 bytes or so).

Dave
[SOLVED] Re: PanelDue causes random print failures
February 04, 2015 06:00PM
Just a final update.
I had no further errors with Duet firmware 1.00e and PanelDue 1.02 @ 57600Baud and checksum enabled.
For my setup the checksums solved the issue, decrease in baud rate didn't help at all.
This is after 4 more days with 8 prints.

Thanks again for the great support.

Peter

Edited 1 time(s). Last edit at 02/05/2015 04:01AM by pantau.
Re: PanelDue causes random print failures
February 05, 2015 02:21AM
Thanks for the update! Would you care to add [SOLVED] to the subject line now?



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
[SOLVED] PanelDue causes random print failures
February 05, 2015 11:07AM
Quote
dc42
Thanks for the update! Would you care to add [SOLVED] to the subject line now?

I tried, but it seems that I can just edit my last post?
Am I overlooking something?
Re: [SOLVED] PanelDue causes random print failures
February 05, 2015 11:21AM
Quote
pantau
Quote
dc42
Thanks for the update! Would you care to add [SOLVED] to the subject line now?

I tried, but it seems that I can just edit my last post?
Am I overlooking something?

That's OK, there seems to be a time limit on editing old posts. Thanks for trying.



Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].
Re: PanelDue causes random print failures
February 05, 2015 12:21PM
Enabling checksums has cured the main symptom, but has not solved the root cause. Presumably any command with an incorrect checksum will be ignored - so while the Duet will not execute incorrect commands it will still be missing the occasional command (which is unlikely to cause any big problem but is still not right). It might be an idea to put some temporary debug in the Duet that allows you to see what was received whenever a checksum error occurs (and how often such errors occur). This will determine whether it is missing characters (and if so how many) or receiving incorrect characters. The fact that the error is still occurring at half the baud rate (with checksums not enabled) makes me doubt whether the issue is caused by failure to service the UART in time, because if that is the case I would expect it to occur reasonably frequently at the default higher baud rate on other machines. It may be that Peter has a particularly bad environment in terms of electrical noise, or some other factor that is affecting him alone and it would be good to know the cause for certain so that whatever it is can be either circumvented, avoided, or dismissed as an error in Peter's setup.

If a slower baudrate is required, it should be possible to greatly reduce the number of characters needed in a packet by using tokens and binary values instead of G-commands and ASCII - one byte can represent 256 different commands or requests, and parameters could probably be sent as a binary number rather than an ASCII string - e.g. the head temperature can be transferred in one byte instead of 3 ASCII numbers. Only user macros & similar need be sent in full. Files can be selected using a parameter of a single byte if they are given a number when the names are first sent from Duet to panel. The code to convert from token/binary to full ASCII would be pretty simple, and I suspect would result in close to a 90% reduction in traffic.

Dave
Sorry, only registered users may post in this forum.

Click here to login