Welcome! Log In Create A New Profile

Advanced

Arduino Firmware Reset Bug

Posted by ZachHoeken 
Arduino Firmware Reset Bug
February 05, 2008 10:51PM
Bad News,

I spent a solid night chasing a firmware bug that causes the Arduino to reset. This is most noticeable when using the stepper exerciser, as well as when you try to print with it.

I can reliably trigger the bug by 'homing' the x and y axes, then doing the 'line test' which does a DDA to the point. after 5-10 steps, the firmware will reset. also, similar things happen when you home either axis, then move a slider.

I'm not 100% sure whats happening, but from what I've gathered, it appears some sort of stack overflow is happening and the thing is restarting. I was able to determine this by using the snapDebug() command and putting it in the cartesian bot snap v1 setup function that gets called at the beginning of the program. when the bug happens, you'll see the output of this debug call in the serial debug terminal.

I am at wits end, and I have tried nearly everything I know to try and fix this bug. I thought the interrupt handlers were too big, so i whittled them down to nothing. I have tried adding an enable/disable call around various different parts. I've refactored many different parts of the code, but nothing helps!!

The aggravating part is that it works fine most of the time. Doing DDA or moving the sliders works GREAT... so long as its not at home. It seems that homing the axes has something to do with it and sets up the conditions for the reset.

If anyone has any idea on how to fix this, or wants to look at the code, please do so! I'm so close with this Arduino stuff and now this hurdle is in the way.

I've attached my latest version of the code for you to peruse. Any suggestions are welcome.
Attachments:
open | download - reprap-arduino-firmware-1.2-rc2.zip (48.3 KB)
VDX
Re: Arduino Firmware Reset Bug
February 06, 2008 01:10AM
Hi Zach,

... when homing, then you drive your actual position down to 0 and then even lower (until the end-switch is activated), so the actual position went negative, as your home-position is defined as "drive into the switch, then drive some steps out and this is your actual 0".

Maybe you have an overflow because of to high possible position-steps, which went even bigger, when counted negative?

Try to set your home-position at 100,100,100 for example, so your homing-in-coordinates stay in the positive branch all the time ...

Viktor
Anonymous User
Re: Arduino Firmware Reset Bug
February 06, 2008 05:18AM
When the bug happens you get the snapDebug output on your terminal? There are two places where it gets called, in the init stuff and at the end of case CMD_DDA in process_cartesian_bot_snap_commands_v1(). Which one do you see when the bug happens? I guess it is the second, so the bug gets triggered by a snap dda command...

When reading the code from there I see a potentially nasty bug:

else if (y_sync_mode == sync_dec)
bot.y.setTarget(bot.y.getPosition() - target);

There seems to be no checking for overflow or underflow. And when X or Y is near home, target can easily get too large and underflow. Try to add some checks for over/underflow here and see if they get triggered.

--Blerik
Re: Arduino Firmware Reset Bug
February 06, 2008 09:40AM
VDX Wrote:
-------------------------------------------------------
> Hi Zach,
>
> ... when homing, then you drive your actual
> position down to 0 and then even lower (until the
> end-switch is activated), so the actual position
> went negative, as your home-position is defined as
> "drive into the switch, then drive some steps out
> and this is your actual 0".
>
> Maybe you have an overflow because of to high
> possible position-steps, which went even bigger,
> when counted negative?
>
> Try to set your home-position at 100,100,100 for
> example, so your homing-in-coordinates stay in the
> positive branch all the time ...
>
> Viktor

when homing, it doesnt use a counter at all. it simply moves the stepper towards home until it hits a limit switch, then when it does it marks that as 0, and notifies the host. i doubt it would be an overflow.
Re: Arduino Firmware Reset Bug
February 06, 2008 09:43AM
blerik Wrote:
-------------------------------------------------------
> When the bug happens you get the snapDebug output
> on your terminal? There are two places where it
> gets called, in the init stuff and at the end of
> case CMD_DDA in
> process_cartesian_bot_snap_commands_v1(). Which
> one do you see when the bug happens? I guess it is
> the second, so the bug gets triggered by a snap
> dda command...
>
> When reading the code from there I see a
> potentially nasty bug:
>
> else if (y_sync_mode == sync_dec)
> bot.y.setTarget(bot.y.getPosition() - target);
>
> There seems to be no checking for overflow or
> underflow. And when X or Y is near home, target
> can easily get too large and underflow. Try to add
> some checks for over/underflow here and see if
> they get triggered.
>
> --Blerik

1. theres a snapDebug() called in setup_cartesian_bot_snap_v1() which is called once at the very beginning of the code execution. when the error happens, this function is getting called again and everything is reset to the initial program state.

thats a good suggestion. i'll look at it. however, i dont think its as simple as that. the bug happens both with DDA and when you try to slide the slider. i'll give it a look though.
Anonymous User
Re: Arduino Firmware Reset Bug
February 06, 2008 11:04AM
There is another call to snapDebug() in the code you attached. It's at the end of case CMD_DDA in process_cartesian_bot_snap_commands_v1(). Are you absolutely sure it's the call in init? And can you describe the ordering of things, eg:

* go to home position in doohickey foo
* slide the slider to blah
* snapDebug prints foobarbaz
* Arduino resets to starting conditions

So everybody that is looking at the code can at least try to reason about the bug?

--Blerik
Re: Arduino Firmware Reset Bug
February 06, 2008 12:02PM
hey blerik,

man, i really wish i would've saved the debug output. that would be immensely helpful. i'll try and get to it tonight, but i probably wont be able to post it until thursday.

Anyway, the snapDebug in CMD_DDA was to test to see how far the code got. here's what i can make it do *every* single time.

* home the Y axis
* start the 'line test' (dda command)
* the dda command completes, the snapDebug() at the end prints out properly
* the steppers take 3-5 steps (tested by setting speed to 0)
* the program crashes, and the first snapDebug() in the setup function gets called.

the bizarre thing is that this doesnt happen if i home the x axis and do a DDA, nor does it happen when neither axis is homed. once this is fixed, hopefully the code will be rock solid. i've done lots of refactoring and looking at the code to optimize it. thats a nice side-benefit i guess =)
Re: Arduino Firmware Reset Bug
February 06, 2008 01:27PM
Well if homing Y does it and X doesn't look for differences between X and Y in the s/w and h/w. Swap things round in the h/w and see which axis has the fault.

E.g. presumably the system is symmetrical so you should be able to swap X and Y motor connections and limit switches to rule out a hardware problem. You should also be able to swapper stepper driver boards.

I assume the code is identical apart from the I/O definitions so try swapping those.


[www.hydraraptor.blogspot.com]
Re: Arduino Firmware Reset Bug
February 06, 2008 01:32PM
PS, doesn't the Arduino have debugging facilities like break points and single step?

It isn't normal for crashed programs to reset unless you have a watchdog enabled or a processor with exception vectors pointing to the reset code.


[www.hydraraptor.blogspot.com]
Re: Arduino Firmware Reset Bug
February 06, 2008 02:05PM
unfortunately it does not have those facilities. its basically an atmega168 with the pins broken out to headers and a nice software suite built up around it. not much in the way of debugging tools.
Anonymous User
Re: Arduino Firmware Reset Bug
February 06, 2008 02:14PM
It would be really helpful if somebody can get a SNAP comms dump of this bug. Then you can really see what's going on, and reason about it. For all we know it is a bug in the host software that leads to a different sequence of SNAP commands when using the Y-axis instead of the X-axis.

There is a CommsDebug preference in the RepRap host code, can anybody with a working setup run the steps Zach has outlined, and post the debug output here?

--Blerik
Re: Arduino Firmware Reset Bug
February 06, 2008 04:20PM
if nobody else can get it, i'll have it up here by Thursday night at the latest. I'm determined to find and fix this bug. It must be eliminated!
Re: Arduino Firmware Reset Bug
February 07, 2008 11:02PM
as promised, here is the latest debug output. it has both Debug and CommsDebug enabled and is generated with the Single_Arduino_Snap firmware + libraries attached to this post. Host software is latest software from subversion.

I've embedded my comments directly in the debug. they should be obvious.

a couple comments now that i've bashed my head against this some more:

1. it appears the serial library is not 100% compatible yet. it would probably be a good idea to get this working 100% before trying the next bug. perhaps that will even fix it.

2. the bug seems to be centered around the home/reset command. probably a good place to look.

3. debugging serial comms is hard.

4. increasing the sleep after opening the serial port is probably a good idea. i'll try it now.

[java] DEBUG: Opening port /dev/tty.usbserial-A4001sts [0.000s/-1202442514370ms]
[java] Stable Library
[java] =========================================
[java] Native lib Version = RXTX-2.1-7
[java] Java lib Version = RXTX-2.1-7
[java] RXTX Warning: Removing stale lock file. /var/lock/LK.003.010.012
[java] DEBUG: Attempting to initialize Arduino [0.047s/47ms]
[java] comms: tx 0->8: 0 ( = 0 ) [2.059s/2012ms]
[java] comms: rx: 54 51 34 0 2 0 0 0 0 3d ( = 84 81 52 0 2 0 0 0 0 61 ) [2.059s/0ms]
[java] Received data packet when expecting ACK
[java] sendMessage error - retrying

this is the intial startup debug. it interrupts the version request which triggers a retry.

[java] comms: tx 0->8: 0 ( = 0 ) [2.160s/101ms]
[java] comms: rx: 54 52 30 0 8 c3 ( = 84 82 48 0 8 195 ) [2.160s/0ms]
[java] comms: rx: 54 53 30 0 8 4c ( = 84 83 48 0 8 76 ) [2.175s/15ms]

first one is an ack. second one is a nak. not sure whats happening here.

[java] DEBUG: Ignored and dropped packet [2.176s/1ms]
[java] DEBUG: X axis - setting maximum torque to: 50 [2.504s/328ms]
[java] comms: tx 0->2: e 22 ( = 14 34 ) [2.504s/0ms]
[java] comms: rx: 54 51 33 0 2 0 1 0 76 ( = 84 81 51 0 2 0 1 0 118 ) [2.527s/23ms]
[java] Received data packet when expecting ACK
[java] sendMessage error - retrying

this is the version reply finally coming through

[java] comms: tx 0->2: e 22 ( = 14 34 ) [2.628s/101ms]
[java] sendMessage error - retrying
[java] comms: rx: 54 53 30 0 2 32 ( = 84 83 48 0 2 50 ) [2.628s/0ms]

looks like a NAK on the last reply? probably not a big deal.

[java] comms: tx 0->2: e 22 ( = 14 34 ) [2.729s/101ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [2.729s/0ms]
[java] DEBUG: X axis - setting position to: 820 [2.729s/0ms]
[java] comms: tx 0->2: 3 34 3 ( = 3 52 3 ) [2.730s/1ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [2.751s/21ms]
[java] DEBUG: Y axis - setting maximum torque to: 50 [2.759s/8ms]
[java] comms: tx 0->3: e 22 ( = 14 34 ) [2.759s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [2.768s/9ms]
[java] DEBUG: Y axis - setting position to: 820 [2.768s/0ms]
[java] comms: tx 0->3: 3 34 3 ( = 3 52 3 ) [2.768s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [2.783s/15ms]
[java] DEBUG: Z axis - setting maximum torque to: 50 [2.786s/3ms]
[java] comms: tx 0->4: e 22 ( = 14 34 ) [2.787s/1ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [2.787s/0ms]
[java] DEBUG: Z axis - setting position to: 10922 [2.787s/0ms]
[java] comms: tx 0->4: 3 aa 2a ( = 3 170 42 ) [2.787s/0ms]
[java] comms: rx: 54 52 30 0 4 60 ( = 84 82 48 0 4 96 ) [2.799s/12ms]

clicked stepper exerciser and started up. everything going fine.

[java] DEBUG: Y axis - home reset at speed 1 [23.737s/20938ms]
[java] DEBUG: Y axis - setting notification on. [23.737s/0ms]
[java] comms: tx 0->3: 7 0 ( = 7 0 ) [23.738s/1ms]
[java] comms: rx: 54 52 30 0 4 60 ( = 84 82 48 0 4 96 ) [23.738s/0ms]
[java] comms: tx 0->3: 10 1 ( = 16 1 ) [23.739s/1ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [23.756s/17ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [23.756s/0ms]
[java] DEBUG: Unexpected ACK received instead of message, not supported yet [23.757s/1ms]

duplicate ack? not sure what caused this.

[java] comms: rx: 54 51 31 0 3 10 57 ( = 84 81 49 0 3 16 87 ) [26.027s/2270ms]
[java] DEBUG: Y axis - setting notification off. [26.027s/0ms]
[java] comms: tx 0->3: 7 ff ( = 7 255 ) [26.028s/1ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [26.043s/15ms]
[java] DEBUG: Y axis - seeking position 0 at speed 1 [26.044s/1ms]
[java] comms: tx 0->3: 5 1 0 0 ( = 5 1 0 0 ) [26.044s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [26.059s/15ms]
[java] DEBUG: Y axis - getting position. It is... [26.260s/201ms]
[java] comms: tx 0->3: 4 ( = 4 ) [26.260s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [26.283s/23ms]
[java] comms: rx: 54 51 33 0 3 4 0 0 a3 ( = 84 81 51 0 3 4 0 0 163 ) [26.284s/1ms]
[java] DEBUG: ...0 [26.284s/0ms]

clicked 'home' for y axis. successful.

[java] DEBUG: X axis - getting position. It is... [45.846s/19562ms]
[java] comms: tx 0->2: 4 ( = 4 ) [45.847s/1ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [45.864s/17ms]
[java] comms: rx: 54 51 33 0 2 4 34 3 d8 ( = 84 81 51 0 2 4 52 3 216 ) [45.864s/0ms]
[java] DEBUG: ...820 [45.864s/0ms]
[java] DEBUG: Y axis - getting position. It is... [45.866s/2ms]
[java] comms: tx 0->3: 4 ( = 4 ) [45.866s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [45.880s/14ms]
[java] comms: rx: 54 51 33 0 3 4 0 0 a3 ( = 84 81 51 0 3 4 0 0 163 ) [45.880s/0ms]
[java] DEBUG: ...0 [45.880s/0ms]

clicked 'line test' dialog comes up.

[java] DEBUG: X axis - getting position. It is... [69.307s/23427ms]
[java] comms: tx 0->2: 4 ( = 4 ) [69.307s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.317s/10ms]
[java] comms: rx: 54 51 33 0 2 4 34 3 d8 ( = 84 81 51 0 2 4 52 3 216 ) [69.333s/16ms]
[java] DEBUG: ...820 [69.333s/0ms]
[java] DEBUG: Y axis - getting position. It is... [69.333s/0ms]
[java] comms: tx 0->3: 4 ( = 4 ) [69.333s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [69.349s/16ms]
[java] comms: rx: 54 51 33 0 3 4 0 0 a3 ( = 84 81 51 0 3 4 0 0 163 ) [69.349s/0ms]
[java] DEBUG: ...0 [69.349s/0ms]
[java] DEBUG: X axis - setting sync to 0 [69.349s/0ms]
[java] comms: tx 0->2: 8 0 ( = 8 0 ) [69.349s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.365s/16ms]
[java] DEBUG: Y axis - setting sync to 2 [69.365s/0ms]
[java] comms: tx 0->3: 8 2 ( = 8 2 ) [69.365s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [69.381s/16ms]
[java] DEBUG: X axis - dda at speed 1. x1 = 200, deltaY = 100 [69.381s/0ms]
[java] DEBUG: X axis - setting notification on. [69.381s/0ms]
[java] comms: tx 0->2: 7 0 ( = 7 0 ) [69.381s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.397s/16ms]
[java] comms: tx 0->2: b 1 c8 0 64 0 ( = 11 1 200 0 100 0 ) [69.398s/1ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.413s/15ms]

clicked 'move to x: 200, y:100'

[java] comms: rx: 54 51 34 0 2 0 0 0 0 3d ( = 84 81 52 0 2 0 0 0 0 61 ) [82.034s/12621ms]
[java] DEBUG: Ignored and dropped packet [82.034s/0ms]

stepper takes 10-15 steps and dies. here it is at the beginning again.
Attachments:
open | download - reprap-arduino-firmware-1.2-bughunt.zip (47.9 KB)
Re: Arduino Firmware Reset Bug
February 07, 2008 11:35PM
update:

okay, i found and fixed some of the bugs with the comms. there were 2:

1. the snapDebug was throwing a wrench into the host softwares plans. i took it out and it worked a bit better
2. the reading of commands was slightly broken. if 2 commands were in the buffer, it would read both and try to process the second one not the first. i changed this so that it reads the buffer until the processing lock is set. this helps alot. i committed the code.

unfortunately, things are still broken. i feel alot better about how things are going though. i feel like i made progress and the problem feels within reach now.

i also have a plan for next lab day: (probably monday... going on vacation... mixed good and bad there, hehe. want. to. fix. bug.)

1. modify host software so that i can send special non-SNAP debug data that it will simply ignore and spit out as-is. this will make debugging SO MUCH EASIER.

2. debug the crap out of it with above feature =)
Anonymous User
Re: Arduino Firmware Reset Bug
February 08, 2008 07:26AM
Great big cut-n-paste coming up, my responses are >>

[java] DEBUG: Opening port /dev/tty.usbserial-A4001sts [0.000s/-1202442514370ms]
[java] Stable Library
[java] =========================================
[java] Native lib Version = RXTX-2.1-7
[java] Java lib Version = RXTX-2.1-7
[java] RXTX Warning: Removing stale lock file. /var/lock/LK.003.010.012
[java] DEBUG: Attempting to initialize Arduino [0.047s/47ms]
[java] comms: tx 0->8: 0 ( = 0 ) [2.059s/2012ms]
[java] comms: rx: 54 51 34 0 2 0 0 0 0 3d ( = 84 81 52 0 2 0 0 0 0 61 ) [2.059s/0ms]
[java] Received data packet when expecting ACK
[java] sendMessage error - retrying

this is the intial startup debug. it interrupts the version request which triggers a retry.

[java] comms: tx 0->8: 0 ( = 0 ) [2.160s/101ms]
[java] comms: rx: 54 52 30 0 8 c3 ( = 84 82 48 0 8 195 ) [2.160s/0ms]
[java] comms: rx: 54 53 30 0 8 4c ( = 84 83 48 0 8 76 ) [2.175s/15ms]

first one is an ack. second one is a nak. not sure whats happening here.

>> first one is ack of first request. Second one is nak of second request. It probably naks the second because it is still working on the first.

[java] DEBUG: Ignored and dropped packet [2.176s/1ms]
[java] DEBUG: X axis - setting maximum torque to: 50 [2.504s/328ms]
[java] comms: tx 0->2: e 22 ( = 14 34 ) [2.504s/0ms]
[java] comms: rx: 54 51 33 0 2 0 1 0 76 ( = 84 81 51 0 2 0 1 0 118 ) [2.527s/23ms]
[java] Received data packet when expecting ACK
[java] sendMessage error - retrying

this is the version reply finally coming through

>> now the host doesn't know the version, but it doesn't care. Everything else is in sync...

[java] comms: tx 0->2: e 22 ( = 14 34 ) [2.628s/101ms]
[java] sendMessage error - retrying
[java] comms: rx: 54 53 30 0 2 32 ( = 84 83 48 0 2 50 ) [2.628s/0ms]

looks like a NAK on the last reply? probably not a big deal.

>> It seems to be nak-ing the second request for setpower.

[java] comms: tx 0->2: e 22 ( = 14 34 ) [2.729s/101ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [2.729s/0ms]

>> ack on the setpower request, everything is fine

[java] DEBUG: X axis - setting position to: 820 [2.729s/0ms]
[java] comms: tx 0->2: 3 34 3 ( = 3 52 3 ) [2.730s/1ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [2.751s/21ms]

>> fine as well

[java] DEBUG: Y axis - setting maximum torque to: 50 [2.759s/8ms]
[java] comms: tx 0->3: e 22 ( = 14 34 ) [2.759s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [2.768s/9ms]

>> also fine

[java] DEBUG: Y axis - setting position to: 820 [2.768s/0ms]
[java] comms: tx 0->3: 3 34 3 ( = 3 52 3 ) [2.768s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [2.783s/15ms]

>> fine

[java] DEBUG: Z axis - setting maximum torque to: 50 [2.786s/3ms]
[java] comms: tx 0->4: e 22 ( = 14 34 ) [2.787s/1ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [2.787s/0ms]

>> fine

[java] DEBUG: Z axis - setting position to: 10922 [2.787s/0ms]
[java] comms: tx 0->4: 3 aa 2a ( = 3 170 42 ) [2.787s/0ms]
[java] comms: rx: 54 52 30 0 4 60 ( = 84 82 48 0 4 96 ) [2.799s/12ms]

>> Z-axis doesn't ack here!!!
clicked stepper exerciser and started up. everything going fine.

[java] DEBUG: Y axis - home reset at speed 1 [23.737s/20938ms]
[java] DEBUG: Y axis - setting notification on. [23.737s/0ms]
[java] comms: tx 0->3: 7 0 ( = 7 0 ) [23.738s/1ms]
[java] comms: rx: 54 52 30 0 4 60 ( = 84 82 48 0 4 96 ) [23.738s/0ms]
[java] comms: tx 0->3: 10 1 ( = 16 1 ) [23.739s/1ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [23.756s/17ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [23.756s/0ms]
[java] DEBUG: Unexpected ACK received instead of message, not supported yet [23.757s/1ms]

duplicate ack? not sure what caused this.
>> first request is notify on to Y-axis
>> response is ack, but from _Z-axis_ (delayed ack from above)
>> which explains the dup ack, Z-axis is interfering...
>> But the host doesn't care, so we won't as well smiling smiley

[java] comms: rx: 54 51 31 0 3 10 57 ( = 84 81 49 0 3 16 87 ) [26.027s/2270ms]

>> ^^^ Y-axis notifies that it has found home

[java] DEBUG: Y axis - setting notification off. [26.027s/0ms]
[java] comms: tx 0->3: 7 ff ( = 7 255 ) [26.028s/1ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [26.043s/15ms]

>> fine

[java] DEBUG: Y axis - seeking position 0 at speed 1 [26.044s/1ms]
[java] comms: tx 0->3: 5 1 0 0 ( = 5 1 0 0 ) [26.044s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [26.059s/15ms]

>> fine

[java] DEBUG: Y axis - getting position. It is... [26.260s/201ms]
[java] comms: tx 0->3: 4 ( = 4 ) [26.260s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [26.283s/23ms]
[java] comms: rx: 54 51 33 0 3 4 0 0 a3 ( = 84 81 51 0 3 4 0 0 163 ) [26.284s/1ms]
[java] DEBUG: ...0 [26.284s/0ms]

>> everything fine up to now...
clicked 'home' for y axis. successful.

[java] DEBUG: X axis - getting position. It is... [45.846s/19562ms]
[java] comms: tx 0->2: 4 ( = 4 ) [45.847s/1ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [45.864s/17ms]
[java] comms: rx: 54 51 33 0 2 4 34 3 d8 ( = 84 81 51 0 2 4 52 3 216 ) [45.864s/0ms]
[java] DEBUG: ...820 [45.864s/0ms]
[java] DEBUG: Y axis - getting position. It is... [45.866s/2ms]
[java] comms: tx 0->3: 4 ( = 4 ) [45.866s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [45.880s/14ms]
[java] comms: rx: 54 51 33 0 3 4 0 0 a3 ( = 84 81 51 0 3 4 0 0 163 ) [45.880s/0ms]
[java] DEBUG: ...0 [45.880s/0ms]

>> fine
clicked 'line test' dialog comes up.

[java] DEBUG: X axis - getting position. It is... [69.307s/23427ms]
[java] comms: tx 0->2: 4 ( = 4 ) [69.307s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.317s/10ms]
[java] comms: rx: 54 51 33 0 2 4 34 3 d8 ( = 84 81 51 0 2 4 52 3 216 ) [69.333s/16ms]
[java] DEBUG: ...820 [69.333s/0ms]
[java] DEBUG: Y axis - getting position. It is... [69.333s/0ms]
[java] comms: tx 0->3: 4 ( = 4 ) [69.333s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [69.349s/16ms]
[java] comms: rx: 54 51 33 0 3 4 0 0 a3 ( = 84 81 51 0 3 4 0 0 163 ) [69.349s/0ms]
[java] DEBUG: ...0 [69.349s/0ms]
[java] DEBUG: X axis - setting sync to 0 [69.349s/0ms]
[java] comms: tx 0->2: 8 0 ( = 8 0 ) [69.349s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.365s/16ms]
[java] DEBUG: Y axis - setting sync to 2 [69.365s/0ms]
[java] comms: tx 0->3: 8 2 ( = 8 2 ) [69.365s/0ms]
[java] comms: rx: 54 52 30 0 3 e3 ( = 84 82 48 0 3 227 ) [69.381s/16ms]
[java] DEBUG: X axis - dda at speed 1. x1 = 200, deltaY = 100 [69.381s/0ms]
[java] DEBUG: X axis - setting notification on. [69.381s/0ms]
[java] comms: tx 0->2: 7 0 ( = 7 0 ) [69.381s/0ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.397s/16ms]
>> fine
[java] comms: tx 0->2: b 1 c8 0 64 0 ( = 11 1 200 0 100 0 ) [69.398s/1ms]
[java] comms: rx: 54 52 30 0 2 bd ( = 84 82 48 0 2 189 ) [69.413s/15ms]
>> ^^^ the final command before crashing, the DDA command
clicked 'move to x: 200, y:100'

[java] comms: rx: 54 51 34 0 2 0 0 0 0 3d ( = 84 81 52 0 2 0 0 0 0 61 ) [82.034s/12621ms]
[java] DEBUG: Ignored and dropped packet [82.034s/0ms]

>> above is snapDebug on X-axis, so it seems to have reset itself like you said. >> Everything is fine (enough) up to the DDA. So the bug should be in the DDA code
>> So I'll be looking into that first. I'll let you know what I find...

stepper takes 10-15 steps and dies. here it is at the beginning again.

>>--Blerik
Anonymous User
Re: Arduino Firmware Reset Bug
February 08, 2008 07:28AM
Oh, one more thing. Can you test the same thing but with the X-axis homed, and a DDA to x: 100, y: 200?

That should also generate the bug.

--Blerik
Re: Arduino Firmware Reset Bug
February 08, 2008 09:35AM
hey. i fixed all the nak/ack stuff in the first part of the startup. its caused by the snapDebug stuff. i've figured out a way to get debug data out of the serial comms that wont be interpreted as a packet at all. that will allow me to send debug data as ascii that will be easy to interpret. should be sweet!

yeah, i'll be out of town this weekend, but i'll be back in full force on Monday trying to fix this problem some more. i'll post full-on traces of various different things.
Re: Arduino Firmware Reset Bug
February 08, 2008 11:03PM
ACK! ACK!


Jay
Re: Arduino Firmware Reset Bug
February 13, 2008 10:41AM
good news: i successfully added easy debug capability to the protocol. if you start a packet with the letter 'd' it will print everything you send as ASCII until there is a newline. this makes debugging on the arduino AWESOME. i made lots of progress tracking down the bug with debug statements, but i could not narrow it down to a particular part. it seemed to crash *between* the loop and the interrupt handler. it was a rough night of debugging.

bad news: i've essentially given up on the current cartesian bot snap code. i'm going to start over and re-write that code based on what i know now. it will be faster, smaller, leaner, and better. i guess thats good. i started on the libraries, and everything is looking good.

good news: i went back to the random DDA firmware and ran it on the exact same arduino and it ran successfully for about an hour (til i killed it) this means that all the stuff up to the cartesian bot snap processing works well. if theres a bug, its either in SNAP, or the cartesian bot processing script. i'm assuming that SNAP is fine, so the problem has to be with my implementation of the processing of the cartesian bot commands. i'm going to go back and re-implement each one, step by step. lets hope it works.
Re: Arduino Firmware Reset Bug
February 14, 2008 08:40PM
Tell me if I butcher something.

So I know next to nothing about SNAP protocols, but nophead said in another thread (Sorry if this is inapplicable)

"The confusion here is that SNAP is just the transport layer which ensures the packets get there. USB already has that. The command layer is the thing that is unique to RepRap. If those commands are sent over USB there is no need for them to be wrapped in SNAP packets as well. "

So my question is: Why are there SNAP protocols in the arduino firmware?


Jay
Anonymous User
Re: Arduino Firmware Reset Bug
February 15, 2008 05:39AM
The Arduino firmware speaks SNAP because the host software does. Zach tries to emulate the gen1 electronics, so he doesn't have to change the host software. Which is good engineering, you only change one part at a time. But it is still a pain, since the SNAP layer doesn't add anything when using USB. And it is a lot of code to get bug free.

--Blerik
Re: Arduino Firmware Reset Bug
February 15, 2008 10:13AM
OK, that clears it up for me. Thanks Blerik


Jay
Re: Arduino Firmware Reset Bug
February 15, 2008 12:41PM
yeah, it is a bummer.

however... have no fear. i have monday off work and i plan on spending the whole time on this bug. i have full confidence that i will be able to solve it.

wish me luck!
Re: Arduino Firmware Reset Bug
February 15, 2008 09:35PM
Good Luck!


Jay
Re: Arduino Firmware Reset Bug
February 17, 2008 08:47PM
argh. another day that feels wasted. i've rewritten large chunks of the code, and actually fixed a couple minor bugs. however, the reset bug still looms large.

i've tracked it down to a specific case: home/reset the Y axis, then seek the Y axis. for some reason its ONLY the Y axis. I've tried switching boards, unplugging the stepper boards, nothing helps.

i feel its a software problem only. would someone please look over my code and find the tiny little mistake that is making me go prematurely grey?
Anonymous User
Re: Arduino Firmware Reset Bug
February 18, 2008 05:17AM
It only happens on the Y-axis, and the code for the Y-axis is 100% identical to the X-axis. Maybe it is an overflow in the code somewhere that is only a problem for the Y-axis because of the positioning of the code. This means analyzing assembly dumps of the code to see exactly what is going wrong sad smiley. Could you post a detailed description of the minimal amount of steps to duplicate this error, with all the logging you can get? And an object dump of the code (howto on this page: [www.arduino.cc])?

--Blerik
Re: Arduino Firmware Reset Bug
February 18, 2008 01:08PM
well, here's everything. this is way over my head. if you can figure out what the heck is going wrong, i'd be very appreciative.

there are 2 easy steps to reproduce the bug via the stepper exerciser

1. home the Y axis
2. seek the Y axis anywhere

attached is:

the firmware
the snap debug text
the object dump file
a diff for the host code to capture the debug output from arduino.
Attachments:
open | download - reprap-arduino-firmware-2008-02-18.zip (51.4 KB)
open | download - snap_debug.txt (6.6 KB)
open | download - Single_Arduino_SNAP.objdump (310.2 KB)
open | download - host.diff (3 KB)
Anonymous User
Re: Arduino Firmware Reset Bug
February 19, 2008 05:15AM
This is just a first glance:

984: 82 30 cpi r24, 0x02 ; 2
986: 19 f4 brne .+6 ; 0x98e <__vector_11+0x34>
988: 0e 94 50 04 call 0x8a0 <_Z12interruptDDAv>
98c: 10 c0 rjmp .+32 ; 0x9ae <__vector_11+0x54>
98e: 83 30 cpi r24, 0x03 ; 3
990: 19 f4 brne .+6 ; 0x998 <__vector_11+0x3e>
992: 0e 94 69 04 call 0x8d2 <_Z18interruptHomeResetv>
996: 0b c0 rjmp .+22 ; 0x9ae <__vector_11+0x54>
998: 81 30 cpi r24, 0x01 ; 1
99a: 19 f4 brne .+6 ; 0x9a2 <__vector_11+0x48>
99c: 0e 94 94 04 call 0x928 <_Z13interruptSeekv>

I see that DDA, HomeReset and Seek are all on the same interrupt. Could this problem be related to state getting messed up or not cleaned up so the wrong interrupt handler is called?

I'll keep looking...

*edit* In the code itself the state is kept in a global variable called bot_mode in CartesianBot_SNAP_v1.cpp. This variable is used both inside the interrupt handler and outside. This could lead to subtle timing problems.

*edit* This is probably another red herring. Debugging is such fun...

--Blerik

Edited 2 time(s). Last edit at 02/19/2008 05:53AM by blerik.
Re: Arduino Firmware Reset Bug
February 20, 2008 09:37AM
Great news!!! I juts got this in the email and while I havent had a chance to test it, it seems to be *exactly* what our problem is:

Feel free to try it out and see if it works.

---------------------------------------------------------

Hi Zach,

I'm emailing you this since I'm still waiting for the activation email
for the reprap forum to arrive. Feel free to forward this mail to the
forum if it's helpful.

Sorry for taking so long to test this, but somehow getting the RepRap
host software to run on Mac was a bit tricky.
Anyway, it's running now. I get very similar issues as you; the
arduino resets when trying to do a Line Test.

Looking at the interrupt code, I noticed smth. fishy:
In CartesianBot::enableTimerInterrupt(), you do this:

TIMSK1 |= (1<<ICIE1);
TIMSK1 |= (1<<OCIE1A);

The first of these lines activates the Input _Capture_ Interrupt, for
which I cannot find an interrupt vector. If this interrupt vector is
initialized to 0, the arduino would naturally reset whenever this
interrupt is triggered.
Removing the first line also fixes the sudden reset, although I
haven't tested very much.

I hope this helped. Also, now I have the software running again, so I
should be able to help you on shorter notice next time winking smiley

Cheers,

~/= Marius
Anonymous User
Re: Arduino Firmware Reset Bug
February 20, 2008 11:26AM
Brilliant! Good work.

--Blerik

(bit ashamed that I didn't see that one)
Sorry, only registered users may post in this forum.

Click here to login