Getting more speed out of Arduino Mega

TESKAn

Getting more speed out of Arduino Mega
May 30, 2015 01:56PM

Registered: 9 years ago
Posts: 5

I recently started printing nylon and ran into a problem - pauses due to buffer emptying caused large blobs in the middle of printed object. This is no problem at slower speeds, but then printing takes forever. Therefore I decided to see if there is a way to optimize something in the code and shure enough, there was. Of particular interest was a piece of code in prepare_move function:

clamp_to_software_endstops(destination);
  previous_millis_cmd = millis();

#ifdef DELTA
  float difference[NUM_AXIS];
  for (int8_t i=0; i < NUM_AXIS; i++) {
    difference = destination - current_position;
  }
  float cartesian_mm = sqrt(sq(difference[X_AXIS]) +
                            sq(difference[Y_AXIS]) +
                            sq(difference[Z_AXIS]));
  if (cartesian_mm < 0.000001) { cartesian_mm = abs(difference[E_AXIS]); }
  if (cartesian_mm < 0.000001) { return; }
  float seconds = 6000 * cartesian_mm / feedrate / feedmultiply;
  int steps = max(1, int(delta_segments_per_second * seconds));
  // SERIAL_ECHOPGM("mm="); SERIAL_ECHO(cartesian_mm);
  // SERIAL_ECHOPGM(" seconds="); SERIAL_ECHO(seconds);
  // SERIAL_ECHOPGM(" steps="); SERIAL_ECHOLN(steps);
  for (int s = 1; s <= steps; s++) {
    float fraction = float(s) / float(steps);
    for(int8_t i=0; i < NUM_AXIS; i++) {
      destination = current_position + difference * fraction;
    }
    calculate_delta(destination);
    plan_buffer_line(delta[X_AXIS], delta[Y_AXIS], delta[Z_AXIS],
                     destination[E_AXIS], feedrate*feedmultiply/60/100.0,
                     active_extruder);
  }
  
#endif // DELTA

Let's break it down.

clamp_to_software_endstops(destination);

So this calls a function to check if we are outside of range. Ok, but we are making a function call for a couple of if's. This means that for a couple of if's, we are storing some stuff on stack, jumping to subroutine, doing the if's, jumping back, restoring stuff from stack...all of this takes time, which we don't have. So instead just replace function call with what's inside of the function:

  if (min_software_endstops) 
  {
    if (destination[X_AXIS] < min_pos[X_AXIS]) destination[X_AXIS] = min_pos[X_AXIS];
    if (destination[Y_AXIS] < min_pos[Y_AXIS]) destination[Y_AXIS] = min_pos[Y_AXIS];
    if (destination[Z_AXIS] < min_pos[Z_AXIS]) destination[Z_AXIS] = min_pos[Z_AXIS];
  }

  if (max_software_endstops) 
  {
    if (destination[X_AXIS] > max_pos[X_AXIS]) destination[X_AXIS] = max_pos[X_AXIS];
    if (destination[Y_AXIS] > max_pos[Y_AXIS]) destination[Y_AXIS] = max_pos[Y_AXIS];
    if (destination[Z_AXIS] > max_pos[Z_AXIS]) destination[Z_AXIS] = max_pos[Z_AXIS];
  }

This takes up more flash, but flash we have in abundance.
Some time saved.

  for (int8_t i=0; i < NUM_AXIS; i++) {
    difference = destination - current_position;
  }

Ok, this is nice to have so that we can automatically support any number of axis...but for me, this is unnecessary looping. Shure, not much time to save, but let's do it anyway.

  difference[0] = destination[0] - current_position[0];
  difference[1] = destination[1] - current_position[1];
  difference[2] = destination[2] - current_position[2];
  difference[3] = destination[3] - current_position[3];

Again couple of ticks saved.

  if (cartesian_mm < 0.000001) { cartesian_mm = abs(difference[E_AXIS]); }
  if (cartesian_mm < 0.000001) { return; }

Doing same compare twice? What for?
Let's optimize:

  if (cartesian_mm < 0.000001) 
  { 
    cartesian_mm = abs(difference[E_AXIS]); 
    return;
  }

  float seconds = 6000 * cartesian_mm / feedrate / feedmultiply;
  int steps = max(1, int(delta_segments_per_second * seconds));

I had a particular problem with this piece of code. So we are using feedrate to calculate number of delta segments? This means that for different feed rates, we get different accuracy. For example, 1 mm at 20 mm/sec has 10 segments, at twice the speed 5 segments and so on. So instead let's use segments per mm so we get some consistency.

  float fTemp = cartesian_mm * 5;
  int steps = (int)fTemp;

Also note that we removed a division, which takes a bunch more time to complete than multiplication. Also by multiplying distance before converting to int, we get correct steps for fractions of millimeters.

Next is this code segment

  for (int s = 1; s <= steps; s++) {
    float fraction = float(s) / float(steps);
    for(int8_t i=0; i < NUM_AXIS; i++) {
      destination = current_position + difference * fraction;
    }

For each step one division, 4 additions, 4 multiplications. Let's see what we can do.
First, a couple of new variables:

  float addDistance[NUM_AXIS];
  float fractions[NUM_AXIS];

Then we make shure that there is at least one step, we precalculate fractions - distance moved for each axis in one segment and we zero distance added.

  if(0 == steps)
 {
   steps = 1;
   fractions[0] = difference[0];
   fractions[1] = difference[1];
   fractions[2] = difference[2];
   fractions[3] = difference[3];
 }
 else
 {
   fTemp = 1 / float(steps);
   fractions[0] = difference[0] * fTemp;
   fractions[1] = difference[1] * fTemp;
   fractions[2] = difference[2] * fTemp;
   fractions[3] = difference[3] * fTemp;
 }
  
  // For number of steps, for each step add one fraction
  // First, set initial destination to current position
  addDistance[0] = 0.0;
  addDistance[1] = 0.0;
  addDistance[2] = 0.0;
  addDistance[3] = 0.0;

Next, the loop. Let's break it down:
First part of the loop, calculate target position for all axis:

    // Add step fraction
    addDistance[0] += fractions[0];
    addDistance[1] += fractions[1];
    addDistance[2] += fractions[2];
    addDistance[3] += fractions[3];
    // Add to destination
    destination[0] = current_position[0] + addDistance[0];
    destination[1] = current_position[1] + addDistance[1];
    destination[2] = current_position[2] + addDistance[2];
    destination[3] = current_position[3] + addDistance[3];

To this point the above code has replaced steps x (1x divide + 4x add+ 4x multiply) with 1x divide + 4x multiply + steps x (8x add). So instead of dividing, multiplying, adding for each step we are dividing and multiplying once and then using the results for additions for each step. Time saved.

Next inside step loop is calculate_delta

calculate_delta(destination);

Again replace function call with code:

    // X axis
    delta[X_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower1_x-destination[X_AXIS];
    delta[X_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower1_y-destination[Y_AXIS];
    delta[X_AXIS] -= (fTemp * fTemp);
    delta[X_AXIS] = sqrt(delta[X_AXIS]);
    delta[X_AXIS] += destination[Z_AXIS];
     // Y axis
    delta[Y_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower2_x-destination[X_AXIS];
    delta[Y_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower2_y-destination[Y_AXIS];
    delta[Y_AXIS] -= (fTemp * fTemp);
    delta[Y_AXIS] = sqrt(delta[Y_AXIS]);
    delta[Y_AXIS] += destination[Z_AXIS];   
     // Z axis
    delta[Z_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower3_x-destination[X_AXIS];
    delta[Z_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower3_y-destination[Y_AXIS];
    delta[Z_AXIS] -= (fTemp * fTemp);
    delta[Z_AXIS] = sqrt(delta[Z_AXIS]);
    delta[Z_AXIS] += destination[Z_AXIS];

Same for adjust_delta, if you have auto leveling:

    // Adjust delta
    float grid_x = destination[X_AXIS] * AUTOLEVEL_GRID_MULTI;// / AUTOLEVEL_GRID;
    if(2.999 < grid_x) grid_x = 2.999;
    else if(-2.999 > grid_x) grid_x = -2.999;
    
    float grid_y = destination[Y_AXIS] * AUTOLEVEL_GRID_MULTI;// / AUTOLEVEL_GRID;
    if(2.999 < grid_y) grid_y = 2.999;
    else if(-2.999 > grid_y) grid_y = -2.999;
    
    int floor_x = floor(grid_x);
    int floor_y = floor(grid_y);
    float ratio_x = grid_x - floor_x;
    float ratio_y = grid_y - floor_y;
    float z1 = bed_level[floor_x+3][floor_y+3];
    float z2 = bed_level[floor_x+3][floor_y+4];
    float z3 = bed_level[floor_x+4][floor_y+3];
    float z4 = bed_level[floor_x+4][floor_y+4];
    float left = (1-ratio_y)*z1 + ratio_y*z2;
    float right = (1-ratio_y)*z3 + ratio_y*z4;
    float offset = (1-ratio_x)*left + ratio_x*right;
  
    delta[X_AXIS] += offset;
    delta[Y_AXIS] += offset;
    delta[Z_AXIS] += offset;

And last, the code:

	plan_buffer_line(delta[X_AXIS], delta[Y_AXIS], delta[Z_AXIS],
	destination[E_AXIS], feedrate*feedmultiply/60/100.0,
	active_extruder);

feedrate*feedmultiply/60/100.0 is calculated every time we go through the loop, even though it stays the same. So we calculate it once before the loop and use that value for each iteration.

Putting it all together, the code looks like this:


//  clamp_to_software_endstops(destination);
  if (min_software_endstops) 
  {
    if (destination[X_AXIS] < min_pos[X_AXIS]) destination[X_AXIS] = min_pos[X_AXIS];
    if (destination[Y_AXIS] < min_pos[Y_AXIS]) destination[Y_AXIS] = min_pos[Y_AXIS];
    if (destination[Z_AXIS] < min_pos[Z_AXIS]) destination[Z_AXIS] = min_pos[Z_AXIS];
  }

  if (max_software_endstops) 
  {
    if (destination[X_AXIS] > max_pos[X_AXIS]) destination[X_AXIS] = max_pos[X_AXIS];
    if (destination[Y_AXIS] > max_pos[Y_AXIS]) destination[Y_AXIS] = max_pos[Y_AXIS];
    if (destination[Z_AXIS] > max_pos[Z_AXIS]) destination[Z_AXIS] = max_pos[Z_AXIS];
  }
  previous_millis_cmd = millis();
#ifdef DELTA

  float difference[NUM_AXIS];
  float addDistance[NUM_AXIS];
  float fractions[NUM_AXIS];
  
  
  difference[0] = destination[0] - current_position[0];
  difference[1] = destination[1] - current_position[1];
  difference[2] = destination[2] - current_position[2];
  difference[3] = destination[3] - current_position[3];

  float cartesian_mm = difference[X_AXIS] * difference[X_AXIS];
  cartesian_mm += (difference[Y_AXIS] * difference[Y_AXIS]);
  cartesian_mm += (difference[Z_AXIS] * difference[Z_AXIS]);

  cartesian_mm = sqrt(cartesian_mm);
     
  if (cartesian_mm < 0.000001) 
  { 
    cartesian_mm = abs(difference[E_AXIS]); 
    return;
  }
  
  //float frfm = feedrate * feedmultiply;
  
  /*
  // For 1 mm, steps are ((6000 * 1) / (1200 * 100)) * 200
  float seconds = 6000 * cartesian_mm / frfm;// feedrate / feedmultiply;
//  int steps = max(1, int(DELTA_SEGMENTS_PER_SECOND * seconds));
  int steps = int(DELTA_SEGMENTS_PER_SECOND * seconds);
  if(1 > steps) steps = 1;
  */
  // Using steps per mm makes much more sense
  //int mms = (int)cartesian_mm;

  float fTemp = cartesian_mm * 5;
  int steps = (int)fTemp;
  // At least one step
  // Calculate step and fraction
  if(0 == steps)
 {
   steps = 1;
   fractions[0] = difference[0];
   fractions[1] = difference[1];
   fractions[2] = difference[2];
   fractions[3] = difference[3];
 }
 else
 {
   fTemp = 1 / float(steps);
   fractions[0] = difference[0] * fTemp;
   fractions[1] = difference[1] * fTemp;
   fractions[2] = difference[2] * fTemp;
   fractions[3] = difference[3] * fTemp;
 }
  
  // For number of steps, for each step add one fraction
  // First, set initial destination to current position
  addDistance[0] = 0.0;
  addDistance[1] = 0.0;
  addDistance[2] = 0.0;
  addDistance[3] = 0.0;
  // Calculate feedrate*feedmultiply/60/100.0
  // We use this in each for iteration
  float frfm = feedrate*feedmultiply/60/100.0;
  // Then add fraction for each segment step
  for (int s = 1; s <= steps; s++) 
  {
    // Add step fraction
    addDistance[0] += fractions[0];
    addDistance[1] += fractions[1];
    addDistance[2] += fractions[2];
    addDistance[3] += fractions[3];
    // Add to destination
    destination[0] = current_position[0] + addDistance[0];
    destination[1] = current_position[1] + addDistance[1];
    destination[2] = current_position[2] + addDistance[2];
    destination[3] = current_position[3] + addDistance[3];
    
    // Calculate delta
    // X axis
    delta[X_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower1_x-destination[X_AXIS];
    delta[X_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower1_y-destination[Y_AXIS];
    delta[X_AXIS] -= (fTemp * fTemp);
    delta[X_AXIS] = sqrt(delta[X_AXIS]);
    delta[X_AXIS] += destination[Z_AXIS];
     // Y axis
    delta[Y_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower2_x-destination[X_AXIS];
    delta[Y_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower2_y-destination[Y_AXIS];
    delta[Y_AXIS] -= (fTemp * fTemp);
    delta[Y_AXIS] = sqrt(delta[Y_AXIS]);
    delta[Y_AXIS] += destination[Z_AXIS];   
     // Z axis
    delta[Z_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower3_x-destination[X_AXIS];
    delta[Z_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower3_y-destination[Y_AXIS];
    delta[Z_AXIS] -= (fTemp * fTemp);
    delta[Z_AXIS] = sqrt(delta[Z_AXIS]);
    delta[Z_AXIS] += destination[Z_AXIS];  
    
    // Adjust delta
    float grid_x = destination[X_AXIS] * AUTOLEVEL_GRID_MULTI;// / AUTOLEVEL_GRID;
    if(2.999 < grid_x) grid_x = 2.999;
    else if(-2.999 > grid_x) grid_x = -2.999;
    
    float grid_y = destination[Y_AXIS] * AUTOLEVEL_GRID_MULTI;// / AUTOLEVEL_GRID;
    if(2.999 < grid_y) grid_y = 2.999;
    else if(-2.999 > grid_y) grid_y = -2.999;
    
    int floor_x = floor(grid_x);
    int floor_y = floor(grid_y);
    float ratio_x = grid_x - floor_x;
    float ratio_y = grid_y - floor_y;
    float z1 = bed_level[floor_x+3][floor_y+3];
    float z2 = bed_level[floor_x+3][floor_y+4];
    float z3 = bed_level[floor_x+4][floor_y+3];
    float z4 = bed_level[floor_x+4][floor_y+4];
    float left = (1-ratio_y)*z1 + ratio_y*z2;
    float right = (1-ratio_y)*z3 + ratio_y*z4;
    float offset = (1-ratio_x)*left + ratio_x*right;
  
    delta[X_AXIS] += offset;
    delta[Y_AXIS] += offset;
    delta[Z_AXIS] += offset; 
  
    plan_buffer_line(delta[X_AXIS], delta[Y_AXIS], delta[Z_AXIS],
                     destination[E_AXIS], frfm,
                     active_extruder);
  }
#else

So what does this get us? Timing the function to move extruder for 1 mm, I get ~42 msec execution time. With all the changes, time drops to ~20 msec. Quite a difference.

Reply Quote

Wurstnase

Re: Getting more speed out of Arduino Mega
May 30, 2015 11:39PM

Registered: 9 years ago
Posts: 4,977

Really awesome work!

Don't you think that the compiler will optimize the normal loop (without fractions)?

You could also declare FORCE_INLINE void clamp_to_software_endstop(...).

Edited 1 time(s). Last edit at 05/30/2015 11:50PM by Wurstnase.

Triffid Hunter's Calibration Guide

--> X <-- Drill for new Monitor

Most important Gcode.

Reply Quote

TESKAn

Re: Getting more speed out of Arduino Mega
May 31, 2015 01:56AM

Registered: 9 years ago
Posts: 5

Thanks

.
For compiler optimizations, I don't know - I am new to Arduino programming and I just wanted to get the machine to move smoother and faster smiling smiley

.
And it shows that there is life still in Arduino mega - I mean I got the whole electronic for 3D printer (Arduino, RAMPS, stepper drivers and full graphic LCD controller) for ~45$. I don't think it is possible to go any cheaper smiling smiley

.

Edit: Also just found a bug. The code

  if (cartesian_mm < 0.000001) { cartesian_mm = abs(difference[E_AXIS]); }
  if (cartesian_mm < 0.000001) { return; }

Had a reason.
So my new solution is to put the second if inside of the first one, so under normal circumstances we only check one if.

  if (cartesian_mm < 0.000001) 
  { 
    cartesian_mm = abs(difference[E_AXIS]); 
    if(cartesian_mm < 0.000001) 
    {
      return;
    }
  }

If we just want to extrude some filament, my first solution won't do anything - it will just return without queueing any moves. The check before return makes shure that this does not happen.

Edited 1 time(s). Last edit at 05/31/2015 02:02AM by TESKAn.

Reply Quote

AndrewBCN

Re: Getting more speed out of Arduino Mega
May 31, 2015 03:01PM

Registered: 9 years ago
Posts: 977

@ TESKAn

Nice work! I will test your performance patches ASAP and will report back here. thumbs up

Reply Quote

Traumflug

Re: Getting more speed out of Arduino Mega
June 01, 2015 08:46AM

Registered: 13 years ago
Posts: 7,616

Quote
TESKAn
Putting it all together, the code looks like this

Looks like you've put a lot of effort into what Teacup Firmware does for years already: looking at the details, measuring the results. Teacup even has precise measurement tools, which leads to kind of regression testing regarding performance:

[reprap.org]

P.S.: I just recognize the above might sound snobbish. Wasn't meant this way.

Edited 1 time(s). Last edit at 06/01/2015 09:14AM by Traumflug.

Generation 7 Electronics	Teacup Firmware	RepRap DIY

Reply Quote

TESKAn

Re: Getting more speed out of Arduino Mega
June 01, 2015 12:51PM

Registered: 9 years ago
Posts: 5

No hard feelings winking smiley

.
I realize there are better solutions out there, but one thing that comes with age is that you grow tired of fiddling with machines that you just want to use to make stuff for your other projects. It was the same with my PC, 15 years ago I spent tons of time tinkering with it, making DIY watercooling, overclocking, now I just want to turn it ON and work without worrying.
Same with my 3D printer, ~1 year ago, when I decided to make my own and decided on delta design, Marlin firmware was the one I used. And now it is stuck, I got it to work OK for my needs and it will probably stay this way smiling smiley

.

To get back on topic, for anyone wanting to try this.

First, if you have auto leveling, add a line to config.h after #define AUTOLEVEL_GRID:

#define AUTOLEVEL_GRID_MULTI  1/AUTOLEVEL_GRID

To get exact representation of this multiplier (that replaces one divide) AUTOLEVEL_GRID should be set to a power of 2, e.g. 8, 16, 32, 64...

Next, this is my whole prepare_move function in marlin_main.cpp:

void prepare_move()
{
  // Replace function call
//  clamp_to_software_endstops(destination);
  if (min_software_endstops) 
  {
    if (destination[X_AXIS] < min_pos[X_AXIS]) destination[X_AXIS] = min_pos[X_AXIS];
    if (destination[Y_AXIS] < min_pos[Y_AXIS]) destination[Y_AXIS] = min_pos[Y_AXIS];
    if (destination[Z_AXIS] < min_pos[Z_AXIS]) destination[Z_AXIS] = min_pos[Z_AXIS];
  }

  if (max_software_endstops) 
  {
    if (destination[X_AXIS] > max_pos[X_AXIS]) destination[X_AXIS] = max_pos[X_AXIS];
    if (destination[Y_AXIS] > max_pos[Y_AXIS]) destination[Y_AXIS] = max_pos[Y_AXIS];
    if (destination[Z_AXIS] > max_pos[Z_AXIS]) destination[Z_AXIS] = max_pos[Z_AXIS];
  }
  previous_millis_cmd = millis();
  
#ifdef DELTA

  float difference[NUM_AXIS];
  float addDistance[NUM_AXIS];
  float fractions[NUM_AXIS];
  
  
  difference[0] = destination[0] - current_position[0];
  difference[1] = destination[1] - current_position[1];
  difference[2] = destination[2] - current_position[2];
  difference[3] = destination[3] - current_position[3];

  float cartesian_mm = difference[X_AXIS] * difference[X_AXIS];
  cartesian_mm += (difference[Y_AXIS] * difference[Y_AXIS]);
  cartesian_mm += (difference[Z_AXIS] * difference[Z_AXIS]);

  cartesian_mm = sqrt(cartesian_mm);
     
  if (cartesian_mm < 0.000001) 
  { 
    cartesian_mm = abs(difference[E_AXIS]); 
    if(cartesian_mm < 0.000001) 
    {
      return;
    }
  }
  
  // Using steps per mm makes much more sense

  float fTemp = cartesian_mm * 5;
  int steps = (int)fTemp;
  // At least one step
  // Calculate step and fraction
  if(0 == steps)
 {
   steps = 1;
   fractions[0] = difference[0];
   fractions[1] = difference[1];
   fractions[2] = difference[2];
   fractions[3] = difference[3];
 }
 else
 {
   fTemp = 1 / float(steps);
   fractions[0] = difference[0] * fTemp;
   fractions[1] = difference[1] * fTemp;
   fractions[2] = difference[2] * fTemp;
   fractions[3] = difference[3] * fTemp;
 }
  
  // For number of steps, for each step add one fraction
  // First, set initial destination to current position
  addDistance[0] = 0.0;
  addDistance[1] = 0.0;
  addDistance[2] = 0.0;
  addDistance[3] = 0.0;
  // Calculate feedrate*feedmultiply/60/100.0
  // We use this in each for iteration
  float frfm = feedrate*feedmultiply/60/100.0;
  // Then add fraction for each segment step
  for (int s = 1; s <= steps; s++) 
  {
    // Add step fraction
    addDistance[0] += fractions[0];
    addDistance[1] += fractions[1];
    addDistance[2] += fractions[2];
    addDistance[3] += fractions[3];
    // Add to destination
    destination[0] = current_position[0] + addDistance[0];
    destination[1] = current_position[1] + addDistance[1];
    destination[2] = current_position[2] + addDistance[2];
    destination[3] = current_position[3] + addDistance[3];
    
    // Calculate delta
    // X axis
    delta[X_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower1_x-destination[X_AXIS];
    delta[X_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower1_y-destination[Y_AXIS];
    delta[X_AXIS] -= (fTemp * fTemp);
    delta[X_AXIS] = sqrt(delta[X_AXIS]);
    delta[X_AXIS] += destination[Z_AXIS];
     // Y axis
    delta[Y_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower2_x-destination[X_AXIS];
    delta[Y_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower2_y-destination[Y_AXIS];
    delta[Y_AXIS] -= (fTemp * fTemp);
    delta[Y_AXIS] = sqrt(delta[Y_AXIS]);
    delta[Y_AXIS] += destination[Z_AXIS];   
     // Z axis
    delta[Z_AXIS] = DELTA_DIAGONAL_ROD_2;
    fTemp = delta_tower3_x-destination[X_AXIS];
    delta[Z_AXIS] -= (fTemp * fTemp);
    fTemp = delta_tower3_y-destination[Y_AXIS];
    delta[Z_AXIS] -= (fTemp * fTemp);
    delta[Z_AXIS] = sqrt(delta[Z_AXIS]);
    delta[Z_AXIS] += destination[Z_AXIS];  
    
    //*****************************************************************
    // Comment this part out if you don't have auto bed leveling
    // Adjust delta
    float grid_x = destination[X_AXIS] * AUTOLEVEL_GRID_MULTI;// / AUTOLEVEL_GRID;
    if(2.999 < grid_x) grid_x = 2.999;
    else if(-2.999 > grid_x) grid_x = -2.999;
    
    float grid_y = destination[Y_AXIS] * AUTOLEVEL_GRID_MULTI;// / AUTOLEVEL_GRID;
    if(2.999 < grid_y) grid_y = 2.999;
    else if(-2.999 > grid_y) grid_y = -2.999;
    
    int floor_x = floor(grid_x);
    int floor_y = floor(grid_y);
    float ratio_x = grid_x - floor_x;
    float ratio_y = grid_y - floor_y;
    float z1 = bed_level[floor_x+3][floor_y+3];
    float z2 = bed_level[floor_x+3][floor_y+4];
    float z3 = bed_level[floor_x+4][floor_y+3];
    float z4 = bed_level[floor_x+4][floor_y+4];
    float left = (1-ratio_y)*z1 + ratio_y*z2;
    float right = (1-ratio_y)*z3 + ratio_y*z4;
    float offset = (1-ratio_x)*left + ratio_x*right;
  
    delta[X_AXIS] += offset;
    delta[Y_AXIS] += offset;
    delta[Z_AXIS] += offset; 
    
    //*****************************************************************
    // End of comment
  
    plan_buffer_line(delta[X_AXIS], delta[Y_AXIS], delta[Z_AXIS],
                     destination[E_AXIS], frfm,
                     active_extruder);
  }
  
#else

#ifdef DUAL_X_CARRIAGE
  if (active_extruder_parked)
  {
    if (dual_x_carriage_mode == DXC_DUPLICATION_MODE && active_extruder == 0)
    {
      // move duplicate extruder into correct duplication position.
      plan_set_position(inactive_extruder_x_pos, current_position[Y_AXIS], current_position[Z_AXIS], current_position[E_AXIS]);
      plan_buffer_line(current_position[X_AXIS] + duplicate_extruder_x_offset, current_position[Y_AXIS], current_position[Z_AXIS], 
          current_position[E_AXIS], max_feedrate[X_AXIS], 1);
      plan_set_position(current_position[X_AXIS], current_position[Y_AXIS], current_position[Z_AXIS], current_position[E_AXIS]);
      st_synchronize();
      extruder_duplication_enabled = true;
      active_extruder_parked = false;
    }  
    else if (dual_x_carriage_mode == DXC_AUTO_PARK_MODE) // handle unparking of head
    {
      if (current_position[E_AXIS] == destination[E_AXIS])
      {
        // this is a travel move - skit it but keep track of current position (so that it can later
        // be used as start of first non-travel move)
        if (delayed_move_time != 0xFFFFFFFFUL)
        {
          memcpy(current_position, destination, sizeof(current_position)); 
          if (destination[Z_AXIS] > raised_parked_position[Z_AXIS])
            raised_parked_position[Z_AXIS] = destination[Z_AXIS];
          delayed_move_time = millis();
          return;
        }
      }
      delayed_move_time = 0;
      // unpark extruder: 1) raise, 2) move into starting XY position, 3) lower
      plan_buffer_line(raised_parked_position[X_AXIS], raised_parked_position[Y_AXIS], raised_parked_position[Z_AXIS],    current_position[E_AXIS], max_feedrate[Z_AXIS], active_extruder);
      plan_buffer_line(current_position[X_AXIS], current_position[Y_AXIS], raised_parked_position[Z_AXIS], 
          current_position[E_AXIS], min(max_feedrate[X_AXIS],max_feedrate[Y_AXIS]), active_extruder);
      plan_buffer_line(current_position[X_AXIS], current_position[Y_AXIS], current_position[Z_AXIS], 
          current_position[E_AXIS], max_feedrate[Z_AXIS], active_extruder);
      active_extruder_parked = false;
    }
  }
#endif //DUAL_X_CARRIAGE

  // Do not use feedmultiply for E or Z only moves
  if( (current_position[X_AXIS] == destination [X_AXIS]) && (current_position[Y_AXIS] == destination [Y_AXIS])) {
      plan_buffer_line(destination[X_AXIS], destination[Y_AXIS], destination[Z_AXIS], destination[E_AXIS], feedrate/60, active_extruder);
  }
  else {
    plan_buffer_line(destination[X_AXIS], destination[Y_AXIS], destination[Z_AXIS], destination[E_AXIS], feedrate*feedmultiply/60/100.0, active_extruder);
  }
#endif //else DELTA
  for(int8_t i=0; i < NUM_AXIS; i++) {
    current_position = destination;
  }

}

Reply Quote

thetazzbot

Re: Getting more speed out of Arduino Mega
June 15, 2015 05:57PM

Registered: 8 years ago
Posts: 396

Quote
Traumflug

Quote
TESKAn
Putting it all together, the code looks like this

Looks like you've put a lot of effort into what Teacup Firmware does for years already: looking at the details, measuring the results. Teacup even has precise measurement tools, which leads to kind of regression testing regarding performance:

[reprap.org]

P.S.: I just recognize the above might sound snobbish. Wasn't meant this way.

I understand that Teacup has made a lot of progress in many areas but there are many features of Marlin that many users need that Teacup does not have.

In particular I don't see support yet for the RepRapDiscount Full LCD with SD Card. Multiple extruders/Heaters/Thermistors/Fans? Can it utilize 100% the features of a basic RAMPS board?

Reply Quote

Traumflug

Re: Getting more speed out of Arduino Mega
June 20, 2015 06:25AM

Registered: 13 years ago
Posts: 7,616

Quote
thetazzbot
In particular I don't see support yet for the RepRapDiscount Full LCD with SD Card. Multiple extruders/Heaters/Thermistors/Fans? Can it utilize 100% the features of a basic RAMPS board?

Number of heaters and temp sensors only limited by the number of available pins. And you might find out that it's easier to add redundant stuff like displays, than getting the movement preparation and stepping algorithm up to perfection.

Generation 7 Electronics	Teacup Firmware	RepRap DIY

Reply Quote

thetazzbot

Re: Getting more speed out of Arduino Mega
June 23, 2015 01:08PM

Registered: 8 years ago
Posts: 396

Quote
Traumflug

Quote
thetazzbot
In particular I don't see support yet for the RepRapDiscount Full LCD with SD Card. Multiple extruders/Heaters/Thermistors/Fans? Can it utilize 100% the features of a basic RAMPS board?

Number of heaters and temp sensors only limited by the number of available pins. And you might find out that it's easier to add redundant stuff like displays, than getting the movement preparation and stepping algorithm up to perfection.

All I was saying was, you seem to promote Teacup firmware as if it is a 1:1 replacement for Marlin and it is not. So I was just making the "buyer beware" statement that it is not a 1:1 comparison. Sure, Teacup might do one thing better (and that one thing might be the most important in some people's mind), it is not everything. For example, under 300mm/s (I've never even printed that fast so I don't care) I get fine output from Marlin. I'm not even sure what advantage I (a user with a cartesian printer) would gain from Teacup, but I might lose some features in the process. I've read your comments also about "developers not testing their code" or not merging into the main branch... So I'm confused, are you the developer of Teacup? I see an SD branch, an LCD branch, a multi-extruder branch, etc etc... But what I'm looking for is all of those features merged into main/trunk... who does that? I.e. who approves the pull requests?

I admire the work you're doing on Teacup, but for my machine it is not what I need at this time.

Cheers,
Mark

Edited 1 time(s). Last edit at 06/23/2015 01:09PM by thetazzbot.

Reply Quote

AndrewBCN

Re: Getting more speed out of Arduino Mega
June 23, 2015 01:58PM

Registered: 9 years ago
Posts: 977

Mark,
I don't think Markus has ever promoted Teacup as a 1:1 replacement for Marlin, I don't think he even believes that to be true. Each firmware has its advantages and disadvantages, its pros and cons and I guess we enjoy the freedom of choosing the firmware that best suits our needs, which is a very Good Thing (tm) in my book! spinning smiley sticking its tongue out

Back to the topic of TESKAn's code optimization, I still haven't had the time to test them. Has anybody given these patches a test drive?

Reply Quote

Chri

Re: Getting more speed out of Arduino Mega
June 24, 2015 03:01AM

Registered: 12 years ago
Posts: 799

Am i right that this optimization only may help on Delta printers, not cartesian ?

Chri

[chrisu02.wordpress.com] Quadmax Intel Delid Tools

Reply Quote

TESKAn

Re: Getting more speed out of Arduino Mega
June 24, 2015 03:28AM

Registered: 9 years ago
Posts: 5

You are correct. Cartesian "core" is already pretty optimized as it uses GRBL project, also G code is already in cartesian so you don't have to do any expensive calculations to get from G code to stepper movements.
Deltas, on the other hand, have to turn each X/Y move into a series of short moves (in my code into 0.2 mm moves) because as you traverse lets say from -100 to +100 on X axis, it is not like on cartesian where you just command X axis stepper to move, you have to command all three steppers to move and not in one single move, but a series of smaller moves to move end effector in a straight line.
So where you have a single move on cartesian machine, you have hundreds of small moves on a delta machine and for each move you have to calculate for square roots and a bunch of other operations to get where you want to. And if you can save half the time on each short move, then it adds up.

Reply Quote

Traumflug

Re: Getting more speed out of Arduino Mega
June 24, 2015 06:02AM

Registered: 13 years ago
Posts: 7,616

Quote
thetazzbot
I.e. who approves the pull requests?

I consider it to be a myth that one can simply take such pull requests as-is. Most of them break other code, other platforms or are simply done only up to the "it works for me" state of the particular developer. Maybe Marlin is in its current state because it accepted such code.

Also, AndrewBCN is right, Teacup is not a Marlin clone. Doing such a thing would be pointless. Teacup tries to do things with cleaner code (which is why I brought it up here), less resource consumption and accordingly better performance ... in print speed as well as print quality.

Generation 7 Electronics	Teacup Firmware	RepRap DIY

Reply Quote

dc42

Re: Getting more speed out of Arduino Mega
June 24, 2015 08:34AM

Registered: 10 years ago
Posts: 14,672

Quote
TESKAn
You are correct. Cartesian "core" is already pretty optimized as it uses GRBL project, also G code is already in cartesian so you don't have to do any expensive calculations to get from G code to stepper movements.

However, the motor steps are generated at uneven intervals except for moves in particular directions, because of the algorithm used. More advanced firmwares use a different algorithm to generate the steps at even intervals.

Quote
TESKAn
Deltas, on the other hand, have to turn each X/Y move into a series of short moves (in my code into 0.2 mm moves) because as you traverse lets say from -100 to +100 on X axis, it is not like on cartesian where you just command X axis stepper to move, you have to command all three steppers to move and not in one single move, but a series of smaller moves to move end effector in a straight line.
So where you have a single move on cartesian machine, you have hundreds of small moves on a delta machine and for each move you have to calculate for square roots and a bunch of other operations to get where you want to. And if you can save half the time on each short move, then it adds up.

That's not the only way of doing it. RepRapFirmware doesn't segment long moves at all. Instead it calculates the step times for a delta directly by solving the equations of motion.

Large delta printer [miscsolutions.wordpress.com], E3D tool changer, Robotdigg SCARA printer, Crane Quad and Ormerod

Disclosure: I design Duet electronics and work on RepRapFirmware, [duet3d.com].

Reply Quote

Wurstnase

Re: Getting more speed out of Arduino Mega
June 24, 2015 08:43AM

Registered: 9 years ago
Posts: 4,977

Quote
Traumflug
I consider it to be a myth that one can simply take such pull requests as-is. Most of them break other code, other platforms or are simply done only up to the "it works for me" state of the particular developer. Maybe Marlin is in its current state because it accepted such code.

No one will accept this code. We have already a discussion on github.
There are some issues with that code, which will break some ideas. However, some points in this are good for a PR, but not all. Join use on github, make a PR and a lot of people are there to discuss.

Quote
dc42

Quote
TESKAn
Deltas, on the other hand, have to turn each X/Y move into a series of short moves (in my code into 0.2 mm moves) because as you traverse lets say from -100 to +100 on X axis, it is not like on cartesian where you just command X axis stepper to move, you have to command all three steppers to move and not in one single move, but a series of smaller moves to move end effector in a straight line.
So where you have a single move on cartesian machine, you have hundreds of small moves on a delta machine and for each move you have to calculate for square roots and a bunch of other operations to get where you want to. And if you can save half the time on each short move, then it adds up.

That's not the only way of doing it. RepRapFirmware doesn't segment long moves at all. Instead it calculates the step times for a delta directly by solving the equations of motion.

Right, but this needs a little bit more than an old 8bit AVR.

Triffid Hunter's Calibration Guide

--> X <-- Drill for new Monitor

Most important Gcode.

Reply Quote