anhardt / marlin Goto Github PK

This project forked from marlinfirmware/marlin

Reprap FW with look ahead. SDcard and LCD support. It works on Gen6, Ultimaker, RAMPS and Sanguinololu

License: GNU General Public License v3.0

C 24.78% Makefile 0.28% C++ 70.31% Python 0.80% Shell 0.60% Objective-C 2.22% CMake 0.04% OpenSCAD 0.05% HTML 0.31% CSS 0.05% JavaScript 0.42% G-code 0.02% PHP 0.07% Assembly 0.03% GDB 0.01%

marlin's Issues

Speedlimits

What is limiting speeds at a stepper driven 3d-printer?

G28
G0 X200 F200000 ; 200 meters/minute = 3333 mm/s

Will order the printer to first home and than to move a 200mm long line along X with 3333 mm/s what should last 200/3333 ~ 0.06s.
That's unrealistic. No printer can do that.

Processor
Let's calculate µsteps/s. With a common 100µsteps/mm we will get 200*100/0.06 ~ 3.333,333 µsteps/second ~ 3.3Mhz. Our 16MHz micro-controllers can easily step that (when doing nothing else), but most stepper drivers will not.

Stepper drivers
The stepper drivers do need step pulses to be not shorter than 1/8-30 µs and a at least equal long break between the pulses.

TMC2100-LA ~4Mhz/µstep ~15KHz/full step 256µs/s,
A4988 500KHz/µstep 31KHz/full step 16µs/s,
DRV8825 250KHz/µstep 8KHz/step 32µs/s ,
TB6560AHQ 15KHz/µstep 7KHz/step 2µs/s)

Stepper Motors
Internal magnetic forces are limiting the steprate of stepper motors. They can't rotate infinit fast. Speed can be improved by higher voltage, more current, other decay mode.
The limit is usually far below 10K full steps/second for a bare unloaded stepper motor. For one with a useful load it can be a magnitude lower.

At maximum speed, the force created by the motor is in balance with the force it needs to push the axis forward with this constant speed.
The highest step rates can be reached if accelerated slowly.

Axes Lenght
Despite of having said above, low accelerations do not make much sense. For infinit low accelerations you need infinit long axes to reach maximum speed. For a nice tool to play with see,
Prusa Calculator (The vertical axis shows speed. The horizontal axis shows way)

Acceleration
Locking at that graphs you may have realized that the move we started with (G0 X200 F200000) will take much longer than the estimated ~0.06s, even if the feedrate could be reached. You will need additional time to accelerate an decelerate.

Jerk
F = m * a (Force, mass, acceleration) is only then that simple when the kinetic system is 'stiff' and the movement 'continuous'. Stepper based systems are neither 'stiff' nor 'continuous'. Because of the steps movement can't be continuous. And because you can move the stepper about 1 full step forward or backward without losing steps when you reduce the force, there is some elasticity. (If you try to move it more than one full step it will snap in n*4 full steps from the original position.)
Similar to

This 'spring' system converts the discontinuous move/force into a much more continuous.
Without this 'springiness' it would not be possible to make any step without losing it. (F=m*a with a=infinit --> F=infinite)
This elasticity also allows us to accelerate our moves from a speed different from 0. That is called jerk in Marlin.

This allows us to reach the wanted speed earlier, reduces the time the move will last, makes it more similar to the wanted constant speed we ordert with the g-code.
Jerk is what is described in some stepper datasheets as 'pull in torque' (under load), 'self start range'.

Moved Mass
The mass a stepper motor has to move has no direct influence to the top speed it can reach (That's more depending on the quality of the barings). But it has influence on the acceleration we can apply. F = m * a. With a given fore/torque 'a' varies with 'm'.
Masses are different for the axes on different types of printers. Let's only have a look at XYZ. E is different,
Here multiplied with a guessed amount of moving per step:
// efector may include e-stepper (direct) or not (bowden)
For a printer like a Prusa i3

X-stepper moves the efektor
Y the bed
Z the x-axis + x-stepper + efector

For a ultimaker mechanism

X moves y-axis + efector
Y x-axis + efector
Z the bed

A CoreXY

A is moving 1/2 efector + 1/2 x-axis
B is moving 1/2 efector + 1/2 x-axis
Z the bed

For a DELTA things are not that simple but we can guess worst and best case
For all 3 towers (only one tower moving)

vertical diagonal rod -> towers sled + 1.2 * efector
horizontal diagonal rod -> towers sled
above [0,0] -> towers sled + 1.2 * efector
worst case -> towers sled + 1.3 * efector
(ask me for the test program)

Most interesting facts here are:

We have to adjust max_acceleration for the worst case
Worst case is independent from all other steppers. We can test them separately.
We have to adjust max_acceleration per stepper, not per axis!
Calculating/limiting a common Cartesian acceleration is not very useful.

@native_English_speakers
I'd love if you could improve this article in spelling, grammar, ...

Ideas for an alternate heater algorithm

Newtons law of cooling:
T(t)=U+(T(0)−U)*e^(−kt)

With
k = a constant
t = time
U = temperature of the environment
T(0) = temperature at time 0
T(t) = temperature after time t

matches the temperature of our heaters (nozzle/bed) very well, when the heater is off

The impossibility of safe automatic sensorless homing

Save automatic sensorless homing is impossible in several ways.

A text to read on a long winter evening.

Wording:

Homing an axis is the process of determining a (carriages) position on an axis and giving that position a number.
That position has to be indicated automatically by something - a sensor.
In contrast to that a manual homing could be sensorless. Drive the carriage to somewhere and tell the system by a G92 axis position what axis is at what position now.
At the so called sensor less homing not No but the Stallguard Sensor in a TMC stepper driver is used.
From now on let's talk about "Stallguard homing" and imply 'automatic' and 'not sensorless' but a special kind of sensors.

Lets restart with

The impossibility of safe Stallguard homing

What makes Stallguard different from other sensors

If understand that right.

Stallguard:

Stallguard tries to measure a kind of 'Load-angle' - the difference between the angles of the static magnetic field of the rotating rotor and the rotating electromagnetic field of the static coils. Without load that angle is ideally zero. If the angular difference is larger than one full step the motor will lose (at least) 4 full steps if the load is not immediately decreased. If the angle exceeds two full steps a loss off at least 4 full steps is warranted.
There are two registers in the TMCs. In one is the numerical representation of the 'Load-angle'. In the other is a value to compare the measured value with. Depending on, if the one value is larger/smaller then the other, a pin can be set. (The direction depends on the version of Stallguard) Under what exact circumstances the pin is reset is still unclear to me. For now I see it as a pulse of undefined length.
At least the register containing the actual measured load-angle can also be read out via the drivers serial (UART or SPI) connection and could be compared to a threshold stored in the processor. Endstop-pin-less-homing can be realized by this. Saving one (ideally interrupt capable) pin per driver, but needing much more time to read out and process. Especially software UARTs will often mess up Marlins timing completely, by disabling interrupts while receiving.
However. To distinguish the 'Load-angle' caused by acceleration from that of breaking from hitting something unmovable Stallguard needs some steps with a minimum constant speed to calibrate itselves. To reach that speed it needs some accelerated steps. The summed exact way for this depends on jerk-speed, acceleration and the wanted speed for this. That value can be different for all the axes. For the further, of this article, we don't need the exact number but we have to hold in mind:

Stallguard needs to move SOME_WAY before it is able to detect a stop!

(@tmc-experts: Do you want to contribute the details?)

Legal Positions:

We home an axis when we do not know where we are on that axis, else we could simply G0 or G1 to where we want. Not being homed means: we can't do absolute moves, we don't have working software-endstops, but we can (for a machine with independent axes) assume we are in a 'legal' position (somewhere in between the one and the other end of the axis). (For machines with not independent axes it's a bit more complicated. A DELTA is broken if the difference of the highest and lowest carriage exceeds a threshold) .
(@delta-experts: can you contribute the details?)

A (not already broken) Machine is switched on in a 'legal' position!

Homable positions:

Normally all 'legal' positions on an axis are homable. But there are exceptions:

endstop not at the very end of that axis but position is beyond that point. No endstop in homing-direction left.
dedicated home-switch not at an end of the axis but no endstop behind that and position, in homing direction, already behind the home-switch.
Position already at a endstop and sensor will only send a puls at the first contact.
Homing sequence could be of importance.
...

With Stallguard, from positions being closer to an end than SOME_WAY, axes are not homeble.

Could there be a 'sequence' of doing moves to solve that problem with automatic homing?

No!
It was suggested to move by SOME_WAY in the opposite homing direction before homing.
That will fail silently (but well audible ;-) ) if closer than SOME_WAY to the other end.

The only way out is manually checking if the position is closer than SOME_WAY to the end and moving away before homing.

But that's not very automatic any more.

The problem is about the same with lifting Z before homing X,Y to not scratch the bed with the nozzle, before Z is homed - but worse. Lifting Z before Z is homed can cause crashing into the opposite end of the z-axis. But here we can protect the system with a z-max-endstop - not so with Stallguard, because here, by definition, the moving distance is to short.

Additional difficulties with Quick- and DELTA- Homing.

Quick- and DELTA- Homing are very similar to each other. In both cases at first a group of axes is moved together until one of them reaches the limit. Then the individual axes are homed. Additional to the above constrain (all axes have to be able move at least SOME_WAY before hitting) here we have to solve additional problems.
With Quick-Homing there is a problem when the start point of homing is less than SOME_WAY
away from the diagonal ending at corner we are homing to. In that case the second arriving axis is less than SOME_WAY away from it's endpoint. Because the axes can (normally) be homed individually and we have already traveled most of the way to the end to be homed at, we can safely back up each axis by SOME_WAY before homing it individually. The sequence of homing the individual axes does not matter.
With *DELTA-Homing we can do the same but have to care about Legal Positions. We will have trouble if the position before homing is closer than SOME_WAY to an illegal position. For example let's assume C-tower has the highest carriage, A the lowest and B somewhere in between. C and A already at the largest allowed distance. After the common move C touches the end and the distances between the carriages are the same as before. If now A begins to back-up, the distance between the carriages will become more than allowed, the machine will break.
A hopefully safe strategy could be:

Move all axes up together until the first axis hits.
Move the first hitting axis back by 2*SOME_WAY, the others by SOME_WAY. That will make the maximum difference between the axes smaller.
Repeat from the beginning until all axes have been at the end at least once. Now the max. difference between the axes should be <= than SOME_WAY
Finally home all axes in random order, without a further back off.

Good luck!

Please remember:

There can't be a save Stallguard-Homing! It needs visual control before hitting the button! Don't use it in a batch or g-code program!

Does this make sense?

AT boelle: This is neither an issue nor a pull-request. This is a try for a start of an developer talk. Please stay away.

Concept: Deaccelerated fast stop

what are your thoughts on saving the buffered commands so the detection is immediate? Is it possible to inject a command into the buffer on the fly? Is this a bigger task to tackle?

The detection is already close to imediat - it's the reaction on the detection what is delayed.
It's impossible to inject a movement command 'on the fly'. It needs a good amount of preparation to do so.
It's not impossible to improve the reaction time, but it's a job for the 'big boys'.

Currently we have two ways to stop a print.

Stop planing new planner buffer lines and wait until the buffer is empty. This is what is currently used with filament runout and can last for a while.
Stop planing new planer buffer lines, stop stepping immediately and delete the planer buffer content. This is used as a emergency stop and with power failure detection. This is not recoverable without homing because the sudden stop may lose steps because it is not deaccelerated. The content of the buffer is lost forever. (In the example above the layer change would be lost!)

What Marlin is missing is a coordinated deaccelerated fast stop. A concept for that could be:

When detecting 'filament runout' or 'power failure' or any other situation we want to stop fast (endstop ?) - set a flag as fast as possible (maybe in a interrupt).
If the flag is set
** in the stepper IRQ go into deacceleration mode. In the planner buffer we already have all needed information. We have the maximum allowed deacceleration for every line and we have a point from where in the line we have to deaccelerate. If the complete line as accelerating or at target speed this point is the end of that line. However for the complete buffer content its warranted the final speed is zero. If we ignore the precalculated point from where on to deaccelerate but do it when the 'flag' is set we can reach the stop much faster (at least in the cases when the buffer is full of long lasting lines).
We can stop stepping when the speed is at or below jerk-speed, or at zero.
** Meanwhile the planner stopped accepting new moves, finalised planing the last accepted move and optimized the buffer for highest possible speed (as always). So planner buffer state is consistent. The stepper ISR did not rewrite the planner buffer, just ignored the instructions from where to deaccelerate.
We can now save the machine state outside the planer/stepper.
Now, when the planer does not plan anymore, the speed is zero and no further steps are done, we can save the stepperISR state (position and where in the last line we stopped) and ether save the entire planer buffer, reset it and use it for injecting new moves (for now to bring the nozzle away from the part), or if we just want to inject some lines we can simply use an other (maybe much shorter) line-buffer. Now reset the flag and resume stepping.
Now we can do the filament change or whatever
When we want to resume
** at first we have to restore the machine physical state (position, nozzles priming state, temperatures, etc)
** than the machines outer logical state
** than the blanner buffer state. For that we stop the stepper IRQ again and copy in the saved contend (or switch to the original buffer). Now we have to replan the buffer content. Parts of the first move are already done and current speed is zero. Most of the work is already done, we just have to shift the points for end of acceleration and begin of deaccelerations, what usually the 'optimizer' does. Now we can enable stepping an accepting new moves from the original queue.
Done.

Remarks:
Deaccelerated coordinated fast break could have less 'break-way' than just stopping to step. When steps are skipped the movement is accelerated again on the 'backside' of the 'dents'. Moving 'the field' 'right' means keeping the 'magnet' on the frontside of the 'dent'.

In systems sub-segmenting moves (deltas) we need to save and recover the sub-move number we stopped.

Visualisation of x-sled-rotation

// x_rotation.scad

// Animated demonstration for nozzle
// and probe seeing not the same height
// caused by rototion of the x-sled,
// around the x-axis,
// when moving in x-dirextion.
// 
// Load in OpenScad.
// Start animation with about 10 FPS
// and 50 steps.
// Observe from all sides, then take
// a look directly along the x-axis.
// See nozle (red) and probe (blue)
// not flying at the same height.
// Play with the next 3 values.

z_till = 10; // misalignment of the z-towers in °.
// when 0 machine is perfectly adjusted.

nozzle_dist = 20; // from center of rotaton in mm.
// When 0 the effect to the nozzles height is minimized.

probe_dist = 40; // from center of rotaton in mm.
// When same as nozzle_dist the probe will see the same height as the nozzle.
// probe_dist > nozzle_dist --> overcompensation
// probe_dist < nozzle_dist --> undercompensation

rod_len = 300;
rod_dia = 8;
x_rod_dist2 = 25;
cone_size = 10;

module rod(){
  translate([0,0,-rod_len/2])
  cylinder(d= rod_dia, h= rod_len);    
}

module z_rod(){
  rod();
}

module x_rod() {
  rotate([0,90,0]) 
  rod();    
};

module nozzle() {
  color("red")
  translate([0,nozzle_dist,0])
  cylinder(d1=0, d2=cone_size, h=cone_size);
}
module probe() {
  color("blue")
  translate([0,probe_dist,0])
  cylinder(d1 = 0, d2 = cone_size, h = cone_size);
}

rotate([-z_till,0,0]) translate([-rod_len/2,0,0]) z_rod();
rotate([z_till,0,0]) translate([+rod_len/2,0,0]) z_rod();

x_end_shift = x_rod_dist2 * tan(z_till);
echo("x_end_shift",x_end_shift);
x_rot = atan2(x_end_shift, rod_len/2);
echo("x_rot",x_rot);

rotate([0,0,x_rot]) translate([0,0,-x_rod_dist2]) x_rod();
rotate([0,0,-x_rot]) translate([0,0,+x_rod_dist2]) x_rod();

x = $t * rod_len - rod_len/2;
x_sled_rot = 2*z_till * $t - z_till; 
echo("x_sled_rot",x_sled_rot);
rotate([x_sled_rot,0,0]) translate([x,0,0]) nozzle();
rotate([x_sled_rot,0,0]) translate([x,0,0]) probe();

Endstops, Probes & Homeswitches

A little series of articles about endstops, probes and homeswitches.
I want to cover:

Color Mixer

Idea for a steady colour mix:

reset_color_bresenham( ) {
  //called at the end of setup_color_bresenham(), and from step_colour_bresenham().
  //reset the line ready to draw the first point 
  //set colour_step to the length of the leading axis;
}

setup_color_bresenham( colourmix[EXTRUDERS] ) {
  //Called when the color is changed. Other virtual tool, or set colour, or changed E parameters.
}

uint8_t step_colour_bresenham() {
  Called from get_extruder().
  Make a step. return a bitmask for the steppers who stepped.
  If at the end of the line reset_colour_bresenham()
}

uint8_t get_extruder(void) {
  //Called whenever an E-step is needed - forwards and backwards.
  //Returns the index of a stepper.
  static uint8_t stack = 0;
  static uint8:t b = 0;
  do {
    while (stack) {
        if (stack & 1) {
         stack= stack>>1;
         return b++;  // steps could also be executed here, but pulse time cant be used for anything else.
       } else {
         stack = stack>>1;
         b++;
       }
    }
    b=0;
    stack = step_colour_bresenham();
    // may produce steps for more than one extruder at once, but only one per extruder
  } while (true);
}

How do steppers work (some rarely described facts)

The key here is the different definition of jerk in Marlin and the rest of the world.
Physicists talk about jerk as the change of acceleration (m/s²) over time - having the unit m/s³.
Marlins jerk-speed is in m/s - the fastest speed we can begin a move with when starting from 0m/s. A difference in speed.
I hear you thinking - that's impossible. You can't have a jump in speed - that needs because of F=m*a an infinite force. It's the other way around. We don't move a mass. We do change the direction of a field. It's about like loading a spring. And the mass of the rotor is forced and accelerated by the spring to rotate.

Imagine a powered stepper motor with only 4 full steps per rotation. Now we mount a lever pointing to 12 o'clock. Now we take a finger and try to displace the lever up to 3 o'clock. We experience a increasing force/moment against our finger, reaching it's maximum at 3. Going further to 6 the moment decreases down to zero. Reaching 6 the lever is pulled away from the finger and increasingly accelerated to 9, then decreasingly accelerated to 12, where it reaches its maximum speed. Then because of the momentum of the mass it continues its move, now decelerated by the field until (on a low friction system) our finger is hit near 6 from the other side. We'll see some oscillations around 12 with decreasing amplitude. In summ we skipped 4 steps.

No displacement between field and rotor - No moment!

Now the other way around. Put away your finger. We apply a step. Logically this is an immediate proces. We tell the stepper driver to take a step. That's a nearly infinite fast thing, we just switch a pin to high. Locking at the field it takes somewhat longer to decrease the current in the one coil and rise it in the other. But compared to the now slowly starting to move rotor, changing the field is a fast process. The rotor follows the, now pointing to 3 o'clock, field and is accelerated up to there. If no further step follows in time the rotor is now decelerated, returns at a bit before 6, and oscillates around 3 for a while until all energy is converted to heat by the friction. Now we could repeat our first experiment - displacing the lever with the finger. Will get the same result turned 90°. We also will lose 4 full steps again.
Now let's see what happens when we don't take full steps, but microsteps. Let's assume 3 microsteps per full step, to stay with the picture of a clock. We readjusted the lever to 12 o'clock. Right, we applied a full step backwards. Pushing the lever to there will not work. Now we can apply the microstep. The field now points to 1 o'clock. The rotor, coming from 12 follows accelerating the field to 1, has its maximum speed there, swings over up to nealy 2, oscillates around 1 and finally points steadily to 1. All in all this microstep was much less violent than the full step. Overswing, speeds and duration of the oscillation have been much lower. The microstep was smoother than the full step. Repeating the experiment with manually displacing the lever shows increasing force up to 4, zero at 7, a return to 1. The needed forces and achieved speeds have been about the same as with full steps (depending on the exact currents the stepper-driver applied). We learn, even here the amount of lost full steps, when displacing manually was 4, or in microsteps 4*3= 12. We also can learn, we have to apply 3 microsteps, with unmoved rotor, until the moment can reach its maximum. Further follows, if we have a system with N microsteps per full step, we can apply 2*N-1 (in our 3 microstep system for example from 12 to 5) microsteps at once while holding the lever, and the lever will still go to the right place and in the right direction when released.

That has consequences for 'double-' and 'quad-stepping' at systems with full- or half-stepping when starting from zero speed. In 'double stepping we apply two steps as fast as we can. That means, on a full-step system, we get zero moment and the direction of the move, when it comes, is undefined (Butterfly effect). Applying a quad step makes the rotor think - nothing to do because field did not change. On a half step system we have the same situation with a quad-step - direction of rotation is undetermined. Nevertheless double- quad-stepping does work. We don't hold the the rotor, can't apply steps infinite fast and, provided jerk-speed or 'junction deviation' is set low enough, start with a speed below it takes double-stepps. When we reach the speed we begin with the double-steps the rotor does not stand still - its direction is determined.

Double- (quad-)stepping is about as switching the micro-stepping one (two) levels down. For example from 16 to 8 (16 to 4).

Now let's add a very special (unreal) stick only break to the 3 microsystem system. We adjust the break to the point where, when the field points to 12, we can position the lever to every place between 11 and 1 (that also means between 5 and 7). Let's point with the field to 12 and also the lever. Now we apply a microstep, the field is pointing to 1 but we don't see any change. The moment applied to the rotor by turning the field by 1 hour is just not big enough to exceed the stick-force. When we apply a second micro-step, the field points to 2, the rotor brakes loose, accelerates up to 2, swings over to about 4, oscillates a bit around 2 until the amplitude is below 1h difference and finally stands still somewhere between 1 and 3, where 1 and 3 are the most likely places (because oscillations have zero speed at the point they change direction and stick forces can grab). What we will see when we apply the next micro-step, pointing the field to 3, depends on the position of the lever when we apply it. At 1 it will end at a place between 2 and 4. At 2 and 3 it will not move, because the stick-forces are not exceeded. However, the lever will always end at the place where the field points to, +- one hour. If we change to more microsteps this will stay the same - but an hour will be an other amount of microsteps.
If we take a look at the lever we can't say anymore where the field is pointing to. It's somewhere +-1 hoer before or after the lever. Think about what happens when we increase the stick forces. Right - the margin will grow - but how far can we increase? Right - Until the margin is one full step. Then we don't get any movement because the stick-force is greater than the moment the motor can develop.
When we (again) hold the lever and apply a bunch of steps we now have to calculate the max with 2*N-1-(stick-force in micro-steps), to get always a movement into the right direction.

Why can't we step infinitely fast? Because it takes some time to readjust the currents in the coils. If we switch off the current in the coil before it reached its maximum, maximum moment is not reached. Because the moment goes to zero when the time for a step goes to zero, somewhere we get to the point where the moment is not large enough to overcome and resistance. The angle between rotor and field increases and finally, when the distance is one full stepp, the motor will skip (at least 4 full steps).

Why can't we accelerate infinitely fast? The now the not that surprising answer is - We can! But before you want to give the Nobel-award to me I have to restrict/relativate this. We can, at the step level for a very little amount of microsteps. We can, safely jump 1 full step if field and rotor was pointing into the same direction, to get the full moment instantly. Thereafter F=m*a rules! We have to accelerate slow enough to never exceed the motors max moment at that speed.

Why do we want S-shape-acceleration? The upper part of the S is about the lower available moment at high speeds. You remember - when we do step slow the coils get the full current where we have the highest moment. When stepping fast we don't have that high currents - not that high moments. So when stepping slow we have a lot of moment available to realise high accelerations, but when stepping fast a lot less.
The lower part of the S is about smoothness of the move. When we apply high moments, we get high displacements between field and rotor. Do you remember the oscillations when we applied full steps?

Because of that (and some more elasticities in the system) we can apply Marlins jerk-speed. It describes the length of the break in between of the first two step pulses. How much the break shrinks is described by acceleration. The number of steps in time is speed, the sum of steps - way. Jerk (in the conventional sense) is variation in acceleration.

EDIT: Still not satisfied with the S-shape chapter.
EDIT: Still no chapter explaining the dynamic.

Some Remarks about SPI

There seems to be some confusion about what SPI is, does, how it works, has to be handled. Here some remarks.

SPI Buss:
The common lines of a SPI-buss are the 3 lines: a clock (SCLK), a line where the Master is sending and the Slave is receiving (MasterOutSlaveIn) and a line where the Master is receiving and the Slave is sending (MasterInSlaveOut).
Normally having one SPI-Buss for all Slaves is enough. There are four reasons for having more.
a.) There are to many slaves on the buss. Even when usually a Slave is switching its inputs to high impedance when not selected the current providable by a Masters output pin may not be enough to produce a clean signal.
b.) The data volume may exceed the possible amount on one buss. (summ_of(bits_to_transfer_per_second_to_a_slave/baudrate_of_slave)over all the slaves > 1)
c. One (or more) Slaves are ignoring or interpreting the SS-Signal falsely - like the ST7920 driving the REPRAP_DISCOUNT_FULL_GRAPHIC_SMART_CONTROLLER receiving random data even when SS is high.
d. Having SPI-Transfers in an interrupt. A SPI_Master strictly communicates to only one Slave at a time, the one with the SS at low. Driving more then one SS low is only allowed in send only configurations, when all slaves have to receive the exact same data and are not sending back any data (MISO disconnected at the slaves).

SPI Slave:
A SPI-Slave is a device a SPI-Master wants to communicate with. Usually it has 4 pins related to SPI-Communication: a input called SlaveSelect (or CE, or ...) drawn low by the Master to select this specific Slave when the Master wants to talk to this Slave (may be fixed to low if only one SPI-Slave is connected to the SPI-Buss), a input called SerialCL*oK** driven by the SPI-Master and at least one of the following two: MasterInSlaveOut and/or MasterOutSlaveIn, depending on if only data is received (only MOSI on for example some display controllers) or send (only MISO for example MAX6675 thermometer) or both (for example an SPI-SD-Card-reader/writer). There can be multiple Slaves connected to one SPI-Buss.
There are 4 things a SPI-Master has to know about a SPI-Slave: What SPI-Bus is it connected to; What is the pin to set for the right Select of the Slave; What is the maximum speed the Slave can handle; And finally the SPI-Mode the Slave expects (Clock polarity and edge of the clock when to fetch the data) and the bit-order LSBFIRST/BSBFIRST (That's not the Byte-order described by the "Endianness" (big-endian/little-endian) when sending multi-byte data like int16_t or int32_t - what is simply ignored by transfer16()).
Being a SPI-Slave is not handled by the Arduino-SPI-API.
In Marlin the CPU Marlin is running on is never a SPI-Slave.

SPI Master:
Each SPI_Bus must have one (and only ONE at a time) SPI-Master. That is the one driving the SCLK-line. Usually it has to drive SCLK, MOSI and the SS-Lines for the different Slaves and to read from the MISO line. (In special cases all but the SCLK_Line may be omitted.)(In SPI-MultiMasterMode the chips connected to a Buss may change their Master/Slave-roles - but there is always only one Master at a time.) In Marlin the CPU Marlin is running on is always the SPI-Master. One SPI-Master is driving one SPI-Buss. One CPU can have multiple SPI-Masters - so drive multiple SPI-Busses.

Hardware SPI:
All CPUs Marlin is currently running on, have at least one Hardware-SPI peripheral integrated into the processor. One of this set of registers and pins is driving one SPI-buss. In general you can select one of a set of pins used for driving a SPI-Signal_Line but the Arduino-SPI-library selects always a predefined set of pins - the selection is not changeable by the Arduino-SPI-API. So SPI, SPI3, SPI6, ... always us the same pins - except we work around the Arduino-SPI-API and use a deeper layer.
Usually the data send register is feed and picked up byte by byte but on some processors even block transfers via DMA are possible.
All Hardware_SPI processor peripherals have a dedicated SS input pin (maybe selectable from a set) for being a slave. That is switching, when drawn to low, automatically the MISO and the SCLK pins from high impedance to inputs. On the AVRs, for being a Master this pin has no special function. On Arduino boards this is usually the one labeled "SS" (if labeled). When being a master this pin may be used as one of the SS-outputs, but any otherwise unused pin is ok for that.

Software SPI:
Software SPI is possible with any set of not otherwise used pins. The main disadvantage of using Software-SPI is speed. Here the processor has to send the data bit by bit. This limits the achievable SPI transfer speed - keeping the CPU busy for a longer time when sending the same amount of data.
In Slave-Mode it would be nice if the SS-input and the SCLK could cause an interrupt. Otherwise the CPU will be busy in polling this pins and cant' do much else. (Being a Software-SPI-Slave is not a good idea.)

Hardware SPI vs. Software SPI
On any system Hardware-SPI can transfer data much faster, with a lower processor load than with Software-SPI. If that higher speed is usable depends on:
Is the Slave capable of receiving faster than the Software-SPI-Master can send?
How granular is the SPI-Speed adjustable? In the old Arduino API there is only a "full speed devising factor" adjustable in potentes of two (1, 1/2, 1/4, 1/8, ...). If there is one factor just right to produce a SPI-data-rate a bit too fast for the Slave, then the next lower data-rate will only make use of a bit more than half of the possible data-rate of the Slave. If the achievable data-rate for the Software-SPI is higher than that, like at Marlins fine tunable Software-SPI device for the ST7920_RRD in U8G and AVRs, than that is preferable (Must be on its own SPI-Buss anyway because of the broken SS - or you have to use tricks to avoid as much visible garbage on the screen as possible).
Otherwise (than duration of the transfer) at Marlin it makes not much of a difference if a Soft- or Hardware-SPI is used. Marlin always waits until the transfer was completed before doing something else. Marlin does not have any SPI-transfers in interrupts. Even SPI-DMA does not make the transfers much faster, only the gaps between sending the bytes are shorter. Marlin busy waits until all data is sent and/or received back.
Making DMA-SPI-Transfers really useful, avoiding the busy waits, making that time available for more useful things, would result in a completely different structure of Marlins SPI concept (a mayor rewrite).

Old Arduino SPI API:
As far as i can remember the old original Arduino SPI API was made with only the AVRs in mind. You had to do about everything by yourselves.
The one SPI is already define. SPI.begin() had no parameters.
The mode of SS,pin(s) had to be set.

When selecting a slave the sequence setClockDivider(a); setDataMode(b), setBitOrder(c); digitalWrite(SSPin, LOW); has to be sent. Note: All the set commands having only one parameter. If there was any chance an other Slave was still selected you had to check all other SSPins before - and wait.
Then you could transfer the data with recdata = SPI.transfer(sendata);. Note: having only one parameter when transferring one byte only and having two parameters when sending an array.
After transfer to one Slave you had to deselect it by a digitalWrite(SSPin, HIGH);

The intermediate DueExtendedSPI:
When the DUE came up the SPI API was extended. SPI.begin(pin) could now have a pin parameter from a small set of capable pins for the SS-pin. setClockDivider(pin, a); setDataMode(pin, b), setBitOrder(pin, c);. recdata = SPI.transfer(sendata); got two new parameters; the pin and a flag if the transfer was a final one or not. For that the digitalWrite(SSPin, HIGH|LOW); before and after the transfer can be omitted.

The current Arduino SPI API:
Was mend to simplify the change of the slaves. The data for ClockDivider, DataMode, BitOrder went into a structure (class) named "SPI Settings". This struct is used in SPI.beginTransaction() to switch all three parameters at once. For whatever reason the SSPin is not part of the "SPI Settings" so handling of that is still as before. Also SPI.beginTransaction() does not give back any data like if that call was successful or not because maybe an other Slave on that buss was still in use.

Adafruit BussIO:
Adafruit BussIO here especially Adafruit_SPIDevice is a game changer. A Adafruit_SPIDevice for the first time describes a SPI-Slave completely, adds a relatively fast Software-SPI implementation. The higher level read(), write() and write_then_read() functions do handle the SS-Pin and start/end-transaction automatically, while you have with transfer() the full flexibility to do all by yourselves.

Planner - optimizer

Primary goal of the planers optimizer is to build a chain of connected boks.
The secondary goal is to plan the moves speeds as close as possible to the wanted speeds, but not faster.

Here I just take a look at the planes optimizer - presuming max junction speeds are calculated correctly and chained (current blocks max-start-speed = max-exit-speed of the previous block).

Some presumptions:
a) We can't alter a running block (executed by the stepper irq (tail))
b) If there is no planned block the machine does not move.
c) A tail block is running, if we don't prevent that.

Some consequences:
d) A first block, after a stop, starts from zero speed (or minimum speed, or minimum jerk-speed - for now thats all the same) [from c)]
e) A last block (head) ends at zero speed. [from c]
f) In case we have only one block in the buffer, after a stop, it starts and ends with zero speed. [from d and e] (filling)
g) In case we have only one block in the buffer, it always ends with zero speed. [from e] (emptying)
h) A second block always starts from zero. [from f and g] (except we hindered the first block to run - then we can replan the first one)
i) We can exclude the second block from the backward run. [from h] If we are indeed waiting for the second move to join it (filling only) (however we do it) we do mark both blocks as recalculate - and 'recalculate_trapezoids()' does its job.

Stallguard Homing leaves no leeway for braking

All kinetic energy of the moving mass (sled) is transferred to the stop (frame) before Stallguard is able to recognize that it cannot move any further.

It's like using a car's AirBag as a sensor. If the AirBag has triggered - it's too late to brake.

Most other kinds of endstops have at least a tiny bit of stopping distance regardless of how bad they are mounted. (FSRs are one of the exceptions)

A jurny into the depth of `MULTY_STEPPING` and `ADAPTIVE_STEP_SMOOTHING`

and why i think both are currently broken.

#Under construction !!!

What do these features do?
Both of them try to keep the rate of stepper-interrupts between limits.

MULTY_STEPPING cares about the upper limit. A stepper interrupt lasts some time. In that time no other code can run. Even it can't run while it is already running. So there is a upper limit how often we can run the stepper-interrupt per time.
When we do only one step per stepper-interrup this also limits the possible steprate. The idea of MULTY_STEPPING is to do more than one step per stepper-interrupt to enable the system to reach higher steprates. This works because doing more than one step per interrupt saves some overhead. With for example 'DOUBLE_STEPPING' each of the single stepper-interrupts (now doing two steps) takes longer than before but less than doing two stepper-interrupts with 'SINGLE-STEPPING'. Doing more steps per interrupt saves more time than doing less.

ADAPTIVE_STEP_SMOOTHING cares about the lower limit. When there are only a few stepper-interrupts per time, the CPU is idling most of the time, doing nothing useful. Instead of this it increases the number of stepper-interrupts and does less than one step per stepper-interrupt while increasing the resolution.
In a Bresanham line drawing algorithm we have always one leading axis, that with the most steps to go in that line/move. This axis is normally steppt in each of the stepper-interrupts and the time between the steps is calculated for that axis. If now the relation of the steps per axis to do is 'uneven' like 100 to 75 (0.75) the pattern for the steps looks about that way:

X: 1111111111111111 -> 16
Y: 1101011010110101 -> 10

Because the algorithm calculates in integer no half steps can be done. We have to alter between doing two and one steps at a time. That can be noticed - the stepper with the shorter way is running a bit rough. The step relation cant match (0.62)
When doubling the resolution the pattern for the same line looks like:

X: 10101010101010101010101010101010 ->16
Y: 10010010010010010010010010010010 ->11

The relation (0.69) can be represented better and the rhythm for the shorter axis is now much more uniform.
The higher the resolution the better the representation of the relation and the more uniform the rhythm.

So if the rate of stepper-interrups is below a lower limit we can double the amount of stepper-interrupts to double the resolution - to 'smooth the steps' (of the not leading axes).

What do we pay with?
For MULTY_STEPPING we pay with the opposite effect of ADAPTIVE_STEP_SMOOTHING. The rhythm of the steps becomes less uniform. For the leading axis where the time between the steps have been completely uniform when 'SINGLE_STEPPING' we get blocks of multiple steps, following each other as fast as the stepperdivers allow. For the stepper motors this appear as if the field is jumping several micro-steps at once. The stepper will run as if a lower micro-stepping would have been selected. Stepper-drivers like the TMCs, interpolating to up to 256 micro-steps, have to predict the time to the next step-pulse they will see. They do that from measuring the time between the last steps. This fails when the rhythm is not uniform. Doing more micro-steps at once than representing one (or two) full steps results in stepp losses.
For ADAPTIVE_STEP_SMOOTHING we pay with higher reaction times of other processes and/or interrupts. While the stepper interrupt is running nothing else does happen. Even lower priority interrupts are interrupted for the stepper-interrupts on the STs (not so on the AVRs). All things are suspended if not eiter in hardware, like hardware-PWM or DMA. We have to care about leaving at least some time for doing other things. Else the system becomes increasingly unresponsive until either a segment with a lower interrupt rate is worked on or the hardware-watchdog bites because its last refresh was to long ago.

How is this implemented?
Let's walk thru the relevant parts of stepper.cpp and stepper.h of the current Marlin-2.0.x. (In the hope this will not change as fast as bugfix-2.0.x with the risk the laser changes did change here relevant parts of the code (I don't think so))
Let' begin with the definition of some numbers of cycles for different parts of the stepper-interrupt for some types of CPUs. There is hopefully not that much to say about. Some estimations - hopefully at the high side. But not very diversificated - for example it lasts longer to save the F4s registers when the MPU is used because there are some more of it.
Then the number of processor-cycles per step in a stepper-interrupt taking R steps at once is defined.
Followed by some defines defining the maximum step-frequency (for one step) when stepped in a stepper-interrupt doing 1 to 128 steps at once. Here the most interesting one for ADAPTIVE_STEP_SMOOTHING is MAX_STEP_ISR_FREQUENCY_1X - one step per stepper-interrupt.
The next lines define a MIN_STEP_ISR_FREQUENCY. There is a comment this should be 10% of the full processor load but defined here is 100% of the full load. So with MIN_STEP_ISR_FREQUENCY = MAX_STEP_ISR_FREQUENCY_1X the processor can't do anything else but stepping. If the numbers for the cycles of the parts of the interrupt have been estimated to low stepping at MIN_STEP_ISR_FREQUENCY will load the processor with more than 100%. Let's see if that can happen later.

Let's have a look into the startup phase of the stepper-interrupt where a new block was just grabbed and we prepare some local variables we want to work with later - the calculation of oversampling_factor in stepper.cpp what later is used to shift left (increase) the steprate to calculate the counter-values for the timer to call the next interrupt. Locally oversampling is used to shift some counters left (increase). These values are used for the whole block, including the acceleration- and break- phases.
So with what load do we end up? We have a while-loop until (max_rate < MIN_STEP_ISR_FREQUENCY) with a additional check if the new doubled max_rate is exceeding MAX_STEP_ISR_FREQUENCY_1X. Only while both is true we increase oversampling. Because MIN_STEP_ISR_FREQUENCY = MAX_STEP_ISR_FREQUENCY_1X = maximum possible system load we will end up with a number using somewhere in between of 50% to 100% of the processors power - depending on the exact value of current_block->nominal_rate. So for the fix "slow probing speed" always the same amount as long we don't change it by will. The processor load during probing is "randomly" determined by the processor, its frequency and the probing speed and can be somewhere from 50 to 100%. (If we messed up the estimation for the cycles, maybe more than 100%. (50+x to 100+2x %))

How to fix ADAPTIVE_STEP_SMOOTHING?
In the light of how well it currently works, limiting MIN_STEP_ISR_FREQUENCY to only 10% of MAX_STEP_ISR_FREQUENCY_1X as the comment suggests seems to be extremely conservative

Dynamic approach
Relation to SLOWDOWN

Fix for MULTY_STEPPING