Git Product home page Git Product logo

Comments (23)

bkubicek avatar bkubicek commented on June 1, 2024

is this happening in-print? What is your platform / cpu speed? Could it be related to endstops being triggering when the ISR thing starts? How long does this take, maybe ~4 seconds? How often does this happen?
Is it reproducable, or random?

Thanks for the report!

from marlin.

blende64 avatar blende64 commented on June 1, 2024

I can confirm this. It happens while printing when a layer is almost finished. Printing stops some seconds while showing many "Error:0 ISR overtaking itself." Errors. Then it resumes with the next layer. After about 10-15 layers the firmware is crashed and doesn't respond anymore. (print was centered, far away from endstops)

Marlin 1.0.0 Beta 1 compiled with Arduino 0023
Orca v0.2 with Gen6
Pronterface Version unknown (mid/end October)
SFACT V42.4.1 11.09.15

from marlin.

bkubicek avatar bkubicek commented on June 1, 2024

once the ISR sends this routine, its likely that it will fail the next time, as serial sending blocks. Maybe it would recover faster if we send the message only once. It is not a solution.But it would be nice to know what causes this.

anyone, can you print the same file again, and is the error at the same location? Whats the frequency of the sanguinolulu?

ratenki: whats your microcontroller frequency, 16 mhz?

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

I am on a ramps 1.3 with arduino mega128 @16mhz iirc. I won't try again on the same print (yoda, 12 hours into a 13 hour print), but I'll see if it happens with my fine detail test print and report back if it is consistently failing in the same spot.

I also have increased my hardware buffer from 16 to 32 blocks, which seemed to make it more robust, but didn't eliminate the problem (and I may be fooling myself, as I didn't do A/B testing).

from marlin.

blende64 avatar blende64 commented on June 1, 2024

I just tried again. It will fail reliable but at random points. One common behavior is that pronterface becomes very instable/crashed when printing stops. After restarting pronterface you can reconnect without restarting the printer. ( I restarted the printer befor every try to be sure)

  1. try:
    starting from home position. It crashed after the first z-move ( lifting nozzle). Many "Error:28265 ISR overtaking itself." and then "endstop hit" but the opto is physically not triggered.
  2. try:
    printed about 10 layers successfully. Then many "Error:28265 ISR overtaking itself." . Moved 1-2 cm and finally crashed with many "Error:28265 ISR overtaking itself."
  3. try:
    after lifting the nozzle, many isr errors, then moving about 2cm and befor this move finished it crashed with this:
    " ...
    Error:26478 ISR overtaking itself.
    Error:26478 ISR overtaking itself.
    Error:26478 ISR overtaking itself.
    Error:
    echo:endstops hit: Z:1.91
    "
    no opto was physically triggered

electronic is standart mendel-parts Gen6

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

Pronterface will block waiting for results, so that isn't terribly surprising. I have crashed it by holding the arduino down in reset or unplugging the usb. Since the firmware is crashing, this is kind of expected.

Interesting that the numbers we see ie Error:$integer ISR overtaking itself. are all different... /me off to read source.

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

A little further examination of the code has me thinking we are lucky the heaters aren't running away.

Also, I added a local patch to only send the serial error once, instead of every overrun. I'll commit back if it helps.

Finally, I had the crash again, it happens at totally different places on every run for me.

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

Re-ran with the following changes:

// in configuration.h
#define BUFSIZE 5
#define BLOCK_BUFFER_SIZE 32

// in stepper.cpp
volatile int errorsent=0;

//in ISR(TIMER1_COMPA_vect)
if(busy){
  if(errorsent){
    SERIAL_ERROR_START
    SERIAL_ERROR(*(unsigned short *)OCR1A);
    SERIAL_ERRORLNPGM(" ISR overtaking itself.");
    errorsent=1;
   }
   return; 
}
///etc the rest of ISR
errorsent=0;

This prevented duplicate ISR error sends. It doesn't do anything to address the root cause, although I wonder if a larger minimum segment size would prevent the overruns by merging multiple short moves? I currently have 5, but with 16x microstepping, that is < 1/10 of a mm on my machine. Perhaps I could do 10? Just spitballing...

from marlin.

blende64 avatar blende64 commented on June 1, 2024

@rantenki: Thank you, that's it.

side effect: On all tests before, the print was not centered as it should be. Now it's correctly centered!
...
my 2 hour print just finished successfully with no ISR Error.

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

Unfortunately this fix isn't addressing the root cause, whatever it might be, although I suspect short segments resulting in the ISR taking longer than the inter-step time. We can probably figure that out with static analysis, but that doesn't sound like fun. ;) It isn't going to help people with 8Mhz and/or low ram (ie: atmega168) either.

from marlin.

blende64 avatar blende64 commented on June 1, 2024

But compared with a failed print this is a big step ahead. And with this fix it's for me more stable than the "official" 0.9.xx versions. ( improved resolution/quality would be really great too...)

from marlin.

bkubicek avatar bkubicek commented on June 1, 2024

I have been thinking a bit out what could be a possible reason for this. The only things i find that are not100% deterministic when printing idedntical files, concerning the ISR's operation, are the two following:

  • the fill state of the buffer. If the the buffer runs empty,the velocities and accelerations can change. Also, in case of low buffer, the ISR might or might maybe take a different, longer branch. In this case, larger movement buffer could help a tiny bit, as ratenki might or might not have seen.
  • Endstop hits or Endstop triggered for a very very short time due to electromagnetic noise. The latter would only be visible by lost microsteps= nearly not at all in normal print, if the firmware would only drop steps if the endstop is triggered. Counter this is the fact, that marlin then would stop the whole move, and this would lead to a probably visible shift. i honestly had one big layer shift, but i think that it was just a normal problem too-much-extrusion-bumping-into-the-rough-surface.
  • It could maybe happen that the stepper ISR is delayed by the temperature-timer interrupt interrupt it, or maybe also a serial receive ISR, of which I honestly don't know the operational time.

If it is the timer, we could do some manual interrupt priorisation: an enum/define with values 0 1 2 that are stored in a volatile uint8_t. While operating the moveISR or tempISR writes a 1 or 2 respectively. After finishing, both write a 0.
Before starting the moveISR checks, if the uint8_t is active either by itself=very bad, or by the tempISR=continue. If the tempISR is called, it checks if the moveISR is active, and if so, returns. If this helps, also more clever things can be done, e.g. the tempISR counting how many times it did not go active, and telling the moveISR to let it finish for once after a nr of failures.

from marlin.

ErikZalm avatar ErikZalm commented on June 1, 2024

blender64. You also seems to have a problem with your endstops. (False triggers)
I will change Marlin so that very small pulses will not be seens as endstop triggers. But you should also fix the problem in your printer.
Increase the distance between the endstop cables and the motor cables. Or even better use shielded cables.
ISR warning with a large number are probably caused by the endstops.

The ISR warning with 0 and 32 is a FW bug that I am investigating. The OCR1A should not be 0 or 32. I never write it with a value < 100. (A least not directly)

from marlin.

ErikZalm avatar ErikZalm commented on June 1, 2024

rantentki,

Can you reproduce this error. One of the problems for me is that I can not reproduce it.

from marlin.

bkubicek avatar bkubicek commented on June 1, 2024

Erik: I have seen some problem that manifests as pause in print, but with autonomous sd card printing. I can try to have a Serial connected host software to monitor whats going on. I suspect it could be the same cause, so at least I can test. The annoying problem: it only happens for me 3 times in 3h cummulated print time.

from marlin.

ErikZalm avatar ErikZalm commented on June 1, 2024

rantenku,

Can you test with the latest version? It now should display the message only one time and not in the ISR.

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

ErikZalm: Yep, I can reproduce anytime I want, although I am out of the lab today so I cannot try again til tomorrow. I'll give the new version a shot tomorrow.

Sorry, my work schedule is a bit hectic so it is tough to carve out time to test things.

from marlin.

ErikZalm avatar ErikZalm commented on June 1, 2024

rantenki,

Can you give a description on how you can reproduce it? That would be a big help.

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

Yep, very long runs with lots of very fine detail and short line segments with the default configuration. If I scale yoda down to 1/4 size it happens after a couple of hours of print time. Once I get back later today I can attach the gcode that triggers it (several hours in).

from marlin.

ErikZalm avatar ErikZalm commented on June 1, 2024

I removed the nesting from the stepper ISR. This was showing this message.

The nesting was allowed to prevent serial errors. This is now done by checking the serial line in the stepper ISR.

from marlin.

armyofevilrobots avatar armyofevilrobots commented on June 1, 2024

bkubicek/ErikZalm: The newest version as of Saturday doesn't seem to have the same issue, and I didn't get any failures during some pretty long prints which previously have failed. That said, this is not entirely deterministic, so it isn't guaranteed that the bug is fixed until we get a lot of repeated non-failures (and I don't need that many yodas). Pulling the serial writes out of the ISR certainly seems to have reduced the cascading nature of the failure, but I think we need a ton of testing before we can call it closed.

As for the setting of a volatile enum, perhaps we can locate a good arduino semaphore library (I understand that freeRTOS has some), because there is always a risk of a race in the check/set conditional that will result in incorrect state being read/set.

Man, stuff like this really reminds me why I don't miss embedded dev ;)

from marlin.

ErikZalm avatar ErikZalm commented on June 1, 2024

rantenki,

Thanks for testing. The saturday version improved it but I still had the problem when printing on high speed.
I decided to remove the nesting from the stepper routine. This was needed for the serial communication but I put a serial check in the stepper ISR. The nesting is not needed anymore.

from marlin.

github-actions avatar github-actions commented on June 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from marlin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.