Git Product home page Git Product logo

Comments (26)

DeqingSun avatar DeqingSun commented on June 2, 2024

You are right, Timer0Interrupt takes about 60~100 instructions to run, which is about 1~3% of CPU time. It will be better to use private register banks to reduce 50%~70% CPU time for millis. Also, timer0_millis and timer0_overflow_count take exactly 8 bytes.

However, I don't think it is a good idea to reduce the variable size. An overflow of 65s will be a big problem for projects with logging or multitasking.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

It's not the iRAM usage that is the problem, it's all the extra code for manipulating uint32_t 8 bits at a time. Here's a rough outline of a 16-bit counter ISR in 10 instructions and 16 bytes:

push PSW ; 2B
push A
setb RS1 ; 2B bank switch
inc R6   ; 1B
mov A, R6; 1B
jnz endt0; 2B
inc R7   ; 1B
endt0:
pop A
pop PSW
reti     ; 1B

You could extend it to 32 bits with 6 more instructions if you think 65s is too low. I think 65s is plenty even if you want to log data every hour, as the user could write a simple wait_minutes(uint8_t) that calls delay(60000) in a loop.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

Yes, I get your idea. I'm saying using a private bank will not use extra iRAM. It will take 8 bytes either way.

For the "BlinkWithoutDelay" type of code. The short overflow may be an issue. Users won't be able to do a timer longer than overflow time. Another concern is, when the user uses code like if (currentMillis - previousMillis >= interval) after the overflow, it will give the wrong result without explicit signed casting (Not tested on SDCC yet, tested on AVR-GCC). So a very long overflow is necessary for beginners.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

Did 2dd35c2 in https://github.com/DeqingSun/ch55xduino/tree/playground look good?

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

Did 2dd35c2 in https://github.com/DeqingSun/ch55xduino/tree/playground look good?

That's a big improvement. Does the compiler automatically push/pop the Accumulator? I don't think it does.
After I posted the code above, I realized you can avoid using the Accumulator by using:

inc R6
cjne R6, #0, endt0

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

Did 60b7a19 in https://github.com/DeqingSun/ch55xduino/tree/playground look good?

The code looks cleaner but it didn't save code space or run time. I am using inline assembly instead of assembly function, so prologue and epilogue are handled by the compiler. SDCC did do a good job in main.c.lst .

In "CH55X指令周期.PDF" (Instruction cycle), it mentioned:

CH55X assembly instruction overview:
The number of instruction cycles of non-jump instructions is the same as the number of instruction bytes;
Jump instructions containing MOVC/RET/CALL are usually several cycles longer than the number of bytes;
4 or 5 more cycles of MOVC instructions (5 more when the next instruction address is odd);
3 or 4 more cycles of RET/RETI instruction (4 more when the return address is odd);
2 or 3 more cycles for the remaining instructions (3 more when the target address is odd);
If no jump occurs in the conditional jump instruction, the number of cycles is the same as the number of instruction bytes;

I did a test

void loop() {
  // put your main code here, to run repeatedly:
  EA = 0;

  __asm__ ("    nop    \n");//adjust alignment

  __asm__ ("    mov r4,#0                                \n"
           "   clr P1.4                                   \n"
           "    inc r4                                   \n"
           "    cjne r4,#0,test01 \n"
           "test01:               \n"
           "    inc r4                                   \n"
           "    cjne r4,#0,test02 \n"
           "test02:               \n"
           "    inc r4                                   \n"
           "    cjne r4,#0,test03 \n"
           "test03:               \n"
           "    inc r4                                   \n"
           "    cjne r4,#0,test04 \n"
           "test04:               \n"
           "    setb P1.4                                   \n"
          );


  __asm__ ("    mov r4,#0                                \n"
           "   clr P1.4                                   \n"
           "    inc r4                                   \n"
           "    mov a, r4                                 \n"
           "    jnz test11      \n"
           "test11:               \n"
           "    inc r4                                   \n"
           "    mov a, r4                                 \n"
           "    jnz test12      \n"
           "test12:               \n"
           "    inc r4                                   \n"
           "    mov a, r4                                 \n"
           "    jnz test13      \n"
           "test13:               \n"
           "    inc r4                                   \n"
           "    mov a, r4                                 \n"
           "    jnz test14      \n"
           "test14:               \n"
           "    setb P1.4                                   \n"
          );
  EA = 1;
  delay(1);
}

So each inc and branch takes 6 or 7 instruction cycles, depending on alignment. cjne is 3 bytes, move and jnz are 3 bytes, too. They are the same.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

Thanks for details about instruction timing. I hadn't seen them before, but had found from testing that jumps usually take 3 extra cycles. I hadn't tested alignment though.
Does the compiler generate a prologue and epilogue for the ISR that does a push A/pop A, even though the newer version of the asm code doesn't clobber A?

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

Yes.

                                    739 ;------------------------------------------------------------
                                    740 ;Allocation info for local variables in function 'Timer0Interrupt'
                                    741 ;------------------------------------------------------------
                                    742 ;	/Users/sundeqing/Library/Arduino15/packages/CH55xDuino/hardware/mcs51/0.0.3/cores/ch55xduino/main.c:36: void Timer0Interrupt(void) __interrupt (INT_NO_TMR0) __using(1) //using register bank 1
                                    743 ;	-----------------------------------------
                                    744 ;	 function Timer0Interrupt
                                    745 ;	-----------------------------------------
      00003F                        746 _Timer0Interrupt:
                           00000F   747 	ar7 = 0x0f
                           00000E   748 	ar6 = 0x0e
                           00000D   749 	ar5 = 0x0d
                           00000C   750 	ar4 = 0x0c
                           00000B   751 	ar3 = 0x0b
                           00000A   752 	ar2 = 0x0a
                           000009   753 	ar1 = 0x09
                           000008   754 	ar0 = 0x08
      00003F C0 E0            [24]  755 	push	acc
      000041 C0 D0            [24]  756 	push	psw
      000043 75 D0 08         [24]  757 	mov	psw,#0x08
                                    758 ;	/Users/sundeqing/Library/Arduino15/packages/CH55xDuino/hardware/mcs51/0.0.3/cores/ch55xduino/main.c:64: );
                                    759 ;Increase	timer0_overflow_count on R4~R7     
      000046 0C               [12]  760 	inc	r4                                   
      000047 BC 00 09         [24]  761 	cjne	r4,#0,incTimer0_overflow_countOver$ 
      00004A 0D               [12]  762 	inc	r5                                   
      00004B BD 00 05         [24]  763 	cjne	r5,#0,incTimer0_overflow_countOver$ 
      00004E 0E               [12]  764 	inc	r6                                   
      00004F BE 00 01         [24]  765 	cjne	r6,#0,incTimer0_overflow_countOver$ 
      000052 0F               [12]  766 	inc	r7                                   
      000053                        767 	incTimer0_overflow_countOver$:
                                    768 ;Has	timer0_overflow_count inc by 8?         
      000053 74 07            [12]  769 	mov	a, #7                                
      000055 5C               [12]  770 	anl	a, r4                                
      000056 70 0D            [24]  771 	jnz	incTimer0_millisOver$                
                                    772 ;Increase	timer0_millis on R0~R3             
      000058 08               [12]  773 	inc	r0                                   
      000059 B8 00 09         [24]  774 	cjne	r0,#0,incTimer0_millisOver$         
      00005C 09               [12]  775 	inc	r1                                   
      00005D B9 00 05         [24]  776 	cjne	r1,#0,incTimer0_millisOver$         
      000060 0A               [12]  777 	inc	r2                                   
      000061 BA 00 01         [24]  778 	cjne	r2,#0,incTimer0_millisOver$         
      000064 0B               [12]  779 	inc	r3                                   
      000065                        780 	incTimer0_millisOver$:
                                    781 ;	/Users/sundeqing/Library/Arduino15/packages/CH55xDuino/hardware/mcs51/0.0.3/cores/ch55xduino/main.c:65: }
      000065 D0 D0            [24]  782 	pop	psw
      000067 D0 E0            [24]  783 	pop	acc
      000069 32               [24]  784 	reti
                                    785 ;	eliminated unneeded push/pop dpl
                                    786 ;	eliminated unneeded push/pop dph
                                    787 ;	eliminated unneeded push/pop b

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

Fixed in release 0.0.4

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

If you make the ISR naked, the compiler won't include the superfluous push acc/pop acc, saving 4 bytes and 4 cycles. And if you set/clear the bank1 bit in psw, you'll save the 3-byte mov instruction. You'd need to add a nop to maintain the word alignment of the cjne instructions.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

@nerdralph The interrupt handler does need to convert timer0_overflow_count to timer0_millis. So acc is needed for every interrupt call. Also, using SETB and CLR to change RS0 and RS1 seems to take 4 bytes.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

I didn't notice the accumulator use. I think you might be able to remove that, and just count overflows. The adjustment from overflows to micros/millis could be done in the functions themselves.
You only need one SETB call to set RS0, and a CLR to clear it at the end. You don't need to touch RS1, as ch55xduino only uses banks 0 & 1.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

If we move adjustment to functions, the overflow time will be significantly decreased. I would be more comfortable keeping it similar to the original Arduino algorithm.

Timer 0 interrupt has a pretty high priority. If the user uses RS1, there will be a problem if we don't touch it.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

I don't think you understand what I mean about only counting overflows in the ISR. The original Arduino algorithm is poorly written, as it makes the ISR more complex in order to simplify the implementation of millis()/micros(). I'm currently writing a millis timer replacement for the AVR core that uses only a timer overflow counter, and doesn't disable interrupts in the millis and micros functions. If you understand AVR asm, it should be easy for you to copy the algorithm for ch55xduino. If not, I'll port it to 8051 assembler after I finish the AVR version.

As for the register banks, if the sketch code can use custom banks, you should document that or else there could be a conflict with bank1 between the timer ISR and the sketch code.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

You can copy my modified version of delay to reduce the size of ch55xduino. No precision is lost by using 16 bits for micros since it is only incremented by 1000 (10 bits) at a time.
https://github.com/nerdralph/ArduinoShrink/blob/master/src/wiring.c#L17

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

I get the idea of using a smaller data type for the delay. How would you suggest doing the millis function? If we only keep timer0_overflow_count, should we do a 3-bit shift in millis? We will save 7 clockes every 1/8 ms, but there will be 50 clocks more for millis to do the shift. Depending on how frequently millis is called, is it worthwhile?

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

I get the idea of using a smaller data type for the delay. How would you suggest doing the millis function? If we only keep timer0_overflow_count, should we do a 3-bit shift in millis? We will save 7 clockes every 1/8 ms, but there will be 50 clocks more for millis to do the shift. Depending on how frequently millis is called, is it worthwhile?

Saving time in an ISR is more important than saving time in user code. In embedded systems interrupt response time is more important than non-interrupt code. You can also significantly improve the millis() and micros() functions from the Arduino versions by leaving interrupts enabled.
https://github.com/nerdralph/ArduinoShrink/blob/master/src/micros.S

For the correction from overflows to millis, multiplying overflows by 6 and then divide by 256 will give the same accuracy as using the Arduino version. This assumes 16Mhz, so for other speeds the math needs to be adjusted.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

@nerdralph I made 0036b18 and 07bcda6 . Not in main branch yet.

So in order to get 4 bytes result in millis, timer0_overflow_count is now 5 bytes. millis() function shift the 5 byte timer0_overflow_count by 3 bits to get millis. Since 51 is not efficient at shifting. multiply by 32 is used instead.

How do you think the 2 commits?

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

@nerdralph I made 0036b18 and 07bcda6 . Not in main branch yet.

So in order to get 4 bytes result in millis, timer0_overflow_count is now 5 bytes. millis() function shift the 5 byte timer0_overflow_count by 3 bits to get millis. Since 51 is not efficient at shifting. multiply by 32 is used instead.

How do you think the 2 commits?

The 5 bytes in the AVR version should not be needed for the 8051. In the AVR version of the timer overflow ISR does not stop incrementing at 4 (or even 5) bytes. It will increment the 5th byte after 2^32 overflows, which will take 49.71 days to happen. It will clobber the 6th byte after 34.8 years, so my ArduinoShrink library assumes the AVR will not run continuously without reset for longer than that.

From my first skim of the code, I have a couple ideas:

  1. You don't need to disable interrupts when reading the overflow count. You can read the least significant byte twice, and if it has changed, go back and read the 4 bytes again.
    https://github.com/nerdralph/ArduinoShrink/blob/master/src/millis.S#L23
  2. You might be able to shift efficiently like this:
mov count, #3
loop:
clr C
mov A, R0
rlc
mov R0, A
mov A, R1
rlc
mov R1, A
mov A, R2
rlc
mov R2, A
mov A, R3
rlc
mov R3, A
djnz count, loop

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

I thought of a faster way to increment the t0 overflow count in the ISR:

mov r7, #0
setb C
mov A, r0
addc A, r7
mov r0, A
mov A, r1
addc A, r7
mov r1, A
mov A, r2
addc A, r7
mov r2, A
mov A, r3
addc A, r7
mov r3, A

And here's another version optimized for size (with the counter in r7:r2 (but you'll only need to read r5:r2 for the uint32 count):

mov r0, #0
mov r1, #ar2
setb C
inct0:
mov A, @r1
addc A, r0
mov @r1, A
jz inct0

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

@nerdralph timer0_overflow_count increase 8 times per second. so the 4-byte variable will overflow after 6 days. If we don't use 5 bytes for it, millis will overflow at 2^29, which makes the overflow handling much more complicated. AVR can make timer trigger every 1 millisecond so 4 bytes are enough, but CH55x can not. I think one extra byte is worthy.

Yes, it is possible to do repeated reading to avoid disabling interrupt. But is it necessary to avoid that for only 10 clocks?

using multiply to do shifting takes about 40 bytes of code for 5 bytes, probably 40 clocks. But rlc takes 13 clocks each bit for 4 bytes. It is not faster.

using addc instead of branching is faster in the worse case, but slow is the best cases. Since 255 of 256 times we only need to add one byte, branching is better I think.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

@nerdralph timer0_overflow_count increase 8 times per second. so the 4-byte variable will overflow after 6 days. If we don't use 5 bytes for it, millis will overflow at 2^29, which makes the overflow handling much more complicated. AVR can make timer trigger every 1 millisecond so 4 bytes are enough, but CH55x can not. I think one extra byte is worthy.

OK, I see now that the only prescaler options for T0 are /12 (which you are using), /4, or /1(no prescale). If you think overflow in 6 days is going to cause problems vs 50 days for the Arduino AVR core, then use the 5th byte.

Yes, it is possible to do repeated reading to avoid disabling interrupt. But is it necessary to avoid that for only 10 clocks?

Why wouldn't you want to improve interrupt latency when it comes at virtually no cost? Checking the lowest byte and reloading takes about the same amount of cycles and code as disabling and re-enabling interrupts.

using multiply to do shifting takes about 40 bytes of code for 5 bytes, probably 40 clocks. But rlc takes 13 clocks each bit for 4 bytes. It is not faster.

Inside interrupts, speed & determinism is most important. Outside interrupts code size is usually more important than speed.

using addc instead of branching is faster in the worse case, but slow is the best cases. Since 255 of 256 times we only need to add one byte, branching is better I think.

The main difference is mov/addc/mov is deterministic - it always takes the same number of cycles. It's not a big difference, so I can see why you may want to keep the inc/cjne version.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

Why wouldn't you want to improve interrupt latency when it comes at virtually no cost? Checking the lowest byte and reloading takes about the same amount of cycles and code as disabling and re-enabling interrupts.

Checking the lowest byte only is not going to work well. There are other interrupts such as USB one. It may take more than 32ms, although not likely but it will be a bug. And such a bug will be extremely difficult to debug. Checking the lowest 3 bytes will be much safer.

Inside interrupts, speed & determinism is most important. Outside interrupts code size is usually more important than speed.

The main difference is mov/addc/mov is deterministic - it always takes the same number of cycles. It's not a big difference, so I can see why you may want to keep the inc/cjne version.

Inside interrupts I prefer speed. So far I don't see how Arduino style code can be benefited from deterministic. On the code side, I think there is still a lot of other stuff to optimize, such as linking. There is still a lot of unnecessary functions linked to Arduino sketch.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

Why wouldn't you want to improve interrupt latency when it comes at virtually no cost? Checking the lowest byte and reloading takes about the same amount of cycles and code as disabling and re-enabling interrupts.

Checking the lowest byte only is not going to work well. There are other interrupts such as USB one. It may take more than 32ms, although not likely but it will be a bug. And such a bug will be extremely difficult to debug. Checking the lowest 3 bytes will be much safer.

Now you don't seem to know what you are talking about. Any interrupt taking 32ms is a bug. If I had an embedded programmer working for me that wrote an ISR that took even 1ms on a MCU running at 24Mhz, he'd be fired. Most of the ISR code I write takes less than 100 cycles.

Do realize that any ISR taking more than 125uS will stop the millis counter from working? If you don't understand realtime embedded software development then I don't see much point in trying to help you with ch55xduino.

from ch55xduino.

DeqingSun avatar DeqingSun commented on June 2, 2024

For a professional programmer, a 32ms is a bug. I never wrote such a long interrupt. But this is an Arduino board support with attachInterrupt. You never know what the user will do. Comparing more than 1 byte eliminates the possibility. Or we just add another flag using 1 bit for an interrupt to inform the main code the value is updated.

No, long ISR will not stop millis counter from working. INT_NO_TMR0 has priority only lower to INT_NO_INT0 interrupt. Interrupt nesting will occur.

from ch55xduino.

nerdralph avatar nerdralph commented on June 2, 2024

No, long ISR will not stop millis counter from working. INT_NO_TMR0 has priority only lower to INT_NO_INT0 interrupt. Interrupt nesting will occur.

You are proving my suspicion that you don't understand how 8051 interrupts work. To have T0 interrupt all other ISRs, you'd have to make it a high priority interrupt by setting PT0 bit in the IP register. I'm pretty sure you're not doing that. Github doesn't support searching in forks, so can't quickly search all your repository for PT0, so there's a slight chance I missed it.

I don't have the time to be teaching you this stuff, especially when you think you already know it all. I may check back on the project in a couple months to see if you have progressed.

from ch55xduino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.