Git Product home page Git Product logo

Comments (11)

MCUdude avatar MCUdude commented on August 19, 2024

The abs macro bit me in the rear today. I was incrementing a variable inside abs, and it ended up incrementing twice.
I'll have to do something about this.

@nerdralph suggested to use __builtin_abs() for the abs macro. Do you know other replacements that will work and isn't performance hogs?

#define abs(x) __builtin_abs(x)

Oh, I found these examples of the Arduino forums as well. Not sure if they are any "safer", faster or slower though:

#define max(a,b) \
       ({ typeof (a) _a = (a); \
           typeof (b) _b = (b); \
         _a > _b ? _a : _b; })

#define min(a,b) \
       ({ typeof (a) _a = (a); \
           typeof (b) _b = (b); \
         _a < _b ? _a : _b; })

from mcudude_corefiles.

MCUdude avatar MCUdude commented on August 19, 2024

Apparently, The abs macro below occupies less flash than __builtin_abs:

#define abs(a) \
       ({ typeof (a) _a = (a); \
         _a > 0 ? _a : -_a; })

from mcudude_corefiles.

nerdralph avatar nerdralph commented on August 19, 2024

from mcudude_corefiles.

MCUdude avatar MCUdude commented on August 19, 2024

It turns out I was wrong. It's the other way around! __builtin_abs occupies less flash, but only when LTO is disabled.
Do you know where I can find the source code for __builtin_abs ?

I tested it with the program below:

#define new_abs(a) \
       ({ typeof (a) _a = (a); \
         _a > 0 ? _a : -_a; })

#define new_abs1(x) __builtin_abs(x)

int32_t val;

void setup() {
  pinMode(0, OUTPUT);
}

void loop() {
  digitalWrite(0, HIGH);
  delay(new_abs1(val));
  digitalWrite(0, LOW); 
  delay(new_abs1(val));
  val--;
}

Compiled size for ATmega1284P when LTO is off and new_abs is used: 1252 bytes (13 bytes of RAM)
Compiled size for ATmega1284P when LTO is off and new_abs1 is used: 1236 bytes (13 bytes of RAM)

from mcudude_corefiles.

SpenceKonde avatar SpenceKonde commented on August 19, 2024

According to the docs, it's a "builtin function of gcc"

As far as I can see, there isnt source for it? Though you can always compile it and examine the assembly listing if you care... I also suspect it matters how many times you call it...

Also.... I think you want to declare val volatile for that test, instead of changing it like you are doing...

from mcudude_corefiles.

MCUdude avatar MCUdude commented on August 19, 2024

As far as I can see, there isnt source for it? Though you can always compile it and examine the assembly listing if you care... I also suspect it matters how many times you call it...

Couldn't find it either...

Also.... I think you want to declare val volatile for that test, instead of changing it like you are doing...

The program works regardless. Care to explain why you think it's necessary in this case?

Anyways, I've come up with these. They are still macros, but they all work as expected:

#define abs(x)       __builtin_abs(x)
#define sq(x)        ({ typeof (x) _x = (x); _x * _x; })
#define min(a,b)     ({ typeof (a) _a = (a); typeof (b) _b = (b); _a < _b ? _a : _b;  })
#define max(a,b)     ({ typeof (a) _a = (a); typeof (b) _b = (b); _a > _b ? _a : _b;  })
#define round(x)     ({ typeof (x) _x = (x);  _x >= 0 ? (long)x + 0.5 : (long)x - 0.5 })
#define radians(deg) ((deg) * DEG_TO_RAD)
#define degrees(rad) ((rad) * RAD_TO_DEG)
#define constrain(x,low,high)   ({ \
  typeof (x) _x = (x);             \
  typeof (low) _l = (l);           \
  typeof (high) _h = (h);          \
  _x < _l ? _l : _x > _h ? _h :_x })

BTW it's very strange that this hasn't been fixed in any of the official Arduino cores yet. The fix is so simple, and the current implementation brings no benefit compared to the "new" one.

On the Arduino reference page they have even documented its "unexpected behavior":

Because of the way the abs() function is implemented, avoid using other functions inside the brackets, it may lead to incorrect results.
abs(a++); // avoid this - yields incorrect results
// use this instead:
abs(a);
a++; // keep other math outside the function

from mcudude_corefiles.

SpenceKonde avatar SpenceKonde commented on August 19, 2024

IIRC the rules that the C standard defines that dictate what the optimizer is allowed to do can occasionally get weird and perverse... because even though, in reality, when you roll an integer over, it wraps around, the C standard doesn't guarantee that, and the compiler is allowed to shit on that... I can't find where I put it, but I remember being so flabbergasted by when I encountered it once that I saved it...I thought I put it in github, but can't find it.

It was something like

byte i = 1;
while (i++) {...}

and under certain circumstances (usually not, but occasionally), the compiler would optimize out the test entirely, because i was always going to be 1 because it's never subtracted from, only added to, and it starts out as 1, so it can never be zero... I was stunned that it was allowed to do that! (IIRC I ran into at least one other wacky unexpected optimization there). And like, it wouldn't do that under normal circumstances, either - make anything slightly more complex, and it wouldn't do that... but in the very simple minimal test case, like one writes for understanding things like flash usage like that...... yeah... wtf...

Since then I always just make things volatile when I want to make sure the compiler doesn't pull some wacky optimization out of the twilight zone to throw my test off the mark.

For example, in your test, new_abs1(val) is only calculated once per iteration of loop... this takes the same flash as your version of loop:

void loop() {
  digitalWrite(0, HIGH);
  int32_t test=new_abs1(val);
  delay(test);
  digitalWrite(0, LOW); 
  delay(test);
  val--;
}

I don't think that was what you meant was it?

On a t1614, just quickly...
As you wrote, or as I noted above, 872. (ie, it's only calculated once)
volatile val as you have it, 914 (ie, calculated twice)
volatile val as I wrote, 884 (ie, calculated once)
volatile val, as I wrote, but switched to the new_abs instead of new_abs1, 924 (oof!)
volatile val, as you have it, switched to new_abs: 1010 (dear lord!)

Why is new_abs made of lose and fail? I may be wrong, but I am suspicious that typeof() is copying the volatility too, because this:

void loop() {
  digitalWrite(0, HIGH);
  int32_t test=val; //can't optimize this away
  delay(new_abs(test));
  digitalWrite(0, LOW); 
  test=val; // or this, so two accesses
  delay(new_abs(test)); //and have to recalculate - compiler can't assume what val, hence test is
  val--;
}

or

void loop() {
  digitalWrite(0, HIGH);
  delay(new_abs((int32_t )val));
  digitalWrite(0, LOW); 
  delay(new_abs((int32_t )val));
  val--;
}

both compile to 906....

And this one:

void loop() {
  digitalWrite(0, HIGH);
  delay(abs(val));
  digitalWrite(0, LOW); 
  delay(abs(val));
  val--;
}

with volatile val, 914, 884 if you use a temp variable like I did above...

But yeah - always declare things volatile if you want to make sure the compiler isn't taking shortcuts you don't want it to... but I guess you need to cast away the volatility right when you pass it to a macro, heh... I wasn't expecting that bit...

from mcudude_corefiles.

SpenceKonde avatar SpenceKonde commented on August 19, 2024

Oh, wait, the abs() that megaTinyCore is using isn't the same as the one the official core uses, it looks like.... lol... okay, I'm done with this for now, have other things I need to get done this weekend, but I can't find any definition of abs() in megaTinyCore, so it's using the one from stdlib.h... which is __builtin_abs()

But yeah - in this case I think I agree with your conclusion: __builtin_abs() it is, but your method doesn't actually prove it

from mcudude_corefiles.

nerdralph avatar nerdralph commented on August 19, 2024

from mcudude_corefiles.

SpenceKonde avatar SpenceKonde commented on August 19, 2024

from mcudude_corefiles.

nerdralph avatar nerdralph commented on August 19, 2024

Hmm - should we be using that flag for our cores, or does it cause other problems?

I only recently became aware of how stupid gcc can be with integer overflow, so I haven't done much testing with -fwrapv.
https://www.avrfreaks.net/forum/failure-truncate-16bits-when-flto

from mcudude_corefiles.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.