Comments (11)
The abs
macro bit me in the rear today. I was incrementing a variable inside abs
, and it ended up incrementing twice.
I'll have to do something about this.
@nerdralph suggested to use __builtin_abs()
for the abs macro. Do you know other replacements that will work and isn't performance hogs?
#define abs(x) __builtin_abs(x)
Oh, I found these examples of the Arduino forums as well. Not sure if they are any "safer", faster or slower though:
#define max(a,b) \
({ typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b ? _a : _b; })
#define min(a,b) \
({ typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a < _b ? _a : _b; })
from mcudude_corefiles.
Apparently, The abs macro below occupies less flash than __builtin_abs:
#define abs(a) \
({ typeof (a) _a = (a); \
_a > 0 ? _a : -_a; })
from mcudude_corefiles.
from mcudude_corefiles.
It turns out I was wrong. It's the other way around! __builtin_abs
occupies less flash, but only when LTO is disabled.
Do you know where I can find the source code for __builtin_abs
?
I tested it with the program below:
#define new_abs(a) \
({ typeof (a) _a = (a); \
_a > 0 ? _a : -_a; })
#define new_abs1(x) __builtin_abs(x)
int32_t val;
void setup() {
pinMode(0, OUTPUT);
}
void loop() {
digitalWrite(0, HIGH);
delay(new_abs1(val));
digitalWrite(0, LOW);
delay(new_abs1(val));
val--;
}
Compiled size for ATmega1284P when LTO is off and new_abs
is used: 1252 bytes (13 bytes of RAM)
Compiled size for ATmega1284P when LTO is off and new_abs1
is used: 1236 bytes (13 bytes of RAM)
from mcudude_corefiles.
According to the docs, it's a "builtin function of gcc"
As far as I can see, there isnt source for it? Though you can always compile it and examine the assembly listing if you care... I also suspect it matters how many times you call it...
Also.... I think you want to declare val volatile for that test, instead of changing it like you are doing...
from mcudude_corefiles.
As far as I can see, there isnt source for it? Though you can always compile it and examine the assembly listing if you care... I also suspect it matters how many times you call it...
Couldn't find it either...
Also.... I think you want to declare val volatile for that test, instead of changing it like you are doing...
The program works regardless. Care to explain why you think it's necessary in this case?
Anyways, I've come up with these. They are still macros, but they all work as expected:
#define abs(x) __builtin_abs(x)
#define sq(x) ({ typeof (x) _x = (x); _x * _x; })
#define min(a,b) ({ typeof (a) _a = (a); typeof (b) _b = (b); _a < _b ? _a : _b; })
#define max(a,b) ({ typeof (a) _a = (a); typeof (b) _b = (b); _a > _b ? _a : _b; })
#define round(x) ({ typeof (x) _x = (x); _x >= 0 ? (long)x + 0.5 : (long)x - 0.5 })
#define radians(deg) ((deg) * DEG_TO_RAD)
#define degrees(rad) ((rad) * RAD_TO_DEG)
#define constrain(x,low,high) ({ \
typeof (x) _x = (x); \
typeof (low) _l = (l); \
typeof (high) _h = (h); \
_x < _l ? _l : _x > _h ? _h :_x })
BTW it's very strange that this hasn't been fixed in any of the official Arduino cores yet. The fix is so simple, and the current implementation brings no benefit compared to the "new" one.
On the Arduino reference page they have even documented its "unexpected behavior":
Because of the way the abs() function is implemented, avoid using other functions inside the brackets, it may lead to incorrect results.
abs(a++); // avoid this - yields incorrect results
// use this instead:
abs(a);
a++; // keep other math outside the function
from mcudude_corefiles.
IIRC the rules that the C standard defines that dictate what the optimizer is allowed to do can occasionally get weird and perverse... because even though, in reality, when you roll an integer over, it wraps around, the C standard doesn't guarantee that, and the compiler is allowed to shit on that... I can't find where I put it, but I remember being so flabbergasted by when I encountered it once that I saved it...I thought I put it in github, but can't find it.
It was something like
byte i = 1;
while (i++) {...}
and under certain circumstances (usually not, but occasionally), the compiler would optimize out the test entirely, because i was always going to be 1 because it's never subtracted from, only added to, and it starts out as 1, so it can never be zero... I was stunned that it was allowed to do that! (IIRC I ran into at least one other wacky unexpected optimization there). And like, it wouldn't do that under normal circumstances, either - make anything slightly more complex, and it wouldn't do that... but in the very simple minimal test case, like one writes for understanding things like flash usage like that...... yeah... wtf...
Since then I always just make things volatile when I want to make sure the compiler doesn't pull some wacky optimization out of the twilight zone to throw my test off the mark.
For example, in your test, new_abs1(val) is only calculated once per iteration of loop... this takes the same flash as your version of loop:
void loop() {
digitalWrite(0, HIGH);
int32_t test=new_abs1(val);
delay(test);
digitalWrite(0, LOW);
delay(test);
val--;
}
I don't think that was what you meant was it?
On a t1614, just quickly...
As you wrote, or as I noted above, 872. (ie, it's only calculated once)
volatile val as you have it, 914 (ie, calculated twice)
volatile val as I wrote, 884 (ie, calculated once)
volatile val, as I wrote, but switched to the new_abs instead of new_abs1, 924 (oof!)
volatile val, as you have it, switched to new_abs: 1010 (dear lord!)
Why is new_abs made of lose and fail? I may be wrong, but I am suspicious that typeof() is copying the volatility too, because this:
void loop() {
digitalWrite(0, HIGH);
int32_t test=val; //can't optimize this away
delay(new_abs(test));
digitalWrite(0, LOW);
test=val; // or this, so two accesses
delay(new_abs(test)); //and have to recalculate - compiler can't assume what val, hence test is
val--;
}
or
void loop() {
digitalWrite(0, HIGH);
delay(new_abs((int32_t )val));
digitalWrite(0, LOW);
delay(new_abs((int32_t )val));
val--;
}
both compile to 906....
And this one:
void loop() {
digitalWrite(0, HIGH);
delay(abs(val));
digitalWrite(0, LOW);
delay(abs(val));
val--;
}
with volatile val, 914, 884 if you use a temp variable like I did above...
But yeah - always declare things volatile if you want to make sure the compiler isn't taking shortcuts you don't want it to... but I guess you need to cast away the volatility right when you pass it to a macro, heh... I wasn't expecting that bit...
from mcudude_corefiles.
Oh, wait, the abs() that megaTinyCore is using isn't the same as the one the official core uses, it looks like.... lol... okay, I'm done with this for now, have other things I need to get done this weekend, but I can't find any definition of abs() in megaTinyCore, so it's using the one from stdlib.h... which is __builtin_abs()
But yeah - in this case I think I agree with your conclusion: __builtin_abs() it is, but your method doesn't actually prove it
from mcudude_corefiles.
from mcudude_corefiles.
from mcudude_corefiles.
Hmm - should we be using that flag for our cores, or does it cause other problems?
I only recently became aware of how stupid gcc can be with integer overflow, so I haven't done much testing with -fwrapv.
https://www.avrfreaks.net/forum/failure-truncate-16bits-when-flto
from mcudude_corefiles.
Related Issues (12)
- Unified tab spacing and line feed
- gitlab CI HOT 2
- delay() on 20 & 24MHz is wrong HOT 4
- Accurate delayMicroseconds for 14.7456 MHz, 11.0592 MHz, 7.3728 MHz, 3.6864 MHz and 1.8432 MHz HOT 7
- Travis fails HOT 2
- Radical suggestion HOT 5
- Atmega8a-PU (DIP-28) @ internal 8Mhz issue HOT 4
- Compatibility of delayMicroseconds HOT 3
- Support for ATmega48PB/88PB/168PB/328PB should be considered HOT 13
- How on earth do I update this subtree? HOT 6
- Update CI script? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mcudude_corefiles.