Comments (11)
Original comment by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
Hi Xavier,
Oh dear. That seems bad.
It is quite possible that you have run into a problem with the way rios handles things when using the multiprocessing option. That option has never been very well tested, as we use some of the other parallel options instead.
However, before we get too far into that, I should say that I would be surprised if Fmask benefits much from running in parallel. Most of it is (I think) largely I/O limited, rather than compute limited. The one part which is strongly compute limited is the matching of clouds to shadows, and this is not written so be able to benefit from parallel processing, and runs across the whole image at once (because it does not know how far it will have to search to find the shadow). So, I would expect that running parallel in the way yo are suggesting will not actually help much.
Do you think that is correct?
It is very brave of you to try this, by the way. :-)
Neil
from python-fmask.
Original comment by Xavier Corredor Llano (Bitbucket: XavierCLL, GitHub: XavierCLL).
Hi Neil
Yeah you are right, maybe some process need calculate in whole image and this makes it impossible run in parallel. But, I would know which is the best option for run in parallel in Rios library (the most tested). The parallel process works for some parts of code and it increase the velocity of run (somewhat), I recomment it (maybe this can a new feature, that the user can choose if want run in parallel)
On other hand, thinking in improve the time of run, I saw some process that maybe are unnecessary like as "Computing Statistics" and "Computing Pyramid Layers" is really neccesay for fmask process? can I disabled this? how?
thanks a lot :)
from python-fmask.
Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).
Hi Xavier,
It turns out that landsatTOA.py calls info.getNoDataValueFor() which does not work when running in parallel. I've added a paragraph to the RIOS docstring about this: https://bitbucket.org/chchrsc/rios/commits/2131b9e9381e6621978c0a7dc00d39a9f3b4f0c7.
There is no reason why RIOS could not cache the no data value in the ReaderInfo object rather than calling through to GDAL each time, and this would make it run in parallel. Feel free to open a PR against RIOS and we can discuss the finer details.
However, I would echo Neil's sentiments about speed. Try removing the call to getNoDataValueFor and see what happens. I tend to find that for simple calculations like this things tend to run slower, not faster, due to the extra communication overhead.
Regarding pyramid layers and statistics, you can disable them with this call on the controls object: http://rioshome.org/en/latest/rios_applier.html#rios.applier.ApplierControls.setCalcStats
Sam.
from python-fmask.
Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).
I should have said that the reason for the weird error was that Python 2.x can't seem to report the exception in the sub process properly, if you run with Python 3.x you will see the exception raised when calling getNoDataValueFor() in the sub process. See http://stackoverflow.com/questions/29080650/debugging-errors-in-python-multiprocessing.
from python-fmask.
Original comment by Xavier Corredor Llano (Bitbucket: XavierCLL, GitHub: XavierCLL).
Hi Sam, thanks for your explanation,
I tried set info.getNoDataValueFor() to None in landsatTOA.py and works fine!, but in fmask.py in doPotentialCloudFirstPass for enable multiprocessing I set the info.getNoDataValueFor() to None in potentialCloudFirstPass, the result is different compared with not parallel version. The error persist too in finalizeAll in fmask.py changing the ApplierControls.
On the other hand, I set False to setCalcStats in all controls Rios instance for disable "Computing Statistics" and "Computing Pyramid Layers", improve the run of process, now required the 50%!, but the result compared with normal run is so different (mainly for the shadow) is weird, I thinked that this process was independent, is setCalcStats part of the fmask process?
Thanks a lot
from python-fmask.
Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).
Hi Xavier
Neil may comment more when he is back from holidays, but getting and seeing the no data (which is what calculating the stats also does) is very important for fmask operation so no surprise you are getting different answers.
My reason for suggesting that you do this was to show that the speed is actually worse when running in parallel. I'm guessing this is what you found, correct?
Sam
from python-fmask.
Original comment by Xavier Corredor Llano (Bitbucket: XavierCLL, GitHub: XavierCLL).
Hi Sam,
Thanks, It good idea know more about that, maybe Neil help me about it.
With some test with some image, the partial parallel improve the time of run it's not much, but for me is better implement the parallel even if it is partial.
Thanks
from python-fmask.
Original comment by Sam Gillingham (Bitbucket: gillins, GitHub: gillins).
Hi Xavier,
I'm interested that you seem to get an improvement in speed. On my AMD machine running Linux Mint running fmask_usgsLandsatTOA.py with 4 CPUs is actually 6% slower than one. So multiprocessing seems a waste of time in this case.
Are you getting different results from this? And if so, what sort of hardware do you have?
Sam.
from python-fmask.
Original comment by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
Hi Xavier,
I have taken a moment to look into all this. There are several points to discuss.
I have modified the null value handling in fmask.py, so that it never uses getNoDataValueFor(), and this should allow it to be run with parallel, if you wish. However, parallel can only be used on the third pass and the final pass sections. Both the first and second passes are accumulating information between processing blocks, so these blocks cannot be run in parallel. I have tested this, and found no useful speed-up, with up to 30 threads. I believe that this is because the processing is I/O limited.
I modified landsatTOA.py to avoid calculating the stats and pyramids, and instead to set the null value explicitly on the output file. It is important to have the null value set on the TOA image file, as this is used to exclude the null area from consideration in the Fmask algorithm (very important to do this). Doing this made the TOA calculation neary 50% faster, as you said, but it is only about 10 seconds on my hardware, so I had not really felt it to be important. However, I have committed this change, too, if you think it is helpful. I am guessing that the lack of the null value would have been the reason for the difference in results which you found. I get identical results with these changes.
The stats and pyramids are not calculated in any of the intermediate images from fmask.py - I already had setCalcStats(False) in all cases. The stats and pyramids are calculated for the final output cloud image, but I believe that this is useful, because it makes the image easier to view. It takes only 4 seconds on my hardware, so I am going to leave that in place.
Thanks for being so keen on this project. It is really nice to have a strong interest from users. I hope that this discussion has been helpful to you.
Neil
from python-fmask.
Original comment by Xavier Corredor Llano (Bitbucket: XavierCLL, GitHub: XavierCLL).
Hi Neil and Sam,
Sorry for the delay, I was very busy the last week.
Neil you are right and thanks for your explanation. I made a little test, first I ran normally the fmask, after I enabled the parallel process and finally disable stats for some process (no for the final result), the result are (in my little laptop with 8 cores, for a image of ladsat 8, time for angle, saturation, toa and stacked function):
- time run normal (without parallel): 4:34
- time run (+ parallel): 4:18
- time run (+ parallel and - stats): 4:09
And yeah, the parallel process is not useful for improve the speed-up, as you said.
I made a clone with this changes maybe is useful for you or for someone interested about this topic: https://bitbucket.org/XavierCLL/python-fmask
-
Commit for parallel: https://bitbucket.org/XavierCLL/python-fmask/commits/68f13ce9d786bdbcb2e87d95c513815e9be39c0c
-
Commit for disable setCalcStats in some functions: https://bitbucket.org/XavierCLL/python-fmask/commits/147a1976e4842a0bcb0feef678a24b5ce70acd41
Thanks a lot Neil ans Sam,
Regards
from python-fmask.
Original comment by Neil Flood (Bitbucket: neilflood, GitHub: neilflood).
Hi Xavier,
thanks for all that. Good to know that my intuitions on this were correct. Thanks for investigating it thoroughly.
cheers
Neil
from python-fmask.
Related Issues (20)
- Using Fmask HOT 7
- python-fmask and google colab HOT 1
- A proposal to adapt Fmask for use in Google Earth Engine HOT 5
- can not run version 0.5.5 HOT 4
- Add support for Landsat Collection 2 HOT 9
- Terraced artefacts in Fmask cloud shadow data HOT 12
- Cloud shadow extrapolation to scene edge HOT 1
- Copernicus Sentinel-2 Major Products Upgrade on 2021/10/26 HOT 4
- Does python-fmask also need the auxiliary data๏ผ HOT 4
- Output file are colored with one colour HOT 1
- Add Sentinel-2 run with TOA feature back
- Default Landsat cloud and shadow dilations used by `python-fmask`? HOT 2
- About applying the generated mask back to the RGB file HOT 1
- dummy question
- Extract the snow/ice and water mask out from output .tif file HOT 1
- Can I run fmask on tiff files? HOT 2
- cannot import name '_fillminima' from 'fmask' HOT 2
- About the results HOT 3
- Unable to move the cache, Access denied 0x5 HOT 6
- Pure latency measure HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-fmask.