Comments (16)
Yes suspect something to do with multiprocessing.. Set threads_=1
but issue persists. May try using celery or aws lambda to spread the load. Feel free to close this issue if you want
from jimutmap.
Probably I will try to find a solution by next week.
Let’s see.
If you solve this before, then please feel free to share the solution by doing a PR :)
Best,
-Jimut
from jimutmap.
OK today threads_=1
appears to be OK...
from jimutmap.
Eventually, I have to use a database, since I have to write the image stitching module sometime.
This tool was created with the hypothetical idea of converting 2D satellite images to 3D by using GANs and other related unsupervised deep learning stuffs (back in 2019).
Not sure when I will get time to work on this project for solving that initial purpose :) But I will come up with the solution to the present bug by next week.
from jimutmap.
I'm quite happy to leave it running over a weekend on my Max, so speed is not my main concern. The generated filenames are unique? One suggestion is the download method could return a dictionary of the created files, request etc. This can then be appended to a pandas data frame, inserted to SQLite db etc.
from jimutmap.
Using threads=1 I left it overnight and it completed. OK thanks for looking into this, I will consider other ways to parallize if I need to in future. Cheers
from jimutmap.
Thanks for opening your first issue here! Be sure to follow the issue template!
from jimutmap.
I probably might have run into this bug before, but ignored it, as I thought it might be a network connection issue. Here is a workable idea to solve the issue, but not sure if it will be efficient. Let’s keep a temporary database for each run containing the IDs and corresponding marker to tile image and corresponding road masks. We need to continue the download until all the markers are marked to 1.
ID | Road | Mask |
---|---|---|
XXXXX1_YYYYY1 | 1 | 1 |
XXXXX1_YYYYY2 | 1 | 1 |
XXXXX1_YYYYY3 | 0 | 0 |
... | ... | ... |
I am not sure if this will provide a suitable solution. It might also happen that some tiles are not downloading at all and we need to do this repeatedly. Please provide suggestions, if you find a better way to mitigate the problem :)
from jimutmap.
It appears the first 2-3 requests go through, then subsequent ones do not. I've tried adding some sleep
, without success. Could it be a block on the IP or issue with the auth?
Your suggestion is to use retries essentially? I've used this approach in previous role as data engineer.
from jimutmap.
I am thinking of using retries as it came to my mind in the first glance of the issue. If the issue is regarding block on the IP or authentication, then I am not sure if that will work out or not. I also thought that it might also be an issue with the multiprocessing.pool
library, since in case of overloading of requests, I am assuming Operating System might kill some requests internally. In that case, retries will be our only option :)
from jimutmap.
I think the maximum threads offered by CPU (in my case it is 4) will also work.
Thanks for pin-pointing the issue, now I am sure it is a thread issue. The only problem with threads=1 is it will be very slow compared to the others, since it is searching deterministically. But increasing the thread over the capacity of hardware may also cause the computer to slow down, like for example in Linux and Windows-based OS, it slows down considerably and may even result in deadlock (hang).
Retries will again slow down, since we are checking it repeatedly. Looks like I have to use some buffer mechanism, which selectively retries the links by using the database. It will slow down things considerably, but using multiprocessing within retries may solve the issue. I have an exam tomorrow. Let’s see, I hope I will come up with a workable, efficient solution by next week.
from jimutmap.
Hi, it should work now, created a dirty patch, and it will be a bit slow to start probably. The patch uses the maximum number of threads the core of your CPU can provide, so this will be the maximum limit of the hardware.
Be sure to install the latest version using pip and then try to check the test.py file, and update accordingly.
"""
Jimut Bahan Pal
First updated : 22-03-2021
Last updated : 04-04-2022
"""
import os
import glob
import shutil
from jimutmap import api, sanity_check
download_obj = api(min_lat_deg = 10,
max_lat_deg = 10.01,
min_lon_deg = 10,
max_lon_deg = 10.01,
zoom = 19,
verbose = False,
threads_ = 50,
container_dir = "myOutputFolder")
# If you don't have Chrome and can't take advantage of the auto access key fetch, set
# a.ac_key = ACCESS_KEY_STRING
# here
# getMasks = False if you just need the tiles
download_obj.download(getMasks = True)
# create the object of class jimutmap's api
sanity_obj = api(min_lat_deg = 10,
max_lat_deg = 10.01,
min_lon_deg = 10,
max_lon_deg = 10.01,
zoom = 19,
verbose = False,
threads_ = 50,
container_dir = "myOutputFolder")
sanity_check(min_lat_deg = 10,
max_lat_deg = 10.01,
min_lon_deg = 10,
max_lon_deg = 10.01,
zoom = 19,
verbose = False,
threads_ = 50,
container_dir = "myOutputFolder")
print("Cleaning up... hold on")
sqlite_temp_files = glob.glob('*.sqlite*')
print("Temporary sqlite files to be deleted = {} ? ".format(sqlite_temp_files))
inp = input("(y/N) : ")
if inp == 'y' or inp == 'yes' or inp == 'Y':
for item in sqlite_temp_files:
os.remove(item)
## Try to remove tree; if failed show an error using try...except on screen
try:
chromdriver_folders = glob.glob('[0-9]*')
print("Temporary chromedriver folders to be deleted = {} ? ".format(chromdriver_folders))
inp = input("(y/N) : ")
if inp == 'y' or inp == 'yes' or inp == 'Y':
for item in chromdriver_folders:
shutil.rmtree(item)
except OSError as e:
print ("Error: %s - %s." % (e.filename, e.strerror))
Kindly tell if it works or not.
Note: This patch will force download all the road masks too.
from jimutmap.
Tried 1.4.0 but issue persists. I set threads=20 and get this nice warning: Sorry, 20 -- threads unavailable, using maximum CPU threads : 8
Running test.py
:
(venv) robin@Robins-MacBook-Pro dataset-global-solar-plant-locations % python3 test.py
Initializing jimutmap ... Please wait...
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Initializing jimutmap ... Please wait...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 332.47it/s]
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Initializing jimutmap ... Please wait...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 4418.78it/s]
Total satellite images to be downloaded = 225
Total roads tiles to be downloaded = 225
Approx. estimated disk space required = 4.39453125 MB
Total number of satellite images needed to be downloaded = 225
Total number of satellite images needed to be downloaded = 225
Batch ============================================================================= 1
===================================================================================
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Downloading all the satellite tiles:
Updating sanity db ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1817.74it/s]
Total number of satellite images needed to be downloaded = 210
Total number of satellite images needed to be downloaded = 210
Waiting for 15 seconds... Busy downloading
Batch ============================================================================= 2
===================================================================================
Downloading all the satellite tiles:
Updating sanity db ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 450/450 [00:00<00:00, 177724.75it/s]
Total number of satellite images needed to be downloaded = 0
Total number of satellite images needed to be downloaded = 0
************************* Download Sucessful *************************
Cleaning up... hold on
Temporary sqlite files to be deleted = ['temp_sanity.sqlite'] ?
(y/N) : y
Temporary chromedriver folders to be deleted = ['99'] ?
(y/N) : y
from jimutmap.
Could you please tell what is the expected number of files that are to be downloaded...?
And how many of them are actually being downloaded ?
I think in your code, increasing the sleep might fix the issue.
for index in tqdm(range(len(df))):
test_lat = df.iloc[index]['lat']
test_lon = df.iloc[index]['lon']
extent = 0.01
min_lat_deg = test_lat - extent
max_lat_deg = test_lat + extent
min_lon_deg = test_lon - extent
max_lon_deg = test_lon + extent
print(min_lat_deg, max_lat_deg, min_lon_deg, max_lon_deg)
download_obj = api(
min_lat_deg = min_lat_deg, # min_lat,
max_lat_deg = max_lat_deg, # max_lat,
min_lon_deg = min_lon_deg, # min_lon,
max_lon_deg = max_lon_deg, # max_lon,
zoom = 16, # 0 is min, 17 is good
verbose = False,
threads_ = 5,
container_dir = img_dir
)
download_obj.download(getMasks = False)
time.sleep(100) # wait for download to finish
from jimutmap.
If I use threads=8
images are downloaded in the first iteration, but not subsequent ones. If I use threads=1
images are downloaded at every iteration
from jimutmap.
I am not sure about this. Sorry I couldn't solve this, I give up. Probably a multiprocessing issue.
How long is it taking to download all the files using threads=1?
from jimutmap.
Related Issues (10)
- A test issue HOT 1
- Getting historical Imagery HOT 3
- Variables from longitude and latitude HOT 2
- Not loading HOT 9
- ret_lat_lon function is not giving the correct result for given pixel and zoom HOT 2
- Black images when i download tiles from test.py HOT 7
- BLANK IMAGES HOT 3
- Black tiles and Extremely slow download speed HOT 5
- Improvement on _getAPIKey method HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jimutmap.