Git Product home page Git Product logo

Comments (16)

robmarkcole avatar robmarkcole commented on May 25, 2024 1

Yes suspect something to do with multiprocessing.. Set threads_=1 but issue persists. May try using celery or aws lambda to spread the load. Feel free to close this issue if you want

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024 1

Probably I will try to find a solution by next week.
Let’s see.
If you solve this before, then please feel free to share the solution by doing a PR :)

Best,
-Jimut

from jimutmap.

robmarkcole avatar robmarkcole commented on May 25, 2024 1

OK today threads_=1 appears to be OK...

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024 1

Eventually, I have to use a database, since I have to write the image stitching module sometime.

This tool was created with the hypothetical idea of converting 2D satellite images to 3D by using GANs and other related unsupervised deep learning stuffs (back in 2019).

Not sure when I will get time to work on this project for solving that initial purpose :) But I will come up with the solution to the present bug by next week.

from jimutmap.

robmarkcole avatar robmarkcole commented on May 25, 2024 1

I'm quite happy to leave it running over a weekend on my Max, so speed is not my main concern. The generated filenames are unique? One suggestion is the download method could return a dictionary of the created files, request etc. This can then be appended to a pandas data frame, inserted to SQLite db etc.

from jimutmap.

robmarkcole avatar robmarkcole commented on May 25, 2024 1

Using threads=1 I left it overnight and it completed. OK thanks for looking into this, I will consider other ways to parallize if I need to in future. Cheers

from jimutmap.

welcome avatar welcome commented on May 25, 2024

Thanks for opening your first issue here! Be sure to follow the issue template!

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024

I probably might have run into this bug before, but ignored it, as I thought it might be a network connection issue. Here is a workable idea to solve the issue, but not sure if it will be efficient. Let’s keep a temporary database for each run containing the IDs and corresponding marker to tile image and corresponding road masks. We need to continue the download until all the markers are marked to 1.

ID Road Mask
XXXXX1_YYYYY1 1 1
XXXXX1_YYYYY2 1 1
XXXXX1_YYYYY3 0 0
... ... ...

I am not sure if this will provide a suitable solution. It might also happen that some tiles are not downloading at all and we need to do this repeatedly. Please provide suggestions, if you find a better way to mitigate the problem :)

from jimutmap.

robmarkcole avatar robmarkcole commented on May 25, 2024

It appears the first 2-3 requests go through, then subsequent ones do not. I've tried adding some sleep, without success. Could it be a block on the IP or issue with the auth?

Your suggestion is to use retries essentially? I've used this approach in previous role as data engineer.

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024

I am thinking of using retries as it came to my mind in the first glance of the issue. If the issue is regarding block on the IP or authentication, then I am not sure if that will work out or not. I also thought that it might also be an issue with the multiprocessing.pool library, since in case of overloading of requests, I am assuming Operating System might kill some requests internally. In that case, retries will be our only option :)

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024

I think the maximum threads offered by CPU (in my case it is 4) will also work.

Thanks for pin-pointing the issue, now I am sure it is a thread issue. The only problem with threads=1 is it will be very slow compared to the others, since it is searching deterministically. But increasing the thread over the capacity of hardware may also cause the computer to slow down, like for example in Linux and Windows-based OS, it slows down considerably and may even result in deadlock (hang).

Retries will again slow down, since we are checking it repeatedly. Looks like I have to use some buffer mechanism, which selectively retries the links by using the database. It will slow down things considerably, but using multiprocessing within retries may solve the issue. I have an exam tomorrow. Let’s see, I hope I will come up with a workable, efficient solution by next week.

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024

Hi, it should work now, created a dirty patch, and it will be a bit slow to start probably. The patch uses the maximum number of threads the core of your CPU can provide, so this will be the maximum limit of the hardware.

Be sure to install the latest version using pip and then try to check the test.py file, and update accordingly.

"""
Jimut Bahan Pal
First updated : 22-03-2021
Last updated : 04-04-2022
"""

import os
import glob
import shutil
from jimutmap import api, sanity_check


download_obj = api(min_lat_deg = 10,
                      max_lat_deg = 10.01,
                      min_lon_deg = 10,
                      max_lon_deg = 10.01,
                      zoom = 19,
                      verbose = False,
                      threads_ = 50, 
                      container_dir = "myOutputFolder")

# If you don't have Chrome and can't take advantage of the auto access key fetch, set
# a.ac_key = ACCESS_KEY_STRING
# here

# getMasks = False if you just need the tiles 
download_obj.download(getMasks = True)

# create the object of class jimutmap's api
sanity_obj = api(min_lat_deg = 10,
                      max_lat_deg = 10.01,
                      min_lon_deg = 10,
                      max_lon_deg = 10.01,
                      zoom = 19,
                      verbose = False,
                      threads_ = 50, 
                      container_dir = "myOutputFolder")

sanity_check(min_lat_deg = 10,
                max_lat_deg = 10.01,
                min_lon_deg = 10,
                max_lon_deg = 10.01,
                zoom = 19,
                verbose = False,
                threads_ = 50, 
                container_dir = "myOutputFolder")

print("Cleaning up... hold on")

sqlite_temp_files = glob.glob('*.sqlite*')

print("Temporary sqlite files to be deleted = {} ? ".format(sqlite_temp_files))
inp = input("(y/N) : ")
if inp == 'y' or inp == 'yes' or inp == 'Y':
    for item in sqlite_temp_files:
        os.remove(item)



## Try to remove tree; if failed show an error using try...except on screen
try:
    chromdriver_folders = glob.glob('[0-9]*')
    print("Temporary chromedriver folders to be deleted = {} ? ".format(chromdriver_folders))
    inp = input("(y/N) : ")
    if inp == 'y' or inp == 'yes' or inp == 'Y':
        for item in chromdriver_folders:
            shutil.rmtree(item)
except OSError as e:
    print ("Error: %s - %s." % (e.filename, e.strerror))

Kindly tell if it works or not.

Note: This patch will force download all the road masks too.

from jimutmap.

robmarkcole avatar robmarkcole commented on May 25, 2024

Tried 1.4.0 but issue persists. I set threads=20 and get this nice warning: Sorry, 20 -- threads unavailable, using maximum CPU threads : 8

Running test.py:

(venv) robin@Robins-MacBook-Pro dataset-global-solar-plant-locations % python3 test.py 
Initializing jimutmap ... Please wait...
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Initializing jimutmap ... Please wait...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 332.47it/s]
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Initializing jimutmap ... Please wait...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 4418.78it/s]
Total satellite images to be downloaded =  225
Total roads tiles to be downloaded =  225
Approx. estimated disk space required = 4.39453125 MB
Total number of satellite images needed to be downloaded =  225
Total number of satellite images needed to be downloaded =  225
Batch =============================================================================  1
===================================================================================
Sorry, 50 -- threads unavailable, using maximum CPU threads : 8
Downloading all the satellite tiles: 
Updating sanity db ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1817.74it/s]
Total number of satellite images needed to be downloaded =  210
Total number of satellite images needed to be downloaded =  210
Waiting for 15 seconds... Busy downloading
Batch =============================================================================  2
===================================================================================
Downloading all the satellite tiles: 
Updating sanity db ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 450/450 [00:00<00:00, 177724.75it/s]
Total number of satellite images needed to be downloaded =  0
Total number of satellite images needed to be downloaded =  0
************************* Download Sucessful *************************
Cleaning up... hold on
Temporary sqlite files to be deleted = ['temp_sanity.sqlite'] ? 
(y/N) : y
Temporary chromedriver folders to be deleted = ['99'] ? 
(y/N) : y

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024

Could you please tell what is the expected number of files that are to be downloaded...?
And how many of them are actually being downloaded ?

I think in your code, increasing the sleep might fix the issue.

for index in tqdm(range(len(df))):
    test_lat = df.iloc[index]['lat']
    test_lon = df.iloc[index]['lon']
    extent = 0.01

    min_lat_deg = test_lat - extent
    max_lat_deg = test_lat + extent
    min_lon_deg = test_lon - extent
    max_lon_deg = test_lon + extent

    print(min_lat_deg, max_lat_deg, min_lon_deg, max_lon_deg)

    download_obj = api(
        min_lat_deg = min_lat_deg, # min_lat,
        max_lat_deg = max_lat_deg, # max_lat,
        min_lon_deg = min_lon_deg, # min_lon,
        max_lon_deg = max_lon_deg, # max_lon,
        zoom = 16, # 0 is min, 17 is good
        verbose = False,
        threads_ = 5, 
        container_dir = img_dir
        )

    download_obj.download(getMasks = False)
    time.sleep(100) # wait for download to finish

from jimutmap.

robmarkcole avatar robmarkcole commented on May 25, 2024

If I use threads=8 images are downloaded in the first iteration, but not subsequent ones. If I use threads=1 images are downloaded at every iteration

from jimutmap.

Jimut123 avatar Jimut123 commented on May 25, 2024

I am not sure about this. Sorry I couldn't solve this, I give up. Probably a multiprocessing issue.
How long is it taking to download all the files using threads=1?

from jimutmap.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.