Git Product home page Git Product logo

Comments (10)

jkhenning avatar jkhenning commented on July 22, 2024

Hi @aimakhotka, as you've mentioned, you'll indeed need to use the full service endpoint (i.e. https://b-ws-hk0m2-pd11-r87.b1.s3.sbercloud.ru:443/<bucket-name>/...) to specify your non-AWS service, as ClearML has no way to understand you're choosing a non-AWS service otherwise.
To make your life easier, you can use the sdk.development.default_output_uri setting in your clearml.conf file instead of specifying this every time you call Task.init()

from clearml-server.

aimakhotka avatar aimakhotka commented on July 22, 2024

Hi @aimakhotka, as you've mentioned, you'll indeed need to use the full service endpoint (i.e. https://b-ws-hk0m2-pd11-r87.b1.s3.sbercloud.ru:443/<bucket-name>/...) to specify your non-AWS service, as ClearML has no way to understand you're choosing a non-AWS service otherwise. To make your life easier, you can use the sdk.development.default_output_uri setting in your clearml.conf file instead of specifying this every time you call Task.init()

Hi @jkhenning, thanks you so much for your reply! The catch is that the method with specifying the sdk.development.default_output_uri parameter doesn't work. I specify the same address in clearml.conf, but it still doesn't work without specifying sdk.development.default_output_uri when calling Task.init(), although in theory everything should work. That's why I decided to ask for help(

from clearml-server.

jkhenning avatar jkhenning commented on July 22, 2024

Hi @aimakhotka,

I specify the same address in clearml.conf

Where do you specify it? in the sdk.aws.s3 section?

but it still doesn't work without specifying sdk.development.default_output_uri when calling Task.init(), although in theory everything should work

You should either provide it with sdk.development.default_output_uri or with Task.init(output_uri="https://b-ws-hk0m2-pd11-r87.b1.s3.sbercloud.ru:443/<bucket-name>/...") - are you saying using on e of these methods doesn't work?

from clearml-server.

aimakhotka avatar aimakhotka commented on July 22, 2024

Hi @jkhenning,

Where do you specify it? in the sdk.aws.s3 section?

No, in the sdk.development.default_output_uri.

are you saying using on e of these methods doesn't work?

Yeah, that's exactly what I'm saying. I specify a sdk.development.default_output_uri, but it's like ClearML doesn't see this parameter in config. The method with Task.init() works, so the problem is not in the S3 path.

from clearml-server.

jkhenning avatar jkhenning commented on July 22, 2024

So you:

  1. Set sdk.development.default_output_uri in your clearml.conf file under with the value being https://...:433/bucket/...
  2. Run your python script locally (on the same machine) which uses Task.init() but specifies no output_uri

And the SDK does not use the default output_uri? Can you attach screenshots of how the task looks in the ClearML UI? Specifically the Execution and Info sections?

from clearml-server.

aimakhotka avatar aimakhotka commented on July 22, 2024

Can you attach screenshots of how the task looks in the ClearML UI? Specifically the Execution and Info sections?

Yes, sure.

Here's what happens in the terminal

$ python3 artifacts.py 
ClearML Task: created new task id=7c73ee92bb4a4902b63bd2d7c9e88540
2023-11-10 14:29:52,039 - clearml.storage - ERROR - Failed uploading: Could not connect to the endpoint URL: "https://b-ws-hk0m2-pd11-r87.s3.n-ws-hk0m2-pd11.amazonaws.com/.clearml.2bc71333-b21a-48ee-b875-feb5f2372a15.test"
Traceback (most recent call last):
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/util/connection.py", line 72, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/socket.py", line 953, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/httpsession.py", line 464, in send
    urllib_response = conn.urlopen(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/util/retry.py", line 525, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
    conn.connect()
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connection.py", line 363, in connect
    self.sock = conn = self._new_conn()
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7fe80bb6c5e0>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 2741, in check_write_permissions
    self.delete(path=dest_path)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 2726, in delete
    return self._driver.delete_object(self.get_object(path))
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 599, in delete_object
    object.delete()
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/boto3/resources/factory.py", line 580, in do_action
    response = action(self, *args, **kwargs)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/boto3/resources/action.py", line 88, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/client.py", line 963, in _make_api_call
    http, parsed_response = self._make_request(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/client.py", line 986, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 202, in _send_request
    while self._needs_retry(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 354, in _needs_retry
    responses = self._event_emitter.emit(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 207, in __call__
    if self._checker(**checker_kwargs):
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 284, in __call__
    should_retry = self._should_retry(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 320, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 363, in __call__
    checker_response = checker(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 247, in __call__
    return self._check_caught_exception(
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
    raise caught_exception
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 281, in _do_get_response
    http_response = self._send(request)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 377, in _send
    return self.http_session.send(request)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/httpsession.py", line 493, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://b-ws-hk0m2-pd11-r87.s3.n-ws-hk0m2-pd11.amazonaws.com/.clearml.2bc71333-b21a-48ee-b875-feb5f2372a15.test"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aimakhotka/Documents/github/sber/clearml_doc/src/files/artifacts.py", line 56, in <module>
    main()
  File "/home/aimakhotka/Documents/github/sber/clearml_doc/src/files/artifacts.py", line 13, in main
    task = Task.init(project_name='test_cloud', task_name='jkhenning_test')
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/task.py", line 593, in init
    task.output_uri = task.get_project_object().default_output_destination
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/task.py", line 1124, in output_uri
    helper.check_write_permissions(value)
  File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 2743, in check_write_permissions
    raise ValueError("Insufficient permissions (delete failed) for {}".format(base_url))
ValueError: Insufficient permissions (delete failed) for s3://b-ws-hk0m2-pd11-r87

Screenshots of ClearML UI

image
image
image

Configuration, Artifacts, Console, Scalar, Plots, Debug Samles are empty.

My clearml.conf


# ClearML SDK configuration file
api {
    # Notice: 'host' is the api server (default port 8008), not the web server.
    api_server: http://localhost:8008
    web_server: http://localhost:8080
    files_server: https://http://localhost:8081/

    # Credentials are generated using the webapp, http://62.113.97.251:8080/settings
    # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
    credentials {"access_key": "***", "secret_key": "***"}
}
sdk {
    # ClearML - default SDK configuration

    storage {
        cache {
            # Defaults to <system_temp_folder>/clearml_cache
            default_base_dir: "~/.clearml/cache"
            # default_cache_manager_size: 100
        }

        direct_access: [
            # Objects matching are considered to be available for direct access, i.e. they will not be downloaded
            # or cached, and any download request will return a direct reference.
            # Objects are specified in glob format, available for url and content_type.
            { url: "file://*" }  # file-urls are always directly referenced
        ]
    }

    metrics {
        # History size for debug files per metric/variant. For each metric/variant combination with an attached file
        # (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
        # X files are stored in the upload destination for each metric/variant combination.
        file_history_size: 100

        # Max history size for matplotlib imshow files per plot title.
        # File names for the uploaded images will be recycled in such a way that no more than
        # X images are stored in the upload destination for each matplotlib plot title.
        matplotlib_untitled_history_size: 100

        # Limit the number of digits after the dot in plot reporting (reducing plot report size)
        # plot_max_num_digits: 5

        # Settings for generated debug images
        images {
            format: JPEG
            quality: 87
            subsampling: 0
        }

        # Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
        tensorboard_single_series_per_graph: false
    }

    network {
        # Number of retries before failing to upload file
        file_upload_retries: 3

        metrics {
            # Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
            # a specific iteration
            file_upload_threads: 4

            # Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
            # being sent for upload
            file_upload_starvation_warning_sec: 120
        }

        iteration {
            # Max number of retries when getting frames if the server returned an error (http code 500)
            max_retries_on_server_error: 5
            # Backoff factory for consecutive retry attempts.
            # SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
            retry_backoff_factor_sec: 10
        }
    }
    aws {
        s3 {
            # S3 credentials, used for read/write access by various SDK elements

            # The following settings will be used for any bucket not specified below in the "credentials" section
            # ---------------------------------------------------------------------------------------------------


            key: "***"
            secret: "***"
            region: "n-ws-hk0m2-pd11"

            # Or enable credentials chain to let Boto3 pick the right credentials.
            # This includes picking credentials from environment variables,
            # credential file and IAM role using metadata service.
            # Refer to the latest Boto3 docs
            use_credentials_chain: true
            # Additional ExtraArgs passed to boto3 when uploading files. Can also be set per-bucket under "credentials".
            extra_args: {

            }
            # ---------------------------------------------------------------------------------------------------

            credentials: [
		        {
                     #  This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
                     host: "n-ws-hk0m2-pd11.s3pd11.sbercloud.ru:443"
                     # Specify explicit keys
		     bucket: "b-ws-hk0m2-pd11-r87"
                     multipart: false
                     secure: true
                 }
            	]
        }
        boto3 {
            pool_connections: 512
            max_multipart_concurrency: 16
            multipart_threshold: 8388608 # 8MB
            multipart_chunksize: 8388608 # 8MB
        }
    }
    google.storage {
        # # Default project and credentials file
        # # Will be used when no bucket configuration is found
        # project: "clearml"
        # credentials_json: "/path/to/credentials.json"
        # pool_connections: 512
        # pool_maxsize: 1024

        # # Specific credentials per bucket and sub directory
        # credentials = [
        #     {
        #         bucket: "my-bucket"
        #         subdir: "path/in/bucket" # Not required
        #         project: "clearml"
        #         credentials_json: "/path/to/credentials.json"
        #     },
        # ]
    }
    azure.storage {
        # max_connections: 2

        # containers: [
        #     {
        #         account_name: "clearml"
        #         account_key: "secret"
        #         # container_name:
        #     }
        # ]
    }

    log {
        # debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
        null_log_propagate: false
        task_log_buffer_capacity: 66

        # disable urllib info and lower levels
        disable_urllib3_info: true
    }

    development {
        # Development-mode options

        # dev task reuse window
        task_reuse_time_window_in_hours: 72.0

        # Run VCS repository detection asynchronously
        vcs_repo_detect_async: true

        # Store uncommitted git/hg source code diff in experiment manifest when training in development mode
        # This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
        store_uncommitted_code_diff: true

        # Support stopping an experiment in case it was externally stopped, status was changed or task was reset
        support_stopping: true

        # Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
        default_output_uri: "https://n-ws-hk0m2-pd11.s3pd11.sbercloud.ru:443/b-ws-hk0m2-pd11-r87/test_clearml_s3_artifacts/"

        # Default auto generated requirements optimize for smaller requirements
        # If True, analyze the entire repository regardless of the entry point.
        # If False, first analyze the entry point script, if it does not contain other to local files,
        # do not analyze the entire repository.
        force_analyze_entire_repo: false

        # If set to true, *clearml* update message will not be printed to the console
        # this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
        suppress_update_message: false

        # If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
        detect_with_pip_freeze: false

        # Log specific environment variables. OS environments are listed in the "Environment" section
        # of the Hyper-Parameters.
        # multiple selected variables are supported including the suffix '*'.
        # For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
        # This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
        # Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
        log_os_environments: []

        # Development mode worker
        worker {
            # Status report period in seconds
            report_period_sec: 2

            # The number of events to report
            report_event_flush_threshold: 100

            # ping to the server - check connectivity
            ping_period_sec: 30

            # Log all stdout & stderr
            log_stdout: true

            # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
            # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds
            console_cr_flush_period: 10

            # compatibility feature, report memory usage for the entire machine
            # default (false), report only on the running process and its sub-processes
            report_global_mem_used: false

            # if provided, start resource reporting after this amount of seconds
            #report_start_sec: 30
        }
    }

    # Apply top-level environment section from configuration into os.environ
    apply_environment: false
    # Top-level environment section is in the form of:
    #   environment {
    #     key: value
    #     ...
    #   }
    # and is applied to the OS environment as `key=value` for each key/value pair

    # Apply top-level files section from configuration into local file system
    apply_files: false
    # Top-level files section allows auto-generating files at designated paths with a predefined contents
    # and target format. Options include:
    #  contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
    #  format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
    #          base64-encoded contents string, otherwise ignored
    #  path: the target file's path, may include ~ and inplace env vars
    #  target_format: format used to encode contents before writing into the target file. Supported values are json,
    #                 yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
    #  overwrite: overwrite the target file in case it exists. Default is true.
    #
    # Example:
    #   files {
    #     myfile1 {
    #       contents: "The quick brown fox jumped over the lazy dog"
    #       path: "/tmp/fox.txt"
    #     }
    #     myjsonfile {
    #       contents: {
    #         some {
    #           nested {
    #             value: [1, 2, 3, 4]
    #           }
    #         }
    #       }
    #       path: "/tmp/test.json"
    #       target_format: json
    #     }
    #   }
}

from clearml-server.

jkhenning avatar jkhenning commented on July 22, 2024

You specifying both the global sdk.aws.s3 settings:

            key: "***"
            secret: "***"
            region: "n-ws-hk0m2-pd11"
            use_credentials_chain: true

As well as the bucket-specific:

            credentials: [
		        {
                     #  This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
                     host: "n-ws-hk0m2-pd11.s3pd11.sbercloud.ru:443"
                     # Specify explicit keys
		     bucket: "b-ws-hk0m2-pd11-r87"
                     multipart: false
                     secure: true
                 }
            	]

You should only specify the bucket-specific one and not use the credentials chain, can you please try it out?

from clearml-server.

aimakhotka avatar aimakhotka commented on July 22, 2024

You should only specify the bucket-specific one and not use the credentials chain, can you please try it out?

That helped! But default_output_uri had to be specified in the format s3://...:443/bucket-name/....
Thank you so much! Why is this happening? What does the use_credentials_chain parameter do?

from clearml-server.

jkhenning avatar jkhenning commented on July 22, 2024

This parameter basically tells boto3 to look for credentials in the system's configuration or in an AWS role (in case it's running on an AWS machine) and not use the explicitly provided credentials

from clearml-server.

aimakhotka avatar aimakhotka commented on July 22, 2024

This parameter basically tells boto3 to look for credentials in the system's configuration or in an AWS role (in case it's running on an AWS machine) and not use the explicitly provided credentials

Ooh, I see. In all the examples I saw, this parameter was "true" and I didn't really understand the description in the documentation, so I didn't even think about it. Thanks, you really helped me out!

from clearml-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.