Comments (10)
Hi @aimakhotka, as you've mentioned, you'll indeed need to use the full service endpoint (i.e. https://b-ws-hk0m2-pd11-r87.b1.s3.sbercloud.ru:443/<bucket-name>/...
) to specify your non-AWS service, as ClearML has no way to understand you're choosing a non-AWS service otherwise.
To make your life easier, you can use the sdk.development.default_output_uri
setting in your clearml.conf
file instead of specifying this every time you call Task.init()
from clearml-server.
Hi @aimakhotka, as you've mentioned, you'll indeed need to use the full service endpoint (i.e.
https://b-ws-hk0m2-pd11-r87.b1.s3.sbercloud.ru:443/<bucket-name>/...
) to specify your non-AWS service, as ClearML has no way to understand you're choosing a non-AWS service otherwise. To make your life easier, you can use thesdk.development.default_output_uri
setting in yourclearml.conf
file instead of specifying this every time you callTask.init()
Hi @jkhenning, thanks you so much for your reply! The catch is that the method with specifying the sdk.development.default_output_uri
parameter doesn't work. I specify the same address in clearml.conf
, but it still doesn't work without specifying sdk.development.default_output_uri
when calling Task.init()
, although in theory everything should work. That's why I decided to ask for help(
from clearml-server.
Hi @aimakhotka,
I specify the same address in clearml.conf
Where do you specify it? in the sdk.aws.s3
section?
but it still doesn't work without specifying sdk.development.default_output_uri when calling Task.init(), although in theory everything should work
You should either provide it with sdk.development.default_output_uri
or with Task.init(output_uri="https://b-ws-hk0m2-pd11-r87.b1.s3.sbercloud.ru:443/<bucket-name>/...")
- are you saying using on e of these methods doesn't work?
from clearml-server.
Hi @jkhenning,
Where do you specify it? in the
sdk.aws.s3
section?
No, in the sdk.development.default_output_uri
.
are you saying using on e of these methods doesn't work?
Yeah, that's exactly what I'm saying. I specify a sdk.development.default_output_uri
, but it's like ClearML doesn't see this parameter in config. The method with Task.init() works, so the problem is not in the S3 path.
from clearml-server.
So you:
- Set
sdk.development.default_output_uri
in your clearml.conf file under with the value beinghttps://...:433/bucket/...
- Run your python script locally (on the same machine) which uses
Task.init()
but specifies nooutput_uri
And the SDK does not use the default output_uri? Can you attach screenshots of how the task looks in the ClearML UI? Specifically the Execution and Info sections?
from clearml-server.
Can you attach screenshots of how the task looks in the ClearML UI? Specifically the Execution and Info sections?
Yes, sure.
Here's what happens in the terminal
$ python3 artifacts.py
ClearML Task: created new task id=7c73ee92bb4a4902b63bd2d7c9e88540
2023-11-10 14:29:52,039 - clearml.storage - ERROR - Failed uploading: Could not connect to the endpoint URL: "https://b-ws-hk0m2-pd11-r87.s3.n-ws-hk0m2-pd11.amazonaws.com/.clearml.2bc71333-b21a-48ee-b875-feb5f2372a15.test"
Traceback (most recent call last):
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/socket.py", line 953, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/httpsession.py", line 464, in send
urllib_response = conn.urlopen(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 799, in urlopen
retries = retries.increment(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/util/retry.py", line 525, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
conn.connect()
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7fe80bb6c5e0>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 2741, in check_write_permissions
self.delete(path=dest_path)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 2726, in delete
return self._driver.delete_object(self.get_object(path))
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 599, in delete_object
object.delete()
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/boto3/resources/factory.py", line 580, in do_action
response = action(self, *args, **kwargs)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/boto3/resources/action.py", line 88, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/client.py", line 535, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/client.py", line 963, in _make_api_call
http, parsed_response = self._make_request(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/client.py", line 986, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 202, in _send_request
while self._needs_retry(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 354, in _needs_retry
responses = self._event_emitter.emit(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
response = handler(**kwargs)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 207, in __call__
if self._checker(**checker_kwargs):
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 284, in __call__
should_retry = self._should_retry(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 320, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 363, in __call__
checker_response = checker(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 247, in __call__
return self._check_caught_exception(
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
raise caught_exception
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 281, in _do_get_response
http_response = self._send(request)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/endpoint.py", line 377, in _send
return self.http_session.send(request)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/botocore/httpsession.py", line 493, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://b-ws-hk0m2-pd11-r87.s3.n-ws-hk0m2-pd11.amazonaws.com/.clearml.2bc71333-b21a-48ee-b875-feb5f2372a15.test"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/aimakhotka/Documents/github/sber/clearml_doc/src/files/artifacts.py", line 56, in <module>
main()
File "/home/aimakhotka/Documents/github/sber/clearml_doc/src/files/artifacts.py", line 13, in main
task = Task.init(project_name='test_cloud', task_name='jkhenning_test')
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/task.py", line 593, in init
task.output_uri = task.get_project_object().default_output_destination
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/task.py", line 1124, in output_uri
helper.check_write_permissions(value)
File "/home/aimakhotka/.pyenv/versions/3.9.0/lib/python3.9/site-packages/clearml/storage/helper.py", line 2743, in check_write_permissions
raise ValueError("Insufficient permissions (delete failed) for {}".format(base_url))
ValueError: Insufficient permissions (delete failed) for s3://b-ws-hk0m2-pd11-r87
My clearml.conf
# ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://localhost:8008
web_server: http://localhost:8080
files_server: https://http://localhost:8081/
# Credentials are generated using the webapp, http://62.113.97.251:8080/settings
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "***", "secret_key": "***"}
}
sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to <system_temp_folder>/clearml_cache
default_base_dir: "~/.clearml/cache"
# default_cache_manager_size: 100
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_type.
{ url: "file://*" } # file-urls are always directly referenced
]
}
metrics {
# History size for debug files per metric/variant. For each metric/variant combination with an attached file
# (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
# X files are stored in the upload destination for each metric/variant combination.
file_history_size: 100
# Max history size for matplotlib imshow files per plot title.
# File names for the uploaded images will be recycled in such a way that no more than
# X images are stored in the upload destination for each matplotlib plot title.
matplotlib_untitled_history_size: 100
# Limit the number of digits after the dot in plot reporting (reducing plot report size)
# plot_max_num_digits: 5
# Settings for generated debug images
images {
format: JPEG
quality: 87
subsampling: 0
}
# Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
tensorboard_single_series_per_graph: false
}
network {
# Number of retries before failing to upload file
file_upload_retries: 3
metrics {
# Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
# a specific iteration
file_upload_threads: 4
# Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
# being sent for upload
file_upload_starvation_warning_sec: 120
}
iteration {
# Max number of retries when getting frames if the server returned an error (http code 500)
max_retries_on_server_error: 5
# Backoff factory for consecutive retry attempts.
# SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
retry_backoff_factor_sec: 10
}
}
aws {
s3 {
# S3 credentials, used for read/write access by various SDK elements
# The following settings will be used for any bucket not specified below in the "credentials" section
# ---------------------------------------------------------------------------------------------------
key: "***"
secret: "***"
region: "n-ws-hk0m2-pd11"
# Or enable credentials chain to let Boto3 pick the right credentials.
# This includes picking credentials from environment variables,
# credential file and IAM role using metadata service.
# Refer to the latest Boto3 docs
use_credentials_chain: true
# Additional ExtraArgs passed to boto3 when uploading files. Can also be set per-bucket under "credentials".
extra_args: {
}
# ---------------------------------------------------------------------------------------------------
credentials: [
{
# This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
host: "n-ws-hk0m2-pd11.s3pd11.sbercloud.ru:443"
# Specify explicit keys
bucket: "b-ws-hk0m2-pd11-r87"
multipart: false
secure: true
}
]
}
boto3 {
pool_connections: 512
max_multipart_concurrency: 16
multipart_threshold: 8388608 # 8MB
multipart_chunksize: 8388608 # 8MB
}
}
google.storage {
# # Default project and credentials file
# # Will be used when no bucket configuration is found
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# pool_connections: 512
# pool_maxsize: 1024
# # Specific credentials per bucket and sub directory
# credentials = [
# {
# bucket: "my-bucket"
# subdir: "path/in/bucket" # Not required
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# },
# ]
}
azure.storage {
# max_connections: 2
# containers: [
# {
# account_name: "clearml"
# account_key: "secret"
# # container_name:
# }
# ]
}
log {
# debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
null_log_propagate: false
task_log_buffer_capacity: 66
# disable urllib info and lower levels
disable_urllib3_info: true
}
development {
# Development-mode options
# dev task reuse window
task_reuse_time_window_in_hours: 72.0
# Run VCS repository detection asynchronously
vcs_repo_detect_async: true
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
store_uncommitted_code_diff: true
# Support stopping an experiment in case it was externally stopped, status was changed or task was reset
support_stopping: true
# Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
default_output_uri: "https://n-ws-hk0m2-pd11.s3pd11.sbercloud.ru:443/b-ws-hk0m2-pd11-r87/test_clearml_s3_artifacts/"
# Default auto generated requirements optimize for smaller requirements
# If True, analyze the entire repository regardless of the entry point.
# If False, first analyze the entry point script, if it does not contain other to local files,
# do not analyze the entire repository.
force_analyze_entire_repo: false
# If set to true, *clearml* update message will not be printed to the console
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
suppress_update_message: false
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
detect_with_pip_freeze: false
# Log specific environment variables. OS environments are listed in the "Environment" section
# of the Hyper-Parameters.
# multiple selected variables are supported including the suffix '*'.
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
log_os_environments: []
# Development mode worker
worker {
# Status report period in seconds
report_period_sec: 2
# The number of events to report
report_event_flush_threshold: 100
# ping to the server - check connectivity
ping_period_sec: 30
# Log all stdout & stderr
log_stdout: true
# Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
# Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds
console_cr_flush_period: 10
# compatibility feature, report memory usage for the entire machine
# default (false), report only on the running process and its sub-processes
report_global_mem_used: false
# if provided, start resource reporting after this amount of seconds
#report_start_sec: 30
}
}
# Apply top-level environment section from configuration into os.environ
apply_environment: false
# Top-level environment section is in the form of:
# environment {
# key: value
# ...
# }
# and is applied to the OS environment as `key=value` for each key/value pair
# Apply top-level files section from configuration into local file system
apply_files: false
# Top-level files section allows auto-generating files at designated paths with a predefined contents
# and target format. Options include:
# contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
# format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
# base64-encoded contents string, otherwise ignored
# path: the target file's path, may include ~ and inplace env vars
# target_format: format used to encode contents before writing into the target file. Supported values are json,
# yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
# overwrite: overwrite the target file in case it exists. Default is true.
#
# Example:
# files {
# myfile1 {
# contents: "The quick brown fox jumped over the lazy dog"
# path: "/tmp/fox.txt"
# }
# myjsonfile {
# contents: {
# some {
# nested {
# value: [1, 2, 3, 4]
# }
# }
# }
# path: "/tmp/test.json"
# target_format: json
# }
# }
}
from clearml-server.
You specifying both the global sdk.aws.s3
settings:
key: "***"
secret: "***"
region: "n-ws-hk0m2-pd11"
use_credentials_chain: true
As well as the bucket-specific:
credentials: [
{
# This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
host: "n-ws-hk0m2-pd11.s3pd11.sbercloud.ru:443"
# Specify explicit keys
bucket: "b-ws-hk0m2-pd11-r87"
multipart: false
secure: true
}
]
You should only specify the bucket-specific one and not use the credentials chain, can you please try it out?
from clearml-server.
You should only specify the bucket-specific one and not use the credentials chain, can you please try it out?
That helped! But default_output_uri
had to be specified in the format s3://...:443/bucket-name/...
.
Thank you so much! Why is this happening? What does the use_credentials_chain
parameter do?
from clearml-server.
This parameter basically tells boto3 to look for credentials in the system's configuration or in an AWS role (in case it's running on an AWS machine) and not use the explicitly provided credentials
from clearml-server.
This parameter basically tells boto3 to look for credentials in the system's configuration or in an AWS role (in case it's running on an AWS machine) and not use the explicitly provided credentials
Ooh, I see. In all the examples I saw, this parameter was "true" and I didn't really understand the description in the documentation, so I didn't even think about it. Thanks, you really helped me out!
from clearml-server.
Related Issues (20)
- Async Delete Always Failed when Removing Experiments (using Minio)
- nginx 0.6.x < 1.20.1 1-Byte Memory Overwrite RCE vulnerability HOT 2
- ElasticSearch UI and Redis UI? HOT 2
- The problem with scalars HOT 12
- Curl 7.69 < 8.4.0 Heap Buffer Overflow vulnerability HOT 2
- OpenSSL 1.1.1 < 1.1.1x Vulnerability HOT 1
- Elasticsearch image tag 7.17 does not exist HOT 4
- Git package is not installed by default in node:20-bookworm-slim HOT 1
- SERVER UNAVAILABLE HOT 4
- APP Credentials disapper in webapp HOT 21
- Scalar graphs legend is too narrow for experiments with long names HOT 9
- Update from 1.14.1 to 1.15.0 leads to several fatal issues when booting HOT 3
- AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. HOT 3
- Web server Ipv6 error
- Error Download via UI for Azure Blob Storage HOT 1
- after upgrading to 1.16.0, images dont load in web UI. HOT 15
- How to limit parallel job per queue + user permission Q HOT 2
- clearml server deletes APP credentials on server restart HOT 2
- default_worker_timeout_sec is ignored HOT 2
- Invalid mongodb connection string if CLEARML_MONGODB_SERVICE_CONNECTION_STRING is specified HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml-server.