jic-dtool / dtool-create Goto Github PK

Dtool plugin for creating datasets and collections

License: MIT License

Python 100.00%

dtool-create's Issues

Resolve absolute path when using ``--symlink-path`` option

At the moment relative paths are not expanded and a command such as the below fails:

dtool create my-first-ds --symlink-path rel/path/to/data

At the moment the user needs to do the below:

dtool create my-first-ds --symlink-path `pwd`/rel/path/to/data

Which is not intuitive.

Add verbose flag to ``dtool copy``

End user suggested that it would be useful for the dtool copy command to feed back information about which files are being copied across.

Could you show the valid STORAGE values in ``dtool copy --help``?

Add ``dtool item cp`` command to return an item using its original relpath

Lots of tools and user think of data in terms of file names. The current command for fetching an item:

dtool item fetch

does not return the item using the original items relpath instead it uses the UUID of the dataset and the item ID.

In order to make life easier for users, and tool that really make use of information in the file path, it would be useful to be able to fetch an item returning it with the original relpath. A nice solution for this might be to implement the command:

dtool item cp

Note that this may also enable us to remove the "hack" of appending the file suffix to the end of the abspaths created by dtool item fetch.

dtool readme interactive shortcomings

This feedback reached me via email,

dtool readme interactive wäre ja eigentlich nett, ist aber bei nested key value maps echt nicht schön, da es nur die innerste Schlüssel-Ebene anzeigt. Auch kann man kein array als Wert eingeben, oder ich versteh zumindest nicht wie. Außerdem stürzt das ganze ab, wenn im template ein null value angegeben ist - was aber m.E perfekt valides yaml wäre.

deepl translation:

dtool readme interactive would actually be nice, but is really not nice for nested key value maps, as it only shows the innermost key level. Also, you can't enter an array as a value, or at least I don't understand how. In addition, the whole thing crashes if a null value is specified in the template - which in my opinion would be perfectly valid yaml.

Add ``dtool publish`` command to CLI

Feedback from user that it would be better to incorporate the functionality of the dtool_publish command line tool from dtool-http into the client as a command named dtool publish.

Make ``dtool copy`` use URIs for both src and dest

Currently:

dtool copy --help
Usage: dtool copy [OPTIONS] DATASET_URI [PREFIX] [STORAGE]

Desired:

dtool copy --help
Usage: dtool copy [OPTIONS] SRC_DATASET_URI DEST_DATASET_URI

Add ``-q/--quiet`` option to ``dtool create`` that only returns the generated URI

Current output:

dtool create my_dataset ~/junk
Created proto dataset file:///Users/olssont/junk/my_dataset
Next steps:
1. Add descriptive metadata, e.g:
   dtool readme interactive file:///Users/olssont/junk/my_dataset
2. Add raw data, eg:
   dtool add item my_file.txt file:///Users/olssont/junk/my_dataset
   Or use your system commands, e.g:
   mv my_data_directory /Users/olssont/junk/my_dataset/data/
3. Convert the proto dataset into a dataset:
   dtool freeze file:///Users/olssont/junk/my_dataset

Output with desired option:

dtool create -q my_dataset ~/junk
file:///Users/olssont/junk/my_dataset

inconsistency between date and datetime objects

Using dtool create, I had the error that datetime.date has no attribute date.

The bug is caused by the following lines:

dtool-create/dtool_create/dataset.py

Lines 89 to 98 in 80563dd

 elif isinstance(value, datetime.date): 

 def parse_date(value): 

 try: 

 date = datetime.datetime.strptime(value, "%Y-%m-%d") 

 except ValueError as e: 

 raise click.BadParameter( 

 "Could not parse date, {}".format(e), param=value) 

 return date 

 new_value = click.prompt(key, default=value, value_proc=parse_date) 

 d[key] = new_value.date().isoformat()

The default happens to be a date object, not a datetime object, but the parse_date function returns a datetime object.

Here is the proposed fix:

        elif isinstance(value, datetime.date):
            def parse_date(value):
                try:
                    date = datetime.datetime.strptime(value, "%Y-%m-%d")
                except ValueError as e:
                    raise click.BadParameter(
                        "Could not parse date, {}".format(e), param=value)
                return date.date()
            new_value = click.prompt(key, default=value, value_proc=parse_date)
            d[key] = new_value.isoformat()

This might be related to #22, since something has been changed from datetime to date.

I wonder why this error comes up now, since the code dates from may.

Current python version is 3.8.

Add ability to update README file with descriptive metadata

Original README should be not be deleted but rather be renamed as something along the lines of README.yml.timestamp

Ability to combine copy with verify/diff to ensure that the copy has been successful

Replace:

dtool copy src dest
dtool diff src dest

With a single command:

dtool copy src dest

Implementation detail:

Create a helper function in dtoolcore.compare that takes two manifests as input
Always check identifiers and sizes
If the hashes match also check these, if not log an logger.info message that the hashes could not be compared

Add quiet flag to ``dtool copy`` that only returns the URI copied to

This is needed so that one can programatically discover where the data was copied to when transferring data from different backends. Below is the output of the current behaviour in going from iRODS to local file storage.

$ dtool copy irods:///jic_archive/f13ef963-37f0-4c3c-a96c-da99e036ea10 ~/junk2
Copying dataset  [------------------------------------]    0%
Dataset copied to file:///Users/olssont/junk2/my-another-dataset

From this it is difficult to programatically work out where the dataset has been put. The behaviour below would be a solution to this problem.

$ dtool copy -q irods:///jic_archive/f13ef963-37f0-4c3c-a96c-da99e036ea10 ~/junk2
file:///Users/olssont/junk2/my-another-dataset

Python2 issue with unicode in readme

$ dtool readme edit ~/junk/test-unicode-ds
Traceback (most recent call last):
  File "/Users/olssont/envs/dtool/bin/dtool", line 11, in <module>
    sys.exit(dtool())
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtool_create/dataset.py", line 278, in edit
    edited_content = click.edit(readme_content)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/termui.py", line 456, in edit
    return editor.edit(text)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/_termui_impl.py", line 425, in edit
    text = text.encode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)

Readme generated by ``dtool readme interactive`` looses indentation

Current output:

---
description: Dataset description
project: Project name
confidential: false
personally_identifiable_information: false
owners:
- name: Your Name
  email: [email protected]
  username: olssont
creation_date: 2017-10-23
# links:
#  - http://doi.dx.org/your_doi
#  - http://github.com/your_code_repository
# budget_codes:
#  - E.g. CCBS1H10S

Expected:

---
description: Dataset description
project: Project name
confidential: false
personally_identifiable_information: false
owners:
  - name: Your Name
    email: [email protected]
    username: olssont
creation_date: 2017-10-23
# links:
#  - http://doi.dx.org/your_doi
#  - http://github.com/your_code_repository
# budget_codes:
#  - E.g. CCBS1H10S

Ensure corrupted files do not end up in the dtool cache

If a dtool copy command fails because the connection is broken one can end up with a broken file in the dtool cache. If one then tries to resume the copy one can end up with the broken file into the dtool cache.

Dataset items that end up in the cache should never be corrupted. Some validation should therefore occur before they are put into it.

Add -q/--quiet flag to dtool freeze command

At the moment dtool freeze is very chatty.

$ dtool freeze ~/junk/test > log
$ cat log
Generating manifest
Dataset frozen file:///Users/olssont/junk/test

It would be useful to have a -q/--quiet flag to suppress the writing of this info.

Should ``dtool copy`` be changed to ``dtool cp``?

Sanity checking before running ``dtool freeze``

Feedback from a user:

I would like “dtool freeze” to be more circumspect about proceeding because “freeze”ing a proto is irreversible (or should be seen that way).

I think “dtool freeze” should ask for confirmation with a warning about kicking off a (potentially) long-running process.
A verbose “—dry-run” option would also be a good addition.

Why do I ask this?

From the dtool documentation (http://dtool.readthedocs.io/en/latest/philosophy.html), a proto should resemble:

project_1

README.yml
data
- raw_datafile_1
- XYZ

But let’s say I create the proto and then copy/move files into the proto to get

project_1

README.yml
XYZ
data
- raw_datafile_1

After freezing, I copy the dataset to iRODS, verify it and delete my local copy.
My understanding (tell me if I’m wrong) is when I retrieve this dataset from iRODS I get:

project_1

README.yml
data
- raw_datafile_1

and XYZ is has disappeared.

In this case, if “dtool freeze” had exited with a message about file XYZ, then I wouldn’t have lost XYZ.
Additionally, the dataset structure suggested in the documentation would be enforced.
AFAICT dtool does not enforce the dataset structure (again please tell me if I’m wrong).

``dtool readme`` command outputs stack trace when called on frozen dataset

The dtool readme command results in stack trace when called on a frozen dataset.
It needs to have some sanity checking that the dataset provided is a proto dataset.

Example output below:

$ dtool readme interactive symlink:///Users/olssont/junk/my_dataset
Traceback (most recent call last):
  File "/Users/olssont/envs/dtool/bin/dtool", line 11, in <module>
    sys.exit(dtool())
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtool_create/dataset.py", line 154, in interactive
    config_path=CONFIG_PATH)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtoolcore/__init__.py", line 318, in from_uri
    return cls._from_uri_with_typecheck(uri, config_path, "protodataset")
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtoolcore/__init__.py", line 164, in _from_uri_with_typecheck
    "{} is not a {}".format(uri, cls.__name__))
dtoolcore.DtoolCoreTypeError: symlink:///Users/olssont/junk/my_dataset is not a ProtoDataSet

Add ability to get content of readme file using ``dtool readme`` command

At the moment the dtool readme command is just used to edit/update the content of the readme file. It is not possible to get the content back. This is an issue when working with datasets in remote storage locations, such as iRODS. One does not want to have to fetch the whole dataset in order to be able to inspect the content of the readme file.

Suggest updating the behaviour of the dtool readme command to mimic that of dtool name which can be used both to echo back and to edit the name.

	elif isinstance(value, datetime.date):
	def parse_date(value):
	try:
	date = datetime.datetime.strptime(value, "%Y-%m-%d")
	except ValueError as e:
	raise click.BadParameter(
	"Could not parse date, {}".format(e), param=value)
	return date
	new_value = click.prompt(key, default=value, value_proc=parse_date)
	d[key] = new_value.date().isoformat()

jic-dtool / dtool-create Goto Github PK

dtool-create's Issues

Recommend Projects

Recommend Topics

Recommend Org