Git Product home page Git Product logo

dtool-create's Issues

Resolve absolute path when using ``--symlink-path`` option

At the moment relative paths are not expanded and a command such as the below fails:

dtool create my-first-ds --symlink-path rel/path/to/data

At the moment the user needs to do the below:

dtool create my-first-ds --symlink-path `pwd`/rel/path/to/data

Which is not intuitive.

Add verbose flag to ``dtool copy``

End user suggested that it would be useful for the dtool copy command to feed back information about which files are being copied across.

Add ``dtool item cp`` command to return an item using its original relpath

Lots of tools and user think of data in terms of file names. The current command for fetching an item:

dtool item fetch

does not return the item using the original items relpath instead it uses the UUID of the dataset and the item ID.

In order to make life easier for users, and tool that really make use of information in the file path, it would be useful to be able to fetch an item returning it with the original relpath. A nice solution for this might be to implement the command:

dtool item cp

Note that this may also enable us to remove the "hack" of appending the file suffix to the end of the abspaths created by dtool item fetch.

dtool readme interactive shortcomings

This feedback reached me via email,

dtool readme interactive wäre ja eigentlich nett, ist aber bei nested key value maps echt nicht schön, da es nur die innerste Schlüssel-Ebene anzeigt. Auch kann man kein array als Wert eingeben, oder ich versteh zumindest nicht wie. Außerdem stürzt das ganze ab, wenn im template ein null value angegeben ist - was aber m.E perfekt valides yaml wäre.

deepl translation:

dtool readme interactive would actually be nice, but is really not nice for nested key value maps, as it only shows the innermost key level. Also, you can't enter an array as a value, or at least I don't understand how. In addition, the whole thing crashes if a null value is specified in the template - which in my opinion would be perfectly valid yaml.

Add ``dtool publish`` command to CLI

Feedback from user that it would be better to incorporate the functionality of the dtool_publish command line tool from dtool-http into the client as a command named dtool publish.

Add ``-q/--quiet`` option to ``dtool create`` that only returns the generated URI

Current output:

dtool create my_dataset ~/junk
Created proto dataset file:///Users/olssont/junk/my_dataset
Next steps:
1. Add descriptive metadata, e.g:
   dtool readme interactive file:///Users/olssont/junk/my_dataset
2. Add raw data, eg:
   dtool add item my_file.txt file:///Users/olssont/junk/my_dataset
   Or use your system commands, e.g:
   mv my_data_directory /Users/olssont/junk/my_dataset/data/
3. Convert the proto dataset into a dataset:
   dtool freeze file:///Users/olssont/junk/my_dataset

Output with desired option:

dtool create -q my_dataset ~/junk
file:///Users/olssont/junk/my_dataset

inconsistency between date and datetime objects

Using dtool create, I had the error that datetime.date has no attribute date.

The bug is caused by the following lines:

elif isinstance(value, datetime.date):
def parse_date(value):
try:
date = datetime.datetime.strptime(value, "%Y-%m-%d")
except ValueError as e:
raise click.BadParameter(
"Could not parse date, {}".format(e), param=value)
return date
new_value = click.prompt(key, default=value, value_proc=parse_date)
d[key] = new_value.date().isoformat()

The default happens to be a date object, not a datetime object, but the parse_date function returns a datetime object.

Here is the proposed fix:

        elif isinstance(value, datetime.date):
            def parse_date(value):
                try:
                    date = datetime.datetime.strptime(value, "%Y-%m-%d")
                except ValueError as e:
                    raise click.BadParameter(
                        "Could not parse date, {}".format(e), param=value)
                return date.date()
            new_value = click.prompt(key, default=value, value_proc=parse_date)
            d[key] = new_value.isoformat()

This might be related to #22, since something has been changed from datetime to date.

I wonder why this error comes up now, since the code dates from may.

Current python version is 3.8.

Ability to combine copy with verify/diff to ensure that the copy has been successful

Replace:

dtool copy src dest
dtool diff src dest

With a single command:

dtool copy src dest

Implementation detail:

  1. Create a helper function in dtoolcore.compare that takes two manifests as input
  2. Always check identifiers and sizes
  3. If the hashes match also check these, if not log an logger.info message that the hashes could not be compared

Add quiet flag to ``dtool copy`` that only returns the URI copied to

This is needed so that one can programatically discover where the data was copied to when transferring data from different backends. Below is the output of the current behaviour in going from iRODS to local file storage.

$ dtool copy irods:///jic_archive/f13ef963-37f0-4c3c-a96c-da99e036ea10 ~/junk2
Copying dataset  [------------------------------------]    0%
Dataset copied to file:///Users/olssont/junk2/my-another-dataset

From this it is difficult to programatically work out where the dataset has been put. The behaviour below would be a solution to this problem.

$ dtool copy -q irods:///jic_archive/f13ef963-37f0-4c3c-a96c-da99e036ea10 ~/junk2
file:///Users/olssont/junk2/my-another-dataset

Python2 issue with unicode in readme

$ dtool readme edit ~/junk/test-unicode-ds
Traceback (most recent call last):
  File "/Users/olssont/envs/dtool/bin/dtool", line 11, in <module>
    sys.exit(dtool())
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtool_create/dataset.py", line 278, in edit
    edited_content = click.edit(readme_content)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/termui.py", line 456, in edit
    return editor.edit(text)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/_termui_impl.py", line 425, in edit
    text = text.encode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)

Readme generated by ``dtool readme interactive`` looses indentation

Current output:

---
description: Dataset description
project: Project name
confidential: false
personally_identifiable_information: false
owners:
- name: Your Name
  email: [email protected]
  username: olssont
creation_date: 2017-10-23
# links:
#  - http://doi.dx.org/your_doi
#  - http://github.com/your_code_repository
# budget_codes:
#  - E.g. CCBS1H10S

Expected:

---
description: Dataset description
project: Project name
confidential: false
personally_identifiable_information: false
owners:
  - name: Your Name
    email: [email protected]
    username: olssont
creation_date: 2017-10-23
# links:
#  - http://doi.dx.org/your_doi
#  - http://github.com/your_code_repository
# budget_codes:
#  - E.g. CCBS1H10S

Ensure corrupted files do not end up in the dtool cache

If a dtool copy command fails because the connection is broken one can end up with a broken file in the dtool cache. If one then tries to resume the copy one can end up with the broken file into the dtool cache.

Dataset items that end up in the cache should never be corrupted. Some validation should therefore occur before they are put into it.

Add -q/--quiet flag to dtool freeze command

At the moment dtool freeze is very chatty.

$ dtool freeze ~/junk/test > log
$ cat log
Generating manifest
Dataset frozen file:///Users/olssont/junk/test

It would be useful to have a -q/--quiet flag to suppress the writing of this info.

Sanity checking before running ``dtool freeze``

Feedback from a user:

I would like “dtool freeze” to be more circumspect about proceeding because “freeze”ing a proto is irreversible (or should be seen that way).

I think “dtool freeze” should ask for confirmation with a warning about kicking off a (potentially) long-running process.
A verbose “—dry-run” option would also be a good addition.

Why do I ask this?

From the dtool documentation (http://dtool.readthedocs.io/en/latest/philosophy.html), a proto should resemble:

project_1

  • README.yml
  • data
    • raw_datafile_1
    • XYZ

But let’s say I create the proto and then copy/move files into the proto to get

project_1

  • README.yml
  • XYZ
  • data
    • raw_datafile_1

After freezing, I copy the dataset to iRODS, verify it and delete my local copy.
My understanding (tell me if I’m wrong) is when I retrieve this dataset from iRODS I get:

project_1

  • README.yml
  • data
    • raw_datafile_1

and XYZ is has disappeared.

In this case, if “dtool freeze” had exited with a message about file XYZ, then I wouldn’t have lost XYZ.
Additionally, the dataset structure suggested in the documentation would be enforced.
AFAICT dtool does not enforce the dataset structure (again please tell me if I’m wrong).

``dtool readme`` command outputs stack trace when called on frozen dataset

The dtool readme command results in stack trace when called on a frozen dataset.
It needs to have some sanity checking that the dataset provided is a proto dataset.

Example output below:

$ dtool readme interactive symlink:///Users/olssont/junk/my_dataset
Traceback (most recent call last):
  File "/Users/olssont/envs/dtool/bin/dtool", line 11, in <module>
    sys.exit(dtool())
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtool_create/dataset.py", line 154, in interactive
    config_path=CONFIG_PATH)
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtoolcore/__init__.py", line 318, in from_uri
    return cls._from_uri_with_typecheck(uri, config_path, "protodataset")
  File "/Users/olssont/envs/dtool/lib/python2.7/site-packages/dtoolcore/__init__.py", line 164, in _from_uri_with_typecheck
    "{} is not a {}".format(uri, cls.__name__))
dtoolcore.DtoolCoreTypeError: symlink:///Users/olssont/junk/my_dataset is not a ProtoDataSet

Add ability to get content of readme file using ``dtool readme`` command

At the moment the dtool readme command is just used to edit/update the content of the readme file. It is not possible to get the content back. This is an issue when working with datasets in remote storage locations, such as iRODS. One does not want to have to fetch the whole dataset in order to be able to inspect the content of the readme file.

Suggest updating the behaviour of the dtool readme command to mimic that of dtool name which can be used both to echo back and to edit the name.

Add validation of dataset name on creation

It is currently possible to create dataset names with / and newlines. This is not good when copying data from cloud to disk.

There should also be some limit on the length of the dataset name.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.