Git Product home page Git Product logo

easydataverse's People

Contributors

abearab avatar atrisovic avatar jr-1991 avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

easydataverse's Issues

folder structure and description of added files not being preserved

Hi,

I was trying to upload a bunch of files with descriptions directly with easyDataverse but I am having a little issue. Although the files are uploaded, the information corresponding to the folder structure and file description is lost. Does this happen to anyone else, or is it just me?

Cheers

Error at creating a Dataset from_url

Hey!

I successfully install easyDataverse, but get an error when creating a dataset.
This is my code:

from easyDataverse import Dataset
dataset = Dataset.from_url("https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/Y73N6C")

This is the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 dataset = Dataset.from_url("https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/Y73N6C")

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:40, in pydantic.decorator.validate_arguments.validate.wrapper_function()

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:134, in pydantic.decorator.ValidatedFunction.call()

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:201, in pydantic.decorator.ValidatedFunction.execute()

File ~/PycharmProjects/easyDataverse/easyDataverse/core/dataset.py:359, in Dataset.from_url(cls, url, filedir, download_files, api_token, lib_name)
    353     raise ValueError(
    354         f"Given URL '{url}' is not a valid Dataverse URL since no 'persistenID' is given"
    355     )
    357 dataverse_url = f"https://{parsed_url.hostname}/"
--> 359 return cls.from_dataverse_doi(
    360     doi=doi[0],
    361     filedir=filedir,
    362     lib_name=lib_name,
    363     dataverse_url=dataverse_url,
    364     api_token=api_token,
    365     download_files=download_files,
    366 )

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:40, in pydantic.decorator.validate_arguments.validate.wrapper_function()

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:134, in pydantic.decorator.ValidatedFunction.call()

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:201, in pydantic.decorator.ValidatedFunction.execute()

File ~/PycharmProjects/easyDataverse/easyDataverse/core/dataset.py:416, in Dataset.from_dataverse_doi(cls, doi, filedir, filenames, download_files, lib_name, dataverse_url, api_token)
    412     if not dataverse_url:
    413         raise ValueError(
    414             "Dataverse URL has not been specified in argument 'dataverse_url'. Please specify it to download datasets from your desired installation."
    415         )
--> 416     return cls._fetch_without_lib(
    417         dataset=cls(),
    418         doi=doi,
    419         filedir=filedir,
    420         dataverse_url=dataverse_url,
    421         api_token=api_token,
    422         filenames=filenames,
    423     )
    425 elif not lib_name and "EASYDATAVERSE_LIB_NAME" in os.environ:
    426     dataverse_url, api_token = cls._fetch_env_vars(api_token)

File ~/PycharmProjects/easyDataverse/easyDataverse/core/dataset.py:473, in Dataset._fetch_without_lib(**kwargs)
    468     except KeyError:
    469         warnings.warn(
    470             "No 'API_TOKEN' found in the environment. Please be aware, that you might not have the rights to download this dataset."
    471         )
--> 473 return download_from_dataverse_without_lib(**kwargs)

File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/downloader.py:99, in download_from_dataverse_without_lib(dataset, doi, filedir, dataverse_url, api_token, filenames)
     96 dataset.p_id = doi
     98 # Step 2: Extract all metadatablocks from the given dataset
---> 99 blocks = [
    100     create_block_definitions(block_name, block, dataverse_url)
    101     for block_name, block in metadatablocks.items()
    102 ]
    104 # Step 3: Populate data and assign to dataset
    105 for block in blocks:

File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/downloader.py:100, in <listcomp>(.0)
     96 dataset.p_id = doi
     98 # Step 2: Extract all metadatablocks from the given dataset
     99 blocks = [
--> 100     create_block_definitions(block_name, block, dataverse_url)
    101     for block_name, block in metadatablocks.items()
    102 ]
    104 # Step 3: Populate data and assign to dataset
    105 for block in blocks:

File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:95, in create_block_definitions(block_name, block, dataverse_url)
     93 # Turn raw definitions into classes
     94 for field in block["fields"]:
---> 95     _process_field(field, lookup, cls_def, add_funs)
     97 # Now, create the class
     98 block_cls = create_model(
     99     block_name.capitalize(), __base__=(DataverseBase,), **cls_def
    100 )()

File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:138, in _process_field(field, lookup, cls_def, add_funs)
    131 PROCESS_MAPPING = {
    132     "primitive": _process_primitive,
    133     "controlledVocabulary": _process_primitive,
    134     "compound": _process_compound,
    135 }
    137 fun = PROCESS_MAPPING[field["typeClass"]]
--> 138 fun(field, lookup, cls_def, add_funs)

File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:162, in _process_compound(field, lookup, cls_def, add_funs)
    159 field_name = _camel_to_snake(field_name)
    161 # Generate add method
--> 162 add_funs[f"add_{field_name}"] = _generate_add_method(cls, field_name)
    164 cls_def[field_name] = (dtype, Field(**field_meta))

File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:245, in _generate_add_method(target_cls, field)
    242 add_fun = copy.deepcopy(_generic_add_function)
    243 add_fun.__name__ = f"add_{field}"
--> 245 return forge.sign(
    246     forge.self,
    247     *[
    248         forge.kwarg(name, type=dtype, default=forge.empty)
    249         for name, dtype in target_cls.__annotations__.items()
    250     ],
    251     forge.kwarg("_target_cls", default=target_cls, bound=True),
    252     forge.kwarg("_field", default=field, bound=True),
    253 )(add_fun)

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_revision.py:331, in Revision.__call__(self, callable)
    328         return callable(*mapped.args, **mapped.kwargs)
    330 next_.validate()
--> 331 inner.__mapper__ = Mapper(next_, callable)  # type: ignore
    332 inner.__signature__ = inner.__mapper__.public_signature  # type: ignore
    333 return inner

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_revision.py:62, in Mapper.__init__(self, fsignature, callable)
     54 def __init__(
     55         self,
     56         fsignature: FSignature,
   (...)
     59     # pylint: disable=W0622, redefined-builtin
     60     # pylint: disable=W0621, redefined-outer-name
     61     private_signature = inspect.signature(callable)
---> 62     public_signature = fsignature.native
     63     parameter_map = self.map_parameters(fsignature, private_signature)
     64     context_param = get_context_parameter(fsignature)

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_signature.py:1231, in FSignature.native(self)
   1224 @property
   1225 def native(self) -> inspect.Signature:
   1226     """
   1227     Provides a representation of this :class:`~forge.FSignature` as an
   1228     instance of :class:`inspect.Signature`
   1229     """
   1230     return inspect.Signature(
-> 1231         [param.native for param in self if not param.bound],
   1232         return_annotation=self.return_annotation,
   1233     )

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_signature.py:1231, in <listcomp>(.0)
   1224 @property
   1225 def native(self) -> inspect.Signature:
   1226     """
   1227     Provides a representation of this :class:`~forge.FSignature` as an
   1228     instance of :class:`inspect.Signature`
   1229     """
   1230     return inspect.Signature(
-> 1231         [param.native for param in self if not param.bound],
   1232         return_annotation=self.return_annotation,
   1233     )

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_signature.py:355, in FParameter.native(self)
    353 if not self.name:
    354     raise TypeError('Cannot generate an unnamed parameter')
--> 355 return inspect.Parameter(
    356     name=self.name,
    357     kind=self.kind,
    358     default=empty.ccoerce_native(self.default),
    359     annotation=empty.ccoerce_native(self.type),
    360 )

File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/inspect.py:2673, in Parameter.__init__(self, name, kind, default, annotation)
   2670     name = 'implicit{}'.format(name[1:])
   2672 if not name.isidentifier():
-> 2673     raise ValueError('{!r} is not a valid parameter name'.format(name))
   2675 self._name = name

ValueError: 'e-mail' is not a valid parameter name

What am I doing wrong? x-)

func list_files does not list files

Hey!

I was testing the Dataset functions and I noticed that list_files does not list files, but all metadata. See the example here:

dataset = Dataset.from_dataverse_doi(
  doi="doi:10.7910/DVN/TNI7DY",
  dataverse_url="https://dataverse.harvard.edu"
)
dataset.list_files

Result:

<bound method Dataset.list_files of Dataset(metadatablocks={'citation': Citation(title='Replication Code for: "Essay Content and Style are Strongly Related to Household Income and SAT Scores: Evidence from 60,000 Undergraduate Applications"', author=[Author(name='Alvero, AJ', affiliation='Stanford University', identifier_type=None, identifier=None), Author(name='Giebel, Sonia', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-4730-8980'), Author(name='Gebre-Medhin, Ben', affiliation='Mount Holyoke College', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-8140-0406'), Author(name='antonio, anthony', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-7083-0355'), Author(name='Stevens, Mitchell', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0003-2194-9226'), Author(name='Domingue, Ben', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-3894-9049')], point_of_contact=[PointOfContact(name='Alvero, AJ', affiliation='Stanford University', e_mail='[email protected]')], description=[Description(text='Code used in paper "Essay Content and Style are Strongly Related to Household Income and SAT Scores: Evidence from 60,000 Undergraduate Applications"')], subject=['Arts and Humanities', 'Computer and Information Science', 'Social Sciences', 'Other'], language=['English'], depositor='AJ Alvero', deposit_date='2021-08-27', _metadatablock_name='citation')}, p_id='doi:10.7910/DVN/TNI7DY', files=[File(filename='bootstrap_CI_sat.R', description='Updated variable/column names', file_pid='5595601', local_path='./bootstrap_CI_sat.R', dv_dir=None), 
...

However, dataset.files correctly lists the files:

[File(filename='bootstrap_CI_sat.R', description='Updated variable/column names', file_pid='5595601', local_path='./bootstrap_CI_sat.R', dv_dir=None),
 File(filename='create_cor_figure_S1.R', description='', file_pid='5414271', local_path='./create_cor_figure_S1.R', dv_dir=None),
 File(filename='ctm_essays.R', description='', file_pid='5414269', local_path='./ctm_essays.R', dv_dir=None),
 File(filename='decile_barplots_code.R', description='', file_pid='5414274', local_path='./decile_barplots_code.R', dv_dir=None),
 File(filename='hist_code.R', description='', file_pid='5414284', local_path='./hist_code.R', dv_dir=None),
 File(filename='k_fold_ebrw_on_rhi.R', description='', file_pid='5414270', local_path='./k_fold_ebrw_on_rhi.R', dv_dir=None),
 File(filename='k_fold_ebrw.R', description='', file_pid='5414279', local_path='./k_fold_ebrw.R', dv_dir=None),
....

Add NetCDF initializer

Following from the works in issue IQSS/dataverse#9331 and talks with @pdurbin and @atrisovic, EasyDataverse should be extended to a classmethod that initializes/adds NetCDF data (bounding box only atm) to a Dataset object, if the geospatial metadata block is present.

This method should follow the implementation of @pdurbin IQSS/dataverse#9523 or @atrisovic pdurbin/dataverse#2 - We may decide which one is the final one, if not already decided?

Here is an example notebook, which already extracts the bounding box, but may need some optimization. Currently, it's just extracting it based on unit, but I think it doesn't align to both PRs atm.

Feature request: Add file tags (categories) to files in a dataset

If I am not mistaken, the categories field of a file in a dataset cannot be set by dataset.add_file() at the moment and is not part of the File class.

https://guides.dataverse.org/en/latest/api/native-api.html#updating-file-metadata

def add_file(self, dv_path: str, local_path: str, description: str = ""):
"""Adds a file to the dataset based on the provided path.
Args:
filename (str): Path to the file to be added.
description (str, optional): Description of the file. Defaults to "".
"""

class File(BaseModel):
filename: str
description: Optional[str] = None
file_pid: Optional[str] = None
local_path: Optional[str] = None
dv_dir: Optional[str] = None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.