gdcc / easydataverse Goto Github PK
View Code? Open in Web Editor NEW๐ช - Lightweight Dataverse interface in Python to upload, download and update datasets found in Dataverse installations.
License: MIT License
๐ช - Lightweight Dataverse interface in Python to upload, download and update datasets found in Dataverse installations.
License: MIT License
Hi,
I was trying to upload a bunch of files with descriptions directly with easyDataverse but I am having a little issue. Although the files are uploaded, the information corresponding to the folder structure and file description is lost. Does this happen to anyone else, or is it just me?
Cheers
The panel shows the dataverse, the doi, but not the final link to easily access it.
Maybe this error No 'API_TOKEN' found in the environment.
should refer to DATAVERSE_API_TOKEN
instead of API_TOKEN
as that is what the user sets/exports.
easyDataverse/easyDataverse/core/dataset.py
Line 470 in 81d8d45
Hey!
I successfully install easyDataverse, but get an error when creating a dataset.
This is my code:
from easyDataverse import Dataset
dataset = Dataset.from_url("https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/Y73N6C")
This is the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[10], line 1
----> 1 dataset = Dataset.from_url("https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/Y73N6C")
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:40, in pydantic.decorator.validate_arguments.validate.wrapper_function()
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:134, in pydantic.decorator.ValidatedFunction.call()
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:201, in pydantic.decorator.ValidatedFunction.execute()
File ~/PycharmProjects/easyDataverse/easyDataverse/core/dataset.py:359, in Dataset.from_url(cls, url, filedir, download_files, api_token, lib_name)
353 raise ValueError(
354 f"Given URL '{url}' is not a valid Dataverse URL since no 'persistenID' is given"
355 )
357 dataverse_url = f"https://{parsed_url.hostname}/"
--> 359 return cls.from_dataverse_doi(
360 doi=doi[0],
361 filedir=filedir,
362 lib_name=lib_name,
363 dataverse_url=dataverse_url,
364 api_token=api_token,
365 download_files=download_files,
366 )
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:40, in pydantic.decorator.validate_arguments.validate.wrapper_function()
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:134, in pydantic.decorator.ValidatedFunction.call()
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/pydantic/decorator.py:201, in pydantic.decorator.ValidatedFunction.execute()
File ~/PycharmProjects/easyDataverse/easyDataverse/core/dataset.py:416, in Dataset.from_dataverse_doi(cls, doi, filedir, filenames, download_files, lib_name, dataverse_url, api_token)
412 if not dataverse_url:
413 raise ValueError(
414 "Dataverse URL has not been specified in argument 'dataverse_url'. Please specify it to download datasets from your desired installation."
415 )
--> 416 return cls._fetch_without_lib(
417 dataset=cls(),
418 doi=doi,
419 filedir=filedir,
420 dataverse_url=dataverse_url,
421 api_token=api_token,
422 filenames=filenames,
423 )
425 elif not lib_name and "EASYDATAVERSE_LIB_NAME" in os.environ:
426 dataverse_url, api_token = cls._fetch_env_vars(api_token)
File ~/PycharmProjects/easyDataverse/easyDataverse/core/dataset.py:473, in Dataset._fetch_without_lib(**kwargs)
468 except KeyError:
469 warnings.warn(
470 "No 'API_TOKEN' found in the environment. Please be aware, that you might not have the rights to download this dataset."
471 )
--> 473 return download_from_dataverse_without_lib(**kwargs)
File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/downloader.py:99, in download_from_dataverse_without_lib(dataset, doi, filedir, dataverse_url, api_token, filenames)
96 dataset.p_id = doi
98 # Step 2: Extract all metadatablocks from the given dataset
---> 99 blocks = [
100 create_block_definitions(block_name, block, dataverse_url)
101 for block_name, block in metadatablocks.items()
102 ]
104 # Step 3: Populate data and assign to dataset
105 for block in blocks:
File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/downloader.py:100, in <listcomp>(.0)
96 dataset.p_id = doi
98 # Step 2: Extract all metadatablocks from the given dataset
99 blocks = [
--> 100 create_block_definitions(block_name, block, dataverse_url)
101 for block_name, block in metadatablocks.items()
102 ]
104 # Step 3: Populate data and assign to dataset
105 for block in blocks:
File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:95, in create_block_definitions(block_name, block, dataverse_url)
93 # Turn raw definitions into classes
94 for field in block["fields"]:
---> 95 _process_field(field, lookup, cls_def, add_funs)
97 # Now, create the class
98 block_cls = create_model(
99 block_name.capitalize(), __base__=(DataverseBase,), **cls_def
100 )()
File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:138, in _process_field(field, lookup, cls_def, add_funs)
131 PROCESS_MAPPING = {
132 "primitive": _process_primitive,
133 "controlledVocabulary": _process_primitive,
134 "compound": _process_compound,
135 }
137 fun = PROCESS_MAPPING[field["typeClass"]]
--> 138 fun(field, lookup, cls_def, add_funs)
File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:162, in _process_compound(field, lookup, cls_def, add_funs)
159 field_name = _camel_to_snake(field_name)
161 # Generate add method
--> 162 add_funs[f"add_{field_name}"] = _generate_add_method(cls, field_name)
164 cls_def[field_name] = (dtype, Field(**field_meta))
File ~/PycharmProjects/easyDataverse/easyDataverse/tools/downloader/nolibutils.py:245, in _generate_add_method(target_cls, field)
242 add_fun = copy.deepcopy(_generic_add_function)
243 add_fun.__name__ = f"add_{field}"
--> 245 return forge.sign(
246 forge.self,
247 *[
248 forge.kwarg(name, type=dtype, default=forge.empty)
249 for name, dtype in target_cls.__annotations__.items()
250 ],
251 forge.kwarg("_target_cls", default=target_cls, bound=True),
252 forge.kwarg("_field", default=field, bound=True),
253 )(add_fun)
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_revision.py:331, in Revision.__call__(self, callable)
328 return callable(*mapped.args, **mapped.kwargs)
330 next_.validate()
--> 331 inner.__mapper__ = Mapper(next_, callable) # type: ignore
332 inner.__signature__ = inner.__mapper__.public_signature # type: ignore
333 return inner
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_revision.py:62, in Mapper.__init__(self, fsignature, callable)
54 def __init__(
55 self,
56 fsignature: FSignature,
(...)
59 # pylint: disable=W0622, redefined-builtin
60 # pylint: disable=W0621, redefined-outer-name
61 private_signature = inspect.signature(callable)
---> 62 public_signature = fsignature.native
63 parameter_map = self.map_parameters(fsignature, private_signature)
64 context_param = get_context_parameter(fsignature)
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_signature.py:1231, in FSignature.native(self)
1224 @property
1225 def native(self) -> inspect.Signature:
1226 """
1227 Provides a representation of this :class:`~forge.FSignature` as an
1228 instance of :class:`inspect.Signature`
1229 """
1230 return inspect.Signature(
-> 1231 [param.native for param in self if not param.bound],
1232 return_annotation=self.return_annotation,
1233 )
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_signature.py:1231, in <listcomp>(.0)
1224 @property
1225 def native(self) -> inspect.Signature:
1226 """
1227 Provides a representation of this :class:`~forge.FSignature` as an
1228 instance of :class:`inspect.Signature`
1229 """
1230 return inspect.Signature(
-> 1231 [param.native for param in self if not param.bound],
1232 return_annotation=self.return_annotation,
1233 )
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/site-packages/forge/_signature.py:355, in FParameter.native(self)
353 if not self.name:
354 raise TypeError('Cannot generate an unnamed parameter')
--> 355 return inspect.Parameter(
356 name=self.name,
357 kind=self.kind,
358 default=empty.ccoerce_native(self.default),
359 annotation=empty.ccoerce_native(self.type),
360 )
File ~/opt/miniconda3/envs/easyDVconda/lib/python3.10/inspect.py:2673, in Parameter.__init__(self, name, kind, default, annotation)
2670 name = 'implicit{}'.format(name[1:])
2672 if not name.isidentifier():
-> 2673 raise ValueError('{!r} is not a valid parameter name'.format(name))
2675 self._name = name
ValueError: 'e-mail' is not a valid parameter name
What am I doing wrong? x-)
EasyDataverse needs to be properly documented to improve its usability. Something in the style of PyDantic`s docs would be nice.
Hey!
I was testing the Dataset functions and I noticed that list_files
does not list files, but all metadata. See the example here:
dataset = Dataset.from_dataverse_doi(
doi="doi:10.7910/DVN/TNI7DY",
dataverse_url="https://dataverse.harvard.edu"
)
dataset.list_files
Result:
<bound method Dataset.list_files of Dataset(metadatablocks={'citation': Citation(title='Replication Code for: "Essay Content and Style are Strongly Related to Household Income and SAT Scores: Evidence from 60,000 Undergraduate Applications"', author=[Author(name='Alvero, AJ', affiliation='Stanford University', identifier_type=None, identifier=None), Author(name='Giebel, Sonia', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-4730-8980'), Author(name='Gebre-Medhin, Ben', affiliation='Mount Holyoke College', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-8140-0406'), Author(name='antonio, anthony', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-7083-0355'), Author(name='Stevens, Mitchell', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0003-2194-9226'), Author(name='Domingue, Ben', affiliation='Stanford University', identifier_type='ORCID', identifier='https://orcid.org/ 0000-0002-3894-9049')], point_of_contact=[PointOfContact(name='Alvero, AJ', affiliation='Stanford University', e_mail='[email protected]')], description=[Description(text='Code used in paper "Essay Content and Style are Strongly Related to Household Income and SAT Scores: Evidence from 60,000 Undergraduate Applications"')], subject=['Arts and Humanities', 'Computer and Information Science', 'Social Sciences', 'Other'], language=['English'], depositor='AJ Alvero', deposit_date='2021-08-27', _metadatablock_name='citation')}, p_id='doi:10.7910/DVN/TNI7DY', files=[File(filename='bootstrap_CI_sat.R', description='Updated variable/column names', file_pid='5595601', local_path='./bootstrap_CI_sat.R', dv_dir=None),
...
However, dataset.files
correctly lists the files:
[File(filename='bootstrap_CI_sat.R', description='Updated variable/column names', file_pid='5595601', local_path='./bootstrap_CI_sat.R', dv_dir=None),
File(filename='create_cor_figure_S1.R', description='', file_pid='5414271', local_path='./create_cor_figure_S1.R', dv_dir=None),
File(filename='ctm_essays.R', description='', file_pid='5414269', local_path='./ctm_essays.R', dv_dir=None),
File(filename='decile_barplots_code.R', description='', file_pid='5414274', local_path='./decile_barplots_code.R', dv_dir=None),
File(filename='hist_code.R', description='', file_pid='5414284', local_path='./hist_code.R', dv_dir=None),
File(filename='k_fold_ebrw_on_rhi.R', description='', file_pid='5414270', local_path='./k_fold_ebrw_on_rhi.R', dv_dir=None),
File(filename='k_fold_ebrw.R', description='', file_pid='5414279', local_path='./k_fold_ebrw.R', dv_dir=None),
....
Following from the works in issue IQSS/dataverse#9331 and talks with @pdurbin and @atrisovic, EasyDataverse should be extended to a classmethod
that initializes/adds NetCDF data (bounding box only atm) to a Dataset
object, if the geospatial metadata block is present.
This method should follow the implementation of @pdurbin IQSS/dataverse#9523 or @atrisovic pdurbin/dataverse#2 - We may decide which one is the final one, if not already decided?
Here is an example notebook, which already extracts the bounding box, but may need some optimization. Currently, it's just extracting it based on unit, but I think it doesn't align to both PRs atm.
Currently JSON schemas are exported upon code generation. There should be a method to skip code generation and obtain only the schemes.
Hi!
Currently, it is not possible to add / change license information of datasets in Dataverse using the easyDataverse library. Support for that would be awesome, though!
If I am not mistaken, the categories
field of a file in a dataset cannot be set by dataset.add_file()
at the moment and is not part of the File
class.
https://guides.dataverse.org/en/latest/api/native-api.html#updating-file-metadata
easyDataverse/easyDataverse/core/dataset.py
Lines 57 to 63 in e835795
easyDataverse/easyDataverse/core/file.py
Lines 7 to 13 in e835795
Maybe you can update your package here โ https://pypi.org/project/easyDataverse/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.