Git Product home page Git Product logo

dvcurator-python's Introduction

dvcurator -- Automating common Dataverse Curation tasks

codecov.io

The tool is based on QDR's curation practices and will likely require modifications for other repositories.

Functionality

This program executes four main tasks:

  1. Creating a local curation folder (ideally synced elsewhere, like dropbox) to edit the data, with subfolders for Original Deposit and Prepared
  2. Downloading the .zip file for the full data project to the Original Deposit folder and an unzipped version to QDR Prepared for further Curation
  3. Creating github issues for standard curation tasks and associating them with a github Project for the curation of the data project.
  4. Automatically setting metadata for PDF files based on Dataverse metadata

Installing dvcurator-python

Most users will want to use the applications available in the release section.

NOTE: Syracuse University office computers cannot run this program directly. You should download the SU Lab Install.bat file (right click "Raw" and the top and Save Link As), and run that instead. This will install the program properly and create a shortcut on your desktop.

Running dvcurator-python

This program is operated primarily through the GUI. If you downloaded the self-contained binaries, just double-click and run.

Running the .exe file from the release page requires no additional software.

An .ini configuration file is used to save program settings, like the github and dataverse tokens. After entering the values into the program, you can save a .ini config file from the "File" menu at the top.

Some functions, like downloading public datasets, will operate without API tokens, but expect potential bugs.

Tokens

To be fully functional, the following parameters must be set:

  • A project DOI in the form doi:10.1234/abcdef. Once metadata is loaded for that DOI, use the "Reset dvcurator" button to input a different DOI.
  • A github token
    • To create a github token, go to your github developer settings/personal access tokens at https://github.com/settings/tokens
    • Click on "Generate New Token", and select "Generate New Token (classic)"
    • Give the token a recognizable name such as "QDR Curation" and check the following boxes:
      • repo
      • admin:org
      • project
    • Click "Generate Token" at the bottom of the screen. Make sure to note down your token and keep it safe (you won't be able to access this later)
  • A dataverse API key -- this must be for the dataverse installation you will work with.

Both of these tokens are entered on into the main window of dvcurator, under "Github token" and "Dataverse token" respectively.

Other parameters are:

  • QDR GA folder: Where the archive will be downloaded and extracted to. Usually points to a folder that syncs with Dropbox, but does not necessarily need to be. For QDR GA's this should very literally be the "QDR GA" folder within the QDR Dropbox.

For developers (i.e. not most users!)

The more adventurous can install this package directly through pip. If you have both pip and git installed, this package can be downloaded and installed directly with:

pip install git+https://github.com/QualitativeDataRepository/dvcurator-python

Otherwise, this package can be installed from a zip file:

pip install dvcurator-python-master.zip

If you want to run dvcurator as an interpreted program, the python library requrirements are listed in requirements.txt.

Running

Installations through pip can be run directly, e.g.

python3 -m dvcurator

dvcurator-python's People

Contributors

adam3smith avatar mccallc avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

adam3smith

dvcurator-python's Issues

documentation

  • high-level documentation about how functions fit together
  • roxygen type documentation for each function

Update issues

  • Add accessibility
  • Change "add terms" language

Look through Curation Handbook if anything else needs updating

pikepdf metadata missing in some github-compiled versions

Traceback (most recent call last):
  File "pikepdf/__init__.py", line 19, in <module>
  File "PyInstaller/loader/pyimod03_importers.py", line 495, in exec_module
  File "pikepdf/_version.py", line 13, in <module>
  File "importlib/metadata/__init__.py", line 984, in version
  File "importlib/metadata/__init__.py", line 957, in distribution
  File "importlib/metadata/__init__.py", line 548, in from_name
importlib.metadata.PackageNotFoundError: No package metadata was found for pikepdf

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "tkinter/__init__.py", line 1921, in __call__
  File "dvcurator/gui.py", line 152, in set_metadata
  File "dvcurator/pdf_metadata.py", line 44, in standard_metadata
  File "PyInstaller/loader/pyimod03_importers.py", line 495, in exec_module
  File "pikepdf/__init__.py", line 21, in <module>
ImportError: Failed to determine version```


pikepdf fails to set metadata

  File "/usr/lib/python3.7/tkinter/__init__.py", line 1705, in __call__
    return self.func(*args)
  File "/home/michael/Documents/qdr/dvcurator-python/dvcurator/gui.py", line 152, in set_metadata
    dvcurator.pdf_metadata.standard_metadata(metadata_path, self.citation['depositor'])
  File "/home/michael/Documents/qdr/dvcurator-python/dvcurator/pdf_metadata.py", line 51, in standard_metadata
    pdf = pikepdf.open(path, allow_overwriting_input=True)
  File "/usr/lib/python3/dist-packages/pikepdf/__init__.py", line 41, in open
    return Pdf.open(*args, **kwargs)
TypeError: open(): incompatible function arguments. The following argument types are supported:
    1. (filename_or_stream: object, password: str='', hex_password: bool=False, ignore_xref_streams: bool=False, suppress_warnings: bool=True, attempt_recovery: bool=True, inherit_page_attributes: bool=True) -> pikepdf._qpdf.Pdf```

Use creator (not depositor) lastname for metadata

Since there can be >1 creators, we'd need to make sure we capture those cases. Prefered behavior would be:

1 author: author1lastname -
2 authors: author1lastnmae-author2lastname -
3+ authors author1lastname-et-al -

Clarify build/versioning process

Currently, automatic builds fail because they're We should probably do proper semantic versioning. Could we mint a 0.0.1 release and then automatically incrementally tag on build? There's really no cost to having some arbitrarily large number in the end. Alternatively, we keep the build workflow manual. I'm also OK with that.
Suggestions for other options welcome.

If we can display the version number in the tool, that'd be helpful for error reporting.

projects beta

long term, might be worth exploring the new github projects system. it's currently in beta and they expect API changes

ticket changes

change "assign depositor to project" to "assign curator"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.