rseng / rse Goto Github PK

View Code? Open in Web Editor NEW

15.0 4.0 2.0 21.83 MB

tools for assessment and categorization of research software

Home Page: https://rseng.github.io/rse/

License: Mozilla Public License 2.0

Python 45.68% Dockerfile 0.10% Shell 1.16% CSS 23.61% HTML 25.09% Makefile 1.36% Ruby 0.01% JavaScript 2.98%

rseng research-software research-software-encyclopedia

rse's Introduction

Research Software Engineering

Criteria and taxonomy for research software engineering (rseng).

Overview

This repository serves a taxonomy and criteria for research software, intended to be used with the research software encyclopedia. The two are maintained separately for development, and because it might be the case that the criteria and taxonomy would want to be used separately from the encyclopedia.

How do I contribute?

You can edit taxonomy or criteria items by way of opening a pull request against the master branch. When it is merged, an automated task will update the interface served at https://rseng.github.io/rseng. You can also interact with the rseng software for your own needs, shown below.

Usage

Usage of the library includes programmatic (within Python or command line) interaction with criteria or taxonomy, and generation of output files.

Criteria
Taxonomy
Generate

Criteria

For usage within Python, you will first want to instantiate a CriteriaSet. If you don't provide a default file, the library default will be used.

from rseng.main.criteria import CriteriaSet
cset = CriteriaSet()
# [CriteriaSet:6]

You can then see questions loaded. Each has a unique id that gives a sense of what is being asked:

cset.criteria                                                                       
{'RSE-research-intention': <rseng.main.criteria.base.Criteria at 0x7f3d2e85d410>,
 'RSE-domain-intention': <rseng.main.criteria.base.Criteria at 0x7f3d2dab8490>,
 'RSE-question-intention': <rseng.main.criteria.base.Criteria at 0x7f3d2dab8910>,
 'RSE-citation': <rseng.main.criteria.base.Criteria at 0x7f3d2db34810>,
 'RSE-usage': <rseng.main.criteria.base.Criteria at 0x7f3d2db340d0>,
 'RSE-absence': <rseng.main.criteria.base.Criteria at 0x7f3d2db34850>}

You can inspect any particular criteria:

cset.criteria['RSE-usage']
<rseng.main.criteria.base.Criteria at 0x7f3d2db340d0>

cset.criteria['RSE-usage'].uid
# 'RSE-usage'

cset.criteria['RSE-usage'].question
# 'Has the software been used by researchers?'

cset.criteria['RSE-usage'].options
# ['yes', 'no']

And further interact with the CriteriaSet, for example export to a tabular file:

print(cset.export()) # You can also define a "filename" and/or "sep" here.
RSE-research-intention	Is the software intended for research?	yes,no
RSE-domain-intention	Is the software intended for a particular domain?	yes,no
RSE-question-intention	Was the software created with intention to solve a research question?	yes,no
RSE-citation	Has the software been cited?	yes,no
RSE-usage	Has the software been used by researchers?	yes,no
RSE-absence	Would taking away the software be a detriment to research?	yes,no

or iterate through the criteria, or get a list of all of them.

> list(cset)
[[Criteria:RSE-research-intention,Is the software intended for research?],
 [Criteria:RSE-domain-intention,Is the software intended for a particular domain?],
 [Criteria:RSE-question-intention,Was the software created with intention to solve a research question?],
 [Criteria:RSE-citation,Has the software been cited?],
 [Criteria:RSE-usage,Has the software been used by researchers?],
 [Criteria:RSE-absence,Would taking away the software be a detriment to research?]]

for criteria in cset:
    print(criteria)

[Criteria:RSE-research-intention,Is the software intended for research?]
[Criteria:RSE-domain-intention,Is the software intended for a particular domain?]
[Criteria:RSE-question-intention,Was the software created with intention to solve a research question?]
[Criteria:RSE-citation,Has the software been cited?]
[Criteria:RSE-usage,Has the software been used by researchers?]
[Criteria:RSE-absence,Would taking away the software be a detriment to research?]

Taxonomy

The taxonomy is interacted with in a similar fashion.

from rseng.main.taxonomy import Taxonomy
tax = Taxonomy()

It will show you the total number of nodes (nested too):

from rseng.main.taxonomy import Taxonomy
tax = Taxonomy()
#  [Taxonomy:24]

Validation happens as the default file is loaded. Akin to criteria, the files are located in rseng/main/taxonomy in yaml format, and are dated. You can quickly print an easily viewable, human understandable version of the tree:

for name in tax.flatten(): 
   ...:     print(name) 
   ...:                                                                                                                                                                                                                      
Software to directly conduct research >> Domain-specific software >> Domain-specific hardware
Software to directly conduct research >> Domain-specific software >> Domain-specific optimized software
Software to directly conduct research >> Domain-specific software >> Domain-specific analysis software
Software to directly conduct research >> General software >> Numerical libraries
Software to directly conduct research >> General software >> Data collection
Software to directly conduct research >> General software >> Visualization
Software to support research >> Explicitly for research >> Workflow managers
Software to support research >> Explicitly for research >> Interactive development environments for research
Software to support research >> Explicitly for research >> Provenance and metadata collection tools
Software to support research >> Used for research but not explicitly for it >> Databases
Software to support research >> Used for research but not explicitly for it >> Application Programming Interfaces
Software to support research >> Used for research but not explicitly for it >> Frameworks
Software to support research >> Incidentally used for research >> Operating systems
Software to support research >> Incidentally used for research >> Personal scheduling and task management
Software to support research >> Incidentally used for research >> Version control
Software to support research >> Incidentally used for research >> Text editors and integrated development environments
Software to support research >> Incidentally used for research >> Communication tools or platforms

As of version 0.0.13 there are assigned colors for each taxonomy item to ensure more consistency across interface generation. The colors to choose from can be imported from rse.utils.colors.browser_palette, and include those with "medium" or "dark" in the name. This one hasn't been used yet, and the list should be consulted for others.

mediumvioletred

Generate

After you install rseng, the rseng executable should be in your path. You can generate output files for the taxonomy or critiera to a folder oath that doesn't exist yet. For example, to generate the markdown files for the static documentation for each of the taxonomy and criteria we do:

Markdown Jekyll Pages

# rseng generate <type>   <path>          <version>
$ rseng generate taxonomy docs/_taxonomy
docs/_taxonomy/RSE-taxonomy-domain-hardware.md
docs/_taxonomy/RSE-taxonomy-optimized.md
docs/_taxonomy/RSE-taxonomy-analysis.md
docs/_taxonomy/RSE-taxonomy-numerical libraries.md
docs/_taxonomy/RSE-taxonomy-data-collection.md
docs/_taxonomy/RSE-taxonomy-visualization.md
docs/_taxonomy/RSE-taxonomy-workflow-managers.md
docs/_taxonomy/RSE-taxonomy-ide-research.md
docs/_taxonomy/RSE-taxonomy-provenance-metadata-tools.md
docs/_taxonomy/RSE-taxonomy-databases.md
docs/_taxonomy/RSE-taxonomy-application-programming-interfaces.md
docs/_taxonomy/RSE-taxonomy-frameworks.md
docs/_taxonomy/RSE-taxonomy-operating-systems.md
docs/_taxonomy/RSE-taxonomy-personal-scheduling-task-management.md
docs/_taxonomy/RSE-taxonomy-version-control.md
docs/_taxonomy/RSE-taxonomy-text-editors-ides.md
docs/_taxonomy/RSE-taxonomy-communication-tools.md

The default version generated for each is "latest" but you can add another version as the last argument to change that. Here is generation of the criteria, showing using latest:

# rseng generate <type>   <path>          <version>
$ rseng generate criteria docs/_criteria
docs/_criteria/RSE-research-intention.md
docs/_criteria/RSE-domain-intention.md
docs/_criteria/RSE-question-intention.md
docs/_criteria/RSE-citation.md
docs/_criteria/RSE-usage.md
docs/_criteria/RSE-absence.md

Intended for Visualization (json)

You can also generate a (non flat) version of the taxonomy, specifically a json file that plugs easily into the d3 hierarchy plots.

# rseng generate taxonomy-json <filename>
$ rseng generate taxonomy-json taxonomy.json

GitHub Issue Templates

If you want an issue template that can work with a GitHub workflow (both in your software repository) to items via GitHub, both can be produced with updated criteria or taxonomy items via:

$ rseng generate criteria-annotation-template

And the template will be generated (with default filename) in the present working directory:

---
name: Annotate Criteria
about: Select this template to annotate criteria for a software repository
title: "[CRITERIA]"
labels: ''
assignees: ''
---

## Repository

<!-- write the name of the repository here-->

## Criteria

<!-- check boxes for criteria to indicate "yes" -->


 - [ ] criteria-RSE-research-intention
 - [ ] criteria-RSE-domain-intention
 - [ ] criteria-RSE-question-intention
 - [ ] criteria-RSE-citation
 - [ ] criteria-RSE-usage
 - [ ] criteria-RSE-absence

You can do the same for a GitHub issues taxonomy annotation template:

$ rseng generate taxonomy-annotation-template

---
name: Annotate Taxonomy
about: Select this template to annotate software with taxonomy categories
title: "[TAXONOMY]"
labels: ''
assignees: ''
---

## Repository

<!-- write the name of the repository here-->

## Taxonomy

<!-- check one or more boxes for categories to indicate "yes" -->


 - [ ] RSE-taxonomy-domain-hardware
Software to directly conduct research >> Domain-specific software >> Domain-specific hardware

 - [ ] RSE-taxonomy-optimized
Software to directly conduct research >> Domain-specific software >> Domain-specific optimized software

 - [ ] RSE-taxonomy-analysis
Software to directly conduct research >> Domain-specific software >> Domain-specific analysis software

 - [ ] RSE-taxonomy-numerical libraries
Software to directly conduct research >> General software >> Numerical libraries

 - [ ] RSE-taxonomy-data-collection
Software to directly conduct research >> General software >> Data collection

 - [ ] RSE-taxonomy-visualization
Software to directly conduct research >> General software >> Visualization

 - [ ] RSE-taxonomy-workflow-managers
Software to support research >> Explicitly for research >> Workflow managers

 - [ ] RSE-taxonomy-ide-research
Software to support research >> Explicitly for research >> Interactive development environments for research

 - [ ] RSE-taxonomy-provenance-metadata-tools
Software to support research >> Explicitly for research >> Provenance and metadata collection tools

 - [ ] RSE-taxonomy-databases
Software to support research >> Used for research but not explicitly for it >> Databases

 - [ ] RSE-taxonomy-application-programming-interfaces
Software to support research >> Used for research but not explicitly for it >> Application Programming Interfaces

 - [ ] RSE-taxonomy-frameworks
Software to support research >> Used for research but not explicitly for it >> Frameworks

 - [ ] RSE-taxonomy-operating-systems
Software to support research >> Incidentally used for research >> Operating systems

 - [ ] RSE-taxonomy-personal-scheduling-task-management
Software to support research >> Incidentally used for research >> Personal scheduling and task management

 - [ ] RSE-taxonomy-version-control
Software to support research >> Incidentally used for research >> Version control

 - [ ] RSE-taxonomy-text-editors-ides
Software to support research >> Incidentally used for research >> Text editors and integrated development environments

 - [ ] RSE-taxonomy-communication-tools
Software to support research >> Incidentally used for research >> Communication tools or platforms

Example in the wild include this one for criteria and this one for the taxonomy. Note that you should add the templates along with creating labels, one for each of taxonomy and criteria. A workflow to automatically update criteria/taxonomy items is being written and will be added soon.

License

Free software: MPL 2.0 License

rse's People

Contributors

Stargazers

Watchers

Forkers

untzag nickledave

rse's Issues

DOC: Revise / add screenshot of how to share google sheets for `rse import`

The rse import command docs are very thorough (if only a user reads them 😇) but the explanation of how to share the sheet might be a bit confusing since there are two places that say "share": (1) a big vibrantly-colored button in the upper right, which is not what you are referring to, and (2) one tiny word under the File dropdown on the menu bar, that you are referring to.

I thought it was (1) and not (2) and spent a while staring at the box for (2) trying to figure out what I was missing.

One way to make this clearer would be to change this

Share -> Publish to Web -> Form Responses 1 (or the sheet name in first dropdown) -> Comma-separated value (csv) (second dropdown)

to this

File -> Share -> Publish to Web -> Form Responses 1 (or the sheet name in first dropdown) -> Comma-separated value (csv) (second dropdown)

Another way would be to add a screenshot with arrows saying "not this" and "this" (with the dropdown showing)

Extended rse software for custom registry and analysis

We discussed today being able to make a more customized (and simpler) UI for a software database, and I think this should be supported with rse export jekyll-web /docs. Specifically we also want:

Create example workflows to:
- Update a database nightly - the nested "data" should update for the repo
Ability to manually add listing of tags to annotation. E.g., currently we have "topics" under a repository automated metadata, and we should be able to have a list of human curated tags, with the option to enforce adding from a particular set.
Custom UI with table for rsepedia- instead of the default UI we have now, this should be customizable in a jekyll template
As an alternative, a scraper for a .csv file (from a URL) with set of required fields and others are custom data - this should provide a template form to show how to do it.
We want to be able to define a custom domain (bioacousics software) schema that can plug into analyzer
A separate cool idea - can we use google scholar to get citation info? A separate UI. https://serpapi.com/google-scholar-api
Add to rse to be able to parse a CITATION.cff -> then ask who cited this (via the above)
Need an example to explicitly update a record

When metrics done, add to base annotation interface

And also add to the repository level annotation view, for the user to see how many have been answered.

Color assignment and dates

We want to minimize changes to the interface, so we should:

1. derive the date from the repo instead of when the post was generated
1. have pre-assigned colors so the files aren't updated with new color generation

"'Logger' object has no attribute 'exit'" from `rse import`

Research Software Encyclopedia version: 0.0.42
Python version: 3.10.5
Operating System: Ubuntu (Pop!-OS 22.04 LTS)

Description

I am trying to run rse import on a copy of the original spreadsheet that the example in the docs is copied from.
To do so, in the copy I changed the first two column titles to "Title" and "Url" (to match the example), and then I published to get a link as explained in the docs for the import command.
I then ran the command with the published url (below) for the copy of the spreadsheet.
I hit two errors: the first, because paste inserts backslashes that break the link; the second, because rse is not happy with one of the column titles.
The only reason the first matter errors is because it made it more obvious that there's some issue with the Logger that's preventing the full traceback from the second error getting logged -- i.e. I can't see what required field is missing.

What I Did

error 1 (because of extra slashes inserted when pasting

$ rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub\?gid\=0\&single\=true\&output\=csv"
Traceback (most recent call last):
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/bin/rse", line 8, in <module>
    sys.exit(main())
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/__init__.py", line 519, in main
    main(args=args, extra=extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/imp.py", line 24, in main
    results = importer.scrape(extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/main/scrapers/googlesheet.py", line 51, in scrape
    bot.exit(f"Issue parsing {url}: {response.text}")
AttributeError: 'Logger' object has no attribute 'exit'

error 2 (because of some {field} I'm missing in the spreadsheet)

$ rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub?gid=0&single=true&output=csv" 
Traceback (most recent call last):
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/bin/rse", line 8, in <module>
    sys.exit(main())
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/__init__.py", line 519, in main
    main(args=args, extra=extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/imp.py", line 24, in main
    results = importer.scrape(extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/main/scrapers/googlesheet.py", line 63, in scrape
    bot.exit(f"Sheet {url} is missing required field {required}.")
AttributeError: 'Logger' object has no attribute 'exit'

Additional scrapers to add

ascl https://ascl.net/code/all/limit/250
biogrids https://biogrids.org/software/
https://geodynamics.org/software?
and the rest (non data, but software) from https://scicodes.net/participants/

Create simple interface to interact with / visualize repos

The interface will be the basis for assessing criteria and the taxonomy.

rse shell fails without ipython

Research Software Encyclopedia version: 0.0.30
Python version: 3.9.0
Operating System: Debian 10

Description

rse shell fails when ipython is not installed.
No module IPython.
In my opinion this should fall back to another shell if IPython isn't found.
IPython is not (and should not be, in my opinion) a dependency of rse.

What I Did

$ rse shell
INFO:rse.main:Database: filesystem
Traceback (most recent call last):
  File "/home/blaise/miniconda3/envs/rse/bin/rse", line 8, in <module>
    sys.exit(main())
  File "/home/blaise/miniconda3/envs/rse/lib/python3.9/site-packages/rse/client/__init__.py", line 477, in main
    main(args=args, extra=extra)
  File "/home/blaise/miniconda3/envs/rse/lib/python3.9/site-packages/rse/client/shell.py", line 23, in main
    return lookup[shell](args)
  File "/home/blaise/miniconda3/envs/rse/lib/python3.9/site-packages/rse/client/shell.py", line 35, in ipython
    from IPython import embed
ModuleNotFoundError: No module named 'IPython'

Update Docker deploy

Deploy needs to happen to ghcr https://github.com/rseng/rse/actions/runs/6602857428/job/17935377256

Add client and taxonomy

Stage 1: Add to client #10
Stage 2: Add annotation functions, and representation in database (larger PR)
Stage 3: Add interface to drive annotation

Given custom tags or taxonomy...

we can add a custom circle plot to show some kind of relationship in the jekyll-web template. https://codepen.io/samsonite123/pen/BxoxYd.

static interface: can we simplify export?

Right now we export an entire page to annotate a software repo - can we simplify this?

Summary and analysis metrics

Give a research software encyclopedia, I should be able to run a function to summarize or derive metrics, including the following:

summary that shows raw counts / categories for each repository
analyze that can take a set of criteria, an optional threshold, optional list of repos, optional subset of taxonomy items, and return a yes/no response about being research software (or not).

This will be the main driving functionality for the original idea.

User should be able to skip annotation

Bug with parsing github name with .

INFO:rse.main.scrapers.joss:Found repository: https://github.com/swiftsim/swiftsimio
INFO:rse.main.scrapers.joss:Found repository: https://github.com/adbar/htmldate
INFO:rse.main.scrapers.joss:Found repository: https://c4science.ch/source/tamaas/
INFO:rse.main.scrapers.joss:Found repository: https://github.com/ur-whitelab/hoomd-tf/
INFO:rse.main.scrapers.joss:Found repository: https://github.com/logological/gpp.git
INFO:rse.main.scrapers.joss:Found repository: https://github.com/zdelrosario/py_grama
INFO:rse.main.scrapers.joss:Found repository: https://github.com/keurfonluu/toughio
INFO:rse.main.scrapers.joss:Found repository: https://github.com/rhenanbartels/hrv
INFO:rse.main.scrapers.joss:Found repository: https://github.com/csdms/bmi
INFO:rse.main.scrapers.joss:Found repository: https://github.com/donaldRwilliams/BGGM
INFO:rse.main:Database: filesystem
INFO:rse.main.database.filesystem:github/swiftsim/swiftsimio was added to the the database.
INFO:rse.main:github/swiftsim/swiftsimio has been updated.
INFO:rse.main.database.filesystem:github/adbar/htmldate was added to the the database.
INFO:rse.main:github/adbar/htmldate has been updated.
Found 10 results
Traceback (most recent call last):
  File "/usr/local/bin/rse", line 11, in <module>
    load_entry_point('rse==0.0.27', 'console_scripts', 'rse')()
  File "/usr/local/lib/python3.6/dist-packages/rse/client/__init__.py", line 457, in main
    main(args=args, extra=extra)
  File "/usr/local/lib/python3.6/dist-packages/rse/client/scrape.py", line 31, in main
    scraper.create(database=args.database, config_file=args.config_file)
  File "/usr/local/lib/python3.6/dist-packages/rse/main/scrapers/joss.py", line 107, in create
    repo = get_parser(uid)
  File "/usr/local/lib/python3.6/dist-packages/rse/main/parsers/__init__.py", line 39, in get_parser
    raise NotImplementedError(f"There is no matching parser for {uri}")

https://github.com/rseng/software/runs/936400731?check_suite_focus=true

needs to be re-run after fix!

Look for GitHub url in Zenodo identifiers

e.g., see https://zenodo.org/record/2667656/export/json

add static search view to interface

`ERROR:rse.utils.urls:Cannot find endpoint` when running `rse import --type google-sheets`

Research Software Encyclopedia version: installed from branch in #75
Python version: 3.10
Operating System: Ubuntu

Description

I now have my copy of the Google sheet from @rhine3 set up so that I can start import running like so:

rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub?gid=0&single=true&output=csv"

that is, I don't get any errors about missing fields, i.e. incorrect column names, now that I removed the message in the hidden row 1, and renamed the first three columns

full sheet is here:
https://docs.google.com/spreadsheets/d/1Ba1MY4o5Sm1f08IekJcbxAtSjkDN71Z1RZ42kzrofJ0/edit?usp=sharing

However I do get an error now that I'm not sure how to fix, traceback below

Two notes:

even if I have the checkbox checked for "Republish automatically" (as shown) it seems to be the case that Sheets doesn't update? And I need to toggle "publish" a couple of times and reset the page before it's correct. Just waiting doesn't seem to fix the problem
I also removed extra rows at the bottom with some calculations, that did not fix this error

What I Did

$ rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub?gid=0&single=true&output=csv"
INFO:rse.main.import.google-sheet:Found software record: https://github.com/patriceguyot/Acoustic_Indices
INFO:rse.main.import.google-sheet:Found software record: https://www.adobe.com/products/audition.html
INFO:rse.main.import.google-sheet:Found software record: https://www.titley-scientific.com/us/anabat-insight.html
INFO:rse.main.import.google-sheet:Found software record: https://datadryad.org/stash/dataset/doi:10.5061/dryad.221mq23
INFO:rse.main.import.google-sheet:Found software record: https://github.com/ChristianBergler/ANIMAL-SPOT
INFO:rse.main.import.google-sheet:Found software record: https://arbimon.rfcx.org/
INFO:rse.main.import.google-sheet:Found software record: https://soundanalysis.wp.st-andrews.ac.uk/
INFO:rse.main.import.google-sheet:Found software record: https://www.audacityteam.org/download/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nwolek/audiomoth-scripts
INFO:rse.main.import.google-sheet:Found software record: https://github.com/sarabsethi/audioset_soundscape_feats_sethi2019/tree/master/calc_audioset_feats
INFO:rse.main.import.google-sheet:Found software record: https://autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/timsainb/AVGN
INFO:rse.main.import.google-sheet:Found software record: http://www.avianz.net/index.php
INFO:rse.main.import.google-sheet:Found software record: http://www.avisoft.com/sound-analysis/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/EricArcher/banter
INFO:rse.main.import.google-sheet:Found software record: https://bitbucket.org/chrisscott/batclassify/src
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macaodha/batdetect
INFO:rse.main.import.google-sheet:Found software record: https://www.batlogger.com/en/products/batexplorer/
INFO:rse.main.import.google-sheet:Found software record: https://www.wsl.ch/en/services-and-products/software-websites-and-apps/batscope-4.html
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/bioacoustics/index.html
INFO:rse.main.import.google-sheet:Found software record: https://birdnet.cornell.edu/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxclassify
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxdetect
INFO:rse.main.import.google-sheet:Found software record: https://github.com/OpenWild/caracal
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/crowsetta
INFO:rse.main.import.google-sheet:Found software record: https://github.com/MarineBioAcousticsRC/DetEdit
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DrCoffey/DeepSqueak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nilomr/fieldtools
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DenaJGibbon/gibbonR-package
INFO:rse.main.import.google-sheet:Found software record: http://www.oldbird.org/glassofire.htm
INFO:rse.main.import.google-sheet:Found software record: https://www.goldwave.com/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/Cdevenish/hardRain
INFO:rse.main.import.google-sheet:Found software record: https://sites.google.com/view/alcore-suzuki/home/harkbird
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/hybrid-vocal-classifier
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DanWoodrich/INSTINCT
INFO:rse.main.import.google-sheet:Found software record: http://bioacoustics.us/ishmael.html
INFO:rse.main.import.google-sheet:Found software record: https://www.wildlifeacoustics.com/products/kaleidoscope-pro
INFO:rse.main.import.google-sheet:Found software record: https://meridian.cs.dal.ca/2015/04/12/ketos/
INFO:rse.main.import.google-sheet:Found software record: https://koe.io.ac.nz/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shyamblast/Koogu/tree/v0.6.5
INFO:rse.main.import.google-sheet:Found software record: https://librosa.org/librosa/
INFO:rse.main.import.google-sheet:Found software record: https://rflachlan.github.io/Luscinia/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/monitoR/index.html
INFO:rse.main.import.google-sheet:Found software record: https://marce10.github.io/ohun/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/kitzeslab/opensoundscape
INFO:rse.main.import.google-sheet:Found software record: https://www.pamguard.org/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/TaikiSan21/PAMr
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YannickJadoul/Parselmouth
INFO:rse.main.import.google-sheet:Found software record: https://www.fon.hum.uva.nl/praat/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shivChitinous/prinia-project
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-lite/
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-pro
INFO:rse.main.import.google-sheet:Found software record: https://www.reaper.fm/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/scikit-maad/scikit-maad
INFO:rse.main.import.google-sheet:Found software record: https://docs.scipy.org/doc/scipy/reference/signal.html
INFO:rse.main.import.google-sheet:Found software record: http://dx.doi.org/10.6084/m9.figshare.3792780
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/seewave/index.html
INFO:rse.main.import.google-sheet:Found software record: https://www.sonicvisualiser.org/
INFO:rse.main.import.google-sheet:Found software record: https://sonobat.com/
INFO:rse.main.import.google-sheet:Found software record: https://doi.org/10.1080/09524622.2013.827588
INFO:rse.main.import.google-sheet:Found software record: https://soundata.readthedocs.io/en/latest/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/soundecology/vignettes/intro.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macster110/aipam
INFO:rse.main.import.google-sheet:Found software record: https://github.com/rhine3/specky
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YvesBas/Tadarida-L

https://github.com/YvesBas/Tadarida-D

https://github.com/YvesBas/Tadarida-C
INFO:rse.main.import.google-sheet:Found software record: https://www.cetus.ucsd.edu/technologies_triton.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/yardencsGitHub/tweetynet
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/vak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/HaroldMills/Vesper
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/warbleR/index.html
Found 70 results
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/master/calc_audioset_feats.
Traceback (most recent call last):
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/bin/rse", line 33, in <module>
    sys.exit(load_entry_point('rse', 'console_scripts', 'rse')())
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/rse/rse/client/__init__.py", line 520, in main
    main(args=args, extra=extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/rse/rse/client/imp.py", line 28, in main
    importer.create(
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/rse/rse/main/scrapers/googlesheet.py", line 99, in create
    result = update_nonempty(result, data)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/rse/rse/utils/strings.py", line 13, in update_nonempty
    for key, value in source.items():
AttributeError: 'NoneType' object has no attribute 'items'

I can't see anything specifically different about the link that causes the crash

https://github.com/sarabsethi/audioset_soundscape_feats_sethi2019/tree/master/calc_audioset_feats

I notice that when I click on the link I get a redirect from GitHub?
But this happens for all the links in the rows above it too.

Add a generic CsvImporter?

This is discussion / feature request, not a bug

Related to discussion here:
rhine3/bioacoustics-software#3 (comment)

I wonder if it would make sense to add an importer/scraper that can take a .csv directly? This way the data could be munged + cleaned, e.g. by a script, before handing it off to rse.

It seems like this might be possible by decoupling some of the logic in the google sheets scraper?

Like maybe make a mixin that does the specific sheets accessing part, and have a more generic csv parser class that could then be combined with another mixin that parses a csv from a local file

So you'd have something like

class SpreadSheetImporter(ScraperBase):
    def parse(self, f, args):
        # csv parsing logic that's in GoogleSheetImporter now
        # starting with https://github.com/rseng/rse/blob/299ee56d33e4b9868deef4d624cca0608a786cbf/rse/main/scrapers/googlesheet.py#L58
        reader = csv.reader(f, delimiter=",")
        ...
        return self.results


class GoogleSheetMixIn:
    def scrape(self, args):
        ...
        f = StringIO(response.text)
        return f


class GoogleSheetImporter(SpreadSheetImporter, GoogleSheetMixin):
    ...
   def scrape(self, args):
       f = self.scrape(args)
       self.parse(f, args)


class CsvMixIn:
    def load(self, csv_path):
        ...
        f = pandas.read_csv(csv_path)


class CsvImporter(ScraperBase, CsvMixIn):
    ...
    def scrape(self, args):

I know this really starts to stretch the definition of "scrape" but I hope you see what I'm suggesting here in terms of decoupling the functionality

Create an export function to export repos list

add delay for interaction with GitHub api

Based on this run https://github.com/rseng/software/runs/814880232?check_suite_focus=true we need to add some delay to not use up the github quota. Also note the unformatted- format string at the end.

static web export should export static api, e.g., data.json in root

TypeError during `rse export` but still got "Export is complete!"

Research Software Encyclopedia version: 0.0.42
Python version: 3.10.5
Operating System: Ubuntu (Pop!-OS 22.04 LTS)

Description

I was trying to export a site, to understand the command, as discussed in #71.

rse export --type static-web docs/

The contents of my directory are from running

rse init .
rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vTsPmEWUg8Tr1ZoYTcQ0kTdsCrVskQveSuwfdEHaktHtQG693O4DHQrZotoFd5dXCLAciykAYNf-RSz/pub?gid=0&single=true&output=csv"

as in the import command docs:

Then I did the

rse export --type static-web docs/

and I got an "export complete" but there was a TypeError right before that (full traceback below).
Not clear if there's something I'm doing wrong.

The site does appear to have been generated:

$ ls docs
api  criteria  data.json  index.html  repository  search  static  taxonomy

From squinting menacingly at the traceback, my guess is what's happening here is that the parser can't parse one of the sources that is not a GitHub or GitLab link?

What I Did

$ rse export --type static-web docs/ 
Server initialized for gevent.
INFO:engineio.server:Server initialized for gevent.
Starting export for http://127.0.0.1:5000
Research Software Encyclopedia: running on http://127.0.0.1:5000
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:53] "HEAD / HTTP/1.1" 200 118 0.032824
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:53] "GET / HTTP/1.1" 200 4451 0.007007
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:53] "GET /repository/github/nwolek/audiomoth-scripts/annotate-criteria HTTP/1.1" 200 6959 0.134137
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/github/nwolek/audiomoth-scripts/annotate-taxonomy HTTP/1.1" 200 18818 0.100137
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/github/nwolek/audiomoth-scripts HTTP/1.1" 200 1890 0.002798
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/github/ChristianBergler/ANIMAL-SPOT/annotate-criteria HTTP/1.1" 200 7014 0.002462
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/github/ChristianBergler/ANIMAL-SPOT/annotate-taxonomy HTTP/1.1" 200 18830 0.003664
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/github/ChristianBergler/ANIMAL-SPOT HTTP/1.1" 200 1730 0.002515
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/github/patriceguyot/Acoustic_Indices/annotate-criteria HTTP/1.1" 200 6912 0.002571
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/github/patriceguyot/Acoustic_Indices/annotate-taxonomy HTTP/1.1" 200 18833 0.002451
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/github/patriceguyot/Acoustic_Indices HTTP/1.1" 200 1687 0.001672
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/www-adobe-com/products/audition-html/annotate-criteria HTTP/1.1" 200 6868 0.001764
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/www-adobe-com/products/audition-html/annotate-taxonomy HTTP/1.1" 200 18832 0.001930
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/custom/www-adobe-com/products/audition-html HTTP/1.1" 200 865 0.000983
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/soundanalysis-wp-st-andrews-ac-uk/annotate-criteria HTTP/1.1" 200 6888 0.001135
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/soundanalysis-wp-st-andrews-ac-uk/annotate-taxonomy HTTP/1.1" 200 18825 0.001212
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/custom/soundanalysis-wp-st-andrews-ac-uk HTTP/1.1" 200 976 0.000650
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/www-audacityteam-org/download/annotate-criteria HTTP/1.1" 200 6892 0.000749
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/www-audacityteam-org/download/annotate-taxonomy HTTP/1.1" 200 18813 0.000917
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/custom/www-audacityteam-org/download HTTP/1.1" 200 994 0.000580
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/datadryad-org/stash/dataset/doi-10-5061/dryad-221mq23/annotate-criteria HTTP/1.1" 200 7022 0.000706
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/datadryad-org/stash/dataset/doi-10-5061/dryad-221mq23/annotate-taxonomy HTTP/1.1" 200 18883 0.000949
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/custom/datadryad-org/stash/dataset/doi-10-5061/dryad-221mq23 HTTP/1.1" 200 1153 0.000638
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/www-titley-scientific-com/us/anabat-insight-html/annotate-criteria HTTP/1.1" 200 6951 0.000690
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/www-titley-scientific-com/us/anabat-insight-html/annotate-taxonomy HTTP/1.1" 200 18868 0.000914
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/custom/www-titley-scientific-com/us/anabat-insight-html HTTP/1.1" 200 1278 0.000583
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/arbimon-rfcx-org/annotate-criteria HTTP/1.1" 200 6819 0.000666
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /repository/custom/arbimon-rfcx-org/annotate-taxonomy HTTP/1.1" 200 18774 0.000902
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/custom/arbimon-rfcx-org HTTP/1.1" 200 1188 0.000566
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api HTTP/1.1" 200 478 0.000591
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos HTTP/1.1" 200 2187 0.000873
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/parser/github HTTP/1.1" 200 749 0.000682
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/repos/parser/gitlab HTTP/1.1" 200 110 0.000451
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/taxonomy HTTP/1.1" 200 7885 0.000448
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /api/criteria HTTP/1.1" 200 1018 0.000379
ERROR:rse.app.server:Exception on /search [GET]
Traceback (most recent call last):
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 271, in error_router
    return original_handler(e)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/app/views/main.py", line 81, in search
    url = RSE_HOST + RSE_URL_PREFIX or flask.request.host_url + RSE_URL_PREFIX
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /search HTTP/1.1" 500 401 0.001437
Issue parsing http://127.0.0.1:5000/search
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /criteria HTTP/1.1" 200 2936 0.005845
INFO:geventwebsocket.handler:127.0.0.1 - - [2022-08-21 17:52:54] "GET /taxonomy HTTP/1.1" 200 4440 0.012862
Generating data export
Export is complete!

bio.tools database

https://bio.tools

GitHub parser should use regular expressions

see #76 (comment)

ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.

Research Software Encyclopedia version: 0.0.44
Python version: 3.10
Operating System: Pop!OS (ubuntish)

Description

I'm back to trying to run rse import on an edited copy of the google-sheet from @rhine3's bioacoustics-software repo.
It now crashes with this

Found 70 results
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.

Full traceback below. I think rse makes it through all the urls now and the source of this crash is something else? Unless I'm reading the traceback wrong somehow

What I Did

$ rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub?gid=0&single=true&output=csv"
INFO:rse.main.import.google-sheet:Found software record: https://github.com/patriceguyot/Acoustic_Indices
INFO:rse.main.import.google-sheet:Found software record: https://www.adobe.com/products/audition.html
INFO:rse.main.import.google-sheet:Found software record: https://www.titley-scientific.com/us/anabat-insight.html
INFO:rse.main.import.google-sheet:Found software record: https://datadryad.org/stash/dataset/doi:10.5061/dryad.221mq23
INFO:rse.main.import.google-sheet:Found software record: https://github.com/ChristianBergler/ANIMAL-SPOT
INFO:rse.main.import.google-sheet:Found software record: https://arbimon.rfcx.org/
INFO:rse.main.import.google-sheet:Found software record: https://soundanalysis.wp.st-andrews.ac.uk/
INFO:rse.main.import.google-sheet:Found software record: https://www.audacityteam.org/download/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nwolek/audiomoth-scripts
INFO:rse.main.import.google-sheet:Found software record: https://github.com/sarabsethi/audioset_soundscape_feats_sethi2019
INFO:rse.main.import.google-sheet:Found software record: https://autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/timsainb/AVGN
INFO:rse.main.import.google-sheet:Found software record: http://www.avianz.net/index.php
INFO:rse.main.import.google-sheet:Found software record: http://www.avisoft.com/sound-analysis/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/EricArcher/banter
INFO:rse.main.import.google-sheet:Found software record: https://bitbucket.org/chrisscott/batclassify/src
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macaodha/batdetect
INFO:rse.main.import.google-sheet:Found software record: https://www.batlogger.com/en/products/batexplorer/
INFO:rse.main.import.google-sheet:Found software record: https://www.wsl.ch/en/services-and-products/software-websites-and-apps/batscope-4.html
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/bioacoustics/index.html
INFO:rse.main.import.google-sheet:Found software record: https://birdnet.cornell.edu/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxclassify
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxdetect
INFO:rse.main.import.google-sheet:Found software record: https://github.com/OpenWild/caracal
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/crowsetta
INFO:rse.main.import.google-sheet:Found software record: https://github.com/MarineBioAcousticsRC/DetEdit
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DrCoffey/DeepSqueak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nilomr/fieldtools
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DenaJGibbon/gibbonR-package
INFO:rse.main.import.google-sheet:Found software record: http://www.oldbird.org/glassofire.htm
INFO:rse.main.import.google-sheet:Found software record: https://www.goldwave.com/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/Cdevenish/hardRain
INFO:rse.main.import.google-sheet:Found software record: https://sites.google.com/view/alcore-suzuki/home/harkbird
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/hybrid-vocal-classifier
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DanWoodrich/INSTINCT
INFO:rse.main.import.google-sheet:Found software record: http://bioacoustics.us/ishmael.html
INFO:rse.main.import.google-sheet:Found software record: https://www.wildlifeacoustics.com/products/kaleidoscope-pro
INFO:rse.main.import.google-sheet:Found software record: https://meridian.cs.dal.ca/2015/04/12/ketos/
INFO:rse.main.import.google-sheet:Found software record: https://koe.io.ac.nz/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shyamblast/Koogu/tree/v0.6.5
INFO:rse.main.import.google-sheet:Found software record: https://librosa.org/librosa/
INFO:rse.main.import.google-sheet:Found software record: https://rflachlan.github.io/Luscinia/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/monitoR/index.html
INFO:rse.main.import.google-sheet:Found software record: https://marce10.github.io/ohun/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/kitzeslab/opensoundscape
INFO:rse.main.import.google-sheet:Found software record: https://www.pamguard.org/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/TaikiSan21/PAMr
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YannickJadoul/Parselmouth
INFO:rse.main.import.google-sheet:Found software record: https://www.fon.hum.uva.nl/praat/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shivChitinous/prinia-project
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-lite/
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-pro
INFO:rse.main.import.google-sheet:Found software record: https://www.reaper.fm/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/scikit-maad/scikit-maad
INFO:rse.main.import.google-sheet:Found software record: https://docs.scipy.org/doc/scipy/reference/signal.html
INFO:rse.main.import.google-sheet:Found software record: http://dx.doi.org/10.6084/m9.figshare.3792780
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/seewave/index.html
INFO:rse.main.import.google-sheet:Found software record: https://www.sonicvisualiser.org/
INFO:rse.main.import.google-sheet:Found software record: https://sonobat.com/
INFO:rse.main.import.google-sheet:Found software record: https://doi.org/10.1080/09524622.2013.827588
INFO:rse.main.import.google-sheet:Found software record: https://soundata.readthedocs.io/en/latest/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/soundecology/vignettes/intro.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macster110/aipam
INFO:rse.main.import.google-sheet:Found software record: https://github.com/rhine3/specky
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YvesBas/Tadarida-L

https://github.com/YvesBas/Tadarida-D

https://github.com/YvesBas/Tadarida-C
INFO:rse.main.import.google-sheet:Found software record: https://www.cetus.ucsd.edu/technologies_triton.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/yardencsGitHub/tweetynet
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/vak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/HaroldMills/Vesper
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/warbleR/index.html
Found 70 results
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.
Traceback (most recent call last):
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/bin/rse", line 8, in <module>
    sys.exit(main())
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/__init__.py", line 520, in main
    main(args=args, extra=extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/imp.py", line 28, in main
    importer.create(
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/main/scrapers/googlesheet.py", line 99, in create
    result = update_nonempty(result, data)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/utils/strings.py", line 13, in update_nonempty
    for key, value in source.items():
AttributeError: 'NoneType' object has no attribute 'items'

Look into bitbucket for a resource

E.g., We want to add https://bitbucket.org/geoplexity/easal-dev/src/master/ via https://api.molssi.org/software_detail/61e1cea9c6d2094de2d20395

DOC: Add link to rse-web and/or rse-jekyll-web README from tutorials page

as discussed here:
rhine3/bioacoustics-software#3 (comment)

A user might want to know how to serve their own site from an rse-generated database

You just need to generate the database, and then copy this repository to grab and update from it. https://github.com/rseng/web
Or just copy this entirely :) https://github.com/rseng/rse-jekyll-web/
That repository README has full instructions.
I think perhaps we should provide a link to the repository I shared above here: https://rseng.github.io/rse/tutorials/index.html

Guessing you want a link to one of the READMEs, that will serve as a tutorial?
And that link should be added as another bullet point on tutorials/index?

Add ability to export static flask interface

If I'm a researcher, I should be able to export an explorable interface for my software (the flask interface sans the annotation part).

Hal research software database

https://hal.archives-ouvertes.fr/search/index/?q=%2A&docType_s=SOFTWARE

Add other version control parsers

I think the main one I want to add now is GitLab, I don't see huge usage for others (but can add on demand).

parsers should only store minimal metadata

E.g., instead of a whole GitHub dump, we should only store the minimum needed, maybe:

url
name
description
homepage
stars/forks
timestamp

Not clear why malformed entries are malformed

Research Software Encyclopedia version: 0.0.46
Python version: 3.10
Operating System: Ubuntu

Description

Hi @vsoch thank you again for adding the ability to import .csv files directly.
I'm working on a script to clean up a .csv but I'm not actually able to figure out why the scraper reports some entries are malformed, instead of turning them into a "custom" repo.
I think it has something to do with the format of the url?

Below I'll put output from running.
Seems like anything that's not "https://" will fail.
Is that so? Is it in the docs somewhere that I need to have all urls be this format and I'm missing it? I don't find anything about the custom parser.
Sorry if I'm misunderstanding what the main loop in CSVImporter.create is doing.

What I Did

Here's the report of malformed entries.
Seems like anything that's not "https://" fails.
E.g., "www.", "readthedocs.", etc.

Yes I haven't set up a GitHub token yet

... all the INFO logs here of "Found software record" ...
Found 70 results
WARNING:rse.main.import.csv:Skipping malformed entry www.adobe.com/products/audition.html
WARNING:rse.main.import.csv:Skipping malformed entry www.titley-scientific.com/us/anabat-insight.html
WARNING:rse.main.import.csv:Skipping malformed entry datadryad.org/stash/dataset/doi:10.5061/dryad.221mq23
WARNING:rse.main.import.csv:Skipping malformed entry arbimon.rfcx.org/
WARNING:rse.main.import.csv:Skipping malformed entry soundanalysis.wp.st-andrews.ac.uk/
WARNING:rse.main.import.csv:Skipping malformed entry www.audacityteam.org/download/
WARNING:rse.main.import.csv:Skipping malformed entry autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html
WARNING:rse.main.import.csv:Skipping malformed entry www.avianz.net/index.php
WARNING:rse.main.import.csv:Skipping malformed entry www.avisoft.com/sound-analysis/
WARNING:rse.main.import.csv:Skipping malformed entry bitbucket.org/chrisscott/batclassify/src
WARNING:rse.main.import.csv:Skipping malformed entry www.batlogger.com/en/products/batexplorer/
WARNING:rse.main.import.csv:Skipping malformed entry www.wsl.ch/en/services-and-products/software-websites-and-apps/batscope-4.html
WARNING:rse.main.import.csv:Skipping malformed entry cran.r-project.org/web/packages/bioacoustics/index.html
WARNING:rse.main.import.csv:Skipping malformed entry birdnet.cornell.edu/
WARNING:rse.main.import.csv:Skipping malformed entry www.oldbird.org/glassofire.htm
WARNING:rse.main.import.csv:Skipping malformed entry www.goldwave.com/
WARNING:rse.main.import.csv:Skipping malformed entry sites.google.com/view/alcore-suzuki/home/harkbird
WARNING:rse.main.import.csv:Skipping malformed entry bioacoustics.us/ishmael.html
WARNING:rse.main.import.csv:Skipping malformed entry www.wildlifeacoustics.com/products/kaleidoscope-pro
WARNING:rse.main.import.csv:Skipping malformed entry meridian.cs.dal.ca/2015/04/12/ketos/
WARNING:rse.main.import.csv:Skipping malformed entry koe.io.ac.nz/
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.
WARNING:rse.main.import.csv:Skipping malformed entry github.com/shyamblast/Koogu/tree/v0.6.5
WARNING:rse.main.import.csv:Skipping malformed entry librosa.org/librosa/
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/rflachlanhub.io/Luscinia.
WARNING:rse.main.import.csv:Skipping malformed entry rflachlan.github.io/Luscinia/
WARNING:rse.main.import.csv:Skipping malformed entry cran.r-project.org/web/packages/monitoR/index.html
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/ohun/index.html.
WARNING:rse.main.import.csv:Skipping malformed entry marce10.github.io/ohun/index.html
WARNING:rse.main.import.csv:Skipping malformed entry www.pamguard.org/
WARNING:rse.main.import.csv:Skipping malformed entry www.fon.hum.uva.nl/praat/
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/shivChitinous/prinia-project: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/shivChitinous/prinia-project
WARNING:rse.main.import.csv:Skipping malformed entry ravensoundsoftware.com/software/raven-lite/
WARNING:rse.main.import.csv:Skipping malformed entry ravensoundsoftware.com/software/raven-pro
WARNING:rse.main.import.csv:Skipping malformed entry www.reaper.fm/
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/scikit-maad/scikit-maad: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/scikit-maad/scikit-maad
WARNING:rse.main.import.csv:Skipping malformed entry docs.scipy.org/doc/scipy/reference/signal.html
WARNING:rse.main.import.csv:Skipping malformed entry dx.doi.org/10.6084/m9.figshare.3792780
WARNING:rse.main.import.csv:Skipping malformed entry cran.r-project.org/web/packages/seewave/index.html
WARNING:rse.main.import.csv:Skipping malformed entry www.sonicvisualiser.org/
WARNING:rse.main.import.csv:Skipping malformed entry sonobat.com/
WARNING:rse.main.import.csv:Skipping malformed entry doi.org/10.1080/09524622.2013.827588
WARNING:rse.main.import.csv:Skipping malformed entry soundata.readthedocs.io/en/latest/
WARNING:rse.main.import.csv:Skipping malformed entry cran.r-project.org/web/packages/soundecology/vignettes/intro.html
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/macster110/aipam: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/macster110/aipam
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/rhine3/specky: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/rhine3/specky
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/YvesBas/Tadarida-C: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/YvesBas/Tadarida-C
WARNING:rse.main.import.csv:Skipping malformed entry www.cetus.ucsd.edu/technologies_triton.html
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/yardencsGitHub/tweetynet: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/yardencsGitHub/tweetynet
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/vocalpy/vak: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/vocalpy/vak
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/HaroldMills/Vesper: 403, rate limit exceeded
WARNING:rse.main.import.csv:Skipping malformed entry github.com/HaroldMills/Vesper
WARNING:rse.main.import.csv:Skipping malformed entry cran.r-project.org/web/packages/warbleR/index.html

rseng / rse Goto Github PK

rse's Introduction

Research Software Engineering

Overview

How do I contribute?

Usage

Criteria

Taxonomy

Generate

Markdown Jekyll Pages

Intended for Visualization (json)

GitHub Issue Templates

License

rse's People

Contributors

Stargazers

Watchers

Forkers

rse's Issues

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org