Comments (12)
Hmm, because the file is in draft, I bet _draft
would need to be appended like this:
id = "datafile_12930_draft"
@famuvie do you want to see if you can find your draft file that way with curl? You'll have to pass your API token. Docs on this at https://guides.dataverse.org/en/5.9/api/search.html
@kuriwaki this might also work:
entityId:12930
An example: https://dataverse.harvard.edu/api/search?q=entityId:3371438
(I'm not sure why I suggested id
instead of entityId
at #113 (comment) . The id
changes (_draft
is dropped on publish) but entityId
stays the same.)
from dataverse-client-r.
I will put a tip about this in the dataverse download vignette. I think it is a limitation that might be common to people who try to download draft datasets, but the current method to edit something seems not too onerous.
from dataverse-client-r.
Ultimately, the problem in is_ingested()
boils down to dataverse_search()
not finding the file:
library(dataverse)
server <- Sys.getenv("DATAVERSE_SERVER")
key <- Sys.getenv("DATAVERSE_KEY")
dataverse_search(id = "datafile_12930", type = "file", server = server, key = key)
#> 0 of 0 results retrieved
#> list()
Created on 2022-02-04 by the reprex package (v2.0.1)
It is worth noting that I can find the file using some keywords on the web interface.
Whereas dataverse_search()
correctly finds a published file.
from dataverse-client-r.
Not sure how to pass the API token with curl, but it works with dataverse_search()
:
library(dataverse)
server <- Sys.getenv("DATAVERSE_SERVER")
key <- Sys.getenv("DATAVERSE_KEY")
dataverse_search(id = "datafile_12930_draft", type = "file", server = server, key = key)
#> 1 of 1 result retrieved
#> name type
#> 1 Bovine_2020_2021.tab file
#> url file_id
#> 1 https://dataverse.cirad.fr/api/access/datafile/12930 12930
#> description
#> 1 On this document, there is only the tick data collection between 2020 and 2021.\n\nSome information about variable :\n\n- Vache : Identifier of cow (10 digits)\n- Date : date of slaughterhouse visit\n- Commune : origin cow municipality\n- Eleveur : origin cow breeder\n- Troupeau : municipality and breeder\n- H_marginatus : number of *H_marginatus* collected\n- R_bursa : number of *R_bursa* collected\n- I_ricinus : number of *I_ricinus* collected\n- H_scupense : number of *H_scupense* collected\n- B_annulatus : number of *B_annulatus* collected\n- R_sanguineus : number of *R_sanguineus* collected\n- H_punctata : number of *H_punctata* collected.\n- D_marginatus : number of *D_marginatus* collected\n- Tiques ? : sum of ticks collected
#> file_type file_content_type size_in_bytes
#> 1 Tab-Delimited text/tab-separated-values 69648
#> md5 checksum.type
#> 1 688c6fc5f92e6526a3cd158854027e8b MD5
#> checksum.value unf dataset_name
#> 1 688c6fc5f92e6526a3cd158854027e8b UNF:6:lt7ZJ1diuShhMCd8UWq5zQ== Bovine
#> dataset_id dataset_persistent_id
#> 1 12928 doi:10.18167/DVN1/8Z1ZI9
#> dataset_citation
#> 1 Bartholomee, Colombine, 2022, "Bovine", https://doi.org/10.18167/DVN1/8Z1ZI9, CIRAD Dataverse, DRAFT VERSION, UNF:6:ov2odYXNktsIbiuwc2MDJQ== [fileUNF]
Created on 2022-02-04 by the reprex package (v2.0.1)
from dataverse-client-r.
I'm not sure how to pass the API token with curl. I'll check.
You can pass it as a header or a query parameter. Please see https://guides.dataverse.org/en/5.9/api/auth.html
from dataverse-client-r.
Sorry, I made a mistake in the previous example and have just corrected it. It actually works!
from dataverse-client-r.
Still, I can't find a hacky way for adding "_draft" to the file id. I guess that needs to be fixed in the package :)
from dataverse-client-r.
Thanks @famuvie for creating an issue. A partial fix is now on dev.
@pdurbin, thanks for pointing out entityId. I implemented it on dev as there seems to be no downside.
I created a test dataset on demo dataverse that is intentionally unpublished. The get commands seem to go ok except for my unpublished test file does not have a UNF
under the SEARCH API even though it does with the File API. Have you seen this before?
Proper UNF detection becomes necessary since that's how it currently determines if a file is ingested or not.
> str(dataset_files(dataset = "10.70122/FK2/4XHVAP", server = "demo.dataverse.org")[[1]]$dataFile)
List of 16
$ id : int 1951382
$ persistentId : chr ""
$ pidURL : chr ""
$ filename : chr "mtcars.tab"
$ contentType : chr "text/tab-separated-values"
$ filesize : int 1713
$ storageIdentifier : chr "s3://demo-dataverse-org:17f75571af3-60325bcbb1f1"
$ originalFileFormat : chr "text/csv"
$ originalFormatLabel: chr "Comma Separated Values"
$ originalFileSize : int 1700
$ originalFileName : chr "mtcars.csv"
$ UNF : chr "UNF:6:KRE/AItWGJWd5tJ+bboN7A=="
$ rootDataFileId : int -1
$ md5 : chr "c502359c26a0931eef53b2207b2344f9"
$ checksum :List of 2
..$ type : chr "MD5"
..$ value: chr "c502359c26a0931eef53b2207b2344f9"
$ creationDate : chr "2022-03-10"
> str(dataverse_search(entityId = 1951382, server = "demo.dataverse.org", key = Sys.getenv("DATAVERSE_KEY")))
1 of 1 result retrieved
'data.frame': 1 obs. of 13 variables:
$ name : chr "mtcars.csv"
$ type : chr "file"
$ url : chr "https://demo.dataverse.org/api/access/datafile/1951382"
$ file_id : chr "1951382"
$ file_type : chr "Comma Separated Values"
$ file_content_type : chr "text/csv"
$ size_in_bytes : int 1700
$ md5 : chr "c502359c26a0931eef53b2207b2344f9"
$ checksum :'data.frame': 1 obs. of 2 variables:
..$ type : chr "MD5"
..$ value: chr "c502359c26a0931eef53b2207b2344f9"
$ dataset_name : chr "Permanent draft dataset for testing"
$ dataset_id : chr "1951381"
$ dataset_persistent_id: chr "doi:10.70122/FK2/4XHVAP"
$ dataset_citation : chr "Kuriwaki, Shiro, 2022, \"Permanent draft dataset for testing\", https://doi.org/10.70122/FK2/4XHVAP, Demo Datav"| __truncated__
from dataverse-client-r.
The get commands seem to go ok except for my unpublished test file does not have a
UNF
under the SEARCH API even though it does with the File API.
Huh. This is news to me but I see what you mean.
No UNF from the Search API when I look at your unpublished file...
curl -H X-Dataverse-key:$API_TOKEN https://demo.dataverse.org/api/search?q=id:datafile_1951382_draft
{
"status": "OK",
"data": {
"q": "id:datafile_1951382_draft",
"total_count": 1,
"start": 0,
"spelling_alternatives": {},
"items": [
{
"name": "mtcars.csv",
"type": "file",
"url": "https://demo.dataverse.org/api/access/datafile/1951382",
"file_id": "1951382",
"file_type": "Comma Separated Values",
"file_content_type": "text/csv",
"size_in_bytes": 1700,
"md5": "c502359c26a0931eef53b2207b2344f9",
"checksum": {
"type": "MD5",
"value": "c502359c26a0931eef53b2207b2344f9"
},
"dataset_name": "Permanent draft dataset for testing",
"dataset_id": "1951381",
"dataset_persistent_id": "doi:10.70122/FK2/4XHVAP",
"dataset_citation": "Kuriwaki, Shiro, 2022, \"Permanent draft dataset for testing\", https://doi.org/10.70122/FK2/4XHVAP, Demo Dataverse, DRAFT VERSION"
}
],
"count_in_response": 1
}
}
... but when I look at a published file (different server but shouldn't matter), I do see a UNF:
curl https://dataverse.harvard.edu/api/search?q=id:datafile_3371438
{
"status": "OK",
"data": {
"q": "id:datafile_3371438",
"total_count": 1,
"start": 0,
"spelling_alternatives": {},
"items": [
{
"name": "2019-02-25.tab",
"type": "file",
"url": "https://dataverse.harvard.edu/api/access/datafile/3371438",
"file_id": "3371438",
"description": "",
"published_at": "2019-02-26T03:03:13Z",
"file_type": "Tab-Delimited",
"file_content_type": "text/tab-separated-values",
"size_in_bytes": 17232,
"md5": "9bd94d028049c9a53bca9bb19d4fb57e",
"checksum": {
"type": "MD5",
"value": "9bd94d028049c9a53bca9bb19d4fb57e"
},
"unf": "UNF:6:2MMoV8KKO8R7sb27Q5GXtA==",
"file_persistent_id": "doi:10.7910/DVN/TJCLKP/3VSTKY",
"dataset_name": "Open Source at Harvard",
"dataset_id": "3035124",
"dataset_persistent_id": "doi:10.7910/DVN/TJCLKP",
"dataset_citation": "Durbin, Philip, 2017, \"Open Source at Harvard\", https://doi.org/10.7910/DVN/TJCLKP, Harvard Dataverse, DRAFT VERSION, UNF:6:2MMoV8KKO8R7sb27Q5GXtA== [fileUNF]"
}
],
"count_in_response": 1
}
}
Perhaps we don't reindex the file after ingest is complete? I'm not sure. You could test this by making a change to your draft dataset metadata (add a keyword or something). This will reindex the dataaset and its files.
from dataverse-client-r.
Yes! It was sufficient to add a data description to the draft dataset, and it somehow updated. Thank you.
from dataverse-client-r.
@kuriwaki hmm, I can replicate this on "develop" on my laptop (around 0d853b74e9). When I first upload a file to a draft, the UNF does not appear in search results...
$ curl -s -H X-Dataverse-key:$API_TOKEN http://localhost:8080/api/search?q=id:datafile_5_draft | jq .
{
"status": "OK",
"data": {
"q": "id:datafile_5_draft",
"total_count": 1,
"start": 0,
"spelling_alternatives": {},
"items": [
{
"name": "2016-06-29.csv",
"type": "file",
"url": "http://localhost:8080/api/access/datafile/5",
"file_id": "5",
"file_type": "Comma Separated Values",
"file_content_type": "text/csv",
"size_in_bytes": 58690,
"md5": "d5de092a84304a9965c787b8dcd27c99",
"checksum": {
"type": "MD5",
"value": "d5de092a84304a9965c787b8dcd27c99"
},
"dataset_name": "zzz",
"dataset_id": "4",
"dataset_persistent_id": "doi:10.5072/FK2/JJK8WY",
"dataset_citation": "Admin, Dataverse, 2022, \"zzz\", https://doi.org/10.5072/FK2/JJK8WY, Root, DRAFT VERSION"
}
],
"count_in_response": 1
}
}
... but if I edit the metadata of the draft dataset (forcing the file to be reindexed, the UNF appears):
$ curl -s -H X-Dataverse-key:$API_TOKEN http://localhost:8080/api/search?q=id:datafile_5_draft | jq .
{
"status": "OK",
"data": {
"q": "id:datafile_5_draft",
"total_count": 1,
"start": 0,
"spelling_alternatives": {},
"items": [
{
"name": "2016-06-29.tab",
"type": "file",
"url": "http://localhost:8080/api/access/datafile/5",
"file_id": "5",
"file_type": "Tab-Delimited",
"file_content_type": "text/tab-separated-values",
"size_in_bytes": 59208,
"md5": "d5de092a84304a9965c787b8dcd27c99",
"checksum": {
"type": "MD5",
"value": "d5de092a84304a9965c787b8dcd27c99"
},
"unf": "UNF:6:6YVg+pUWsYD52stDkZuzUA==",
"dataset_name": "zzzyyy",
"dataset_id": "4",
"dataset_persistent_id": "doi:10.5072/FK2/JJK8WY",
"dataset_citation": "Admin, Dataverse, 2022, \"zzzyyy\", https://doi.org/10.5072/FK2/JJK8WY, Root, DRAFT VERSION, UNF:6:6YVg+pUWsYD52stDkZuzUA== [fileUNF]"
}
],
"count_in_response": 1
}
}
Please feel free to open an issue about this at https://github.com/IQSS/dataverse/issues if you'd like.
from dataverse-client-r.
Addressed by 0.3.11.
from dataverse-client-r.
Related Issues (20)
- Release dataverse 0.3.9
- Unused and unbuilt functions
- Improve doc on how to read objects without object assignment HOT 2
- Progress bar HOT 4
- Expired API token for R check HOT 2
- Faster JSON parser HOT 2
- Better detection test for whether a file is ingested HOT 7
- add_dataset_file error HOT 10
- CRAN 0.3.11 HOT 4
- release of haven 2.5.0 HOT 1
- Problema para recuperar informação pela API HOT 3
- CRAN check errors HOT 5
- GitHub Actions failing due to spending limits HOT 7
- rename default branch to "main" HOT 4
- How to download an RData file? HOT 9
- Problem downloading larger files HOT 1
- HTTP 503 on data that used to work HOT 4
- CRAN checks failing at vignette when network resource unavailable HOT 9
- Guidance on downloading and reading multiple files (shp)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataverse-client-r.