Git Product home page Git Product logo

Comments (12)

pdurbin avatar pdurbin commented on August 19, 2024 1

Hmm, because the file is in draft, I bet _draft would need to be appended like this:

id = "datafile_12930_draft"

@famuvie do you want to see if you can find your draft file that way with curl? You'll have to pass your API token. Docs on this at https://guides.dataverse.org/en/5.9/api/search.html

@kuriwaki this might also work:

entityId:12930

An example: https://dataverse.harvard.edu/api/search?q=entityId:3371438

(I'm not sure why I suggested id instead of entityId at #113 (comment) . The id changes (_draft is dropped on publish) but entityId stays the same.)

from dataverse-client-r.

kuriwaki avatar kuriwaki commented on August 19, 2024 1

I will put a tip about this in the dataverse download vignette. I think it is a limitation that might be common to people who try to download draft datasets, but the current method to edit something seems not too onerous.

from dataverse-client-r.

famuvie avatar famuvie commented on August 19, 2024

Ultimately, the problem in is_ingested() boils down to dataverse_search() not finding the file:

library(dataverse)
server <- Sys.getenv("DATAVERSE_SERVER")
key <- Sys.getenv("DATAVERSE_KEY")
dataverse_search(id = "datafile_12930", type = "file", server = server, key = key)
#> 0 of 0 results retrieved
#> list()

Created on 2022-02-04 by the reprex package (v2.0.1)

It is worth noting that I can find the file using some keywords on the web interface.

Whereas dataverse_search() correctly finds a published file.

from dataverse-client-r.

famuvie avatar famuvie commented on August 19, 2024

Not sure how to pass the API token with curl, but it works with dataverse_search():

library(dataverse)
server <- Sys.getenv("DATAVERSE_SERVER")
key <- Sys.getenv("DATAVERSE_KEY")
dataverse_search(id = "datafile_12930_draft", type = "file", server = server, key = key)
#> 1 of 1 result retrieved
#>                   name type
#> 1 Bovine_2020_2021.tab file
#>                                                    url file_id
#> 1 https://dataverse.cirad.fr/api/access/datafile/12930   12930
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              description
#> 1 On this document, there is only the tick data collection between 2020 and 2021.\n\nSome information about variable :\n\n- Vache : Identifier of cow (10 digits)\n- Date : date of slaughterhouse visit\n- Commune : origin cow municipality\n- Eleveur : origin cow breeder\n- Troupeau : municipality and breeder\n- H_marginatus : number of *H_marginatus* collected\n- R_bursa : number of *R_bursa* collected\n- I_ricinus : number of *I_ricinus* collected\n- H_scupense : number of *H_scupense* collected\n- B_annulatus : number of *B_annulatus* collected\n- R_sanguineus : number of *R_sanguineus* collected\n- H_punctata : number of *H_punctata* collected.\n- D_marginatus : number of *D_marginatus* collected\n- Tiques ? : sum of ticks collected
#>       file_type         file_content_type size_in_bytes
#> 1 Tab-Delimited text/tab-separated-values         69648
#>                                md5 checksum.type
#> 1 688c6fc5f92e6526a3cd158854027e8b           MD5
#>                     checksum.value                            unf dataset_name
#> 1 688c6fc5f92e6526a3cd158854027e8b UNF:6:lt7ZJ1diuShhMCd8UWq5zQ==       Bovine
#>   dataset_id    dataset_persistent_id
#> 1      12928 doi:10.18167/DVN1/8Z1ZI9
#>                                                                                                                                         dataset_citation
#> 1 Bartholomee, Colombine, 2022, "Bovine", https://doi.org/10.18167/DVN1/8Z1ZI9, CIRAD Dataverse, DRAFT VERSION, UNF:6:ov2odYXNktsIbiuwc2MDJQ== [fileUNF]

Created on 2022-02-04 by the reprex package (v2.0.1)

from dataverse-client-r.

pdurbin avatar pdurbin commented on August 19, 2024

I'm not sure how to pass the API token with curl. I'll check.

You can pass it as a header or a query parameter. Please see https://guides.dataverse.org/en/5.9/api/auth.html

from dataverse-client-r.

famuvie avatar famuvie commented on August 19, 2024

Sorry, I made a mistake in the previous example and have just corrected it. It actually works!

from dataverse-client-r.

famuvie avatar famuvie commented on August 19, 2024

Still, I can't find a hacky way for adding "_draft" to the file id. I guess that needs to be fixed in the package :)

from dataverse-client-r.

kuriwaki avatar kuriwaki commented on August 19, 2024

Thanks @famuvie for creating an issue. A partial fix is now on dev.
@pdurbin, thanks for pointing out entityId. I implemented it on dev as there seems to be no downside.

I created a test dataset on demo dataverse that is intentionally unpublished. The get commands seem to go ok except for my unpublished test file does not have a UNF under the SEARCH API even though it does with the File API. Have you seen this before?

Proper UNF detection becomes necessary since that's how it currently determines if a file is ingested or not.

> str(dataset_files(dataset = "10.70122/FK2/4XHVAP", server = "demo.dataverse.org")[[1]]$dataFile)
List of 16
 $ id                 : int 1951382
 $ persistentId       : chr ""
 $ pidURL             : chr ""
 $ filename           : chr "mtcars.tab"
 $ contentType        : chr "text/tab-separated-values"
 $ filesize           : int 1713
 $ storageIdentifier  : chr "s3://demo-dataverse-org:17f75571af3-60325bcbb1f1"
 $ originalFileFormat : chr "text/csv"
 $ originalFormatLabel: chr "Comma Separated Values"
 $ originalFileSize   : int 1700
 $ originalFileName   : chr "mtcars.csv"
 $ UNF                : chr "UNF:6:KRE/AItWGJWd5tJ+bboN7A=="
 $ rootDataFileId     : int -1
 $ md5                : chr "c502359c26a0931eef53b2207b2344f9"
 $ checksum           :List of 2
  ..$ type : chr "MD5"
  ..$ value: chr "c502359c26a0931eef53b2207b2344f9"
 $ creationDate       : chr "2022-03-10"
> str(dataverse_search(entityId = 1951382, server = "demo.dataverse.org", key = Sys.getenv("DATAVERSE_KEY")))
1 of 1 result retrieved
'data.frame':	1 obs. of  13 variables:
 $ name                 : chr "mtcars.csv"
 $ type                 : chr "file"
 $ url                  : chr "https://demo.dataverse.org/api/access/datafile/1951382"
 $ file_id              : chr "1951382"
 $ file_type            : chr "Comma Separated Values"
 $ file_content_type    : chr "text/csv"
 $ size_in_bytes        : int 1700
 $ md5                  : chr "c502359c26a0931eef53b2207b2344f9"
 $ checksum             :'data.frame':	1 obs. of  2 variables:
  ..$ type : chr "MD5"
  ..$ value: chr "c502359c26a0931eef53b2207b2344f9"
 $ dataset_name         : chr "Permanent draft dataset for testing"
 $ dataset_id           : chr "1951381"
 $ dataset_persistent_id: chr "doi:10.70122/FK2/4XHVAP"
 $ dataset_citation     : chr "Kuriwaki, Shiro, 2022, \"Permanent draft dataset for testing\", https://doi.org/10.70122/FK2/4XHVAP, Demo Datav"| __truncated__

from dataverse-client-r.

pdurbin avatar pdurbin commented on August 19, 2024

The get commands seem to go ok except for my unpublished test file does not have a UNF under the SEARCH API even though it does with the File API.

Huh. This is news to me but I see what you mean.

No UNF from the Search API when I look at your unpublished file...

curl -H X-Dataverse-key:$API_TOKEN https://demo.dataverse.org/api/search?q=id:datafile_1951382_draft

{
  "status": "OK",
  "data": {
    "q": "id:datafile_1951382_draft",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "mtcars.csv",
        "type": "file",
        "url": "https://demo.dataverse.org/api/access/datafile/1951382",
        "file_id": "1951382",
        "file_type": "Comma Separated Values",
        "file_content_type": "text/csv",
        "size_in_bytes": 1700,
        "md5": "c502359c26a0931eef53b2207b2344f9",
        "checksum": {
          "type": "MD5",
          "value": "c502359c26a0931eef53b2207b2344f9"
        },
        "dataset_name": "Permanent draft dataset for testing",
        "dataset_id": "1951381",
        "dataset_persistent_id": "doi:10.70122/FK2/4XHVAP",
        "dataset_citation": "Kuriwaki, Shiro, 2022, \"Permanent draft dataset for testing\", https://doi.org/10.70122/FK2/4XHVAP, Demo Dataverse, DRAFT VERSION"
      }
    ],
    "count_in_response": 1
  }
}

... but when I look at a published file (different server but shouldn't matter), I do see a UNF:

curl https://dataverse.harvard.edu/api/search?q=id:datafile_3371438
{
  "status": "OK",
  "data": {
    "q": "id:datafile_3371438",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "2019-02-25.tab",
        "type": "file",
        "url": "https://dataverse.harvard.edu/api/access/datafile/3371438",
        "file_id": "3371438",
        "description": "",
        "published_at": "2019-02-26T03:03:13Z",
        "file_type": "Tab-Delimited",
        "file_content_type": "text/tab-separated-values",
        "size_in_bytes": 17232,
        "md5": "9bd94d028049c9a53bca9bb19d4fb57e",
        "checksum": {
          "type": "MD5",
          "value": "9bd94d028049c9a53bca9bb19d4fb57e"
        },
        "unf": "UNF:6:2MMoV8KKO8R7sb27Q5GXtA==",
        "file_persistent_id": "doi:10.7910/DVN/TJCLKP/3VSTKY",
        "dataset_name": "Open Source at Harvard",
        "dataset_id": "3035124",
        "dataset_persistent_id": "doi:10.7910/DVN/TJCLKP",
        "dataset_citation": "Durbin, Philip, 2017, \"Open Source at Harvard\", https://doi.org/10.7910/DVN/TJCLKP, Harvard Dataverse, DRAFT VERSION, UNF:6:2MMoV8KKO8R7sb27Q5GXtA== [fileUNF]"
      }
    ],
    "count_in_response": 1
  }
}

Perhaps we don't reindex the file after ingest is complete? I'm not sure. You could test this by making a change to your draft dataset metadata (add a keyword or something). This will reindex the dataaset and its files.

from dataverse-client-r.

kuriwaki avatar kuriwaki commented on August 19, 2024

Yes! It was sufficient to add a data description to the draft dataset, and it somehow updated. Thank you.

from dataverse-client-r.

pdurbin avatar pdurbin commented on August 19, 2024

@kuriwaki hmm, I can replicate this on "develop" on my laptop (around 0d853b74e9). When I first upload a file to a draft, the UNF does not appear in search results...

$ curl -s -H X-Dataverse-key:$API_TOKEN http://localhost:8080/api/search?q=id:datafile_5_draft | jq .
{
  "status": "OK",
  "data": {
    "q": "id:datafile_5_draft",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "2016-06-29.csv",
        "type": "file",
        "url": "http://localhost:8080/api/access/datafile/5",
        "file_id": "5",
        "file_type": "Comma Separated Values",
        "file_content_type": "text/csv",
        "size_in_bytes": 58690,
        "md5": "d5de092a84304a9965c787b8dcd27c99",
        "checksum": {
          "type": "MD5",
          "value": "d5de092a84304a9965c787b8dcd27c99"
        },
        "dataset_name": "zzz",
        "dataset_id": "4",
        "dataset_persistent_id": "doi:10.5072/FK2/JJK8WY",
        "dataset_citation": "Admin, Dataverse, 2022, \"zzz\", https://doi.org/10.5072/FK2/JJK8WY, Root, DRAFT VERSION"
      }
    ],
    "count_in_response": 1
  }
}

... but if I edit the metadata of the draft dataset (forcing the file to be reindexed, the UNF appears):

$ curl -s -H X-Dataverse-key:$API_TOKEN http://localhost:8080/api/search?q=id:datafile_5_draft | jq .
{
  "status": "OK",
  "data": {
    "q": "id:datafile_5_draft",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "2016-06-29.tab",
        "type": "file",
        "url": "http://localhost:8080/api/access/datafile/5",
        "file_id": "5",
        "file_type": "Tab-Delimited",
        "file_content_type": "text/tab-separated-values",
        "size_in_bytes": 59208,
        "md5": "d5de092a84304a9965c787b8dcd27c99",
        "checksum": {
          "type": "MD5",
          "value": "d5de092a84304a9965c787b8dcd27c99"
        },
        "unf": "UNF:6:6YVg+pUWsYD52stDkZuzUA==",
        "dataset_name": "zzzyyy",
        "dataset_id": "4",
        "dataset_persistent_id": "doi:10.5072/FK2/JJK8WY",
        "dataset_citation": "Admin, Dataverse, 2022, \"zzzyyy\", https://doi.org/10.5072/FK2/JJK8WY, Root, DRAFT VERSION, UNF:6:6YVg+pUWsYD52stDkZuzUA== [fileUNF]"
      }
    ],
    "count_in_response": 1
  }
}

Please feel free to open an issue about this at https://github.com/IQSS/dataverse/issues if you'd like.

from dataverse-client-r.

kuriwaki avatar kuriwaki commented on August 19, 2024

Addressed by 0.3.11.

from dataverse-client-r.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.