Git Product home page Git Product logo

dvwebloader's People

Contributors

donsizemore avatar luddaniel avatar qqmyers avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

dvwebloader's Issues

gdcc.io URL wanted?

Hi @qqmyers,

if you want I can create any gdcc.io subdomain for you. Just name it and it shall be done.

If this isn't desirable, simply close this issue.

ERROR","message":"Invalid token=STRING ... Expected tokens are: [COMMA]"

Apologies in advance - this is probably my error error, not sure what I'm doing incorrectly. I'm running the curl command and get the error:
{"status":"ERROR","message":"Invalid token=STRING at (line no=7, column no=9, offset=194). Expected tokens are: [COMMA]"}[ec2-user

This is what I have for the curl command:

curl -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools -d \
'{
  "displayName": "Dataverse WebLoader",
  "description": "Upload all  the files in a local directory!",
  "toolName": "dvwebloader",
  "scope": "dataset",
  "contentType":"text/plain"
  "types": [
    "explore"
  ],
  "toolUrl": "https://gdcc.github.io/dvwebloader/src/dvwebloader.html",
  "toolParameters": {
    "queryParameters": [
      {
        "siteUrl": "https://dataverse.ucla.edu"
      },
      {
        "datasetPid": "doi:10.25346/S6/EZYHS4"
      },
      {
        "key": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"    
      }
    ]
  }
}'

Thank you, jamie

filename validation ConstraintViolationException

With dv-webloader, one may drop a folder containing thousands of files and begin upload. dv-webloader will upload them. The user sees:
Screenshot 2023-08-11 at 10 46 10

Then, in the javascript console, things go splat:
Screenshot 2023-08-11 at 10 44 41

Payara's server.log sez

Constraint violation found in FileMetadata. File Name cannot contain any of the following characters:  / : * ? " < > | ; # . The invalid value is "filename".

  javax.ejb.EJBException: One or more Bean Validation constraints were violated while executing Automatic Bean Validation on callback event: prePersist for class: edu.harvard.iq.dataverse.FileMetadata. Please refer to the embedded constr
aint violations for details.
        at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723)
        at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652)
        at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482)
        at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601)
        at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134)
        at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104)
        at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220)
        at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
        at com.sun.proxy.$Proxy377.submit(Unknown Source)
        at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngineInner__Intf____Bean__.submit(Unknown Source)
        at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:272)
        at jdk.internal.reflect.GeneratedMethodAccessor1337.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
        at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408)
        at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
        at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:665)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
        at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)
        at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
        at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
        at jdk.internal.reflect.GeneratedMethodAccessor150.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
        at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)
        at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
        at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
        at jdk.internal.reflect.GeneratedMethodAccessor144.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
        at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
        at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807)
        at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
        at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
        at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
        at com.sun.proxy.$Proxy405.submit(Unknown Source)
        at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngine__Intf____Bean__.submit(Unknown Source)
        at edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper.addFiles(AddReplaceFileHelper.java:2148)
        at edu.harvard.iq.dataverse.api.Datasets.addFilesToDataset(Datasets.java:3467)
Caused by: javax.validation.ConstraintViolationException: One or more Bean Validation constraints were violated while executing Automatic Bean Validation on callback event: prePersist for class: edu.harvard.iq.dataverse.FileMetadata. Please refer to the embedded constraint violations for details.
        at org.eclipse.persistence.internal.jpa.metadata.listeners.BeanValidationListener.validateOnCallbackEvent(BeanValidationListener.java:124)
        at org.eclipse.persistence.internal.jpa.metadata.listeners.BeanValidationListener.prePersist(BeanValidationListener.java:86)
        at org.eclipse.persistence.descriptors.DescriptorEventManager.notifyListener(DescriptorEventManager.java:746)
        at org.eclipse.persistence.descriptors.DescriptorEventManager.notifyEJB30Listeners(DescriptorEventManager.java:689)
        at org.eclipse.persistence.descriptors.DescriptorEventManager.executeEvent(DescriptorEventManager.java:233)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerNewObjectClone(UnitOfWorkImpl.java:4468)
        at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.cloneAndRegisterNewObject(RepeatableWriteUnitOfWork.java:613)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalRegisterObject(UnitOfWorkImpl.java:3045)
        at org.eclipse.persistence.internal.sessions.MergeManager.registerObjectForMergeCloneIntoWorkingCopy(MergeManager.java:1102)
        at org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfCloneIntoWorkingCopy(MergeManager.java:575)
        at org.eclipse.persistence.internal.sessions.MergeManager.mergeChanges(MergeManager.java:324)
        at org.eclipse.persistence.mappings.CollectionMapping.mergeIntoObject(CollectionMapping.java:1650)
        at org.eclipse.persistence.internal.descriptors.ObjectBuilder.mergeIntoObject(ObjectBuilder.java:4202)
        at org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfCloneIntoWorkingCopy(MergeManager.java:612)
        at org.eclipse.persistence.internal.sessions.MergeManager.mergeChanges(MergeManager.java:324)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.mergeCloneWithReferences(UnitOfWorkImpl.java:3637)
        at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.mergeCloneWithReferences(RepeatableWriteUnitOfWork.java:389)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.mergeCloneWithReferences(UnitOfWorkImpl.java:3597)
        at org.eclipse.persistence.internal.jpa.EntityManagerImpl.mergeInternal(EntityManagerImpl.java:648)
        at org.eclipse.persistence.internal.jpa.EntityManagerImpl.merge(EntityManagerImpl.java:625)
        at com.sun.enterprise.container.common.impl.EntityManagerWrapper.merge(EntityManagerWrapper.java:307)
        at edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand.execute(UpdateDatasetVersionCommand.java:129)
        at edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand.execute(UpdateDatasetVersionCommand.java:30)
        at edu.harvard.iq.dataverse.EjbDataverseEngineInner.submit(EjbDataverseEngineInner.java:36)

It would be great if dv-webloader could either percolate the user-friendly error back to the dv-webloader UI, or perhaps pre-emptively search for such characters before attempting to upload files?

Multipart file upload failing relative to S3 direct upload

Current behavior:

When using dvwebloader to upload a file that uses multipart upload the upload of said files appears to fail with error 401.
This is coming form dvwebloader since the same same is uploaded with no difficulty when uploaded using the normal dataverse interface.

From the logs it looks like the proper api key is not being passed in that case.


[#|2022-10-06T07:42:43.035+0000|WARNING|Payara 5.2021.1|edu.harvard.iq.dataverse.util.BundleUtil|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(5);_TimeMillis=1665042163035;_LevelValue=900;|
  Could not find key "externaltools.dvwebloader.displayname" in bundle file: |#

Expected behavior :

the File being uploaded

Reproducing steps :

Upload a file that uses multipart upload. Currently on https://test-docker.dataverse.no/ the part size is about 1gB.

Failure with dataset containing an ingested file

The check related to finding existing files that match something being uploaded in the case where the file in the dataset has been ingested fails with a 'equals in not a function' error. The comparison should just be === as equals is not a Javascript method. This is fixed in #13 and could be back-ported to the main branch if that isn't merged soon.

Upload failed on pre-GitHub version

Before Jim published the DVWebloader in GitHub, I tried to upload a folder called “b3lyp-631pgs” containing 2 subfolders and a total of 1076 files amounting to a total of approx. 934 MB. After a while the upload process stopped. I sent the log file to Jim.

Part of his answer was: FWIW: It looks like your server has a low part size around 5 MB – probably OK but I think our default is larger so I wasn’t seeing the issue with files in ~10MB range

[Jim: I created this issue because I want to refer to it from our GitHub repo.]

File upload doesn't start

I tried to start file upload by loading the html file into my Chrome browser using this address:

file:///C:/Users/pco000/Desktop/Files/DVWebloader/dvwebloader.html?siteUrl=https://test-docker.dataverse.no/&datasetPid=doi:10.21337/F3YZSW&key=36be7c2a-8f3b-429a-a9bc-b0abeecc809f

As far as I can see, the address has the same structure as the one I used when testing the previous version of DVWebloader, before Jim put it on GitHub.

When I click Enter, I see "Select a Directory" in the upper left corner of the screen (it looks somewhat different from last time, not like a button anymore, just text).

In Chrome Inspect > Console, I see the following:

dvwebloader.css:1 Failed to load resource: net::ERR_FILE_NOT_FOUND
fileupload2.js:1 Failed to load resource: net::ERR_FILE_NOT_FOUND
dvwebloader.html?siteUrl=https://test-docker.dataverse.no/&datasetPid=doi:10.21337/F3YZSW&key=36be7c2a-8f3b-429a-a9bc-b0abeecc809f:18 Uncaught ReferenceError: queueFileForDirectUpload is not defined
at input.onchange (dvwebloader.html?siteUrl=https://test-docker.dataverse.no/&datasetPid=doi:10.21337/F3YZSW&key=36be7c2a-8f3b-429a-a9bc-b0abeecc809f:18:7)

ERROR","message":"Unknown reserved word

error: {"status":"ERROR","message":"Unknown reserved word: https://dataverse.ucla.edu"}
I'm trying to compare this to the examples in https://guides.dataverse.org/en/latest/api/external-tools.html. The reserved words look correct and I get a similar error if I change the order - pid or key first.

curl -vv -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools -d \

'{
"displayName": "Dataverse WebLoader",
"description": "Upload all the files in a local directory!",
"toolName": "dvwebloader",
"scope": "dataset",
"contentType":"text/plain",
"types": [
"explore"
],
"toolUrl": "https://gdcc.github.io/dvwebloader/src/dvwebloader.html",
"toolParameters": {
"queryParameters": [
{
"siteUrl": "https://dataverse.ucla.edu"
},
{
"datasetPid": "doi:10.25346/S6/EZYHS4"
},
{
"key": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
]
}
}'

  • About to connect() to localhost port 8080 (#0)
  • Trying ::1...
  • Connected to localhost (::1) port 8080 (#0)

POST /api/admin/externalTools HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:8080
Accept: /
Content-type: application/json
Content-Length: 548

  • upload completely sent off: 548 out of 548 bytes
    < HTTP/1.1 400 Bad Request
    < Server: Payara Server 5.2021.6 #badassfish
    < X-Powered-By: Servlet/4.0 JSP/2.3 (Payara Server 5.2021.6 #badassfish Java/Red Hat, Inc./11)
    < Content-Type: application/json;charset=UTF-8
    < Connection: close
    < Content-Length: 80
    < X-Frame-Options: SAMEORIGIN
    <
  • Closing connection 0
    {"status":"ERROR","message":"Unknown reserved word: https://dataverse.ucla.edu"}[ec2-user@ip-172-31-17-169 ~]$

Error message when trying to upload from Linux/Ubuntu/Chromium through Wi-fi

@qqmyers @Louis-wr

Earlier today, I tried to upload some 5 GB files from Linux/Ubuntu/Chromium through Wi-fi, and got the following error messages:

Successful upload of part 811 of 820
fileupload2.js:292 Successful upload of part 820 of 820
fileupload2.js:292 Successful upload of part 812 of 820

...

fileupload2.js:292 Successful upload of part 817 of 820
fileupload2.js:292 Successful upload of part 818 of 820
fileupload2.js:292 Successful upload of part 819 of 820
fileupload2.js:340 reporting file data_0.bag
dvwebloader.html:1 Access to XMLHttpRequest at 'https://test-docker.dataverse.no/api/datasets/mpupload?globalid=doi:10.21337/BGMM6J&uploadid=NTQ2OTAxMDIxNjY1NjM5NjQ1NTM1&storageidentifier=S3://2002-green-dataversenotest1:183cfdc010b-df234abd0e00' from origin 'https://gdcc.github.io' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
fileupload2.js:420 Failure: 0
fileupload2.js:421 Failure:
test-docker.dataverse.no/api/datasets/mpupload?globalid=doi:10.21337/BGMM6J&uploadid=NTQ2OTAxMDIxNjY1NjM5NjQ1NTM1&storageidentifier=S3://2002-green-dataversenotest1:183cfdc010b-df234abd0e00:1 Failed to load resource: net::ERR_FAILED
fileupload2.js:375 md5 done
fileupload2.js:429 handling
fileupload2.js:433 handling2
fileupload2.js:551 0 : 4 : 5 : 0

Security topic : Api Key and Signed urls

Hello @qqmyers and everybody, I was wondering about the security regarding the use of Api Key &key=xxx as query string of the url.
Understand that I just want to open the dialogue on this topic.

Api Key is required to use Direct DataFile Upload/Replace APIs but the security risk seems important to me; Non IT user may share this url or may keep browser history on a shared computer and give their level of access on Dataverse.

Security is important and this issue has been addressed for Dataverse External Tools with the option Signed URLs. I don't know if it's possible to use it right now but it might be an idea to work on this (maybe extend Dataverse Signed Url scope to more than only External Tools if it's not).

Here is a non-exhaustive list of benefits to consider :

  • No security issues regarding accidental share of Api Key
  • Limited authorised scope of api endpoints and time of use (a full day is safe enough)
  • No issue regarding Api Key creation and expiration ("Your key is expired, you must renew it before use DVWL...")

What do you think ?
Best regards

directory upload not working

Current behavior:

Directory upload is apparently currently not working, when uploading a directory we get the flowing console out and nothing gets uploaded :

before folder/Screenshot_2022-04-08_10-35-37.png fileupload2.js:61:21
Screenshot_2022-04-08_10-35-37.png  fileupload2.js:558:13

Uncaught TypeError: right-hand side of 'in' should be an object, got undefined
    queueFileForDirectUpload https://gdcc.github.io/dvwebloader/src/js/fileupload2.js:559
    onchange https://gdcc.github.io/dvwebloader/src/js/fileupload2.js:62

Expected behavior :

No error/files uploaded

Reproducing steps :

Use Dvwebloader to upload a directory

Exit after successful file upload

I have been able to upload several ~5 GB files organized in several folders. However, I'm not sure how to proceed after successful file upload:

  1. Maybe some instructions could be added, e.g., "You may now close this web browser tab" (or similar).

  2. When refreshing the dataset landing page one needs to confirm the following message:
    image

But after clicking the Continue button, a new DVWebloader tab is opened.

To see the uploaded file, I actually had to close the tab with the dataset landing page also, and then click into the dataset from the repository/collection landing page once more.

Depending on how this tool/feature is further developed, the comments above are possible irrelevant.

I guess the idea is that folder/file upload as now implemented in DVWebloader would be implemented so that a Dataverse installation admin could configure this kind of upload as the default, and depositors would simply click the usual upload button to upload folders/files this way. Another side is download. Will it work along the same lines? Maybe we can discuss further progress on email or a call. Thanks!

No verification of duplicated files

Current behavior:

The check regarding file duplication does not trigger when uploading a duplicate of a file with dvwebloader.

Expected behavior :

When uploading a duplicate of a file that is already in the datasets a splash screen should warn the user.
image

Reproducing steps :

Upload the same folder twice with dvwebloader to the same dataset.

document integrating dv-webloader in 5.13+

README.md documents externaltool-ness for 5.12 installations, but 5.13 and above may integrate with an "Upload Folder" button via two database settings. PR forthcoming.

updating existing dataset refresh/usability

Nick Lauland of TDL reports:

The page says "Close this window and refresh your dataset page to see the uploaded files", but this is only if you are adding to a draft. If you add to an already published dataset, a new draft is created, but the browser page still shows the published version and we've had some confused users.

Perhaps a version check on the dataset, to determine whether to append &version=DRAFT?

Don't restore 'Start Upload' button after an upload has been done

As of now, the dvwebloader can only be used once without being refreshed (could be fixed). If someone checks/unchecks some files in the interface, the 'Start Upload' button is restored and it is possible to try another upload. While the files go to S3, the step to add them to the dataset fails. At a minimum, the logic to turn on that button (whose purpose is to turn the button on/off depending on whether you have some files checked) should not restore it after an upload has been done.

tell users to renew their API tokens

Just tried dv-webloader on a 5.13 test system. Visually the screen hangs, telling me: Getting Dataset Information...

Javascript console sez 401 Unauthorized

Payara log sez attempted access with expired token

It would be user-friendly to percolate this last error back to the browser.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.