gdcc / dvwebloader Goto Github PK
View Code? Open in Web Editor NEWA web tool for uploading folders of files to a Dataverse dataset
License: Apache License 2.0
A web tool for uploading folders of files to a Dataverse dataset
License: Apache License 2.0
Hi @qqmyers,
if you want I can create any gdcc.io
subdomain for you. Just name it and it shall be done.
If this isn't desirable, simply close this issue.
Apologies in advance - this is probably my error error, not sure what I'm doing incorrectly. I'm running the curl command and get the error:
{"status":"ERROR","message":"Invalid token=STRING at (line no=7, column no=9, offset=194). Expected tokens are: [COMMA]"}[ec2-user
This is what I have for the curl command:
curl -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools -d \
'{
"displayName": "Dataverse WebLoader",
"description": "Upload all the files in a local directory!",
"toolName": "dvwebloader",
"scope": "dataset",
"contentType":"text/plain"
"types": [
"explore"
],
"toolUrl": "https://gdcc.github.io/dvwebloader/src/dvwebloader.html",
"toolParameters": {
"queryParameters": [
{
"siteUrl": "https://dataverse.ucla.edu"
},
{
"datasetPid": "doi:10.25346/S6/EZYHS4"
},
{
"key": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
]
}
}'
Thank you, jamie
With dv-webloader, one may drop a folder containing thousands of files and begin upload. dv-webloader will upload them. The user sees:
Then, in the javascript console, things go splat:
Payara's server.log sez
Constraint violation found in FileMetadata. File Name cannot contain any of the following characters: / : * ? " < > | ; # . The invalid value is "filename".
javax.ejb.EJBException: One or more Bean Validation constraints were violated while executing Automatic Bean Validation on callback event: prePersist for class: edu.harvard.iq.dataverse.FileMetadata. Please refer to the embedded constr
aint violations for details.
at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723)
at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652)
at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482)
at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601)
at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134)
at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
at com.sun.proxy.$Proxy377.submit(Unknown Source)
at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngineInner__Intf____Bean__.submit(Unknown Source)
at edu.harvard.iq.dataverse.EjbDataverseEngine.submit(EjbDataverseEngine.java:272)
at jdk.internal.reflect.GeneratedMethodAccessor1337.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:665)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
at jdk.internal.reflect.GeneratedMethodAccessor150.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)
at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
at jdk.internal.reflect.GeneratedMethodAccessor144.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807)
at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
at com.sun.proxy.$Proxy405.submit(Unknown Source)
at edu.harvard.iq.dataverse.__EJB31_Generated__EjbDataverseEngine__Intf____Bean__.submit(Unknown Source)
at edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper.addFiles(AddReplaceFileHelper.java:2148)
at edu.harvard.iq.dataverse.api.Datasets.addFilesToDataset(Datasets.java:3467)
Caused by: javax.validation.ConstraintViolationException: One or more Bean Validation constraints were violated while executing Automatic Bean Validation on callback event: prePersist for class: edu.harvard.iq.dataverse.FileMetadata. Please refer to the embedded constraint violations for details.
at org.eclipse.persistence.internal.jpa.metadata.listeners.BeanValidationListener.validateOnCallbackEvent(BeanValidationListener.java:124)
at org.eclipse.persistence.internal.jpa.metadata.listeners.BeanValidationListener.prePersist(BeanValidationListener.java:86)
at org.eclipse.persistence.descriptors.DescriptorEventManager.notifyListener(DescriptorEventManager.java:746)
at org.eclipse.persistence.descriptors.DescriptorEventManager.notifyEJB30Listeners(DescriptorEventManager.java:689)
at org.eclipse.persistence.descriptors.DescriptorEventManager.executeEvent(DescriptorEventManager.java:233)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerNewObjectClone(UnitOfWorkImpl.java:4468)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.cloneAndRegisterNewObject(RepeatableWriteUnitOfWork.java:613)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalRegisterObject(UnitOfWorkImpl.java:3045)
at org.eclipse.persistence.internal.sessions.MergeManager.registerObjectForMergeCloneIntoWorkingCopy(MergeManager.java:1102)
at org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfCloneIntoWorkingCopy(MergeManager.java:575)
at org.eclipse.persistence.internal.sessions.MergeManager.mergeChanges(MergeManager.java:324)
at org.eclipse.persistence.mappings.CollectionMapping.mergeIntoObject(CollectionMapping.java:1650)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.mergeIntoObject(ObjectBuilder.java:4202)
at org.eclipse.persistence.internal.sessions.MergeManager.mergeChangesOfCloneIntoWorkingCopy(MergeManager.java:612)
at org.eclipse.persistence.internal.sessions.MergeManager.mergeChanges(MergeManager.java:324)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.mergeCloneWithReferences(UnitOfWorkImpl.java:3637)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.mergeCloneWithReferences(RepeatableWriteUnitOfWork.java:389)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.mergeCloneWithReferences(UnitOfWorkImpl.java:3597)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.mergeInternal(EntityManagerImpl.java:648)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.merge(EntityManagerImpl.java:625)
at com.sun.enterprise.container.common.impl.EntityManagerWrapper.merge(EntityManagerWrapper.java:307)
at edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand.execute(UpdateDatasetVersionCommand.java:129)
at edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand.execute(UpdateDatasetVersionCommand.java:30)
at edu.harvard.iq.dataverse.EjbDataverseEngineInner.submit(EjbDataverseEngineInner.java:36)
It would be great if dv-webloader could either percolate the user-friendly error back to the dv-webloader UI, or perhaps pre-emptively search for such characters before attempting to upload files?
When using dvwebloader to upload a file that uses multipart upload the upload of said files appears to fail with error 401.
This is coming form dvwebloader since the same same is uploaded with no difficulty when uploaded using the normal dataverse interface.
From the logs it looks like the proper api key is not being passed in that case.
[#|2022-10-06T07:42:43.035+0000|WARNING|Payara 5.2021.1|edu.harvard.iq.dataverse.util.BundleUtil|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(5);_TimeMillis=1665042163035;_LevelValue=900;|
Could not find key "externaltools.dvwebloader.displayname" in bundle file: |#
the File being uploaded
Upload a file that uses multipart upload. Currently on https://test-docker.dataverse.no/ the part size is about 1gB.
The check related to finding existing files that match something being uploaded in the case where the file in the dataset has been ingested fails with a 'equals in not a function' error. The comparison should just be === as equals is not a Javascript method. This is fixed in #13 and could be back-ported to the main branch if that isn't merged soon.
Before Jim published the DVWebloader in GitHub, I tried to upload a folder called “b3lyp-631pgs” containing 2 subfolders and a total of 1076 files amounting to a total of approx. 934 MB. After a while the upload process stopped. I sent the log file to Jim.
Part of his answer was: FWIW: It looks like your server has a low part size around 5 MB – probably OK but I think our default is larger so I wasn’t seeing the issue with files in ~10MB range
[Jim: I created this issue because I want to refer to it from our GitHub repo.]
As a developer, I would like to be able to add a new translation of texts in a language other than English.
I tried to start file upload by loading the html file into my Chrome browser using this address:
file:///C:/Users/pco000/Desktop/Files/DVWebloader/dvwebloader.html?siteUrl=https://test-docker.dataverse.no/&datasetPid=doi:10.21337/F3YZSW&key=36be7c2a-8f3b-429a-a9bc-b0abeecc809f
As far as I can see, the address has the same structure as the one I used when testing the previous version of DVWebloader, before Jim put it on GitHub.
When I click Enter, I see "Select a Directory" in the upper left corner of the screen (it looks somewhat different from last time, not like a button anymore, just text).
In Chrome Inspect > Console, I see the following:
dvwebloader.css:1 Failed to load resource: net::ERR_FILE_NOT_FOUND
fileupload2.js:1 Failed to load resource: net::ERR_FILE_NOT_FOUND
dvwebloader.html?siteUrl=https://test-docker.dataverse.no/&datasetPid=doi:10.21337/F3YZSW&key=36be7c2a-8f3b-429a-a9bc-b0abeecc809f:18 Uncaught ReferenceError: queueFileForDirectUpload is not defined
at input.onchange (dvwebloader.html?siteUrl=https://test-docker.dataverse.no/&datasetPid=doi:10.21337/F3YZSW&key=36be7c2a-8f3b-429a-a9bc-b0abeecc809f:18:7)
error: {"status":"ERROR","message":"Unknown reserved word: https://dataverse.ucla.edu"}
I'm trying to compare this to the examples in https://guides.dataverse.org/en/latest/api/external-tools.html. The reserved words look correct and I get a similar error if I change the order - pid or key first.
curl -vv -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools -d \
'{
"displayName": "Dataverse WebLoader",
"description": "Upload all the files in a local directory!",
"toolName": "dvwebloader",
"scope": "dataset",
"contentType":"text/plain",
"types": [
"explore"
],
"toolUrl": "https://gdcc.github.io/dvwebloader/src/dvwebloader.html",
"toolParameters": {
"queryParameters": [
{
"siteUrl": "https://dataverse.ucla.edu"
},
{
"datasetPid": "doi:10.25346/S6/EZYHS4"
},
{
"key": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
]
}
}'
POST /api/admin/externalTools HTTP/1.1
User-Agent: curl/7.29.0
Host: localhost:8080
Accept: /
Content-type: application/json
Content-Length: 548
Earlier today, I tried to upload some 5 GB files from Linux/Ubuntu/Chromium through Wi-fi, and got the following error messages:
Successful upload of part 811 of 820
fileupload2.js:292 Successful upload of part 820 of 820
fileupload2.js:292 Successful upload of part 812 of 820
...
fileupload2.js:292 Successful upload of part 817 of 820
fileupload2.js:292 Successful upload of part 818 of 820
fileupload2.js:292 Successful upload of part 819 of 820
fileupload2.js:340 reporting file data_0.bag
dvwebloader.html:1 Access to XMLHttpRequest at 'https://test-docker.dataverse.no/api/datasets/mpupload?globalid=doi:10.21337/BGMM6J&uploadid=NTQ2OTAxMDIxNjY1NjM5NjQ1NTM1&storageidentifier=S3://2002-green-dataversenotest1:183cfdc010b-df234abd0e00' from origin 'https://gdcc.github.io' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.
fileupload2.js:420 Failure: 0
fileupload2.js:421 Failure:
test-docker.dataverse.no/api/datasets/mpupload?globalid=doi:10.21337/BGMM6J&uploadid=NTQ2OTAxMDIxNjY1NjM5NjQ1NTM1&storageidentifier=S3://2002-green-dataversenotest1:183cfdc010b-df234abd0e00:1 Failed to load resource: net::ERR_FAILED
fileupload2.js:375 md5 done
fileupload2.js:429 handling
fileupload2.js:433 handling2
fileupload2.js:551 0 : 4 : 5 : 0
See IQSS/dataverse#9431 - Dataverse supports algorithms other than MD5 (SHA-1, SHA-256, and SHA-512) for checksums. There hasn't been a way for external apps to discover which algorithm is supported but the resolution of #9431 will add a new API call for that which can then be used by dvwebloader.
PR to follow. (fixed at QDR)
Hello @qqmyers and everybody, I was wondering about the security regarding the use of Api Key &key=xxx
as query string of the url.
Understand that I just want to open the dialogue on this topic.
Api Key is required to use Direct DataFile Upload/Replace APIs but the security risk seems important to me; Non IT user may share this url or may keep browser history on a shared computer and give their level of access on Dataverse.
Security is important and this issue has been addressed for Dataverse External Tools with the option Signed URLs. I don't know if it's possible to use it right now but it might be an idea to work on this (maybe extend Dataverse Signed Url scope to more than only External Tools if it's not).
Here is a non-exhaustive list of benefits to consider :
What do you think ?
Best regards
Directory upload is apparently currently not working, when uploading a directory we get the flowing console out and nothing gets uploaded :
before folder/Screenshot_2022-04-08_10-35-37.png fileupload2.js:61:21
Screenshot_2022-04-08_10-35-37.png fileupload2.js:558:13
Uncaught TypeError: right-hand side of 'in' should be an object, got undefined
queueFileForDirectUpload https://gdcc.github.io/dvwebloader/src/js/fileupload2.js:559
onchange https://gdcc.github.io/dvwebloader/src/js/fileupload2.js:62
No error/files uploaded
Use Dvwebloader to upload a directory
I have been able to upload several ~5 GB files organized in several folders. However, I'm not sure how to proceed after successful file upload:
Maybe some instructions could be added, e.g., "You may now close this web browser tab" (or similar).
When refreshing the dataset landing page one needs to confirm the following message:
But after clicking the Continue button, a new DVWebloader tab is opened.
To see the uploaded file, I actually had to close the tab with the dataset landing page also, and then click into the dataset from the repository/collection landing page once more.
Depending on how this tool/feature is further developed, the comments above are possible irrelevant.
I guess the idea is that folder/file upload as now implemented in DVWebloader would be implemented so that a Dataverse installation admin could configure this kind of upload as the default, and depositors would simply click the usual upload button to upload folders/files this way. Another side is download. Will it work along the same lines? Maybe we can discuss further progress on email or a call. Thanks!
The check regarding file duplication does not trigger when uploading a duplicate of a file with dvwebloader.
When uploading a duplicate of a file that is already in the datasets a splash screen should warn the user.
Upload the same folder twice with dvwebloader to the same dataset.
README.md documents externaltool-ness for 5.12 installations, but 5.13 and above may integrate with an "Upload Folder" button via two database settings. PR forthcoming.
Nick Lauland of TDL reports:
The page says "Close this window and refresh your dataset page to see the uploaded files", but this is only if you are adding to a draft. If you add to an already published dataset, a new draft is created, but the browser page still shows the published version and we've had some confused users.
Perhaps a version check on the dataset, to determine whether to append &version=DRAFT
?
As of now, the dvwebloader can only be used once without being refreshed (could be fixed). If someone checks/unchecks some files in the interface, the 'Start Upload' button is restored and it is possible to try another upload. While the files go to S3, the step to add them to the dataset fails. At a minimum, the logic to turn on that button (whose purpose is to turn the button on/off depending on whether you have some files checked) should not restore it after an upload has been done.
Just tried dv-webloader on a 5.13 test system. Visually the screen hangs, telling me: Getting Dataset Information...
Javascript console sez 401 Unauthorized
Payara log sez attempted access with expired token
It would be user-friendly to percolate this last error back to the browser.
I'm reviewing the code on the Dataverse side at IQSS/dataverse#9096 and plan to use this issue to list bugs and typos I find:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.