Git Product home page Git Product logo

airavata-mft's Introduction

Airavata Managed File Transfers (MFT)

Apache Airavata MFT is a high-performance, multi-protocol data transfer engine to orchestrate data movement and operations across most cloud and On-premises storages. MFT aims to abstract the complexity of heterogenous storages by providing a unified and simple interface for users to seamlessly access and move data across any storage endpoint. To accomplish this goal, MFT provides simple but highly-performing tools to access most cloud and on-premise storages as seamlessly as they access local files in their workstations.

Apache Airavata MFT bundles easily deployable agents that auto determine optimum network path with additional multi-channel, parallel data paths to optimize the transfer performance to gain the maximum throughput between storage endpoints. MFT utilizes parallel Agents to transfer data between endpoints to gain the advantage of multiple network links.

Try Airavata MFT

MFT requires Java 11+ and python3.10+ to install Airavata MFT in your environment. MFT currently supports Linux and MacOS operating systems. Contributions to support Windows are welcome!!.

Download and Install

Following commands will download Airavata MFT into your machine and start the MFT service.

pip3 install airavata-mft-cli
mft init

If the installer failed for M1 and M2 Macs complaining about grpcio installation. Follow the solution mentioned in here. You might have to uninstall already installed grpcio and grpcio-tools distributions first. For other common installation issues, please refer to the troubleshooting section.

To stop MFT after using

mft stop

Registering Storages

First you need to register your storage endpoints into MFT in order to access them. Registering storage is an interactive process and you can easily register those without prior knowledge

mft storage add

This will ask the type of storage you need and credentials to access those storages. To list already added storages, you can run

mft storage list

Accessing Data in Storages

In Airavata MFT, we provide a unified interface to access the data in any storage. Users can access data in storages just as they access data in their computers. MFT converts user queries into storage specific data representations (POSIX, Block, Objects, ..) internally

mft ls <storage name>
mft ls <storage name>/<resource path>

Moving Data between Storages

Copying data between storages are simple as copying data between directories of local machine for users. MFT takes care of network path optimizations, parallel data path selections and selections or creations of suitable transfer agents.

mft cp <source storage name>/<path> <destination storage name>/<path> 

MFT is capable of auto detecting directory copying and file copying based on the path given.

Troubleshooting and Issue Reporting

This is our very first attempt release Airavata MFT for community usage and there might be lots of corner cases that we have not noticed. All the logs of MFT service are available in ~/.mft/Standalone-Service-0.01/logs/airavata.log. If you see any error while using MFT, please report that in our Github issue page and we will respond as soon as possible. We really appreciate your contribution as it will greatly help to improve the stability of the product.

Common issues

  • Following error can be noticed if you have a python version which is less than 3.10
  ERROR: Could not find a version that satisfies the requirement airavata-mft-cli (from versions: none)
  ERROR: No matching distribution found for airavata-mft-cli

If the Error still occurs after installing the right python version, try creating a virtual environemnt

python3.10 -m venv venv
source venv/bin/activate
pip install airavata-mft-cli

airavata-mft's People

Contributors

aksrajvanshi avatar dependabot[bot] avatar dimuthuupe avatar dinukadesilva avatar ezkayotwjprkxwcuyeeb avatar futexfreezer avatar gkiran292 avatar impiyush83 avatar isururanawaka avatar jayancv avatar jlleitschuh avatar lahirujayathilake avatar machristie avatar nandarshan avatar pokearu avatar praneethchityala avatar sharanya17410 avatar shivangmishra avatar smarru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airavata-mft's Issues

Encrypt Agent messages in consul

Currently the messages published to agent through controller is in un encrypted format: https://github.com/apache/airavata-mft/blob/master/controller/src/main/java/org/apache/airavata/mft/controller/TransferDispatcher.java#L73. We need to encrypt agentTransferRequest object so that only the target agent can decrypt it. Typically the agent connects to the consul through a ssh tunnel. https://github.com/apache/airavata-mft/blob/master/controller/src/main/java/org/apache/airavata/mft/controller/spawner/SSHProvider.java#L123

One option is to encrypt it through the public key of Agent which is being used to create the ssh tunnel. Other option is to share a symetric key between agent and the controller when the initial connection is created and use that key to encrypt / decrypt messages

mft cp command with OVH S3 buckets

Describe the bug
I try to copy a file from an S3 storage to another S3 storage. Both buckets are hosted on OVH provider.
mft ls storage-name works and lists the objects present in the source and destination bucket.
mft cp storage-name1/file.txt storage-name2/ command creates a file on the destination bucket, but size is zero.
mft cp command output:

user@mft-master-01:~$ mft cp blaze-test-s3-gra/hello.txt blaze-test-02-s3-gra/
Total number of 1 files to be transferred. Total volume is 6 bytes. Do you want to start the transfer?  [Y/n]: y
  [------------------------------------]    0%
local variable 'prev_percentage' referenced before assignment

There are some errors in airava.log file:

2023-08-24 12:44:08,960 [pool-9-thread-20] WARN  com.amazonaws.services.s3.internal.S3AbortableInputStream {} - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2023-08-24 12:44:08,989 [pool-9-thread-20] WARN  com.amazonaws.services.s3.internal.S3AbortableInputStream {} - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2023-08-24 12:44:08,989 [pool-9-thread-20] WARN  com.amazonaws.services.s3.internal.S3AbortableInputStream {} - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2023-08-24 12:44:08,989 [pool-7-thread-8] ERROR org.apache.airavata.mft.transport.s3.S3OutgoingConnector {} - S3 failed to upload chunk to bucket blaze-test-02-s3-gra for resource path hello.txt
2023-08-24 12:44:08,990 [pool-7-thread-8] ERROR org.apache.airavata.mft.agent.TransportMediator {} - Transfer 9abb6f6c-7a06-4ed9-89cf-81bee0e7682b failed with error
java.lang.NullPointerException: null
	at org.apache.airavata.mft.transport.s3.S3OutgoingConnector.complete(S3OutgoingConnector.java:134) ~[?:?]
	at org.apache.airavata.mft.agent.TransportMediator.transferSingleThread(TransportMediator.java:149) ~[mft-agent-service-0.01-SNAPSHOT.jar:0.01-SNAPSHOT]
	at org.apache.airavata.mft.agent.TransferOrchestrator.processTransfer(TransferOrchestrator.java:187) ~[mft-agent-service-0.01-SNAPSHOT.jar:0.01-SNAPSHOT]
	at org.apache.airavata.mft.agent.TransferOrchestrator.lambda$submitTransferToProcess$0(TransferOrchestrator.java:101) ~[mft-agent-service-0.01-SNAPSHOT.jar:0.01-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2023-08-24 12:44:08,995 [OkHttp http://localhost:8500/...] INFO  org.apache.airavata.mft.controller.MFTController {} - Received state Key 9abb6f6c-7a06-4ed9-89cf-81bee0e7682b/local-agent/da256772-ae53-4cfa-a2b0-a71550defc19/0b8c6f040a1cdf526b05508a3bee7a10/1692881048992 val {"state":"FAILED","publisher":"local-agent","updateTimeMils":1692881048992,"percentage":0.0,"description":"Transfer failed due to java.lang.NullPointerException\n\tat org.apache.airavata.mft.transport.s3.S3OutgoingConnector.complete(S3OutgoingConnector.java:134)\n\tat org.apache.airavata.mft.agent.TransportMediator.transferSingleThread(TransportMediator.java:149)\n\tat org.apache.airavata.mft.agent.TransferOrchestrator.processTransfer(TransferOrchestrator.java:187)\n\tat org.apache.airavata.mft.agent.TransferOrchestrator.lambda$submitTransferToProcess$0(TransferOrchestrator.java:101)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n","childId":null}
2023-08-24 12:44:08,996 [OkHttp http://localhost:8500/...] INFO  org.apache.airavata.mft.admin.MFTConsulClient {} - Saved transfer status {"state":"FAILED","publisher":"local-agent","updateTimeMils":1692881048992,"percentage":0.0,"description":"Transfer failed due to java.lang.NullPointerException\n\tat org.apache.airavata.mft.transport.s3.S3OutgoingConnector.complete(S3OutgoingConnector.java:134)\n\tat org.apache.airavata.mft.agent.TransportMediator.transferSingleThread(TransportMediator.java:149)\n\tat org.apache.airavata.mft.agent.TransferOrchestrator.processTransfer(TransferOrchestrator.java:187)\n\tat org.apache.airavata.mft.agent.TransferOrchestrator.lambda$submitTransferToProcess$0(TransferOrchestrator.java:101)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n","childId":"0b8c6f040a1cdf526b05508a3bee7a10"}
2023-08-24 12:44:08,996 [OkHttp http://localhost:8500/...] INFO  org.apache.airavata.mft.controller.MFTController {} - Deleting key mft/controller/messages/states/9abb6f6c-7a06-4ed9-89cf-81bee0e7682b/local-agent/da256772-ae53-4cfa-a2b0-a71550defc19/0b8c6f040a1cdf526b05508a3bee7a10/1692881048992
2023-08-24 12:44:08,996 [pool-7-thread-8] INFO  org.apache.airavata.mft.agent.TransferOrchestrator {} - Removed transfer 9abb6f6c-7a06-4ed9-89cf-81bee0e7682b from queue with transfer success = false. Total running 13

To Reproduce
Create 2 storages of type S3 (OVH provider endpoint) with mft storage add.
Run command:
mft cp storage-name1/file.txt storage-name2/

Expected behavior
The file should be present on the destination storage with the same size of the source file.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Add capability for S3 bucket users to choose between Path-Style Access and Virtual-hosted–style access

Is your feature request related to a problem? Please describe.
Path-style access is a way of specifying the location of a bucket or object in Amazon S3 using a URL-like syntax. This is in contrast to virtual-hosted style access, which uses the bucket name as a subdomain in the URL. For example:

Virtual-hosted style: https://bucket-name.s3.amazonaws.com/object-key
Path-style: https://s3.amazonaws.com/bucket-name/object-key

In virtual-hosted style access, the bucket name must be a valid DNS name, which means it must not contain dots. This can be a problem if you need to access a bucket that has dots in its name. In addition, some regions may only support path-style access.

Describe the solution you'd like

  1. Add enablePathStyleAccess property to S3StorageCreateRequest proto file
  2. Make the property enablePathStyleAccess default to false. A client that wants to use path style access will set the value to true
  3. In the S3Util.java class, add the method withPathStyleAccessEnabled(boolean) to the AmazonS3ClientBuilder object.

Additional context
Some buckets are not valid DNS names. Setting this flag to true will result in path-style access being used for all requests.

Vault backend support to Secret Service

Secret service contains a pluggable security backend framework and currently, it was implemented to support database-backed storage [1]. However, this is not recommended for production usage as credentials are saved in plain text. Vault [2] is a very popular credential storage tool and it is better if we can have backend implementation to support credential storage in vault

[1] https://github.com/apache/airavata-mft/tree/master/services/secret-service/server/src/main/java/org/apache/airavata/mft/secret/server/backend
[2] https://www.vaultproject.io/

Storage List Command Fails

M1 Macbook Pro with MacOS 13.2.1

$ mkdir AIR
$ cd AIR
$ python3.10 -m venv venv
$ source venv/bin/activate
(venv) $ pip3 install airavata-mft-cli
Collecting airavata-mft-cli
  Using cached airavata_mft_cli-0.1.9-py3-none-any.whl (21 kB)
Collecting typer[all]<0.8.0,>=0.7.0
  Using cached typer-0.7.0-py3-none-any.whl (38 kB)
Collecting pick==2.2.0
  Using cached pick-2.2.0-py3-none-any.whl (4.9 kB)
Collecting airavata_mft_sdk==0.0.1-alpha27
  Using cached airavata_mft_sdk-0.0.1a27-py3-none-any.whl (96 kB)
Collecting grpcio==1.47.0rc1
  Using cached grpcio-1.47.0rc1-cp310-cp310-macosx_13_0_arm64.whl
Collecting grpcio-tools==1.47.0rc1
  Using cached grpcio_tools-1.47.0rc1-cp310-cp310-macosx_13_0_arm64.whl
Collecting google-api-python-client>=2.0.0
  Downloading google_api_python_client-2.83.0-py2.py3-none-any.whl (11.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.2/11.2 MB 3.4 MB/s eta 0:00:00
Collecting six>=1.5.2
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from grpcio-tools==1.47.0rc1->airavata-mft-cli) (67.2.0)
Collecting protobuf<4.0dev,>=3.12.0
  Using cached protobuf-3.20.3-py2.py3-none-any.whl (162 kB)
Collecting click<9.0.0,>=7.1.1
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting shellingham<2.0.0,>=1.3.0
  Using cached shellingham-1.5.0.post1-py2.py3-none-any.whl (9.4 kB)
Collecting colorama<0.5.0,>=0.4.3
  Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting rich<13.0.0,>=10.11.0
  Using cached rich-12.6.0-py3-none-any.whl (237 kB)
Collecting uritemplate<5,>=3.0.1
  Using cached uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)
Collecting google-auth<3.0.0dev,>=1.19.0
  Downloading google_auth-2.17.0-py2.py3-none-any.whl (178 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.1/178.1 kB 13.5 MB/s eta 0:00:00
Collecting google-auth-httplib2>=0.1.0
  Using cached google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5
  Using cached google_api_core-2.11.0-py3-none-any.whl (120 kB)
Collecting httplib2<1dev,>=0.15.0
  Using cached httplib2-0.22.0-py3-none-any.whl (96 kB)
Collecting pygments<3.0.0,>=2.6.0
  Using cached Pygments-2.14.0-py3-none-any.whl (1.1 MB)
Collecting commonmark<0.10.0,>=0.9.0
  Using cached commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
Collecting requests<3.0.0dev,>=2.18.0
  Using cached requests-2.28.2-py3-none-any.whl (62 kB)
Collecting googleapis-common-protos<2.0dev,>=1.56.2
  Using cached googleapis_common_protos-1.59.0-py2.py3-none-any.whl (223 kB)
Collecting cachetools<6.0,>=2.0.0
  Using cached cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Collecting rsa<5,>=3.1.4
  Using cached rsa-4.9-py3-none-any.whl (34 kB)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.1.0-cp310-cp310-macosx_11_0_arm64.whl (123 kB)
Installing collected packages: pyasn1, commonmark, urllib3, uritemplate, six, shellingham, rsa, pyparsing, pygments, pyasn1-modules, protobuf, pick, idna, colorama, click, charset-normalizer, certifi, cachetools, typer, rich, requests, httplib2, grpcio, googleapis-common-protos, google-auth, grpcio-tools, google-auth-httplib2, google-api-core, google-api-python-client, airavata_mft_sdk, airavata-mft-cli
Successfully installed airavata-mft-cli-0.1.9 airavata_mft_sdk-0.0.1a27 cachetools-5.3.0 certifi-2022.12.7 charset-normalizer-3.1.0 click-8.1.3 colorama-0.4.6 commonmark-0.9.1 google-api-core-2.11.0 google-api-python-client-2.83.0 google-auth-2.17.0 google-auth-httplib2-0.1.0 googleapis-common-protos-1.59.0 grpcio-1.47.0rc1 grpcio-tools-1.47.0rc1 httplib2-0.22.0 idna-3.4 pick-2.2.0 protobuf-3.20.3 pyasn1-0.4.8 pyasn1-modules-0.2.8 pygments-2.14.0 pyparsing-3.0.9 requests-2.28.2 rich-12.6.0 rsa-4.9 shellingham-1.5.0.post1 six-1.16.0 typer-0.7.0 uritemplate-4.1.1 urllib3-1.26.15

(venv) $ mft init
Setting up MFT Services
Consul process id: 74795
Standalone Service stoping ...
./standalone-service-daemon.sh: line 55: kill: (62156) - No such process
Standalone Service stopped ...
Starting Standalone Service ...
Standalone Service started ...
MFT Started
(venv) $ mft storage list
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/myuser/venv/lib/python3.10/site-packages/airavata_mft_cli/storage/__init__.py:6 │
│ 6 in list_storage                                                                                │
│                                                                                                  │
│   63 │   │   │   │   │   │   │   │     secret_service_host = configcli.secret_service_host,      │
│   64 │   │   │   │   │   │   │   │     secret_service_port = configcli.secret_service_port)      │
│   65 │   list_req = StorageCommon_pb2.StorageListRequest()                                       │
│ ❱ 66 │   list_response = client.common_api.listStorages(list_req)                                │
│   67 │                                                                                           │
│   68 │   console = Console()                                                                     │
│   69 │   table = Table(show_header=True, header_style='bold #2070b2')                            │
│                                                                                                  │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮                     │
│ │   client = <airavata_mft_sdk.mft_client.MFTClient object at 0x10796d8a0> │                     │
│ │ list_req = <class 'rich.pretty.Node'>.__repr__ returned empty string     │                     │
│ ╰──────────────────────────────────────────────────────────────────────────╯                     │
│                                                                                                  │
│ /Users/myuser/venv/lib/python3.10/site-packages/grpc/_channel.py:946 in __call__       │
│                                                                                                  │
│    943 │   │   │   │    compression=None):                                                       │
│    944 │   │   state, call, = self._blocking(request, timeout, metadata, credentials,            │
│    945 │   │   │   │   │   │   │   │   │     wait_for_ready, compression)                        │
│ ❱  946 │   │   return _end_unary_response_blocking(state, call, False, None)                     │
│    947 │                                                                                         │
│    948 │   def with_call(self,                                                                   │
│    949 │   │   │   │     request,                                                                │
│                                                                                                  │
│ ╭──────────────────────────────────── locals ─────────────────────────────────────╮              │
│ │           call = <grpc._cython.cygrpc.SegregatedCall object at 0x1076085c0>     │              │
│ │    compression = None                                                           │              │
│ │    credentials = None                                                           │              │
│ │       metadata = None                                                           │              │
│ │        request = <class 'rich.pretty.Node'>.__repr__ returned empty string      │              │
│ │           self = <grpc._channel._UnaryUnaryMultiCallable object at 0x107adde70> │              │
│ │          state = <grpc._channel._RPCState object at 0x107adf4f0>                │              │
│ │        timeout = None                                                           │              │
│ │ wait_for_ready = None                                                           │              │
│ ╰─────────────────────────────────────────────────────────────────────────────────╯              │
│                                                                                                  │
│ /Users/myuser/venv/lib/python3.10/site-packages/grpc/_channel.py:849 in                │
│ _end_unary_response_blocking                                                                     │
│                                                                                                  │
│    846 │   │   else:                                                                             │
│    847 │   │   │   return state.response                                                         │
│    848 │   else:                                                                                 │
│ ❱  849 │   │   raise _InactiveRpcError(state)                                                    │
│    850                                                                                           │
│    851                                                                                           │
│    852 def _stream_unary_invocation_operationses(metadata, initial_metadata_flags):              │
│                                                                                                  │
│ ╭──────────────────────────────── locals ────────────────────────────────╮                       │
│ │      call = <grpc._cython.cygrpc.SegregatedCall object at 0x1076085c0> │                       │
│ │  deadline = None                                                       │                       │
│ │     state = <grpc._channel._RPCState object at 0x107adf4f0>            │                       │
│ │ with_call = False                                                      │                       │
│ ╰────────────────────────────────────────────────────────────────────────╯                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1680205081.363435000","description":"Failed to pick
subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3261,"referenced_errors":
[{"created":"@1680205081.363434000","description":"failed to connect to all
addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
>
(venv) $

Performance issue in parallel file metadata call invokes

When concurrent connections are made to file metadata fetch api, response time grows exponentially

`
for (int i = 0; i < 20; i++) {

Thread t = new Thread(new Runnable() {

    @Override

    public void run() {

        long tstartTime = System.currentTimeMillis();

        FileMetadataResponse fileResourceMetadata = client.getFileResourceMetadata(FetchResourceMetadataRequest.newBuilder()
                .setResourceId("remote-ssh-resource2")

                .setResourceType("SCP")

                .setResourceToken("local-ssh-cred")

                .setTargetAgentId("agent0")

                .build());

        long tendTime = System.currentTimeMillis();

        System.out.println("Thread " + Thread.currentThread().getName() + " Time " + (tendTime - tstartTime));

    }

}, i + "");

t.start();

}
`

MFT API enhancements for airavata-django-portal integration

Is your feature request related to a problem? Please describe.

The MFT API is missing some features required to implement the data management capabilities of airavata-django-portal.

  • delete file/directory
  • create directory/ies
  • update a file - this allows users to edit files in the gateway

Possibly also (I consider these "nice to have" but aren't necessary):

  • move file/directory
    • this can be implemented by reading and writing a new file in the new location and then deleting the old file in the old location, but a move API could be more efficient
    • probably only need this for moving files/directories within the same resource
  • copy file/directory
    • similar comments to move, this can be implemented by reading and writing a copy of a file, but a copy API could be more efficient
    • for this we might want the ability to copy across resources. The use case is a user cloning another user's experiment and the inputs files are copied. If the source input files are on one resource but the experiment data directory is created on a different resource then copying across resources is needed.

For reference, this is the minimal interface I want to implement using MFT APIs: https://github.com/apache/airavata-django-portal-sdk/blob/mft-integration/airavata_django_portal_sdk/user_storage/backends/base.py

The MFT APIs support implementing all of these methods except for the following:

    def delete(self, resource_path):
        raise NotImplementedError()

    def update(self, resource_path, file):
        raise NotImplementedError()

    def create_dirs(self, resource_path, dir_names=[], create_unique=False):
        """
        Create one or more named subdirectories inside the resource_path.
        resource_path must exist. dir_names will potentially be normalized as
        needed. The intermediate directories may already exist, but if the
        final directory already exists, this method will raise an Exception,
        unless create_unique is True in which the name will be modified until
        a unique directory name is found.
        """
        raise NotImplementedError()

Requesting a workflow feature

Requesting a GitHub workflow feature that builds and releases in a given period of time or after a merging to the main branch.

Right now it is being done manually.

pip install fails

MacBook-Pro-405:Applications spamidig$ python3.10 -m venv venv
MacBook-Pro-405:Applications spamidig$ source venv/bin/activate
(venv) MacBook-Pro-405:Applications spamidig$ pip install airavata-mft-client
ERROR: Could not find a version that satisfies the requirement airavata-mft-client (from versions: none)
ERROR: No matching distribution found for airavata-mft-client

Capturing Transport Configurations

Capturing transport level configurations
Right now MFT doesn't have a way to input certain configurations by transport type like s3, GCP, Azure, local etc.
For example, MFT supports local transport depending on if the local machine support Direct Memory Access (DMA) or not.

Solution would be like
A solution would be like to create a mechanism to feed transport configurations into MFT using configuration files, so MFT could perform tasks optimized for those configurations.

gRPC installation in macs with arm64e (M1/M2)

This error is only specific to macbooks with apple silicon (M1/M2...)

Description

When trying work with mft after installing using "pip install airavata-mft-cli", unable to use mft with conflicts from gRPC libraries incompatible with arm64e architecture.

Steps to Reproduce

Create a python virtual environment by "python3.10 -m venv venv"
Activate the above env "source venv/bin/activate"
Install mft-client with "pip install airavata-mft-cli".
Execute "mft --help" to generate the below error.

Traceback (most recent call last):
  File "/opt/homebrew/bin/mft", line 5, in <module>
    from airavata_mft_cli.main import app
  File "/opt/homebrew/lib/python3.10/site-packages/airavata_mft_cli/main.py", line 2, in <module>
    import airavata_mft_cli.storage
  File "/opt/homebrew/lib/python3.10/site-packages/airavata_mft_cli/storage/__init__.py", line 3, in <module>
    import airavata_mft_cli.storage.s3 as s3
  File "/opt/homebrew/lib/python3.10/site-packages/airavata_mft_cli/storage/s3.py", line 4, in <module>
    from airavata_mft_sdk import mft_client
  File "/opt/homebrew/lib/python3.10/site-packages/airavata_mft_sdk/mft_client.py", line 1, in <module>
    import grpc
  File "/opt/homebrew/lib/python3.10/site-packages/grpc/__init__.py", line 22, in <module>
    from grpc import _compression
  File "/opt/homebrew/lib/python3.10/site-packages/grpc/_compression.py", line 15, in <module>
    from grpc._cython import cygrpc
ImportError: dlopen(/opt/homebrew/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-darwin.so, 0x0002): tried: '/opt/homebrew/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

Expected Behaviour

It is expected to execute mft commands.

Your Environment

M1 MacBook Pro.

File resource download hangs when trying to download file

Description

I'm running download_http.py to generate a download url for a file resource. When I click on the generated URL to download the file, it just hangs.

Steps to Reproduce

I created a SCP file resource

...
  {
    "type": "SCP",
    "resourceId":  "remote-ssh-resource",
    "resourceMode": "FILE",
    "resourcePath": "/tmp/test1/solution_summary.json",
    "storageId" : "remote-ssh-storage"
  },

Then I generate the download URL with a python script (modified download_http.py):

import grpc
import MFTApi_pb2
import MFTApi_pb2_grpc

channel = grpc.insecure_channel('localhost:7004')
stub = MFTApi_pb2_grpc.MFTApiServiceStub(channel)
download_request = MFTApi_pb2.HttpDownloadApiRequest(
    # sourceStoreId ="remote-ssh-storage",
    sourceResourceId = "remote-ssh-resource",
    # sourceResourceId =  "remote-ssh-dir-resource",
    # sourceStoreId ="remote-ssh-dir-resource",
    # sourcePath= "/tmp/test1/foo",
    sourceToken = "local-ssh-cred",
    sourceType= "SCP",
    targetAgent = "agent0",
    # mftAuthorizationToken = ""
    )

result = stub.submitHttpDownload(download_request)
print(result)

When I run the script I get:

$ python download_http.py 
url: "http://localhost:3333/4978023d-c807-46fc-989b-8f550d18021d"
targetAgent: "agent0"

When I load the URL in a browser, the browser just spins and never finishes loading the file.

Expected Behaviour

Loading the URL in the browser would result in a file download.

Your Environment

  • mft branch or release version used: develop
  • Operating system and version: macOS Big Sur

Additional Context

By using logging statements, I know that it gets to trying to trying to create a session in SCPReceiver.java

https://github.com/apache/airavata-mft/blob/develop/transport/scp-transport/src/main/java/org/apache/airavata/mft/transport/scp/SCPReceiver.java#L103

Apparently it is stuck on this line in SCPTransportUtil:

https://github.com/apache/airavata-mft/blob/develop/transport/scp-transport/src/main/java/org/apache/airavata/mft/transport/scp/SCPTransportUtil.java#L37

because I can log the line before but none of my logging after this line prints.

Also, I know that the SCP configuration is good since I'm able to fetch metadata for the resource:

request = MFTApi_pb2.FetchResourceMetadataRequest(

    resourceId= "remote-ssh-resource",
                                        resourceType = "SCP",
                                        resourceToken = "local-ssh-cred",
                                        targetAgentId = "agent0",
                                        )
response = stub.getFileResourceMetadata(request)
print(response)

Cleanup of the README

The current README is not helpful for external users who come to understand how the project works. We need to make is simple and provide general information on the project.

MFT Copy command shows wrong count of files while making transfer

Description:
Command: mft cp source_storage_name/path destination_storage_name/path
on executing the above command it prints the below questionnaire
A total number of 2 files to be transferred. Total volume is 91808874 bytes. Do you want to start the transfer? [Y/n]:

Here actually the number of transferred files is 1 but the logs show 2 files as a count.

Steps to reproduce:

  • configure source storage
  • configure destination storage
  • use copy command to transfer files

Expected Behaviour:
The count should be logged as the actual count of files being transferred.

Environment::
M2 Mac

Local user interaface for Airavata MFT

Currently, Airavata MFT can be accessed through its command line interface and the gRPC API. However, it is really easy if a Docker desktop-like user interface is provided for a locally running Airavata MFT. The functionalities of such an interface can be summarized as follows

  1. Start / Stop MFT Instance
  2. Register/ List/ Remove Storage endpoints
  3. Access data (list, download, delete, upload) in configured storage endpoints
  4. Move data between storage endpoints
  5. Search data across multiple storage endpoints
  6. Analytics - Performance numbers (data transfer rates in each agent)

We can use ElectonJS to develop this cross-platform user interface. The node.js backend of ElectronJS can use gRPC to connect to Airavata MFT to perform management operations

MFT Copy command not exiting post completion of transferring files

Description
MFT Copy command not exiting post completion of transferring files

Steps to reproduce:

configure source storage
configure destination storage
use the copy command to transfer files

Expected Behaviour:
The command after transferring the files should exit and allow the user to proceed with the next command.

Environment::
M2 Mac

Setting up the testing framework

We need to have a proper testing (unit / integration) infrastructure in place to make sure that new features do not break existing ones. Possible frameworks to use are JUnit / Mockito

Implementing Agent to Agent transfers mechanisms

Is your feature request related to a problem? Please describe.

No

Describe the solution you'd like

Currently there is no Agent to Agent transfer mechanism.

Describe alternatives you've considered

To begin with, I prefer to try out UDT and Tsunami protocols and evaluate the feasibility of integrating with Agents

Additional context

Some evaluations were already performed for those protocols : https://scialert.net/fulltext/?doi=itj.2009.600.604#:~:text=UDT%3A%20UDT%20is%20more%20complex,control%20and%20flow%20control%20mechanisms.&text=Like%20TCP%2C%20UDT%20also%20uses,the%20number%20of%20unacknowledged%20packets.

Integrate CLI into Python SDK

To simplify the distribution of CLI, it is better to stick with Python SDK as anyone can easily do a pip install and access the CLI.

Docker distribution for standalone service

Current MFT distribution is available as binaries and local installations require java to run that. We need to have a docker distribution of same binary and provide an option for users to select the mode they want to install MFT locally. Ideally the installation mode could be provided at the init command.

mft init --docker

This will greatly benefit non-java and windows users

Helpful links
Distributions: https://github.com/apache/airavata-mft/tags
How the distribution is built: https://github.com/apache/airavata-mft/blob/master/standalone-service/pom.xml#L116
How initialization happens: https://github.com/apache/airavata-mft/blob/master/python-cli/mft_cli/airavata_mft_cli/bootstrap.py

Provide authorization for Agents connecting to Consul

MFT Agents communicate with the controller through the consul key-value store. https://github.com/apache/airavata-mft/blob/master/common/common-clients/src/main/java/org/apache/airavata/mft/admin/MFTConsulClient.java Consul keys are represented through paths and each agent has its own path to access messages. Currently there is no mechanism to control access to those paths as anyone can read from it. We can use the Access Control setup provided through consul to enforce authorization for agent communication. https://developer.hashicorp.com/consul/tutorials/security/access-control-setup-production The idea is,

  1. No open access to any consul path is provided. All communication should happen through Consul tokens.
  2. When an agent needs to connect to consul, it is give a consul token and agent can only access a particular path using that token.

Provide Swift endpoint configuration through CLI

Currently Swift endpoint configuration is only available through gRPC APIs of resource and secret services. We need to provide this capability through the command line client

Target location of new cli feature implementation : https://github.com/apache/airavata-mft/tree/master/python-cli/mft_cli/airavata_mft_cli/storage
Swift storage protobuf definitions : https://github.com/apache/airavata-mft/tree/master/services/resource-service/stub/src/main/proto/swift , https://github.com/apache/airavata-mft/tree/master/services/secret-service/stub/src/main/proto/swift

Decouple Storage endpoint and resource definition in the resource API

Currently there is no first class notion about storage endpoints. But we might need this for the storage browsing interfaces. A resource might be a child of a storage endpoint. For example: Storage endpoint will be a ssh able compute resource or a s3 bucket. Resource will be a file inside that. Storage endpoint is the one directly mapping with credentials there could be one or many credentials assigned to the storage endpoint.

Unable to install airavata-mft with Python3.11

Describe the bug
Unable to install airavata-mft with Python3.11. I get the following errors:
-> ERROR: Could not build wheels for grpcio, grpcio-tools, which is required to install pyproject.toml-based projects

To Reproduce
To reproduce,

  1. Install python3.11
  2. If you have already installed airavata-mft, create a virtual environment with python3.11.
  3. To create venv: python3.11 -m venv <path>
  4. To activate venv: source <path>/bin/activate
  5. python3.11 -m pip install airavata-mft-cli
  6. Error I got is shown in the following screenshot.
Screenshot 2023-06-20 at 10 51 33 AM

Expected behavior
To install successfully

Additional context 1
System Configuration:

  1. Processor: 2.4 GHz 8-Core Intel Core i9
  2. Operating System: macOS Ventura Version 13.4
  3. Python version: 3.11.4

Additional context 2
Even though this issue is for M1/M2 Macs, I found this issue similar to the above error. This issue had a solution which is to install a specific version of grpcio and grpcio-tools. But installing those gave an error in python3.11

On entering the following to the testvenv activated above:

python3.11 -m pip install grpcio==1.47.0rc1

I got the following error

Screenshot 2023-06-20 at 11 17 08 AM

I tried the second command in the solution of this issue.
On entering the following to the testenv activated above:

python3 -m pip install grpcio-tools==1.47.0rc1

I got the following error

Screenshot 2023-06-20 at 11 25 07 AM



Additional context 3
I was able to successfully install the airavata-mft after downgrading to python3.10. Since I was not able to install airavata-mft with python3.11, I am posting this issue.

EDIT: Added backslashes before < and > in the markdown so that <path> is visible in step 3 and step 4 of the "To Reproduce" section.

Verifying java runtime before starting the MFT through CLI

Currently mft init assumes that the correct java runtime was installed on the host machine but there might be cases where there are 2 java versions and MFT ends up picking the wrong version. Example: #93 (comment). It is better to verify this at the very early stage of init operation and fail fast if the environment is not compatible.

Prepare MFT for first Source Release

Clean up Airavata MFT Repo making it ready for an Apache Source Release. Convenience binaries can be bundled along with the source release.

This issue is to track all tasks related to the ASF release:

  • [] Check for any missing ASF Headers #73
  • [] Check for missing License Information #80
  • [] Check for incompatible licensed dependencies
  • [] Check for included unexpected binary codes

Provide directory copy functionality in the CLI

Currently, only files can be copied through the CLI. API has the capability to submit batch file transfers but there is no public API to list files to create that batch request. Once the #60 is fixed, this can be implemented at the CLI side

Build error on mac M1

I get the following error when I try to build the project using the script: ./scripts/build.sh

[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.5.1:compile (default) on project mft-common-proto: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:osx-aarch_64:3.0.2
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.0.2 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=/path/to/file
[ERROR] 
[ERROR]   Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]       mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.0.2 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR]   Path to dependency: 
[ERROR]   	1) org.apache.airavata:mft-common-proto:jar:0.01-SNAPSHOT
[ERROR]   	2) com.google.protobuf:protoc:exe:osx-aarch_64:3.0.2
[ERROR] 
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR] 
[ERROR] for artifact: 
[ERROR]   org.apache.airavata:mft-common-proto:jar:0.01-SNAPSHOT
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   apache.snapshots (https://repository.apache.org/snapshots, releases=false, snapshots=true),
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)

This is because com.google.protobuf:protoc:3.0.2 doesn't have support for Apple Silicon. Here's the referenced github issue in the protobuf repo.

guava version issue in dependencies while adding GCS storage through CLI

Description

When trying add google cloud storage (GCS) buckets using standaloneServiceApplication, it shows an error as below:

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Failed to fetch file resource metadata. Timed out waiting for the response"
        debug_error_string = "{"created":"@1675112686.986866000","description":"Error received from peer 
ipv6:[::1]:7003","file":"src/core/lib/surface/call.cc","file_line":967,"grpc_message":"Failed to fetch file resource metadata. Timed out waiting for the 
response","grpc_status":13}"
>

This error is because of non compatible module of guava installed from other modules, maven dependency tree is as below:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ mft-standalone-service ---
[INFO] org.apache.airavata:mft-standalone-service:jar:0.01-SNAPSHOT
[INFO] +- io.github.lognet:grpc-spring-boot-starter:jar:4.7.1:compile
[INFO] |  +- io.grpc:grpc-netty-shaded:jar:1.47.0:compile
[INFO] |  |  +- com.google.guava:guava:jar:31.0.1-android:compile
[INFO] |  |  |  +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] |  |  |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  |  |  +- org.checkerframework:checker-qual:jar:3.12.0:compile

Steps to Reproduce

  • Start the 'StandaloneServiceApplication' from "airavata-mft/standalone-service/src/main/java/org.apache.airavata.mft.standalone.server/StandaloneServiceApplication"
  • Then in terminal execute "mft storage add"
  • Choose "Google Cloud Storage (GCS)"
  • Choose "Enter Manually"
  • Then provide path to the service account credential JSON

They you would be the above error.

Expected Behavior

It should then use the credentials of GCS and show the available buckets.

Your Environment

MAC OS, arm64e architecture.

Expose file browsing API to global API

Users should be able to list buckets / directories in a storage endpoint using the public GRpc API. This API can be used as a utility API to transfer / sync an entire bucket / directory to another storage

Readonly web application for MFT

We need a simple web application that shows the metrics of a running MFT setup. It will be a part of MFT Cli with the option of initialization through the command "mft init --with-dashboard"

Features of the dashboard could be

  1. Show currently running transfers and realtime throughput
  2. Show running agents, registered storages and previous transfers
  3. Access MFT logs
  4. Manually register an agent (This is yet to be implemented)

Resources
https://github.com/apache/airavata-mft/blob/master/python-cli/mft_cli/airavata_mft_cli/base.py#L15
https://github.com/apache/airavata-mft/blob/master/python-cli/mft_cli/airavata_mft_cli/bootstrap.py

Setting Up Airavata for File Transfer - Java

I would like to know if there is a feature to define multiple transfer paths in database. In my use case I have over 6000 different paths I want to pick files from and move to different locations. Is there a way to dynamically define my paths

Hierarchical class loaders for tranasports

Currently MFT standalone service and Agent bundles with all the available transport libraries. This makes the distribution very fat and prone to various dependency conflict issues. The ideal way is to have hierarchical class loaders to load transports [3] on demand and keep the main distribution light and simple. The ConnectorResolver and MetadataCollectorResolver should use separate class loaders to load jars for each transport. Libs for each transport can be organized in subdirectories of the lib directory of the distribution.

[1] https://github.com/apache/airavata-mft/blob/master/core/src/main/java/org/apache/airavata/mft/core/ConnectorResolver.java#L33
[2] https://github.com/apache/airavata-mft/blob/master/core/src/main/java/org/apache/airavata/mft/core/MetadataCollectorResolver.java
[3] https://github.com/apache/airavata-mft/tree/master/transport

Request for Clear Deployment Guide for Apache Airavata MFT

Dear Apache Airavata MFT authors

I hope this message finds you well. I'm exploring the deployment of Apache Airavata MFT for file transfer needs within our organization. While reviewing the available documentation on the official website, I found the information insufficient for setting up Apache Airavata MFT between two machines, especially without involving Java programming.

I'm seeking clarification and a comprehensive tutorial or guide that covers the deployment steps and configuration setup specifically for a two-machine deployment scenario. My aim is to establish file transfer operations between these machines using Apache Airavata MFT, without delving into Java programming.

I would greatly appreciate a guide that includes:

Recommended deployment architecture and setup for Apache Airavata MFT between two machines.
Step-by-step instructions for installation, configuration, and establishing file transfers without requiring Java programming knowledge.
Prerequisites, configurations, and any specific considerations essential for this deployment scenario.
Having a detailed guide tailored to this use case would greatly assist our team in effectively utilizing Apache Airavata MFT for our file transfer requirements.

I understand your time is valuable, and I genuinely appreciate any guidance or documentation you can provide. Additionally, if there are community resources or forums where similar deployment scenarios have been discussed, I'd be grateful for any references.

Thank you very much for your attention to this request. I'm looking forward to your guidance on this matter.

Warm regards,
Khasan
[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.