moov-io / achgateway Goto Github PK
View Code? Open in Web Editor NEWPayment gateway enabling automated ACH operations in a distributed and fault tolerant way.
Home Page: https://moov-io.github.io/achgateway/
License: Apache License 2.0
Payment gateway enabling automated ACH operations in a distributed and fault tolerant way.
Home Page: https://moov-io.github.io/achgateway/
License: Apache License 2.0
Currently our events emitted from Inbound.ODFI assume files represent "corrections" or "returns" and are not comprised of multiple entry types. This isn't the case in most real-world deployments. We should look at emitting events at the EntryDetail level (with their Batch Header) instead.
Events that are broken:
Slack error alerting as implemented in #101 requires setting up a slack app, but other slack alerting in achgateway can use their "legacy webhook" functionality. We're unsure when/if Slack will deprecate webhooks, but it was raised in slack that requiring both is confusing.
They're picked up by the Inbound path, but we can have a config specific to them.
We should document what Processors can detect inbound files.
Along with #53 we should not allow pending files (those in ./storage/mergable/*
) to be modified. They should be immutable once created.
With request transformation of encryption and encoding we should support additional algorithms. One popular example would be Vault.
This issue is to discuss using Transit vs other options.
Currently releases and docker images are only for amd64. Building the Docker image manually on arm64 works perfectly fine but it would be great if the official releases build for arm64.
ACHGateway Version: v0.16.6
We're seeing some EOF errors that don't get bubbled up to PagerDuty, so track down how that can happen and fix.
in the config.yml
does the cutoff window define if it is a sameday
or standard
ach window? I assume that it tries to send whatever it can send during that cutoff window?
Is the cutoff window the actually cutoff for the bank and achgateway tries to send something beforehand or is that the time that the achgateway cuts off payments and starts generating files and sending it to the sftp endpoint?
This is a proposal to introduce a new event called TriggerCutoff
to achgateway. This event will function the same as a manual/automatic cutoff trigger but allows us to keep state consistent across multiple instances of achgateway.
Imagine a kafka topic with messages file1
, file2
, triggerCutoff[09:00]
, file3
:
Each instance of achgateway will queue file1
and file2
for a cutoff. Then the triggerCutoff
event is consumed which initiates leadership election and processing. Only after completed cutoff processing is file3
queued for later upload.
Currently instances of achgateway can get out of sync due to clock skew which might let file3
be consumed by some instances and not others. This produces inconsistent state.
The service producing QueueACHFile
events could produce this. Often the producing service needs knowledge of cutoff times to allow for setting EffectiveEntryDate to a proper value. Duplication of cutoff time configuration is not ideal.
Similar slack, email, pagerduty notifications from paygate.
https://github.com/moov-io/ach-conductor/blob/master/docs/initial.md#high-level-plan
From Andrew Hamilton in slack,
If using the Audit config, newBlobStorage will happily skip over setting the GPG fields if the config is not there, but blobStorage.SaveFile doesn’t confirm the cryptor is non-nil before attempting to disfigure it, leading to a panic
achgateway_1 | panic: runtime error: invalid memory address or nil pointer dereference
achgateway_1 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa192e2]
achgateway_1 |
achgateway_1 | goroutine 120 [running]:
achgateway_1 | github.com/moov-io/cryptfs.(*FS).Disfigure(0x0, {0xc00027ec00?, 0xc00074a360?, 0x5e?})
achgateway_1 | /src/vendor/github.com/moov-io/cryptfs/cryptfs.go:85 +0x22
achgateway_1 | github.com/moov-io/achgateway/internal/audittrail.(*blobStorage).SaveFile(0xc00071c480, {0xc0004c18c0, 0x36}, {0xc00027ec00?, 0x4?, 0x1?})
achgateway_1 | /src/internal/audittrail/storage_blob.go:62 +0x5e
achgateway_1 | github.com/moov-io/achgateway/internal/incoming/odfi.(*AuditSaver).save(...)
achgateway_1 | /src/internal/incoming/odfi/audit.go:39
achgateway_1 | github.com/moov-io/achgateway/internal/incoming/odfi.process({0xc00021f740, 0x22}, 0xc00071c4a0, {0xc0000a2940, 0x3, 0x4})
achgateway_1 | /src/internal/incoming/odfi/processor.go:130 +0x71a
achgateway_1 | github.com/moov-io/achgateway/internal/incoming/odfi.ProcessFiles(0xc000724350, 0x8?, {0xc0000a2940, 0x3, 0x4})
achgateway_1 | /src/internal/incoming/odfi/processor.go:90 +0x1e5
achgateway_1 | github.com/moov-io/achgateway/internal/incoming/odfi.(*PeriodicScheduler).tick(0xc00017d8c0, 0x16e2b85?)
achgateway_1 | /src/internal/incoming/odfi/scheduler.go:172 +0x307
achgateway_1 | github.com/moov-io/achgateway/internal/incoming/odfi.(*PeriodicScheduler).tickAll(0xc00017d8c0)
achgateway_1 | /src/internal/incoming/odfi/scheduler.go:139 +0x4e5
achgateway_1 | github.com/moov-io/achgateway/internal/incoming/odfi.(*PeriodicScheduler).Start(0xc00017d8c0)
achgateway_1 | /src/internal/incoming/odfi/scheduler.go:104 +0xf9
achgateway_1 | github.com/moov-io/achgateway/internal.NewEnvironment.func5()
achgateway_1 | /src/internal/environment.go:215 +0x32
achgateway_1 | created by github.com/moov-io/achgateway/internal.NewEnvironment
achgateway_1 | /src/internal/environment.go:214 +0x148a
ftp_1 | 2022/06/15 17:09:29 a1985a83d4f35936e07a Connection Terminated
Similar to paygate we will want to trigger cutoff times based on a schedule. This will initiate the processing that paygate currently does and is similar in ach-conductor.
Example: https://github.com/moovfinancial/paygate/blob/master/x/schedule/cutoff.go
As part of our support for OpenShift across all OSS projects we should include an OpenShift compatible image on each release. We can look at Watchman's image to build and push on each release.
We should have a multi-node setup of consul running in docker-compose.yml
that allows us to test and develop against. Consul will be a critical component of ach-conductor's orchestration.
Docker Hub: https://hub.docker.com/_/consul/
Examples:
It seems handy to notify (at an INFO level) when we're skipping processing for a day due to it being a holiday.
After consul is setup for development (see #2) we should have code that connects to the cluster and offers leadership capabilities. This can be benchmarked.
Example: https://clivern.com/leader-election-with-consul-and-golang/
I'm thinking that the KV path used for leadership would be /ach-conductor/shards/$shardKey
coming from messages.
in config.yml
the merging directory is located where? Is that on the machine running each gateway or on the sftp endpoint? I assume it is on the SFTP server because it is nested under Upload
The file upload component of #7 needs to be implemented. Similar features and config from paygate.
https://github.com/moov-io/ach-conductor/blob/master/docs/initial.md#high-level-plan
FileUploaded
events don't populate the Filename field, which looks tricky because pending files can be merged into multiple outgoing files.
Code: https://github.com/moov-io/achgateway/blob/v0.16.10/internal/pipeline/aggregate.go#L219-L225
Pending files that are waiting for the next cutoff (aka inside ./storage/mergable/*
) should be encrypted at rest. These files contain PII and sensitive data. While having encryption at other layers is good we should further protect the data.
With async messages to cancel messages it's helpful to know when a file was successfully canceled (e.g. skipped during merge/upload).
As part of a member of the cluster we'll consume ACH files via kafka or HTTP. This allows us to be extensible and async where possible.
We've talked about accepting messages that are very similar to what paygate emits now:
https://github.com/moov-io/ach-conductor/blob/master/docs/initial.md#high-level-plan
Along with kafka the gocloud.dev/pubsub package supports Amazon SQS. You can read their docs for the package interface.
Publish: https://gocloud.dev/howto/pubsub/publish/#sqs
Subscribe: https://gocloud.dev/howto/pubsub/subscribe/#sqs
After #3 we should have leaders upload their files to a trivial FTP/SFTP server. This allows us to benchmark a naive setup to find the maximum performance we can expect.
The uploading would start to consume the upload configuration from paygate (and referenced in our docs)
https://github.com/moov-io/ach-conductor/blob/master/docs/initial.md#configuration
Followup from #108 Upload notifications should support a proper Slack App in the event that "legacy webhooks" are removed in the future. (Also to be consistent with Error alerting)
Setting up achgateway to fully encrypt messages is confusing. The Kafka sub-object has its own TransformConfig
but several parent objects also contain this config. We should simplify this.
Likely this is simpler by only having TransformConfig
on the parent objects (Events
, Inbound
).
In this case we could use the same setup for uploading wire files, ach files, BSA/Compliance files as needed to partners, you could still have different upload agents for each if needed but the generic nature of this need would go a long way.
I feel like otherwise we'll end up duplicating several parts of ach gateway multiple times with various systems when some configuration changes would be everything that is needed, though I understand that would also mean a fairly good size refactor of the codebase possibly.
Following #141 it would be helpful to have an event for when cancelation fails. Either due to errors or the request was processed too late/early.
Our error alerting only offers PagerDuty as a notification channel. We should include slack webhooks, which is useful for lower environments where issues are less urgent.
Source: https://github.com/moov-io/achgateway/blob/v0.6.4/internal/service/model_errors.go#L25-L28
When we manually trigger shards it would be helpful to return the result of each shard in the response. An example is below.
{
"shards": {
"testing": "skipped",
"prod": "completed",
"uat": "errored"
}
}
The possible values at first could be skipped
, errored
and completed
. The skipped
comes into play when we trigger only some of the shards.
We've ran into some issues in the real world where /trigger-cutoff
isn't fully applied to every instance properly to maintain consistent state.
After we are consuming files (see #5) we'll stage them similar to how paygate does. The filesystem based layout is something I'd like us to keep because it gives us a lot of isolation, stability, and easy implementation.
The root level object was never used as intended.
Related to #138
Along with acking the message to avoid queueing the system on a failure, it is also prudent that we must publish successfully into a DLQ topic the failed message.
As an operator:
As an application dev:
While achgateway currently supports Filename templating for OUTGOING
files it does not support them for INCOMING
files, this can lead to a situation where an ODFI has a lack of capabilities to drop files of different types into multiple folders in an SFTP but instead wants to break this up by giving Filenames.
For example rather than using a folder structure like
/incoming/ach-file-name
an ODFI might use a filename structure like
INCOMING_ACH_FILE_file-name
or
RETURN_ACH_FILE-file-name
However in the current integration this is not supported very well unless we just process all files as one type or we have our ODFI generate unique credentials for each type of file generated which is also a pain.
In an ideal world we'd be able to sort incoming files by either filename structure, folder structure or even both. This would provide the maximum level of configurability
We're seeing lots of logs like the following during shutdown. This is likely from contexts being kept around when we didn't expect them.
2022-06-14 11:48:17 | ts=2022-06-14T16:48:17Z msg="nil message received" app=achgateway level=info version=v0.15.3
2022-06-14 11:48:17 | ts=2022-06-14T16:48:17Z msg="ERROR receiving message: context canceled" app=achgateway errored=true level=info version=v0.15.3
At Moov we've stopped running achgateway with consul support and others have deployed without consul as well. The consul implementation has several known issues and does not work as expected.
On the /trigger-inbound
endpoint we don't allow the option of triggering some of the shards. Instead every shard is triggered by default. We should support triggering only some shards, which is how /trigger-cutoff
works.
https://github.com/moov-io/achgateway/blob/v0.15.4/internal/incoming/odfi/admin.go#L35
Often deployments will mount a NAS / external storage on the local filesystem. Our system could support this with the same encryption as cloud storage.
Please star/comment on this issue if you're interested in the feature.
We should support the ability to store (encrypted) files in an audittrail bucket. This will allow lookups for support folks and records retention requirements.
We should probably read cancelation events on a separate topic. This will help to avoid consumer groups getting in the way (the wrong achgateway instance consuming cancel messages) and to speed up those messages compared to incoming ACH files.
There is a CancelACHFile
event, but no HTTP endpoint to submit this. We should support canceling a file over HTTP.
https://github.com/moov-io/achgateway/blob/master/pkg/models/events.go#L147
Similar to paygate we could offer a CLI for viewing files in audittrail storage. This has been helpful before ach-web-viewer existed, but can still be powerful.
Example: moov-io/paygate#621
Ran into the following error recently in production. It was from the ODFI processor failing to parse an ACH file. The alert wasn't sent due to on-call staff.
ERROR sending alert: alerting error: creating event in PagerDuty: HTTP response failed with status code 400 and no JSON error object was present
Relevant Code:
https://github.com/moov-io/achgateway/blob/master/internal/alerting/pagerduty.go#L32
Hey Adam, one thing we noticed yesterday while validating our setup (besides that the errors emitted by crypto ssh are can be quite opaque) is that we had a misconfigured path to drop off the outbound files and when we triggered the cutoff to send a file, the logs said 1 of 1 file had been moved successfully (wrote 1 of 1 files to remote agent) but since we didn’t have permission to write to the incorrect destination, the file wasn’t actually delivered. We got a FileUploaded event for this fileID as well. Granted this was during initial setup and we caught it pretty quickly, but was interested in hearing your thoughts on validating the uploaded files in some way after upload. Thinking about an instance in the future when the permissions on our upload server change and we miss that files are not successfully being written.
Source: https://moov-io.slack.com/archives/CD9J8EJKX/p1665672492870269
Yeah, from the ACH Gateway perspective, everything looked successful, but the file was not in the remote destination. I tried an analogous file put operation just via the command line and got a permissions error there:
sftp> put 091218445-ach-json-test.ach /m1/Outbound/ACHFiles/
debug1: Couldn't stat remote file: No such file or directory
Uploading 091218445-ach-json-test.ach to /m1/Outbound/ACHFiles/
remote open("/m1/Outbound/ACHFiles/"): No such file or directory
./Outbound/ACHFiles/ is the path I should have been using and that works with both.
Most of the other Moov services have the /ping
endpoint as a convenient built in healthcheck. I just happened across it that this is not the case in achgateway when building a paw/postman collection for it. This feels like something that should likely be added for those that rely on these endpoints
Instead of using GPG we should offer Vault as a method to encrypt each file. This could use the transit backend.
We should expose the pending files (under ./storage/mergable/
) over an API so tools like ach-web-viewer can include them in listings. Ideally these are tagged by the viewer so they can be displayed with appropriate metadata.
Note: This endpoint should be disabled by default because it exposes sensitive data. It should also have a masking config usable like other tools.
We should support uploading individual files to the audittrail. This helps out when manual intervention happens and could be from the request body or somewhere under ./storage/
.
Similar to #24 but over HTTP instead of a CLI.
Examples:
POST /audittrail/:id/files
// Nacha or moov-io/ach JSON formatted body
POST /audittrail/:id/files
{
"merging": {
"filepaths": [
"testing/20210825-135000/uploaded/sha1.ach"
]
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.