NUTRINTG (1) File Fetcher (DEPRECATED/OUTDATED DOCUMENTATION)
File fetcher is the inbound service which fetches files from the provided SFTP server and stores the results in s3 bucket. Furthermore, is checks integrity of files while transferring and stores calculated checksum in the FileMatadata store. High level overview of the service is presented below:
Main process steps:
First file fetcher is polling for new entries from the SFTP server. Polling interval is configurable with application properties (
process.fixed-delay
)File fetcher streams new file from SFTP to S3.
If the uploading is successful the calculated checksum is stored in the FileMetadataStore (mongodb collection).
Conditions for fetching a file:
File name should match the pattern defined in properties
process.fileNamePattern
. Files which does not match pattern are ignored.File with same destinationPath (see
process.destinationPathTemplate
) cannot exist on s3 already.File should not be in uploading state - ProgressMark for new entry should not exist in the database.
Internet connection failures handling (aka ProgressMark):
If all conditions for fetching a given file are satisfied, the process will store initial ProgressMark in the database. Then it will check every 5 minutes (configurable with proces.updateProgressMarkOncePerMinutes
) that the uploading is in progress - and if yes the update of ProgressMark will be done to postpone the expiration time.
Thanks to using ProgressMark, when the instance is killed, or something wrong will happen, after restarting the instance/ or expiration period, the process of sending the same file will start again automatically (ProgressMark will be expired) - no manual handling is required.
Sequence diagrams
File fetcher sequence diagram is presented below:
Please notice that in the Upload Loop there is an asynchronous call to the ProgressObserver with a number of transferred bytes.
The ProgressObserver update is triggered by scheduler. It has already all transferred bytes events that have occurred in specified period (progress.updateProgressMarkOncePerMinutes
), so it can compute aggregated result and store it to the database as a ProgressMark.
Most important properties
Property | Description | Example |
---|---|---|
| Directory on SFTP from which files can be fetched | file-dir |
| File name pattern from SFTP server. |
|
| S3 server destination path template. One can use matched groups from |
|
| Pooling interval in mills. | 5000 |
| Algorithm used to calculate checksum. Java Platform supports the following ones: MD5, SHA-1,SHA-256 | MD5 |
| Expiration index in ProgressMark Repository | 1h |
| Interval for updating ProgressMark database entry while uploading a corresponding file. | 5 |
Tips
To re-upload file again, one can just delete file from s3 (file fetcher will automatically re-trigger upload)