/
NUTRINTG (1) File Fetcher (DEPRECATED/OUTDATED DOCUMENTATION)

NUTRINTG (1) File Fetcher (DEPRECATED/OUTDATED DOCUMENTATION)

File fetcher is the inbound service which fetches files from the provided SFTP server and stores the results in s3 bucket. Furthermore, is checks integrity of files while transferring and stores calculated checksum in the FileMatadata store. High level overview of the service is presented below:

 

Main process steps:

  1. First file fetcher is polling for new entries from the SFTP server. Polling interval is configurable with application properties (process.fixed-delay)

  2. File fetcher streams new file from SFTP to S3.

  3. If the uploading is successful the calculated checksum is stored in the FileMetadataStore (mongodb collection).

Conditions for fetching a file:

  1. File name should match the pattern defined in properties process.fileNamePattern. Files which does not match pattern are ignored.

  2. File with same destinationPath (see process.destinationPathTemplate) cannot exist on s3 already.

  3. File should not be in uploading state - ProgressMark for new entry should not exist in the database.

Internet connection failures handling (aka ProgressMark):

If all conditions for fetching a given file are satisfied, the process will store initial ProgressMark in the database. Then it will check every 5 minutes (configurable with proces.updateProgressMarkOncePerMinutes) that the uploading is in progress - and if yes the update of ProgressMark will be done to postpone the expiration time.

Thanks to using ProgressMark, when the instance is killed, or something wrong will happen, after restarting the instance/ or expiration period, the process of sending the same file will start again automatically (ProgressMark will be expired) - no manual handling is required.

Sequence diagrams

File fetcher sequence diagram is presented below:

Please notice that in the Upload Loop there is an asynchronous call to the ProgressObserver with a number of transferred bytes.

The ProgressObserver update is triggered by scheduler. It has already all transferred bytes events that have occurred in specified period (progress.updateProgressMarkOncePerMinutes), so it can compute aggregated result and store it to the database as a ProgressMark.

Most important properties

Property

Description

Example

Property

Description

Example

process.remoteDirectory

Directory on SFTP from which files can be fetched

file-dir

process.fileNamePattern

File name pattern from SFTP server.

(?<filename>rb_(?<region>amer)_.*_(?<date>[0-9]{8})[0-9]{6}(\\.md5|\\.dat\\.pgp))

process.destinationPathTemplate

S3 server destination path template. One can use matched groups from process.fileNamePattern property.

${region}/${date}/${filename}

process.fixedDelay

Pooling interval in mills.

5000

process.checksumAlgorithm

Algorithm used to calculate checksum. Java Platform supports the following ones:

MD5, SHA-1,SHA-256

MD5

process.expireAfter

Expiration index in ProgressMark Repository

1h

process.updateProgressMarkOncePerMinutes

Interval for updating ProgressMark database entry while uploading a corresponding file.

5

Tips

  • To re-upload file again, one can just delete file from s3 (file fetcher will automatically re-trigger upload)