/
NUTRINTG (2) File Set Analyzer (DEPRECATED/OUTDATED DOCUMENTATION)

NUTRINTG (2) File Set Analyzer (DEPRECATED/OUTDATED DOCUMENTATION)

File set analyzer is a process responsible for analyzing s3 bucket finding files and preparing full Delta messages if all files match all requirements. High level architecture overview:

Main process steps

  1. The process is triggered by cron expression (process.cron-expression). Then the process calculates the next expected delta date based on the presence of last entry in the FullDelta repository and process.delta-interval-in-days. If no entry exists in the database then the delta date is resolved from folder name on S3.

  2. If all files required for delta are present on s3, the process will check validity of each data file by comparing checksum present in FileMetadataStore with the checksum from the corresponding checksum file. (Checksum verification can be enabled/disabled by setting the following property: checksum-verification.enabled).

  3. If all files are analyzed & verified the output message is generated with all data files (checksum files are not included in the output message).

  4. The information about the successfully produced next full delta is stored in FullDelta repository.

Spring integration channels graph and error handling

Below there is presented a channel graph of the file-set-anaylzer:

Processing the message

  1. First the last delta date is retrieved from FullDelta repository and checked if the next delta is required.

  2. If the delta is required then the calculation proceeds. The files list is received from s3 bucket, the process check if all files are present and has valid checksums (FileMetadataStore have checksum calculated by file-fetcher while sending a file).

  3. If all files are verified and present the result handler is triggered.

In case of any error occurred while processing the appropriate email message is generated.

Most important properties

Property

Description

Example

Property

Description

Example

process.region-name

The region name for which the file-set-analyzer is deployed. Please keep in mind that each instance of the file-set-analyzer is per region.

amer

process.zone-id

Process TimeZone, it is required properly calculate next delta date.

UTC

process.midnight-offset-in-ms

Midnight offset in mills. (Thanks to setting this property we can postpone required full delta check time)

0

process.delta-interval-in-days

Every how many days we expect the next delta. It is required to preserve order of delta messages

1

process.file-name-regex

File name regex.

rb_amer_(?<entity>.*)_(?<deltaDate>[0-9]{8})[0-9]{6}\\.(?<extension>.*)

process.entities

Expected entities. For each we check if a complete set of files with proper extensions is received.

child,customer

process.extensions

Files extensions. We expect files with the following extensions for each entity.

dat.pgp,md5