/
NUTRINTG File Fetcher [WORK IN PROGRESS]

NUTRINTG File Fetcher [WORK IN PROGRESS]

Simple explanation

Every day morning child extracts are created for all non-USA markets together with Enterprise files. Around 15:00 (Warsaw Time Zone) files for USA are created. In both cases files are automatically posted on Inbound SFTP.

Fetcher is responsible for fetching all files which weren’t yet processed and uploading those to s3 fetched folder.

Infrastructure

All infrastructure are placed in AWS cloud.

Bucket on s3 - place where fetched files are being stored

NoSQL DB - MongoDB / DocumentDB - used to store metadata and errored metadata

Most important process properties

Property

Description

Example

Property

Description

Example

process.fixed-delay

Polling interval in mills

5000

process.remoteDirectory

Directory on source sftp from where files are being fetched

Export/UAT

process.fileNamePattern

Pattern for matching metadata stored in filename and for filtering not needed files

(?<filename>(?<brandOrgCode>[^]*)(?<tableName>\D*)_?(?<date>[0-9]{8})(\.csv))

process.destinationPathTemplate

Path on s3 bucket where to store fetched files

fetched/${date}/${filename}

process.retryCount

Number of upload attempt during fetching file

3

Flow chart

 

Flow chart - elements explanation

Source SFTP - One of SFTP where files extracted by SFMC are stored. Poller is rotating between those SFTPs.

Files filtering - After obtaining list of files on current SFTP we filter out following files:

  • Not matching pattern specified in properties

  • Files already processed - entries for those files are in sfmcToCdpFileFetcher_fileMetaDataStore collection

  • Files already processed but with errors - entries for those files are in sfmcToCdpFileFetcher_erroredFileMetaDataStore collection

  • Files currently processed - entries for those files are in sfmcToCdpFileFetcher_progressMark collection

Resolve destination path - Basing on file name and using pattern in properties we are getting information about date of batch. With this information and properties we can create destination path on s3 bucket.

Add progress mark - Creating entry in mongo that file is currently being processed

Upload file to s3 bucket - Uploading files by parts (decided by properties) to s3 bucket destination (choosed by destination path in previous case)

Save metadata - In this step we are already sure that file got uploaded and thats why we are adding it to sfmcToCdpFileFetcher_fileMetaDataStore collection

Remove progress mark - we can now remove entry in sfmcToCdpFileFetcher_progressMark collection

Save errored metadata - Entry is getting added to sfmcToCdpFileFetcher_erroredFileMetaDataStore collection. Beside file name there is also stacktrace of error which caused failed processing of file.