/
NUTRINTG File Processor (DEPRACATED/OUTDATED DOCUMENTATION)

NUTRINTG File Processor (DEPRACATED/OUTDATED DOCUMENTATION)

Infrastructure

All infrastructure are placed in AWS cloud.

Input queue - is the entry point to the service

ACK Timeout = 30 min - reserved time to return a response to the queue. If the response will not be sent, the message will be available to be consumed once again

Message retention period = 7 days - reserved time to pop out a message. If not happen, the message will be removed automatically without processing

Max redelivery counter = 1 - max number of retry to process the message. If the message will not be consumed successfully after the last try it will be placed into Dead-letter queue
Dead-letter queue - queue with messages processed without any result within a given timeframe (ACK Timeout) - the dead-letter-queue is common for all services

Output queue - is the exit point from the service

Aurora - SQL database; PostgreSQL implementation. Used to execute SQL queries on files in order to create new set of files to be sent to CDP.

noSQL DB - MongoDB / DocumentDB - used to store process configuration

Service configuration

InputQueueName = sfmc-to-cdp-file-processor-input-queue-test.fifo

OutputQueueName = sfmc-to-cdp-file-encrypter-input-queue-test.fifo

DeadLetterQueueName = sfmc-to-cdp-dead-letter-queue-test.fifo (COMMON FOR ALL SFMC-TO-CDP SERVICES)

AuroraConnection

OutputFileBucketName = newSfmcToCdp

OutputFileFolderName = output

To check/discuss

  1. If files on S3 as a result of SQL query are overridden when the query is executed once again.
    We do not bother with the overriding mechanism, since files will have different ‘exportDateTime’ in file names across query executions.

  2. If files generated by SQL queries should be erased manually - I think there is a logic on S3 to autodelete files older than X days

  3. file-analyzer sends date or dateTime? Which date/dateTime have records put into tables? Is the value propagated to further services?
    File-analyzer sends date. Records in tables have assigned ‘date’. ‘Date’ values are the same for records in DB and in the messages sent by File-analyzer

Processing messages

Flow chart

 

Flow chart description for processing messages

Receive a message on the input queue

The process is triggered by an input message sent to the input SQS queue.

Message format:

{​​​​​​​ "strategyCode": "MJNPOL_PUSH", "brandOrgCode": "MJNPOL", "date": "20200505" }​​​​​​​

brandOrgCode - used to select records to process from DB

strategyCode - based on the value the File Processor reads configuration

date - used to select records to process from DB; date format 'yyyymmdd'

Validate the input message

The input message must contain fields:

  • strategyCode

  • brandOrgCode

  • date - in the correct format

If any of the fields are missing or blank or have an incorrect format, the error message is sent to the error queue.

Send Error Mail

Error mail body is prepared based on template and sent via Amazon Mail Service. No message removal from the input queue is required.

Read DB configuration

Configuration is stored in noSQL DB. The primary key is strategyCode.

Configuration contains the following fields:

strategyCode - the primary key

queries - list of queries to execute. Queries are templates and can be populated with brandOrgCode, date and treatmentCodes.

treatmentCodes - list of treatment codes linked to given brandOrgCodeId

 

example

{ "strategyCode": "MJNPOL_PUSH", "outputFiles": [ { "queryTemplate": "SELECT ... FROM TABLE1 ...", "fileNamePrefix": "C_APP-PUSH-ADDR-TRACK_SFMC" }, { "queryTemplate": "SELECT ... FROM TABLE2 ...", "fileNamePrefix": "C_CMP-METADATA_SFMC" } ], "treatmentCodes": [ "codeA", "codeB" ] }

 

Execute SQL queries

Queries are executed parallelly. queryTemplate is taken from DB configuration and formatted with some values (treatmentCodes, exportDateTime etc.) The process waits for all queries to be finished before goes to the next step.

All queries must create S3 result files in the same path with headers. The output file name is defined in queries. If a query does not produce any results, then a file containing only headers is generated.

If at least one query failed, an error mail is sent.

 

outputFileName = {fileNamePrefix}_{brandOrgCode}_{exportDateTime}.dat

fileNamePrefix - value from DB configuration

exportDateTime - is the time of executing query. All generated files in the same batch must have the same value. The format is ‘yyyyMMddHHmmss

s3FolderPath = s3://{OutputFileBucketName}/{OutputFileFolderName}/{strategyCode}/{date}

OutputFileBucketName and OutputFileFolderName are defined in service configuration

 

Send a message to the output queue

Once the message is sent to the output queue, the message from the input queue is removed.

Message format:

{​​​​​​​ "files": [ "s3://someBucket/someFolder/output/MJNPOL_PUSH/20200505/C_APP-PUSH_MJNPOL_20200506142356.dat", "s3://someBucket/someFolder/output/MJNPOL_PUSH/20200505/C_CMP-PUSH_MJNPOL_20200506142356.dat" ], "strategyCode": "MJNPOL_REGULAR", "date": "20210601" }​​​​​​​

files - list of all files generated by queries

 

Glossary

originalMessage - is the message read from the input queue