NUTRINTG File Processor (DEPRACATED/OUTDATED DOCUMENTATION)
Infrastructure
All infrastructure are placed in AWS cloud.
Input queue - is the entry point to the service
ACK Timeout = 30 min - reserved time to return a response to the queue. If the response will not be sent, the message will be available to be consumed once again
Message retention period = 7 days - reserved time to pop out a message. If not happen, the message will be removed automatically without processing
Max redelivery counter = 1 - max number of retry to process the message. If the message will not be consumed successfully after the last try it will be placed into Dead-letter queue
Dead-letter queue - queue with messages processed without any result within a given timeframe (ACK Timeout) - the dead-letter-queue is common for all services
Output queue - is the exit point from the service
Aurora - SQL database; PostgreSQL implementation. Used to execute SQL queries on files in order to create new set of files to be sent to CDP.
noSQL DB - MongoDB / DocumentDB - used to store process configuration
Service configuration
InputQueueName = sfmc-to-cdp-file-processor-input-queue-test.fifo
OutputQueueName = sfmc-to-cdp-file-encrypter-input-queue-test.fifo
DeadLetterQueueName = sfmc-to-cdp-dead-letter-queue-test.fifo (COMMON FOR ALL SFMC-TO-CDP SERVICES)
AuroraConnection
OutputFileBucketName = newSfmcToCdp
OutputFileFolderName = output
To check/discuss
If files on S3 as a result of SQL query are overridden when the query is executed once again.
We do not bother with the overriding mechanism, since files will have different ‘exportDateTime’ in file names across query executions.If files generated by SQL queries should be erased manually - I think there is a logic on S3 to autodelete files older than X days
file-analyzer sends date or dateTime? Which date/dateTime have records put into tables? Is the value propagated to further services?
File-analyzer sends date. Records in tables have assigned ‘date’. ‘Date’ values are the same for records in DB and in the messages sent by File-analyzer
Processing messages
Flow chart
Flow chart description for processing messages
Receive a message on the input queue
The process is triggered by an input message sent to the input SQS queue.
Message format:
{
"strategyCode": "MJNPOL_PUSH",
"brandOrgCode": "MJNPOL",
"date": "20200505"
}
brandOrgCode - used to select records to process from DB
strategyCode - based on the value the File Processor reads configuration
date - used to select records to process from DB; date format 'yyyymmdd'
Validate the input message
The input message must contain fields:
strategyCode
brandOrgCode
date - in the correct format
If any of the fields are missing or blank or have an incorrect format, the error message is sent to the error queue.
Send Error Mail
Error mail body is prepared based on template and sent via Amazon Mail Service. No message removal from the input queue is required.
Read DB configuration
Configuration is stored in noSQL DB. The primary key is strategyCode.
Configuration contains the following fields:
strategyCode - the primary key
queries - list of queries to execute. Queries are templates and can be populated with brandOrgCode, date and treatmentCodes.
treatmentCodes - list of treatment codes linked to given brandOrgCodeId
example
{
"strategyCode": "MJNPOL_PUSH",
"outputFiles": [
{
"queryTemplate": "SELECT ... FROM TABLE1 ...",
"fileNamePrefix": "C_APP-PUSH-ADDR-TRACK_SFMC"
},
{
"queryTemplate": "SELECT ... FROM TABLE2 ...",
"fileNamePrefix": "C_CMP-METADATA_SFMC"
}
],
"treatmentCodes": [
"codeA",
"codeB"
]
}
Execute SQL queries
Queries are executed parallelly. queryTemplate is taken from DB configuration and formatted with some values (treatmentCodes, exportDateTime etc.) The process waits for all queries to be finished before goes to the next step.
All queries must create S3 result files in the same path with headers. The output file name is defined in queries. If a query does not produce any results, then a file containing only headers is generated.
If at least one query failed, an error mail is sent.
outputFileName = {fileNamePrefix}_{brandOrgCode}_{exportDateTime}.dat
fileNamePrefix - value from DB configuration
exportDateTime - is the time of executing query. All generated files in the same batch must have the same value. The format is ‘yyyyMMddHHmmss’
s3FolderPath = s3://{OutputFileBucketName}/{OutputFileFolderName}/{strategyCode}/{date}
OutputFileBucketName and OutputFileFolderName are defined in service configuration
Send a message to the output queue
Once the message is sent to the output queue, the message from the input queue is removed.
Message format:
{
"files": [
"s3://someBucket/someFolder/output/MJNPOL_PUSH/20200505/C_APP-PUSH_MJNPOL_20200506142356.dat",
"s3://someBucket/someFolder/output/MJNPOL_PUSH/20200505/C_CMP-PUSH_MJNPOL_20200506142356.dat"
],
"strategyCode": "MJNPOL_REGULAR",
"date": "20210601"
}
files - list of all files generated by queries
Glossary
originalMessage - is the message read from the input queue