/
NUTRINTG (2) File Set Analyzer

NUTRINTG (2) File Set Analyzer

Short Summary

Input

cron scheduler of specific region or region batch

Input

cron scheduler of specific region or region batch

Output

  • message to file-set-analyzer-to-file-decrypter SQS queue

  • email notification about the result

Region-awareness

6 instances with different Spring Boot profiles (amer, sea, eu, us1, us2, us3)

Scalability

not possible, max 1 instance for a profile because of cron scheduler

Metadata Collections

  • cdpToSfmc_fileMetadataStore

  • cdpToSfmcFileSetAnalyzer_fullDeltas

Flow Chart

  1. Run every N minutes. On production schedule is set by process.cron-expression property to 0 0 * * * * in UTC on all regions.

  2. If there is a message, calculate next delta according to the following algorithm.

    • Try to retrieve the latest already verified delta from cdpToSfmcFileSetAnalyzer_fullDeltas collection. Retrieval is done via sorting by createdDate. This is an example of a document.

      { "_id" : ObjectId("61276062e9d552087ff86224"), "region" : "amer_usa1", "deltaDate" : "20211001", "createdDate" : ISODate("2021-10-01T00:00:00.000Z") }
    • If there are no documents, then next delta is automatically required. To calculate it, get current time in millis in UTC and deduct the value of process.midnight-offset-in-ms property, then convert it to real date.

    • If there is a document, try to check if this is a current day for a region. To do this, get current time in millis in UTC and deduct the value of process.midnight-offset-in-ms property. Then convert it to real date and see if it matches deltaDate from the latest document from the database.

    • If the latest document has the current deltaDate, then next delta is not required and we just send a success email and finish. Otherwise, add the N days to deltaDate. The number of days is determined by process.delta-interval-in-days and is supposed to be set to 1.

  3. Next try to find out if all the necessary files are on S3. Look for the files in the directory where first part is the value of process.master-region-name property and the second one is delta date (e.g. amer_us/20211001). Then check if all the files match the regex from process.file-name-regex property (e.g. rb_amer_usa_(?<entity>.*)_(?<deltaDate>[0-9]{8})[0-9]{6}\\.(?<extension>.*)). Then we substitute entity in regex with all the values of process.entities property with all the extensions from process.extensions property, effectively making a cartesian product. For example, if process.entities for region is set to order,customer and process.extensions is set to dat.pgp,md5, then we expect 4 files to be present (as presented below). If not all the files are present, send an error email and retry the whole process during next run.

    • rb_amer_usa_order_20211001120000.dat.pgp

    • rb_amer_usa_order_20211001120000.md5

    • rb_amer_usa_customer_20211001120000.dat.pgp

    • rb_amer_usa_customer_20211001120000.md5

  4. As we are usually waiting files to come in pairs of .dat.pgp and .md5 , checksum verification is determined by checksum-verification.enabled property.

  5. If checksum verification is enabled, we calculate the checksum for each of .dat.pgp files and compare it to the content of corresponding .md5 file. In order for verification to succeed, a document for a file has to be in cdpToSfmc_fileMetadataStore collection (created by File Sender). Then we update the document with calculated checksum. If for any file verification fails, send an error email and retry the whole process during next run.

  6. If verification succeeds, create a list of only files with data (i.e. .dat.pgp), send them in a message to outbound queue (aws.sqs.queue-url), create a new metadata document in cdpToSfmcFileSetAnalyzer_fullDeltas collection and send a success email. Example of a message is below.

    { "region" : "amer_usa1", "deltaDate" : "20211001", "filePaths" : [ "amer_usa/20211001/rb_amer_usa_order_20211001120000.dat.pgp", "amer_usa/20211001/rb_amer_usa_customer_20211001120000.dat.pgp" ] }

Support Tips and Notes

  1. As there are 4 regions but amer_usa is split into 3 batches, there are 6 instances of this service and 6 Spring profile with properties. Properties can be accessed under https://bitbucket.org/rbdigital/spring-boot-cdp-to-sfmc-integration. Production logs can be observed and downloaded from https://kubernetes-dashboard-production.frankfurt.rbdigitalcloud.com under namespace cdp-to-sfmc-integration. For access, the platform team has to be contacted.

  2. Error emails on production are currently sent to cdp.middleware@rb.com mailbox. Try to check it and read what is an error. Errors are usually self-explanatory, describing what files are missing or so.

  3. If there are any errors, try to fix it and see if delta will be successfully verified in next full hour (if cron expression is standard 0 0 * * * *).