NUTRINTG (2) File Set Analyzer
Short Summary
Input | cron scheduler of specific region or region batch |
---|---|
Output |
|
Region-awareness | 6 instances with different Spring Boot profiles (amer, sea, eu, us1, us2, us3) |
Scalability | not possible, max 1 instance for a profile because of cron scheduler |
Metadata Collections |
|
Flow Chart
Run every N minutes. On production schedule is set by
process.cron-expression
property to0 0 * * * *
in UTC on all regions.If there is a message, calculate next delta according to the following algorithm.
Try to retrieve the latest already verified delta from cdpToSfmcFileSetAnalyzer_fullDeltas collection. Retrieval is done via sorting by createdDate. This is an example of a document.
{ "_id" : ObjectId("61276062e9d552087ff86224"), "region" : "amer_usa1", "deltaDate" : "20211001", "createdDate" : ISODate("2021-10-01T00:00:00.000Z") }
If there are no documents, then next delta is automatically required. To calculate it, get current time in millis in UTC and deduct the value of
process.midnight-offset-in-ms
property, then convert it to real date.If there is a document, try to check if this is a current day for a region. To do this, get current time in millis in UTC and deduct the value of
process.midnight-offset-in-ms
property. Then convert it to real date and see if it matches deltaDate from the latest document from the database.If the latest document has the current deltaDate, then next delta is not required and we just send a success email and finish. Otherwise, add the N days to deltaDate. The number of days is determined by
process.delta-interval-in-days
and is supposed to be set to1
.
Next try to find out if all the necessary files are on S3. Look for the files in the directory where first part is the value of
process.master-region-name
property and the second one is delta date (e.g. amer_us/20211001). Then check if all the files match the regex fromprocess.file-name-regex
property (e.g.rb_amer_usa_(?<entity>.*)_(?<deltaDate>[0-9]{8})[0-9]{6}\\.(?<extension>.*)
). Then we substitute entity in regex with all the values ofprocess.entities
property with all the extensions fromprocess.extensions
property, effectively making a cartesian product. For example, ifprocess.entities
for region is set toorder,customer
andprocess.extensions
is set todat.pgp,md5
, then we expect 4 files to be present (as presented below). If not all the files are present, send an error email and retry the whole process during next run.rb_amer_usa_order_20211001120000.dat.pgp
rb_amer_usa_order_20211001120000.md5
rb_amer_usa_customer_20211001120000.dat.pgp
rb_amer_usa_customer_20211001120000.md5
As we are usually waiting files to come in pairs of
.dat.pgp
and.md5
, checksum verification is determined bychecksum-verification.enabled
property.If checksum verification is enabled, we calculate the checksum for each of
.dat.pgp
files and compare it to the content of corresponding.md5
file. In order for verification to succeed, a document for a file has to be in cdpToSfmc_fileMetadataStore collection (created by File Sender). Then we update the document with calculated checksum. If for any file verification fails, send an error email and retry the whole process during next run.If verification succeeds, create a list of only files with data (i.e.
.dat.pgp
), send them in a message to outbound queue (aws.sqs.queue-url
), create a new metadata document in cdpToSfmcFileSetAnalyzer_fullDeltas collection and send a success email. Example of a message is below.{ "region" : "amer_usa1", "deltaDate" : "20211001", "filePaths" : [ "amer_usa/20211001/rb_amer_usa_order_20211001120000.dat.pgp", "amer_usa/20211001/rb_amer_usa_customer_20211001120000.dat.pgp" ] }
Support Tips and Notes
As there are 4 regions but amer_usa is split into 3 batches, there are 6 instances of this service and 6 Spring profile with properties. Properties can be accessed under https://bitbucket.org/rbdigital/spring-boot-cdp-to-sfmc-integration. Production logs can be observed and downloaded from https://kubernetes-dashboard-production.frankfurt.rbdigitalcloud.com under namespace cdp-to-sfmc-integration. For access, the platform team has to be contacted.
Error emails on production are currently sent to cdp.middleware@rb.com mailbox. Try to check it and read what is an error. Errors are usually self-explanatory, describing what files are missing or so.
If there are any errors, try to fix it and see if delta will be successfully verified in next full hour (if cron expression is standard
0 0 * * * *
).