NUTRINTG CDP to SFMC Integration
- Sebastian Baltruszewicz
Epsilon CDP spec: MJN_CDP outbound extracts to SFMC_FSD_V1.5.docx
SFMC spec: Reckitt Benckiser BU Strategy Document v1.3.docx
Process description in Polish prepared by developer Tristan Szaniawski for RB security department (Bartlomiej.Pokrzywinski@rb.com): procedura.txt
Preconditions:
Brand-Org-Code is unique globally (particular Brand-Org-Code will not be present within two different Markets / Countries).
1) CDP → 2) SFTP → 3) Middleware
15 of ".dat.pgp" and 15 of ".md5" files for each region (EUR, SEA, AMER). Files contain customers data delta (changes) during the last 24h.
FILE SPECIFICATION | |
Source | CDP DataMart |
---|---|
File Name | RB_<country>_<filetype_YYYYMMDDHHMMSS.dat |
Field Delimiter | Pipe (|) |
Row Delimiter | CRLF |
Encoding | UTF8 |
Primary Key | Individual_ID |
Headers | YES |
Encryption | PGP |
MD5 | Yes |
Datetime fields format | YYYY-MM-DD H24:MM:SS in UTC timezone |
15 Files content | CDP to SFMC Data Dictionary 20190307_required .xlsx |
Files content description
CG_Subscription | Profile | Activity_Detail | KM_Web_Activity | Promotion_History | Survey | Shopify | CSL |
---|---|---|---|---|---|---|---|
Contains all the subscription an Individual is opted-in to. This table will also contain the brand and market level opt-ins for an Individual. Join back to the customer table with the Individual_ID. | Contains a non-consolidated view of all Individuals along with the Source Account and Data Source identifiers. | Contains key actions taken by the consumer either through a GWT website, purchased list or Call Centre. Activities include Registration, Profile edit, Sample Request etc. | Contains consumer browsing activity as captured by KissMetrics on MJN websites such as video viewed, articles read, etc. | Contains all promotion history associated with the caregiver. Promotion history spans all channels including Direct Mail (DM), Email (EM), Text (SM) and Phone (PH). | Contains responses to questions asked in market and/or to capture dynamic data such as nutritionist consultants name | CA only. Extract containing Shopify Order Detail and Product data will be created and sent to SFMC for campaigning daily after the CDP Data Mart refresh completes. | The CSL table contains a listing of customer keys that have been made obsolete. All records listed in the table will programmatically be suppressed by the ESP |
CDP Extracts are sent after the Mart refresh is complete for a particular region.
Host | sftp.skynet.epsilon.com | ||||
Port | 22 | ||||
Authentication | SSH (RSA) | ||||
Region | UAT Username | PROD Username | SFTP folder | Uploading start time (UTC) | Uploading end time (UTC) |
---|---|---|---|---|---|
SEA | rbnk_q_sfmc_sea | rbnk_p_sfmc_sea | (username)_<country>/outgoing | 20:30 | 23:30 |
EU | rbnk_q_sfmc_eu | rbnk_p_sfmc_eu | 23:30 | 00:10 | |
AMER | rbnk_q_sfmc_amer | rbnk_p_sfmc_amer | 09:45 | 11:45 |
Applicatoin description
"cdp-to-sfmc-files-integration" application is deployed on cloud and has entity for each of the region: AMER-PROD-1, EU-PROD-1, SEA-PROD-1. It is started once a day by cron. Configuration is set in the file "application-PROD.properties" and among others contains credentials for connecting with:
- Middlware EC2 server
- MongoDB
- Amazon S3 and Athena services in the AWS cloud
Connection CDP sFTP → EC2 is secured by SSH and EC2 is whitelisted on CDP sFTP.
Scripts are executed on EC2 with SSH by Mule.
Steps
- Download files from CDP sFTP "/outgoing/" folder to Middleware EC2, archive downloaded files.
"1_download_files.sh" script executed on EC2:- Connect to CDP sFTP
- Download all *.DAT.PGP files to "current/encrypted" folder on EC2. Files are downloaded with an approximate speed of 1mln records per hour.
- Download all *.MD5 files to "current/md5" folder on EC2
- Create "/archive/TTT" catalog on CDP sFTP and move there downloaded files for archival purposes. TTT is datetime fragment picked from "customer" file.
- IF not all files are available on CDP sFTP script will return an error code, which will throw an exception in Mule and stop the process.
EP-234 - Getting issue details... STATUS
EP-253 - Getting issue details... STATUS
EP-346 - Getting issue details... STATUS
EP-105 - Getting issue details... STATUS
- Check md5 to verify files are undamaged (all md5 are copied to one file all.md5 and checked by EC2 script):
- Copy checksums (content) from MD5 files to file "all.md5" - "2_calculate_checksums.sh" script.
- - "3_verify_checksums.sh" script.
- Decrypt files to the folder: "current/decrypted" - "4_decrypt_files.sh" script.
- Move *.DAT.PGP and *.MD5 files to Middleware archive folder "current/acrhive/TTT", where TTT is current server datetime.
- Remove quotes and \n from the content of the fields (otherwise Amazon Athena splits one records to two) - "6_remove_line_brakes.sh" script.
EP-249 - Getting issue details... STATUS - Transfer CSV(*.dat) files from EC2 "cureent/decrypted" to S3 ??? Is it up-to-date step if we mounted S3 on EC2?
EP-110 - Getting issue details... STATUS
3) Middleware
- Split files per BU using Amazon Athena in a streaming manner. Requests content are available in the folder: "resources/athena-queries/PROD/ctas/[amer|eu|sea]" in the source code.
EP-59 - Getting issue details... STATUS - Write valid and invalid records to different files
EP-73 - Getting issue details... STATUS
File specification | |
---|---|
Name | deName-marketCode-brandOrgCode-yyyy_MM_dd_HH_mm_ss-***.gz, where:
|
Format | CSV |
Headers | No |
Delimiter | SoH (ASCII /001) |
Compression | Each file is GZIPped (.gz) into it’s own archive. However: EP-224 - Getting issue details... STATUS |
3) Middleware → 4) SFTP → 5)SFMC
- Upload files from S3 to each BU sFTP according to config in MongoDB (to two BU main and)
File destination:**/Triggered_Automations/CDP Import**
If at any stage of the whole process some error occurs error email is sent.
EP-151 - Getting issue details... STATUS
EP-47 - Getting issue details... STATUS
EP-102 - Getting issue details... STATUS
Target Data extension | Refresh | Contain deletes? |
Customer | Delta | Yes |
---|---|---|
CG_Subscription | Delta | No |
Profile | Delta | Yes |
Profile_Item_List | Delta | No |
Child | Delta | Yes |
Child_Enrollment | Delta | No |
Activity_Detail | Delta | No |
Samp_Req_Detail | Delta | No |
KM_Web_Activity | Delta | No |
Promotion_History | Delta | No |
Survey | Delta | No |
Shopify | Delta | No |
Voucher_Redemption | Delta | No |
CSL | Delta | No |