NUTRINTG SFMC to CDP Integration
- Sebastian Baltruszewicz
Repository: https://bitbucket.org/rbdigital/cdp-sfmc-files-to-cdp-integration/
Epsilon CDP spec: RB_SFMC to CDP Integration_FSD_V6.docx
SFMC spec: Reckitt Benckiser BU Strategy Document v1.3.docx
Middleware process documentation for Security team (in Polish): Dokumentacja-SFMC-TO-CDP.docx
Preconditions:
Brand-Org-Code is unique globally (particular Brand-Org-Code will not be present within two different Markets / Countries).
1) SFMC → SFTP → Middleware → SFTP → CDP
From: /wiki/spaces/DCP/pages/374571120
Mulesoft ingest SFMC Return feed and sends it to CDP. This contains the customer engagement data for example open, clicks, bounces and etc.
1) SFMC → SFTP
SFMC transfers data files from SafeHouse to the Enhanced FTP export folder. From here RB have access to them.
Up to date schema:
Remarks:
- There is a mistake in the diagram. Instead of each: SubscriberKey should be Individual_ID.
- FIelds order in the Opens.csv is: ClientID, SendID, SubscriberKey, EmailAddress, SubscriberID, ListID, EventDate, EventType, BatchID, TriggeredSendExternalKey.
- Contacts and Subscribers tables are not sent.
Enterprise sFTP (covers all Business Units) HOST: mc1plnsfxggrfg-l69pj913rd1x4.ftp.marketingcloudops.com | Brand level BU sFTP | Market level BU sFTP | ||||
---|---|---|---|---|---|---|
1 | ALL_EMAIL_TRK_<yyyymmdd>.zip (= Email on diagram) | 24 hours delta | <BRANDORG>_JOURNEY_TRK_<yyyymmdd>.csv (= Journey Builder on diagram) | Full data range | <CountryCode>_JOURNEY_TRK_<yyyymmdd>.csv (= Journey Builder on diagram) | Full data range |
2 | Sent.csv | 24 hours delta | <BRANDORG>_JOURNEY_ACT_TRK_<yyyymmdd>.csv (= Journey Builder on diagram) | Full data range | <CountryCode>_JOURNEY_ACT_TRK_<yyyymmdd>.csv (= Journey Builder on diagram) | Full data range |
3 | Opens.csv | 24 hours delta | <BRANDORG>_SENDLOG_TRK_<yyyymmdd>.csv (=SendLog on diagram) | 24 hours delta | <CountryCode>_SENDLOG_TRK_<yyyymmdd>.csv (=SendLog on diagram) | 24 hours delta |
4 | Complaints.csv | 24 hours delta | <BRANDORG>_SMS_TRK_<yyyymmdd>.csv (=SMS Message Tracking on diagram) | ? Kam should update table (not in the scope) | <CountryCode>_SMS_TRK_<yyyymmdd>.csv (=SMS Message Tracking on diagram) | ? Kam should update table (not in the scope) |
5 | Bounces.csv | 24 hours delta | ||||
6 | SendJobs.csv | 24 hours delta | ||||
7 | Clicks.csv | 24 hours delta | ||||
8 | Unsubs.csv | 24 hours delta | ||||
9 | StatusChanges.csv | 24 hours delta |
Files specification:
CSV generated using DOS/Windows linefeed format and saved in UTF8 format First line will contain field headers CSV files are comma ( , ) DEPRICATED (BU Strategy Document v1.3 is not correct here): delimited with text enclosed in double-quotes ( “ ) No /n /t " characters None of the fields is required = each is nullable Datetime format: mm/dd/yyyy hh:mm:ss AM/PM CST.(server timezone for the extracts is CST)
Based on (5.4.1 File Naming Convention): /wiki/spaces/DCP/pages/374571120
2) SFTP → 3) Middleware
- Cron scheduler checks if files appeared on SFMC SFTP EP-198 - Getting issue details... STATUS EP-197 - Getting issue details... STATUS
- Download all above described files, apart from <BRANDORG>_SMS_TRK_<yyyymmdd>.csv, to S3. EP-191 - Getting issue details... STATUS
- Create folder “yyyymmdd” in the folder “Archive” on SFMC sFTP and move there files for backup purposes.
3) Middleware
- Load files to Athena
- If REPAIR TABLE or CTAS error occurs while communicating with Athena from Mule send templated email (as in EP-96). EP-238 - Getting issue details... STATUS
Generate 3 files (for this step in JSON format) using Amazon Athena queries: EP-192 - Getting issue details... STATUS
and change TriggeredSendExternalKey field value to Null, if it's RTM message EP-410 - Getting issue details... STATUSFile specification (target - for final steps):
Name: Type + <BRANDORG>_YYYYMMddHHmmss.dat Where YYYYMMddHHmmss is in CST time zone. Types: C_CMP-METADATA_, C_RESPONSE_SFMC-, C_SEND_SFMC- Examples: C_SEND_SFMC-MJNCAN_20190319145505.dat, C_CMP-METADATA_RBGBRDUR_20190319145505.dat Headers: required. The case does not matter in the column header. It can be sent as “eventtype” or “EVENTTYPE”. Delimiter: pipe (|), optionally enclosed by double-quotes (“”) Row delimiter: CRLF Encryption: PGP MD5 Checksum: Yes
File 1: C_SEND_SFMC-<BRANDORG>_YYYYMMddHHmmss.json
Click here to expand...It is important to note that by itself the email tracking extract will not have the context of which business unit sent the email. In order to establish this, email SendID has to be joined with jobID in the sendLog data extract when imported into CDP. SendLog captures BrandOrgCode for every email that is sent.
Combine Sent (if the TriggeredSendExternalKey is filled in, then this represents an RTM message) and SendLog (<BRANDORG>_SENDLOG_TRK.csv) files using:
- Sent.SendID = SendLog.JobID
- Sent.Individual_ID= SendLog.Individual_ID
- Sent.BatchID = SendLog.BatchID
Rationale: email tracking extract doesn't have information of which business unit sent the email. jobID extract is joined as it captures BrandOrgCode for every email that is sent.
SELECT s.ClientID, s.SendID, s.Individual_ID, s.EmailAddress, s.SubscriberID, s.ListID, s.EventDate, s.EventType, s.BatchID, s.TriggeredSendExternalKey, sl.Treatment_Code, sl.Child_ID sl.Timestamp, sl.Brand_Org_Code, sl.Profile_ID, sl.SMSJobID, sl.SubID, sl.TriggeredSendID, sl.ErrorCode, sl.SmsBatchID, sl.SmsTriggeredSendID, FROM Sent s INNER JOIN SendLog sl ON s.SendID = sl.JobID AND s.Individual_ID = sl.Individual_ID AND s.BatchID = sl.BatchID
SOURCE COLUMN FINAL COLUMN DATA TYPE
LENGTH
MANDATORY Sent.ClientID ClientID Numeric
Big integer
Y
Sent.SendID SendID Number
Big integer
Y
Sent.Individual_ID Individual_ID String
100
Y
Sent.EmailAddress EmailAddress String
100
Y
Sent.SubscriberID SubscriberID Number
Big integer
N
Sent.ListID ListID Number
Big integer
Y
Sent.EventDate EventDate Datetime
Y
Sent.EventType EventType String
6
Y
Sent.BatchID BatchID Number
Big integer
Y
Sent.TriggeredSendExternalKey TriggeredSendExternalKey String
100
N (Mandatory in case of RTM)
SendLog.Treatment_Code Treatment_Code Text
256
Y
SendLog.Child_ID Child_ID Text
256
N
SendLog.Timestamp SendLogTimestamp Date
N
SendLog.Brand_Org_Code Brand_Org_Code Text
256
Y
SendLog.Profile_ID Profile_ID Text
32
N
sendlog.SMSJobID
SMSJobID
String
N
sendlog.SubID
SubID
String
N
sendlog.TriggeredSendID
TriggeredSendID
String
N
sendlog.ErrorCode
ErrorCode
String
N
sendlog.SmsBatchID
SmsBatchID
String
N
sendlog.SmsTriggeredSendID
SmsTriggeredSendID
String
N
Contains 24h delta.
File 2: C_RESPONSE_SFMC-<BRANDORG>_YYYYMMddHHmmss.json
Click here to expand...Combine Opens, Complaints, Bounces, Clicks, Unsubs using UNION query.
all of those files have below columns:
ClientID EventType EventDate SendID Individual_ID TriggeredSendExternalKey EmailAddress SubscriberID ListID BatchID For the fields that are only on a single file, the values will be NULL for records from one of the other files.
SELECT * FROM ( (SELECT o.ClientID, o.EventType, o.EventDate, o.SendID, o.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE2'', ...) THEN o.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, o.EmailAddress, o.SubscriberID, o.ListID, o.BatchID, null AS Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Opens o INNER JOIN SendLog sl ON o.SendID = sl.JobID AND o.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (o.export_date = ''{0}'')) UNION (SELECT sc.ClientID, sc.EventType, sc.EventDate, sc.SendID, sc.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE2'', ...) THEN sc.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, sc.EmailAddress, sc.SubscriberID, sc.ListID, sc.BatchID, sc.Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Spam_Complaints sc INNER JOIN SendLog sl ON sc.SendID = sl.JobID AND sc.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (sc.export_date = ''{0}'')) UNION (SELECT b.ClientID, b.EventType, b.EventDate, b.SendID, b.Individual_ID, (CASE WHEN sl.treatment_code IN (''ESPWEMPOST01'', ''TREATMENTCODE2'', ...) THEN b.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, b.EmailAddress, b.SubscriberID, b.ListID, b.BatchID, null AS Domain, b.BounceCategory, b.BounceReason, b.SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Bounces b INNER JOIN SendLog sl ON b.SendID = sl.JobID AND b.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (b.export_date = ''{0}'')) UNION (SELECT c.ClientID, c.EventType, c.EventDate, c.SendID, c.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE2'', ...) THEN c.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, c.EmailAddress, c.SubscriberID, c.ListID, c.BatchID, null AS Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, c.SendURLID, c.URLID, c.URL, c.Alias, c.Browser, c.EmailClient, c.OperatingSystem, c.Device FROM Clicks c INNER JOIN SendLog sl ON c.SendID = sl.JobID AND c.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (c.export_date = ''{0}'')) UNION (SELECT u.ClientID, u.EventType, u.EventDate, u.SendID, u.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE1'', ...) THEN u.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, u.EmailAddress, u.SubscriberID, u.ListID, u.BatchID, null AS Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Unsubs u INNER JOIN SendLog sl ON u.SendID = sl.JobID AND u.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (u.export_date = ''{0}'')))
SOURCE COLUMN FINAL COLUMN DATA TYPE
LENGTH
MANDATORY AllFiles.ClientID ClientID Numeric
Big integer
Y
AllFiles.EventType EventType String
6
Y
AllFiles.EventDate EventDate Datetime
Y
AllFiles.SendID SendID Number
Big integer
Y
AllFiles.Individual_ID Individual_ID String
100
Y
AllFiles.TriggeredSendExternalKey TriggeredSendExternalKey String
100
N, Mandatory in case of RTM
AllFiles.EmailAddress EmailAddress String
100
Y
AllFilest.SubscriberID SubscriberID Number
Big integer
Y
AllFiles.ListID ListID Number
Big integer
Y
AllFiles.BatchID BatchID Number
Big integer
Y
Spam Complaints.Domain Domain String
100
N
Bounces.BounceCategory BounceCategory String
50
N, Mandatory in case of bounce
Bounces.BounceReason BounceReason String
8000
N, Mandatory in case of bounce
Bounces.SMTPCode SMTPCode Numeric
Small integer
N
Clicks.SendURLID SendURLID Number
Big integer
N
Clicks.URLID URLID Number
Big integer
N
Clicks.URL URL String
4000
N
Clicks.Alias Alias String
500
N
Clicks.Browser Browser Text
none
N
Clicks.EmailClient EmailClient Text
none
N
Clicks.OperatingSystem OperatingSystem Text
none
N
Clicks.Device Device
Text
none
N
Contains 24h delta.
File 3: C_CMP-METADATA_SFMC-<BRANDORG>_YYYYMMddHHmmss.json
Click here to expand...Combine SendJobs, JourneyActivity (<BRANDORG>_JOURNEY_ACT_TRK.csv), Journey (<BRANDORG>_JOURNEY_TRK.csv) files using:
- SendJobs File to Journey Activity (left join since not all SendJobs originate from a Journey) using
- SendJobs.TriggeredSendExternalKey = JourneyActivity.JourneyActivityObjectID
- Journey activity File to Journey (left join) using
- JourneyActivity.VersionID = Journey.VersionID
SELECT sj.ClientID sj.SendId, sj.FromName, sj.FromEmail, sj.SchedTime, sj.SentTime, sj.Subject, sj.EmailName, sj.TriggeredSendExternalKey, sj.SendDefinitionExternalKey, sj.JobStatus, sj.PreviewURL, sj.IsMultipart, ja.VersionID, ja.ActivityID, ja.ActivityName, ja.ActviityExternalKey, ja.ActivityType, j.JourneyID, j.JourneyName, j.VersionNumber, j.CreatedDate, j.LastPublishedDate, j.ModifiedDate, j.JourneyStatus FROM sendjobs sj LEFT JOIN journey_activity ja ON sj.TriggeredSendExternalKey = ja.JourneyActivityObjectID LEFT JOIN journey j ON ja.VersionID = j.VersionID + INNER JOIN SendLog (for specific brand_org_code) sl ON sj.SendID = sl.JobID // In order to constraint to specific brand_org_code
SOURCE COLUMN FINAL COLUMN DATA TYPE
LENGTH
Mandatory Fields
SendJobs.ClientID ClientID Numeric
Big integer
Y
SendJobs.SendId SendId Number
Big integer
Y
SendJobs.FromName FromName String
130
N
SendJobs.FromEmail FromEmail String
100
N
SendJobs.SchedTime SchedTime Datetime
mm/dd/yyyy hh:mi:ss AM/PM
N
SendJobs.SentTime SentTime Datetime
mm/dd/yyyy hh:mi:ss AM/PM
N
SendJobs.Subject Subject String
200, but can exceed this value depending on selected extract options
N
SendJobs.EmailName EmailName String
100
N
SendJobs.TriggeredSendExternalKey TriggeredSendExternalKey String
100
Y
SendJobs.SendDefinitionExternalKey SendDefinitionExternalKey String
100
N
SendJobs.JobStatus JobStatus String
30
N
SendJobs.PreviewURL PreviewURL String
300
N
SendJobs.IsMultipart IsMultipart String
5
N
JourneyActivity.VersionID VersionID Text
256
Y
JourneyActivity.ActivityID ActivityID Text
256
N
JourneyActivity.ActivityName ActivityName Text
256
N
JourneyActivity.ActviityExternalKey ActviityExternalKey Text
256
N
JourneyActivity.ActivityType ActivityType Text
256
N
Journey.JourneyID JourneyID Text
256
N
Journey.JourneyName JourneyName Text
256
N
Journey.VersionNumber VersionNumber Number
N
Journey.CreatedDate CreatedDate Date
mm/dd/yyyy hh:mm:ss AM/PM
N
Journey.LastPublishedDate LastPublishedDate Date
mm/dd/yyyy hh:mm:ss AM/PM
N
Journey.ModifiedDate ModifiedDate Date
mm/dd/yyyy hh:mm:ss AM/PM
N
Journey.JourneyStatus JourneyStatus Text
256
N
Contains 24h delta.
- Transfer files to our EC2 host EP-193 - Getting issue details... STATUS
- Unzip files
- Create empty files with headers (if Athena hasn't returned any file and there was no exception) EP-194 - Getting issue details... STATUS
- Convert to csv (extension .dat) and change file names EP-237 - Getting issue details... STATUS
- PGP encrypt files (same production public key for all CDP integrations) frms_prod_key_pub.asc EP-195 - Getting issue details... STATUS
- Create MD5
Original file consolidation description file (for archival purposes): SFMC Return File Consolidation.xlsx
3) Middleware → 4) SFTP
- Transfer files from EC2 to CDP by country daily in a different sets before 19:30 UTC, when the CDP data warehouse batch load starts.
The folder structure will be as shown below. The folder names will be the same as the 3-character ISO country codes for the respective countries.->incoming
-> MEX
-> POL
-> USA - EP-430 - Getting issue details... STATUS
- Transferred (encrypted) files are retained on EC2 (S3 bucket as virtual drive) for 21 days for backup purposes.
EP-196 - Getting issue details... STATUS
EP-349 - Getting issue details... STATUS
EP-347 - Getting issue details... STATUS
EP-197
-
Getting issue details...
STATUS
HOST | sftp.skynet.epsilon.com |
---|---|
UAT Username | RBNK_Q_SFMC |
PROD Username | RBNK_P_SFMC |
Port | 22 |
EP-333 - Getting issue details... STATUS
4) SFTP → 5) CDP
- Decrypt files
- Decompress files
- FRMS and batch load process will perform audit checks to validate file name, record count and data compliance
- Files then go through the ETL process to be loaded into the data warehouse.
- Refresh loads the data to the data mart.
- Subsequently, the extracts from the CDP data mart are pulled and transferred to the SFTP location for pickup by the RB middleware.
- Perform a “test” initial load extract with one or two incremental updates for the initial POC IFCN brand. Then shut the update off so we have a stable environment
- Create some “fake” test subscribers represented by various RB and Epsilon employees (or create a fake BU in SFMC with just those individuals)
- Generate “campaigns” against the test subscribers
- Each respondent has a set of instructions on what to do with the email they receive (open, click, unsubscribe, etc)
- Receive the actual data from those campaigns and response back from SFMC
- Epsilon tests the return file feeds and this is the basis of the UAT
- Once UAT is complete and RB has done all of their campaign development/testing for the real IFCN campaigns they clear the BU and Epsilon sends a new initial load with updates going to SFMC every day thereafter.