/
NUTRINTG SFMC to CDP Integration

NUTRINTG SFMC to CDP Integration

Repository: https://bitbucket.org/rbdigital/cdp-sfmc-files-to-cdp-integration/
Epsilon CDP spec: RB_SFMC to CDP Integration_FSD_V6.docx
SFMC spec: Reckitt Benckiser BU Strategy Document v1.3.docx
Middleware process documentation for Security team (in Polish): Dokumentacja-SFMC-TO-CDP.docx

Preconditions:

Brand-Org-Code is unique globally (particular Brand-Org-Code will not be present within two different Markets / Countries).

1) SFMC → SFTP → Middleware → SFTP → CDP

From: /wiki/spaces/DCP/pages/374571120

Mulesoft ingest SFMC Return feed and sends it to CDP. This contains the customer engagement data for example open, clicks, bounces and etc.

1) SFMC →  SFTP

 Click here to expand...

SFMC transfers data files from SafeHouse to the Enhanced FTP export folder. From here RB have access to them.

Up to date schema:

Remarks:

  1. There is a mistake in the diagram. Instead of each: SubscriberKey should be Individual_ID.
  2. FIelds order in the Opens.csv is: ClientID, SendID, SubscriberKey, EmailAddress, SubscriberID, ListID, EventDate, EventType, BatchID, TriggeredSendExternalKey.
  3. Contacts and Subscribers tables are not sent.


Enterprise sFTP (covers all Business Units)

HOST: mc1plnsfxggrfg-l69pj913rd1x4.ftp.marketingcloudops.com
User: 6231186

Brand level BU sFTPMarket level BU sFTP
1ALL_EMAIL_TRK_<yyyymmdd>.zip 
(= Email on diagram)
24 hours delta<BRANDORG>_JOURNEY_TRK_<yyyymmdd>.csv 
(= Journey Builder on diagram)
Full data range<CountryCode>_JOURNEY_TRK_<yyyymmdd>.csv 
(= Journey Builder on diagram)
Full data range
2Sent.csv24 hours delta<BRANDORG>_JOURNEY_ACT_TRK_<yyyymmdd>.csv 
(= Journey Builder on diagram)
Full data range<CountryCode>_JOURNEY_ACT_TRK_<yyyymmdd>.csv 
(= Journey Builder on diagram)
Full data range
3Opens.csv24 hours delta<BRANDORG>_SENDLOG_TRK_<yyyymmdd>.csv 
(=SendLog on diagram)
24 hours delta<CountryCode>_SENDLOG_TRK_<yyyymmdd>.csv 
(=SendLog on diagram)
24 hours delta
4Complaints.csv24 hours delta<BRANDORG>_SMS_TRK_<yyyymmdd>.csv 
(=SMS Message Tracking on diagram)
? Kam should update table
(not in the scope)
<CountryCode>_SMS_TRK_<yyyymmdd>.csv 
(=SMS Message Tracking on diagram)
? Kam should update table
(not in the scope)
5Bounces.csv24 hours delta



6SendJobs.csv24 hours delta



7Clicks.csv24 hours delta



8Unsubs.csv24 hours delta



9StatusChanges.csv24 hours delta




Files specification:

CSV generated using DOS/Windows linefeed format and saved in UTF8 format
First line will contain field headers
CSV files are comma ( , )     DEPRICATED (BU Strategy Document v1.3 is not correct here): delimited with text enclosed in double-quotes ( “ ) 
No /n /t " characters
None of the fields is required = each is nullable
Datetime format: mm/dd/yyyy hh:mm:ss AM/PM CST.(server timezone for the extracts is CST)

Based on (5.4.1 File Naming Convention): /wiki/spaces/DCP/pages/374571120


2) SFTP → 3) Middleware

 Click here to expand...
  1. Cron scheduler checks if files appeared on SFMC SFTP  EP-198 - Getting issue details... STATUS   EP-197 - Getting issue details... STATUS
  2. Download all above described files, apart from <BRANDORG>_SMS_TRK_<yyyymmdd>.csv, to S3.  EP-191 - Getting issue details... STATUS  
  3. Create folder “yyyymmdd” in the folder “Archive” on SFMC sFTP and move there files for backup purposes.


3) Middleware

 Click here to expand...


  1. Load files to Athena
  2. If REPAIR TABLE or CTAS error occurs while communicating with Athena from Mule send templated email (as in EP-96).  EP-238 - Getting issue details... STATUS
  3. Generate 3 files (for this step in JSON format) using Amazon Athena queries:  EP-192 - Getting issue details... STATUS
    and change TriggeredSendExternalKey field value to Null, if it's RTM message  EP-410 - Getting issue details... STATUS

    File specification (target - for final steps):

    Name: Type +  <BRANDORG>_YYYYMMddHHmmss.dat  Where YYYYMMddHHmmss is in CST time zone. 
    Types: C_CMP-METADATA_, C_RESPONSE_SFMC-, C_SEND_SFMC-  Examples: C_SEND_SFMC-MJNCAN_20190319145505.dat, C_CMP-METADATA_RBGBRDUR_20190319145505.dat
    Headers: required. The case does not matter in the column header. It can be sent as “eventtype” or “EVENTTYPE”.
    Delimiter: pipe (|), optionally enclosed by double-quotes (“”)
    Row delimiter: CRLF
    Encryption: PGP
    MD5 Checksum: Yes

    File 1: C_SEND_SFMC-<BRANDORG>_YYYYMMddHHmmss.json

     Click here to expand...

    It is important to note that by itself the email tracking extract will not have the context of which business unit sent the email. In order to establish this, email SendID has to be joined with jobID in the sendLog data extract when imported into CDP. SendLog captures BrandOrgCode for every email that is sent.  


    Combine Sent (if the TriggeredSendExternalKey is filled in, then this represents an RTM message) and SendLog (<BRANDORG>_SENDLOG_TRK.csv) files using:

    • Sent.SendID = SendLog.JobID
    • Sent.Individual_ID= SendLog.Individual_ID
    • Sent.BatchID = SendLog.BatchID

    Rationale: email tracking extract doesn't have information of which business unit sent the email. jobID extract is joined as it captures BrandOrgCode for every email that is sent.  

    SELECT s.ClientID,
           s.SendID,
           s.Individual_ID,
           s.EmailAddress,
           s.SubscriberID,
           s.ListID,
           s.EventDate,
           s.EventType,
           s.BatchID,
           s.TriggeredSendExternalKey,
           sl.Treatment_Code,
           sl.Child_ID
           sl.Timestamp,
           sl.Brand_Org_Code,
           sl.Profile_ID,
    			sl.SMSJobID,         
    			sl.SubID,            
    			sl.TriggeredSendID,  
    			sl.ErrorCode,       
    			sl.SmsBatchID,      
    			sl.SmsTriggeredSendID,
      FROM Sent s
           INNER JOIN SendLog sl
                   ON s.SendID = sl.JobID AND
                      s.Individual_ID = sl.Individual_ID AND
    					 s.BatchID = sl.BatchID


    SOURCE COLUMNFINAL COLUMN

    DATA TYPE

    LENGTH

    MANDATORY
    Sent.ClientIDClientID

    Numeric

    Big integer

    Y

    Sent.SendIDSendID

    Number

    Big integer

    Y

    Sent.Individual_IDIndividual_ID

    String

    100

    Y

    Sent.EmailAddressEmailAddress

    String

    100

    Y

    Sent.SubscriberIDSubscriberID

    Number

    Big integer

    N

    Sent.ListIDListID

    Number

    Big integer

    Y

    Sent.EventDateEventDate

    Datetime


    Y

    Sent.EventTypeEventType

    String

    6

    Y

    Sent.BatchIDBatchID

    Number

    Big integer

    Y

    Sent.TriggeredSendExternalKeyTriggeredSendExternalKey

    String

    100

    N (Mandatory in case of RTM)

    SendLog.Treatment_CodeTreatment_Code

    Text

    256

    Y

    SendLog.Child_IDChild_ID

    Text

    256

    N

    SendLog.TimestampSendLogTimestamp

    Date


    N

    SendLog.Brand_Org_CodeBrand_Org_Code

    Text

    256

    Y

    SendLog.Profile_IDProfile_ID

    Text

    32

    N

    sendlog.SMSJobID         

    SMSJobID          

    String

     

    N

    sendlog.SubID            

    SubID            

    String

     

    N

    sendlog.TriggeredSendID  

    TriggeredSendID  

    String

     

    N

    sendlog.ErrorCode        

    ErrorCode        

    String

     

    N

    sendlog.SmsBatchID       

    SmsBatchID       

    String

     

    N

    sendlog.SmsTriggeredSendID

    SmsTriggeredSendID

    String

     

    N

    Contains 24h delta.

    File 2: C_RESPONSE_SFMC-<BRANDORG>_YYYYMMddHHmmss.json

     Click here to expand...

    Combine Opens, Complaints, Bounces, Clicks, Unsubs using UNION query.

    all of those files have below columns:

    ClientID
    EventType
    EventDate
    SendID
    Individual_ID
    TriggeredSendExternalKey
    EmailAddress
    SubscriberID
    ListID
    BatchID


    For the fields that are only on a single file, the values will be NULL for records from one of the other files.


    SELECT * FROM (
    (SELECT o.ClientID, o.EventType, o.EventDate, o.SendID, o.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE2'', ...) THEN o.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, o.EmailAddress, o.SubscriberID, o.ListID, o.BatchID, null AS Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Opens o INNER JOIN SendLog sl ON o.SendID = sl.JobID AND o.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (o.export_date = ''{0}''))
    UNION 
    (SELECT sc.ClientID, sc.EventType, sc.EventDate, sc.SendID, sc.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE2'', ...) THEN sc.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, sc.EmailAddress, sc.SubscriberID, sc.ListID, sc.BatchID, sc.Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Spam_Complaints sc INNER JOIN SendLog sl ON sc.SendID = sl.JobID AND sc.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (sc.export_date = ''{0}'')) 
    UNION 
    (SELECT b.ClientID, b.EventType, b.EventDate, b.SendID, b.Individual_ID, (CASE WHEN sl.treatment_code IN (''ESPWEMPOST01'', ''TREATMENTCODE2'', ...) THEN b.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, b.EmailAddress, b.SubscriberID, b.ListID, b.BatchID, null AS Domain, b.BounceCategory, b.BounceReason, b.SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Bounces b INNER JOIN SendLog sl ON b.SendID = sl.JobID AND b.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (b.export_date = ''{0}'')) 
    UNION 
    (SELECT c.ClientID, c.EventType, c.EventDate, c.SendID, c.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE2'', ...) THEN c.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, c.EmailAddress, c.SubscriberID, c.ListID, c.BatchID, null AS Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, c.SendURLID, c.URLID, c.URL, c.Alias, c.Browser, c.EmailClient, c.OperatingSystem, c.Device FROM Clicks c INNER JOIN SendLog sl ON c.SendID = sl.JobID AND c.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (c.export_date = ''{0}'')) 
    UNION 
    (SELECT u.ClientID, u.EventType, u.EventDate, u.SendID, u.Individual_ID, (CASE WHEN sl.treatment_code IN (''TREATMENTCODE1'', ''TREATMENTCODE1'', ...) THEN u.TriggeredSendExternalKey ELSE null END) AS TriggeredSendExternalKey, u.EmailAddress, u.SubscriberID, u.ListID, u.BatchID, null AS Domain, null AS BounceCategory, null AS BounceReason, null AS SMTPCode, null AS SendURLID, null AS URLID, null AS URL, null AS Alias, null AS Browser, null AS EmailClient, null AS OperatingSystem, null AS Device FROM Unsubs u INNER JOIN SendLog sl ON u.SendID = sl.JobID AND u.Individual_ID = sl.Individual_ID WHERE (sl.brandorgcode LIKE ''{2}'') AND (u.export_date = ''{0}'')))


    SOURCE COLUMNFINAL COLUMN

    DATA TYPE

    LENGTH

    MANDATORY
    AllFiles.ClientIDClientID

    Numeric

    Big integer

    Y

    AllFiles.EventTypeEventType

    String

    6

    Y

    AllFiles.EventDateEventDate

    Datetime


    Y

    AllFiles.SendIDSendID

    Number

    Big integer

    Y

    AllFiles.Individual_IDIndividual_ID

    String

    100

    Y

    AllFiles.TriggeredSendExternalKeyTriggeredSendExternalKey

    String

    100

    N, Mandatory in case of RTM

    AllFiles.EmailAddressEmailAddress

    String

    100

    Y

    AllFilest.SubscriberIDSubscriberID

    Number

    Big integer

    Y

    AllFiles.ListIDListID

    Number

    Big integer

    Y

    AllFiles.BatchIDBatchID

    Number

    Big integer

    Y

    Spam Complaints.DomainDomain

    String

    100

    N

    Bounces.BounceCategoryBounceCategory

    String

    50

    N, Mandatory in case of bounce

    Bounces.BounceReasonBounceReason

    String

    8000

    N, Mandatory in case of bounce

    Bounces.SMTPCodeSMTPCode

    Numeric

    Small integer

    N

    Clicks.SendURLIDSendURLID

    Number

    Big integer

    N

    Clicks.URLIDURLID

    Number

    Big integer

    N

    Clicks.URLURL

    String

    4000

    N

    Clicks.AliasAlias

    String

    500

    N

    Clicks.BrowserBrowser

    Text

    none

    N

    Clicks.EmailClientEmailClient

    Text

    none

    N

    Clicks.OperatingSystemOperatingSystem

    Text

    none

    N

    Clicks.Device

    Device

    Text

    none

    N

    Contains 24h delta.

    File 3: C_CMP-METADATA_SFMC-<BRANDORG>_YYYYMMddHHmmss.json

     Click here to expand...

    Combine SendJobs, JourneyActivity (<BRANDORG>_JOURNEY_ACT_TRK.csv), Journey (<BRANDORG>_JOURNEY_TRK.csv) files using:

    • SendJobs File to Journey Activity (left join since not all SendJobs originate from a Journey) using
      • SendJobs.TriggeredSendExternalKey = JourneyActivity.JourneyActivityObjectID
    • Journey activity File to Journey (left join) using
      • JourneyActivity.VersionID = Journey.VersionID
    SELECT sj.ClientID
           sj.SendId,
           sj.FromName,
           sj.FromEmail,
           sj.SchedTime,
           sj.SentTime,
           sj.Subject,
           sj.EmailName,
           sj.TriggeredSendExternalKey,
           sj.SendDefinitionExternalKey,
           sj.JobStatus,
           sj.PreviewURL,
           sj.IsMultipart,
           ja.VersionID,
           ja.ActivityID,
           ja.ActivityName,
           ja.ActviityExternalKey,
           ja.ActivityType,
           j.JourneyID,
           j.JourneyName,
           j.VersionNumber,
           j.CreatedDate,
           j.LastPublishedDate,
           j.ModifiedDate,
           j.JourneyStatus
      FROM sendjobs sj
           LEFT JOIN journey_activity ja
                  ON sj.TriggeredSendExternalKey = ja.JourneyActivityObjectID
           LEFT JOIN journey j
                  ON ja.VersionID = j.VersionID
    
    
    +
           INNER JOIN SendLog (for specific brand_org_code) sl
                   ON sj.SendID = sl.JobID
    
    // In order to constraint to specific brand_org_code


    SOURCE COLUMNFINAL COLUMN

    DATA TYPE

    LENGTH

    Mandatory Fields

    SendJobs.ClientIDClientID

    Numeric

    Big integer

    Y

    SendJobs.SendIdSendId

    Number

    Big integer

    Y

    SendJobs.FromNameFromName

    String

    130

    N

    SendJobs.FromEmailFromEmail

    String

    100

    N

    SendJobs.SchedTimeSchedTime

    Datetime

    mm/dd/yyyy hh:mi:ss AM/PM

    N

    SendJobs.SentTimeSentTime

    Datetime

    mm/dd/yyyy hh:mi:ss AM/PM

    N

    SendJobs.SubjectSubject

    String

    200, but can exceed this value depending on selected extract options

    N

    SendJobs.EmailNameEmailName

    String

    100

    N

    SendJobs.TriggeredSendExternalKeyTriggeredSendExternalKey

    String

    100

    Y

    SendJobs.SendDefinitionExternalKeySendDefinitionExternalKey

    String

    100

    N

    SendJobs.JobStatusJobStatus

    String

    30

    N

    SendJobs.PreviewURLPreviewURL

    String

    300

    N

    SendJobs.IsMultipartIsMultipart

    String

    5

    N

    JourneyActivity.VersionIDVersionID

    Text

    256

    Y

    JourneyActivity.ActivityIDActivityID

    Text

    256

    N

    JourneyActivity.ActivityNameActivityName

    Text

    256

    N

    JourneyActivity.ActviityExternalKeyActviityExternalKey

    Text

    256

    N

    JourneyActivity.ActivityTypeActivityType

    Text

    256

    N

    Journey.JourneyIDJourneyID

    Text

    256

    N

    Journey.JourneyNameJourneyName

    Text

    256

    N

    Journey.VersionNumberVersionNumber

    Number


    N

    Journey.CreatedDateCreatedDate

    Date

    mm/dd/yyyy hh:mm:ss AM/PM

    N

    Journey.LastPublishedDateLastPublishedDate

    Date

    mm/dd/yyyy hh:mm:ss AM/PM

    N

    Journey.ModifiedDateModifiedDate

    Date

    mm/dd/yyyy hh:mm:ss AM/PM

    N

    Journey.JourneyStatusJourneyStatus

    Text

    256

    N

    Contains 24h delta.

  4. Transfer files to our EC2 host  EP-193 - Getting issue details... STATUS
  5. Unzip files
  6. Create empty files with headers (if Athena hasn't returned any file and there was no exception)  EP-194 - Getting issue details... STATUS
  7. Convert to csv (extension .dat) and change file names  EP-237 - Getting issue details... STATUS
  8. PGP encrypt files (same production public key for all CDP integrations)  frms_prod_key_pub.asc  EP-195 - Getting issue details... STATUS
  9. Create MD5


Original file consolidation description file (for archival purposes): SFMC Return File Consolidation.xlsx



3) Middleware → 4) SFTP

 Click here to expand...
  1. Transfer files from EC2 to CDP by country daily in a different sets before 19:30 UTC, when the CDP data warehouse batch load starts. 
    The folder structure will be as shown below. The folder names will be the same as the 3-character ISO country codes for the respective countries. 

    ->incoming
        -> MEX
       
    -> POL
       
    -> USA

  2. EP-430 - Getting issue details... STATUS
  3. Transferred (encrypted) files are retained on EC2 (S3 bucket as virtual drive) for 21 days for backup purposes. 

EP-196 - Getting issue details... STATUS  

EP-349 - Getting issue details... STATUS

EP-347 - Getting issue details... STATUS


EP-197 - Getting issue details... STATUS

HOSTsftp.skynet.epsilon.com

UAT Username

RBNK_Q_SFMC

PROD Username

RBNK_P_SFMC

Port

22

EP-333 - Getting issue details... STATUS


4) SFTP → 5) CDP

 Click here to expand...


  1. Decrypt files
  2. Decompress files
  3. FRMS and batch load process will perform audit checks to validate file name, record count and data compliance
  4. Files then go through the ETL process to be loaded into the data warehouse. 
  5. Refresh loads the data to the data mart. 
  6. Subsequently, the extracts from the CDP data mart are pulled and transferred to the SFTP location for pickup by the RB middleware.



 Test procedure proposed by Epsilon
  • Perform a “test” initial load extract with one or two incremental updates for the initial POC IFCN brand. Then shut the update off so we have a stable environment
  • Create some “fake” test subscribers represented by various RB and Epsilon employees (or create a fake BU in SFMC with just those individuals)
  • Generate “campaigns” against the test subscribers
  • Each respondent has a set of instructions on what to do with the email they receive (open, click, unsubscribe, etc)
  • Receive the actual data from those campaigns and response back from SFMC
  • Epsilon tests the return file feeds and this is the basis of the UAT
  • Once UAT is complete and RB has done all of their campaign development/testing for the real IFCN campaigns they clear the BU and Epsilon sends a new initial load with updates going to SFMC every day thereafter.