NUTRINTG CDP to SFMC
Please find bellow instruction how to test integration CDS to SFMC:
Architecture
----------------------------------------------------------------
We have four servers(systems):
A) - Epsilon SFTP (sftp.skynet.epsilon.com)
Currently Epsilon got informations about clients from three regions: America(countries: Canada + Mexico), Europe(countries: Spain+Poland), Asia(countries: Singapore).
Customer data from each region goes daily to specified FTP account:
America: rbnk_X_sfmcamer
Europe: rbnk_X_sfmceu
Asia: rbnk_X_sfmcsea
where X :
p - is production environment
q - is develop environment
When trying to log into FTP account, it is necessary to provide proper user name and RSA key.
On each FTP account, daily 30 files are coming into "/outgoing/" catalog:
XYZ.dat.pgp - CSV files decrypted with PGP
XYZ.md5 - checksum MD5 file XYZ.dat.pgp
where XYZ is a file name in format rb_RRRR_FFFF_YYYYMMDDHHMMSS
RRRR - region(amer, eu, sea)
FFFF - table name (actdet, cgsub, child, csl, customer...)
YYYYMMDDHHMMSS - file creation time and date
B) - Mule middleware[Mule Cloud]
Mule application "cdp-to-sfmc-files-integration" on cloud, for each of regions:
AMER-PROD-1 - for maintenance America region in Epsilon
EU-PROD-1 - for maintenance Europe region in Epsilon
SEA-PROD-1 - for maintenance Asia region in Epsilon
Configuration for application can be found in JAR in file "application-PROD.properties" and contains data which is necessary to establish connection with:
- serwer C) using SSH
- MongoDB
- Athena and S3 AWS cloud
Applications runs once a day, triggered by CRON.
Application goal is to send data from Epsilon SFTP to SalesForce FTP.
C) EC2 middleware [AWS Cloud](cdp-sftp-prod.services.rbcloud.io)
When trying to log into server, it is necessary to provide proper user name and RSA key using SSH or SFTP connection.
mule_amer - when integration is about America region
mule_eu - when integration is about Europe region
mule_sea - when integration is about Asia region
BASH scripts are responsible for:
- data transfer from A) server
- decryption and basic transformation files from A) server
- sending files to AWS S3
Due to communication with A) runs with SFTP using keys, keys must be installed on C) by platform engineerings.
Bartłomiej Jurek is responsible for installing keys. Each user which is trying to log into A) have to have it own key.
On user account must be also installed PGP key to encode files.
If you have any questions regarding keys or how to install keys please contact with Tomasz Lewiński or Bartłomiej Jurek.
To be able to send files to S3, platform engineerings have to configure instance so as to use S3 without necessary of providing password. For any details please contact with Bartłomiej Jurek.
D) SFTP SalesForce
All access details (including servers addresses) to SF FTP can be found in MongoDB data base.
Each BusinessUnit contains two FTP servers. Files from Athen have to be transferred on both servers - FTP master and FTP slave.
All connections details to SF can be found on MS Teams, or it can be provided by Jan Jastrząb
How process of sending data from Epsilona do SalesForcea looks like?
----------------------------------------------------------------
Once a day (on each region) on B) runs process "cdp-to-sfmc-files-integration".
This process:
- get files from A) and transfer S3, through:
- runs on C), using SSH, script "1_download_files.sh"
- script connect with A)
- download all files *.DAT.PGP to "current/encrypted" catalog
- download all files *.MD5 to "current/md5" catalog
- create on A) "/archive/XYZ" catalog, where gdzie XYZ is a part of data and time from "customer" PGP file.
- for newly created catalog, files PGP and MD5 are uploaded which were downlaoded to archive.
- runs on C), through SSH, script "2_calculate_checksums.sh" which copy checksum form MD5 and put them in "all.md5" file
- runs on C), through SSH, script "3_verify_checksums.sh" which check checksum form MD5 PGP files with expected data in "all.md5"
- runs on C), through SSH, script "4_decrypt_files.sh" which decode PGP files and put them in "current/decrypted" catalog
- runs on C), through SSH, script "5_cleanup_files.sh" which transfer PGP files and MD5 to "current/archive/XYZ" catalog, where "XYZ"
is current server time
- runs on C), through SSH, script "6_remove_line_brakes.sh" which removes new lines marks from records in CSV files
- move CSV files (*.dat) from "current/decrypted" catalog to S3 bucket, sending SHELL request through SSH
- send request for Athena to regenerate table
- runs requests on Athena. Requests can be found in "resources/athena-queries/PROD/ctas/[amer|eu|sea]" catalog in source code
- get from S3(the same as above) list of results from requests to Athena and choose those which were executed by current process.
Results are sorted by brandOrgCode and marketCode. Results files name are filtered with regular expression which rejects not supported brandOrgCodys. This expression can be found in "resources/application-PROD.properties" file.
- for files which were accepted by regular expression, date from MongoDB are downloads to establish where those files should be uploaded in SalesForce
- using date downloaded in previous step, file is sented to D) (SalesForce)
Known issues
----------------------------------------------------------------
1) Epsilon is late with generating files
There can be situation when Mule process start while there is no all 30 files on SFTP server because Espilon was late with generating files.
In this kind of situation on C) in "current/encrypted|md5" catalogs, there are going to be dowloaded files and on A) be created archive catalog with dowloaded files.
Because no all files are available script "1_download_files.sh" will returns with error and it caused exception in Mule and stops process.
What must be done to solve this issue:
- all 30 files form A) form day in which error occured must be copied on local server z A).
there is possibility that it will be necessary to copy file from "outgoing/" catalog and "outgoing/archive/XYZ" catalog.
- when we have downloaded all 30 files it is necessary to delete "outgoing/archive/XYZ" catalog
- remove all files from "outgoing/" catalog
- upload all 30 files which were download previously to "outgoing/" catalog
- log into C) and remove all PGP and MD5 files from "current/encrypted" and "current/md5" catalog
- on B) runs Cron for this process and control application LOG
It may happened that (on weekend) that due to above error on A) there will be more than 30 files because of Epslion new files upload.
In this case we need to run whole process as above but we need to sort files by date.
When we clean up files from A) and C) we need to run process on B) first for oldest pack of 30 files and for youngest pack of 30 files.
It is necessary to remember that archive catalog names are made from date and time from "customer" file.
2) Connection error on SFTP beetwen A) and C)
Reasons can be different but it seems that teeth are some issues with Epsilon SFTP.
Probably you need to run proceeder from 1) to clean up and run process again.
3) Connection break up between B) and D) SFTP
In this case we need to:
OR run whole process from beginning (first please run fix from point 1)
OR manual copy files form S3, change files name and send to D)
OR redeploy to B) converted process which will only transfer files to D)
It may be necessary to set up correct exportDate in flowVars and hide few blocks.
4) brandOrgCody are not uploaded to D) eve if they are in Epsilon drop
Probably they filtered as not supported. All we need is to add this brandOrgCody using "|" in regular expression in "application-PROD.properties".
5) There is no space on C)
----------------------------------------------------------------
If there is no space on C) you may ask platform engineerings to get additional space or remove some olde archive which are all ready not needed.
Adding new region
----------------------------------------------------------------
When it is necessary to add new region (similar to amer, eu, sea) it is necessary to follow below steps:
- get form Epsilon login and key for SFTP for specified region
- ask platform engineerings to create new user on C), using name specified for new region
On newly created account there have to be BASH scripts which were described above and catalogs infrastructure in "current" catalog must be created
- get from platform engineerings login and RSA key which is needed to connect with this account through SSH
- instal on above account key which is going be used for SSH communication
- zainstalować na powyższym koncie klucz PGP służący do deszyfrowania plików PGP
- ask platform engineerings to configure above account to have an access to AWS
- ask platform engineerings to create new S3 bucket specified for newly created region and add region name to properties file
- ask platform engineerings to create new database in Athena and configure it
- make sure that in MongoDB are data which is related to newly created region and expected brands
- add Athena requests to resources to catalog for newly created region
- modify local properties
- redeploy applications to B)
- if it is possible to run WORKGROUPy on Athena then platform engineerings must make sure that newly created S3 bucket and database are configured properly