NUTRINTG Crowdtwist to CDP Integration (Harmony)
This is a project powered by Spring Integration framework. The purposes of this project are:
Fetch files from Crowdtwist SFTP.
Analyze if files are matching our expected files.
Encrypt the files.
Send the encrypted files to Epsilon SFTP.
Therefore there is a service for each of these purposes:
file-fetcher
file-set-analyzer
file-encrypter
file-sender
Bitbucket URL: https://bitbucket.org/rbdigital/crowdtwist-to-cdp-integration
File Fetcher
File Fetcher process is triggered every 5 seconds (on all environments, subject to be changed if necessary). It polls files that match regex patterns from Crowdtwist SFTP to S3. Regex patterns supported on production:
(?<filename>(2_(?<clientid>[^]*)USER(PROFILE|ACTIVITY|REDEMPTION)(?<date>[0-9]{8}).zip))
(?<filename>((PURCHASE_Transaction_Detail_|Receipt_Scan_Activity_)(?<year>\\d{4})(?<month>\\d{2}))(?<day>\\d{2})\\d{6}.zip)
(?<filename>(2(?<clientid>[^]*)POINTS_EXPIRATION(?<date>[0-9]{8}).zip))
(?<filename>(295_points_expiration(?<year>\\d{4})(?<month>\\d{2}))(?<day>\\d{2}).csv)
(?<filename>(USAC_historic_receipt_scan_(?<year>\d{4})-(?<month>\d{2}))-(?<day>\d{2})_\d{6}.csv)
Crowdtwist SFTP details alongside regex patterns are stored in MongoDB in crowdtwistToCdpFileFetcher_sftpProperties
collection. Bucket name is stored in aws.s3.bucket
property in AWS. Data about polled files is stored in crowdtwistToCdp_fileMetaDataStore
. Data about file transfers in progress is stored in crowdtwistToCdpFileFetcher_progressMark
collection. The process ends with sending a message to a queue on AWS SQS. Queue name is stored in aws.sqs.file-fetcher-to-file-set-analyzer.url
property in AWS.
File Set Analyzer
Triggered by a message from the queue. It validates a file name in the queue message against the data in crowdtwistToCdpFileSetAnalyzer_strategyCodes
collection. There is currently only one strategy code "code" that serves as a placeholder for possible future extensions. The process ends with an email and a message to a queue on AWS SQS. Queue name is stored in aws.sqs.file-set-analyzer-to-file-encrypter.url
property in AWS.
File Encrypter
Triggered by a message from the queue. Encrypts a file using a public key stored in appropriate yaml file in infra/secrets folder and uploaded on AWS during terraform run. File is SOPS-protected, the platform team should be contacted for additional details. Information about encrypted files is stored in crowdtwisttocdpfileencrypter_encryptedFiles
collection. The process ends with an email and a message to a queue on AWS SQS. Queue name is stored in aws.sqs.file-encrypter-to-file-sender.url
property in AWS.
File Sender
Triggered by a message from the queue. Sends an encrypted file to Epsilon SFTP. SFTP connection details are stored in AWS, additional details can be found in infra/main.tf file. Information about sent files is stored in crowdtwistToCdpFileSender_sentFiles
collection. The process ends with sending an email.
Closing Notes
As always the best documentation is code itself. However, keep in mind that there are some misleading inconsistencies related to the fact that project structure was copied from other Harmony projects. The good example of that are SQS messages with info about a single file in an array instead of a string.
First place to look at are configuration files in src/main/resources of each module. To understand the flow it is recommended to investigate IntegrationConfiguration.java
files in each of the modules. Lastly, there is infra folder with terraform files for AWS.
These actions should give a pretty solid understanding of Crowdtwist to CDP integration.