Cloud based Data Infrastructure Accelerates Innovation
Executive Summary
Our client is a market research company which provides data driven workplace culture intelligence sought. This client sought our expertise in developing a cloud-based data infrastructure. The aim was to create a platform capable of seamlessly integrating data from multiple survey sources, enabling data driven insights extraction. They also wanted to offer their clients access to their specific data while ensuring the privacy and security of data from other clients. Leveraging AWS and Snowflake, Newtuple Technologies successfully delivered a tailored solution to meet these complex requirements.
Prior to this, the client relied on spreadsheets for storing survey data, a method that greatly hindered their ability to utilize the data for machine learning and required manual and time consuming workflow for providing access to customer data. The collaboration with Newtuple Technologies marked a significant transformation, as we efficiently implemented and deployed a robust data infrastructure within a mere four weeks. This new system is not only efficient but also primed for seamless integration with the GenAI solution
X%
Reduction in ingestion time from diverse survey platforms
$Y
Reduction through the automation of customer data access
Z%
Efficiency improvement due to the implementation of automation processes
High level objectives
-
Develop a scalable, cloud-based platform for data storage and analytics.
-
Decrease the duration required to integrate new survey platforms into the current data system.
-
Facilitate seamless data ingestion and run machine learning pipelines for inference
-
Automated workflow for providing customers access to their data on Snowflake by ensuring privacy is maintained
Problem Statement and Solution
The client conducts bi-annual employee surveys from US based companies utilizing multiple survey platforms to evaluate workplace culture. This data is pivotal for educating their clients on cultural intelligence. The company's ability to effectively integrate with a wide range of survey platforms is crucial for extracting actionable, data-driven insights.
Our proposed solution involved using the Snowflake platform for data storage and AWS EC2 for data gathering and transformation. We offered the client access to an AWS S3 bucket to upload survey data, which is then processed by the backend engine operating in EC2 to format the data appropriately. Subsequently, this formatted data is transferred to Snowflake for storage. This solution is highly scalable, capable of accommodating an extensive client base. Additionally, the platform facilitates effortless execution of machine learning pipelines, as clients can effortlessly query data in their preferred format. The solution is connected with SQLAlchemy where the user can execute SQL queries.
Upon the addition of a new customer's survey data to the platform, our solution automatically detects this change. Following this, an email with specific credentials to access that particular customer's data is generated and sent to the client. These credentials can then be shared by the client with the new customer, allowing exclusive access to only their data, ensuring the privacy of data from other customers. This entire process, from identifying a new customer to generating and distributing credentials, is fully automated through a Python-based backend running on an AWS EC2 instance.
​
Architecture and Technical Stack
​
​
​
​
​
​
​
Summary
The data infrastructure developed for the client significantly enhances their market research workflow efficiency. Comprehensive automation within the solution has yielded numerous benefits, ranging from minimizing manual data ingestion efforts to facilitating customer access to data views. This infrastructure has been designed with a focus on compatibility with GenAI-based solutions, aiming to expedite analytics processes on the stored data.