Search
Close this search box.

Revolutionizing Data Transformation: A Comprehensive Guide to Building an Autonomous Pipeline

As the keynote speaker at the Data Engineering Summit 2023, Rittika Jindal, a seasoned Cloud Architect at Thomson Reuters, enlightened the audience about the development of a dynamic and autonomous data transformation pipeline. The tech stack comprised AWS Fargate containers, DBT, and Airflow, focusing on the ‘T’ in the ETL or ELT pipeline. This pipeline […]

As the keynote speaker at the Data Engineering Summit 2023, Rittika Jindal, a seasoned Cloud Architect at Thomson Reuters, enlightened the audience about the development of a dynamic and autonomous data transformation pipeline. The tech stack comprised AWS Fargate containers, DBT, and Airflow, focusing on the ‘T’ in the ETL or ELT pipeline. This pipeline equips businesses to construct their data marts independently, relying on SQL/DBT and creating dashboards for rapid insights without depending excessively on data engineers.

The Nature of the Transformation Pipeline:

Jindal emphasized the concept of a transformation pipeline, distinguishing it from conventional data extraction and loading pipelines. A transformation pipeline primarily targets the ‘T’ in ETL, assuming the data is already within the system and focusing solely on transformation. Thomson Reuters uses this pipeline to drive autonomy in their data analytics, incorporating technologies like DBT, Airflow, and AWS Fargate containers.

Promoting Autonomy in Data Analytics:

The primary goal of this transformation pipeline is to foster autonomy within Thomson Reuters’ data analytics team. Jindal explains that multiple teams operate in their data domain: the data engineering team, responsible for building data pipelines and maintaining years of data activity, and the analytics team, working closely with business units to generate insights, models, and dashboards. To bridge the gap between raw data and interpretable data for analysts and data scientists, Thomson Reuters devised the data transformation pipeline. This pipeline empowers the analytics team to transform data and build their marts, leveraging simple SQL queries.

Choosing the Right Tools: DBT and Airflow

After settling on the idea of a data transformation pipeline, the next challenge was choosing the right tools. After weighing multiple factors, including licensing costs, upskilling requirements, and ease of use, DBT emerged as the best fit for data wrangling and transformation. Open-source and user-friendly, DBT’s sole focus is on transforming the data already present in the system.

For workflow orchestration, Jindal and her team considered various tools, settling on Airflow. They needed a tool that could not only orchestrate DBT tasks but could also handle Python codes and third-party API calls. Airflow, as a platform that allows you to script your workflows using Python, complemented DBT well and fulfilled the requirements.

Implementing the Solution:

The final step was to integrate everything into a seamless architecture, adhering to the architectural principles of scalability and high availability. Thomson Reuters prioritizes building solutions on cloud and prefers serverless designs. This strategy led to a solution architecture utilizing Amazon ECR for storing Docker images and AWS Fargate for running serverless containers.

The architecture essentially works as follows: The Docker image containing DBT and Python is stored in the Amazon ECR. From there, Amazon ECS Task Definitions are created, which launch Fargate containers. Each container contains the DBT image, runs one model at a time, and then shuts down. The entire process is orchestrated through Airflow, making the system efficient, scalable, and high-performing.

Conclusion:

Rittika Jindal’s talk shed light on an innovative approach to data transformation, emphasizing the importance of autonomy in data analytics and making intelligent choices in tool selection. By implementing a data transformation pipeline using DBT, Airflow, and AWS Fargate, Thomson Reuters has efficiently streamlined their data analytics process, minimizing dependencies and accelerating insights generation.

Picture of AIM Research
AIM Research
AIM Research is the world's leading media and analyst firm dedicated to advancements and innovations in Artificial Intelligence. Reach out to us at info@aimresearch.co
Subscribe to our Latest Insights
By clicking the “Continue” button, you are agreeing to the AIM Media Terms of Use and Privacy Policy.
Recognitions & Lists
Discover, Apply, and Contribute on Noteworthy Awards and Surveys from AIM
AIM Leaders Council
An invitation-only forum of senior executives in the Data Science and AI industry.
Stay Current with our In-Depth Insights
The Biggest Exclusive Gathering Of CDOs & Analytics Leaders In United States

MachineCon 2024
26 July 2024, New York

MachineCon 2024
Meet 100 Most Influential AI Leaders in USA
Our Latest Reports on AI Industry
Supercharge your top goals and objectives to reach new heights of success!

Cutting Edge Analysis and Trends for USA's AI Industry

Subscribe to our Newsletter