Surya Yerasi
Development
Texas, United States
Skills
Data Engineering
About
Surya Sainath Reddy Yerasi's skills align with System Developers and Analysts (Information and Communication Technology). Surya also has skills associated with Database Specialists (Information and Communication Technology). Surya Sainath Reddy Yerasi has 6 years of work experience.
View more
Work Experience
Data Engineer
Nike
April 2021 - January 2022
- Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store Extracted real-time data using SQL and analyzed 20+ product streams, activation, and app usage metrics to track sales performance. Continuously ensured the satisfaction of ROI with boost of GMV targets by 15% Designed reusable components, frameworks and libraries at scale to support analytics products for consumer insights Developing different ETL jobs to extract data from different data sources like Oracle, Microsoft SQL Server, transform the extracted data using Hive Query Language (HQL) and load it into Hadoop Distributed file system (HDFS). Managed Airflow clusters daily batch runs for data migration between ingestion layers inventory forecasting Wrote Airflow DAGs using python and Spark SQL queries and deployed them to the cluster, Migrated data across EMR clusters using Spark commands. Used various packages in Python like Pandas and NumPy
Data Engineer
Aetna
October 2020 - March 2021
- Developed workload Automation framework in Bash for executing Hive/beeline and Pyspark big data pipelines. This framework supports error handling/Alarming, QC Validation and sequential/progressive execution Worked on AWS S3 Ingestion Framework development and integration helping Hadoop Ingestion Team to effectively utilize and Ingest Data from AWS S3 to Hadoop Data Lake Set up ETL pipelines on AWS using Docker, Kafka, S3, AWS Glue, Elasticsearch and RDS instances to provide real-time audience analysis Automated data ingestion from AWS Glue Data Catalog to DynamoDB using Hive Scripts, AWS Step Functions, and AWS Event Bridge. Used CloudFormation to deploy & maintain multi-region Infrastructure Implemented the design framework for migrating the Databases from Hadoop 2 to Hadoop 3 environment
Data Engineer
Capital One
October 2019 - September 2020
- Supported in building the Data Warehouse platform for Capital One Auto Finance data which scaled for 100+ TB of data and tens of thousands of queries per week improving decision making efficiency by 400% Provided data-driven insights for Auto Finance which increased rate of customer acquisition by 20% Optimized SQL query design for the highest priority data science ETL pipeline and migrated its HiveQL queries to PySpark and redesigning intermediate table loading processes, reducing runtime by 30% Redesigned a critical incoming data pipeline with the upstream web development team to migrate data transmission process from multiple REST API calls to a single batch CSV data file ingestion in AWS S3 bucket, enhancing the pipeline efficiency and stability Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for the certain events based on use cases
Data Engineer
Syneren Technologies
May 2019 - October 2019
- Initiated and Migrated 75% of on-prem applications to AWS Cloud, reducing the infrastructure cost by 70% Implemented audit framework by partnering closely with business stakeholders to review existing data deliverables and develop new ones for reporting metrics accurately in improving business decisions Design and development of batch and real time big data analytical applications to monitor various mission critical applications so that data can be used to drive and improve the business Leveraged Apache Spark as the processing engine to transform and produce analytics by sourcing the data from RDBMS databases and AWS S3 buckets Implemented AWS Datapipeline to trigger scheduled EMR jobs in the AWS big data stack and alerts on the success and failures of the Datapipeline jobs
Data Engineer
TIF LABS
June 2017 - July 2018
- Built a data platform in the cloud (AWS) to enable data-driven decisions for business benefits and obtain meaningful data insights and developed email alerts to detect EC2 shortages Built Dimensional Model and Data Warehouse for loan processing Data Mart for HDFC Bank Key contributor to custom production ETL codebase in Python for financial data transformations Ingested data with Kafka, saved raw data to S3, batch processed data with Spark, trained ALS based collaborative filtering model, stored data in Cassandra for providing usable, timely Big data for generating valuable insights thereby improving the efficiency by 40% Created aggregates for large scale data sets which reduced time to process data by 30%
Data & ML Engineer
Amazon
January 2022 - Present
- Developed a data-driven support system empowering executives for informed strategic decisions leading to 30% increase in production efficiency translating to annual savings of $7M across 8 facilities Designed and Implemented Data Lineage solution across the data discovery Environment Designed and Developed Industrial Data Fabric for Factory visibility by integrating data from various systems including ERP, ADP, IOT and M3. Developed a data-driven support system empowering executives for informed strategic decisions leading to 30% increase in production efficiency translating to annual savings of $7M across 8 facilities Developed Dag-level and Job-level lineage which provides clear understanding of how data flows through the system, identifies any potential security risks or vulnerabilities, reduce costs by identifying unused or redundant processes that can be eliminated Led the whole development process including writing design review documents, holding design review meetings, implementation, maintenance and collecting feedbacks