Worky | Meet our Talent - Amudha Eerichetty

Skills

Data Engineering

About

Amudha's skills align with Programmers (Information and Communication Technology). Amudha also has skills associated with System Developers and Analysts (Information and Communication Technology). Amudha has 9 years of work experience.

Work Experience

Hadoop Developer / Data Engineer

Bank of America

June 2023 - Present

Responsibilities: * Understanding the business functionality & analyzing the business requirements of Hadoop Projects. * Used Sqoop to import data from Oracle into the HDFS and then export the data back into an Oracle. * Created Hive tables to store the processed results in a tabular format. * Used Hive queries in Spark-SQL for analyzing and processing the data. * Created Spark Data frames and performed many transformations, Conversions and Merging of data based on the requirement documents provided by the client. * Requirement is to fetch all the database/schema details like HDFS directory paths, Table name Details, table/view name, Application name, group names and user details. From the production environment retrieved all the database/schema names and written into a file. * Using the Unix script, iterating through all the schemas and fetching all the details and writing into the final hive table which is in the text format. Maintaining project documentation/knowledgebase. * Developed Spark Streaming Jobs in Python to consume data from Kafka Topics, made transformations on data and inserted to HBase. * Analyzed large data sets by running Hive queries and Spark jobs in Hadoop Projects. * Worked with AutoSys jobs for loading the data in to tables and for small files compaction in daily weekly and monthly basis. * Worked with different frameworks developed by BOA for data processing. * Used BitBucket to store and manage the code, as well as track and control the changes to the code. * Implemented CI/CD pipeline By Using ETL workflows and automated with GIT.

Hadoop Developer/ Data Engineer

Prudential Insurance

June 2022 - June 2023

Responsibilities: * Co-ordinate with team to get the features/bugs to be solved, prioritization, WBS, planning, task deviation analysis. Cloud Technologies * Developing Integrated Spark streaming code in Data Bricks to dynamically receive data and pre-process, enrich, transform and validate the data of XML data sets & JSON Logs then load into US Data mart Warehouse. * Used SQL to integrate data from different sources, used for analyze data stored in and also SQL is used for improving query performance and reducing query execution time to optimize the performance. * Make Data ready for the downstream systems to consume and ML Model building and dynamic Power BI visualizations and dashboards creation. * Developed Spark Streaming Jobs in Python to consume data from Kafka Topics, made transformations on data and inserted to HBase. * Used Spark for fast and general processing engine compatible with Hadoop data. * Used Spark to design and perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. * Analyzed large data sets by running Hive queries and Spark jobs. * Cascade Jobs introduced to make the data Analysis more efficient as per the requirement. * Worked on NoSQL databases including HBase and Cassandra. * Implemented CI/CD pipeline By Using ETL workflows for Hadoop Projects. * Developing modules to pre-process/compute values with incoming XML datasets & JSON Logs * Developed real time spark application in Scala to populate dynamic debtor score with each transactions & action based on history click stream, product entities(structured) & Customer web events data using Parquet or ORC file formats. * used RESTful APIs for building web services that are scalable, interoperable, and for promote loose coupling between the client and server, making it easier to evolve and maintain the API over time. * Packaged the code to Azure DevOps Git Repo. * Maintaining code by executing using Azure Pipelines.

Hadoop Developer

Axis Bank

December 2020 - May 2022

Responsibilities: * Responsible for building scalable distributed data solutions using Hadoop. * Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster. * Developed Simple to complex Map/Reduce Jobs using Hive and Pig. * Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms. * Performed Data processing on large sets of structured, unstructured and semi structured data. * Performed data analysis in Hive by creating tables, loading it with data and writing hive queries, which will run internally in a MapReduce way. * Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop. * Avoided MapReduce by using PySpark for boosting performance to 3x times. * Worked on RDD and DataFrames in Spark for processing data at a faster rate. * Involved in ETL Data Cleansing, Integration and Transformation using Hive and PySpark. * Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection. * Implemented batch processing of data sources using Apache Spark. * Involved in creating Hive tables, loading with data and writing hive queries. * Cluster co-ordination services through ZooKeeper. * As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not. * Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required. * Used Apache Oozie workflow scheduler system for managing and executing Hadoop jobs. * Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Hadoop developer / Sr. Data Analyst

Four Soft Private Limited

September 2018 - November 2020

Responsibilities: * Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Sqoop, flume, Oozie. * Involved in importing and exporting data from local and external file system and RDBMS to HDFS. * Worked extensively with Hive Query language (HQL). * Designed a data warehouse using Hive, created and managed Hive tables in Hadoop. * Responsible of managing data from disparate sources. * Solved performance issues in Hive with understanding of joins, Groups, and aggregation and how does it translate to MapReduce jobs. * Imported data from critical applications to HDFS Using Sqoop for data analysis. * Worked in setting up Hadoop on Pseudo distributed environment. * Involved in Developing MapReduce script using Java. * Involved in unit testing activities and test data preparation for various business requirements. * Hands-on experience with reading and parsing XML and JSON files using Spark/Hive. * Replaced the existing data analysis tool with Hadoop. * Moved between agile and waterfall approaches depending on project specifics and client goals, creating detailed project road maps, plans, schedules and work breakdown structures. * Created and maintained technical documentation for launching Hadoop Clusters and for executing Hive queries. * Used MS Team Foundation Service for project tracking, bug tracking and project management. * Involved in Scrum calls, Grooming and Demo meeting.

Hadoop Developer / Data Analyst

Netxcell Limited

January 2017 - August 2018

Responsibilities: * Heatmaps and count plots were generated using seaborn and matplotlib packages and outliers were detected and removed * Dimensionality Reduction With t-SNE for Visualization, feature scaling and data normalization is performed using standard scaler * Developed credit card risk modeling to enhance existing risk scorecards or marketing analytics modeling * Among them Random Forest model was selected to get higher degree of comprehensiveness with better performance * Predictive modelling was developed on test data and classification report were generated to identify the fraud transactions * Responsible for using Spark and Hive for data transformation and intensive use of Spark Sql to perform analysis of vast data stores and uncover insights. * Implemented ingestion pipelines to migrate ETL to Hadoop using Spark Streaming and Oozie workflow. Loaded unstructured data into Hadoop distributed File System (HDFS). * Conducted POC's on migrating to Spark and Spark-Streaming using KAFKA to process live data streams and Compared Spark performance with Hive and SQL. * Experienced in writing SQL queries in performing ETL techniques and strong understanding of data warehousing. * Highly analytical and process-oriented with in-depth knowledge of database types; research methodologies; and big data capture, mining, manipulation, and visualization. * Experienced working on Hadoop Framework and its ecosystem like HDFS, MapReduce, Yarn, Spark, Hive, impala, Sqoop and Oozie. * Experience in data ingestion using Spark, Sqoop and Kafka. * Experienced in Spark Programming with Scala and Python. * Expertise on Spark streaming (Lambda Architecture), Spark SQL, Tuning and Debugging the Spark Cluster (MESOS). * Getting in touch with the Junior developers and keeping them updated with the present cutting-Edge technologies like Hadoop, Spark, Spark SQL.

Data Analyst

Nakshatra IT solutions

January 2015 - December 2016

Responsibilities: * Leveraged Regression and Time-Series forecasting techniques using Python on current and historical sales data to help project salesperson's performance. * Extracted program viewership data from logs and cleansed the information using Python (NumPy, Pandas) to segment and store this data into SQL and Oracle databases for further analysis and reporting. * Developed custom SQL functions, actions, groups, sets and data merge procedures to enable rapid development of reports and dashboards. * Helped develop multiple Tableau dashboards based on global program viewership information based on content providers needs and requirements. * Involved in Troubleshooting, resolving and escalating data related issues and automating data validation procedures to improve data quality. * Created Test cases and test scripts for multiple ETL workflows. * Validated multiple reports which were created in SSRS, Tableau. * Helped develop weekly project status reports to be distributed to business and project teams to measure progress against established milestones. * Experienced in writing SQL queries in performing ETL techniques and strong understanding of data warehousing. * Highly analytical and process-oriented with in-depth knowledge of database types; research methodologies; and big data capture, mining, manipulation, and visualization.

Amudha Eerichetty