Worky | Meet our Talent - Laxman Bangaru

Skills

Data Engineering

AWS (Amazon Web Services)

Athena (AWS)

Apache Spark

Azure Databricks

S3 (AWS)

Python

Linux

Hive

Azure Storage

Jenkins

Ansible

Agile

mySQL

Scala

SQL

Shell Scripting

PySpark

Hadoop

HDFS

MapReduce

YARN

Sqoop

Apache Oozie

Snowflake

Glue (AWS)

IAM (AWS)

Azure (Microsoft Azure)

Microsoft SQL Server

Cassandra

HBase

Git

Maven

IntelliJ

ETL

About

Laxman is a highly skilled Data Engineer with over six years of experience in Big Data, cloud technologies, and advanced data processing. He has a proven track record of leveraging tools like PySpark, Hadoop, and AWS to build scalable, efficient, and automated data solutions. With expertise in managing end-to-end data pipelines, optimizing performance, and collaborating with cross-functional teams, Laxman excels at delivering data-driven insights across diverse industries such as healthcare, finance, and telecom. His adaptability and ability to quickly master new technologies make him an invaluable asset for any data-focused initiative.

Acomplishments

Streamlined XML Data Processing for MAN Trucks: Laxman designed and implemented a scalable data pipeline using PySpark, AWS Glue, and Snowflake to process large volumes of XML data, ensuring seamless integration, automation, and reliable data insights.

Optimized Data Transformation and Storage: At Mphasis, Laxman revamped data pipelines by converting complex SQL queries into highly efficient Spark DataFrames, resulting in significant performance gains for business-critical operations.

Work Experience

Senior Data Engineer

Capgemini

January 2024 - Present

Developed PySpark scripts for processing XML-formatted truck data.
Ensured proper data parsing, extraction, and validation to maintain integrity.
Collaborated with AWS specialists to integrate data pipelines using S3, Glue, and Athena.
Utilized Snowflake and DBT to enhance ETL and data warehousing processes.

Data Engineer

Aubay Consulting

June 2023 - December 2023

Implemented code for importing data using Sqoop from RDBMS and transforming data with Spark Scala.
Migrated SQL queries into Spark DataFrames to improve performance.
Managed code maintenance through Git and Jenkins, ensuring smooth deployment processes.
Conducted Spark tuning activities and created Control-M jobs for automation.
Enhanced and updated code based on changing business requirements.

Big Data Developer

Mphasis India Pvt.Ltd

December 2021 - July 2023

Coded the implementation for importing the data using Sqoop from RDBMS and Spark for
loading the files to Hadoop Hive tables and PySpark data frames for transforming the data
based on business requirements.
Involved in creating Hive tables, and loading and analyzing data using Hive queries Involved
in creating views and indexes in Hive.
Scheduled the jobs in Control-M and created dependencies if-any and moved the code to
production.
Collaborated with the infrastructure, network, database, application, and BI teams to ensure
data quality and availability.
Responsible for DevOps activities such as code maintenance in GIT, Jenkins for Maven build, and Ansible deployment.
Spark tuning activities, creating Control-M jobs.
Enhanced the existing code, based on business proposals on changes.

Software Engineer

Recvue

September 2021 - December 2021

Coded the implementation for importing the data using Sqoop from RDBMS and Spark Scala to
load the files to Hadoop Hive tables and Spark data frames to transform the data
based on business requirements. Shell scripting is used to automate the Sqoop and Spark jobs.
Code quality with SonarQube.
Converted SQL queries to Scala Data Frames for better performance optimization.
Responsible for DevOps activities such as code maintenance in GIT, Jenkins for Maven build, and Ansible deployment.
Spark tuning activities, creating Oozie jobs.
Enhancements to existing code, based on business proposals on changes.

Big Data Developer

Cognizant Technology Solutions India Pvt. Ltd.

March 2020 - September 2021

Code implementation for importing the data using Sqoop from RDBMS and Spark Scala for
loading the files to Hadoop Hive tables and Spark data frames for transforming the data
based on business requirements. Shell scripting is used to automate the Sqoop and Spark jobs.
Code quality with SonarQube.
Converted SQL queries to Scala Data Frames for better performance optimization
Responsible for DevOps activities such as code maintenance in GIT, Jenkins for SBT build, and Ansible deployment.
Spark tuning activities, creating Oozie jobs.
Enhanced the existing code, based on business proposals on changes.

Hadoop Developer

Novartis-Dice-Brazil 2.0

July 2019 - February 2020

Wrote Transformation logic to meet business requirements by leveraging Spark.
Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems
into Hadoop Hands-on writing the hive scripts to reduce the job execution time.
Involved in creating Hive tables, and loading and analyzing data using Hive queries Involved
in creating views and indexes in Hive.
Enhanced the code based on business requirements
Scheduled the jobs in Control-M, created dependencies if-any, and moved the code to
production.
Collaborated with the infrastructure, network, database, application, and BI teams to ensure
data quality and availability.
Created the framework and implemented the metadata processes.
Designed a logical and physical data model.
Used Spark SQL to process a huge amount of structured data.
Experienced in managing and reviewing Hadoop log files.

Hadoop Developer

Anthem

October 2018 - July 2019

Experienced in building automated scalable distributed data solutions using Stream sets Data Collector.
Experienced in job management using Control-M scheduler, creating dependencies, and triggering the job flows.
Built automated scalable distributed data solutions using Stream sets Data Collector.
Performed data validations and filtered out the error records before loading them into the Hive Database.
Worked on improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD, and Spark YARN.
Experienced in debugging, troubleshooting production systems, profiling, and identifying performance bottlenecks.
Well-versed in installation, configuration, support, and management of Big Data and underlying
infrastructure of Hadoop Cluster and deployment tools.
Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
Extensive experience in working with structured data using Hive QL, joining operations, writing custom UDF, and experience in optimizing Hive Queries.
Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in the Cloudera Cluster.
Experience in Apache Flume and Kafka for collecting, aggregating, and moving huge chunks of
data from various sources such as web servers, and databases.
Extensive experience in importing/exporting data from/to RDBMS the Hadoop Ecosystem using
Apache Sqoop.

Hadoop/Spark Developer

Anthem

January 2018 - September 2018

Used the Cloudera distribution for the Hadoop ecosystem.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Good experience in Hive partitioning, Bucketing, and Collections perform different types of joins on Hive tables.
Created Hive external tables to perform ETL on data that is generated daily.
Created HBase tables for random lookups as per the requirement of business logic.
Performed transformations using spark and loaded data into HBase tables
Performed validation on the data ingested to filter and cleanse the data in Hive.
Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions
Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.
Assisted in exporting analyzed data to NoSQL DB’s Cassandra and HBase using Sqoop.
Implemented test script to support test-driven development and continuous integration.
Worked on tuning the performance of Hive and Pig queries.
Performance tuning of Hadoop clusters and Hadoop Map Reduce routines.
Managed and reviewed Hadoop log files.

Education

Microsoft

Microsoft Certified: Azure Data Engineer Associate

January 2025 - Present

Credential ID: D4B008C830B50175 Certification number: 5EEFFP-D52F83

Kite College of professional engineering and sciences

B.Tech

September 2010 - May 2014

Electronic communication Engineering

Laxman Bangaru

Skills

About

Acomplishments

Work Experience

Senior Data Engineer

Capgemini

January 2024 - Present

Data Engineer

Aubay Consulting

June 2023 - December 2023

Big Data Developer

Mphasis India Pvt.Ltd

December 2021 - July 2023

Software Engineer

Recvue

September 2021 - December 2021

Big Data Developer

Cognizant Technology Solutions India Pvt. Ltd.

March 2020 - September 2021

Hadoop Developer

Novartis-Dice-Brazil 2.0

July 2019 - February 2020

Hadoop Developer

Anthem

October 2018 - July 2019

Hadoop/Spark Developer

Anthem

January 2018 - September 2018

Education

Microsoft

Microsoft Certified: Azure Data Engineer Associate

January 2025 - Present

Kite College of professional engineering and sciences

B.Tech

September 2010 - May 2014

Sree Triveni Junior college

MPC

June 2008 - April 2010

SSSM shishuMandir

SSC

May 2006 - April 2007