Close this
Close this

Laxman Bangaru

Development
Porto, Portugal

Skills

Data Engineering
AWS (Amazon Web Services)
Athena (AWS)
Apache Spark
Azure Databricks
S3 (AWS)
Python
Linux
Hive
Azure Storage
Jenkins
Ansible
Agile
mySQL
Scala
SQL
Shell Scripting
PySpark
Hadoop
HDFS
MapReduce
YARN
Sqoop
Apache Oozie
Snowflake
Glue (AWS)
IAM (AWS)
Azure (Microsoft Azure)
Microsoft SQL Server
Cassandra
HBase
Git
Maven
IntelliJ
ETL
View more

About

Laxman is a highly skilled Data Engineer with over six years of experience in Big Data, cloud technologies, and advanced data processing. He has a proven track record of leveraging tools like PySpark, Hadoop, and AWS to build scalable, efficient, and automated data solutions. With expertise in managing end-to-end data pipelines, optimizing performance, and collaborating with cross-functional teams, Laxman excels at delivering data-driven insights across diverse industries such as healthcare, finance, and telecom. His adaptability and ability to quickly master new technologies make him an invaluable asset for any data-focused initiative.
View more

Acomplishments

Streamlined XML Data Processing for MAN Trucks: Laxman designed and implemented a scalable data pipeline using PySpark, AWS Glue, and Snowflake to process large volumes of XML data, ensuring seamless integration, automation, and reliable data insights.
Optimized Data Transformation and Storage: At Mphasis, Laxman revamped data pipelines by converting complex SQL queries into highly efficient Spark DataFrames, resulting in significant performance gains for business-critical operations.

Work Experience

Senior Data Engineer

Capgemini
January 2024 - Present
  • Developed PySpark scripts for processing XML-formatted truck data.
  • Ensured proper data parsing, extraction, and validation to maintain integrity.
  • Collaborated with AWS specialists to integrate data pipelines using S3, Glue, and Athena.
  • Utilized Snowflake and DBT to enhance ETL and data warehousing processes.

Data Engineer

Aubay Consulting
June 2023 - December 2023
  • Implemented code for importing data using Sqoop from RDBMS and transforming data with Spark Scala.
  • Migrated SQL queries into Spark DataFrames to improve performance.
  • Managed code maintenance through Git and Jenkins, ensuring smooth deployment processes.
  • Conducted Spark tuning activities and created Control-M jobs for automation.
  • Enhanced and updated code based on changing business requirements.

Big Data Developer

Mphasis India Pvt.Ltd
December 2021 - July 2023
  • Coded the implementation for importing the data using Sqoop from RDBMS and Spark for
  • loading the files to Hadoop Hive tables and PySpark data frames for transforming the data
  • based on business requirements.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries Involved
  • in creating views and indexes in Hive.
  • Scheduled the jobs in Control-M and created dependencies if-any and moved the code to
  • production.
  • Collaborated with the infrastructure, network, database, application, and BI teams to ensure
  • data quality and availability.
  • Responsible for DevOps activities such as code maintenance in GIT, Jenkins for Maven build, and Ansible deployment.
  • Spark tuning activities, creating Control-M jobs.
  • Enhanced the existing code, based on business proposals on changes.

Software Engineer

Recvue
September 2021 - December 2021
  • Coded the implementation for importing the data using Sqoop from RDBMS and Spark Scala to
  • load the files to Hadoop Hive tables and Spark data frames to transform the data
  • based on business requirements. Shell scripting is used to automate the Sqoop and Spark jobs.
  • Code quality with SonarQube.
  • Converted SQL queries to Scala Data Frames for better performance optimization.
  • Responsible for DevOps activities such as code maintenance in GIT, Jenkins for Maven build, and Ansible deployment.
  • Spark tuning activities, creating Oozie jobs.
  • Enhancements to existing code, based on business proposals on changes.

Big Data Developer

Cognizant Technology Solutions India Pvt. Ltd.
March 2020 - September 2021
  • Code implementation for importing the data using Sqoop from RDBMS and Spark Scala for
  • loading the files to Hadoop Hive tables and Spark data frames for transforming the data
  • based on business requirements. Shell scripting is used to automate the Sqoop and Spark jobs.
  • Code quality with SonarQube.
  • Converted SQL queries to Scala Data Frames for better performance optimization
  • Responsible for DevOps activities such as code maintenance in GIT, Jenkins for SBT build, and Ansible deployment.
  • Spark tuning activities, creating Oozie jobs.
  • Enhanced the existing code, based on business proposals on changes.

Hadoop Developer

Novartis-Dice-Brazil 2.0
July 2019 - February 2020
  • Wrote Transformation logic to meet business requirements by leveraging Spark.
  • Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems
  • into Hadoop Hands-on writing the hive scripts to reduce the job execution time.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries Involved
  • in creating views and indexes in Hive.
  • Enhanced the code based on business requirements
  • Scheduled the jobs in Control-M, created dependencies if-any, and moved the code to
  • production.
  • Collaborated with the infrastructure, network, database, application, and BI teams to ensure
  • data quality and availability.
  • Created the framework and implemented the metadata processes.
  • Designed a logical and physical data model.
  • Used Spark SQL to process a huge amount of structured data.
  • Experienced in managing and reviewing Hadoop log files.

Hadoop Developer

Anthem
October 2018 - July 2019
  • Experienced in building automated scalable distributed data solutions using Stream sets Data Collector.
  • Experienced in job management using Control-M scheduler, creating dependencies, and triggering the job flows.
  • Built automated scalable distributed data solutions using Stream sets Data Collector.
  • Performed data validations and filtered out the error records before loading them into the Hive Database.
  • Worked on improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD, and Spark YARN.
  • Experienced in debugging, troubleshooting production systems, profiling, and identifying performance bottlenecks.
  • Well-versed in installation, configuration, support, and management of Big Data and underlying
  • infrastructure of Hadoop Cluster and deployment tools.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Extensive experience in working with structured data using Hive QL, joining operations, writing custom UDF, and experience in optimizing Hive Queries.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in the Cloudera Cluster.
  • Experience in Apache Flume and Kafka for collecting, aggregating, and moving huge chunks of
  • data from various sources such as web servers, and databases.
  • Extensive experience in importing/exporting data from/to RDBMS the Hadoop Ecosystem using
  • Apache Sqoop.

Hadoop/Spark Developer

Anthem
January 2018 - September 2018
  • Used the Cloudera distribution for the Hadoop ecosystem.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Good experience in Hive partitioning, Bucketing, and Collections perform different types of joins on Hive tables.
  • Created Hive external tables to perform ETL on data that is generated daily.
  • Created HBase tables for random lookups as per the requirement of business logic.
  • Performed transformations using spark and loaded data into HBase tables
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions
  • Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.
  • Assisted in exporting analyzed data to NoSQL DB’s Cassandra and HBase using Sqoop.
  • Implemented test script to support test-driven development and continuous integration.
  • Worked on tuning the performance of Hive and Pig queries.
  • Performance tuning of Hadoop clusters and Hadoop Map Reduce routines.
  • Managed and reviewed Hadoop log files.

Education

Microsoft

Microsoft Certified: Azure Data Engineer Associate
January 2025 - Present
Credential ID: D4B008C830B50175 Certification number: 5EEFFP-D52F83

Kite College of professional engineering and sciences

B.Tech
September 2010 - May 2014
Electronic communication Engineering

Sree Triveni Junior college

MPC
June 2008 - April 2010

SSSM shishuMandir

SSC
May 2006 - April 2007