Close this

Kumar Vaibhav

Development
Karnataka, India

Skills

Data Engineering

About

Kumar Vaibhav's skills align with System Developers and Analysts (Information and Communication Technology). Kumar also has skills associated with Programmers (Information and Communication Technology). Kumar Vaibhav appears to be a low-to-mid level candidate, with 4 years of experience.
View more

Work Experience

GCP Data Engineer

CME Group Bangalore
May 2022 - Present
  • CME Group is the world leading financial derivatives exchange, and trades in assets classes that include agriculture products, Currencies, Energy, Metals and stock indexes futures Created Spark jobs using Databricks with G C P infrastructure and python as a programming language. Worked on writing Spark code and run on databricks cluster along with Dataproc GCP Also worked on managing databricks unity catlog Interacted with data residing in HDFS using PySpark to process data. Wrote a PySpark program to parse out needed data by using Spark Context and selected columns with target information and assigned names. Managed and deployed automation scripts over processing scripts. Transferred data from the cluster to a long-term storage system. Executed Hadoop/Spark jobs on Dataproc cluster using programs, data stored in GCS Buckets. Lead the Offshore team of Data Engineer and worked with onshore team for requirement gathering And maintail onshore offshore coordination. Developed multiple Spark Streaming and batch Spark jobs using Python on AWS. Defined and optimized Spark jobs using techniques as parameter or flag optimization.. Performed coverage and unit testing on the Python scripts using Pytest. Worked with terabytes of data in HDFS. Developed archival scripts that could be run either in Jupyter Notebook or on the command line. Developed multiple processing jobs to filter and transform data. Used PySpark modules to store the data on HDFS.

HCL Tech
June 2021 - May 2022
  • HCLTECH is an Indian Service based MNC and I worked for Commonwealth Bank of Australia. Created Spark jobs using Databricks and AWS infrastructure and python as a programming language. Developed Kafka producer and consumer programs and ingested data into AWS S3 buckets. Configured Spark-submit command to allocate resources to all the jobs across the cluster. Loaded and transformed large sets of structured and semi-structured data using AWS Glue. Implemented different instance profiles and roles in IAM to connect tools in AWS. Evaluated and proposed new tools and technologies to meet the needs of the organization. Applied excelent understanding/knowledge of tools in AWS like Glue, EMR, S3, Lambda, Redshift, and Athena. Used Avro, Parquet and ORC data formats to store in to HDFS. Collected log information using custom-engineered input adapters and Kafka. Orchestrated workflows in Apache Airflow to run ETL pipelines using tools in AWS. Integrating streams with Spark streaming for prime speed processing. Developed Spark jobs for data processing and Spark-SQL/Streaming for a distributed processing of data. Wrote simple SQL scripts on the final database to prepare data for visualization with Tableau.

AWS Big Data Developer

Wipro
April 2020 - May 2021
  • Wipro is an Indian Service based MNC and I worked for retail client Levis USA Developed Kafka queue system to collect log data without data loss and published to multiple sources. Worked on Hadoop cloudera CDH distribution were worked on partitions on hive external tables and UDF in hive. Defined and implemented schema for a custom HBase. Worked on Sqoop incremental load of data daily basis using sqoop job scheduled through oozie shell script action. Worked on Python validation script for validation row counts in hive tables and RDBMS database oracle. Workedon reporting in tableau connected with hive using hive server2 (thrift server). Optimized Hive analytics SQL queries, created tables/views, wrote custom queries and Hive-based exception process. Loaded data from the UNIX file system to HDFS. Ingested data using Flume with source as Kafka Source and Sink as HDFS. Performed storage capacity management, performance tuning, and benchmarking of clusters. Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis. Business Intelligence Technical Systems Analyst Cognizant, Noida, India April 2016- March2020 Cognizant is a multinational engineering, design, planning, architectural design, project management and consulting services company. Involved in analyzing system failures, identifying root causes and recommended a course of action. Utilized MySQL from day to day to debug and fix issues with client processes. Developed, tested, and implemented the financial-services application to bring multiple clients into standard database format. Worked with business functional lead to review and finalize requirements and data profiling analysis. Responsible for gathering the requirements, designing and developing the applications.

Education

Jaypee Institute Of Information Technology Noida

Btech
January 2012 - January 2016