Rahul Kasam
Development
Michigan, United States
Skills
Data Engineering
About
RAHUL KASAM's skills align with System Developers and Analysts (Information and Communication Technology). RAHUL also has skills associated with Database Specialists (Information and Communication Technology). RAHUL KASAM has 5 years of work experience.
View more
Work Experience
Associate Hadoop/GCP Data Engineer
FORD MOTOR COMPANY
October 2020 - January 2022
- Led a team of five in developing a Hadoop-based data processing pipeline to process and analyze automotive data, leveraging tools such as HDFS, Hive, and MapReduce. • Applied data science techniques, including machine learning and predictive modeling, to solve business problems and extract valuable insights from data. Monitored and optimized Airflow DAG performance, ensuring reliability and efficiency of data workflows. • Developed Terraform modules for automating the deployment of cloud infrastructure, ensuring consistency and scalability. • Led successful data migration projects, transferring large datasets between Oracle, Teradata, and other relational database systems while ensuring data consistency. • Implemented data storage solutions using GCP services like Cloud Storage and Bigtable for efficient data management. • Developed a full-stack data supply chain e-commerce platform using Java, Spring Boot, and MySQL. • Implemented dataflow architectures using technologies like Apache Beam to enable efficient data processing and transformation. Developed systems for monitoring and responding to real-time events. • Utilized Python and Spark to transform and cleanse raw data, ensuring high-quality inputs for downstream analytics. Integrated Dataproc with other GCP services such as BigQuery, Cloud Storage, and Pub/Sub for seamless data ingestion, storage, and analysis. • Developed and maintained interactive dashboards and data visualization tools like Tableau, Matplotlib Power BI, or QlikView to create reports and charts in order provide stakeholders with real-time insights. • Specialized in financial data analysis, including the evaluation of financial statements, investment trends, and risk assessment. • Successfully executed database migration projects, transitioning data between Oracle and SQL Server while mitigating risks and minimizing downtime.
Big Data Engineer
ACCRETION INFO SYSTEMS PRIVATE LIMITED
May 2017 - July 2018
- Integrated data pipelines with CodePipeline to automate the deployment and management of data-related workflows. Leveraged Kinesis Data Streams to process and analyze real-time data feeds, enabling timely decision-making. • Administered relational databases like MySQL, Postgres (PostgreSQL), AWS DynamoDB and Oracle, ensuring data integrity, security, and high availability. Conducted troubleshooting to diagnose and fix software and hardware problems. • Improved overall project efficiency and rectified bottlenecks in data pipelines, leading to a 40% reduction in resource utilization. • Supported the migration of on-premises data to cloud-based platforms, gaining exposure to hybrid data storage architectures. • Utilized advanced data modeling techniques to create robust data schemas that optimized data storage, retrieval, and analytical capabilities. Applied predictive analytics techniques to make data-driven forecasts.
Big Data and Cloud Computing Intern
November 2016 - May 2017
- Participated in data transformation tasks, cleaning, and formatting raw data for analysis, and developed data processing pipelines using Hadoop and Spark. Strong quantitative skills for data analysis and decision-making. • Utilized Git workflows (e.g., Gitflow, GitHub Flow) to streamline the development process. • Proficient in working with YAML for configuration and data representation.
AWS/GCP Data Engineer
NEWYORK LIFE INSURANCE COMPANY
January 2022 - Present
- • Developed ETL pipelines using Python, PySpark, and AWS Glue to transform and enrich data from multiple sources. • Leveraged AWS services, including Amazon S3, Redshift, and Athena, for storing and querying large volumes of data. Implementing data version control and lineage tracking to enhance data governance and ensure data quality. • Optimized data models in Redshift, resulting in a 30% improvement in query performance and reduced response times. Designing and implementing data integration workflows using Apache Airflow for efficient task scheduling and data movement. • Leveraged GCP services such as BigQuery, Dataflow, and Pub/Sub to build scalable and cost-effective data processing pipelines. • Implemented serverless architecture using AWS Lambda, EC2, EMR, RDS, ECS to automate data processing and streamline workflows. • Worked closely with data scientists to deploy machine learning models as scalable APIs using AWS Lambda, REST API, GrapQL and API Gateway. • Collaborating with business analysts and stakeholders to define data requirements and translate them into scalable, efficient, and reliable data pipeline architecture and solutions. • Created custom Airflow operators and DAGs to automate data ingestion, transformation, and loading processes. • Conducted performance tuning on PySpark jobs, optimizing resource allocation, and improving overall efficiency. Leading the migration of on-premises data processing workloads to AWS for improved scalability and cost efficiency. • Utilized Amazon CloudWatch for monitoring and alerting, proactively identifying performance issues in data pipelines. Ensuring compliance with industry standards and data privacy regulations, maintaining the confidentiality and integrity of sensitive data, while fostering a culture of continuous improvement. • Designed and implemented a scalable and cost-effective data lake architecture using cloud platforms such as AWS S3, Azure Data Lake Storage, or Google Cloud Storage, while also working with data warehouse solutions like Amazon Redshift, Snowflake, or Google BigQuery. • Proficient in designing, developing, and maintaining ETL processes using Pentaho Data Integration (Kettle). Created complex transformations to extract, transform, and load data from various sources into target databases. • Implemented data modeling and schema design for Cassandra databases. Strong expertise in database management, including SQL and NoSQL databases. • Collaborated with cross-functional teams to define data analysis goals, design experiments, and interpret results to drive data driven strategies. • Monitored SQL Server performance using built-in tools and third-party monitoring solutions, optimizing query execution and server resources. Integrated JSON data into various applications and systems. • Proficient in big data technologies such as Hadoop, Spark, and Apache Kafka, enabling the processing and analysis of large datasets. Demonstrated success in enterprise-level data and software solutions. • Developed and maintained disaster recovery plans for both Oracle and SQL Server environments, minimizing data loss and ensuring business continuity. • Configured Jenkins jobs to automate build, test, and deployment processes, improving development efficiency. Designed and optimized DynamoDB databases for high-performance data storage.