Ranjith Kusthapuram
Finance Expert
Kansas, United States
Skills
Data Engineering
About
Ranjith Kusthapuram's skills align with Programmers (Information and Communication Technology). Ranjith also has skills associated with System Developers and Analysts (Information and Communication Technology). Ranjith Kusthapuram has 7 years of work experience.
View more
Work Experience
Azure Data Engineer
Commerce Bank
May 2023 - Present
- Description: Commerce Bank is a regional bank. It is delivering a full line of financial services including business and personal banking, checking, savings, loans (including mortgages and student loans), credit and debit cards, plus investment services and wealth management. Responsibilities: * Extensively involved in all phases of Data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver business needs of different teams. * Our project includes developing of L0 tables or raw tables applied different uses cases and loaded into L1 (Curated Tables) tables. * Leveraged my skills to create Data bricks workflows with multiple subtasks and added dependencies to make sure it is running in sequential order and not loading duplicate records or filtering out main records. * Extensively worked with Databricks platform components, created interactive clusters for data analysis and development, Job clusters for production and scalable computation. * Worked on Jenkins pipelines to run various steps including unit, integration, and static analysis tools. * Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and My SQL dB package. Performed load testing and optimization to ensure the pipeline's scalability and efficiency in handling large volumes of data. * Designed and implemented pipelines in Azure Data Factory, leveraging Linked Services for data extraction, transformation, and loading from diverse sources including Azure SQL Data Warehouse and Azure Data Lake Storage. * Integrated Kafka as a messaging system to ingest real-time weblogs into Spark Streaming, conducting data quality checks and flagging erroneous data for further processing. * Leveraged Databricks capabilities for batch and streaming data ingestion, transformation, and loading into final tables. * Broadcasting, ADQ, partitions methodologies helped team for 30% of potential bottlenecks. * Delays in workloads reduced by 20% by improving dataset operations for performance and scalability. * Creating job flow using Airflow in python and automating the jobs. Airflow will have separate stack for developing DAGs on and will run jobs on EMR or EC2 Cluster. Written queries in MySQL and Native SQL. * Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards. * Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming and did data quality checks using Spark Streaming and arranged bad and passable flags on the data. * Used Continuous Delivery Pipeline. Deployed microservices, including provisioning Azure environments and developed modules using Python scripting and Shell Scripting. Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. * Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions. * Analyzed the SQL scripts and designed it by using Spark SQL for faster performance. * Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible. * Integrated Azure Data Factory with Blob Storage to move data through DataBricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse. Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB). * Supported development of Web portals, completed Database Modelling in PostgreSQL, front end support in HTML/CSS, jQuery. Created Data stage and ETL jobs for populating the data into Data Warehouse constantly from different source systems like ODS, flat files, Parquet. * Broadcasting, ADQ, partitions methodologies helped team for 30% of potential bottlenecks. * Delays in workloads reduced by 20% by improving dataset operations for performance and scalability. * Developed multiple notebooks using Pyspark and Spark SQL in Databricks for data extraction, analyzing and transforming the data according to the business requirements. Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure. Environment: Azure Data Factory, AWS EMR, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, YARN, JIRA, S3, Redshift, Athena, Shell Scripting, GitHub, Maven.
AWS Data Engineer
Spriggs Bioanalytical consulting
October 2022 - April 2023
- Description: Spriggs Bioanalytical Consulting specializing in the bioanalytical aspects of biopharmaceutical product. It offers expertise in everything from identification and selection of the right bioanalytical lab to managing the selected lab to successfully deliver quality bioanalytical data in a timely manner. Responsibilities: * Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented AJAX, JSON, and Java script to create interactive web screens. Involved in building database Model, APIs and Views utilizing Python, to build interactive web-based solutions. * Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics). Developed a Front-End GUI as stand-alone Python application. Worked with Spark Core, Spark ML, Spark Streaming and Spark SQL and data bricks. * Exploring with the PySpark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's. Conducted Performance tuning and optimization of Snowflake data warehouse, resulting in improved query execution times and reduced operational costs. * Designed and deployed a Kubernetes-based containerized infrastructure for data processing and analytics, leading to a 20% increase in data processing capacity. * Created datasets from S3 using AWS Athena and created Visual insights using AWS Quicksight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes. * Developed remote integration with third-party platforms by using RESTful web services. * Have used T-SQL for MS SQL server and ANSI SQL extensively on disparate databases. * Used Python based GUI components for the Front-End functionality such as selection criteria. * Responsible for Building and Testing of applications. Experience in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing and configuring various packages in python (Teradata, MySQL, MySQL connector, PyMongo and SQLAlchemy). * Used Azure Data factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day requirements, and loaded them to Azure Data Lakes. * Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming and did data quality checks using Spark Streaming and arranged bad and passable flags on the data. * Developed a fully automated continuous integration system using Git, Jenkins and custom tools developed in Python. * Perform Unit testing, System Integration Testing, Regression Testing. Working successfully in a team environment using Agile Scrum methodologies. * Performed Modeling and Analytics using power BI reporting tool for the loaded table in hive meta store. * Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers. * The AWS Lambda functions were written in Spark with cross - functional dependencies that generated custom libraries for delivering the Lambda function in the cloud. Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS. Environment: AWS Cloud, Hadoop, Spark, Hive, Teradata, Mload, Oozie, Spring Boot, JUnit, IntelliJ, Maven and Git Hub, Docker, Kubernetes.
Data Engineer
Liberty General Insurance
April 2019 - June 2022
- Description: Liberty General Insurance Limited is a private general insurance company. It has a vast insurance portfolio, which includes both personal and corporate insurance. It offers private car insurance, two-wheeler insurance, and health insurance. Responsibilities: * Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application. * Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm, and web Methods. * Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session. * Designed GIT branching strategies, merging per the needs of release frequency by implementing GIT flow workflow on Bit bucket. Developed tools using Python, Shell scripting, XML to automate tasks. * Used R and Python for Exploratory Data Analysis to compare and identify the effectiveness of the data. * Installing and automation of application using configuration management tools Puppet and Chef. * Developing complex ETL/ELT pipelines and Mappings and scheduled jobs and creating data flows and control flows to improve performance of pipelines by 15%. Migrated the code across environments using various migration/deployment methods like GIT/Devops. * Performed Data Governance Techniques to give permissions only on view tables not on main tables and reduced risk of impact on main tables by 50%. * Used azure Databricks REST API calls typically to return a response payload that contains information about request, such as a cluster's settings, job permissions, notebooks, pipelines etc. These response payloads are typically in JSON format. * Created Delta tables and loaded the tables using dataframes and files from Azure blob location. * Extensively Used SQL Editor and Notebooks to run quires on tables. Root Cause analysis for job failures/Data issues raised by business users and lead managers. * Created Delta Live stream tables to load tables synchronously into target Tables. Utilized Kafka Streams API within Databricks to perform real-time data processing, transformation, and aggregation, contributing significantly to 18% of project deliverables. * Responsible for estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster. * Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. * Implemented Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast. Extracted the data from Teradata into HDFS using Sqoop. * Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level. Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query. * Integrated Kubernetes with cloud-native services, such as AWS EKS and GCP GKE, to leverage additional scalability and managed services. * Implemented automated Data pipelines for Data migration, ensuring a smooth and reliable transition to the Cloud environment. Use Python's Unit Testing library for testing various programs on python and other codes. * Utilized Elasticsearch and Kibana for indexing and visualizing the real-time analytics results, enabling stakeholders to gain actionable insights quickly. * Converted SAS codes to Python for predictive models with Pandas, NumPy and scikit-learn. Designed and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services. Used PowerBI as a front-end BI tool to design and develop dashboards, workbooks, and complex aggregate calculations. * Execute the Validation Process through SIMICS. Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream and Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing. Environment: CDH, Pig, Hive, MapReduce, YARN, Oozie, Flume, Sqoop, Impala, Spark, Scala, SQL Server, Teradata, Fast Export, Oracle, Shell Scripting.
Data Engineer
Carwale
June 2017 - March 2019
- Description: CarWale is one of the India's leading sources of new car pricing and other cars related information. It offers a complete consumer-focused service that includes content and tools to simplify car buying in India. Responsibilities: * Trained and documented initial deployment and Supported product stabilization/debugging at the deployment stage. Worked on SQL and PL/SQL for backend data transactions and validations. * Designed and implemented a fault-tolerant data processing framework leveraging Kubernetes and Docker, reducing downtime, and increasing system reliability by 25%. * Design and build scalable data pipelines to ingest, translate, and analyze large sets of data. * Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud. * Using Azure Cluster services, Azure Data Factory V2 ingested a large amount and diversity of data from diverse source systems into Azure Data Lake Gen2. * Have worked on partition of Kafka messages and setting up the replication factors in Kafka Cluster * Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export. Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy. * Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications. * Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented AJAX, JSON, and Java script to create interactive web screens. * Created Session Beans and controller Servlets for handling HTTP requests from Talend. Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders. Environment: Hadoop, Spark, Scala, MapReduce, HDFS, Hive, Java, Teradata, Tpump, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Zookeeper.