Sathish Kumar Rajendran
Development
Tamilnadu , India
Skills
Data Engineering
About
R. SATHISH KUMAR's skills align with Programmers (Information and Communication Technology). R. also has skills associated with System Developers and Analysts (Information and Communication Technology). R. SATHISH KUMAR has 15 years of work experience.
View more
Work Experience
System Analyst
Hexaware Technologies
March 2021 - March 2023
- Project1: BDF Acceleration and BDFA Core Team Size: 12 Client: IQVIA (healthcare) Technology: Cloudera, Hadoop, hive, spark, HDFS, yarn, impala, IntelliJ, ESP cybermation, WinSCP, putty, Hue, oozie, airflow, GitLab, Jenkins, Jira confluence, agile, DBeaver Technical Decoding - Flow 1 Using shell script file that includes Spark jars that reads data from the various source like SFTP, SQLServer, Oracle, Netezza such as Transaction, Product reference, Period, Manufacture Name, Category, Subcategory, Division/Location, Product Owner, Patient, pharmacy, hospitality, doctors list, geo location, molecule, Country code, Baseline Service, Expected Billing Date, Milestone Count, Net Revenue for each countries assets (SISO, CPI, LRx, MLC, NPA and CDs) and done the Incremental load. For new country must do the historical load first. Using HQL process the data for categorizing the panel like Hospital, Pharmacy, Ecommerce, Offtake into different levels like completed, pending, processing at the period of every day and stored as Parquet. Have experienced with GDM (Global Data Mapping in Hadoop using HiveQL query). We got the business requirement sheet from Business Analyst or Data Scientist team, based on that we build the Hive query and extract the data from our Base Fact and consolidated dimension tables to target GDM tables. This development completes using some advanced SQL function. Loaded the data into Hive tables based on panel which was arrived in source location and build the view on top of the normalized table. That views were shared to report team and finally for visualization through PowerBI. Technical Decoding - Flow 2 The file arrives in downstream the move-it team place the file from SFTP location to S3 data lake by weekly interval where Lambda function will be triggered. Each S3 bucket is linked to SQS notification service and Lambda is triggered to convert CSV to Parquet file format. PySpark reads data every week (12. ) from Glue to extract the data from S3 bucket to RDS tables. Then Using Athena to validate the data and reporting to Visual team. After processing of all the data using pySpark dataframe, Spark pushes the processed data in no-SQL database HBASE, and phoenix view will be created on HBASE tables and visualized in Tableau Project2: BDF ABW mobile apps Enhancement (Mar 2021 to Jun 2021 - 3months only) Team size: 4 Technology: Hive, Impala, Shell script, Hue, PowerBI Description: Fetch the backend data from impala tables based on filter which is appears in UI customer app. After that create final delivery file to customer with pipe delimited and include header format. Experience 3
Hadoop Developer, BA Resource at TCS
RA Infotech
December 2020 - February 2021
- Project: Code Migration Team Size: 6 Client: ALDI Technology: Cloudera, Hadoop, HDFS, Yarn, Hive, Impala, Spark, Scala, HUE, oozie, Spark UI, Bitbucket and Jenkins Description: Extract the Historical and Forecast weather information from the respective API's for the given number of weather location, The number of weather locations should be able to be adapted in a flexible way in future, The weather locations should be exactly the same as the ones for which the historical information was extracted, one second delay in API calls between each call, store the both API's data in separate hive table and the New API call being appended to the existing Hive Table on daily basis. Roles and Responsibility: Data Preparations for Store Forecast Project - R-Code to Scala. Read all weather information API(URL) from configuration properties file. Create RDD from the list of API. Create HTTP Connection and read data from the API list (RDD). The RDD API data convert into spark DataFrame. Write the data into separate Hive Table for history and forecast API data. Monitoring oozie workflow Jobs in HUE prod. Do failure analysis and rerun the jobs. Experience 4
DATA Engineer
Rashi Peripherals Pvt Ltd
May 2018 - July 2020
- Team Size: 5 Technology: Spark, Scala, Hive and Sqoop Roles and Responsibility: Using Sqoop Created Data Lake by extracting our Vendors, Dealers and customers from various data sources into HDFS. This includes Data from SAP Production Server, SQL DB and RDBMS. Developed an ingestion module to ingest data into HDFS from Heterogeneous data Sources. Built distributed in-memory application using Spark and Spark SQL to do analytics efficiently on huge datasets. Proficiency in a programming language, ideally Scala. Proficiency and knowledge of best practices with the Hadoop (YARN, HDFS, Hive, MapReduce) Experience 5
Support Engineer
Rashi Peripherals Pvt Ltd
August 2008 - April 2018
- Team size: 8 Roles and Responsibility: Managed the overall administration of Windows System, Linux, Mac and application software. Server Support for our clients TATA communication and Zoho Corp at Chennai location. Service management, Stock and Inventory Management for RMA Process. Configure and installing SAP and Web Portal for Stock management and CRM. Maintained SAP server to manage Product transaction data and dealer transaction data. Monitored and maintain system security, upgrade and patch installation. Installed of latest versions of Operating Systems on demand, as per requirements of the clients. Responsible for helping clients with regards to technical issues with our widgets.
Senior Bigdata Engineer (on-site)
Accord Innovations sdn bhd
April 2023 - Present
- Cyberjaya, Selangor, Malaysia. Client: Great Eastern Life (Insurance) Roles & Responsibilities: Creating big data framework which will collect data from various sources (SFTP, Oracle, SQL, BD2 ) and store it into BDP data lake. Then start Initial LGY/RAW data load from move it path to Hive tables. I can support around 3 more entity (SG, MY and ID) for Batch and Realtime (CDC) data processing on daily basis. We have received new requirement from user then communicate with to source system and get tables access using new table onboarding form. Then build the source and target ingestion config to extract data from source to our BDP HDFS location. Stored file type is ORC format. Admin Support for Data Governance Tool (Informatica) to onboarding new user for EDC, DEQ and AXON web Ui and Application support to restart the Informatica services. Projects (on-site Malaysia)
AutoSys, SNOW, agile, Informatica (Admin)
Group Data (Data Driven Business
April 2023 - Present
- April 2023 to till date) Team Size: 5 Client: Great Eastern Life (Insurance) Technology: Hortonworks, Sqoop, Hadoop, hive, spark, HDFS, yarn, Ambari, Qlik Replicate, Tectia, CyberArk, AutoSys, SNOW, agile, Informatica (Admin) Technical Decoding Batch Data Processing: Using ingestion framework script file that includes Sqoop and Spark jars that reads data from the various source like SFTP, Oracle, DB2 (AS400) such as Transaction, Contract master, Contract Product, Product reference, Policy endorsement, Policy Period, Fund Cash, Funs trans, Subcategory, Division/Location, Product Owner, Customer, policy, hospitality, doctors list, geo location, master log, product log, Insured list, Pay due, Policy type, Net Revenue etc.., for each entity (MY, SG and ID) source systems (FPMS, FPMSTK, G400, G400TK, MBS400, MBS400TK, CIF, GPA, LIF, PAS) and done the Incremental load via Sqoop then extract the data to BDP SI layer on Daily, Weekly and Monthly basis. For new source system must do the full load first. Provided L2 production support and Job monitoring on daily basis. Real time data capture (CDC) - Change data capture using Qlik replicate tool to capture incremental data from huge volume of source table and stored to HDFS path as CSV file for next day batch Ingestion. Also provide application support to stop and start services during server downtime, patch update, Server upgrade and so on. Monitoring the Qlik task and creating the new task based on the requirement. Provided Admin support for Informatica Data Governance tool (DGT). New User onboarding and provide/revoke user access like Axon, EDC and DEQ services. Also provided production support and monitoring the DEQ jobs, in case of failure needs to re-triggered or recovered the workflow to complete the jobs on monthly basis.