Close this
Close this

Vigneswari Mani

Development
florida, United States

Skills

Data Engineering

About

VIGNESWARI MANI's skills align with System Developers and Analysts (Information and Communication Technology). VIGNESWARI also has skills associated with Database Specialists (Information and Communication Technology). VIGNESWARI MANI has 8 years of work experience.
View more

Work Experience

CITIBANK N.A
October 2019 - Present
  • USA CDW DATA MIGRATION AND ANALYSIS Team Size 3 Start Date OCT 2019 End Date Till Date Project Description The COMPLIANCE DATA WAREHOUSE (CDW) sources data from multiple product processors in LATAM, applies data transformation to generate DIS files and send them to Mantas platform for AML monitoring. CDW does not always receive data directly from product processors. In some countries there are local data warehouses, which sit between the product processor and CDW, increasing the number of data hops and adding complexity to data flow. This project aims migration of the current functionality from CDW which is an oracle based Data Mart to EAP Citi strategical big data platform. As a part of this migration EAP will source data directly from the product processors to simplify the data flow. Aliening with AML technology strategy to move away from oracle to EAP Citi strategical big data platform. To source the data directly from PP, thereby eliminating the dependency on local data warehouse. Reduction of the number of incoming feeds to reuse the data which will be already available in the EAP platformfor instance EAP currently sources data from PMC (FX rates) and AMC (reference data). The existing rates from PMC and AMC to CDW can be stopped after a successful cutover to EAP. Role & Contribution * Implemented ETL framework for using Spark with Python and loaded standardized data into Hive tables * Designed and built Data Quality framework for covering data quality aspects like completeness and accuracy using python and Spark * Implemented and optimized Spark jobs to improve performance and reduce processing time * Utilized Spark SQL and Data Frame APIs for efficient data manipulation and transformation * Implemented error handling and logging mechanisms to enhance the reliability and maintainability of data pipelines. * Troubleshot and resolved issues related to Spark job failures, optimizing queries for better performance. * Involved in Developing Hive queries for Data analysis and Sqoop Queries to import data into Hadoop * Worked on reading and writing data formats like JSON, ORC, parquet on HDFS using pyspark * Collaborated with data scientists and analysts to design data processing pipelines that met business objectives. * Create the data fixes for the frequently raised production issues and have to develop scripts to provide deployment steps to execute the EAP (Enterprise Architectural Platform) Workflow in UAT/PROD. * To analyze the root cause on repeated issues which may be at code or data level and has to give fixes on the Code. * Production issue analysis and resolution and to develop automation scripts. * Analyze new features that can be implemented in EAP (Enterprise Architectural Platform) and evaluate existing system to determine effectiveness and suggest changes to meet organizational requirements. * Coordinate with 3rd Party Vendors like Cloudera, Oracle, Starburst and DBA teams to understand the issue's root cause and get the tool fixed or any performance metrics commands. * Implemented optimization strategies to enhance query performance and overall system efficiency * System performance sanity check will be performed whenever any new DB, new cluster creation, any patches installation in existing cluster happens. * Write PYSPARK scripts for Integration with different frameworks like ETL(Transformation), DQ (Data Quality), RC (Reconciliation), DP (Data Profiling) with Big Data technologies. * Develop redaction script to redact the data to lower environments (SIT) in order to mask the PII data. * Create the entire set up of the data workflow in UAT and PROD environments. * Develop Autosys scripts for end-to-end workflow process in a scheduled time in UAT/PROD environments * Analyzing and developing the number of executors, executor memory and driver memory required for the application to run smoothly in production. Analysis will be based on the cluster resource. * Creating preinstall, post install scripts and maintain the codes in the BITBUCKET and deploy the codes from SIT to UAT and PROD environment through RLM. * Performance tuning for the highly complex hive queries based on business logic changes * Proposing the configuration parameters in hive and spark for the performance tuning for the long running complex logic queries. * Conduct track wise calls with team and vendors (onshore/offshore), discuss the regular activities, and provide guidance to team on showstoppers if any. Technology & Tools HIVE, Spark, PySpark, autosys, sqoop, shellscripting, cloudera distribution Work Experience Project 2

CITIBANK N.A
March 2016 - May 2019
  • Project Description This project is about to build a custom Transaction Monitoring application for Citibank Credit cards which will be an alternative for the well-established products like Mantas and Actimize. The traditional AML tools cannot handle huge volume of data and hence a new interval application is built on the big data technologies. With this application, we can handle beyond peta-byte of data and can have the look back period since the customer on-boarded. Also this application enables us to write any customer scenario which cannot be achieved by the traditional tools. Data from different regions and sources are stored in a data lake in EAP environment which is built on Hadoop platform. The data in EAP is standardized as per the requirement so that data looks uniform in the entire EAP for any lookup or codes. There are several layers defined where the data will reside for specific needs. The Quality of data is ensure by having custom DQ application. DQ application will run on each layer and if the quality of data is good then only the data will be moved to the next layer. Segmentation and Transaction Monitoring are the core part of this application. With 12 moths of data, customers will be segmented based on the transaction and payment data. Thresholds will be defined for each Segments. Transaction Monitoring will happen every month and the scenarios will based on the threshold defined for each segment. Based on various scenarios, AML transactions are caught and alerted to the Citi CMT system. With this application there are no false positives and all the Alerted transactions are genuine. There are separate layers for the BI which enables drill down for the alerted transactions, DQ data and profiling information. Role & Contribution * Develop hql script for table creation in each layer and create Sqoop script to import data from source into L0 layer then to L1 layer of EAP using ingestion frameworks. * Understanding the transformation logic provided in the mapping sheet and creating complex HQL codes to transfer data from source to the proceeding layers. * Writing scripts for integration with different frameworks like ETL, DQ, RC, DP and validation in higher environments. * Develop oozie workflow & autosys jobs to make end to end process in a scheduled time. * Creating preinstall, postinstall scripts and maintain the codes in the RTC repository and deploy the codes from DEV to SIT/UAT environment through RLM. * Coordinate with the third party members like cloudera, DBA teams to understand the issue root causes. * Update and expand existing shell scripts for DQ/RC/DP modules as per the requirement in different layers * Testing of the scripts through framework with different test cases * Error fixing and validation, re-checking the scripts to ensure that the desired results are produced and the data quality is achieved. * Develop, analyze and review the workflow as per the designed logic * Performance tuning for the highly complex hive queries. * Development of Unix shell scripting which has spark-submit and other command to send notification via email, this shell script will be invoked by Autosys. * Analyzing and developing the number of executors, executor memory and driver memory required for the application to run smoothly in production. Analysis will be based on the cluster resource. * Proposing solutions to automate the DQ runs through shell scripting. * Proposing the configuration parameters in hive and spark for the performance tuning for the long running complex logic queries. * Regular calls with the business key leads and BA team to improve the data Model. Technology & Tools HIVE, Spark, autosys, sqoop, shellscripting, cloudera distribution

Education

PONDICHERRY ENGINEERING COLLEGE

Bachelor of technology