Close this

Heredion Nzabanita

Development
Texas, United States

Skills

Data Science

About

Heredion Nzabanita's skills align with IT R&D Professionals (Information and Communication Technology). Heredion also has skills associated with Database Specialists (Information and Communication Technology). Heredion Nzabanita has 7 years of work experience.
View more

Work Experience

Data Scientist/ ML engineer /Contract /Remote

Alpha Recon
May 2023 - January 2024
  • * Developed a scraper to extract data from GDELT datasets and created Azure functions and data pipelines to facilitate future analysis. * Designed and created real-time data pipelines that accelerate the time from idea to insight, to enhance data engineering best practices, processes, and standards. * Used Bayesian algorithm to detect anomaly in the data by modeling the likelihood of normal data versus abnormal data. * Worked with data engineers, data architects, fellow data scientists, and other internal stakeholders to understand product requirements and then design, build, and monitor data platforms and pipelines that meet company's requirements. * Used GCP to Conduct complex SQL queries (using Google Big Query) to retrieve data from various sources. * Developed and implemented machine learning models and AI algorithms to detect anomalies, classify security incidents, and improve threat detection capabilities. * Deployed Large Language Models such as Falcon-40B on Azure machine learning workspace over GDELT datasets. * Utilized predictive analytics and modeling to forecast potential security risks. * Evaluated the performance of machine learning models and optimized them for better accuracy and efficiency.

Machine Learning Engineer

ASAAK
May 2022 - May 2023
  • * Developed and deployed algorithms to solve business problems, collaborating with Software Engineers for seamless integration into production. * Implemented machine learning techniques to address product and data science challenges. * Created a Safety Score driving model using machine learning to analyze location data and identify driving behaviors, resulting in improved driving behavior insights. * Trained machine learning models using fine-tuning and weights calibration on a large real-time dataset. * Utilized ARIMA models to forecast demand and describe price fluctuations. * Ensured data accuracy and quality through consistent validation and cleaning using BigQuery. * Delivered well-documented datasets, tools, and reports to key stakeholders for informed decision-making. * Collaborated with cross-functional teams to develop and deploy scalable models. * Implemented a Majority Vote Classifier, that calculates the most common label in the data, predict that label for each given point in the dataset as well as calculating the error rate for the classifier's predictions. * Utilized data mining and statistical analysis to discover insights from large datasets.

Automatic Speech Recognition
January 2022 - December 2022
  • * Developed sequence models, such as GRU/LSTM-based models, using PyTorch to accurately transcribe speech with high precision. * Utilized CTC Loss and optimized the model using greedy and beam decoders for improved performance. MPI Estimation Using Nightlight Data 2022 * Analyzed geospatial data using ArcGIS, Python, and statistical techniques. * Processed and analyzed nightlight data, calculating statistics, and using ridge-regression and elastic nets to determine feature significance. Developing and Investigating the Standalone Credit Scoring Model 2022 * Built a standalone credit scoring model for risk assessment. * Evaluated model performance using confusion matrix, precision score, and recall score. Analyzing the Daily Returns of the Stock Market Index for the Dow Jones 2021 * Performed Principal Component Analysis (PCA) on daily returns of 30 stocks in the Dow Jones Index. * Identified unusual stocks based on their distance from the average using correlation matrices and built a dendrogram to cluster stocks based on industrial sectors. Analyzing the Titanic Dataset to Estimate Survival Rate 2021 * Built a random forest model and performed ROC analysis to predict survival on the Titanic. NumPy-based Deep Learning Library (myTorch) 2021-2023 * Developed a custom deep learning library framework from scratch, implementing various neural network architectures and optimizers. Implementation of Classical Machine Learning Algorithms from Scratch 2021-2023 * Implemented various classical machine learning algorithms, including Kalman Filter, Reinforcement Learning, Ford-Fulkerson algorithm, Constraint Satisfaction, Minimax, Alpha-Beta Pruning, Dijkstra's, Informed Search, Hidden Markov Models, Bayesian Networks, KNN, GMM, K-Means, Kernel Support Vector Machines, Logistic Regression, Decision Tree, Random Forest, Principal Component Analysis, and others, from scratch. Modeling the risk factors of Stunting among under five children in developing countries. 20218-2021 * Combined Demographic and Health surveys datasets with hospital datasets to obtain relevant data on potential risk factors of stunting among under-fives children. * Utilized data imputation, normalization, and scaling to clean and preprocess the data to handle missing values, outliers, and inconsistencies. * Performed feature engineering to create new features or transformations that may enhance the predictive power of the model. * Used Logistic regression as an interpretable technique for modeling binary outcomes. * Validated the predictive performance of the model using independent datasets or through external validation with data from different regions.

Research Assistant

Carnegie Mellon University
November 2021 - April 2022
  • * Analyzed image data for a 3D reconstruction project. * Implemented algorithms for feature point matching and camera pose estimation. * Applied machine learning algorithms to generate 3D point clouds from matched feature points. * Improved 3D model accuracy through depth estimation optimization. * Validated the quality and accuracy of reconstructed 3D models using quantitative and qualitative methods. * Collaborated Processed with stakeholders to discuss project progress and prototype presentations. * Assessed model performance and viability and identified areas for further development. * Engaged in regular progress sharing and feedback-seeking with team members and researchers.

Data Scientist

East African Statistical Research and Consultancy Ltd
April 2018 - May 2021
  • * Conducted queries using SQL and developed data analysis algorithms. * Translated business needs into technical data models, supporting data architecture. * Ensured efficient data flow and data quality for insightful analysis. * Developed and monitored data analytics models, including predictive models using machine learning techniques. * Managed data cleaning processes to maintain high data quality and reliability. * Collaborated with stakeholders to leverage data for business solutions and decision-making. * Implemented and maintained data management systems, databases, and data warehouses. * Optimized SQL queries to improve database performance.

Intern

National Institute of Statistics of Rwanda (NISR)
August 2016 - August 2017
  • * Performed data cleaning, preprocessing, and analysis using Python. * Conducted data enhancement and visualization using Tableau, Excel, and Matplotlib. * Applied statistical methods, such as hypothesis testing and regression analysis, to gain insights from data. * Collaborated with the data science team to evaluate and refine models. * Employed binary logistic regression to model risk factors of teenage pregnancy using demographic and health survey datasets. DATA SCIENCE / MACHINE LEARNING PROJECTS Early Crop Disease Discovery using Leaf Symptom Images with Deep Convolutional Neural Networks 2022 * Developed an advanced deep learning model for early crop disease detection using the PlantVillage dataset containing over 54,000 images. * Implemented an ensemble of deep convolutional neural network models, including ResNet50 and ConvNext50, to achieve a disease prediction accuracy of 99.2%. * Utilized various data enhancement and augmentation techniques to improve model performance and symptom recognition.

Data Analyst

Process Tech
June 2023 - Present
  • * Analyzed and validated large datasets to guarantee the precision of information, maintaining data integrity and optimizing data processing workflows. * Implemented automated workflows that lower manual/operational costs, defined and timely availed data for inference, and move the company closer to democratizing data. * Produced high quality, modular, reusable code that incorporates best practices. * Ensure processes or systems are running effectively. * Performing Data process monitoring, adjustments, and qualification in a high-speed, high-throughput manufacturing environment. * Built productive and healthy relationships within the department and other teams. * Performed a key role in upholding data quality standards and supporting the continuous improvement of data processing capabilities for the change point management task set. * Tracking of Manufacturing processes and identifying the critical failures that may imply bigger problems through data trends.

Education

CARNEGIE MELLON UNIVERSITY

Master of Science in Engineering of Artificial Intelligence
August 2021 - May 2023

RUHENEGRI INSTITUTE OF APPLIED SCIENCES

Bachelor of Science in Applied Statistics
September 2013 - March 2018