Jingru Chen
Development
NY, United States
Skills
Machine Learning (ML)
About
Jingru Chen's skills align with IT R&D Professionals (Information and Communication Technology). Jingru also has skills associated with Database Specialists (Information and Communication Technology). Jingru Chen appears to be an entry-level candidate, with 23 months of experience.
View more
Work Experience
Research Assistant
Columbia University
June 2023 - Present
- Conducted humor detection with PyTorch and fine-tuned LLM (Pythia, GPT2, RoBERTa) training on human-made parallel corpus (funny and corresponding unfunny), which outperformed that training on simple corpus by 7%. Expanded training set with GPT4-generated parallel corpus, improving the fine-tuned LLM's performance by 10%. Explored how LLM's intrinsic ability (perplexity) to detect humorous texts changes when its size grows.
Data Scientist Internship
Columbia University
June 2023 - August 2023
- Wrote Python-based standard solutions for 30+ data challenges with Scikit-learn and Transformers, incorporating feature engineering, ensemble methods, regularization, dimensionality reduction, and hyperparameter tuning. Prompt engineered with commercial LLM (ChatGPT, Colossal-AI) for automated generation of comprehensive data challenges, covering a wide range of topics (e.g. fraud detection, recommendation system, sales prediction). Crawled and analyzed interview questions for data scientists and machine learning engineers within 10 industries.
Machine Learning Engineer Internship (Capstone)
L'Oreal
January 2023 - May 2023
- Fine-tuned T5 models with PyTorch at GCP on WikiSQL and Spider datasets to translate spoken questions to executable SQL queries, gaining an execute accuracy score of 70% and an exact match score of 50%. Benchmarked text2SQL task with OpenAI Codex - code-davinci-002, achieving an execute accuracy score of 80% and an exact match score of 70%. Exploratorily analyzed the corpora with correlation analysis, frequency analysis, and distribution visualization of query commands and question words using Numpy and Matplotlib.
Backend Development Internship
Midea Group
August 2021 - November 2021
- Built, maintained and monitored a website, which allows more than 100,000 users to apply for access and to get customized data views of large enterprise databases, with Python and Flask. Designed and built relational databases using MySQL, implemented reading and writing operations with SQLAlchemy, and enhanced access speed by caching frequently and previously retrieved data in Redis. Used Celery to asynchronously handle long-running retrieval tasks (getting users' permissions, querying to databases) and tasks with external dependencies (submitting users' applications, sending error messages by Email).
Research Assistant
Zhongnan University of Economics and Law
January 2021 - June 2021
- Identified rumors by training XGBoost, SVM, LSTM, BERT, and ERNIE with PyTorch, gaining 91% accuracy. Deployed a website named Weibo Rumor Detection in Python Flask, HTML, and CSS, achieving over 500 daily hits. Scraped Chinese official rumor texts and factual texts from Weibo Rumor Clarification Website using Python crawler.