Fred Sun

Education

University College London (UCL) — London, UK
B.Sc. in Mathematics | Sept 2022 – Sept 2025

Research Experience

Centre for Brain Inspired Computing Research (CBICR), Tsinghua University — Beijing, China
Research Assistant | Supervisor: Shangqi Guo | Dec 2025 – Present

Search Agent Research Based on Dynamic Causal Knowledge Graph Construction for Scientific Reasoning

Background: Existing search agents (Tongyi DeepResearch, WebDancer, etc.) excel at general Q&A tasks but lack systematic capabilities for scientific causal reasoning; existing knowledge graph methods rely on static pre-construction (Wikidata) or data-driven construction (CausalKG), failing to meet the dynamic nature and evidence evaluation needs of scientific literature
Research Content: Proposed dynamic causal knowledge graph construction framework, enabling real-time extraction of causal relationships from retrieved literature to build query-specific temporary knowledge graphs; designed multi-stage information extraction pipeline combining few-shot prompting and rule validation to extract structured causal information from scientific papers, targeting 80%+ extraction accuracy
Integrated GRADE evidence quality assessment framework, implementing automated rule-based evaluation and LLM-assisted subjective judgment; implemented structured reasoning mechanism based on knowledge graphs, replacing LLM black-box reasoning with explicit reasoning rules to support evidence chain tracing; constructed CausalReasoningQA evaluation benchmark with multi-dimensional assessment framework

Work Experience

ValiantSec Information Technology Co., Ltd. — Changsha, China
LLM Development Engineer | Jul 2025 – Oct 2025

ValiantSec DTCoder Large Language Model

Led DTCoder dataset construction and fine-tuning optimisation; utilised SWE-Bench to construct bug instances based on real GitHub repositories, leveraging powerful models for PEFT and DPO data synthesis; designed data cleaning pipeline using generator-discriminator cross-model detection and self-training methods
Results: Through LoRA fine-tuning and DPO reinforcement learning, DTCoder based on Qwen3-8B and 32B demonstrated excellent performance in vertical domain tasks, achieving over 20% improvement in resolved rate on SWE-Bench and custom test sets

ValiantSec Checklist Code Review Agent

Led Checklist Agent architecture design; proposed 3-stage agent workflow decomposing tasks into "extract issues - check issues - resolve issues" ReAct sub-task flow; developed 20+ type-specific static code review methods integrated into agent toolbox
Results: Based on new workflow, Checklist tool achieved over 30% recall improvement; introduction of static tools and discriminators significantly reduced model hallucination, repetition, and infinite loop issues

Project Experience

Pure RL Training Research Based on Deepseek-R1-Zero and DeepSWE | Sept 2025 – Dec 2025

Business Challenge: Clients unable to deploy large-parameter models; small models with only PEFT insufficient for actual business scenarios; pure RLHF methods face extremely sparse rewards in early stages, causing small models to easily collapse
Work: Experimentally validated achieving good performance through pure reinforcement learning on base models without domain-specific PEFT; using Qwen2.5-3B as base model, implemented PPO training with rule-based reward function on Countdown task via veRL framework, model spontaneously generated chain-of-thought, improving accuracy from 7% to 52%
Extended task to interactive code repair scenario, learning from zero policy through environment feedback; improved advantage clipping methods in PPO and GRPO, removing KL Loss and Clip_high() components to encourage larger exploration while avoiding early collapse

Technical Skills

Proficient in Python; solid understanding of LLM-related technologies and underlying mathematical principles; familiar with Transformer architecture design, self-attention mechanism, RoPE positional encoding and optimisation schemes
Familiar with mathematical derivation and code implementation of mainstream RL algorithms including PPO/DPO/GRPO; experienced in full development pipeline from pretraining, SFT to RLHF
Experienced in data collection and augmentation for real-world engineering; understanding of agent fundamentals and engineering frameworks; near-native English proficiency for rapid reading and learning of technical documentation and papers

My CV

Fred Sun

Education

Research Experience

Work Experience

Project Experience

Technical Skills

添加新评论

最新文章

最近回复

分类

归档

其它