My CV
Fred Sun
Education
University College London (UCL) — London, UK
B.Sc. in Mathematics | Sept 2022 – Sept 2025
Research Experience
Centre for Brain Inspired Computing Research (CBICR), Tsinghua University — Beijing, China
Research Assistant | Supervisor: Shangqi Guo | Dec 2025 – Present
Search Agent Research Based on Dynamic Causal Knowledge Graph Construction for Scientific Reasoning
- Background: Existing search agents (Tongyi DeepResearch, WebDancer, etc.) excel at general Q&A tasks but lack systematic capabilities for scientific causal reasoning; existing knowledge graph methods rely on static pre-construction (Wikidata) or data-driven construction (CausalKG), failing to meet the dynamic nature and evidence evaluation needs of scientific literature
- Research Content: Proposed dynamic causal knowledge graph construction framework, enabling real-time extraction of causal relationships from retrieved literature to build query-specific temporary knowledge graphs; designed multi-stage information extraction pipeline combining few-shot prompting and rule validation to extract structured causal information from scientific papers, targeting 80%+ extraction accuracy
- Integrated GRADE evidence quality assessment framework, implementing automated rule-based evaluation and LLM-assisted subjective judgment; implemented structured reasoning mechanism based on knowledge graphs, replacing LLM black-box reasoning with explicit reasoning rules to support evidence chain tracing; constructed CausalReasoningQA evaluation benchmark with multi-dimensional assessment framework
Work Experience
ValiantSec Information Technology Co., Ltd. — Changsha, China
LLM Development Engineer | Jul 2025 – Oct 2025
ValiantSec DTCoder Large Language Model
- Led DTCoder dataset construction and fine-tuning optimisation; utilised SWE-Bench to construct bug instances based on real GitHub repositories, leveraging powerful models for PEFT and DPO data synthesis; designed data cleaning pipeline using generator-discriminator cross-model detection and self-training methods
- Results: Through LoRA fine-tuning and DPO reinforcement learning, DTCoder based on Qwen3-8B and 32B demonstrated excellent performance in vertical domain tasks, achieving over 20% improvement in resolved rate on SWE-Bench and custom test sets
ValiantSec Checklist Code Review Agent
- Led Checklist Agent architecture design; proposed 3-stage agent workflow decomposing tasks into "extract issues - check issues - resolve issues" ReAct sub-task flow; developed 20+ type-specific static code review methods integrated into agent toolbox
- Results: Based on new workflow, Checklist tool achieved over 30% recall improvement; introduction of static tools and discriminators significantly reduced model hallucination, repetition, and infinite loop issues
Project Experience
Pure RL Training Research Based on Deepseek-R1-Zero and DeepSWE | Sept 2025 – Dec 2025
- Business Challenge: Clients unable to deploy large-parameter models; small models with only PEFT insufficient for actual business scenarios; pure RLHF methods face extremely sparse rewards in early stages, causing small models to easily collapse
- Work: Experimentally validated achieving good performance through pure reinforcement learning on base models without domain-specific PEFT; using Qwen2.5-3B as base model, implemented PPO training with rule-based reward function on Countdown task via veRL framework, model spontaneously generated chain-of-thought, improving accuracy from 7% to 52%
- Extended task to interactive code repair scenario, learning from zero policy through environment feedback; improved advantage clipping methods in PPO and GRPO, removing KL Loss and Clip_high() components to encourage larger exploration while avoiding early collapse
Technical Skills
- Proficient in Python; solid understanding of LLM-related technologies and underlying mathematical principles; familiar with Transformer architecture design, self-attention mechanism, RoPE positional encoding and optimisation schemes
- Familiar with mathematical derivation and code implementation of mainstream RL algorithms including PPO/DPO/GRPO; experienced in full development pipeline from pretraining, SFT to RLHF
- Experienced in data collection and augmentation for real-world engineering; understanding of agent fundamentals and engineering frameworks; near-native English proficiency for rapid reading and learning of technical documentation and papers