Fred Sun


Education

University College London (UCL) — London, UK
B.Sc. in Mathematics | Sept 2022 – Sept 2025


Research Experience

Centre for Brain Inspired Computing Research (CBICR), Tsinghua University — Beijing, China
Research Assistant | Supervisor: Shangqi Guo | Dec 2025 – Present

Search Agent Research Based on Dynamic Causal Knowledge Graph Construction for Scientific Reasoning

  • Background: Existing search agents (Tongyi DeepResearch, WebDancer, etc.) excel at general Q&A tasks but lack systematic capabilities for scientific causal reasoning; existing knowledge graph methods rely on static pre-construction (Wikidata) or data-driven construction (CausalKG), failing to meet the dynamic nature and evidence evaluation needs of scientific literature
  • Research Content: Proposed dynamic causal knowledge graph construction framework, enabling real-time extraction of causal relationships from retrieved literature to build query-specific temporary knowledge graphs; designed multi-stage information extraction pipeline combining few-shot prompting and rule validation to extract structured causal information from scientific papers, targeting 80%+ extraction accuracy
  • Integrated GRADE evidence quality assessment framework, implementing automated rule-based evaluation and LLM-assisted subjective judgment; implemented structured reasoning mechanism based on knowledge graphs, replacing LLM black-box reasoning with explicit reasoning rules to support evidence chain tracing; constructed CausalReasoningQA evaluation benchmark with multi-dimensional assessment framework

Work Experience

ValiantSec Information Technology Co., Ltd. — Changsha, China
LLM Development Engineer | Jul 2025 – Oct 2025

ValiantSec DTCoder Large Language Model

  • Led DTCoder dataset construction and fine-tuning optimisation; utilised SWE-Bench to construct bug instances based on real GitHub repositories, leveraging powerful models for PEFT and DPO data synthesis; designed data cleaning pipeline using generator-discriminator cross-model detection and self-training methods
  • Results: Through LoRA fine-tuning and DPO reinforcement learning, DTCoder based on Qwen3-8B and 32B demonstrated excellent performance in vertical domain tasks, achieving over 20% improvement in resolved rate on SWE-Bench and custom test sets

ValiantSec Checklist Code Review Agent

  • Led Checklist Agent architecture design; proposed 3-stage agent workflow decomposing tasks into "extract issues - check issues - resolve issues" ReAct sub-task flow; developed 20+ type-specific static code review methods integrated into agent toolbox
  • Results: Based on new workflow, Checklist tool achieved over 30% recall improvement; introduction of static tools and discriminators significantly reduced model hallucination, repetition, and infinite loop issues

Project Experience

Pure RL Training Research Based on Deepseek-R1-Zero and DeepSWE | Sept 2025 – Dec 2025

  • Business Challenge: Clients unable to deploy large-parameter models; small models with only PEFT insufficient for actual business scenarios; pure RLHF methods face extremely sparse rewards in early stages, causing small models to easily collapse
  • Work: Experimentally validated achieving good performance through pure reinforcement learning on base models without domain-specific PEFT; using Qwen2.5-3B as base model, implemented PPO training with rule-based reward function on Countdown task via veRL framework, model spontaneously generated chain-of-thought, improving accuracy from 7% to 52%
  • Extended task to interactive code repair scenario, learning from zero policy through environment feedback; improved advantage clipping methods in PPO and GRPO, removing KL Loss and Clip_high() components to encourage larger exploration while avoiding early collapse

Technical Skills

  • Proficient in Python; solid understanding of LLM-related technologies and underlying mathematical principles; familiar with Transformer architecture design, self-attention mechanism, RoPE positional encoding and optimisation schemes
  • Familiar with mathematical derivation and code implementation of mainstream RL algorithms including PPO/DPO/GRPO; experienced in full development pipeline from pretraining, SFT to RLHF
  • Experienced in data collection and augmentation for real-world engineering; understanding of agent fundamentals and engineering frameworks; near-native English proficiency for rapid reading and learning of technical documentation and papers

添加新评论