Publications
My research focuses on AI Safety, frontier risks, interpretability, and cybersecurity. I have published papers on large language models, AI alignment, and knowledge graphs.
For a complete list of my publications, please visit my Google Scholar.
2025
[ArXiv 2025] Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
[ICLR 2026] PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
[ACL 2025] The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
[NeurIPS 2025] RepGuard: Adaptive Feature Decoupling for Robust Backdoor Defense in Large Language Models
[ArXiv 2025] Oyster-I: Beyond Refusal—Constructive Safety Alignment for Responsible Language Models
[Cybersecurity 2025] When LLMs Meet Cybersecurity: A Systematic Literature Review
2024
[ICLR 2025] Reef: Representation Encoding Fingerprints for Large Language Models
[ACL 2024] Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
[ArXiv 2024] The Better Angels of Machine Personality: How Personality Relates to LLM Safety
[ArXiv 2024] Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
[ArXiv 2024] From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality Through Four Modalities
2023
[IEEE TrustCom 2023] Hackmentor: Fine-tuning Large Language Models for Cybersecurity
2020-2021
[ICCC 2020] Answer Extraction with Graph Attention Network for Knowledge Graph Question Answering
[IEEE Access 2021] Reasoning for Local Graph over Knowledge Graph with a Multi-policy Agent
[CISAI 2020] A General Framework for Chinese Domain Knowledge Graph Question Answering Based on TransE
For the most up-to-date publications and citation metrics, see my Google Scholar profile.







