I’m a computer science graduate student at the University of Pennsylvania , working on Large Language Models (LLMs), Vision-Language Models (VLMs), and NLP applications in AI for Science.

At UPenn, I’m fortunate to be advised by Prof. Chris Callison-Burch, Prof. Lyle Ungar, and Delip Rao. I also collaborate with Dr. Xiaodong Yu (AMD GenAI) and Prof. Yunhuai Liu (Peking University).

My research centers on advancing (M)LLMs with Effective, Efficient, and Explainable methods. I care about building models that work better, run cheaper, and fail more predictably. Just as importantly, I want to understand why they work, when they break, and how we can steer them with confidence.

Today, a lot of progress comes from large-scale training and black-box iteration. It works, but it often hides the reasons behind progress and makes reliability harder to reason about. At the same time, scaling alone is starting to feel more incremental. That’s why I focus on two complementary directions:

Research Pipeline: Effective • Efficient • Explainable

From understanding → to reliable impact → to continual improvement

1. Mechanism-driven Understanding (Interpretability + optimization): I study what is happening inside LLMs and VLMs, and how those internal signals can be used to improve models. I look at attention patterns, residual streams, activations, representations, and logits. My goal is not interpretability as a visualization layer. My goal is interpretability that changes how we optimize and control models.
(More: Why I care about interpretability)
2. Model Adaptation (From base models to real experts): I work on adapting foundation models to specific domains and building systems with measurable impact. I'm interested in task-agnostic adaptation pipelines where we can inject real scientific novelty, including post-training (SFT, RL, distillation), efficiency methods (quantization, pruning, layer skipping, routing), and system-level tooling like retrieval and evaluation. I also like going deep into real domains, where novelty often comes from new tasks, synthetic data, and sometimes further architecture optimization.
(More: Why I care about model adaptation)

I am also the co-founder of Savable Koupon AI, where we build AI-driven price tracking, LLM-based product analysis, and recommendation systems for e-commerce. I serve as a reviewer for top conferences like ICLR, ICML, ACL, CVPR, AAAI.

Feel free to reach out for collaboration or just to say hi—shoot me an email: feijianghan@gmail.com (it’s — for me right now).

🔥 News

January 2026: 🎉 ICLR 2026 x 2
January 2026: 🎉 ICASSP 2026
November 2025: 🎉 AAAI 2026 x 2
July 2025: 🎉 COLM 2025
June 2025: 🎉 Paper published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2025
June 2025: 🎉 MOSS@ICML2025

Why I’m excited about these problems

Why I care about interpretability

Interpretability, for me, begins with curiosity. I like watching a system and asking: why did that happen? It feels like being a kid observing insects. You stare long enough, and suddenly a pattern shows up. That moment of “wait, that’s weird” makes me happy.

More rationally, interpretability also serves a long-term goal: building AI systems that are truly reliable, possibly all the way to AGI or even ASI.

If scaling eventually leads to AGI, we may get extremely capable black-box systems. Then the key question becomes safety and alignment. How do we ensure a superintelligent model consistently acts in good faith, and does not quietly deceive people to do harm?
If scaling alone still fails to reach AGI, we will need deeper answers. Why do these models work at all? What factors truly drive their performance?

Good explanations help us trust models in practice. They also guide us to design better models based on principles, not just trial and error.

I often think about how physics matured. First came careful observations (Tycho Brahe). Then hypotheses (Kepler). Then principles (Newton). In AI, we have made huge empirical progress, many interpretability papers open a trained model and hunt for circuits. I love and respect that line of work, such as logit-lens analyses for LLMs/VLMs, sparse autoencoders (SAEs), and recent mechanistic interpretability work from Anthropic. But we still lack “Newton-style” first principles.

I want to ask questions that begin at the level of training processes and model architecture.

Why do compositional features and circuits appear at all? Why do we sometimes see sparsity, low-rank structure, or neatly separated factors after training? Can we connect those outcomes to the equations of gradient-based learning, instead of only collecting evidence after the fact?

For example, I like the ICLR 2025 oral paper Learning Dynamics of LLM Finetuning, which offers explanations for why SFT can lead to hallucinations and why DPO performance may degrade over time.

My hope is that interpretability can slowly move from biology-style observation to physics-style reasoning. If that shift happens, it will feel like a real change of era.

For the near term (the next 3-5 years), I believe a key priority is turning insights into actionable model improvements—making interpretability more than just criticized as useless analysis.

Actionable Mechanistic Interpretability Framework

Actionable Mechanistic Interpretability: From Localizing and Steering to Model Improvement
(Figure from the survey "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models")

Why I care about model adaptation

I also spend a lot of energy thinking about adaptation. Partly because I do not believe “general” intelligence comes for free.

Scaling has worked, but the returns can slow down. It is unlikely that every new GPT-n will feel as shocking as the earlier jump from GPT-4. At the same time, we have already had LLMs in the real world for several years, but there are still many specialized tasks they cannot do well. Pretraining will never perfectly cover every niche, every workflow, or every kind of expertise.

So I care about a practical question. How do we turn a strong base model into a model that is genuinely useful for a specific need?

I think about this in two layers, as illustrated below:

Novelty in Model Adaptation: Task-Specific vs Task-Agnostic Approaches

Rich Sutton’s “The Bitter Lesson” keeps proving true in the LLM era: general methods that leverage compute—search and learning—consistently outperform methods that rely on human priors and hand-crafted knowledge. With that in mind, First, I want to improve the general “base-to-expert” pipeline (task-agnostic). That includes post-training methods like SFT, RL, and distillation; inference efficiency, such as quantization, pruning, layer skipping, and routing; and retrieval, evaluation, and benchmarks—because the workflow around a model often matters as much as the model itself.

Second, I want to take these tools into real domains and make them work end-to-end. This idea is not new—it was central in the BERT era and remains central now. Beyond popular areas like coding and document analysis, I think many domains that rely on careful human judgment could benefit from LLM-based specialists. Malware or virus detection is one example.

Some see this direction as “just engineering” (data + training). I am drawn to it because I believe that kind of engineering can carry real scientific novelty.

Sometimes the novelty is how you get data when data is scarce. Sometimes it is how you design synthetic data that teaches the right behavior. Sometimes it is how you change representations or architectures when the base model cannot capture a key dependency. Sometimes it is how a new industrial need becomes a new research question.

In the long run, I am optimistic about a system view of intelligence. If we can build many strong, efficient specialists, and let them collaborate as agents, we may reach broad capability in a way that is easier to maintain, easier to adapt, and easier to interpret than betting everything on a single monolithic model.

📝 Selected Publications

For a complete list of publications, please visit my Google Scholar

🔮 Research Interest 1: Uncovering NLP & LLM Internal Mechanism and Interpretability

ICLR 2026

ZeroTuning: Unlocking the Initial Token’s Power to Enhance Large Language Models Without Training

Feijiang Han, Xiaodong Yu, Jianheng Tang, Delip Rao, Weihua Du, Lyle Ungar

Paper | Code & Demo | Blog | Poster | ICLR Talks

TL;DR. Training-free attention tuning can boost frozen LLMs, but prior methods often depend on fragile heuristics to find “important” task tokens. ZeroTuning shows a simpler universal control lever: tune only the initial token (e.g., <BOS>). With tiny head-specific biases on BOS attention logits, we can reshape downstream attention (sharpen/flatten), lower output entropy, and unlock pretrained knowledge—without any parameter updates.

Key Points:

Lightweight + practical: ~4-line change, no KV-cache / decoding changes, kernel-agnostic (works with SDPA & FlashAttention), and works with quantized inference.
Strong, broad gains: across 15 datasets, e.g., on Llama-3.1-8B: +19.9% (classification), +4.5% (QA), +2.1% (dialogue); MT-Bench improves 7.804 → 7.966.
Why it works (mechanism insights): BOS acts as an attention sink, so tuning it gives monotonic control of attention entropy; effects are stronger in earlier layers and heterogeneous across heads (up-effective vs. down-effective). Includes supervised calibration and an unsupervised entropy-minimization variant.

📑 Click to see abstract

Token-level attention tuning, a class of training-free methods including Post-hoc Attention Steering (PASTA, AutoPASTA) and Attention Calibration (ACT), has emerged as a promising way to improve frozen LLMs with interpretable interventions. However, these methods depend on auxiliary heuristics to identify "important" task-specific tokens, which can introduce bias and limit applicability when token importance is unclear or when using optimized kernels where attention maps are inaccessible. We propose a simpler and more elegant alternative: acting only on the initial token (e.g., <BOS> in LLaMA). We show theoretically that adding lightweight biases to this token's attention logits monotonically controls the entropy of the downstream attention distribution--an effect amplified by its natural function as an attention sink. Our empirical analysis reveals that this tuning process can positively affect LLMs and better unlock their pretrained knowledge, with stronger effects in early layers and distinct scaling preferences across attention heads. Building on these insights, we introduce ZeroTuning: a training-free method that improves LLM performance by applying head-specific attention adjustments to the initial token, requiring zero parameter updates. We present two variants: a supervised mode that calibrates on validation examples, and a novel unsupervised mode that directly minimizes the model's output entropy. Our method requires no KV‑cache or decoding changes, and is kernel‑agnostic (works with SDPA and FlashAttention). The method is lightweight and requires only four lines of modification to the standard LlamaAttention code. It achieves broad gains across 15 datasets and outperforms previous, more complex methods; for instance, with Llama-3.1-8B, it yields relative improvements of 19.9% on classification, 4.5% on question answering, and 2.1% on dialogue. ZeroTuning also works out-of-the-box with quantized inference and maintains its performance improvements with increasing context lengths. Our code and runnable demo are available at https://anonymous.4open.science/r/ZeroTuning.

ICASSP 2026

Read Before You Think: Mitigating LLM Comprehension Failures with Step-by-Step Reading

Feijiang Han, Hengtao Cui, Licheng Guo, Zelong Wang, Zhiyuan Lyu

Paper | Blog

TL;DR. Many “reasoning” failures in LLMs are actually comprehension failures—the model misreads the question (semantic misunderstanding), so even Chain-of-Thought can’t reliably help. We introduce Step-by-Step Reading (SSR), a training-free framework that makes models read before they think: parse the question incrementally, keep each reasoning step grounded to the text, and fix backward dependencies via iterative re-contextualization.

Key Points:

Identified Semantic Misunderstanding as a core reasoning bottleneck that persists even with CoT, stemming from the inherent constraints of the unidirectional attention mechanism.
Explained the effectiveness of prompt repetition through the lens of Attention: it helps models suppress focus on low-semantic tokens (e.g., punctuation) and redistribute attention to critical information.
Proposed a training-free framework to resolve these issues by: (1) applying step-by-step reading logic, (2) automatically steering attention to key tokens via self-reference, and (3) resolving backward dependencies through iterative re-contextualization.

📑 Click to see abstract

Large Language Models (LLMs) often fail on complex reasoning tasks due to flawed question comprehension, not just flawed logic. This paper presents a systematic investigation into these comprehension failures. Our work yields three key insights: (1) the step-by-step principle, effective for calculation, can be migrated to the reading process to enhance comprehension; (2) increasing the proportion of question-related tokens (e.g., via repetition) succeeds by refocusing attention, a mechanism that can be explicitly controlled; and (3) backward dependencies represent a core bottleneck for decoder-only models that persists even with strong methods like Chain-of-Thought. Based on these findings, we introduce the Step-by-Step Reading (SSR) family of prompts. This multi-stage approach culminates in SSR++, a method specifically engineered to deepen model comprehension by guiding it to parse questions with finer granularity, focus attention on critical tokens, and resolve backward dependencies through iterative re-contextualization. SSR++ sets a new state-of-the-art on multiple reasoning benchmarks, and our analysis confirms it works by directly mitigating semantic misunderstanding. These results demonstrate that guiding how a model reads is a powerful and efficient method for improving its reasoning ability.

🔍 Research Interest 2: Model Adaptation

COLM 2025

Can LLMs handle WebShell detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework

Feijiang Han, Jiaming Zhang, Chuyi Deng, Jianheng Tang, Yunhuai Liu

Paper | Blog | Poster

TL;DR. WebShell detection is hard for LLMs because a server-side script can span millions of tokens while the truly malicious logic is often just a tiny, obfuscated fragment—so naïvely feeding the whole file dilutes the signal and breaks context limits. We provide the first comprehensive evaluation of LLMs for WebShell detection and introduce BFAD, a behavior-driven, function-aware pipeline that helps LLMs focus on the most indicative code, yielding a +13.82% average F1 improvement and pushing both large and small LLMs toward (or beyond) prior SOTA.

Task: input = {server-side script / PHP file} → output = {WebShell / Benign}.

📑 Click to see abstract

WebShell attacks, where malicious scripts are injected into web servers, pose a significant cybersecurity threat. Traditional machine learning and deep learning methods are often hampered by challenges such as the need for extensive training data, catastrophic forgetting, and poor generalization. Recently, Large Language Models (LLMs) have emerged as a powerful alternative for code-related tasks, but their potential in WebShell detection remains underexplored. In this paper, we make two major contributions: (1) a comprehensive evaluation of seven LLMs, including GPT-4, LLaMA 3.1 70B, and Qwen 2.5 variants, benchmarked against traditional sequence- and graph-based methods using a dataset of 26.59K PHP scripts, and (2) the Behavioral Function-Aware Detection (BFAD) framework, designed to address the specific challenges of applying LLMs to this domain. Our framework integrates three components: a Critical Function Filter that isolates malicious PHP function calls, a Context-Aware Code Extraction strategy that captures the most behaviorally indicative code segments, and Weighted Behavioral Function Profiling (WBFP) that enhances in-context learning by prioritizing the most relevant demonstrations based on discriminative function-level profiles. Our results show that, stemming from their distinct analytical strategies, larger LLMs achieve near-perfect precision but lower recall, while smaller models exhibit the opposite trade-off. However, all baseline models lag behind previous State-Of-The-Art (SOTA) methods. With the application of BFAD, the performance of all LLMs improves significantly, yielding an average F1 score increase of 13.82%. Notably, larger models like GPT-4, LLaMA-3.1-70B, and Qwen-2.5-Coder-14B now outperform SOTA benchmarks, while smaller models such as Qwen-2.5-Coder-3B achieve performance competitive with traditional methods. This work is the first to explore the feasibility and limitations of LLMs for WebShell detection and provides solutions to address the challenges in this task.

AAAI 2026

LaTeX2Layout: High-Fidelity, Scalable Document Layout Annotation Pipeline for Layout Detection

Feijiang Han, Zelong Wang, Bowen Wang, Xinxin Liu, Skyler Cheung, Delip Rao, Chris Callison-Burch, Lyle Ungar

Paper | [Code & Dataset] (Release Due: 2026.4.1)

TL;DR. Layout detection turns a PDF into structured page understanding (bounding boxes + reading order), but current VLMs struggle mainly because high-fidelity supervision is scarce and PDF-parser-based labels are noisy and expensive. We introduce LaTeX2Layout, a scalable data-centric pipeline that extracts pixel-accurate layout ground truth directly from the LaTeX compilation process, enabling large-scale training without manual annotation.

Task: input = {PDF document} → output = {page elements’ bounding boxes + reading order (optionally OCR)}.

📑 Click to see abstract

General-purpose Vision-Language Models (VLMs) are increasingly integral to modern AI systems for document understanding, yet their ability to perform fine-grained layout analysis remains severely underdeveloped. Overcoming this limitation requires large-scale, high-fidelity training datasets. However, current annotation methods that rely on parsing rendered PDFs are costly, error-prone, and difficult to scale. We propose a different paradigm: extracting ground-truth layout directly from the LaTeX compilation process rather than the final PDF. We present LaTeX2Layout, a generalizable procedural pipeline that recovers pixel-accurate bounding boxes and reading order from compiler traces. This enables the generation of a 140K-page dataset, including 120K programmatically generated synthetic variants that more than double the layout diversity of real-world data. Using this dataset, we fine-tune an efficient 3B-parameter VLM with an easy-to-hard curriculum that accelerates convergence. Our model achieves Kendall's tau=0.95 for reading order and mAP@50=0.91 for element grounding, delivering nearly 200% relative improvement over strong zero-shot baselines such as GPT-4o and Claude-3.7.

AAAI 2026

Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification

Feijiang Han

Paper | Video (AI) | Slide (AI) | [Code & Dataset] (Release Due: 2026.4.1)

TL;DR. While WebShell detection answers “malicious or not,” real-world defense also needs attribution and tracking: WebShells come in diverse families with different behaviors and variants. We are the first to systematically study representation learning for automated WebShell family classification.

Task: given a WebShell → predict its family ID.

Key Points:

Benchmark: the first systematic study of representation learning for fine-grained WebShell family classification.
Behavioral view: dynamic function-call traces + LLM-augmented variants for robust behavioral analysis.
Key finding: structural representations (especially tree-based GNNs / Tree-GAT) consistently outperform sequence models for family attribution.

📑 Click to see abstract

Malicious WebShells represent a severe and evolving threat, compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has achieved considerable success in WebShell detection (distinguishing malicious from benign samples), we argue it is time to advance from passive detection to a new stage of in-depth analysis and proactive defense. A promising and critical direction is the automation of WebShell family classification: identifying the specific malware lineage to understand an adversary's tactics and enable a precise, rapid response. This crucial task, however, remains a largely unexplored area that currently relies on slow, manual expert analysis. To address this gap, we present the first systematic study to automate WebShell family classification. Our method begins with extracting dynamic function call traces to capture inherent behaviors that are resistant to common encryption and obfuscation. To enhance the scale and diversity of our dataset for a more stable evaluation, we augment these real-world traces with new variants synthesized by a Large Language Model (LLM). These augmented traces are then abstracted into sequences, graphs, and trees, providing a foundation to benchmark a comprehensive suite of representation methods. Our evaluation spans classic sequence-based embeddings (CBOW, GloVe), transformers (BERT, SimCSE), and a range of structure-aware algorithms, including Graph Kernels, Graph Edit Distance, Graph2Vec, and various Graph Neural Networks.

Project

ThinknCheck: Grounded Claim Verification with Compact, Reasoning-Driven, and Interpretable Models

Delip Rao, Feijiang Han, Chris Callison-Burch

Poster | [Paper] (Coming Soon)

TL;DR. Efficient scientific claim verification is essential for trustworthy literature review and retrieval—but most strong verifiers are large, expensive, and hard to interpret. We develop ThinknCheck, a compact “reason first, then decide” verifier, and summarize best practices for making small LLMs reliable and interpretable on document-grounded claim verification.

Task: input = {Document, Claim} → output = {True / False}.

Key Points:

1B-scale, 4-bit ThinknCheck verifier trained to “reason first, then decide” for scientific claim verification
New reasoning-augmented datasets LLMAggreFact-Think and GSMClaims for document-grounded scientific and arithmetic claims
Small model matches or surpasses larger specialized verifiers (e.g., MiniCheck-7B) while providing short, interpretable rationales

📑 Click to see abstract

We present ThinknCheck, a 1B-parameter verifier for grounded claim verification that first produces a short, structured rationale and then a binary verdict. We construct LLMAggreFact-Think, a 24.1k reasoning-augmented training set derived from LLMAggreFact, and fine-tune a 4-bit Gemma3 model to follow this format. On LLMAggreFact, ThinknCheck attains 78.1 balanced accuracy (BAcc), surpassing MiniCheck-7B (77.4) with 7x fewer parameters; removing the reasoning step reduces BAcc to 57.5. On SciFact, ThinknCheck reaches 64.7 BAcc, a +14.7 absolute gain over MiniCheck-7B. By contrast, zero-shot chain-of-thought on the base Gemma3-1B harms accuracy relative to direct answers, and preference optimization with a simple format+accuracy reward underperforms supervised reasoning. A qualitative audit of generated rationales indicates current verification datasets over-reward lexical overlap and under-test multi-sentence and numerical reasoning. To probe the latter, we introduce GSMClaims and a domain-specialized variant, ThinknCheck-Science, which improves across benchmarks, including 61.0\% accuracy on GSMClaims. Overall, explicit, supervised reasoning enables compact verifiers that are competitive while remaining resource-efficient and interpretable.

🌟 Research Interest 3: Other Topics (HCI, Big Data Visualization, IoT, Federated and Continual Learning)

Information Sciences 2023

Credit and quality intelligent learning based multi-armed bandit scheme for unknown worker selection in multimedia MCS
Jianheng Tang, Feijiang Han, Kejia Fan, et al.

TL;DR. High-quality training data is the bottleneck for modern multimodal and foundation models, and mobile crowd sensing (MCS) is a scalable way to collect it—but platforms must recruit workers before knowing who is trustworthy or produces high-quality data. We formulate this as an online decision-making problem under uncertainty and propose CQL-MAB, a bandit-style RL scheme that learns workers’ credit (honesty) and quality (data utility) from feedback and selects workers cost-effectively with incentive guarantees.

Why this is RL: it’s a contextual multi-armed bandit: repeatedly choose “arms” (workers), observe stochastic rewards (credit/quality), and minimize regret while respecting budget/auction constraints.

Task: input = {workers’ bids + streaming feedback from their submitted data} → output = {selected worker set (and payments) each round}.

Key Points:

CQL-MAB: jointly models credit (trustworthiness) and quality (data utility) as rewards for bandit-based recruitment under budget.
Two-stage / two-level reward UCB to pick workers while continuously updating beliefs about unknown workers.
Proven properties in reverse auctions: truthfulness, individual rationality, and computational efficiency, plus strong empirical performance (revenue/regret).

📑 Click to see abstract

The field of intelligent multimedia systems, which rely heavily on multimodal models trained on large amounts of high-quality data, has been revolutionized by the use of deep learning. One promising approach to collect such multimodal data is Mobile Crowd Sensing (MCS). However, MCS platforms face a significant challenge in selecting both high-credit and high-quality workers at low cost due to the Post-Unknown Worker Recruitment (PUWR) problem. The PUWR problem makes it difficult to determine the credits and qualities of workers in advance, which can lead to the recruitment of dishonest or low-quality workers. This problem severely affects the quality and quantity of MCS data collection, posing a serious threat to the security and robustness of large-scale multimedia models. To address this issue, we propose a Credit and Quality Learning based Multi-Armed Bandit (CQL-MAB) scheme, which consists of a novel credit identification algorithm, a fine-grained worker quality calculation method, and a two-stage reward-based Multi-Armed Bandit (MAB) for worker selection in reverse auction. The theoretical proof shows that the CQL-MAB scheme achieves the truthfulness, individual rationality, and efficiency of the auction mechanism. A large number of simulation experiments on real data traces are conducted to demonstrate the outstanding performance of CQL-MAB.

UBICOMP 2025 CALM: A Ubiquitous Crowdsourced Analytic Learning Mechanism for Continual Service Construction with Data Privacy Preservation
Kejia Fan, Yuwei Huang, Jiayi He, Feijiang Han, Jianheng Tang, et al.
arXiv 2025 APFL: Analytic Personalized Federated Learning via Dual-Stream Least Squares
Kejia Fan, Jianheng Tang, Zixuan Yang, Feijiang Han, Jiayi Li, et al.
arXiv 2025 ACU: Analytic Continual Unlearning for Efficient and Exact Forgetting with Privacy Preservation
Jianheng Tang, Haotian Zhuang, Dongxiao Fang, Jiayi Li, Feijiang Han, et al.
Information Sciences 2024 MAB-RP: A Multi-Armed Bandit based workers selection scheme for accurate data collection in crowdsensing
Yuwei Lou, Jianheng Tang, Feijiang Han, Anfeng Liu, et al.
Information and Software Technology 2024 Fctree: Visualization of function calls in execution
Fei Zhou, Yifan Fan, Shengchao Lv, Lingxiao Jiang, Zhuo Chen, Jingui Yuan, Feijiang Han, et al.
IEEE IoT Journal 2023 CRL-MABA: a completion rate learning-based accurate data collection scheme in large-scale energy internet
Kejia Fan, Jianheng Tang, Wenbin Xie, Feijiang Han, Yuwei Huang, et al.
IEEE IoT Journal 2023 BTV-CMAB: A bi-directional trust verification-based combinatorial multiarmed bandit scheme for mobile crowdsourcing
Jianheng Tang, Kejia Fan, Wenbin Xie, Feijiang Han, et al.
Computer Communications 2023 A Semi-supervised Sensing Rate Learning based CMAB scheme to combat COVID-19 by trustful data collection in the crowd
Jianheng Tang, Kejia Fan, Wenbin Xie, Lingxiao Zeng, Feijiang Han, et al.

🎖 Honors and Awards

2025 AAAI 2026 Scholarship
2025 COLM 2025 Registration & Travel Grant
2024 Xiaomi Special Scholarship (Top 10 university-wide)
2024 Outstanding Graduate of the Class of 2020
2023 National Scholarship for Outstanding Students

📝 Notes & Experiences

A Comprehensive Guide to the Computer Science PhD Journey: A Survey of Collected Wisdom - This survey distills hard-won CS PhD advice from nearly 20 well-known researchers’ essays and talks into one practical, end-to-end guide. It covers the full arc—from applying and choosing an advisor to building a sustainable research workflow, publishing effectively, and planning your post-PhD path.
A curated list of Best Paper Award winners from top ML/NLP venues (ICLR, NeurIPS, ICML, ACL, EMNLP, NAACL, AAAI, and more), covering 2022–2026 - A resource for developing research taste.
Prompt Papers Are Dead: Embracing Scaling Training (Prompt论文已死，拥抱Scaling Training) - 从 Scaling 与 Training 视角讨论 Prompt 论文的局限与未来研究方向 (Chinese Version)
美国留学申请心得 - 我的美国留学申请经验总结与建议 (Chinese Version)

📅 Collaboration

If you’d like to discuss research collaboration or have any questions, feel free to schedule a meeting with me:

If you feel our backgrounds align and you’d like to collaborate, get help, or seek mentorship, please fill out this short form: Collaboration Interest Form

Misc

Beyond research, I enjoy writing and sharing knowledge. I maintain a blog on Xiaohongshu where I share research experiences, lecture summaries, insights, and paper discussions. I find that writing helps me think more clearly and connect with others in the community.

I also practice traditional Chinese martial arts, including Tai Chi, and health qigong practices such as Mawangdui Daoyin. These practices help me maintain balance and focus, both physically and mentally.