Zhang Yuteng’s Academic Portfolio

Software vulnerability detection is essential for ensuring software security and reliability. While deep learning has advanced the field, real-world vulnerability datasets exhibit two critical yet underexplored challenges: frequency imbalance and difficulty imbalance. First, vulnerability datasets are highly skewed in sample frequency, both between vulnerable and non-vulnerable code and across Common Weakness Enumeration (CWE) categories, where a small number of types dominate while many security-critical ones remain rare. Second, even within the same CWE category, vulnerability instances often exhibit substantial variation in code structure, control-flow patterns, and data dependencies, resulting in heterogeneous learning difficulty. We reinterpret these imbalances from an embedding geometry perspective and observe that they manifest as geometric distortions in hyperspherical representation space. To address this issue, we propose AEGIS, a metric-based framework that learns discriminative vulnerability representations through adaptive-margin metric learning and hyperspherical prototype modeling. Specifically, AEGIS dynamically adjusts angular margins according to both class frequency and embedding dispersion, where learning difficulty is estimated via von Mises–Fisher concentration. Meanwhile, hyperspherical prototypes encourage compact intra-class distributions and stable decision boundaries. Extensive experiments on four public vulnerability datasets demonstrate that AEGIS consistently outperforms baselines by 5%–20% on both binary vulnerability detection and multi-class CWE classification tasks. Further analysis shows that AEGIS produces well-structured embedding geometries, improving robustness, interpretability, and generalization in realistic vulnerability detection scenarios.

AEGIS: Adaptive Embedding Geometry for Imbalanced Software