Publications

(2024). Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack. Preprint In Arxiv.

PDF Cite Code

(2023). WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models. Accepted to ACL 2024.

PDF Cite Code