YuLiang Sun
YuLiang Sun
Home
Experiences
Projects
Publications
Awards
Light
Dark
Automatic
article
Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
We collect a large-scale dataset with 12,974 knowledge-jailbreak pairs and fine-tune a large language model as jailbreak-generator, to produce domain…
Shangqing Tu
,
Zhuoran Pan
,
Wenxuan Wang
,
Zhexin Zhang
,
YuLiang Sun
,
Jifan Yu
,
Hongning Wang
,
Lei Hou
,
Juanzi Li
PDF
Cite
Code
Cite
×