NLPExplorer

BlackboxNLP - 2025

Total Papers:- 33

Total Papers accross all years:- 176

Total Citations :- 0

1 2 3 »

From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language Models

Ziqi Zhang | Jianfei Ma | Emmanuele Chersoni | You Jieshun | Zhaoxin Feng |

BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection

Yaniv Nikankin | Dana Arad | Itay Itzhak | Anja Reusch | Adi Simhi | Gal Kesten | Yonatan Belinkov |

Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs

Himanshu Beniwal | Sailesh Panda | Birudugadda Srivibhav | Mayank Singh |

Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives

Kentaro Ozeki | Risako Ando | Takanobu Morishita | Hirohiko Abe | Koji Mineshima | Mitsuhiro Okada |

When LRP Diverges from Leave-One-Out in Transformers

Weiqiu You | Siqi Zeng | Yao-Hung Hubert Tsai | Makoto Yamada | Han Zhao |

Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

Yuval Weiss | David Demitri Africa | Paula Buttery | Richard Diehl Martinez |

BlackboxNLP-2025 MIB Shared Task: IPE: Isolating Path Effects for Improving Latent Circuit Identification

Nicolò Brunello | Andrea Cerutti | Andrea Sassella | Mark James Carman |

BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods

Philipp Mondorf | Mingyang Wang | Sebastian Gerstner | Ahmad Dawar Hakimi | Yihong Liu | Leonor Veloso | Shijia Zhou | Hinrich Schuetze | Barbara Plank |

Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Yonatan Belinkov | Aaron Mueller | Najoung Kim | Hosein Mohebbi | Hanjie Chen | Dana Arad | Gabriele Sarti |

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

Nathalie Maria Kirch | Constantin Niko Weisser | Severin Field | Helen Yannakoudakis | Stephen Casper |

Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

Francesca Lucchetti | Arjun Guha |

Can Language Neuron Intervention Reduce Non-Target Language Output?

Suchun Xie | Hwichan Kim | Shota Sasaki | Kosuke Yamada | Jun Suzuki |

Understanding the Side Effects of Rank-One Knowledge Editing

Ryosuke Takahashi | Go Kamoda | Benjamin Heinzerling | Keisuke Sakaguchi | Kentaro Inui |

Circuit-Tracer: A New Library for Finding Feature Circuits

Michael Hanna | Mateusz Piotrowski | Jack Lindsey | Emmanuel Ameisen |

On the Representations of Entities in Auto-regressive Large Language Models

Victor Morand | Josiane Mothe | Benjamin Piwowarski |

Conference Topic Distribution

Conference Citation Distribution

Conference Papers have no Citations yet

Topics