NLPExplorer

Nikola Ljubesic

Number of Papers:- 90

Number of Citations:- 311

First ACL Paper:- 2008

Latest ACL Paper:- 2025

Venues:-

COLING

EAMT

SIGUL

ParlaCLARIN

LREC

MWE

BUCC

WNUT

EMNLP

BSNLP

ACL

LT4VAR

LAW

SemEval

NLP+CSS

WASSA

VarDial

ALW

EACL

WAC

RANLP

PEOPLES

NAACL

WMT

TACL

Co-Authors:-

Similar Authors:-

Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects VarDial WS

SlavicNLP 2025 Shared Task: Detection and Classification of Persuasion Techniques in Parliamentary Debates and Social Media BSNLP WS

Identifying Filled Pauses in Speech Across South and West Slavic Languages BSNLP WS

Nikola Ljubešić | Ivan Porupski | Peter Rupnik | Taja Kuzman |

Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024) VarDial WS

JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far VarDial WS

Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines ParlaCLARIN WS

Geographic Adaptation of Pretrained Language Models TACL

DIALECT-COPA: Extending the Standard Translations of the COPA Causal Commonsense Reasoning Dataset to South Slavic Dialects VarDial WS

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining SIGUL WS

VarDial Evaluation Campaign 2024: Commonsense Reasoning in Dialects and Multi-Label Similar Language Identification VarDial WS

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark NAACL

PARSEME corpus release 1.3 MWE

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages EAMT

ParlaMint II: The Show Must Go On LREC ParlaCLARIN

ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus LREC ParlaCLARIN

Nikola Ljubešić | Danijel Koržinek | Peter Rupnik | Ivo-Pavao Jazbec |

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild LREC

Taja Kuzman | Peter Rupnik | Nikola Ljubešić |

Extending the SSJ Universal Dependencies Treebank for Slovenian: Was It Worth It? LAW LREC

Kaja Dobrovoljc | Nikola Ljubešić |

Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects COLING VarDial WS

Findings of the VarDial Evaluation Campaign 2021 EACL VarDial

Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection EACL WASSA

Ilia Markov | Nikola Ljubešić | Darja Fišer | Walter Daelemans |

Social Media Variety Geolocation with geoBERT EACL VarDial

Yves Scherrer | Nikola Ljubešić |

Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects EACL VarDial

BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian BSNLP EACL

Nikola Ljubešić | Davor Lauc |

Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization EMNLP WNUT

Yves Scherrer | Nikola Ljubešić |

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization EMNLP WNUT

Cultural Topic Modelling over Novel Wikipedia Corpora for South-Slavic Languages RANLP

SemEval-2020 Task 3: Graded Word Similarity in Context COLING SemEval

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models COLING VarDial

Yves Scherrer | Nikola Ljubešić |

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects COLING VarDial

The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene COLING PEOPLES

Nikola Ljubešić | Ilia Markov | Darja Fišer | Walter Daelemans |

A Report on the VarDial Evaluation Campaign 2020 COLING VarDial

Findings of the 2020 Conference on Machine Translation (WMT20) EMNLP WMT

Gigafida 2.0: The Reference Corpus of Written Standard Slovene LREC

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context LREC

Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects NAACL WS

What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian ACL WS

Nikola Ljubešić | Kaja Dobrovoljc |

Improving UD processing via satellite resources for morphology WS

Kaja Dobrovoljc | Tomaž Erjavec | Nikola Ljubešić |

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction ACL

Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings ACL WS

Nikola Ljubešić | Darja Fišer | Anita Peti-Stantić |

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018) COLING VarDial WS

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign COLING VarDial WS

Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages COLING VarDial WS

Nikola Ljubešić |

Datasets of Slovene and Croatian Moderated News Comments EMNLP WS

Nikola Ljubešić | Tomaž Erjavec | Darja Fišer |

Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) VarDial WS

Findings of the VarDial Evaluation Campaign 2017 VarDial WS

Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages BSNLP WS

Tanja Samardžić | Mirjana Starović | Željko Agić | Nikola Ljubešić |

Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text BSNLP WS

Nikola Ljubešić | Tomaž Erjavec | Darja Fišer |

Language-independent Gender Prediction on Twitter NLP+CSS WS

Nikola Ljubešić | Darja Fišer | Tomaž Erjavec |

Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene ALW WS

Darja Fišer | Tomaž Erjavec | Nikola Ljubešić |

TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data COLING

Nikola Ljubešić | Tanja Samardžić | Curdin Derungs |

Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene LREC

Nikola Ljubešić | Tomaž Erjavec |

Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor’s Love Affair LREC

Croatian Error-Annotated Corpus of Non-Professional Written Language LREC

Vanja Štefanec | Nikola Ljubešić | Jelena Kuvač Kraljević |

Corpus-Based Diacritic Restoration for South Slavic Languages LREC

Nikola Ljubešić | Tomaž Erjavec | Darja Fišer |

New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian LREC

Nikola Ljubešić | Filip Klubička | Željko Agić | Ivo-Pavao Jazbec |

A Global Analysis of Emoji Usage WAC WS

Nikola Ljubešić | Darja Fišer |

Dealing with Data Sparseness in SMT with Factured Models and Morphological Expansion: a Case Study on Croatian EAMT WS

Victor M. Sánchez-Cartagena | Nikola Ljubešić | Filip Klubička |

Collaborative Development of a Rule-Based Machine Translator between Croatian and Serbian EAMT WS

Filip Klubička | Gema Ramírez-Sánchez | Nikola Ljubešić |

Private or Corporate? Predicting User Types on Twitter WNUT WS

Nikola Ljubešić | Darja Fišer |

Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3) VarDial WS

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task VarDial WS

Enlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian VarDial WS

Maja Popović | Kostadin Cholakov | Valia Kordoni | Nikola Ljubešić |

Predicting the Level of Text Standardness in User-generated Content RANLP

Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons RANLP

Nikola Ljubešić | Miquel Esplà-Gomis | Filip Klubička | Nives Mikelić Preradović |

Comparing two acquisition systems for automatically building an English—Croatian parallel corpus from multilingual websites LREC