Skip to content
Change the repository type filter

All

    Repositories list

    • Code for the paper 'The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis' (Findings of ACL 2024)
      Python
      1700Updated Oct 4, 2024Oct 4, 2024
    • mt-sft

      Public
      Official implementation for "Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?"
      0100Updated Oct 1, 2024Oct 1, 2024
    • Code for the paper 'Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?'
      Python
      GNU Affero General Public License v3.0
      0100Updated Sep 19, 2024Sep 19, 2024
    • winopron

      Public
      Code for the paper 'Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case'
      Python
      GNU Affero General Public License v3.0
      0100Updated Sep 19, 2024Sep 19, 2024
    • human

      Public
      Hierarchical Universal Modular ANotator
      TypeScript
      GNU General Public License v3.0
      51160Updated Sep 18, 2024Sep 18, 2024
    • Python
      0000Updated Sep 10, 2024Sep 10, 2024
    • A web application for the languagemodels library, enabling the calculation and plotting of surprisal across text using large language models.
      HTML
      0000Updated Aug 15, 2024Aug 15, 2024
    • Teaching materials for use with the web-based Surprisal Toolkit, in tutorials on information theory and calculating surprisal from large language models.
      Jupyter Notebook
      MIT License
      0000Updated Aug 14, 2024Aug 14, 2024
    • A simple toolkit to train and evaluate language models.
      Python
      0500Updated Aug 8, 2024Aug 8, 2024
    • All newly created resources for the HuCLLM@ACL 2024 paper "Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?"
      Python
      0200Updated Aug 6, 2024Aug 6, 2024
    • 0000Updated Jul 20, 2024Jul 20, 2024
    • Official Code for "Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study"
      Python
      MIT License
      0100Updated Jul 14, 2024Jul 14, 2024
    • MCSE

      Public
      NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings
      Python
      MIT License
      85300Updated Jun 10, 2024Jun 10, 2024
    • AAdaM

      Public
      Code for the paper 'AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness'
      Python
      1400Updated Jun 10, 2024Jun 10, 2024
    • Python
      0200Updated Mar 21, 2024Mar 21, 2024
    • Code for the paper 'A Lightweight Method to Generate Unanswerable Questions in English' (Findings of EMNLP 2023)
      Python
      GNU Affero General Public License v3.0
      0410Updated Mar 12, 2024Mar 12, 2024
    • Noisy readback error label generation using rule based method
      Python
      0000Updated Dec 16, 2023Dec 16, 2023
    • llmft

      Public
      Fine-tuning large language models with huggingface transformers and deepspeed
      Python
      172901Updated Dec 11, 2023Dec 11, 2023
    • NoisyNER

      Public
      A dataset for realistic evaluation of noisy label methods
      Python
      01400Updated Dec 3, 2023Dec 3, 2023
    • ATC-Anno

      Public
      ATC-Anno is an annotation tool for Air Traffic Control data that offers automatic semantic and concept annotation.
      Python
      MIT License
      31001Updated Nov 17, 2023Nov 17, 2023
    • Molecule Transformers is a collection of recipes for pre-training and fine-tuning molecular transformer language models, including BART, BERT, etc. Full thesis available at https://moleculetransformers.github.io/thesis_cs_msc_Khan_Shahrukh.pdf.
      Python
      MIT License
      0100Updated Nov 13, 2023Nov 13, 2023
    • Weaker Than You Think: A Critical Look at Weakly Supervised Learning
      Python
      0810Updated Oct 22, 2023Oct 22, 2023
    • On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
      Python
      Apache License 2.0
      2113232Updated Sep 6, 2023Sep 6, 2023
    • babylm

      Public
      Jupyter Notebook
      MIT License
      0000Updated Aug 1, 2023Aug 1, 2023
    • An information theoretic characterization of the relation between self-supervised, discrete speech representations and phonetic categories
      Jupyter Notebook
      0100Updated Jun 2, 2023Jun 2, 2023
    • msr

      Public
      Meta Self-Refinement for Robust Learning with Weak Supervision
      Python
      1000Updated Apr 30, 2023Apr 30, 2023
    • Code for our paper "Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists" @ SIGTYP 2023
      Jupyter Notebook
      GNU General Public License v3.0
      0000Updated Mar 29, 2023Mar 29, 2023
    • A Closer Look at Linguistic Knowledge in Masked Language Models:The Case of Relative Clauses in American English
      Jupyter Notebook
      0100Updated Feb 2, 2023Feb 2, 2023
    • PyPremise

      Public
      PyPremise - Python tool for the Premise algorithm to identify patterns or explanations of where a machine learning classifier performs well and where it fails.
      Python
      MIT License
      21700Updated Jan 27, 2023Jan 27, 2023
    • premise

      Public
      Additional Material for the paper "Label-Descriptive Patterns and Their Application to Characterizing Classification Errors" (ICML 2022)
      C++
      0300Updated Jan 27, 2023Jan 27, 2023