index.json


    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    [{"authors":null,"categories":null,"content":"","date":1721692800,"expirydate":-62135596800,"kind":"term","lang":"en","lastmod":1721692800,"objectID":"c529841b8d115f5b6a5748bc4785fb75","permalink":"","publishdate":"0001-01-01T00:00:00Z","relpermalink":"","section":"authors","summary":"","tags":null,"title":"Alisa Liu*","type":"authors"},{"authors":null,"categories":null,"content":"","date":1721692800,"expirydate":-62135596800,"kind":"term","lang":"en","lastmod":1721692800,"objectID":"0753073516c1480e76b8e0520f3e10fc","permalink":"","publishdate":"0001-01-01T00:00:00Z","relpermalink":"","section":"authors","summary":"","tags":null,"title":"Jonathan Hayase*","type":"authors"},{"authors":null,"categories":null,"content":"Hello! I am a fifth-year PhD student in computer science at the University of Washington, advised by Yejin Choi and Noah Smith. My research area is natural language processing, with interests particularly in decoding-time algorithms and data creation. I am grateful to be supported by the NSF Graduate Research Fellowship and OpenAI SuperAlignment Fellowship.\nPreviously I was an undergraduate at Northwestern University where I majored in computer science and math. There, I was very fortunate to learn about research from Professor Doug Downey, Professor Bryan Pardo, and Dr. Prem Seetharaman.\n","date":1705363200,"expirydate":-62135596800,"kind":"term","lang":"en","lastmod":1705363200,"objectID":"2525497d367e79493fd32b198b28f040","permalink":"","publishdate":"0001-01-01T00:00:00Z","relpermalink":"","section":"authors","summary":"Hello! I am a fifth-year PhD student in computer science at the University of Washington, advised by Yejin Choi and Noah Smith. My research area is natural language processing, with interests particularly in decoding-time algorithms and data creation.","tags":null,"title":"Alisa Liu","type":"authors"},{"authors":["Jonathan Hayase*","Alisa Liu*","Yejin Choi","Sewoong Oh","Noah A. Smith"],"categories":null,"content":"","date":1721692800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1721692800,"objectID":"cf3c1dcbde1a204717740f4928d2d2c5","permalink":"https://alisawuffles.github.com/publication/bpe/","publishdate":"2024-07-23T00:00:00Z","relpermalink":"/publication/bpe/","section":"publication","summary":"Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors. However, tuning these models has become increasingly resource-intensive, or impossible when model weights are private. We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the result of directly tuning the model, but by accessing only its prediction over the output vocabulary. Our method instead tunes a smaller LM, then applies the difference between the predictions of the small tuned and untuned LMs to shift the original predictions of the base model in the direction of tuning, while retaining the benefits of larger scale pretraining. In experiments, when we apply proxy-tuning to Llama2-70B using proxies of only 7B size, we can close 88% of the gap between Llama2-70B and its truly-tuned chat version, when evaluated across knowledge, reasoning, and safety benchmarks. Interestingly, when tested on TruthfulQA, proxy-tuned models are actually more truthful than directly tuned models, possibly because decoding-time guidance better retains the model's factual knowledge. We then demonstrate the generality of proxy-tuning by applying it for domain adaptation on code, and task-specific finetuning on question-answering and math problems. Our work demonstrates the promise of using small tuned LMs to efficiently customize large, potentially proprietary LMs through decoding-time guidance.","tags":null,"title":"Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?","type":"publication"},{"authors":["Alisa Liu","Xiaochuang Han","Yizhong Wang","Yulia Tsvetkov","Yejin Choi","Noah A. Smith"],"categories":null,"content":"","date":1705363200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1705363200,"objectID":"2b122eeb97a173ca4cc61d4ab42b76e9","permalink":"https://alisawuffles.github.com/publication/proxy_tuning/","publishdate":"2024-01-16T00:00:00Z","relpermalink":"/publication/proxy_tuning/","section":"publication","summary":"We develop an algorithm for \"tuning\" language models at decoding-time!","tags":["text generation"],"title":"Tuning Language Models by Proxy","type":"publication"},{"authors":["Alisa Liu","Zhaofeng Wu","Julian Michael","Alane Suhr","Peter West","Alexander Koller","Swabha Swayamdipta","Noah A. Smith","Yejin Choi"],"categories":null,"content":"","date":1697760000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1697760000,"objectID":"b48b43b64ec2bc2af513d17233ee3383","permalink":"https://alisawuffles.github.com/publication/ambient/","publishdate":"2023-10-20T00:00:00Z","relpermalink":"/publication/ambient/","section":"publication","summary":"We build a benchmark to evaluate LM understanding of ambiguity, which is an intrinsic feature of language, and find that the task remains extremely challenging, including for GPT-4","tags":["ambiguity","dataset","evaluation","natural language inference"],"title":"We're Afraid Language Models Aren't Modeling Ambiguity","type":"publication"},{"authors":["Jaechan Lee","Alisa Liu","Orevaoghene Ahia","Hila Gonen","Noah A. Smith"],"categories":null,"content":"","date":1697673600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1697673600,"objectID":"c5058736da31a5b27cf00c2aa60c496c","permalink":"https://alisawuffles.github.com/publication/tide/","publishdate":"2023-10-19T00:00:00Z","relpermalink":"/publication/tide/","section":"publication","summary":"The translation of ambiguous text presents a challenge for translation systems, as it requires using the surrounding context to disambiguate the intended meaning as much as possible. While prior work has studied ambiguities that result from different *grammatical* features of the source and target language, we study semantic ambiguities that exist in the source (English in this work) itself. In particular, we focus on idioms that are open to both literal and figurative interpretations (e.g., *goose egg*), and collect TIDE, a dataset of 512 pairs of English sentences containing idioms with disambiguating context such that one is literal (*it laid a goose egg*) and another is figurative (*they scored a goose egg*, as in a score of zero). In experiments, we evaluate neural MT models and language models for (i) their **preference** when given an ambiguous subsentence, (ii) their **sensitivity** to disambiguating context, and (iii) the performance **disparity** between figurative and literal source sentences. We find that current MT models consistently translate English idioms literally, even when the context suggests a figurative interpretation. On the other hand, LMs are far more context-aware, although there remain disparities across target languages. Our findings underline the potential of LMs as a strong backbone for context-aware translation.","tags":["ambiguity","dataset","evaluation"],"title":"That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?","type":"publication"},{"authors":["Ian R. McKenzie","18 others","Alisa Liu","Jiacheng Liu","Tom Tseng","Tomasz Korbak","Najoung Kim","Samuel R. Bowman","Ethan Perez"],"categories":null,"content":"","date":1697587200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1697587200,"objectID":"d389bdd8c8c7b9a29ca9db0a7d2ee145","permalink":"https://alisawuffles.github.com/publication/inverse_scaling/","publishdate":"2023-10-18T00:00:00Z","relpermalink":"/publication/inverse_scaling/","section":"publication","summary":"Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.","tags":["dataset","evaluation"],"title":"Inverse Scaling: When Bigger Isn't Better","type":"publication"},{"authors":["Muru Zhang","Ofir Press","William Merrill","Alisa Liu","Noah A. Smith"],"categories":null,"content":"","date":1684713600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1684713600,"objectID":"0105475463132d065c8a017bddcd4b8c","permalink":"https://alisawuffles.github.com/publication/snowballing/","publishdate":"2023-05-22T00:00:00Z","relpermalink":"/publication/snowballing/","section":"publication","summary":"A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we show that LMs sometimes produce hallucinations that they can separately recognize as incorrect. To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim. Crucially, we find that GPT-3.5, GPT-4, and LLaMA2-70B-chat can identify 67%, 87%, and 94% of these incorrect claims, respectively. We show that this phenomenon doesn't disappear under higher temperatures sampling, beam search, and zero-shot chain-of-thought prompting. These findings reveal that LM hallucinations can snowball: early mistakes by an LM can lead to more mistakes that otherwise would not be made.","tags":["hallucination"],"title":"How Language Model Hallucinations Can Snowball","type":"publication"},{"authors":["Yizhong Wang","Yeganeh Kordi","Swaroop Mishra","Alisa Liu","Noah A. Smith","Daniel Khashabi","Hannaneh Hajishirzi"],"categories":null,"content":"","date":1671580800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1671580800,"objectID":"65fd5b50fdcad78d1167e05b9aebb209","permalink":"https://alisawuffles.github.com/publication/self_instruct/","publishdate":"2022-12-21T00:00:00Z","relpermalink":"/publication/self_instruct/","section":"publication","summary":"Large “instruction-tuned” language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off their own generations. Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model. Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT-001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.","tags":["text generation","dataset","instruction following"],"title":"Self-Instruct: Aligning Language Models with Self-Generated Instructions","type":"publication"},{"authors":["Skyler Hallinan","Alisa Liu","Yejin Choi","Maarten Sap"],"categories":null,"content":"","date":1671494400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1671494400,"objectID":"72bd9d47190ac45611b625fb662c3ba5","permalink":"https://alisawuffles.github.com/publication/marco/","publishdate":"2022-12-20T00:00:00Z","relpermalink":"/publication/marco/","section":"publication","summary":"Using expert and anti-expert LMs to rewrite toxic text for safety","tags":["text generation","toxicity"],"title":"Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts","type":"publication"},{"authors":["Alisa Liu","Swabha Swayamdipta","Noah A. Smith","Yejin Choi"],"categories":null,"content":"","date":1642204800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1642204800,"objectID":"756e63fd5afe5ba0196006ae27fe679a","permalink":"https://alisawuffles.github.com/publication/wanli/","publishdate":"2022-01-15T00:00:00Z","relpermalink":"/publication/wanli/","section":"publication","summary":"We introduce a paradigm for dataset creation based on human and machine collaboration, and demonstrate its empirical effectiveness for collecting a new large-scale NLI dataset","tags":["dataset","text generation","natural language inference"],"title":"WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation","type":"publication"},{"authors":["Jiacheng Liu","Alisa Liu","Ximing Lu","Sean Welleck","Peter West","Ronan Le Bras","Yejin Choi","Hannaneh Hajishirzi"],"categories":null,"content":"","date":1640995200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1640995200,"objectID":"0b33cac39a523ad281eceee3b454f54e","permalink":"https://alisawuffles.github.com/publication/knowledge_gen/","publishdate":"2022-01-01T00:00:00Z","relpermalink":"/publication/knowledge_gen/","section":"publication","summary":"Prompting GPT-3 to generate relevant background knowledge improves performance on a variety of commonsense reasoning tasks","tags":["text generation","commonsense reasoning"],"title":"Generated Knowledge Prompting for Commonsense Reasoning","type":"publication"},{"authors":["Alisa Liu","Maarten Sap","Ximing Lu","Swabha Swayamdipta","Chandra Bhagavatula","Noah A. Smith","Yejin Choi"],"categories":null,"content":"","date":1627862400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1627862400,"objectID":"0603c839ecc8847a0334bc6cedb0e026","permalink":"https://alisawuffles.github.com/publication/dexperts/","publishdate":"2021-08-02T00:00:00Z","relpermalink":"/publication/dexperts/","section":"publication","summary":"Steering open-ended text generation toward desired or away from undesired attributes, using expert and anti-expert language models","tags":["text generation","toxicity"],"title":"DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts","type":"publication"},{"authors":["Alisa Liu","Prem Seetharaman","Bryan Pardo"],"categories":null,"content":"","date":1604275200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1604275200,"objectID":"5670b4eefae894fde84d28a2990337f4","permalink":"https://alisawuffles.github.com/publication/ensemble/","publishdate":"2020-11-02T00:00:00Z","relpermalink":"/publication/ensemble/","section":"publication","summary":"Ensemble model for audio source separation, using a confidence measure to mediate among domain-specific models","tags":["source separation","deep clustering"],"title":"Model Selection for Deep Audio Source Separation via Clustering Analysis","type":"publication"},{"authors":["Alex Fang","Alisa Liu","Prem Seetharaman","Bryan Pardo"],"categories":null,"content":"","date":1595030400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1595030400,"objectID":"2a6c41a4f969a5982c323f6a62c270d4","permalink":"https://alisawuffles.github.com/publication/grading_function/","publishdate":"2020-07-18T00:00:00Z","relpermalink":"/publication/grading_function/","section":"publication","summary":"An automatic, interpretable, and musically-motivated grading function for Bach chorales","tags":["music generation"],"title":"Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach","type":"publication"},{"authors":["Alisa Liu","Alex Fang","Gaëtan Hadjeres","Prem Seetharaman","Bryan Pardo"],"categories":null,"content":"","date":1595030400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1595030400,"objectID":"84858b3d3275032965c11e31f112e34a","permalink":"https://alisawuffles.github.com/publication/aug_gen/","publishdate":"2020-07-18T00:00:00Z","relpermalink":"/publication/aug_gen/","section":"publication","summary":"A generative data augmentation method for music generation systems on a resource-constrained domain","tags":["music generation"],"title":"Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation","type":"publication"},{"authors":["Alisa Liu"],"categories":null,"content":"I really enjoy making study guides! Here are a couple I’m especially proud of, and I hope they are useful resources for other students.\nEECS 396: Statistical Machine Learning The professor shared the topic being tested by every problem on the final, so I organized course content around it.\nMath 312: Number Theory One of my favorite classes (and subjects) of all time. I shared this with the class and passed it on to many students who would take the course after me. I still hear from people who use it, which makes me really happy.\nMath 320: Real Analysis This class ⁠makes robust everything we know and love from calculus. In this study guide I try to outline approaches to common problems and provide examples of functions with interesting properties.\nMath 307: Applications of Linear Algebra More of a formula sheet than a study guide.\n","date":1573257600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1573257600,"objectID":"10648a30886333134259f643769e3276","permalink":"https://alisawuffles.github.com/post/study-guides/","publishdate":"2019-11-09T00:00:00Z","relpermalink":"/post/study-guides/","section":"post","summary":"I really enjoy making study guides! Here are a couple I’m especially proud of, and I hope they are useful resources for other students.\nEECS 396: Statistical Machine Learning The professor shared the topic being tested by every problem on the final, so I organized course content around it.","tags":null,"title":"Study guides from undergrad","type":"post"},{"authors":["Michael Chen","Mike D’Arcy","Alisa Liu","Jared Fernandez","Doug Downey"],"categories":null,"content":"","date":1559347200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1559347200,"objectID":"be980448a0cf79ef2ae03d613777bb02","permalink":"https://alisawuffles.github.com/publication/codah/","publishdate":"2019-06-01T00:00:00Z","relpermalink":"/publication/codah/","section":"publication","summary":"An adversarially-constructed dataset for common sense QA","tags":["commonsense reasoning","dataset"],"title":"CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense","type":"publication"},{"authors":null,"categories":null,"content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum. Sed ac faucibus dolor, scelerisque sollicitudin nisi. Cras purus urna, suscipit quis sapien eu, pulvinar tempor diam. Quisque risus orci, mollis id ante sit amet, gravida egestas nisl. Sed ac tempus magna. Proin in dui enim. Donec condimentum, sem id dapibus fringilla, tellus enim condimentum arcu, nec volutpat est felis vel metus. Vestibulum sit amet erat at nulla eleifend gravida.\nNullam vel molestie justo. Curabitur vitae efficitur leo. In hac habitasse platea dictumst. Sed pulvinar mauris dui, eget varius purus congue ac. Nulla euismod, lorem vel elementum dapibus, nunc justo porta mi, sed tempus est est vel tellus. Nam et enim eleifend, laoreet sem sit amet, elementum sem. Morbi ut leo congue, maximus velit ut, finibus arcu. In et libero cursus, rutrum risus non, molestie leo. Nullam congue quam et volutpat malesuada. Sed risus tortor, pulvinar et dictum nec, sodales non mi. Phasellus lacinia commodo laoreet. Nam mollis, erat in feugiat consectetur, purus eros egestas tellus, in auctor urna odio at nibh. Mauris imperdiet nisi ac magna convallis, at rhoncus ligula cursus.\nCras aliquam rhoncus ipsum, in hendrerit nunc mattis vitae. Duis vitae efficitur metus, ac tempus leo. Cras nec fringilla lacus. Quisque sit amet risus at ipsum pharetra commodo. Sed aliquam mauris at consequat eleifend. Praesent porta, augue sed viverra bibendum, neque ante euismod ante, in vehicula justo lorem ac eros. Suspendisse augue libero, venenatis eget tincidunt ut, malesuada at lorem. Donec vitae bibendum arcu. Aenean maximus nulla non pretium iaculis. Quisque imperdiet, nulla in pulvinar aliquet, velit quam ultrices quam, sit amet fringilla leo sem vel nunc. Mauris in lacinia lacus.\nSuspendisse a tincidunt lacus. Curabitur at urna sagittis, dictum ante sit amet, euismod magna. Sed rutrum massa id tortor commodo, vitae elementum turpis tempus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean purus turpis, venenatis a ullamcorper nec, tincidunt et massa. Integer posuere quam rutrum arcu vehicula imperdiet. Mauris ullamcorper quam vitae purus congue, quis euismod magna eleifend. Vestibulum semper vel augue eget tincidunt. Fusce eget justo sodales, dapibus odio eu, ultrices lorem. Duis condimentum lorem id eros commodo, in facilisis mauris scelerisque. Morbi sed auctor leo. Nullam volutpat a lacus quis pharetra. Nulla congue rutrum magna a ornare.\nAliquam in turpis accumsan, malesuada nibh ut, hendrerit justo. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Quisque sed erat nec justo posuere suscipit. Donec ut efficitur arcu, in malesuada neque. Nunc dignissim nisl massa, id vulputate nunc pretium nec. Quisque eget urna in risus suscipit ultricies. Pellentesque odio odio, tincidunt in eleifend sed, posuere a diam. Nam gravida nisl convallis semper elementum. Morbi vitae felis faucibus, vulputate orci placerat, aliquet nisi. Aliquam erat volutpat. Maecenas sagittis pulvinar purus, sed porta quam laoreet at.\n","date":1461715200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1461715200,"objectID":"e8f8d235e8e7f2efd912bfe865363fc3","permalink":"https://alisawuffles.github.com/project/example/","publishdate":"2016-04-27T00:00:00Z","relpermalink":"/project/example/","section":"project","summary":"An example of using the in-built project page.","tags":["Deep Learning"],"title":"Example Project","type":"project"},{"authors":null,"categories":null,"content":"","date":1461715200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1461715200,"objectID":"d1311ddf745551c9e117aa4bb7e28516","permalink":"https://alisawuffles.github.com/project/external-project/","publishdate":"2016-04-27T00:00:00Z","relpermalink":"/project/external-project/","section":"project","summary":"An example of linking directly to an external project website using `external_link`.","tags":["Demo"],"title":"External Project","type":"project"}]